In 2015, one of the world’s most prestigious photojournalism events—The World Press Photo Contest—was shrouded in controversy following the disqualification of 22 entrants, including an overall prize winner, for manipulating their photo entries. News of the disqualifications led to a heated public debate about the role of photo manipulation in photojournalism. World Press Photo responded by issuing a new code of ethics for the forthcoming contest that stipulated entrants “must ensure their pictures provide an accurate and fair representation of the scene they witnessed so the audience is not misled” (World Press Photo). They also introduced new safeguards for detecting manipulated images, including a computerized photo-verification test for entries reaching the penultimate round of the competition. The need for such a verification process highlights the difficulties competition organizers face in trying to authenticate images. If photography experts can’t spot manipulated images, what hope is there for amateur photographers or other consumers of photographic images? This is the question we aimed to answer. That is, to what extent can lay people distinguish authentic photos from fakes?
Digital image and manipulation technology has surged in the previous decades. People are taking more photos than ever before. Estimates suggested that one trillion photos would be taken in 2015 alone (Worthington, 2014), and that, on average, more than 350 million photos per day are uploaded to Facebook—that is over 14 million photos per hour or 4000 photos per second (Smith, 2013). Coinciding with this increased popularity of photos is the increasing frequency with which they are being manipulated. Although it is difficult to estimate the prevalence of photo manipulation, a recent global survey of photojournalists found that 76% regard photo manipulation as a serious problem, 51% claim to always or often enhance in-camera or RAW (i.e., unprocessed) files, and 25% admit that they, at least sometimes, alter the content of photos (Hadland, Campbell, & Lambert, 2015). Together these findings suggest that we are regularly exposed to a mix of real and fake images.
The prevalence and popularity of manipulated images raises two important questions. First, to what extent do manipulated images alter our thinking about the past? We know that images can have a powerful influence on our memories, beliefs, and behavior (e.g., Newman, Garry, Bernstein, Kantner, & Lindsay, 2012; Wade, Garry, Read, & Lindsay, 2002; Wade, Green, & Nash, 2010). Merely viewing a doctored photo and attempting to recall the event it depicts can lead people to remember wholly false experiences, such as taking a childhood hot-air balloon ride or meeting the Warner Brothers character Bugs Bunny at Disneyland (Braun, Ellis, & Loftus, 2002; Sacchi, Agnoli, & Loftus, 2007; Strange, Sutherland, & Garry, 2006). Thus, if people cannot differentiate between real and fake details in photos, manipulations could frequently alter what we believe and remember.
Second, to what extent should photos be admissible as evidence in court? Laws governing the use of photographic evidence in legal cases, such as the Federal Rules of Evidence (1975), have not kept up with digital change (Parry, 2009). Photos were once difficult to manipulate; the process was complex, laborious, and required expertise. Yet in the digital age, even amateurs can use sophisticated image-editing software to create detailed and compelling fake images. The Federal Rules of Evidence state that the content of a photo can be proven if a witness confirms it is fair and accurate. Put another way, the person who took the photo, any person who subsequently handles it, or any person present when the photo was taken, is not required to testify about the authenticity of the photo. If people cannot distinguish between original and fake photos, then litigants might use manipulated images to intentionally deceive the court, or even testify about images, unaware they have been changed.
Unfortunately, there is no simple solution to prevent people from being fooled by manipulated photos in everyday life or in the criminal arena (Parry, 2009). But the newly emerging field of image forensics is making it possible to better protect against photo fraud (e.g., Farid, 2006). Image forensics uses digital technology to determine image authenticity, and is based on the premise that digital manipulation alters the values of the pixels that make up an image. Put simply, the act of manipulating a photo leaves behind a trace, even if only subtle and not visible to the naked eye (Farid, 2009). Given that different types of manipulations—for instance, cloning, retouching, splicing—affect the underlying pixels in unique and systematic ways, image forensic experts can develop computer methods to reveal image forgeries. Such technological developments are being implemented in several domains, including law, photojournalism, and scientific publishing (Oosterhoff, 2015). The vast majority of image authenticity judgments, however, are still made by eye, and to our knowledge only one published study has explored the extent to which people can detect inconsistencies in images.
Farid and Bravo (2010) investigated how well people can make use of three cues— shadows, reflections, and perspective distortion—that are often indicative of photo tampering. The researchers created a series of computer-generated scenes consisting of basic geometrical shapes. Some scenes, for instance, were consistent with a single light source whereas others were inconsistent with a single light source. When the inconsistencies were obvious, that is, when shadows ran in opposite directions, observers were able to identify tampering with nearly 100% accuracy. Yet when the inconsistencies were subtle, for instance, where the shadows were a combination of results from two different light positions on the same side of the room, observers performed only slightly better than chance. These preliminary findings, based on computer-generated scenes of geometric objects, suggest that the human visual system is poor at identifying inconsistencies in such images.
In the current study we examined whether people are similarly poor at detecting inconsistencies within images of real-world scenes. On the one hand, we might expect people to perform even worse if trying to detect manipulations in real-world photos. Research shows that real-world photos typically contain many multi-element objects that can obscure distortions (Bex, 2010; Hulleman & Olivers, 2015). For example, people with the visual impairment metamorphopsia often do not notice any problems with their vision in their everyday experiences, yet the impairment is quite apparent when they view simple stimuli, such as a grid of evenly spaced horizontal and vertical lines (Amsler, 1953; Bouwens & Meurs, 2003). We also know that people find it more difficult to detect certain types of distortions, such as changes to image contrast, in complex real-world scenes than in more simplistic stimuli (Bex, 2010; Bex, Solomon, & Dakin, 2009). In sum, if people find it particularly difficult to detect manipulations in complex real-world scenes, then we might expect our subjects to perform worse than Farid and Bravo’s (2010) subjects.
On the other hand, there is good reason to predict that people might do well at detecting manipulations in real-world scenes. Visual cognition research suggests that people might detect image manipulations using their knowledge of the typical appearance of real-world scenes. Real-world scenes share common properties, such as the way the luminance values of the pixels are organized and structured (Barlow, 1961; Gardner-Medwin & Barlow, 2001; Olshausen & Field, 2000). Over time, the human visual system has become attuned to such statistical regularities and has expectations about how scenes should look. When an image is manipulated, the structure of the image properties change, which can create a mismatch between what people see and what they expect to see (Craik, 1943; Friston, 2005; Rao & Ballard, 1999; Tolman, 1948). Thus, based on this real-world scene statistics account, we might predict that people should be able to use this “mismatch” as a cue to detecting a manipulation. If so, our subjects should perform better than chance at detecting manipulations in real-world scenes.
Although there is a lack of research directly investigating the applied question of people’s ability to detect photo forgeries, people’s ability to detect change in a scene is well-studied in visual cognition. Notably, change blindness is the striking finding that, in some situations, people are surprisingly slow, or entirely unable, to detect changes made to, or find differences between, two scenes (e.g., Pashler, 1988; Simons, 1996; Simons & Levin, 1997). In some of the early studies, researchers demonstrated observers’ inability to detect changes made to a scene during an eye movement (saccade) using very simple stimuli (e.g., Wallach & Lewis, 1966), and later, in complex real-world scenes (e.g., Grimes, 1996). Researchers have also shown that change blindness occurs even when the eyes are fixated on the scene: The flicker paradigm, for instance, simulates the effects of a saccade or eye blink by inserting a blank screen between the continuous and sequential presentation of an original and changed image (Rensink, O’Regan, & Clark, 1997). It often requires a large number of alternations between the two images before the change can be identified. Furthermore, change blindness persists when the original and changed images are shown side by side (Scott-Brown, Baker, & Orbach, 2000), when change is masked by a camera cut in motion pictures (Levin & Simons, 1997), and even when change occurs in real-world situations (Simons & Levin, 1998).
Such striking failures of perception suggest that people do not automatically form a complete and detailed visual representation of a scene in memory. Therefore, to detect change, it might be necessary to draw effortful, focused attention to the changed aspect (Simons & Levin, 1998). So which aspects of a scene are most likely to gain focused attention? One suggestion is that attention is guided by salience; the more salient aspects of a scene attract attention and are represented more precisely than less salient aspects. In support of this idea, research has shown that changes to more important objects are more readily detected than changes made to less important objects (Rensink et al., 1997). Other findings, however, indicate that observers sometimes miss even large changes to central aspects of a scene (Simons & Levin, 1998). Therefore, the question of what determines scene saliency continues to be explored. Specifically, researchers disagree about whether the low-level visual salience of objects in a scene, such as brightness (e.g., Lansdale, Underwood, & Davies, 2010; Pringle, Irwin, Kramer, & Atchley, 2001; Spotorno & Faure, 2011) or the high-level semantic meaning of the scene (Stirk & Underwood, 2007) has the most influence on attentional allocation.
What other factors affect people’s susceptibility to change blindness? One robust finding in the signal detection literature is that the ability to make accurate perceptual decisions is related to the strength of the signal and the amount of noise (Green & Swets, 1966). Signal detection theory has been applied to change detection. In one study, observers judged whether two sequentially presented arrays of colored dots remained identical or if there was a change (Wilken & Ma, 2004). Crucially, the researchers manipulated the strength of the signal in the change trials by varying the number of colored dots in the display that changed, while noise (total set size) remained constant. Performance improved as a function of the number of dots in the display that changed color—put simply, greater signal resulted in greater change detection.
Given the lack of research investigating people’s ability to detect photo forgeries, change blindness offers a highly relevant area of research. A key difference between the change blindness research and our current experiments, however, is that our change detection task does not involve a comparison of two images; therefore, representing the scene in memory is not a factor in our research. That is, subjects do not compare the original and manipulated versions of an image. Instead, they make their judgment based on viewing only a single image. This image is either the original, unaltered image or an image that has been manipulated in some way.
In the current study, we explored people’s ability to identify common types of image manipulations that are frequently applied to real-world photos. We distinguished between physically implausible versus plausible manipulations. For example, a physically implausible image might depict an outdoor scene lit only by the sun with a person’s shadow running one way and a car’s shadow running the other way. Such shadows imply the impossible: two suns. Alternatively, when an unfamiliar face is retouched in an image it is quite plausible; eliminating spots and wrinkles or whitening teeth do not contradict physical constraints in the world that govern how faces ought to look. In our study, geometrical and shadow manipulations made up our implausible manipulation category, while airbrushing and addition or subtraction manipulations made up our plausible manipulation category. Our fifth manipulation type, super-additive, presented all four manipulation types in a single image and thus included both categories of manipulation.
We had a number of predictions about people’s ability to detect and locate manipulations in real-world photos. We expected the type of manipulation—implausible versus plausible—to affect people’s ability to detect and locate manipulations. In particular, people should correctly identify more of the physically implausible manipulations than the physically plausible manipulations given the availability of evidence within the photo. We also expected people to be better at correctly detecting and locating manipulations that caused more change to the pixels in the photo than manipulations that caused less change.