Binocular vision provides an important source of depth information that contributes to the performance of many everyday tasks (Bradshaw et al., 2004; Hayhoe, Gillam, Chajka, & Vecellio, 2009; Hibbard & Bradshaw, 2003; Keefe, Hibbard, & Watt, 2011; Loftus, Servos, Goodale, Mendarozqueta, & Mon-Williams, 2004; McIntire, Havig, & Geiselman, 2014; Melmoth & Grant, 2006; Patla, Niechwiej, Racco, & Goodale, 2002; Read, Begum, McDonald, & Trowbridge, 2013; Servos, Goodale, & Jakobson, 1992; Watt & Bradshaw, 2003). Binocular cues are valuable because they are very precise (e.g. Harris, 2004; McKee, 1983; Stevenson, Cormack, & Schor, 1989) and also because, unlike many other depth cues, they provide scaled metric depth information. This means that they allow the actual shape, size, and location of objects to be estimated.
The improvement of depth perception provided by stereoscopic cues is important in enhancing the user experience in stereoscopic three-dimensional displays (S3D). In a review of the enhancement in performance provided in S3D compared with non-stereoscopic displays, McIntire, Havig, and Geiselman (2012) concluded that stereoscopic displays lead to a clear improvement in performance on tasks requiring spatial understanding or the manipulation of objects and are somewhat useful for tasks requiring the judgment of the position or distance of objects and in finding, identifying, or classifying objects. In medical applications, stereoscopic displays have been shown to be particularly useful in improving visualization and diagnosis in medical imaging, and spatial orientation and performance in minimally invasive surgery (Held & Hui, 2011). In each of these cases, the improvements in performance are consistent with the enhanced information about 3D structure provided by stereoscopic cues.
Stereoscopic displays enhance the perceptual experience of depth, as well as improving performance on tasks requiring the use of depth information (McIntire et al., 2012). Ideally, an optimal display would create a 3D experience that was indistinguishable from a direct view of the real world (Banks, Hoffman, Kim, & Wetzstein, 2016). The creation of a realistic 3D experience is particularly important in creating a sense of presence in movies, games, and virtual reality (Freeman & Avons, 2000). The aim of the current study is to determine which aspects of the representation of stereoscopic depth are important in creating a convincing and realistic 3D experience.
Banks, Hoffman, Kim, and Wetzstein (2016) recently reviewed the hardware factors that affect the degree of realism experienced in 3D displays. Stimulus parameters that influence the realism of the 3D experience have also been assessed. These include image quality (Lambooij, IJsselsteijn, Bouwhuis, & Heynderickx, 2011; Seuntiens, Meesters, & IJsselsteijn, 2006), the separation between the left and right cameras and the focal length used in capturing the images and therefore the range of binocular disparities and how these relate to other depth cues (IJsselsteijn, de Ridder, & Hamberg, 1998). When complex images such as natural photographs are used for stimuli, the manipulation of binocular disparity tends to create a conflict between the depth specified by binocular disparity and that specified by pictorial cues. This is because, while the depth specified by disparity increases when the camera separation is increased, depth specified by pictorial cues remains unchanged. This conflict could potentially reduce the realism of the 3D experience. This conflict can be reduced by using stimuli in which the informativeness of non-stereoscopic depth cues is minimized. In the current study, we studied the effect of manipulating the disparity content of images on the apparent depth realism, using simple stimuli that allowed us to easily quantify the effect of binocular cues. We used this approach to allow us to focus on those aspects of the representation of stereoscopic depth that can be predicted, on theoretical grounds, to contribute to the realism of the experience.
The first factor under consideration is the magnitude of apparent depth. The relative disparity between two objects increases monotonically as their physical separation in the depth direction increases. If depth was perceived accurately, then it should increase in just the same way with the disparity present in the image. For relatively small disparities, this is what is observed (Ogle, 1952). While depth is still seen for larger values, there is no further increase in the amount of depth seen with the size of the disparity. Depth perception in these two ranges of disparity is referred to as “patent” and “qualitative” stereopsis (Ogle, 1952), respectively. All other things being equal, greater stereoscopic depth might be expected to enhance the 3D experience, by increasing the difference between the stereoscopic and non-stereoscopic experience.
A second factor that is enhanced under stereoscopic viewing, and which might also be expected to create a more convincing 3D experience, is the precision with which depth is represented. This precision also depends on the size of the disparity present in the stimulus. Ogle (1952) found that the standard deviation of errors, for a task in which the depth of a briefly presented target was aligned with that of a reference, increased exponentially with the disparity of the two stimuli relative to fixation. Blakemore (1970) measured relative disparity discrimination thresholds for distinguishing between the depth of a target and reference, for different values of reference disparity. He found that these thresholds increased exponentially up to 90 arc min, the largest value tested. Similar decreases in relative depth sensitivity with pedestal disparity have been reported elsewhere (Badcock & Schor, 1985).
Large separations intervals will create disparities beyond the range of qualitative stereopsis (Ogle, 1952). In this range, observers are not able to make even simple judgments as to which of two targets is closer than another, if fixation is kept fixed. If, however, observers are free to move their eyes, then accurate relative depth judgments can be made, relying partly on information about the change in convergence (Backus & Matza-Brown, 2003; Brenner and van Damme, 1998). The precision of relative depth judgments based on changes in convergence is not greatly affected by the magnitude of the disparity difference between the target objects (Brenner & van Damme, 1998).
It has previously been proposed that the realism of the 3D experience might relate to the precision with which depth is represented (Hibbard, 2008; Vishwanath, 2005). More specifically, it has also been proposed that quality of perceived depth is associated with the precision with which scaled metric depth is represented (Vishwanath, 2011, 2014).
A final factor that might be expected to affect the realism of the 3D experience is the extent to which the two images can be fused into a single percept. When the disparity in a stimulus is large, we are unable to completely fuse the two images together and as a result see some elements as double, an experience known as diplopia (Hampton & Kertesz, 1983; Qin, Takamatsu, & Nakashima, 2006). The range of single vision beyond which diplopia occurs is smaller than the range of patent stereopsis identified by Ogle (1952). This means that there is a range of disparities which are outside of Panum’s fusional limit and for which stimuli appear diplopic, but for which perceived depth nevertheless scales with disparity. There is also no direct link between the point at which fusion is lost and the precision of depth judgments. Rather, precision decreases at the same rate with increasing disparity both within and beyond Panum’s fusional limit (Wilcox & Allison, 2009). In complex stimuli, the limit of single vision is affected by many factors, including the size of the image (Hampton & Kertesz, 1983; Qin, Takamatsu, & Nakashima, 2006), how many elements it contains (Braddick, 1979; Burt & Julesz, 1980; Tyler, 1973), how long it is presented (Woo, 1974), and whether the observer is free to move their eyes (Mitchell, 1966). While a reduction in fusion might be expected to reduce realism for large disparities, it is unlikely to be able to account for variations in realism across the full range of disparities, particularly when the depth range is small. In the extreme, stimuli with zero or small disparities will have the greatest degree of fusion, but a low rating for creating a realistic depth experience.
Our goal was to test how the realism of the 3D experience from a stereoscopic display is associated with the magnitude and precision of perceived depth and with binocular fusion. Establishing these links is important in understanding the representational correlates of perceptual experience (Allen, 2013; Hibbard, 2008; Peacocke, 1987; Tye, 2002; Vishwanath & Hibbard, 2013; Vishwanath, 2014) and providing an optimal perceptual experience in stereoscopic applications (Häkkinen et al., 2008; IJsselsteijn et al., 1998; IJsselsteijn, de Ridder, & Vliegen, 2000; Lambooij et al., 2011; Seuntiens et al., 2006). To do this, we varied the disparity presented in stereoscopic images and measured how this affects the magnitude and precision of perceived depth, depth realism, and binocular fusion. To isolate the influence of binocular cues and reduce the effects of conflicts with pictorial cues, we used simple random element stereograms as our stimuli. We had a number of predictions about how these manipulations might affect the realism of depth.
We expected that there would be a range of disparities in which patent stereopsis would be apparent, such that apparent depth would increase with disparity. Beyond this range there should be no further increase in perceived depth. If 3D realism is determined by the magnitude of perceived depth, it should increase with disparity, within this range of patent stereopsis.
The sensitivity of observers to differences in depth was expected to decrease with increasing disparity. If 3D realism is determined by the precision of perceived depth, it should decrease with increasing disparity.
We also expected that, for disparities within Panum’s fusional limit, stimuli would be fused into a coherent percept, but for diplopia to occur for larger disparities. Binocular fusion might be expected to be important in establishing a convincing 3D experience with a sense of solid, 3D objects. We would then expect a convincing 3D experience when stimuli are fused and for this to diminish for large disparities, for which fusion does not occur. Equally, when disparities are small, and the stimuli do not differ substantially from those experienced with a non-stereoscopic display, we would not expect observers to report a strong 3D experience.
These predicted effects of binocular fusion and precision on depth realism are in the opposite direction from the predicted effects of depth magnitude. The predictions from fusion and depth sensitivity, while in the same direction, can also be distinguished. The prediction from fusion is that there will be a range of disparities over which fusion is maintained and for which a convincing 3D experience will be produced. Beyond this range, both fusion and realism will be lost. In contrast, if realism depends on the precision with which depth is represented, we expect depth sensitivity to decrease continuously both within and beyond the fusional limit, so would predict a continuous variation in the degree of realism. In particular, we would expect variation in the degree of realism even with small disparities, where there is no variation in the degree to which fusion is reported.
These experiments, by assessing the influence of disparity parameters on the realism of perceived depth, are related to previous studies (IJsselsteijn et al., 1998; Lambooij et al., 2011; Seuntiens et al., 2006). However, unlike these previous studies, which have used natural photographs, we used random circle stereograms to minimize the contribution of pictorial depth cues. This allowed us to separate the effects of disparity manipulations change the effects of conflicts between pictorial and binocular depth cues. Our stimulus manipulations are also linked to classic work which has established the influence of the range of disparities on the way that binocular images are fused together and depth is represented (Burt & Julesz, 1980; Ogle, 1952). In contrast to these studies, we focused on the effects of these manipulations on the subjective experience of the realism of 3D perception.