Contribution of Disparity to the Perception of 3D Shape as Revealed by Bistability of Stereoscopic...

16
Seeing and Perceiving 25 (2012) 561–576 brill.com/sp Contribution of Disparity to the Perception of 3D Shape as Revealed by Bistability of Stereoscopic Necker Cubes C. J. Erkelens Helmholtz Institute, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands Received 24 April 2012; accepted 2 August 2012 Abstract The Necker cube is a famous demonstration of ambiguity in visual perception of 3D shape. Its bistability is attributed to indecisiveness because monocular cues do not allow the observer to infer one particular 3D shape from the 2D image. A remarkable but not appreciated observation is that Necker cubes are bistable during binocular viewing. One would expect disparity information to veto bistability. To investigate the effect of zero and non-zero disparity on perceptual bistability in detail, perceptual dominance durations were measured for luminance- and disparity-defined Necker cubes. Luminance-defined Necker cubes were bistable for all tested disparities between the front and back faces of the cubes. Absence of an effect of disparity on dominance durations suggested the suppression of disparity information. Judgments of depth between the front and back sides of the Necker cubes, however, showed that disparity affected perceived depth. Disparity-defined Necker cubes were also bistable but dominance durations showed different distri- butions. I propose a framework for 3D shape perception in which 3D shape is inferred from pictorial cues acting on luminance- and disparity-defined 2D shapes. © Koninklijke Brill NV, Leiden, 2012 Keywords Perceptual bistability, disparity, binocular vision 1. Introduction Visual perception usually gives us a single impression of the visual environment al- though most information enters our visual system twice, namely, via both eyes. This means that somewhere in the brain the signals from the two eyes must either inter- act or compete with each other. The invention of random-dot stereograms (Julesz, 1960) has generated undisputed evidence for the existence of binocular interaction: in these stereograms we experience three-dimensional shapes that are not visible in either of its images. Competition is demonstrated by binocular rivalry, which * E-mail: [email protected] © Koninklijke Brill NV, Leiden, 2012 DOI:10.1163/18784763-00002396

Transcript of Contribution of Disparity to the Perception of 3D Shape as Revealed by Bistability of Stereoscopic...

Seeing and Perceiving 25 (2012) 561–576 brill.com/sp

Contribution of Disparity to the Perception of 3D Shape asRevealed by Bistability of Stereoscopic Necker Cubes

C. J. Erkelens ∗

Helmholtz Institute, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands

Received 24 April 2012; accepted 2 August 2012

AbstractThe Necker cube is a famous demonstration of ambiguity in visual perception of 3D shape. Its bistabilityis attributed to indecisiveness because monocular cues do not allow the observer to infer one particular 3Dshape from the 2D image. A remarkable but not appreciated observation is that Necker cubes are bistableduring binocular viewing. One would expect disparity information to veto bistability. To investigate theeffect of zero and non-zero disparity on perceptual bistability in detail, perceptual dominance durationswere measured for luminance- and disparity-defined Necker cubes. Luminance-defined Necker cubes werebistable for all tested disparities between the front and back faces of the cubes. Absence of an effect ofdisparity on dominance durations suggested the suppression of disparity information. Judgments of depthbetween the front and back sides of the Necker cubes, however, showed that disparity affected perceiveddepth. Disparity-defined Necker cubes were also bistable but dominance durations showed different distri-butions. I propose a framework for 3D shape perception in which 3D shape is inferred from pictorial cuesacting on luminance- and disparity-defined 2D shapes.© Koninklijke Brill NV, Leiden, 2012

KeywordsPerceptual bistability, disparity, binocular vision

1. Introduction

Visual perception usually gives us a single impression of the visual environment al-though most information enters our visual system twice, namely, via both eyes. Thismeans that somewhere in the brain the signals from the two eyes must either inter-act or compete with each other. The invention of random-dot stereograms (Julesz,1960) has generated undisputed evidence for the existence of binocular interaction:in these stereograms we experience three-dimensional shapes that are not visiblein either of its images. Competition is demonstrated by binocular rivalry, which

* E-mail: [email protected]

© Koninklijke Brill NV, Leiden, 2012 DOI:10.1163/18784763-00002396

562 C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576

is experienced when the two eyes view very different shapes (Levelt, 1965). Thequestion of how and where binocular interaction and competition occur in the brainis currently under discussion (Blake and Wilson, 2011; Buckthought and Mendola,2011; Carlson and He, 2000; Knapen et al., 2011; Logothetis, 1998; Logothetis etal., 1996; Orban et al., 2006; Parker, 2007; Polonsky et al., 2000; Tong and Engel,2001; Tong et al., 1998).

Neurophysiological studies (Parker, 2007) have provided ample evidence thatbinocular interaction creates disparity signals that, in the absence of other sourcesof information (cues), are sufficient to induce an unambiguous 3D percept. Duringthe binocular viewing of natural scenes, the retinal images contain cues such as mo-tion, texture, perspective and shade that, in addition to disparity, contribute to theperceived 3D layout of visual scenes. Several studies have demonstrated that humanobservers make judgments about size, depth and orientation of objects by integrat-ing cues in an almost optimal fashion (Hillis et al., 2004; Jacobs, 1999; Muller etal., 2007; Saunders and Knill, 2001). Observers give weights to cues dependenton their reliability and robustness (Hillis et al., 2004; Knill and Saunders, 2003).Bayesian statistical models have been found suited to approximate the perceptualeffects of cue combination in conditions of small conflicts (Hillis et al., 2004; Knilland Saunders, 2003) as well as in conditions of large conflicts between disparity andpictorial cues (Knill, 2007; van Ee et al., 2003). To explain the perceptual results forlarge conflicts, additional priors related to the 3D shape of objects were included inthe Bayesian models. Knill (2007) assumed a mixed prior distribution of 3D shapes(in his study a mixture of objects having circular and elliptical surfaces). Van Ee etal. (2003) assumed alternation between strong and weak rectangularity priors.

Conceptually, random ellipse priors and weak rectangularity priors are problem-atic because they do not specify a particular shape but a class of shapes. As a resultthe priors lose their predictive power for depth and slant perception. Another con-ceptual problem is combining disparity and pictorial cues into a single Bayesianframework. Different from pictorial cues such as perspective and shading, stereop-sis in the strict sense, i.e. 3D vision from disparity alone, does not require a prior for3D shape. Starting from a measurement of 2D shapes, pictorial cues infer specific3D shapes based on prior experience. The inferred shapes determine the perceiveddepth and slant of the 3D shapes. Stereopsis is a measurement in 3D (2D shapeplus depth) that specifies a particular 3D shape, irrespective of whether the depthmeasurement is unreliable or not. Here 3D shape is determined by the measurementof disparity and not by a prior that, in turn, determines a certain slant and depth.

Necker cubes (Necker, 1832) are interesting stimuli for studying the interactionbetween disparity and pictorial cues. Perceptual alternation between two 3D inter-pretations of the image is attributed to a Bayesian inference (Knill and Richards,1996) based on the likelihood that the 2D image is produced by one of the twocubes. Apparently the likelihood for perceiving a cube is much higher than that forperceiving any other polyhedral 3D shape or for perceiving the shape as a 2D fig-ure, which all could have given rise to the same 2D image. The presence of zero

C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576 563

disparity does not seem sufficient to tip the balance in favor of perceiving the 2Dfigure because, in many recent studies, perceptual alternations have been reportedduring binocular viewing of Necker cubes (Alais et al., 2010; Britz et al., 2009;Kanai et al., 2005; Kornmeier and Bach, 2009; Kornmeier et al., 2011; Shannon etal., 2011; Shen et al., 2009; Sundareswara and Schrater, 2008; van Dam and vanEe, 2006; van Ee, 2005). Most researchers do not discuss this fact and make no dis-tinction between perceptual alternations during binocular and monocular viewing.To investigate effects of zero and non-zero disparity to perceptual alternations indetail, perceptual dominance durations were measured during the monocular andbinocular viewing of Necker cubes. To manipulate the strength of disparity relativeto pictorial cues, dominance durations were measured for monocularly visible andmonocularly hidden Necker cubes.

2. Methods

2.1. Experimental Setup

Stimuli were displayed using a conventional Wheatstone stereoscope arrangement(Fig. 1a) consisting of two TFT displays (20′′ LaCie Photon 20 Vision II, 1600 ×1200 pixels, 75 Hz) and two small mirrors (Wismeijer et al., 2008). The mirrorswere slanted about the vertical axis at angles of plus and minus 45° with respect totheir display. The virtual intersection point of the orthogonal mirrors was alignedwith the center of the displays. This arrangement meant that viewing the left andright images on the two screens was geometrically equivalent to binocular viewingthe images, mirrored, on a single frontoparallel screen placed at the same distancefrom the observer. The straight-ahead viewing distance (eye-mirror-display) was57 cm. Subjects were seated close to the mirrors so that the left eye could not see

Figure 1. (a) Top view of the stimulus set-up. Examples of stereograms containing gray (b) andrandom-dot (c) Necker cubes embedded in random-dot noise.

564 C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576

the right mirror and vice versa. The display area of the monitors was approximately35° × 28°. Stimuli were created using custom Open GL based software. A chin-and headrest restricted the subjects’ head movements. The whole setup and experi-mental room were painted black matte and the room was darkened.

2.2. Stimuli

Subjects binocularly viewed two types of Necker cubes drawn in orthogonal pro-jection to prevent perspective cues from influencing the preference for one of thetwo perceived 3D orientations. One type of Necker cube (Fig. 1b) was homoge-neously gray (33.4 cd/m2) and projected on a background of randomly distributed50% black (0.1 cd/m2) and 50% white (71.9 cd/m2) dots (6′ × 6′) covering the re-mainder of the display. Dominance durations were measured during both monocularand binocular viewing of the stimuli. Depth judgments were measured separatelyduring binocular viewing only. The gray Necker cubes were monocularly visibleby either eye during binocular viewing. The width of the cubes’ edges was 3 dots(18′). The front and back faces (7° × 7°) of the cubes were shifted by 3.5° in bothhorizontal and vertical directions relative to each other creating a central square of3.5° × 3.5°. The relative disparity between front- and backside of the cubes was var-ied from −0.4° to 0.4° in steps of 0.2° during binocular testing. The Necker cubeshad an overall disparity of 0.3° relative to the background so that they appearedcompletely in front of the background. A random-dot version of the Necker cubewas identical to the gray one except that the cube’s edges consisted of randomlydistributed black and white dots rendering the cube invisible during monocularviewing (Fig. 1c). All random-dot Necker cubes had a mean disparity of 0.3° rela-tive to their background.

2.3. Procedure

Before embarking on experimental trials, subjects were presented with one exam-ple of each type of the Necker cubes. They were instructed to look around in thestimulus while keeping their fixation within the central square of the Necker cube.After some explanation, all subjects experienced the stimuli as 3D cubes alternat-ing between two orientations. In the experiments, each trial started by an initialscreen that informed the subject to start the presentation of the stimulus by press-ing the space bar of the keyboard that was placed in front of the subject. Stimuliwere presented for durations of 3 min during which the subject indicated the per-ceived orientation of the Necker cube using two keys (left and right arrows for theleft- and right-face-in-front orientations of the cube, respectively) of the keyboard.Subjects were allowed to pause between trials as long as they wished. Dominantpercept duration was recorded as the time between successive presses on differentkeys. A dominance duration was excluded from analysis if the dominant perceptwas ended by the end of the trial. Per stimulus type, stimuli were presented in ran-dom order. Each stimulus was presented twice. For each stimulus, the durations ofthe two perceived orientations (left- and right-face-in-front) were treated separately.

C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576 565

All variables taken together led to 6 (subjects) × 11 (stimuli) × 2 (orientations) =132 distributions of dominance durations. On average, distributions contained 45data points. Reports during the instruction phase learned that subjects occasionallyjudged the depth of the Necker cube differently in the two orientations. In separatesessions, depth judgments were measured as a function of disparity for gray Neckercubes. In a forced-choice manner, subjects indicated which of the two alternatingpercepts contained most depth. Stimuli were presented 30 times in random order.

2.4. Analysis

Dominance durations were arranged in cumulative distributions. The advantage ofusing cumulative distributions is that data points do not have to be summed withinarbitrarily selected bin sizes. Distributions of dominance durations resemble thestatistical gamma distribution, introduced in the context of percept durations byLevelt (1967). The cumulative density function (CDF) of the gamma distribution isgiven by

CDF(x) =∫ x

0

1

λk�(k)tk−1et/λ dt,

where �(k) is the continuous extension of (k − 1)! and x and t are time variables.The parameters k and λ are known as the shape and scale parameters of gammadistributions, respectively. The skewness of the distribution depends on k only. To-gether k and λ determine the mean (m) of the distribution, because m = k · λ. Thismeans that mean dominance duration (m) is not independent of k and λ for datadescribed by gamma distributions. Mean dominance durations were computed toenable comparison of results with existing literature. The means were computeddirectly from the data and thus obtained independently of k and λ. A mixed-designanalysis of variance was performed to compare k, λ and m as a function of disparity(within-subjects factor) and cube orientation (between-subjects factor). Effects ofNecker cube type (gray, random-dot) and viewing condition (monocular, binocular)were tested in separate ANOVAs. Analysis of variance was performed to comparethe mean percentages of the depth judgments as a function of disparity.

2.5. Subjects

Six subjects participated in all experiments. All were naive with respect to the pur-pose of the study. The subjects had normal or corrected-to-normal vision. Stereo-acuity was assessed by shape detection in a series of selected random-dot stere-ograms presented on the stereoscope arrangement that was used in the experiments.All subjects were successful in naming the hidden shapes.

3. Results

All subjects reported that the task was easy to perform. Subjects perceived com-plete orientation alternations for the gray Necker cubes even if the cubes contained

566 C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576

Figure 2. Cumulative dominance distributions for individual subjects during viewing of gray Neckercubes. Data and fits to gamma functions are grouped by disparity (top: 0°, bottom: −0.2°) and cubeorientation (left: left face in front, right: right face in front). Numbers indicate data of individualsubjects.

non-zero disparity between front and back faces. Figure 2 shows the cumulative fre-quencies of dominance durations for the individual subjects for gray Necker cubeshaving zero and one non-zero disparity. The graphs show considerable within-subjects and between-subjects variability in dominance duration.

Figure 3(a), (b) and (c) show the mean values of k, λ and m as a function ofdisparity and Necker cube orientation across all 6 observers. Figure 3(d) shows theresults of the depth judgments.

The factor disparity was not significant for k (F(4,40) = 2.25, p = 0.14), λ

(F(4,40) = 0.72, p = 0.40) and m (F(4,40) = 1.42, p = 0.24). The factor cubeorientation was not significant for k (F(1,10) = 0.01, p = 0.98), λ (F(1,10) =0.20, p = 0.67) and m (F(1,10) = 1.56, p = 0.24) either. Mean dominance du-ration (m) ranged between 3.0 and 4.9 s. These values are somewhat higher thanthe range between 2.0 and 3.2 s that has been reported in the literature (Babich andStanding, 1981; Peterson and Hochberg, 1983; Ross and Ma-Wyatt, 2004; van Ee,2005). The much larger size of the Necker cubes and the random-dot backgroundare possible causes for the difference.

The factor disparity was highly significant for the depth judgments (F(4,25) =39.12, p < 10−9). The interpretation of the depth judgments is that subjects per-ceived more depth in the Necker cubes if disparity and perspective signaled corre-sponding orientations. Figure 3(d) shows that the judgments made at the relatively

C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576 567

Figure 3. Mean values of k (a), λ (b) and m (c) for perceiving the left (light gray) and right (darkgray) faces of the Necker cube in front (N = 6). (d) Mean percentages of seeing most depth in theright-face-in-front orientation of the Necker cubes. Error bars represent 1 SEM.

large disparity (−0.4° and 0.4°) contributed most importantly to the perceived depthorder.

The results demonstrate that monocularly visible Necker cubes are bistable evenif there is a considerable disparity between the front and back faces of the cubes.Absence of a significant effect of disparity on dominance durations suggests thecomplete suppression of disparity information. The depth judgments, however, op-pose the suggestion and show that disparity influences perceived depth if disparityand perspective specify equal depth orders.

Figure 4 shows the cumulative frequencies of dominance durations for the in-dividual subjects for random-dot Necker cubes having zero and one non-zero dis-parity. The zero-disparity graphs show that within-subjects and between-subjectsvariability of dominance distributions are similar to those for gray Necker cubes(Fig. 2, top).

Between-subjects variability was extremely large for non-zero disparities (Fig. 4,bottom). The subjects reported in one of the two orientations of the Necker cubesa conflict between the overall orientation and the occlusion of individual barssurrounding the central area of fixation. In this orientation the Necker cube wasexperienced as an impossible 3D object. The conflict interfered with the task. Sub-jects reported that occasionally they were uncertain about the overall orientationor forgot to press the required key at the appropriate instance. Two of the subjects

568 C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576

Figure 4. Cumulative dominance distributions for individual subjects during the viewing of random-dot Necker cubes. Data and fits to gamma functions are grouped by disparity (top: 0°, bottom: 0.2°)and cube orientation (left: left face in front, right: right face in front). Numbers indicate data of indi-vidual subjects.

became so confused that they stopped responding altogether. As a result of theseproblems statistical analysis was limited to dominance distributions for random-dotNecker cubes having zero disparity.

The factor stimulus type was highly significant for k (F(2,20) = 14.10, p =0.0012), λ (F(2,20) = 59.84, p < 10−6) and m (F(2,20) = 11.05, p = 0.0034).Post-hoc Tukey tests showed that the random-dot Necker-cube results were signifi-cantly different from those for the gray and monocularly viewed Necker cubes (allF ’s > 7.28, p’s < 0.003). The random-dot Necker-cube k’s were about half thevalue of the other k’s (Fig. 5(a)), implying that the random-dot Necker-cube dis-tributions were more skewed than the others. On the other hand, the random-dotNecker-cube λ’s were almost four times as large as the other λ’s (Fig. 5(b)). Thevalues of k and λ imply that the random-dot Necker-cube m’s were larger thanthe other m’s (Fig. 5(c)). Mean dominance duration (m) ranged between 7.5 and7.7 s for the random-dot Necker cubes. These values are in-line with mean perceptdurations measured for slant bistability (van Ee, 2005). The results for the monoc-ularly and binocularly viewed gray Necker cubes were not significantly differentfrom each other (all F ’s < 2.95, p’s > 0.10). The factor cube orientation was notsignificant for k (F(1,10) = 0.04, p = 0.85), λ (F(1,10) = 0.07, p = 0.79) or form (F(1,10) = 0.66, p = 0.44).

C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576 569

Figure 5. Mean values of k (a), λ (b) and m (c) for perceiving the left (light gray) and right (darkgray) faces of the Necker cubes in front (N = 6). The binocularly viewed Necker cubes (gray andr(andom-)dot) contained zero-disparity. The monocularly viewed Necker cubes (mon) were grayNecker cubes against a random-dot background presented to the right eye. Error bars represent 1 SEM.

The results demonstrate that monocularly invisible Necker cubes are bistable ifthere is zero disparity between the front and back faces of the cubes. The Neckercubes are still unstable if disparity is different from zero. Stable depth of centrallyviewed bars does not stop instability but turns the Necker cube into an impossiblefigure that is known as the crazy crate in popular literature on optical illusions.

4. Discussion

Bistability of Necker cubes has been reported for various binocular viewing condi-tions. Wheatstone (1838) observed that a flat Necker-type cube, designed to appear3D when viewed binocularly, appeared to alternate in depth. Purves and Andrews(1997) showed that a 3D wire frame produced bistable perception during binocularviewing. Furthermore, any viewing of a flat Necker cube using both eyes is a binoc-ular view of a zero-disparity Necker cube. In all these conditions, zero or non-zerodisparities should veto changes in perceived depth. However, they do not. These ob-servations may give the false impression that disparity does not affect the perceptionof Necker cubes at all. Studies of Dosher et al. (1986) and Backus and Haijiang(2007), however, showed that disparity helped in disambiguating rotating Neckercubes if these cubes were viewed for a few seconds. Furthermore, the depth judg-ments in the present study showed that disparity affected the perceived depth of the

570 C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576

bistable Necker cubes. Despite the observed contributions of disparity, the presenceof zero as well as non-zero disparity in luminance- and disparity-defined Neckercubes did not stop bistability during long-lasting binocular viewing. Distributionsof dominance durations were very similar during monocular and binocular viewingfor luminance-defined Necker cubes. Disparity does seem to have a negligible ef-fect on the bistability of such Necker cubes. The likely explanation is that disparityis fully outweighed by pictorial cues that are supposed to drive bistability (Knilland Richards, 1996). An essential property of pictorial cues is that they act on 2Dshape (meaning that the third dimension is not defined). Although the distributionsof dominance durations showed different characteristics, disparity-defined Neckercubes were also found to be bistable in particular if the front and back faces of thecubes were not segregated by relative disparity. Outweighing of disparity by pic-torial cues is now impossible because the cubes are defined by disparity alone. Anexplanation would be that disparity is suppressed too little to prevent the emergenceof the cubes and sufficient to allow bistability. The problem with this explanation is,however, that there are no pictorial cues that can act on the disparity-defined Neckercubes. In order to interfere with disparity, pictorial cues would have to operate onshapes after binocular combination. However, these shapes are supposed to be 3D.Yet another explanation for bistability of disparity-defined Necker cubes is that theprocessing of disparity causes bistability on its own due to inversion of the dispar-ity sign (Matthews et al., 2011). This explanation is unlikely because, if true, onewould expect to find bistability in almost all random-dot stereograms, which is notthe case. Binocular bistability is limited to objects that are bistable during monocu-lar viewing. Another counter-argument is that inversion of the disparity sign wouldnot affect the zero-disparity Necker cubes.

The question that has to be answered is: How can bistability result from rivalrybetween conflicting cues if there is no cue other than disparity that gives rise toseeing a Necker cube? Figure 6 shows a theoretical framework of binocular visionthat may explain the current bistability results as well as the cooperation betweendisparity and monocular cues in the 3D perception of pictures and random-dot stere-ograms. In the schematic framework, figural shape is mapped onto two or three2D representations. One representation is for luminance-defined 2D shapes result-ing from figure–ground processing of luminance edges that originate from one orboth retinal images (the OR box). Since motion is measured by specialized neu-rons, there is probably also a 2D representation for motion-defined shape (structurefrom motion). Disparity-defined 3D shapes result from figure–ground processingof disparities, obtained from combining the two retinal images (the AND box).In the model, the non-depth part is stored in a representation for 2D disparity-defined shape. The hypothesis resulting from the current study is that all shapeperception follows from neural processes that convert 2D shape into 3D shape. Re-cently, Pizlo et al. (2010) proposed a process based on veridicality, complexity,symmetry and volume that successfully recovers the 3D shape of an object from asingle luminance-defined representation of 2D shape (Li et al., 2009). In the current

C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576 571

Figure 6. Flow chart of representations (discs) and processes (bars) involved in 3D perception.

framework 3D shape results from pictorial cues acting on luminance-, motion- anddisparity-defined 2D shapes. The pictorial cues cause the perceptual bistability ofNecker cubes. Separate representations for luminance- and disparity-defined shapeseem justified by the different dynamics of perceptual alternations for luminance-and disparity-defined Necker cubes that were measured in the current experiments.The inferred 3D shapes determine the 3D layout of the scene because it is onlyin that particular layout that the inferred 3D shapes are in agreement with the mea-sured 2D shapes. During binocular vision, the 3D layout is also inferred directly viathe disparity pathway. Studies of cue combinations have shown that, within certainlimits, the perceived 3D layout is a reconcilement between the layouts measuredby disparity and inferred from monocular cues (Hillis et al., 2004; Jacobs, 1999;Muller et al., 2007; Saunders and Knill, 2001). Outside the limits, one of the path-ways acquires dominance permanently, such as during the viewing of pictures andrandom-dot stereograms, or temporarily, such as during the viewing of slant-rivalrystimuli (van Ee et al., 2003) and reverspective paintings (Wagner et al., 2009).

The proposed flow chart explains the remarkable but not appreciated observa-tion that Necker cubes are bistable during binocular vision. Bistability results fromthe processing of 3D shape from monocular cues, which occurs identically duringmonocular and binocular vision. The presence of disparity cannot stop bistabil-ity because disparity is not an input for 3D shape processing. Disparity interactswith the output. It combines with the perspective-related depth in one orientationof the Necker cube, affecting the cube’s perceived depth, and it is suppressed in

572 C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576

the other orientation. Recently, similar effects were observed in stereoscopicallyviewed hollow-masks (Matthews et al., 2011). An implication of the frameworkpresented here is that purely stereoscopic stimuli such as random-dot stereogramsare subjected to the same cognition-based processes as those that convert monocularimages into 3D shapes. Cognitive-based processing is supported by the observationthat a large number of visual illusions persist when they are presented as purelystereoscopic stimuli (Julesz, 1971). The coupling of perceived depth and perceivedsize in random-dot stereograms offers further support (Bishop, 1996).

The current proposal explains a problematic finding in binocular vision thathas been discussed since the nineteenth century (Hering, 1879; von Helmholtz,1911). The finding was that the range of patent stereopsis lies outside the region ofbinocular single vision, where fusion of the disparate images occurs (Ogle, 1952).Patent stereopsis is the strong impression of depth with fused or double images.Ogle studied the limits of stereopsis and fusion extensively and measured consid-erable differences. He also found subjects for whom fusion existed but for whomno stereopsis could be demonstrated. Tyler (1991) proposed a not very convincingphysiological basis for fusion and diplopia in terms of activities of monovalent andbinocular neurons. The different limits are problematic because fusion and stereop-sis are supposed to follow from the same process. Within the framework presentedhere, the different limits for fusion and stereopsis are easily explained by fusionbeing a quality of the representation for luminance-defined shape and stereopsis aquality of the representation for disparity-defined shape. Thus, fusion and stereop-sis are manifestations of independent representations. The explanation is supportedby the fact that diplopia is easily observed of luminance-defined shapes but not ofpurely disparity-defined shapes. The latter observation was also made in random-dot stereograms where some individual (luminance-defined) dots appeared doublewhile the (disparity-defined) depth figure appeared single (Lee and Dobbins, 2006).

In his book Foundations of Cylopean Perception, Julesz (1971) presented a flowchart of the visual system. Julesz’s division in peripheral and central processes sepa-rated by the cyclopean retina expresses the current view on binocular vision. The pe-ripheral processes receive input from the retinas and provide output to the cyclopeanretina that, in turn, is the central input stage to the central processes. The presentresults favor a model in which peripheral processes produce several representationsof 2D shape that in parallel are processed by common central processes to constructa unified 3D percept of the visual scene. The representation for luminance-definedshape is one of the gates to central processing and the representation for disparity-defined shape another one. Probably there are more representations. For instance, a2D representation for shape related to structure from motion seems another likelycandidate (Siegel and Andersen, 1988). In his influential book, Marr (1982) pro-posed the primal sketch as an intermediate representation in his framework forderiving shape information from images. The primal sketch was supposed to bea representation of generic images, in terms of image primitives, such as bars andedges. The representations of 2D shape in the current framework may be seen as

C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576 573

appropriate locations for primal sketches. A noteworthy difference with Marr’s pro-posal is the inclusion of disparity- and motion-related information. Marr envisioneddisparity and motion first to contribute to the 2.5D sketch.

The main search for the neural substrate of stereoscopic vision has been devotedto the perception of depth and not so much of shape (Parker, 2007). Still, a num-ber of neurophysiological reports are suggestive for specific aspects of the currentmodel of binocular integration (Orban et al., 2006). Buckthought and Wilson (2007)showed in a psychophysical study that it is possible to perceive depth and rivalrysimultaneously in one spatial location, as long as the components lie in differentorientation or spatial frequency bands. Buckthought and Mendola (2011) arguedthat current models of binocular vision do not predict the perception of simultane-ous depth and rivalry, as the binocular false matches and suppression from rivalrywould prevent binocular matching for depth perception to occur (Hayashi et al.,2004). The explanation offered by Buckthought and Mendola (2011) was that theneural substrates for the representation of surfaces are distinct from that in whichthe correspondence problem for depth or rivalry is solved. While early visual ar-eas (V1, V2, V3) are likely to be involved in solving the correspondence problem,possible candidate areas for surface representation include the lateral occipital ar-eas responsive to either depth or rivalry. Applying these ideas to the flow chart ofFig. 6 implies that the processes ‘matching and grouping’ take place in the areasV1, V2 and V3. Larsson and Heeger (2006) described two visual field maps, calledLO1 and LO2, in the lateral occipital cortex (LOC). Each map contained a topo-graphic representation whose eccentricity coding was shared with V1, V2 and V3.The topography, stimulus selectivity, and anatomical location indicated that LO1and LO2 integrate shape information from multiple visual submodalities in retino-topic coordinates. Since the integration of luminance- and disparity-defined shapesrequires retinotopic correspondence, LO1, LO2 or related retinotopic areas may bethe neural substrates of luminance- and disparity-defined 2D shape, although Lars-son and Heeger (2006) did not test these areas with disparity-defined stimuli. LOChas been named as the area that hosts representations of 3D shape (Grill-Spector etal., 2001) and scenes (Macevoy and Epstein, 2011). It is not clear whether LOC isalso involved in stereoscopic shape processing. Areas V4 and connected subregionsof the inferotemporal cortex (IT) contain neurons that are selective for orientation indepth of disparity-defined stimuli (Orban et al., 2006). Where and how signals fromstereoscopic depth are integrated with the monocular representation of 3D shape isnot understood at the physiological level (Parker, 2007). The proposed frameworkfor 3D shape perception may be a source of inspiration for neurophysiological stud-ies directed to solving related questions.

References

Alais, D., van Boxtel, J. J., Parker, A. and van Ee, R. (2010). Attending to auditory signals slows visualalternations in binocular rivalry, Vision Research 50, 929–935.

574 C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576

Babich, S. and Standing, L. (1981). Satiation effects with reversible figures, Percept. Mot. Skills 52,203–210.

Backus, B. T. and Haijiang Q. (2007). Competition between newly recruited and pre-existing visualcues during the construction of visual appearance, Vision Research 47, 919–924.

Bishop, P. O. (1996). Can random-dot stereograms serve as a model for the perception of depth inrelation to real three-dimensional objects? Vision Research 36, 1473–1477.

Blake, R. and Wilson, H. (2011). Binocular vision, Vision Research 51, 754–770.Britz, J., Landis, T. and Michel, C. M. (2009). Right parietal brain activity precedes perceptual alter-

nation of bistable stimuli, Cereb. Cortex 19, 55–65.Buckthought, A. and Mendola, J. D. (2011). A matched comparison of binocular rivalry and depth

perception with fMRI, J. Vision 11, 1–15.Buckthought, A. and Wilson, H. R. (2007). Interaction between binocular rivalry and depth in plaid

patterns, Vision Research 47, 2543–2556.Carlson, T. A. and He, S. (2000). Visible binocular beats from invisible monocular stimuli during

binocular rivalry, Curr. Biol. 10, 1055–1058.Dosher, B. A., Sperling, G. and Wurst, S. A. (1986). Tradeoffs between stereopsis and proximity

luminance covariance as determinants of perceived 3D structure, Vision Research 26, 973–990.Grill-Spector, K., Kourtzi, Z. and Kanwisher, N. (2001). The lateral occipital complex and its role in

object recognition, Vision Research 41, 1409–1422.Hayashi, R., Maeda, T., Shimojo, S. and Tachi, S. (2004). An integrative model of binocular vision:

a stereo model utilizing interocularly unpaired points produces both depth and binocular rivalry,Vision Research 44, 2367–2380.

Hering, E. (1879). Spatial Sense and Movements of the Eye (1942). American Academy of Optometry,Baltimore, USA.

Hillis, J. M., Watt, S. J., Landy, M. S. and Banks, M. S. (2004). Slant from texture and disparity cues:optimal cue combination, J. Vision 4, 967–992.

Jacobs, R. A. (1999). Optimal integration of texture and motion cues to depth, Vision Research 39,3621–3629.

Julesz, B. (1971). Foundations of Cyclopean Perception. University of Chicago Press, Chicago, USA.Julesz, B. (1960). Binocular depth perception of computer generated patterns, Bell Syst. Technol. J.

39, 1125–1162.Kanai, R., Moradi, F., Shimojo, S. and Verstraten, F. A. (2005). Perceptual alternation induced by

visual transients, Perception 34, 803–822.Knapen, T., Brascamp, J., Pearson, J., van Ee, R. and Blake, R. (2011). The role of frontal and parietal

brain areas in bistable perception, J. Neurosci. 31, 10293–10301.Knill, D. C. (2007). Learning Bayesian priors for depth perception, J. Vision 7, 13.Knill, D. C. and Richards, W. (Eds) (1996). Perception as Bayesian Inference. Cambridge University

Press, Cambridge, MA, USA.Knill, D. C. and Saunders, J. A. (2003). Do humans optimally integrate stereo and texture information

for judgments of surface slant? Vision Research 43, 2539–2558.Kornmeier, J. and Bach, M. (2009). Object perception: when our brain is impressed but we do not

notice it, J. Vision 9, 7.1–10.Kornmeier, J., Pfaffle, M. and Bach, M. (2011). Necker cube: stimulus-related (low-level) and percept-

related (high-level) EEG signatures early in occipital cortex, J. Vision 11, 12.Larsson, J. and Heeger, D. J. (2006). Two retinotopic visual areas in human lateral occipital cortex,

J. Neurosci. 26, 13128–13142.

C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576 575

Lee, H. S. and Dobbins, A. C. (2006). Perceiving surfaces in depth beyond the fusion limit of theirelements, Perception 35, 31–39.

Levelt, W. J. (1967). Note on the distribution of dominance times in binocular rivalry, Brit. J. Psychol.58, 143–145.

Levelt, W. J. M. (1965). On Binocular Rivalry. Royal Van Gorcum, Assen, The Netherlands.Li, Y., Pizlo, Z. and Steinman, R. M. (2009). A computational model that recovers the 3D shape of an

object from a single 2D retinal representation, Vision Research 49, 979–991.Logothetis, N. K. (1998). Single units and conscious vision, Phil. Trans. Royal Soc. London B 353,

1801–1818.Logothetis, N. K., Leopold, D. A. and Sheinberg, D. L. (1996). What is rivalling during binocular

rivalry? Nature 380, 621–624.Macevoy, S. P. and Epstein, R. A. (2011). Constructing scenes from objects in human occipitotemporal

cortex, Nat. Neurosci. 14, 1323–1329.Marr, D. (1982). Vision. A Computational Investigation into the Human Representation and Process-

ing of Visual Information. W.H. Freeman and Company, New York, USA.Matthews, H., Hill, H. and Palmisano, S. (2011). Binocular disparity magnitude affects perceived

depth magnitude despite inversion of depth order, Perception 40, 975–988.Muller, C. M., Brenner, E. and Smeets, J. B. (2007). Living up to optimal expectations, J. Vision 7, 2.Necker, L. A. (1832). Observations on some remarkable optical phaenomena seen in Switzerland;

and on an optical phaenomenon which occurs on viewing a figure of a crystal or geometical solid,London and Edinburgh Phil. Mag. J. Sci. 1, 329–337.

Ogle, K. N. (1952). Disparity limits of stereopsis, A.M.A. Arch. Ophthalmol. 48, 50–60.Orban, G. A., Janssen, P. and Vogels, R. (2006). Extracting 3D structure from disparity, Trends Neu-

rosci. 29, 466–473.Parker, A. J. (2007). Binocular depth perception and the cerebral cortex, Nat. Rev. Neurosci. 8, 379–

391.Peterson, M. A. and Hochberg, J. (1983). Opposed-set measurement procedure: a quantitative analysis

of the role of local cues and intention in form perception, J. Exper. Psychol. Hum. Percept. Perform.9, 183–193.

Pizlo, Z., Sawada, T., Li, Y., Kropatsch, W. G. and Steinman, R. M. (2010). New approach to theperception of 3D shape based on veridicality, complexity, symmetry and volume, Vision Research50, 1–11.

Polonsky, A., Blake, R., Braun, J. and Heeger, D. J. (2000). Neuronal activity in human primary visualcortex correlates with perception during binocular rivalry, Nat. Neurosci. 3, 1153–1159.

Purves, D. and Andrews, T. J. (1997). The perception of transparant three-dimensional objects, Proc.Natl. Acad. Sci. USA. 94, 6517–6522.

Ross, J. and Ma-Wyatt, A. (2004). Saccades actively maintain perceptual continuity, Nat. Neurosci. 7,65–69.

Saunders, J. A. and Knill, D. C. (2001). Perception of 3D surface orientation from skew symmetry,Vision Research 41, 3163–3183.

Shannon, R. W., Patrick, C. J., Jiang, Y., Bernat, E. and He, S. (2011). Genes contribute to the switch-ing dynamics of bistable perception, J. Vision 11, 8.

Shen, L., Zeng, Z. L., Huang, P. Y., Li, Q., Mu, J., Huang, X. Q., Lui, S., Gong, Q. Y. and Xie, P. (2009).Temporal cortex participates in spontaneous perceptual reversal, Neuroreport 20, 647–651.

Siegel, R. M. and Andersen, R. A. (1988). Perception of three-dimensional structure from motion inmonkey and man, Nature 331, 259–261.

576 C. J. Erkelens / Seeing and Perceiving 25 (2012) 561–576

Sundareswara, R. and Schrater, P. R. (2008). Perceptual multistability predicted by search model forBayesian decisions, J. Vision 8, 12.1–19.

Tong, F. and Engel, S. A. (2001). Interocular rivalry revealed in the human cortical blind-spot repre-sentation, Nature 411, 195–199.

Tong, F., Nakayama, K., Vaughan, J. T. and Kanwisher, N. (1998). Binocular rivalry and visual aware-ness in human extrastriate cortex, Neuron 21, 753–759.

Tyler, C. W. (1991). The horopter and binocular fusion, in: Binocular Vision, D. Regan (Ed.), Vol. 9.Vision and Visual Dysfunction Series, J. R. Cronly-Dillon (Ed.), pp. 19–37, The Macmillan PressLtd, London, UK.

van Dam, L. C. and van Ee, R. (2006). The role of saccades in exerting voluntary control in perceptualand binocular rivalry, Vision Research 46, 787–799.

van Ee, R. (2005). Dynamics of perceptual bistability for stereoscopic slant rivalry and a comparisonwith grating, house-face, and Necker cube rivalry, Vision Research 45, 29–40.

van Ee, R., Adams, W. J. and Mamassian, P. (2003). Bayesian modeling of cue interaction: bistabilityin stereoscopic slant perception, J. Opt. Soc. Amer. A 20, 1398–1406.

von Helmholtz, H. (1911). Helmholtz’s Treatise on Physiological Optics (III). Republished byThoemmes Press, Bristol, UK (2000).

Wagner, M., Ehrenstein, W. H. and Papathomas, T. V. (2009). Vergence in reverspective: percept-driven versus data-driven eye movement control, Neurosci. Lett. 449, 142–146.

Wheatstone, C. (1838). Contributions to the physiology of vision — Part 1: On some remarkable andhitherto unobserved phenomena of binocular vision, Phil. Trans. Royal Soc. London 128, 371–394.

Wismeijer, D. A., van Ee, R. and Erkelens, C. J. (2008). Depth cues, rather than perceived depth,govern vergence, Exper. Brain Res. 184, 61–70.