Joint Attention as the Fundamental Basis of Understanding Perspectives

21

Transcript of Joint Attention as the Fundamental Basis of Understanding Perspectives

394 Henrike Moll and Andrew N. Meltzoff

signs of autism, simply are able or skillful to engage in (see also Gallagher& Hutto, 2008; Seemann, 2007).

In line with this, we want to bring into focus some of the social­cognitive advances that are enabled by-instead of necessary for-jointattention (a similar argument has been made for imitation; see Meltzoff,2005; Kannetzky, 2007). The central claim we shall make is that joint atten­tion provides the necessary foundation for the development of perspectiv­ity. Children first learn to take perspectives in the context of jointattentional engagement. Infants, even before their first birthdays, not yetknowing anything about perspectives, can share attention and possiblyperspectives (see also Barresi & Moore, 1993) with others. The use of thisskill then blossoms into the development of taking and understandingperspectives that follows in the coming months and years.

To develop this argument, we first proVide a brief overview of howinfants by the end of the first year of life share attention with others anddevelop a sense for when they are and when they are not in joint atten­tion. We will then examine the beginnings of perspective taking in thedomains of knowledge and perception shown by infants and young chil­dren between one and three years of age. Finally, we will argue, in agree­ment with some of the traditional research and contra recent suggestionsfrom infant habituation and looking-time research, that a full appreciationof perspectives is not in place before the age of four to five years, whenchildren come to "confront" different viewpoints, knowing that the self­same object can be viewed or construed differently (Perner, 1991; Perner,Brandl, & Garnham, 2003).

In a nutshell, the developmental trajectory that young children followon their way to an adult-like understanding of perspectives has its begin­ning in the sharing of perceptual experiences and actions of others. Incontrast to a widely held view, we argue that this sharing is primary andontogenetically precedes the attribution of mental states to the other indi­vidual. Soon after children first engage in instances of joint attention, theylearn to take another person's perspective that differs from their own. Theycan understand requests and make sense of actions that are performedfrom perspectives that do not match their own perspective. However, thisperspective taking does not yet entail an awareness that there are two dif­ferent perspectives on an identical object or situation. Such a full acknowl­edgment of the fact that there may be perspectival differences is only inplace once children can explicitly confront two -contrasting perspectiveson the same object-a skill that seems to be acquired no sooner than atthe classic watershed of four to five years. In this chapter, we will describe

Joint Attention as the Fundamental Basis of Understanding Perspectives 395

this developmental pathway from elementary joint attention to a realiza­tion that there can be different perspectives or views on the same object.

Establishing Joint Attention

By nine to twelve months, human infants readily engage in early forms ofjoint attention. They are no longer exclusively occupied with either anobject individually (thing, event, situation, etc.) or another person dyadi­cally. Instead, they now often "triangulate" (Davidson, 2001) and share anobject of interest with another person triadically (with the possibility thatthe object place is taken by another [third] person to whom infant andadult attend together). They co-attend with their caretaker to the plane inthe sky, the older sibling's activities, or the sound of the siren emergingfrom the ambulance in the street.

One possible way to begin an episode of joint visual attention, besidesshowing objects and pointing to things for others, is to adopt another'sattentional focus through gaze or point following. Starting at around ninemonths, infants look where another has just looked or pointed (instead offixating the pointing finger; see Murphy & Messer, 1977). By around twelvemonths, infants look past distractors on their visual scan path (Butterworth& Jarrett, 1991) and follow gaze to loci outside of their immediate visualfields, such as behind them (Deak, Flom, & Pick, 2000) or behind opaquebarriers (Moll & Tomasello, 2004). Other variations of the gaze-followingparadigm suggest that children have an implicit understanding of whenjoint attention is and is not possible to establish. More specifically, twelve­to fifteen-month-old infants follow another person's head turn to an objectless when the person's eyes are closed (Brooks & Meltzoff, 2002, 2005;Tomasello, Hare, Lehmann, & Call, 2007), averted (Corkum & Moore,1995), blindfolded (Brooks & Meltzoff, 2002; Meltzoff & Brooks, 2008) orwhen the other's view is impeded by a barrier (Butler, Caron, & Brooks,2000; Caron, Kiel, Dayton, & Butler, 2002)-suggesting that children areaware that joint attention cannot occur under these conditions.

It needs to be emphasized that gaze following is not to be equated withjoint attention. Many animal species, such as dogs (Hare & Tomasello,1999), goats (Kaminski, Riedel, Call, & Tomasello, 2005), ravens (Schloegl,Kotrschal, & Bugnyar, 2007), and apes (Tomasello, Hare, & Agnetta, 1999)follow a conspecific or human's line of regard and thus monitor where anindividual is looking without thereby engaging in joint attention (seeCarpenter & Liebal, this volume). In the human case, however, gaze fol­lowing has a different quality. It often marks the beginning of an episode

396 Henrike Moll and Andrew N. Meltzoff

in which the "jointness" of the attention becomes manifest in "sharinglooks" (see Carpenter, Nagell, & Tomasello, 1998), gaze alternation betweenthe object and the co-attender, and shared affect or attitude toward theobject or event that is expressed in "knowing smiles" and vocalizations.Unlike in nonhuman animals, where gaze following serves as a mechanismto track where others orient individually, this behavior in humans, andeven human infants, often sets the stage for a "meeting of minds" (Bruner,1995) in an episode of joint attention.

Taken together, this shows that infants around their first birthdays orsoon thereafter engage in joint attentional relationships and possess thepractical ability to detect when another is ready for joint attention.However, it would be mistaken to think that the earliest instances of infantjoint attention imply a sophisticated understanding of another's mentalstates or perspectives. The reason is that it does not involve any explicitdetermination of what the other sees, let alone how another perceives agiven object from his or her viewpoint. Perspective taking, however,requires some such specification of what the other perceives, knows, orfeels, etc., when this differs from the child's own view or experience.Already some months after infants begin to jointly attend to objects withothers, they also demonstrate first instances of perspective taking thusconceived. The following is a brief analysis of these first forms of perspec­tive taking in late infancy and early childhood.

First Forms of Perspective Taking

For humans to interact effectively with each other, they need to keep trackof what they have and have not shared with whom in the past-what mustbe recognized as new for the other versus what can count as "commonground," whether this is a jointly witnessed event, a shared previous activ­ity, or the content of the previous discourse. This often does not requireany knowledge of what others know propositionally but rather what theyknow in the sense that is conveyed by "connaitre" in French, "kennen" inGerman, and "conocer" in Spanish-which is probably best translated as"being familiar" or "acquainted" with something from past experience.

Research has shown that even infants have the ability to determinewhat others know in this sense of the term. In a study by Tomasello andHaberl (2003), twelve- and eighteen-month-old infants and an adult jointlyengaged with two novel objects. A third object was presented to the childbut was not seen by the adult who was absent during this time. When theadult returned and made an excited expression coupled with a request for

Joint Attention as the Fundamental Basis of Understanding Perspectives 397

"that one," children at both ages were able to discern which of the threeobjects was new for the adult because she had not witnessed it earlier. Theinfants thus recognized what the adult was and was not familiar with,irrespective of their own familiarity with the objects.

Moll and Tomasello (2007) hypothesized that infants this young solvethis task only as long as they share the other person's experiences with theknown objects in joint engagement. The one-year-olds' joint attentionwith the adult around the first two objects allowed them to register theadult as knowing these objects later. The unknown object stuck out becauseinfant and adult had not previously shared it together. To test this hypoth­esis, the authors varied the specific way in which the adult became familiarwith the two known objects. In one condition-modeled on Tomaselloand Haberl's experimental condition-the adult shared her experience ofthe two known objects with the infant in joint engagement. In two otherconditions, (a) infants observed the adult examine the two known objectsindividually instead of in joint engagement, or (b) the adult looked onfrom afar as the infant and the assistant examined the two familiar objects.The adult then left the room while the assistant presented the third objectto the infant. In line with the hypothesis, fourteen-month-olds recognizedwhich object was new for the adult upon her return only when they hadshared her experience of the familiar objects. In the other conditions inwhich the objects were not shared, infants failed to identify what the adultwas referring to in her request. (By eighteen months, infants knew whatthe adult had experienced not just through joint attentional engagementbut also by observing the adult actively manipulate the known objects onher own.)

More empirical support for the view that infants come to understandwhat others experience through joint engagement stems from a study byMoll, Carpenter, and Tomasello (2007). They found that fourteen-month­olds did not pass the test when they witnessed an adult jointly engagingwith the familiar objects with another person from a third-person perspec­tive. Instead, infants had to share the objects with the adult directly in orderto register her as knOWing them. Importantly, this finding extends to otherresearch paradigms as well. In the context of joint attentional engagement,infants of fourteen months and older were equally able to (1) select anobject that was mutually familiar but had been shared in special waysbetween infant and adult prior to her making an ambiguous request for"it" (Moll, .Richter, Carpenter, & Tomasello, 2008) and (2) perceive anadult's expression of excitement as being directed holistically at an entireobject versus a part of an object, depending on whether the object was

398 Henrike Moll and Andrew N. Meltzoff

mutually known from previous interactions (Moll, Koring, Carpenter, &

Tomasello, 2006).Joint engagement is thus at least helpful, if not necessary, for infants at

fourteen months to register others as becoming familiar with something.This points out the critical importance of the second person. Infants donot learn about the social world mostly from third persons, from "he's"and "she's" whom they observe dispassionately from the outside. Instead,they learn first and foremost from the "you's" with whom they interactand engage in collaborative activities with joint goals and shared attention.As Heal (2005) puts it, "the basic subjects of psychological predicates willbe "us": viz. you and me" (p. 41).

This is not to say, of course, that infants or young children cannot learnfrom third parties by observing, eavesdropping, and overhearing. Forexample, eighteen-month-olds regulate their imitation of actions on anobject through observing an emotional interaction between two otherpeople (Repacholi & Meltzoff, 2007; Repacholi, Meltzoff, & Olsen, 2008),and they learn novel words by overhearing what third persons say to eachother (Floor & Akhtar, 2006). However, at the beginning-and this mayonly be a few months prior-learning takes place within the "I-thou"(Buber, 1958) relationship. Joint attention therefore seems to be a key toothers' minds. On this account, an understanding of others and theirattentional states and perspectives is an achievement that develops out ofthe experience of sharing objects and events with them. What comes firstis the sharing of attention and interest, not the understanding of theothers' individual attention, particular perspective, and how it differs fromone's own. Just as a deeper understanding of others' actions and goals doesnot precede infant imitation but has its origin there (Meltzoff, 2005, 2007),an advanced understanding of others' attentional states and perspectivesemerges out of joint attention.

Overestimating the Shared Perceptual Space

Interestingly, children's susceptibility for joint attention is so strong thatthey sometimes take a perceptually shared situation for granted, evenwhen no sharing is taking place. Moll, Carpenter, and Tomasello (2011)conducted a study in which two-year-old children shared two objects withan adult one by one. Children were then presented with a third object,but the adult did not see it until the test phase. What was independentlyvaried in a 2 x 2 design was whether (1) the adult was absent (leaving theroom after the second object) or copresent (remaining seated across from

Joint Attention as the Fundamental Basis of Understanding Perspectives 399

the child with a visual barrier blocking her view of the third object) and(2) the adult continued to verbally communicate with the child as thechild explored the third object, uttering generic expressions like "Oh, look,nice!" The task for children was to identify which of the three objects wasnew for the adult when she explicitly requested the "one she has not seenbefore" (the third object) from them later. The result was that the two-year­olds readily selected the target when the adult was absent and silent duringthe child's exploration of it. However, they were not able to differentiatebetween what was old and new for the adult when the adult was copresentat the time they were engaged with the target object-even though theadult's visual access to it was blocked. Children in this situation selectedobjects randomly, independently of whether the adult additionally com­municated to them. Communication alone, with no physical copresence,disrupted the children's knowledge-ignorance distinction less (neither didchildren's performance in this condition differ significantly from the"ideal" situation of an absent and silent adult, nor did it exceed chancelevel).

What these results indicate is that young children tend to assumeshared perceptual experiences when they are mutually engaged with oth­ers-they fail to detect others' ignorance of an object. The fact that theadult's copresence hampered the children's detection of her ignorance isin accordance with the view that physical copresence is the main indicatorof an experience's being shared (see also Schiffer, 1972). Especially foryoung children, the primordial sharing scenario is a face-to-face interac­tion with another individual in close proximity-which is exactly the situ­ation that was simulated in the conditions involving copresence. It needsto be explored in future research what exactly it takes, over and above thesimultaneous presence in the room, for children to presume a sharedexperience. To be sure, another's mere presence is not sufficient for infantsto presuppose that anything they perceive is perceived by the other aswelli otherwise, children would never point to or show objects to peoplein their proximity. Perceptual attention has a postural component: oneorients toward the thing one attends to, leans over it, approaches it, andso on, and thus an adult may need to-over and above simply "beingaround"--display some of these behaviors for young children to assumethat the other has taken notice of something. We also conjecture that theprevious joint attentional episode around the first two objects may havecontributed to children's behaVing as if the adult became familiar withthe third object also. A joint attentional sequence, once begun, may haveto be clearly closed by turning away, leaving, or notably commencing

400 Henrike Moll and Andrew N. Meltzoff

an occupation with something else for young children to register itstermination.

The fact that an effect was observed when an absent adult communi­cated to children shows that linguistic interaction can lead to an impres­sion of visual sharedness as well. This is in line with the everyday experiencethat it takes children some years to figure out that their conversationalpartner on the telephone does in fact not share their visual space and thuscannot see their gestures or understand demonstrative expressions such as"This one here!" Adults sometimes make similar slips of action, such aswhen they point to the monitor on their laptop while giving a talk to anaudience who sits facing them and thus cannot see what the speaker triesto show them. This yverestimation also works in the opposite directionfrom visual copresence to an impression of an auditorily shared space, ascan be seen when car drivers mumble "Go ahead" or "Thank you" to othertraffic members when they drive in their cars with the windows shut­acting as though they could be heard because the other is "right there" infront of them (see Epley, Morewedge, & Keysar, 2004).

What the Moll et a1. (2011) experiment shows is that, somewhat coun­terintuitively, children learn what others have or have not becomeacquainted with before they come to determine what they can see fromtheir specific viewpoints. When an adult left the room entirely, childrenlater knew that the adult was not familiar with whatever object was pre­sented to them during her absence (knowledge-ignorance distinction).However, when the adult remained copresent and co-oriented towardthem, they failed to understand that she did not see what they saw due toa barrier's blocking her vision (visual perspective taking). This developmen­tal order, with a broad distinction of others' familiarity versus ignoranceof things being in place before the ability to determine what others cansee in the here and now, is remarkable as it turns the view that perceptionis somehow fundamental or primary on its head. Instead, an understand­ing of "mere visual perception" needs some time to develop and followsan understanding of richer forms of engagement with something. It seemsthat children start out with an understanding of "engagement" holisticallyconceived. They recognize whether a person is or was engaged with anobject (one way or another), but they do not, at this early stage in theirdevelopment, understand the specifics of seeing in contrast to hearing orother forms of perceptual engagement. Over the course of development,this holistic grasp of engagement becomes more differentiated and eventu­ally includes knowledge about the "functioning" of, for example, visualversus auditory perception, the respective enabling and defeating condi-

Joint Attention as the Fundamental Basis of Understanding Perspectives 401

tions that .go with the particular senses, and the role that they play inknowledge formation. This is in accordance with experimental findings onthe development of visual perspective taking, the flowering of which lieswell after the emergence of an ability to distinguish between what has andhas not been shared-namely, between two and three years of age-as thefollowing studies suggest.

Visual Perspective Taking

The main theoretical organizing construct in the developmental literatureon visual perspective taking has been Flavell's (1992) distinction between"level I" and "level 2." A child who has reached level 1 can determinewhat objects another sees from a certain spatial position or where an objecthas to be placed in order to hide it from a person's view.

A seminal study was conducted by Masangkay, McCluskey, McIntyre,Sims-Knight, Vaughn, and Flavell (1974). In their experiment, an adultheld up a card between herself and the child. The side of the card facingthe child contained a picture of one animal, for example, a dog, while theside facing the adult showed a different animal, for example, a cat. Thechildren were previously shown both sides of the card and so knew whateach side depicted. Children were then asked what they themselves sawand what the adult saw. Most children at the age of 2.5 years and oldercould say what they saw and what the adult saw. Other studies have pro­vided converging evidence that level 1 perspective taking develops ataround 2.5 years of age-but also point at some limitations at this age. Ina study by Flavell, Shipstead, and Croft (1978), children were asked to hidean object from an adult. Three-year-olds, but not 2.5-year-olds, knew whereto place a barrier in order to interrupt an adult's visual perception of anobject (see also McGuigan & Doherty, 2002). In a study using a searchparadigm, Moll and Tomasello (2006) found that twenty-four-month-oldshave a nascent understanding of what others can and cannot see fromtheir viewpoint. An adult pretended to be searching for an object. Therewere two candidate objects in the room, both of which were well visibleto and equidistant from the child position, but behind (from the child'sperspective) one of the objects was an opaque barrier that blocked theadult's view to it. The twenty-four-month-olds selected this object signifi­cantly in response to the adult's searching but had no preference for it ina control condition.

The second step in Flavell's developmental model is dubbed level 2perspective taking and has been characterized as the understanding that

402 Henrike Moll and Andrew N. Meltzoff

people may not only see different things but see things differently. In thislevel, a child can determine, in philosophical terms, the specific "mode ofpresentation" (Frege, 1892) in which an object is given. Probably the mostwidely known level 2 test is Piaget and Inhelder's (1956) three-mountainproblem in which children are asked to specify how a doll sees a three­dimensional array from various positions by choosing from among a setof photographs. A test that is more suitable for preschoolers is the so-called"turtle task" (Masangkay et aI., 1974). In this task, a child and an adult siton opposite sides of a table with a picture of a turtle between them. Thechild, who Sees the picture right side up, is asked to say how she herselfsees the turtle ("right side up") and how the adult sees it ("upside down").The results showed that children at 4.5 years and older successfully "decen­tered" from their perspective and acknowledged that the adult saw thepicture in a different orientation. The younger children, however, mostlygave egocentric replies, judging that the adult saw the turtle as they did,that is, right side up.

Numerous studies since then have replicated that children below fourto five do not engage in level 2 perspective taking. For example, three-year­olds were no better than in the original turtle test when expressions withdistinctive features were used (e.g., "standing on its head" instead of"upside down") or when the test was embedded in an ecologically validevent like book reading-with the book orientated "the right way" or "thewrong way" (Flavell, Everett, Croft, & Flavell, 1981). Studies in which theeffects of an observer's distance on the appearance of objects was variedalso yielded negative results (Pillow & Flavell, 1986). Moreover, even train­ing three-year-olds by systematically presenting them with the perceptualchanges following a change of spatial location was insufficient (Taylor &

Hort, 1990). This has led researchers to conceive of level 2 as robust anduniform (see Flavell, 1992, for a review).

However, this view has recently been challenged. Moll and Meltzoff(2011) designed a novel level 2 perspective-taking task with color filters(though color filters have been used before; see Flavell, Flavell, & Green,1983; Gopnik & Astington, 1988; Taylor & Flavell, 1984). In their firstexperiment, thirty-six-month-old children were shown two identical-look­ing blue objects. An adult who stood at some distance, facing the children,saw one of the objects through a yellow color filter and the other througha transparent screen. One object was thus seen in the same color by childand adult (blue), whereas the other looked different to them: blue for thechild but green for the adult. The adult then looked straight ahead in thedirection of the objects and requested either "the blue one" or "the green

Joint Attention as the Fundamental Basis of Understanding Perspectives 403

one" from the children. Importantly, she did not indicate via gaze directionwhich of the two objects she was referring to. The thirty-six-month-oldssignificantly selected the correct object in response to both requests. Thatis, they chose the object that they and the adult saw blue when blue wasrequested, but they chose the object that only the adult saw green whengreen was requested. The children thus readily identified another person'sway of perceiving an object, whether this matched their own perceptionor not. In a subsequent study, children of the same age were also able toproduce a certain perception in an adult: they knew on which side of ayellow filter to place a blue object for an adult to see it green-even thoughit kept looking blue from their own perspective.

The pressing question, then, is how these data can be reconciled withthe previous findings. As noted above, there was a very strong empiricalconfirmation, with many replications, of the original finding that childrenyounger than four to five years of age cannot apprehend that others maysee things differently. The new color filter tests, in contrast, indicate thateven thirty-six-month-olds have such an understanding.

Confronting Perspectives

To resolve this apparent conflict, we argue that Moll and Meltzoff's (2011)color tests do not involve a particular element shared by the suite of social­cognitive tasks that are typically solved between ages four and five. In onetheoretical variant, Perner characterizes the cognitive step taken at the"threshold age" as the nascence of the ability to confront two (or more)perspectives on the selfsame object at the same time (see Perner, Stummer,Sprung, & Doherty, 2002). In the false-belief test, children have to acknowl­edge that the same object can be thought of as located in the drawer orthe cupboard, depending on one's epistemic or doxastic perspective. Inappearance-reality, two ways of construing a single object have to be con­fronted: the same thing can look to be one thing (e.g., a rock) but at thesame time really be another (a sponge). Likewise, in Doherty and Perner's(1998) "alternative naming game" children have to understand that oneand the same object, for example, a rabbit, can come under two differentsortals or conceptual perspectives (bunny and rabbit). Analogously, in theturtle task, children have to confront two visual perspectives on the sameobject and specify how they and the adult see it. Confronting perspectivesthus requires children to explicitly acknowledge a certain way of seeing orconceptualizing an object or situation while simultaneously being awareof another view or construal of the very same thing. If children pass such

404 Henrike Moll and Andrew N. Meltzoff

a test, it shows that they know that the same thing can come under dif­ferent descriptions or can be seen from various points of view at the sametime. In perspective taking, in contrast, children can "get away" with acomprehension of what the other is striving for in his or her actions orreferring to in his or her speech acts-but this can be accomplished withoutan awareness of the perspectival differences. We will further elaborate onthis below.

In accord with Perner's conceptual framework, we propose that three­year-olds can "take" but not yet "confront" visual perspectives-just asthey cannot confront perspectives in other realms. The idea is that thethirty-six-month-olds in Moll and Meltzoff's test succeeded because noconfrontation of visual perspectives was necessary. Instead, children justneeded to take or adopt a perspective that was already specified in theadult's request. In their first experiment, this was achieved by determiningwhich of two potential objects an adult saw in a certain, specified color.In their second experiment, it was achieved by producing a color percep­tion that was again specified by the adult. However, we predicted thatwhen the test is modified such that it involves a spontaneous specificationof the wayan object is seen by another person when this differs fromhow they see it at that time, then three-year-olds will fail, even if the mate­rial and basic procedure is kept the same and responses can be givennonverbally.

To test this hypothesis, Moll, Meltzoff, Merzsch, and Tomasello (submit­ted) modified the color filter test so that it involved a confrontation ofperspectives. After demonstrating to three- and 4.S-year-old children thatblue objects appear green when held behind a yellow colof filter, a bluepicture was placed directly in front of them. An adult then fixated the samepicture through the yellow filter, so that the picture looked green to herwhile it looked blue to the child. In direct analogy to the turtle task, chil­dren were asked how they see the picture (blue) and how the adult sees it(green). They responded by pointing to a blue or green color sample infront of them. As hypothesized, the younger children did not succeed inthis version of the task. Most three-year-olds claimed that the adult sawthe picture blue; only the 4.S-year-olds correctly judged that the adult sawthe picture green. Thus, despite the strong superficial similarities with Molland Meltzoff's color task, the results were more comparable to the super­ficially dissimilar turtle and other classic level 2 tests (see Masangkay et al.,1974, and Flavell et al., 1981).

As expected, then, three-year-olds do not yet have a full appreciationof visual perspectives-despite their impressive abilities in responding

Joint Attention as the Fundamental Basis of Understanding Perspectives 405

appropriately in many different perspective-taking scenario . What remainsto be done is an attempt to spell out exactly what the cognitive leap occur­ring sometime between the ages of four and five consists in. It seems to usthat the level I-level 2 dichotomy offers an insufficient description. Three­year-old children readily engage in some level 2 tests (Moll & Meltzoff,2011), but many tasks that fall under the level 2 description are not solveduntil much later. We will again draw upon the distinction between taking

and confronting to attempt to describe this development.Children engage in perspective taking when they make sense of

another person's action or utterance, even when it is contingent upon aperspective the children themselves do not have. This may involve deter­mining the goal of another person's action or the referent of his or herrequest-for example, by identifying which of two things a person issearching for because she~cannot see it (when the child does see it) orwhich of two objects someone refers to as being a certain color (whenthe child sees it in a different color). An analog in the epistemic domainis when children determine an agent's goal as a function of his or herdoxastic attitude-again, when the child's own epistemic state differs.This was accomplished by eighteen-month-olds in Buttelmann, Carpen­ter, and Tomasello's (2009) study, in which infants saw an agent tryingto open a box whose content had preViously been removed either sur­reptitiously in the agent's absence or while he was watching attentively.In the first case, they took the agent's goal to be the dislocated contentof the box and so showed him where it was moved; in the latter case,they inferred that he must be going for the box for some other reasonand so helped him to get it open.

Children also perspective take when they know how to modify theenvironment in order to enable or produce a certain, desired perspectivein another person, for example, by placing an object in a location whereit is no longer visible for an adult (McGuigan & Doherty, 2002) or appearsin a different color for her (Moll & Meltzoff, 2011). What is common toall these cases is that the perspective was already named or specified bythe adult, either verbally (e.g., by asking for a specific color) or throughher actions (e.g., searching, trying to open a box, etc.)-children neitherhad to predict nor explicitly contrast it with some other perspeCtive on theselfsame object at the same time. They just had to determine what an agentwas aiming for or referring to.

Confronting perspectives, on the other hand, requires something in addi­tion to a mere determination of what another is talking about or tryingto accomplish. It demands a judgment about ho~ something is seen or

406 Henrike Moll and Andrew N. Meltzoff

construed by someone (self or other) when an alternative view of the sameobject (thing, situation, state of affairs, etc.) is saliently available to thechild at that same moment. The ability to confront perspectives can onlybe demonstrated in the form of explicit judgments. One critically impor­tant implication of this is that this ability cannot be captured by looking­time procedures or other implicit tests (at least as so far designed), butmakes questions of the kind that are asked in traditional theory of mindtasks vitally important. Yet, the children's judgments need not be verbalbut can be made, for example, by pointing in a certain direction to predictwhere someone will go on the basis of his or her belief or pointing to aswatch to indicate in what color a person sees an object when the childrenthemselves perceive it in a different color. Children who cannot yet con­front perspectives will respond with whatever perspective first springs totheir mind-which one that is depends partly on the cognitive domainand contextual factors.

It may be helpfUl to distinguish between "mutually exclusive" and "notmutually exclusive" perspectives. Not mutually exclusive perspectives canbe held by one person at the same time, such as conceptual perspectives.Mutually exclusive ones cannot be occupied by a single person at a giventime. Perceptual and epistemic perspectives are of this kind because anidentical surface cannot look blue and green all over simultaneously, andone cannot know something to be true and have a false belief about it atthe same time. For this reason, tasks in these domains are structured insuch a way that the child's perspective is contrasted with either the simul­taneous perspective of another person (transfer of location task, visualperspective tasks) or with the child's own previous belief about the samestate of affairs (transfer of content task). Children in these situations willtend to give "egocentric" or "nunocentric" (focusing on the current knowl­edge) responses, simply because their own/present point of view is mostobvious and salient to them. This is why children in these tests often neednot report their own current knowledge because the error predictablyoccurs when the other's/their past perspective has to be made explicit.! Inappearance-reality tasks or the alternative naming game, the two perspec­tives are not mutually exclusive in the sense that the same person canconceptualize an object in alternative ways without any changes occurringin the world (e.g., things being moved to other places) or in the person'svisuospatial position. Children who fail to confront perspectives mostlysettle on one construal-which one this is depends on the task. In appear­ance-reality, most three-year-olds go with reality when an object's identityis at stake, stating that a glass of milk held behind a red color filter still

Joint Attention as the Fundamental Basis of Understanding Perspectives 407

looks like milk, not fruit punch. In contrast, when the question is aboutan object's property such as color, they will stick to phenomenology andsay that the (white) liquid behind the red color filter not only looks redbut really is red (Flavell, Green, & Flavell, 1986). Tasks using the samematerial can therefore yield opposing response patterns depending on theway the situation is conceptualized. In the alternative naming game, apuppet chooses to call an object by one name and the child, who waspreviously familiarized with an alternative label, is supposed to name it

differently. Here, children tend to simply repeat what the puppet justsaid-again, because they cannot confront two conceptualizations of thesame object (Doherty & Perner, 1998).

Perner's distinction that we adopted preserves the idea that there is animportant conceptual change between four and five years of age, and itprovides a unitary explanation for the various social-cognitive advance­ments that children are known to make during this time (see Perner,Brandl, & Garnham, 2003). Whether one is concerned with false belief,appearance::.reality, or conflicting perceptions, what allows children tounderstand these things is their ability to simultaneously confront perspec­tives on the selfsame object.

Concluding Remarks

The major question that has remained open is how children proceed fromone step to another and how they become able to take perspectives in thefirst place. We believe that taking and confronting perspectives both havetheir roots in infants' ability to engage in joint attention with others. Aswe have argued elsewhere (see Moll & Meltzoff, 2011; Tomasello & Moll,2010) the notion of perspective presupposes a shared, single object ontowhich the different perspectives converge. If you and I look out the windowbut you focus on a bus in the street whereas I look at a tree, we simplydeal with different objects of perception, not different perspectives. In themonths leading up to their first birthdays, infants share attention to thingswith others but, arguably, do not consider at all that they and the otherperson perceive the objects in different ways. What is of primary interestis that an object of perception becomes shared-from what particular pointof view it is seen is secondary. Through some process-not well-known butperhaps involving infants' comprehension of adults' communicative actsdesigned to draw their attention to particular aspects of the shared refer­ent-infants' sharing of attention is enriched to include various aspectsand perspectives on the joint object of attention. However, this is still, of

408 Henrike Moll and Andrew N. Meltzoff

course, just taking perspectives, and confronting them requires somethingmore. Again speculatively, it may be that the contrast of perspectives mustbe jointly attended to for children to become aware of it and notice the"clash" of perspectives.

In any case, the current results suggest that the classic theoretical dis­tinction between levelland level 2 perspective taking introduced byFlavell and colleagues (Flavell, 1977, 1992) is in need of revision. It mayseem that once a child can take another person's perspective at level 2, thegeneral understanding that two people can simultaneously see the samething differently should come "for free." It has been taken for granted thatthe key challenge lies in the ability to transcend one's own perspective andadopt that of another person. However, our data have shown that childrencan do this fairly early, long before the classic watershed between four tofive years of age. To be sure, a critical cognitive step is taken by childrenduring this time, but it has been misconstrued. What children have cometo learn when they first solve the false-belief and similar "theory of mind"tasks is not perspective taking but the explicit acknowledgment that a givenobject may be seen in alternative ways. They thus have acquired theconcept of perspective.

Acknowledgments

This work was supported by a Dilthey-Fellowship from the VolkswagenFoundation awarded to the first author and a grant from NSF (SBE-03S44S3)and ONR (N000140910097) awarded to the second author.

Notes

1. Another's perceptual perspective can be made more salient to children than theirown, which leads to "allocentric" responses (see Moll, Meltzoff, Merzsch, & Toma­sello, submitted). Children thus do not generally start out as egocentrists in a narrowsense of the word, projecting their own perception onto others. Instead, egocentricresponding is but one manifestation among others of children's inability to reallyunderstand perspectives and map the perspectives with specific people-the chil­dren as much lack an understanding of their own perspective as they lack an under­standing of that of the other.

References

Barresi, J., & Moore, C. (1993). Sharing a perspective precedes the understanding ofthat perspective. Behavioral and Brain Sciences, 16, 513-514.

Joint Attention as the Fundamental Basis of Understanding Perspectives 409

Brooks, R., & Meltzoff, A. N. (2002). The importance of eyes: How infants interpret

adult looking behavior. Developmental Psychology, 38, 958-966.

Brooks, R., & Meltzoff, A. N. (2005). The development of gaze following and its

relation to language. Developmental Science, 8, 535-543.

Bruner, J. (1995). From joint attention to the meeting of minds: An introduction.

In C. Moore & P. J. Dunham (Eds.), Joint attention: Its origins and role in development(pp. 1-14). Hillsdale, NJ: Erlbaum.

Buber, M. (1958). I and thou (R. G. Smith, Trans.). New York: Scribner's.

Butler, S. c., Caron, A. L & Brooks, R. (2000). Infant understanding of the referential

nature of looking. Journal of Cognition and Development, 1,359-377.

Buttelmann, D., Carpenter, M., & Tomasello, M. (2009). Eighteen-month-old infants

show false belief understanding in an active helping paradigm. Cognition, 112,337-342.

Butterworth, G., & Jarrett, N. (1991). What minds have in common is space: Spatial

mechanisms serving joint visual attention in infancy. British Journal ofDevelopmentalPsychology, 9, 55-72.

Campbell, J. (2005). Joint attention and common knowledge. In N. Eilan, C: Hoerl,

T. McCormack, & J. Roessler (Eds.), Joint attention: Communication and other minds(pp. 287-297). Oxford: Oxford University Press.

Caron, A. J., Kiel, E. L Dayton, M., & Butler, S. C. (2002). Comprehension of the

referential intent of looking and pointing between 12 and 15 months. Journal ofCognition and Development, 3, 445-464.

Carpendale, J., & Lewis, C. (2006). How children develop social understanding. Oxford:

Blackwell.

Carpenter, M., Nagell, K., & Tomasello, M. (1998). Social cognition, joint attention,

and communicative competence from 9 to 15 months of age. Monographs of theSociety for Research in Child Development, 63(4, Serial No. 255).

Corkum, V., & Moore, C. (1995). Development of joint visual attention in infants.

In C. Moore & P. J. Dunham (Eds.), Joint Attention: Its origins and role in development(pp. 61-83). Hillsdale, NJ: Erlbaum.

Davidson, D. (2001). Subjective, intersubjective, objective. Oxford: Oxford University

Press.

Deak, G. 0., Flom, R. A., & Pick, A. D. (2000). Effects of gesture and target on 12­

and 18-month-olds' joint visual attention to objects in front of or behind them.

Developmental Psychology, 36, 511-523.

Doherty, M., & Pemer, J. (1998). Metalinguistic awareness and theory of mind: Just

two words for the same thing? Cognitive Development, 13, 279-305.

410 Henrike Moll and Andrew N. Meltzoff

Epley, N., Morewedge, C. K., & Keysar, B. (2004). Perspective taking in children andadults: Equivalent egocentrism but differential correction. Journal of Experimental

Social Psychology, 40, 760-768.

Flavell, ]. H. (1977). The development of knowledge about visual perception. InC. B. Keasey (Ed.), The Nebraska Symposium on Motivation: Vol. 25. Social cognitivedevelopment (pp. 43-76). Lincoln: University of Nebraska Press.

Flavell, ]. H. (1992). Perspectives on perspective taking. In H. Beilin & P. B. Pufall

(Eds.), The Jean Piaget symposium series: Vol. 14. Piaget's theory: Prospects and possibili­

ties (pp. 107-139). Hillsdale, NJ: Erlbaum.

Flavell, J. H., Everett, B. A., Croft, K., & Flavell, E. R. (1981). Young children's knowl­edge about visual perception: Further evidence for the level I-level 2 distinction.Developmental Psychology, 17,99-103.

Flavell, J. H., Flavell, E. R., & Green, F. L. (1983). Development of the appearance­reality distinction. Cognitive Psychology, 15, 95-120.

Flavell, J. H., Green, F. L., & Flavell, E. R. (1986). Development of knowledge aboutthe appearance-reality distinction. Monographs of the Society for Research in Child

Development, 51 (1, Serial No. 212).

Flavell, J. H., Shipstead, S. G., & Croft, K. (1978). Young children's knowledge

about visual perception: Hiding objects from others. Child Development, 49, 1208­1211.

Floor, P., & Akhtar, N. (2006). Can 18-month-old infants learn words by listeningin on conversations? Infancy, 9, 327-339.

Frege, G. (1892). Ober Sinn und Bedeutung. Zeitschrift {iir Philosophie und philoso­

phische Kritik, 100, 25-50.

Gallagher, S., & Hutto, D. D. (2008). Understanding others through primary inter­

action and narrative practice. In J. Zlatev, T. P. Racine, C. Sinha, & E. Itkonen(Eds.), The shared mind: Perspectives on intersubjectivity (pp. 17-38). Amsterdam:

Benjamins.

Gopnik, A., & Astington, J. W. (1988). Children's understanding of representationalchange and its relation to the understanding of false belief and the appearance­reality distinction. Child Development, 59, 26-37.

Hare, B., & Tomasello, M. (1999). Domestic dogs (Canis familiaris) use human and

conspecific social cues to locate .hidden food. Journal ofComparative Psychology, 113,

173-177.

Heal, ]. (2005). Joint attention and understanding the mind. In N. Eilan, C. Hoerl,T. McCormack, & ]. Roessler (Eds.), Joint attention: Communication and other minds

(pp. 34-44). Oxford: Oxford University Press.

Joint Attention as the Fundamental Basis of Understanding Perspectives 411

Kaminski,]., Riedel,]., Call, ]., & Tomasello, M. (200S). Domestic goats, Capra hircus,

follow gaze direction and use social cues in an object choice task. Animal Behaviour,

69, 11-18.

Kannetzky, F. (2007). What makes cultural heredity unique? On action-types, inten­

tionality, and cooperation in imitation. Mind & Language, 22, 592-623.

Masangkay, Z. 5., McCluskey, K. A., McIntyre, C. W., Sims-Knight, ]., Vaughn, B. E.,

& Flavell,]. H. (1974). The early development of inferences about the visual percepts

of others. Child Development, 45, 357-366.

McGuigan, N., & Doherty, M.]. (2002). The relation between hiding skill and judg­

ment of eye direction in preschool children. Developmental Psychology, 38,

418-427.

Meltzoff, A. N. (200S). Imitation and other minds: The "like me" hypothesis. In S.

Hurley & N. Chater (Eds.), Perspectives on imitation: From neuroscience to social science

(Vol. 2, pp. 55-77). Cambridge, MA: MIT Press.

Meltzoff, A. N. (2007). "Like me": A foundation for social cognition. Developmental

Science, 10, 126-134.

Meltzoff, A. N., & Brooks, R. (2008). Self-experience as a mechanism for learning

about others: A training study in social cognition. Developmental Psychology, 44,

1257-1265.

Moll, H., Carpenter, M., & Tomasello, M. (2007). Fourteen-month-olds know what

others experience only in joint engagement. Developmental Science, 10, 826-835.

Moll, H., Carpenter, M., & Tomasello, M. (2011). Social engagement leads 2-year-olds

to overestimate others' knowledge. Infancy, 16, 248-265.

Moll, H., Koring, c., Carpenter, M., & Tomasello, M. (2006). Infants determine

others' focus of attention by pragmatics and exclusion. Journal of Cognition and

Development, 7, 411-430.

Moll, H., & Meltzoff, A. N. (2011). How does it look? Level 2 perspective-taking at

36 months of age. Child Development, 82, 661-673.

Moll, H., Meltzoff, A. N., Merzsch, K., & Tomasello, M. (submitted). Taking versusconfronting visual perspectives in preschool children.

Moll, H., Richter, N., Carpenter, M., & Tomasello, M. (2008). Fourteen-month-olds

know what "we" have shared in a special way. Infancy, 13, 90-101.

Moll, H., & Tomasello, M. (2004). 12- and 18-month-old infants follow gaze to

spaces behind barriers. Developmental Science, 7, Fl-F9.

Moll, H., & Tomasello, M. (2006). Level 1 perspective-taking at 24 months of age.

British Journal ofDevelopmental Psychology, 24, 603-613.

412 Henrike Moll and Andrew ~. Meltzoff

Moll, H., & Tomasello, M. (2007). How 14- and 18-month-olds know what others

have experienced. Developmental Psychology, 43, 309-317.

Moore, c., & Corkum, V. (1994). Social understanding at the end of the first year

of life. Developmental Review, 14,349-372.

Murphy, C. M., & Messer, D.]. (1977). Mothers, infants and pointing: A study of a

gesture. In H. R. Schaffer (Ed.), Studies in mother-infant interaction (pp. 325-354).

London: Academic Press.

Perner, J. (1991). Understanding the representational mind. Cambridge, MA: MIT Press.

Perner, J., Brandl, ]. L., & Garnham, A. (2003). What is a perspective problem?

Developmental issues in belief ascription and dual identity. Facta Philosophica, 5,

355-378.

Perner, J., Stummer, S., Sprung, M., & Doherty, M. (2002). Theory of mind finds its

Piagetian perspective: Why alternative naming comes with understanding belief.

Cognitive Development, 17, 1451-1472.

Piaget, J., & Inhelder, B. (1956). The child's conception of space. London: Routledge.

Pillow, B. H., & Flavell, J. H. (1986). Young children's knowledge about visual percep­

tion: Projective size and shape. Child Development, 57, 125-135.

Repacholi, B. M., & Meltzoff, A. N. (2007). Emotional eavesdropping: Infants

selectively respond to indirect emotional signals. Child Development, 78, 503­

521.

Repacholi, B. M., Meltzoff, A. N., & Olsen, B. (2008). Infants' understanding of the

link between visual perception and emotion: "If she can't see me doing it, she won't

get angry." Developmental Psychology, 44, 561-574.

Schiffer, S. R. (1972). Meaning. Oxford: Oxford University Press.

Schloegl, c., Kotrschal, K., & Bugnyar, T. (2007). Gaze follOWing in common ravens,

Corvus corax: Ontogeny and habituation. Animal Behaviour, 74, 769-778.

Seemann, A. (2007). Joint attention, collective knowledge, and the "we" perspective.

Social Epistemology, 21, 217-230.

Taylor, M., & Flavell, ]. H. (1984). Seeing and belieVing: Children's understanding

of the distinction between appearance and reality. Child Development, 55,

1710-1720.

Taylor, M., & Hort, B. (1990). Can children be trained in making the distinction

between appearance and reality? Cognitive Development, 5, 89-99.

Tomasello, M., & Haberl, K. (2003). Understanding attention: 12- and 18-month­

olds know what is new for other persons. Developmental Psychology, 39, 906-912.

Joint Attention as the Fundamental Basis of Understanding Perspectives 413

Tomasello, M., Hare, B., & Agnetta, B. (1999). Chimpanzees, Pan troglodytes, followgaze direction geometrically. Animal Behaviour, 58, 769-777.

Tomasello, M., Hare, B., Lehmann, H., & Call, ]. (2007). Reliance on head versuseyes in the gaze following of great apes and human infants: The cooperative eyehypothesis. Journal of Human Evolution, 52, 314-320.

Tomasello, M., & Moll, H. (2010). The gap is social: Human shared intentionalityand culture. In P. Kappeler &]. Silk (Eds.), Mind the gap: Tracing the origins ofhuman

universals (pp. 331-349). Berlin: Springer.