How does skilled conditional reasoning develop? Testing structural intuition in young gifted...

74
How does skilled conditional reasoning develop?: Testing structural intuition in young gifted children Author: Emily Morson, Northwestern University Abstract Several theories claim that a high working memory span is needed for accurate conditional reasoning, placing it out of reach for children under 12. Yet prior research (Wolf & Shigaki 1983) suggests gifted 5-8 year olds may reason at adultlike levels, despite age-typical working memory. However, this research did not test gifted children on the problems which are considered impossible for young children, the ones with uncertain solutions. To extend these findings, we compare gifted and typically developing 5-8 year olds; to explain them, we test a new model of skilled conditional reasoning, in which people solve problems by mostly unconscious analysis of their implicitly learned abstract structure. Thus, gifted children should solve abstract problems like “if there is a blicket there is a dax” as accurately as concrete ones like “if the boots have dots the scarf has stripes.” Working memory and executive function (tapped by a rote task) should not exhaustively explain reasoning; substantial variance should be left over after covarying working memory and rote performance. So, despite claims that “reasoning is little more than working memory,” higher-level cognition exists, and is needed to explain skilled conditional reasoning. Keywords: reasoning, development, conditionals, working memory, implicit learning, gifted

Transcript of How does skilled conditional reasoning develop? Testing structural intuition in young gifted...

How does skilled conditional reasoning develop?: Testing structural intuition in young gifted children Author: Emily Morson, Northwestern University Abstract

Several theories claim that a high working memory span is needed for accurate conditional

reasoning, placing it out of reach for children under 12. Yet prior research (Wolf & Shigaki 1983)

suggests gifted 5-8 year olds may reason at adultlike levels, despite age-typical working memory.

However, this research did not test gifted children on the problems which are considered

impossible for young children, the ones with uncertain solutions. To extend these findings, we

compare gifted and typically developing 5-8 year olds; to explain them, we test a new model of

skilled conditional reasoning, in which people solve problems by mostly unconscious analysis of

their implicitly learned abstract structure. Thus, gifted children should solve abstract problems

like “if there is a blicket there is a dax” as accurately as concrete ones like “if the boots have dots

the scarf has stripes.” Working memory and executive function (tapped by a rote task) should not

exhaustively explain reasoning; substantial variance should be left over after covarying working

memory and rote performance. So, despite claims that “reasoning is little more than working

memory,” higher-level cognition exists, and is needed to explain skilled conditional reasoning.

Keywords: reasoning, development, conditionals, working memory, implicit learning, gifted

Conditional (if-then) reasoning is perhaps the most widely used form of deductive reasoning. People use it to make predictions (“if I go to the party, Bob will be there”) imagine counterfactual situations (“if the U.S. hadn’t dropped the atomic bomb on Japan, the U.S. would have had to invade Japan”), evaluate beliefs (“if a person criticizes his country during wartime, he is being unpatriotic”), and even discipline children (“if you clean your room, you can have ice cream after dinner”).

There are four ways to add information to an if-then sentence, which produce four types of problems. Suppose we start with a sentence of the form “if a then b,” e.g. “if it rained, then the grass is wet.” Adding that a is true produces a modus ponens (MP) argument: “If it rained, then the grass is wet.” (if a then b) “It rained.” (a) Most children and adults have no difficulty drawing the correct conclusion: the grass is wet (b follows). One meta-analysis found that nearly 100% of participants correctly answer MP (Schroyens, Schaeken, & D’Ydewalle, 2001), as did another involving 2774 participants (Schroyens & Schaeken, 2003, cited in Ali et al, 2010). Children draw MP inferences in everyday conversation and in propositional contexts with concrete materials as early as preschool (Scholnick & Wing, 1991; Chao & Cheng, 2000).

A modus tollens (MT) argument is somewhat harder:

“If it rained, then the grass is wet.” (if a then b).

“The grass is not wet” (not b) The correct conclusion is that it did not rain (not a). Adults have little difficulty with this inference; a meta-analysis of 2774 participants found they achieved 72% correct (Schroyens & Schaeken, 2003, cited in Ali et al, 2010), while in another meta-analysis, accuracy levels reached 91% (Schroyens, Schaeken, & D’Ydewalle, 2001). Preschool children draw MT inferences both in everyday conversation (Scholnick & Wing, 1991) and in propositional contexts with concrete materials and simple instructions (Chao & Cheng, 2000).

Affirmation of the consequent (AC) is as follows: “If it rained, then the grass is wet.” (if a then b)

“The grass is wet” (b) Many people mistakenly conclude that it rained (if b then a). In fact, the answer can’t be determined from the given information. The grass could be wet because of a sprinkler, for example. Denial of the antecedent (DA) poses similar problems: “If it rained, then the grass is wet.” (a then b) “It did not rain.” (not a) Once again, the answer cannot be determined. Denying the antecedent means the relationship proposed in the if-then statement does not apply, so one cannot draw any conclusion about the wetness of the grass.

Both adults (Verscheuren, Schaeken & D’Ydewalle, 2001) and children (Barrouillet, Grosset & Lecas, 2000; Barrouillet & Lecas, 1998, 1999; Chao & Cheng, 2000; Wildman & Fletcher, 1979; Markovits, 2000; Markovits et al, 1996; Roberge, 1970; O’Brien & Shapiro, 1968) often give falsely determinate answers to the uncertain forms (AC and DA ) For adults, such answers occur between 23-89% of the time for AC and 17-82% of the time for DA (Schroyens, Schaeken, & D’Ydewalle, 2001). Adults also seem to find problems where they

know the positive truth of the premise or conclusion (MP and AC) easier than ones where they know only what is false (MT and DA). Perhaps as a result, they tend to give determinate inferences more often for MP than MT, and sometimes for AC than DA (Schroyens & Schaeken, 2003, cited in Ali et al, 2010). In short, adults show the accuracy pattern MP > MT > DA >= AC (Klauer, Beller & Hutter, 2010), a result many have taken to indicate that they do not reason according to the rules of formal logic1.

Adults often commit errors because they focus on the real-world content of the problem and answer based on factual knowledge rather than logic (unless cued otherwise, see Dias, Roazzi, & Harris, 2005). 2 For this reason, providing real-world counterexamples can improve adults’ and teenagers’ performance on uncertain problems (Overton, Byrnes & O’Brien, 1985) perhaps because it uses the same real-world framework participants use but introduces information they were not using. Children seem even more prone than adults to rely on factual knowledge, and they do not benefit from real-world counterexamples until about age 10 (Markovits et al., 1996). Children do not spontaneously give correct answers to the uncertain problems without experimental intervention before age 12 (Markovits et al, 1996).

Recent research has overlooked a puzzling fact about certain inferences: they seem intuitive. Not only do we almost always believe MP inferences (i.e. if p then q; p; therefore q), but they seem self-evident. We obtain this answer with seemingly no deliberation, and the answer feels as if it “has to be true.” Indeed, it is not clear what proof could be more convincing than this intuition itself (Rips, 1983; Rips, 1994). Neither do we know how we make the MP inference; it seems inherent in the meaning of the statements themselves rather than a deduction we ourselves draw. If we were to try to explain our reasoning, we would end up simply repeating the premises; the answer seems to come out of the process of comprehending the statements. Several other inferences seem similarly intuitive, such as certain or inferences (e.g., “if p or q, then r; p; therefore r” or “a is b or c; a is not b; therefore a is c”; Rips, 1994). If someone did not draw the correct conclusions, we would assume they did not understand the meaning of “if” or or.”

Even as complex an inference as transitivity (which requires representing two relations between three entities; Halford, 1998; Halford, 1984) seems intuitive. Individuals who implicitly learned that features A and B covaried and, independently, that features B and C covaried, spontaneously began to expect features A and C to covary, despite lack of conscious desire to learn these associations and lack of knowledge about what they had learned; thus, transitivity can be automatic and effortless (Lewicki, Hill & Czyszewska, 1992).

1 Much of the research on conditional reasoning has debated whether adults are “rational.” Inspired by the results of the Wason Selection Task, now rarely used, a number of researchers argued that adults do not truly reason but instead answer based on heuristics (Woodsworth & Sells, 1935), real-world knowledge, or pragmatic schemas (Cheng & Holyoak, 1985). The major theoretical programs today all acknowledge that people have some basic competence and mechanisms for reasoning (e.g., Rips, 1994; Johnson-Laird, Byrne & Schaeken, 1992; Oaksford & Chater, 2001; Evans, 2003), but that their actual performance suffers from interference by conversational implicatures (Verschueren, Schroyens, Schaeken & D’Ydewalle, 2001), real-world knowledge (Markovits et al., 1998), and limitations in long-term memory retrieval and working memory span (Markovits & Quinn, 2002; Johnson-Laird, 2001; Markovits, 2002). 2 The phenomenon of “belief bias” provides further evidence that, contrary to standard logic, the content of a problem can be more important for adults than the logical form (e.g., De Neys, 2006). “Belief bias” occurs when the logically correct answer conflicts with a real-world truth, e.g. “If it rained, the grass is dry. It rained. Conclusion: the grass is wet.” The logically correct answer is that the grass is dry, but many adults mistakenly conclude that it is wet because they answer based on real-world knowledge rather than logical form.

These intuitive solutions seem to share certain characteristics: the processing involved is inaccessible to consciousness; it occurs quickly, automatically and seemingly effortlessly; it seems tied to a particular sort of problem (e.g., MP) rather than the content it represents; and the solution feels as if it “has to be true.” Most importantly, they involve structure. People must recognize that they are dealing with, say, an MP problem and retrieve the appropriate answer. How do they do it? The key is that all MP problems share a common logical structure, and usually a common sentence structure as well:

If a, then b.

a.

The same could be said for “or” statements or transitivity. Furthermore, we instantly recognize the answer to an MP problem whether it concerns our friends’ behavior (“if Kate learns a secret, she will repeat it to everyone”), arbitrary associations (“if the boots have polka dots, the scarf has stripes”) or even nonsense words (“if there is a wug, there is a dax”); Thus, we rely on deep logical structure (cued by words such as “if” and “then”) rather than surface features like problem content. Furthermore, we learn how to abstract this structure at an early age; three and four year olds draw appropriate MP inferences (Scholnick & Wing, 1991; Chao & Cheng, 2000). Because of the importance of structure and the intuitiveness of the solutions produced, let us call the cognitive process under discussion “structural intuition.”

Because MP problems always have the same logical and linguistic structure (“if a then b; a”) and the same answer (the second clause, b), these problems could theoretically be solved by a simple pattern-recognition process well within the reach of young children, in which the pattern automatically activates the associated answer. Because the association between the structure of the problem and the answer is so automatic, as soon as we see the problem, we can retrieve the answer, thus producing the union between interpretation and solution so often found for MP. Furthermore, the answer seems obvious because it constitutes part of the long term memory structures we use in comprehension.

Thus, while structural intuition and insight both involve a sudden awareness of the solution and a feeling that the answer must be true, they possess these characteristics for different reasons. In insight, people solve a problem by reconceptualizing it, which generally takes time and effort and often occurs after an impasse (Beeman et al, 2004; Subramaniam et al, 2009). The information needed to reconceptualize the problem is activated, but too weakly to immediately enter consciousness. The answer suddenly bursts into consciousness either when people stop focusing on the more strongly-activated dominant interpretation or their attention becomes diffuse enough to pick up on the weakly activated answer (Subramaniam et al, 2009). By contrast, reasoners never need to reinterpret problems like MP. Their intuition consists of interpreting and answering the problem together and immediately. In structural intuition, the answer appears quickly and holistically bundled with the meaning of the conditional sentence because during the comprehension of the conditional, the reasoner instantly recognizes the structure and retrieves the answer that accompanies that structure. The answer feels certain because of its deep establishment in long term memory and its automatic, intimate connection with the interpretive process.

More problems have consistent structure than the few that everyone finds intuitive. For instance, MT problems also have a consistent structure, regardless of content:

If it rained, then the sidewalk is wet. The sidewalk is not wet.

_________________________ Therefore, it did not rain.

If blicket is true, then dax is true.

Dax is not true. _________________________ Therefore, blicket is not true.

When we see the if-then sentence and a denial of the second clause, regardless of content, we can automatically deny the first clause as well. The uncertain problems also have consistent structure, although in this case the normative answer to retrieve is “no certain conclusion.” As with MP, any conditional could be solved through simple pattern recognition and long-term memory retrieval.

Indeed, in my informal observations, skilled adult reasoners find a broader range of deductions intuitive, sometimes including MT and the uncertain forms. I suggest that one measure of a reasoner’s skill might be the range of problems for which they experience intuition and feelings of certainty. Thus, structural intuition can apply to different problems for different individuals. It may also apply to different problems at different times for the same individual. That is, a child might develop structural intuition for MP and the other basic inferences by preschool, for MT around age 10, and for AC and DA in high school. In this respect, my claims concur with those of Falmagne (1990), who believes that children abstract the logical structure of specific inferences by encountering instances and receiving feedback either from other speakers or from real-world events, and Braine (1990), who suggests that they learn the meaning of “if” and modus ponens from contingencies between if-then sentences and real-world events, e.g. “If you look in the box, you’ll find your toy.” In keeping with these hypotheses, I suggest that structural intuition is learned, with learning occurring at different rates for different problems. In other words, just as expertise is domain-specific, structural intuition is problem-specific.

How do people learn the association between the structure of conditionals and their answer? Since children receive no explicit teaching of conditional reasoning until at least middle school, they probably learn them implicitly.

Most researchers define implicit learning as acquiring complex, abstract information without a conscious decision to learn it and without complete verbalizable knowledge of what has been learned—or often, that anything has been learned at all (Seger, 1994; Lewicki, Hill & Czyzewska, 1992; Reber, 1967; Reber, 1989; Reber & Lewis, 1977). Indeed, even college students given unlimited time and a large monetary reward cannot detect the patterns they have implicitly learned (Lewicki, Hill & Czyzewska, 1992). Attempts to figure out the rules behind a task seem to interfere with both implicit learning and later performance (e.g., Reber, 1989). Such information lasts for weeks and even years, even after any concomitant explicit knowledge has faded (Allen & Reber, 1980; Meuelemans, Van der Linden, & Perruchet, 1998), and persists in amnesic patients who can no longer form explicit memories (Seger, 1994). The knowledge abstracted ranges from highly concrete and procedural (e.g., more efficient visual search; e.g., Meuelemans, Van der Linden, & Perruchet, 1998) to highly abstract (representations of semantic categories; Goschke & Bolte, 2007; Lewicki, Hill & Czyzewska, 1992; DeCaro, Thomas & Beilock, 2008, and artificial grammar structure; Reber, 1967; Reber & Lewis, 1977; Reber, 1989).

Most importantly for our purposes, implicit learning allows children and adults to induce formal structures too complex for them to consciously comprehend. For instance, in an

unpublished study, Czyzewska and colleagues found that 4-5 year olds could easily and unconsciously acquire interactions between variables that they lacked the cognitive resources to express verbally (cited in Lewicki, Hill & Czyzewska, 1992).

Implicit learning also applies well to children because it functions in the absence of intentional, strategic learning, and young children appear to lack both learning strategies and a motivation to learn conditional reasoning.

Some researchers would disagree that conditional reasoning could be implicitly learned, claiming that for tasks that can be optimally solved using a verbalizable, logical rule, people instead use explicit hypothesis testing (DeCaro, Thomas & Beilock, 2008). Some studies directly comparing explicit and implicit learning methods for learning rules versus complex feature integration found that the explicit methods led to more effective learning of the rules (DeCaro, Thomas, & Beilock, 2008; Mathews et al, 1989). In fact, we already know a case where children can implicitly learn information expressible as explicit rules: grammar learning (Gomez & Gerken 2000, 1999).3 Children absorb grammatical rules easily in early childhood, but comprehend them only painstakingly when explicitly taught in school. Logical rules may function the same way4.

One might resolve the contradiction by dividing tasks up not based on whether they require rules, but on whether explicit processing lies within subjects’ capabilities. Studies of tasks that find an explicit advantage for rules often use adult subjects capable of representing the rules in question; grammar learning uses less verbalizable rules. Whether a task falls into the “representable” or “nonrepresentable” categories, and thus requires explicit or implicit processing, varies depending on the age and working memory span of the participants. For children younger than eight, who due to working memory limitations can process two relations only with difficulty (Markovits, 2002; Andrews & Halford, 1998), the uncertain conditionals (AC & DA) may well fall into the “nonrepresentable” category that requires implicit processing.

Can children really learn somewhat complex inferences like MT, AC, and DA simply by encountering them in their everyday environments? A set of experiments by Falmagne (1990) suggests that they can. When she provided examples of MT along with feedback, 8-11 year olds were able to acquire the form and generalize it to new problems. More impressively, with the same method, 10-11 year olds learned to give “can’t tell” responses to AC problems. Falmagne argues that these children abstracted the form of the inference. It remains to be seen whether these findings generalize to younger children and more naturalistic learning situations, however, they constitute a “proof of concept” that the implicit learning required for structural intuition is possible.

Structural intuition presents a new account of conditional reasoning development directed at explaining why some reasoning problems are so often easy and intuitive, a fact that recent theories, focused on explaining reasoning errors, seem to overlook completely. It may also explain some puzzling findings that pose problems for other theories. Specifically, these theories 3 It remains controversial whether children learn the grammar of their native language or whether it comes hardwired as a module, although some evidence suggests that they do. However, infants can implicitly learn artificial grammars (Gomez & Gerken, 2000, 1999), and implicit learning may play a role in non-native language learning (Hulstijn, 2005). 4 By “mental rules,” I do not mean that children necessarily represent logical structures in the form of propositions. However children represent them, logical structures naturally lend themselves to description as “rules,” because a) they have patterns easily expressed in formalisms like “if p then q; q; therefore p,” and b) like production rules, they link a stimulus with a specific response.

claim that accurate reasoning can only occur with a high working memory span, but young gifted children may reason at an adult level without possessing anything like the working memory of a teenager or adult. I will present this test case and then explain why structural intuition resolves the paradox.

Test Case: Conditional reasoning in young gifted children

Preliminary evidence suggests the existence of children who reason at adult levels despite age-appropriate working memory. I will first examine the evidence that gifted children reason at such high levels, and then explain why this poses problems for several prominent theories of conditional reasoning.

Gifted children, as defined by the No Child Left Behind Act, must “give evidence of high achievement capability in areas such as intellectual, creative, artistic, or leadership capacity, or in specific academic fields,” and require “services or activities not ordinarily provided by the school in order to fully develop these capabilities.”5 Most states use similar definitions. Common definitions include the top 3% (Lohman, 2009) or IQ above 130 (Hollingworth, 1938). Gifted children also share a number of cognitive characteristics.

Researchers have claimed gifted children demonstrate an unusual degree of insight (e.g., Sternberg & Davidson, 1984), although their definition of insight differs from the mainstream of insight research (e.g., Beeman et al, 2004, Subramaniam et al, 2009). Gifted children outperform peers at representing problems and developing elaborate solution strategies (review cited in Barfurth et al, 2009). They also excel at sifting relevant information from large amounts of irrelevant data, relating new information to existing knowledge, and synthesizing seemingly isolated pieces of information into a new and useful idea (Davidson & Sternberg, 1984). In these characteristics, gifted children resemble adult experts (Barfurth et al, 2009), and indeed, some have speculated that giftedness is essentially expertise at a younger than usual age (Barfurth et al, 2009). Expertise and insight both resemble structural intuition; however, it remains unclear whether the particular characteristics of gifted children relate directly to structural intuition. The tasks used in these studies were complex problems involving meaningful content, to which children could apply real-world knowledge, making it still more difficult to generalize from them to a conditional reasoning context.

Wolf and Shigaki (1983) tested 160 gifted children, ages 4-11, on an array of increasingly complicated deductive reasoning problems, with the least difficult being MP (if a then b; a) and MT (if a then b; not b).

Children as young as 5 got 73.3% of MP and 56.7% of MT syllogisms correct. By age 7 they answered 98.3% of MP and 90.0% of MT syllogisms correctly. Thus, they performed comparably to adults, nearly 100% of whom answer MP conditionals and up to 91% of whom answer MT conditionals correctly (Schroyens, Schaeken, & D’Ydewalle, 2001). The children in this study all scored above 130 IQ, with an average IQ of 142.4 (SD 11.1), on either the Stanford-Binet or the Wechsler Intelligence Scale for Children.

5 U.S. laws defining giftedness have followed a similar definition since at least 1972. The No Child Left Behind Act is simply the most recent example.

Critically, this study did not include the uncertain forms, AC and DA, which are the ones young children supposedly lack the working memory to solve. Thus, it remains unclear whether gifted children would also perform comparably to adults on the uncertain forms. I believe they may, for two reasons. First, gifted children become skilled reasoners by adolescence and adulthood (Means & Voss, 1996). Second, precocious abstract reasoning ability has been observed from the preschool years on in highly gifted children, although these reports are unfortunately anecdotal (Lovecky, 1994). Young gifted children excel at discovering logical flaws, using analogies to solve problems, and grasping the main point of an idea. For instance, one child, from ages 2 to 4, learned 11 different languages to find out whether there had been a parent language (Feldman, 1986, cited in Lovecky, 1994). Compared to this level of abstract conceptualization, recognizing the structure of MP, MT, AC or DA problems should be trivial.

One might argue that gifted children achieve such high performance because they have the working memory span of a child twelve or older who can solve such problems. Yet research suggests otherwise. A study of 456 gifted 3rd graders’ performance on the WISC-R found that 54.4% scored within the average range on digit span, and many of the 7.9% who scored below average did so on digit span (Wilkinson, 1993). The gifted group used to norm the WISC-IV test scored only 112.5 on the Working Memory subtest, within 1 standard deviation of the mean (Wechsler, 2003). Using the same test, the Gifted Development Center found that gifted children who scored 131.7 on Verbal Comprehension and 126.4 on Perceptual Reasoning scored just 1 standard deviation above the mean on working memory (117.7) (Silverman, Gilman & Falk, 2004). Thus, while gifted children may earn higher than average working memory scores, they do not appear to have the working memory of a child 4-7 years older than they.

Working memory relies on the late-developing prefrontal cortex, particularly the dorsolateral prefrontal cortex (Braver et al, 1997), which is involved in maintaining information during a delay (Fuster & Alexander, 1971). In early childhood, children with 121-149 IQ have a relatively thinner frontal cortex (Shaw et al, 2006). They only catch up later with a rapid increase in cortical thickness peaking at about age 11, while their peers with average IQ (83-108) show either a slight increase or a decline. Thus, 5-8 year old gifted children may actually have fewer neural resources to devote to working memory. It may seem strange that gifted children do not exhibit superior working memory spans, given that working memory and intelligence correlate between .41 and .91 in adults (Kane, Hambrick & Conway, 2005; Conway, Kane & Engle, 2003; Kylonnen & Christal, 1990) and .50 and .82 in children (De Abreu, Conway & Gathercole, 2010; De Jong & Das-Smaal, 1995; Miller & Vernon, 1996; Fry & Hale, 1996). However, because children with over 130 IQ make up only 2% of the population, these studies probably did not sample enough gifted children to draw any conclusions about them. As a small, atypical group, gifted people could show a lower correlation than average without significantly affecting the population as a whole.

Working memory and intelligence might decouple in the gifted for several reasons. First, statistical likelihood may play a role. It is more likely to find one ability at 2 standard deviations above the mean than multiples.6 Thus, a child would be more likely to have an overall IQ 2 standard deviations above the mean than both IQ and working memory at this level. In support of this conclusion, similar large differences occur between verbal and performance IQ in the gifted population, increasing with overall IQ (Sweetland, Reina & Tatti, 2006).

6Thisstatisticallikelihoodargumentassumesthatworkingmemoryandfluidintelligenceareindeedseparateabilities.Theextremelyhighcorrelationsbetweenthemsuggestsastrongrelationship,butnoonedoubtsthattheyaredistinctabilities.

Second, variability may play a role. Some gifted children have ADHD (Antshel et al, 2007), which involves reduced working memory scores relative to overall IQ (Martinussen, Hayden, Hogg-Johnson & Tannock, 2005). Meanwhile, others, identified for their “mature” behavior and high achievement, might possess unusually well-developed executive functions, including working memory. A population consisting of some members with gifted and others with average working memory scores could average out to roughly 1 standard deviation above the mean, about the level actually observed in the gifted population (Silverman, Gilman & Falk, 2004).

Neuroscientific evidence suggests a third possibility. Some have argued that extreme intelligence is characterized by a thinner, later-developing prefrontal cortex. Functionally, these children trade lower levels of early executive functioning for greater plasticity and cognitive flexibility—thus, greater ability to learn (Schill, Ramscar & Chrysikou, 2009). Some have even argued that such delayed executive function may enable extreme intelligence (Ramscar & Gitcho, 2007; Schill, Ramscar & Chrysikou, 2009). For our purposes, it is sufficient to note that gifted children score only about 1 standard deviation above their peers in working memory, and the prefrontal substrate underlying their working memory performance is actually less mature than their peers’. Thus, in the present study, we expect gifted children will show roughly age-appropriate working memory scores. In statistical terms, any working memory advantage they possess should not contribute significantly to their improved reasoning performance.

A large body of evidence indicates the involvement of working memory in reasoning. Working memory is the brain’s RAM, the amount of data it can hold onto and manipulate at any given time. Its limited capacity varies among individuals and increases throughout childhood (Best, Miller & Jones, 2009; Gathercole, Pickering, Ambridge & Wearing, 2004; Dempster, 1981). Working memory is not a single ability, but the product of several independent processes that vary in prominence at different ages, but all produce the same effect. These functions include processing speed, speech rate, inhibition of interference from irrelevant thoughts, and strategies such as rehearsal and chunking. It is still debated whether children get more slots or pack more information into each slot over development. Either way, children gain more “mental workspace” and so can process more information at a time. For our purposes, all that matters is how much “space” they can use at what age.

Working memory appears to involve a phonological loop, which maintains and manipulates verbal representations through means such as rehearsal, and a visual-spatial scratch pad (VSSP), which works with visual or spatial representations (Baddeley, 1974). A representationally-neutral “central executive” forms a third, dominant component of working memory (Baddeley, 1992).

Many studies find moderate correlations between working memory and conditional reasoning accuracy. These range from .17 to .74 depending on study methodology, with the majority between .30 and .50 (Barrouillet & Lecas, 1999; Markovits, Doyon & Simoneau, 2002; Handley et al, 2004; Bacon et al., 2007; Capon, Handley & Dennis, 2003; Handley et al, 2002).

Other evidence for the involvement of working memory in conditional reasoning comes from studies that require participants to reason while simultaneously performing a simple task that loads some component of working memory. Interference of one task on another (particularly interference of the working memory task on the reasoning task) indicates that the tasks share processing resources. In other words, the reasoning task taps the same working memory resources as the interfering one. Using this procedure, Toms and colleagues (1993) found that

conditional syllogistic reasoning resisted interference from visuospatial secondary tasks, was mildly disrupted by concurrent articulation (a phonological secondary task), and suffered dramatically under tasks presumed to load the central executive. Thus, verbal working memory seems to play a minor and executive working memory a crucial role in conditional reasoning. A later study by Klauer (1997) confirmed these results. In general, visual spatial working memory only seems to affect reasoning when the premises themselves involve spatial content (Duyck, Vandierendonck, & DeVooght, 2003), or when individuals deliberately use a spatial strategy such as Venn diagrams (Bacon, Dennis & Newstead, 2007).

More and less competent reasoners appear to differ in the working memory components on which they rely. For less competent reasoners, visual-spatial working memory marginally correlates with performance on concrete problems (e.g., “if it rained then the grass is wet”), but not abstract problems (e.g., “If the malou falls, then it will be dredon”) (Markovits, Doyon & Simoneau, 2002). By contrast, for more competent reasoners, verbal working memory span correlates with both concrete and abstract problems (Markovits, Doyon & Simoneau, 2002). Thus, high verbal working memory helps one solve both concrete and abstract problems, while high visual-spatial working memory only aids in solving concrete ones. Additionally, the pattern of correltions suggests that less competent reasoners rely on visual-spatial working memory while more competent reasoners use verbal working memory. Thus, while working memory correlates moderately with performance regardless of skill, people who use verbal representations tend to reason more successfully.

In addition to these correlations with overall performance, working memory may cause individual differences in interpreting or answering if-then questions. Barrouillet and Lecas (1999) found that adults and children with low spans interpreted “if a then b” as either conjunctive (a and b), or biconditional (a and b, or else not a and not b). By contrast, those with high spans favored the correct conditional interpretation (Barrouillet and Lecas, 1999). Verscheuren and colleagues (2005) found that participants with higher working memory capacity used more counterexamples than those with lower capacity. Retrieval of counterexamples can increase accuracy on indeterminate problems (Overton, Byrnes & O’Brien, 1985).

One can draw two conclusions about the role of working memory in conditional reasoning from this data, a weak one and a strong one. The weak conclusion is simply that working memory facilitates reasoning (Rips, 1983), as it does for most complex tasks (Noel, 2009). However, that need not imply that individual differences in working memory alone explain why some adults reason better than others, let alone why children fail to solve uncertain problems before age 12. Two of the major theories of conditional reasoning7, however, make a stronger claim: that because accurate reasoning can only occur through a slow, laborious process, it cannot occur at all without a large working memory capacity.

7 I have necessarily left out important research on conditional reasoning. Conditional reasoning research consists of an unresolved debate between four types of theories: mental logic (e.g., Rips 1994; Rips, 1983; Braine & O’Brien, 1998), mental models (e.g., Johnson-Laird & Byrne, 2002; Johnson-Laird, Byrne & Schaken, 1992; Markovits, 2002; Barrouillet & Lecas, 1998), probabilistic (e.g., Oaksford & Chater, 2001; Oaksford & Chater, 2003) & dual-process (e.g., Evans, 2003; Klauer, Beller & Hutter, 2010; Klaczynski & Daniel, 2005; De Neys, 2006; Verscheuren, Schaeken, & D’Ydewalle, 2005). I do not discuss probabilistic and mental logic theories here, because they do not make strong claims about working memory. I focus on mental model theory in particular not only because of its arguments about working memory, but because mental model theorists have conducted much of the recent developmental research (e.g., Markovits et al, 1996; Barrouillet, Groset & Lecas, 2000; Markovits, 2000; Barrouillet & Lecas, 1999; Markovits, Fleury, Quinn, & Venet, 1998).

In dual process theories, human cognition involves two systems. System 1, composed of more ancient brain regions, computes information quickly, unconsciously, and largely automatically, with little working memory load (Evans, 2003). System 2, possibly unique to humans, makes up for its slow processing speed and strict working memory constraints with flexibility and the ability to monitor and correct one’s own thought processes (Evans, 2003). In conditional reasoning, System 1 activates prior knowledge and heuristics (Evans 2003). None of these processes lead to very accurate solutions. Researchers disagree on what process System 2 conducts. It may implement mental models (Klauer, Beller & Hutter, 2010), analyze the logical form of the conditional (Klauer, Beller, & Hutter, 2010), or retrieve counterexamples from long-term memory (Verscheuren, Schaeken & D’Ydewalle, 2005a, 2005b, Markovits & Barrouillet, 2002). Regardless, only this slow, working memory-constrained system is supposed to allow for accurate reasoning. Indeed, some researchers believe that only this system can engage in any kind of abstract, hypothetical thought (Evans, 2003). In practice, people use a combination of System 1 and System 2 processes in reasoning. In some theories, System 2 kicks in after System 1 and acts to “check one’s work”. In others, System 2 must actively inhibit System 1 in order for accurate reasoning to occur (De Neys, 2006). Thus, even when both systems act in tandem, reasoning can be fast, with a low cognitive load, or accurate, but not both. A child with accurate reasoning (a characteristic of System 2) but age-appropriate memory capacity (characteristic of thinkers who rely on system 1) would pose problems for dual process theories. Theoretically, intelligence could allow a child to perform well despite working memory limitations, but that would assume that intelligence does not depend on working memory—exactly the opposite of dual process theory’s claims. Although studies have found that processing characteristic of System 2 is linked to a child’s intelligence (Evans, 2003), theorists cannot invoke intelligence as an explanation so long as they claim that abstract, intelligent thought can only occur in working memory-intensive system 2.

Mental model theory has two versions: the original formulation by Johnson-Laird and a later, developmental reimagining by Markovits, Barrouillet and their colleagues (e.g., Markovits, 2002). In both, reasoning depends on constructing mental simulations (mental models) of the possibilities compatible with the premises, and drawing conclusions based on the full set of models8. People build these mental simulations by constructing tokens representing the entities in the premises. Suppose people hear the conditional “If there is a circle, then there is a triangle.” As part of the sentence comprehension process, they imagine the following:

Where the “…” conveys the knowledge that the meaning of “if” can be further unpacked explicitly if needed.

8 According to Johnson-Laird, on a computational level, participants construct a truth table, which is much bulkier than logical rules but equally formal. On a psychological level, participants fill in the cells of the truth table by creating tokens linked by meaningful relationships. One weakness of Johnson-Laird’s theory is its need to link these two levels, especially given that its main appeal is its account of the psychological level rather than its formal underpinning. Perhaps for this reason, Markovits & Barrouillet discuss only the psychological level of mental models.

There are two interpretations of “if-then” statements, which differ in the number of models they contain when fully unpacked. A conditional interpretation of “if there is a circle, there is a triangle” would look like this:

- -

-

The (logically incorrect but common) biconditional interpretation would be:

- -

The crucial aspect of mental models is that answers do not need to be deduced, but simply “read off” of the models created. The real work of reasoning thus comes from the model generation process; deduction itself does not exist. Participants will draw a conclusion if it appears in all models, decide it is false if it appears in none, and decide it may or may not follow if it occurs in only some models.

Participants can correctly answer MP from their initial model, simply by noting the fact that it contains a triangle. They cannot do so accurately for MT. Knowing that there is not a triangle eliminates the first model. Hence, it would appear that nothing follows, and subjects sometimes answer in exactly this way (Johnson-Laird, Byrne & Schaeken, 1992). Participants can answer correctly with either a conditional or biconditional interpretation. Either way, the lack of triangle eliminates the models containing triangles, leaving behind only one model, - -, which permits the correct conclusion (Johnson-Laird, 1992).

If participants build only two models, they will incorrectly provide certain answers to AC and DA. For AC, they will find the model with a triangle, and read off the presence of a circle. For DA, they will identify the model with no circle and note the lack of a triangle. If they create a third model, however, there will be two models with a triangle and two with no circle. Each pair of models enables conflicting conclusions about the other shapes, leading to the correct answer that nothing follows.

In general, the fewer models required for an inference, and the simpler they are, the easier the inference will be, because working memory places a bottleneck on processing (Johnson-Laird, 2001). Mental models cost a great deal of working memory, for several reasons. First, simply generating and maintaining models costs working memory, so the more models participants have to represent, the heavier the working memory load. Second, examining models and integrating them all to draw a conclusion uses working memory. Third, retrieving information related to problem content, and any relevant counterexamples, takes working memory. Because these constraints make it so costly to work with more models, reasoners will tend to make inferences using the minimum number of models possible. Thus, they will favor the two-model biconditional interpretation over the three-model one, and systematically produce certain answers to problems with uncertain solutions, such as AC and DA. In other words the number of models people construct determines how accurately they produce and evaluate conditionals.

Children have limited working memory spans which increase with age; thus, the number of models they can construct and manipulate also increases with age. However, Johnson-Laird himself does not make specific predictions about what problems children can solve at what age. Markovits, Barrouillet, and colleagues do so by changing some aspects of mental models and specifying a particular order in which they are constructed.

Some aspects of the representations people use to construct models remain unclear and controversial, including how verbal or visual they may be (Klauer, 1997). While Johnson-Laird asserts that meaningful relations link the tokens in mental models, such relations do not appear in the examples he uses (such as the circle-triangle one), and he provides little information on how they are represented or how real-world relational information actually influences the conclusions people draw. Furthermore, while he notes that real-world knowledge can influence the fleshing-out process (Johnson-Laird, Byrne & Schaeken, 1992), he has largely left the details to other researchers. Markovits and Barrouillet eliminate the truth table and define mental models as simply entities linked by relations (Markovits, 2000). Their specification of the meaning of relations draws on computational research by Halford and colleagues on the processing load imposed by relations (Halford, 1984; Halford, 1998). In conditional reasoning, each model is generally a binary relation between premises a and b (unless the premises themselves contain sub-relations, in which case ternary relations may be involved). Halford (1984) found that children over 5 can almost always process two binary relations (one relation between two items), while preschool children could only consider one; thus, only the five year olds understood transitivity (i.e., a = b and b = c, so a = c, or a > b and b > c, so a > c). The ability to process ternary relations (2 relations between three items, e.g. a > b > c) continues developing after age 5; one study found it was present in 15% of four year olds, 41% of five year olds, 63% of six year olds, 72% of seven year olds, and 80% of eight year olds (Andrews & Halford, 1998). Thus, Markovits (2002) predicts that 6-7 year olds making conditional inferences use only two models, while 8-9 year olds can store three, although with some difficulty.

Adding assumptions about the order of construction of mental models allows for specific predictions about which problems children will solve at which age. Suppose that eight year olds can only work with two models. Which models will they build, and what problems will they solve? They might take a biconditional interpretation, as predicted by Johnson-Laird. But nothing in Johnson-Laird’s account prevents them from instead ending up with the following model:

-

In which case, the conclusion for MP would be “there is a triangle,” the conclusion for MT would be “nothing follows,” the conclusion for AC would be “nothing follows,” and the conclusion for DA would be “there is a triangle.” This pattern of results has not often been reported. To explain why, Markovits argues that children will construct “not p—not q” second because we first draw the conclusions that provide us with the most information. Only after that will they construct the “not p—q” model.

Markovits and Barrouillet can then predict a distinct developmental pattern for each of the four inferences: MP remains high and stable across development; MT increases with age; biconditional responses to AC decrease; and biconditional responses to DA first increase and then decrease. Specifically, in the youngest children, who represent an if-then statement like “if p then q” as meaning simply “p and q,” positive conclusions like “q” or “p” will be drawn more

frequently than negative ones like “not q” or “not p.” Thus, they will show higher accuracy for MP and DA than MT and AC. This gap disappears when children can construct two models. When children can construct three models, determinate answers will be drawn more frequently for MP and MT than for AC and DA, thus increasing accuracy (Barrouillet, 2000).

In both these versions of mental model theory, working memory places an absolute performance limit on both adults and children. Any child or adult who lacks the working emmory span needed to generate, maintain, and check three mental models cannot answer the uncertain forms correctly without special training. Thus, skilled conditional reasoning would be in principle impossible for children under twelve. Like dual process theory, mental model theory could not accommodate the existence of younger children who reason as if they had three models, but lack the working memory to build them.

If gifted children can solve uncertain conditional reasoning problems, they must do so using a method that places little load on working memory. Thus, they probably use unconscious processes, as these consume relatively little working memory and may operate independently of it. Limitations—of attention, working memory, or multitasking-- constitute a defining feature of consciousness (Baars, 1997). By contrast, unconscious processes compute vast quantities of information rapidly and in parallel. As an unconscious process, structural intuition could explain gifted children’s performance. The implicit learning on which it relies is either unaffected by working memory (Unsworth & Engle, 2005) or impeded by it (DeCaro, Thomas & Beilock, 2008), and thus could explain how children learn the structure of conditionals and their associations with answers, despite severe limitations in working memory span.

One might object to this claim, based on research on the relationship between implicit learning and intelligence. We propose that gifted children implicitly abstract the structure of conditionals, while no evidence indicates that typically developing children do so for problems other than MP.

However, several studies have found that intelligence is significantly correlated with performance in explicit but not implicit learning tasks (Reber, Walkenfeld & Hernstadt, 1991; Maybery, Taylor & O’Brien-Malone, 1995; Unsworth & Engle, 2005; McGeorge, Crawford, & Kelly, 1997), or that only one or two limited components of intelligence are correlated with implicit learning, such as verbal reasoning, WAIS-R performance subtests, or processing speed (Kaufman et al, 2010; McGeorge, Crawford & Kelly, 1997).

These findings do not pose a significant problem for our hypothesis, for several reasons. First, as Reber and colleagues (1991) themselves point out, the nonsignificant correlations indicate not that intelligence has no relationship with implicit learning, but that the relationship is weaker than with explicit learning. Indeed, the components of intelligence related to implicit learning tend to load heavily on fluid intelligence—the sort of flexible, content-independent intelligence where gifted children particularly excel (Silverman, 2009; Silverman, Gilman & Falk, 2004).

Furthermore, one study directly compared gifted children with same-age mentally retarded and both older and younger typically developing children, in order to investigate an earlier study’s finding that IQ did not affect implicit learning but age did. Fletcher and colleagues (2000) found that the gifted children performed better on the implicit learning task, though they did not differ in their explicit knowledge of the task. Specifically, not their IQ but their mental age, or developmental level, mediated this improved performance. Thus, gifted children may indeed excel at implicit learning.

But even if these studies found no effect of intelligence on implicit learning, it would not bear on our hypothesis. All these studies use only one or two implicit learning tasks (typically Artificial Grammar Learning or Serial Reaction Time), and examine individual differences on these specific tasks. By contrast, we claim not that gifted children exceed their peers on any particular implicit learning task or even implicit learning ability in general, but merely that they can apply this learning process to a greater range of skills. Thus, even if there were no intelligence-related individual differences on any particular implicit learning task, this would have no bearing on whether gifted children could implicitly learn conditional reasoning skills.

One might wonder whether choosing to study gifted children “begs the question” by choosing a group preselected for advanced reasoning. In fact, it is debatable to what extent a consistent selection basis exists at all.

The legal definition of giftedness incorporates general intelligence, specific academic talent, creativity, artistic ability, and leadership capacity. While broad definitions like this one benefit children by identifying and serving the largest number, they cause problems for researchers. These traits do not overlap perfectly. A child may have any one of these characteristics, or any size subset consisting of any combination of them.

Difficult ethical issues only compound the problem. Local attitudes differ about the relative importance of general intelligence, specific academic talent, creativity, artistic ability, or leadership capacity. Each program must balance the desire for equity (benefitting as many students as possible) with challenge (the need to meet the needs of the most highly gifted students) (Van Tassel-Baska, 2000). It must also choose a balance between accelerating already-advanced children and cultivating those with untapped potential for achievement (Van Tassel-Baska, 2000; Lohman, 2009). Different programs balance these objectives differently, identifying children with different cognitive profiles as “gifted.” As a result, while some studies and programs define gifted as the top 2-3 percent of the population (Lohman, 2009), a school system using a broad definition could identify over 15% of its population as gifted (Van-Tassel Baska, 2000?).

Some programs try to reduce this heterogeneity by focusing primarily on intellectual ability and specific academic talent. Many set the cutoff for admission at 130 IQ and up, or the top 2% of the population. Even so, problems of measurement still produce heterogeneity. First, gifted programs may accept many different types of tests, including aptitude tests (i.e., IQ tests), achievement tests, or out-of-level achievement tests (e.g., the SAT taken by a 7th grader). The same percentile rank means something different on an IQ test, a grade-level-based achievement test, and an out-of-level achievement test. Thus, a gifted program may include children with a broad range of ability. Indeed, they generally do, because gifted programs must include large numbers of students to justify their existence (Van Tassel-Baska, 2000). Some researchers estimate that a given program may include children who vary by as much as three standard deviations in a specific intellectual ability (Van Tassel-Baska, 2000). For example, the reading levels of children in a 5th grade gifted program might range from 7th grade to college level (Van Tassel-Baska, 2000).

Even with the same target of 130 IQ or higher, different tests select for different cognitive profiles (Lohman, 2009). For example, the WISC-IV finds more verbally gifted students than the SB-5, while the SB-5 identifies more mathematically gifted ones (Silverman, 2009). While comparisons of verbally and mathematically gifted children at 5-8 years of age are scanty, studies of older students who have taken the SAT show large differences in their cognitive profiles, including working memory (e.g., Benbow & Minor, 1990).

In short, if a consistent basis for selection exists, it would consist of IQ and achievement tests, and these test advanced abstract reasoning ability very little and conditional reasoning not at all.

The Present Study

The present study tests two hypotheses: first, that gifted children will show adultlike performance on conditional reasoning problems, particularly the uncertain ones; second, that they will exhibit structural intuition. A secondary aim is to trace the developmental trajectory of conditional reasoning in gifted children, as this has not been systematically investigated.

Structural intuition has a number of characteristics—speed; effortlessness; feeling of certainty; task-specificity; and understanding of structure. The present study focuses on the latter two because they seem both particularly essential to the phenomenon and potentially controversial. In order to test whether gifted children comprehend the structure of conditionals, we use problems with both concrete and abstract content, but identical logical and grammatical structure, as follows:

If there is an apple in the fruit salad, there is a banana in the fruit salad.

There is an apple in the fruit salad.

__________________________________________________________

There is a banana in the fruit salad.

If there is a gidget in the yanna, there is a wurgle in the yanna.

There is a gidget in the yanna.

_____________________________________________________

There is a wurgle in the yanna.

Children who lack structural intuition and rely on problem content may solve the concrete problems, but will not know how to solve the abstract ones. Thus, they should solve the concrete problems more accurately. By contrast, children with structural intuition can recognize that the abstract problems work the same way as more familiar concrete ones, and so should solve them just as accurately. Several studies indicate that young typically developing children have not developed this structural intuition, so they should solve more concrete problems than abstract ones (Markovits, 2002; Venet & Markovits, 2001). By contrast, we predict gifted children, equipped with structural intuition, will solve concrete and abstract problems equally well.

Because structural intuition relies on abstracting the structure of reasoning problems, it is task-specific. In other words, it involves knowledge that goes beyond domain-general executive and problem-solving abilities. To test this hypothesis, we developed a task designed to draw

heavily on working memory and executive function. We then covary performance on this task to determine whether general problem-solving skills exhaustively explain logical ability. If Markovits, Barrouillet, & their colleagues are correct, typically developing children use only domain-general executive function skills such as these to solve reasoning problems (Markovits & Barrouillet, 2002; Markovits et al, 1996; Markovits, 2000), while we predict gifted children will have additional, conditional reasoning-specific resources. Thus, group differences in reasoning accuracy should remain after covarying performance on the rote task.

In designing the rote task, we faced a difficult problem: gifted children excel at turning hard rote problems into easier, more meaningful ones through deep comprehension or content knowledge, essentially turning the task into a measure of insight or crystallized intelligence rather than executive functions. To prevent gifted children from finding such shortcuts and rendering the task less useful as a measure of working memory and executive function, we developed a new task designed to be as rote as possible. We presented participants with 2x2, 3x3, 4x4 and 5x5 grids of randomly arranged letters and asked them to say the letters in alphabetical order as fast as possible. The lack of content (beyond alphabetic knowledge) was designed to prevent creative uses of crystallized knowledge, while the simple instructions and arbitrary arrangement rule were supposed to limit the range of creative problem-solving strategies.

Reasoning Problem Design. In designing the reasoning problems, we faced a dilemma. Because 5-8 year old children do not all know how to read, and individual differences in reading speed and comprehension would confound results anyway, we had to present problems out loud. Recalling the problems while listening to the possible answer choices places a heavy load on working memory, particularly for the abstract problems, with their unfamiliar nonsense words. Problems had to be simple enough to present out loud. We kept problems as short as possible (the concrete ones tended to be longer than the abstract ones), and told children they could ask to hear the problems again, though they rarely did so.

To ensure children understood the task, we asked for a response in the following manner:

If the boots have polka dots, the scarf has stripes.

The scarf does not have stripes.

Do the boots have polka dots?

We had to elicit children’s ideas about the necessity of their answers without using overly complicated vocabulary or overloading their working memory. We also needed to offer children the possibility of an uncertain answer without directly cuing them to give such an answer if they would not otherwise do so. Thus, we offered three answer choices: “definitely yes,” “can’t,” and “not sure.” We did not directly address whether “not sure” meant “the answer is uncertain” and “I don’t know.” However, we provided, and gave the correct answers to, three sample problems, one of whose answers was “not sure.” Whether or not children answered this sample question correctly, we told them the answer and explained why the answer was “not sure,” thus informing them that “not sure” could mean “uncertain” as well as “I don’t know.”

Since our study investigates how children process structure rather than content, we wanted to avoid confounds of believability or real-world knowledge. Clearly, we wanted to avoid conditionals that elicit belief bias, such as this:

If you step on the brakes, the car will speed up.

You step on the brakes.

More subtly, we wanted to avoid conditionals with a meaningful causal, categorical, or other relationship. In our concrete problems, clauses were only arbitrarily related; for instance:

“If the boots have polka dots, the scarf has stripes.” Our problems thus differ from those in most recent studies of conditional reasoning, which often use clauses with causal or categorical relationships.

To generate the abstract problems, we replaced the nouns and adjectives with nonsense words, thus eliminating content while preserving both logical and grammatical structure.

Working Memory span. We dispute the strong claim that accurate conditional reasoning cannot occur without a large working memory span. However, our argument rests on the premise that gifted children have roughly age-appropriate working memory. To test this, we used the Digit Span subtest of the Woodcock Johnson (WJ-III; Woodcock, McGrew & Mather, 2001); digit span is a standard measure of verbal working memory. The test yields a raw score, an estimated age equivalence, a percentile rank, and a standard score.

Group Assignment. For all children not recruited directly from a gifted program, we administered a brief IQ test (Brief Intellectual Ability index from the WJ-III) to determine whether they belonged in the gifted or typically developing group. This test consists of three subtests: Verbal Comprehension, which measures vocabulary; Concept Formation, which measures childrens’ understanding of “and,” “or” and similar logical relations using colored shapes; and Visual Matching, which measures processing speed. The BIA yields percentile rank and standard (IQ-equivalent) scores for full-scale and for each subtest. We used a conservative cutoff of 130 full-scale IQ or higher to assign children to the gifted group, because researchers and gifted programs frequently use this cutoff.

Methods

Participants

Gifted participants were 12 five year olds (average age: 66 months; range: 61-69 months; 6 females); 19 six year olds (average age: 80 months; range: 72-83 months; 9 females); 15 seven year olds (average age: 90 months; range 84-94 months; 8 females); and 13 eight year olds (average age: 103 months; range: 98-107 months; 9 females).

Typically developing participants were 7 five year olds (average age: 67 months; range: 60-71 months; 4 females); 7 six year olds (average age: 77 months; range: 74-78 months; 3 females); 7 seven year olds (average age: 89 months; range: 84-95 months; 5 females); and 7 eight year olds (average age: 101 months; range: 96-106 months; 5 females). Most had full-scale IQ scores higher than 100 (see Table 1 for their BIA scores).

All participants were right handed, with no neurological conditions or known learning disabilities.

The majority of the gifted participants were recruited from Center for Talent Development, an enrichment program affiliated with Northwestern University’s School of

Education and Social Policy. To be eligible for the program, children had to score at or above the 95th percentile on any nationally normed aptitude or achievement test, a fairly typical requirement for a gifted program. We did not obtain test scores from these participants. Two five year olds, four six year olds, two seven year olds, and one eight year old were recruited from local schools, summer camps and a homeschool group, and earned a full scale IQ of 130 (98th percentile) or higher on the Woodcock Johnson III Brief Intellectual Ability test we administered.

Participants not from Center for Talent Development were recruited from Evanston and its surrounding suburbs, an affluent area. Participants were selected from an affluent area in order to avoid confounds from differences in SES. Most participants in both ability groups were white; there were more Asians in the gifted group and more Hispanics in the typically developing group.

Participants were paid $10 per hour for their participation.

Materials

Reasoning Problems. There were 16 conditional reasoning problems, half of which were concrete and half of which were abstract (see Appendix A for the complete list). Across four sets of stimuli, each problem appeared in all of the four forms (MP, MT, AC and DA). The 16 problems appeared in a different random order in each of the four sets.

Rote problems. Rote problems were constructed from strings produced by a random number generator (www.random.org) using numbers 1-26. In the 2x2, 3x3 and 4x4 problems, no letter was allowed to appear more than once. For the 5x5 problems, no letter could appear more than 3 times, and generally no more than four letters could repeat. To avoid confounds, we eliminated or changed strings that contained or consisted of real words or abbreviations. Problems were put together in such a way that participants who simply read the puzzle instead of trying to answer it would get all but 1 or 2 letters out of order.

There were four sets of rote problems, each of which was matched with a set of reasoning problems such that Set 1 of the reasoning problems was presented with Set 1 of the rote problems, and so on. There were 64 rote problems, 16 of which appeared in each set (see Appendix B for example rote problems).

Digit Span. The Digit Span task was taken from the Woodcock-Johnson III.

Participants listen to a CD on which a male speaker presents increasingly long lists of words and numbers, such as “8, horse, sock, 2.” Children must repeat these words and numbers in the order they were heard, while rearranging them in such a way that the words come before the numbers, e.g. “horse, sock, 8, 2.” Participants could take as long as they needed to respond. This task requires children to maintain information in immediate awareness, divide it into groups, reorder it, and shift attentional resources to the new ordered sequences. The instructions, though standardized, are also rather complex, and thus were difficult to understand for some five year olds in the present study.

Children earn 2 points for putting the words in order followed by the numbers in order; 1 point for placing either the words or the digits in the correct order; and 0 points for either a response with digits first or both the words and the digits in the wrong order. Testing ends either

when participants reach a ceiling (scoring 0 on the last three items in a group of problems) or when they reach the end of the test, whichever came first. Points for all problems are added together to produce the raw score. The computerized scoring program then computes an age equivalence score (the age at which the average child would earn the same raw score), and a standard score (normed against age peers in the manner of an IQ test, with a mean of 100 and an SD of 15).

BIA. The Brief Aptitude Test consists of three subtests from the Woodcock-Johnson III: Verbal Comprehension, Concept Formation, and Visual Matching. The computerized scoring program calculates an overall IQ and a standard score (IQ-equivalent) for each of the subtests. Verbal comprehension measures verbal IQ, and contains the subtests “Picture Vocabulary,” “Synonyms,” “Antonyms,” and “Verbal Analogies.” A typical analogy: “on is to start, as off is to ___.” Scores on the subtests are summed together to produce the raw score for the subtest.

Concept Formation is intended to measure induction of abstract rules based on limited and controlled feedback from the experimenter. In practice with this group of participants, it appears to measure the relational knowledge children bring to the task rather than any new learning. Children are shown assortments of colored shapes, where some are inside boxes and some are outside. Children are told that shapes in the box differ in some way from the shapes outside, and they are asked to state the rule explaining what makes the shapes inside the boxes different. The number of shapes inside the box varies, and the rule changes from one simple feature (like “circle,” “red” or “big”) to a conjunction (e.g., “square and yellow”) or a disjunction (“small or square”). Four sections move systematically from easy to hard, while the fifth includes a random mix of all types of problems, for which children must figure out the appropriate rule under a time limit. Participants get one point for each correct answer, and these points sum to produce their raw score.

In the Visual Matching subtest, children receive a worksheet filled top-to-bottom with matching problems. Each problem is a row of digits containing two identical ones, which the child must find and circle. Problems become increasingly complicated, including an increasing number of look-alike numbers such as 6 and 9 and moving from one to two and three digit numbers. Children must complete as many problems as possible in exactly three minutes, earning one point for each correctly identified pair. Thus, this subtest measured processing speed.

Procedure

All participants first completed the Digit Span task, followed by the rote and reasoning tasks in counterbalanced order. These three tasks took about an hour. Participants not recruited from Center for Talent Development then were given the BIA, which took about half an hour for most participants. The BIA was last because if a child got tired or bored and decided to leave early, this task would hurt the least to skip. Children sat at a low table across from the experimenter during all tasks, and could take breaks between tasks or as needed.

Reasoning Task. During the reasoning task, participants were given the following instructions: “We’re going to do some riddles now. They’re about something imaginary that happens when something else imaginary happens. I’ll tell you if one of the things happens, and you tell me if you think the other thing definitely happens (nodding head), can’t happen (shaking head) or you’re not sure (shrugging shoulders). It’s OK to say not sure. If you need to hear a riddle again just ask and I’ll repeat it. Some of the riddles will be about things you know a lot

about, like people, colors or food. Some are about things with funny names that a lot of people don’t know about. Even if you don’t know what those things are, you’ll still be able to answer the riddles.” I then gave the following three examples of how the answer choices should be used:

Jane is at the store.

Fred is at the movies.

Do you think Jane is at the store?

(Correct answer: yes).

Jane is at the store.

Fred is at the movies.

Do you think Jane is with Fred?

(Correct answer: no).

Jane is at the store.

Fred is at the movies.

Do you think Jane goes to the movies that day?

(Correct answer: not sure).

If the child answered correctly, I told them they were right, and why. If they answered incorrectly, I said, “Good try” and told them the answer. For the third example, which most of the participants got wrong, after telling them it was a trick question and the answer was “not sure,” I asked if they could guess why. If they figured it out, I told them so, and if they didn’t know, I told them the question was about the whole day, so Jane could go to the movies, but the question doesn’t say whether she does, so the answer is not sure.

We elicited responses from participants by presenting information about one clause and asking about the other, as follows:

If the boots have polka dots, the scarf has stripes.

The scarf has stripes.

Do the boots definitely have polka dots? Can’t have polka dots? Or are you not sure if they have polka dots?

MP was scored “correct” if participants answered “yes” or “definitely.” MT was scored “correct” if participants answered “no” or “can’t.” Affirmation of the Consequent and Denial of the Antecedent were scored “correct” if participants answered “not sure” or “can.”

Participants were asked the reasons for their answers. Only a few participants were consistently willing and able to do so. Most simply said “don’t know” all the time, while a few were shy and didn’t want to answer the question. Thus, these responses were not analyzed.

Rote task. Participants were shown a set of two example puzzles and told that the letters in the puzzles were in a mixed-up order and their goal was to say the letters in the same order they would be in the alphabet as fast as possible. The first puzzle was a 2x2, and the second a 4x4 that had two A’s in it. Before the child began the second puzzle, I asked if he or she noticed anything different about this puzzle. If they didn’t notice in 2-3 guesses, I told them there were two A’s, that this happened in some of the larger puzzles to be tricky, and they could show that I didn’t fool them by saying “AA” when they saw this. I also told them that if they realized they skipped a letter, they should say that letter and keep going rather than starting over from the beginning. Participants were timed, but were stopped by 5 minutes.

Participants could make three types of errors: omissions, commissions, or wrong order. Thus, participants could get 2 points per letter—one for saying it and one for putting it in the correct position. Thus, each letter was worth 1/8 or .125 in 2x2 problems, 1/18 or .056 in 3x3 problems, 1/32 or .03125 in 4x4 problems, and 1/50 or .02 in 5x5 problems. Each type of error was worth 1 letter and was subtracted from 100% to find participants’ accuracy on the problem. This scoring system has two consequences. First, participants will score above 50% unless they get all letters missing or out of order. Second, each error is worth less as the problems get larger, thus potentially flattening the accuracy difference between different-sized problems.

Digit Span & BIA. The Digit Span and BIA were scored using the WJ-III computerized scoring program, which uses age to calculate percentile rank and standard scores. The Digit Span recording sheet itself gives from 0-2 points on each problem, which sum to produce the raw score. The program than calculates estimated age equivalences, percentile rank, and standard scores from the raw score. The standard score compares the child to others of the same age using the same distribution as IQ, while the age equivalence score indicates at which age the average child would obtain the same raw score.

Results

Tests for precocious reasoning & normal working memory in gifted children.

As a test case for structural intuition, I predicted that gifted children would demonstrate precocious conditional reasoning alongside age-appropriate working memory spans. Specifically, they would reason more accurately than typically developing peers, but their working memory spans would not significantly differ. Unfortunately, the results did not support these predictions.

In an ANOVA with Condition and Set as within-subject factors and Ability and Age as between-subject factors, gifted children did not reason significantly better than typically developing peers. This lack of significant effect was likely due not simply to a lack of power, but to almost identical means (see Table 2), as gifted children correctly solved 41%, compared to

39% for typically developing children. Mean accuracy was also nearly identical for concrete and abstract problems (see Table 2).

Gifted children did not appear to understand abstract problems better than typically developing children did; each group solved 41% of them correctly. Thus, if gifted children have unusual facility with abstract thinking at this age, it does not seem to extend to conditional problems with nonsense words.

Contrary to my predictions, gifted children had significantly higher working memory spans than typically developing children, F(1, 80) = 8.320, p = .005. On average, gifted children earned a standard score of 117, just over 1 standard deviation above the mean, in line with previous research (Silverman, Gilman & Falk, 2004; see Table 3).

This mean difference seems to be driven by a wider range for gifted children, with more extreme high scores. The lowest standard scores for both gifted and typically developing children were 78, or almost 1 standard deviation below the mean. But whereas the highest score for typically developing children was 132, just over 2 standard deviations above the mean, the highest score for gifted children was 170, over 4 standard deviations above the mean. Indeed, there were two outliers in the gifted group (based on percentile), with standard scores of 160 and 170, respectively. To determine whether these outliers were responsible for the difference between gifted and typically developing children, I removed their data and re-ran the ANOVA (an Ability x Age ANOVA with working memory age equivalence scores as the dependent variable).

Removing the outliers did not eliminate group differences in working memory, F(1, 72) = 8.828, p = .004. An examination of the frequency distributions for standard scores, which are easier to interpret, indicated that gifted children’s working memory scores are positively skewed, both relative to the normal distribution and to the present study’s typically developing group. Computing the bell curves of standard scores for each group (see Figure 1) indicates that typically developing children peak just above the normal mean of 100, with slightly more extreme scores at the high end than the low end of the curve. By contrast, the gifted children peak just below a standard score of 120, with the majority of participants scoring from the mean to 2 standard deviations above it (see Table 3c). Thus, as a group, gifted children do appear to have higher working memory spans than typically developing children (see Tables 3, 4).

Gifted five year olds started out with the working memory span of the average 6 year old; by age 7, they had reached the equivalent of a 10 year old, and by age 8, they had the working memory span of a 14 year old (see Table 3a). In other words, by age 8, they had at least the amount of working memory capacity proposed to be required for solving uncertain problems.

By contrast, typically developing five year olds started with the working memory span of the average 5 year old, also reached the level of a 10 year old by age 7, but did not develop beyond this age level at age 8. Thus, one would predict low accuracy for uncertain problems in even the eight year olds.

In both groups, working memory increased with age in a nonlinear fashion, with nonsignificant increases between some ages and significant jumps between others (see Table 5). While gifted children did not significantly differ in working memory age equivalency from ages 5 to 6 (student’s t, one-tailed, p = .114), or 6 to 7 (student’s t, one-tailed, p = .424), they experience a significant jump in working memory age equivalency between ages 7 and 8 (student’s t, one-tailed, p = .005), which survived Bonferroni correction for multiple comparisons. By contrast,

typically developing children experienced significant growth in working memory age equivalency a year earlier than the gifted group, between ages 6 and 7 (student’s t, one-tailed, p < .001); this effect also survived Bonferroni correction. The timing of the jump for gifted children is intriguing, given Shaw and colleagues’ (2006) finding that children between 121-149 IQ (like our gifted group) start out with thinner prefrontal cortices at age 7, then undergo a period of rapid growth, peaking by about age 11, compared to either no change or a slight increase in children with lower IQ (like our typically developing group). The jump in working memory from ages 7 to 8 in gifted children in the present study may mark the beginning of this rapid prefrontal development. However, without knowledge of the full trajectory of working memory development in gifted children, or its relationship with prefrontal cortical thickness, this conclusion is only speculative.

Rote task.

Overall, both gifted and typically developing children were much more accurate on the rote task than on the reasoning task; gifted children got 84% correct, compared to 78% for typically developing children (see Table 6).

Because the rote task was designed specifically for the present study, it bears investigating whether this task indeed taps working memory and executive function, as intended. As problems increase in size, they should take reliably longer to solve. They should also have lower accuracy, but not necessarily significantly so, given that any single mistake affects accuracy more on smaller problems than larger ones, leading to decreased accuracy differences between problems. Working memory should correlate highly with accuracy, reaction time, or both, as children must maintain knowledge of which letters they have already said while determining the next one to say.

The rote task does appear to tap working memory and executive function, though it seems to do so somewhat differently for gifted and typically developing children. Difficulty increased reliably with the size of the problems, with all problem sizes differing significantly from the others in RT, F(3, 71) = 91.231, p < .001. Across all participants, working memory correlated moderately with accuracy on all problems (r= .416, p < .001), and on each of the different sizes (see Table 7a,b), with correlations between .2 and .5. These correlations were somewhat lower than predicted; however, the task was not a pure working memory test. Presumably, it taps a number of other executive function skills other than working memory, such as controlled attention and self-monitoring, as well as—not surprisingly, given the speeded nature of task—processing speed. Higher scores on the processing speed subtest of the BIA were moderately correlated with faster rote problem solving on all problems (r = -.443, p = .030), on 2x2 problems (r = -.425, p = .038), on 3x3 problems (r = -.443, p = .030), and on 4x4 problems (-.560, p = .004) for typically developing children. (There were no significant correlations for gifted children, probably because only 7 had BIA scores).

There was an ability group x age interaction for reaction time, suggesting that gifted and typically developing children may have differed in their approach to the problem over development (see Figure 2). One would predict that as children get older, they would solve the rote problems more quickly, because their processing speed increases, they have to talk themselves through the task less, and their executive function grows more efficient. And, indeed, gifted children got somewhat faster with age. However, typically developing children slow down with age; they start out faster than the gifted children at age 5, but by age 8, they take much more

time to solve the rote problems. One possible explanation is that gifted children perform the task as intended at each age; by contrast, typically developing children use shortcuts and do not really attempt to go through all the processing steps until they reach age 7 or 8, at which point they take longer than gifted children because their executive functions and working memory are less well-developed. Typical shortcuts would include skipping through sections of the alphabet, or simply reading all the letters in the puzzle instead of trying to put them in order, which typically developing five year olds often did when confronted with the largest rote problems. (This “reading” strategy is particularly suboptimal—a child will lose almost half the points on a problem in this way because only 1 or 2 letters end up in the correct position).

An ability group x age x problem size interaction, F(9, 219) = 2.040, p = .036supports this interpretation; it was driven by a difference between gifted and typically developing children on the largest (5x5) problems, where gifted children took 84.5 seconds and typically developing children took 27.3. Given that these problems have 25 letters to alphabetize, it is highly unlikely that the typically developing five year olds could earnestly attempt them in 27 seconds.

If typically developing children with lower working memory spans tended to rush through the problems, then they should have lower accuracy, and the whole typically developing group might show a particularly strong correlation between working memory and accuracy, particularly on the hardest problems (where less working memory implies less accuracy). These exact results occurred. Although working memory correlated reliably with accuracy in both gifted and typically developing children 9(see Table 7), the correlations were stronger in typical children (between .5 and .8) than in gifted children (between .3 and .4; see Table 8). Working memory also correlated with accuracy on the largest problems in the typically developing children (r = .602, p = .001), but not in the gifted children, exactly what one would predict if only typical children with low working memory spans chose suboptimal strategies to solve the largest problems.

The lower working memory correlations for gifted children are somewhat troubling, as they suggest these problems did not tap working memory very effectively. Given the analysis above, gifted children likely did use executive functions to solve the rote problems, but they may have relied more on ones other than working memory. Without a direct comparison between the rote task and a standardized task designed to measure executive function, one can only speculate on the mechanisms they might have used.

I had feared that the rote task would provide an effective measure of working memory and executive function for the typically developing but not the gifted group, due to gifted children’s tendency to find insightful shortcuts. Instead, gifted children seemed to perform the task as intended, while the younger typically developing children came up with shortcuts that may have reduced its effectiveness at loading their executive functions. Thus, the rote task did not work as well as intended, but not for the expected reasons.

Testing WM-based theories.

9In this case, working memory span as measured by age equivalency. Within ability groups, I found almost the identical correlations with raw working memory score and age equivalency scores, so only the age equivalency scores are reported.

Markovits and colleagues have argued that children cannot spontaneously solve AC and DA problems until they have the working memory of a 12 year old (1996). To test the importance of this threshold, we divided all participants into groups according to their working memory age equivalency score. The High group consisted of 16 children with a working memory of a child at least 12 years old. We also defined a Low group consisting of 50 children with the working memory of a child younger than 10 and a Middle group consisting of 20 children with the working memory of the average 10-12 year old. Each group contained both gifted and typically developing children. If 12 years old is truly a meaningful threshold, then we should find significant differences between the High group and the Middle group.

The High Group reasoned more accurately than the Low Group, F (2, 83) = 6.722, p = .002, solving on average 13% more problems (see Table 9). This effect could have been driven by age differences, as more eight year olds than five year olds would earn the same working memory score as a 12 year old. Thus, I covaried age in months. However, the difference between the High and Low groups remained significant after covarying age; controlling for age, the High group solved 10.4% more problems than the Low Group. Thus, the High group performed better because of their increased working memory span, not just because of their age.

There were no significant differences between the High group and the Middle group, or between the Middle group and the Low group. Given the lack of difference between the High group and the Middle group, having the working memory of a 12 year old, in and of itself, does not appear to produce better reasoning.

While the Middle and Low groups solved a roughly equal number of concrete and abstract problems, the High group trended towards solving more abstract problems. Thus, while performance increased with working memory age, abstract problems benefited the most from attaining the working memory of a 12 year old. (see Figure 3a) However, this trend did not reach significance. Contrary to Markovits and colleagues’ predictions, children in the High group did not solve more AC and DA problems than children with lower working memory ages; however, they solved more MP and MT problems, though this trend was not significant (see Figure 2b).

As in the main analysis, there was a significant main effect of problem type, F(3, 173.421) = 8.874, p < .001. MP problems were easier than AC (mean difference = .275, p < .001) and DA (mean difference = .250, p < .001), and a difference between MP & MT approached significance (mean difference = .131, p = .054). An effect of abstraction x problem type approached significance, F(3, 249) = 2.356, p = .072, power = .587, where MP is higher in the concrete condition (.612 vs. .561 correct) while DA is higher in the abstract condition (.277 vs. .397 correct). Controlling for age did not remove these effects.

There was one respect in which the High group differed meaningfully from the Middle group: consistency in interpreting and answering the problems. We rated each child as having a consistent interpretation if they gave the same answer at least 3 times out of 4, and an inconsistent interpretation if they did not. Notice that a child could have a consistent but erroneous interpretation. For example, a child with a biconditional interpretation would score as perfectly consistent on all problems, but would have low accuracy on AC and DA problems. A child who always guesses “yes,” or one who always guesses “not sure,” would also earn a high consistency score; however, there were only one or two of each in the present study. We rated consistency on the MP problems alone; on the problems with certain solutions (MP and MT) and across all problem types (MP, MT, AC and DA). Each of these levels of consistency theoretically corresponds to a successively higher level of competence in reasoning, and indeed, within each

age group, more children consistently interpreted MP problems than certain problems, and more children consistently interpreted certain problems than the complete set (see Table 10a-c).

Specifically, working memory age influenced consistency for the complete set of problems, F(2, 83) = 5.302, p = .007, and for the certain problems, F (2, 83) = 4.377, p = .016, but not for MP problems alone, perhaps because MP was fairly accurate in all three groups (see Figure 3a,b). For the complete set of problems, the High group was 36% more consistent than the Medium group, p = .035, and 38% more consistent than the Low group, p = .006. For the certain problems, the High group was 40% more consistent than the Medium group, p = .045, and 39% more consistent than the Low group, p = .017. Thus, having the working memory of a 12 year old seems to improve consistency. Perhaps it helps participants recall how they answered similar questions earlier, and align their current answer with previous ones. Perhaps it helps them fully process the premises. Whatever the reason for the relationship between working memory age and consistency, it suggests that consistency should not be taken as evidence for structural intuition in these participants. Structural intuition bypasses working memory resources, whereas consistency seems to rely on them. However, because structural intuition by definition involves a consistent response to problems of a given type, a lack of consistency would count as evidence against structural intuition. We will return to the implications of consistency for structural intuition later.

Testing for Structural Intuition.

Planned analysis: evidence from covarying rote task. I had originally planned to test the task specificity of structural intuition by investigating whether controlling for executive function and working memory (measured by the rote task) would eliminate differences between gifted and typically developing children. However, no significant group effects existed, so task specificity could not be directly tested.

The only significant results from the analysis of variance were a main effect of problem type, F(3, 76) = 6.527, p < .001, where MP was easier than all other problems, and an interaction between abstractness and problem type, F (3, 76) = 4.331, p = .011, where MP was more accurate in the concrete condition and DA in the abstract condition. Covarying accuracy on the rote task reduced the abstractness-problem type interaction below significance, but it did not remove the difference between MP and other problems. Thus, while working memory may influence whether a problem is more easily solved in concrete or abstract form, it does not alone explain why MP is so much easier than other problems. This finding contradicts the explanation in mental model theory that MP is easier because it requires only one model, thus using less working memory. Instead, it may indirectly support the idea that MP may be intuitive in a way other problems often are not (Rips, 1983; Rips, 1994; Braine, 1990), which was the jumping-off point for proposing structural intuition.

Evidence from working memory correlations. Working memory contributed moderately to reasoning for both gifted and typically developing children. In gifted children, it correlated with the total amount of problems solved (r = .413, p = .002), as well as with abstract problems (r = .411, p = .002) and a correlation with concrete problems approached significance (r = .264, p = .056). These correlations were higher than predicted.

For typically developing children, working memory correlated with the total amount of problems solved (r = .461, p = .016) as well as with concrete problems (r = .390, p = .044), but it

did not correlate significantly with abstract problems. Contrary to predictions, working memory did not correlate more highly with accuracy in typically developing than in gifted children.

The lack of correlation between working memory and abstract solving, and its presence in gifted children, could be interpreted as supporting structural intuition—but for typically developing rather than gifted children. Structural intuition predicts that abstract problems can be solved through a low-working memory process, provided that children recognize the structure of the problem (thus, if children have structural intuition, correlations with working memory should be low or nonexistent, particularly for abstract problems). This pattern occurred for typically developing, but not gifted, children. Given the similar answer patterns and accuracies for gifted and typically developing children, and the lack of evidence in the literature for structural intuition across all four types of conditionals in typically developing 5-8 year olds, this is probably not the correct interpretation. The lack of working memory correlation for typically developing children also could not have occurred because of floor effects for abstract problems, as they solved these problems at the same rate as concrete ones. One possibility is that working memory plays more of a role for gifted children because they solve abstract problems by imagining elaborate real-world situations, which they compare to the abstract conditional by analogy. As will be discussed later, some of the justifications gifted children gave for their answers support this possibility.

Interestingly, for gifted and typically developing children, the ability to solve abstract problems predicted overall accuracy better than the ability to solve concrete problems did. For gifted children, abstract problems correlated .838 with overall solving, p < .001, compared to .787 for concrete problems, p < .001. For typically developing children, abstract problems correlated .813 with overall solving, p < .001, compared to .688 for concrete problems, p < .001. These results suggest that children who reason better overall tend to better on abstract problems—or, conversely, that children who can solve abstract problems more accurately tend to be better reasoners. This supports my hypothesis that understanding abstract structure contributes to skill at reasoning.

Evidence from IQ Correlations. Verbal ability was related to reasoning ability, both within each ability group and across all participants. For gifted children, verbal ability was related to overall performance, as well as accuracy on concrete and abstract problems. By contrast, for typically developing children, it was related to MP and MT accuracy as well as overall performance. Across all participants, verbal ability was correlated with overall reasoning performance (r = .414, p = .015), as well as with accuracy on the concrete problems (r = .356, p = .039), MP problems (r = .377, p = .028), and MT problems (r = .348, p = .044); the relationship with MP and MT was probably driven by typically developing children.

If MP is an automatic inference and MT a fairly easy and accessible one, it seems odd that they would be sensitive to individual differences in verbal ability. Perhaps these correlations point to the importance of language processing for understanding the meaning of logical connectives like “if…then.” If typically developing children are still solidifying their understanding of “if,” greater verbal ability might lead to stronger concepts and thus greater consistency between problems. Gifted children may have a more solid understanding of MP and MT (and thus no significant correlations with verbal ability), but they are far from ceiling in their performance on concrete and abstract problems; thus, greater verbal ability may improve their consistency.

Oddly, neither overall IQ nor fluid intelligence were related to reasoning for gifted children. This lack of effect could stem from either a small sample size (only 7 took the BIA) or a narrow range of IQ scores in the gifted group. Overall IQ correlated with accuracy across all

problems (r = .391, p = .048) and with MP solutions (r = .431, p = .028) for typically developing children, while fluid intelligence also correlated with MP (r = .499, p = .008), again suggesting room for improvement in typically developing children’s MP concepts.

Evidence from consistency. Structural intuition assumes that participants have a representation of the structure of a given problem (e.g., MP), which they use whenever they encounter that problem, regardless of its specific content. Thus, a lack of consistent responding indicates a lack of structural intuition.

There were low levels of consistency for all participants, particularly for the complete set of problems (see Table 10a). This indicates that children were responding differently to individual problems of each type and ignoring their invariant structure. Even for MP problems, on which even preschoolers display competency, 29 to 58% of gifted children and 25 to 43% of typically developing children responded inconsistently.

One might expect children to develop more consistent interpretations with age; however, consistency did not significantly increase with age for MP problems, F(3, 78) = .813, p = .490, for MT problems, F(3, 78) = 1.035, p = .382, or for the full set, F(3, 78) = .511, p = .676. However, the power to detect age differences between participants consistent on MP problems was only .211, which may have been too low to detect a significant effect.

Gifted children are not more consistent than typically developing children, on the full set of problems, F(1, 78) = 2.714, p = .103, on determinate problems, F(1, 78) = .124, p = .725, or even on MP problems, F(1, 78) = .869, p = .354. There were non-significant trends towards group differences across the full set of problems, however. (see Figure 5). Gifted children increased markedly in consistency after age 5, while the typical group stays roughly the same from ages 5 to 8. Gifted children were also generally more consistent than typically developing children. However, neither of these findings came out significant, perhaps because for children consistent on MP problems, the power to detect main effects of giftedness was only .246, and the power to detect an interaction was .211. Thus, gifted children may in fact be more consistent, and develop consistency more rapidly with age. This could indicate the presence of structural intuition, but it could also signal the development of any consistent process for reasoning, including mental models.

It might appear that filtering out inconsistent responders would reduce within-group variability and uncover significant results unidentified by the main analysis ANOVA. However, it did not introduce any new main effects. Eliminating children who responded inconsistently to MP revealed a new significant interaction between ability group, age and abstractness, F (3, 46) = 2.858, p = .047, which persisted even after controlling for working memory age equivalency.

Specifically, typically developing children who respond consistently solve abstract problems more accurately than concrete ones at age 5, and concrete problems more accurately than abstract ones after age 5. Most likely, five year olds achieve higher accuracy rates for the abstract problems because they are particularly likely to be confused or overloaded by the nonsense words and say “not sure,” which happens to be the correct response to 50% of the problems. Older children may be more likely to attempt the abstract problems rather than say they don’t know, so unless they have an effective strategy for solving such problems, their accuracy may drop. By contrast, for gifted children, the means for concrete and abstract are virtually identical at all ages, and no consistent, interpretable pattern persists across ages (see Figure 6a-d). Controlling for working memory age equivalency reduced this interaction to marginal significance, F(3, 45) = 2.795, p = .051. One could interpret this finding as suggesting

that the interaction occurs because of individual differences in working memory span—specifically, that typically developing five year olds tend to say “don’t know” to abstract problems because they lack the working memory to encode and process them. However, the difference between p values with and without the covariate is tiny and probably would not have been meaningful had the interaction not been barely significant to begin with. Covarying working memory actually changes the means and trends very little for either group (see Figure 6a-d), suggesting that the true impact of working memory was quite small and the interaction dropped below the significance threshold for purely statistical reasons.

Evidence from reponse justifications. Some participants erroneously relied on real-world information to answer the reasoning questions. For example:

If there is an apple in the fruit salad, there is a banana in the fruit salad.

There is a banana in the fruit salad.

Is there an apple in the fruit salad?

A typically developing five year old said there couldn’t be, because he’d never seen a fruit salad with an apple in it. In the MT version of the question, a gifted five year old said there must be an apple, because there “needs to be at least one fruit” in a fruit salad, and in the DA version, another gifted five year old said there had to be a banana for the same reason.

Some of these real-world intrusions were rather creative. A response to one of the most abstract concrete problems:

If there is a green circle, there is a red riangle.

There is not a red triangle.

Is there a green circle?

A typically developing six year old said there “should be, you can’t take away both and throw them away ‘cause people might want them.” This child had apparently interpreted the if-then statement as implying that a red triangle was present, and the second statement as saying that someone took the red triangle and threw it away. Another concrete problem said:

If John likes playing tag, Dave likes playing tag.

Dave likes playing tag.

Does John like playing tag?

A gifted six year old got the correct answer (“not sure”) by a very long and strange chain of reasoning. She started out by supposing that John and Dave were both the same size. If Dave was taller than John, John couldn’t like playing tag. But if John was taller than Dave, he would like tag, because the bigger guy knows he’d keep winning and would want to play. She ended up saying “not sure” because the problem did not provide information about John’s and Dave’s relative heights.

This response is interesting because it suggests some aspects of how gifted children might reason, which could be tested in future studies. Specifically, one could say that she excelled at carrying on a long and complex chain of thought, comparing not one but two different hypothetical situations, and she ended up with a sensible conclusion, given her framing of the problem. On the other hand, she imaginatively drew on irrelevant real-world considerations (the relative heights of the two characters) instead of recognizing that she was in a deductive situation that required her to draw conclusions from the premises exactly as stated. She was logical (able to draw conclusions from the premises) but not metalogical (thinking about the premises and conclusion as such; able to represent the problem as a problem, independently of its content). Thus, this response suggests a great deal of complexity and creativity, but no spontaneous tendency to think abstractly or metalogically.

One question even provoked participants to fill in causal information absent from the problem itself. For instance, in response to:

If Mrs. Jones’s class goes to the fire station, Mr. Smith’s class goes to the zoo.

Mrs. Jones’s class goes to the fire station.

Does Mr. Smith’s class go to the zoo?

A typically developing 8 year old wasn’t sure, because they could have gotten into a car crash, while another gave the same reply because it might be raining. And in response to:

If John likes playing tag, Dave likes playing tag.

John does not like playing tag.

Does Dave like playing tag?

A gifted six year old responded that Dave couldn’t like playing tag because if two people were playing tag and it’s them, and John stops playing, then Dave has no one to play with. Then it’s not fun, so he doesn’t like it.

Some participants also interpreted the abstract problems in unexpected ways. In response to:

If the tibble is bloopish, the basmy is lerfish.

The basmy is lerfish.

Is the tibble bloopish?

A typically developing six year old answered no, saying first that “bloopish” is a color, and then imagining it was something sticky. In response to:

If there is a striggish neave, there is a delkish skell.

There is a striggish neave. Is there a delkish skell?

A gifted six year old was not sure because “two different things can be alive or not alive.” This child had interpreted “there is a” as saying that the nonsense word was a creature that was alive, and “there is not a” as implying that the creature was not alive. For the problem:

If boogle is true, ronee is true.

Ronee is true.

Is boogle true?

The same child said boogle can’t be true because if they were fighting, one has to be true and one has to be lying. Rather than interpreting “boogle” and “ronee” as states of affairs or propositions with truth values, this child interpreted them as people who could be telling the truth or lying.

These interpretations were idiosyncratic, and may not represent those of participants who chose not to explain their answers. However, they suggest that gifted children were not using structural intuition to solve these problems, even the MP ones with whose structure they were presumably familiar. These children engaged in complex trains of thought and interpreted the problems in creative ways, rather than drawing deductions from the problem structure. The current results do not distinguish whether they do so because, like unschooled adults, they fail to understand the pragmatic demands of a laboratory reasoning task (Dias, Roazzi & Harris, 2005), or because they lack structural intuition at this age.

However, the striking tendency of participants to fill in missing causal information may suggest something about they interpret the meaning of the “if…then” relation. In correct logic, because the content of the antecedent and consequent are irrelevant to deductive validity, the states of reality denoted by the antecedent and consequent could occur because of covariation or contingency (“if there is a sticker on the box, there is a fruit inside”), or even arbitary coincidence (“if there is an X on the chalkboard, there is a Y on the chalkboard”). By contrast, in everyday usage (from which children might derive their structure-answer mappings), “if…then” may imply a causal relationship between premise and conclusion, not just covariation or contingency. This assumption makes sense in the real world for two reasons. First, in this complex world, two events normally do not consistently covary unless they have some sort of meaningful causal relationship (if only an indirect one). Second, according to the pragmatic criterion of relevance, a person would not choose to mention two events together unless he or she perceived them to be somehow related. Thus, even if problems are selected to have no necessary or causal connection between premise and conclusion (“if the boots have polka dots, the scarf has stripes”; “if John likes playing tag, Dave likes playing tag”), participants will assume that these statements have been joined together into an if-then statement because they are somehow causally related. Children may think there has to be a causal connection between John and Dave (perhaps that they are friends who like to play together), even though they realize that just because one person likes something, another person need not like it.

If deductive statements in a child’s environment all have logically incorrect meanings like this, then a child may implicitly learn an incorrect interpretation of if-then statements. This interpretation, in turn, might interfere with (implicitly) recognizing the consistent structure behind each problem’s causal relation by directing attention away from structure and towards surface features. In other words, participants in the present study may have failed to develop structural intuition because they incorrectly interpret “if a then b” as “there is a causal relationship between a and b.” Because problems differed idiosyncratically in the ease of filling in causal information, the sort of causal information that could be inferred, and the degree to which such filling in would

lead to incorrect answers, children could interpret “if-then” statements in a consistent way (as causal relations) but end up with inconsistent answers to particular problem types (e.g., MP). This might explain the surprisingly low consistency in participants’ responses.

Surprising Findings.

Low Accuracy for MP & MT. Accuracy for MP and MT problems was surprisingly low in both gifted and typically developing children. In prior studies of children between preschool and third grade, accuracy rates for MP were between 80-100% and were often similarly high for MT (Wolf & Shigaki, 1983; Chao & Cheng, 2000; Hawkins et al, 1984; Markovits, 2000; Markovits et al., 1998; Markovits et al., 1996; Barrouillet Grosset. & Lecas, 2000; Taplin, Staudenmayer & Taddonio,1974). In the present study, accuracy rates for MP were only 56% for gifted children (SD: .34) and 53% for typically developing children (SD: .36), while those for MT were only 44% for gifted children (SD: .38) and 37% for typically developing children (SD: .36). The low accuracy rates for gifted children in particular was surprising, as Wolf and Shigaki (1983) found that gifted children as young as 5 answered 73.3% of MP and 56.7% of MT problems correctly, and 8 year olds reached 93.3% accuracy on MP and 91.7% on MT. The lower accuracy in the present study probably stems from the use of abstract problems and the lack of meaningful causal links between the antecedent and consequent in the concrete problems.

Lack of significant age differences. Older children did not reason significantly more accurately than younger ones. However, an effect of age group approached significance, F (3, 76) = 2.266, p = .087. Because the observed power was only .552, children may in fact have gotten more accurate with age, but there was not sufficient power to detect this change.

Positive/Negative errors. Children made two basic types of errors: certainty errors and positive/negative errors. Certainty errors consist either of supplying a determinate answer to a problem with no determinate answer or responding “not sure” to a problem with a determinate answer; both types of certainty errors have been reported both in children and adults (Schroyens, Schaeken, & D’Ydewalle, 2001; Barrouillet, Grosset & Lecas, 2000; Barrouillet & Lecas, 1998, 1999; Chao & Cheng, 2000; Wildman & Fletcher, 1979; Markovits, 2000; Markovits et al, 1996; Roberge, 1970; O’Brien & Shapiro, 1968; Taplin, Staudenmayer & Taddonio, 1974; Byrne, 1989; Bonnefon & Politzer, 2010; Verscheuren et al, 2001; Johnson-Laird, Byrne & Schaeken, 1992). However, children of all ages committed another, much more surprising sort of error. For MP and AC problems, they sometimes answered “no,” as follows:

If the boots have polka dots, the scarf has stripes.

The boots have polka dots. Does the scarf have stripes?

_______________________________________________

No, the scarf can’t have stripes.

If the boots have polka dots, the scarf has stripes.

The scarf has stripes. Do the boots have polka dots?

_____________________________________________

No, the boots can’t have polka dots.

Meanwhile, they answered “yes” to MT and DA problems, as follows:

If the boots have polka dots, the scarf has stripes.

The scarf does not have stripes. Do the boots have polka dots? (MT)

_________________________________________________________

Yes, the boots definitely have polka dots.

If the boots have polka dots, the scarf has stripes.

The boots do not have polka dots. Does the scarf have stripes? (DA)

________________________________________________________

Yes, the scarf definitely has stripes.

Such errors fit neither the logically correct conditional interpretation nor the incorrect but common biconditional one, and it is not immediately obvious how one could comprehend the premises and still produce these answers.

To trace developmental trajectories in positive/negative and certainty errors, I added up the errors for each group at each age (e.g., gifted five year olds, typically developing five year olds, etc.), assigned them to specific categories (e.g., no for MP, not sure for MP, yes for DA, no for DA, etc.), and calculated the proportion of each of these errors (see Table 11). Positive/negative answers were defined as the sum of no answers to MP, yes answers to MT, no answers to AC, and yes answers to DA. Certainty errors were defined as the sum of not sure answers to MP, not sure answers to MT, and biconditional responses to AC and DA.

Because positive/negative errors seemed to demonstrate a lack of comprehension of the premises or their relationship, I predicted that such errors would be most common in five year olds and decrease with age. An Ability x Age ANOVA with positive/negative errors as the dependent variable confirmed this trend; there was a significant main effect of age, F (3, 78) = 7.647, p < .001. Specifically, five year olds committed significantly more positive/negative errors than 6 year olds (27.9%, p = .003), 7 year olds (27.8%, p =.004), and 8 year olds (35.6%, p < .001), none of which differed significantly from each other. Typically developing children did not commit more positive/negative errors than gifted children (.399 vs. .309). There was no group x age interaction. Although typically developing 7 year olds made many more

positive/negative errors than gifted 7 year olds (.400 vs. .208), this was not a large enough difference to drive a significant interaction (see Figure 7).

In a two-factor Ability x Age ANOVA, certainty errors decreased with age, F(1, 78) = 7.647, p < .001 (see Figure 8). Specifically, five year olds made significantly fewer certainty errors than 6 year olds (mean difference: -.279, p = .003), 7 year olds (mean difference: -.278, p = .004), & 8 year olds (mean difference: -.356, p < .001), while the older children did not differ significantly from each other.

Gifted children did not make significantly fewer certainty errors than typically developing ones; they actually committed slightly more certainty errors (.691 vs. .601), though this difference was not significant. Thus, gifted children do not appear to have an advantage either in understanding that AC and DA problems have determinate answers (as predicted), or in drawing valid MP and MT inferences despite conflicting real-world information.

I am not sure why participants would give negative answers to positive problems, nor does the reasoning literature clarify matters. However, research on children’s understanding of tautologies and contradictions may explain why participants gave positive answers to negative problems. Morris and Sloutsky (2002) found that when two halves of a statement conflict, young children tend to ignore the second half and draw conclusions based on the first half. For example, they will interpret the tautology “the ball will fall or it will not fall” as simply, “the ball will fall,” and the contradiction “the ball will fall and it will not fall” as “the ball will fall.” This error is called a “cut” because it effectively cuts off half of a statement.

Several of my younger participants, when given DA problems (if a is true, b is true; a is not true) expressed confusion about what I meant (“but you just said a is true”). Repeating the if-then statement with emphasis on the if did not clear up the confusion; their misinterpretation seemed to be due not to a mishearing but to a lack of understanding that “if” implies multiple possibilities. At least some children, then, perceived DA as a contradiction. Following Morris and Sloutsky (2002), they may have performed a cut, as follows.

Problem:

If the boots have polka dots, the scarf has stripes.

The boots do not have polka dots. Does the scarf have stripes?

Interpretation:

The boots have polka dots and the scarf has stripes.

Does the scarf have stripes?

Correct answer (given interpretation):

Yes, the scarf has stripes.

Participants could also have “cut” MT, in a similar fashion.

Problem: If the boots have polka dots, the scarf has stripes.

The scarf does not have stripes. Do the boots have polka dots?

Interpretation: The boots have polka dots, and the scarf has stripes.

Do the boots have polka dots?

Correct answer (given interpretation):

Yes, the boots have polka dots.

Both these cases would lead children to provide positive answers to MT and DA, and a surprisingly large number do so (see Table 12a-c). Fully one third of the gifted five year olds consistently responded yes to DA and to MT; 29% of typically developing five year olds consistently responded “yes” to DA, while fully 57% of them did so for MT. The percentages are still higher when we include participants who responded “yes” half the time (on 2 out of 4 occasions). Overall, 67% of gifted five year olds and 86% of typically developing five year olds make a consistent MT or DA error (see Table 8a-c); a “cut” explains why so many found these responses compelling.

Interestingly, for gifted children, consistent interpretation of MP problems was related to fewer positive/negative errors (r = -.330, p = .016) and more certainty errors (.330, p = .016), as was consistent interpretation of the determinate problems (r = -.361, p = .008 & r = .361, p = .008) and of all types of conditionals (r = -.437, p = .001 & r = .437, p = .001). The effects in gifted children appeared to drive a similar relationship across all participants; for consistent interpretations of MP (r = -.311, p = .005 & r = .311, p = .005), the determinate problems (r = -.344, p = .002 & r = .344, p = .002), and all types of conditionals (r = -.329, p = .003 & r = .329, p = .003). Interestingly, there was a weaker relationship between consistency and error type in typically developing children. There was no correlation across all problem types. Correlations for MP consistency only approached significance (r = -.344, p = .079 & r = .344, p = .079), as did correlations for determinate problems (r = -.342, p = .081 and r = .342, p = .081).

In other words, for gifted children, participants who responded consistently to each type of conditional tended to make fewer positive/negative errors and more certainty errors. In a qualitative sense, the more consistent participants reasoned better. For typically developing children, the same relationship probably existed, but there were not enough participants to detect it.

Interestingly, the relationship between working memory, consistency and error type differs in gifted and typically developing children. For gifted children, children with a higher working memory age interpreted MP and MT problems more consistently (r = .285, p = .039), and made fewer positive/negative errors (r = -.470, p < .001), and more certainty errors (r = .470, p < .001). By contrast, for typically developing children, there was no significant relationship between working memory age and consistency or error type. These results contradict my predictions because they suggest that for gifted children, though not for typically developing children, those with higher working memory reason better.

Discussion

Gifted five to eight year olds made a poor test case for structural intuition. We failed to extend the results of Wolf and Shigaki’s (1983) study, as gifted children did not show adultlike accuracy on the uncertain or even on the certain problems. While gifted children performed better than chance overall, on MP and MT, and on both concrete and abstract problems, these accuracies were only between 40 and 56 percent. As in prior literature, gifted children had working memory spans about 1 SD above the mean, but contrary to our predictions, this represented a significant difference from their typically developing peers. Contrary to my predictions, gifted children were neither significantly more consistent than typically developing children, nor better at solving abstract conditionals.

Like typically developing children, gifted children appear to answer based on real-world knowledge, perhaps because they interpret “if a then b” as “there is a causal connection between a and b.” Furthermore, at least at younger ages, they may oversimplify their interpretation of conditionals, leading to erroneous positive conclusions where they should make negative ones. The most conclusive evidence against structural intuition in gifted children is their highly inconsistent interpretations of all problems, including, surprisingly, MP. Fewer than half of gifted children--between 8 and 41 percent, depending on age—had a consistent interpretation for each problem type, even an incorrect one. Even for MP, a substantial proportion—ranging from 29 to 58 percent—answered inconsistently. Because structural intuition by definition involves retrieval of a specific answer to all instances of a particular type of problem, this inconsistency suggests that these children have not yet developed structural intuition, even for MP.

One might interpret the inconsistency differently, arguing that one might have structural intuition for a problem type and still experience interference on some problems. For example, although MP is generally intuitive for adults, they sometimes answer that “nothing follows” because they have been given extra information that suggests that the antecedent does not apply, as in the following case (Byrne, 1989).

If Ruth has an essay to write, she will study late in the library.

If the library stays open, she will study late in the library.

Ruth has an essay to write.

Subjects may erroneously conclude that nothing follows because they do not know whether the library is open late.

I would argue that these participants’ pattern recognition process has failed and they no longer recognize this as an MP problem. Not all participants make this error. Someone with stronger structural intuition would recognize the MP structure despite the distracting “disabling” information and draw the valid conclusion that Jill stays late at the library. Thus, one might say that the subjects who make these sorts of errors have a less well-developed structural intuition than those who do not. In a developmental context, a younger child might be less consistent than an older child because his representations of MP structure are weaker and thus more subject to interference. Thus, one can say that participants in the present study have not fully developed structural intuition for most conditionals. I think this is the only principled interpretation one can make, because otherwise we lack empirical grounds for distinguishing children who have no conception of MP whatsoever and answer based on contingent features of the stimuli from children who have weak conceptions of MP that the experimental materials simply do not elicit. Alternatively, one could make the problems easier and look for age differences, but this ultimately would not resolve the competence vs. performance issue. No matter how clear the

instructions and concrete the problems, one can always point to interfering factors, because laboratory conditional reasoning tasks are intrinsically artificial, with expectations that differ from those surrounding real-life use of conditionals (Leevers & Harris, 1999).

Oddly, there were no significant age differences in accuracy. While an examination of the means indicated that eight year olds reasoned more accurately and consistently than five year olds, the difference did not reach significance. This may have occurred because of large within-age variability (e.g., some individuals successfully solving all MT problems and others none), compared with small between-age variability. However, significant qualitative age differences exist. Five year olds made significantly more positive/negative errors and significantly fewer certainty errors than older children. Certainty errors, either giving a determinate answer to an indeterminate problem or providing an indeterminate answer to a determinate one, have often been reported (Schroyens, Schaeken, & D’Ydewalle, 2001; Barrouillet, Grosset & Lecas, 2000; Barrouillet & Lecas, 1998, 1999; Chao & Cheng, 2000; Wildman & Fletcher, 1979; Markovits, 2000; Markovits et al, 1996; Roberge, 1970; O’Brien & Shapiro, 1968; Taplin, Staudenmayer & Taddonio, 1974; Byrne, 1989), and could be said to make sense given pragmatic implicatures (Bonnefon & Politzer, 2010; Verscheuren et al, 2001) or given an interpretation of “if…then” as meaning “if and only if” (Verscheuren et al, 2001; Johnson-Laird, Byrne & Schaeken, 1992; Bonnefon & Politzer, 2010). Positive/negative errors—that is, giving a negative response to MP or AC or a positive response to MT or DA--have not previously been reported, and they seem to make a lot less sense. Particularly for MP, it is hard to comprehend how someone who understands the premises could possibly draw this sort of conclusion. Five year olds may make more positive/negative errors because they fail to integrate the premises properly; for the negative problems in particular, they may interpret the information as contradictory and then cut the second premise (Morris & Sloutsky, 2002). As children get older they appear to integrate and comprehend the premises better and graduate to making certainty errors.

Structural intuition makes different predictions than mental model and dual process theories about the role of working memory in conditional reasoning. Structural intuition largely bypasses working memory resources, and any correlations found should be quite low and stem from the demands of encoding and recalling aurally-presented problems. By contrast, mental model theory assumes that, regardless of the modality of presentation, the reasoning process loads working memory so heavily that certain problems (the uncertain ones) can only be solved accurately with a high working memory span (that of someone at least 12 years old). Dual process theory similarly argues for the necessity of a high working memory span for accurate reasoning, particularly on uncertain problems. The present study does not contradict the need for such high working memory span to solve the uncertain problems, because none of the participants successfully solved them. This finding does not contradict mental model or dual process theories because they do not claim that children with the necessary working memory span will reason accurately, only that they have the ability to do so.

Working memory correlates moderately (between .3 and .5) with overall accuracy and various subsets of problems both for gifted and typically developing children. Children with higher working memory spans do reason better. Specifically, children with the working memory of a 12 year old reasoned more accurately than those with the working memory of a child younger than 10. However, the specific age-12 threshold itself seems unimportant, as children with the working memory of a 12 year old did not reason more accurately than children with the working memory of a 10-12 year old.

Children with the working memory of a person older than 12 also answered more consistently than children with less advanced working memory spans. Markovits, Barrouillet and

colleagues do not predict widespread inconsistency at all; rather, children are supposed to move from a consistent conjunctive to a consistent biconditional to a consistent conditional interpretation (Barrouillet, Grosset & Lecas, 2000). However, they might claim that inconsistency stems from fluctuations in available working memory that enable more models to be built at some times than at others. By contrast, I have claimed that working memory may help indirectly, by allowing children to compare their earlier responses to a particular type of problem with their currently pending response. In other words, it enables children to recognize whether or not they are answering consistently, and thus check their thinking before they answer. Because the present study does not provide a sensitive measure of mechanisms, it cannot differentiate between these two possibilities.

We found that MP was solved significantly more accurately than all other problems, and that MP was more accurate in the concrete condition, while DA was more accurate in the abstract condition. Mental model theory would claim that the first effect, at least, was driven by working memory demands: MP requires only one model to solve, while the other problems require 2 or 3 (Markovits, 2002). In fact, neither of these effects were driven by working memory; controlling for working memory did not eliminate them. Thus, working memory does not appear to be as all-determining factor as mental model theories might predict.

On the whole, the present study suggests a large but potentially nuanced involvement for working memory in reasoning. The results are more consistent with mental model and dual process theory than with structural intuition, but in some cases are weaker than mental model theory in particular might predict.

Participants were far less accurate on MP and MT problems than in other developmental studies with preschoolers through third graders (Wolf & Shigaki, 1983; Chao & Cheng, 2000; Hawkins et al, 1984; Markovits, 2000; Markovits et al., 1998; Markovits et al., 1996; Barrouillet Grosset. & Lecas, 2000; Taplin, Staudenmayer & Taddonio,1974). This difference may have occurred because I went to great lengths to eliminate causal and other real-world relationships between the antecedent and the consequent, whereas most developmental studies include many problems with such relationships. If children can rely both on real-world knowledge and any logical competence they possess to solve problems, they will clearly do better than if they are forced to rely on logical competence. Thus, the accuracy levels in the present study may indicate the baseline deductive ability of 5-8 year old children. This ability appears to be more reliable than chance, but far from ceiling. Thus, 5-8 year old children are rational, at least on problems with determinate answers, but their deductive ability has much room to develop.

Future directions.

Methodological limitations may have interfered with finding significant differences between gifted and typically developing children. First, the gifted children may simply not have been sufficiently different from the typically developing ones. To get a conservative estimate of giftedness and to maximize the number of children in the typically developing group (who were more difficult to recruit), I used a more conservative IQ cutoff than did Center for Talent Development. Center for Talent Development requires children to score in the 95th percentile, while I required them to score in the 98th percentile. Several children who I placed in the typically developing group scored between the 95th and 98th percentile, and thus may have overlapped in IQ and other cognitive characteristics with the gifted group. Conversely, one child who I learned participated in Center for Talent Development after I obtained her IQ score, earned

a full scale IQ of 123 and would otherwise have been placed in the typically developing group. Without obtaining the scores of Center for Talent Development participants, there is no way to know how serious this overlap may be. If I were to do a new version of the present study, I would either give everyone the BIA with the same criterion, or I would simply use the same cutoff as the gifted program from which I recruit.

Perhaps subgroups of gifted children differ in their abstract reasoning ability, and I undersampled these subgroups with a facility for abstract reasoning and oversampled those with no particular talent for it. For instance, for gifted children who received a BIA score, the highest IQ was 142. Dramatic differences in cognitive characteristics, including abstract reasoning ability, have been reported between moderately gifted children with IQs in the 130s to 140s and highly gifted children with more than 160 IQ (Lovecky, 1994). A sample with predominantly highly gifted children might have found significant results. However, I cannot draw this conclusion with any certainty because the gifted children in Wolf and Shigaki’s study (1983) earned an average IQ of 143 (SD: 11.1), which would be lower today due to the Flynn effect10 (Silverman, 2009; Lohman, 2009). In other words, Wolf and Shigaki’s participants had roughly the same IQ as gifted children in the present study, but reasoned much more accurately.

More importantly, the instructions and answer choices provided may have interfered with performance and also made it harder to assess. First, the fact that the third answer choice, “not sure” meant both “don’t know” and “nothing follows” obscured the meaning of these responses. It may also have overinflated accuracy for younger children who admitted confusion on abstract problems, or those of any age who preferred to hedge their bets. The spontaneously-used vocabulary of participants suggests a better way to assess certainty in future versions of this study. A number of children spontaneously used phrasing like “can,” “probably,” and “doesn’t have to,” all of which were scored as “not sure.” Such phrases, then, are within children’s repertoire and could be used in future studies to ask more directly about certainty. Thus, a future version of this study should use “can” instead of “not sure” as the third answer choice.

Before beginning the reasoning task, children solved three example problems intended to explain the meaning of the answer choices. These examples may have unwittingly encouraged children to focus on pragmatic rather than logical considerations. In the problem intended to illustrate the “can’t” response, the reason was that a person could not be in two places at once. In the example intended to indicate that “not sure” could mean “nothing follows for certain,” the rationale was that the question asked about an entire day, and the premises only described a particular moment during that day. Thus, these examples may have communicated that we wanted children to make predictions in response to tricky questions about real-world situations. Judging by some participants’ complex and erroneous justifications based on real-world knowledge, they performed exactly as “expected.”

Instructions mentioned that the problems would concern “imaginary” events, but this may not have been a strong enough cue to set aside real-world knowledge. Other studies that give such an instruction emphasize this point a lot more (Dias & Harris, 1988, 1990; Hawkins et al, 1984; Leevers & Harris, 1999), and future versions of this study probably should as well.

10 Whereas mean IQ scores have risen at a rate of 3 points every 10 years for the population as a whole, the gifted validation samples for the WISC-IV and SB-5, who had been identified by scores above 130 on earlier tests, obtained full-scale IQs of 123.5 and 124, respectively (Silverman, 2009). This drop suggests that gifted children have maintained the same intelligence while the population in general has increased, thus causing a drop in IQ relative to the general population.

While these methodological changes would improve the present study, it may be that 5-8 year olds, regardless of ability level, are not the best age group in which to seek evidence for structural intuition. It would be more promising to follow up with a study of adults, testing each of the specific characteristics of structural intuition. For instance, speed could be tested through RTs, effortlessness through effort ratings, and feelings of certainty with ratings (e.g., “Do you feel your answer ‘has to be true?’ accompanied by a 1-7 Likert scale). Awareness of abstract structure and task-specificity could be tested in the same fashion as in the present study, by comparing concrete and abstract problems and by covarying working memory and executive function abilities. With adults, protocol analysis would become more of an option; a large number of repetitions of the premises or answers like “I don’t know how I know, I just know” should characterize structural intuition. If structural intuition is related to skill at reasoning, then greater degrees of speed, effortlessness, feelings of certainty, etc. should correlate with higher accuracy. If at least some adults demonstrate structural intuition, then it might be worth examining increasingly younger age groups to trace its development for specific problems.

The present study aimed simply to demonstrate the existence of structural intuition, so many details about its mechanisms remain unexplored. Specifically, future studies should explore how the implicit learning process works, what stimuli trigger it and which impede it, why it applies earlier to some problems than others, whether and why it might differ between individuals, and how it interacts with well-studied interpretational factors such as pragmatic implicature. In addition, it would help to specify the maximum range of problems that can be solved using structural intuition. Are there some that no one solves in this way, or can people develop structural intuition for any type of problem, given sufficient exposure?

The roles of logical versus linguistic structure in structural intuition should be untangled, if possible. Does the implicit learning process absorb linguistic structure, logical structure, or both? Potentially, “if-then” sentences might provide the cue to retrieve the appropriate logical structure and answer. One might explore these issues by using sentence structures other than “if a then b,” such as “b follows from a” or “b if a.” Any differences in interpretation may result from a lack of structural intuition due to lack of exposure to such sentences; participants might have to painstakingly map these statements by analogy onto the more familiar “if a than b” statements. Such a study would probably need to use adults, because children might have difficulty simply due to a limited vocabulary or difficulties with complicated sentence structures.

Given the limitations of the present study, there are many fruitful avenues for research on structural intuition. If this process can, in fact, be demonstrated, it could help explain why some deductions seem so intuitive, and perhaps, how skilled reasoning develops.

References

Ali, Nilufa, Schlottmann, Anne, Shaw, Abigail, Chater, Nick & Oaksford, Mike (2010). Causal discounting and conditional reasoning in children. In Oaksford, Mike & Chater, Nick, eds. (2010). Cognition and conditionals: probability and logic in human thinking. Oxford: Oxford University Press. Allen, R. & Reber, Arthur (1980). Very long term memory for tacit knowledge. Cognition 8:2, 175- 185 Andrews, Glenda & Halford, Graeme S. (1998). Children’s ability to make transitive inferences: the importance of premise integration and relational complexity. Cognitive Development 13, 479-513 Antshel, Kevin M., Faraone, Stephen V., Stallone, Kimberly, Nave, Andrea, Kaufmann, Felice A., Doyle, Alysa, Fried, Ronna, Seidman, Larry & Biederman, Joseph (2007). Is attention deficit hyperactivity disorder a valid diagnosis in the presence of high IQ? Results from the MGH longitudinal family studies of ADHD. Journal of Child Psychology & Psychiatry 48:7, 687-694 Baars, Bernard J. (1997). In the theater of consciousness: Global workspace theory, a rigorous scientific theory of consciousness. Journal of Consciousness Studies 4:4, 292-309 Bacon, Alison, Handley, Simon, Dennis, Ian & Newstead, Stephen (2007). Reasoning strategies: the role of working memory and verbal-spatial ability. European Journal of Cognitive Psychology 20:6, 1065-1088 Baddeley, Alan (1992). Working memory. Science 255, 556-559 Baddeley, A. D., & Hitch, G. J. (1974). Working memory. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 8, pp. 47-89). New York: Academic Press. Barfurth, Marion A., Ritchie, Krista C., Irving, Julie A., & Shore, Bruce M. (2009). A metacognitive portrait of gifted learners. In Shavinia, Larisa V, ed. (2009). International Handbook on Giftedness. Springer Netherlands. Barrouillet, Pierre, Grosset, Nelly, & Lecas, Jean-Francois (2000). Conditional reasoning by mental models: chronometric & developmental evidence. Cognition 75, 237-266 Barrouillet, Pierre & Lecas, Jean-Francois (1998). How can mental models theory account for content effects in conditional reasoning? A developmental perspective. Cognition 67, 209-253. Barouillet, Pierre & Lecas, Jean-Francois (1999). Mental models in conditional reasoning and working memory. Thinking & Reasoning, 5:4, 289-302 Benbow, Camilla Persson, & Minor, Lola L. (1990). Cognitive profiles of verbally and mathematically precocious students: implications for identification of the gifted. Gifted Child Quarterly 34:1, 21-26 Best, John R., Miller, Patricia H., & Jones, Lara L. (2009). Executive functions after age 5: changes and correlates. Developmental Review 29, 180-200 Bonnefon, Jean-Francois, & Politzer, Guy (2010). Pragmatic conditionals, conditional pragmatics, and the pragmatic component of conditional reasoning. In Oaksford, Mike & Chater, Nick, eds. (2010). Cognition and conditionals: probability and logic in human thinking. Oxford: Oxford University Press. Braine, Martin D.S. (1990). The “natural logic” approach to reasoning. In Overton, Willis F., ed. (1990). Reasoning, necessity, and logic: developmental perspectives. New Jersey: Lawrence Earlbaum Associates. Braver, Todd S., Cohen, Jonathan D., Nystrom, Leigh E., Jonides, John, Smith, Edward E., & Noll, Douglas C. (1997). A parametric study of prefrontal cortex involvement in human working memory. Neuroimage 5, 49-62 Byrne, Ruth M.J. (1989). Suppressing valid inferences with conditionals. Cognition 31, 61-83. Capon, Alison, Handley, Simon & Dennis, Ian (2003). Working memory & reasoning: an individual differences perspective. Thinking & Reasoning 9:3, 203-244

Chao, Shaw-Jing, & Cheng, Patricia W. (2000). The emergence of inferential rules: The use of pragmatic reasoning schemas by preschoolers. Cognitive Development 15, 39-62 Cheng, P.W. & Holyoak, K.J. (1985). Pragmatic reasoning schemas. Cog. Psych. 17, 391-416. Conway, Andrew R.A., Kane, Michael J., & Engle, Randall W. (2003). Working memory capacity & its relation to general intelligence. Trends in Cognitive Sciences 7:12, 547-552 Cowan, Nelson & Alloway, Tracy. (2009). Development of Working Memory in Childhood. Draft of a chapter published in M.L. Courage & N. Cowan (eds). (2009), The development of memory in infancy and childhood. London: Psychology Press. Davidson, Janet E., & Sternberg, Robert J. (1984). The role of insight in intellectual giftedness. Gifted Child Quarterly 28:2, 58-64 De Abreu, Pascale M.J. Engel, Conway, Andrew R.A., & Gathercole, Susan E. (2010). Working memory & fluid intelligence in young children. Intelligence 38, 552-561 DeCaro, Marci S., Thomas, Robin D., & Beilock, Sian L. (2008). Individual differences in category learning: sometimes less working memory capacity is better than more. Cognition 107, 284-294 DeJong, P.F. & Das-Smaal, E.A. (1995). Attention & intelligence: the validity of the star- counting test. J. of Educ. Psych. 87:1, 80-92 DeNeys, Wim (2006). Dual processing in reasoning: Two systems but one reasoner. Psychological Science 17:5, 428-433 Dempster, F.N. (1981). Memory span: sources of individual & developmental differences. Psych. Bulletin 89:1, 63-100 Dias, M. & Harris, PL. (1988). The effect of make-believe play on deductive reasoning. British J. of Dev. Psych. 6, 207-221. Dias, M. & Harris, P.L. (1990). The influence of the imagination on reasoning by young children. British J. of Dev. Psych. 8, 305-318. Dias, Maria, Roazzi, Antonio & Harris, Paul L. (2005). Reasoning from unfamiliar premises: a study with unschooled adults. Psychological Science 16:7, 550-554 Duyck, Wouter, Desmet, Timothy, Verbeke, Lieven P.C. & Brysbaert, Marc (2004). WordGen: A tool for word selection and nonword generation in Dutch, English, German, and French. Behavior Research Methods, Instruments & Computers 36:3, 488-499 Duyck, Wouter, Vandierendonck, Andre, & De Vooght, Gino (2003). Conditional reasoning with a spatial content requires visuo-spatial working memory. Thinking & Reasoning 9:3, 267- 287. Evans, Jonathan St. B.T. (2003). In two minds: dual-process accounts of reasoning. Trends in Cognitive Science 7:10, 454-459. Falmagne, Rachel Joffe (1990). Language ad the acquisition of logical knowledge. In Overton, Willis F., ed. (1990). Reasoning, necessity, and logic: developmental perspectives. New Jersey: Lawrence Earlbaum Associates. Fletcher, Janet, Maybery, Murray T., & Bennett, Sarah (2000). Implicit learning differences: A question of developmental level? Journal of Experimental Psychology: Learning, Memory & Cognition, 26:1, 246-252 Fuster, Joaquin M. & Alexander, Garrett E. (1971). Neuron activity related to short-term memory. Science 173, 652-654 Gathercole, Susan E. (1999). Cognitive approaches to the development of short-term memory. Trends in Cognitive Science 3:11, 410-419 Gathercole, Susan E., Pickering, Susan J., Ambridge, Benjamin & Wearing, Hannah (2004). The structure of working memory from 4 to 15 years of age. Developmental Psychology 40:2, 177-190. Gobbo, Camilla & Chi, Michelene (1986). How knowledge is structured and used by expert and novice children. Cognitive Development 1, 221-237

Gomez, Rebecca L. & Gerken, LouAnn (1999). Artificial grammar learning by 1-year-olds leads to specific and abstract knowledge. Cognition 70, 109-135 Gomez, Rebecca L. & Gerken, LouAnn (2000). Infant artificial language learning & language acquisition. Trends in Cognitive Sciences 4:5, 178-186 Goschke, T. & Bolte, A. (2007). Implicit learning of semantic category sequences: response- independent acquisition of abstract sequential regularities. Journal of Experimental Psychology: Learning, Memory & Cognition 33:2, 394-406 Halford, Graeme (1984). Can young children integrate premises in transitivity & serial order tasks? Cognitive Psychology 16, 65-93 Halford, Graeme S., Wilson, William H., & Phillips, Steven (1998). Processing capacity defined by relational complexity: Implications for comparative, developmental, & cognitive psychology. Behavioral & Brain Sciences 21:6, 803-864 Handley, Simon J., Capon, A., Copp, C. & Harper, C. (2002). Conditional reasoning & the Tower of Hanoi: The role of spatial & verbal working memory. British J. of Psych. 93, 501-518. Handley, Simon J., Capon, A., Beveridge, M., Dennis, I., & Evans, J. St. B.T. (2004). Working memory, inhibitory control, and the development of children’s reasoning. Thinking & Reasoning 10:2, 175-195 Hawkins, J., Pea, R.D., Glick, J. & Scribner, S. (1984). “Merds that laugh don’t like mushrooms”: evidence for deductive reasoning by preschoolers. Developmental Psychology 20:4, 584-594. Holllingworth, Leta (1938). An enrichment curriculum for rapid learners at Public School 500: Speyer School. Teachers College Record 39, 296-306 Hulstijn, Jan H. (2005). Theoretical and empirical issues in the study of implicit and explicit second- language learning. Studies in second language acquisition 27, 129-140. Johnson-Laird, Philip N. (2001). Mental models and deduction. Trends in Cognitive Sciences 5:10, 434- 442 Johnson-Laird, Philip N. (2010). Mental models and human reasoning. PNAS 107:43, 18243-18250 Johnson-Laird, Philip N. & Byrne, Ruth M.J. (2002). Conditionals: a theory of meaning, pragmatics, & inference. Psych. Review 109:4, 646-678. Johnson-Laird, Philip N., Byrne, Ruth M.J., & Schaeken Walter (1992). Propositional reasoning by model. Psychological Review 99:3, 418-439. Jung-Beeman, Mark, Bowden, Edward M., Haberman, Jason, Frymiare, Jennifer L., Arambel-Liu, Stella, Greenblatt, Richard, Reber, Paul J., & Kounios, John (2004). Neural activity when people solve verbal problems with insight. PLOS Biology 2:4, 500-510 Kane, Michael J., Hambrick, David Z., & Conway, Andrew R.A. (2005). Working memory capacity and fluid intelligence are strongly related constructs: A reply to Ackerman, Beier & Boyle (2005). Kaufman, Scott Barry, DeYoung, Colin G., Gray, Jeremy R., Jimenez, Luis, Brown, Jamie & Mackintosh, Nicholas (2010). Implicit learning as an ability. Cognition 116, 321-340 Klaczynski, Paul A., & Daniel, David B. (2005). Individual differences in conditional reasoning: a dual- process account. Thinking & Reasoning 11:4, 305-325. Klauer, Karl Christoph, Stegmaier, Ralph & Meiser, Thorsten (1997). Working memory involvement in propositional & spatial reasoning. Thinking & Reasoning 3:1, 9-47 Klauer, Karl Christoph, Beller, Sieghard & Hutter, Mandy (2010). Conditional reasoning in context: a dual-source model of probabilistic inference. J. of Exp. Psych: Learning, Memory & Cognition. 36:2, 298-323 Kyllonen, Patrick C & Christal, Raymond E. (1990). Reasoning ability is (little more than) working-memory capacity?! Intelligence 14, 389-433.

Leevers, Hilary J. & Harris, Paul L. (1999). Persisting effects of instruction on young children’s syllogistic reasoning with incongruent and abstract premises. Thinking & Reasoning 5:2, 145- 173 Lewicki, Pawel, Hill, Thomas & Czyzewska, Maria (1992). Nonconscious acquisition of information. American Psychologist 47:6, 796-801 Lohman, David F. (2009). Identifying academically talented students: some general principles, two specific procedures. In Shavinina, Larisa, ed. (2009). International Handbook of Giftedness. Ottawa: University of Quebec. Lovecky, Dierdre V. (1994). Exceptionally gifted children: different minds. Roeper Review 17:2, 116-120. Markovits, Henry (2000). Mental model analysis of young children’s conditional reasoning with meaningful premises. Thinking & Reasoning 6:4, 335-347. Markovits, Henry & Barrouillet, Pierre (2002). The development of conditional reasoning: a mental model account. Developmental Review 22, 5-36 Markovits, Henry, Doyon, Celine, & Simoneau, Michael (2002). Individual differences in working memory & conditional reasoning with concrete & abstract content. Thinking & Reasoning 8:2, 97-107 Markovits, Henry, Fleury, Marie-Leda, Quinn, Stephanie, & Venet, Michele (1998). The development of conditional reasoning & the structure of semantic memory. Child Development 69:3, 742-755 Markovits, Henry & Quinn, Stephane (2002). Efficiency of retrieval correlates with “logical” reasoning from causal conditional premises. Memory & Cogntiion 30:5, 696-706. Markovits, Henry, Venet, Michele, Janveau-Brennan, Genevieve, Malfait, Nicole, Pion, Nadia & Vadeboncoeur, Isabelle (1996). Reasoning in young children: Fantasy and information retrieval. Child Development 67:6, 2857-2872 Martinussen, Rhonda, Hayden, Jill, Hogg-Johnson, Sheila, & Tannock, Rosemary (2005). A meta- analysis of working memory impairments in children with Attention-Deficit/Hyperactivity Disorder. Journal of the American Academy of Child & Adolescent Psychiatry 44:4, 377-384 Mathews, Robert C., Buss, Ray R., Stanley, William B., Blanchard-Fields, Fredda, Ryeul Cho, Jeung, & Druhan, Barry (1989). Role of implicit and explicit processes in learning from examples: a synergistic effect. Journal of Experimental Psychology: Learning, Memory & Cognition 15:6, 1083-1100 McGeorge, Peter, & Crawford, J.R. & Kelly, S.W. (1997). The relationships between psychometric intelligence & learning in an explicit & an implicit task. Journal of Experimental Psychology: Learning, Memory & Cognition 23:1, 239-245 Means, Mary L., & Voss, James F. (1996). Who reasons well? Two studies of informal reasoning among children of different grade, ability, & knowledge levels. Cognition & Instruction 14:2, 139-178 Meulemans, Thierry, Van der Linden, Martial, & Perruchet, Pierre (1998). Implicit sequence learning in children. Journal of Experimental Child Psychology 69, 199-221 Miller, Linda T. & Vernon, Philip A. (1996). Intelligence, reaction time & working memory in 4 to 6 year old children. Intelligence 22, 155-190 Morris, Bradley J. & Sloutsky, Vladimir (2002). Children’s solutions of logical versus empirical problems: What’s missing and what develops? Cog. Dev. 16, 907-928. Noel, Marie-Pascale (2009). Counting on working memory when learning to count and to add: a preschool study. Dev. Psych. 45:6, 1630-1646. Oaksford, Mike & Chater, Nick (2001). The probabilistic approach to human reasoning. Trends in Cognitive Science 5:8, 349-357. Oaksford, Mike & Chater, Nick (2003). Conditional probability and the cognitive science of propositional reasoning. Mind & Language 18:4, 359-379.

O’Brien, Thomas C. & Shapiro, Bernard J. (1968). The development of logical thinking in children. American Ed. Research J. 5:4, 531-542. Overton, Willis, Byrnes, James P., & O’Brien, David P. (1985). Developmental and individual differences in conditional reasoning: the role of contradiction training and cognitive style. Dev. Psych. 21:4, 692-701. Perkins, D.N. & Salomon, Gavriel (1989). Are cognitive skills context-bound? Educational Researcher 18:1, 16-25 Ramscar, M. & Gitcho, N. (2007). Developmental change and the nature of learning in childhood. Trends in Cognitive Sciences 11, 274-279 Reber, Arthur S. (1967). Implicit learning of artificial grammars. Journal of Verbal Learning & Verbal Behavior 6:6, 855-863 Reber, Arthur S. (1989). Implicit learning and tacit knowledge. Journal of Experimental Psychology 118:3, 219-235 Reber, Arthur S. & Lewis, Selma (1977). Implicit learning: an analysis of the form and structure of a body of tacit knowledge. Cognition 5, 333-361 Reber, Arthur S., Walkenfeld, Faye F., & Hernstadt, Ruth (1991). Implicit and explicit learning: Individual differences and IQ. Journal of Experimental Psychology: Learning, Memory & Cognition 17:5, 888-896. Ricker, Timothy J., AuBouchon, Angela M., & Cowan, Nelson (2010). Working memory. WIREs Cogn. Sci 1, 1-13 Rips, Lance J. (1983). Cognitive processes in propositional reasoning. Psychological Review 90:1, 38-71. Rips, Lance J. (1994). The psychology of proof: Deductive reasoning in human thinking. MIT. Roberge, James J. (1970). A study of children’s abilities to reason with basic principles of deductive reasoning. Am. Educ. Res. J. 7, 583 Schoenfeld, Alan H. & Herrmann, Douglas J. (1982). Problem perception & knowledge structure in expert and novice mathematical problem solvers. Journal of Experimental Psychology: Learning, Memory & Cognition, 8:5, 484-494 Scholnick, Ellin Kofsky, & Wing, Clara S. (1995). Logic in conversation: comparative studies of deduction in children & adults. Cognitive Development 10, 319-345 Scholnick, Ellin Kofsky, & Wing, Clara S. (1991). Speaking deductively: preschoolers’ use of if in conversation & in conditional inference. Dev. Psych. 27:2, 249-258 Schroyens, Walter J., Schaeken, Walter & D’Ydewalle, Gery (2001). The processing of negations in conditional reasoning: A meta-analytic case study in mental model and/or mental logic theory. Thinking & Reasoning 7:2, 121-172 Seger, Carol Auguart (1994). Implicit learning. Psychological Bulletin 115:2, 163-196 Shaw, P., Greenstein, D., Lerch, J., Clasen, L., Lenroot, R., Gogtay, N., Evans, A., Rapoprt, J., & Giedd, J. (2006). Intellectual ability & cortical development in children & adolescents. Nature 440, 676- 679 Silverman, Linda Kreger (2009). How to use the new IQ tests in selecting gifted students. Executive summary of article “The Measurement of Giftedness” in Shavinia, Larisa (ed). International Handbook on Giftedness. Springer. Silverman, L. K., Gilman, B. J., & Falk, R. F. (2004, November). Who are the gifted using the new WISC- IV? Paper presented at the 51 st annual convention of the National Association for Gifted Children, Salt Lake City, UT Storkel, Holly H. (2001). Learning new words: phonotactic probability in language development. Journal of Speech, Language & Hearing Research 44, 1321-1337 Subramaniam, Karuna, Kounios, John, Parrish, Todd B., & Jung-Beeman, Mark (2009). A brain mechanism for facilitation of insight by positive affect. Journal of Cognitive Neuroscience 21:3, 415-432.

Sweetland, John D., Reina, Jacqueline M. & Tatti, Anne F. (2006). WISC-III Verbal/Performance discrepancies among a sample of gifted children. Gifted Child Quarterly 50:7, 7-10 Taplin, John E., Staudenmayer, Herman & Taddonio, Judith L. (1974). Developmental changes in conditional reasoning: linguistic or logical? J. of Exp. Child Psych. 17, 360-373 Thompson-Schill, Sharon L., Ramscar, Michael, & Chrysikou, Evangelia G. (2009). Cognition without control: When a little frontal lobe goes a long way. Current Directions in Psychological Science 18:5, 259-263 Toms, Margaret, Morris, Neil & Ward, Deborah (1993). Working memory and conditional reasoning. Quarterly Journal of Experimental Psychology A, 46:4, 679-699 Van Tassel-Baska, Joyce (2000). The ongoing dilemma of effective identification practices in gifted education. The Communicator, 31. Venet, Michele & Markovits, Henry (2001). Understanding uncertainty with abstract conditional premises. Merrill-Palmer Quarterly 47:1, 74-99 Verscheuren, Niki, Schaeken, Walter & D’Ydewalle, Gery (2004). Everyday conditional reasoning with working memory preload. Proceedings of the 26th Annual Meeting of the Cognitive Science Society 1399-1404. Mahwah, NJ: Erlbaum. Verscheuren, Niki, Schaeken, Walter & D’Ydewalle, Gery (2005). Everyday conditional reasoning: a working memory-dependent tradeoff between counterexample and likelihood use. Memory & Cognition 33:1, 107-119. Verscheuren, Niki, Schaeken, Walter & D’Ydewalle, Gery (2005). A dual-process specification of causal conditional reasoning. Thinking & Reasoning 11:3, 239-278 Verschueren, N., Schroyens, W., Schaeken, W., & DʼYdewalle, G. (2001). Why do participants draw non- valid inferences in conditional reasoning? Current Psychology Letters, 6, 238-246. Vos, Sandra H., Gunter, Thomas C., Schriefers, Herbert, & Friederici, Angela D. (2001). Syntactic parsing and working memory: the effects of syntactic complexity, reading span, & concurrent load. Language & Cognitive Processes 16:1, 65-103 Wechsler, D. (2003). Wechsler intelligence scale for children—Fourth edition (WISC-IV). San Antonio, TX: The Psychological Corporation. Wildman, Terry M. & Fletcher, Harold J. (1979). Processing errors in conditional and biconditional problem solving behavior. Contemporary Ed. Psych. 4, 366-380. Wilkinson, Cynthia (1993). WISC-R profiles of children with superior intellectual ability. Gifted Child Quarterly 37:2, 84-91 Wolf, Willavene, & Shigaki, Irene (1983). A developmental study of young gifted children’s conditional reasoning ability. Gifted Child Quarterly 27, 173-179 Woodcock, Richard W., McGrew, Kevin S., & Mather, Nancy (2001). Woodcock-Johnson III Tests of Cognitive Abilities. Itasca, IL: Riverside Publishing. Woodsworth, R.S. & Sells, S.B. (1935). An atmosphere effect in formal syllogistic reasoning. J. of Exp. Psych. 18:4, 451-460.

Table 1. IQ Scores of Typically Developing Participants

Overall IQ Verbal IQ Fluid IQ Processing Speed

Mean (SD) 111 106 117 99

Range 87-129 81-136 84-138 62-122

Table 2. Reasoning performance in gifted and typically developing children

Gifted Typically Developing

Overall .41 (.13) .39 (.13)

Concrete .40 (.15) .38 (.15)

Abstract .41 (.17) .41 (.21)

MP .56 (.34) .53 (.36)

MP Concrete .63 (.37) .55 (.43)

MP Abstract .51 (.41) .52 (.39)

MT .44 (.38) .37 (.36)

MT Concrete .37 (.42) .36 (.40)

MT Abstract .49 (.45) .36 (.42)

AC .30 (.30) .34 (.31)

AC Concrete .30 (.34) .33 (.36)

AC Abstract .28 (.37) .34 (.38)

DA .34 (.32) .30 (.30)

DA Concrete .28 (.36) .22 (.29)

DA Abstract .38 (.39) .38 (.44)

Table 3a. Working memory scores in gifted and typically developing children Gifted-

Raw Score Typical-Raw Score

Gifted-Age Equiv.

Typical-Age Equiv.

Gifted-Standard Score

Typical-Standard Score

Mean (SD) 16 11 125 96 117 105

Range 2-34 2-22 50-265 50-149 78-170 78-132

*Where Raw Score= the total number of points earned, Age Equivalent = the age (in months) at which the average child gets the same raw score, and Standard Score is normed based on a child’s age in the same fashion as IQ, with a mean of 100 and a SD of 15. Table 3b. Working memory development in gifted and typically developing children Age Raw Score Age Equivalent Standard Score

Gifted 5 10 (9) 92 (59) 112 (23)

6 15 (8) 117 (51) 118 (21)

7 16 (6) 120 (6) 113 (17)

8 22 (5) 171 (60) 123 (13)

Typically Developing

5 6 (6) 70 (26) 101 (15)

6 8 (3) 78 (14) 99 (11)

7 17 (4) 121 (19) 117 (11)

8 16 (5) 116 (23) 105 (14)

Table 4. Working memory age equivalencies after removing gifted outliers 5 years old 6 years old 7 years old 8 years old

Gifted 76 (22) 110 (36) 120 (34) 177 (59)

Typically Developing

70 (26) 78 (16) 121 (20) 116 (23)

Table 5a-b. Working memory development in gifted and typically developing children a. Percent of gifted and typically developing children at various working memory age levels Gifted Typically Developing Below Age 10 52.8 70.4

Ages 4-5 (50-71 months) 20.8 37.0 Age 6 (72-83 months) 5.7 11.1 Age 7 (84-95 months) 3.8 3.7

Age 8 (96-107 months) 15.2 3.7 Age 9 (109-119 months) 7.6 14.8

Ages 10-11 (120-143 months) 22.6 25.9 Above age 12 (144+ months) 24.7 3.7 b. Percent of gifted and typically developing children with working memory standard scores at specific standard deviations from the mean Gifted Typically Developing Below 100 17.0 29.6 75-84 (> 1 SD below mean) 3.8 3.7 85-99 13.2 25.9 100-114 30.2 40.8 115-129 35.8 22.2 130-144 15.1 7.4 145-160 1.9 0

Figure 1. Unequal working memory standard score distributions in gifted and typically developing children

Table 6. Accuracy and reaction time on the rote task in gifted and typically developing children Gifted Typically Developing

Overall Accuracy .84 (.01)

.53-.98

.78 (.03)

.51-.98

2x2 Accuracy .93 (.01)

.63-1.00

.84 (.03)

.55-1.00

3x3 Accuracy .82 (.02)

.51-1.00

.77 (.03)

.47-1.00

4x4 Accuracy .81 (.02)

.50-.99

.78 (.03)

.48-.98

5x5 Accuracy .79 (.02)

.44-.95

.74 (.03)

.45-.95

Overall RT 50.7 (3.1)

19.4-132.4

49.5 (5.4)

8.8-113.2

2x2 RT 8.4 (.83)

1.7-34.5

11.7 (1.6)

2.8-38.5

3x3 RT 38.2 (3.8)

13.7-154.0

32.4 (3.6)

5.8-80.5

4x4 RT 64.9 (5.3)

15.0-230.5

68.4 (8.8)

10.2-221.0

5x5 RT 90.3 (5.1)

19.8-203.3

85.5 (10.9)

16.3-225.6

Table 7a-b. Working memory correlates with accuracy across all participants Working Memory Age

Equivalence Working Memory Raw Score

Accuracy, all problems .416 p < .001

.461 p < .001

2x2 Accuracy .388 p = .001

.415 p < .001

3x3 Accuracy .397 p < .001

.442 p < .001

4x4 Accuracy .459 p < .001

.502 p < .001

5x5 Accuracy .268 p = .021

.326 p = .026

Working Memory Standard Score

Accuracy, all problems .255 p = .026

2x2 Accuracy .236 p = .040

3x3 Accuracy .267 p = .020

4x4 Accuracy .270 p = .019

5x5 RT .236 p = .043

Figure 2. Reaction time developmental trends in gifted and typically developing children

Table 8. Working memory-Rote task correlations in gifted and typically developing children

Gifted Typically Developing

Working memory & Overall

accuracy

.325

p = .020

.685

p < .001

Working memory & 2x2

accuracy

.310

p = .027

.538

p = .006

Working memory & 3x3

accuracy

.310

p = .027

.720

p < .001

Working memory & 4x4

accuracy

.399

p = .004

.720

p < .001

Working memory & 5x5

accuracy

None .602

p = .001

Working memory & 5x5 RT None Approaching significance, p =

.084

.353

Standard working memory

score & 3x3 accuracy

None .426

p = .034

Standard working memory

score & 4x4 accuracy

None .403

p = .046

Standard working memory

score & 5x5 RT

.294

p = .040

None

Table 9. Rates of accurate solution by working memory age

High (WM age 12+)

Middle (WM age 10-12)

Low (WM below age 10)

MP MP Concrete .75 (.37) .48 (.38) .61(.39) MP Abstract .72 (.36) .53 (.38) .44 (.40)

MT MT Concrete .59 (.42) .38 (.43) .29 (.38) MT Abstract .66 (.44) .43 (.44) .39 (.43)

AC AC Concrete .22 (.31) .43 (.41) .29 (.32) AC Abstract .34 (.44) .30 (.38) .29 (.35)

DA DA Concrete .25 (.37) .35 (.37) .23 (.32) DA Abstract .41 (.42) .43 (.41) .36 (.40)

Figure 3a. Performance on concrete and abstract problems by working memory age

Note: 1 indicates Concrete, 2 stands for Abstract. Figure 2b. Performance on certain and uncertain problems by working memory age

Note: 1 indicates MP, 2 stands for MT, 3 means AC, and 4 is DA.

Figure 4a-b. Consistent interpretations increase with working memory age. a. Across all problems

b. Across problems with certain solutions

* Note: 1 indicates the High group, 2 the Middle group, and 3 the Low Group. Table 10a-c. Consistency in gifted and typically developing children

a. Proportion of consistent responses over development in gifted and typically developing

children

Group Age Consistency, All

Problems

Consistency,

MP

Consistency,

MP & MT

Gifted 5 .08 .42 .25

6 .41 .71 .53

7 .33 .67 .40

8 .38 .54 .46

Typically

Developing

5 .14 .57 .29

6 .13 .75 .38

7 .14 .71 .71

8 .14 .71 .43

Table 10b. Number of participants with a consistent interpretation of MP.

Group Age N

Gifted 5 5

6 12

7 10

8 7

Typically Developing 5 4

6 6

7 5

8 5

Table 10c. Accuracy levels for children with a consistent interpretation of MP.

G-5 G-6 G-7 G-8 TD-5 TD-6 TD-7 TD-8

MP

Concrete

.90

(.28)

.63

(.43)

.70

(.42)

.86

(.38)

.63

(.48)

.58

(.49)

.70

(.45)

.70

(.45)

MP

Abstract

.70

(.45)

.38

(.43)

.80

(.35)

.86

(.20)

.63

(.48)

.58

(.49)

.60

(.42)

.50

(.50)

MT

Concrete

.20

(.45)

.29

(.45)

.55

(.44)

.57

(.45)

0 (0) .58

(.38)

.30

(.50)

.60

(.42)

MT

Abstract

.30

(.45)

.46

(.45)

.70

(.48)

.50

(.50)

.50

(.58)

.50

(.45)

.30

(.45)

.60

(.42)

AC

Concrete

.20

(.27)

.38

(.38)

.30

(.42)

.07

(.19)

.25

(.29)

.33

(.41)

.30

(.45)

.30

(.27)

AC

Abstract

.30

(.27)

.42

(.36)

.10

(.32)

.21

(.39)

.13

(.25)

.25

(.42)

.10

(.22)

.40

(.42)

DA

Concrete

.50

(.35)

.17

(.33)

.40

(.39)

.29

(.39)

.13

(.25)

.08

(.20)

.30

(.27)

.30

(.27)

DA .30 .38 .29 .31 .50 .08 .30 .10

Abstract (.45) (.38) (.35) (.48) (.58) (.20) (.27) (.28) Note: G stands for gifted and TD for typically developing.

Figure 5. Consistency rates over development in gifted and typically developing children

Note: The differences shown here between gifted and typically developing children are not statistically significant.

Figure 6a-d. Performance by age on concrete and abstract problems in gifted and typically developing children. a. 5 year olds b. 6 year olds

c. 7 year olds d. 8 year olds

Note: The blue line indicates gifted children and the green line, typically developing children. 1 denotes the Concrete problems and 2 the Abstract problems.

Figure 7a-d. Performance by age on concrete and abstract problems in gifted and typically developing children after controlling for working memory differences a. 5 year olds b. 6 year olds

c. 7 year olds d. 8 year olds

Table 11. Error rates over development in gifted and typically developing children Group-Age

Errors Errors/ Subject

No-MP

No-AC

Yes-MT

Yes-DA

Pos/Neg Error

NS-MP

NS-MT

Bicond. AC/DA

Certainty Error

G-5 126 10.5 .11 .07 .20 .17 .55 .07 .07 .28 .42 T-5 71 10.1 .10 .10 .28 .20 .68 .08 .04 .25 .37 G-6 168 9.9 .05 .05 .13 .13 .36 .17 .11 .36 .64 T-6 79 9.9 .08 .08 .06 .09 .31 .11 .15 .43 .69 G-7 133 8.9 .05 .03 .09 .07 .24 .11 .14 .51 .76 T-7 72 10.3 .03 .06 .15 .17 .41 .15 .13 .29 .57 G-8 117 9.0 .03 .03 .09 .06 .21 .12 .13 .47 .80 T-8 60 8.6 .05 .03 .13 .08 .29 .17 .08 .55 .72

Figure 7. Positive/Negative errors decrease over development in gifted and typically developing children

Figure 8. Certainty errors increase over development in gifted and typically developing children

Table 12a-c. Positive errors for negative problems by group a. Cuts for DA Problems by Group Group Age Consistent Yes At least Half Yes Gifted 5 .33 .58 Gifted 6 .24 .41 Gifted 7 .07 .20 Gifted 8 0 .15 Typically Developing 5 .29 .71 Typically Developing 6 .13 .25 Typically Developing 7 .29 .57 Typically Developing 8 .14 .14 b. Cuts for MT Problems by Group Group Age Consistent Yes At least Half Yes Gifted 5 .33 .67 Gifted 6 .24 .29 Gifted 7 .07 .27 Gifted 8 .08 .15 Typically Developing 5 .57 .86 Typically Developing 6 0 .13 Typically Developing 7 .43 .57 Typically Developing 8 .14 .29 c. Total Cuts by Group Group Age Consistent Yes (MT

+ DA) Gifted 5 .67 Gifted 6 .47 Gifted 7 .13 Gifted 8 .08 Typically Developing 5 .86 Typically Developing 6 .14 Typically Developing 7 .71 Typically Developing 8 .29

Appendix A. Conditional Reasoning Problems

Concrete.

If there is a green circle, there is a red triangle.

If there is a big star, there is a small square.

If there is an apple in the fruit salad, there is a banana in the fruit salad.

If the cat is on the mat, the fox is wearing socks.

If the boots have polka dots, the scarf has stripes.

If John likes playing tag, Dave likes playing tag.

If Bill’s favorite class is art, Sue’s favorite class is math.

If Mrs. Jones’s class goes to the fire station, Mr. Smith’s class goes to the zoo.

Abstract.

If there is a blick, there is a snabe.

If the tibble is bloopish, the basmy is lerfish.

If there is a gidget in the yanna, there is a wurgle in the yanna.

If Arv’s favorite is snirt, Boog’s favorite is floom.

If boogle is true, ronee is true.

If Jeed likes playing gooze, Mish likes playing gooze.

If the smits has a mizzle, the cursh has a zade.

If there is a striggish neave, there is a delkish skell.

Appendix B. Rote Problems

Example of a 2x2:

I K

W D

Example of a 3x3:

B C S

J L P

W E F

Example of a 4x4:

C O F G

Q X V H

M A U W

Z I S D

Example of a 5x5:

Y N P I R

G H C F B

T L Q X S

K M J A V

W D Q X U

Set 1 E F S P X K N M C B W U Y I D O 2. S D Y F 3. G N Y O E T J H B S I M Q K V W 4. T B H K D S U O C 5. P J T M

S K H I D E O Y W G B Q 6. J X O Y 7. R G A D V S C K N 8. D J F A B G Y C L M B V K H R Y T H U P W O S Q E 9. Y S G U 10. K U E Y P U W L G I B X C P H S A N Z D Q G R O F 11. L Y I E 12. Z X W U V R A F B D E T J Q V H N P X O F C K M I 13. F W E S I X L C D Z B Q M G R K 14. M B Y Q 15. U D P A X M W A L O G I E M J S U V T Q H K B E F 16. I E L H C N

M K B Set 2 1. example 5x5 2. G O C X I P S W I V E A N R Y R O H Z L D B Q M V 3. example 3x3 4. example 4x4 5. J X H C I U W B F 6. X H W S D M V U J P C F A E G R 7. example 2x2 8. Z S E I P V O J X H G N W M K U 9. O M V C 10. W H J M U Q S V I N D C O K B E X P S F L T A I G 11. N E I Y 12. R N C Q Z W S P U 13. M W D B F V O L C

14. X E R Y V C G D I H U M R Y T L N K B W Z O S J R 15. S V G X D M J F Q W O R I C L E 16. D Z S P L G U I K R T H E F B Q Set 3 1. V X W D K O H F B S T Y L P I Q 2. F S N G 3. D V Q P U O E M W F N T P J Y N K A I Z Q V H C K 4. Y H U X W M C E S D P T L I F K 5. W A K C N B Q M D V J I N S L F U O E G L H K P X 6. Q C H R G I M Z F 7. O X N U Q H M E J

8. S O X C Y V P U B L W E Q H M T 9. C J K U 10. Z M C V Y E R G L 11. J N B X D I E Y H 12. E I K V F S P N O 13. N D L P J K G S W B M Y C X V A L E I O G H I T R 14. T N Q I 15. B C Z L 16. K L O C G S U V T E X R B W H J Set 4 1. T J U Y X E L H D N W I P O F Z 2. N P H A O E U X C T Q K M D W L 3. H J V

L G D R A O 4. D P F K W B O I G Q L T V W Y Q M U N H V J E X S 5. W D J T 6. M U W C A O X K B T Y P J W F D F N M V K R O G I 7. U P Z S G K Q I H 8. I C W P 9. X I Z T D E N H Q Y U S K D C P Z U R F M S O G A 10. B W K J 11. L V J C I W D R T L Z V Q I E N F A P R N Y H O G 12. P S I F Q N E A V R S U O J L H M Q Y W X K W C I 13. S F X V H O D P J 14. F H W

O T I G D S 15. E Q H C 16. O I J A E S Y L T N K W B R H C