Complexity of geometric inductive reasoning tasks

Complexity of geometric inductive reasoning tasks

Contribution to the understanding of fluid intelligence

Ricardo Primi*

Centro de Ciencias Humanas e Sociais, University of Sao Francisco, Rua Alexandre Rodrigues Barbosa, 45,

CEP 13251-900 Itatiba, Sao Paulo, Brazil

Received 26 January 2000; received in revised form 30 May 2000; accepted 8 August 2000

Abstract

Studies of the complexity of geometric inductive matrix items used to measure fluid intelligence

(Gf) indicate that such complexity may be related to (a) an increase in the number of figures, (b) an

increase in the number of rules relating these figures, (c) the complexity of these rules, and (d) the

perceptual complexity of the stimulus. One limitation of these studies is that complex items present all

of these characteristics simultaneously. Thus, no information regarding their relative importance is

available, nor is it clear whether all these factors have a significant effect on complexity. In the present

study, two matrix tests were created by orthogonally manipulating these four sources of complexity,

and the results show that perceptual organization has the strongest effect, followed by the increase in

the amount of information (figures and rules). These results suggest that Gf is most strongly associated

with that part of the central executive component of working memory that is related to the controlled

attention processing and selective encoding. D 2001 Elsevier Science Inc. All rights reserved.

Keywords: Cognitive processes; Inductive and deductive reasoning; Fluid intelligence; Item content; Item

response theory; Intelligence testing; Item analysis

1. Introduction

Initially defined by Cattell (1941) and further elaborated by Horn and Cattell (1966), fluid

intelligence (Gf) is one of the broad factors of intelligence, (Carroll, 1993a, 1993b, 1997; Horn

& Noll, 1997). It is a mental activity that ‘‘involves making meaning out of confusion;

0160-2896/01/$ – see front matter D 2001 Elsevier Science Inc. All rights reserved.

PII: S0160 -2896 (01 )00067 -8

* Tel.: +55-193-295-0130.

E-mail address: [email protected] (R. Primi).

Intelligence 30 (2001) 41–70

developing new insights; going beyond the given to perceive that which is not immediately

obvious; forming (largely nonverbal) constructs which facilitate the handling of complex

problems involvingmanymutually dependent variables’’ (Raven,Raven,&Court, 1998, p.G4).

In the last four decades, the basic component processes that comprise this complex mental

activity have been under study by various cognitive psychologists (Bethell-Fox, Lohman, &

Snow, 1984; Carpenter, Just, & Shell, 1990; Embretson, 1995, 1998; Evans, 1968; Goldman &

Pellegrino, 1984; Gonzales Labra, 1990; Gonzales Labra, & Ballesteros Jimenez, 1993; Green

& Kluever, 1992; Hornke & Habon, 1986; Hunt, 1974; Mulholland, Pellegrino, & Glaser,

1980; Primi, 1995; Primi & Rosado, 1995; Primi, Rosado, & Almeida, 1995; Rumelhart &

Abrahamson, 1973; Sternberg, 1977, 1978, 1980, 1984, 1986, 1997; Sternberg & Gardner,

1983). This research has tried to identify the cognitive processes people use to solve geometric

analogy tasks which, according to Marshalek, Lohman, and Snow (1983), are the prototype

tasks to assess Gf. Basically these studies (a) identify the basic component processes and the

strategies that organize them in a complex chain, (b) investigate the correlations between

component and traditional psychometric measures, (c) discover complexity factors underlying

the tasks, and (d) simulate problem-solving behavior using artificial intelligence.

Fig. 1 displays four examples of the geometric 3� 3 matrix problems used in the present

study. Each of these problems consists of an organized set of geometric figures obeying either

two or four rules; the subject must discover them so that he or she can generalize from them

to decide which of the eight options is the most appropriate to fit into the blank space.

The basic components of the problem-solving behavior involved in such problems can be

organized into three stages. The first stage is associated with the creation of a mental

representation of the attributes of the problem and the rules relating these attributes. In the

literature, these two aspects have received various labels, including encoding and inference

(Sternberg, 1977), perceptual and conceptual analysis (Carpenter et al., 1990), pattern

comparison and decomposition, and transformational analysis and rule generation (Mulhol-

land et al., 1980). The second stage is associated with the recognition of the parallels between

these rules and a new, but analogous, situation. This component has been variously

denominated as mapping (Sternberg, 1977), perceptual and generalized conceptual analysis

(Carpenter et al., 1990), and rule comparison (Mulholland et al., 1980). The third stage is

associated with the application of the rules to create an appropriate representation to fill the

blank, and the selection of an answer from the options provided. The terms used to

denominate this process of representation generation are application, comparison and

response (Sternberg, 1977), and response generation and selection (Carpenter et al., 1990).

Recently, research has suggested that Gf is associated with working memory capacity

(Duncan, Emslie, & Williams, 1996; Embretson, 1995, 1998; Engle, Tuholski, Laughlin, &

Conway, 1999; Hunt, 1996, 1999; Jurden, 1995; Kyllonen & Christal, 1990; Prabhakaran,

Smith, Desmond, Glover, & Gabrieli, 1997). According to Baddeley and Hitch (1994),

working memory capacity can be decomposed into memory buffers responsible for storing

speech-based information and visuospatial information (phonological loop and visuospatial

sketchpad) as well as a central executive component responsible for the coordination of the

basic components and attentional control.

Engle et al. (1999) use the term short-term memory to denote memory buffers and suggest

that they are related to the amount of information that can be maintained active at any one

R. Primi / Intelligence 30 (2001) 41–7042

time; the central executive component involves the ability to maintain a representation active

in the face of interference and distraction by controlling the focus of attention (controlled

attention). These authors use structural equation modeling to compare the correlations of

memory tasks with Gf and conclude that the central executive component drives the

relationships between memory tasks and Gf. Hence, Gf is specially related to the central

executive component or controlled attention.

Fig. 1. Examples of experimental test items. The first and second characters represent the number of elements and

number of rules, respectively; the third and fourth constitute a code representing the type of transformation

(SI = simple, SP= spatial, CX= complex, CO= conceptual); and the fifth indicates the type of organization (H:

harmonic, N: nonharmonic).

R. Primi / Intelligence 30 (2001) 41–70 43

In the past two decades, a new approach for test construction has been developed that

integrates the procedures of cognitive psychology with those of psychometric methods

(Embretson, 1983, 1985a, 1985b, 1994, 1995, 1996, 1998; Frederiksen, Mislevy, & Bejar,

1993; Whitely, 1980a, 1980b, 1980c). Embretson (1983, 1994, 1998) has proposed a two-part

distinction for construct validation: construct representation that involves the identification of

cognitive components underlying task performance, and nomothetic span, which concerns the

specification of the network of test score correlations with other constructs. Embretson argues

that traditional methods of construct validation involve only the latter, which gives meaning

to test scores by linking them with other measures (nomothetic span); whereas new advances

in cognitive psychology suggested that the meaning of measures can also be established by a

direct understanding of the process, strategies, and knowledge involved in problem-solving

behavior for individual items (construct representation).

An important aspect of construct representation is the determination of item complexity.

This involves the development of a theory by proposing a cognitive model for solving items;

the identification of basic capacities involved in item performance and item characteristics

that poses demand to these capacities; finally items varying in these characteristics are

produced. The theory is tested by comparing expected item complexity with empirical data.

Although such studies are concerned with the explanation of variability of item complexity,

as items are summed to produce test scores and as item characteristics are understood to exert

differential demands on basic capacities, what is eventually being explained is the test score

variability in reference to individual differences in basic cognitive capacities. In fact, in Item

Response Theory, item complexity and ability are found on the same scale. Carroll (1993a,

1993b) proposed a similar procedure, although he called it behavioral scaling.

While cognitive studies have made important findings about the nature of Gf, only a few

studies use the procedure of construct representation to link item complexity with the findings

of cognitive psychology. As will be discussed later, the few studies in the literature leave

some important questions unanswered. In the light of new research on Gf such as that

involving the search for a link between Gf and brain mechanisms (Crinella & Yu, 1999;

Duncan et al., 1996; Prabhakaran et al., 1997) and that involving a possible explanation for

the increase in Gf test scores in past few decades (Flynn, 1985, 1998; Neisser, 1998), an

understanding of item complexity will be important to be able to identify exactly which

aspect of the construct of Gf is being measured by existing tests.

The present study was designed to investigate the source of the complexity of geometric

matrix items using a controlled experiment to provide a solid basis for inferences about the

relative importance of these components. The following section of summarizes the research

already available concerning the factors involved in the complexity of geometric matrix items

and links these factors to Gf capacities. The limitations of these previous studies will then be

discussed and the goals of this empirical study presented.

2. Complexity factors and essential capacities of Gf

Complexity factors are those features of a task that define its complexity. These features

are intrinsically related to cognitive capacities that individuals must possess in order to deal


with task demands and solve the problem that is proposed by a particular task. Each

complexity factor constitutes a demand for one or more of the essential cognitive capacities

that comprises Gf. In the literature, four main factors influencing item complexity are

considered: (a) number of elements, (b) number of transformations or rules, (c) type of rules,

and (d) perceptual organization. In this section, each one of these variables will be considered,

and a link with the Gf capacities will be proposed.

2.1. Amount of information: number of elements and rules

The number of elements refers to the number of geometric figures or attributes in an

existing matrix problem, while the number of rules refers to the number of relationships

existing among the different elements or attributes. The role of these variables in the

complexity of geometric inductive reasoning tasks has been investigated mainly by Mulhol-

land et al. (1980), and also by Sternberg (1977) and Sternberg and Gardner (1983).

Mulholland et al. (1980) created four-term geometric analogies in true–false format (A is

to B as C is to D) by systematically manipulating the number of elements (1, 2, and 3) and the

number of transformations (0, 1, 2, and 3). They observed that when the number of elements

and transformations increases, the processing time increases simultaneously beyond what a

simple additive function would predict. In their words, ‘‘as both elements and transformations

increase, then a solution may require substantial external memory that is unavailable, thus

creating a need for alternative processing strategies that are time consuming with respect to

item solution’’ (p. 265). Based on these data, Mulholland et al. proposed the idea of a memory

management process for solving complex items. They also found that the most difficult items

were those involving multiple transformations of single elements. In such cases, a person has

to perform serial operations on stored representations, as well as storing the results and the

order of the operations performed.

These two variables are both associated with the amount of information that must be

processed in the working memory and are consequently the basic sources of working memory

load. An increased amount of information requires a larger memory buffer to hold all the bits

of procedural and/or declarative information (short-term memory) involved, while simulta-

neously requiring a more efficacious organization of goals and encoded elements (central

executive component).

According to Baddeley and Hitch (1994), the working memory is a system that

simultaneously combines storage and processing. The inclusion of processing in working

memory breaks with the traditional concept of storage only, which was implied in the concept

of short-term memory or short-term apprehension-retention (Horn, 1986, 1991; McGrew,

Werder, & Woodcock, 1991). Woodcock (1990, p. 247), commenting on the digit span

subtest of the WISC, affirmed that ‘‘A numbers reversed task appears to require both short-

term memory and fluid reasoning. A numbers forward task, however, is a purer measure of

short-term memory than numbers reversed.’’ This affirmation has led to the conclusion that

tasks that require only storage are not as closely related to Gf as are tasks that require both

storage and processing.

Various measures of working memory have been created including the ABC Numerical

Assignment Test (Kyllonen, 1994), which requires individuals to supply the number assigned


to C, in problems such as the following: A=C + 3, C=B/3, and B = 9. Using such tasks,

Kyllonen and Christal (1990) found correlations around .88 with reasoning tasks, leading to

the conclusion that reasoning seems to be little more than working memory.

Both the reversed numbers task and the ABC Numerical Assignment Test, as well as

geometric inductive matrix problems, require information to be encoded, retained, and

manipulated or transformed in the working memory to reach a solution. For instance, in

the solving of geometric matrix problems, various aspects requiring integrated storage and

processing are involved. First, the problems include more than one element or attribute, and

all of these attributes must be stored while visual perceptual processing is controlled. A

similar kind of dual processing is required in mapping, when information in the working

memory is used as a guide for perception. Second, when stored attributes prove to be

irrelevant, they must be discarded, which requires a transformation of the information stored

in working memory. Moreover, to create a solution, it is necessary to apply a rule stored in

working memory to produce a new representation and then to store that result in working

memory until the various options can be processed.

Salthouse (1994, p. 536) has distinguished three components of working memory: ‘‘(a)

storage capacity, reflecting the ability to preserve relevant information; (b) processing

efficiency, representing the ability to perform required processing operations rapidly; and

(c) coordination effectiveness, corresponding to the ability to monitor and coordinate

simultaneous activities.’’ These components can be organized into two general capacities:

structural (storage capacity) and operational (processing efficiency and coordination effec-

tiveness). Salthouse, Babcock, and Shaw (1991) and Salthouse, Legg, Palmon, and Mitchell

(1990) have created numerical, verbal, and visual measures of working memory by

manipulating variables associated with structural and operational capacities. Although the

main concern of these studies was the investigation of decline in working memory capacity

due to age, they show that such measures show moderate correlations with Gf tests like

Raven’s Progressive Matrices.

Several studies (Duncan et al., 1996; Embretson, 1995; Engle et al., 1999; Kyllonen, 1994;

Kyllonen & Christal, 1990; Mulholland et al., 1980; Woodcock, 1990) have presented

evidence that the most important facet of working memory associated with Gf is the central

executive component (Salthouse’s, 1994, operational capacity). In simpler tasks where the

memory load is within the average maximum number of bits of information that can be

maintained active, it may be that only the short-term memory buffer is involved, but in

complex tasks, such as most types of inductive geometric matrix items, these limits are

overstepped, and a strategy for dealing with memory overload must be implemented. Such a

strategy will result in the implementation of a complex mental activity by the central

executive component of working memory, which is responsible for the assembly of numerous

elementary comparison processing loops (Klauer, 1990; Marshalek et al., 1983; Snow,

Kyllonen, & Marshalek, 1984).

The importance of the existence of these loops in the solution process for Raven’s

Advanced Progressive Matrices was identified by Carpenter et al. (1990) by analyzing the

participants’ eye movements while they were solving the problems. They noted that the

analytical decomposition of the problem into smaller subproblems, with the incremental

organization of the solution to each subproblem into a global strategy, was the most


noticeable facet of the solution process. This process requires the ability ‘‘to successfully

generate and manage their problem-solving goals in working memory. . . . The process of

spawning subgoals from goals, and then tracking the ensuring successful and unsuccessful

pursuits of the subgoals on the path to satisfying higher level goals’’ (Carpenter et al., 1990,

p. 428). This global process was called goal management. One can consider the organization

of goals in a hierarchical structure to be a strategy for dealing with the limited capacity of

working memory because, as Carpenter et al. (p. 428) have observed, ‘‘goal management

enables the problem solver to construct a stable intermediate form of knowledge about his or

her progress’’ making it possible to keep a representation active in face of interference, as

Engle et al. (1999) have argued. Kyllonen and Christal (1990, p. 428) also recognize the

importance of a reasoning strategy in working memory when they propose that ‘‘an additional

important determinant of working memory capacity is the degree to which buffer storage can

be managed through a kind of reasoning process.’’

Another important reasoning strategy to cope with working memory load that has been

found to be associated with Gf is adaptive flexibility. Bethell-Fox et al. (1984) have

demonstrated that analogical problems can be solved by two strategies: constructive

matching and response elimination. Constructive matching refers to the creation of a

mental representation of an answer to a problem and its comparison with existing options,

while response elimination refers to loops involved in the creation of partial solutions,

usually based on a single attribute, and the elimination of incorrect options. These authors

found that constructive matching was used more frequently than response elimination for

simple items and by subjects with high ability; and they concluded that working memory

demands caused a strategy shift. Hence, one important aspect of Gf that may also be related

to the central executive component of working memory is the flexibility to alternate from

one strategy to another in order to handle increased working memory loads when problems

become more complex.

In summary, an increase in the amount of information (number of elements and rules) will

constitute an increased demand on the structural and functional working memory compo-

nents, that is, the size of the working memory buffers and the implementation of strategies to

organize and optimize information in the available space, respectively. This distinction is

similar to that made in the working memory studies of Just and Carpenter (1992) and

Salthouse et al. (1990, 1991). As the amount of information in complex tests almost always

surpasses the structural capacity of working memory, the most important resource is the

central executive component, since this is responsible for the organization of the flux of

information to cope with memory overload.

2.2. The nature of relationships: type of rules

Type of rules is another source of item complexity considered in the literature; this refers to

the nature or content of the relationships or transformations applied to elements or attributes

(see Fig. 2). In their study of Raven’s Advanced Progressive Matrices, Carpenter et al. (1990)

classified problem rules using an adaptation of the Jacobs and Vandeventer (1972) taxonomy,

which defines 12 categories, based on an analysis of 201 intelligence tests. In Fig. 2, in the

first three columns, three different taxonomies are presented. The first column presents the


taxonomy used in the present study, while the second, presents that of Carpenter et al., and the

third that of Jacobs and Vandeventer. The final column displays examples of each type of rule

identified in these studies. In this column, each row with three geometric figures exemplifies

one particular type of transformation.

Fig. 2. Examples of values for type of relationship and the link to previous research.


The taxonomy of Carpenter et al. (1990) posits four main types of rules: (a)

quantitative pairwise progressions involving the increment or decrement of an attribute

between adjacent elements; (b) figure addition and subtraction, involving the production

of an element by the addition or subtraction of the other two elements; (c) distribution of

three values, in which elements are instances or values of a conceptual attribute; and (d)

distribution of two values, in which element subparts appear in only two of the three

elements in the row. Those who are familiar with the Carpenter et al. taxonomy will note

that this system includes an additional, constant in a row category, which refers to cases

where no attribute change between elements is involved. However, since this type of rule

was not used in the present study, it was not included in Fig. 2. The Jacobs and

Vandeventer (1972) taxonomy, on the other hand expands the quantitative pairwise

progression and uses different names for the other three.

In the present study, these rules were reorganized (column 1 of Fig. 2). The first two levels

separate the quantitative pairwise rules involving increment or decrement transformations

(simple) from those involving spatial transformations (spatial). The third level (complex)

includes figures addition and subtraction, distribution of three values, and distribution of two

values, and attribute addition. The fourth level (conceptual) is actually a subset of the third

involving an expansion of the distribution of three values to simplify computer manipulation

(see Materials and Procedure).

The effect of type of rule on the complexity of geometric analogies was demonstrated by

Whiteley and Schneider (1981), and on the complexity of matrix items by Carpenter et al.

(1990), Embretson (1998), Green and Kleuver (1992), and Hornke and Habon (1986). The

Carpenter et al. analysis divided these rules according to their level of abstraction, from the

easiest to the hardest: constant in a row, quantitative pairwise progression, figure addition and

subtraction, distribution of three elements, and distribution of two values. In items involving

pairwise progression rules, inference only requires basic perceptual comparison of two

elements in order to induce a rule that can be generalized to the other elements. In items

involving other type of rules, inference requires the simultaneous consideration of all

elements in order to induce a rule. At the same time, this rule must be based on conceptual

similarity rather than perceptual similarity. For instance, when an individual has to discover

that there are three shapes when the three figures in a row of a matrix are different, they must

be considered as instances of a single concept because, on a concrete level, they are

perceptually different.

Carpenter et al. (1990) have found evidence that individuals employ serial rule induction

and consider simpler rules before more complex ones. This would imply that, for items

composed of abstract rules (e.g., distribution of two values), individuals would first try to

solve the problem by searching for simple relationships (perceptual similarities) before

considering more complex ones (conceptual similarities). It is interesting to note that, in such

a serial induction model, the amount of information (number of rules) is always correlated

with the type of rule; even though an item involving only the single complex rule

‘‘distribution of two values,’’ for example, individuals will have to try at least four rules

before arriving at the correct rule (constant in a row, quantitative pairwise progression, figure

addition and subtraction, and distribution of three elements). Embretson (1998) thus proposes

that different types of rules, which she called levels of relationship, will impose differential


demands on the working memory, hence the type of rule will impose the same demand on Gf

capacities that an increase in the amount of information will.

In addition to the fact that type of rules is linked to information load, it is also linked to

abstraction. Abstraction makes it possible to construct representations based on analytically

decomposed fragments of perception thus allowing the reorganization of natural groupings

formed by the concrete characteristics of a stimulus. In terms of cognitive processing,

abstraction may be a result of the process of selective attention, associated with the central

executive component of working memory. Hence, selective attention plays a necessary role in

the creation of abstract representations; the two should be considered as aspects of a single

unitary concept.

Selective attention has been divided into three distinct categories depending on the

direction of the flow of information (Sternberg, 1986). When information must be activated

in long-term memory for transferal to working memory, the process is called selective

comparison; when it moves from stimulus to working memory, the process is called selective

encoding; and when it is selected from the working memory itself, the process is called

selective combination. This distinction is important, because different types of analogical

tasks are classified according to the type of selective attention required. Geometric analogies

require largely selective encoding, while verbal analogies require selective comparison and

deductive tasks, selective combination.

Depending on the type of rule involved in such geometric problems the demand for

selective encoding will be greater because of the presence of various irrelevant attributes that

must be ignored in the inference process. For example, when figures with the same color are

grouped, the difference in shape must be ignored. Furthermore, complex items in which

elements are grouped by conceptual instead of perceptual similarity require the continued

activation of concepts while attention is focused on selected parts of stimuli. This process

may generate an additional demand for the controlled attention of the executive component of

working memory.

2.3. Perceptual organization

The final variable, perceptual organization, has been the focus of the least study. It

involves the gestalt principle of perceptual grouping of visual perceptions, such as grouping

by proximity, similarity, common region, and continuity (Mack, Tang, Tuma, & Rock,

1992; Palmer, 1992; Rock & Palmer, 1990). These principles can either increase or

decrease the complexity of the problem, as was demonstrated by Primi (1995) and Primi

and Rosado (1995).

Perceptual organization is related to ambiguity, contradiction among perceptual and

conceptual groupings, and the number of misleading cues. As can be seen in the sample

items in Fig. 1, the two items in the first column represent items that are relatively less

complex than those in the second column. The term harmony has been used here to refer to

the visual esthetics of such combinations, in analogy to harmony in music, where the sound

resulting from the simultaneous playing of certain musical notes is more pleasing to the ear

than that resulting from other combinations, that is, harmony here is used to refer to the

esthetics resulting from congruency in the arrangement of parts (Webster’s New Collegiate


Dictionary, 1981). Visually harmonious items display perceptual and conceptual combina-

tions that represent congruent relationships between elements, whereas nonharmonic items

tend to portray competitive or conflicting combinations between visual and conceptual

aspects that must be dealt with in reaching a solution.

Fig. 3 helps to explain how item features can be manipulated to create these two levels of

perceptual complexity. The top row reproduces the first row of item 22SIH, a harmonic item,

which can be seen in full form in Fig. 1. The second, third, and forth rows show the

successive manipulations that were made on the elements in the top row, to produce item

22SIN, which corresponds to the nonharmonic version of item 22SIH. This last row is thus

the same as the first row of item 22SIN.

The first row of Fig. 3 lists the rules that are involved in item 22SIH, that is, shape and

shading transformations (two pairwise quantitative rules, see Fig. 2). The second row shows

the result of the transformation of the second element into a form similar to that found in the

first element (circles). This manipulation increases the likelihood of perceptual groupings

based on similarity of form. The third row shows the result of the second phase of

Fig. 3. Example of transformations applied to harmonic items to create nonharmonic items.


manipulation, which added color to the first element. This manipulation was also designed to

increase the likelihood of perceptual groupings based on similarity of ‘‘color.’’ The fourth

row shows the result of the third phase of manipulation, which interrupts the alignment of the

elements. This transformation interrupts the natural perceptual continuity of the elements,

which makes it difficult to identify which elements should be grouped. Similar manipulations

were used on the second and third rows of item 22SIH to produce item 22SIN. It is important

to note that, in terms of number of elements, number of rules and type of rules, these two

items are identical. The difference is due to perceptual complexity. In previous studies, it has

been found that such transformations have a remarkable impact on the complexity of

nonverbal inductive reasoning tasks (Primi, 1995; Primi & Rosado, 1995).

In their study of Raven’s Advanced Progressive Matrices, Carpenter et al. (1990) observed

the existence of certain misleading cues that increase the complexity of the process of finding

a correspondence among elements or of grouping elements that are governed by the same

rule. This complexity was found in items composed of multiple rules, probably because the

involvement of several rules also implies the presence of several superposed elements that

form perceptually complex figures.

In perceptually complex items, the likelihood of the formation of irrelevant groups of

elements based on perceptual features is increased. Hence, such items impose demands on

selective encoding and abstraction, because certain perceptual groupings must be ignored and

others based on more abstract attributes considered. They also impose demands on goal

management associated with the central executive component, as the irrelevant groupings will

have to be discarded, and it will be necessary to operate on the stored representations or in

fragments of the perceptual field that have proved to be irrelevant. Moreover, they may

impose demands on the visual memory component of working memory (visual scratch pad)

because a discontinued group of elements implies the need to remember their positions in

visual space to orient the subsequent selective encoding process through the indication of the

remaining elements that needs to be considered.

2.4. The differential effect of complexity factors

In summary, there is some evidence that all complexity factors of geometric matrix

inductive tasks have an effect on the central executive component of working memory.

Hence, in general, individual differences in Gf are closely related to the capacities associated

with this component of working memory.

Although all complexity factors affect the same component, the manner in which they do

so appears to be different. It may be possible to identify two groups of complexity factors,

one involving the number of elements and number of rules and the other perceptual

organization and type of rules. The first group encompasses variables correlated with the

process of handling simultaneous bits of information in short term memory (goal manage-

ment), while the second is composed of variables correlated with the simultaneous control of

visual processing of selective encoding (abstraction) and the management of information in

short-term memory. The second group may also be associated with goal management when

irrelevant information is present.


No evidence for a differential impact from these two groups of variables is found in the

literature. The Carpenter et al. (1990) analysis of Raven’s Advanced Progressive Matrices, for

example, shows that item complexity is basically correlated with the number and type of

rules, but these occur simultaneously in complex items. In comparison with easy items, the

complex items involved more rules (usually three or four), more complex rules (distribution

of two values, e.g., Fig. 2) and misleading cues complicating the process of finding

correspondences. Hence, this collinearity precludes the possibility of identifying unique

contributions to item difficulty.

Two studies in the literature were designed to create matrix items with geometric figures

controlling item features to gain insights into their effects on item complexity. The first was a

study by Hornke and Habon (1986), who developed an item bank with 616 matrix items

involving two elements and two rules each, that is, with no variation in the amount of

information but varying in relation to the other three features (type of rule, direction of

relationships, and perceptual organization). The first variable included eight levels: identity,

addition, subtraction, intersection, unique addition, seriation, variation of open gestalts, and

variations of closed gestalts. Since these authors created their taxonomy from the work of

Jacobs and Vandeventer (1972) and Ward and Fitzpatrick (1973), all but one of the

relationships is accounted for in the examples in Fig. 2. Intersection is the only new

transformation, and it can be described as the inverse of distribution of two values or unique

addition. Unique addition items can be solved by superimposing two elements, examining the

part that does not intersect, and composing the third element with these parts. In intersection,

the process is the same, except that the third element is composed of the parts that do intersect

in the first two elements. The final two types of rules, variation of closed and open gestalts,

correspond to distribution of three values, except for closedness (squares, circles, etc.) or

openness (arrows, lines, etc.) of the elements.

The second variable was the direction of possible relationships by row, by column, or by

row and column, and the third was the arrangement of the elements such that (a) elements

could clearly be perceptually separated components, (b) they could be integrated, that is, they

consisted of two attributes of a unitary element or geometric figure, such as shape and color,

and (c) they could be embedded to appear perceptually as a unitary element although separate

rules are involved in the variation of the subparts. Based on the discussion of complexity

factors in this paper, this variable could be considered to involve perceptual complexity.

The proportion of variance in item difficulty accounted for by these variables was .40. A

visual inspection of the Hornke and Habon (1986) results shows that items composed of

intersection, unique addition, and with embedded elements were the most complex. The

type of rule had a significant effect on complexity, but the perceptual organization of the

elements had a stronger effect. The items whose elements were perceptually separate were

easier than those whose elements formed a whole figure requiring an analytic process to

dissociate the elements.

Certain questions can be raised about this study and the low predictability achieved. First,

in the taxonomy of rule types, various categories for similar types of rules were created (e.g.,

intersection with unique addition, addition with subtraction, variations of closed gestalts with

variation of open gestalts). Second, the processing model postulated a distinct cognitive

operation associated with each category in their taxonomy, but since some of these categories


were very similar, they probably involved similar cognitive operations, therefore, the

potential of the taxonomy for explaining item complexity is reduced, especially since it

may not have exhausted all the variance in the complexity of cognitive operations involved in

Gf. Third, and most important, the authors did not vary the amount of information involved,

although this constitutes an important source of difficulty. The most relevant aspect of this

study, however, was the operationalization of perceptual complexity and the demonstration of

its effect on item difficulty.

The second and more recent study was conducted by Embretson (1995, 1998), who

based her items on the structure identified by Carpenter et al. (1990) for the Raven’s

Advanced Progressive Matrix items; this structure involves variations in the number and

type of rules, as discussed above. Basically, Embretson (1998) coded item structure by

using two variables, the first combining number and type of rules and the second indicating

the perceptual complexity of the stimulus. For the first variable, items were scored by a

weighted sum of the rules involved (1 = identity; 2 = quantitative pairwise progressions;

3 = figure addition/subtraction; 4 = distribution of three values; 5 = distribution of two

values). The second variable consisted of three dummy codes indicating whether the

elements in an item were overlaid (object overlay), whether they were combined appearing

as a unitary object (object fusion), and whether corresponding elements were perceptually

altered (object distortion). Embretson observed that although this last variable did have a

significant effect on item complexity, the first variable had a correlation of .71 with item

complexity. Based on these results she concluded that working memory was the most

important source of item complexity in Gf tests.

Unfortunately, the collinearity of the independent variables makes this conclusion

questionable. It is well known that in multiple regression analysis it is difficult to

identify which of the independent variables is the most important in predicting the

dependent variable when they are correlated (Howell, 1997). In fact, the correlation

between the number of rules and the presence of the distribution of two values (the most

complex rule) in the 36 items of the Advanced Progressive Matrices (the structure used in

the Embretson study) is .64 (N = 36, P < .001; Primi & Castilho, 1996). As was suggested

earlier, the number and types of rules may have differential impact on the central

executive component of working memory, with the former more closely associated to goal

management and the latter to abstraction and selective encoding. The conclusions of the

study give importance to the first component although the second could also have been

supported. With the data available, it is not possible to know if items are more difficult

because they require a more efficient management of goals in short term memory, because

they require a difficult controlled visual process of selective encoding, or because they

require a combination of the two capacities.

In fact, in another study using the latent response model by Maris (1995), Embretson

(1995) estimated separate ability scores for working memory and general control processing.

In this earlier study, the meaning attributed to working memory differs from that adopted in

the present study as it was linked to the number and type of rules exerting demands on goal

management. In the other hand, general control processes was linked to representational

variability or ambiguity in the meaning of the item stimuli (perceptual complexity) exerting

demands on strategy planning, monitoring, and evaluation. The results of this earlier study


thus suggest a more important role for general control processing in the prediction of

individual differences in Gf.

In summary, all these studies suggest the importance of the role of variables associated

with the amount of information (goal management), rule complexity, and perceptual

organization (selective encoding and abstraction), but they differ in the importance attributed

to the specific sources of item complexity. The Hornke and Habon (1986) study emphasizes

the importance of perceptual organization in association with abstraction and selective

encoding while the Embretson (1995, 1998) studies fail to show conclusive results because

either the role of goal management and/or that of abstraction capacity in the prediction of item

complexity could have been supported. It thus seems that the role of these variables is not yet

fully understood.

One problem is that these two studies present limitations precluding the precise

identification of the effect of each source of difficulty. The former fails to control for

the amount of information, while the latter fails to control for the collinearity of the

structural variables. The present study was thus designed to investigate the effect of each of

the four potential sources of item difficulty discussed earlier. Considering that experimental

control of these variables is necessary in order to investigate the unique contribution of

each, the main goal of the present study was to create an item structure that represented an

orthogonal combination of these four main sources of item complexity so that their impact

on item difficulty and reaction time could be identified. All four variables were expected to

play a role in item complexity, moreover the variables associated with individual differ-

ences in goal management and selective encoding and abstraction capacities were expected

to furnish further insights into the relative contribution of each to item complexity, and

consequently, to an explanation of Gf.

3. Method

3.1. Participants

The participants were 313 undergraduate students, approximately 68% females, with ages

ranging from 17 to 52; about 75% were between 17 and 22 (mean = 21.9, S.D. = 5.8).

3.2. Materials and procedure

Sixty-four items were developed and divided into two sets with 32 items each (Forms A

and B). These forms were structurally identical and were applied to two independent

subgroups of subjects (Form A: n = 122 and Form B: n = 191).

Each item consisted of a 3� 3 matrix with an empty cell and eight response alternatives. In

the present paper, each geometric figure within a cell is considered to be a term, with the

elements being subparts of a term. The cell matrix is referred to using the letters A to I, with

the alternatives labeled I1 to I8. Four examples of the items used have already been presented

in Fig. 1.


The item structure is defined by four variables: number of elements, number of rules, type

of rule, and perceptual organization. Each item is formed by either two or four elements,

randomly selected from a pool of 59 geometric figures; either two or four rules govern the

relationships between these elements. This leads to four possible combinations of number of

elements and number of rules: two elements involving two rules, two elements involving four

rules, four elements involving two rules, and four elements involving four rules. Except for

the two elements four rules combination, in which two transformations were applied to each

element, it was possible to apply a single rule to each element.

Four types of rules were employed: simple (SI), spatial (SP), complex (CX), and

conceptual (CO). The first column of Fig. 2 shows each of these types, called levels, while

the final column shows examples of each level. Considering the Carpenter et al. (1990) and

Jacobs and Vandeventer (1972) taxonomies, more than one type of rule was involved for

each level. The first level includes five types of quantitative pairwise progressions, that is,

increase or decrease of an attribute between adjacent elements: size, shading, number series,

shape, and added element. The second level also includes three types of quantitative pairwise

progressions, but these involve spatial transformations such as movement on a plane, flipping

over, and reversal. Level 3a includes the four more complex transformations: figure addition

and subtraction, distribution of three values, distribution of two values, and attribute addition

(in which an element is composed by combination of two attributes from the other two

elements). Level 3b is a subset of Level 3a, involving only distribution of three values of an

attribute. The attributes involved were shading, inclination, color, size, outline, and shape.

Since such transformations are suitable for automatic item generation, specific items

involving these rules were created to investigate their psychometric potential for future

studies concerning computerized online item generation.

Combining these three variables, that is, number of elements (two or four), number of rules

(two or four), and type of rule or levels (simple, spatial, complex, or conceptual), led to the

creation of a basic 16-item set. Since more than one possible option was available for each

rule type specific rules were randomly selected.

The original 16-item set was further transformed to create two levels of perceptual

organization: harmonic and nonharmonic. The two items in the first column of Fig. 1 are

examples of harmonic items (22SIH, 42COH), and the two items in the second column their

corresponding nonharmonic transformed versions (22SIN, 42CON). The transformation

employed, exemplified for item 22SIH (Fig. 3), was similar to that used in previous studies,

in which elements are arranged according to gestalt principles of perceptual grouping so that

specific perceptual correspondences among elements are produced (Primi, 1995; Primi &

Rosado, 1995).

For the nonharmonic items, irrelevant perceptual correspondences were created by

manipulating the principles of similarity and continuation (Palmer, 1992; Rock & Palmer,

1990). Manipulating attributes of noncorresponding elements such as color and shape

leads to a perceptual tendency to group them according to similarity. These groupings,

based on the perceptual process do not conform to any meaningful rule in the problem

(e.g., the group composed by all black elements in item 22SIN of Fig. 1), and constitute

misleading cues. Simultaneously, altering the relative positions of corresponding elements

across a row made it possible to increase the complexity involved in forming relevant


groups due to the interruption of natural perceptual continuity, which would have

facilitated their grouping. This particular transformation may increase the demands on

the visual component of the working memory, because the formation of relevant groups

of elements involves the storage of complex, nonsystematic visual patterns of element

positions in visual space.

For the harmonic items, these same principles were used to create a perceptual tendency to

group elements according to the appropriate conceptual rule, that is, corresponding element

attributes such as shape and color were the same and different from other noncorresponding

elements; and, corresponding elements were always aligned in space forming groups by their

good continuity (cf. items 42COH with 42CON in Fig. 1).

The seven distractors were created in such a way as to have: (a) two alternatives with only

one incorrect transformation, (b) two alternatives with two incorrect transformations, (c) two

alternatives with more than two incorrect transformations, and (d) one alternative that was a

copy of the term presented in cell H. The distractors and the correct alternative were randomly

assigned to the eight positions I1 to I8.

The 32 items in each form were arranged according to difficulty. It was assumed that (a) a

nonharmonic version would be more complex than a harmonic one, (b) rule type would

decrease in complexity from CO–CX to SP to SI, (c) items with more information (involving

more elements and rules) would be more complex than items with less information, and (d)

the effect of perceptual organization would be relatively greater than the effect of the type

of rule.

The basic 16-item sets were presented once in the harmonic version and a second time in a

nonharmonic version. If items were always presented in this order, however, subjects would

be better prepared to answer the nonharmonic items and learning could interfere with the

effect exerted by perceptual organization, thus influencing the results. The item order was

thus controlled, with half of the subjects receiving in the harmonic–nonharmonic order and

the other half the nonharmonic–harmonic order, with the subjects being randomly selected

for each of two groups.

Each item was drawn using Corel Draw 4.0 and exported to a 640� 480 pixel bitmap file

format. Computer software was developed using Microsoft Visual Basic 4.0 Professional

(Microsoft, 1995a) to manage item presentation and store the responses in Access database

files (Microsoft, 1995b).

In the application sessions, the subjects answered simultaneously in computer laboratories

with 20 work stations (PC-486). Each session was conducted by a trained experimenter who

was a psychology undergraduate student participating as part of the work required to obtain

course credits. The duration of each testing session was about 60 min.

In a typical session, students first received instructions by means of a hypertext

presentation, which included the following: (a) a brief definition of inductive reasoning,

(b) an example of the task format, (c) explanations about how to interact with the computer,

and (d) general orientations concerning the number of items and the existence of only one

correct alternative. After these instructions, the subjects completed three practice items before

starting the experimental test.

Initially, only the first eight cells (ABC, DEF, GH) were presented. To see the response

alternatives, the subjects had to click a button called ‘‘Present the Alternatives.’’ To choose an


alternative, they had to click on it. Clicking on an alternative moved it to the blank space (the

ninth cell I). The subjects could, however, change their minds by clicking on the alternative

now occupying this space, which returned the alternative to its original location. This

‘‘unselecting’’ of a response was called correction response (CR). Subjects could also

eliminate alternatives by clicking on them with the right-hand mouse button, making the

alternative disappear, although the alternative eliminated could be recuperated by clicking on

the space again with the right-hand mouse button. This reaction was called an elimination

response (ER). After terminating the item, subjects had to click on a button called ‘‘next

item’’ to proceed to the next item, but this button was only enabled after the subject had

chosen some alternative for the previous item, which effectively prevented the skipping of

any item. The right-hand corner of the screen contained an indication of the number of items

answered and those remaining.

Basically all responses were recorded, including the reaction time elapsed from the time of

item presentation to the time of the response precise to the level of milliseconds These

measurements were obtained by using the timegettime function included in the library file

MMSYSTEM.DLL.

3.3. Design

The main goal of the present experiment was to investigate the importance of each

structural variable on complexity. Each item is a cell containing a specific combination of

four independent variables: (a) number of elements (two levels), (b) number of rules (two

levels), (c) type of rule (four levels), and (d) type of organization (two levels). The dependent

variables for each of the combinations were reaction time (RT), accuracy (P), and response

elimination (ER). Since each subject answered under all conditions, the design is one of

repeated measures. Moreover, the application of two parallel forms to the independent groups

functioned as a cross validation study.

4. Results

4.1. Psychometric properties

Table 1 shows the descriptive statistics of total and item scores for Forms A and B

separately. It also shows the distribution of item difficulty and point biserial correlations

between item scores and total scores.

As expected, item difficulty varied from easy (.89) to complex (.05), with the total set of 64

items forming a representative sample of a broad spectrum of item complexity. The

correlations between item and total scores varied from moderate to high. Out of the total

of 64 items, 55 presented item–total point biserial correlations higher than .30.

These indices indicate that the items formed a coherent group along a homogeneous scale.

Internal consistency coefficients were high (.84 and .85); the two forms that were structurally

similar, showed equivalent psychometric properties, although Form B was slightly easier than

Form A.


4.2. Item complexity analysis

The complexity analysis focalized the Rasch difficulty index as the dependent variable;

this was estimated by the unconditional maximum-likelihood method proposed by Wright

and Stone (1979) and performed using RASCAL software (Assessment Systems Corporation,

1996). About 85.9% of the items fitted the one-parameter model. Because of the variability in

the item–total correlations, the two-parameter model could have been used to provide an

even better fit. However, since the correlation between the difficulty indices obtained for the

two models was very high (.98), the simpler one-parameter model was used.

Difficulty indices were analyzed using the general linear model approach. Independent

variable effects were represented by a set of orthogonal linear contrasts (Cohen, 1968;

Howell, 1997), which represented the effects of the structural variables (number of

elements, number of rules, type of rule, and perceptual organization) and the test form.

The goal of this analysis was to predict the item difficulty on the basis of these independent

variables using a stepwise multiple regression technique. All structural variables were

expected to make significant contributions, whereas the test form was not expected to make

a significant contribution.

Table 2 presents the results of the stepwise multiple regression displaying the non-

standardized regression coefficients (B), the standard error (S.E.), the standardized regression

Table 1

Descriptive statistics for total and item scores in test forms A and B

Descriptive statistics for total scores Form A Form B

Mean 17.45 18.94

S.D. 6.04 5.90

Min 2 1

Max 30 30

K-R 20 .84 .85

Descriptive statistics for items P rpb P rpb

Mean .54 .41 .59 .41

S.D. .19 .10 .21 .10

Min .05 .20 .06 .11

Max .89 .61 .85 .62

Frequency distributionsa Frequency (%) Frequency (%) Frequency (%) Frequency (%)

> .10 1 (3.1) 1 (3.1)

.10– .20 1 (3.1) 1 (3.1)

.21– .30 1 (3.1) 4 (12.5) 2 (6.3) 4 (12.5)

.31– .40 5 (15.6) 12 (37.5) 1 (3.1) 6 (18.8)

.41– .50 6 (18.8) 11 (34.4) 4 (12.5) 15 (46.9)

.51– .60 7 (21.9) 4 (12.5) 4 (12.3) 5 (15.6)

.61– .70 5 (15.6) 1 (3.1) 8 (25.0) 1 (3.1)

.71– .80 3 (9.4) 6 (18.8)

>.81 4 (12.5) 5 (15.6)a Difficulty index for P and point biserial correlations between items scores and total scores for rpb.


coefficient (b), and the squared multiple correlation (R2). In the first analysis, including all the

64 items, only perceptual organization contributed significantly to the prediction of item

difficulty. The proportion of variance in item difficulty accounted for by the perceptual

manipulations intended to create irrelevant correspondences was .408, which was statistically

significant, F(1,62) = 43.58, P< .0001.

Although perceptual organization had a large effect, more than half of the variance in item

difficulty remained unexplained. A detailed look at the item matrix data revealed two possible

sources of interaction: (a) interaction between number of elements and rules with type of rules

and (b) number of transformations with number of elements. The effect of the number of

elements and number of rules seemed to be stronger for items involving simple, complex, or

conceptual rules than for those involving spatial ones. Also, items in which more than one

transformation occurred in a single element seemed to be more difficult. Based on these

observations, a second analysis was performed in which the spatial items were excluded and

the contrasts for number of elements and number of rules was replaced by a new contrast

representing the sum of these effects plus an effect representing the number of rules applied to

a single element. This new contrast has been denominated ‘‘amount of information.’’

The second analysis (Table 2) shows that, in Step 1, with perceptual organization included

in the equation, R2=.534, Finc(1,45) = 51.60, P < .0001. Then, Step 2, with amount of

information included in the equation, yielded R2=.642, Finc(1,44) = 13.29, P < .001. This

second analysis, which basically excluded the spatial items, revealed a greater effect for

perceptual organization, as well as a significant increase of approximately 11% in the

predictability of item complexity attributable to the amount of information. None of the other

variables, including the test form, made a significant contribution.

4.3. Reaction time analysis

The RTs of the 313 subjects answering the 32 items (10,016 observations) varied from 4.62

to 1031.21 s (mean = 79.48, S.D = 64.86). The distribution of these RTs was positively

skewed, so prior to the analysis, the RT was transformed by the natural logarithmic function.

Table 2

Summary of results of regression analysis predicting item difficulty from structural variables

Structural variables B S.E. b

First analysisa

Perceptual organization .675 .103 .639***

Second analysisb

Step 1 R2=.534

Organization .794 .111 .731***

Step 2 R2=.642, DR2=.108***

Perceptual organization .785 .098 .723***

Number of elements + number of rules + number of rules on same element .216 .059 .329***a 64 items, R2=.408.b 48 items (items with spatial rules excluded).

*** P < .001.


A 3� 2� 2 ANOVA was performed, with the logarithm of the RT as the dependent

variable, and the amount of information (three levels), the type of rule (four levels), and the

perceptual organization (two levels), as independent variables. The levels of the amount of

information included: (a) Level 1, for items with two rules and two elements, (b) Level 2, for

items with four elements and two rules, and (c) Level 3, for items with four rules (either two

or four elements).

Table 3 shows the results of the ANOVA. All the main effects and interactions were

statistically significant, but their magnitude varied considerably. The RT depended primarily

on the individual subject, that is, on a general between-subject facet representing individual

differences in the mean RT for answering all 32 items. The proportion of the total RT variance

explained by the between-subject source was .367. The next most important effect was due to

perceptual organization. Harmonic items required an average of 59.71 s, whereas non-

harmonic items required an average of 91.65 s. The proportion of the variance in total

reaction time accounted for by this variable was .169. Another important effect was due to the

interaction between type of rule and perceptual organization. Harmonic items with simple

transformations required an average of 47.12 s, while conceptual transformations required

52.49 s, spatial transformations required 68.33 s, and complex ones an average of 70.90 s.

But for nonharmonic items, these differences were not as pronounced as for their harmonic

counterparts. The proportion of the total RT variance accounted for by the interaction

between perceptual organization with type of rule was .035. Fig. 4 shows the mean RT for

each combination of independent variables. It seems that increasing the amount of informa-

Table 3

Results from 3� 2� 2 (Amount of information�Type of rule�Type of perceptual organization) repeated

measures ANOVA for RT

Source of variance S.S. df a M.S. F h2

Between subjects

Persons 1138.01 312.00 3.65 .367

Within subjects

Type of perceptual organization 330.19 (280.93) 1 (312) 330.19 (0.90) 366.71*** .169

Type of rule 90.42 (214.64) 2.73 (852.73) 33.09 (0.25) 131.44*** .046

Amount of information 22.63 (85.89) 1.98 (616.70) 11.45 (0.14) 82.22*** .011

Type of rule�Type

of perceptual organization

69.09 (202.98) 2.66 (831.33) 25.93 (0.24) 106.19*** .035

Amount of information�Type

of perceptual organization

23.18 (75.99) 1.91 (597.35) 12.10 (0.13) 95.16*** .012

Type of rule�Amount of information 14.81 (278.78) 5.06 (1578.3) 2.93 (0.18) 16.57*** .007

Type of perceptual organization�Type

of rule�Amount of information

6.29 (262.48) 5.35 (1669.35) 1.18 (0.16) 7.47*** .003

Total (within subjects) 1958.29

Values within parentheses correspond to the sum of squares (S.S.), degrees of freedom (df), and the mean squares

(S.Q.) of the error source, respectively.a The degrees of freedom were corrected using the Greenhouse–Geisser formula to compensate for compound

symmetry violation (Howell, 1997).

*** P < .001.


tion and rule complexity exerts a more systematic effect for harmonic items then for

nonharmonic ones.

4.4. The analysis of elimination

The computerized application of the test made it possible to identify all the eliminations

made while the subjects were taking the experimental test. Table 4 shows the descriptive

statistics for the number of eliminations for each of the 32 items answered by each of the 313

subjects. There was great variability in the use of this resource. Some students did not use it at

Fig. 4. Mean RT for each item classified according to structural definition, amount of information, type of rule,

and perceptual organization.

Table 4

Descriptive statistics related to elimination of responses, calculated for each item (by item) and for each student

(by student; average of responses to all 32 items)

By item By student

Number of observations

Total 10,016 313

Total number of items or students where at least one elimination occurred 2318 (23.1%) 132 (42.2%)

Descriptive statistics

Mean 4.87 82.47

S.D. 1.60 55.23

Min 1 1

Max 7 224

The descriptive statistics were calculated for all observations where at least one elimination occurred.


all, while others used it frequently. Those who did use the resource also revealed great

variability. The internal consistency coefficient was high .97.

The correlation between the number of eliminations and ability was r=.51, N = 313,

P < .001, indicating that the subjects who used this resource more frequently tended to have

higher scores. Fig. 5 shows a scatterplot classifying each student in a two-dimensional space

defined by the coordinates of ability (WIT scale, Wright & Stone, 1979) and the total number

of eliminations made for the 32 items.

The subjects located to the right (i.e., those making greater use of the strategy of

elimination) tended to have higher scores, although subjects not using eliminations so

frequently showed greater variation in total score. This suggests that high ability was not

necessarily associated with the elimination of responses but individuals who used this

resource frequently tended to have higher scores possibly because this strategy may have

served to reduce the information overload.

5. Discussion

Geometric inductive matrix items such those found in Raven’s Advanced Progressive

Matrices constitute markers of the assessment of Gf. Cognitive psychological studies have

pointed out that item complexity is associated with (a) an increase in the number of figures,

(b) an increase in the number of rules relating these figures, (c) the complexity of these rules,

and (d) the perceptual complexity of the stimulus. One limitation of these studies, however, is

Fig. 5. Scatterplot of subjects classified according to ability and number of eliminations for 32 items.


that complex items present all of these characteristics simultaneously. Thus, no information

regarding relative importance is furnished, nor is it clear whether all these factors actually

have a significant effect on complexity. Since each feature may relate to a different aspect of

the information processing of Gf the variables were combined orthogonally in the present

study so that their effects could be investigated more precisely.

Classical psychometric properties indicate that the two experimental tests developed here

constituted good measures of Gf. Moreover, they included problems from a wide range on the

complexity continuum. The results obtained here support the systematic use of cognitive

psychology in test development, as was proposed by Embretson (1994, 1998), since this

produced a sound psychometric measure, while simultaneously providing an enhanced

understanding of the cognitive processes associated with item performance.

The major contribution of this study involves the identification of the most important

sources of difficulty in test items that contribute to the construct representation of Gf.

Two variables contributed significantly to an increase in item complexity: perceptual

organization and the amount of information, a variable created by combining number of

elements, number of rules, and number of rules applied to a given element. The study

suggests that the most important effects are due to perceptual organization, which explains

53.4% of the variance in item complexity.

Contrary to Embretson (1995, 1998), the results obtained here do not emphasize the

prevalence of the role of goal management component of the central executive component,

but rather show that abstraction (associated with selective encoding) is a major aspect

affecting item complexity. As was discussed earlier, the variable used to predict item

complexity by Embretson (1998) confused type of rules (associated with abstraction

capacity), and number of rules (associated with goal management). Although she interpreted

the effect of this composite variable as emphasizing the notion of overload of information

associated with the number of rules, as this variable was correlated with type of rule, another

interpretation of her results is possible, which emphasizes the need for a more abstract

inference due to rule complexity.

Based on the results of Carpenter et al. (1990), Embretson (1998) postulated that rules

would be inferred serially, from simple to complex. This conception led her to suggest that

rule type would have the same cognitive impact as the number of rules, that is, efficacious

goal management would be required to cope with working memory overload. In the present

study, however, type of rule seemed to be related to the need for abstract processing.

Although it did not have a significant effect, perceptual organization was postulated as a

variable to produce similar cognitive demands and was found to have the strongest effect on

complexity. Thus, the results of the present experiment strongly support the importance of the

process of abstraction in Gf.

But one limitation of the study concerns the variable ‘‘type of rule.’’ If abstraction is the

most important ability involved in item solving, and considering that perceptual organization

and type of rules are item features that operationalize item demands for this ability, why did

only perceptual organization have a significant effect? One possible explanation would be

that a great variability might have existed between rules comprising each general type in the

taxonomy developed for the present study. This interpretation is supported by the fact that the

average reaction time for items involving conceptual rules (Level 3b, in Fig. 2) was similar to


that observed for simple items. Therefore, this specific rule, which was considered to be as

complex as the other rules comprising Level 3a (addition of attributes, distribution of two,

addition of elements), turned out to be much simpler than expected. Further studies are

needed to distinguish within each type of rule and combine these individual rules with other

variables to clarify their effects.

This limitation, however, does not preclude the importance of the effect of perceptual

organization. Perceptual complexity was more homogeneously defined than the levels of type

of rules and this could have been responsible for the great effect of this variable. Due to the

factorial combinations of the independent variables, the main effect that was found for

perceptual organization can be generalized for items with varying amounts of information,

and for items with different types of rules, except for those involving spatial pairwise

progressions. Thus, the effect of perceptual complexity can theoretically be generalized to a

wider universe of matrix items.

Increasing the complexity of perceptual organization complicates the encoding of the

attributes of a problem, thus making the creation of a stable mental representation more

difficult. Certain types of stimulus organization induce the formation of irrelevant groups of

elements or attributes, thus requiring more controlled attentional processing of selective

encoding for the flow of information to the working memory to make focus on abstract

relationships possible, while ignoring the concrete attributes that appear simultaneously in the

field of perception.

An analytical approach based on the control of attention might help to reduce the

overload of information in working memory, since limiting consideration to relevant

attributes reduces the load caused by irrelevant information. At the same time, such an

approach might help a subject to consider one attribute at a time, thus preventing overload

on and confusion in working memory when various attributes must be considered for a

given item.

The relationship between systematic approach and ability is shown in the use of the

strategy of elimination of alternatives. This may be interpreted as a physical analog of the

attention control process. Since the elimination of alternatives may be based on the

selection of a relevant attribute and the ignoring of irrelevant information in the visual

field, thus reducing the amount of information that must be considered. Moreover, the RT

analysis presents evidence that more complex items (more perceptually ambiguous)

generally overload the processing system, consequently requiring more processing time.

This additional time can be associated with the extra time needed for processing the

irrelevant information.

The importance of selective encoding associated with the capacity for abstraction also

suggests that visual processing and the corresponding visual scratch pad of working memory

may be important components of Gf. This interpretation is coherent with factor analytic

studies that have shown the broad Gf factor to be associated with the other broad visual

processing factor (see Carroll, 1993a, 1993b, for a complete review). It is also important for

studies trying to explain the rise observed in intelligence test scores, particularly, on Gf tests,

since one of the hypotheses being considered is that this increase may be associated with

increases in exposure to visual stimulation in recent years (Flynn, 1998). Moreover, such

encoding is also cited in relation to age related loss in working memory, which is linked to


difficulties in encoding and retention of relevant information, since operational capacities

appear to be unchanged in older adults (Salthouse et al., 1990, 1991).

A second source of complexity, involving less impact but contributing a significant 10.8%

to the explanation of the variance in item complexity is the amount of information that must

be encoded and processed in order to solve a problem. The difficulty arises essentially from

pressure on the working memory capacity, that is, the difficulty involved in processing

several items of information simultaneously.

Various studies have stressed the role of the working memory in the cognitive interpre-

tation of Gf (Carpenter et al., 1990; Duncan et al., 1996; Embretson, 1995, 1998; Kyllonen &

Christal, 1990; Mulholland et al., 1980). The present study supports this interpretation and

suggests that Gf is strongly related to a specific aspect of the central executive component of

working memory. The most complex tasks of Gf tests require the capacity to control selective

encoding in visual processing simultaneously with the management of the information in

short term memory to prevent loss of information due to overload.

These results are also in agreement with Engle et al. (1999), who showed that the general

control process was responsible for the high correlations between working memory and Gf

tasks. In their words:

the critical factor common to measures of working memory capacity and higher level

cognitive tasks is the ability to maintain a representation as active in face of interference from

automatically activated representations competing for selection for action and in the face of

distractions that would otherwise draw attention away from the currently needed

representation (p. 312).

In summary, the present study offers evidence that a very important aspect of Gf is the

abstraction capacity associated with the process of selective encoding. It also corroborates

past findings that the general control process of goal management, which organizes a

hierarchical flow of information to the working memory to compensate for natural limitations

in dealing simultaneously with numerous bits of information, is another important aspect of

Gf. Perhaps the most important contribution of this study is the identification of item features

that produce specific demands for each one of these capacities, as well as the provision of a

method for altering these features operationally so that more carefully controlled tests of Gf

can be produced.

Acknowledgments

This paper is based on the author’s doctoral dissertation, submitted to the University of Sao

Paulo (Institute of Psychology) under the orientation of Adail Victorino Castilho. The

research was financed by the Brazilian National Research Council (CNPq). The author

acknowledges the contributions of Gerardo Andanez Prieto, Ronald K. Hambleton, Leandro

S. Almeida, and Linda Gentiy El-Dash for their helpful comments on the draft of the

manuscript, as well as Claudineia Ap. Ferreira de Godoi Veiga, Romilda Simoes de Queiroz,

Roseli Filizatti, Tristana Cezaretto, Erika S. de Souza Barboza, Cristiane Jardim Girioli,

Rosangela Scrich, Kelly Fiorelli Ferro, and Jose Maurıcio Haas Bueno for their valuable

assistance in the collection of the data. The author is especially grateful to Robert J. Sternberg


who contributed invaluable guidance during the author’s stay at Yale University during the

fall semester of 1997.

References

Assessment Systems. (1996). User’s manual for the MicroCat Testing System. St. Paul: ASC.

Baddeley, A. D., & Hitch, G. J. (1994). Developments in the concept of working memory. Neuropsychology, 8 (4),

485–493.

Bethell-Fox, C. E., Lohman, D. F., & Snow, R. E. (1984). Adaptive reasoning: componential and eye movement

analysis of geometric analogy performance. Intelligence, 8, 205–238.

Carpenter, P. A., Just, M. A., & Shell, P. (1990). What one intelligence test measures: a theoretical account of the

processing in the Raven Progressive Matrices test. Psychological Review, 97 (3), 404–431.

Carroll, J. B. (1993a). Human cognitive abilities: a survey of factor-analytic studies. New York: Cambridge

Univ. Press.

Carroll, J. B. (1993b). Test theory and the behavioral scaling of test performance. In: N. Frederiksen,

R. J. Mislevy, & I. I. Bejar (Eds.), Test theory for a new generation of tests (pp. 297–322). Hillsdale, NJ:

Lawrence Erlbaum Associates.

Carroll, J. B. (1997). The three-stratum theory of cognitive abilities. In: D. P. Flanagan, J. L. Genshaft, & P. L.

Harrison (Eds.), Contemporary intellectual assessment: theories, tests, and issues (pp. 122–130). New York:

Guilford Press.

Cattell, R. B. (1941). Some theoretical issues in adult intelligence testing. Psychological Bulletin, 31, 161–179.

Cohen, J. (1968). Multiple regression as a general data-analytic system. Psychological Bulletin, 70 (6),

426–443.

Crinella, F. M., & Yu, J. (1999). Brain mechanisms and intelligence. Psychometric g and executive function.

Intelligence, 27 (4), 299–327.

Duncan, J., Emslie, H., & Williams, P. (1996). Intelligence and the frontal lobe: the organization of goal-directed

behavior. Cognitive Psychology, 30, 257–303.

Embretson, S. (1983). Construct validity: construct representation versus nomothetic span. Psychological Bulletin,

93 (1), 179–197.

Embretson, S. (1985a). Studying intelligence with test theory models. In: D. K. Detterman (Ed.), Current topics in

human intelligence, 1, (pp. 3–17). Norwood, NJ: Ablex.

S., Embretson (Ed.) (1985b). Test design: developments in psychology and psychometrics. Orlando: Aca-

demic Press.

Embretson, S. (1994). Applications of cognitive design systems to test development. In: C. R. Reynolds (Ed.),

Cognitive assessment: a multidisciplinary perspective (pp. 107–135). New York: Plenum.

Embretson, S. (1995). The role of working memory capacity and general control process in intelligence. Intelli-

gence, 20, 169–189.

Embretson, S. (1996). The new rules of measurement. Psychological Assessment, 8 (4), 341–349.

Embretson, S. (1998). A cognitive design system approach to generating valid tests: application to abstract

reasoning. Psychological Methods, 3 (3), 380–396.

Engle, R. W., Tuholski, S. W., Laughlin, J. E., & Conway, A. R. A. (1999). Working memory, short-term memory,

and general fluid intelligence: a latent-variable approach. Journal of Experimental Psychology, General, 128 (3),

309–331.

Evans, T. G. (1968). Program for the solution of a class of geometric-analogy intelligent-test questions. In: M.

Minsky (Ed.), Semantic information processing (pp. 271–353). Cambridge, MA: MIT Press.

Flynn, J. R. (1985). Massive IQ gains in 14 nations: what IQ tests really measure. Psychological Bulletin, 101 (2),

171–191.

Flynn, J. R. (1998). IQ gains over time: toward finding the causes. In: U. Neisser (Ed.), The rising curve

(pp. 25–66). Washington, DC: American Psychological Association.


Frederiksen, N., Mislevy, R. J., & Bejar, I. I. (1993). Test theory for a new generation of tests. Hillsdale, NJ:

Lawrence Erlbaum Associates.

Goldman, S. R., & Pellegrino, J. W. (1984). Deductions about induction: analyses of developmental and individual

differences. In: R. J. Sternberg (Ed.), Advances in the psychology of human intelligence (vol. 2, pp. 149–197).

Hillsdale, NJ: Lawrence Erlbaum Associates.

Gonzales Labra, M. J. (1990). El nivel de abstraccion en las analogias geometricas (The level of abstraction of

geometric analogies). Revista de Psicologia General y Aplicada, 43 (1), 23–32.

Gonzales Labra, M. J., & Ballesteros Jimenez, S. (1993). Analisis componencial de las analogıas geo-

metricas (Componential analysis of geometric analogies). Revista de Psicologia General y Aplicada,

46 (2), 139–147.

Green, K. E., & Kluever, R. C. (1992). Components of item difficulty of Raven’s matrices. Journal of General

Psychology, 119 (2), 189–199.

Horn, J. L. (1986). Theory of fluid and crystallized intelligence. In: R. J. Sternberg (Ed.), Advances in the

psychology of human intelligence (vol. 3, pp. 443–451). Hillsdale, NJ: Lawrence Erlbaum Associates.

Horn, J. L. (1991). Measurement of intellectual capabilities: a review of theory. In: K. S. McGrew, J. K. Werder, &

R. W. Woodcock (Eds.), WJ-R technical manual (pp. 197–245). Allen, TX: DLM.

Horn, J. L., & Cattell, R. B. (1966). Refinement and test of the theory of fluid and crystallized general intelli-

gences. Journal of Educational Psychology, 57 (5), 253–270.

Horn, J. L., & Noll, J. (1997). Human cognitive capabilities: Gf–Gc theory. In: D. P. Flanagan,

J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment: theories, tests, and issues

(pp. 53–91). New York: Guilford Press.

Hornke, L. F., & Habon, M. W. (1986). Rule-based item bank construction and evaluation within the linear logistic

framework. Applied Psychological Measurement, 10 (4), 369–380.

Howell, D. C. (1997). Statistical methods for psychology. Boston: Duxbury Press.

Hunt, E. (1974). Quote the Raven? Nevermore! In: L. W. Gregg (Ed.), Knowledge and cognition (pp. 129–158).Potomac, MD: Lawrence Erlbaum Associates.

Hunt, E. (1996). Intelligence for the 21st century. Paper presented at the European Society for Cognitive Psycho-

logy and Spanish Society for the Study of Individual Differences, Madrid, Spain.

Hunt, E. (1999). Intelligence and human resources: past, present and future. In: P. L. Ackerman, P. C. Kyllonen, &

R. D. Roberts (Eds.), Learning and individual differences: process, trait and content determinants (pp. 3–28).

Washington, DC: American Psychological Association.

Jacobs, P. I., & Vandeventer, M. (1972). Evaluating the teaching of intelligence. Educational and Psychological

Measurement, 32, 235–248.

Jurden, F. H. (1995). Individual differences in working memory and complex cognition. Journal of Educational

Psychology, 87 (1), 93–102.

Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehension: individual differences in working

memory. Psychological Review, 99 (1), 122–149.

Klauer, K. J. (1990). A process theory of inductive reasoning tested by teaching of domain-specific thinking

strategies. European Journal of Psychology of Education, 5 (2), 191–206.

Kyllonen, P. C. (1994). CAM: a theoretical framework for cognitive abilities measurement. In: D. K. Detterman

(Ed.), Current topics in human intelligence theories of intelligence (vol. 4, pp. 307–360). Norwood, NJ:

Ablex.

Kyllonen, P. C., & Christal, R. (1990). Reasoning ability is (little more than) working memory capacity?!

Intelligence, 14, 389–434.Mack, A., Tang, B., Tuma, S., & Rock, I. (1992). Perceptual organization and attention. Cognitive Psychology, 24,

475–501.

Maris, E. (1995). Psychometric latent response models. Psychometrika, 60 (4), 523–547.

Marshalek, B., Lohman, D. F., & Snow, R. E. (1983). The complexity continuum in the radex and hierarchical

models of intelligence. Intelligence, 7, 107–127.

McGrew, K. S., Werder, J. K., & Woodcock, R. W. (1991). WJ-R technical manual. Allen, TX: DLM.

Microsoft. (1995a). Microsoft Visual Basic version 4.0 — programmer’s guide. WA: Redmond.

Microsoft. (1995b). Guide to data access objects. WA: Redmond.


Mulholland, T. M., Pellegrino, J. W., & Glaser, R. (1980). Components of geometric analogy solution. Cognitive

Psychology, 12, 252–284.

Neisser, U. (1998). Introduction: rising test scores what they mean. In: U. Neisser (Ed.), The rising curve

(pp. 3–22). Washington, DC: American Psychological Association.

Palmer, S. E. (1992). Common region: a new principle of perceptual grouping. Cognitive Psychology, 24,

346–447.

Prabhakaran, V., Smith, J. A. L., Desmond, J. E., Glover, G. H., & Gabrieli, J. D. E. (1997). Neural substances of

fluid reasoning: an fMRI study of neocortical activation during performance of the Raven’s Progressive

Matrices test. Cognitive Psychology, 33, 43–63.

Primi, R. (1995). Inteligencia, processamento de informacao e teoria da gestalt: um estudo experimental (In-

telligence, information processing and gestalt theorie: an experimental study). Unpublished Master’s thesis,

Catholic University of Campinas, Campinas.

Primi, R., & Castilho, A. V. (1996). Processos cognitivos e dificuldade dos itens do teste Raven — um estudo

baseado na IRT (Cognitive processes and complexity of Raven test items: a study based on Item Response

Theory). In: Encontro de Tecnicas do Exame Psicologico: Ensino, Pesquisa e Aplicacoes, 2. Sao PauloProg-

rama e Resumos, p. 8. Sao Paulo: IP-USP.

Primi, R., & Rosado, E. M. S. (1995). Os princıpios de organizacao perceptual e a atividade inteligente: um estudo

sobre testes de inteligencia (Principles of perceptual organization and intelligent mental activity: a study about

intelligence tests). Estudos de Psicologia, 11 (2), 3–12.

Primi, R., Rosado, E. M. S., & Almeida, L. S. (1995). Resolucao de tarefas de raciocınio analogico:

contributos da teoria da gestalt a compreensao dos problemas subjacentes (Resolution of analogy reasoning

tasks: gestalt contribution to the comprehension of basic cognitive components). In: L. S. Almeida, & I. S.

Ribeiro (Eds.), Avaliacao Psicologica: Formas e Contextos (vol. 3, pp. 559–562). Braga: APPORT

(Associacao dos Psicologos Portugueses).

Raven, J., Raven, J. C., & Court, J. H. (1998). Manual for Raven’s progressive matrices and vocabulary scales:

section 1. general overview. Oxford: Oxford Psychologists Press.

Rock, I., & Palmer, S. (1990). The legacy of Gestalt psychology. Scientific American, 263, 48–61 (December).

Rumelhart, D. E., & Abrahamson, A. A. (1973). A model for analogical reasoning. Cognitive Psychology, 5,

1–28.

Salthouse, T. A. (1994). The aging of working memory. Neuropsychology, 8 (4), 535–543.

Salthouse, T. A., Babcock, R. L., & Shaw, R. J. (1991). Effects of adult age on structural and operational capacities

in working memory. Psychology and Aging, 6 (1), 118–127.

Salthouse, T. A., Legg, S., Palmon, R., & Mitchell, D. (1990). Memory factors in age-related differences in simple

reasoning. Psychology and Aging, 5 (1), 9–15.

Snow, R. E., Kyllonen, P. C., & Marshalek, B. (1984). The topography of learning and ability correlations.

In: R. J. Sternberg (Ed.), Advances in the psychology of human intelligence (vol. 2, pp. 47–103). Hill-

sdale, NJ: Lawrence Erlbaum Associates.

Sternberg, R. J. (1977). A component process in analogical reasoning. Psychological Review, 84 (4),

353–378.

Sternberg, R. J. (1978). Isolating the components of intelligence. Intelligence, 2, 117–128.

Sternberg, R. J. (1980). Sketch of a componential subtheory of human intelligence. Behavioral and Brain

Sciences, 3, 573–613.

Sternberg, R. J. (1984). Toward a triarchic theory of human intelligence. Behavioral and Brain Sciences, 7,

269–315.

Sternberg, R. J. (1986). Toward a unified theory of human reasoning. Intelligence, 10, 281–314.

Sternberg, R. J. (1997). The triarchic theory of intelligence. In: D. P. Flanagan, J. L. Genshaft, & P. L.

Harrison (Eds.), Contemporary intellectual assessment: theories, tests, and issues (pp. 92–104). New

York: Guilford Press.

Sternberg, R. J., & Gardner, M. K. (1983). Unities in inductive reasoning. Journal of Experimental Psychology,

General, 112, 80–116.

Ward, J., & Fitzpatrick, T. F. (1973). Characteristics of matrices items. Perceptual and Motor Skills, 36, 987–993.

Webster’s new collegiate dictionary, (1981). Springfield, MA: Merriam-Webster.


Whitely, S. E. (1980a). Modeling aptitude test validity from cognitive components. Journal of Educational

Psychology, 72 (6), 750–769.

Whitely, S. E. (1980b). Multicomponent latent trait models for ability tests. Psychometrika, 45 (4), 479–494.

Whitely, S. E. (1980c). Latent trait models in study of intelligence. Intelligence, 4, 97–132.

Whitely, S. E., & Schneider, L. M. (1981). Information structure for geometric analogies: a test theory approach.

Applied Psychological Measurement, 5 (3), 383–397.

Woodcock, R. W. (1990). Theoretical foundations of the WJ-R measures of cognitive ability. Journal of Psycho-

educational Assessment, 8, 231–258.

Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: MESA.


Complexity of geometric inductive reasoning tasks

Documents

Transcript of Complexity of geometric inductive reasoning tasks