Download - Karuza, E., Farmer, T., Fine, A. B., Smith, F. X., and Jaeger, T. F. UNDER REVIEW. Evidence for the fine-tuning of expectations in self-paced non-adjacent dependency learning

Running head: PREDICTION DURING LEARNING

Evidence for the fine-tuning of expectations in self-paced non-

adjacent dependency learning

Elisabeth A. Karuza*1, Thomas A. Farmer2, Alex B. Fine3, Francis

X. Smith2, & T. Florian Jaeger1,4,5

1Department of Brain and Cognitive Sciences, University ofRochester, Rochester, NY 14627

2Department of Psychology, University of Iowa, Iowa City, IA52242

3Department of Psychology, The Hebrew University of Jerusalem,Israel

4Department of Linguistics, University of Rochester, Rochester,NY 14627

5Department of Computer Science, University of Rochester,Rochester, NY 14627

1


Corresponding Author*:Elisabeth A. Karuza

Center for Cognitive NeuroscienceGoddard Laboratories Room 322University of Pennsylvania

Philadelphia PA 19104Email: [email protected]

*Author’s current affiliation differs from the bylineaffiliation

WORD COUNT: 7165

Abstract

As lifelong statistical learners, humans are remarkably sensitive

to the unfolding of elements and events in their surroundings. In

the present work, we examine the bi-directional influence of

prediction-based processing and learning as adult participants

were exposed to a visual artificial grammar containing a non-

adjacent dependency. Using a self-paced moving window display, we

recorded response times as learners progressed through a series

of structured glyph sequences. After accounting for general task

adaptation effects, we quantified the growing influence of

element predictability on those response times. We find that, as

a function of exposure, participants generally processed the

2


grammar increasingly faster; however, the facilitatory benefit

was significantly greater for the perfectly predictable items of

the grammar. In turn, this progressive processing benefit on

predictable elements was uniquely correlated with off-line

performance on a post-test. Our results indicate that

participants who develop implicit predictions as they learn, and

have their expectations met, achieve higher learning outcomes.

Links between these findings, obtained with novel stimuli in an

experimental context, and the role of prediction in natural

language comprehension are considered.

Introduction

3


Underlying our ability to interact effectively with the

environment is neural machinery equipped to acquire novel

representations and take advantage of expectations about familiar

ones. In the case of language, this means that the human brain

contends both with learning and efficiently understanding it.

Though acquisition and comprehension differ in a number of

respects, they are related in that they both hinge on an

exquisite sensitivity to the statistical patterns inherent to

linguistic input (Seidenberg & MacDonald, 1999). A substantial

literature on “statistical learning” has demonstrated that

infants and adults can detect regularities embedded in artificial

languages in order to form representations of previously unknown

structures (see Romberg & Saffran, 2010 for a review). Likewise,

studies of natural language processing offer evidence that stored

knowledge of probabilistic grammatical patterns plays a central

role in on-line comprehension (e.g., Arai & Keller, 2013;

Garnsey, Pearlmutter, Myers, & Lotocky, 1997; Jurafsky, 1996;

Levy, 2008; MacDonald, Pearlmutter, & Seidenberg, 1994;

Trueswell, Tanenhaus, & Garnsey, 1994; for a review, see

MacDonald, 2013). Recurrent in this work is the observation

4


that, as syntactic structures unfold, downstream elements that

are statistically predictable require fewer processing resources.

Furthermore, once we have developed relatively stable knowledge

of predictive relationships, we can rapidly adapt our

expectations about upcoming input based on context-specific

statistics (e.g., Farmer, Fine, Yan, Cheimariou, & Jaeger, 2014;

Fine, Jaeger, Farmer, & Qian, 2013; Kamide, 2012). Adult

listeners will, for example, show a reduction in ERP surprisal

effects as they are repeatedly exposed to syntactic violations in

their native language (Hanulíková, Van Alphen, Van Goch, & Weber,

2012; see also Hahne & Friederici, 1999).

Thus, expectation-based processing has been implicated as a

key mechanism in language comprehension. While knowledge of

predictive relationships are known to arise early in first

language acquisition (e.g., Bannard & Matthews, 2008; Lew-

Williams & Fernald, 2007), less attention has been paid to how

this knowledge, as it is being acquired, affects processing

efficiency in adults. In the present set of experiments, we offer

new insight into the intersection between the language

comprehension and acquisition literatures. We investigate how

5


statistical learning mechanisms shape the development of accurate

expectancies made during exposure to unfamiliar input. Modifying

the canonical statistical learning paradigm to allow for the

collection of an on-line index of processing difficulty, we

examine the time-course over which learners develop knowledge of

dependent relationships. These relationships are presented in the

context of an artificial grammar so that participants are without

strong prior knowledge about its structure. We explore properties

of the learning curve that underlies this process and trace, at

an individual differences level, the link between the implicit

generation of expectations during exposure to a novel language

and ultimate learning outcomes. We begin by discussing previous

work that has played a central role in informing the design of

the current study before outlining the details of the experiment.

Learners become attuned to various kinds of statistical

regularities as they extract structure from their surroundings.

Studies of statistical learning commonly involve manipulation of

a specific type of regularity: the transitional probabilities

between adjacent elements. For example, in a word segmentation

task, a high transitional probability between neighboring

6


syllables might suggest that those syllables form a coherent

chunk (i.e., a word; Saffran, Aslin, & Newport, 1996). However,

contingencies exist not only between adjacent items, but also

between items that are not in direct proximity to one another

(Newport & Aslin, 2004). To offer an example from natural

language: the present progressive in English can be formed by

combining the auxiliary verb is and a main verb marked with the

inflectional suffix -ing. Thus, is and -ing have a high

transitional probability, whereas the intervening main verbs vary

widely (e.g., is eating, is sleeping, is walking, etc.). Along

these lines, Gómez (2002) created an artificial grammar of the

form A-X-B in which pseudowords in the final position of a string

(B) were perfectly predictable given pseudowords in the initial

position (A). In contrast, X items were drawn from a

systematically varied set of possible elements. Gómez

demonstrated that both infant and adults acquired the non-

adjacent dependencies between A and B after a period of passive

auditory exposure.

Here, we capitalize on a well-established behavioral metric

of implicit learning—motor response time—to monitor progressive

7


changes in the allocation of processing resources during the

acquisition of non-adjacent dependencies. Importantly, canonical

studies of statistical learning tend to assess acquisition of a

statistical regularity by embedding it in an artificial language,

exposing participants to the artificial language, and

administering one off-line post-exposure grammaticality judgment

test to assess whether or not knowledge of the regularity was

acquired (but see, e.g., Hunt & Aslin, 2001; Karuza et al., 2013;

Misyak, Christiansen, & Tomblin, 2010a). As a result, we have

only a surface understanding of the time course of the

acquisition of statistical relationships among units of an

artificial language. To address this limitation, we adopt a self-

paced moving window display used frequently to study word-by-word

reading in the sentence processing literature (Just, Carpenter, &

Woolley, 1982). This paradigm has been used to examine changes in

expectations in natural language processing (e.g. Fine et al.,

2013), and trades on the firmly established inverse relation

between reading times and readers’ expectations (i.e., that a

reader will spend a greater amount of time processing

unpredictable words or syntactic structures, e.g., Frank & Bod,

8


2011; Garnsey et al., 1997; Levy, 2008; McDonald & Shillcock,

2003; McRae, Spivey-Knowlton, & Tanenhaus, 1998; Smith & Levy,

2013). The moving window display enables us to collect reaction

time (RT) data as participants progress, at their own pace,

through each element in an artificial grammar. On a trial-by-

trial level, we are thus able to measure the incremental cumulative

effect of exposure on the processing difficulty from the earliest

stages of non-adjacent dependency learning onward.

Specifically, we translate the Gómez grammar into the visual

modality using a set of glyphs unfamiliar to subjects, and

instruct participants to advance through each A-X-B triplet by

pressing the space bar. We then measure, on a glyph-by-glyph

basis, how the processing difficulty associated with each element

in the grammar changes over the course of repeated instances of

the non-adjacent dependency. In parallel with successful

learning, participants should develop increasingly accurate

expectations about the nature of upcoming stimuli, which should

manifest as a processing benefit on predictable elements in the

input sequence (i.e., faster motor response time). Thus, we

expect to observe a facilitatory effect—a growing decrease in

9


processing time, on predictable elements (B in the A-X-B sequence)

relative to predictive elements (A) as a function of exposure to

the artificial grammar.

To test these hypotheses, we measured glyph-by-glyph RTs as

participants read sequences of glyphs sampled from one of two

grammars: an A-X-B grammar in which A perfectly predicts B (the

structured condition), or a control condition in which glyph

frequency was matched, but non-adjacent elements were

comparatively uninformative (Unstructured condition). We find

that in the statistically Structured condition, in which A is

perfectly predictive of B, relative to the Unstructured

condition, which lacks a robust non-adjacent dependency,

processing of B elements was facilitated relative to the

predictive A elements as exposure increased. This effect held

even after completely removing the effect of “task adaptation” (a

general speedup in RTs as subjects become increasingly familiar

with the self-paced moving window paradigm). Finally, we probe

the relationship between on-line facilitatory learning effects

and post-exposure learning performance by asking whether

participants who develop more precise predictions during exposure

10


(those individuals with the greatest processing benefits on

predictable elements) also exhibit more accurate knowledge of the

language’s grammar on an off-line post-test. Thus, we

demonstrate that statistical learning of non-adjacent

dependencies can be characterized not as a singular passive

process by which the learner merely stores statistical

information, but instead as a bi-directional process in which the

progressive accrual of knowledge about a predictive relationship

leads to the generation of more accurate predictions that

facilitate processing over the time-course of learning.

Additionally, we demonstrate a previously unspecified link

between variability in the on-line development and fine-tuning of

predictions and individual differences in the extent of

acquisition as gauged by post-test familiarity judgments.

Materials and Methods

Participants

68 participants recruited from the University of Rochester

and University of Iowa communities completed the study. All were

11


native English speakers between the ages of 18 and 30. They

provided informed consent and were compensated financially

(Rochester) or for course credit (Iowa). The experiment lasted

approximately one hour, depending on the pace of the participant.

No participant reported familiarity with the glyph-based writing

system used to generate the stimuli (as indicated by a language

history questionnaire).

Stimuli

All images used in this study were glyphs borrowed from the

Ge’ez script, a syllabic alphabet used for several, mostly

Semitic, languages spoken primarily in Ethiopia and Eritrea,

including Amharic and Tigre (Appendix A; font downloaded from

http://www.senamirmir.org). These stimuli were selected because

we required a large set of visually distinct items that would be

unfamiliar to participants in our experiment.

~~~~~~~~~Figure 1~~~~~~~~~~

12


Structured condition. Participants in this condition were

exposed to a series of 3-element strings of the form A-X-B. A and

B elements were drawn from a set of 6—3 always occurring as A and

3 always occurring as B—and were paired such that each A-element

always co-occurred with the same B element (i.e., A1-X-B1, A2-X-

B2, A3-X-B3). In contrast, X elements were drawn from a pool of

24 items. Thus, calculated over the entire exposure phase, the

transitional probability between non-adjacent items within a string

(i.e., B|A) was 1.0, but the transitional probability between

adjacent items within a string (i.e., X|A or B|X) was

comparatively low, either 0.04 or 0.33, respectively (Figure 1).

The 3 pairs of A and B elements were combined exhaustively with

the full set of 24 X elements, rendering 72 unique sequences. As

in the original Gómez study, these strings were repeated 6 times

and then randomized to create a list of 432 triplets.

Of the triplets tested post-exposure, half matched the A-X-B

form found in the exposure phase (e.g., A1-X4-B1) and half

violated that form because they contained unmatched A and B items

(e.g., A1-X4-B3). By design, the matched strings had occurred 6

13


times during exposure, while the unmatched strings had not

occurred during exposure.

Unstructured control. In the previous condition, unique A

and B glyphs (of which there were only 6, each appearing 144

times) were far more frequent than unique X glyphs (of which

there were 24, each appearing a total of 18 times). We were not

interested in the effect of such differences in overall symbol

frequency, but rather in differences related to non-adjacent

element predictability. Thus, to examine non-adjacent dependency

learning separately from RT effects potentially attributable to

highly unbalanced glyph frequency, we created an “Unstructured”

control condition also consisting of 72 triplet strings also

repeated 6 times each. Individual A, B, and X items were exactly

matched in frequency to the Structured condition, but the sharp

contrast between adjacent and non-adjacent transitional

probabilities were no longer a stable cue to grammatical

structure, which was purposefully lacking in this condition.

Stimuli were engineered such that the transitional probability

between any two adjacent or non-adjacent items was matched.

Furthermore, triplet region was comparatively uninformative in

14


the Unstructured condition; that is, all A, X, and B glyphs

occurred in each of the 3 presentation slots. In contrast,

recall that in the Structured condition, item B was perfectly

predictable given A (p(B|A)=1.0) and always occurred in the first

and third regions of the triplets, respectively (Figure 1).

Exactly as in the Structured condition, 6 of the test trials

were seen previously in the exposure, and, 6 contained a single-

glyph violation on a previously viewed triplet. The 6 test

triplets were drawn at random from the exposure list and the

violations occurred with equal frequency in each triplet region.

Thus, the Unstructured and Structured conditions were as closely

matched as possible, but differed along a critical dimension: the

absence of a strong non-adjacent dependency. For both conditions,

exposure and test lists were presented in one of two randomized

orders.

Random control. To account for general task adaptation in

the absence of any robust statistical regularities, we also

created a completely random (with replacement) condition in which

the frequency and position of glyphs in each triplet string

varied freely throughout the exposure phase. A unique series of

15


432 triplets was generated for each subject by sampling, at each

triplet region, from a uniform distribution of all 30 possible

elements. Post-test triplets were drawn at random from the

exposure phase, and foils were triplets that never appeared in

the exposure phase.

Procedure

The experiment consisted of 4 phases: familiarization with

the individual glyphs, exposure to the glyph sequences, an off-

line post-test establishing the extent of learning, and a

debriefing phase (Table 1). Procedures were matched across

conditions.

~~~~~~~~~Table 1~~~~~~~~~~

Familiarization. Participants first completed a brief (~5

min) matching task. This task ensured that RT effects would be

largely reflective of learning, not of surprisal at the

occurrence of a novel glyph (though with such a brief

familiarization phase, it is likely that participants were still

16


acquiring the inventory of glyphs during the early stages of the

exposure phase). During the matching task, each glyph was flashed

on the computer screen for 2 s. Next, the participant was

presented with three options and asked to select which option

corresponded to the glyph they had just observed. Participants

advanced to the next trial only after a correct response.

Exposure. The exposure phase consisted of 432 triplets and

144 intermittent catch trials (72 response “yes”, 72 response

“no”). Since we were interested in changes in processing time, we

included these catch trials to ensure that participants were

attending to the stimuli during the exposure phase. They required

participants to indicate whether or not they had seen a specific

item in the previous triplet sequence, ensuring that participants

actually “read” the elements. In both the Structured and

Unstructured conditions, 96 catch trials tested on X glyphs and

48 tested on the highly frequent A and B glyphs1. This

1 There was a slight difference in the format of the catch trialsbetween the Structured and Unstructured conditions related to thefact that the triplet region (position 1, 2 or 3 in the string) and glyph identity (A, X, or B) were perfectly correlated in the Structured condition. Of the 72 “yes” trials in the Structured condition, 48 probed region 2 and a total of 24 probed regions 1 and 3. In the Unstructured condition, each region was probed

17


distribution was purposefully selected so as not to bring undue

attention to the regularity present in the highly repetitive A

and B elements. In the Random condition, in which triplets were

generated by sampling from a uniform distribution of all

elements, participants were tested on each glyph an equal number

of times. Participants were instructed to pay attention to the

screen and make their best effort. Regardless of condition, they

were informed that stimuli might become familiar over time.

Participants performed three initial practice trials consisting

of number and letter—instead of glyph—sequences.

The pace of the exposure trials was controlled entirely by

the participant, as shown in Figure 2. RTs were recorded for each

element in the sequence, thus providing a by-element index of

processing difficulty associated with each element as training

progressed. At the start of each triplet trial, the participant

saw 5 horizontal dashes centered on the computer screen. They

initiated a sequence by pressing the space bar, at which point

the first dash became a small, opaque circle. At the next press

of the space bar, the circle became a dash and the second dash

equally: 24 times.

18


was replaced by the first glyph (A). With another press of the

space bar, the first glyph became a dash again and the next dash

became the second glyph (X). This process continued until the

trial was completed. The circles in the initial and final

positions were included to reduce any effects on RTs associated

with initiating or ending a trial. Participants were offered a

built-in break option every 96 trials.

~~~~~~~~~Figure 2~~~~~~~~~~

Post-test. The final phase contained 12 test items. Triplets

were presented in their entirety (i.e., all glyphs appeared

simultaneously in a row). For each trial, participants indicated

whether or not that 3-part sequence had been observed during the

exposure phase by pressing “yes” or “no”. Instructions were

phrased as follows: “Now we'd like you to indicate whether or not

some test sequences seem familiar, like you saw them in the

previous exposure phase. We want you to tell us whether or not

you have already seen each entire sequence.” Post-exposure

learning was estimated as the proportion of correctly accepted

19


matched strings and correctly rejected unmatched string out of

all 12 strings.

Debriefing. At the end of the experiment, participants

filled out a debriefing form intended to examine the extent of

their explicit knowledge about the structure of the language

presented to them. In the control conditions, this debriefing

form was essentially meaningless, as there was no strong

dependency to learn. However, in the structured condition

questions such as, Did you notice any patterns? and How did you make your

familiarity judgments during the test phase? These questions enabled us to

determine whether participants had explicit knowledge of the

strong non-adjacent association between the A and B items in the

exposure strings.

Results

Exclusions

Of the original 68 participants, 11 were excluded because

their performance on the catch trial task fell below a pre-

determined threshold of 70% accuracy (thus we could not be sure

whether or not those participants were completing the task as

20


directed). Therefore, 57 participants are included in the present

analyses: n=19 in the Structured condition, n=18 in the

Unstructured and n=20 in the Random condition2. We excluded

trials with response times exceeding 6 s (data loss 0.3%) or RTs

deviating more than 3 SDs of the mean processing time per glyph

(data loss 2.1%). These criteria parallel standard exclusions

from self-paced reading experiments. All results reported below

hold also without these exclusions.

Overview of Analyses

Our objective was to evaluate changes throughout the

exposure phase in RTs for each element in the triplet sequences.

We expected that as participants learned the structure of the A-

X-B grammar, we would observe an increasing facilitatory effect

on the predictable B items of the Structured condition (relative

to the predictive A items). In addressing this question, however,

it is important to keep in mind that RTs also are known to

reflect effects unrelated to the learning of statistical

2 Approximately similar sample sizes used in prior studies of statistical learning in adults (e.g., Karuza et al., 2013; Newport & Aslin, 2004; Saffran, Newport & Aslin, 1996).

21


dependencies. In particular, RTs often differ depending on the

position of words in a sentence (Kuperman, Dambacher, Nuthan, &

Kleigl, 2010). Second, it is well known that RTs in self-paced

reading experiments are subject to considerable task adaptation

effects (i.e., participants show an overall decrease in motor

response time as the experiment progresses, Fine, Qian, Jaeger, &

Jacobs, 2010; Fine et al., 2013). Since we are interested in

differences in the changes in RTs between triplet regions over

the course of the experiment, these two confounds are potentially

critical. Ruling out task adaptation as an explanation for the

learning effects we predicted in the structured condition

motivated the Random control condition, which allowed us to

remove such task-general effects from our contrast of primary

interest: RTs in the Structured relative to the Unstructured

condition.

In Step 1 of our analyses, we estimate region-specific and

practice effects against data from only the Random condition,

which does not contain non-adjacent dependencies. These estimates

are used to calculate corrected RTs for the Structured and

Unstructured conditions (in much the same way that length-

22


corrected RTs are used in self-paced reading studies). These

corrected RTs remove the effects of symbol position and task

practice from the raw recorded RTs.

In Step 2 we analyze corrected RTs by comparing how

differences between these corrected RTs in Regions 1 and 3 change

over the course of the experiment in both conditions. We thus

establish an index of each participant’s ability to pick up on

the predictive relationships during learning.

Finally, Step 3 tests whether this measure of individual

variation in participants’ sensitivity to predictability during

processing predicts their performance on offline grammaticality

judgments during the post-exposure test.

Step 1: General task effects. We begin by visualizing the

effects of task practice and triplet region observed in the

Random conditions compared to the Structured and Unstructured

conditions. Figure 3 shows a non-linear generalized additive

model in which Trial (1-432), Region 1, 2, 3) and their

interaction were regressed onto log-transformed RTs for each of

the three experimental conditions. Generalized additive models

are a powerful tool to fit data with unknown degrees of non-

23


linearity, accommodating highly non-linear relations between a

set of predictors. We note that logarithm-transformed RTs were

used because a) this is a common transform in RT analyses in

order to correct for violations of normality due to the (lower)

boundedness of RT data (for a discussion see Whelan, 2008) and b)

log-transforming the RTs indeed led to normally distributed

residuals. When reporting the analyses below, we specify

instances when lack of a log-transform alters the significance of

our results.

~~~~~~~~~Figure 3~~~~~~~~~~

As evident in Figure 3, RTs are overall faster in Region 3

than in Regions 1 and 2, even in the Random condition, which

lacked robust statistical regularity. Additionally, we observe

an overall decrease in RTs over the course of the experiment,

regardless of triplet region. The impression that RTs are subject

to task-general effects was confirmed by a linear mixed effects

regression against RTs from the Random condition. Using the lmer()

function (library lme4, v. 1.1-7; Bates, Maechler, Bolker, &

24


Walker, 2014) in R (v. 3.1.1; R Development Core Team, 2014), we

regressed log RTs from the Random condition onto all main effects

and interactions of Trial (1-432) and Region (1, 2 and 3). We

included by-participant random effects with the maximal random

effects structure that still allowed the model to converge. All

predictors were centered to reduce collinearity between main

effects and their interactions (fixed effect correlation rs<

0.6).

In addition to a significant main effect of Trial (β =

−0.001, t = −41.52, p<0.001), we find that Region 3 was

processed more quickly than the average of Regions 1 and 2 (β =

−0.06, t = −7.09, p<0.001). Moreover, a significant interaction

between Trial and Region indicates that this facilitatory effect

on Region 3 grew more pronounced over the course of exposure,

even when there was no underlying triplet structure (β = −0.0001,

t = −7.30, p<0.001)3. In sum, we observe in the Random condition

that participants processed all glyphs increasingly faster over

the course of exposure and that participants showed the greatest

3 All p-values for these and future results were calculated usingthe Sattherwaite approximation method implemented in the lmerTest package (v. 2.0-25; Kuznetsova, Brockhoff, & Christiansen, 2014).

25


facilitatory effect for Region 3. Most strikingly, we observed

that RTs for the last elements in each triplet progressively

diverged over the course of the experiment, even in the absence of non-

adjacent dependency learning. Note that this pattern of diverging RTs

is precisely our prediction for the Structured condition, in

which we anticipate that processing times on B elements should

continue to decrease relative to A elements as learning unfolds.

Therefore, to avoid anti-conservativity and to ensure that

we were not misinterpreting task-general RT effects as a

signature of learning, we removed these overall task effects from

the RTs before comparing the Structured and Unstructured

conditions. We accomplished this by fitting a generalized

additive model (as in Figure 3) to data from the Random condition

and using that function to derive predicted RTs for the Structured

and Unstructured groups.4 The model was fit with the gam()

4 An alternative approach would be to use a linear mixed effect regression with non-linear terms for the Trial effect. Using model comparison over linear mixed effect regressions found that the best fit is obtained for a model with square-root transformedTrial effects. Since the effects reported in Step 2 were robust to the choice of residualization technique, we report the generalized additive model residualization since it is arguably the most conservative approach for our purpose.

26


function from the mgcv library (v. 1.8-0; Wood, 2011). The

difference between the actual and predicted log RTs (i.e., the

residual log RTs) were used as the dependent variable in the next

step of our analysis, where we compare RT effects in the

Structured and Unstructured conditions. We note that the

significant effects of predictability reported below hold even

without residualization; our approach removes effects that would

otherwise bias in favor of our hypothesis. The use of

residualized RTs is standard in, for example, self-paced reading

experiments (e.g., to remove word-length effects, Ferreira &

Clifton, 1986; and practice effects, Fine et al., 2013). To

demonstrate that this procedure successfully removes region and

practice effects, Figure 4 plots these residual log RTs for the

Random condition over time.

~~~~~~~~~Figure 4~~~~~~~~~~

Step 2: Sensitivity to predictability in non-adjacent

dependency learning. Focusing on contrasts between predictive

and predictable elements in the Structured and Unstructured

27


group, we then analyzed the corrected RTs for Regions 1 and 3,

respectively. Recall that in the Structured condition, Region 1

corresponded to the predictive A glyphs and Region 3 corresponded

to the predictable B glyphs. Figure 5 illustrates the change of

corrected RTs over the course of the experiment for both the

Structured and Unstructured group (cf. Figure 4 for the Random

group). The effect of element predictability in the Structured

condition is clearly visible: as participants in the Structured

group incrementally learn to predict B glyphs from A glyphs, they

increasingly ‘read’ B glyphs faster (Figure 5, left panel). No

such divergence between Region 1 and 3 is expected –and none is

observed—in the Unstructured condition.

~~~~~~~~~Figure 5~~~~~~~~~~

This result was confirmed through linear mixed effects

regression that included an additional predictor intended to

capture the cumulative probabilities of position-specific glyphs.

This predictor, which we label Predictability, was calculated at

each trial as a moving proportion of instances a given glyph

28


appeared in a specific triplet region, capturing both time (i.e.,

number of trials as they unfolded) and element probability within

a given region. It was thus reflective of the statistical

regularity to which an ideal learner would likely be sensitive in

extracting the underlying structure of the A-X-B grammar. To be

clear, we do not claim here that participants would exploit this

specific statistic at the exclusion of all others. Rather, we

simply sought a reasonable model of the probabilistic beliefs

that a learner might develop in the process of acquiring the non-

local grammatical dependency. In the Structured condition,

position-specific glyph probabilities were highly informative;

the small set of A and B glyphs were unique to the first and

third regions of the triplet, and they shared a strong, though

non-adjacent link (p(B | A) = 1.0). In contrast, individual

glyphs in the Unstructured condition occurred with equivalent

probability in all locations within a triplet, and the

transitional probability between the first and final positions

was far less robust (Figure 1). As a result, an ideal learner in

the latter condition would presumably be less likely to learn position-

29


specific probabilistic regularities in generating predictions

about the nature of upcoming stimuli.

To test whether participants exhibited differing levels of

sensitivity to position-specific regularities in the Structured

relative to the Unstructured condition, corrected RTs were

regressed onto all main effects and interactions of log-

transformed Predictability, Region (1 or 3) and Condition

(Structured v. Unstructured). The Predictability measure was

derived by computing, at each trial, the cumulative frequency of

each glyph in a given position, then normalizing by the previous

number of trials. We used simple add-1 smoothing, which assumes

that, due to the matching task prior to exposure, each glyph had

been observed once prior to beginning the experiment. The model

included the maximal random effects structure (random by-

participant intercepts and slopes for all within-participant

manipulations, i.e., for the main effects and interaction of

Predictability and Region). Because position-specific glyph

probabilities were largely uninformative in the case of the

Unstructured condition, we anticipated that RTs in the Structured

30


condition would show a stronger facilitatory effect contingent on

Predictability.

Results are summarized in Table 2. We obtained a significant

main effect of Predictability (β = −0.10, t = −5.10, p<0.001) as

well as a significant two-way interaction between Predictability

and Condition (β = −0.06, t = −3.13, p<0.01). These results

indicate that participants differed in their sensitivity to

Predictability between conditions, and that the magnitude of the

facilitatory RT effect was greater in the highly predictable

Structured condition. A subsequent simple effects analysis

revealed that the effect of position-specific Predictability was

significant in the Structured Condition (β= −0.16, t =

−5.69, p<0.001), but not the Unstructured condition ((β= −0.04, t

= −1.43 p=0.17). This result was expected, as an ideal learner

would likely exploit the highly regular position-specific

statistics in the Structured condition, but less so in the

Unstructured condition. Crucially, the significant three-way

interaction between Predictability, Region, and Condition (β=

−0.04, t = −3.03, p<0.01) demonstrates that participants’

sensitivity to element predictability was not only greater in the

31


Structured relative to the Unstructured condition, but also that

it was highly dependent on a given glyph’s position in the

triplet. Namely, as exposure progressed (and position-specific

glyph probabilities increased), processing time associated with

the last element in each triplet decreased more quickly than

processing time on the first element in each triplet, and this

difference was greater when the first and last regions contained

a robust non-adjacent dependency (i.e., in the Structured

condition, where the glyphs in Region 3 were perfectly

predictable given the glyphs in Region 1)5.

~~~~~~~~~Table 2~~~~~~~~~~

5 Without log-transforming our RTs, we maintain a significant effect of Predictability. However, the two-way interaction between Predictability and Condition and the three-way interaction between Predictability, Region, and Condition are only marginally significant (β= −32.68, t = −1.97, p=0.06 and β=−19.00, t = −1.63, p=0.12, respectively). As noted above, however, log-transformation of RTs was justified both on prior theoretical grounds and by the distribution of residuals. We further note that the use of corrected RTs is highly conservative, as it effectively attributes all variance that is ambiguous between task adaptation and non-adjacent dependency learning to the former.

32


Step 3: The relationship between prediction and learning

outcomes. To evaluate the relationship between the development

of predictions during online sequence processing and the

performance on post-exposure offline grammaticality judgments, we

assessed the correlation between participants’ sensitivity to our

Predictability measure during processing and off-line accuracy

scores in the post-exposure familiarity test. For each

participant, sensitivity to predictability score was calculated by summing

the simple fixed effect coefficients for the Predictability

effect and the by-participant random effects from the linear

mixed effect regression (i.e., the Best Linear Unbiased

Predictors; Baayen et al., 2008). We evaluated these scores

separately for Regions 1 and 3. Figure 6B reveals that

sensitivity to glyph predictability is only significantly

correlated with ultimate learning performance on the B elements

in the Structured condition (r = -0.51, p=0.03). That is, the

faster an individual’s RTs decrease on the predictable (B) items

in triplet sequence, the greater their ultimate performance at

post-test grammaticality judgments. In the Structured condition,

this effect only holds true for the B elements, which are

33


perfectly predictable given the A glyphs in Region 1. In line

with our expectations, we do not find a significant correlation

between sensitivity to predictability on Region 1 and post-test

performance (r = -0.30, p=0.21), indicating that, although

participants were indeed sensitive to region-specific statistics

associated with the A items, that sensitivity did not predict

their learning outcomes. Finally, as anticipated, we do not

observe a correlation between post-test performance and

sensitivity to glyph probability for either Region 1 (r = 0.20,

p=0.44) or Region 3 (r = 0.26, p=0.30) in the Unstructured

condition.

~~~~~~~~~Figure 6~~~~~~~~~~

Discussion

While statistical learning has been richly supported as a

domain-general mechanism of acquisition, its real-time processing

dynamics have been largely obscured by the overwhelming use of

off-line post-tests (but see, e.g., Misyak et al., 2010a). That

children and adults tap into regularities in their environment

34


has been demonstrated thoroughly, yet far less attention has been

paid to how that sensitivity impacts the manner in which we

process and interact with novel surroundings. By visualizing

trial-by-trial motor response to patterned input, we have framed

prediction, not as a by-product of learning, but as a concurrent,

interlinked process related to post-exposure learning outcomes.

At their core, the present results suggest that the acquisition

of knowledge about statistical dependencies can be indexed by

participants’ success in generating increasingly precise

expectations about that regularity. These findings enable us to

make contact with prior proposals that error-driven learning plays a

central role in both processing and acquisition; language

processing is assumed to involve prediction and prediction errors

are assumed to lead to learning (e.g., Chang, Dell, Bock, &

Griffin, 2000; Chang, Dell, & Bock, 2006; Fine & Jaeger, 2013;

Jaeger & Snider, 2013; for recent reviews, see also Dell & Chang,

2013; Kleinschmidt & Jaeger, 2015; MacDonald, 2013). Here, we

provide evidence that individuals with the strongest sensitivity

to element predictability, those whose on-line predictions

increasingly aligned with the statistics of the input, ultimately

35


performed better on an off-line measure of learning. Finally,

these results contribute to a small but growing literature on

non-adjacent pattern learning that is not auditory-linguistic in

nature (e.g., involving perceptually similar tones, Creel,

Newport, & Aslin, 2004; certain types of alternating visual

sequences, Howard & Howard, 1997; or learning in dual-task

conditions, Conway & Christiansen, 2006); we demonstrate that the

processing of non-local dependencies shares a link to learning

that is neither domain nor modality specific.

Prior Work on Prediction and Artificial Grammar Learning

Prior work on artificial grammar processing has capitalized

on similar motor response measures to investigate facilitatory

processing in non-adjacent dependency learning (Amato &

MacDonald, 2010; Misyak et al., 2010a). However, key elements of

the present experimental design and analysis enabled us to expand

on these earlier studies. Misyak et al. (2010a) found strong

evidence of predictive processes in artificial language learning

by employing a mouse-clicking task. In their set-up, participants

were required to match auditorily and visually presented

36


pseudowords from the Gómez grammar. An increasingly facilitatory

effect on response time was observed for the dependent items in

the grammar, a result we replicate here. However, the present

study also includes control conditions that enable us to account

for potential effects of element frequency or general task

adaptation. Moreover, our self-paced paradigm addresses a

limitation of Misyak et al.—in that experiment, participants were

required to select between successive pairs of visually presented

pseudowords, narrowing the set of items that could occur in a

given position and potentially altering the nature of learning.

In addition, we have tested directly the relationship between on-

line prediction measures and ultimate learning outcomes; Misyak

et al. (2010a) concentrated on the link between prediction during

artificial grammar learning and sentence processing in natural

language (see also Misyak, Christiansen, & Tomblin, 2010b). We

have demonstrated that participants who develop more precise

predictions as they learn, and thus experience the on-line

facilitatory effects associated with having their expectations

met, also achieve higher off-line learning outcomes.

37


In a related study, Amato & MacDonald (2010) used self-paced

reading as a metric of artificial language learning, focusing

analyses on mean reading times after a period of extensive

training on a complex grammar. Here, we investigate the

cumulative effect of stimulus exposure (i.e., the role of

prediction-based processes throughout learning, as opposed to

resulting from it). Interestingly, the authors found that,

despite facilitatory RT effects on predictable pseudowords,

participants were unsuccessful in performing a sentence

completion task. In contrast, participants in Misyak et al.’s

mouse-clicking study achieved above-chance scores on a similar

completion task with a different underlying grammar. While our

post-test was comparatively more implicit, our debriefing

questionnaire allowed us to determine participants’ level of

awareness of the non-adjacent dependency. In the next section, we

perform a follow-up analysis that probes whether individual

differences in explicit awareness might explain some of the

observed RT effects.

38


Follow-up Analyses

The role of explicit awareness. In both Amato & MacDonald

(2010) and Misyak et al. (2010a), participants made relatively

explicit off-line judgments about the grammatical structure of

the input. Participants were not asked whether a grammatical or

ungrammatical string “looked familiar” (as they were in the

current experiment), but rather selected an appropriate

pseudoword to fill in a missing spot in a sequence. To reveal any

potential effects of differences in the post-test structure

(i.e., the extent to which learners were tested on explicit

knowledge), we used data from our debriefing questionnaire to

examine the influence of explicit, verbalizable awareness on the

learning process. Participants from the Structured condition were

categorized as “aware” (n=7) if they indicated that the first and

the third elements in each triplet shared a dependent

relationship and “unaware” (n=12) if they did not explicitly

describe this pattern. Participants from the Unstructured

condition, in which there was no non-adjacent dependency to

verbalize, are not included in these analyses. In the Structured

39


group, we find that participants with explicit knowledge of the

A-B dependency significantly outperformed participants without

this explicit knowledge (mean proportion correct for aware=0.83;

unaware=0.60; t(17)=3.1, p<0.01). This correlation needs to be

interpreted with caution; it is possible that better performance

on the post-test or factors that lead to higher performance on

the post-test make it more likely that participants become

explicitly aware of the structure of the language. In this

context, it is also interesting that lack of explicit knowledge

did not preclude above chance performance on post-test. Post-test

scores in the “unaware” group still, on average, differed

significantly from chance t(11)=2.4, p=0.04).

We then evaluated the effect of awareness on RTs during

exposure by repeating the logistic analysis from Step 2 with an

additional predictor signifying participants’ explicit awareness

of the non-adjacent dependency 6. Corrected RTs from the

Structured condition were regressed onto all main effects and

interactions of Predictability, Region, and Awareness (“aware” or

6 To be clear, this analysis has limitations in that we cannot determine the precise point at which each participant became aware.

40


“unaware”). All predictors were centered to reduce collinearity

(all rs<0.6). Results indicate that participants with explicit

awareness of the regularity tended to have slower overall RTs,

though this effect was not significant (β = 0.09, t=1.60,

p=0.13). Thus, it is potentially the case that participants who

engaged in more deliberative processing had explicit awareness of

the underlying A-B regularity, but our results are not conclusive

in this respect. We find no significant interaction between

awareness and any of the other predictors (Table 3), and our

original effects of Predictability (β = −0.17, t = −4.25,

p<0.001) and Region (β = −0.06, t = −2.5, p=0.02) are maintained.

Their interaction is marginally significant (β = −0.06, t =

−1.97, p=0.07). Note that the absence of a significant three-way

interaction between Predictability, Region, and Awareness

indicates that the on-line generation of predictions throughout

dependency learning is not necessarily an implicit process. The

previously discussed correlation between post-test performance

and glyph probability further shows that learners who were better

predictors were ultimately better performers, regardless of

explicit knowledge of the dependency.

41


~~~~~~~~~Table 3~~~~~~~~~~

Conclusions and Outstanding Questions

In sum, we have provided evidence of predictive processing

in the context of exposure to a visual artificial grammar.

Reaction times revealed a progressive facilitatory effect for

predictable items, suggesting that predictions, when they are

favorably resolved with subsequent input, speed up processing of

temporally ordered elements. Furthermore, we observed that those

participants who developed the most accurate predictions, and

thus increasingly experienced their expectations being met,

performed better on a post-test requiring familiarity judgment.

We have therefore demonstrated a link between on-line and off-

line measures of learning. While this correlation does not allow

us to make specific claims about the directionality of that

relationship, it strongly indicates a tight coupling between the

generation of implicit expectations, in this case the speed up on

statistically dependent elements, and a commonly used metric of

learning outcome, familiarity judgments following exposure.

42


By developing and validating a learning task that enables us

to trace, at an individual differences level, learners’ ability

to generate accurate predictions within a novel environment, we

have opened up exciting possibilities for future research. For

example, one might ask which factors might either speed up or

perturb the process of successfully generating predictions within

various learning contexts (e.g., the complexity or stationarity

of the input). Alternatively, this fine-grained metric of on-line

processing might also be used to predict acquisition in much

larger-scale learning tasks (e.g., second language learning). Are

faster, more accurate predictors better natural language

learners? This question would build on previous observations that

performance on a simple statistical learning task correlates with

second language literacy outcomes (Frost, Siegelman, Narkiss, &

Afek, 2013).

Of final note, our paradigm has provided evidence for the

formation of prediction when learners lack strong prior

expectation about underlying structure (because we used a set of

unfamiliar glyphs). Self-paced processing paradigms have also

been used to examine how expectations based on prior natural

43


language experience (e.g., from one’s native language) can be

adapted to unexpected distributions during comprehension. Fine et

al. (2013), for example, found that a priori infrequent syntactic

structures, which typically incur a processing cost, are read

increasingly faster in a context in which they are more probable

(i.e., the structures become expected). It remains an important

outstanding question whether statistical learning, as examined in

artificial worlds with novel stimuli, and adaptation or priming

effects in native language comprehension rely on a common

learning mechanism.

Acknowledgments

This research was supported by NSF Career Grant IIS-1150028 to

TFJ, and NSF GRFs to EAK and ABF.

References

Altmann, G.T.M., & Mirkovic, J. (2009). Incrementality and

prediction in human sentence

processing. Cognitive Science, 33, 1–27.

Amato, M.S., & MacDonald, M.C. (2010). Sentence Processing in an

Artificial Language:

44


Learning and Using Combinatorial Constraints. Cognition, 116,

143–148.

Arai, M., & Keller, F. (2013). The use of verb-specific

information for prediction in sentence

processing. Language and Cognitive Processes, 28, 525–560.

Baayen, R.H., Davidson, D.J., & Bates, D.M. (2008). Mixed effects

modeling with crossed

random effects for subjects and items. Journal of Memory and

Language, 59, 390–412.

Bannard, C., & Matthews, D. (2008). Stored word sequences in

language learning. Psychological

Science, 19, 241–248.

Bates, D., Maechler, M., Bolker, B. & Walker, S. (2014). lme4:

Linear mixed-effects models

using Eigen and S4_. R package version 1.1-7, <URL:

http://CRAN.R-project.org/package=lme4>.

Chang, F., Dell, G. S., Bock, K., & Griffin, Z. M. (2000).

Structural priming as implicit learning:

A comparison of models of sentence production. Journal of

Psycholinguistic Research, 29, 217–229.

45

ttp://CRAN.R-p


Chang, F., Dell, G. S., & Bock, K. (2006). Becoming syntactic.

Psychological Review, 113, 2,

234–272.

Conway, C.M., & Christiansen, M.H. (2006). Statistical learning

within and between modalities: Pitting abstract against

stimulus-specific representations. Psychological Science, 17, 905–

912.

Creel, S. C., Newport, E. L., & Aslin, R. N. (2004). Distant

melodies: Statistical learning of non-

adjacent dependencies in tone sequences. Journal of Experimental

Psychology: Learning, Memory, and Cognition, 30, 1119–1130.

Dell, G.S., & Chang, F. (2014). The P-Chain: Relating sentence

production and its disorders to

comprehension and acquisition. Philosophical Transactions of the

Royal Society B, 369, 1471–2970.

Farmer, T. A., Fine, A. B., Yan, S., Cheimariou, S., & Jaeger, T.

F. (2014). Syntactic expectation

adaptation in the eye-movement record. In P. Bello, M.

Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the

46


36th Annual Meeting of the Cognitive Science Society (pp. 2181–

2186). Austin, TX: Cognitive Science Society.

Ferreira, F., & Clifton, C. E. (1986). The independence of

syntactic processing. Journal of

Memory and Language, 25, 348–368.

Fine, A.B., & Jaeger, T.F. (2013). Evidence for implicit learning

in syntactic

comprehension. Cognitive Science, 37, 578–591.

Fine, A. B., Jaeger, T. F., Farmer, T. A., & Qian, T. (2013).

Rapid expectation adaptation during

syntactic comprehension. PLoS ONE, 8, e77661.

Fine, A.B., Qian, T., Jaeger, T.F., & Jacobs, R.A. (2010).

Syntactic Adaptation in Language

Comprehension. In Proceedings of the 2010 Workshop on Cognitive

Modeling and Computational Linguistics (pp. 18–26). Uppsala, Sweden:

Association for Computational Linguistics.

Frost, R., Siegelman, N., Narkiss, A., & Afek, L. (2013). What

predicts successful literacy

acquisition in a second language? Psychological Science, 24, 1243–

1252.

47


Garnsey, S. M., Pearlmutter, N. J., Myers, E., & Lotocky, M.

A. (1997). The contributions of

verb bias and plausibility to the comprehension of

temporarily ambiguous sentences. Journal of Memory and

Language, 37, 58–93.

Gómez, R. L. (2002). Variability and detection of invariant

structure. Psychological Science,

13, 431–436.

Hahne, A., & Friederici, A. D. (1999). Electrophysiological

evidence for two steps in syntactic

analysis: Early automatic and late controlled processes.

Journal of Cognitive Neuroscience, 11, 194–205.

Hanulíková A., van Alphen P. M., van Goch M. M., & Weber A.

(2012). When one person’s

mistake is another’s standard usage: the effect of foreign

accent on syntactic processing. Journal of Cognitive Neuroscience, 24,

878–887.

Howard, J.H., & Howard, D.V. (1997). Age differences in implicit

learning of higher order

48


dependencies in serial patterns. Psychology and Aging, 12, 634–

656.

Hunt, R.H., & Aslin, R.N. (2001). Statistical learning in a

serial reaction time task: Simultaneous

extraction of multiple statistics. Journal of Experimental

Psychology: General, 130, 658–680.

Jaeger, T. F., & Snider, N. (2013). Alignment as a consequence of

expectation adaptation:

Syntactic priming is affected by the prime’s prediction

error given both prior and recent experience. Cognition, 127,

57–83.

Jurafsky, D. (1996). A Probabilistic model of lexical and

syntactic access and

disambiguation. Cognitive Science, 20, 137–194.

Just M.A., Carpenter P.A., & Woolley J.D. (1982). Paradigms and

processes in reading

comprehension. Journal of Experimental Psychology: General, 111, 228–

238.

Kamide, Y. (2012). Learning individual talkers' structural

preferences. Cognition, 124, 66-71.

49


Karuza, E.A., Newport, E.L., Aslin, R.N., Starling, S.J.,

Tivarus, M.E., & Bavelier, D. (2013).

Neural correlates of statistical learning in a word

segmentation task: An fMRI study. Brain and Language, 127, 46–

54.

Kleinschmidt, D.F., & Jaeger, T.F. (2015). Robust speech

perception: Recognize the familiar,

generalize to the similar, and adapt to the

novel. Psychological Review, 122, 148–203.

Kuperman, V., Dambacher, M., Nuthmann, A., & Kliegl, R. (2010).

The effect of word position

on eye-movements in sentence reading: A cross-corpora

analysis. Quarterly Journal of Experimental Psychology, 63, 1838–1857.

Kuznetsova, A., Brockhoff, P.B., & Christensen, R.H.B. (2015).

lmerTest: Tests in Linear

Mixed Effects Models. R package version 2.0-25. <URL:

http://CRAN.R-project.org/package=lmerTest>.

Levy, R. (2008). Expectation-based syntactic comprehension.

Cognition, 106, 1126–1177.

50

ttp://CRAN.R-p


Lew-Williams, C. & Fernald, A. (2007). Young children learning

Spanish make rapid use of

grammatical gender in spoken word recognition. Psychological

Science. 18, 193–198.

MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994).

The lexical nature of

syntactic ambiguity resolution. Psychological Review, 101, 676–

703.

MacDonald, M. C. (2013). How language production shapes language

form and

comprehension. Frontiers in Psychology, 4, 1–16.

McDonald, S. A., & Shillcock, R. C. (2003). Low-level predictive

inference in reading: The

influence of transitional probabilities on eye

movements. Vision Research, 43,1735–1751.

McRae, K., Spivey-Knowlton, M. J., & Tanenhaus, M. K. (1998).

Modeling the influence of

thematic fit (and other constraints) in on-line sentence

comprehension. Journal of Memory and Language, 38, 283–312.

51


Misyak, J.B., Christiansen, M.H., & Tomblin, J.B. (2010a).

Sequential expectations: The role of

prediction-based learning in language. Topics in Cognitive Science,

2, 138–153.

Misyak, J. B., Christiansen, M. H., & Tomblin, J. B. (2010b). On-

line individual differences in

statistical learning predict language processing. Frontiers in

Psychology, 1, 1–9.

Newport, E. L., & Aslin, R. N. (2004). Learning at a distance I.

Statistical learning of non-

adjacent dependencies. Cognitive Psychology, 48, 127–162.

Nissen, M. J., & Bullemer, P. (1987). Attentional requirements of

learning: evidence from

performance measures. Cognitive Psychology, 19, 1–32.

R Core Team (2014). R: A language and environment for statistical

computing. R Foundation for

Statistical Computing, Vienna, Austria. URL http://www.R-

project.org/.

Romberg, A.R. & Saffran, J.R. (2010). Statistical learning and

language acquisition. Wiley

52

http://www.R-project.org/

http://www.R-project.org/


Interdisciplinary Reviews: Cognitive Science, 1, 906–814

Saffran, J., Aslin, R., & Newport, E. (1996). Statistical

learning by 8-month-old infants. Science,

274, 1926–1928.

Saffran, J.R., Newport, E.L., & Aslin, R.N. (1996). Word

segmentation: The role of

distributional cues. Journal of Memory and Language, 35, 606–621.

Seidenberg, M. S., & MacDonald, M. C. (1999). A probabilistic

constraints approach to

language acquisition and processing. Cognitive Science, 23, 569–

588.

Smith, N.J., & Levy, R. (2013). The effect of word predictability

on reading time is logarithmic.

Cognition, 128, 302–319.

Trueswell, J. C., Tanenhaus, M. K., & Garnsey, S. (1994).

Semantic influences on parsing: Use

of thematic role information in syntactic ambiguity

resolution. Journal of Memory and Language, 33, 285–318.

Whelan, R. (2008). Effective analysis of reaction time data. The

Psychological Record, 58, 475–

53


482.

Wood, S.N. (2011) Fast stable restricted maximum likelihood and

marginal likelihood

estimation of semiparametric generalized linear models.

Journal of the Royal Statistical Society (B), 73, 3–36.

54


Figure 1. Directed graph displaying the underlying triplet

structure of each of the three language conditions: Structured,

Unstructured, and Random. Edges represent the mean transitional

probabilities between pairs of items in each triplet region.

Nodes represent the glyphs that appear in each region. Note how

region and glyph type (A, X, or B) are perfectly correlated only

in the highly regular Structured condition; in the other two

conditions, glyphs appear in the each of the three presentation

slots and transitional probabilities do not vary across edges.

55


Table 1. Experiment design, including ordering of tasks, number

of trials, and behavioral data collected.

Phase Task Trial N Measure1. Familiarization Glyph matching 30 N/a2. Exposure A. Self-paced

presentation oftriplets

B. Intermittentcatch trials

432

144

Processing time/ glyph (ms)

Accuracy

3. Post-test Familiarity judgments

12 Accuracy

4. Debriefing Record explicitknowledge aboutthe structure of the languages

7 Self-reported awareness

56


Figure 2. Example of a single triplet trial. Each trial began

with a row of dashes. Participants advanced each item in the

sequence by pressing the space bar. Response times between the

initiation of successive elements were recorded, revealing the

duration each glyph was present on the screen.

57

Running head: PREDICTION DURING LEARNING 58

Running head: PREDICTION DURING LEARNING 59


Figure 3. Change in log-transformed RTs for each Region, smoothed

with a generalized additive function in light of high degrees of

non-linearity present in the data. In each of the three

conditions, Random (top plot), Structured (middle plot), and

Unstructured (lower plot), we observe a pronounced facilitatory

effect for Region 3 (blue), motivating the residualization

approach we employed. Note, however, the robust learning

signature unique to the Structured condition: higher overall RTs

for the unpredictable Region 2 glyphs (purple) coupled with a

growing difference in processing time for Region 1 and Region 3,

shown in orange and blue respectively). Shading represents 95%

confidence intervals.

60


Figure 4. Corrected (residual log-transformed) RTs for the Random

condition plotted across experimental trials. Values centered on

0 indicate that general task and region-specific effects were

successfully removed. Error bars represent 95% confidence

intervals.

61


Figure 5. Corrected (residual log-transformed) RTs plotted across

trials for the Structured (left panel) and Unstructured (right panel)

conditions. The facilitatory effect on Region 3 is strikingly

clear for the Structured condition. That is, as participants

learn the non-adjacent dependency of the A-X-B grammar, they

“read” B items increasingly faster over time (blue). In contrast,

there is no evidence of a prediction-based learning in the

62


Unstructured condition (Regions 1 and 3 do not dissociate over

time). Error bars represent 95% confidence intervals.

Table 2. Coefficients

(and corresponding t-

values) for each

predictor in the model

presented in Step 2 of

our analyses comparing

the Structured and

Unstructured conditions. Significant values (determined using the

Sattherwaite approximation and corresponding to p<0.05) are

bolded.

63

Predictor Estimate

T-value

Predictability −0.10 −5.10Region 0.005 0.23Condition 0.03 0.67Predictability*Region −0.02 −1.85Predictability*Condition

−0.06 −3.13

Region*Condition −0.02 −1.00Predictability*Region*Condition

−0.04 −3.03


A

B

64


Figure 6. Relationship between RT sensitivity to increasing

element probability and off-line grammaticality judgments. We

find no significant correlation between the effect of Region 1

Predictability and ultimate off-line learning performance in

either condition (top plot, A). However, the bottom plot (B)

demonstrates a significant correlation between sensitivity to

Predictability for Region 3 and post-test performance that is

specific to the Structured condition (red). That is, the faster

an individual’s RTs on the perfectly predictable (B) items, the

greater their ultimate performance at post-test grammaticality

judgments. Shaded areas represent 95% confidence intervals.

65


Table 3. Coefficients (and corresponding t-values) for each

predictor in a model examining the effect of explicit awareness

of grammatical structure on residual-log RTs. Significant values

66


(determined using the Sattherwaite approximation and

corresponding to p<0.05) are bolded.

67

Predictor Coefficient

T-value

Predictability −0.17 −4.25Region −0.06 −2.50Awareness 0.09 1.60Predictability*Region −0.06 −1.97Predictability*Awareness

−0.04 −1.00

Region*Awareness −0.0006 −0.02Predictability*Region*Awareness

−0.003 −0.08


Appendix A

68