Memory & Cognition2000, 28 (4), 648-656
When does inconsistency hurt? On the relationbetween phonological consistency effects
and the reliability of sublexical units
HElKE MARTENSEN, ERIC MARIS, and TONDIJKSTRAUniversity ofNijmegen, Nijmegen, The Netherlands
Phonological consistency describes to what extent a letter string in one word is pronounced equallyin other words. Phonological reliability describes to what extent a sublexical unit is usually consistent throughout a language. The relationship between the two concepts was investigated by comparing fivesublexical units (onset-consonants, vowel, end-consonants, and the concatenation of the vowelwith begin- or end-consonants) in Dutch and English with respect to their reliability and to how theirconsistency was related to naming errors and latencies. In a regression analysis, naming latencies anderrors on genuine Dutch words (consistent) and imported words (inconsistent) were predicted by thephonological consistency of the five units. The same was done for two sets of English naming data. Consistency had a much stronger effect in Dutch than in English naming studies when all five units wereconsidered. The special role of the vowel plus end-consonants, which has been found in English naming data, could not be demonstrated in Dutch. In both languages, the size of consistency effects mirrors the reliability of the five units.
In languages with an alphabetic writing system, a systematic relation exists between graphemes and phonemesfor each word. Traditionally, it has been assumed that thepronunciation of a written word is generated either byclustering letters into graphemes and translating them intosingle sounds or by associating the written word as a wholewith its complete phonological code (see, e.g., Coltheart,1978). However, almost as old as this tradition is the notion that the translation might take place at the level ofsublexical units that consist of more than one graphemeor phoneme but are smaller than the whole word, such asmorphemes, syllables, or consonant and vowel clusters.In the following, we will refer to the consonants at the beginning ofa monosyllabic word as the onset, to the vowelas the nucleus, and to the consonants following the vowelas the coda. The concatenation of nucleus and coda willbe called the body, and the concatenation ofthe onset andthe nucleus the oncleus,' As an example, consider theword SHARP: its onset is SH, the nucleus is A, the coda isRP, the body is ARP, and the oncleus is SHA. In this paper,we will examine the role of these sublexical units in reading words aloud in two languages, English and Dutch,and consider possible reasons for the differential reliance
We are deeply grateful to Rebecca Treiman, John Mullennix, RankaBijeljac-Babic, and Daylene Richmond-Welty for sharing the data fromtwo naming studies with us. We also thank Rebecca Treiman, Debra Jared,David Plaut, Robert Lorch, Jonathan Grainger, and Herbert Schriefersfor their helpful comments on earlier drafts ofthis paper. Correspondenceconcerning this article should be addressed to H. Martensen, NijmegenInstitute of Cognition and Information, University of Nijmegen, P. O.Box 9104, 6500 HE Nijmegen, The Netherlands (e-mail: [email protected]).
on particular sublexical units in each language. First, wewill consider results from English naming studies and relate them to statistical regularities in the English language.Later, we will discuss the same language statistics forDutch and formulate hypotheses for the outcome of aDutch naming experiment.
In English, there is evidence that the onset and the bodyof monosyllabic words play an important role in readingaloud. Bowey (1990, 1993) showed that priming a lowfrequency word with its body decreases the naming latency more than priming it with another equally large segment (e.g., priming BITE with ITE vs. priming GRIN withRIN). Treiman and Zukowski (1988) demonstrated thatthe pronunciation ofvowels in nonwords is affected moreby the following consonants than by the preceding ones.Treiman and Chafetz (1987) compared words presentedwith slashes inserted between onset and body with wordswith slashes between the oncleus and the coda. Lexicaldecisions were faster on stimuli like CLi/AIM than onstimuli like CLAI//M.
These findings are reflected in the convention of expressing phonological consistency in terms of the wordbody (Andrews, 1982; Bowey, 1996; Brown, 1987; Brown& Watson, 1994; Glushko, 1979; Jared, 1997; Jared, MeRae, & Seidenberg, 1990; Kay & Bishop, 1987; Seidenberg, Waters, Barnes, & Tanenhaus, 1984; Taraban &McClelland, 1987; Ziegler, Stone, & Jacobs, 1997). Phonological consistency expresses to what extent letter stringsthat are spelled identically across words are also pronounced identically. In the following, words that share anorthographic sublexical unit are called neighbors withrespect to that unit (for instance, MINT and LINT are bodyneighbors because they share their body). Two neighbors
Copyright 2000 Psychonomic Society, Inc. 648
(1)
are called phonologically consistent if the unit involvedis also pronounced the same (for example, MINT and LINT
are consistent with respect to their body,whereas MINT andPINT are inconsistent). Neighbors that are pronouncedconsistently will also be calledfriends, and neighbors thatare pronounced inconsistently will be called enemiesI
Treiman, Mullennix, Bijeljac-Babic, and RichmondWelty (1995) connected the notion of phonological consistency to the role that different sublexical units play inreading words aloud. These authors analyzed to what extent naming latencies from two large-scale English naming studies could be predicted from the consistency offive sublexical units: onset, nucleus, coda, body, and oncleus. The logic ofthis analysis was to test whether readers rely on one particular sublexical unit, by estimatinghow much naming latencies are prolonged if the unit inquestion has an inconsistent pronunciation. Inconsistencies in the onset and the body prolonged naming latencies more than did inconsistencies in any other unit. Inthe following, this pattern of results will be called theonset-body pattern. In this study, we examine the resultsofa similar analysis ofDutch naming data and relate theoutcome in each language to its statistical regularities.
Statistical Regularities in EnglishTreiman et al. (1995) and Bernstein and Treiman (in
press) suggested that the onset-body pattern results fromcertain regularities in the English language picked up byreaders. They showed that the onset and the body are themost informative units with respect to three different aspects: the correspondence between spelling and sound,the co-occurrence of orthographical units, and the cooccurrence of phonological units.
A thorough analysis of the correspondence betweenspelling and sound in different sublexical units was givenin the first part of the paper by Treiman et al. (1995).They studied the phonological reliability of a sublexicalunit. Phonological reliability indicates how well a readercan generate the correct pronunciation ofa sublexical unitwithout taking other context letters into consideration. Awritten sublexical unit is reliable if it has one dominantpronunciation that is correct in most of the cases andother pronunciations do not exist or occur only in veryfew words. For example, the body INT is pronounced Imtlin almost all cases (HINT, MINT, LINT, etc.). There is onlyone alternative pronunciation, IAmtl, with a rather lowprobability because it appears in just one word (PINT).
This makes /mt/ a very good guess ofhow to pronounceINT if one does not have any additional information. Therefore, INT is rather reliable. Compared with this, the bodyOUGH is very unreliable. There are many possible pronunciations (BOUGH, COUGH, TOUGH, etc.), and moreover,none of them is dominant. Without knowing the contextletters, it is impossible to say how this sublexical unit ispronounced. Therefore, OUGH is phonologically unreliable.
Treiman et al. (1995) measured phonological reliability with the H-statistic (so-called by Fitts & Posner, 1967;
WHEN DOES INCONSISTENCY HURT? 649
introduced as an information statistic by Shannon &Weaver, 1949). It is calculated as follows:
H(u) = ±PU} IOg2(_I~.),} PUJ
where J is the number ofpossible pronunciations for thesublexical unit u, andpu}denotes the proportion ofwordsin which unit u is associated with the pronunciation j.The termp)og2(l/p) is added for each possible pronunciation ofa sublexical unit. The value ofH is large if thereare many equally likely pronunciations (i.e., phonologically unreliable), and small if there is one pronunciationwith a very high probability and other pronunciations arevery unlikely or do not exist (i.e., reliable).
Treiman et al. (1995) calculated the H-statistic for eachof the five sublexical units (onset, nucleus, coda, body,and oncleus) of an exhaustive collection ofall monosyllabic words with a CVC phonological structure. It turnedout that the mean H-statistic for bodies was lower thanthat for onclei. The relation between spelling and soundis more reliable in the body than in the oncleus. Thus, relying on the body will lead to the correct pronunciationmore often than relying on the oncleus.
The special role of the body with respect to the cooccurrences oforthographic units was demonstrated byBernstein and Treiman (in press). They counted the number of different onsets, codas, and nuclei in the Englishwriting system, as well as the number ofdifferent bodiesand onclei. There are more codas than onsets, but nevertheless, there are fewer bodies than onclei. The nucleusis more restrictive with respect to the coda than to theonset. Inother words, within the body, there is a strongerconstraint on which letters can be combined and whichcannot than in the oncleus. As an example of such a constraint, consider the nucleus AI and the coda K: Both appearin several words, but there is no word with the body AIK.
A similar analysis with respect to the pattern of cooccurrences ofphonological units was provided by Kessler and Treiman (1997). They calculated the expected frequencies of bodies and onclei under the assumption ofstatistical independence. These calculations were basedon the frequency of their constituent parts. For example,the vowel 101 appears with medium frequency in the nucleus, and the consonant III appears with medium frequency in the coda. Therefore, the expected frequencyfor the body loll is also medium. However, loll is a highfrequency combination (COAL, BOWL, DOLE, etc.). Likewise, the vowel lrel appears with medium frequency inthe nucleus. Therefore, the body lre1l also is expected tohave a medium frequency. Yet, there are only two words(PAL and SHALL) with that body. Kessler and Treimanshowed that for about 30% of the words, the body is either more or less frequent than would be expected ifnuclei and codas were randomly combined. Only for 7% ofthe words does the frequency ofthe oncleus deviate fromthe frequency that would be expected when onset and nu-
650 MARTENSEN, MARIS, AND DIJKSTRA
cleus are randomly combined. Kessler and Treiman concluded that the nucleus gives information about the codabut not about the onset. Like the English orthography, thephonology induces an onset-body structure.
Treiman and her co-workers (Bernstein & Treiman, inpress; Treiman et al. 1995) concluded that "fluent readersappear to have internalized the statistical regularities ofEnglish; they implicitly know that ve 2 units [bodies] arebetter guides to pronunciation than C IV units [onclei]"(Treiman et al., 1995, p. 124). However, the body is moreinformative than the oncleus with respect to three aspects:(1) spelling-sound correspondence, (2) co-occurrences inorthography, and (3) co-occurrences in phonology. It remains unclear which of these aspects is responsible forthe specific reliance on onset and body that has been observed in English readers or whether the onset-body pattern is due to a combination of the three. To evaluate therelative contribution ofeach aspect, it is interesting to analyze naming latencies obtained from languages in whichthe onset-body structure is not induced by all three aspects. In this paper, we analyze the correspondence between spelling and sound and the co-occurrences in orthography and phonology in Dutch.
Analysis ofthe Relation BetweenSpelling and Sound in Dutch
In English, the correspondence between spelling andsound is much stronger in the body than in the oncleus.Tocompare English and Dutch with respect to the spellingsound correspondence, we calculated the phonological reliability for the five sublexical units in Dutch. Like Treiman et al. (1995), we used the H-statistic (Fitts & Posner,1967) as a quantification ofreliability. Our reference lexicon contained 2,671 monosyllabic Dutch words. (For thedetails of our reference lexicon and the assignment ofletters and phonemes to the sublexical units, see Appendix A.) The H-statistic was calculated according to Equation 1.The results are presented in Table 1, together withthe results of Treiman et al.'s analysis of English words.
Compared with English, the pronunciation of writtenDutch is rather straightforward. For the majority of sublexical units, only one pronunciation exists, in which casethe H-statistic becomes zero. The few exceptions aremostly due to words imported from other languages. Thenucleus is slightly more unreliable than other sublexical
Table 1H-Statistic for Sublexical Units of English (Treiman, Mullenix,
Bijelac-Babic, & Richmond-Welty, 1995) and Dutch Words
English Dutch
Sublexical Unit M SD M SD
Onset .12 .24 .01 .05Nucleus .73 .68 .06 .06Coda .26 .39 .02 .14Oncleus .30 .43 .02 .07Body .14 .32 .02 .05
Note-Means (high valuesindicate low reliability) are calculatedacrossall words. N(English) = 1,329; N(Dutch) = 1,671.
units. However, compared with English, in Dutch all sublexical units are equally unambiguous. As a consequence,a strategy based on the correspondence between spellingand sound will induce an onset-body pattern in Englishword naming, but not in Dutch. In Dutch, the smaller sublexical units are relatively reliable, so readers might either make no use of larger units (i.e., body and oncleus)or, if they do, use both body and oncleus to an equal extent. Consequently, if the onset-body pattern in Englishnaming data is caused by the correspondence betweenspelling and sound, there should not be an onset-bodypattern in Dutch naming latencies.
Analysis of Co-occurrence Patternsin Dutch Orthography and Phonology
In English, a stronger relation exists between nucleusand coda than between nucleus and onset-both in phonology and in orthography. This makes the body the moreinformative unit in both systems. To investigate whetherthe same holds for Dutch, we analyzed the co-occurrencesamong orthographic units and among phonological unitsin our Dutch lexicon in the same way as Bernstein andTreiman (in press) did.
The reference lexicon was the same as that in the previous analysis. We counted how many different cases existed ofevery type of sublexical unit. The results are presented in Table 2.
For phonological units, there are more different codasthan different onsets, but there are fewer different bodiesthan onclei. For the larger sublexical units, body and oncleus, constraints exist on the possible combinations ofthe smaller sublexical units onset, nucleus, and coda. Thisis shown in the columns on the right-hand side ofTable 2.Only 44% ofthe possible combinations ofonset and nucleus phonemes actually occur as onclei in our phonological lexicon. Likewise, only 23% ofthe possible combinations of nuclei and codas occur as bodies in our lexicon.
For orthographic units, the same pattern ofresults wasobtained. There are more codas than onsets, yet there arefewer bodies than onclei. Ofthe possible combinations ofonset and nucleus, 28% occur as oncleus, whereas only13% of the possible combinations between nucleus andcoda occur as body in our lexicon.
From these results, we can conclude that the constraintson the body are stronger than those on the oncleus. We canalso see that both larger units-body and oncleus-putsome constraint on the smaller units. Thus, with respectto its orthography and its phonology, the structure ofDutch resembles that ofEnglish. If the constraints on cooccurrences within one ofthese two levels (orthography,phonology) cause the onset-body pattern in English naming data, the same pattern should be found in Dutch naming data.
A Regression Analytical Study in DutchWe wanted to establish whether the onset-body pat
tern found in English naming data could be replicated inDutch. This should be the case when readers' reliance on
Table 2Number of Different Cases of Each Sublexical Unit
in Phonology and in Orthography
Absolute number of Percent ofDifferent Cases Possible Combinations
Sublexical Unit Phonology Orthography Phonology Orthography
Onset 63 66Nucleus 23 38Coda 86 129Oncleus 648 709 44% 28%Body 460 616 23% 13%
Note-The percentagesare calculatedwith respect to the productof absolute numbersofthe constituent units. Forinstance,there are 63 different phonologicalonsets and 23 different nuclei, which results in 1,449possible combinationsto form an oncleus. However, only 648 (44%) ofthese combinationsactually occur in natural language.
sublexical units is determined by either orthographic orphonological constraints. Conversely, the onset-bodypattern should not be found in Dutch when only the relation between orthography and phonology is essentialfor the reliance on different sublexical units. In that case,Dutch readers should rely no more on the body than onthe oncleus, and altogether they should rely on the smallunits (onset, nucleus, and coda) to a greater extent thando English readers.
Like Treiman et al. (1995), we measured the effect ofa unit's phonological consistency on naming latenciesand errors to establish whether readers relied on this particular unit. To do this, one major obstacle had to be overcome. How do we select phonologically inconsistent wordsin Dutch-a language that was chosen because of itshighly consistent spelling-to-sound system? The pronunciation of genuine Dutch words is straightforward, butthere are several frequently used words that are importedfrom other languages (mostly English and French) thathave an exceptional spelling-to-sound relation.
However, the number of inconsistent words is rathersmall. Moreover, the consistency of the different units arenot independent from each other. This is especially truefor the large units (oncleus and body) and their smallerconstituents (onset, nucleus, and coda). Although sometimes the large units resolve the inconsistency of the smallunits, very often the large units are simply inconsistentwhen one of their components is. Owing to these dependencies between the consistency of the different units, itis not possible to form sufficiently large groups ofDutchwords that differ only in the consistency ofone particularunit and are comparable with respect to the consistency ofall other units and possible additional variables (e.g.,word frequency). This problem can be dealt with in a regression-analytical design in which variations in naminglatencies and errors are explained by the degree ofphonological consistency ofdifferent units and word frequency.Actually, there are two ways to handle dependencies between predictors: simultaneous and hierarchical testing.
Simultaneous testing should be applied when there isno further knowledge about the origin of the dependen-
WHEN DOES INCONSISTENCY HURT? 651
cies. All the predictors are entered into the regressionequation simultaneously, and for each of them it is calculated how much variance that cannot be explained by anyof the other predictors they account for. In this method,all variance that can be explained by more than one predictor is ignored.
However, when theoretical considerations allow attributing variance that is shared between two predictorsto one ofthem, one can also apply hierarchical testing. Wedivided our predictors into three priority levels: (1) wordfrequency, (2) consistency of the small units (onset, nucleus, and coda), and (3) consistency of the large units(oncleus and body). In this way, variance that could beattributed either to word frequency or to another variablewas always attributed to word frequency. For the smallunits, we only considered the variance they could explainover and above what could also be explained by word frequency, and for the large units, we only considered thevariance they could explain over and above the variancethat was already explained by word frequency and consistency of the small units. With this hierarchical testingscheme, we could establish to what extent the consistency of the larger units was needed to explain the naming latencies.
From the relation between spelling and sound in Dutch,we derived the hypothesis that the small units would besufficient to explain naming latencies. However, if theco-occurrence patterns in orthography or phonology areresponsible for the way words are clustered in reading,the large units should explain variance in naming latencies that cannot be explained by the small units, and thebody should explain more variance than the oncleus.
METHOD
MaterialsThe reference lexicon for the calculation ofthe consistency mea
sures was the same as that in the analyses reported in the introduction (see Appendix A for details). From this large pool, 191 genuineDutch words were selected for the naming experiment. All thewords were monosyllabic nouns and three to five letters long. Noneof them had an indecent meaning or connotation.
Furthermore, we added all words from the reference lexicon witha proportion ofconsistent neighbors smaller than .50 for one of theunits, if they satisfied the selection criteria named above. This resulted in 28 additional words that had all been imported from otherlanguages (e.g., TANK or JEEP). For 17 ofthese words, the body alsoappeared in genuine Dutch words.'
ParticipantsThirty students (26 female, 4 male) from Nijmegen University
participated in the experiment. The mean age was 23.8 years (SD =3.42 years). The participants were either paid or received coursecredit for their participation. All the participants were native speakers ofDutch, fluent readers, and had normal or corrected-to-normalvision.
Apparatus and ProcedureEach trial began with a 300-msec presentation of a star in the
middle of the screen. After a blank screen of 700 msec, the targetword was presented. The word remained on the monitor until the
652 MARTENSEN, MARIS, AND DIJKSTRA
subject responded. The following trial began after an interval of1,000 msec.
Each participant saw 225 test items. The order was generated randomly for each participant, with the restriction that no two importedwords were presented consecutively. The critical items were presented in blocks of 50. In addition, there were 20 practice items inthe beginning of the experiment and 5 filler items at the beginningof each block. Naming latencies on practice and filler items werenot recorded. The participants were instructed to read the wordsaloud as quickly and as accurately as possible, with accuracy beingemphasized over speed.
The experiment was conducted on an LX II Apple Macintosh.The words were presented in 16-point Times script on a 12-in. monitor. Errors were recorded by the experimenter. A voice key wasused to register the beginning of the speech signal.
RESULTS
Data CleaningError trials because oftechnical problems with the voice
key were discarded. These problems involved prematuretriggering because a participant made some noise or retarded triggering because a participant did not speakloud enough. The mean percentage of such trials was1.7% (SD = 2.4%) per person. The remaining trialsformed the basis on which the percentage of pronunciation errors was calculated for each item. The mean percentage of pronunciation errors was 1.6% (SD = 5%).Only response times for trials without error were includedin the following analyses of response times.
Because the result ofa regression analysis is sensitiveto outliers, we excluded some items with extreme latenciesor extreme error values. Because the imported words alsohad extreme consistency values, the exclusion criteriafor genuine Dutch words and for imported words weredifferent. We calculated the mean latencies and errors forimported and genuine Dutch words. A genuine Dutchword with a mean error or mean latency of more than 3.47standard deviations above the mean for genuine Dutchwords was discarded. An imported word with a mean error or latency more than 2.93 deviations above the meanfor imported words was also excluded from the analysis.The critical values were due to different Bonferroni corrections within each group, so that the overall risk to excludeone item unjustly was .05 for genuine Dutch words, aswell as for imported words. This way, two imported andfour genuine words were excluded from the analysis.
Regression AnalysesTo establish the effect ofphonological consistency, we
conducted two regression analyses: hierarchical and simultaneous. In both cases, naming latencies and errorswere predicted by word frequency and the consistency ofthe five sublexical units. To test whether the contributionsofsingle variables or groups ofvariables were significant,we conducted repeated measures regression analyses(Lorch & Myers, 1990) that provide a basis for generalizing the observed effects to the participant population.In essence, taking into account individual differencesamounts to conducting an analysis for each participant and
calculating t tests on the resulting beta-weights. The procedure for testing the contribution of a group of severalvariables, added in a hierarchical regression analysis, isdescribed by Lorch and Myers.'
The predictors were word frequency and the phonological consistency of the onset, nucleus, coda, oncleus,and body. We imported the word-form frequency of occurrence from CELEX (Baayen, Piepenbrock, & vanRijn, 1993) and log-transformed it. The consistency ofeach unit was calculated by dividing the number offriendswith respect to that particular unit by the total number ofneighbors with respect to that unit.> In accordance withTreiman et al. (1995), we did not count a word as its ownneighbor and assigned unique words the consistency value O. In Appendix B, mean, standard deviation, and thedegree of collinearity are given for each predictor.
Hierarchical Regression AnalysisAs was described in the introduction, the predictors
were entered in three steps. In Step I, only word frequencywas entered; in Step 2, the consistencies of the onset, thenucleus, and the coda were added; and in Step 3, the consistencies of the oncleus and the body were entered. Theresults from this analysis are given in Table 3. For eachstep, the increase in explained variance owing to the addedpredictors is given. All the reported percentages of explained variance in naming latencies and in errors werecalculated on the mean naming latency (error) for eachitem, aggregated over participants.
For naming latencies, it can be seen in Step I that wordfrequency had a small but significant effect. In Step 2, theconsistencies of the small units-the onset, nucleus, andcoda-were added as predictors. This increased the percentage of explained variance by 41%. In Step 3, consistencies of the oncleus and the body were added to thepredictors. This increased the explained variance by asmall-although significant-amount. Apparently, thesmall units are almost sufficient to explain the variancein naming latencies.s
A word of caution is necessary. The order ofentrance ofthe predictors into the hierarchical regression analysis wasbased on theoretical considerations. A reversed order produced a different picture, because 29.3% ofthe variancesin naming latencies could be explained by both small and
Table 3Dutch Words: Explained Variance
in a Hierarchical Regression Analysis
Latency Error
Step Predictors R2 R~ncrease R2 R~ncrease
I Wordfrequency 4.1 4.1* 4.0 4.02 Wordfrequency, onset,
nucleus,coda 44.5 41.4* 40.6 36.63 Wordfrequency, onset,
nucleus,coda, oncleus, body 46.0 1.5* 41.7 l.l
Note-For the onset, nucleus, coda, oncleus, and body, the predictorwas the consistencyof that particular unit. Explained variance (R2) isgiven as a percentage. Significance testing on errors was omitted.*p(RTncrease IHo) < .01.
large units. However, the small units explained 11% ofthevariance in naming latency that cannot be explained bythe large units, whereas the large units explained only 1.5%ofthe variance that cannot be explained by the small units.This difference was significant [t(29) = 5.79, P < .Ol].
The results of the error analysis basically confirmedthe results found for latencies. There was a large percentage of variance in the errors that could be explained bythe consistency of the small units (onset, nucleus, andcoda), and the extra contribution of the large units (oncleus and body) was relatively small.
Simultaneous Regression AnalysisAll six predictors were entered into the regression
analysis simultaneously. For each predictor, it was calculated how much variance it could explain above whatcould be explained by any of the other predictors. Theresults are presented in Table 4.
With respect to naming latencies, the body did not havea stronger effect than the oncleus. In fact, the oppositewas true: The oncleus explained a larger percentage thanthe body. However, we conducted a t test on the uniquecontributions (sr2 ) of the oncleus and the body, and thedifference was not significant [t(29) = 0.034]. Furthermore, it can be seen that both large units had a smallunique contribution, as compared with the contributionsof the onset and the coda.
In the error analysis, the results with respect to the bodyand the oncleus were the same as those for the latencyanalysis. Within the small units, it was the coda, ratherthan the onset, that had the strongest effect on errors.
Reanalysis of English Naming DataThe structure of the present study was similar to the
study of Treiman et aI. (1995). However, in that study, alarge number of other variables (e.g., frequency of thesublexical units, bigram frequencies, and variables characterizing the onset phoneme) were included in the regression analyses. Owing to the relatively small numberof items in our study (213), we could not conduct a regression analysis with 42 predictors. To allow for a comparison between our Dutch data and the English data presented in Treiman et aI., we conducted hierarchical and
Table 4Dutch Words: Explained Variance
in a Simultanous Regression Analysis
Predictor Latency Error
Word frequency 0.2 0.2Onset 8.7t 0.0Nucleus 0.4* 0.0Coda 2.6t 13.9Oncleus 0.9t 0.5Body 0.4t 0.4
Note-For the onset, nucleus, coda, oncleus, and body, the predictorwas the consistency of that particular unit. Variance explained uniquelyby each predictor (sr2l is given as a percentage. Significance testing onerrors was omitted. *p(sr2
1 Hal < .05. tp(sr21 Hal < .0 I.
WHEN DOES INCONSISTENCY HURT? 653
simultaneous regression analyses, as described above (seeTables 5 and 6), on two sets ofEnglish naming data (Seidenberg & Waters, 1989, cited by Treiman et aI., 1995,and Treiman et aI., 1995).Wehad to omit significance testing, because this requires the nonaggregated data matrix,whereas we had only the mean item latencies and errors,aggregated over participants.
The most striking result is that all six predictors together accounted for a much smaller percentage of thevariance in naming latency for English than for Dutch.This result reflects the much more complex relation between spelling and sound in English. Step 2 shows that especially the small units, which had a large impact in Dutch,explained a comparatively small amount of variance inEnglish naming latencies. The onset-body pattern canbe seen in the variance uniquely explained by each variable: In all four analyses, the onset and the body were thetwo units with the largest percentage ofvariance uniquelyexplained.
DISCUSSION
We conducted a naming experiment in Dutch to establish whether English and Dutch readers rely on thesame sublexical units of written words. This was doneby measuring the effects ofphonological consistency forfive sublexical units: the onset, nucleus, coda, oncleus,and body. We compared the effects of these variables onDutch and English naming data.
For Dutch naming latencies, we found that consistencyofthe small units (the onset, nucleus, and coda) explaineda large proportion of the variance. Consistency of thelarge units (the oncleus and body) added only very littleto the proportion of explained variance. Because a largeproportion ofexplained variance is common to the smalland the large units, we have no conclusive evidence thatDutch readers do not process the large units. However,there is a substantial proportion of variance that can beexplained only by the small units, but not by the largeunits, whereas there is almost no variance that can be explained only by the large units, but not by the small units.The simple assumption that Dutch readers process theonset, nucleus, and coda separately explains the data almost as well as the more complex assumption that readers rely on both small and large units. When consideringthe unique contribution ofeach unit's phonological consistency above all the other variables, the onset and codaare the units whose consistencies have the largest effecton naming latencies. The oncleus and the body both hadmuch smaller unique contributions. Moreover, in trend,the body had an even smaller contribution than the oncleus. Therefore, we can conclude that the onset-bodypattern reported for English latencies (Treiman et aI.,1995) has not been replicated. The analysis oferrors confirmed these results.
The reanalysis of the English data showed the onsetbody pattern for both data sets (Seidenberg & Waters,1989, cited by Treiman et aI., 1995). For latencies as well
654 MARTENSEN, MARIS, AND DIJKSTRA
Table 5English Words: Explained Variance in a Hierarchical Regression Analysis
Step Predictors
McGill Data Wayne State Data
Latency Error Latency Error
R2 RYncrease R2 RYncrease R2 R1ncrease R2 RYncrease
4.6
5.0
4.6
9.6
5.1
6.5
5.1
11.6
0.5
1.62.1
0.51.1
12.2
1.1
13.3
12
Word frequencyWord frequency, onset,
nucleus, coda3 Word frequency, onset,
nucleus, coda, oncleus,body 14.9 1.6 3.5 1.4 13.0 1.4 12.7 3.1
Note-For the onset, nucleus, coda, oncleus, and body, the predictor was the consistency of that particularunit. Explained variance (R2) is given as a percentage. Significance testing was omitted.
as for errors, onset and body are the two units that explain the largest proportion ofunique variance. However,when comparing the results for English and Dutch naming data, the most striking difference is the small proportion of variance (13% and 15%) in English naminglatencies that can be explained by the same six variablesthat accounted for 46% of the variance in Dutch naminglatencies. Whereas relatively speaking, the body had astronger impact in English than it had in Dutch, in absolute numbers the effect ofthe body in English was onlyslightly larger than the effect of the body in Dutch. Themain difference between the two languages concerns theeffect ofconsistency of the small units. Dutch readers relied to a large extent on the onset, nucleus, and coda,whereas for English readers, these units seemed to playa comparatively small role.
The differences between the Dutch and the English results might, in part, be due to the nature of the inconsistent words. Because Dutch is such a consistent language,inconsistency is inevitably confounded with foreign origin. Most of the inconsistent words have irregular pluralforms, and they are acquired at a later age than genuineDutch words. However, all our inconsistent words arecommonly used in Dutch, have an entry in the standarddictionary, and for most of them, there is no appropriategenuine Dutch word.
Another possible explanation for the differences between Dutch and English results could concern the ex-
Table 6English Words: Explained Variance
in a Simultanous Regression Analysis
McGill Data Wayne State Data
Predictor Latency Error Latency Error
Word frequency 2.2 0.9 6.0 6.0Onset 7.3 0.9 4.6 1.6Nucleus 0.0 0.1 0.1 0.2Coda 0.2 0.0 0.1 0.0Oncleus 0.6 0.0 0.6 0.1Body 1.0 1.3 0.8 3.0
Note-For the onset, nucleus, coda, oncleus, and body, the predictorwas the consistency of that particular unit. Variance explained uniquelyby each predictor (sr 2) is given as a percentage. Significance testingwas omitted.
treme consistency and frequency values for the Dutchwords. All the words were either very consistent or veryinconsistent. This may have brought out consistency effects more strongly. The Dutch words also had a lowerfrequency than the English words. This was especiallytrue for the phonologically inconsistent words. In English, the inconsistent words are often highly frequent(e.g., HAVE),whereas in Dutch the inconsistent words areof medium or low frequency. In many studies, consistencyeffects were larger for low-frequency words than for highfrequency words (Seidenberg et al., 1984; Taraban &McClelland, 1987; but see Jared, 1997). This may, inpart, explain why we found such strong consistency effects in our Dutch data. However, it is definitely not thecase that the differences between the Dutch and the English results can all be attributed to the different selection of the words. This is evident from an analysis on asubset of the English naming data. In particular, we selected words from the English word pool that werematched in word frequency and consistency ofonset andbody to our Dutch words. A perfect match was not possible, but the selected items were almost as extreme infrequency and consistency as our Dutch items. However,the explained variance in the same type of analysis, asreported in Tables 5 and 6, increased by only 4% and 1%,respectively. We therefore conclude that the differencesbetween Dutch and English results cannot be explainedby the specific characteristics of the Dutch items used inthis study.Apparently, the different pattern ofconsistencyeffects is due to a difference in phonological processingbetween Dutch and English readers.
To conclude, Dutch readers relied strongly on the smallunits, whereas English readers did so only to a very limited extent. Altogether, phonological consistency had amuch larger effect in Dutch than in English. Relative tothe overall amount ofvariance explained by consistency,the large units, especially the body, had a stronger effectin English than in Dutch.
This pattern ofconsistency effects cannot be related tothe co-occurrence patterns in orthography or phonology.These patterns of co-occurrence are the same in Dutchand English. Within each level-orthographic and phonological-the body is the most informative unit in bothlanguages. However, in Dutch the effect ofbody consis-
tency is smaller than the effect of all the other units (except the nucleus). We conclude that the pattern ofconsistency effects cannot be explained by the co-occurrencesin orthography or phonology.
When considering phonological reliability (the correspondence between orthography and phonology) in Dutchand English, we can see that the pattern of phonologicalreliability in each language agrees with the pattern ofconsistency effects. In Dutch, all the units are very reliable. The large units, oncleus and body, do not differ fromeach other, and they are no more reliable than their constituents, the onset, nucleus, and coda. This is reflected inthe fact that the consistency of the small units is sufficient to explain a large proportion of variance in Dutchnaming latencies. The oncleus and body explain onlyvery little additional variance, and they do not differ fromeach other with respect to the size of their consistency effects. In English, the body is more reliable than its twoconstituents, and it is more reliable than the oncleus. Thisis reflected in the fact that the phonological consistenciesof the onset and the body are the two predictors that explain the highest proportion ofunique variance in Englishnaming latencies. Moreover, the overall size of consistency effects in English is much smaller than that inDutch, which agrees with the low phonological reliability of all units in English, as compared with Dutch.
We conclude that the co-occurrence patterns in orthography and phonology are not responsible for the onsetbody pattern. It is the reliability of the correspondencebetween spelling and sound that determines whetherthere is a consistency effect or not. For sublexical unitsthat are generally phonologically reliable, consistencyhas a stronger effect than for sublexical units that aremostly unreliable. In other words, inconsistency hurtsmost if you do not expect it.
REFERENCES
ANDREWS, S. (1982). Phonological recoding: Is the regularity effect consistent? Memory & Cognition, 10, 565-575.
BAAYEN, R. H., PJEPENBROCK, R., & VAN RUN, H. (1993). The CELEXlexical database [Computer software]. Philadelphia: University ofPennsylvania, Linguistic Data Consortium.
BERNSTEIN, S. E., & ThEIMAN, R. (in press). The special role of rimesin the processing of printed and spoken English. In R. Smyth (Ed.),Birdtracks in the sand: A Festschriftfor Bruce Derwing. New York:Benjamins.
BOWEY, J. A. (1990). Orthographic onsets and rimes as functional unitsofreading. Memory & Cognition, 18,419-427.
BOWEY, J. A. (1993). Orthographic rime priming. Quarterly Journal ofExperimental Psychology, 40A, 247-271.
BOWEY, J. A. (1996). Phonological recoding of nonword orthographicrime primes. Journal ofExperimental Psychology: Learning, Memory, & Cognition, 22, 117-131.
BROWN, G. D. A. (1987). Resolving inconsistency: A computationalmodel of word naming. Journal ofMemory & Language, 23,1-23.
BROWN, G. D. A., & WATSON, E L. (1994). Spelling-to-sound effects insingle-word reading. British Journal ofPsychology, 85, 181-202.
COLTHEART, M. (1978). Lexical access in simple reading tasks. InG. Underwood (Ed.), Strategies ofhuman information processing(pp. 151-216). London: Academic Press.
FITTS, P. M., & POSNER, M. I. (1967). Human performance. Belmont,CA: Brooks/Cole.
WHEN DOES INCONSISTENCY HURT? 655
GLUSHKO, R. J. (1979). The organization and synthesis of orthographicknowledge in reading aloud. Journal of Experimental Psychology:Human Perception & Performance.S, 674-691.
JARED, D. (1997). Spelling-sound consistency affects the namingof high-frequency words. Journal ofMemory & Language, 36,505529.
JARED, D., McRAE, K., & SEIDENBERG, M. S. (1990). The basis of consistency effects in word naming. Journal ofMemory & Language, 29,687-715.
KAY, J., & BISHOP, D. (1987). Anatomical differences between nose,palm, and foot, or, the body in question: Further dissection of the processes ofsub-lexical spelling-sound translation. In M. Coltheart (Ed.),Attention andperformance XII: The psychology ofreading (pp. 449469). Hillsdale, NJ: Erlbaum.
KESSLER, 8., & ThEIMAN, R. (1997). Syllable structure and the distribution of phonemes in English syllables. Journal ofMemory & Language, 37, 295-311.
LORCH, R. E, & MYERS, J. L. (1990). Regression analyses of repeatedmeasures data in cognitive research. Journal ofExperimental Psychology: Learning, Memory, & Cognition, 16, 149-157.
PATTERSON, K. E., & MORTON, 1. (1985). From orthography to phonology: An attempt at an old interpretation. In K. E. Patterson, 1.C. Marshall, & M. Coltheart (Eds.), Surface dyslexia: Neuropsychologicaland cognitive studies ofphonological reading (pp. 335-359). Hillsdale, NJ: Erlbaum.
SEIDENBERG, M. S., WATERS, G., BARNES, M. A., & TANENHAUS, M. K.(1984). When does irregular spelling or pronunciation influenceword recognition? Journal of Verbal Learning & Verbal Behavior,23, 383-404.
SHANNON, C. E., & WEAVER, W. (1949). The mathematical theory ofcommunication. Urbana: University of Illinois Press.
TARABAN, R., & MCCLELLAND, J. L. (1987). Conspiracy effects in wordpronunciation. Journal ofMemory & Language, 26, 603-631.
ThEIMAN, R., & CHAFETZ. J. (1987). Are there onset- and rime-likeunits in printed words? In M. Coltheart (Ed.), Attention and performance XII: The psychology ofreading (pp. 281-298). Hillsdale, NJ:Erlbaum.
ThEIMAN, R., MULLENIX, J., BUELJAC-BABIC, R., & RICHMOND-WELTY,E. D. (1995). The special role ofrimes in the description, use, and acquisition of English orthography. Journal ofExperimental Psychology: General, 124, 107-136.
IREIMAN, R., & ZUKOWSKI, A. (1988). Units in reading and spelling.Journal ofMemory & Language, 27, 466-477.
ZIEGLER, J. c. STONE, G. 0., & JACOBS, A. M. (1997). What is the pronunciation for -ough and the spelling for /ul? A database for cornput-'ing feedforward and feedback consistency in English. Behavior Research Methods, Instruments, & Computers, 29, 600-618.
NOTES
I. There is a confusing variety of names for the concatenated units.We adopted the term body from Patterson and Morton (1985). Other authors (e.g., Treiman, Mullenix, Bijeljac-Babic, & Richmond-Welty,1995) call this sublexical unit the rime. We decided not to do so, sincethe word rime is focused on the phonology of a word, whereas the present paper concerns phonology, orthography and the link between thesetwo systems. We thank David Plaut for suggesting the term oncleus.
2. The terms friends and enemies were introduced by Taraban andMcClelland (1987).
3. Actually, 225 words were presented in total. Six words were notpart ofthe reference lexicon, because they included a plural form or hada word-form frequency lower than I per million. Naming data for thesewords were not analyzed.
4. No significance testing on errors was performed because, in extending Lorch and Myers' (1990) repeated measures regression analysis to the case of binary data, numerical-statistical problems were encountered. Logistic regression analyses were calculated on the errors ofeach participant. For half of the participants (14 out of Ju), no finite logistic regression weights could be computed, and for those participantsfor which the weights could be computed, the asymptotic standard errors of the regression weights were enormous (an average of 1,025).
656 MARTENSEN, MARIS, AND DIJKSTRA
Rather than using these extremely unreliable estimates in further computations (e.g., testing the significance of their mean), it was decided toomit statistical testing altogether.
5. We also calculated the following analyses with a token measure ofconsistency (i.e., using the sum of word frequencies for friends and enemies instead of the number). The total variance that could be explainedwas lower. For latencies, the oncleus had a higher weight than in thetype analysis. For the rest, the results were similar.
6. We also conducted the analysis reported in Table 3 with length andthe response-voice-key asynchrony included. Length does not have asignificant correlation with naming latencies. The asynchrony betweenresponse and voice key registration (measured on 3 extra participants,whose answers were digitalized) explained 2.7% (p < .01), 1.5% (n.s.),and 1.3%(n.s.) of unique variance in Steps 1,2, and 3, respectively. Theproportions of unique variance for the other variables were practicallyunchanged (maximally by 0.7%).
APPENDIX A
The Reference LexiconAll calculations were based on a reference lexicon that con
tained all 2,671 wordforms (i.e., headwords like GO or declinations like WENT, GONE, GOES), which satisfied the followingconditions: (I) The occurrence per million must be higher thanI; (2) the wordform must be at least three letters long; (3) thewordform must be monosyllabic; and (4) the wordform mustnot be a plural or a genitive.
The spelling, phonological transcription, and wordform frequency ofoccurrence for each word was imported from CELEX(Baayen et al., 1993). The wordform frequency (as opposed tothe headword frequency) does not include related wordforms.For instance, the frequency of DOEN (to do) does not includethe frequencies of DOE (first pers. sing.), DOET (second andthird pers. sing.), DEED (did), and GEDAAN (done). These related wordforms have their own lexical entries. Frequencies ofhomographs are collated, but only ifthey are also homophones.For example, the frequency ofWEER combines its occurrencesin the meaning ofagainand in the meaning ofweather,but POOL/po:l/ (pole) and POOL /pu:I/ (pool) have separate entries.
Division Into Onset, Nucleus, and CodaLetters and phonemes were assigned to the units onset, nu
cleus, and coda as follows: Every vowel belongs to the nucleus;every consonant in front of a vowel belongs to the onset; everyconsonant after the vowel(s) belongs to the coda. The following extra rules were applied: (I) The letter J is considered to bea vowel if it follows an I (e.g., in DIJK); otherwise, it is considered a consonant (e.g., in JAS); (2) the letter Y is considered aconsonant when it stands at the beginning of a word (e.g.,YEN); otherwise it is considered a vowel (e.g., in BOY); (3) theletter U is added to the onset if it follows a Q; in all other cases,it is added to the nucleus; (4) the letter E in the end ofa word isadded to the nucleus with a special sign marking the differencebetween the orthographic codes for the nucleus of BEEN andthe nucleus of CREME; (5) the phoneme /j/ is added to the nucleus if it follows a vowel (e.g., in MAIS), which is pronounced/majs/; it is added to the onset ifit appears in the beginning ofthe word (e.g., in /jas/).
APPENDIXB
TableD!Predictor Statistics for Dutch and English Words
Dutch English
Shared SharedPredictor Mean SD Variance Mean SD Variance
Word frequency 0.92 0.58 0.7 1.25 0.84 2.3Onset 0.96 0.17 25.3 0.94 0.16 6.4Nucleus 0.89 0.28 67.2 0.61 0.30 51.2Coda 0.97 0.16 28.2 0.92 0.19 16.5Oncleus 0.84 0.35 62.7 0.56 0.37 46.9Body 0.88 0.32 57.8 0.81 0.34 18.2
Note-For the onset, nucleus, coda, oncleus, and body, the predictor wasthe consistency ofthat particular unit. Shared variance gives the percentage ofvariance for each predictor that could also be explained by otherpredictors. The numbers for the English words were calculated on allthe words used by Treiman, Mullenix, Bijeljac-Babic, and RichmondWelty (1995) and/or Seidenberg and Waters (1989), cited by Treimanet al. The deviations for the subsets actually used in both studies areless than 1%.
(Manuscript received August 10, 1998;revision accepted for publication July 20, 1999.)
Top Related