Unsettling questions about semantic ambiguity in connectionist models: Comment on Joordens and...

6
Journal of Experimental Psychology: Learning, Memory, and Cognition 1995, Vol. 21, No. 2,509-514 Copyright 1995 by the American Psychological Association, Inc. 0278-7393/95/S3.00 Unsettling Questions About Semantic Ambiguity in Connectionist Models: Comment on Joordens and Besner (1994) Michael E. J. Masson and Ron Borowsky University of Victoria S. Joordens and D. Besner (1994) described an attempt to simulate a semantic ambiguity advantage in lexical decision using a connectionist model (Masson, 1991) that was based on a Hopfield (1982) network. The question of the validity of the ambiguity advantage is briefly considered, and the assumptions behind the simulation results reported by Joordens and Besner are critically examined. The model used by Joordens and Besner is compared with other connectionist models, and alternative methods of simulating lexical decisions with this class of models are discussed. It is concluded that further empirical evidence is required and that a number of modeling alternatives need to be explored before strong conclusions can be made about the validity of the semantic ambiguity advantage and about the best way to model the effect. Representing and processing ambiguous words is a chal- lenge for distributed memory models, which are also known as parallel distributed processing or connectionist models. This class of models represents lexical knowledge in weights associ- ated with links that connect a set of processing units to one another and instantiates a known word by evoking its unique pattern of activation across the processing units. The instantia- tion of a word as a pattern of activation across an entire collection of units contrasts with the classic view of lexical representation in which each word (or word meaning) is represented by a single unit or node in a network (e.g., Anderson, 1983; Collins & Loftus, 1975; Neely, 1977). Seman- tically ambiguous words pose an interesting problem for distributed memory models because one orthographic pattern must be mapped onto two different patterns of activation among units that represent meaning. Joordens and Besner (1994) pointed out that in distributed memory models the two alternative semantic interpretations of an ambiguous ortho- graphic pattern may compete and thereby make processing less efficient. In a localist representation scheme, however, an ambiguous orthographic input can activate multiple meaning nodes simultaneously (e.g., Kintsch, 1988; Seidenberg, 1985), so inefficiency need not result. Joordens and Besner (1994) also noted that inefficient processing of semantically ambiguous words, apparently inher- ent in distributed memory models, is at odds with empirical data that have shown an advantage for ambiguous over unambiguous words in the lexical decision task (e.g., Jastrzemb- ski, 1981; Kellas, Ferraro, & Simpson, 1988; Millis & Button, 1989). Using the distributed memory model developed by Michael E. J. Masson and Ron Borowsky, Department of Psychol- ogy, University of Victoria, Victoria, British Columbia, Canada. Preparation of this article was supported by a research grant and a postdoctoral fellowship, both from the Natural Sciences and Engineer- ing Research Council of Canada. Correspondence concerning this article should be addressed to Michael E. J. Masson, Department of Psychology, University of Victoria, P.O. Box 3050, Victoria, British Columbia V8W 3P5, Canada. Electronic mail may be sent via Internet to [email protected]. Masson (1991), Joordens and Besner successfully simulated the ambiguity advantage, but only under certain conditions. In this comment, we briefly discuss the reliability of the empirical effect of semantic ambiguity, examine the approach taken by Joordens and Besner in their simulation of the lexical decision task, and consider the prospects of alternative simulation approaches. The Semantic Ambiguity Advantage The semantic ambiguity advantage in lexical decision that Joordens and Besner (1994) attempted to simulate has been the subject of some controversy in the literature. There is debate over whether the processing advantage for ambiguous words is reliable or an artifact of some confounding factor. Rueckl (1995) provides good coverage of this issue in his commentary, so we make only a few remarks here. The inconsistency in the literature reviewed by Rueckl suggests that the ambiguity advantage may be attributable to a factor that, in some experiments, is confounded with ambiguity. Indeed, in some of our work with the naming task (e.g., Borowsky & Masson, 1994), we found that when ambiguous and unambiguous words were carefully matched on a variety of factors that could influence response latency, ambiguous words failed to produce an ambiguity advantage even though those same ambiguous words generated an advantage in naming latency in an earlier experiment that used less closely matched unambiguous words (Fera, Joordens, Balota, Fer- raro, & Besner, 1992). It is not yet clear, however, whether the ambiguity advantage generally is a product of some confound- ing factor, or whether the effect is likely to appear in some tasks (e.g., lexical decision) but not others (e.g., naming). Therefore, the empirical debate continues. Given the controversy over empirical results, it seems particularly important to build viable theoretical accounts to guide empirical explorations of the effect. Herein lies the primary value of the Joordens and Besner (1994) article. Using a Hopfield (1982) network, they discovered a possible basis for an ambiguity advantage that they refer to as a proximity effect. We have more to say about this observation in the next section 509

Transcript of Unsettling questions about semantic ambiguity in connectionist models: Comment on Joordens and...

Journal of Experimental Psychology:Learning, Memory, and Cognition1995, Vol. 21, No. 2,509-514

Copyright 1995 by the American Psychological Association, Inc.0278-7393/95/S3.00

Unsettling Questions About Semantic Ambiguity in ConnectionistModels: Comment on Joordens and Besner (1994)

Michael E. J. Masson and Ron BorowskyUniversity of Victoria

S. Joordens and D. Besner (1994) described an attempt to simulate a semantic ambiguity advantagein lexical decision using a connectionist model (Masson, 1991) that was based on a Hopfield (1982)network. The question of the validity of the ambiguity advantage is briefly considered, and theassumptions behind the simulation results reported by Joordens and Besner are criticallyexamined. The model used by Joordens and Besner is compared with other connectionist models,and alternative methods of simulating lexical decisions with this class of models are discussed. It isconcluded that further empirical evidence is required and that a number of modeling alternativesneed to be explored before strong conclusions can be made about the validity of the semanticambiguity advantage and about the best way to model the effect.

Representing and processing ambiguous words is a chal-lenge for distributed memory models, which are also known asparallel distributed processing or connectionist models. Thisclass of models represents lexical knowledge in weights associ-ated with links that connect a set of processing units to oneanother and instantiates a known word by evoking its uniquepattern of activation across the processing units. The instantia-tion of a word as a pattern of activation across an entirecollection of units contrasts with the classic view of lexicalrepresentation in which each word (or word meaning) isrepresented by a single unit or node in a network (e.g.,Anderson, 1983; Collins & Loftus, 1975; Neely, 1977). Seman-tically ambiguous words pose an interesting problem fordistributed memory models because one orthographic patternmust be mapped onto two different patterns of activationamong units that represent meaning. Joordens and Besner(1994) pointed out that in distributed memory models the twoalternative semantic interpretations of an ambiguous ortho-graphic pattern may compete and thereby make processingless efficient. In a localist representation scheme, however, anambiguous orthographic input can activate multiple meaningnodes simultaneously (e.g., Kintsch, 1988; Seidenberg, 1985),so inefficiency need not result.

Joordens and Besner (1994) also noted that inefficientprocessing of semantically ambiguous words, apparently inher-ent in distributed memory models, is at odds with empiricaldata that have shown an advantage for ambiguous overunambiguous words in the lexical decision task (e.g., Jastrzemb-ski, 1981; Kellas, Ferraro, & Simpson, 1988; Millis & Button,1989). Using the distributed memory model developed by

Michael E. J. Masson and Ron Borowsky, Department of Psychol-ogy, University of Victoria, Victoria, British Columbia, Canada.

Preparation of this article was supported by a research grant and apostdoctoral fellowship, both from the Natural Sciences and Engineer-ing Research Council of Canada.

Correspondence concerning this article should be addressed toMichael E. J. Masson, Department of Psychology, University ofVictoria, P.O. Box 3050, Victoria, British Columbia V8W 3P5, Canada.Electronic mail may be sent via Internet to [email protected].

Masson (1991), Joordens and Besner successfully simulatedthe ambiguity advantage, but only under certain conditions. Inthis comment, we briefly discuss the reliability of the empiricaleffect of semantic ambiguity, examine the approach taken byJoordens and Besner in their simulation of the lexical decisiontask, and consider the prospects of alternative simulationapproaches.

The Semantic Ambiguity Advantage

The semantic ambiguity advantage in lexical decision thatJoordens and Besner (1994) attempted to simulate has beenthe subject of some controversy in the literature. There isdebate over whether the processing advantage for ambiguouswords is reliable or an artifact of some confounding factor.Rueckl (1995) provides good coverage of this issue in hiscommentary, so we make only a few remarks here. Theinconsistency in the literature reviewed by Rueckl suggeststhat the ambiguity advantage may be attributable to a factorthat, in some experiments, is confounded with ambiguity.Indeed, in some of our work with the naming task (e.g.,Borowsky & Masson, 1994), we found that when ambiguousand unambiguous words were carefully matched on a variety offactors that could influence response latency, ambiguouswords failed to produce an ambiguity advantage even thoughthose same ambiguous words generated an advantage innaming latency in an earlier experiment that used less closelymatched unambiguous words (Fera, Joordens, Balota, Fer-raro, & Besner, 1992). It is not yet clear, however, whether theambiguity advantage generally is a product of some confound-ing factor, or whether the effect is likely to appear in sometasks (e.g., lexical decision) but not others (e.g., naming).Therefore, the empirical debate continues.

Given the controversy over empirical results, it seemsparticularly important to build viable theoretical accounts toguide empirical explorations of the effect. Herein lies theprimary value of the Joordens and Besner (1994) article. Usinga Hopfield (1982) network, they discovered a possible basis foran ambiguity advantage that they refer to as a proximity effect.We have more to say about this observation in the next section

509

510 OBSERVATIONS

but for now simply note that their efforts have revealed apotential explanation for the ambiguity advantage. In theremainder of this article, we examine the simulation resultsreported by Joordens and Besner and consider alternativemodeling approaches.

Simulation of the Semantic Ambiguity Advantage

The model used by Joordens and Besner (1994), andoriginally developed by Masson (1991), consists of two process-ing modules, one representing the orthographic pattern of aword and another representing its conceptual meaning. Wordidentification is simulated by instantiating the orthographicpattern of the word in the orthographic module, then asynchro-nously updating (i.e., in random order with replacement) theunits in the conceptual module until they settle into a stablepattern that corresponds to the meaning of the word. In theJoordens and Besner application of the model, settling of theunits in the conceptual module serves as the basis for a positivelexical decision. The number of updating cycles needed toestablish a stable pattern of activation is taken as simulatedresponse latency.

Joordens and Besner (1994) found that after learningambiguous words (orthographic patterns that are mapped ontotwo different conceptual patterns on different learning trials)the model often failed to settle into one of the appropriateconceptual patterns of an ambiguous word. Instead the modelsettled into a blend, representing a mixture of the two learnedconceptual patterns. On those occasions when the network didsettle into one of the two known meanings of an ambiguousword, however, it did so faster (in fewer updating cycles) onaverage than when unambiguous words were tested. It is in thissense that Joordens and Besner were able to simulate thesemantic ambiguity advantage.

Proximity and Item-Selection Effects

The reason for the processing advantage for ambiguouswords stems from the fact that the random pattern of activa-tion in which the conceptual module was placed at the start ofa trial happened, on some proportion of the trials, to be closerthan expected by chance to one of the two meanings of anambiguous word (see Figures 4 and 8 of Joordens & Besner,1994). Joordens and Besner referred to this phenomenon asthe proximity effect. Proximity refers to the percentage of unitsin the conceptual module that, at the start of a simulated trial,are in the state that corresponds to a (or the) meaning of theorthographic pattern that has been instantiated in the ortho-graphic module. Because conceptual units are set to a randompattern at the start of a trial, there is a 50% probability that agiven unit will be in the state (1 or -1) that matches aparticular meaning of the target word. By chance, then, onewould expect the proximity between the starting pattern in theconceptual module and a particular meaning of a word to be50%.

The virtue of an ambiguous word lies in having two validconceptual patterns, as opposed to only one pattern as is thecase with unambiguous words. It is more likely that therandomly selected starting state of the conceptual units will

have greater than 50% proximity to one or the other meaningof an ambiguous word than to the single meaning of anunambiguous word. With greater proximity, it is generally truethat fewer updating cycles will be required for units in amodule to settle. By increasing the number of conceptual unitsin their third simulation, Joordens and Besner (1994) demon-strated the law of large numbers, inasmuch as it was less likelythat a high proximity value (expressed as percentage ofmatching units) would be obtained. With a smaller proximityadvantage, a smaller effect of ambiguity on settling time wasobtained (see Figure 7 of Joordens & Besner, 1994).

Although the proximity effect is a potentially valid accountof the ambiguity advantage in lexical decision, the problemJoordens and Besner (1994) encountered with conceptualunits settling into blend states led to a complication in theirassessment of the proximity effect. They excluded from consid-eration those trials on which the conceptual units settled intoblend states. Because the settling outcome was related to theoriginal starting pattern (proximity), Joordens and Besnereffectively selected for those trials involving ambiguous itemsthat had particularly high proximity values. When they selectedfor trials involving unambiguous words that had proximityvalues comparable to those of successfully settled ambiguouswords, performance measured in cycles to settle was indistin-guishable for ambiguous and unambiguous words. In essence,by considering trials only when they settled into a knownpattern of meaning, Joordens and Besner introduced anitem-selection effect. Our concern is that rather than constitut-ing an indication of a genuine processing advantage forambiguous words, the simulation results reported by Joordensand Besner may reflect an item-selection artifact.

Settling as a Criterion for Lexical Decision

The blend effect that occurred with ambiguous words in theJoordens and Besner simulations highlights the importance ofthe assumption (made by Joordens & Besner, 1994, and byMasson, 1991) that lexical decision is based on the settling ofthe conceptual units into a stable state. The use of this basis forlexical decision raises a number of questions about theprocesses involved in various word-processing tasks and raisesthe issue of selective versus multiple activation of the meaningsof ambiguous words.

Meaning selection. By assuming that a lexical decisionresponse is produced when all units in the conceptual modulehave settled into the states appropriate to the target word, andby eliminating ambiguous-word trials on which a blend state isreached, one is assuming that participants select a particularmeaning of an ambiguous word before making a lexicaldecision. The meaning selection that is implied by this assump-tion can be compared with what appears to occur duringreading comprehension. From studies that measure eye-fixation duration during reading comprehension, we knowsome important facts about time spent reading an ambiguousword. In particular, in neutral contexts, participants spendmore time viewing an ambiguous word with two equallyfrequent meanings than either an unambiguous word or anambiguous word with one highly dominant meaning (Duffy,Morris, & Rayner, 1988; Rayner & Frazier, 1989). These

OBSERVATIONS 511

results suggest that when substantial computation of meaningis involved, ambiguous words take longer to process. Theapparent reason for extended processing time is that theparticipant must select among meanings of a word that arenearly equal in strength. Comparison of the empirical datafrom lexical decision and reading comprehension studies,then, indicates that the ambiguity effect goes in oppositedirections in these two paradigms. If it is assumed that thesettling of conceptual units into one meaning of an ambiguousword corresponds both to the basis for a lexical decision and tothe selection of word meaning during reading comprehension,a serious paradox may exist. Our suspicion is that the resolu-tion of this paradox lies in a different basis for making lexicaldecisions that does not involve full settling of conceptual units.

Aside from the paradox involving lexical decision andreading comprehension, Joordens and Besner's (1994) empha-sis on the settling of the conceptual units as a basis for lexicaldecision highlights an important problem with the model. Theproblem is that when presented with an ambiguous ortho-graphic pattern, the conceptual units often settled into apattern representing a blend of the two meanings associatedwith the orthographic pattern. This problem is intriguingbecause we know that people have no trouble thinking of aparticular meaning of an ambiguous word, or even alternatingbetween two possible meanings. The model's inability to settleconsistently on a single meaning of an ambiguous wordsuggests that either there is something lacking in the represen-tational scheme that has been implemented or that an addi-tional processing mechanism is required. It does not appearlikely that the problem rests entirely with the representationalscheme. Even with a localist representation, there is noobvious means of selecting between two equally strong mean-ings of a word. Rather, we suspect that an additional mecha-nism would have to be implemented to allow the model toselect a single meaning consistently. Such a mechanism mighttake advantage of the conceptual module's starting pattern,perhaps through a modified activation function (a simplethreshold function was used in the simulations reported byJoordens and Besner), so that even a slight proximity advan-tage for one meaning could turn the tide in favor of thatinterpretation and avoid falling into a blend pattern. Alterna-tively, contextual information (e.g., schematic informationactivated by a sentence or by a part of text in which a wordappears) could be invoked to influence the path taken by theconceptual units in reaching a stable state (e.g., Sharkey, 1990).

Semantic priming. A second difficulty with the use ofsettling in the conceptual module as a basis for lexical decisionis that this criterion is incompatible with recent semanticpriming results. In the lexical decision task a semanticallyrelated prime can reduce latency in response to a subsequenttarget, even if an unrelated word to which a response must bemade is inserted between the prime and the target (Davelaar& Coltheart, 1975; McNamara, 1992; Meyer, Schvaneveldt, &Ruddy, 1972). The distributed memory model used by Joor-dens and Besner (1994), and a variant of it, can account forsemantic priming if it is assumed that semantically relatedwords have similar conceptual patterns of activation (Masson,1991, 1995). A priming effect is obtained because once arelated prime word has been processed, the conceptual units

are left in a pattern that is more similar to the target word'sconceptual pattern than when an unrelated prime is used. Thus, aform of proximity effect is created, whereby processing of the targetis given a head start by the work done by the related prime.

If each word that requires a response involves full settling ofthe conceptual module, however, there is no way for theinfluence of a prime to survive the effects of an interveningword. The entire pattern of activation in the conceptualmodule created by the prime would be eradicated by theintervening word. Therefore, the requirement that the mean-ing module settle completely during a lexical decision trialseems untenable.

Settling in an orthographic module. An alternative to basinglexical decisions on settling in a conceptual module is to usesettling with the orthographic module. Kawamoto, Farrar, andKello (1994) presented a distributed memory model of lexicaldecision that takes this approach. Their model consists of anorthographic and a meaning (conceptual) module, and uses anerror-correction learning algorithm (unlike the Hebbian learn-ing rule used in the simulations reported by Joordens &Besner, 1994). During learning, connection weights are changedaccording to the degree of discrepancy between actual andtarget activation levels of units in the network. When thealgorithm is applied to the learning of an ambiguous word,connections between orthographic units are more stronglyaffected than in the case of learning an unambiguous word.This occurs because for ambiguous words, different patterns ofactivation in the meaning units are related to one orthographicpattern. Therefore, connection weights between meaning andorthographic units are not consistently altered from onelearning trial to the next, but instead the changes varydepending on which meaning of the ambiguous word isinvolved. The connection weights between orthographic unitsmust compensate for the lack of consistency in changes to themeaning-to-orthography connection weights, thereby resultingin more potent changes to connection weights between ortho-graphic units when ambiguous words are learned. A comple-mentary effect occurs with the connection weights betweenorthographic and meaning units. That is, the weights are morestrongly affected by presentation of unambiguous words be-cause of the consistent mapping between an orthographicpattern and a single meaning pattern.

To simulate lexical decision, Kawamoto et al. (1994) as-sumed that the orthographic units must settle into a stablepattern of activation. These units settle more quickly forambiguous words because the connections between them arebetter tuned to the patterns of activation for ambiguous wordsthan for unambiguous words. On the other hand, a disadvan-tage for ambiguous words would result if instantiation of apattern of activation in the meaning module were used toproduce a lexical decision response. This result would occurbecause the connection weights between orthographic andmeaning units are better tuned to patterns representingunambiguous words.

The method of simulating lexical decision chosen by Kawa-moto et al. (1994) in their model runs counter to whatJoordens and Besner (1994) envisioned, because they empha-sized the role of word meaning in producing the ambiguity

512 OBSERVATIONS

advantage in word identification. Moreover, Fera, Ferraro,and Besner (1993) reported that the magnitude of the ambigu-ity advantage in lexical decision increases as a function ofincreasing orthographic and phonological overlap betweenword and nonword stimuli. Assuming that such overlap forcesparticipants to rely more on semantic information to discrimi-nate between words and nonwords, a semantic locus for theambiguity processing advantage is implied. The Kawamoto etal. model would predict a reversal in the ambiguity effect ifforced to monitor activation in the meaning units to makelexical decisions, rather than an increase in the effect as Feraet al. (1993) appear to have demonstrated.

Discriminating between words and nonwords. A questionnot addressed by Joordens and Besner (1994) in their simula-tion is whether the settling criterion would allow the model todistinguish between words and nonwords. They did not reportany simulations involving presentation of nonword stimuli orinvolving the model's ability to discriminate these items fromlearned words. Our recent experience with a variant of themodel used by Joordens and Besner (1994) indicates that whenthe model is presented with an orthographic pattern that isdifferent from any of the learned patterns (i.e., a nonword),the conceptual units will perform a gradient descent into astable state that corresponds to the meaning of some knownword (Borowsky & Masson, 1994). If this were to happenconsistently, the model would be unable to discriminatebetween words and nonwords on the basis of successful settlingof the conceptual units.

One could include an additional mechanism to permitword-nonword discrimination. For example, Hinton and Shal-lice (1991) simulated lexical decision in a connectionist modelby first allowing units in a semantic module to settle under theinfluence of activation from orthographic units. The settledpattern of activation in the semantic module was then com-pared with the semantic patterns of known words. Becauseunits in this model take on continuous activation values andtypically do not reach maximum possible activation, the matchbetween a settled pattern and a target pattern is not exact,unlike models whose units take on only binary values. If thesettled pattern was sufficiently similar to a known pattern, itwas classified as a word. A similar approach, in which thepattern of activation in an orthographic module is used, hasbeen taken by Seidenberg and McClelland (1989) and by Plautand Shallice (1993). In these models, the similarity betweenthe input pattern and the computed pattern of activation in theorthographic units is used as the basis for lexical decisions.Input patterns of learned words usually produce computedorthographic patterns that are more similar to the inputpattern, thereby permitting discrimination between words andnonwords. In the Plaut and Shallice model, the semanticmodule sends activation to the orthographic units, so there is apotential basis for generating ambiguity effects in that model.As far as we know, however, none of these models have beenapplied to the task of comparing lexical decisions aboutambiguous and unambiguous words, so it is not clear whetherthey would produce an ambiguity advantage.

Alternatives to Settling

Although Joordens and Besner (1994) and the developers ofthe other models we have just discussed elected to simulatelexical decisions by having at least one set of processing unitsreach a stable state, we agree with Rueckl (1995) that otherpromising approaches deserve to be explored as well. Onemethod involves measuring the number of activated features ina meaning module and the other involves assessing thegoodness of fit between the state of the processing units andthe connection weights in the network.

Number of activated semantic features. As an alternative toa fully distributed representation, Rueckl (1995) suggested theuse of a coding scheme in which each unit represents asemantic feature. The presence of a semantic feature in themeaning of a word would be coded with l's and absence codedwith O's. This suggestion amounts to a sparse coding schemeinasmuch as any single word's meaning would include very fewof the entire set of possible semantic features. Semanticfeatures become activated as a word's orthographic pattern isencoded. Nonword orthographic patterns should activate veryfew semantic features because they have not been experiencedin prior learning episodes, although some features are likely tobe activated because of orthographic similarity between non-words and words. Therefore, one might use the number ofactivated semantic features as a criterion for discriminatingbetween words and nonwords. Ambiguous words (i.e., ortho-graphic patterns), by virtue of having been associated with twodifferent meanings, might activate a greater number of seman-tic features than would unambiguous words. If this turns out tobe the case, ambiguous words should reach criterion soonerthan unambiguous words, thus generating an ambiguity advan-tage in lexical decision latency.

Goodness of fit. The Hopfield (1982) network that formsthe basis of the model used by Joordens and Besner (1994) canbe characterized as a system containing basins of attraction(representing learned states) into which the network tends tomove when units are updated. The basin into which thenetwork moves depends on the initial pattern of activation(i.e., the orthographic input and the random starting pattern inthe conceptual units). The scheme that is used to move thenetwork from one state to another and eventually into a basinof attraction has a cost function associated with it, whichHopfield called an energy function. The process of moving intoa basin of attraction can be quantified as finding a minimum ofthe energy function. This function essentially is a measure ofthe goodness of fit between the current states of the processingunits and the connection weights that link them. For example,in the Hopfield network used by Joordens and Besner, eachunit can take on one of two possible values, 1 or —1. Two unitsthat have a strong positive connection contribute to a good fit,or low energy, if they are in the same state (i.e., both 1, or both-1) . Two units with a strong negative connection make asimilar contribution if they are in different states (i.e., one unitis 1 and the other is -1) . The energy function is defined as

E = - ^WijSjSj,

OBSERVATIONS 513

where wg represents the connection weight between two units iand;, and s,- and Sj represent the activation values, or states, ofthe two units (±1).

We suggest that the energy of the network (or componentsof it) can be taken as a metric of familiarity. The motivation forthis suggestion is as follows. As the network descends into abasin of attraction, the value of £ decreases. Once the bottomof the basin (a learned state) is reached, the energy value is at aminimum. Thus, energy can be used to track how close thenetwork is to a known or familiar state. In the task of lexicaldecision, we believe that participants essentially are makingspeeded familiarity judgments and that the energy of theHopfield (1982) network provides an approximation to thefeeling of familiarity that is produced when viewing a word. Inpreliminary work with a modified version of the Hopfieldnetwork used by Masson (1991) and by Joordens and Besner(1994), we have found that measuring energy provides areliable basis for discriminating between words and nonwords(Masson, 1994). We currently are using this approach toexplore the ambiguity advantage in lexical decision (Borowsky& Masson, 1994).

Assumptions About Learning

A final issue we wish to raise in the context of modeling asemantic ambiguity advantage concerns assumptions aboutlearning. The approach taken by Joordens and Besner (1994)and by others (e.g., Kawamoto et al., 1994) assumes thatunambiguous and ambiguous words follow the same learningprocess, whereby on a particular learning trial an orthographicpattern is associated with a single semantic pattern (presum-ably determined by contextual constraints). It is possible,however, that an ambiguity advantage, if genuine, depends inpart on a process that occurs during learning episodes. Inparticular, it could be the case that when an ambiguous word isencountered, each of its known meanings is activated at leastto some degree. If multiple activation occurs, each meaning ofan ambiguous word might have its representation (including itsconnection to its orthographic pattern) strengthened, althoughthe meaning that fits and is selected on the basis of the contextwould receive greater benefit. In the model used by Joordensand Besner, it was assumed that during learning, presentationof an ambiguous orthographic pattern always was associatedwith only one of the two possible meanings. A greater potentialfor producing an ambiguity advantage might be generated byassuming that each presentation of an ambiguous orthographicpattern strengthens all relevant meanings. We do not yet knowwhether a learning process of this sort plays a role in theambiguity advantage, but consideration of models of the sortused by Joordens and Besner draws these possibilities to ourattention.

Conclusion

The simulation results described by Joordens and Besner(1994) provide an interesting demonstration of how proximityeffects might overcome the competition inherent in ambiguousstimuli represented in a distributed memory system. Theconstraints associated with their simulations of the ambiguity

advantage illustrate both the shortcomings of the rather simplemodel with which they were working and some directions forfuture development. We have discussed a number of promisingextensions to the Hopfield (1982) network that Joordens andBesner used and have compared that type of model to othermodels that have been used to simulate word identification. Itis particularly interesting to note that diverse assumptions usedin different models have been applied to the lexical decisiontask. In some cases it is assumed that a lexical decision is basedon orthographic representations, and in other cases the deci-sion depends on semantic representations. The ambiguityadvantage may originate in either or both of these locations.On the other hand, additional empirical work is needed beforethe ambiguity advantage is unequivocally established as valid.The possibility that the effect may turn out not to be genuineserves as a clear signal of the importance of maintaining closelinks between formal models and empirical data. It would beironic if models were found to readily produce an ambiguityadvantage that did not truly exist.

References

Anderson, J. R. (1983). TTie architecture of cognition. Cambridge, MA:Harvard University Press.

Borowsky, R., & Masson, M. E. J. (1994). Semantic ambiguity effects inword identification revisited. Manuscript submitted for publication.

Collins, A. M., & Loftus, E. F. (1975). A spreading-activation theory ofsemantic processing. Psychological Review, 82, 407-428.

Davelaar, E., & Coltheart, M. (1975). Effects of interpolated items onthe association effect in lexical decision tasks. Bulletin of thePsychonomic Society, 6, 269-272.

Duffy, S. A., Morris, R. K., & Rayner, K. (1988). Lexical ambiguity andfixation times in reading. Journal of Memory and Language, 27,429-446.

Fera, P., Ferraro, F. R., & Besner, D. (1993, July). Resolving theambiguous role of semantics in lexical decision: Evidence from seman-tic ambiguity. Paper presented at the 3rd Annual Meeting of theCanadian Society for Brain, Behavior, and Cognitive Science,Toronto, Ontario, Canada.

Fera, P., Joordens, S., Balota, D. A., Ferraro, F. R., & Besner, D.(1992, November). Ambiguity in meaning and phonology: Effects onnaming. Paper presented at the 33rd Annual Meeting of thePsychonomic Society, St. Louis, MO.

Hinton, G. E., & Shallice, T. (1991). Lesioning an attractor network:Investigations of acquired dyslexia. Psychological Review, 98, 74-95.

Hopfield, J. J. (1982). Neural networks and physical systems withemergent collective computational abilities. Proceedings of the Na-tional Academy of Sciences, USA, 81, 3088-3092.

Jastrzembski, J. E. (1981). Multiple meanings, number of relatedmeanings, frequency of occurrence, and the lexicon. CognitivePsychology, 13, 278-305.

Joordens, S., & Besner, D. (1994). When banking on meaning is not(yet) money in the bank: Explorations in connectionist modeling.Journal of Experimental Psychology: Learning, Memory, and Cognition,20, 1051-1062.

Kawamoto, A. H., Farrar, W. T., & Kello, C. (1994). When twomeanings are better than one: Modeling the ambiguity advantageusing a recurrent distributed network. Journal of ExperimentalPsychology: Human Perception and Performance, 20, 1233-1247.

Kellas, G., Ferraro, F. R., & Simpson, G. B. (1988). Lexical ambiguityand the timecourse of attentional allocation in word recognition.Journal of Experimental Psychology: Human Perception and Perfor-mance, 14, 601-609.

514 OBSERVATIONS

Kintsch, W. (1988). The role of knowledge in discourse comprehen-sion: A construction-integration model. Psychological Review, 95,163-182.

Masson, M. E. J. (1991). A distributed memory model of contexteffects in word identification. In D. Besner & G. W. Humphreys(Eds.), Basic processes in reading: Visual word recognition (pp.233-263). Hillsdale, NJ: Erlbaum.

Masson, M. E. J. (1994, February). Beyond conjecture: Contextualinfluences on the perception of words and objects. Paper presented atthe annual meeting of the Lake Ontario Visionary Establishment,Niagara Falls, Ontario, Canada.

Masson, M. E. J. (1995). A distributed memory model of semanticpriming. Journal of Experimental Psychology: Learning, Memory, andCognition, 21, 3-23.

McNamara, T. P. (1992). Theories of priming: I. Associative distanceand lag. Journal of Experimental Psychology: Learning, Memory, andCognition, 18, 1173-1190.

Meyer, D. E., Schvaneveldt, R. W., & Ruddy, M. G. (1972, Novem-ber). Activation of lexical memory. Paper presented at the 13thAnnual meeting of the Psychonomic Society, St. Louis, MO.

Millis, M. L., & Button, S. B. (1989). The effect of polysemy on lexicaldecision time: Now you see it, now you don't. Memory & Cognition,17, 141-147.

Neely, J. H. (1977). Semantic priming and retrieval from lexicalmemory: Roles of inhibitionless spreading activation and limited-

capacity attention. Journal of Experimental Psychology: General, 106,226-254.

Plaut, D. C , & Shallice, T. (1993). Deep dyslexia: A case study ofconnectionist neuropsychology. Cognitive neuropsychology, 10, 377—500.

Rayner, K., & Frazier, L. (1989). Selection mechanisms in readinglexically ambiguous words. Journal of Experimental Psychology:Learning, Memory, and Cognition, 15, 779-790.

Rueckl, J. G. (1995). Ambiguity and connectionist networks: Stillsettling into a solution—Commentary on Joordens and Besner(1994). Journal of Experimental Psychology: Learning, Memory, andCognition, 21, 501-508.

Seidenberg, M. S. (1985). The time course of phonological codeactivation in two writing systems. Cognition, 19, 1-30.

Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, develop-mental model of word recognition and naming. Psychological Review,96, 523-568.

Sharkey, N. E. (1990). A connectionist model of text comprehension.In D. A. Balota, G. B. Flores d'Arcais, & K. Rayner (Eds.),Comprehension processes in reading (pp. 487-514). Hillsdale, NJ:Erlbaum.

Received March 16,1994Revision received May 13,1994

Accepted May 16,1994 •