Four dimensional ``old minimal'' Script N = 2 supersymmetrization of Script R 4
Effects of Script Type on Word Association Processes in ...
-
Upload
khangminh22 -
Category
Documents
-
view
0 -
download
0
Transcript of Effects of Script Type on Word Association Processes in ...
Running head: EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 1
Effects of Script Type on Word Association Processes in Mandarin Chinese
Master Thesis
By
Fei Teng (u864402)
Supervisor: dr. E.A. Keuleers
Second reader: dr. P.A. Vogt
Cognitive Science and Artificial Intelligence
Tilburg University
August, 2018
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 2
Abstract
This study investigated the effect of different Chinese word scripts on word association
processes. In a controlled experiment, associations to images of objects accompanied either
by a description in Chinese simplified characters or in Chinese alphabetical pinyin script
were contrasted with associations to the same objects in a control condition showing only
images. We recruited 92 native Chinese speakers and asked them to provide three
associations, written in Chinese simplified characters, to each stimulus. We then calculated
both the phonetic and semantic similarity between each stimulus and its associations.
Phonetic similarity was calculated as the Levenshtein distance between phonetic
transcriptions, while semantic similarity was calculated using specifically generated word
embeddings based on a Chinese language corpus. The results showed that word
associations produced in the character condition had lower semantic similarities to the
stimuli than for the control condition, which itself did not differ from the pinyin condition.
Finally, the results did not show significant effects of phonetic similarity.
Keywords: writing systems, word association, distributional semantics, Mandarin
Chinese, logographic script, pinyin
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 3
Acknowledgments
Six years ago, I came to the Netherlands to do my studies. Six years later, I am on
my way to finishing my master’s thesis. This thesis is the result of my graduation project
for the Master program Cognitive Science and Artificial Intelligence(CSAI) at Tilburg
University. It marks the last milestone of my student period.
This thesis would not have been possible without the help of several people. First
and foremost, I would like to express my gratitude to my supervisor Dr. Emmanuel
Keuleers for a great amount of effort and time taken in providing me with his thoughtful
and knowledgeable feedback. We spent a lot of time discussing the project and writing the
script together. You have opened the door to psycholinguistics: a subject that I had never
thought about before but turned out to be very attractive to me in the end. Thank you for
feeding me decent information, and being there for me throughout the whole project.
Additionally, I am very grateful to all the participants who took the time and effort to help
me without hesitation. Furthermore, I would like to thank my friends for taking my mind
off the project whenever necessary. Thanks to all the lovely people that I met here in
Eindhoven, Tilburg and during my trips.
Finally, I would like to express my deepest gratitude to my parents. This study would
not even have been started without your unfailing support. Thank you for encouraging me
throughout the years I have spent for my studies and throughout the process of researching
and writing this thesis.
Fei Teng
Tilburg, August 2018
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 4
Contents
Abstract 2
Introduction 6
Theoretical Background 9
The development of Chinese writing until the 1950s . . . . . . . . . . . . . . . . . 9
From traditional to simplified Chinese characters . . . . . . . . . . . . . . . . . . 9
The six categories of simplified Chinese characters . . . . . . . . . . . . . . . . . . 10
Romanizations of Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Cognitive processing of written Mandarin Chinese . . . . . . . . . . . . . . . . . . 12
Word association and measuring semantic similarity . . . . . . . . . . . . . . . . . 15
The current study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Method 21
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Semantic similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Phonetic similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Results 26
Descriptive analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Mixed-Effects Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Discussion and Conclusion 32
Principal findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 5
Limitations and Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
References 37
Appendices 43
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 6
Introduction
To what extent can the way we write our language influence the way we think? The
current study will try to answer this question by studying word associations to stimuli
presented in two different writing systems, or scripts, for Mandarin Chinese. Mandarin
Chinese provides a unique tool for testing the effect of writing systems because it can be
written using either Chinese simplified characters or pinyin, an alphabetic writing system,
which just like the writing system English speakers are used to, relies on a mapping
between letters and sounds. On the other hand, most Chinese simplified characters are
composed of phonetic and semantic radicals. These radicals can have orthographic,
phonological, and semantic properties (Tong, Tong, & McBride, 2017). A Chinese
character consists of three tiers (Shen & Ke, 2007): a character contains one or more
radicals and each radical is composed of strokes. Although the phonetic radical of a
compound character provides a cue to the sound of the word represented by the character,
the connection between phonetic radical and sound is not predictable (Cao et al., 2013;
Guan, Liu, Chan, Ye, & Perfetti, 2011).
The relationship between script and thought can be seen as a special case of the more
general relation between language and thought. In the 1940’s, one of the linguists who
started to pay special attention to the relationship between language and thought was
Benjamin Lee Whorf. Whorf studied Hopi, a Native American language that is spoken in
northeastern Arizona. He noticed that some linguistic structures that were found in Hopi
were very different from those in what he called SAE (Standard Average European)
languages. One of the examples Whorf gave is that in SAE time and objects are both
counted in the same way. In SAE, one can therefore as easily say that 6 days is “more”
than 5 days as one can say that 6 boxes is “more” than 5 boxes. However, in Hopi, there is
no such “objectification of time” (Penhallurick, 2010, p. 125) and one can only say that the
6th day is “later” than the 5th (Whorf, 2012), but not “more”. Whorf argued that the Hopi
language makes the Hopi experience the world differently (Hussein, 2012). According to
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 7
the Whorfian hypothesis (Whorf, 1956), language frames the way in which the users of that
language talk about their universe and the same principle holds true for all languages and
people.
One of the major arguments against the Whorfian hypothesis is that it cannot be
proven whether language shapes thought or whether it is the other way around. Pinker
(1995) claims that language is a reflection of thought, but not vice versa. Pinker uses the
example of the word “spring” and mentions that when people think about “spring” they
are not confused by their thought about a season or something that goes “boing”. In
addition, Pinker (1995, p. 136) mentions that “if one word can correspond to two thoughts,
then thoughts cannot be words”. Moreover, some argue that if the Whorfian hypothesis is
correct, then it would be impossible to become bilingual or to become a translator because
it is impossible for us to view the world in different ways at the same time (Al-Samarrai,
2007).
Research into the relationship between language and thought has mostly looked at
whether the way language users experience the world is influenced by the way their spoken
language is structured. In contrast, there has been little investigation in the connection
between the way a language is written and how its users experience the world. In a prior
study, S. Chen (2017) showed Chinese native speakers stimuli consisting of an image and a
corresponding word written either in Chinese simplified characters or in pinyin, an
alphabetical script. She then asked participants to come up with three associations to
these stimuli, either in Chinese simplified characters or in pinyin. The results showed that
reading Chinese characters led to semantically and phonetically different associations than
reading pinyin versions of the same words, but that the script the participants were asked
to respond in did not affect the results. While Chen’s study strongly suggested that the
way words are written can affect our associations to those words, one shortcoming of the
study was that it did not include a baseline condition where only images were shown.
Hence, the study was unable to identify to what extent associations in each of the scripts
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 8
differed from a neutral presentation of images without writing. In other words, due to the
study’s design, it was unable to identify the extent to which each of the scripts influenced
association processes.
The present study builds on Chen’s study and improves its design. Its main research
question is “To what extent do associations to images accompanied by words written in
simplified characters and pinyin differ from associations to the same images without
script?”. We will try to answer this research question by presenting two groups of
participants with stimuli in either Chinese simplified characters or pinyin, and compare
their associations to those of a group of participants who see only images. Like in Chen’s
study, we will use numerical word embeddings derived from semantic vector spaces to
calculate the semantic similarity between stimuli and associations. Phonetic similarity will
be evaluated using a traditional string distance measure.
The remainder of this thesis is structured as follows. In the second chapter, we will
discuss the theoretical background of this study. The third chapter will cover the
methodological framework. In the fourth chapter, the results of this study will be
presented. Finally, the fifth chapter will summarize and discuss the main findings and will
discuss the strengths and limitations of this study.
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 9
Theoretical Background
The development of Chinese writing until the 1950s
Chinese language dates back at least 6000 years. Written Chinese character
inscriptions have been found in turtle shells from the Shang dynasty (1766 BC - 1123 BC)
which means that the written language has existed at least for more than 3000 years
(Roberts, 1999). Over time, the writing system has been influenced by political changes,
but the basic principles of the language and written characters have remained the same.
The development of Chinese script started from oracle bone script (used from 1500 to
1000 BC) which is characterized by sharp and square lines. Then the scripts were inscribed
on bronze ware (used from 1100 to 770 BC) which look similar to the oracle bone script.
Subsequently, the script became widely standardized after Qin Shihuang unified China.
During this period, the seal script (used from 770 to 207 BC), which has smooth curved
lines was commonly used. This was followed by clerical script (used from 206 BC to AD
220) during the Han Dynasty, which was developed to facilitate faster writing. Finally, the
regular script (or the standard script, used from ca. AD 200) is one of the last major styles
to develop. Today, these character sare referred to as “traditional Chinese characters”.
From traditional to simplified Chinese characters
In mainland China, the traditional Chinese character script was seen as too
complicated and archaic. In the 1950s, in order to improve literacy, the government of
mainland China issued official documents containing simplified characters and began
promoting them for use in daily life. According to Mills (1956), simplified Chinese script
differs in several ways from the traditional characters: It uses fewer strokes; It has fewer
characters in common use; and it merges some characters with similar sound and meaning.
With the passage of time, simplified Chinese characters became the formal and
standard way of writing in mainland China, while traditional Chinese characters are still in
use in Taiwan and Hong Kong. Although using simplified Chinese characters has helped to
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 10
improve literacy (Xie, 2009), there is still discussion about reintroducing traditional
Chinese characters in mainland China. In 2009, People’s Daily1 argued that the state
should reintroduce traditional Chinese characters in mainland China because the use of
simplified characters hinders communication between different Chinese-speaking regions.
For example, people from mainland China have problems reading documents with
traditional characters, while people from Taiwan have problems when reading documents
with simplified characters.
The six categories of simplified Chinese characters
Simplified Chinese characters can be divided into six different categories (in
Mandarin: liu shu, “six scripts”). This classification standard is known from Xu Shen’s
dictionary (in Mandarin: shuowen jiezi). The three main categories are pictographs (in
Mandarin: xiangxing), ideographs (in Mandarin: zhishi), and phono-semantic compounds
(in Mandarin: xingsheng). The other three categories are compound ideographs (in
Mandarin: huiyi), transfer characters (in Mandarin: zhuanzhu), and rebus (phonetic loan)
characters (in Mandarin: jiajie).
The most common of the six character categories is the phonetic-semantic compound
category, in which the character combines a semantic element or radical with a phonetic
element which has exactly the same pronunciation as the whole character. For example,
the character ‘chopstick’ (in Chinese: kuai) is composed of the radical ‘bamboo’ (in
Mandarin: zhu) and the phonetic element ‘kuai’. The combination forms the character
‘kuai’ and refers to an object made of bamboo. According to the Institute of Language
Teaching and Research (Dictionary, 1986), over 70% of Chinese words are compounds. In
addition, there are roughly 600 Chinese pictographs, characters that directly represent the
objects (Z. Zhou, 2014). These pictographs, for instance ‘person’ (in Mandarin: ren, 人)
are generally the oldest characters in Chinese. Ideographs were developed after pictographs
1 2009-04-09
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 11
and are often intended to symbolize abstract concepts. For example, ‘one’ is indicated by
one horizontal line (in Mandarin: yi, 一) and ‘two’ is indicated by two horizontal lines (in
Mandarin: er, 二). Compound ideographs are the combinations of two or more
pictographic or ideographic parts. For example, the compound ideographs of grove (in
Mandarin: lin, 林) and forest (in Mandarin: sen, 森) are compounds of wood (in Mandarin:
mu, 木). The last two types of characters are not frequently used. Transfer characters are
interchangeable with other characters that have the same radical and similar etymology.
For example, the Chinese word for father can be either “父” (in Mandarin: fu) or “爸” (in
Mandarin: ba). Finally, rebus characters are characters borrowed from another
homophonous or near-homophonous morpheme (i.e different meaning but same/similar
pronunciation), comparable with “4” as a rebus for English “for” (Gu, 2011). For example,
the original word “泳” (in Mandarin: yong, swim) is borrowed by the rebus word “永” (in
Mandarin: yong, forever).
Romanizations of Chinese
Romanization refers to the representation of Chinese pronunciation using the Roman
(Latin) script. Earlier romanizations of Mandarin Chinese were the Wade-Giles and the
Yale system. The Wade-Giles (Wade) system, which was developed by Thomas Wade
during the mid-19th century, is still in use in Hong Kong and Taiwan and was also popular
in western society. Because Wade-Giles was created in order to render the sounds of all
different Chinese dialects (not just Mandarin Chinese), it requires readers to memorize
special pronunciations for certain characters.The later developed Yale system (named after
Yale University) uses English spelling conventions to represent Chinese sound and thus
requires no special training. It was widely used in American textbooks until the late 1970s
(Benjamin, 1997).
The pinyin system was developed in the 1950s by Chinese linguists and officially
adopted in mainland China in 1979. Like the Yale system, pinyin is a type of romanization
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 12
of the Beijing dialect of Mandarin in which most characters are pronounced more or less
like what an English speaker would expect. “Pin Yin” in Mandarin Chinese can be literally
translated into “spell sound”. In mainland China, pinyin is used as a tool to teach the
Mandarin pronunciation of Chinese characters from the early stage of primary school.
Chinese language books or magazines for children are often annotated with pinyin above
the simplified Chinese characters.
While pinyin was originally a teaching aid, it is now also used for other purposes. For
example, pinyin can be found on traffic signs and billboards. With the enormous
developments in science and technology, pinyin has also become very useful because it is
the most popular and common way to type out Chinese characters on a keyboard.
Although mobile phones with a touchscreen also allows users to draw characters, this is
more time consuming and the devices’ ability to recognize the right character depends on
the handwriting skills of the user. Still, this system is preferred by many people born
before the 1980s. In addition, a large number of senior citizens never learned either pinyin
or standard Mandarin Chinese.
In today’s society, pinyin plays an important and unique role in mainland China, but
it is still complementary to Chinese characters. From time to time, there is discussion
about replacing Chinese characters altogether with pinyin. The American Sinologist Victor
Henry Mair advocates for writing Chinese in an alphabetic script (i.e. pinyin) because he
sees advantages for Chinese education, computerization, and lexicography (Mair, 1986).
Cognitive processing of written Mandarin Chinese
Reading and writing involve the interpretation and expression of language using
specific symbols, which must be learned. This learning involves complex cognitive
processes, such as visual, orthographic, phonological and semantic processing (Tan, Spinks,
Eden, Perfetti, & Siok, 2005). One way of distinguishing between the cognitive processing
of an alphabetical writing systems, like pinyin, and a logographic writing system, like
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 13
simplified Chinese characters, is to take the perspective of the ”Dual Route” model
(Coltheart, 1981; Marshall & Newcombe, 1973). In this perspective, there are two routes
that words can take when they are read: the “visual” route, which maps words from their
written form (orthography) directly to their meaning (semantics), and the “phonological”
route, which first translates the written form to sound (phonology), which is then mapped
to meaning in the same way as spoken words. When languages use an alphabetic writing
system, both the phonological route of words and the direct visual route are in principle
possible, and, because written symbols have a tight mapping to sounds, there is a
possibility to derive a phonological form from the visual input without knowing the
meaning (Harm & Seidenberg, 2004). For a logographic writing system, such as Chinese
simplified characters or Japanese Kanji, there is no such systematic relation between
orthography and phonology, and the only route available is the visual one, mapping form
directly to sound (Hino, Kusunose, Lupker, & Jared, 2013; X. Zhou & Marslen-Wilson,
2000).
From the perspective of the dual route model, one expects to find these differences
reflected in the acquisition and behavioral and neural processing of simplified characters
and pinyin. The evidence discussed below points to the existence of different contributions
to acquisition and to consistent differences in processing for Chinese simplified characters
and pinyin.
M. J. Chen and Yuen (1991) compared visual processing in children from mainland
China, Taiwan and Hong Kong. A crucial difference between those groups is that children
in mainland China first learn pinyin and then simplified Chinese characters, while in Hong
Kong learning follows the traditional approach without alphabetic foundation and
Taiwanese children learn traditional Chinese characters in combination with pinyin. In the
study by M. J. Chen and Yuen (1991), children performed different tasks using Chinese
characters (simplified in mainland China, traditional in Taiwan and Hong Kong).
Crucially, in a pseudohomophone naming task, children from mainland China and Taiwan
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 14
performed better than Hong Kong children, suggesting that pinyin training helps people
pronounce unfamiliar words because it contributes to the extraction of phonological
information for the character.
Other evidence shows that learning Chinese by using only pinyin has a negative
impact toward students’ future acquisition of Chinese. Mushangwe and Chisoni (2015)
studied Zimbabwean students during their acquisition of Mandarin Chinese by using a
character recognition survey. Their results showed that using only pinyin in teaching
negatively affects vocabulary acquisition because each Chinese character carries a unique
meaning. They suggested that the use of pinyin in teaching should always be accompanied
by Chinese characters and that pinyin should only be used as a supplementary system to
Chinese characters.
Cao et al. (2013) found that, in comparison to writing pinyin, writing Chinese
characters activated bilateral superior parietal lobules and bilateral lingual gyri in both a
lexical decision task and an implicit writing task. They suggested that writing characters
provided a better representation of the visual-spatial structure of the character and its
orthography while writing pinyin provided a better connection with phonology by
activating the right inferior frontal gyrus.
Y. Chen, Fu, Iversen, Smith, and Matthews (2002) studied reading in Chinese using
fMRI. In this study, participants were shown either two real Chinese characters in a
meaningless combination or paired characters and pinyin group. They were asked to decide
whether visually presented, paired Chinese characters and pinyin “sounded like” a real
word in a phonological and lexical task. During the experiment, a script-to-sound
translation had to be performed in order to make the right decision. The results of this
study showed that reading Chinese characters and pinyin activates a common brain
network including the inferior frontal, middle, and inferior temporal gyri, the inferior and
superior parietal lobules and the extrastriate areas. They also found that reading pinyin
led to a greater activation in the inferior parietal cortex bilaterally, the precuneus, and the
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 15
anterior middle temporal gyrus, while reading characters led to a greater activation in the
left fusiform gyrus, the bilateral cuneus, the posterior middle temporal, the right inferior
frontal gyrus, and the bilateral superior frontal gyrus.
In another study that investigated the extent to which different Chinese scripts
influence brain activation, Fu, Chen, Smith, Iversen, and Matthews (2002), participants
were asked to silently read either pinyin or simplified characters. The results of this fMRI
study showed that some brain regions were associated with processing of pinyin and
simplified characters, independently of surface form, while other regions were specifically
associated with the processing of one script or the other.
Cao et al. (2013) investigated whether learning to write Chinese in different types of
script influences the brain’s reading network. In this study, English speaking students in an
introductory Chinese class were taught Chinese words by seeing their corresponding
simplified characters, pinyin and English translation. After each instruction, they were
asked to write down the simplified character (character-writing condition) or the pinyin
(pinyin-writing condition). Following the instruction, fMRI data was collected in a passive
viewing task, a lexical decision task, and in an “implicit writing” task, all involving
simplified characters. Based on the involvement of different brain networks in the different
conditions, Cao et al. (2013) interpreted the suggested that writing characters during
learning leads to better representation of the visual structure of the character and its
orthography and to more interaction with sensori‐motor information during character
recognition. In addition, they suggested that writing characters during learning leads to
higher activation of brain networks involving semantics during recognition, while writing
pinyin improves connections to brain networks involving phonology.
Word association and measuring semantic similarity
The differences in acquisition and processing between simplified Chinese characters
and pinyin offer a possible basis for more pronounced differences in how words in these two
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 16
writing systems can be experienced. To explore this in more detail, however, requires a
task that can explicitly reveal thought processes.
To explore how words are related to other words in the mind of language users, there
are several research approaches, such as collecting slips of the tongue, speech error analysis,
and word association (WA) tests. Word association tests, which were initially used as a
psychological tool to study the subconscious mind (Hui, 2011) are one of the most common
and oldest research tools for revealing thought processes. Word association tasks play an
important role in psycholinguistic research, especially in the field of lexical retrieval
(Church & Hanks, 1990).
According to Sturrock (2008), word association tasks can be divided into phonetic
and semantic association tasks. In a phonetic association task, participants may for
instance be instructed to come up with words that sound similar to a given stimulus. For
example, the word hill and the word kill are clearly phonetically similar because of the fact
that only the first letters differ. In a semantic association test, participants are instructed
to come up with words that they think of after exposure to a target word. In this task, the
relation between the target word and its associations is not strictly defined. In traditional
linguistic terms, it can take many different forms, such as relatedness, synonymy,
antonymy, hypernimy, etc. (Sigurd, 2009). The only commonality between all these cases is
that the words are somehow related in meaning.
One way to measure semantic similarity, without explicitly taking into account the
formal relationship between words, is rooted in the the distributional hypothesis, which
states that words that occur in similar contexts tend to be semantically similar (Harris,
1954). A well-known version of the hypothesis was given by Firth (1957), who wrote: “You
shall know a word by the company it keeps”. Another version is given by Wittgenstein
(1953), who claims that the meaning of a word is defined by the way it is used. More
formally, Lenci (2008) states that the degree of semantic similarity between two linguistic
terms A and B is a function of the similarity of the linguistic context in which A and B can
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 17
appear.
Distributional semantics models (DSMs) (Landauer & Dumais, 1997; Schütze, 1998)
are implementations of this distributional hypothesis. The aim of these models is to derive
numerical vector representation for words based on the contexts that these words occur in
text corpora. With these vector representations, words can be treated as a points in a
multidimensional space and the similarity between a pair of words can then be equated to
the similarity between the vectors representing the words (Mandera, Keuleers, &
Brysbaert, 2017). For example, both car and vehicle will often appear with the same words
such as wheels, gasoline, and engine, so the numerical vectors representing of both words
will be very similar.
When these vector representations are derived using neural network models, they are
usually called word embeddings (Vulić & Mrkšić, 2017; Levy & Goldberg, 2014;
Bojanowski, Grave, Joulin, & Mikolov, 2016; Pennington, Socher, & Manning, 2014;
Mikolov, Sutskever, Chen, Corrado, & Dean, 2013). In the current study, we will use word
embeddings to calculate the similarity between the stimulus and responses.
The current study
As we discussed in the introduction and illustrated in Figure 1, the study by S. Chen
(2017) asked Chinese native speakers to provide word associations to images accompanied
by either Chinese simplified characters or pinyin. The reason Chen presented images
together with the written stimuli was to remove ambiguity. Mandarin Chinese is highly
homophonous and therefore a pinyin transcription can correspond to many different words.
Presenting the image made it clear which word the written stimulus referred to. In the
study, participants are provided written word associations either in Chinese simplified
characters or pinyin. The study then compared the semantic and phonetic similarity
between stimulus and associations in the two reading conditions and in the two writing
conditions. The result showed that when participants read pinyin script, the semantic
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 18
similarity between stimulus and associations was significantly lower than when participants
read simplified characters and that the corresponding phonetic similarities were
significantly higher. However, no differences in either phonetic or semantic similarity could
be attributed to the script that participants used to write the associations.
Figure 1. An overview of the comparison between the current study and the S. Chen
(2017) study
An important gap in the study by S. Chen (2017) is that the results only allow us to
say that there is a difference between the script conditions, but not which of the script
conditions may influence thought compared to a neutral situation. A control condition or a
third condition is crucial for testing the extent to which script influences word association
processes. Therefore the current study adapts Chen’s study by adding a control condition
in which participants will only see images. Compared to the study by S. Chen (2017), the
current study will only ask participants to write their responses in Chinese simplified
characters. This decision is motivated by the absence of an effect due to writing in Chen’s
experiment.
To summarize, the main research question of the current study is: “To what extent
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 19
do associations to images accompanied by words written in simplified characters and pinyin
differ from associations to the same images without script?” To answer this question,
participants will be asked to write down their associations in Chinese simplified characters
to stimuli presented in three different conditions: images accompanied by a description in
Chinese simplified characters (condition 1); images accompanied by a description in pinyin
(condition 2); or images unaccompanied by a description (condition 3).
In order to determine whether the different conditions have an effect on the generated
word associations, we will calculate the semantic similarity and phonetic similarity between
the stimulus and its associations. Importantly, both the character condition and the pinyin
condition will be compared to a control condition. There will be no direct comparison of
the pinyin condition to the simplified character condition because the goal of the thesis
explicitly wants to tests the influence of each script against a baseline condition in which
no written language is presented.
Based on the literature exposing behavioral and neuropsychological differences in
processing pinyin and Chinese characters (e.g. S. Chen (2017) ; Cao et al. (2013)), we have
the following hypotheses (note that since the subcomponents of both hypotheses will be
tested using orthogonal contrasts, the subcomponents are not separated into different
hypotheses):
1. Word associations produced in the Chinese simplified character condition will be
semantically more similar to the stimuli than word associations produced in the
control condition. Word associations produced in the pinyin condition will be
semantically less similar to the stimuli than word associations produced in the
control condition.
2. Word associations produced in the pinyin condition will be phonetically more similar
to the stimuli than word associations produced in the control condition. Word
associations produced in the Chinese simplified character condition will be
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 20
phonetically less similar to the stimuli than word associations produced in the control
condition.
An additional difference between the study by S. Chen (2017) and the current one is that
to compute the semantic similarity between stimuli and associations, S. Chen (2017) used
off-the-shelf word embeddings, based on a Wikipedia corpus (Al-Rfou, Perozzi, & Skiena,
2013). For the current study, we will generate new word embeddings based on a corpus of
Chinese movie subtitles (Lison & Tiedemann, 2016), which are more closely related to
typical human dialogue (Brysbaert & New, 2009).
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 21
Method
Design
The experiment used a between-subjects design with word script as a factor with
three levels (simplified characters, pinyin, and control) and the semantic and phonetic
similarity between the stimulus and responses provided by participants as a dependent
variable.
Participants
Ninety-two Chinese native speakers (46 female, 46 male) took part in the study.
Their age ranged from 16 to 80 years (M = 33.63, SD = 11.22). Participants were
randomly assigned to one of the three conditions. At the end of the experiment, there were
30 participants in the character condition, 30 participants in the pinyin condition and 32
participants in the control (images-only) condition. Since the study required native
Chinese speakers, participants were recruited in Chongqing, China, for which the author of
this thesis received helped from his family members.
Procedure
The experiment was conducted on paper (A4 paper size). The bundle provided to
each participant contained 7 pages. The first two pages consisted of general information
about the experiment and a consent form in English and in Chinese (see appendix A, B,
and C). The participants were instructed to read the consent form carefully and to sign it.
We also asked participants to fill in their age and gender. They were then instructed that
they would be presented with a list of stimuli for each of which they would need to write
three associated words, in Chinese simplified characters. In the character and pinyin
conditions, a picture with the intended meaning of each stimulus was displayed together
with the written description; in the control condition, only these images were displayed.
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 22
Participants were asked to finish the experiment within 30 minutes. After finishing
the experiment, participants had to hand in the answer sheet. The participants performed
the experiment independently, so they did not know about other conditions. An example of
the instructions, the informed consent form, and the answer sheets can be found in the
Appendix.
Materials
As shown in Figure 2, each trial consisted of a row with four columns. The leftmost
column was reserved for the stimulus, while the three columns next to the stimulus were
reserved for the answers provided by the participants. Above each row, an instruction
sentence in Chinese was displayed.
Figure 2. A fragment of the answer sheet, presenting ‘tree’ (shu) in the pinyin condition
(Translation of the instruction: ”Please write down three associated Chinese characters or
words based on the pinyin and the picture”)
Images for the stimuli were selected based on the stimuli from an ongoing project
involving the thesis supervisor. In order to correctly place the image into each block, we
first reduced the size of each image to 250 x 150 (mm) and added the character script and
pinyin script into the upper left corner of the image for each condition. Three example
images that were used in the different conditions are shown in Figure 3.
Before the experiment, we asked several volunteers to perform a pilot test in order to
determine whether the envisaged number of stimuli would fit in the time allotted for the
experiment. We prepared 30 stimuli and asked three participants to provide three
associated words in Chinese simplified character form. All 3 participants spent nearly the
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 23
(a) (b)
(c)
Figure 3. An example of the stimulus ‘tree’ (树)
a: Simplified character condition, b: Pinyin condition, c: Control condition
same amount of time (30 minutes) to finish the experiment. In the control condition (i.e.
only show pictures), participants spent slightly more time than in the other two conditions.
Since we also needed to ask participants to read the instructions and write down their age
and gender, we decided to reduce the number of stimuli to 25.
As discussed before, there are different types of Chinese simplified characters. Based
on the analysis that about 80% to 90% of Chinese simplified characters are
semantic-phonetic compounds (Kang, 1993; Zhu, 1987), we only used semantic-phonetic
compounds in the experiment. Semantic-phonetic compounds characters consist of two
parts: the ‘semantic’ element part and the ‘phonetic’ element part.
According to Feldman and Siok (1999), approximately 75% of semantic-phonetic
compound characters have their semantic radicals on the left. For compound characters,
most left radicals function as semantic elements while the right radicals function as the
phonological elements. Since we wanted to show participants Chinese characters that are
used frequently, we decided on three criteria for the character selection: 1)
semantic-phonetic compounds; 2) preferably a left-right structure with the left radicals as
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 24
the semantic element and the right radical as the phonological element; and 3) sufficient
familiarity to participants. In order to satisfy the final criterion, we inspected the word
frequency of each potential stimulus in order to ensure that participants would be familiar
with it. Based on a corpus of film and television subtitles (Cai & Brysbaert, 2010), the
frequency rank for the stimuli in our study ranged from 255 to 1889 (out of 5939) (M =
869, SD = 449).
Measurements
Semantic similarity. The semantic similarity between stimuli and responses was
always based on the similarity between the stimulus in its simplified character form, even if
the participants didn’t see it, and the associations, which were always written in character
form. When generating the vector spaces for the word embeddings, we were faced with
potential concerns due to having chosen a different approach then S. Chen (2017).
Therefore, we decided to compute semantic similarity in both ways.
First, we generated word embeddings using Gensim (Řehřek & Sojka, 2010), with the
default settings (worker: 4, vector size: 100, window: 5, algorithm: CBOW) for the
word2vec model (Mikolov et al., 2013), on the Opensubtitles2018 corpus (Lison &
Tiedemann, 2016), which contains 191.4M words from film and television subtitles in 2
simplified Chinese characters. To be able to compare our results to those of S. Chen (2017),
we also used the off-the-shelf semantic spaces for Chinese provided by the Polyglot project
(Al-Rfou et al., 2013) that were used in her study. Polyglot contains word embeddings for
117 languages, based on Wikipedia articles, and the word embeddings are generated using
the curriculum learning method (Bengio, Louradour, Collobert, & Weston, 2009).
Phonetic similarity. Before computing phonetic similarity, we converted all stimuli
and responses from their simplified character representation into pinyin, using the open
2 http://www.opensubtitles.org/
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 25
source software pinyin 0.4.03. Then, we computed the phonetic similarity between each
stimulus and its associations using the Levenshtein distance (Levenshtein, 1966). The
Levenshtein distance quantifies the distance between two strings as the minimal number of
deletions, insertions, or substitutions required to transform one string into another. A
similarity is obtained from this distance by (l − d)/l, where d is the Levenshtein distance
between the two strings and l is the length of the longest of the two strings.
3 https://pypi.org/project/pinyin/
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 26
Results
Descriptive analysis
We collected results from 92 participants. Due to an error in generating the response
sheet from the stimuli, we included stimulus ‘鼓’ (gu3) twice. Therefore, we decided to
remove the results for the second presentation of the stimulus ‘鼓’ in each response sheet,
which resulted in the total number of stimuli decreased to 24.
For each pair of stimulus and response, we calculated semantic similarity based on
two semantic spaces: Opensubtitles (Gensim) and Wikipedia (Polyglot). Because some
participants failed to provide three associations for some stimuli, 75 observations were
missing (6549 instead of 6624 observations). Figure 4 summarizes the obtained semantic
similarity using the different approaches (see above). According to Figure 4, surprisingly,
the average semantic similarity of the pinyin condition was the highest in all cases.
Mixed-Effects Analysis
In order to answer the research question, we compared both the character condition
and the pinyin condition to the control condition, using separate linear mixed-effects
models, for each semantic space, and for phonetic similarity.
The fixed effect part of all three models contains main effects for condition (i.e.
reading different scripts), response order (1 to 3, from the first response to the last), and
presentation order (1 to 24) as well as all second and third order interaction effects
between these variables. All models used random intercepts for participants and stimuli.
To check whether random slopes were necessary, we compared models with a random slope
for response order on the random intercept for participant [2nd model] with models with a
random slope for response order on the random intercept for ‘participant’ and for ‘stimulus’
[3rd model]. Anova model comparison tests showed that, in all cases, the 2nd model should
be preferred over the base model and the 3rd model. Therefore, all of the analyses reported
below were done with the 2nd model’s random effects structure. Degrees of freedom and
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 27
Figure 4. Box-and-whisker plot showing the distribution of semantic similarity between
stimuli and associations in the three conditions (1: Chinese simplified characters, 2: pinyin,
3: control), using two different vector spaces (Opensubtitles [Gensim] vs Wikipedia
[Polyglot]). The top and bottom of a box show the first and third quartiles. The line in the
middle of a box is the median. The whiskers extend from the end of the boxes ± 1.5 times
the interquartile distance. The blue dots show outliers beyond this range.
p-values were calculated using Satterthwaite’s method (Satterthwaite, 1946).
Table 1 summarizes the results of the analysis on semantic similarity using the
Opensubtitles (Gensim) space. As shown in Table 1, the overall effect of condition was
significant. Analysis with treatment contrasts showed that semantic similarity was
significantly lower in the Chinese character condition compared to the control condition
(t(171) = -3.341, p = .001). The semantic similarity was also significantly affected by the
response order and presentation order and by the interaction effect between the Chinese
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 28
character condition and the response order (t(216.2) = 3.432, p < .001) and by the
interaction effect between the response order and the presentation order. Figure 5a
indicates the partial effects of reading condition and response order on semantic similarity
using the Opensubtitles (Gensim) semantic space.
Table 1
Summary of analysis of fixed effect on semantic similarity. Type III analysis of variance
table with Satterthwaite’s method, Opensubtitles (Gensim) semantic space
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Condition 0.60 0.30 2.00 170.89 7.62 0.0007
Response order 3.75 3.75 1.00 242.23 95.80 0.0000
Presentation order 1.59 1.59 1.00 6033.47 40.52 0.0000
Condition × Response order 0.50 0.25 2.00 242.22 6.39 0.0020
Condition × Presentation order 0.01 0.01 2.00 6038.76 0.19 0.8289
Response order × Presentation order 0.24 0.24 1.00 6031.93 6.01 0.0143
Condition × Response order ×
Presentation order 0.06 0.03 2.00 6031.15 0.73 0.4823
Table 2 summarizes the results of the analysis on semantic similarity using the
Wikipedia (Polyglot) semantic space. The pattern of results was similar to the one found
using the Opensubtitles (Gensim) semantic space. The overall effect of condition was
significant. Analysis with treatment contrasts showed that semantic similarity was
significantly lower in the Chinese character condition than in the control condition (t(271)
= -3.110, p = 0.002), but there was no difference in semantic similarity between the pinyin
condition and the control condition. The semantic similarity was also influenced by the
response order and presentation order and also the interaction effect between the Chinese
character condition and response number (t(358) = 2.517, p = 0.012). Figure 5b indicates
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 29
the partial effects of reading condition and response order on semantic similarity using the
Wikipedia (Polyglot) semantic space.
Table 2
Summary of analysis of fixed effects on semantic similarity. Type III analysis of variance
table with Satterthwaite’s method, Wikipedia (Polyglot) word space
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Condition 0.53 0.26 2.00 222.68 4.79 0.0092
Response order 1.65 1.65 1.00 360.80 29.88 0.0000
Presentation order 0.37 0.37 1.00 5613.63 6.64 0.0100
Condition × Response order 0.44 0.22 2.00 360.79 3.98 0.0196
Condition × Presentation order 0.12 0.06 2.00 5613.59 1.12 0.3263
Response order × Presentation order 0.02 0.02 1.00 5625.98 0.29 0.5912
Condition × Response order ×
Presentation order 0.11 0.06 2.00 5625.92 1.04 0.3531
As Table 3 shows, for the phonetic similarity, the results showed that the effect of
condition was not significant. Analysis with treatment contrasts showed that neither
reading Chinese simplified characters or reading Chinese pinyin resulted in differences with
the control condition. Phonetic similarity was also significantly influenced by response
order and there was an interaction effect between condition and response order. Figure 5c
shows the partial effects of reading condition and response order on phonetic similarity.
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 30
Table 3
Summary of analysis of fixed effects on phonetic similarity. Type III analysis of variance
table with Satterthwaite’s method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Condition 0.07 0.03 2.00 161.96 2.68 0.0718
Response order 0.58 0.58 1.00 215.64 45.94 0.0000
Presentation order 0.00 0.00 1.00 6387.46 0.33 0.5638
Condition × Response order 0.07 0.03 2.00 215.65 2.58 0.0784
Condition × Presentation order 0.02 0.01 2.00 6392.47 0.69 0.5008
Response order × Presentation order 0.00 0.00 1.00 6376.14 0.06 0.8079
Condition × Response order ×
Presentation order 0.01 0.01 2.00 6376.10 0.50 0.6078
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 31
(a) (b)
(c)
Figure 5. Partial effects plot showing the interaction between condition and response order.
Lines show condition: (1) Simplified characters; (2): Pinyin; (3) Control. Panels show
measure: (a) Semantic similarity based on Opensubtitles (Gensim); (b) Semantic similarity
based on Wikipedia (Polyglot); (c) Phonetic similarity.
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 32
Discussion and Conclusion
The current study tried to answer the question “To what extent do associations to
images accompanied by words written in simplified characters and pinyin differ from
associations to the same images without script?”. We asked participants in this study to
provide three associations to images accompanied by a description in simplified Chinese
characters, a description in pinyin, or no written description. The results from the
experiment were in line with a previous study by S. Chen (2017) in terms of the existence
of an effect of word script on semantic similarity. However, our results did not have the
same direction: word associations produced in the character condition had lower semantic
similarity to the stimuli than for the control condition, which itself did not differ from the
pinyin condition. Although we did not find statistically unambiguous effects of script type
on the phonetic similarity between stimulus and associations, the results point to a
decrease in phonetic similarity for associations to stimuli accompanied by simplified
characters compared to control stimuli.
Principal findings
The aim of the current study was to gain a deeper understanding of the relation
between a language’s written form and the way its users process that language. Because
Mandarin Chinese can be written using either Chinese simplified characters or pinyin, it
provides a unique tool for testing the effect of writing systems.
The results from this experimental study showed that Chinese native speakers
produce less semantically similar word association when they responded to images
accompanied by a written description in Chinese simplified characters. The effect did not
appear for stimuli presented in pinyin. This was the opposite of what we predicted in
hypothesis 1.
A potential explanation for this indirect processing of the pinyin stimuli may lie in
the way in which both scripts are commonly used. As we discussed before, Chinese
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 33
speakers predominantly use Chinese simplified characters in their daily life. Even if pinyin
is used as an intermediate step in writing, such as when typing Chinese characters on
mobile phones, reading pinyin without the corresponding Chinese characters is an unusual
task for Chinese speakers. In this sense, the pinyin stimuli may have had less effects on
word association processes because they are processed less ‘automatically’ and therefore are
more likely to be ignored by participants. This could have led to less attention being given
to the written part of the stimulus in the pinyin condition. For the character condition, on
the other hand, the processing of the character part of the stimulus is more likely to occur
completely automatically and therefore it contributes to associations processes
independently of the image part of the stimulus. One way to test whether this explanation
is reasonable and sensible is to conduct an experiment in which we compare a condition in
which pinyin occurs together with an image (condition 1) to a condition in which pinyin
occurs without an image (condition 2) and a condition in which characters occur without
an image (condition 3). If the above explanation is correct, then condition 2 should force
participants to pay attention to the pinyin part of the stimulus and should result in
different associations than in condition 1, but it should be similar to condition 3.
When we started this study, we were aware that the semantic space that was used in
the study by Chen had a shortcoming: Wikipedia pages use more formal language than we
use in daily life. A smaller issue was that the spaces from the Polyglot project were
constructed using the curriculum learning method (Bengio et al., 2009), which is not as
widely used as a training method. Therefore, we decided to make a semantic space using
data from film and television subtitles, which are more closely related to typical human
conversation, and to use Gensim to generate the semantic space using the CBOW training
method with the word2vec (Mikolov et al., 2013) algorithm. At the same time, we wanted
to be able to rule out that differences between our results and those of S. Chen (2017)
would be due to differences in the semantic space we used. Therefore, we also ran analyses
using the same semantic space that was used in the study by Chen. We found that both
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 34
spaces showed a very similar pattern of effects, by which we can conclude that (1) the
results are not dependent on which semantic space we chose and that (2) the results are
reliable and consistent across widely different language varieties and training methods.
Although in hypothesis 2, we predicted that phonetic similarity in the pinyin
condition would be higher than in the control condition and that phonetic similarity in the
character condition would be higher than in the control condition, the results did not
confirm this. Still, the results are not inconsistent with the study by S. Chen (2017), as the
difference in phonetic similarity between the simplified character condition and control
condition was marginally significant.
Throughout all analyses, we found a significant main effect of response order, which
can clearly be attributed to a general decrease in both semantic and phonetic similarity
from the first association to the last one. For semantic similarity, the interaction effect of
response order and condition showed that this decrease was more pronounced for the
second and third association than for the first one. We also found significant main effects
for presentation order on semantic similarity. Strangely, when measured using the
Opensubtitles (Gensim) space, the overall semantic similarity of the associations went
down as participants proceeded in the experiment, but the opposite happened when using
the Wikipedia (Polyglot) space. At this point, it is impossible to provide a reasonable
explanation for this finding, because it would at least involve running the different training
methods on both corpora.
Finally, a significant interaction between presentation order and response order on
semantic similarity was present in the analysis using the Opensubtitles (Gensim) space,
showing that the effect of presentation order was strongest for the first association and
became gradually less pronounced for subsequent associations.
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 35
Limitations and Recommendations
Some limitations of the study are worth pointing out. In comparison with the
previous study (S. Chen, 2017), the current study only included semantic-phonetic
compounds. It is possible that those characters activated both semantic and phonetic
associations at the same time, while characters with only semantic radicals primarily lead
to semantic associations. Future studies could investigate to what extent the type of
characters influence the word association processes.
For the character condition, the results illustrated that word associations had a lower
semantic similarity to the stimuli than for the control condition, but the associations in
pinyin condition did not differ from the control condition. As we discussed in the second
chapter, Chinese characters can be seen as the standard representation of the Chinese
language, while Chinese pinyin can be seen as the non-standard representation. We suggest
that a future study can examine another language by using the same research approach.
For example, in Korean, Hanja (Chinese characters) can be seen as the non-standard
representation, while the standard representation is Hangul (a phonetic system). Such an
experiment could test whether the effects remain when Chinese characters are the
non-standard representation rather than the standard.
The participants were mainly from the southern part of China, while pinyin is based
on the pronunciation in Beijing and China’s Northern Mandarin dialects. It is possible that
these participants were more likely to ignore the pinyin and therefore generate less
phonetically similar associations in the pinyin condition. Therefore, a recommendation for
future research is to compare this group with participants from Northern China, where the
dialect is closer to standard Mandarin Chinese.
Finally, we suggest that future research can implement a different way of computing
similarity of Chinese characters, based on the radicals and strokes in the character. The
more radicals and strokes are shared by two characters, the higher the similarity of those
two characters would be. Another method could use the visual similarity between Chinese
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 36
characters, as proposed by C.-L. Liu et al. (2011). They adopted and extended the basic
concepts of Cangjie, a proven Chinese input method. The Cangjie method defines 24 basic
elements in Chinese characters and the rules to decompose Chinese characters into these
basic elements (M. Liu, Rus, Liao, & Liu, 2016). The similarity between characters could
then be based on the total number of matched elements between two characters or on the
structure and the location of the matched elements.
Overall, our research provides an example of how word association processes can
provide an insight in the relation between written language and thought. We believe this is
a promising approach which needs more attention from researchers.
Conclusion
The current study tried to gain a deeper understanding of the relation between
written language and thought processes by taking an experimental approach using word
associations. The results of the current study demonstrated that written language can
influence word associations, but were only partially in line with our initial hypotheses. On
top of previous evidence that shows behavioural and neuropsychological differences in
processing Chinese characters and pinyin, this study also shows that different writing
systems can lead to cognitively different ways of representing language. This implies that
replacing Chinese characters with pinyin or vice versa would result in the loss of a unique
double way of representing language and decrease the richness of the Chinese language
system.
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 37
References
Al-Rfou, R., Perozzi, B., & Skiena, S. (2013, August). Polyglot: Distributed word
representations for multilingual nlp. In Proceedings of the seventeenth conference on
computational natural language learning (pp. 183–192). Sofia, Bulgaria: Association
for Computational Linguistics. Retrieved from
http://www.aclweb.org/anthology/W13-3520
Al-Samarrai, P. D. N. (2007). Language and culture a philosophical hypothesis. Journal of
Tikrit University for the Humanities, 14(4), 434–447.
Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In
Proceedings of the 26th annual international conference on machine learning (pp.
41–48).
Benjamin, A. (1997). History and prospect of Chinese romanization. Chinese
Librarianship.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with
subword information. arXiv preprint arXiv:1607.04606.
Brysbaert, M., & New, B. (2009). Moving beyond kučera and francis: A critical evaluation
of current word frequency norms and the introduction of a new and improved word
frequency measure for American English. Behavior research methods, 41(4), 977–990.
Cai, Q., & Brysbaert, M. (2010). Subtlex-ch: Chinese word and character frequencies
based on film subtitles. PloS one, 5(6), e10729.
Cao, F., Vu, M., Chan, L., Ho, D., Lawrence, J. M., Harris, L. N., … Perfetti, C. A. (2013).
Writing affects the brain network of reading in Chinese: A functional magnetic
resonance imaging study. Human Brain Mapping, 34(7), 1670–1684.
Chen, M. J., & Yuen, J. C.-K. (1991). Effects of pinyin and script type on verbal
processing: Comparisons of China, Taiwan, and Hong Kong experience. International
Journal of Behavioral Development, 14(4), 429–448.
Chen, S. (2017). Does script shape thought? : the effects of Chinese simplified character
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 38
and Pinyin scripts on Chinese native speakers’ reading and writing production
(Master’s thesis). Retrieved from Item Resolution URL
http://arno.uvt.nl/show.cgi?fid=142435
Chen, Y., Fu, S., Iversen, S. D., Smith, S. M., & Matthews, P. M. (2002). Testing for dual
brain processing routes in reading: a direct contrast of Chinese character and pinyin
reading using fMRI. Journal of Cognitive Neuroscience, 14(7), 1088–1098.
Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and
lexicography. Computational linguistics, 16(1), 22–29.
Coltheart, M. (1981). Disorders of reading and their implications for models of normal
reading. Visible language, 15(3), 245.
Dictionary, M. C. F. (1986). Modern Chinese frequency dictionary. Beijing: Beijing
Language Institute Publisher.
Feldman, L. B., & Siok, W. W. (1999). Semantic radicals contribute to the visual
identification of Chinese characters. Journal of Memory and Language, 40(4),
559–576.
Firth, J. R. (1957). A synopsis of linguistic theory, 1930-1955. Studies in linguistic
analysis.
Fu, S., Chen, Y., Smith, S., Iversen, S., & Matthews, P. (2002). Effects of word form on
brain processing of written Chinese. NeuroImage, 17(3), 1538–1548.
Gu, S. (2011). A cultural history of the Chinese language. McFarland.
Guan, C. Q., Liu, Y., Chan, D. H. L., Ye, F., & Perfetti, C. A. (2011). Writing strengthens
orthography and alphabetic-coding strengthens phonology in learning to read
Chinese. Journal of Educational Psychology, 103(3), 509.
Harm, M. W., & Seidenberg, M. S. (2004). Computing the meanings of words in reading:
cooperative division of labor between visual and phonological processes.
Psychological review, 111(3), 662.
Harris, Z. S. (1954). Distributional structure. Word, 10(2-3), 146–162.
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 39
Hino, Y., Kusunose, Y., Lupker, S. J., & Jared, D. (2013). The processing advantage and
disadvantage for homophones in lexical decision tasks. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 39(2), 529.
Hui, L. (2011). An investigation into the L2 mental lexicon of Chinese English learners by
means of word association. Chinese journal of applied linguistics, 34(1), 62–76.
Hussein, B. A.-S. (2012). The Sapir-Whorf hypothesis today. Theory and Practice in
Language Studies, 2(3), 642–646.
Kang, J. (1993). Analysis of semantics of semantic-phonetics compound characters in
modern Chinese. Information analysis of usage of characters in modern Chinese,
68–83.
Landauer, T. K., & Dumais, S. T. (1997). A solution to plato’s problem: The latent
semantic analysis theory of acquisition, induction, and representation of knowledge.
Psychological review, 104(2), 211.
Lenci, A. (2008). Distributional semantics in linguistic and cognitive research. Italian
journal of linguistics, 20(1), 1–31.
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and
reversals. In Soviet physics doklady (Vol. 10, pp. 707–710).
Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings. In Proceedings of
the 52nd annual meeting of the association for computational linguistics (volume 2:
Short papers) (Vol. 2, pp. 302–308).
Lison, P., & Tiedemann, J. (2016). Opensubtitles2016: Extracting large parallel corpora
from movie and tv subtitles.
Liu, C.-L., Lai, M.-H., Tien, K.-W., Chuang, Y.-H., Wu, S.-H., & Lee, C.-Y. (2011).
Visually and phonologically similar characters in incorrect Chinese words: Analyses,
identification, and applications. ACM Transactions on Asian Language Information
Processing (TALIP), 10(2), 10.
Liu, M., Rus, V., Liao, Q., & Liu, L. (2016). Encoding and ranking similar Chinese
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 40
characters (Tech. Rep.). Tech report, Accessed online 08/2018, Chongqing
University.
Mair, V. H. (1986). The need for an alphabetically arranged general usage dictionary of
Mandarin Chinese: a review article of some recent dictionaries and current
lexicographical projects (No. 1). Order from Dept. of Oriental Studies, University of
Pennsylvania/CU.
Mandera, P., Keuleers, E., & Brysbaert, M. (2017). Explaining human performance in
psycholinguistic tasks with models of semantic similarity based on prediction and
counting: A review and empirical validation. Journal of Memory and Language, 92,
57–78.
Marshall, J. C., & Newcombe, F. (1973). Patterns of paralexia: A psycholinguistic
approach. Journal of psycholinguistic research, 2(3), 175–199.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed
representations of words and phrases and their compositionality. In Advances in
neural information processing systems (pp. 3111–3119).
Mills, H. C. (1956). Language reform in China: Some recent developments. The Journal of
Asian Studies, 15(4), 517–540.
Mushangwe, H., & Chisoni, G. (2015). A critical analysis of the use of pinyin as a
substitute of Chinese characters. Journal of Language Teaching and Research, 6(3),
685–694.
Penhallurick, R. (2010). Studying the English language. Macmillan International Higher
Education.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word
representation. In Proceedings of the 2014 conference on empirical methods in natural
language processing (emnlp) (pp. 1532–1543).
Pinker, S. (1995). Language acquisition. Language: An invitation to cognitive science, 1,
135–82.
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 41
Řehřek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora.
Roberts, J. (1999). A concise history of China. Harvard University Press. Retrieved from
https://books.google.nl/books?id=gWKDgM3_z-oC
Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance
components. Biometrics bulletin, 2(6), 110–114.
Schütze, H. (1998). Automatic word sense discrimination. Computational linguistics,
24(1), 97–123.
Shen, H. H., & Ke, C. (2007). Radical awareness and word acquisition among nonnative
learners of Chinese. The Modern Language Journal, 91(1), 97–111.
Sigurd, B. (2009). Computer simulation of word associations and crossword solving.
Working Papers in Linguistics, 45, 157–166.
Sturrock, J. (2008). Structuralism. John Wiley & Sons.
Tan, L. H., Spinks, J. A., Eden, G. F., Perfetti, C. A., & Siok, W. T. (2005). Reading
depends on writing, in Chinese. Proceedings of the National Academy of Sciences of
the United States of America, 102(24), 8781–8785.
Tong, X., Tong, X., & McBride, C. (2017). Radical sensitivity is the key to understanding
Chinese character acquisition in children. Reading and Writing, 30(6), 1251–1265.
Vulić, I., & Mrkšić, N. (2017). Specialising word vectors for lexical entailment. arXiv
preprint arXiv:1710.06371.
Whorf, B. L. (1956). Language and thought. Cambridge: MIT Press.
Whorf, B. L. (2012). Language, thought, and reality: Selected writings of benjamin lee
whorf.
Wittgenstein, L. (1953). Philosophische untersuchungen, translated by elizabeth anscombe
as philosophical investigations. Basil Blackwell: Oxford.
Xie, J. (2009). Guanyu fantizi yu jiantizi de ruogansikao. yanjiang xueyuan xuebao, 4,
45–49.
Zhou, X., & Marslen-Wilson, W. (2000). The relative time course of semantic and
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 42
phonological activation in reading Chinese. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 26(5), 1245.
Zhou, Z. (2014). The Six Principles of Chinese Writing and Their Application to Design
As Design Idea. Studies in Literature and Language, 8(3), 84.
Zhu, Y. (1987). Analysis of cuing functions of the phonetic in modern China. Unpublished
manuscript, East China Normal University.
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 43
Appendices
Answer Sheets that were prepared for participants to write down their responses (see
the next pages).
Appendix A: Answer sheets for the simplified character condition
Appendix B: Answer sheets for the pinyin condition
Appendix C: Answer sheets for the control condition
Appendix A: Example answer sheet charactercondition
Subject information and Consent Form
Welcome to the experiment! 中文翻译请见背面
This document gives you information about the study the effects of word form on language processing. Beforethe study begins, it is important that you learn about the procedure followed in this study and that you give your informed consent for voluntary participation. Please read this document carefully.
Aim and benefit of the studyThe aim of this study is to examine the word association processes of Chinese language. This study is done byFei Teng, a student under the supervision of dr. Emmanuel Keuleers at the department of Cognitive Scienceand Artificial Intelligence.
ProcedureDuring this study, you have to write done three associated words in Chinese for each keyword.
RiskThe study does not involve any risks or detrimental side effects
DurationThe study will last approximately 30 minutes.
VoluntaryYour participation is completely voluntary. You can refuse to participate without giving any reasons and youcan stop your participation at any time during the study. You can also withdraw your permission to use your experimental data up to 24 hours after the study is finished. All this will have no negative consequences whatsoever.
Further informationIf you want more information about this study you can ask Fei Teng ( If you haveany complaints about this study, please contact the supervisor, Emmanuel Keuleers.I, (NAME) have read and understood this consent form annity to ask questions. I agree to voluntary participate in this research study carried by the department of Cognitive Scienceand Artificial Intelligence of Tilburg University.
Your information
1. Your age years old.2. Your gender
2 Female2 Male
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 44
知情同意书欢迎参加本次实验!
本文为您提供有关本次实验 “研究词语的形式对语言处理的影响” 的信息。在研究开始之前,我们希望您能了解本次实验所需要遵循的程序,并提供给您自愿参与的知情同意书。请仔细阅读本文件。
研究的目的和好处这项实验的目的是研究汉语字形和拼音的关联过程。本研究由在认知科学和人工智能系的教授埃曼努尔·克里尔斯指导,由认知科学和人工智能系学生滕飞完成。
流程在这项实验中,您需要对每个关键词用中文写下三个关联词。
风险该研究不涉及任何风险或有害的副作用。
持续时间该研究将持续大约 30 分钟。
自主性您的参与完全是自愿的, 你可以拒绝参加,不需要任何理由。您可以在实验期间的任何时候停止参与。实验结束后 24 小时,您也可以撤回您可以使用您的实验数据的许可。所有这些都不会有任何负面影响。
更多信息如果你想了解更多关于这项研究的信息,你可以咨询滕飞(电子邮件:[email protected])。如果您对本次实验有任何不满,请联系埃曼努尔·克里尔斯教授(电子邮件:[email protected])。我,(姓名) 已阅读并理解这份文件,并获得了机会提问。我同意自愿参加由蒂尔堡大学认知科学和人工智能系开展的这项研究。
您的信息3. 您的年龄 岁。
4. 您的性别2 女性
2 男性
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 45
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 46
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 47
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 48
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
请写下您认为和这个中文字词和图片相关的三个中文字词
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 49
Appendix B: Example answer sheet pinyincondition
Subject information and Consent Form
Welcome to the experiment! 中文翻译请见背面
This document gives you information about the study the effects of word form on language processing. Beforethe study begins, it is important that you learn about the procedure followed in this study and that you give your informed consent for voluntary participation. Please read this document carefully.
Aim and benefit of the studyThe aim of this study is to examine the word association processes of Chinese language. This study is done byFei Teng, a student under the supervision of dr. Emmanuel Keuleers at the department of Cognitive Scienceand Artificial Intelligence.
ProcedureDuring this study, you have to write done three associated words in Chinese for each keyword.
RiskThe study does not involve any risks or detrimental side effects
DurationThe study will last approximately 30 minutes.
VoluntaryYour participation is completely voluntary. You can refuse to participate without giving any reasons and youcan stop your participation at any time during the study. You can also withdraw your permission to use your experimental data up to 24 hours after the study is finished. All this will have no negative consequences whatsoever.
Further informationIf you want more information about this study you can ask Fei Teng . If you haveany complaints about this study, please contact the supervisor, Emmanuel Keuleers.I, (NAME) have read and understood this consent form and unity to ask questions. I agree to voluntary participate in this research study carried by the department of Cognitive Scienceand Artificial Intelligence of Tilburg University.
Your information
1. Your age years old.2. Your gender
2 Female2 Male
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 51
知情同意书欢迎参加本次实验!
本文为您提供有关本次实验 “研究词语的形式对语言处理的影响” 的信息。在研究开始之前,我们希望您能了解本次实验所需要遵循的程序,并提供给您自愿参与的知情同意书。请仔细阅读本文件。
研究的目的和好处这项实验的目的是研究汉语字形和拼音的关联过程。本研究由在认知科学和人工智能系的教授埃曼努尔·克里尔斯指导,由认知科学和人工智能系学生滕飞完成。
流程在这项实验中,您需要对每个关键词用中文写下三个关联词。
风险该研究不涉及任何风险或有害的副作用。
持续时间该研究将持续大约 30 分钟。
自主性您的参与完全是自愿的, 你可以拒绝参加,不需要任何理由。您可以在实验期间的任何时候停止参与。实验结束后 24 小时,您也可以撤回您可以使用您的实验数据的许可。所有这些都不会有任何负面影响。
更多信息如果你想了解更多关于这项研究的信息,你可以咨询滕飞(电子邮件:[email protected])。如果您对本次实验有任何不满,请联系埃曼努尔·克里尔斯教授(电子邮件:[email protected])。我,(姓名) 已阅读并理解这份文件,并获得了机会提问。我同意自愿参加由蒂尔堡大学认知科学和人工智能系开展的这项研究。
您的信息3. 您的年龄 岁。
4. 您的性别2 女性
2 男性
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 52
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 53
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 54
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 55
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
请写下您认为和这个中文拼音和图片相关的三个中文字词
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 56
Appendix C: Example answer sheet controlcondition
Subject information and Consent Form
Welcome to the experiment! 中文翻译请见背面
This document gives you information about the study the effects of word form on language processing. Beforethe study begins, it is important that you learn about the procedure followed in this study and that you give your informed consent for voluntary participation. Please read this document carefully.
Aim and benefit of the studyThe aim of this study is to examine the word association processes of Chinese language. This study is done byFei Teng, a student under the supervision of dr. Emmanuel Keuleers at the department of Cognitive Scienceand Artificial Intelligence.
ProcedureDuring this study, you have to write done three associated words in Chinese for each keyword.
RiskThe study does not involve any risks or detrimental side effects
DurationThe study will last approximately 30 minutes.
VoluntaryYour participation is completely voluntary. You can refuse to participate without giving any reasons and youcan stop your participation at any time during the study. You can also withdraw your permission to use your experimental data up to 24 hours after the study is finished. All this will have no negative consequences whatsoever.
Further informationIf you want more information about this study you can ask Fei Teng . If you haveany complaints about this study, please contact the supervisor.I, (NAME) have read and understood this consent form and unity to ask questions. I agree to voluntary participate in this research study carried by the department of Cognitive Scienceand Artificial Intelligence of Tilburg University.
Your information
1. Your age years old.2. Your gender
2 Female2 Male
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 58
知情同意书欢迎参加本次实验!
本文为您提供有关本次实验 “研究词语的形式对语言处理的影响” 的信息。在研究开始之前,我们希望您能了解本次实验所需要遵循的程序,并提供给您自愿参与的知情同意书。请仔细阅读本文件。
研究的目的和好处这项实验的目的是研究汉语字形和拼音的关联过程。本研究由在认知科学和人工智能系的教授埃曼努尔·克里尔斯指导,由认知科学和人工智能系学生滕飞完成。
流程在这项实验中,您需要对每个关键词用中文写下三个关联词。
风险该研究不涉及任何风险或有害的副作用。
持续时间该研究将持续大约 30 分钟。
自主性您的参与完全是自愿的, 你可以拒绝参加,不需要任何理由。您可以在实验期间的任何时候停止参与。实验结束后 24 小时,您也可以撤回您可以使用您的实验数据的许可。所有这些都不会有任何负面影响。
更多信息如果你想了解更多关于这项研究的信息,你可以咨询滕飞(电子邮件:[email protected])。如果您对本次实验有任何不满,请联系埃曼努尔·克里尔斯教授(电子邮件:[email protected])。我,(姓名) 已阅读并理解这份文件,并获得了机会提问。我同意自愿参加由蒂尔堡大学认知科学和人工智能系开展的这项研究。
您的信息3. 您的年龄 岁。
4. 您的性别2 女性
2 男性
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 59
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 60
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 61
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 62
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
请写下您认为和这个图片相关的三个中文字词
EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 63