Effects of Script Type on Word Association Processes in ...

Running head: EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 1

Effects of Script Type on Word Association Processes in Mandarin Chinese

Master Thesis

By

Fei Teng (u864402)

Supervisor: dr. E.A. Keuleers

Second reader: dr. P.A. Vogt

Cognitive Science and Artificial Intelligence

Tilburg University

August, 2018

EFFECTS OF SCRIPT ON WORD ASSOCIATION PROCESSES 2

Abstract

This study investigated the effect of different Chinese word scripts on word association

processes. In a controlled experiment, associations to images of objects accompanied either

by a description in Chinese simplified characters or in Chinese alphabetical pinyin script

were contrasted with associations to the same objects in a control condition showing only

images. We recruited 92 native Chinese speakers and asked them to provide three

associations, written in Chinese simplified characters, to each stimulus. We then calculated

both the phonetic and semantic similarity between each stimulus and its associations.

Phonetic similarity was calculated as the Levenshtein distance between phonetic

transcriptions, while semantic similarity was calculated using specifically generated word

embeddings based on a Chinese language corpus. The results showed that word

associations produced in the character condition had lower semantic similarities to the

stimuli than for the control condition, which itself did not differ from the pinyin condition.

Finally, the results did not show significant effects of phonetic similarity.

Keywords: writing systems, word association, distributional semantics, Mandarin

Chinese, logographic script, pinyin


Acknowledgments

Six years ago, I came to the Netherlands to do my studies. Six years later, I am on

my way to finishing my master’s thesis. This thesis is the result of my graduation project

for the Master program Cognitive Science and Artificial Intelligence(CSAI) at Tilburg

University. It marks the last milestone of my student period.

This thesis would not have been possible without the help of several people. First

and foremost, I would like to express my gratitude to my supervisor Dr. Emmanuel

Keuleers for a great amount of effort and time taken in providing me with his thoughtful

and knowledgeable feedback. We spent a lot of time discussing the project and writing the

script together. You have opened the door to psycholinguistics: a subject that I had never

thought about before but turned out to be very attractive to me in the end. Thank you for

feeding me decent information, and being there for me throughout the whole project.

Additionally, I am very grateful to all the participants who took the time and effort to help

me without hesitation. Furthermore, I would like to thank my friends for taking my mind

off the project whenever necessary. Thanks to all the lovely people that I met here in

Eindhoven, Tilburg and during my trips.

Finally, I would like to express my deepest gratitude to my parents. This study would

not even have been started without your unfailing support. Thank you for encouraging me

throughout the years I have spent for my studies and throughout the process of researching

and writing this thesis.

Fei Teng

Tilburg, August 2018


Contents

Abstract 2

Introduction 6

Theoretical Background 9

The development of Chinese writing until the 1950s . . . . . . . . . . . . . . . . . 9

From traditional to simplified Chinese characters . . . . . . . . . . . . . . . . . . 9

The six categories of simplified Chinese characters . . . . . . . . . . . . . . . . . . 10

Romanizations of Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Cognitive processing of written Mandarin Chinese . . . . . . . . . . . . . . . . . . 12

Word association and measuring semantic similarity . . . . . . . . . . . . . . . . . 15

The current study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Method 21

Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Semantic similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Phonetic similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Results 26

Descriptive analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Mixed-Effects Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Discussion and Conclusion 32

Principal findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32


Limitations and Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

References 37

Appendices 43


Introduction

To what extent can the way we write our language influence the way we think? The

current study will try to answer this question by studying word associations to stimuli

presented in two different writing systems, or scripts, for Mandarin Chinese. Mandarin

Chinese provides a unique tool for testing the effect of writing systems because it can be

written using either Chinese simplified characters or pinyin, an alphabetic writing system,

which just like the writing system English speakers are used to, relies on a mapping

between letters and sounds. On the other hand, most Chinese simplified characters are

composed of phonetic and semantic radicals. These radicals can have orthographic,

phonological, and semantic properties (Tong, Tong, & McBride, 2017). A Chinese

character consists of three tiers (Shen & Ke, 2007): a character contains one or more

radicals and each radical is composed of strokes. Although the phonetic radical of a

compound character provides a cue to the sound of the word represented by the character,

the connection between phonetic radical and sound is not predictable (Cao et al., 2013;

Guan, Liu, Chan, Ye, & Perfetti, 2011).

The relationship between script and thought can be seen as a special case of the more

general relation between language and thought. In the 1940’s, one of the linguists who

started to pay special attention to the relationship between language and thought was

Benjamin Lee Whorf. Whorf studied Hopi, a Native American language that is spoken in

northeastern Arizona. He noticed that some linguistic structures that were found in Hopi

were very different from those in what he called SAE (Standard Average European)

languages. One of the examples Whorf gave is that in SAE time and objects are both

counted in the same way. In SAE, one can therefore as easily say that 6 days is “more”

than 5 days as one can say that 6 boxes is “more” than 5 boxes. However, in Hopi, there is

no such “objectification of time” (Penhallurick, 2010, p. 125) and one can only say that the

6th day is “later” than the 5th (Whorf, 2012), but not “more”. Whorf argued that the Hopi

language makes the Hopi experience the world differently (Hussein, 2012). According to


the Whorfian hypothesis (Whorf, 1956), language frames the way in which the users of that

language talk about their universe and the same principle holds true for all languages and

people.

One of the major arguments against the Whorfian hypothesis is that it cannot be

proven whether language shapes thought or whether it is the other way around. Pinker

(1995) claims that language is a reflection of thought, but not vice versa. Pinker uses the

example of the word “spring” and mentions that when people think about “spring” they

are not confused by their thought about a season or something that goes “boing”. In

addition, Pinker (1995, p. 136) mentions that “if one word can correspond to two thoughts,

then thoughts cannot be words”. Moreover, some argue that if the Whorfian hypothesis is

correct, then it would be impossible to become bilingual or to become a translator because

it is impossible for us to view the world in different ways at the same time (Al-Samarrai,

2007).

Research into the relationship between language and thought has mostly looked at

whether the way language users experience the world is influenced by the way their spoken

language is structured. In contrast, there has been little investigation in the connection

between the way a language is written and how its users experience the world. In a prior

study, S. Chen (2017) showed Chinese native speakers stimuli consisting of an image and a

corresponding word written either in Chinese simplified characters or in pinyin, an

alphabetical script. She then asked participants to come up with three associations to

these stimuli, either in Chinese simplified characters or in pinyin. The results showed that

reading Chinese characters led to semantically and phonetically different associations than

reading pinyin versions of the same words, but that the script the participants were asked

to respond in did not affect the results. While Chen’s study strongly suggested that the

way words are written can affect our associations to those words, one shortcoming of the

study was that it did not include a baseline condition where only images were shown.

Hence, the study was unable to identify to what extent associations in each of the scripts


differed from a neutral presentation of images without writing. In other words, due to the

study’s design, it was unable to identify the extent to which each of the scripts influenced

association processes.

The present study builds on Chen’s study and improves its design. Its main research

question is “To what extent do associations to images accompanied by words written in

simplified characters and pinyin differ from associations to the same images without

script?”. We will try to answer this research question by presenting two groups of

participants with stimuli in either Chinese simplified characters or pinyin, and compare

their associations to those of a group of participants who see only images. Like in Chen’s

study, we will use numerical word embeddings derived from semantic vector spaces to

calculate the semantic similarity between stimuli and associations. Phonetic similarity will

be evaluated using a traditional string distance measure.

The remainder of this thesis is structured as follows. In the second chapter, we will

discuss the theoretical background of this study. The third chapter will cover the

methodological framework. In the fourth chapter, the results of this study will be

presented. Finally, the fifth chapter will summarize and discuss the main findings and will

discuss the strengths and limitations of this study.


Theoretical Background

The development of Chinese writing until the 1950s

Chinese language dates back at least 6000 years. Written Chinese character

inscriptions have been found in turtle shells from the Shang dynasty (1766 BC - 1123 BC)

which means that the written language has existed at least for more than 3000 years

(Roberts, 1999). Over time, the writing system has been influenced by political changes,

but the basic principles of the language and written characters have remained the same.

The development of Chinese script started from oracle bone script (used from 1500 to

1000 BC) which is characterized by sharp and square lines. Then the scripts were inscribed

on bronze ware (used from 1100 to 770 BC) which look similar to the oracle bone script.

Subsequently, the script became widely standardized after Qin Shihuang unified China.

During this period, the seal script (used from 770 to 207 BC), which has smooth curved

lines was commonly used. This was followed by clerical script (used from 206 BC to AD

220) during the Han Dynasty, which was developed to facilitate faster writing. Finally, the

regular script (or the standard script, used from ca. AD 200) is one of the last major styles

to develop. Today, these character sare referred to as “traditional Chinese characters”.

From traditional to simplified Chinese characters

In mainland China, the traditional Chinese character script was seen as too

complicated and archaic. In the 1950s, in order to improve literacy, the government of

mainland China issued official documents containing simplified characters and began

promoting them for use in daily life. According to Mills (1956), simplified Chinese script

differs in several ways from the traditional characters: It uses fewer strokes; It has fewer

characters in common use; and it merges some characters with similar sound and meaning.

With the passage of time, simplified Chinese characters became the formal and

standard way of writing in mainland China, while traditional Chinese characters are still in

use in Taiwan and Hong Kong. Although using simplified Chinese characters has helped to


improve literacy (Xie, 2009), there is still discussion about reintroducing traditional

Chinese characters in mainland China. In 2009, People’s Daily1 argued that the state

should reintroduce traditional Chinese characters in mainland China because the use of

simplified characters hinders communication between different Chinese-speaking regions.

For example, people from mainland China have problems reading documents with

traditional characters, while people from Taiwan have problems when reading documents

with simplified characters.

The six categories of simplified Chinese characters

Simplified Chinese characters can be divided into six different categories (in

Mandarin: liu shu, “six scripts”). This classification standard is known from Xu Shen’s

dictionary (in Mandarin: shuowen jiezi). The three main categories are pictographs (in

Mandarin: xiangxing), ideographs (in Mandarin: zhishi), and phono-semantic compounds

(in Mandarin: xingsheng). The other three categories are compound ideographs (in

Mandarin: huiyi), transfer characters (in Mandarin: zhuanzhu), and rebus (phonetic loan)

characters (in Mandarin: jiajie).

The most common of the six character categories is the phonetic-semantic compound

category, in which the character combines a semantic element or radical with a phonetic

element which has exactly the same pronunciation as the whole character. For example,

the character ‘chopstick’ (in Chinese: kuai) is composed of the radical ‘bamboo’ (in

Mandarin: zhu) and the phonetic element ‘kuai’. The combination forms the character

‘kuai’ and refers to an object made of bamboo. According to the Institute of Language

Teaching and Research (Dictionary, 1986), over 70% of Chinese words are compounds. In

addition, there are roughly 600 Chinese pictographs, characters that directly represent the

objects (Z. Zhou, 2014). These pictographs, for instance ‘person’ (in Mandarin: ren, 人)

are generally the oldest characters in Chinese. Ideographs were developed after pictographs

1 2009-04-09


and are often intended to symbolize abstract concepts. For example, ‘one’ is indicated by

one horizontal line (in Mandarin: yi, 一) and ‘two’ is indicated by two horizontal lines (in

Mandarin: er, 二). Compound ideographs are the combinations of two or more

pictographic or ideographic parts. For example, the compound ideographs of grove (in

Mandarin: lin, 林) and forest (in Mandarin: sen, 森) are compounds of wood (in Mandarin:

mu, 木). The last two types of characters are not frequently used. Transfer characters are

interchangeable with other characters that have the same radical and similar etymology.

For example, the Chinese word for father can be either “父” (in Mandarin: fu) or “爸” (in

Mandarin: ba). Finally, rebus characters are characters borrowed from another

homophonous or near-homophonous morpheme (i.e different meaning but same/similar

pronunciation), comparable with “4” as a rebus for English “for” (Gu, 2011). For example,

the original word “泳” (in Mandarin: yong, swim) is borrowed by the rebus word “永” (in

Mandarin: yong, forever).

Romanizations of Chinese

Romanization refers to the representation of Chinese pronunciation using the Roman

(Latin) script. Earlier romanizations of Mandarin Chinese were the Wade-Giles and the

Yale system. The Wade-Giles (Wade) system, which was developed by Thomas Wade

during the mid-19th century, is still in use in Hong Kong and Taiwan and was also popular

in western society. Because Wade-Giles was created in order to render the sounds of all

different Chinese dialects (not just Mandarin Chinese), it requires readers to memorize

special pronunciations for certain characters.The later developed Yale system (named after

Yale University) uses English spelling conventions to represent Chinese sound and thus

requires no special training. It was widely used in American textbooks until the late 1970s

(Benjamin, 1997).

The pinyin system was developed in the 1950s by Chinese linguists and officially

adopted in mainland China in 1979. Like the Yale system, pinyin is a type of romanization


of the Beijing dialect of Mandarin in which most characters are pronounced more or less

like what an English speaker would expect. “Pin Yin” in Mandarin Chinese can be literally

translated into “spell sound”. In mainland China, pinyin is used as a tool to teach the

Mandarin pronunciation of Chinese characters from the early stage of primary school.

Chinese language books or magazines for children are often annotated with pinyin above

the simplified Chinese characters.

While pinyin was originally a teaching aid, it is now also used for other purposes. For

example, pinyin can be found on traffic signs and billboards. With the enormous

developments in science and technology, pinyin has also become very useful because it is

the most popular and common way to type out Chinese characters on a keyboard.

Although mobile phones with a touchscreen also allows users to draw characters, this is

more time consuming and the devices’ ability to recognize the right character depends on

the handwriting skills of the user. Still, this system is preferred by many people born

before the 1980s. In addition, a large number of senior citizens never learned either pinyin

or standard Mandarin Chinese.

In today’s society, pinyin plays an important and unique role in mainland China, but

it is still complementary to Chinese characters. From time to time, there is discussion

about replacing Chinese characters altogether with pinyin. The American Sinologist Victor

Henry Mair advocates for writing Chinese in an alphabetic script (i.e. pinyin) because he

sees advantages for Chinese education, computerization, and lexicography (Mair, 1986).

Cognitive processing of written Mandarin Chinese

Reading and writing involve the interpretation and expression of language using

specific symbols, which must be learned. This learning involves complex cognitive

processes, such as visual, orthographic, phonological and semantic processing (Tan, Spinks,

Eden, Perfetti, & Siok, 2005). One way of distinguishing between the cognitive processing

of an alphabetical writing systems, like pinyin, and a logographic writing system, like


simplified Chinese characters, is to take the perspective of the ”Dual Route” model

(Coltheart, 1981; Marshall & Newcombe, 1973). In this perspective, there are two routes

that words can take when they are read: the “visual” route, which maps words from their

written form (orthography) directly to their meaning (semantics), and the “phonological”

route, which first translates the written form to sound (phonology), which is then mapped

to meaning in the same way as spoken words. When languages use an alphabetic writing

system, both the phonological route of words and the direct visual route are in principle

possible, and, because written symbols have a tight mapping to sounds, there is a

possibility to derive a phonological form from the visual input without knowing the

meaning (Harm & Seidenberg, 2004). For a logographic writing system, such as Chinese

simplified characters or Japanese Kanji, there is no such systematic relation between

orthography and phonology, and the only route available is the visual one, mapping form

directly to sound (Hino, Kusunose, Lupker, & Jared, 2013; X. Zhou & Marslen-Wilson,

2000).

From the perspective of the dual route model, one expects to find these differences

reflected in the acquisition and behavioral and neural processing of simplified characters

and pinyin. The evidence discussed below points to the existence of different contributions

to acquisition and to consistent differences in processing for Chinese simplified characters

and pinyin.

M. J. Chen and Yuen (1991) compared visual processing in children from mainland

China, Taiwan and Hong Kong. A crucial difference between those groups is that children

in mainland China first learn pinyin and then simplified Chinese characters, while in Hong

Kong learning follows the traditional approach without alphabetic foundation and

Taiwanese children learn traditional Chinese characters in combination with pinyin. In the

study by M. J. Chen and Yuen (1991), children performed different tasks using Chinese

characters (simplified in mainland China, traditional in Taiwan and Hong Kong).

Crucially, in a pseudohomophone naming task, children from mainland China and Taiwan


performed better than Hong Kong children, suggesting that pinyin training helps people

pronounce unfamiliar words because it contributes to the extraction of phonological

information for the character.

Other evidence shows that learning Chinese by using only pinyin has a negative

impact toward students’ future acquisition of Chinese. Mushangwe and Chisoni (2015)

studied Zimbabwean students during their acquisition of Mandarin Chinese by using a

character recognition survey. Their results showed that using only pinyin in teaching

negatively affects vocabulary acquisition because each Chinese character carries a unique

meaning. They suggested that the use of pinyin in teaching should always be accompanied

by Chinese characters and that pinyin should only be used as a supplementary system to

Chinese characters.

Cao et al. (2013) found that, in comparison to writing pinyin, writing Chinese

characters activated bilateral superior parietal lobules and bilateral lingual gyri in both a

lexical decision task and an implicit writing task. They suggested that writing characters

provided a better representation of the visual-spatial structure of the character and its

orthography while writing pinyin provided a better connection with phonology by

activating the right inferior frontal gyrus.

Y. Chen, Fu, Iversen, Smith, and Matthews (2002) studied reading in Chinese using

fMRI. In this study, participants were shown either two real Chinese characters in a

meaningless combination or paired characters and pinyin group. They were asked to decide

whether visually presented, paired Chinese characters and pinyin “sounded like” a real

word in a phonological and lexical task. During the experiment, a script-to-sound

translation had to be performed in order to make the right decision. The results of this

study showed that reading Chinese characters and pinyin activates a common brain

network including the inferior frontal, middle, and inferior temporal gyri, the inferior and

superior parietal lobules and the extrastriate areas. They also found that reading pinyin

led to a greater activation in the inferior parietal cortex bilaterally, the precuneus, and the


anterior middle temporal gyrus, while reading characters led to a greater activation in the

left fusiform gyrus, the bilateral cuneus, the posterior middle temporal, the right inferior

frontal gyrus, and the bilateral superior frontal gyrus.

In another study that investigated the extent to which different Chinese scripts

influence brain activation, Fu, Chen, Smith, Iversen, and Matthews (2002), participants

were asked to silently read either pinyin or simplified characters. The results of this fMRI

study showed that some brain regions were associated with processing of pinyin and

simplified characters, independently of surface form, while other regions were specifically

associated with the processing of one script or the other.

Cao et al. (2013) investigated whether learning to write Chinese in different types of

script influences the brain’s reading network. In this study, English speaking students in an

introductory Chinese class were taught Chinese words by seeing their corresponding

simplified characters, pinyin and English translation. After each instruction, they were

asked to write down the simplified character (character-writing condition) or the pinyin

(pinyin-writing condition). Following the instruction, fMRI data was collected in a passive

viewing task, a lexical decision task, and in an “implicit writing” task, all involving

simplified characters. Based on the involvement of different brain networks in the different

conditions, Cao et al. (2013) interpreted the suggested that writing characters during

learning leads to better representation of the visual structure of the character and its

orthography and to more interaction with sensori‐motor information during character

recognition. In addition, they suggested that writing characters during learning leads to

higher activation of brain networks involving semantics during recognition, while writing

pinyin improves connections to brain networks involving phonology.

Word association and measuring semantic similarity

The differences in acquisition and processing between simplified Chinese characters

and pinyin offer a possible basis for more pronounced differences in how words in these two


writing systems can be experienced. To explore this in more detail, however, requires a

task that can explicitly reveal thought processes.

To explore how words are related to other words in the mind of language users, there

are several research approaches, such as collecting slips of the tongue, speech error analysis,

and word association (WA) tests. Word association tests, which were initially used as a

psychological tool to study the subconscious mind (Hui, 2011) are one of the most common

and oldest research tools for revealing thought processes. Word association tasks play an

important role in psycholinguistic research, especially in the field of lexical retrieval

(Church & Hanks, 1990).

According to Sturrock (2008), word association tasks can be divided into phonetic

and semantic association tasks. In a phonetic association task, participants may for

instance be instructed to come up with words that sound similar to a given stimulus. For

example, the word hill and the word kill are clearly phonetically similar because of the fact

that only the first letters differ. In a semantic association test, participants are instructed

to come up with words that they think of after exposure to a target word. In this task, the

relation between the target word and its associations is not strictly defined. In traditional

linguistic terms, it can take many different forms, such as relatedness, synonymy,

antonymy, hypernimy, etc. (Sigurd, 2009). The only commonality between all these cases is

that the words are somehow related in meaning.

One way to measure semantic similarity, without explicitly taking into account the

formal relationship between words, is rooted in the the distributional hypothesis, which

states that words that occur in similar contexts tend to be semantically similar (Harris,

1954). A well-known version of the hypothesis was given by Firth (1957), who wrote: “You

shall know a word by the company it keeps”. Another version is given by Wittgenstein

(1953), who claims that the meaning of a word is defined by the way it is used. More

formally, Lenci (2008) states that the degree of semantic similarity between two linguistic

terms A and B is a function of the similarity of the linguistic context in which A and B can


appear.

Distributional semantics models (DSMs) (Landauer & Dumais, 1997; Schütze, 1998)

are implementations of this distributional hypothesis. The aim of these models is to derive

numerical vector representation for words based on the contexts that these words occur in

text corpora. With these vector representations, words can be treated as a points in a

multidimensional space and the similarity between a pair of words can then be equated to

the similarity between the vectors representing the words (Mandera, Keuleers, &

Brysbaert, 2017). For example, both car and vehicle will often appear with the same words

such as wheels, gasoline, and engine, so the numerical vectors representing of both words

will be very similar.

When these vector representations are derived using neural network models, they are

usually called word embeddings (Vulić & Mrkšić, 2017; Levy & Goldberg, 2014;

Bojanowski, Grave, Joulin, & Mikolov, 2016; Pennington, Socher, & Manning, 2014;

Mikolov, Sutskever, Chen, Corrado, & Dean, 2013). In the current study, we will use word

embeddings to calculate the similarity between the stimulus and responses.

The current study

As we discussed in the introduction and illustrated in Figure 1, the study by S. Chen

(2017) asked Chinese native speakers to provide word associations to images accompanied

by either Chinese simplified characters or pinyin. The reason Chen presented images

together with the written stimuli was to remove ambiguity. Mandarin Chinese is highly

homophonous and therefore a pinyin transcription can correspond to many different words.

Presenting the image made it clear which word the written stimulus referred to. In the

study, participants are provided written word associations either in Chinese simplified

characters or pinyin. The study then compared the semantic and phonetic similarity

between stimulus and associations in the two reading conditions and in the two writing

conditions. The result showed that when participants read pinyin script, the semantic


similarity between stimulus and associations was significantly lower than when participants

read simplified characters and that the corresponding phonetic similarities were

significantly higher. However, no differences in either phonetic or semantic similarity could

be attributed to the script that participants used to write the associations.

Figure 1. An overview of the comparison between the current study and the S. Chen

(2017) study

An important gap in the study by S. Chen (2017) is that the results only allow us to

say that there is a difference between the script conditions, but not which of the script

conditions may influence thought compared to a neutral situation. A control condition or a

third condition is crucial for testing the extent to which script influences word association

processes. Therefore the current study adapts Chen’s study by adding a control condition

in which participants will only see images. Compared to the study by S. Chen (2017), the

current study will only ask participants to write their responses in Chinese simplified

characters. This decision is motivated by the absence of an effect due to writing in Chen’s

experiment.

To summarize, the main research question of the current study is: “To what extent


do associations to images accompanied by words written in simplified characters and pinyin

differ from associations to the same images without script?” To answer this question,

participants will be asked to write down their associations in Chinese simplified characters

to stimuli presented in three different conditions: images accompanied by a description in

Chinese simplified characters (condition 1); images accompanied by a description in pinyin

(condition 2); or images unaccompanied by a description (condition 3).

In order to determine whether the different conditions have an effect on the generated

word associations, we will calculate the semantic similarity and phonetic similarity between

the stimulus and its associations. Importantly, both the character condition and the pinyin

condition will be compared to a control condition. There will be no direct comparison of

the pinyin condition to the simplified character condition because the goal of the thesis

explicitly wants to tests the influence of each script against a baseline condition in which

no written language is presented.

Based on the literature exposing behavioral and neuropsychological differences in

processing pinyin and Chinese characters (e.g. S. Chen (2017) ; Cao et al. (2013)), we have

the following hypotheses (note that since the subcomponents of both hypotheses will be

tested using orthogonal contrasts, the subcomponents are not separated into different

hypotheses):

1. Word associations produced in the Chinese simplified character condition will be

semantically more similar to the stimuli than word associations produced in the

control condition. Word associations produced in the pinyin condition will be

semantically less similar to the stimuli than word associations produced in the

control condition.

2. Word associations produced in the pinyin condition will be phonetically more similar

to the stimuli than word associations produced in the control condition. Word

associations produced in the Chinese simplified character condition will be


phonetically less similar to the stimuli than word associations produced in the control

condition.

An additional difference between the study by S. Chen (2017) and the current one is that

to compute the semantic similarity between stimuli and associations, S. Chen (2017) used

off-the-shelf word embeddings, based on a Wikipedia corpus (Al-Rfou, Perozzi, & Skiena,

2013). For the current study, we will generate new word embeddings based on a corpus of

Chinese movie subtitles (Lison & Tiedemann, 2016), which are more closely related to

typical human dialogue (Brysbaert & New, 2009).


Method

Design

The experiment used a between-subjects design with word script as a factor with

three levels (simplified characters, pinyin, and control) and the semantic and phonetic

similarity between the stimulus and responses provided by participants as a dependent

variable.

Participants

Ninety-two Chinese native speakers (46 female, 46 male) took part in the study.

Their age ranged from 16 to 80 years (M = 33.63, SD = 11.22). Participants were

randomly assigned to one of the three conditions. At the end of the experiment, there were

30 participants in the character condition, 30 participants in the pinyin condition and 32

participants in the control (images-only) condition. Since the study required native

Chinese speakers, participants were recruited in Chongqing, China, for which the author of

this thesis received helped from his family members.

Procedure

The experiment was conducted on paper (A4 paper size). The bundle provided to

each participant contained 7 pages. The first two pages consisted of general information

about the experiment and a consent form in English and in Chinese (see appendix A, B,

and C). The participants were instructed to read the consent form carefully and to sign it.

We also asked participants to fill in their age and gender. They were then instructed that

they would be presented with a list of stimuli for each of which they would need to write

three associated words, in Chinese simplified characters. In the character and pinyin

conditions, a picture with the intended meaning of each stimulus was displayed together

with the written description; in the control condition, only these images were displayed.


Participants were asked to finish the experiment within 30 minutes. After finishing

the experiment, participants had to hand in the answer sheet. The participants performed

the experiment independently, so they did not know about other conditions. An example of

the instructions, the informed consent form, and the answer sheets can be found in the

Appendix.

Materials

As shown in Figure 2, each trial consisted of a row with four columns. The leftmost

column was reserved for the stimulus, while the three columns next to the stimulus were

reserved for the answers provided by the participants. Above each row, an instruction

sentence in Chinese was displayed.

Figure 2. A fragment of the answer sheet, presenting ‘tree’ (shu) in the pinyin condition

(Translation of the instruction: ”Please write down three associated Chinese characters or

words based on the pinyin and the picture”)

Images for the stimuli were selected based on the stimuli from an ongoing project

involving the thesis supervisor. In order to correctly place the image into each block, we

first reduced the size of each image to 250 x 150 (mm) and added the character script and

pinyin script into the upper left corner of the image for each condition. Three example

images that were used in the different conditions are shown in Figure 3.

Before the experiment, we asked several volunteers to perform a pilot test in order to

determine whether the envisaged number of stimuli would fit in the time allotted for the

experiment. We prepared 30 stimuli and asked three participants to provide three

associated words in Chinese simplified character form. All 3 participants spent nearly the


(a) (b)

(c)

Figure 3. An example of the stimulus ‘tree’ (树)

a: Simplified character condition, b: Pinyin condition, c: Control condition

same amount of time (30 minutes) to finish the experiment. In the control condition (i.e.

only show pictures), participants spent slightly more time than in the other two conditions.

Since we also needed to ask participants to read the instructions and write down their age

and gender, we decided to reduce the number of stimuli to 25.

As discussed before, there are different types of Chinese simplified characters. Based

on the analysis that about 80% to 90% of Chinese simplified characters are

semantic-phonetic compounds (Kang, 1993; Zhu, 1987), we only used semantic-phonetic

compounds in the experiment. Semantic-phonetic compounds characters consist of two

parts: the ‘semantic’ element part and the ‘phonetic’ element part.

According to Feldman and Siok (1999), approximately 75% of semantic-phonetic

compound characters have their semantic radicals on the left. For compound characters,

most left radicals function as semantic elements while the right radicals function as the

phonological elements. Since we wanted to show participants Chinese characters that are

used frequently, we decided on three criteria for the character selection: 1)

semantic-phonetic compounds; 2) preferably a left-right structure with the left radicals as


the semantic element and the right radical as the phonological element; and 3) sufficient

familiarity to participants. In order to satisfy the final criterion, we inspected the word

frequency of each potential stimulus in order to ensure that participants would be familiar

with it. Based on a corpus of film and television subtitles (Cai & Brysbaert, 2010), the

frequency rank for the stimuli in our study ranged from 255 to 1889 (out of 5939) (M =

869, SD = 449).

Measurements

Semantic similarity. The semantic similarity between stimuli and responses was

always based on the similarity between the stimulus in its simplified character form, even if

the participants didn’t see it, and the associations, which were always written in character

form. When generating the vector spaces for the word embeddings, we were faced with

potential concerns due to having chosen a different approach then S. Chen (2017).

Therefore, we decided to compute semantic similarity in both ways.

First, we generated word embeddings using Gensim (Řehřek & Sojka, 2010), with the

default settings (worker: 4, vector size: 100, window: 5, algorithm: CBOW) for the

word2vec model (Mikolov et al., 2013), on the Opensubtitles2018 corpus (Lison &

Tiedemann, 2016), which contains 191.4M words from film and television subtitles in 2

simplified Chinese characters. To be able to compare our results to those of S. Chen (2017),

we also used the off-the-shelf semantic spaces for Chinese provided by the Polyglot project

(Al-Rfou et al., 2013) that were used in her study. Polyglot contains word embeddings for

117 languages, based on Wikipedia articles, and the word embeddings are generated using

the curriculum learning method (Bengio, Louradour, Collobert, & Weston, 2009).

Phonetic similarity. Before computing phonetic similarity, we converted all stimuli

and responses from their simplified character representation into pinyin, using the open

2 http://www.opensubtitles.org/


source software pinyin 0.4.03. Then, we computed the phonetic similarity between each

stimulus and its associations using the Levenshtein distance (Levenshtein, 1966). The

Levenshtein distance quantifies the distance between two strings as the minimal number of

deletions, insertions, or substitutions required to transform one string into another. A

similarity is obtained from this distance by (l − d)/l, where d is the Levenshtein distance

between the two strings and l is the length of the longest of the two strings.

3 https://pypi.org/project/pinyin/


Results

Descriptive analysis

We collected results from 92 participants. Due to an error in generating the response

sheet from the stimuli, we included stimulus ‘鼓’ (gu3) twice. Therefore, we decided to

remove the results for the second presentation of the stimulus ‘鼓’ in each response sheet,

which resulted in the total number of stimuli decreased to 24.

For each pair of stimulus and response, we calculated semantic similarity based on

two semantic spaces: Opensubtitles (Gensim) and Wikipedia (Polyglot). Because some

participants failed to provide three associations for some stimuli, 75 observations were

missing (6549 instead of 6624 observations). Figure 4 summarizes the obtained semantic

similarity using the different approaches (see above). According to Figure 4, surprisingly,

the average semantic similarity of the pinyin condition was the highest in all cases.

Mixed-Effects Analysis

In order to answer the research question, we compared both the character condition

and the pinyin condition to the control condition, using separate linear mixed-effects

models, for each semantic space, and for phonetic similarity.

The fixed effect part of all three models contains main effects for condition (i.e.

reading different scripts), response order (1 to 3, from the first response to the last), and

presentation order (1 to 24) as well as all second and third order interaction effects

between these variables. All models used random intercepts for participants and stimuli.

To check whether random slopes were necessary, we compared models with a random slope

for response order on the random intercept for participant [2nd model] with models with a

random slope for response order on the random intercept for ‘participant’ and for ‘stimulus’

[3rd model]. Anova model comparison tests showed that, in all cases, the 2nd model should

be preferred over the base model and the 3rd model. Therefore, all of the analyses reported

below were done with the 2nd model’s random effects structure. Degrees of freedom and


Figure 4. Box-and-whisker plot showing the distribution of semantic similarity between

stimuli and associations in the three conditions (1: Chinese simplified characters, 2: pinyin,

3: control), using two different vector spaces (Opensubtitles [Gensim] vs Wikipedia

[Polyglot]). The top and bottom of a box show the first and third quartiles. The line in the

middle of a box is the median. The whiskers extend from the end of the boxes ± 1.5 times

the interquartile distance. The blue dots show outliers beyond this range.

p-values were calculated using Satterthwaite’s method (Satterthwaite, 1946).

Table 1 summarizes the results of the analysis on semantic similarity using the

Opensubtitles (Gensim) space. As shown in Table 1, the overall effect of condition was

significant. Analysis with treatment contrasts showed that semantic similarity was

significantly lower in the Chinese character condition compared to the control condition

(t(171) = -3.341, p = .001). The semantic similarity was also significantly affected by the

response order and presentation order and by the interaction effect between the Chinese


character condition and the response order (t(216.2) = 3.432, p < .001) and by the

interaction effect between the response order and the presentation order. Figure 5a

indicates the partial effects of reading condition and response order on semantic similarity

using the Opensubtitles (Gensim) semantic space.

Table 1

Summary of analysis of fixed effect on semantic similarity. Type III analysis of variance

table with Satterthwaite’s method, Opensubtitles (Gensim) semantic space

Sum Sq Mean Sq NumDF DenDF F value Pr(>F)

Condition 0.60 0.30 2.00 170.89 7.62 0.0007

Response order 3.75 3.75 1.00 242.23 95.80 0.0000

Presentation order 1.59 1.59 1.00 6033.47 40.52 0.0000

Condition × Response order 0.50 0.25 2.00 242.22 6.39 0.0020

Condition × Presentation order 0.01 0.01 2.00 6038.76 0.19 0.8289

Response order × Presentation order 0.24 0.24 1.00 6031.93 6.01 0.0143

Condition × Response order ×


Table 2 summarizes the results of the analysis on semantic similarity using the

Wikipedia (Polyglot) semantic space. The pattern of results was similar to the one found

using the Opensubtitles (Gensim) semantic space. The overall effect of condition was

significant. Analysis with treatment contrasts showed that semantic similarity was

significantly lower in the Chinese character condition than in the control condition (t(271)

= -3.110, p = 0.002), but there was no difference in semantic similarity between the pinyin

condition and the control condition. The semantic similarity was also influenced by the

response order and presentation order and also the interaction effect between the Chinese

character condition and response number (t(358) = 2.517, p = 0.012). Figure 5b indicates


the partial effects of reading condition and response order on semantic similarity using the

Wikipedia (Polyglot) semantic space.

Table 2

Summary of analysis of fixed effects on semantic similarity. Type III analysis of variance

table with Satterthwaite’s method, Wikipedia (Polyglot) word space


Condition 0.53 0.26 2.00 222.68 4.79 0.0092

Response order 1.65 1.65 1.00 360.80 29.88 0.0000







As Table 3 shows, for the phonetic similarity, the results showed that the effect of

condition was not significant. Analysis with treatment contrasts showed that neither

reading Chinese simplified characters or reading Chinese pinyin resulted in differences with

the control condition. Phonetic similarity was also significantly influenced by response

order and there was an interaction effect between condition and response order. Figure 5c

shows the partial effects of reading condition and response order on phonetic similarity.


Table 3

Summary of analysis of fixed effects on phonetic similarity. Type III analysis of variance

table with Satterthwaite’s method


Condition 0.07 0.03 2.00 161.96 2.68 0.0718

Response order 0.58 0.58 1.00 215.64 45.94 0.0000








(a) (b)

(c)

Figure 5. Partial effects plot showing the interaction between condition and response order.

Lines show condition: (1) Simplified characters; (2): Pinyin; (3) Control. Panels show

measure: (a) Semantic similarity based on Opensubtitles (Gensim); (b) Semantic similarity

based on Wikipedia (Polyglot); (c) Phonetic similarity.


Discussion and Conclusion

The current study tried to answer the question “To what extent do associations to

images accompanied by words written in simplified characters and pinyin differ from

associations to the same images without script?”. We asked participants in this study to

provide three associations to images accompanied by a description in simplified Chinese

characters, a description in pinyin, or no written description. The results from the

experiment were in line with a previous study by S. Chen (2017) in terms of the existence

of an effect of word script on semantic similarity. However, our results did not have the

same direction: word associations produced in the character condition had lower semantic

similarity to the stimuli than for the control condition, which itself did not differ from the

pinyin condition. Although we did not find statistically unambiguous effects of script type

on the phonetic similarity between stimulus and associations, the results point to a

decrease in phonetic similarity for associations to stimuli accompanied by simplified

characters compared to control stimuli.

Principal findings

The aim of the current study was to gain a deeper understanding of the relation

between a language’s written form and the way its users process that language. Because

Mandarin Chinese can be written using either Chinese simplified characters or pinyin, it

provides a unique tool for testing the effect of writing systems.

The results from this experimental study showed that Chinese native speakers

produce less semantically similar word association when they responded to images

accompanied by a written description in Chinese simplified characters. The effect did not

appear for stimuli presented in pinyin. This was the opposite of what we predicted in

hypothesis 1.

A potential explanation for this indirect processing of the pinyin stimuli may lie in

the way in which both scripts are commonly used. As we discussed before, Chinese


speakers predominantly use Chinese simplified characters in their daily life. Even if pinyin

is used as an intermediate step in writing, such as when typing Chinese characters on

mobile phones, reading pinyin without the corresponding Chinese characters is an unusual

task for Chinese speakers. In this sense, the pinyin stimuli may have had less effects on

word association processes because they are processed less ‘automatically’ and therefore are

more likely to be ignored by participants. This could have led to less attention being given

to the written part of the stimulus in the pinyin condition. For the character condition, on

the other hand, the processing of the character part of the stimulus is more likely to occur

completely automatically and therefore it contributes to associations processes

independently of the image part of the stimulus. One way to test whether this explanation

is reasonable and sensible is to conduct an experiment in which we compare a condition in

which pinyin occurs together with an image (condition 1) to a condition in which pinyin

occurs without an image (condition 2) and a condition in which characters occur without

an image (condition 3). If the above explanation is correct, then condition 2 should force

participants to pay attention to the pinyin part of the stimulus and should result in

different associations than in condition 1, but it should be similar to condition 3.

When we started this study, we were aware that the semantic space that was used in

the study by Chen had a shortcoming: Wikipedia pages use more formal language than we

use in daily life. A smaller issue was that the spaces from the Polyglot project were

constructed using the curriculum learning method (Bengio et al., 2009), which is not as

widely used as a training method. Therefore, we decided to make a semantic space using

data from film and television subtitles, which are more closely related to typical human

conversation, and to use Gensim to generate the semantic space using the CBOW training

method with the word2vec (Mikolov et al., 2013) algorithm. At the same time, we wanted

to be able to rule out that differences between our results and those of S. Chen (2017)

would be due to differences in the semantic space we used. Therefore, we also ran analyses

using the same semantic space that was used in the study by Chen. We found that both


spaces showed a very similar pattern of effects, by which we can conclude that (1) the

results are not dependent on which semantic space we chose and that (2) the results are

reliable and consistent across widely different language varieties and training methods.

Although in hypothesis 2, we predicted that phonetic similarity in the pinyin

condition would be higher than in the control condition and that phonetic similarity in the

character condition would be higher than in the control condition, the results did not

confirm this. Still, the results are not inconsistent with the study by S. Chen (2017), as the

difference in phonetic similarity between the simplified character condition and control

condition was marginally significant.

Throughout all analyses, we found a significant main effect of response order, which

can clearly be attributed to a general decrease in both semantic and phonetic similarity

from the first association to the last one. For semantic similarity, the interaction effect of

response order and condition showed that this decrease was more pronounced for the

second and third association than for the first one. We also found significant main effects

for presentation order on semantic similarity. Strangely, when measured using the

Opensubtitles (Gensim) space, the overall semantic similarity of the associations went

down as participants proceeded in the experiment, but the opposite happened when using

the Wikipedia (Polyglot) space. At this point, it is impossible to provide a reasonable

explanation for this finding, because it would at least involve running the different training

methods on both corpora.

Finally, a significant interaction between presentation order and response order on

semantic similarity was present in the analysis using the Opensubtitles (Gensim) space,

showing that the effect of presentation order was strongest for the first association and

became gradually less pronounced for subsequent associations.


Limitations and Recommendations

Some limitations of the study are worth pointing out. In comparison with the

previous study (S. Chen, 2017), the current study only included semantic-phonetic

compounds. It is possible that those characters activated both semantic and phonetic

associations at the same time, while characters with only semantic radicals primarily lead

to semantic associations. Future studies could investigate to what extent the type of

characters influence the word association processes.

For the character condition, the results illustrated that word associations had a lower

semantic similarity to the stimuli than for the control condition, but the associations in

pinyin condition did not differ from the control condition. As we discussed in the second

chapter, Chinese characters can be seen as the standard representation of the Chinese

language, while Chinese pinyin can be seen as the non-standard representation. We suggest

that a future study can examine another language by using the same research approach.

For example, in Korean, Hanja (Chinese characters) can be seen as the non-standard

representation, while the standard representation is Hangul (a phonetic system). Such an

experiment could test whether the effects remain when Chinese characters are the

non-standard representation rather than the standard.

The participants were mainly from the southern part of China, while pinyin is based

on the pronunciation in Beijing and China’s Northern Mandarin dialects. It is possible that

these participants were more likely to ignore the pinyin and therefore generate less

phonetically similar associations in the pinyin condition. Therefore, a recommendation for

future research is to compare this group with participants from Northern China, where the

dialect is closer to standard Mandarin Chinese.

Finally, we suggest that future research can implement a different way of computing

similarity of Chinese characters, based on the radicals and strokes in the character. The

more radicals and strokes are shared by two characters, the higher the similarity of those

two characters would be. Another method could use the visual similarity between Chinese


characters, as proposed by C.-L. Liu et al. (2011). They adopted and extended the basic

concepts of Cangjie, a proven Chinese input method. The Cangjie method defines 24 basic

elements in Chinese characters and the rules to decompose Chinese characters into these

basic elements (M. Liu, Rus, Liao, & Liu, 2016). The similarity between characters could

then be based on the total number of matched elements between two characters or on the

structure and the location of the matched elements.

Overall, our research provides an example of how word association processes can

provide an insight in the relation between written language and thought. We believe this is

a promising approach which needs more attention from researchers.

Conclusion

The current study tried to gain a deeper understanding of the relation between

written language and thought processes by taking an experimental approach using word

associations. The results of the current study demonstrated that written language can

influence word associations, but were only partially in line with our initial hypotheses. On

top of previous evidence that shows behavioural and neuropsychological differences in

processing Chinese characters and pinyin, this study also shows that different writing

systems can lead to cognitively different ways of representing language. This implies that

replacing Chinese characters with pinyin or vice versa would result in the loss of a unique

double way of representing language and decrease the richness of the Chinese language

system.


References

Al-Rfou, R., Perozzi, B., & Skiena, S. (2013, August). Polyglot: Distributed word

representations for multilingual nlp. In Proceedings of the seventeenth conference on

computational natural language learning (pp. 183–192). Sofia, Bulgaria: Association

for Computational Linguistics. Retrieved from

http://www.aclweb.org/anthology/W13-3520

Al-Samarrai, P. D. N. (2007). Language and culture a philosophical hypothesis. Journal of

Tikrit University for the Humanities, 14(4), 434–447.

Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In

Proceedings of the 26th annual international conference on machine learning (pp.

41–48).

Benjamin, A. (1997). History and prospect of Chinese romanization. Chinese

Librarianship.

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with

subword information. arXiv preprint arXiv:1607.04606.

Brysbaert, M., & New, B. (2009). Moving beyond kučera and francis: A critical evaluation

of current word frequency norms and the introduction of a new and improved word

frequency measure for American English. Behavior research methods, 41(4), 977–990.

Cai, Q., & Brysbaert, M. (2010). Subtlex-ch: Chinese word and character frequencies

based on film subtitles. PloS one, 5(6), e10729.

Cao, F., Vu, M., Chan, L., Ho, D., Lawrence, J. M., Harris, L. N., … Perfetti, C. A. (2013).

Writing affects the brain network of reading in Chinese: A functional magnetic

resonance imaging study. Human Brain Mapping, 34(7), 1670–1684.

Chen, M. J., & Yuen, J. C.-K. (1991). Effects of pinyin and script type on verbal

processing: Comparisons of China, Taiwan, and Hong Kong experience. International

Journal of Behavioral Development, 14(4), 429–448.

Chen, S. (2017). Does script shape thought? : the effects of Chinese simplified character


and Pinyin scripts on Chinese native speakers’ reading and writing production

(Master’s thesis). Retrieved from Item Resolution URL

http://arno.uvt.nl/show.cgi?fid=142435

Chen, Y., Fu, S., Iversen, S. D., Smith, S. M., & Matthews, P. M. (2002). Testing for dual

brain processing routes in reading: a direct contrast of Chinese character and pinyin

reading using fMRI. Journal of Cognitive Neuroscience, 14(7), 1088–1098.

Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and

lexicography. Computational linguistics, 16(1), 22–29.

Coltheart, M. (1981). Disorders of reading and their implications for models of normal

reading. Visible language, 15(3), 245.

Dictionary, M. C. F. (1986). Modern Chinese frequency dictionary. Beijing: Beijing

Language Institute Publisher.

Feldman, L. B., & Siok, W. W. (1999). Semantic radicals contribute to the visual

identification of Chinese characters. Journal of Memory and Language, 40(4),

559–576.

Firth, J. R. (1957). A synopsis of linguistic theory, 1930-1955. Studies in linguistic

analysis.

Fu, S., Chen, Y., Smith, S., Iversen, S., & Matthews, P. (2002). Effects of word form on

brain processing of written Chinese. NeuroImage, 17(3), 1538–1548.

Gu, S. (2011). A cultural history of the Chinese language. McFarland.

Guan, C. Q., Liu, Y., Chan, D. H. L., Ye, F., & Perfetti, C. A. (2011). Writing strengthens

orthography and alphabetic-coding strengthens phonology in learning to read

Chinese. Journal of Educational Psychology, 103(3), 509.

Harm, M. W., & Seidenberg, M. S. (2004). Computing the meanings of words in reading:

cooperative division of labor between visual and phonological processes.

Psychological review, 111(3), 662.

Harris, Z. S. (1954). Distributional structure. Word, 10(2-3), 146–162.


Hino, Y., Kusunose, Y., Lupker, S. J., & Jared, D. (2013). The processing advantage and

disadvantage for homophones in lexical decision tasks. Journal of Experimental

Psychology: Learning, Memory, and Cognition, 39(2), 529.

Hui, L. (2011). An investigation into the L2 mental lexicon of Chinese English learners by

means of word association. Chinese journal of applied linguistics, 34(1), 62–76.

Hussein, B. A.-S. (2012). The Sapir-Whorf hypothesis today. Theory and Practice in

Language Studies, 2(3), 642–646.

Kang, J. (1993). Analysis of semantics of semantic-phonetics compound characters in

modern Chinese. Information analysis of usage of characters in modern Chinese,

68–83.

Landauer, T. K., & Dumais, S. T. (1997). A solution to plato’s problem: The latent

semantic analysis theory of acquisition, induction, and representation of knowledge.

Psychological review, 104(2), 211.

Lenci, A. (2008). Distributional semantics in linguistic and cognitive research. Italian

journal of linguistics, 20(1), 1–31.

Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and

reversals. In Soviet physics doklady (Vol. 10, pp. 707–710).

Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings. In Proceedings of

the 52nd annual meeting of the association for computational linguistics (volume 2:

Short papers) (Vol. 2, pp. 302–308).

Lison, P., & Tiedemann, J. (2016). Opensubtitles2016: Extracting large parallel corpora

from movie and tv subtitles.

Liu, C.-L., Lai, M.-H., Tien, K.-W., Chuang, Y.-H., Wu, S.-H., & Lee, C.-Y. (2011).

Visually and phonologically similar characters in incorrect Chinese words: Analyses,

identification, and applications. ACM Transactions on Asian Language Information

Processing (TALIP), 10(2), 10.

Liu, M., Rus, V., Liao, Q., & Liu, L. (2016). Encoding and ranking similar Chinese


characters (Tech. Rep.). Tech report, Accessed online 08/2018, Chongqing

University.

Mair, V. H. (1986). The need for an alphabetically arranged general usage dictionary of

Mandarin Chinese: a review article of some recent dictionaries and current

lexicographical projects (No. 1). Order from Dept. of Oriental Studies, University of

Pennsylvania/CU.

Mandera, P., Keuleers, E., & Brysbaert, M. (2017). Explaining human performance in

psycholinguistic tasks with models of semantic similarity based on prediction and

counting: A review and empirical validation. Journal of Memory and Language, 92,

57–78.

Marshall, J. C., & Newcombe, F. (1973). Patterns of paralexia: A psycholinguistic

approach. Journal of psycholinguistic research, 2(3), 175–199.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed

representations of words and phrases and their compositionality. In Advances in

neural information processing systems (pp. 3111–3119).

Mills, H. C. (1956). Language reform in China: Some recent developments. The Journal of

Asian Studies, 15(4), 517–540.

Mushangwe, H., & Chisoni, G. (2015). A critical analysis of the use of pinyin as a

substitute of Chinese characters. Journal of Language Teaching and Research, 6(3),

685–694.

Penhallurick, R. (2010). Studying the English language. Macmillan International Higher

Education.

Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word

representation. In Proceedings of the 2014 conference on empirical methods in natural

language processing (emnlp) (pp. 1532–1543).

Pinker, S. (1995). Language acquisition. Language: An invitation to cognitive science, 1,

135–82.


Řehřek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora.

Roberts, J. (1999). A concise history of China. Harvard University Press. Retrieved from

https://books.google.nl/books?id=gWKDgM3_z-oC

Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance

components. Biometrics bulletin, 2(6), 110–114.

Schütze, H. (1998). Automatic word sense discrimination. Computational linguistics,

24(1), 97–123.

Shen, H. H., & Ke, C. (2007). Radical awareness and word acquisition among nonnative

learners of Chinese. The Modern Language Journal, 91(1), 97–111.

Sigurd, B. (2009). Computer simulation of word associations and crossword solving.

Working Papers in Linguistics, 45, 157–166.

Sturrock, J. (2008). Structuralism. John Wiley & Sons.

Tan, L. H., Spinks, J. A., Eden, G. F., Perfetti, C. A., & Siok, W. T. (2005). Reading

depends on writing, in Chinese. Proceedings of the National Academy of Sciences of

the United States of America, 102(24), 8781–8785.

Tong, X., Tong, X., & McBride, C. (2017). Radical sensitivity is the key to understanding

Chinese character acquisition in children. Reading and Writing, 30(6), 1251–1265.

Vulić, I., & Mrkšić, N. (2017). Specialising word vectors for lexical entailment. arXiv

preprint arXiv:1710.06371.

Whorf, B. L. (1956). Language and thought. Cambridge: MIT Press.

Whorf, B. L. (2012). Language, thought, and reality: Selected writings of benjamin lee

whorf.

Wittgenstein, L. (1953). Philosophische untersuchungen, translated by elizabeth anscombe

as philosophical investigations. Basil Blackwell: Oxford.

Xie, J. (2009). Guanyu fantizi yu jiantizi de ruogansikao. yanjiang xueyuan xuebao, 4,

45–49.

Zhou, X., & Marslen-Wilson, W. (2000). The relative time course of semantic and


phonological activation in reading Chinese. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 26(5), 1245.

Zhou, Z. (2014). The Six Principles of Chinese Writing and Their Application to Design

As Design Idea. Studies in Literature and Language, 8(3), 84.

Zhu, Y. (1987). Analysis of cuing functions of the phonetic in modern China. Unpublished

manuscript, East China Normal University.


Appendices

Answer Sheets that were prepared for participants to write down their responses (see

the next pages).

Appendix A: Answer sheets for the simplified character condition

Appendix B: Answer sheets for the pinyin condition

Appendix C: Answer sheets for the control condition

Appendix A: Example answer sheet charactercondition

Subject information and Consent Form

Welcome to the experiment! 中文翻译请见背面

This document gives you information about the study the effects of word form on language processing. Beforethe study begins, it is important that you learn about the procedure followed in this study and that you give your informed consent for voluntary participation. Please read this document carefully.

Aim and benefit of the studyThe aim of this study is to examine the word association processes of Chinese language. This study is done byFei Teng, a student under the supervision of dr. Emmanuel Keuleers at the department of Cognitive Scienceand Artificial Intelligence.

ProcedureDuring this study, you have to write done three associated words in Chinese for each keyword.

RiskThe study does not involve any risks or detrimental side effects

DurationThe study will last approximately 30 minutes.

VoluntaryYour participation is completely voluntary. You can refuse to participate without giving any reasons and youcan stop your participation at any time during the study. You can also withdraw your permission to use your experimental data up to 24 hours after the study is finished. All this will have no negative consequences whatsoever.

Further informationIf you want more information about this study you can ask Fei Teng ( If you haveany complaints about this study, please contact the supervisor, Emmanuel Keuleers.I, (NAME) have read and understood this consent form annity to ask questions. I agree to voluntary participate in this research study carried by the department of Cognitive Scienceand Artificial Intelligence of Tilburg University.

Your information

1. Your age years old.2. Your gender

2 Female2 Male


知情同意书欢迎参加本次实验！

本文为您提供有关本次实验 “研究词语的形式对语言处理的影响” 的信息。在研究开始之前，我们希望您能了解本次实验所需要遵循的程序，并提供给您自愿参与的知情同意书。请仔细阅读本文件。

研究的目的和好处这项实验的目的是研究汉语字形和拼音的关联过程。本研究由在认知科学和人工智能系的教授埃曼努尔·克里尔斯指导，由认知科学和人工智能系学生滕飞完成。

流程在这项实验中，您需要对每个关键词用中文写下三个关联词。

风险该研究不涉及任何风险或有害的副作用。

持续时间该研究将持续大约 30 分钟。

自主性您的参与完全是自愿的, 你可以拒绝参加，不需要任何理由。您可以在实验期间的任何时候停止参与。实验结束后 24 小时，您也可以撤回您可以使用您的实验数据的许可。所有这些都不会有任何负面影响。

更多信息如果你想了解更多关于这项研究的信息，你可以咨询滕飞（电子邮件：[email protected])。如果您对本次实验有任何不满，请联系埃曼努尔·克里尔斯教授（电子邮件：[email protected])。我，（姓名）已阅读并理解这份文件，并获得了机会提问。我同意自愿参加由蒂尔堡大学认知科学和人工智能系开展的这项研究。

您的信息3. 您的年龄岁。

4. 您的性别2 女性

2 男性


请写下您认为和这个中文字词和图片相关的三个中文字词







Appendix B: Example answer sheet pinyincondition









Further informationIf you want more information about this study you can ask Fei Teng . If you haveany complaints about this study, please contact the supervisor, Emmanuel Keuleers.I, (NAME) have read and understood this consent form and unity to ask questions. I agree to voluntary participate in this research study carried by the department of Cognitive Scienceand Artificial Intelligence of Tilburg University.

Your information


2 Female2 Male












2 男性


请写下您认为和这个中文拼音和图片相关的三个中文字词







Appendix C: Example answer sheet controlcondition









Further informationIf you want more information about this study you can ask Fei Teng . If you haveany complaints about this study, please contact the supervisor.I, (NAME) have read and understood this consent form and unity to ask questions. I agree to voluntary participate in this research study carried by the department of Cognitive Scienceand Artificial Intelligence of Tilburg University.

Your information


2 Female2 Male












2 男性


请写下您认为和这个图片相关的三个中文字词







Effects of Script Type on Word Association Processes in ...

Documents

Transcript of Effects of Script Type on Word Association Processes in ...