The prosody of formulaic expressions in the IBM/Lancaster Spoken English Corpus

28
e prosody of formulaic expressions in the IBM/Lancaster Spoken English Corpus* Phoebe M. S. Lin City University of Hong Kong is article examines the distribution of the nucleus around selected formulaic expressions in the IBM/Lancaster Spoken English Corpus (SEC). e study reveals the presence of a positional bias such that formulaic expressions found at the end of intonation units are more likely to receive the nucleus than those found at the beginning. Amongst the formulaic expressions located at the end of intonation units, 70 percent have the nucleus assigned to the last lexical word of the expressions. For the remaining cases, the obligatory nuclei are found either on the lexical words immediately preceding the expressions or on the first words, the degree words or the flexible slots within the expressions. e study shows how prosodically annotated corpora may facilitate research on the prosody of formulaic expressions. At the same time, it also raises awareness of the issues confronting this new research avenue. Keywords: speech prosody, formulaic language, idioms, phraseology, prosodically annotated corpora 1. Introduction Formulaic language is one of the fastest growing research areas in linguistics. It has been drawing attention from across linguistic disciplines in recent years because it challenges many theories built on the assumption that a word is the unit of linguistic analysis and language processes. Formulaic language has been given various names in the literature, for example, ‘idiomatic expressions’, ‘multiword expressions’, ‘conventional expressions’, ‘speech formulas’, ‘conversa- tional routines’, ‘prefabricated units’, ‘prefabs’, ‘phraseological units’, ‘collocations’, ‘n-grams’ and ‘lexical bundles’. ere is little consensus over the exact meanings and differences between individual terms and there is no sign that the debate will be resolved soon, so each study tends to provide its own definition for the term of International Journal of Corpus Linguistics 18:4 (2013), 561–588. doi 10.1075/ijcl.18.4.05lin issn 1384–6655 / e-issn 1569–9811 © John Benjamins Publishing Company

Transcript of The prosody of formulaic expressions in the IBM/Lancaster Spoken English Corpus

The prosody of formulaic expressions in the IBM/Lancaster Spoken English Corpus*

Phoebe M. S. LinCity University of Hong Kong

This article examines the distribution of the nucleus around selected formulaic expressions in the IBM/Lancaster Spoken English Corpus (SEC). The study reveals the presence of a positional bias such that formulaic expressions found at the end of intonation units are more likely to receive the nucleus than those found at the beginning. Amongst the formulaic expressions located at the end of intonation units, 70 percent have the nucleus assigned to the last lexical word of the expressions. For the remaining cases, the obligatory nuclei are found either on the lexical words immediately preceding the expressions or on the first words, the degree words or the flexible slots within the expressions. The study shows how prosodically annotated corpora may facilitate research on the prosody of formulaic expressions. At the same time, it also raises awareness of the issues confronting this new research avenue.

Keywords: speech prosody, formulaic language, idioms, phraseology, prosodically annotated corpora

1. Introduction

Formulaic language is one of the fastest growing research areas in linguistics. It has been drawing attention from across linguistic disciplines in recent years because it challenges many theories built on the assumption that a word is the unit of linguistic analysis and language processes. Formulaic language has been given various names in the literature, for example, ‘idiomatic expressions’, ‘multiword expressions’, ‘conventional expressions’, ‘speech formulas’, ‘conversa-tional routines’, ‘prefabricated units’, ‘prefabs’, ‘phraseological units’, ‘collocations’, ‘n-grams’ and ‘lexical bundles’. There is little consensus over the exact meanings and differences between individual terms and there is no sign that the debate will be resolved soon, so each study tends to provide its own definition for the term of

International Journal of Corpus Linguistics 18:4 (2013), 561–588. doi 10.1075/ijcl.18.4.05linissn 1384–6655 / e-issn 1569–9811 © John Benjamins Publishing Company

562 Phoebe M. S. Lin

choice. Following Wray (2002), ‘formulaic language’ is used here as an umbrella term to cover all types of word combinations that are, or appear to be, prefab-ricated (i.e. not generated word for word based purely on grammar rules). The decision to align with Wray’s (2002) psycholinguistically-oriented definition of the term was made due to the fact that prosodic structure is underpinned by psy-cholinguistic and cognitive factors such as information structure (i.e. the distinc-tion between given and new information) and the capacity of short term memory (see Chafe 1987; Lin 2010a, 2012).

As formulaic language is a mixed category covering a diversity of non-rule-based language, it needs to be clarified that this study addresses only a short list of formulaic expressions extracted automatically using Wmatrix (Rayson 2003, 2009) and then filtered manually based on certain criteria (see Section 4). There are clearly many more examples of formulaic language than those covered by this study. Moreover, the list of expressions covered here may appear to fall under different grammatical categories, including phrases, clauses and other sentence fragments. It is true that many formulaic expressions can be analysed from a grammatical point of view (see Heid 2011) and that grammar is often used to predict speech prosody (see Altenberg 1987, 1990; Knowles & Lawrence 1987). The intention of this article, however, is to explore if it is possible to look at speech prosody using an alternative approach that does not involve grammatical analy-sis. This is because, despite the remarkable success of grammar-based approaches to speech prosody prediction, developers of text-to-speech systems that use such approaches have reported efficiency problems due to the “heavy – and perhaps unrealistic – demands” (Altenberg 1990: 286) that the sophisticated grammatical analyses place on the parser. If a stable connection can be found between speech prosody and formulaic expressions, then formulaic expressions may be used to predict speech prosody. This, in turn, may help to reduce natural language pro-cessing (NLP) systems’ dependence on POS tagging and parsing when predicting speech prosody.

Many corpus-based studies have revealed that formulaic language is ubiqui-tous in our everyday speech (Altenberg 1998, De Cock 1998, Erman & Warren 2000, Sinclair 1991). However, relatively little attention has been given to the prosody of formulaic language until recently. As Lin (2012, 2013) shows, speech prosody plays a fundamental role in the acquisition, memory and meaning of formulaic language in first language (L1) acquisition. The contextual meaning of a formulaic expression is encoded in the tone of voice with which the expres-sion is delivered. The positive denotation of That’s great!, for example, can be overturned if the expression is delivered using a sarcastic tone. With spoken communication taking up an estimated 90 percent of an average person’s daily linguistic encounter in the L1 (Ronald Carter 2012, personal communication),

The prosody of formulaic expressions in the IBM/Lancaster SEC 563

we are exposed more often to spoken formulaic expressions than to written for-mulaic expressions. If most of the formulaic expressions we know have been acquired from and are used in speech, the phonological representation of formu-laic expressions should, in theory, play a fundamental role in the lexical storage and retrieval. Citing evidence from child language research (e.g. Peters 1977), Lin (2012) argues that the prosodic forms of formulaic expressions take prece-dence over their segmental forms in the child L1 acquisition process (see Lin 2012, 2013 for further discussions).

The prosody of formulaic language is a wide avenue of research that awaits exploration. This article reports on a study that sets out to explore the distribution of the nucleus around formulaic expressions in a prosodically annotated corpus. It aims to answer two specific questions:

i. Do formulaic expressions often receive the nucleus?ii. When placement of the nucleus in formulaic expressions is obligatory,

where will the nucleus be placed within the expressions?

The following discussion begins with an outline of existing research on the pros-ody of formulaic language (Section 2). Despite the scarcity of empirical research, there have been many suggestions and predictions about the prosodic features of formulaic language (see Lin 2010a for an in-depth discussion). Particular attention will be given to suggestions related to the distribution of the nucleus. Section 3 discusses issues confronting studies that use prosodically annotated corpora for investigating the prosody of formulaic language and explains this study’s choice to use the Lancaster/IBM Spoken English Corpus (SEC). Section 4 presents the methods for extracting formulaic expressions from the corpus and data analysis. Sections 5 and 6 discuss, respectively, the study’s findings con-cerning the two research questions and their implications for the next stages of research on the prosody of formulaic language. The article ends with concluding remarks in Section 7.

2. Formulaic expressions, prosodic fixedness and tonicity

It has long been suggested that formulaic expressions have a fixed prosody (Aijmer 1996, Baker & McCarthy 1988, Lin 2010a, Peters 1983, Wray 2002). However, interpretations differ as to what the fixed prosody of formulaic expres-sions means, not to mention the lack of consensus over the definitions and scope of formulaic language. Based on the idea that formulaic expressions are likely to be stored and processed as holistic units in the brain, many studies interpret the fixed prosody of formulaic expressions as the alignment of formulaic expressions

564 Phoebe M. S. Lin

with intonation units (Aijmer 1996, Altenberg & Eeg-Olofsson 1990, Baker & McCarthy 1988, Moon 1997).1 As Baker & McCarthy (1988) suggest, formulaic expressions are less likely to cross intonation unit boundaries and those that are the length of a clause will normally occupy only one intonation unit. Internal pauses within these formulaic expressions are also thought to be less likely (Aijmer 1996, Dahlmann 2009, Erman 2007, Wray 2004). For Aijmer (1996: 14), ‘prosodic fixedness’ also refers to the restricted choice of tones with which the expressions are delivered. For instance, her research shows that thank you is deliv-ered with a falling tone 70 percent of the time. In children’s speech, Peters (1977, 1983) observes that formulaic expressions are articulated with great fluency, even at the expense of articulatory accuracy.

Frequency of occurrence also has an effect on the rhythm of formulaic expres-sions. As Bybee (2002), citing Boyland (1996), explains, the fluency of speech pro-duction as a neuromotor behaviour will improve through practice. As formulaic expressions tend to be used frequently in spontaneous speech, their articulation becomes more fluent. In certain phonological contexts, phonological reduction may occur to shorten the articulation time even further (see Bybee 2002 for a review). Examples include don’t you, last year, I don’t know and I don’t care. These frequent phrases are expected to have a fast rhythm.

Focusing on describing the rules of English intonation, Wells (2006) notes the existence of various expressions whose nucleus placement patterns are con-sidered “idiomatic” because they appear to deviate from general English pro-sodic rules. He provides the cliché it’s not what he said, it’s the way that he said it as an example. His original prosodic annotations on the example are repro-duced below:

(1) It’s ‘not what he ˅said | it’s the ‘way that he \said it.

As shown in Example (1), the nucleus is assigned to the word said in both into-nation units. The first said carries a fall-rise tone, whereas the second said car-ries a falling tone. According to Wells (2006), the case is unusual because the nucleus is normally assigned to the word representing new information in an intonation unit (i.e. way), not to a word representing given information (i.e. the repeated word said).2 From our perspective, this practice in phonology of throw-ing examples that deviate from general English prosodic rules into the idiomatic-ity wastebasket is debatable. This is not only because the definition of idiomaticity is ad-hoc, but also because the practice has assumed that there is no regularity behind the prosody of formulaic language.

The present study is particularly interested in revealing the regularity behind the tonicity (i.e. the placement of the nucleus) of formulaic expressions, if it exists. According to Ashby (2006), such regularity is present in semantically opaque

The prosody of formulaic expressions in the IBM/Lancaster SEC 565

idioms (which are considered a class of formulaic language, according to our broad definition). For idioms that permit both figurative and literal interpreta-tions (e.g. my ears are burning), he suggests that the unmarked, figurative inter-pretation involves the avoidance of narrow focus in the non-compositional parts of the idioms.3 In the sentences it was raining cats and dogs and John has a chip on his shoulder, the non-compositional parts are respectively cats and dogs and chip on shoulder. The word raining lies outside the non-compositional part of the idiom because rain

makes a perfectly straightforward contribution to the meaning [of the idiom… As for John has a chip on his shoulder,] have is presumably outside the non-com-positional part, and the one’s [his] is just some kind of formal placeholder which is necessitated by the attempt to put the idiom into a ‘citation’ form for inclusion in a lexicon. (Ashby 2006: 1594)

Ashby’s (2006) definition and identification of the compositional and non-com-positional parts of an idiom seem vague. However, assuming that his proposed distinction between the compositional and the non-compositional parts is feasi-ble, his theory would offer a simple answer to the search for any regularity behind the tonicity of semantically opaque idioms. If a speaker intends to suggest that cats and dogs are literally falling from heaven, narrow focus will be introduced to cats and dogs through giving pitch prominence to the word cats, the word and or both. Otherwise, for the unmarked figurative interpretation, the expected pro-sodic pattern is to use broad focus, which means that the nucleus falls only on the word dogs.

Going beyond Ashby’s (2006) work on semantically opaque idioms, Lin (2010b) hypothesises that the constraints of tonicity in formulaic language, as broadly defined, may be realised as a lower tendency for formulaic expressions to receive stress. Her idea is that if formulaic expressions are processed as if they are single units, the norm should be for only one stress to be assigned to each expres-sion. To test the hypothesis, Lin (2010b) examines the tonicity of all formulaic expressions used in a short extract of an academic lecture, where the expressions were identified using native speakers’ collective intuitive judgement. The results appear to support the hypothesis as words within formulaic expressions are found to be distinctly less likely to receive stress regardless of their word class (i.e. lexical or functional).

While Lin (2010b) sheds light on how holisticity may influence the tonicity of formulaic language, there are other lesser known perspectives on the tonicity pat-terns of formulaic expressions. These include the concepts of ‘predictability’ and ‘semantic emptiness’. According to Wells (2006), ‘semantic emptiness’ influences tonicity patterns in the sense that lexical words that are semantically “empty” are

566 Phoebe M. S. Lin

unlikely to receive the nucleus. That is to say, these words tend to be deaccented. While the examples Wells (2006) provides to illustrate this general English pro-sodic rule are single word items such as things and people, there is reason to believe that the scope of the ‘semantic emptiness’ rule may extend beyond the single word level. A class of formulaic expressions commonly called ‘fillers’ (such as you know) may be prime candidates for semantically “empty” phrases that tend to be deac-cented in naturally occurring speech.

Similarly, ‘predictability’ may explain deaccentuation in a class of formulaic expressions called ‘collocations’. According to Bolinger (1972), predictable words are more likely to be deaccented than less predictable ones. Using Bolinger’s (1972) original examples, in the frame I have NP to V, if the NP and V form a pair of V-NP collocations (e.g. make-point, solve-problem, cover-topics, eat-food), the nucleus will not be assigned to the verb (i.e. make, solve, cover, eat) even though the last lexical word is the default position for the nucleus (i.e. broad focus, see Note 3). Conversely, if the NP and V are not collocates (e.g. emphasise-point, computerise-problem, elucidate-topics, carry-food), the nucleus will remain in the default position and be assigned to the verb.

As research on the prosody of formulaic language is still in its infancy, many of the suggestions discussed above remain to be tested empirically. To this end, prosodically annotated corpora offer a very good testing ground for these specu-lations concerning the fixed prosody of formulaic expressions.

3. SEC and the challenges of a corpus-based approach to the prosody of formulaic language

Prosodically annotated spoken corpora are quite rare in the field because prosodic annotation is a very labour-intensive and costly procedure. Some well-known prosodically annotated corpora to-date include: the London-Lund Corpus of Spoken English (LLC); the Intonational Variation in English (IViE) corpus; the Hong Kong Corpus of Spoken English (HKCSE); Santa Barbara Corpus of Spoken American English (SBCSAE); the Switchboard Corpus; and the SEC.4 Each of these corpora has been annotated with different types of prosodic information depending on their corpus design. For instance, the LLC provides information about tonality (i.e. the division of utterances into intonation units), tonicity and pitch change, but not rhythm. The SBCSAE provides tonality and temporal data, but not tonicity and pitch change. Each prosodically annotated corpus is suited to answer different research questions.

While prosodically annotated corpora open up many new opportunities for testing existing generalisations and rules in phonology, at present there are

The prosody of formulaic expressions in the IBM/Lancaster SEC 567

many challenges that hinder the use of these corpora to investigate the prosody of formulaic language. The most notable obstacle is their typically small size.5 Automatic formulaic language extraction tools often rely on the frequency of the co-occurrence of word combinations to extract candidates. Since word combina-tions are highly variable, the tools usually rely on the large size of the corpora to advance the robustness of the extraction results (aka their ‘psycholinguistic validity’, see Dahlmann & Adolphs 2007, De Cock 1998, Schmitt et al. 2004). Combining multiple prosodically annotated corpora in order to maximise corpus size is not an option because the different prosodic annotation systems adopted often render the corpora incomparable. This problem of corpus size has so far limited not only the types of investigations that have been conducted into the prosody of formulaic expressions, but also the research results’ level of robust-ness. For instance, corpus investigations into the prosody of formulaic expres-sions rarely go beyond the description of only the most frequent expressions in speech (e.g. thank you in Aijmer 1996 and I don’t know why in Lin & Adolphs 2009). When observations are made about the prosodic features of formulaic expressions that are less frequent in a prosodically annotated corpus, the obser-vations can only be tentative because there are often insufficient instances of the same expressions to warrant definitive conclusions. That said, we call for more, rather than fewer, corpus-based investigations into the prosody of formulaic lan-guage because a greater understanding of the significance and implications of the prosody of formulaic language will lead to more resources for expanding prosodic annotated corpora (see Lin 2012).

Amongst the many prosodically annotated corpora available, this study chose the SEC due to the availability of the original sound recordings and its extensive prosodic annotations. Since the launch of the corpus in the 1990s, many more layers of prosodic annotations have been added to the original SEC prosodic annotations by different research teams under a number of projects, including The Machine Readable Spoken English Corpus (MARSEC, Roach et al. 1993), and Aix-MARSEC (Auran et al. 2004). In the end, the corpus provides not only tonicity, tonality and pitch change data, but also data of temporal alignment even to the phoneme level. It can be said that the SEC is one of the most finely anno-tated prosodic corpora available and one of the richest resources for investigating English prosody.

The 52,637-word SEC collects naturally occurring speech from 11 categories including: commentary; news broadcast; lecture type I; lecture type II; religious broadcast; magazine-style reporting; fiction; poetry; dialogue; propaganda; and miscellaneous. The corpus was annotated jointly by phonologists Gerald Knowles and Briony Williams using a prosodic annotation system that follows the “British nuclear tone approach” (Cruttenden 1997: 28). Like other systems which follow

568 Phoebe M. S. Lin

the same approach (e.g. Halliday 1967, O’Connor & Arnold 1973), SEC’s system reflects only the phonologist’s perception of the prosodic features and excludes acoustic measurements. The system records 4 types of information in fine detail, including two types of intonation unit boundaries (i.e. major versus minor bound-aries), the presence of an intonation unit-internal pause (˄), 9 types of tonetic stress marks and a symbol (∙) for marking syllables that have rhythmic promi-nence but not independent tone movement (see Appendix 1 for the SEC prosodic annotation symbols and Knowles et al. 1996 for details about the system). The different types of ‘stressed’ syllables can be arranged in a hierarchy of prominence. Unstressed syllables aside, at the lowest level are syllables that are marked with (∙). In the annotators’ words, these syllables are “stressed but unaccented” (Knowles et al. 1996: 3). At a higher level in the hierarchy are syllables with independent tone movement and rhythmic prominence. These ‘accented’ syllables are the ones that carry the tonetic stress marks. Amongst all the accented syllables within an intonation unit, there can only be one nucleus. This nucleus has the highest level of prominence within its intonation unit and is taken to be the last accented syl-lable of every intonation unit.

In order to check the reliability of the prosodic annotation system, 9 percent of the SEC (totalling 4,680 words from 24 texts) was annotated by both phonolo-gists. Their annotations on these “overlap passages” were compared in detail by Pickering et al. (1996). According to the SEC’s prosodic annotation system, the insertion of an intonation unit boundary is guided by the presence of such fea-tures as a period of silence, significant lengthening of a preceding segment in con-junction with an intonation discontinuity and the absence of assimilatory effects. Pickering et al.’s (1996) examination shows that the phonologists’ agreement in the insertion of intonation unit boundaries reaches 69 percent (i.e. 810 out of a total of 1,178 intonation units). Disagreement in annotations arises in 368 cases (31 percent of the total) because the phonologists’ preferences differ: Knowles prefers shorter intonation units and has inserted boundaries in the overlap pas-sages where Williams has not in 274 cases (23 percent of the total). Williams, on the other hand, never makes use of the intonation unit-internal pause marker (˄) while Knowles marks 70 of them. There is clearly room for minimising gaps due to individual preferences and improving the 69 percent agreement between the phonologists. However, the SEC prosodic annotation system’s level of reliability appears acceptable.

The prosody of formulaic expressions in the IBM/Lancaster SEC 569

4. Methods for extracting formulaic expressions and data analysis

To extract formulaic expressions in the SEC, this study used Rayson’s (2003, 2009) corpus analysis tool, Wmatrix, because it operates by imposing 18,971 pre-defined formulaic expression templates (Paul Rayson 2011, personal communi-cation) on a corpus after it has undergone part-of-speech (POS) and semantic tagging using CLAWS and SEMTAG respectively (for further discussion, see Piao et al. 2005a, 2005b). Wmatrix’s use of pre-defined templates to extract formulaic expressions means that it is equipped to handle morphological, syntactic and compositional flexibility of formulaic expressions. The fact that the tool does not rely on the frequent co-occurrence of word forms to identify formulaic expres-sions is particularly desirable for all studies that examine formulaic language in prosodically annotated corpora because the robustness of the extraction results will not be affected by the typically small size of prosodically annotated corpora. That said, Wmatrix’s performance in sampling meaningful formulaic expres-sions is dependent upon the quality of the POS tagging, the semantic tagging and the number and accuracy of the formulaic expression templates. Because so many automatic analytical steps are involved in the extraction process, the margins of error may increase accordingly (Dahlmann 2009). Therefore, it is not surprising to see that the output of Wmatrix contains items in which formulaic-ity may appear debatable.

The SEC was submitted to Wmatrix for automatic formulaic language extraction. After POS and semantic tagging, Wmatrix extracted 1,580 (types of) expressions from the SEC with length ranging from 2-word to 5-word. As this article is primarily interested in longer formulaic expressions, 2-word expres-sions (e.g. for example, Hong Kong, you know, sort of) were excluded from the analysis. This resulted in a list of 380 expressions. This list of expressions was further filtered manually to exclude expressions about time or numbers (e.g. four and a half hours, three thousand and fifty five, for two or three times, three out of four), expressions that are proper nouns (e.g. People’s Liberation Army, Hong Kong and Shanghai Bank, Bank of China, New York Times, Mayor of Duisburg, Tunbridge Wells Central, James Bonecrusher Smith), compound nouns (e.g. inner city area, British high commission, custom and excise, chief of police, state of mind) and place names (e.g. north of the border, south of England, Madrid in Lyons, southwest of China, city of Haifa). These expressions clearly qualify as multiword expressions. However, they have not been included at this stage because their composition appears less restricted compared to other extracted expressions. In theory, there could be an endless list of quantity expressions, proper nouns and place names.

570 Phoebe M. S. Lin

In the end, the data of this study are 218 types (or 339 tokens) of formulaic expressions of at least 3 words long in the SEC (see Appendix 2). Concordance lines of these 339 tokens of formulaic expressions were extracted manually from the corpus with the context size set to one intonation unit to the left and right of the node intonation unit embedding the expression. Using an electronic spread-sheet, the formulaic expression and the nucleus of the node intonation units were highlighted on each prosodically annotated concordance line. Instances of these marked concordance lines are shown in Examples (2) to (11) (with the nuclei underlined and the Wmatrix-defined formulaic expressions in bold):

(2) | to ˋClitheroe | in ˉorder to ∙spend the ˏevening | in the ˉtown ˎlibrary | (SEC M06)

(3) | ∙walking ˉtwo ˏmiles | in ∙order to ∙work an iˉllegal ∙twelve hour ˬday | from (six aˏm | until ˉsix pˎm || (SEC M06)

(4) | shall be imˏposed upon any ˏperson | in ˉorder to entitle ˉhim | to be adˉmitted | (SEC M05)

(5) || they ˄ ˇwill have to ∙study | in ∙order to ∙get through the eˋxams | in ∙order to ∙get ˄ a good ˋjob || (SEC J06)

(6) | in ∙order to ∙get through the eˋxams | in ∙order to ∙get ˄ a good ˋjob || ∙em ˄ the ˋpeople themˬselves | (SEC J06)

(7) | then you had to be ˋvery very good at ˋarguing in Chiˋnese || in ˍ order to | ˇyou know | (SEC J06)

(8) || [RG] in ̄ what ̱ ways did you have to ̀ change | in ∙order to | get ̀ used to that kind of ∙culture || (SEC J06)

(9) | ̀ will have to ∙study in order to get through the eˋxams in order to get | a good ˋjob || (SEC J06)

(10) | ˋand those of contemporary ˋcritics | in ˉorder to conˬsider | ˋwhat in ∙these ˇterms | (SEC D01)

(11) | and ∙possibly ˬalso en∙courage ˋimports | in ˍorder to conˇserve | doˬmestic re∙sources | (SEC C01)

All the marked concordance lines were then classified according to the position of the formulaic expression in the intonation (i.e. whether it occupies the whole of an intonation unit, or it is located at the beginning, in the middle or at the end of an intonation unit). As seen in Examples (2) to (11), the expression in order to is located at the beginning of the node intonation unit 7 out of 10 times. It occupies the whole of the intonation unit 2 times, is found in the middle of an intonation unit once, but never at the end of an intonation unit.

The prosody of formulaic expressions in the IBM/Lancaster SEC 571

5. Findings

This section presents the findings that address the two research questions of this study: “do formulaic expressions often receive the nucleus?” (Section 5.1); and “where will the nucleus be placed within the expressions, when the nucleus on the formulaic expressions is obligatory?” (Section 5.2).

5.1 Do formulaic expressions often receive the nucleus?

To answer the question about whether formulaic expressions often receive the nucleus when they are used in utterances, we must first deal with the issue of the position of formulaic expressions in utterances. This is because the pros-ody of lexical items is inextricably linked to their position in an utterance. The beginnings of intonation units are often characterised by higher pitch and faster rhythm and the endings of intonation units by lower pitch and slower rhythm (see Cruttenden 1997). In terms of tonicity, words at the end of intonation units are much more likely to receive the nucleus than other words in the same intonation units. This is not only because broad focus is the default tonicity pattern, but also because the assignment of the nucleus begins from the right to the left. In other words, the penultimate word of an intonation unit will only be considered if the ultimate word is deemed unsuitable to receive the nucleus. This leftward search will continue until a suitable word to receive the nucleus is found.

In the light of this positional bias inherent in English prosody, it only makes sense if the tonicity of formulaic expressions is considered along with their positions in intonation units. The distribution of the positions of the 339 tokens of formulaic expressions extracted from the SEC in their intonation units is shown in Table 1.

Table 1. Distribution of formulaic expressions by their positions in intonation units

Beginning Whole IU End Middle of IU TOTAL

64 66 111 98 339

Considering the effect of the positional bias, it is clear that the answer to the research question “do formulaic expressions often receive the nucleus?” depends on the distribution of the positions of formulaic expressions in the dataset. The likelihood that formulaic expressions receive the nucleus will increase with the number of formulaic expressions found at the end of or occupying the whole of the intonation units (and vice versa).

To bypass the problem due to the positional bias, it is important to consider in isolation the tonicity of formulaic expressions found at the end of or occupying

572 Phoebe M. S. Lin

the whole of the intonation units. These two types of expressions are much more likely to receive the nucleus than the others. Therefore, they offer us the unique opportunity to observe the interaction between formulaicity and tonicity.

Amongst the formulaic expressions found at the end of or occupying the whole of intonation units, 70 percent of the time the nucleus is assigned to the last lexical word of the formulaic expressions (see Table 2). These cases meet our general expectation. Examples (12) to (14) demonstrate these cases from the SEC:

(12) | I’d ̄ like to ̱ draw your aˏttention | to something a ̀ lot of us ∙take for ̀ granted | our ˋhearing || (SEC K02)

(13) | and you ˉcan’t beˋlieve || I ˍknow we take this for ˋgranted | in ˇEngland | that | ̀ oh I’ll just go ̀ home for || but the ̀ nation’s ̌ diplomats | ̄ can’t take ∙that for ˋgranted | and ˇtime | ˉmay not ∙be on their ˋside || (SEC J06)

(14) | for the ˎinterest ∙factor | that ∙has to be ˏtaken into aˏccount | ˎin such ∙calcuˏlations | ↑that ˍgovernments | (SEC C01)

Table 2. Does the nucleus fall on the last lexical word of formulaic expressions?

Nucleus falls on the last lexical word

Nucleus does not fall on the last lexical word

Total

Whole IU 46 (70%) 20 (30%) 66End 78 (70%) 33 (30%) 111Total 124 53 177

However, as Table 2 shows, 53 (= 20 + 33) of the formulaic expressions do not display the default tonicity pattern where the nucleus is assigned to the last lexi-cal word of the intonation unit. These expressions do not receive the nucleus on their last lexical word even though the odds are all in their favour due to the influence of the positional bias. In Examples (15) to (21) below, it is clear that the nuclei fall on a lexical word preceding the formulaic expressions (i.e. identity, fate, change, required, better, clearly and lost) instead of the last words of the expressions (i.e. like, words, world, day, whole, mind and past). Many reasons could explain why these formulaic expressions were deaccented in context. It may be because of the speakers’ intention to contrast or emphasise the seman-tics of a word located before the intonation unit-ending formulaic expressions. However, it is also possible to view the deaccentuation pattern from the perspec-tive of the ‘semantic emptiness’ theory described earlier. Put simply, formulaic expressions may be deaccented because they are semantically “light-weight”. The argument for the semantic emptiness of formulaic expressions was first put for-ward by Coulmas (1981). The discussion section will provide further informa-tion on this argument.

The prosody of formulaic expressions in the IBM/Lancaster SEC 573

(15) ˉsort of | like the ˋinternal passport for Chiˋnese an iˋdentity ∙card if you ∙like || and | theoˇretically we were (SEC J06)

(16) of the ˋpresent ∙motions of the ˍparticles of ˎmatter || our ˇfate in ∙other ∙words | is ˏwritten | ∙in the ˎatoms | (SEC D02)

(17) || ˋthings ∙do ˬchange in ∙this part of the ∙world | in the ∙most unbeˇlievable | ˎways || a ∙forty year old (SEC A02)

(18) | ˉseeking ˎaccess | to the ˍPNC ˬmeetings | were reˇquired the ∙other ∙day | to ˋfill in ˬforms | to aˉpply for (SEC A02)

(19) | fiˏnancial | and ˇincome ∙funds | have done ˇbetter on the ∙whole | ↑though ˇsome of our ∙leaders | have (SEC F03)

(20) | a ˋproject | would ˋshape it∙self | ˉquite ˎclearly in my ∙mind || he ∙turned ˋrestlessly a∙way from (SEC G05)

(21) ˏPollock | of the ˍmain union EIʹS | says ∙teachers only ˇlost ground in the ∙past | by ˎnot ∙striking || (SEC B03)

5.2 When placement of the nucleus in formulaic expressions is obligatory, where will the nucleus be placed within the expressions?

In addressing the first question, it was found that some formulaic expressions were deaccented even though the positional bias favours their accentuation. The extended question, then, is where is the nucleus placed when its placement within a formulaic expression is obligatory?

An examination of our data from the SEC reveals diverse tonicity patterns of formulaic expressions. In addition to examples of formulaic expressions that avoid accentuation altogether, three alternative tonicity patterns emerged. These three patterns include the assignment of the nucleus to the first word of the for-mulaic expressions, to the degree words in the expressions or to the flexible slots of the expressions. These patterns may appear to be an exciting finding, especially because so little has been mentioned in the empirical literature about the tonicity of formulaic expressions. However, they need to be considered with caution due to the small sample size of this study. The three tonicity patterns should not be taken as absolute rules either because there may well be examples that lie outside of the patterns. It is also noteworthy that in the majority (70 percent) of cases, the nucleus does fall on the last lexical word of formulaic expressions that are either found at the end of or occupying the whole of intonation units. With all these conditions in mind, the three tonicity patterns pertaining to the remaining 30 percent of the cases are presented below, in the hope that they could broaden the basis of further discussions about the prosodic features of formulaic expressions.

574 Phoebe M. S. Lin

In corpus linguistics, it is particularly important that more and larger prosodi-cally annotated corpora become available so that these initial observations can be examined more vigorously in a quantitative manner. Then, a more objective assessment can be made of the frequency of occurrence and the relative impor-tance of the three tonicity patterns reported below.

When the placement of the nucleus within a formulaic expression is obliga-tory, the first pattern observed in our data is that some formulaic expressions have the nucleus assigned to their first word regardless of the first word’s word class (i.e. lexical or functional). Examples from the SEC include more or less, late in life and across the road (see Examples (22) to (26)). While both late in life and across the road generate only one concordance line in the corpus, there are more instances of more or less.

(22) || ↑but | ∙strangely eˋnough this was | ˋmore or less the ∙same for | European ˋmale teachers || I remember (SEC J06)

(23) | I ˋwas treated | ˋreasonably ˉwell | and I was ∙treated ˇmore or less | as an ∙honorary ˋmale || er || (SEC J06)

(24) | not ˇreally || it was | the ˋtobe was | ˇmore or less | uniˋversal || and it was ˋawfully ˋhot || ˇmost of them (SEC J06)

(25) ˬdifficult ∙for him | to mainʹˋtain this be∙lief || ˉquite ˇlate in ∙life | he ∙wrote his ∙famous ˍnovel (SEC D02)

(26) I ˋcame round a ˋbend in the ˇroad | and ∙there was a ˋtree lying aˋcross the road || I only ˋjust managed to (SEC J02)

The formulaic expression more or less indeed presents an interesting case. Between the three words of the formulaic expression the tendency to place more emphasis on the first word more than the other words or and less is obvious, as the three concordance lines above show. That is why in the above concordance lines the nucleus is assigned to the word more. However, this tendency for the first word (i.e. more) to attract more emphasis holds even in cases where the nucleus lies outside of the formulaic expression, as the following concordance lines of more or less (i.e. Examples (27) to (29)) illustrate.

(27) || [HK] ̌ yes | it was ̀ more or ∙less a ̀ private ˬ lesson | I had ̀ two young Iˇtalian ∙brothers | I ∙use (SEC J06)

(28) | that’s ˋnot ∙really very ˎmuch || it’s ˇmore or ∙less sub^sistence ˏreally || [RG] ˉthat’s the ˎtrouble | with ∙the (SEC J06)

(29) || [HK] I ˋdon’t ˬthink so | I ∙think the Suda∙nese ∙more or less ∙left en ˋmasse || [RG] ˋyou worked in the (SEC J06)

The prosody of formulaic expressions in the IBM/Lancaster SEC 575

Recalling the discussion on the hierarchy of prominence in Section 3, it is clear that the word more in these concordance lines has a consistently higher level of prominence than the word less. For instance, in the first concordance line, more is an accented syllable with both pitch and rhythmic prominence while less is a stressed but unaccented syllable with only rhythmic prominence.

The second tonicity pattern of formulaic expressions observed in our data involves the nucleus falling on the degree words (e.g. some and certain) of expres-sions containing these words (e.g. in some respect, to a certain extent). This obser-vation has been mentioned in Wells’ (2006) introspective research. Instances from our corpus data are shown in Examples (30) to (33):

(30) || ↑which | I ˉdidn’t | ˍlike in ˎsome respects || ˋobviously it was | very conˋvenient | but | to be ˋtreated ˋdi (SEC J06)

(31) || the ˇsame de∙lusion | is ∙shared to a ˇcertain ex∙tent | by ˋSWAPO | who aˋppear to have ˋtotal (SEC A09)

(32) || but ˋwhile ˇsome changes o∙ccur | a ˬgreat deal | reˍmains the ˎsame || that ˋform I ∙mentioned ˎearlier for (SEC A02)

(33) | and ˋeverybody thought that | ↑given a ˇlittle bit of ∙luck | England ˇmight well have | not necessarily (SEC F04)

The third and final tonicity pattern involves formulaic expressions that are more flexible. These expressions typically contain a flexible slot in the middle (e.g. as far as X be concerned, on the X side of things). In our data, the nuclei are often assigned to the flexible slots of these expressions. In the following concordance lines from our data (i.e. Examples (34) to (38)), these two expressions also display a tendency to align with intonation unit boundaries (see Lin & Adolphs 2009 for a discussion on the correspondence between formulaic expressions and intonation unit boundaries).

(34) ˋreally should have been the ˇshowpiece | as far as ˇEngland ∙cricket was con∙cerned | but it ˉwent wrong (SEC F04)

(35) ˋhand baggage is the order of the ˋday || so as ˋfar as seˇcurity is con∙cerned || Beiˋrut is a ∙lost ˋcause || (SEC A08)

(36) the ˇbiggest | and most ˋpopular | ˬwin | as ∙far | as the ˇlocal crowd was con∙cerned | was of course ˉthat | (SEC J01)

(37) || there ˋwas a ∙problem ˋalso | as ∙far as ˇEngland was con∙cerned | was that they ˉhadn’t ˍhad | any (SEC J01)

(38) ∙what I ∙had already ˋdone || but on the ˋsocial side of ∙things | it’s a ˋlovely ˋcountry to ˬgo to | if you (SEC J06)

576 Phoebe M. S. Lin

6. Discussion

This article raises questions concerning the prosody of formulaic expressions. The starting point of the present investigation is whether formulaic expressions have a fixed prosody just as they are (semi) fixed in terms of lexical constitution and grammar. Amongst the many possible interpretations of what a fixed prosody may mean (Section 2), this study chose to examine the tonicity patterns of for-mulaic expressions.

As Altenberg (1987: 190) puts it, tonicity is basically “speaker-selected: it reflects the speaker’s way of ‘packaging’ his message (chunking) and his selection of an idea as the main point of interest (focus)”. The assignment of the nucleus (and other types of stresses) can be rather difficult to predict because it depends on a speaker’s choices in the immediate speech context. Against this background, it is surprising that formulaic expressions influence tonicity choices in naturally occurring speech. Their effect presents itself in such a way that some formulaic expressions avoid accentuation despite their location at the end of intonation units. Other expressions attract the nucleus to their first words, their degree words and their flexible slots. These are the four tonicity patterns illustrated in Section 5.

While the small size of most prosodically annotated corpora is an issue that needs to be tackled in the long term, the present investigation has shed some light on our understanding of some aspects of formulaic expressions. The remaining part of this article will be devoted to the discussion of three issues brought up in the findings, namely the positional bias of formulaic expressions, the relationship between semantic weight and tonicity patterns, and the relationship between flex-ibility and tonicity patterns.

6.1 Positional bias and formulaic expressions

This article argues that the prosody of formulaic expressions should be consid-ered along with the position of the expressions in intonation units. This argument was advanced on the grounds that English prosody has always been inextricably linked to textual position. The beginnings of intonation units (and utterances) are often associated with higher pitch and faster rhythm whereas their endings with lower pitch and slower rhythm (see Cruttenden 1997, Wichmann 2000). Regarding tonicity, previous research (Quirk et al. 1964, Crystal 1969) has shown that the nucleus has a very high tendency (i.e. 80 to 95 percent) to fall on the last lexical word of an intonation unit.

The prosody of formulaic expressions in the IBM/Lancaster SEC 577

The high tendency for the nucleus to fall on the last lexical word of intonation units, predictably, also applies to our corpus data. Our attention, however, has turned to those formulaic expressions which are located at the end or take up the whole of intonation units and yet do not receive the nucleus. Favoured by their position in the intonation units, these formulaic expressions should have a high chance of receiving the nucleus. Yet, examples were found in which the nucleus is shifted to the lexical word preceding the formulaic expressions. Explanations as to why these formulaic expressions avoid accentuation will be discussed in Section 6.2. For now, our attention should return to the observation that connects the position of formulaic expressions and their tonicity patterns.

In the corpus linguistics literature, the position of words and phrases has been the centre of discussion under the topic of textual colligation. According to Hoey (2005: 13), “[e]very word is primed to occur in, or avoid, certain positions within the discourse; these are its textual colligations”. This idea of the positional prefer-ence of words and phrases has been explored at the textual level in a number of studies (e.g. Hoey 2004, 2009; Hoey & O’Donnell 2008; Mahlberg & O’Donnell 2008; Römer 2010). However, this is one of the first studies to discuss how the position of formulaic expressions influences their prosodic patterns. In light of the close and long-standing link between textual position and prosodic patterns, there is great room for further research to extend the notion of textual colligation beyond the current textual level to the prosodic level. This study is a first step in this new direction of research.

6.2 Relationship between semantic weight and tonicity patterns

Amongst other observations, this study captures a small group of formulaic expressions that are deaccented even though their location at the end of intona-tion units strongly favours their receiving the nucleus. These expressions include if you like, on the whole and in other words. Concordance lines showing their con-textual use in our corpus were presented earlier.

Many reasons can be proposed to explain why these expressions, contrary to expectation, are deaccented in their context of use (e.g. the need to introduce emphasis or contrast to a previous word or the function of the expressions as parenthetical structures in context). One of the less-discussed possible explana-tions, however, is the idea of ‘semantic emptiness’. As mentioned in Section 2, ‘semantic emptiness’ is a general rule in the phonological literature that seman-tically “empty” words such as things and people are not usually accented. If the semantic emptiness theory can be extended to phrases, then it may be argued

578 Phoebe M. S. Lin

that the reason why if you like, on the whole and in other words are deaccented in context is that they are “empty” too. This idea, that these formulaic expres-sions are semantically “empty” in context, needs to be interpreted intelligently. The semantic weight discussed here is more concerned with the meaningfulness of the expressions in their immediate speech context than their inherent (dic-tionary) meaning. That is why a more accurate term may be ‘pragmatic weight’ (or ‘pragmatic meaningfulness’). As far as formulaic expressions are concerned, what is pragmatically meaningful (or meaningless) can vary depending on the context and the speaker. For instance, due to the highly frequent use of if you like in British English (i.e. 14.2 occurrences per million words in the British National Corpus compared to 5.6 times per million words in the Corpus of Contemporary American English), the expression may have weakened in pragmatic meaning when it is used amongst British English speakers (see Coulmas 1981, Malinowski 1923 [1989] for discussions on the inverse relation between frequency of occur-rence and meaningfulness). As the meaningfulness of a formulaic expression is low, there is little point to assign the nucleus and thus draw listeners’ attention to the formulaic expression. This is why accentuation goes to the lexical word pre-ceding the expression.

The argument that the prosodic pattern of formulaic expressions might be related to their pragmatic meaningfulness is not new in the literature. In an inves-tigation into the rhythm of formulaic expressions, Lin (2010b) observes how a university lecturer slowed down the rhythm of some of the formulaic expressions in his speech to prevent these expressions from being treated by listeners as prag-matically meaningless (due to their high frequency of use in lectures) and thus overlooked. If Lin’s (2010b) theory is true, it will mean that language users may be aware of the inverse relation between frequency of occurrence and meaningful-ness (cf. the idea of ‘clichés’) as well as the fact that the manipulation of the pro-sodic features of formulaic expressions is a possible intervention to draw listeners’ attention to those expressions that may otherwise be perceived as pragmatically meaningless.

Taken together, the observations in the present study and Lin’s (2010b) study can have significant implications for our perspective on discourse. Instead of look-ing at lexis, text and speech prosody in isolation, the two studies together reveal how formulaic expressions, their pragmatic meaningfulness and their prosodic pattern may interact and form a complex relationship. This complex relationship deserves further exploration in future studies.

The prosody of formulaic expressions in the IBM/Lancaster SEC 579

6.3 Relationship between flexibility and tonicity patterns

The final interesting point for discussion here is the group of formulaic expres-sions that have flexible slots (e.g. as far as X be concerned and on the X side of things). These expressions present an interesting case as it is their flexible slots in the middle that attract the nucleus whether or not the last words of these expres-sions are lexical. In the phonological literature, lexical words have a much higher tendency than functional words to receive sentence stress.

From the perspective of ‘predictability’, it is possible to explain why the flex-ible slots of semi-fixed formulaic expressions tend to receive the nucleus. In com-parison to the inflexible frame of the semi-fixed expressions, the flexible slot is less predictable because it often varies. This lower predictability means that the flexible slot is inherently more useful to emphasise for the listener than the inflex-ible frame. As these expressions that contain flexible slots tend to be longer, it is likely that they form their own coherent intonation units.6 In that case, because each intonation unit should have at least one nucleus, the nucleus naturally goes to the item that most deserves emphasis, that is, the flexible slot.

From the speaker’s point of view, it is most reasonable for the flexible slot of a formulaic expression to receive the nucleus because the very function of this kind of expression (e.g. as far as X be concerned and on the X side of things) is to highlight to listeners that the preceding or succeeding proposition should be con-sidered from the perspective of the entity/concept presented in the flexible slot. The flexible slot is at the centre of the formulaic expression, whereas the frame is at the periphery. A close examination of the concordance lines of as far as X be concerned and on the X side of things shown above will support this argument.

It is important to note, however, that the predictability of the items in the flex-ible slots of formulaic expressions such as as far as X be concerned may vary. For instance, with 358 instances of as far as I’m/am/was concerned amongst a total of 2,121 concordance lines of as far as X be concerned in the British National Corpus, the predictability of the flexible slot is highest when the slot is filled by the pro-noun I. When the predictability of the flexible slot is very high, the likelihood of the flexible slot receiving the nucleus might vary. To test this theory that links the predictability level of the flexible slot and its likelihood of receiving the nucleus requires a sizable prosodic corpus with many instances of the same formulaic expression. Given the zero occurrence of as far as I’m/am/was concerned in the SEC, we can only speculate on the relationship between flexibility and tonicity patterns of formulaic expressions based on the concordance lines of as far as X be concerned retrieved in the SEC. Further study is needed to test the theory using a larger size corpus.

580 Phoebe M. S. Lin

7. Conclusion

The vast topic of the prosody of formulaic language requires further investiga-tions. This article has presented the first study that examines the prosody of formulaic language in a prosodically annotated corpus. It has demonstrated the presence of regularities behind the tonicity of formulaic language and illustrated four patterns that emerge from the SEC. This focus on prosody is a new avenue in formulaic language research. At the same time, the study has also shown how prosodically annotated corpora may facilitate research on the prosody of formu-laic language.

From a theoretical point of view, the article has also offered new perspectives on the prosody of formulaic language. Firstly, it has drawn attention to ‘holistic-ity’, ‘pragmatic meaningfulness’ and ‘predictability’ as factors affecting the pro-sodic shapes of formulaic language. Secondly, it has proposed an examination of the prosody of formulaic language from the perspective of textual colligation. While existing research tends to examine the positional preference of words and phrases at the textual level, it may be worth exploring textual colligation at a pho-nological level where we may examine the interaction between formulaic expres-sions’ preferred positions and their prosody. Taken together, the study has given some insight into the complexity of the problem with the tonicity of formulaic language. Whether a formulaic expression receives the nucleus in its immedi-ate speech context depends on its position in the intonation unit, its ‘holistic-ity’, ‘pragmatic meaningfulness’ and ‘predictability’. Beyond these factors that are internal to the formulaic expressions, there are many more layers of influence at the global discourse level (e.g. information structure and grammatical structures) which may also affect tonicity patterns. More research is needed to disentangle these competing influences on discourse prosody. In this respect, prosodically annotated corpora offer many opportunities.

Notes

* The author would like to thank the Editor and the anonymous reviewers for their helpful suggestions for improving the article.

1. The intonation unit, which can be defined basically in terms of a single intonation contour, has also been variably called ‘tone unit’ (Quirk et al. 1964), ‘intonation group’ (Cruttenden 1997), ‘tone group’ (Brown et al. 1980, Halliday 1967, Knowles et al. 1996, Wichmann 2000), and ‘intonational phrase’ (Selkirk 1984, Nespor & Vogel 1986).

The prosody of formulaic expressions in the IBM/Lancaster SEC 581

2. Wells (2006) suggests that there is no logical explanation for this example and the case must, therefore, be regarded as formulaic. However, as one of the anonymous reviewers points out, if we accept that there are two senses of say in the example, with the first say meaning “state, convey information” and the second say meaning “use one’s voice, make vocal sounds”, then the second say is not a repetition and so it still deserves to receive the nucleus.

3. Many British approaches to English tonicity (e.g. Cruttenden 1997; Crystal 1969, 1975; Quirk et al. 1985; Tench 1996) assume that there is one nucleus in every intonation unit and the default position of the nucleus is the last lexical word of an intonation unit (known as ‘broad focus’). However, for emphasis or contrast, the nucleus may be assigned to other places in an utterance (known as ‘narrow focus’). The assumption that broad focus is the default tonicity pattern of English is shown to be valid. In empirical studies by Quirk et al. (1964) and Crystal (1969, cited in Altenberg 1987), the nucleus is assigned to the last lexical word of an intonation unit 80 to 95 percent of the time.

4. For the London-Lund Corpus of Spoken English (LLC), see http://khnt.hit.uib.no/icame/manuals/LONDLUND/INDEX.HTM. For the Intonational Variation in English (IViE) cor-pus, see http://www.phon.ox.ac.uk/files/apps/IViE/. For the Hong Kong Corpus of Spoken English (HKCSE), see http://langbank.engl.polyu.edu.hk/HKCSE/. For Santa Barbara Corpus of Spoken American English (SBCSAE), see http://www.linguistics.ucsb.edu/research/santa-barbara-corpus. For the Switchboard Corpus, see http://www.ldc.upenn.edu/Catalog/readme_files/switchboard.readme.html (all accessed July 2013).

5. Nowadays, prosodically annotated corpora tend to have a small size (typically about 50,000 words), and there is often a trade-off between the size of the corpus and the level of detail in the prosodic annotation.

6. As the semi-fixed formulaic expressions (i.e. as far as X be concerned and on the X side of things) are close to the natural length of intonation units (i.e. 5 to 6 words according to Chafe 1988), these expressions have a high chance to form their own coherent intonation units (see Lin & Adolphs 2009 and Lin 2010b).

References

Aijmer, K. 1996. Conversational Routines in English. London/New York: Longman.Altenberg, B. & Eeg-Olofsson, M. 1990. “Phraseology in spoken English: Presentation of a proj-

ect”. In J. Aarts & W. Meijs (Eds.), Theory and Practice in Corpus Linguistics. Amsterdam: Rodopi, 1–26.

Altenberg, B. 1987. Prosodic Patterns in Spoken English: Studies in the Correlation between Prosody and Grammar for Text-to-speech Conversion. Lund: Lund University Press.

Altenberg, B. 1990. “Predicting text segmentation into tone units”. In J. Svartvik (Ed.), The London-Lund Corpus of Spoken English: Description and Research. Lund: Lund University Press, 275–286.

Altenberg, B. 1998. “On the phraseology of spoken English: The evidence of recurrent word-combinations”. In A. P. Cowie (Ed.), Phraseology: Theory, Analysis and Applications. Oxford, England: Clarendon Press, 101–122.

582 Phoebe M. S. Lin

Ashby, M. 2006. “Prosody and idioms in English”. Journal of Pragmatics, 38 (10), 1580–1597.Auran, C., Bouzon, C. & Hirst, D. 2004. “The Aix-MARSEC project: An evolutive database

of spoken British English”. In Proceedings of Speech Prosody 2004. Nara: Speech Prosody Special Interest Group (SProSIG) of the International Speech Communication Association (ISCA), 561–564.

Baker, M. & McCarthy, M. 1988. “Multiword units and things like that”. In Mimeograph. Birmingham: University of Birmingham, 1–35.

Bolinger, D. 1972. “Accent is predictable (if you’re a mind-reader)”. Language, 48 (3), 633–644.Boyland, J. T. 1996. Morphosyntactic Change in Progress: A Psycholinguistic Treatment. Unpub-

lished PhD dissertation, University of California Berkeley, Berkeley, USA.Brown, G., Currie, K. L. & Kenworthy, J. 1980. Questions of Intonation. London: Croom Helm.Bybee, J. 2002. “Phonological evidence for exemplar storage of multiword sequences”. Studies in

Second Language Acquisition, 24 (2), 215–221.Chafe, W. L. 1987. “Cognitive constraints on information flow”. In R. S. Tomlin (Ed.), Coherence

and Grounding in Discourse. Amsterdam: John Benjamins, 21–51.Chafe, W. L. 1988. “Punctuation and the prosody of written language”. Written Communication,

5 (4), 395–426.Coulmas, F. 1981. “Introduction: Conversational routine”. In F. Coulmas (Ed.), Conversational

Routine: Explorations in Standardized Communication Situations and Prepatterned Speech. The Hague: Mouton, 1–17.

Cruttenden, A. 1997. Intonation (2 Ed.). Cambridge: Cambridge University Press.Crystal, D. 1969. Prosodic Systems and Intonation in English. Cambridge: Cambridge University

Press.Crystal, D. 1975. The English Tone of Voice: Essays on Intonation, Prosody and Paralanguage.

London: Edward Arnold.Dahlmann, I. 2009. Towards a Multi-word Unit Inventory of Spoken Discourse. Unpublished

PhD dissertation, The University of Nottingham, Nottingham, UK.Dahlmann, I. & Adolphs, S. 2007. “Pauses as an indicator of psycholinguistically valid Multi-

Word Expressions (MWEs)?” In Proceedings of the Workshop on a Broader Perspective on Multiword Expressions, at ACL 2007, 45th Annual Meeting of the Association for Com-putational Linguistics, Prague, 28 June 2007. ACL Special Interest Group on the Lexicon (SIGLEX), 49–56.

De Cock, S. 1998. “A recurrent word combination approach to the study of formulae in the speech of native and non-native speakers of English”. International Journal of Corpus Lin-guistics, 3 (1), 59–80.

Erman, B. & Warren, B. 2000. “The idiom principle and the open choice principle”. Text, 20 (1), 29–62.

Erman, B. 2007. “Cognitive processes as evidence of the idiom principle”. International Journal of Corpus Linguistics, 12 (1), 25–53.

Halliday, M. A. K. 1967. Intonation and Grammar in British English. The Hague: Mouton.Heid, U. 2011. “German noun+verb collocations in the sentence context: Morphosyntactic

properties contributing to idiomaticity”. In T. Herbst, S. Faulhaber & P. Uhrig (Eds.), The Phraseological View of Language. Berlin: De Gruyter Mouton, 283–312.

Hoey, M. & O’Donnell, M. B. 2008. “Lexicography, grammar, and textual position”. Interna-tional Journal of Lexicography, 21 (3), 293–309.

The prosody of formulaic expressions in the IBM/Lancaster SEC 583

Hoey, M. 2004. “Textual colligation: A special kind of lexical priming”. In K. Aijmer & B.  Altenberg (Eds.), Advances in Corpus Linguistics: Papers from the 23rd International Conference on English Language Research on Computerized Corpora (ICAME 23), Göteborg 22–26 May 2002. Amsterdam: Rodopi, 171–194.

Hoey, M. 2005. Lexical priming. London: Routledge.Hoey, M. 2009. “Corpus-driven approaches to grammar: The search for common ground”. In

U. Römer & R. Schulze (Eds.), Exploring the Lexis-grammar Interface. Amsterdam: John Benjamins, 33–47.

Knowles, G. & Lawrence, L. 1987. “Automatic intonation assignment”. In R. Garside, G. Leech & G. Sampson (Eds.), The Computational Analysis of English: A Corpus-based Approach. London: Longman, 139–148.

Knowles, G., Williams, B. & Taylor, L. 1996. A Corpus of Formal British English Speech: The Lancaster/IBM Spoken English Corpus. London/New York: Longman.

Lin, P. M. S. & Adolphs, S. 2009. “Sound evidence: Phraseological units in spoken corpora”. In A. Barfield & H. Gyllstad (Eds.), Researching Collocations in Another Language: Multiple Interpretations. Basingstoke: Palgrave Macmillan, 34–48.

Lin, P. M. S. 2010a. “The phonology of formulaic sequences: A review”. In D. Wood (Ed.), Per-spectives on Formulaic Language: Acquisition and Communication. London: Continuum, 174–193.

Lin, P. M. S. 2010b. The Prosody of Formulaic Language. Unpublished PhD dissertation, Univer-sity of Nottingham, Nottingham, UK.

Lin, P. M. S. 2012. “Sound evidence: The missing piece of the jigsaw in formulaic language research”. Applied Linguistics, 33 (3), 342–347.

Lin, P. M. S. 2013. “More than music to our ears: The value of the phonological interface in a comprehensive understanding of vocabulary acquisition and knowledge”. In A. N. Archibald (Ed.), Multilingual Theory and Practice in Applied Linguistics: Proceedings of the 45th Annual Meeting of the British Association for Applied Linguistics, 6–8 September 2012, University of Southampton. London: Scitsiugnil Press, 155–158.

Malinowski, B. 1923 [1989]. “The problem of meaning in primitive languages”. In C. K. Ogden & I. A. Richards (Eds.), The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism. San Diego: Harcourt Brace Jovanovich, 296–336.

Mahlberg, M. & O’Donnell, M. B. 2008. “A fresh view of the structure of hard news stories”. In S. Neumann & E. Steiner (Eds.), Online Proceedings of the 19th European Systemic Func-tional Linguistics Conference and Workshop, Saarbrücken, 23–25 July 2007.

Moon, R. 1997. “Vocabulary connections: Multi-word items in English”. In N. Schmitt & M. McCarthy (Eds.), Vocabulary: Description, Acquisition and Pedagogy. Cambridge/New York: Cambridge University Press, 40–63.

Nespor, M. & Vogel, I. 1986. Prosodic Phonology. Dordrecht, Holland: Foris.O’ Connor, J. D. & Arnold, G. F. 1973. Intonation of Colloquial English: A Practical Handbook

(2 Ed.). London: Longman.Peters, A. M. 1977. “Language learning strategies: Does the whole equal the sum of the parts?”

Language, 53 (3), 560–573.Peters, A. M. 1983. The Units of Language Acquisition. Cambridge: Cambridge University Press.Piao, S. S. L., Archer, D., Mudraya, O., Rayson, P., Garside, R., McEnery, A. & Wilson, P. 2005a.

“A large semantic lexicon for corpus annotation”. In Corpus Linguistics 2005, July 14–17, 2005. Birmingham.

584 Phoebe M. S. Lin

Piao, S. S. L., Rayson, P., Archer, D. & McEnery, T. 2005b. “Comparing and combining a seman-tic tagger and a statistical tool for MWE extraction”. Computer Speech & Language, 19 (4), 378–397.

Pickering, B., Williams, B. & Knowles, G. 1996. “Analysis of transcriber differences in the SEC”. In G. Knowles, A. Wichmann & P. R. Alderson (Eds.), Working with Speech: Perspectives on Resaerch into the Lancaster/IBM Spoken English Corpus. London: Longman, 61–86.

Quirk, R., Duckworth, A. P., Svartvik, J., Rusiecki, J. P. L. & Colin, A. J. T. 1964. “Studies in the correspondence of prosodic to grammatical features in English”. In H. G. Lunt (Ed.), Pro-ceedings of the Ninth International Congress of Linguists. The Hague: Mouton, 679–691.

Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1985. A Comprehensive Grammar of the Eng-lish Language. London: Longman.

Rayson, P. 2003. Matrix: A Statistical Method and Software Tool for Linguistic Analysis through Corpus Comparison. Unpublished PhD dissertation, Lancaster University, Lancaster, UK.

Rayson, P. 2009. Wmatrix: A Web-based Corpus Processing Environment. Lancaster: Computing Department, Lancaster University.

Roach, P., Knowles, G., Varadi, T. & Arnfield, S. 1993. “MARSEC: A Machine-Readable spoken English corpus”. Journal of International Phonetic Association, 23 (2), 47–53.

Römer, U. 2010. “Establishing the phraseological profile of a text type: The construction of meaning in academic book reviews”. English Text Construction, 3 (1), 95–119.

Schmitt, N., Grandage, S. & Adolphs, S. 2004. “Are corpus-derived recurrent clusters psycho-linguistically valid?” In N. Schmitt (Ed.), Formulaic Sequences: Acquisition, Processing and Use. Amsterdam: John Benjamins, 127–152.

Selkirk, E. 1984. Phonology and Syntax: The Relation between Sound and Structure. Cambridge, MA: MIT Press.

Sinclair, J. McH. 1991. Corpus, Concordance and Collocation. Oxford: Oxford University Press.Tench, P. 1996. The Intonation Systems of English. London: Cassell.Wells, J. C. 2006. English Intonation: An Introduction. Cambridge: Cambridge University Press.Wichmann, A. 2000. Intonation in Text and Discourse. Harlow: Longman.Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University Press.Wray, A. 2004. “‘Here’s one I prepared earlier’: Formulaic language learning on television”. In

N. Schmitt (Ed.), Formulaic Sequences: Acquisition, Processing and Use. Amsterdam: John Benjamins, 249–268.

Appendix 1

The prosodic annotation symbols used in the SEC are shown below.| Minor intonation unit boundary|| Major intonation unit boundary˄ Intonation unit-internal pauseˋ High fallˎ Low fallʹ High riseˏ Low rise

The prosody of formulaic expressions in the IBM/Lancaster SEC 585

ˉ High levelˍ Low levelˇ High fall riseˬ Low fall rise^ Rise fall∙ Stressed but unaccented↑ Significant rise in pitch↓ Significant drop in pitch

Appendix 2

Formulaic expressions considered in this study were classified into four groups according to the procedures outlined in Figure 1. An expression was put under Groups 1 to 3 if all instances of the expression conform to the same pattern. Groups 1 and 2 are for expressions found at the end of or occupying the whole of intonation units and Group 3 are for expressions that are not. Amongst expressions found at the end of or occupying the whole of intonation units, those that demonstrate the default tonicity pattern (i.e. the nucleus is assigned to the last lexical word of the expression) were put under Group 1; those that do not were put under Group 2. All the remaining expressions were assigned to Group 4. Tables 3 to 6 list the expressions in each group, with their raw frequencies of occurrence presented in brackets.

Form

ulai

c ex

pres

sion

All instances ofthe expression

conform tothe same pattern

Found at the end oroccupying the wholeof intonation units

Demonstrates the defaulttonicity pattern (Group 1),

71 types/85 tokens

Does NOT demonstrate thedefault tonicity pattern (Group 2),

28 types/39 tokensNOT found at the endor occupying the whole of

intonation units (Group 3),94 types/124 tokens

Instances of theexpression DO

NOT conform tothe same pattern

(Group 4)

Found at the end ofor occupying the whole

of intonation units

Demonstrates the defaulttonicity pattern,

21 types/39 tokens

Does NOT demonstrate thedefault tonicity pattern,

13 types/14 tokensNOT found at the endor occupying the whole of

intonation units,20 types/38 tokens

Figure 1. Procedures for classifying the formulaic expressions

586 Phoebe M. S. Lin

Table 3. Group 1: Formulaic expressions found at the end of or occupying the whole of intonation units and with all instances demonstrating the default tonicity patterns

above his head (1)all at one time (1)as far as they know (2)as I said (1)as I say (2)as you can imagine (1)at a loose end (1)at the same time (1)back and forth (1)blocks of flats (1)break the mould (1)by the way (1)cause and effect (1)changed their minds (1)cups of tea (1)dead of night (1)does its best (1)doesn’t matter (1)down the road (3)first of all (1)for some time (1)for the first time (3)for the second time (1)for the sixth time (1)

goes on and on (1)gone our own way (1)good as gold (1)had in mind (1)here and there (1)hold his breath (2)I must admit (3)in a row (2)in a while (1)in his eyes (1)in his mind (1)in the dark (1)in the slightest (1)is not clear (1)lo and behold (1)makes a difference (1)miss the point (1)moaning and groaning (1)more and more (2)not at all (1)of the day (2)on an equal basis (1)on his side (1)on its back (1)

on my way (1)on the ground (1)on the horizon (1)on the increase (1)on the spot (1)on the streets (1)on the table (1)on their side (1)on your back (1)once and for all (1)played a part (1)prove his point (1)some time ago (1)speak for itself (1)stone’s throw (1)take for granted (3)take it back (1)taken for granted (1)taken into account (1)thank you very much indeed (1)top of the list (1)under one roof (1)up to date (1)

Table 4. Group 2: Formulaic expressions found at the end of or occupying the whole of intonation units and with all of the instances demonstrating the non-default tonicity patterns

across the road (1)all the time (1)and so on (8)are up to (1)as far as England cricket was concerned (1)as far as England was concerned (1)as far as security is concerned (1)as far as the local crowd was concerned (1)as it were (1)

as you have said (1)at this point (1)get an idea (1)get up and go (1)in a way (2)in his view (1)in my opinion (1)in some respects (1)in the first place (3)

in the long run (1)in this case (1)keep a check on (1)late in life (1)on the social side of things (1)on the whole (2)one of those things (1)point of view (1)takes into account (1)to a certain extent (1)

The prosody of formulaic expressions in the IBM/Lancaster SEC 587

Table 5. Group 3: Formulaic expressions that are not found at the end of or occupying the whole of intonation units

a little bit (2)a number of (7)am prepared to (1)are prepared to (2)as far as (1)as he said (1)as long as (2)as much as (1)as much as possible (1)as opposed to (1)as she said (1)as soon as (4)as you were saying (1)at that time of day (1)brushing up on (1)by and large (1)came up to (1)clapped his hands (1)come down to (1)do you think (2)doing their bit (1)for a minute (1)for a time (1)from side to side (1)get hold of (1)getting on for (1)go it alone (1)go out with (1)going out of their way (1)had enough of (1)in addition to (1)in common with (1)

in comparison to (1)in favour of (2)in front of (4)in line with (2)in place of (2)in pursuit of (1)in relation to (1)in reply to (1)in response to (1)in spite of (1)in support of (1)in the light of (1)in the way (1)in which case (1)keep track of (2)keeping an eye on (1)let go of (2)live up to (1)look up to (1)made aware of (1)made fun of (1)made up for (1)made use of (1)middle of the road (1)might as well (1)nothing to do with (1)of its own (1)of their own (1)on behalf of (2)on his back (1)on its own (1)on its way (1)

on my plate (1)on the edge (1)on the far side (1)on the other side (1)on the part of (1)on the side (1)on the verge of (1)on top of (1)on your hands (1)on your way (1)out of touch with (1)over the edge (1)quite a lot (2)’re prepared to (1)’s got to (1)something to do with (1)taking advantage of (2)the height of (1)there and then (1)think much of (2)to do with (4)to some extent (1)took advantage of (1)ups and downs (1)’ve got to (3)’ve no idea (1)waited their turn (1)was up with (1)whether or not (1)with reference to (1)

588 Phoebe M. S. Lin

Table 6. Group 4: Formulaic expressions found in mixed situations

Formulaic expressions not considered (%)

Formulaic expressions considered

With default tonicity (%)

down the line (2) 0 50have no idea (2) 0 50on the other hand (2) 0 50part of the world (2) 0 50the other day (3) 0 67at the time (6) 17 83as a whole (5) 20 80in other words (4) 25 50in the end (8) 25 75if you like (3) 33 33as soon as possible (3) 33 67at that time (3) 33 67that’s it (3) 33 67a great deal (2) 50 0for the time being (2) 50 0in my mind (2) 50 0at the moment (4) 50 50I’m afraid (2) 50 50over the hill (2) 50 50thank you very much (2) 50 50in the past (5) 60 20on their way (3) 67 33more or less (7) 71 0again and again (4) 75 25in order to (10) 80 10

Author’s addressPhoebe LinDepartment of Chinese, Translation and LinguisticsCity University of Hong KongTat Chee AvenueKowloon, Hong Kong SAR

[email protected]