Recent quantitative changes in the use of modals and quasi-modals in the Hong Kong, British and...

25
1 Recent quantitative changes in the use of modals and quasi-modals in the Hong Kong, British and American printed press: Exploring the potential of Factiva® for the diachronic investigation of World Englishes Dirk Noël The University of Hong Kong Johan van der Auwera University of Antwerp Abstract: This chapter uses Factiva® to examine recent evolution in the occurrence of verbal modal expressions in the Hong Kong newspaper South China Morning Post, in contrast with American and British newspapers. Previous research established that the frequency of the modals is decreasing in both British and American English generally, but increasing within the pages of TIME Magazine. The present research found that American newspaper data on the recent quantitative evolution of the modals parallel the TIME data to a degree, while British newspaper data correspond with data considered to be representative of British English in its entirety. The quantitative evolutions of both the modals and the quasi-modals in the Hong Kong newspaper resemble the ones in the British newspapers. Keywords: modals and quasi-modals, constructional attrition, newspaper language, Hong Kong English, British and American English, Factiva® 1. Background 1 This article ties in with the part of the burgeoning recent/current change in Englishresearch 2 that is concerned with the general decline, in British and American English, of the frequency of the modal auxiliaries and the concomitant overall rise in these two varieties of the frequency of a mixed bag of verbal modal constructions we will refer to with the term “quasi-modals3 (Leech 2003; Smith 2003; Mair & Leech 2006; Leech et al. 2009; Leech 2013; Smith & Leech 2013; Bowie et al. 2013). More specifically, it links up with Peter Collins’s expansion of this research to other Englishes, additional “inner-circle” ones (Australian and New Zealand English), as 1 We are grateful for the feedback received from audiences of oral presentations of this work in the Research Seminar Series of the School of English of the University of Hong Kong on 3 October 2013, and at the Englishes Today 2013 conference held in Vigo, Spain, on 18-19 October 2013, as well as for comments received from the editor of this volume and a number of anonymous referees. Dirk Noël’s research was supported by an allocation from the General Research Fund of the Hong Kong Research Grants Council (Project Code HKU748213H). 2 See Leech et al. (2009) and Aarts et al. (2013), as well as the eight chapters grouped in a section entitled “Observing recent change through electronic corpora” in Nevalainen and Traugott (2012). 3 This term can be traced back to Hakutani and Hargis (1972). Other terms that have been used are “semi-modals”, “periphrastic modals”, “lexical modals”, “emerging modals” and, most recently, “emergent modals”. This last one is the term used in Geoffrey Leech’s latest contributions to thi s strand of research (Leech 2013; Smith & Leech 2013). It is modelled on Krug’s (2000) term “emerging modals” and justified as a deviation from the term he used in earlier work, “semi -modals”, by saying that it “reflects a difference in the set of verbal idioms included in this rather ill-defined category” (Leech 2013, note 2). Smith and Leech (2013: 80, note 20) recognize, however, that the members of this category can only be called “new” constructions in relative terms. We will adopt the somewhat less loaded term “quasi-modals” for that reason.

Transcript of Recent quantitative changes in the use of modals and quasi-modals in the Hong Kong, British and...

1

Recent quantitative changes in the use of modals and quasi-modals in the Hong

Kong, British and American printed press: Exploring the potential of Factiva®

for the diachronic investigation of World Englishes

Dirk Noël

The University of Hong Kong

Johan van der Auwera

University of Antwerp

Abstract: This chapter uses Factiva® to examine recent evolution in the occurrence

of verbal modal expressions in the Hong Kong newspaper South China Morning Post,

in contrast with American and British newspapers. Previous research established that

the frequency of the modals is decreasing in both British and American English

generally, but increasing within the pages of TIME Magazine. The present research

found that American newspaper data on the recent quantitative evolution of the

modals parallel the TIME data to a degree, while British newspaper data correspond

with data considered to be representative of British English in its entirety. The

quantitative evolutions of both the modals and the quasi-modals in the Hong Kong

newspaper resemble the ones in the British newspapers.

Keywords: modals and quasi-modals, constructional attrition, newspaper language,

Hong Kong English, British and American English, Factiva®

1. Background1

This article ties in with the part of the burgeoning “recent/current change in English”

research2 that is concerned with the general decline, in British and American English,

of the frequency of the modal auxiliaries and the concomitant overall rise in these two

varieties of the frequency of a mixed bag of verbal modal constructions we will refer

to with the term “quasi-modals”3 (Leech 2003; Smith 2003; Mair & Leech 2006;

Leech et al. 2009; Leech 2013; Smith & Leech 2013; Bowie et al. 2013). More

specifically, it links up with Peter Collins’s expansion of this research to other

Englishes, additional “inner-circle” ones (Australian and New Zealand English), as

1 We are grateful for the feedback received from audiences of oral presentations of this work in the

Research Seminar Series of the School of English of the University of Hong Kong on 3 October 2013,

and at the Englishes Today 2013 conference held in Vigo, Spain, on 18-19 October 2013, as well as for

comments received from the editor of this volume and a number of anonymous referees. Dirk Noël’s

research was supported by an allocation from the General Research Fund of the Hong Kong Research

Grants Council (Project Code HKU748213H). 2 See Leech et al. (2009) and Aarts et al. (2013), as well as the eight chapters grouped in a section

entitled “Observing recent change through electronic corpora” in Nevalainen and Traugott (2012). 3 This term can be traced back to Hakutani and Hargis (1972). Other terms that have been used are

“semi-modals”, “periphrastic modals”, “lexical modals”, “emerging modals” and, most recently,

“emergent modals”. This last one is the term used in Geoffrey Leech’s latest contributions to this strand

of research (Leech 2013; Smith & Leech 2013). It is modelled on Krug’s (2000) term “emerging

modals” and justified as a deviation from the term he used in earlier work, “semi-modals”, by saying

that it “reflects a difference in the set of verbal idioms included in this rather ill-defined category”

(Leech 2013, note 2). Smith and Leech (2013: 80, note 20) recognize, however, that the members of

this category can only be called “new” constructions in relative terms. We will adopt the somewhat less

loaded term “quasi-modals” for that reason.

2

well as a good selection of “outer-circle” ones (Philippine, Singapore and Hong Kong

English, bundled as Southeast Asian varieties, plus Indian and Kenyan English in

Collins 2009a, with Jamaican English added in Collins & Yao 2012).

The original observations of the attrition of modal auxiliaries and the

simultaneous growth in the frequency of the quasi-modals were first and foremost

based on data from the Brown family corpora, i.e. on a comparison of frequencies in

corpora containing texts published in the early 1960s and the early 1990s which are

considered to be representative of written British and American English (Leech 2003;

Smith 2003). The claim of the declining frequency of the modals was called into

question by Millar (2009), however, on the basis of frequency data from every single

volume of TIME Magazine published between 1923 and 2006. Leech (2011)

subsequently added extra data points to his two original ones, including data from the

turn of the 20th

century, the early 1930s and the second half of the first decade of the

21st century, and insisted that “the modals ARE declining” in “the language as a

whole”, attributing the differences between his and Millar’s results to “the limitation

of Millar’s study to one very particular genre” (Leech 2011: 549). And very recently,

Smith and Leech (2013: 83) have recognized explicitly that “the genre factor deserves

more attention”, which they illustrate on the example of differences in the frequency

climb of have to in the different subcorpora of an extended British branch of the

Brown family corpora, the highest climb being observable in the Fiction subcorpus of

each corpus, and the lowest in the Learned (academic writing) subcorpus.

Previously, only differences in frequencies from spoken and written corpora

had been taken into account in the work by Leech and his associates. In addition to

the Brown family corpora, Leech (2003) also looked at two “mini” spoken corpora

and observed that both the fall in the frequency of the modals and the rise in the

frequency of the quasi-modals during the second half of the 20th

century were steeper

in spoken language than in written language. Mair and Leech (2006) reported that a

comparison of frequency data from written and spoken American English corpora

showed quasi-modals to have a frequency of 62.5% of that of the modals in spoken

language, compared to only 17% in written language. It is on the basis of such results

that the two observed general frequency changes are argued to occur first in the

spoken language and that they are attributed to “colloquialization” when they happen

in the written language (e.g. in Leech 2013). It is on the basis of such results as well

that Collins (2009a,b) felt justified in drawing diachronic implications from frequency

differences between the spoken and written parts of the various national components

of the International Corpus of English (ICE) he made use of in his research. On the

assumption that the developments observed to be taking place in American and

British English are also taking place in other Englishes, the higher the speech to

writing ratio of the quasi-modals, the more closely these varieties resemble the

situation in American English, which is found to be “leading the way” in the rise of

the quasi-modals (Collins & Yao 2012: 46), and the more advanced these varieties are

consequently considered to be in this evolution. Conversely, a high speech to writing

ratio of the modals suggests “that they retain a degree of vitality, and consequently

that their rate of decline may be less marked – or at least delayed” (Collins & Yao

2012: 46).

It goes without saying, though, that claims about language change are best

supported by diachronic data. To date, the compilation of diachronic corpora or

Brown family-like corpora which are representative of specific New Englishes “as a

whole” has only got under way for a couple of such national varieties, however. On

the other hand, one conclusion one could draw from the altercation between Millar

3

(2009) and Leech (2011) is that it does not always make equal sense to make

undifferentiated statements about change in a national variety “as a whole”. Given the

likely importance of “the genre factor” for rates of change, or even direction of

change, descriptive adequacy will in fact be increased by a genre-specific approach.

As part of a more bottom-up tactic, there is no need, therefore, to sit around waiting

for (or to roll up one’s sleeves and start compiling) representative and well-balanced

diachronic corpora of new English varieties before embarking on investigations of

(parts of) their linguistic evolution.4 Other electronic resources might be available that

lend themselves to diachronic linguistic research even though they have not been

designed for that purpose. One of the objectives of this contribution is to explore the

usefulness of one such resource, Factiva®, a business information and research tool

owned by Dow Jones & Co which to our knowledge has to date not been exploited in

published linguistic research.

The main objective of the research reported on here, however, is to find out

whether the frequency of use of verbal modal expressions has recently gone through

the same evolution in Hong Kong English as in the two main metropolitan English

varieties, or “supervarieties”, i.e. British and American English.5 We are focussing on

Hong Kong not just because this is where one of the two authors of this contribution

resides and works, but also because Hong Kong English shows up in a special way in

Collins’s (2009a: 285-286) results: of all the national varieties he looked at, Hong

Kong English not only uses the modals that were included in his investigation most

frequently, almost doubling the frequency of either British or American English, but it

also outnumbers all other varieties except one, American English, in its use of the

quasi-modals that were counted. In other words, judging by these results, Hong Kong

English appears to be a much “modalized” variety. We will specifically focus on the

language of printed news, not just because of the desirability of a genre-based

approach and the availability of data, but also because it will allow comparison with

Millar’s (2009) results.

In the next section we will outline and motivate our research questions in

greater detail. We will then say a bit more about our research tool, Factiva®, and how

we have used it to find answers to our questions (section 3), which will be followed

by a presentation and discussion of the results of the study (section 4). We will

conclude with a short summary and an evaluation of the usefulness of Factiva® for

the investigation of the evolution of linguistic variation between national varieties of

English (section 5).

2. Problem

If the speech to writing ratios of modals and quasi-modals supplied in Collins and

Yao (2012: 46-47) are anything to go by,6 Hong Kong English is quite conservative

4 Mukherjee and Schilk (2012: 196) point out a possible conflict between ideals and reality in corpus-

based diachronic World Englishes research: “Ideally, what should be envisaged is the compilation of

diachronic corpora of New Englishes so that the description of divergence (or convergence) between

varieties across time can be based on direct evidence. It remains to be seen, however, for which

varieties adequate and sufficient data of earlier stages are available.” 5 In doing so, we are adding to the growing body of knowledge on Hong Kong English. Two volumes

on the topic are Bolton (2002) and Setter et al. (2010). 6 The figures supplied in Collins and Yao (2012) are different from those found in Collins (2009a) but

Collins (pers. comm.) has confirmed the more recent ones to be more accurate. It should also be

4

both in its retention of modals in speech, the average ratio being higher than in British

English, and in its adoption of quasi-modals, the average ratio being lower than that of

British English. Such extrapolations of synchronic facts can only be treated as

hypotheses about frequency evolutions, though, and need to be backed up by

diachronic data. The ratios are, moreover, based on frequencies in the spoken and

written sections of the ICE corpora7 used in their entirety, resulting in an abstraction

that can be far removed from what is happening in certain communicative context

types, e.g. from what journalists are doing in their newspapers or news magazines.

Conversely, as Leech (2011: 549) has argued, Millar (2009: 194) appears to

suggest that his finding of an overall pattern of growth of the modals in TIME

Magazine can be generalized, in spite of an explicit statement that “[t]he patterns of

change observed in the TIME Corpus cannot be claimed directly to hold for the

English language as a whole” (Millar 2009: 206). Leech (2011: 550) warns against

such generalizations with the example of the frequency of the progressive in the

Brown and Frown corpora: their Learned (academic writing) sections attest to a

frequency drop of over 20% during the 30 years separating them (1961 – 1991), while

in the two corpora in their entirety there is an increase of slightly over 10%. Biber and

Gray (2013: 106) link up with this exchange between Millar and Leech in an article

that “challenge[s] the assumption that historical change should be documented for the

language as a whole”, arguing that “change should be studied relative to particular

registers, rather than attempting a kind of average for English” because “register is

crucially important as a mediating factor for historical developments”. They support

this with two case studies, one on the 20th

-century evolution of the use of nouns in

different sub-registers of academic writing, and one which is of potential relevance

for what follows because it involves a comparison of the frequency evolution of a

number of constructions (direct and indirect quotation, the passive, noun + of-phrase,

and noun–noun sequences) in TIME Magazine and The New York Times. Another

recent study that connects with the discussion between Millar and Leech is one by

Bowie, Wallis and Aarts (2013), who likewise conclude that “different types of texts

may be undergoing different changes” (Bowie et al. 2013: 90).

Naturally, the relevance of text type/category, genre or register (and sub-

genre/sub-register)8 distinctions is not restricted to the supervarieties of English. As

Mukherjee and Schilk (2012: 194) point out, “while it remains useful to compare

varieties of English in their entireties with each other to identify overarching

intervarietal differences, there is a growing awareness [in New Englishes research]

that no variety is a monolithic entity and that intravarietal variation exists in all new

Englishes, for example between speech and writing […] and between individual

registers […]”. In tune with this awareness, our aim in this chapter is to contribute to

the bottom-up empirical study of the frequency evolution of modals and quasi-modals

in Hong Kong English with a study of this evolution in the Hong Kong broadsheet

press, which in practice amounts to the South China Morning Post newspaper. In

order to determine the significance of this evolution we will contrast it with the

evolution in two British newspapers, The Times and The Guardian, and two American

ones, The New York Times and The Washington Post. Another reference point is

pointed out that they are based on counts for four modals (must, should, will, shall) and four quasi-

modals (have to, have got to, be going to, want to) only. 7 For American English a comparable corpus was used because at the time ICE-US was not in

existence. (At the time of writing the present article only the written part of the corpus has been made

available.) 8 We are treating these as synonymous terms and will continue to use “(sub-)genre”.

5

Millar’s (2009) TIME Magazine-based study. Though news magazines may constitute

a different sub-genre from newspapers (cf. the Biber & Gray 2013 case study

mentioned above), our New York Times and Washington Post data will show whether

this is a relevant distinction with relation to the development we are interested in here.

Our point of departure, however, is Leech’s (2013) most recent data drawn from the

“extended” Brown family corpora.

These data show a consistent drop in the frequency of the modals as a group

(will, would, can, could, may, might, shall, should, must and need) and a consistent

increase in the frequency of the group of quasi-modals considered (BE able to, BE

going to, HAVE to, HAVE got to, WANT to, NEED to, BE supposed to and had better)9 from

the start of the 20th

century to the first decade of the 21st, both in British and in

American English, with the modals group dropping faster than the quasi-modals

group is increasing.10

The frequency of the modals group has always been, and is still,

several times that of the quasi-modals group, though the difference is getting smaller.

In American English the decline in the frequency of the modals is steeper than in

British English, but there is no statistically significant difference between the

increases of the quasi-modals groups in American and British English. Important for

what follows is that the difference between the 1990s (1991) and 21st-century (2006)

data points is consistent with the longer-term evolution.

Turning to individual modals, Leech (2013: 96-99) first repeats the earliest,

2003, observations based on the original Brown family corpora, so comparing 1960s

and 1990s data points only. In American English every modal becomes less frequent,

with medium or low-frequency ones, particularly may, must and shall, dropping the

most. In British English, on the other hand, two high-frequency ones, can and could,

display a slight frequency rise. The American data are then complemented with data

from the Corpus of Historical American English (COHA) for the 1910–2010 period,

and the Corpus of Contemporary American English (COCA) for the 1990–2010

period. The COHA data reveal that the four most common modals, would, will, can

and could, “have on the whole maintained the same frequency over the [20th

] century”,

and that the “twentieth-century record of gradual decline can actually be attributed to

the steady frequency loss of the seven lower-frequency modals, may, should, must,

might, shall, ought (to) and need(n’t)” (Leech 2013: 103). Important for what follows

is the following statement: “the decline which has affected the less common modals

over the century has in the last decade or so [2000-2010] begun to impinge on the four

commonest modals, so that the overall picture is of accelerating decline of frequency”

(Leech 2013: 103). It follows that would, will, can and could are no longer

maintaining their 20th

-century frequency. The COCA data, which also include spoken

data, do not reveal a drop in the frequency of can and could between 1990 and 2010,

however. They do for will and would, but Leech’s (2013: 104) Figure 6 shows the

decline of would to have halted in the second half of the first decade of the present

century. Leech (2013: 104) also mentions that might, and also can and could, “have

survived at roughly the same level of frequency” during the 1990–2010 period.

Leech (2013) did not look for quasi-modals in the COHA and COCA corpora.

The explanation for this is that he turned to these corpora to check Millar’s (2009)

TIME corpus-based claim that the frequency of the modal group was growing rather

than dropping, while Millar did not disagree with him on the frequency development

of the quasi-modal group. About the quasi-modals, Millar (2009: 204) simply states

9 Small caps indicate lemma forms that can inflect.

10 Leech (2013: 99) does not provide exact frequencies, only a chart, but for reasons we will explain in

the next section, token frequencies are not relevant to this study.

6

that all the ones he looked for (HAVE to, WANT to, BE going to, used to, NEED to, HAVE

got to and had better) “have risen considerably in frequency” in TIME Magazine

between 1923 and 2006, pointing out that in the case of need to “the increase is

almost tenfold”. Calculating the difference between the figures he presents in his

Table 8 for the 1990s and 2000s — because this difference will be relevant for a

comparison with our data — we end up with the results in the one but last column of

Table 1.

1990s 2000s difference significance

HAVE to 554 660 +19.13% p < 0.01

WANT to 474 649 +36.92% p < 0.0001

BE going to 173 227 +31.21% p < 0.01

NEED to 134 223 +66.42% p < 0.0001

used to 125 140 +12% n.s.

HAVE got to 23 30 +30.43% n.s.

had better 11 12 +9.09% n.s.

Table 1: Percentage difference between Millar’s (2009: 204) 1990s and 2000s

frequencies (per million words) for quasi-modals (n.s. = no statistically significant

difference, i.e. p > 0.05)11

We can observe that the four highest-frequency ones of the quasi-modals considered

kept on rising in TIME Magazine between the last decade of the previous and the first

decade of the current century, NEED to the most, the increase of the lower-frequency

ones not being statistically significant.

Where Millar (2009) differs from Leech (2003, 2013) is on the matter of the

overall frequency evolution of the modal group, which Millar found to be a rising

rather than a falling development, with an overall frequency increase of 22.9%

between 1923 and 2006. But not all modals are observed to grow more frequent.

Those that definitely do are can, could and may, two of which, can and could, were

not found to decrease by Leech (2013) either, but to maintain their frequencies in the

COHA data. However, shall, ought and must “show a considerable decline in

frequency” in TIME Magazine as well (Millar 2009: 199). If we again calculate the

difference between the 1990s and 2000s figures, presented in Millar’s Table 3, what

we get is the percentage difference in the one but last column of Table 2.

11

See the Methodology section of the chapter (section 3) for the statistical significance testing method

that was employed.

7

1990s 2000s difference significance

will 2,273.23 2,362.52 +3.93% n.s.

would 1,797.03 1,693.19 -5.78% n.s.

can 1,475.95 1,777.07 +20.40% p < 0.0001

could 1,378.39 1,342.56 -2.60% n.s.

may 937.08 931.91 -0.55% n.s.

should 521.46 593.27 +13.77% p < 0.05

might 474.23 433.34 -8.62% n.s.

must 306.69 250.59 -18.29% p < 0.05

ought 34.9 27.65 -20.86% n.s.

shall 16.09 9.26 -42.45% n.s.

Total 9,215.05 9,421.36 +2.24% n.s.

Table 2: Percentage difference between Millar’s (2009: 199) 1990s and 2000s

frequencies (per million words) for modals

Judging by the data in Table 2, the growth of the modal group in TIME Magazine has

slowed down considerably. In fact, one can hardly consider there to be any collective

growth any more. Only can appears to continue to increase in frequency around the

turn of the century, and the frequency of should has increased more in this period than

in the whole 1923-2006 period considered by Millar. Could and may seem to have

stabilised. Must is still falling. The large percentages in the case of shall and ought are

deceptive: there is no statistically significant decrease any more.

We can now ask to what extent the recent frequency evolution of modals and

quasi-modals in American, British and, of course, Hong Kong newspapers conforms

to these data. This very general question can be broken down into a more specific

series of logically ordered and motivated questions:

1. Do the American newspaper data conform more to Millar’s TIME Magazine

data than to Leech’s American language-as-a-whole data or vice versa? If the former

is the case, genre can indeed be said to play a role in the frequency evolution of verbal

modal expressions.12

However, lack of conformity between the newspaper data and

the TIME data might also point to the relevance of sub-genres. This leads us to

question 2.

2. Do the American newspaper data conform to each other more than they do to

the TIME Magazine data? If they do, newspapers constitute a sub-genre of the news

genre that can be relevantly distinguished in the frequency evolution of verbal modal

expressions.

3. Do the British newspaper data conform more to Leech’s British language-as-a-

whole data or to Millar’s TIME Magazine data? If the former, the geographical factor

is more important than the genre factor in the frequency evolution of verbal modal

expressions. This can be confirmed or disconfirmed by question 4.

4. Do the British newspaper data conform to each other more than they do to

American newspaper data? If so, the frequency evolution of verbal modal expressions

in newspapers is national variety-specific.

5. The relative importance of variety and genre having been established through

the answers to questions 1 to 4, we can finally ask: To what extent do the Hong Kong

12

We are using the term “genre” because this is the term used in the research we are connecting with.

Newspapers can be argued to comprise many different genres, however.

8

newspaper data on the frequency of the evolution of verbal modal expressions

conform to the British and American ones?

3. Methodology

As announced above, frequency data were drawn from five newspapers: two

American ones, The New York Times (henceforth, NYT) and The Washington Post

(henceforth, WP); two British ones, The Times and The Guardian; and the Hong Kong

South China Morning Post (henceforth, SCMP). These were all searched using

Factiva®, an information service product of the American publishing and financial

information firm Dow Jones & Company, which became a subsidiary of Rupert

Murdoch’s News Corp in 2007. On its homepage (www.dowjones.com/factiva/, last

accessed on 1 July 2013), Factiva is described as “the world’s most important

collection of news”, which “unlocks the paywall to critical business facts” by

accessing “thousands of sources in 28 languages from nearly 200 countries”.

Basically, it is a search engine that targets certain sources and whose search results

are texts that hopefully contain the “business facts” its corporate or academic

subscribers are looking for. However, since its search interface is, naturally, a text-

based one, there is no reason why, in academic institutions with a subscription to the

product, its use should be restricted to scholars in business looking for business facts.

In fact, with its options to search for “All of these words”, “At least one of these

words”, “None of these words” and “This exact phrase”, this interface will look very

familiar and useful to many a corpus linguist.

As will be obvious from the above, Factiva was not designed for linguistic

research, though, and the query results consequently do not come in the form of

concordance lines or token frequency counts. As just implied, they come in the form

of texts in which the search terms occur, but the search engine also counts these texts.

In other words, it provides a frequency measure that is known in corpus linguistics as

the “dispersion” of the query expression, albeit a very primitive version of it. In recent

literature dedicated to various (more sophisticated) measures of dispersion it is

considered to be a better measure of the entrenchment of an expression in a language

than raw token frequency because it disregards frequency that results from repeated

use of the query expression in only a subset of texts (see Gries 2008, 2010; Chesley &

Baayen 2010). This was not a relevant consideration in our choice of data source,

however, and indeed the kind of quantitative information Factiva can provide also has

drawbacks. In the case of frequent expressions, for instance, many occurrences will

remain uncounted, since if they occur more than once in a text they are only counted

once, and consequently the quantities generated cannot simply be put next to the

frequency-per-million-words data cited above. However, this is not to say that the two

different kinds of data do not allow any kind of comparison. If one may assume that

verbal modal expressions are not restricted by ideational text content in the same way

as certain lexis, relative differences and fluctuations in token frequency are expected

to be paralleled by relative differences and fluctuations in dispersion rates, provided

the data sources are large enough. In the Results section of this paper we will not

only compare dispersion data drawn from Factiva, we will also put these next to

frequency data from Millar’s study, because sadly Factiva does not provide access to

TIME Magazine and consequently dispersion data could not be collected for it in the

same way. The reader should keep in mind the different nature of the TIME data, but

we believe their comparison is justified, for the reason stated in this paragraph.

9

Very useful to the researcher of linguistic change is that the Factiva search

interface allows one to specify a date range for the sources accessed. The digital age is

still very new, however, and that means we cannot go back in time very far when

using this search engine to access digital versions of newspapers. In the case of the

SCMP, for instance, the furthest we can go back is 1989, and Factiva only started the

indexing of the two other Hong Kong English language newspapers, The Standard

and China Daily Hong Kong Edition in 2002 and 2010 respectively. This is the reason

we could only include one Hong Kong data source in this study.13

It is also the reason

we can only cover the last period included in Millar’s (2009) and Leech’s (2011, 2013)

research, roughly corresponding to the last decade of the 20th

century and the first

decade of the 21st.

To do so we picked three data points: 1990, 2000 and 2010. Each of these

three volumes of each of the five newspapers we are interested in was searched for the

textual occurrence of the modals can, could, may, might, must, ought to, shall, should,

will and would, and for a set of quasi-modals that is the union of the sets mentioned in

Millar (2009) and Leech (2013), viz. (BE) going to, HAVE to, (HAVE) got to, NEED to,

WANT to, BE supposed to, used to and BE able to. Only full forms of the contractible

forms were included in the searches. The query expressions for the periphrastic forms

only included the invariable parts of these forms, e.g. “supposed to” in the case of BE

supposed to. Need(n’t) was not included in the modal set because the Factiva search

system does not make it possible to automatically separate need from need to, but as

shown by Millar’s (2009) and Leech’s (2013) data need is the modal with the lowest

frequency, so that its omission will not have had a great impact on the results.14

The quantitative data Factiva can generate are limited to the count of the texts

in which the query term occurs. It does not tell one how many texts were searched. To

be able to compare and interpret the figures generated we therefore needed to find a

way to limit our searches to a known number of texts. We decided to do this by first

searching each of the three volumes of each publication for both a definite and an

indefinite article, which produced a figure representing the total number of texts in

each volume. We then included definite and indefinite articles in the queries for each

modal and quasi-modal expression, which limited these searches to the texts earlier

determined to constitute the total number of texts. The dispersion rate generated could

subsequently be expressed as a percentage share of the total number of texts. Our

comparisons, in the next section, of the dispersion of individual expressions between

data sets, i.e. between publications and between different volumes of each publication,

will be based on these percentages. As in the research by Leech and Millar we

connect with, the log likelihood test was used to determine the statistical significance

of dispersion rate differences, making use of Paul Rayson’s online log likelihood

calculator (http://ucrel.lancs.ac.uk/llwizard.html).

Another kind of quantitative data we will make reference to for the

comparison of data sets are two ratios: the ratio of the total of the dispersion figures

for all modals to the total number of texts, and the ratio of the total of the dispersion

figures for all quasi-modals to the total number of texts. The higher these ratios, the

higher is the dispersion of the group as a whole. The smaller the difference between

the two ratios, the more the dispersion rates of the two groups have come together.

13

As a Hong Kong edition of a “mainland” Chinese newspaper the status of China Daily Hong Kong

Edition as a Hong Kong newspaper is doubtful anyway. 14

Since the Factiva search engine was not designed for corpus linguistic research purposes, it does not

filter out the nouns can, must, might and will, nor cases of going to and used to where the infinitive is

purposive, or where to is a preposition. There is some noise in the quantitative data, therefore.

10

4. Results and discussion

4.1. “American English”

In line with the logical order of the research questions formulated at the end of section

2, we will first present and discuss the American data from The New York Times

(Table 3) and The Washington Post (Table 4).

[@@ TABLES 3 AND 4 TO BE INSERTED AROUND HERE]

A first observation we can make is a reassuring one with respect to the amount of trust

we can invest in our dispersion data vis-à-vis token frequency data. The frequency

rank order of the expressions listed in Tables 3 and 4 closely resembles that of

Leech’s (2003, 2013) and Millar’s (2009) data. In the case of the modals this is:

would/will > can > could > may > should > must/might > ought > shall. The slashes

indicate variation between Leech’s (2003) 1990s language-as-a-whole Frown data and

Millar’s 1990s–2000s TIME data. In Frown would is the most frequent modal and in

TIME will heads the list. Must is more frequent than might in Frown, but it is the other

way round in TIME.

The rank order in our WP data exactly mirrors that of TIME. In the case of

NYT, there is no significant difference between will and would in 1990, but would

instead of will comes out on top from 2000 (p < 0.001) as a result of a rise in the

frequency of would, the frequency of will staying stable. Judging by Leech’s (2013:

104) Figure 6, which represents the frequency change of the modals in the later 20th

and early 21st century based on COCA data, will was first on top of would, but they

changed places in the latter half of the first decade of the present century as a result of

a drop in the frequency of will. The position of would with relation to will in NYT is

therefore the same as in American English “as a whole”, unlike in WP and TIME, but

apparently as a result of a different change. In WP will has risen and would has

dropped, as they have done slightly, but not statistically significantly, in TIME

between the 1990s and 2000s (see Table 2). As to the relative positions of must and

might, these are the same in NYT, WP and TIME, might being the more frequent

modal, but judging by that same Figure 6 in Leech (2013), this is also what turned out

to be the case in the language as a whole as a result of a drop of must.

Continuing this comparison of the dispersion/frequency evolution of the

modals in American English during the previous two decades we can observe that can

increases in NYT, WP and TIME, while it stays stable in the COCA data. Could rises

in both NYT and WP, while it does not in TIME and COCA. However, it should be

observed perhaps that unlike in NYT, where there is a steady increase, the overall

increase between 1990 and 2010 in WP is the result of sharp increase between 1990

and 2000 followed by a less steep decrease between 2000 and 2010. May first rises

but then drops more steeply in WP and first rises only slightly and then stays stable in

NYT, while it stays stable in TIME and drops in COCA. Should increases in NYT, WP

and TIME, but decreases in COCA. Might rises in both NYT and WP, as it does very

slightly in COCA, but drops unsignificantly in TIME. Must seems to drop everywhere,

as do ought and shall, but the latter two do not do so in a statistically significant

fashion in TIME, and the drop of the first one is not statistically significant in WP.

11

The picture that emerges from this is the following. When comparing the start

point and the endpoint of the twenty-year period under consideration, the evolution of

the dispersion of the modals in the previous two decades is more often the same in

NYT and WP than it is different, but the development is more linear in NYT than in

WP. When the end result is the same in both papers, the evolution is not consistently

in line with that in TIME. When the two newspapers do not show the same evolution,

neither of them is consistently in line with either TIME or the language as a whole. In

other words, as far as the very recent evolution of individual modals in the American

printed press is concerned we can conclude there to be more sub-genre consistency

between the two newspapers than genre consistency with TIME. This is visualized in

Table 5, in which the shaded cells shows where the two newspapers are similar and

which of their similarities are shared with TIME.

NYT WP TIME

will

would

can

could *

may

should *

must

might

shall

ought

Table 5: Similarities in the direction of the dispersion/frequency evolution of the

modals from 1990 to 2010 between American printed news publications.

= rising dispersion/frequency (p < 0.0001, unless specified otherwise)

= falling dispersion/frequency (p < 0.0001, unless specified otherwise)

= no statistically significant difference

* p < 0.001 p < 0.05

However, when we consider the modals as a group by looking at the modals-

to-text ratios,15

we notice that there is a steady increase in the use of modals in NYT

(from 2.81 to 2.99), and that there is an overall increase in WP as well (from 2.93 to

3.1) but that this is the result of a fairly steep rise (from 2.93 to 3.43) between 1990

and 2000 followed by a more gentle fall (from 3.43 to 3.1) between 2000 and 2010.

WP had more modals than NYT in 1990, and the difference between the two

newspapers had grown bigger in 2000, but they had converged considerably by 2010.

It remains to be seen whether the change in the evolution in WP is a case of genre

convergence, but the fact is that the moderate increase between the start and the end

of the twenty-year period considered conforms more to the development in TIME (see

Table 2) than that in American English “as a whole”.

Turning to the field of the quasi-modals, we can observe exactly the same

developments. There is an overall increase in the use of quasi-modal expressions

between 1990 and 2010, which in the case of NYT is a steady increase (from a ratio of

15

Given that we are dealing with dispersion figures rather than token frequencies, this should be read

as “texts with at least one modal of a certain kind”-to-text ratio.

12

0.98 over 1.17 to 1.21) and in the case of WP again the result of a sharper increase

(from 1.14 to 1.48) followed by a less steep decline (from 1.48 to 1.22). Here too NYT

is behind WP at the start of the period, but the two newspapers have very much

converged in 2010.

The same story holds for most of the quasi-modals when considered

individually. Again notice that the frequency rank order of the individual expressions

based on our dispersion data mirrors the token frequency-based rank order that can be

compiled from the data presented in Leech (2003) and Millar (2009): HAVE to > WANT

to > BE going to > NEED to > used to > got to > had better. It is especially striking

when comparing the two newspapers how the sometimes somewhat differing

evolutions (made visible in Table 6) lead to a very similar situation at the end of the

twenty-year period, with all of the quasi-modals being used in a very similar share of

the totality of texts. NEED to is the sharpest climber in both publications, which is

consistent with the TIME data. Inconsistent with TIME, however, as can be seen when

we compare the 2000 and 2010 figures in Tables 3 and 4 with Table 1, is that both

HAVE to and WANT to stopped climbing in both papers in the current century, HAVE to

going down even between 2000 and 2010 in both papers, and WANT to going down in

WP and staying stable in NYT.

NYT WP TIME

have to °

got to

want to

going to °

need to

used to

supposed to

had better

able to °

Table 6: Similarities in the direction of the dispersion/frequency evolution of the

quasi-modals from 1990 to 2010 between American printed news publications.16

= rising dispersion/frequency (p < 0.0001, unless specified otherwise)

= falling dispersion/frequency (p < 0.0001, unless specified otherwise)

= no statistically significant difference

° p < 0.01 p < 0.05

Since Millar (2009) and Leech (2011, 2013) do not point out differences

between the language as a whole and the news genre with relation to the frequency

evolution of the quasi-modals we can conclude the discussion here by saying, with

relation to our first research question, that the evolution in the two American

newspapers considered is on the whole not inconsistent with the general trend of a

continued rise in the frequency of the quasi-modals as a group. The only “abnormality”

is that HAVE to and WANT to have very recently stopped climbing. This is a

development that is displayed by both newspapers, and it is left to be determined

whether this is a sub-genre-specific feature or a “language-as-whole” phenomenon.

16

There are no data on supposed to and able to in Millar (2009).

13

Looking at the differences between the modal and quasi-modal ratios in

Tables 3 and 4, we see that this hardly gets smaller in NYT and grows a bit bigger

even in WP. This is definitely not what is expected to be going on in the language as a

whole, where the frequencies of both groups are predicted to be coming closer to each

other, based on the comparison of 1960s and 1990s data (Leech 2003, 2013).

We can conclude this comparison of American newspaper data with TIME

data and data on American English as a whole by answering the first two research

questions listed at the end of section 2 with 1) yes, the newspaper data conform more

to the TIME data than the language-as-a-whole data, and 2) yes, there is a certain

degree of conformity between the two newspapers which distinguishes them from

TIME. (Sub-)genre is therefore a factor in the recent frequency evolution of verbal

modal expressions.

4.2. “British English”

Turning now to the Times and Guardian data, presented in Tables 7 and 8

respectively, we can observe a number of striking differences with the American data.

[@@ TABLES 7 AND 8 TO BE INSERTED AROUND HERE (THESE TABLES

CAN BE FOUND AT THE END OF THIS DOCUMENT)]

The most important of the observations that can be made is that, judging by the

modals-to-text ratios, the dispersion of the modal group as a whole has gone down in

both British newspapers between 1990 and 2010. In The Times this is a linear

decrease from 2.56 over 2.45 to 2.35. In The Guardian the ratio first rises a little from

2.95 to 3.2, but then drops quite sharply to a level below the 1990 level, 2.6. Looking

at the individual modals, in The Times the occurrence of all modals except two has

consistently dropped. The occurrence of may has remained stable (at slightly above

26%) and that of can first rose sharply from 26% in 1990 to almost 36% in 2000 and

then dropped only a little to slightly over 34%. In The Guardian the situation is more

varied. Some modals first increase their dispersion before dropping to a level below

the 1990 one, as in the cases of could, may, must and ought. Others hover at the same

level (the differences between 1990 and 2000 not being statistically significant)

before dropping, as in the case of will, should and shall. Can and might first rise and

then drop, but not to a level below the 1990 one. Only would drops both between

1990 and 2000 (p < 0.001) and between 2000 and 2010 (p < 0.0001). If we again

make abstraction of the figures for 2000, however, and confine the comparison to the

direction of the evolutions between 1990 and 2010, the two British newspapers turn

out to be very similar to each other, and to be markedly different from the American

newspapers and TIME. This is visualized in Table 9, in which the shaded cells show

where the two British newspapers have evolved in a similar fashion, and which of

these similarities are shared with the American publications.

14

Times Guardian NYT WP TIME

will

would

can

could *

may

should *

must

might

shall

ought *

Table 9: Similarities in the direction of the dispersion/frequency evolution of the

modals from 1990 to 2010 between British and American printed news publications.

= rising dispersion/frequency (p < 0.0001, unless specified otherwise)

= falling dispersion/frequency (p < 0.0001, unless specified otherwise)

= no statistically significant difference

* p < 0.001 p < 0.05

Returning to the modals-to-text ratios, and comparing them with the American

ones, we can observe that The Times uses modals the least of all four newspapers

considered so far, its highest rate (2.56) being lower than the lowest rate in the

American newspapers. The Guardian ends up using the modals less than the

American papers as well in 2010, but started from a position comparable to them in

1990. Note that based on Leech’s (2003: 228) language-as-a-whole data British texts

are not expected to contain fewer modals than American ones, so that we could be

dealing with a difference between these national varieties which is genre-specific,

though we cannot determine here whether and to what extent the genre factor might

be operative on both sides of the Atlantic.

We can conclude that as far as the overall evolution of the modals is

concerned, the two British newspapers are very different from the two American ones

in that they both display a drop, which is consistent with the development in both

British and American English “as a whole”. The evolution in The Guardian

resembled that of TIME and the American newspapers more before the turn of the

century, but then “followed” the evolution in The Times.

Turning to the quasi-modals, we can observe a linear increase in the dispersion

of the group in The Times, moving from a ratio of 0.65 over 0.82 to 0.91, while we

can again detect a rise-fall pattern in The Guardian, from 0.81 over 1.23 to 1.02, but

this time the last ratio is not lower than the first one. As in the case of the modals,

these ratios are considerably lower than the American ones, which confirms that the

British newspapers are less modalized than the American ones. As far as individual

quasi-modal expressions are concerned, we can note that the most frequent ones have

not increased any more in the current century. This is true of HAVE to and WANT to in

both newspapers, and also of NEED to in The Guardian.17

Remember that the increase

of the former two had also halted in the two American papers in this century, but that

17

NEED to might also have stabilized in The Times if it had not been for an overuse of the expression

need to know in the 2010 volume as a result of its repetition in various section titles.

15

the available data do not allow us to decide whether this is sub-genre or language-

specific.

If we make abstraction of the year 2000 data again, the picture that emerges is

as shown in Table 10, which reveals the evolution over the twenty-year period in the

two British newspapers to be quite similar to each other and also, to a slightly smaller

degree, to the American NYT, but quite different from the other American

publications.

Times Guardian NYT WP TIME

have to ° °

got to

want to

going to °

need to

used to

supposed to

had better

able to ° °

Table 10: Similarities in the direction of the dispersion/frequency evolution of the

quasi-modals from 1990 to 2010 between British and American printed news

publications.

= rising dispersion/frequency (p < 0.0001, unless specified otherwise)

= falling dispersion/frequency (p < 0.0001, unless specified otherwise)

= no statistically significant difference° p < 0.01 p < 0.05

Unlike in the American newspapers, the differences between the modal and

quasi-modal ratios grow considerably smaller in the two British newspapers, which is

what is expected to be going on still in the language as a whole, though it has only

been observed yet through a comparison of 1960s and 1990s data (Leech 2003, 2013).

Returning to the research questions listed at the end of section 2, we can

conclude this sub-section by answering the third question with yes, the British

newspaper data conform more to the language-as-a-whole data than to the TIME data,

and the fourth one with yes, the British newspapers conform more to each other than

to the American newspapers, both in their level of modalization and in the direction of

change. Both the occurrence and the frequency evolution of verbal modal expressions

in newspapers is therefore specific to national varieties.

4.3. “Hong Kong English”

We have established in the two previous sections that there are not only national

differences between the degree of modalization of newspapers which are unlikely to

be ascribable to differences in the presence of modals and quasi-modals between the

two varieties “in their entirety”, but that there are also very clear national differences

in the quantitative evolution of the modals (if not the quasi-modals to the same extent)

in newspapers which cannot be attributed to differences in that evolution in the

national varieties “as a whole”. (Sub-)genre is therefore a factor both in the degree of

modalization and in the evolution of modalization. Our analysis of the SCMP data

16

presented in Table 11 cannot therefore lead to conclusions about Hong Kong English

generally.

[@@ TABLE 11 TO BE INSERTED AROUND HERE (THESE TABLES CAN BE

FOUND AT THE END OF THIS DOCUMENT)]

As far as the general degree of modalization is concerned, the SCMP comes

closer to the British newspapers than the American ones, both in terms of their modal

group and their quasi-modal group ratios. This numerical closeness to the British data

may be seen to confirm the genre factor because things look somewhat different in the

frequency counts in the entire writing sections of the ICE corpora presented in the

study by Collins and Yao (2012: 45) already referred to above, which indicated Hong

Hong English to be the highest-scoring of all varieties considered in terms of the use

of modals, with a frequency count (of 6656 tokens per million words) considerably

higher than that of British English (6089), which in turn is higher than the American

English one (4843), and its quasi-modals count (1740) being mid-way between the

British English (1534) and the American English (1987) ones. Looking at the

sequence of both ratios in Table 11, we notice that the size of the modal group

diminished between 1990 and 2010 and that the quasi-modal group became more

sizeable, but that neither evolution happened linearly. Both the modals and the quasi-

modals 2000 ratios being the lowest ones, the SCMP was less modalized at the turn of

the century than it was ten years before or ten years later. This fall-rise pattern is quite

unique in that it was not observed in any of the other newspapers we looked at, but

just as we refrained from speculating about the possible causes of the rise-fall patterns

observed above in WP and The Guardian we will not comment on this lack of

linearity.

Considering the entire span of the twenty-year period, the dispersion of the

modal group has dropped and this conforms to what happened in the two British

newspapers rather than the two American ones. The drop in the Hong Kong paper

(0.15) is smaller than in the The Times (0.21) and The Guardian (0.35), however. In

spite of the overall drop, the dispersion of one modal, viz. can, rose considerably (up

from 29.23% to 36.15%) and that of may increased slightly (from 25.51% to 27.63%).

The former happened in the other four papers and in TIME as well, the latter did not,

nor did it in TIME in the 1990s and 2000s, as can be seen in Table 2, though it was a

big riser over the course of the whole period considered by Millar (2009: 199). The

decrease of the modal group as a whole is mainly due to a sharp fall of will (from

64.51% to 56.26%) and would (from 59.1% to 49.28%). These fall steeply in the

British papers as well, unlike in the American ones, but in the British papers the

contribution to the drop of the group is more evenly shared across its members.

Table 12 compares the dispersion/frequency evolution of the individual

modals from 1990 to 2010 in the six printed news publications considered in this

chapter. The shaded cells in the table are an indication of similarities in the evolution

between SCMP and the other publications and they reveal the evolution in SCMP to

be very similar to that in the two British papers, the developments in The Times

corresponding most often to those in SCMP (only three white cells indicating

dissimilarities). Since there is a high degree of similarity between the pattern of

change in the The Times and The Guardian, and since the patterns in the American

publications are quite different, we could say that the SCMP displays a “British”

pattern of change, though it remains to be seen whether this should be argued to be

the result of British influence.

17

SCMP Times Guardian NYT WP TIME

will

would

can

could *

may

should *

must

might

shall

ought *

Table 12: Similarities in the direction of the dispersion/frequency evolution of the

modals from 1990 to 2010 between six printed news publications.

= rising dispersion/frequency (p < 0.0001, unless specified otherwise)

= falling dispersion/frequency (p < 0.0001, unless specified otherwise)

= no statistically significant difference

* p < 0.001 p < 0.05

Turning to the quasi-modal group, we have already observed that the overall

pattern of change is one of a dispersion increase, which is true of all four newspapers

considered in the two previous sections, but the overall level of use in SCMP

approaches that in the two British papers more. Given the low level of modalization in

2000 the increase happens later than in the other papers, however. Making abstraction

of the 2000 figures, the dispersion of HAVE to did not rise between 1990 and 2010,

which is also the case in most of the other papers (NYT being the exception). The

dispersion of the two other “big ones”, WANT to and NEED to, follow the normal rise

pattern, but the lower dispersion one BE going to shows a slight decrease (p < 0.001).

The shaded cells in Table 13 make visible that the evolution of the quasi-modal group

in SCMP corresponds most with the evolution in the two British newspapers.

18

SCMP Times Guardian NYT WP TIME

have to ° ° °

got to

want to

going to * °

need to

used to

supposed to

had better

able to ° °

Table 13: Similarities in the direction of the dispersion/frequency evolution of the

quasi-modals from 1990 to 2010 between six printed news publications.

= rising dispersion/frequency (p < 0.0001, unless specified otherwise)

= falling dispersion/frequency (p < 0.0001, unless specified otherwise)

= no statistically significant difference

* p < 0.001

° p < 0.01 p < 0.05

Again ignoring the figure for 2000, the level of use of the quasi-modal group

as a whole rises less in SCMP (an increase of only 0.11) than in the two British papers

(+0.26 in The Times and +0.21 in The Guardian) and the difference between the level

of use of the modal group and the quasi-modal group becomes only marginally

smaller (-0.26, compared to -0.47 in The Times and -0.56 in The Guardian). In other

words, the two groups appear to be converging less in SCMP than in the two British

papers.

We can conclude that the quantitative evolution of verbal modal expressions

in the Hong Kong newspaper, though resembling the British development more than

the American one, is also markedly different from that in the British papers.

5. Summary and conclusion

This chapter has aimed to contribute to a bottom-up investigation of recent

grammatical change in Hong Kong English through a study of the evolution of the

frequency of use of verbal modal expressions in a Hong Kong newspaper, the South

China Morning Post. This focus on a single (sub-)genre, if not a single source, is

justified by the realization that language change can vary across genres, as was made

clear by the difference in the results of Leech (2003, 2011, 2013) and Millar (2009)

with relation to the 20th

-century evolution of the frequency of modal auxiliaries. We

have established, through a comparison of Millar’s TIME Magazine data with data on

the quantitative evolution of the modals between 1990 and 2010 in two American and

two British newspapers, that the dispersion of the modals as a group is increasing in

American printed news publications generally, in contrast to frequency data drawn

from corpora that are meant to be representative of American English as a whole, and

unlike in the British printed press. Consequently, there is a national variety-specific

genre factor at work here. Moreover, in the specifics of this evolution, the two

American newspapers resemble each other more than they do TIME Magazine.

Likewise, the British newspapers are more like each other than they are like the

19

American newspapers. We can therefore speak of variety-specific sub-genre

differences in the quantitative evolution of modal auxiliaries.

On the whole the Hong Kong newspaper data resemble the British ones quite

closely in the level of occurrence of the modals and quasi-modals, which stands in

contrast to variety-as-a-whole evidence presented in earlier research. Another

similarity is that they also display a general drop in the dispersion of the modals. The

Hong Kong newspaper differs from the British ones in a number of specifics, however.

Not all verbal modal expressions evolve in the same direction in the Hong Kong and

British press, and the convergence of the modal and quasi-modal groups is smaller in

the Hong Kong paper.

This study has revealed the commercial information tool Factiva® to be an

effective instrument for the collection of temporally distinct quantitative data for

certain text genres, which in combination with its global coverage makes it an

excellent tool for the investigation of genre-specific diachronic variation between

different national varieties of English. A major strength is the amount of data it can

generate. Major restrictions are both the very limited number of genres and the as yet

fairly narrow time-span it can cover. The former limitation is not likely to be removed;

the latter is getting better little by little, day by day.

References

Aarts, Bas, Joanne Close, Geoffrey Leech & Sean Wallis (eds.). 2013. The Verb

Phrase in English: Investigating Recent Language Change with Corpora.

Cambridge: Cambridge University Press.

Biber, Douglas & Bethany Gray. 2013. Being specific about historical change: The

influence of sub-register. Journal of English Linguistics 41(2): 104-134.

Bolton, Kingsley (ed.). 2002. Hong Kong English: Autonomy and Creativity. Hong

Kong: Hong Kong University Press.

Bowie, Jill, Sean Wallis & Bas Aarts. 2013. Contemporary change in modal usage in

spoken British English: Mapping the impact of “genre”. In English Modality:

Core, Periphery and Evidentiality, Juana I. Marín-Arrese, Marta Carretero

Lapeyre, Jorge Arús Hita & Johan van der Auwera (eds), 57-94. Berlin: De

Gruyter.

Chesley, Paula & R. Harald Baayen. 2010. Predicting new words from newer words:

Lexical borrowings in French. Linguistics 48(4): 1343-1374.

Collins, Peter. 2009a. Modals and quasi-modals in world Englishes. World Englishes

28: 281-292.

Collins, Peter. 2009b. Modals and Quasi-modals in English. Amsterdam: Rodopi.

Collins, Peter & Xinyue Yao. 2012. Modals and quasi-modals in New Englishes. In

Mapping Unity and Diversity World-wide: Corpus-based Studies of New

Englishes, Marianne Hundt & Ulrike Gut (eds), 35-53. Amsterdam: Benjamins.

Gries, Stefan Th. 2008. Dispersions and adjusted frequencies in corpora. International

Journal of Corpus Linguistics 13,(4): 403-437.

Gries, Stefan Th. 2010. Dispersions and adjusted frequencies in corpora: Further

explorations. In Corpus-linguistic Applications: Current Studies, New

Directions, Stefan Th. Gries, Stefanie Wulff & Mark Davies (eds), 197-212.

Amsterdam: Rodopi.

Hakutani, Yoshinobu & Charles H. Hargis. 1972. The syntax of modal constructions

in English. Lingua 30: 301-332.

20

Krug, Manfred. 2000. Emerging English Modals: A Corpus-based Study of

Grammaticalization. Berlin: Mouton de Gruyter.

Leech, Geoffrey N. 2003. Modality on the move: The English modal auxiliaries 1961-

1992. In Modality in Contemporary English, Roberta Facchinetti, Manfred

Krug & Frank R. Palmer (eds), 223-240. Berlin: Mouton de Gruyter.

Leech, Geoffrey. 2011. The modals ARE declining: Reply to Neil Millar’s “Modal

verbs in TIME: Frequency changes 1923–2006”, International Journal of

Corpus Linguistics 14:2 (2009), 191-220. International Journal of Corpus

Linguistics 16(4): 547-564.

Leech, Geoffrey. 2013. Where have all the modals gone? An essay on the declining

frequency of core modal auxiliaries in recent standard English. In English

modality: Core, periphery and evidentiality, Juana I. Marín-Arrese, Marta

Carretero Lapeyre, Jorge Arús Hita & Johan van der Auwera (eds), 95-115.

Berlin: De Gruyter.

Leech, Geoffrey, Marianne Hundt, Christian Mair & Nicholas Smith. 2009. Change in

Contemporary English. Cambridge: Cambridge University Press.

Mair, Christian & Geoffrey N. Leech. 2006. Current changes in English syntax. In

Handbook of English Linguistics, Bas Aarts & April McMahon (eds), 318-342.

Oxford: Blackwell.

Millar, Neil. 2009. Modal verbs in TIME: Frequency changes 1923–2006.

International Journal of Corpus Linguistics 14(2): 191-220.

Mukherjee, Joybrato & Marco Schilk. 2012. Exploring variation and change in New

Englishes: Looking into the International Corpus of English (ICE) and beyond.

In The Oxford handbook of the history of English, Terttu Nevalainen &

Elizabeth Closs Traugott (eds), 189-199. Oxford: Oxford University Press.

Nevalainen, Terttu & Elizabeth Closs Traugott (eds.). 2012. The Oxford handbook of

the history of English. Oxford: Oxford University Press.

Setter, Jane, Cathy S. P. Wong & Brian H. S. Chan. 2010. Hong Kong English.

Edinburgh: Edinburgh University Press.

Smith, Nicholas. 2003. Changes in the modals and semi-modals of strong obligation

and epistemic necessity in recent British English. In Modality in

Contemporary English, Roberta Facchinetti, Manfred Krug & Frank R. Palmer

(eds), 241-267. Berlin: Mouton de Gruyter.

Smith, Nicholas & Geoffrey Leech. 2013. Verb structures in twentieth-century British

English. In The Verb Phrase in English: Investigating Recent Language

Change with Corpora, Bas Aarts, Joanne Close, Geoffrey Leech & Sean

Wallis (eds), 68-98. Cambridge: Cambridge University Press.

21

1990 1990 % 2000 2000 % 2010 2010 % will 28153 53.44 34100 54.25 27766 53.39 would 27817 52.8 35013 55.7 29036 55.83 can 21524 40.85 27871 44.34 24875 47.83 could 19587 37.18 25244 40.16 22366 43.01 may 17671 33.54 21620 34.39 17784 34.2 might 11557 21.94 14231 22.64 13194 25.37 must 8421 15.98 9092 14.46 6974 13.41 should 12297 23.34 15126 24.06 12665 24.35 shall 404 0.77 504 0.8 256 0.49 ought to 800 1.52 921 1.47 511 0.98

M ratio 2.81 2.92 2.99

have to 15178 28.81 20201 32.14 16027 30.82 got to 1040 1.97 1778 2.83 1237 2.38 want to 11917 22.62 18456 29.36 15168 29.17 going to 6472 12.28 9744 15.5 8473 16.29 need to 5786 10.98 8514 13.54 9394 18.06 used to 4396 8.34 5591 8.89 4567 8.78 supposed to 1321 2.51 2177 3.46 1956 3.76 had better 110 0.21 134 0.21 109 0.21 able to 5247 9.96 6806 10.83 5746 11.05 Q ratio 0.98 1.17 1.21

ratio diff. 1.83 1.75 1.7

text total 52685 62859 52006

Table 3: The New York Times

22

1990 1990 % 2000 2000 % 2010 2010 % will 32233 57.35 31198 66.48 22726 60.1 would 30135 53.62 27797 59.23 19542 51.68 can 24201 43.06 26263 55.96 20403 53.96 could 21917 39 21753 46.35 15292 40.44 may 18990 33.79 18230 38.85 11935 31.56 might 10530 18.74 10624 22.64 9346 24.72 must 9527 16.95 8808 18.77 6181 16.35 should 15251 27.14 14723 31.37 11180 29.57 shall 532 0.95 463 0.99 254 0.67 ought to 1369 2.44 1245 2.65 519 1.37

M ratio 2.93 3.43 3.1

have to 18218 32.41 17914 38.17 11356 30.03 got to 1862 3.31 1846 3.93 1052 2.78 want to 15161 26.97 17505 37.3 11171 29.54 going to 9597 17.08 10137 21.6 6504 17.2 need to 6904 12.28 9327 19.88 7857 20.78 used to 4335 7.71 4421 9.42 2870 7.59 supposed to 1804 3.21 2120 4.52 1218 3.22 had better 131 0.23 147 0.31 68 0.18 able to 5817 10.35 6133 13.07 4181 11.06 Q ratio 1.14 1.48 1.22

ratio diff. 1.79 1.95 1.88

text total 56204 46928 37811

Table 4: The Washington Post

23

1990 1990 % 2000 2000 % 2010 2010 % will 19386 54.88 33057 50.4 45630 49.72 would 18029 51.04 28422 43.33 38436 41.88 can 9206 26.06 23596 35.97 31456 34.28 could 12727 36.03 21911 33.4 30312 33.03 may 9263 26.22 17329 26.42 24352 26.54 might 5508 15.59 10126 15.44 12845 14 must 5190 14.69 8710 13.28 11488 12.52 should 10101 28.59 15866 24.19 19795 21.57 shall 665 1.88 895 1.36 750 0.82 ought to 396 1.12 670 1.02 697 0.76

M ratio 2.56 2.45 2.35

have to 8739 24.74 17239 26.28 21956 23.92 got to 238 0.67 1005 1.53 1753 1.91 want to 3939 11.15 11892 18.13 16803 18.31 going to 1751 4.96 5330 8.13 8730 9.51 need to 3422 9.69 7451 11.36 20056 21.85 used to 1462 4.14 4154 6.33 5661 6.17 supposed to 270 0.76 931 1.42 1489 1.62 had better 46 0.13 121 0.18 127 0.14 able to 2996 8.48 5815 8.87 7304 7.96 Q ratio 0.65 0.82 0.91

ratio diff. 1.91 1.63 1.44

text total 35326 65592 91773

Table 7: The Times

24

1990 1990 % 2000 2000 % 2010 2010 % will 14034 62.79 26791 62.05 42031 50.97 would 13032 58.31 24288 56.26 36934 44.78 can 7517 33.63 21444 49.67 33232 40.3 could 9209 41.21 18686 43.28 29523 35.8 may 6953 31.11 14169 32.82 22485 27.26 might 3661 16.38 9892 22.91 16106 19.53 must 3799 17 7722 17.89 11240 13.63 should 7089 31.72 13669 31.66 20901 25.34 shall 371 1.66 662 1.53 899 1.09 ought to 294 1.32 688 1.59 863 1.05

M ratio 2.95 3.2 2.6

have to 6760 30.25 16090 37.27 23374 28.34 got to 223 1 1320 3.06 2125 2.58 want to 3286 14.7 12060 27.93 18734 22.72 going to 1618 7.24 5711 13.23 10213 12.38 need to 2573 11.51 7137 16.53 13790 16.72 used to 1210 5.41 4098 9.49 6201 7.52 supposed to 314 1.4 1210 2.8 1651 2 had better 45 0.2 119 0.28 111 0.13 able to 2126 9.51 5390 12.48 7614 9.23 Q ratio 0.81 1.23 1.02

ratio diff. 2.14 1.97 1.58

text total 22349 43173 82470

Table 8: The Guardian

25

1990 1990 % 2000 2000 % 2010 2010 % will 12954 64.51 19384 53.74 12663 56.26 would 11869 59.1 19105 52.96 11093 49.28 can 5869 29.23 9327 25.86 8138 36.15 could 6933 34.52 11162 30.94 7517 33.4 may 5123 25.51 8036 22.28 6219 27.63 might 2891 14.4 4090 11.34 2915 12.95 must 2473 12.31 3447 9.56 2774 12.32 should 6006 29.91 7988 22.14 6091 27.06 shall 215 1.07 204 0.57 143 0.64 ought to 144 0.72 120 0.33 61 0.27

M ratio 2.71 2.3 2.56

have to 5434 27.06 6654 18.45 5783 25.69 got to 145 0.72 204 0.57 146 0.65 want to 2712 13.5 5091 14.11 4339 19.28 going to 1181 5.88 1854 5.14 1139 5.06 need to 2089 10.4 3806 10.55 3573 15.87 used to 784 3.9 1397 3.87 1278 5.68 supposed to 179 0.89 389 1.08 380 1.69 had better 15 0.07 44 0.12 26 0.12 able to 1898 9.45 2633 7.3 2072 9.21 Q ratio 0.72 0.61 0.83

ratio diff. 1.99 1.69 1.73

text total 20082 36073 22509

Table 11: South China Morning Post