Download - In search of the S (curve) in there

Transcript

1

Author’s post-review version. Published in ‘Ye whom the charms of grammar please’: Studies

in English Language History in Honour of Leiv Egil Breivik, publisher: Peter Lang, editors:

Kari E. Haugland, Kevin McCafferty, Kristian A. Rusten, pp. 27-54. ISBN: 978-3-0343-

1779-5.

http://www.peterlang.com/index.cfm?event=cmp.ccc.seitenstruktur.detailseiten&seitentyp=pr

odukt&pk=81061&cid=5&concordeid=431779. Do not redistribute without permission.

In search of the S (curve) in there1

Gard B. Jenset

TORCH - The Oxford Research Centre in the Humanities, University of Oxford

Abstract: The present article considers the evolution of existential there in Old English, with a

focus on the mechanisms underlying the propagation of the change once the initial innovation

had taken place. Drawing on the modeling approach taken by Blythe and Croft (2012), it is

argued that social and structural factors colluded in the early stage propagation of existential

there. Alongside grammatical factors, it is argued that author prestige constituted a social

factor in the propagation of existential there. While this factor on its own is incapable of

explaining the emergence of existential there, it is shown that texts with one identified author

from the late Old English use the lexeme at higher rates. From a methodological perspective,

the article illustrates the value of advanced quantitative statistical modeling as a means to

integrate corpus data and historical linguistic theory.

1 The author would like to thank Jóhanna Barđdal, Tonya Kim Dewey, Barbara McGillivray,

and two anonymous reviewers for their helpful comments and suggestions; nevertheless,

responsibility for the final article rests with the author alone.

2

1 INTRODUCTION

The Present-day English distinction between existential and locative there, exemplified in (1)

and (2) with data from the spoken part of the Contemporary Corpus of American English or

COCA (Davies 2008), goes back to Old English (Breivik 1990). It is widely accepted that the

existential use of there diachronically originated from the locative adverb there, plausibly

through the interplay of syntactic, semantic, and pragmatic factors (Breivik 1990; 1997, Jenset

2010; 2014).

(1) there was a brand-new, just-made key left at the crime scene in the lock. (Existential)

(2) she was there, stuck in the house in New York. (Locative)

However, an important and as of yet unsettled question is how this innovation, once it had

occurred, propagated through the speech community. The essential question explored by the

present paper is: how did Old English þær (for convenience discussed in its modern form

there), once the initial reanalysis as a grammatical subject had taken place, survive and affirm

this new status?

1.1 The evolution of existential there in Old English

In Old English, a number of different constructions were available for expressing existential

or presentational propositions. Breivik (1990) defines the three primary constructions as

follows: type D with there, type C without there but requiring one in later stages of English

(i.e. a zero pronoun), and type A/B which consist of miscellaneous affirmative main clause

constructions. The types are exemplified in (3)–(5) below using examples from Breivik (1990:

198–199).

3

(3) Micel yfelnyss wæs on iudeiscum mannum. (type A/B)

Much evil was on Jewish men;

‘There was much evil in Jewish men.’ (ÆChom I XI:317–19)

(4) Wæs bi éastan þære ceastre welneah sumo cirice… (type C)

Was by east that town quite-near some church…;

‘There was close to the east close to the town a church…’ (Bede 62:2–3)

(5) and đær wæs án cyrce of scíndendum golde. (type D)

and there was one church of shining gold;

‘and there was a church of shining gold’ (ÆChom 1 XIII:196–97)

As established by Breivik (1990), the type C construction was gradually overtaken in

frequency by the type D construction with there over the course of the Old and Middle

English periods. Following work in Construction Grammar (Goldberg 1995 , Croft 2001), the

constructions in (3)–(5) can be considered form-meaning pairs in their own right, and the

reorganization, including changes in relative frequency, in the speech community can be seen

as constructional change (Hilpert 2013: 16). This theoretical positioning is not mere window

dressing, but serves to highlight an important point: rather than casting the emergence of

existential there as a question of the insertion of a lexeme into a grammatical slot, the

implication is that the entire construction should be investigated as a unit in its own right, with

distinct patterns of change.

1.2 Mechanisms of change

The present study follows Croft (2000: 185) in defining change as modified replication (i.e.

innovation), followed by differential replication of the modified variant, essentially defining

change as ‘innovation + replication’. It is probably fair to say that more has been written on

the re-analysis, or innovation, than on the replication (or propagation), as far as existential

4

there is concerned. Syntactic, semantic, and pragmatic factors are all plausible causes for in

initial innovation; however, that is not to say that they are equally capable of explaining the

propagation of the innovation.

The question of which mechanisms operate to propagate linguistic change, and how

they operate, is still far from settled (Blythe & Croft 2012:269). Blythe and Croft distinguish

two broad categories, namely social factors and the frequency of interaction between

interlocutors (2012: 269–270). So far, explanations for the evolution of existential there have

tended to fall in the frequency category. Breivik (1990) suggests that the propagation of the

change, once the initial reanalysis had been achieved, passed through an essentially

deterministic typological process whereby the combined pressure from pragmatic context and

syntactic changes would push the frequency of use from one typological category to another

(1990: 247–249). Other studies dealing with related problems have tended to take a similar

approach. Williams (2000) explains the sudden disappearance of empty expletive subjects in

Middle English by the disappearance of verb initial word order; an explanation which is

analogous to the weight given to the shift from verb second to verb third order in Breivik

(1990).

Later studies on the topic have tended to build on the same line of argumentation.

Breivik (1997) adds a valuable cognitive linguistic dimension to the question of reanalysis,

and Breivik & Swan (2000) extend the argument in the spirit of frequency of interaction by

proposing that the grammaticalization of there proceeded through ‘repeated use in local

syntactic contexts’ (2000:27). Jenset (2010) builds on the same line of reasoning by arguing

that differentiated frequencies of use involving both the linguistic context and there would

gradually shift the cognitive interpretation in favor of existential there via perceptual bias.

The argumentation nevertheless rests on a view which can be considered deterministic in that

there is no room for social factors like identity or prestige (Baxter et al. 2009:258).

5

However, any deterministic model of the development of existential there must face

the questions posed in Baxter et al. (2009) and Blythe & Croft (2012), namely: does it fit the

data? It is well known that many linguistic phenomena follow an S-curve in their diachronic

development, leading some to consider it a default pattern for language change (Chambers

2002:361). See also the discussion in Blythe & Croft (2012: 278–281). The S-curve is attested

in the development of existential there by Breivik (1990: 226), a result corroborated by

findings in Jenset (2010: 273) with a partially different and larger dataset.

Blythe and Croft (2012) present a mathematical model of linguistic change where they

simulate the different assumptions made by various attempts to explain the propagation of

change, i.e. the S-curve. Their results are clear: only an explanatory model that involves a

differential social weighting of competing variants will result in an S-curve under any

reasonable assumptions. A key point in their argumentation is that the variants themselves are

associated with social valuations. According to Blythe and Croft’s model, the S-curve is in

other words not the result of social accommodation and interaction as argued by others

(Nevalainen & Raumolin-Brunberg 2003: 53–55).

The question of the diachronic evolution of existential there can then be broken down

into two sub-questions, corresponding to the notions of innovation or actuation, and

propagation or diffusion (Croft 2000:4–5):

1. What led to the reanalysis of locative there as an existential subject?

2. What led to the propagation of this reanalysis?

Explaining the S-curve properly falls under the domain of question 2, which will serve as the

focus for the present study. However, the study will refrain from going beyond Old English,

both for reasons of space but also because the early phase of propagation is crucial under

Blythe & Croft’s model (2012: 292).

6

1.3 Data and organization

The present study draws its data from the York-Toronto-Helsinki Parsed Corpus of Old

English Prose or YCOE (Taylor et al. 2003), and more specifically from a dataset that was

collected and described in more detail in Jenset (2010). The dataset has been enriched with

more information for the purposes of the present study, as explained below in section 4. Since

the YCOE annotation does not distinguish between existential and locative uses of there, the

distinction between them has been made based on a machine learning algorithm. Jenset (2010:

254–260) describes the procedure that was used for that study, but the for the present study a

new, improved model was employed (Jenset To appear), yielding improved results more

similar to the those reported by Breivik (1990).

The paper is organized as follows: section 2 provides more details on the evolutionary

model that underlies Blythe and Croft’s mathematical model. Section 3 presents an

exploratory step discussing potential candidates for social factors that could plausibly be

involved in the evolution of existential there. Section 4 discusses a statistical model of the

data and illustrates how social and linguistic factors can be accommodated side by side as

explanatory factors. Section 5 pulls the threads from sections 3 and 4 together, while section 6

presents the summary conclusions.

2 REPLICATION MECHANISMS

Blythe & Croft (2012) build on Croft’s evolutionary framework for language change (2000)

in their assumptions. They see language as a complex, adaptive system in which speakers

interact and where the speakers’ past behavior constrains current and future behavior as a

7

consequence of competing factors (Blythe & Croft 2009:47–48). The competing factors can

be considered selectors, or competing differential causes, acting upon replicators (linguistic

structures, i.e. that which is being replicated or passed on). The speaker is modeled in terms of

an interactor, i.e. a vehicle through which the environment can cause differentiated replication

(Croft 2000: 20–40).

Building upon this framework, Blythe & Croft (2012) propose a typology of

mechanisms for propagation of language change in a way that is suitable for mathematical

modeling. Their four mechanisms are as follows:

1. Neutral evolution: linguistic change without any social mechanisms. As an example,

Blythe and Croft give the fixation on the most frequent variant in a situation with fluctuation

between different variants (2012: 275–276).

2. Neutral interactor selection: change driven by the frequency of interaction among

interlocutors, i.e. the strength of the ties between. Social network structures, without any

social differentiation assigned to the linguistic variants themselves, are given as an example

under which this change could occur (2012: 273–274).

3. Weighted interactor selection: change driven by the asymmetry in social prestige between

innovators and followers. The crucial property of this mechanism is the identity of the group

using a given variant, not the variant itself (2012: 274–275).

4. Replicator selection: change driven by the prestige directly associate with linguistic forms

themselves. In other words, the linguistic forms are not considered prestigious because they

are associated to one particular group; instead the forms (or replicators) take on the prestige

directly, possibly through socio-economic differences and social mobility (2012: 272–273).

8

An important property of these four mechanism-specifications is that they are

accompanied by very detailed specifications, glossed over in the presentation above, that are

implemented in mathematical models simulating the behavior of speakers under the various

conditions. Crucially, Blythe and Croft find that only under the assumptions of replicator

selection can a convincing S-curve easily be arrived at (2012: 291–292). Their argument is

not that mechanisms 1-3 cannot be attested or are not relevant; merely that mechanisms 1-3

do not, under their simulations, result in a propagation pattern resembling an S-curve. Since

even a purely frequency-driven, neutral change must still operate among speakers

communicating with each other, the model implication of this is that each speaker is given the

same weight or influence on the other during interaction. Again, the point is not that such

scenarios do not take place, but that obtaining an S-curved pattern of change proves difficult

under any reasonable assumptions.

Blythe and Croft note that the change rate of an innovation will depend on how that

inovation is distributed in the community (2012: 297). If the propagation depends only on the

identity of the innovator, the propagation tends to follow a different path than in replicator

selection, i.e. when the variant itself is associated with a differential social weighting (2012:

297–298).

In short, Blythe and Croft specify the assumptions under which an S-curve can be

achieved (or expected) in painstaking detail, and reach the conclusion that some sort of social

weighting of the variant itself is required for the innovation to not only spread initially but

also to complete the full S-curve trajectory. Taking Blythe and Croft’s approach as a start, the

following section will explore differential distributions of existential there in YCOE based on

social variables.

9

3 IN SEARCH OF DIFFERENCES

Given the argumentation in Blythe & Croft (2012), it is reasonable to look for social variables

that might explain the observed S-curve found in the evolution of existential there. However,

the possibility that frequency of interaction is involved should not be dismissed out of hand.

Since frequency of interaction is very hard to estimate from a written historical corpus, corpus

size was used as a proxy. If the driving mechanism behind frequency of interaction is a higher

degree of interaction with some speakers and a lower degree of interaction with others (Blythe

& Croft 2012: 270–271) then the total corpus size might serve as a possible indicator variable.

Although some degree of chance is involved, there were doubtlessly ‘social preferences

attitudes’ (Labov 2001: 191) at work with respect to who wrote what during the Middle Ages.

The argument can also be extended to which texts were preserved.

With this in mind, the proportion of existential uses of there (out of all instances of

there) was calculated for the Old English, Middle English, and Early Modern English data.

The corpus was measured in the number of sentences rather than words, since this

corresponds directly to how instances of there were collected (Jenset 2010). However, a

Spearman rank correlation test, a robust non-parametric test of correlation, showed that there

was no correlation between corpus size and the proportion of existential uses of there in the

corpus (p = 1, not significant). Furthermore, it is worth noting that mechanism 1 (neutral

evolution by frequency of use) is less plausible when it comes to existential there, since the

data attest to a move away from the most frequent variant (type C) to a minority variant (type

D, i.e. the construction with there). Combined, these considerations strengthen the case for the

Blythe-Croft model with respect to the S-curve observed in the evolution of existential there.

The question is: which social variable (or variables) was involved? In answering this question

the focus will be on the Old English period.

10

The natural categories to consider are of course social class, sex and gender, social

context, and ethnicity (Trudgill 2002: 373). However, in historical corpus linguistics the

parameters of the social context, as much as the linguistic variables themselves, are strictly

determined by who wrote and what was preserved (Schneider 2002: 89–90). Since the

material in question originated within a narrowly circumscribed literary class that was

essentially the purview of men, the concern here will be with social context and ethnicity. The

YCOE documentation provides information which can serve as indicators of social context

(viz. genre, and the text itself which at least to some degree reflects one or more authors) as

well as ethnicity (viz. dialect).

3.1 Exploring corpus data through dispersion

Having identified some candidate variables for social differentiation of variants, the next

question is how to choose between them. The problem can be reformulated as follows: which

social variable stands sufficiently out from the others with respect to existential uses of there

to make it a plausible explanatory factor for the social differentiation assumed to underlie the

observed S-curve?

In corpus linguistics variation is typically reported in terms of frequency of co-

occurrence or a derived measure of association such as Z-score or mutual information (MI),

see Schmitt (2010: 124–132) for an overview. However, Gries (2008) points out that co-

occurrence in itself can be extremely problematic when the dispersion of the item(s) in

question is not properly taken into account. If a term t occurs 100 times in a corpus with 100

parts we would expect to find more or less 1 instance in each corpus part if the term is a

general one. If all the 100 instances of t were found in only one of the 100 corpus parts, it

might suggest that it has a specialized meaning or that it has taken on certain special

connotations. This argumentation can be applied directly to the question of existential uses of

11

there and social context, where the corpus parts are the various categories within each

indicator of social context and ethnicity.

Ethnicity and social context are not recorded directly in the corpus data, but the corpus

meta-data provides some relevant information. Ethnicity is no clear-cut, unproblematic

category which stands in a one to one relationship with language (Fought 2002). However, it

is probably fair to say that both language and ethnicity express identity in ways that

sometimes overlap, and that language can express national but also regional ethnicity

(Chambers 2002: 362–364). The corpus meta-data provides information about the dialect of

Old English used, which, together with information about the genre and the text itself, are

used as categories over which the dispersion of existential there is gauged.

Several measures of dispersion exist (Gries 2008: 406–410). The present study uses

the measure introduced in Gries (2008), DEVIANCE OF PROPORTIONS or DP, since it is

intuitive, robust and conceptually simple, and has a degree of sensitivity that many other

measures lack (Gries 2008: 417–419). The fundamental idea is to compare the size of each

corpus part (relative to the overall corpus) to the proportion of term t in that part, and to sum

the differences to obtain a single value. In more technical terms, we can define DP as the sum

of the absolute differences between observed and expected proportions divided by 2, see also

the formula below:

The expected proportion (EP) is the size of each corpus part relative to the size of the corpus,

whereas the observed proportion (OP) is the number of instances of terms t in the corpus part

(denoted by subscript i), relative to all instances of t in the corpus as a whole. A low DP value

12

indicates that the term is spread fairly evenly across the corpus parts in question, whereas a

higher DP value indicates that the term is spread unevenly, that is, it tends to lump together in

some parts of the corpus. To interpret the magnitude of DP in more detail, Gries (2008: 420–

421) provides DP results applied to data sampled from the BNC. The results range from 0.08

for the indefinite article a (obviously very evenly dispersed) to 0.99 for rare and specialized

lexemes like macari and mamluks. Of greater practical importance is perhaps the

establishment of cutoff points in between these values near the theoretical endpoints of 0 and

1. In his data, Gries classifies intermediate DP values as those between the first (0.27) and

third (0.93) quartiles, with observed intermediate values ranging from 0.41 to 0.80. It should

be clear that this taxonomy is a heuristic, but it is nevertheless valuable as a baseline for

comparison when judging the magnitude of DP values.

3.2 The dispersion of existential there in Old English

Using the formula above, a DP value for existential uses of there in the YCOE material was

calculated based on the following corpus parts (DP value parenthesis): dialect (0.12),

religious vs. secular texts (0.19), genre (0.25), text (0.40). The plot in figure 1 shows these

values plotted against a horizontal line representing a cutoff between the first and second

quartiles in the data presented in Gries (2008: 420). Like the subsequent analyses in this

paper, the calculations and the creation of the plot was done with the statistical package R (R

Development Core Team 2011). For exploratory purposes we can loosely define the cutoff

between the first and second quartiles as the dividing line between a small and a medium

deviance from the expected proportions. The DP value for individual texts clearly stands out

from the other three categories being the only one to fall into the intermediate range suggested

by Gries (2008: 420–421). Since a higher DP value represents a more uneven dispersion, the

interpretation is as follows: existential uses of there are fairly evenly distributed over all the

13

parts of the corpus when we divide it up based on dialect, genre, or whether or not we have a

religious text. However, if we divide the corpus by individual texts, we find that existential

uses of there are distributed less evenly.

Figure 1: Barplot showing the difference in magnitude of DP values for existential there.

Each bar represents one of the four social context variables that could be derived from the

corpus meta-data. A higher value indicates that occurrences of existential there are unevenly

distributed among the corpus parts represented by the category. The dotted horizontal line

represents a heuristic cutoff point in the form of the value of the first quartile in the data

discussed by Gries (2008).

The plot in figure 1 suggests that existential there is found disproportionally in some texts

compared to others, in other words: some authors would use the lexeme more often than

others. Common function words and light verbs fall in the lowest range of DP values in Gries’

data (2008: 421), and this is where we would intuitively expect existential uses of there to fall

14

too. Since the dispersion of existential uses of there by text falls in the intermediate range of

DP values, we can hypothesize that something induced different usage rates of the lexeme in

different authors. I propose that this ‘something’ was a difference in social valuation, cf. the

argument in Blythe & Croft (2012).

The DP metric should primarily be considered an exploratory tool. It gives an

indication of uneven distributions, but it does not inform us directly as to why that might be

case. Furthermore, as the discussion above indicates, the DP measure of dispersion is not a

formal test statistic associated with a null-hypothesis test that can be expressed as a p-value. A

formal null-hypothesis test of significance requires a formalized hypothesis to be meaningful

and interpretable, and this makes it clear precisely why the DP measure is very useful in data

exploration. Initially, i.e. without further information, our best guess would be that a lexeme is

evenly distributed. However, as a formalized hypothesis test this assumption is highly

underspecified. It is reasonable to suspect that a lexeme might be unevenly distributed in

some way, but without specific assumptions to test (either based on theory or previous

research) it is simply premature to employ a formal null-hypothesis test. Instead, the DP

measure quantifies the degree to which the observed data differs from the simplistic a priori

assumption. Based on the patterns highlighted by the DP measure, it is then possible to

construct a more realistic hypothesis which is more readily testable with formalized tests and

models.

It would strengthen the assumption made above if the dispersion of existential uses of

there could be compared to a competing variant with a lower DP value. The dominant type of

existential construction in Old English was, type C, i.e. the one without there (Breivik 1990:

227). The construction is characterized by a verb-initial linear order, although the verb might

be preceded by a negative particle or an adverbial of time or place (Breivik 1990:193). As the

examples in Breivik (1990: 199–200) attest to, the construction is sufficiently heterogeneous

15

to make automated searched in YCOE difficult, since the corpus is not annotated for

constructions. However, one sub-type is reasonably easy to identify, namely cases with an

initial negative particle with a cliticized form of be followed by a nominative NP, which is a

sub-type of existential constructions without there, cf. also the definition of Breivik’s type C

pattern (Breivik 1990: 192–194) exemplified by (4) in section 1 above. An example of this

construction from Wulfstan’s Homilies, found in YCOE, is shown in (6) below:

(6) Nis nan swá yfel scaþa swá is déofol silf;

not-is none so evil enemy as is devil self;

‘There is no enemy so evil as the devil himself.’ (cowulf,WHom_16b:29.1366)

Searching YCOE for this pattern using CorpusSearch 2.0 yielded 633 sentences. Employing

the same formula for DP as above, the resulting DP value for cliticized type C constructions

by text was 0.29. In other words, the cliticized type C existential construction is more evenly

distributed among the texts in the corpus than the type D construction with existential there.

This result strengthens the assumptions made above, since it demonstrates that the, in

diachronic terms, outgoing cliticized type C variant (Breivik 1990: 226–227) was generally

more widely used compared to the new variant with there in YCOE. Incidentally, the result

also demonstrates that the differences between categories in figure 1 are not an artifact of the

choice of text as the category used to partition the corpus. Although this difference does not in

itself establish that there was a difference in social valuation of the variants, it strongly

suggests that the difference in Old English between existential constructions with and without

there is not merely a question of magnitude as established by Breivik (1990: 224). The two

competing constructions also differ in their distribution among authors, i.e. who used it. The

16

next section deals with the question of how this difference can be linked to the social

valuation or weighting of competing variants.

4 AUTHORS, NORMS, AND CONSTRUCTIONS

The explorative steps outlined in the previous section indicated that Old English authors

differed in their use of existential there, exceeding the differences arising from genres or

dialects. A similar, but weaker effect could be seen for existential constructions lacking an

existential pronoun. This raises the question of whether the two are in a complimentary

distribution. This question is interesting since it offers a glimpse into the social aspects of the

grammatical system. If the distinction cannot be explained by grammatical or stylistic factors,

we have some provisional reason for exploring explanations based on prestige, alongside

author idiosyncrasies.

Section 3 demonstrated that the use of existential there in Old English is fairly

heterogeneously dispersed among the texts in the YCOE corpus. This contrasts with an

example of the dominant existential construction without there whose dispersion is much

more homogeneous. Furthermore, it was suggested that this highlighted not only a difference

in how much the two constructions were used, but in who used them, i.e. a form of social

differentiation. Although the argument presented above is valid as far as it goes, it does not

present direct evidence that differential social weighting was involved. In short, although the

argument is plausible, it could certainly be supported better. The present section discusses

how the link between text/author, rates of existential uses of there, and differential social

weighting can be made more explicit.

4.1 Standardization and differential social weighting

17

At the end of the tenth century and beginning of the eleventh century there were attempts

being made to standardize both Old English spelling and grammar by the so-called

‘Winchester School’, notably represented by Ælfric (Gneuss 1972, Gretsch 2009). Although

this standard, or ‘preferred usage’, seems to have been disseminated with some degree of

systematicity (Irvine 2006: 49), it is perhaps more accurate to think of it as a supra-regional

model concentrated around centers of learning and scholarship, as was the case for other elite-

driven prestige variants elsewhere in Europe (Salmons 2012: 159). Nevertheless,

standardization toward a socially prestigious norm would provide a plausible context for the

social weighting required in Blythe and Croft’s model. The next section describes how we can

measure the degree of such standardization efforts.

A good case for the importance for standardization would present itself if texts that

can be linked to standardization efforts could be shown to be using the construction with

existential there more often than expected, and if those texts are less likely to use the type C

construction. In short: are the type C and type D constructions in complimentary distribution,

and are any texts linked to standardization efforts aligned with those distributions? To decide

whether constructions with or without there were in a complimentary distribution in Old

English, the data should preferably include a full range of the various types of existential

constructions, some of which are not easily retrieved from a corpus without manual

inspection. For this reason, data from Breivik (1990:198) will be considered. The data

consists of Old English clauses classified as existential and belonging to one of the three types

exemplified in (3)–(5) above: with existential there (type D), with a zero pronoun (type C), or

other clauses (type A/B).

18

Table 1: Frequency counts of three types of existential constructions in Old English texts: Bede’s Ecclesiastical

History, the Blickling Homilies, the Anglo Saxon Chronicle, the Book of Exeter, the Old English Orosius, and

Ælfric’s Catholic Homilies. After Breivik (1990: 198).

Other (A/B) Ø (C) There (D)

Bede 14 97 1

Blickling 13 33 12

ASc 6 31 4

Exeter 8 29 4

OR 13 73 21

Ælfric 27 123 39

The frequencies in table 1 can be conveniently visualized in an association plot, where the

dotted lines represent expected frequencies, the height of the bars represent (positive or

negative) deviations from the expected frequencies, and the width of the bars represent the

amount of evidence supporting the observation. For more details on association plots see

Jenset (2010: 87–88) and references therein.

19

Figure 2: Association plot displaying deviations from expected frequencies for three types of

existential constructions in Old English texts, based on the frequencies in table 1.Gray bars

represent positive deviations; black bars negative deviations.

The plot in figure 2 is interesting for several reasons. Firstly, it shows that existential

clauses with an initial NP (Breivik’s type A/B) share no particular relationship with the other

two types as far as distributions are concerned: there is no symmetry between over- or under-

representation of type A/B clauses and the other two types. Furthermore, deviations from the

expected values are small. Secondly, we see that existential constructions with there (type D)

and those with zero pronouns (type C) are indeed in complimentary distribution. When one of

the two is overrepresented in a text, the other is underrepresented. Furthermore, the strength

of the evidence as shown by the height and width of the bars lends plausibility to the

interpretation that this reflects real differences. Finally, we see that for compiled works such

as the Anglo Saxon Chronicle and the Book of Exeter, the differences are much smaller than

20

for single author works, such as Ælfric’s works and the Old English translation of Bede. This

raises the question of whether such differences have arisen simply because of author

idiosyncrasies which have cancelled themselves out in the multi-author works, or because of a

more systematic difference.

The differences and similarities can be summarized more succinctly using a technique

called CORRESPONDENCE ANALYSIS (CA). CA is a technique for reducing the variation in

frequencies in a multivariate matrix to its most salient patterns of association in a manner that

lends itself to visualization in a two-dimensional graph, a biplot. For further technical details

see Greenacre (2007); see Baayen (2008: 128–136) or Jenset & McGillivray (2012) for

introductions to CA with a more linguistic focus.

Figure 3: CA biplot of the frequencies in table 1. The plot captures the data well, explaining

100% of the variation in the data. The Blickling Homilies, the Old English Orosius, and

21

Ælfric’s Catholic Homilies are notable in their tendency to use existential constructions with

there.

The CA biplot in figure 3, created with the ca package in R (Greenacre & Nenadic

2007), summarizes the most salient patterns in the data in a two-dimensional subspace. The

horizontal axis, which by definition is the most influential dimension (in this case accounting

for 88.1% of the total variation), is defined by the opposition between constructions with there

and those with the zero pronoun. The translation of Bede, the Book of Exeter, and the Anglo

Saxon Chronicle all tend to use the type C construction, while Ælfric, the Old English

Orosius, and the Blickling Homilies tend to use the type D construction with existential there.

The second (vertical) dimension accounts for a mere 11.9% of the total variation, and is hence

of less importance for this analysis: the dominant feature of the data is the there - zero

pronoun opposition

The data behind the biplot in figure 3 is based on data that were collected and

annotated independently of the data used in the present study, even if there is some overlap in

the sources. For this reason, the data from Breivik (1990) provide an excellent comparison

case. In short, observing the same pattern in another, partially overlapping, dataset annotated

with different methods, should lead to increased confidence in the result.

Turning to the data from YCOE, a small excerpt of which can be seen in table 2, we would

expect to see more or less the same pattern as in figure 3 when constructing a new biplot

based on the YCOE data. The data are annotated somewhat differently in YCOE, the

categories differ somewhat, and the categories used in this analysis are frequencies per author

or manuscript of locative adverbs, of existential there, of forms of be co-occurring with a

locative adverb or existential there, and lexemes other than be occurring in the same position.

22

The resulting 4 x 60 matrix is difficult to interpret directly, and we instead turn to the CA

biplot in figure 4.

Table 2: Example excerpt from the matrix containing frequencies of locative adverbs,

existential there, and their contexts by text in YCOE. The full matrix has 60 rows.

coadrian cobede coblick cobyrhtf

LOCATIVE 4 437 270 47

EXISTENTIAL 0 18 15 10

NON-BE (context) 4 423 242 36

BE (context) 0 32 43 21

The biplot in figure 4 is a good visualization from the point of view of variation accounted

for: all the variation can be captured in two dimensions, with the first (horizontal) dimension

accounting for 85.8%. The alignment of existential there with forms of be is taken as a proxy

for the type D construction, cf. also the argument from Construction Grammar presented in

section 1.In the center of the biplot we find locative adverbs and non-be contexts, alongside

the majority of texts. The horizontal dimension is defined by the opposition between

existential there occurring together with forms of be, and the other categories. On the right

hand side of the plot, associated with existential there and be, we find among others the Old

English Orosius (cooriosiu), the West-Saxon Gospels (cowsgosp), the homilies of Wulfstan

(cowulf), Byrhtferth’s Manual (cobyrhtf), and a number of texts by Ælfric.This result is

strikingly similar to the results in figure 3 based on the independently collected and analyzed

data from Breivik (1990).

23

Figure 4: CA biplot of the YCOE frequencies exemplified in table 2. The biplot captures the

data well, explaining 100% of the variation. Texts on the right-hand side of the biplot are

more associated with the existential there constructions.

Notably, other texts from the same period, i.e. the first half of the eleventh century, are

not particularly associated with existential there. These include translations of the dialogues

of Gregory of Tours (cogregdC and cogregdH, visible in the lower half of the plot in figure 4),

as well as parts of the Anglo-Saxon Chronicle from the same period (located in the center, not

visible). The pattern in the plot cannot easily be attributed to a specific genre: in addition to

the Gospels, we find homilies (Wulfstan, Ælfric), instructional material (Byrhtferth), and

letters and prefaces (Ælfric).

4.2 Authors and authority

24

A striking feature of the plot in figure 4 is that many of the texts that are associated with

existential there are texts written or translated by a single identified author. If such an

association was found for one author, it might plausibly be considered an idiosyncratic feature

associated with that author. However, this is less plausible when the same association is found

in a number of texts by different authors. To estimate the effect, if any, of a single identified

author a BINARY LOGISTIC REGRESSION model was used. This approach allows us to

simultaneously compare the association between identified authors and the use of existential

there to structural variables, such as grammatical features. The relationship between a binary

outcome, in this case an existential use of there or a locative adverb (the most frequent of

which is there), can be considered a function of a number of predictor variables, and by using

a regression model we can investigate the specific properties of this function. For more

detailed discussions and examples of regression models in historical linguistics, see Baayen

(2008: 218–222), Gries & Hilpert (2010), or Jenset (2010; 2014).

The following model was used:

Model: Response (probability of existential there) modeled as depending on:

fixed effects: co-occurrence with be (BeContext) + overall log-transformed complexity of the

sentence (LogComplexity) + interaction between co-occurrence with be and complexity + co-

occurrence with a nominative NP (NomNP) + single author (OneAuthor).

The model was a good fit to the data judging from diagnostic plots of residuals vs. fitted

values (not shown). Nagelkerke’s pseudo R2 was 0.75, indicating that the model covers much

of the variation in the data and further corroborating the impression of a good fit. Finally, a

likelihood ratio test (Baayen 2008: 253) comparing models with and without the OneAuthor

25

predictor indicated that the inclusion of the predictor in the model is warranted. However, it is

worth noting that in a model where OneAuthor is the only predictor, the fit is less good and

the predictor itself is not significant. This implies that this particular predictor only makes an

explanatory contribution when included alongside predictors representing grammatical

variables. Put differently, whatever social differentiation it represents, it fills in a gap left by

the predictors representing grammatical structure, but it is insufficient on its own.

Table 3: Summary of the logistic regression model. The coefficients indicate the effect each

predictor has on the log odds-ratio of finding an instance of existential there compared to a

locative adverb.

Coef. Std. Error z-value p-value

Intercept -6.4741 0.3491 -18.55 > 0.0001

BeContext 1.7300 0.4454 3.88 0.0001

LogComplexity 0.6685 0.1980 3.38 0.0007

NomNP 1.7092 0.2462 6.94 > 0.0001

OneAuthor 0.6667 0.2304 2.89 0.0038

BeContext LogComplexity 3.6093 0.3657 9.87 > 0.0001

The logistic regression model is summed up in table 3. The analysis shows that the

OneAuthor variable, like the other variables, is statistically significant, with a margin of error

that is smaller than the effect itself, but how does this compare to the other predictors in the

model? Apart from the intercept, all the coefficients are positive, which means that they

increase the probability of existential there. However, the intercept is a baseline: the

likelihood of existential there when all the predictors are kept at their reference levels (false

for binary variables and zero for numeric ones). Hence, the intercept is less interesting than

the degree to which the predictors are associated with an increased likelihood of existential

26

there from this baseline. The coefficients in the table are log-transformed odds-ratios, which

complicates interpretation. However, a simple rule of thumb is to divide the coefficient by

four, since this will provide an approximation to the maximum impact, or effect, that the

variable in question has on the outcome (Gelman & Hill 2007: 82). Since the model specifies

an interaction term for BeContext and LogComplexity, this coefficient needs to be taken into

account for those predictors. Adding the BeContext and the interaction coefficients, we get

5.3393 which, divided by four and rounded off, corresponds to a 133% increase in the

probability of existential there. For LogComplexity (again, with the interaction term added)

we get an increase of about 100%, whereas co-occurrence with a nominative NP is associated

with a 43% increase in the likelihood of existential there. Finally, texts written by one,

identified author represent a 17% increase in the likelihood of existential there.

In summary, the present section has shown that the type C and D constructions are in

complimentary distribution, that results based on partially independent data from Breivik

(1990) and YCOE lead to very similar conclusions, and that late Old English texts with one

identifiable author are particularly likely to use the type D construction with there.

Furthermore, these authors represent both individual prestige and can be linked to

standardization efforts (Gneuss 1972, Hogg 2002: 120). The next section will discuss these

results in light of the Blythe-Croft model presented in section 2.

5 DISCUSSION

The explorative steps in section 3 above indicated that individual authors used existential

there at different rates. In section 4, we could further pinpoint this to single authored texts in

the late Old English period in the eleventh century using CA. With a logistic regression

model, it was found that the variable indicating whether a text was written by a single,

27

identified author was a significant, albeit small, predictor of the occurrence of existential

there. Furthermore, the OneAuthor predictor was only significant when included alongside

predictors representing grammatical variables. The analysis also revealed that several authors

followed the same pattern, reducing the likelihood that this was caused by individual

idiosyncracies. Finally, the differences found among texts from the same time period suggests

that the observed differences in the use of type C and type D constructions were not

completely homogenous in the late Old English period, leaving room for other explanations

such as social prestige and normative efforts.

The analysis presented above lends plausibility to the proposal made by Blythe &

Croft (2012) regarding the importance of social factors for explaining the S-shaped trajectory

often found in linguistic change. More importantly, it tests the proposal and provides a more

nuanced picture by measuring the relative importance of the variables involved. As such, the

study testifies to the importance of both social and linguistic factors, and highlights the need

for multivariate statistical methods in historical linguistics. While the grammatical and

cognitive factors discussed by Breivik (1990; 1997) and Jenset (2010; 2014) regarding the

evolution of existential there are doubtlessly important, the fact that they manifest themselves

in larger effect sizes (cf. table 3 above) should not lead us to a situation where social factors

are dismissed. As the present study attests to, grammatical and social factors can be studied

side by side; the Blythe-Croft model provides the theoretical motivation for doing so.

The results from the present study also raise the pertinent question of what kinds of

effect sizes should be expected in theoretically motivated empirical explorations in historical

linguistics. It is perhaps easy to focus on large effects, and sometimes this is doubtlessly

correct. However, the gradual nature of linguistic change coupled with the need to coordinate

linguistic norms across speakers and communities for effective communication suggests that

variables with smaller effects (such as the OneAuthor predictor in the model discussed in

28

section 4) should also be considered, simply because relatively large changes might have

modest causes in a complex system.

It should be emphasized, however, that the analysis above is based on Blythe &

Croft’s richly elaborated model, which again has explicit theoretical motivations. Thus, it is

not merely the convenient singling out of a small but statistically significant variable. It is

precisely within models such as Blythe and Croft’s or larger frameworks such as evolutionary

approaches to linguistic change that small effects can be interpreted meaningfully and can

provide valuable contributions. It is also worth pointing out that the present analysis based on

Blythe & Croft (2012) complements, rather than replaces, the analyses put forth in Breivik

(1990; 1997) and Jenset (2010; 2014). A key premise underlying the present study is the need

for simultaneously considering the multiple factors involved in both the innovation and

propagation stages of constructional change.

In the case of the constructional change taking place regarding the type C and type D

existential constructions in Old English, the analysis above coupled with the Blythe-Croft

model suggests pockets, or centers, where certain circles might plausibly have considered the

there-construction a norm conveying some prestige. According to the framework, the prestige

need not be shared by all speakers in the area; it is to be expected that linguistic prestige in the

Middle Ages as manifested in written manuscripts would have been limited to certain circles

rather than the general population (Salmons 2012: 159). As the gained increased conventional

status its association with different groups within a speech community would have gradually

increased (Croft 2000: 177).

The analysis above leaves unanswered the question of how the social differentiation

possibly involved in the replication (or propagation) of existential there in Old English might

have influenced later stages of the history of English. This omission is intentional, since it

falls outside the scope of the current article. Furthermore, the dialect diversity and lack of

29

standardization in Middle English following the Norman Conquest (Corrie 2006) would

require a somewhat different approach from that taken in the present paper. However, it is not

implausible that the texts written in Old English might have had some influence on the

language of later texts. In the Midlands, the area where standardization of Middle English

began from the fourteenth century, a link to the literary tradition of Old English was

maintained through the study and copying of older texts (Corrie 2006: 109–110). Admittedly

somewhat speculative, the preliminary results discussed in the present study can be recast as a

testable prediction for later stages of English: if a differential social weighting is crucially

involved in the propagation of existential there, we would expect its increasing probability to

coincide with the increasing social prestige (and standardization) of Midlands English after

the middle of the fourteenth century (Corrie 2006). However, the prediction is not merely a

question of geography. It fits well with an evolutionary model of social propagation (Croft

2000: 193–194) where communicative interaction ensures that novel forms spread between

population centers, a model which is also congruent with the results presented for Old English

above.

6 CONCLUSIONS

The investigation above suggests a highly likely case for the view that differential social

weighting, or replicator selection, played a non-trivial role in the early propagation of the Old

English existential construction with there, i.e. type D, to the detriment of the type C

construction. This result contributes to answering the question of how there, once it had been

reanalyzed as an existential subject, kept on being reanalyzed with the construction it

appeared in eventually replacing the former majority variant. For Old English, the period

studied here, it seems clear that Blythe and Croft’s model of replicator selection can lead to

30

new insights about the process of constructional change. We can find empirically motivated

indicators of differential social weighting in the data, possibly driven by the social prestige

associated with some groups and writers.

Crucially, the explanation developed above does not exclude or do away with

linguistic variables. Rather, the differential social weighting is operationalized and adapted to

an empirical framework which allows us to keep linguistic and social variables side by side.

As such, the analysis presented in the present paper builds on the results from Breivik (1990)

by demonstrating how structural factors involved in an innovation might be qualified by

social factors in the process of propagation. However, Blythe and Croft’s model is elegant and

persuasive, but it is still just a model. Empirical studies are required to ascertain to what

degree the map fits the terrain. In looking for social factors to explain linguistic change, it is

important to not throw the linguistic baby out with the bath water. The interaction, or perhaps

competition, between social and linguistic variables needs detailed investigations along the

different temporal points of the S-curve. This is required not only to establish the relative

importance of the variables in the ongoing propagation of change, but also to gain a deeper

understanding of how social factors interact with syntactic, semantic, and pragmatic factors in

driving linguistic change.

REFERENCES

Baayen, R. Harald. 2008. Analyzing linguistic data: A practical introduction to statistics

using R. Cambridge: Cambridge University Press.

31

Baxter, Gareth J., Richard A. Blythe, William Croft & Alan J. McKane. 2009. Modeling

language change: An evaluation of Trudgill’s theory of the emergence of New

Zealand English. Language Variation and Change 21(2), 257–296.

Blythe, Richard A. & William Croft. 2012. S-curves and the mechanisms of propagation in

language change. Language 88(2), 269–304.

Blythe, Richard A. & William A. Croft. 2009. The speech community in evolutionary

language dynamics. Language Learning 59(s1), 47–63.

Breivik, Leiv Egil. 1990. Existential there: a synchronic and diachronic study. 2nd edn. Oslo:

Novus.

Breivik, Leiv Egil. 1997. There in space and time. In Heinrich Ramisch & Kenneth Wynne

(eds.), Language in time and space: studies in honour of Wolfgang Viereck on the

occasion of his 60th birthday, 32–45. Stuttgart: Franz Steiner Verlag.

Breivik, Leiv Egil & Toril Swan. 2000. The desemanticisation of existential there in a

synchronic-diachronic perspective. In Christiane Dalton-Puffer & Nikolaus Ritt (eds.),

Words: Structure, meaning, function–A Festschrift for Dieter Kastovsky, 19–34.

Berlin: Mouton de Gruyter.

Chambers, J. K. 2002. Patterns of Variation including Change. In Chambers et al. (eds.), 349–

372.

Chambers, J. K., Peter Trudgill & Natalie Schilling-Estes (eds.). 2002. The handbook of

language variation and change. Malden, MA.: Blackwell Publishing.

Corrie, Marilyn. 2006. Middle English - Dialects and Diversity. In Mugglestone (ed.), 86–

119.

Croft, William. 2000. Explaining language change: An evolutionary approach. London:

Longman.

32

Croft, William. 2001. Radical Construction Grammar: Syntactic theory in typological

perspective. Oxford: Oxford University Press.

Davies, Mark. 2008. The Corpus of Contemporary American English (COCA): 410+ million

words, 1990-present. Brigham Young University. http://www.americancorpus.org/.

Fought, Carmen. 2002. Ethnicity. In Chambers et al. (eds.), 444–472.

Gelman, Andrew & Jennifer Hill. 2007. Data analysis using regression and multilevel /

hierarchical models. Cambridge: Cambridge University Press.

Gneuss, Helmut. 1972. The origin of Standard Old English and Æthelwold’s school at

Winchester. Anglo-Saxon England 1, 63–83.

Goldberg, Adele E. 1995. Constructions: A construction grammar approach to argument

structure. Chicago: University of Chicago Press.

Greenacre, Michael. 2007. Correspondence analysis in practice. 2nd edn. Boca Raton, FL.:

Chapman & Hall/CRC.

Greenacre, Michael & Oleg Nenadic. 2007. ca: Simple, Multiple and Joint Correspondence

Analysis. http://www.carme-n.org/.

Gretsch, Mechthild. 2009. Ælfric,Language and Winchester. In Hugh Magennis & Mary

Swan (eds.), A companion to Ælfric, 109–138. Leiden: Brill.

Gries, Stefan Th. 2008. Dispersions and adjusted frequencies in corpora. International

Journal of Corpus Linguistics 13(4), 403–437.

Gries, Stefan Th. & Martin Hilpert. 2010. Modeling diachronic change in the third person

singular: a multifactorial, verb-and author-specific exploratory approach. English

Language and Linguistics 14(3), 293–320.

Hilpert, Martin. 2013. Constructional Change in English: Developments in Allomorphy, Word

Formation, and Syntax. Cambridge: Cambridge University Press.

Hogg, Richard. 2002. An introduction to Old English. Oxford: Oxford University Press.

33

Irvine, Susan. 2006. Beginnings and Transitions: Old English. In Mugglestone (ed.), 32–60.

Jenset, Gard B. To appear. When data grow on trees: Random Forests in historical corpus

linguistics.

Jenset, Gard B. 2010. A Corpus-based Study on the Evolution of There: Statistical Analysis

and Cognitive Interpretation. Ph.D. dissertation, University of Bergen.

http://hdl.handle.net/1956/4444.

Jenset, Gard B. 2014. Mapping meaning with distributional methods: a diachronic corpus-

based study of existential there. Journal of Historical Linguistics 3(2), 272–306.

Jenset, Gard B. & Barbara McGillivray. 2012. Multivariate analyses of affix productivity in

translated English. In Michael P. Oakes & Meng Ji (eds.), Quantitative Methods in

Corpus-Based Translation Studies, 301–323. Amsterdam: Jonn Benjamins Publishing

Company.

Labov, William. 2001. Principles of Linguistic Change, vol 2: Social Factors. Oxford:

Blackwell.

Mugglestone, Lynda (ed.). 2006. The Oxford History of English. Oxford: Oxford University

Press.

Nevalainen, Terttu & Helena Raumolin-Brunberg. 2003. Historical sociolinguistics. London:

Longman.

R Development Core Team. 2011. R: A Language and Environment for Statistical

Computing. Vienna. http://www.r-project.org.

Salmons, Joe. 2012. A history of German: what the past reveals about today’s language.

Oxford: Oxford University Press.

Schmitt, Norbert. 2010. Researching Vocabulary: A Vocabulary Research Manual.

Basingstoke: Palgrave Macmillan.

34

Schneider, Edgar W. 2002. Investigating variation and change in written documents. In

Chambers et al. (eds.), 67–96.

Taylor, Ann, Anthony Warner, Susan Pintzuk & Frank Beths. 2003. The York-Toronto-

Helsinki Parsed Corpus of Old English Prose.

Trudgill, Peter. 2002. Social Differentiation. In Chambers et al. (eds.), 373–374.

Williams, Alexander. 2000. Null subjects in Middle English existentials. In S. Pintzuk, G.

Tsoulos & A. Warner (eds.), Diachronic syntax: Models and mechanisms, 285–310.

Oxford: Oxford University Press.