2014. Deshors, Sandra C. "A case for a unified treatment of EFL and ESL: A multifactorial approach"....

This is a contribution from English World-Wide 35:3© 2014. John Benjamins Publishing Company

This electronic file may not be altered in any way.The author(s) of this article is/are permitted to use this PDF file to generate printed copies to be used by way of offprints, for their personal use only.Permission is granted by the publishers to post this file on a closed server which is accessible only to members (students and faculty) of the author’s/s’ institute. It is not permitted to post this PDF on the internet, or to share it on sites such as Mendeley, ResearchGate, Academia.edu. Please see our rights policy on https://benjamins.com/#authors/rightspolicyFor any other use of this material prior written permission should be obtained from the publishers or through the Copyright Clearance Center (for USA: www.copyright.com). Please contact [email protected] or consult our website: www.benjamins.com

John Benjamins Publishing Company

English World-Wide 35:3 (2014), 277–305. doi 10.1075/eww.35.3.02desissn 0172–8865 / e-issn 1569–9730 © John Benjamins Publishing Company

A case for a unified treatment of EFL and ESLA multifactorial approach*

Sandra C. DeshorsNew Mexico State University

This multifactorial corpus-based study focuses on dative alternation construc-tions (Mark gave his daughter a gift versus Mark gave a gift to his daughter) and contrasts 1,313 give occurrences in ditransitive and prepositional dative con-structions across native, learner (EFL) and world (ESL) Englishes. Using cluster analysis and regression modeling, I analyze how grammatical contexts constrain syntactic choices in EFL and ESL and how speakers with different instructional backgrounds develop different variation patterns in their own English variety. The regression model reveals that the English variety factor accounts significantly for syntactic variation. In addition, the study identifies a prototypical preposition-al dative construction in non-native English, which serves as a default construc-tion for learners in more complex grammatical contexts. This study stresses the importance of reaching beyond structural linguistic differences by investigating processing (dis)similarities between EFL and ESL and shows the usefulness of a cognitive theoretical framework as a unified approach to cross-varietal variation.

Keywords: World and Learner Englishes, dative alternation, give, multifactorial approaches, cluster analysis, logistic regression

1. Background of the study

1.1 Combining ESL and EFL: Some preliminaries

The question of what distinguishes and unites ESL (i.e. indigenized varieties of English spoken in countries like Singapore or Hong Kong) and EFL (i.e. foreign varieties of English spoken in countries such as France of Germany) has recently

* I wish to thank, in alphabetical order, Sandra Götz, Stefan Th. Gries, Marianne Hundt, Mark Waltermire and two anonymous reviewers for their valuable comments on a previous version of this paper. All errors are mine alone.

© 2014. John Benjamins Publishing CompanyAll rights reserved

278 Sandra C. Deshors

rapidly captured the attention of corpus linguists concerned with modeling non-native English varieties. However, the notion that different types of Englishes are best approached together is not new. As early as 1982, Kachru points out that “[i]n the international context, it is more realistic to consider a spectrum of Englishes which vary widely ranging from standard native varieties to standard non-native varieties” (Kachru 1982: 53). In 1986, Sridhar and Sridhar (1986) identified a “par-adigm gap” between the EFL and ESL research areas and proposed to approach the two variants in more integrated ways.1 Despite Sridhar and Sridhar’s call for a rapprochement of research into world and learner Englishes, for over 25 years, the two fields continued to be investigated with no real contact between them (Gilquin 2011).

Recent publications such as Mukherjee and Hundt’s (2011) edited volume on Exploring Second-Language Varieties of English and Learner Englishes, however, show that the overall approach to non-native variants is currently undergoing a significant shift, and increasing research effort is being dedicated to bringing together the EFL and ESL research fields, as advocated by Sridhar and Sridhar (1986). In that regard, studies such as Nesselhauf (2009), Gilquin and Granger (2011) and Götz and Schilk (2011), along with discussion forums such as Hundt and Mukherjee (2011a), have already started to show the relevance of compre-hensive approaches to non-native Englishes for the field of English linguistics as a whole and, crucially, they have contributed to setting up an agenda for the devel-opment of a unified approach to second and foreign English variants. This devel-opment has now become an urgent and necessary step in the collective effort to bridge the paradigm gap:

[S]ince both learner Englishes and second-language varieties are typically non-native forms of English that emerge in language contact situations and that are acquired (more or less) in institutionalized contexts, it is high time that they were described and compared on an empirical basis in order to draw conceptual and theoretical conclusions with regard to their form, function and acquisition. (Hundt and Mukherjee 2011b: 2)

Recently, corpus linguists in both EFL and ESL research have taken an active part in trying to find fruitful ways of quantitatively exploring large-scale world and learner English corpus data. As Gilquin (2011: 639) notes, “combining these cor-pora makes it possible for the first time ever, to systematically compare features of WE [World Englishes] and of LE [Learner Englishes]”. The field of corpus-based

1. Note that henceforth I use the neutral term “variant” to refer to native English, EFL and ESL as some readers may argue against EFL representing a full-fledged non-native variety. In cita-tions, the term variety should therefore be understood as variant.


A case for a unified treatment of EFL and ESL 279

English variety research is already benefiting from a fast growing number of con-trastive studies which have so far confirmed the usefulness of combining ESL and EFL research to further our understanding of both variants. In the field of phraseology, for instance, Nesselhauf (2009) shows how integrating the two ar-eas has helped identify similarities of the phraseology of institutionalized second language and foreign learner variants that previously had gone almost unnoticed.

At a more theoretical level, combining EFL and ESL variants is not necessarily a straightforward exercise. As Kachru (1982: 54) explains, the two variants are dis-tinct types of non-native English; and while second language Englishes (or world Englishes) are essentially institutionalized variants, foreign language Englishes (or learner Englishes) are primarily performance Englishes. More specifically, insti-tutionalized variants have “ontological status” (Kachru 1982: 55) in the sense that they have, for instance, an extended range of uses in the sociolinguistic context of a nation, an extended register and style range, a process of nativization of the reg-isters and styles, amongst other characteristics.2 In contrast, performance variants are used as a foreign language and they have no social status. Unlike institutional-ized varieties, performance varieties are highly restricted in terms of their func-tional range in specific social contexts. This distinction between ESL and EFL has been found to be reflected linguistically. For instance, in the context of formulaic sequences in spoken ENL, ESL and EFL, Götz and Schilk (2011: 97) observe that ESL and EFL speakers do not share the same repertoire of formulae. Compared to users of English as an institutionalized second language variety, EFL speakers use a different and more restricted range of formulae. However, despite representing different types of non-native Englishes, EFL and ESL variants share, at least for a period of time, the performance status as “an institutionalized variety always starts as a performance variety, with various characteristics slowly giving it a different status” (Kachru 1982: 55).

This common characteristic between EFL and ESL represents a crucial aspect for the development of an integrated account of world and learner Englishes. Such common ground allows us to approach the two variants solely from the perspec-tive of their linguistic process, which is an important point since, as Kachru (1982) explains, such a linguistic process is one of two processes (along with the attitu-dinal process) that needs to be accounted for to model non-native language.3 In

2. The term “nativization” refers to a particular phase in the process of emergence of a new variety of English during which the use of English becomes “a major practical issue and an ex-pression of new identity” (Schneider 2003: 247).

3. The two processes differ in nature in that the attitudinal process refers to linguistic norm and the linguistic process refers to linguistic behavior. In the words of Kachru (1982: 55–56), “[i]n attitudinal terms, a majority of L2 speakers should identify themselves with the modifying label



addition, as performance variants (at least initially), EFL and ESL have a common starting point in their acquisition process which, as Biewer (2011: 28) notes, entails that their “cognitive processes and learning strategies will be similar at the begin-ning of the learning process”. Crucially, Biewer (2011: 28) asserts that “[t]his fact helps to explain many of the features common to L2 varieties in general”.

Against this background, the present study raises the question of the extent to which those cognitive processes and learning strategies differ (or not) in the case of more advanced learners and proficient ESL speakers. More generally, Biewer’s (2011) study calls for further investigation on whether theoretical approaches in-spired from cognitive linguistics generally represent potentially insightful back-ground domains to integrate EFL and ESL and ultimately bridge the paradigm gap. Finally, this type of approach raises important methodological questions, namely to what extent corpus data and corpus methods allow us to reach beyond struc-tural linguistic differences, to what degree they are compatible with cognitively-inspired analytical frameworks and, more generally, whether those methods rep-resent useful resources to describe and model EFL and ESL in a unified way.

1.2 Existing ways of exploring (dis)similarities across EFL and ESL

Over the past three to four years, contrastive studies of non-native language vari-ants have benefited from significant methodological developments. Gradually, those developments have helped us draw a clearer picture of the nature of the (dis)similarities between EFL and ESL. The relevant literature points toward three types of corpus-based approaches so far generally adopted across studies: tradi-tional basic quantitative studies (e.g. Nesselhauf 2009), studies that combine quan-titative and qualitative approaches (e.g. Gilquin 2011, Laporte 2012) and finally, state-of-the-art quantitative studies that apply sophisticated statistical techniques to study linguistic phenomena within their linguistic context of utterance (e.g. Szmrecsanyi and Kortmann 2011). In what follows, I will present how these ap-proaches have been applied to contrast learner and world Englishes studies and I will briefly discuss their advantages and limitations.

Nesselhauf (2009) investigates co-selection phenomena (such as PLAY A ROLE versus PLAY A PART and the noun-complementation of collocations such as HAVE + INTENTION + of -ing versus to + infinitive) across four vari-ants of world Englishes (specifically India, Singapore, Jamaica and Kenya) and

which marks the non-nativeness of a model: For example, Indian English speakers, Lankan English speakers […] A person may be a user of Indian English in his linguistic behavior but may not consider it the ‘norm’ for his linguistic performance. There is thus a confusion between linguistic norm and linguistic behavior”.



four variants of learner English (specifically German, French, Finnish and Polish). Methodologically, the study consists of a traditional frequentist analysis where frequencies of occurrence of the co-selection patterns in focus across the second and foreign English variants are studied in terms of their over- and underuses compared to a native British norm. From an exploratory perspective, studies such as Nesselhauf (2009) provide a useful starting point in the overall effort to ex-plore areas of (dis)similarities across non-native variants. For instance, Nesselhauf (2009: 22) observes that all the various types of co-selection phenomena inves-tigated in her paper “have been found to occur across institutionalized L2 and learner varieties”, and “co-selection phenomena that only display a low degree of idiomaticity and culture-boundedness tend to have similar characteristics across L2 and learner varieties”. However, such studies tend to call for further-reaching and more fine-grained analyses of the specific grammatical contexts that contrib-ute to observed (dis)similarities between indigenized and learner Englishes.

Such finer-grained resolution is achieved in Gilquin (2011) or Laporte (2012), two studies that combine quantitative and qualitative perspectives. Gilquin (2011) consists of an analysis of phrasal verbs across native British English, all EFL variants form the International Corpus of Learner English and four ESL vari-ants (Kenyan, Tanzanian, Indian and Singaporean English) from the International Corpus of English. The combined approach allows Gilquin to identify shared in-novations in the way phrasal verbs are used across EFL and ESL. Interestingly, her findings lead her to ask “how […] such similarities [can] be explained, considering the contact between all these varieties is extremely unlikely to have taken place” (Gilquin 2011: 641). Her question reinforces the point made earlier in relation to exploring EFL and ESL in terms of their linguistic development. Laporte (2012) analyzes the high-frequency verb MAKE across four world Englishes (Kenyan English, Indian English, Jamaican English and Singaporean English), four learner Englishes (Japanese-, Russian-, French-, and Dutch-English interlanguage) and native English. The study identifies varietal trends based on (i) overall frequency results and (ii) by semantic category. Interestingly, Laporte’s (2012) results yield unexplained discrepancies between the quantitative and qualitative results, strong-ly suggesting the need to integrate the quantitative and qualitative approaches. This can be achieved by adopting a method allowing for richly annotated data (i.e. simulating a qualitative approach) to be analyzed quantitatively. With such an ap-proach, the analyst is in a position to assess whether those discrepancies might be part of a wider and perhaps more coherent picture.

Szmrecsanyi and Kortmann (2011) illustrate the third type of study, name-ly the large-scale corpus investigation of a broad range of grammatical features. Szmrecsanyi and Kortmann (2011) very clearly demonstrate the strong descrip-tive power of this type of approach and the usefulness of integrating grammatical



contexts in contrastive studies of EFL and ESL.4 Szmrecsanyi and Kortmann (2011) take a typological perspective on EFL and ESL and explore part-of-speech (POS) frequencies in order to analyze and compare degrees of grammatical ana-lyticity and syntheticity in five world Englishes (East Africa, Hong Kong, India, Philippines and Singapore), eleven learner Englishes (Bulgarian, Czech, Dutch, Finnish, French, German, Italian, Polish, Russian, Spanish and Swedish), and across three standard British English registers (school essays, university essays and speech). With this approach, the authors recognize the strikingly different typo-logical profiles of EFL and ESL. However, despite its indisputable analytic power, the main disadvantage of this approach is that by nature it is not compatible with cognitive approaches to language use and therefore has little explanatory power in relation to speakers’ linguistic choices. In addition, although the approach in-cludes a wide range of contextual linguistic elements (as POS), the method does not account for any possible interactions between those elements. In other words, this approach prevents the analyst from studying accurately how grammatical fea-tures combine (i.e. interact) with one another during language production. This is an important disadvantage because (i) such combinations reflect the psychological reality of language production as modeled in usage-based approaches to language acquisition, and (ii) recent multifactorial work in EFL demonstrates that combina-tion patterns characterize learner language and contribute significantly to its non-nativeness. In what follows, I briefly present such recent multifactorial analyses of learner language and I show how this type of approach can further help close the paradigm gap.

1.3 Regressions and interactions: A multifactorial solution to the paradigm gap

In the field of learner corpus research, a rapidly growing number of corpus-based studies such as Deshors and Gries (2014) or Gries and Wulff (2014), demonstrate that multifactorial methods (particularly those including logistic regression mod-eling) can help us to:

1. identify grammatical features that trigger non-native linguistic patterns, and2. explain the existence of those patterns in learner Englishes.

4. I acknowledge previous work by Szmrecsanyi and Kortmann (2009) that adopts sophisti-cated methodological approaches such as multidimensional scaling and cluster analysis to ana-lyze 76 grammatical features across 60 non-standard varieties or groups of English varieties: native vernaculars, distinctive ethnic, regional, and social varieties, English-based pidgins and creoles, and major ESL varieties. Despite its interesting methodological aspect, I do not include Szmrecsanyi and Kortmann’s (2009) study in the current discussion as it is experimental in nature (specifically questionnaire-based) rather than corpus-based.



In a nutshell, multifactorial regressions are statistical models that help us predict a dependent variable (i.e. a particular linguistic outcome) on the basis of sever-al independent variables or predictors. More concretely, with logistic regression modeling we can assess how the grammatical contexts of linguistic items (i.e. the interactions between their co-occurring semantic and morpho-syntactic features) systematically vary across native and learner language. According to Gries (p.c., Feb. 2013), “[t]his type of approach […] is extremely powerful in how it allows researchers to investigate the impact of multiple predictors on a linguistic choice simultaneously but is unfortunately still very much underutilized”. From a cog-nitive or usage-based perspective, this approach is also appropriate as language learning is characterized as clouds in multidimensional exemplar space (Gries and Deshors 2014: 18).

Deshors and Gries (2014) illustrate this approach with a focus on modal con-structions with may/can in French-English interlanguage. The authors use logistic regression to analyze 3,700 occurrences of may/can based on their co-occurrence with twenty-two semantic and morphosyntactic factors (a total of 98 linguistic fac-tor levels). This approach allows Deshors and Gries to show that although French English learners’ uses of may and can are similar to native uses (and can be reliably predicted), there are also differences to native uses which ultimately reflect cogni-tive processes such as resorting to can as a default when the linguistic environment becomes too complex. For instance, the authors’ results indicate that with respect to the grammatical features “clause type” and “verb semantics”, learners prefer can in subordinate clauses and may in main and coordinate clauses; and they prefer can with verbs denoting an abstract process and may with copula verbs. Overall, these results are in line with previous work on how grammatical complexity affects lexico-syntactic choices (see Rohdenburg 1996; see Gries and Wulff (2013) for a similar multifactorial analysis of genitive alternation across Chinese and German English learners).

The powerful nature of regression models to study non-native linguistic pat-terns has recently been further demonstrated by Gries and Deshors (2014) who developed a protocol called MuPDAR (Multifactorial Prediction and Deviation Analysis with Regressions), so far “[t]he most fine-grained approach […] that of-fers the most unprecedented level of precision in the analysis of learner language” (Gries and Adelman 2014: 4). The approach involves a two-step regression-ana-lytic procedure aimed at computing, for an individual learner’s linguistic choice, what choice a native speaker would make in the very same linguistic situation as a learner. Broadly, the two regressions allow the analyst to determine (i) what grammatical factors determine a particular linguistic outcome in native language, (ii) whether a learner made a native-like linguistic choice given the exact same lin-guistic context, and (iii) in the case of a mismatch between a native and non-native



choice, to what extent a learner’s choice is off target.5 Against this background, the present paper argues that regression-based approaches provide a way to make a significant methodological contribution to reducing the paradigm gap between second and foreign language Englishes.

1.4 The relevance of the dative alternation for an integrated approach to EFL and ESL

From a multifactorial perspective, the dative alternation, specifically the alterna-tion between ditransitive and prepositional syntactic structures, as illustrated be-low in (1) and (2), represents an interesting phenomenon to investigate.

(1) Mark gave his daughter a gift (ditransitive)

(2) Mark gave a gift to his daughter (prepositional dative)

Existing work on the dative alternation in native English shows that grammatical contexts and processing factors influence speakers’ syntactic choice of one struc-ture over the other (see Green 1974, Ransom 1979, Collins 1995, Gries 2003a, Gries 2003b, amongst others).6 It follows that, from a corpus linguistics perspec-tive, the main objectives are:

– to determine which linguistic contextual features influence speakers’ choice of a particular syntactic construction over the other, and

– to predict which of the alternative dative structures (ditransitive or preposi-tional) a speaker is likely to choose to convey a message about a particular event.

Bresnan et al. (2007) illustrate this approach with their analysis of 2,360 native English spoken instances of dative alternations as used in the full Switchboard collection of recorded telephone conversations. Their study models speakers’ choices of the two syntactic structures based on fourteen processing explanatory factors assumed to have a quantitative influence on dative syntax (semantic class,

5. See Gries and Wulff (2013) for an application of the MuPDAR approach to prenominal ad-jective order and see Gries and Alderman (2013) for an application of the approach to subject realization in Japanese conversation by native and non-native speakers.

6. In the current study, and in line with Bernaisch et al. (2014), processing factors are consid-ered as factors related to the notion of processing effort and the processing cost that they incur on speakers (see Gries 2003b). In that respect, Bernaisch et al. (2014: 15) note, for instance, that length factors “are related to processing in that, on the whole, longer material requires more processing cost than shorter material”. Similarly, pronouns are considered more accessible than full lexical noun phrases.



accessibility of recipient and theme, pronominality of recipient and theme, defi-niteness of recipient and theme, animacy of recipient, person of recipient, number of recipient and theme, concreteness of theme, structural parallelism in dialogue and length difference). The authors use binary logistic regression modeling and successfully predict speakers’ syntactic choices from multiple variables with 94% accuracy. The regression results indicate that, except for the factor “number of recipient”, all model predictors influence speakers’ syntactic choices in a way that is statistically significant. The study further shows that “the effects of discourse ac-cessibility, animacy, definiteness and syntactic weight on dative construction is not reducible to syntactic complexity in parsing” (Bresnan et al. 2007: 82). Similarly, Bresnan and Hay (2008) focus on American and New Zealand Englishes and in-vestigate whether the two English variants differ in their probabilities of syntactic choices over space and time. The authors find that animacy has a significant effect on the syntax of GIVE and they note that “the quantitative differences in animacy […] appear to reflect the dynamics of high-level choices that change grammar in subtle, gradient ways” (Bresnan and Hay 2008: 11).

In the field of world Englishes, Bernaisch et al. (2014) also adopt a multifacto-rial approach to analyze and contrast dative constructions across written native and South Asian Englishes. The study is based on the South Asian Varieties of English (SAVE) Corpus, a corpus of newspapers including six national compo-nents: Bangladesh, India, the Maldives, Nepal, Pakistan and Sri Lanka. The data from the SAVE corpus were obtained from the on-line archives of two leading English-medium national newspapers, and the texts were produced by highly pro-ficient English speakers. Bernaisch et al. (2014) used the periodicals section of the British National Corpus (BNC) as the native reference corpus. The study spe-cifically focuses on the dative alternation of the verb GIVE. The authors manually annotated and analyzed a total of 1,871 occurrences of the verb in its ditransitive, prepositional and passivized constructions based on eleven explanatory factors: transitivity (ditransitive versus prepositional dative), country (SAVE/BNC com-ponent), length of recipient and length of patient (in words), recipient and patient animacy (animate versus inanimate), recipient and patient accessibility (given versus new), recipient and patient pronominality (pronoun versus noun phrase), and patient semantics (abstract versus concrete versus informational). Statistically, Bernaisch et al. (2014) contrast with Bresnan et al. (2007) in that they explore the data using two related approaches: first, a method of conditional inference trees (with the function ctree from the R package party) and second, they fit a classifica-tion tree to their data based on the random forests method (with randomForest for R). This type of approach is based on the recursive inspection of a data set to determine which independent variable should be used to split up the data so as



to best predict the known outcomes of the dependent variable. Ultimately, these approaches allow analysts to model speakers’ decision patterns.

Bernaisch et al.’s (2014) study reveals that (i) the syntactic pattern of GIVE is influenced by the factors of pronominality of the recipient, length of the recipient, semantic class of GIVE and length of the patient, and (ii) that a number of the fac-tors found to be relevant in British English are at play in South Asian variants. The authors conclude that “processing-related factors all seem to be at work similarly across the English varieties” and that those factors “may therefore be shared by all Englishes” (Bernaisch et al. 2014: 16). Despite its insightful results, Bernaisch et al.’s (2014) study calls for further exploration of the influence of the variable coun-try (i.e. English variants) over speakers’ choices of syntactic constructions using regression analysis as such a technique may yield finer differentiation between English variants. This view is mainly based on Cutler (2010) who identifies a num-ber of disadvantages to tree-based classification approaches: (i) classification trees need more data than parametric procedures like logistic regressions, (ii) classifica-tion trees are unstable (i.e. small changes in the data can completely change the fitted tree) and (iii) they are only moderately accurate for prediction and classifica-tion while there are more accurate classifiers and regression procedures available.

To the best of my knowledge, no multifactorial study of the dative alterna-tion involving learner language has so far been conducted. Given that indigenized and learner Englishes represent distinct types of non-native Englishes, this work investigates to what extent the grammatical contexts of use of the two GIVE con-structions influence differently the syntactic choices of ESL and EFL writers. More specifically, this study seeks to pinpoint the exact grammatical contextual factors (e.g. voice, patient animacy, length of the recipient, etc.) that play a determinant role in EFL and ESL writers’ syntactic choices. The study builds on Bernaisch et al. (2014) by studying further the relevance of the factor country by (i) applying logistic regression modeling and (ii) by extending the study to other non-native English speakers, namely EFL speakers.

2. Method

In this section, I will begin by presenting the corpus data used for this study. I will first explain how those data were explored and I will present the factors (or predictors) against which the data were annotated and analyzed. In a second step, I will turn to the statistical aspect of the analysis and present the specifics of the approach selected for this study.



2.1 Material and coded variables

The present study contrasts the uses of GIVE in the ditransitive and prepositional dative constructions across six English variants: three ESL variants (Hong Kong, Indian and Singapore English), two EFL variants (French– and German–English interlanguage) and native British English. To explore the ESL variants, I used the International Corpus of English (ICE) (see Greenbaum 1996). The native data were extracted from the British component of ICE. For the learner Englishes, I used the written subsection of the International Corpus of Learner English (ICLE; Granger et al. 2009) which provides essays produced by advanced learners of English with different mother tongues and who are in their third and fourth year at university. The French and German subsections of ICLE were chosen as representative of learner Englishes (228,081 and 435,000 words, respectively). To ensure compara-bility across the ICLE and ICE corpora, the ESL and native data were limited to the student writing section of ICE (approximately 120,000 words in total), and sam-ples of five hundred GIVE occurrences were extracted from each subcorpus. No distinction across writers was made on the basis of age or educational background.

All material was extracted and statistically analyzed using the software R (see R Development Core Team 2010). Overall, five hundred instances of GIVE within their grammatical contexts of occurrence were randomly extracted from each of the six corpora and checked manually for their syntactic relevance. Monotransitive constructions and non-verbal occurrences were systematically discarded. Table 1 shows the overall distribution of the two GIVE constructions across the six corpora.

Table 1. Distribution of the two dative constructions of GIVE in the six corporaCOUNTRY ICE-GB ICE-HK ICE-IND ICE-SIN ICLE-FR ICLE-GE TOTALDitransitive 142 200 137 137 91 129 836PrepDative 62 96 130 76 79 34 477TOTAL 204 296 267 213 170 163 1313

Each match was annotated against twelve grammatical factors, which is a total of thirty-four linguistic factor levels (see Table 2 for a list of all the factors included in the coding process). The annotation scheme is based on Bernaisch et al. (2014) so as to later facilitate comparisons on verb-complementation patterns across English variants. With regard to the factors patanimacy and recanimacy, I ad-opted a binary encoding scheme (animate versus inanimate) despite the fact that “animacy is, at least cognitively, more of a continuum than a dichotomy” (Schilk et al. 2013: 7). Similarly to Schilk et al. (2013), the relatively small size of the dataset and the large number of different factors motivated the choice of a binary animacy scale over a finer-grained scale such as the one proposed by Zaenen et al. (2004).



Table 2. Overview of the factors used in the annotation of GIVE in ditransitative and prepositional dative constructionsFactors Levels Variable descriptiontransitivity Ditransitive, prepositional dative the syntactic pattern of GIVEcountry Ice-gb, ice-hk, ice-ind, ice-sin,

icle-fr, icle-gethe English variety/corpus subsection from which the GIVE occurrence was extracted

voice active, passive, no voice — reclength continuous factor length of the recipient in wordspatlength continuous factor length of the patient in wordsrecanimacy animate, inanimate animacy of the recipientpatanimacy animate, inanimate animacy of the patientrecaccessibility given, new whether the recipient is mentioned for

the first time or whether it was already mentioned in the preceding ten lines

pataccessibility given, new whether the patient is mentioned for the first time or whether it was already mentioned in the preceding ten lines

recpronominality np, pronoun whether the recipient is expressed with a noun phrase or a pronoun

patpronominality np, pronoun whether the patient is expressed with a noun phrase or a pronoun

patsemantics abstract, concrete, informational semantics of the patient

2.2 Statistical evaluation: Hierarchical cluster analysis (HAC) and binary logistic regression

The statistical analysis consists of two main steps. First, the grammatical contexts of the two constructions were analyzed using a cluster analysis approach. In a sec-ond step, the uses of the two constructions were modeled using binary logistic regression modeling. The HAC approach is an exploratory method which pro-vides a way to explore the cross-varietal similarities and the differences between ditransitive and prepositional dative uses of GIVE based on a large number of contextual clues. This method allows us to compute behavioral profiles of GIVE constructions. A behavioral profile provides a “comprehensive inventory of ele-ments co-occurring with a word within the confines of a single clause or sentence in actual speech or writing” (Divjak and Gries 2009: 277). As such, it provides, in the current analysis, a form-specific summary of the semantic and morpho-syn-tactic behavior of ditransitive and prepositional dative GIVE in each sub-corpus. Techniques like HAC are hypothesis-generating. The individual profiles of GIVE occurrences were computed across the data (i.e. ditransitiveICE-GB, prepdativeICE-GB,



ditransitiveICE-HK, prepdativeICE-HK, ditransitiveICE-IND, prepdativeICE-IND, ditransi-tiveICE-SIN, prepdativeICE-SIN, ditransitiveICLE-FR, prepdativeICLE-FR, ditransitiveICLE-

GE, and prepdativeICLE-GE) using Gries’ (2009) R script Behavioral Profiles 1.01, in relation to the identified semantic and morpho-syntactic predictors.

The output of a HAC analysis is a dendrogram featuring clusters that exhibit high intra-cluster similarity and low inter-cluster similarity and which are, ulti-mately, all part of a single cluster, the original data set. In keeping with previous studies (e.g. Divjak and Gries 2006), I chose the Canberra metric as a measure of (dis)similarity and Ward’s rule as an amalgamation strategy. For Divjak and Gries (2006: 37), the advantage of the Canberra metric is that it “handles the com-paratively large number of zero occurrences of particular features best”. The cluster analyses were later validated on the basis of a bootstrap resampling scheme carried out with the R function pvclust. This resampling consists of sampling repeatedly and randomly, with replacement, from the entire data sample.

In contrast, binary logistic regression is an approach that focuses on the de-pendent variable (here transitivity) and its relation to individual predictors. In the present study, logistic regression modeling helps identify possible correlations between the predictors and native and non-native speakers’ syntactic choices. The initial regression model included all the independent variables listed in Table 2, as well as the dependent variable transitivity and, crucially, their interactions with the variable country, as illustrated in (3). Those interactions are crucial to the analysis as they will serve to pinpoint which linguistic factor levels influence speakers differently across the learner variants.

(3) transitivity ~ country + voice + reclength + patlength + recanimacy + recaccessibility + pataccessibility + recpronominality + patsemantics + country:voice + country:reclength + country:patlength + country:recanimacy + country:recaccessibility + country:pataccessibility + country:recpronominality + country:patsemantics 7

The final and minimally adequate regression model was identified through a step-wise model selection by AIC process using the R function stepAIC() in the rms. package. I used glm in R to fit a generalized linear model. During the selection process, insignificant factors were removed from the model, starting with statisti-cally insignificant interactions, followed by individual factors that did not take part in significant interactions. In order to assess the reliability of the regression model, I used the bootstrap statistical technique and computed estimates of the

7. Because of issues of collinearity, patronominality and patanimacy and their interaction with country were taken out of the statistical analysis.



model performance. Model validation is done to assess whether predicted values from the model are likely to accurately predict responses on new information. The bootstrap validation was performed using the R function validate() in the Design package. The validation process was based on 200 bootstrap runs.

3. Results

3.1 Cluster analysis

The HAC analysis yielded the results presented in Figure 1. This output includes Approximately Unbiased (AU) as well as Bootstrap Probability (BP) p-values. AU p-values are computed by multiscale bootstrap resampling, which, according to Suzuki and Shimodaira (2011), provides a better approximation to unbiased p-values compared to BP values. Henceforth, I therefore focus exclusively on AU p-values. AU values are calculated for all clusters in the analysis and serve as an indicator of how strongly individual clusters are supported by the data. Clusters highly supported by the data will tend to have large p-values. Each cluster in the dendrogram is represented by a horizontal line, and the distance between clusters is indicated by the length of vertical lines. Reading the tree plot from bottom to top, forms clustered early will be more similar than forms clustered late.

As expected, the dendrogram clearly distinguishes between the ditransitive and the prepositional dative constructions. Each construction is clustered inde-pendently of the other, both sub-clusters being strongly supported by the data with AU p-values of 95% for the ditransitive and 96% for the prepositional da-tive. More interesting, however, is the combination of sub-clusters within the two main construction clusters. Starting with ditransitive constructions on the right hand side of the dendrogram, we observe that ditransitive patterns are separated into two sub-clusters: on the one hand, a sub-cluster with all world Englishes plus the German–English learner variant. The data support this first sub-cluster with an AU p-value of 76%. On the other hand, we observe a second sub-cluster con-sisting of native English and the French–English learner variant, which the data do not support as strongly as the first sub-cluster (AU p-value of 64%). With re-gard to prepositional dative constructions, we observe two sub-clusters. In the first place, viz. most strongly supported, is the {ICLEprepdative{ICE-GBprepdative{ICE-INDprepdative ICE-SINprepdative}}} sub-cluster with a AU p-value of 78%, followed by the {ICE-HKprepdative ICLE-GEprepdative} sub-cluster with a AU p-value of 67%.

An interesting aspect of the above results is that the two learner Englishes do not cluster together for either of the two investigated constructions: the French–English learner variant emerges as more similar to native English and the



German–English variant as more similar to world Englishes. With respect to the prepositional dative particularly, German–English interlanguage is most similar to Hong Kong English whereas with ditransitives, it is most similar to Singapore English. With regard to French–English interlanguage, the variant shares more salient similarities with a wider range of world Englishes with prepositional da-tive constructions than it does with ditransitive constructions. Generally, the HAC results confirm that at least structurally it makes sense to investigate EFL and ESL variants together, under the umbrella of non-native Englishes. In what follows, I turn to the logistic regression analysis to explore the nature of the dissimilarities between the two GIVE constructions across the English variants. I first briefly discuss the final regression model, and then I focus on the individual grammati-cal features that influence the different types of speakers in their choice between a ditransitive and a prepositional dative GIVE construction.

Cluster dendrogram with AU/BP values (%)40

3020

100

Hei

ght

9995

83967

92678

2

7 676 58

80 58 576 574

96 99

3864

13

67 40

1064 48

ICEH

K.d

itras

itive

ICLE

GE.

ditr

asiti

ve

ICLF

R.di

tras

itive

ICEG

B.di

tras

itive

ICEI

ND

IA.d

itras

itive

ICES

ING

AP.

ditr

asiti

ve

ICEH

K.p

repo

sitio

nal.d

ativ

e

ICLE

GE.

prep

ositi

onal

.dat

ive

ICLF

R.pr

epos

ition

al.d

ativ

e

Distance: canberraCluster method: ward

ICEG

B.pr

epos

ition

al.d

ativ

e

ICEI

ND

IA.p

repo

sitio

nal.d

ativ

e

ICES

ING

AP.

prep

ositi

onal

.dat

ive

edge#au bp

Figure 1. Distribution of the two dative constructions of GIVE in the six corpora



3.2 Logistic regression

The final GLM regression model reveals a highly significant correlation between the predictors and speakers’ choice of GIVE constructions (Likelihood ratio = 1038.7, df = 42, p<−190), a corresponding strong correlation (R2=0.75) and a very high classification accuracy (89.7%, C=0.95). Table 3 shows a summary of the (marginally) significant predictors and interaction terms identified by the model including their coefficients, significance levels and confidence intervals.

Table 3. Overview of the final model GLM model applied to the six English varieties investigatedEffects Estimate/

Coefficientp 2.5% 97.5%

intercept 3.36110 *** 1.39816 5.39488patlength −1.85904 *** −2.99023 −1.05927recpronominality = pronoun −2.35993 *** −2.93781 −1.80855pataccessibility = given −1.69067 ms −3.63329 0.07759recaccessibility = given 2.88971 ** 1.29226 4.86300voice = passive 1.85303 ms 0.09987 3.83338country = icle-fr * reclength 2.00145 * 0.46620 4.06208country = ice-hk * recaccessibility = given −2.94854 ** −5.04361 −1.17850country = ice-ind * recaccessibility = given −2.58498 ** −4.73298 −0.73757country = ice-sin * recaccessibility = given −2.21067 * −4.41743 −0.28479country = icle-ge * recaccessibility = given −2.68958 * −4.99545 −0.62072country = ice-hk * patlength 1.18780 * 0.34911 2.33718country = ice-ind * patlength 1.23620 * 0.38765 2.39011country = ice-sin * patlength 0.91127 ms −0.03148 2.10377country = icle-ge * patlength 1.09029 * 0.16993 2.27417country = ice-ind * voice = passive −2.96790 ** −5.22324 −0.89176country = icle-fr * voice = passive −2.98605 * −6.04237 −0.18113

The bootstrap validation reveals some overfitting present in the data: the R2 de-creases by 0.048 indicating that the prediction for the test set is less accurate than for the training set. The apparent Somer’s Dxy is 0.91 and the bias-corrected Dxy is 0.88; the maximum absolute error in predicted probability is estimated to be 0.04. Overall, these results suggest that findings based on the regression model should be viewed as tentative and representing a first step in our effort to study indigenized and learner Englishes in a unified way using a regression approach.

Based on Table 3, some predictors influence speakers’ syntactic choices dif-ferently across English variants (e.g. length of the patient and the recipient, ac-cessibility of the recipient and voice) and some do not (e.g. pronominality of



the recipient). With all types of speakers, the pronominality of the recipient (i.e. whether the recipient is mentioned through a noun phrase or a pronoun) contrib-utes highly significantly to the speakers’ syntactic choice. Although this finding is not surprising, given that it was already presented in Bresnan et al. (2007) for native English and in Bernaisch et al. (2014) for ESL, it is, however, the first time that it is observed in EFL (at least to my knowledge) and that the pronominality of the recipient emerges as a strong determinant of syntactic choice, independent of speaker type.

The second finding concerns the interactions between the variable country and the four predictors reclength (i.e. length of the recipient), recaccessibil-ity (i.e. recipient accessibility), patlength (i.e. length of the patient) and voice. Broadly, this result indicates that the syntactic choices of native, EFL and ESL speakers vary statistically significantly on the basis of length of the recipient and the patient, accessibility of the recipient and voice. It is interesting to note that although previous studies on native English (Bresnan et al. 2007) and South Asian Englishes (Bernaisch et al. 2014) found semantic predictors to influence speak-ers’ syntactic choices, such an influence is not observed to distinguish the native English, EFL and ESL variants. Below, I briefly discuss the nature of the effects (main effects and interactions) presented in Table 3.

3.2.1 Monofactorial resultsFigure 2 is a graphic representation of the main effects of predictors recpronomi-nality (upper panel) and pataccessibility (lower panel) on the prepositional dative GIVE construction, all types of speakers considered.

The upper panel in Figure 2 shows the strong correlation between the pro-nominality of the recipient and the choice of a prepositional dative. Speakers use prepositional dative constructions more frequently when the recipient is a noun phrase. In the lower panel of Figure 2, we see that all speakers prefer prepositional constructions when the patient is mentioned for the first time. Although these re-sults are not new and support Bernaisch et al.’s (2014) findings, they serve to con-firm the reliability of the regression method. Furthermore, these results strongly suggest that, in relation to recipient pronominality and patient accessibility, more advanced English learners, proficient ESL speakers and native English speakers share similar psycholinguistic processes. Although follow-up work is necessary to confirm this, the regression results provide a useful starting point for experimental validation at a later stage.

3.2.2 Multifactorial resultsThe multifactorial results focus on the four interactions involving the variable country (i.e. English variants): country:recaccessibility, country:voice,



country:patlength, and country:reclength. Beginning with country:recaccessibility, recall from Table 3 that all non-native English pop-ulations apart from French English learners (i.e. all ESL speakers and German English learners) make syntactic choices that differ significantly from the native norm. It is interesting to note that there is some similarity in the way recacces-sibility affects non-native speakers’ syntactic choices, regardless of whether those speakers are EFL or ESL users. Overall, non-native speakers prefer using preposi-tional datives with given recipients more frequently than native speakers do, (see Figure 3) and native speakers prefer the prepositional dative construction with given recipients (compared to new recipients): they use ditransitive constructions with new recipients approximately ninety percent of the time whereas they use prepositional constructions seventy percent of the time with given recipients. In contrast, English learners prefer prepositional constructions with new recipients (up to approximately forty-five percent of the time in the case of German English learners) and ditransitive constructions with previously mentioned recipients (to the exception of Indian writers whose ditransitive constructions with given recipi-ents are slightly less frequent than those of native speakers).

The main e�ect of recpronominality

Recipient pronominalitypronoun

np

Predicted probability of prepositional_dative

0.0 0.2 0.4 0.6 0.8 1.0

The main e�ect of pataccessibility

Patient accessibility

given

Predicted probability of prepositional_dative

0.0 0.2 0.4 0.6 0.8 1.0

new

Figure 2. The main effect of recpronominality (upper panel) and pataccessibility (lower panel) and across native English, ESL and EFL on the predicted probability of prepositional constructions (versus ditransitive constructions) with all other predictors in the model



With voice, two populations of non-native English speakers (Indian and French), diverge from the native norm (statistically) significantly (see Figure 4). Overall, native speakers have a strong tendency to select prepositional dative con-structions with passive structures, which contrasts sharply with their preference to use active and non-finite structures (e.g. he wanted me to give the book to him) in ditransitive structures. The data show that Indian and French English speakers diverge significantly from this pattern by using prepositional dative constructions much more frequently than native speakers with active voice and non-finite struc-tures. Overall, neither Indian nor French English speakers show any strong prefer-ence of one construction over the other in passive and active voice.

COUNTRY X RECACCESS (ICEGB)

PrepDat

PrepDat

New Given

Recaccessibility

Ditr

Ditr

1.0

0.8

0.6

0.4

0.2

0.0

COUNTRY X RECACCESS (ICLEGE)

PrepDatPrepDat

New Given

Recaccessibility

Ditr

Ditr

1.0

0.8

0.6

0.4

0.2

0.0

COUNTRY X RECACCESS (ICESINGAP)

PrepDat

PrepDat

New Given

Recaccessibility

Ditr

Ditr

1.0

0.8

0.6

0.4

0.2

0.0

COUNTRY X RECACCESS (ICEINDIA)

PrepDat

PrepDat

New GivenRecaccessibility

Ditr

Ditr

1.0

0.8

0.6

0.4

0.2

0.0

COUNTRY X RECACCESS (ICEHK)

PrepDatPrepDat

New Given

Recaccessibility

1.0

0.8

0.6

0.4

0.2

0.0

DitrDitr

Figure 3. The interaction country: recaccessibility



With patlength, three non-native variants (Indian and Hong Kong English and German–English interlanguage) yield patterns significantly different from the native British English (see Figure 5). Based on Figure 5, both native and non-na-tive speakers have in common a preference for longer patients in ditransitive con-structions. However, the interaction shows that both types of speakers have dif-ferent cut-off points (i.e. maximum length of a patient) for a prepositional dative construction. With four-word-long patients, native speakers exclusively choose a ditransitive construction whereas non-native speakers use prepositional dative constructions with patients of up to six words long before they exclusively opt for a ditransitive construction.

With reclength, only the French–English interlanguage differs signifi-cantly from the native baseline (see Figure 6). Overall, in both native and French learner English, speakers prefer using longer recipients in prepositional dative

COUNTRY X VOICE (ICEGB)

Active No voice Passive

1.0

0.8

0.6

0.4

0.2

0.0

Voice

PrepDatPrepDat

PrepDat

DitrDitr

Ditr

COUNTRY X VOICE (ICEINDIA)


1.0

0.8

0.6

0.4

0.2

0.0

Voice

PrepDat PrepDat PrepDat

Ditr Ditr Ditr

COUNTRY X VOICE (ICLEFR)


1.0

0.8

0.6

0.4

0.2

0.0

Voice

PrepDat PrepDatPrepDat

Ditr Ditr Ditr

Figure 4. The interaction country: voice

Prob

abili

ty o

f Pre

posi

tiona

l dat

ive

0.4

0.2

0.6

0.8

0

1

PATLENGTH*COUNTRY effect plot

2 4 6 8 10

Patlength (in words)

COUNTRY

ICEHKICEINDIAICLSINGAPICLEFR

ICEGB

ICLEGE

Figure 5. The interaction country: patlength



constructions. However, what characterizes this interaction is that, compared to native speakers, French English learners disprefer recipients that are longer than two words in ditransitive constructions. Native speakers’ choice of a prepositional construction increases sharply when recipients include six or more words, whereas French English learners exclusively use a prepositional dative construction when recipients include four words or more.

4. Discussion

This study explored an innovative way of contrasting native, second and foreign English variants with a view to ultimately reducing the existing paradigm gap in the research of EFL and ESL. The main motivation behind the present work was the recognition that, although the methodologies currently applied in the field of corpus research on English variants are gradually developing and increasing their descriptive power, they have so far not provided a way to study (i) whether/how grammatical contexts constrain speakers’ syntactic choices differently in native English, EFL and ESL, and (ii) whether/how speakers with different instructional backgrounds (EFL versus ESL speakers) develop different syntactic variation pat-terns in their own English variant and what those distributional differences sug-gest with respect to the underlying motivation of those differences.

In order to address this methodological gap, the present study draws on state-of-the-art methods currently used in learner corpus research and adopts a mul-tifactorial approach to corpus annotation as well as cluster analysis and binary

RECLENGTH*COUNTRY effect plot

Prob

abili

ty o

f Pre

posi

tiona

l dat

ive

0.4

0.2

0.6

0.8

0

1

5 10 15 20Reclength (in words)

COUNTRY

ICEHKICEINDIAICLSINGAPICLEFR

ICEGB

ICLEGE

Figure 6. The interaction country: reclength



logistic regression techniques to analyze the data statistically. A main benefit of this type of approach is that it has provided a way to quantitatively study the gram-matical contexts of ditransitive and prepositional GIVE constructions at a high level of granularity and in an unprecedented multidimensional way (i.e. based on eleven semantic and morphological factors). In addition, and in sharp contrast with previous EFL-ESL studies, the regression technique has provided a way to reach beyond the linguistic structure of the variants and understand how different types of non-native speakers make (dis)similar syntactic choices and what linguis-tic factors motivate those different choices.

The present study also allows us to address the currently debated question of whether EFL and ESL represent discreet types of variants or a continuum (Hundt and Mukherjee 2011a). The results of the cluster analysis reveal no clear-cut dis-tinction between learner Englishes and world Englishes and thereby support Gilquin and Granger’s (2011: 56) position that “the distinction between EFL and ESL should be viewed as a continuum”. In that regard, the cluster analysis reveals that while the two learner variants investigated (i.e. French and German learner Englishes) show close similarities with world Englishes (for instance the German English variant was observed to be most similar to Hong Kong English with prep-ositional constructions and most similar to Singapore English with ditransitive constructions), the two learner Englishes do not cluster together with either of the two constructions investigated. This is an important result as it indicates that, within the EFL–ESL continuum, individual world and learner variants are inter-mingled rather than grouped together according to ‘type’ and positioned distinc-tively closer or further away from the native variant. In that respect, the present results differ from Nesselhauf ’s (2009) account which finds that world Englishes seem to occupy an intermediate position between learner Englishes and native Englishes.

The multifactorial approach adopted in this study (i.e. the computation of be-havioral profiles and the subsequent cluster and logistic regression analyses) has helped further our knowledge and understanding of what distinguishes and what unites learner and world Englishes. It has provided a way to pinpoint the exact grammatical features that not only influence speakers’ choices of one syntactic structure over the other, but also those that influence speakers differently across English variants. The results show that grammatical and processing factors in-volved in ditransitive and prepositional dative constructions interact differently in native English, EFL and ESL, thus leading to different variant-specific character-izations of those constructions. Specifically, the logistic regression results indicate that the factors contributing most to syntactic variation across native English, ESL and EFL are recipient accessibility, length of recipient and patient and voice.



A particularly interesting aspect of the analysis concerns the specific case of prepositional constructions. To a certain extent, the regression results allow us to profile prototypical prepositional dative constructions in non-native Englishes, akin to Bernaisch et al.’s (2014) “protostructions” in South Asian English vari-ants. These are “abstract combinations of (cross-varietally stable) features with a high cue validity, or preference or predictive power, for a particular syntactic construction, or pattern” (Bernaisch et. al 2014: 14). One way of identifying such protostructions is to use statistical analyses which allow us “to assemble the ab-stract combination of features that are associated with a particular construction, regardless of whether this combination of features is in fact instantiated in the data” (Bernaisch et al. 2014: 14). With this procedure, analysts can compare con-structions in cross-varietally different corpora on the basis of abstract combina-tions of features and their constructional preferences, as reflected in regression coefficients. The data of my study suggest that EFL and ESL speakers share, to a certain extent, prepositional dative protostructions including four specific cross-varietally stable features: reclength, recaccessibility, patlength and voice (see Table 4).

Table 4. Combination of stable features in protostructions for prepositional dative in EFL and ESL

reclength recaccessibility patlength voicePrep dative 2+ words given 6+ words passive

Interestingly, the above protostruction supports Nam et al.’s (2013) finding that not all linguistic factors recorded in the data are needed in explaining and model-ing sentence construction. My study demonstrates that Nam et al.’s (2013) find-ing not only applies to native and second language English but is also relevant to EFL. Furthermore, the current work confirms Nam et al.’s (2013: 16) finding that “attributes corresponding to patient, namely patient pronominality and animacy may not be required”.8 Based on the above protostruction, one can hypothesize

8. In contrast with the current work, Nam et al.’s (2013) study is based on three complementa-tion patterns of GIVE: ditransitive, prepositional dative and monotransitive. As pointed out by an anonymous reviewer, the case of monotransitive complementation patterns in ESL varieties is one that needs consideration given that, as Mukherjee and Hoffman (2006) show, such a pat-tern represents the most frequent one for GIVE in Indian English. Therefore the question of how to approach monotransitive patterns arises alongside the ditransitive and prepositional patterns. While Nam et al. (2013) investigate the three complementation patterns simultaneously using a multinomial regression technique, it may be argued that, semantically, ditransitive and prepo-sitional dative constructions on the one hand and monotransitive constructions on the other are equivalent and therefore should not be treated as such in a multinomial regression model.



that in second and foreign language English, prepositional dative constructions serve as default constructions when grammatical contexts are more complex and therefore more cognitively taxing. This view is mainly based on the findings that prepositional constructions in EFL and ESL are systematically found with factor levels that are harder to process such as new recipients, passive voice, longer pa-tients and longer recipients. In other words, greater variation across native and non-native Englishes tends to be observed when factors incur a higher cognitive load on the speaker. Interestingly, this result is in line with Deshors and Gries’ (2014) previously discussed finding that the uses of can (rather than may) by non-native English speakers reflect the cognitive process of resorting to a default term in complex grammatical contexts.

Overall, the regression results demonstrate the usefulness of adopting meth-odological approaches compatible with cognitively-inspired theoretical frame-works in order to narrow the paradigm gap. While previous studies such as Laporte (2012: 19) note that “similar developments and cognitive processes are perhaps at play across both EFL and ESL acquisition”, the current work provides a first set of corpus-based evidence of the existence of such processes. In addition, the present study stresses the importance of reaching beyond structural linguistic differences and investigating processing (dis)similarities between EFL and ESL variants so as to integrate the two variants in ways that are psychologically relevant. The current work also helps to shed light on Laporte’s (2012: 18) previously unexplained find-ing that “despite quantitatively different trends, there are significant qualitative similarities across both types of varieties, which further blurs the distinction be-tween the concepts of EFL and ESL”. By combining quantitative and fine-grained perspectives in a single statistical analysis, the multifactorial approach helps in recognizing that homogeneous patterns across EFL and ESL are those that tend to involve processing routines common to both types of non-native speakers. Ultimately, this finding brings some empirical evidence that processing factors are at play across EFL and ESL and that, from a cognitive perspective, the two fields of study can contribute to one another.

As the first study of its kind to bridge the EFL-ESL paradigm using a multi-factorial methodological approach, the present work opens up various avenues for follow-up research on (dis)similarities across non-native English variants, despite the study being based on a relatively small data set with some degree of internal variability (see the wide confidence intervals in Table 3). Although, to some extent, this has reduced the degree of reliability of the regression model, the results point to animacy as a factor to further investigate. While Bresnan and Hay (2008) find that the factor influences the syntax of GIVE specifically in New Zealand English, the present analysis finds no implication of animacy in cross-variety variation. It therefore follows that animacy seems to be a strong determinant of syntactic



choice in native Englishes but not so much in non-native Englishes. Future studies should further investigate the factor in non-native English variants by analyzing larger data sets that allow analysts to adopt more specified coding schemes (i.e. not binary).

Now that the current analysis has demonstrated how rewarding regression modeling is when used to study EFL and ESL in a unified way, even more power-ful regression-based multifactorial approaches should be considered in order to investigate more closely than ever before the notion of “error versus innovations” (that is, forms that differ from those usually used in native contexts but that are acceptable in a different ESL context) (Kachru 1991; see Gilquin and Granger 2011 and Groves 2010, van Rooy 2011). One such approach is the previously mentioned MuPDAR approach (see Section 1.3 for a brief overview of the approach and Gries and Deshors (2014) for a detailed presentation of the approach). Statistically, the MuPDAR approach differs in one crucial way from the regression approach ad-opted in the current work as it involves computing two logistic regressions instead of one. A first regression is fitted to the native data, from which a regression equa-tion is derived. That equation allows analysts to predict native speakers’ linguistic choices. If the fit of that first regression is good, then that regression is applied to the non-native data. The second regression will return for every choice in the non-native data a prediction of prepositional dative construction or ditranstive construction. In a nutshell, this approach allows analysts to answer the follow-ing question: given all the features of the contextual situation that the non-native speaker is in right now, what would a native speaker use, a prepositional or a di-transitive construction? In cases where native and non-native speakers’ linguistic choices are not in line, analysts will be able to: (i) compute exactly by how much the non-native speakers are off target (e.g. in concordance with line 4, the native speaker chose X, the non-native speaker chose Y and the non-native speaker was ‘off target’ by more than 35%), (ii) quantify to what degree EFL speakers are more or less off target compared to ESL speakers and, (iii) address the fact that “de-spite differences in terms of norm, use and acquisitional setting, the distinction between EFL and ESL seems to be one of degree” (Laporte 2012: 19).

As the current analysis provides valuable insights on the grammatical features that influence EFL and ESL speakers’ choices of GIVE constructions, it also opens the door for further research on the dative alternation with verbs other than GIVE, specifically verbs such as BRING and SEND (Gries 2013, p.c.). As demonstrated in Gries and Stefanowitsch (2004), GIVE dative constructions tend to have a pref-erence for ditransitives whereas BRING tends to prefer prepositional datives and SEND tends to alternate freely between the two constructions. A follow-up study involving those three lexical verbs would help us assess to what extent the above preference patterns are relevant to EFL and ESL speakers.



In sum, the present study has contributed to the ongoing collective effort of bringing together EFL and ESL research areas and it has shown the potential of us-ing multifactorial approaches to continue to gradually close the existing paradigm gap between EFL and ESL variants. More concretely, this study has led to the iden-tification of new patterns of similarity between native English, ESL and EFL and which can be anticipated to be cognitively motivated. Finally, I hope to have shown the usefulness of adopting cognitive analytical frameworks as suitable background domains to integrate ESL and EFL in meaningful ways as well as the power of cor-pus linguistics in closing the EFL-ESL paradigm gap at least a little more.

References

Bernaisch, Tobias, Stefan Th. Gries, and Joybrato Mukherjee. 2014. “The Dative Alternation in South Asian English (es): Modelling Predictors and Predicting Prototypes”. English World-Wide 35: 7–31. <http://www.linguistics.ucsb.edu/faculty/stgries/research/ToApp_TB-STG-JM_DatAltInSAsEngl_EWW.pdf> (accessed October 21, 2013) DOI: 10.1075/eww.35.1.02ber

Biewer, Carolin. 2011. “Modal Auxiliaries in Second Language Varieties: A Learner’s Perspective”. In Joybrato Mukherjee and Marianne Hundt, eds. Exploring Second-Language Varieties of English and Learner Englishes: Bridging the Paradigm Gap. Amsterdam: John Benjamins, 7–33. DOI: 10.1075/scl.44.02bie

Bresnan, Joan, Anna Cueni, Tatiana Nikitina, and R. Harald Baayen. 2007. “Predicting the Dative Alternation”. In Gerlof Bouma, Irene Krämer, and Joost Zwarts, eds. Cognitive Foundations of Interpretation. Amsterdam: Royal Netherlands Academy of Science, 69–94.

Bresnan, Joan, and Jennifer Hay. 2008. “Gradient Grammar: An Effect of Animacy on the Syntax of Give in New Zeland and American English”. Lingua 118: 245–259. DOI: 10.1016/j.lingua.2007.02.007

Collins, Peter. 1995. “The Indirect Object Construction in English: An Informational Approach”. Linguistics 33: 35–49. DOI: 10.1515/ling.1995.33.1.35

Cutler, Richard. 2010. Tree-based Methods for Classification and Regression. NESCent Workshop on Tree-based methods for Classification and regression. <https://www.nescent.org/wg/cart/images/c/ca/Presentation1.pdf> (accessed June 13, 2013)

Deshors, Sandra C., and Stefan Th. Gries. 2014 “A Case for the Multifactorial Assessment of Learner Language: The Uses of ‘May’ and ‘Can’ in French-English Interlanguage”. In Dylan Glynn and Justyna Robinson, eds. Corpus Methods for Semantics: Quantitative Studies in Polysemy and Synonymy. Amsterdam: John Benjamins, 179–204.

Divjak, Dagmar S., and Stefan Th. Gries. 2006. “Ways of Trying in Russian: Clustering Behavioral Profiles”. Corpus Linguistics and Linguistic Theory 2: 23–60. DOI: 10.1515/CLLT.2006.002

Divjak, Dagmar S., and Stefan Th. Gries. 2009. “Corpus-Based Cognitive Semantics: A Contrastive Study of Phasal Verbs in English and Russian”. In Katarzyna Dziwirek and Barbara Lewandowska-Tomaszczyk, eds. Studies in Cognitive Corpus Linguistics. Frankfurt am Main: Peter Lang, 273–296.

http://www.linguistics.ucsb.edu/faculty/stgries/research/ToApp_TB-STG-JM_DatAltInSAsEngl_EWW.pdf

http://www.linguistics.ucsb.edu/faculty/stgries/research/ToApp_TB-STG-JM_DatAltInSAsEngl_EWW.pdf

http://dx.doi.org/10.1075/eww.35.1.02ber

http://dx.doi.org/10.1075/scl.44.02bie

http://dx.doi.org/10.1016/j.lingua.2007.02.007

http://dx.doi.org/10.1515/ling.1995.33.1.35

https://www.nescent.org/wg/cart/images/c/ca/Presentation1.pdf

https://www.nescent.org/wg/cart/images/c/ca/Presentation1.pdf

http://dx.doi.org/10.1515/CLLT.2006.002



Gilquin, Gaëtanelle. 2011. “Corpus Linguistics to Bridge the Gap between World Englishes and Learner Englishes”. In L. Ruiz Miyares and M.R. Álvarez Silva, eds. Comunicación social en el siglo XXI, Vol. II. Santiago de Cuba: Centro de Lingüística Aplicada, 638–642.

Gilquin, Gaëtanelle, and Sylviane Granger. 2011. “From EFL to ESL: Evidence from the International Corpus of Learner English”. In Joybrato Mukherjee and Marianne Hundt, eds. Exploring second-language varieties of English and learner Englishes: Bridging the paradigm gap. Amsterdam: John Benjamins, 55–78. DOI: 10.1075/scl.44.04gra

Götz, Sandra and Marco Schilk. 2011. “Formulaic Sequences in Spoken ENL, ESL and EFL: Focus on British English, Indian English and Learner English”. In Joybrato Mukherjee and Marianne Hundt, eds. Exploring Second-Language Varieties of English and Learner Englishes: Bridging the Paradigm Gap. Amsterdam: John Benjamins, 79–100. DOI: 10.1075/scl.44.05sch

Granger, Sylviane, Estelle Dagneaux, Fanny Meunier, and Magali Paquot. 2009. International Corpus of Learner English. Handbook and CD-ROM. Version 2. Louvain-la-Neuve: Presses Universitaires de Louvain.

Green, Georgia M. 1974. Semantic and Syntactic Irregularity. Bloomington: Indiana University Press.

Greenbaum, Sidney, ed. 1996. Comparing English Worldwide: The International Corpus of English. Oxford: Clarendon Press.

Gries, Stefan Th. 2003a. Multifactorial Analysis in Corpus Linguistics: A Study of Particle Placement. London: Continuum.

Gries, Stefan Th. 2003b. “Towards a Corpus-Based Identification of Prototypical Instances of Constructions”. Annual Review of Cognitive Linguistics 1: 1–27. DOI: 10.1075/arcl.1.02gri

Gries, Stefan Th. 2009. BehavioralProfiles 1.01. A program for R 2.7.1 and higher.Gries, Stefan Th, and Allison S. Adelman. 2014. “Subject Realization in Japanese Conversation

by Native and Non-Native Speakers: Exemplifying a New Paradigm for Learner Corpus Research”. Yearbook of Corpus Linguistics and Pragmatics 2014: New Empirical and Methodological Paradigms. Cham: Springer, 35–54.

Gries, Stefan Th, and Sandra C. Deshors. 2014. “Using Regressions to Explore Deviations be-tween Corpus Data and a Standard/Target: Two Suggestions”. Corpora 9: 109–136.

Gries, Stefan Th, and Anatol Stefanowitsch. 2004. “Extending Collostructional Analysis: A Corpus-Based Perspective on ‘Alternations’”. International Journal of Corpus Linguistics 9: 97–129. DOI: 10.1075/ijcl.9.1.06gri

Gries, Stefan Th, and Stefanie Wulff. 2013. Differences in Prenominal Adjective Order by Native Speakers and Learners: A Two-Step Regression-Analytic Procedure. Paper presented at the 2013 conference of the American Association for Corpus Linguistics, California State University, San Diego, January 18, 2013.

Gries, Stefan Th, and Stefanie Wulff. 2013. “The Genitive Alternation in Chinese and German ESL Learners: Towards a Multifactorial Notion of Context in Learner Corpus Research”. International Journal of Corpus Linguistics.

Groves, Julie. 2010. “Error or Feature? The Issue of Interlanguage Deviations in Non-Native Varieties of English”. Hong Kong Baptist University Papers in Applied Language Studies 14: 108–129.

Hundt, Marianne, and Joybrato Mukherjee. 2011a. “Discussion Forum: New Englishes and Learner Englishes – Quo Vadis?”. In Joybrato Mukherjee and Marianne Hundt, eds. Exploring Second-Language Varieties of English and Learner Englishes: Bridging the Paradigm Gap. Amsterdam: John Benjamins, 209–217. DOI: 10.1075/scl.44.11muk

http://dx.doi.org/10.1075/scl.44.04gra

http://dx.doi.org/10.1075/scl.44.05sch

http://dx.doi.org/10.1075/arcl.1.02gri

http://dx.doi.org/10.1075/ijcl.9.1.06gri

http://dx.doi.org/10.1075/scl.44.11muk



Hundt, Marianne, and Joybrato Mukherjee. 2011b. “Introduction: Bridging a Paradigm Gap”. In Joybrato Mukherjee and Marianne Hundt, eds. Exploring Second-Language Varieties of English and Learner Englishes: Bridging the Paradigm Gap. Amsterdam: John Benjamins, 1–7. DOI: 10.1075/scl.44.01muk

Kachru, Braj B., ed. 1982. The Other Tongue: English across Cultures. Urbana and Chicago: University of Illinois Press.

Kachru, Braj B. 1991. “Liberation linguistics and the Quirk concern”. English Today 25: 3–13. DOI: 10.1017/S026607840000523X

Laporte, Samantha. 2012. “Mind the Gap! Bridge between World Englishes and Learner Englishes in the Making”. English Text Construction 5: 265–292. DOI: 10.1075/etc.5.2.05lap

Mukherjee, Joybrato, and Sebastian Hoffman. 2006. “Describing Verb-Complementational Profiles of New Englishes: A Pilot Study of Indian English”. English World-Wide 27: 147–173. DOI: 10.1075/eww.27.2.03muk

Mukherjee, Joybrato, and Marianne Hundt. 2011. Exploring Second-language Varieties of English and Learner Englishes: Bridging the Paradigm Gap. Amsterdam: John Benjamins. DOI: 10.1075/scl.44

Nam, Christopher, Sach Mukherjee, Marco Schilk, and Joybrato Mukherjee. 2013. “Statistical analysis of varieties of English”. Journal of the Royal Statistical Society 176: 777–793. DOI: 10.1111/j.1467-985X.2012.01062.x

Nesselhauf, Nadja. 2009. “Co-Selection Phenomena across New Englishes: Parallels (and Differences) to Foreign Learner Varieties”. English World-Wide 30: 1–26. DOI: 10.1075/eww.30.1.02nes

R Development Core Team. 2010. R: A Language and Environment for Statistical Computing. Foundation for Statistical Computing. Vienna, Austria. <http://R-project.org> (accessed January 1, 2010)

Ransom, Elizabeth. 1979. “Definiteness and Animacy Constraints on Passives and Double Object Constructions in English”. Glossa 13: 215–240.

Rohdenburg, Günter. 1996. “Cognitive Complexity and Increased Grammatical Explicitness in English”. Cognitive Linguistics 7: 149–182. DOI: 10.1515/cogl.1996.7.2.149

Schilk, Marco, Joybrato Mukherjee, Christopher Nam, and Sach Mukherjee. 2013. “Complementation of Ditransitive Verbs in South Asian Englishes: A Multifactorial Analysis”. Corpus Linguistics and Linguistic Theory 0: 1–39. DOI: 10.1515/cllt-2013-0003

Schneider, Edgar W. 2003. “The Dynamics of New Englishes: From Identity Construction to Dialect Birth”. Language 79: 233–281. DOI: 10.1353/lan.2003.0136

Sridhar, Kamal K., and S. N. Sridhar 1986. “Bridging the Paradigm Gap: Second language Acquistion Theory and Indigenized Varieties of English”. World Englishes 5: 3–14. DOI: 10.1111/j.1467-971X.1986.tb00636.x

Suzuki, Ryota, and Hidetoshi Shimodaira. 2011. Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling. <http://www.is.titech.ac.jp/~shimo/prog/pvclust/pvclust.pdf> (accessed October 4, 2013).

Szmrecsanyi, Benedikt, and Bernd Kortmann. 2009. “The Morphosyntax of Varieties of English Worldwide: A Quantitative Perspective”. Lingua 119: 1643–1663. DOI: 10.1016/j.lingua.2007.09.016

Szmrecsanyi, Benedikt, and Bernd Kortmann. 2011. “Typological Profiling: Learner Englishes versus L2 Varieties of English”. In Joybrato Mukherjee and Marianne Hundt, eds. Exploring Second-Language Varieties of English and Learner Englishes: Bridging the Paradigm Gap. Amsterdam: John Benjamins, 167–207. DOI: 10.1075/scl.44.09kor

http://dx.doi.org/10.1075/scl.44.01muk

http://dx.doi.org/10.1017/S026607840000523X

http://dx.doi.org/10.1075/etc.5.2.05lap

http://dx.doi.org/10.1075/eww.27.2.03muk

http://dx.doi.org/10.1075/scl.44

http://dx.doi.org/10.1111/j.1467-985X.2012.01062.x

http://dx.doi.org/10.1075/eww.30.1.02nes

http://R-project.org

http://dx.doi.org/10.1515/cogl.1996.7.2.149

http://dx.doi.org/10.1515/cllt-2013-0003

http://dx.doi.org/10.1353/lan.2003.0136

http://dx.doi.org/10.1111/j.1467-971X.1986.tb00636.x

http://www.is.titech.ac.jp/~shimo/prog/pvclust/pvclust.pdf

http://www.is.titech.ac.jp/~shimo/prog/pvclust/pvclust.pdf

http://dx.doi.org/10.1016/j.lingua.2007.09.016

http://dx.doi.org/10.1075/scl.44.09kor



Van Rooy, B. 2011. “A Principled Distinction between Error and Conventionalized Innovation in African Englishes”. In Joybrato Mukherjee and Marianne Hundt, eds. Exploring Second-Language Varieties of English and Learner Englishes: Bridging the Paradigm Gap.Amsterdam: John Benjamins, 189–207. DOI: 10.1075/scl.44.10roo

Zaenen, Annie, Jean Carletta, Gregory Gerretson, Joan Bresnan, Andrew Koontz-Garboden, Tatiana Nikitina, M. Catherine O’Connor, and Tom Wasow. 2004. “Animacy encoding in English: why and how”. Proceedings of the 2004 ACL Workshop on Discourse Annotation. Barcelona: 118–125.

Author’s address

Sandra C. DehorsDepartment of Languages and LinguisticsNew Mexico State UniversityMSC 3L, Las CrucesNew Mexico 88003USA

[email protected]

http://dx.doi.org/10.1075/scl.44.10roo

mailto:[email protected]

2014. Deshors, Sandra C. "A case for a unified treatment of EFL and ESL: A multifactorial approach"....

Documents

Transcript of 2014. Deshors, Sandra C. "A case for a unified treatment of EFL and ESL: A multifactorial approach"....