Explorations into variation across Slavic: taking a bottom-up approach

34
Ruprecht von Waldenfels (University of Bern) Explorations into variation across Slavic: Taking a bottom-up approach* Abstract The paper describes three studies concerned with inner-Slavic variation in the use of different functional categories, two of which involve verbal aspect and one of which involves reflexive coding. The leading interest behind this research is (a) to understand convergences and divergences between the Slavic languages from a usage based perspective and (b) to develop a method for empirically approaching this issue using a parallel corpus. This research focuses on variation in a set of closely related standard language varieties of a single genus, and may be considered intermediate between a dialectological and a typological approach. 1. Introduction 1.1. Aims and approach The present paper presents a corpus-driven comparison of the Slavic stan- dard languages using parallel texts. It is part of a larger project aiming to model inner-Slavic convergences and divergences. The overarching ques- tions are: how can the Slavic languages be divided into meaningful groups not with respect to morphological and syntactic isoglosses, as it is tradition- ally done, but with respect to patterns of use of grammatical and other cat- egories in actual texts? Secondly, how can we account for the resulting grouping in terms of genealogical relatedness, areal contingency, common contact histories or other factors? These questions are approached starting from concrete grammatical or lexical categories, which are compared with respect to their use in translated text segments of the same text. The general idea is: if two categories in two languages are predominantly used in corresponding translated segments, * I would like to thank William A. Kretzschmar, Annemarie Verkerk, Bernhard Wälchli, and Deborah Edwards for valuable comments on an earlier draft of this paper.

Transcript of Explorations into variation across Slavic: taking a bottom-up approach

290 Ruprecht von Waldenfels

Ruprecht von Waldenfels (University of Bern)

Explorations into variation across Slavic:Taking a bottom-up approach*

Abstract

The paper describes three studies concerned with inner-Slavic variation inthe use of different functional categories, two of which involve verbal aspectand one of which involves reflexive coding. The leading interest behind thisresearch is (a) to understand convergences and divergences between theSlavic languages from a usage based perspective and (b) to develop a methodfor empirically approaching this issue using a parallel corpus. This researchfocuses on variation in a set of closely related standard language varieties of asingle genus, and may be considered intermediate between a dialectologicaland a typological approach.

1. Introduction

1.1. Aims and approach

The present paper presents a corpus-driven comparison of the Slavic stan-dard languages using parallel texts. It is part of a larger project aiming tomodel inner-Slavic convergences and divergences. The overarching ques-tions are: how can the Slavic languages be divided into meaningful groupsnot with respect to morphological and syntactic isoglosses, as it is tradition-ally done, but with respect to patterns of use of grammatical and other cat-egories in actual texts? Secondly, how can we account for the resultinggrouping in terms of genealogical relatedness, areal contingency, commoncontact histories or other factors?

These questions are approached starting from concrete grammatical orlexical categories, which are compared with respect to their use in translatedtext segments of the same text. The general idea is: if two categories in twolanguages are predominantly used in corresponding translated segments,

* I would like to thank William A. Kretzschmar, Annemarie Verkerk, BernhardWälchli, and Deborah Edwards for valuable comments on an earlier draft of thispaper.

Explorations into variation across Slavic: Taking a bottom-up approach 291

then they are functionally similar; conversely, if such categories are mostlyused in segments which are not translations of each other, then they aredeemed functionally not similar. We thus use parallel texts to investigate thecross-linguistic similarity of categories in a bottom-up fashion: we take theirdistribution in translation as the point of departure, rather than a definitionof their functions.

The present article presents three studies using this approach. Thesestudies are conducted on the basis of thirteen Slavic and two non-Slavic ver-sions of the same text, namely Mikhail Bulgakov’s Russian classic Master i

Margarita. Two of these studies are concerned with cross-Slavic variation ofverbal aspect, one is concerned with reflexive marking. With respect to thetwo aspect studies, the main objective is to test the method and compare itsresults based on the existing literature. With respect to reflexive coding, aninitial hypothesis that language contact with German may have had an in-fluence leading to inner-Slavic divergence is formulated and tested on thebasis of the parallel data.

The remainder of this introductory section gives an overview of the Slaviclanguages before discussing the position of this research between dialecto-logical and typological approaches and introducing the parallel corpus Para-Sol. Section 2 outlines the methodology in more detail; Section 3 presentsthe case studies and Section 4 contains a summary and conclusions.

1.2. The Slavic languages: A sketch of dimensions of diversity

Slavic comprises a set of quite closely related languages that have developedfrom their common ancestor, proto-Slavic, in the course of the last 1500years under rather diverse sociohistorical and contact linguistic conditions.The basic genealogical distinction is between East, West, and South Slavic(see Figure 1; Appendix, p. 462). South Slavic is sometimes additionally splitinto an Eastern (Bulgarian, Macedonian) and a Western Group (Slovenianand the different varieties of Bosnian/Serbian/Croatian). This classic genea-logical classification is primarily based on morphological and phonologicalcriteria (see Sussex and Cubberley 2006: 42–59 for a detailed overview).1

Neighboring Slavic languages are usually mutually comprehensible: aver-age speakers of Serbian or Bulgarian will understand standard Macedonian, aswill speakers of Slovak with regard to Polish, Czech and perhaps Ukrainian.

1 A refined version is put forth by Mares (1980) who posits two intersecting divi-sions into South vs. North and East vs. West Slavic languages, and four resultingmajor groups (North East, North West, South East, South West).

292 Ruprecht von Waldenfels

The Slavic languages differ widely with respect to the influence of otherlanguages on them. All Slavic languages of the West have been in intensecontact with German, albeit to differing degrees. The strongest influence ofGerman has surely been on Upper and Lower Sorbian, two languages spokenin two small communities in an area that has shifted in its majority fromSlavic to German in the course of the last millennium. German has also hadan immense impact on Czech, Slovak and Slovenian, languages spoken interritories that for centuries were part of the Austro-Hungarian Empire. Thesame is true for the westernmost parts of the Croatian/Bosnian/Serbiancontinuum. While Polish has also been in close contact with German, the so-ciohistorical circumstances were quite different in this case, since Poland wasnever part of a German-dominated state until the partitions at the end of the18th century.

Polish itself has had a strong influence on Belarusian and Ukrainian in theeast of its territory as well as on Russian in the Northeast, where Finno-Ugric and Turkic substrate influence has also played a role.

In the Southeast we find the Slavic Balkan languages Bulgarian and Mace-donian as well as the Torlak varieties of Serbian that are members of the Bal-

kan linguistic area together with Romanian, Albanian and Greek varieties.Religiously, the Slavic speaking region divides into an orthodox East and a

catholic West, with very different cultural influences. Latin was the liturgicaland prestige languages in the West, while Old Church Slavonic fulfilled thisfunction in the East; the border runs right through the dialect continuumwhere Croatian/Bosnian/Serbian is the standard language. Here and to theSouth-East, we also find Islamic traditions stemming from the time whenthis region was part of the Ottoman Empire. Today’s Slavic literary lan-guages differ in the age of their codification (from the middle ages to the20th century), in sociolinguistic situations (from near diglossic situations tofar-reaching shifts to a prestige standard) as well as in the size of the respect-ive communities (from less than 50 000 to over 200 millions). For overviews,see Sussex and Cubberley (2006), chapters 2 and 11.

Notwithstanding their diversity, the Slavic languages are quite similar, dueto the relatively short period of divergent development. Much of the dia-chronic development can be traced on the basis of existing texts. We have afairly good understanding of Proto-Slavic due to the early codification ofOld Church Slavonic at a time when the Slavic dialects were still mutuallycomprehensible. Due to these facts, the Slavic languages are well suited tomicrostudies concerning the variation and change of functional categories.

Explorations into variation across Slavic: Taking a bottom-up approach 293

1.3. Comparing Slavic standard languages:Between dialectology and intragenealogic typology

The approach presented in this article is based on parallel translations intoSlavic standard varieties. Contrasting languages in this way is situated be-tween linguistic typology, interested in linguistic variation across geographic andgenealogical groupings, and dialectology, concerned with variation within a setof closely related varieties.

The Slavic languages are especially amendable to such an intermediate ap-proach because they include a remarkable number of quite similar standardlanguages. In this respect, the current approach has a lot in common withdialectology. As with dialects, intelligibility between languages gradually dim-inishes with geographic distance: while neighboring languages are usuallymutually comprehensible, languages on opposite fringes of the area such asSlovenian and Russian, Polish and Bulgarian are not.

A major difference between dialectology and the approach presented inthis paper, however, is the much finer granularity of dialectal variation. Evenif one includes questionable cases of small, weakly standardized languages,hardly more than twenty standard varieties2 can be counted across the Slavicrealm. This is considerably less than the number of dialects or dialect group-ings in the same area.

Another major distinction concerns the language varieties themselves andthe general differences that exist between dialectal, spoken varieties andcodified, written varieties. Due to differences in the history of the standard-ization3 of the Slavic languages the relations of standard and vernacular var-ieties in Slavic are rather diverse and a straight-forward mapping from thestandard languages to the dialects beneath is not possible. The effects of re-stricting the study to standard languages are therefore rather complex and avariationist study will have different results depending on whether it is basedon standard languages or on vernacular dialects (see Murelli 2011 for a typo-logically oriented perspective on non-standard variation in Europe).

2 Cubberley and Sussey (2006: 2), for example, list 19 national languages and sub-national varieties. See Wingender (forthc.) for an overview of the discussion con-cerning types of Slavic standard languages.

3 Factors that play a role are, among others, puristic strategies directed against theinfluence of former roof languages (e.g., of German in relation to Czech or Slo-vene), strategies aimed at bridging the gap to such roof languages (e.g., to ChurchSlavonic in relation to vernacular Russian), strategies emphasizing differences toneighboring Slavic Abstand languages (e.g., Macedonian with respect to Bulgar-ian), as well as an explicit orientation towards particular vernaculars (e.g. Neosto-kavian for Bosnian/Serbian/Croatian).

294 Ruprecht von Waldenfels

There are two important advantages that standard languages possess for acomparative approach: first, standard languages are more accessible thandialects: they have a less variable norm, native speakers are easier to come by,there is a wealth of rather sophisticated literature that can be tapped into, andthere is an abundance of written material that can be analyzed. Secondly, andcrucially for the current approach, standard languages are used in trans-lation, and translated texts are a source of rich data concerning similaritiesand differences across languages, data that is based on speakers’ choicesmade independently of the researcher’s research agenda.

Like many typological approaches, the current approach abstracts awayfrom problems of the representativity of the data source for the speech com-munity as a whole and consequently speaks of doculects, varieties recon-structable from some kind of documentation, rather than of languages. Likein typology, this abstraction is arguably justified by the diversity of the dataand the difficulties in procuring it: while it would be perhaps desirable to usea parallel corpus with translations into a greater number of Slavic varieties,such a corpus would simply be extremely costly to produce. As in typologicalapproaches, the current restriction to standard languages is thus in a sense aconvenience sample (see Wälchli and Waldenfels 2013 for an assessment ofthe diversity of the sample used in this study).

If the current approach can be likened to a dialectology of standard var-ieties, it may also be considered a form of intragenealogic typology (“intragen-etic comparison” in Greenberg 1980 as surveyed extensively in Croft 2003:248–251). Describing this type of research, Kibrik (1998, 2005) and Daniel(2010: 60) stress the point that for some kinds of typological questions, thecomparative study of closely related languages can be more fruitful than thestudy of a large and heterogeneous sample (which is usually used in linguistictypology). This is because in closely related languages, it is much easier to comeup with comparable data, and small variations in use can be studied that are im-portant for the understanding of the issues involved. Such investigations areespecially fruitful with respect to categories such as tense, aspect or mood,which are non-discrete and allow for intermediate values; microstudies of lan-guage family diversity concerning such categories add to the understanding ofthe dynamics of such categories that are very difficult to assess in larger typo-logical samples (Kibrik 2005: 193). A small and constrained sample can pro-vide insight into the effects of language contact and other factors in languagechange in general and in the genealogical groupings in particular. These are,broadly speaking, aims which the current approach is also concerned with.

If this approach of contrasting Slavic standard languages is less granularthan a dialectological approach, it is at the same time more granular than the

Explorations into variation across Slavic: Taking a bottom-up approach 295

typical intragenealogic typological one. A case in point is Verkerk (this vol-ume), who focuses on a family, Indo-European, rather than on a genus suchas Slavic. As a consequence and in contrast to both a dialectological and thecurrent intermediate approach, all of her varieties are mutually incompre-hensible. The question of which linguistic entities are comparable is thereforemore difficult to solve than in the case of the comparison of closely relatedvarieties, be they standard languages or dialects. For example, we will seebelow that the identification of reflexive elements and verbal aspect acrosslanguages is comparatively straightforward in the current sample of Slaviclanguages.

1.4. A parallel corpus for intragenealogic typology: ParaSol

The case studies reported here are based on parallel texts taken from Para-Sol, a Parallel Corpus of Slavic and Other Languages compiled at Bern andRegensburg University. This corpus consists of original and translated textsin almost all Slavic and many other European languages. It is available forquerying through a web-based public interface (http://parasol.unibe.ch; seeWaldenfels 2006, 2011 for a description).

The texts in the corpus are annotated for lemmata and morphosyntacticinformation in as far as tools for the languages in questions are available. Thetexts are aligned on a sentence level, that is, each sentence is linked to itstranslation in the other versions. All procedures, such as sentence boundaryrecognition and alignment are done automatically (and therefore involve acertain error rate). For the case studies reported in this paper, only one textwas used: Mikhail Bulgakov’s novel Master i Margarita and eleven translationsinto other Slavic languages as well as two into Modern Greek (aspect studies)and German (reflexive study).

The examination of the variation across translations is used as a proxy forthe variation in the languages involved; see Wälchli and Cysouw (2007) onthe use of parallel texts in typology and Waldenfels (2012b) on some prob-lems and advantages of using them for linguistic research in general.

Parallel texts have the advantage that they are in many respects semanti-cally comparable across different language versions. When studying highlycontext-dependent categories such as aspectual or modal categories, thiscomparability allows one to start with a minimum of theoretic assumptionsand move bottom-up in the study of the variation of a category.

When examining the translations of a certain text into many languages,we can never be sure whether the given translation is indicative of aspects ofthe linguistic system we are interested in, or whether they stem, rather, from

296 Ruprecht von Waldenfels

pragmatic reinterpretations, intralinguistic variation or indeed idiosyncraticpeculiarities due to priming by the source language. Using massively paralleltexts and a large number of heterogeneous examples enables one to treatthese factors as basically resulting in statistical noise. Therefore it is advisableto employ exploratory quantitative methods, as used also in several otherpapers in the present volume. In the following section, the procedures forsuch an analysis are introduced.

2. Overview of the methodology

2.1. Variables under investigation

Two of the case studies involve verbal aspect, a category that is entrenched inall Slavic languages. In general, every Slavic verb form is of either perfectiveor imperfective aspect. While the core functions of aspect relate to temporaland aspectual components of the denoted situation, it can also be used to ex-press pragmatic and other meanings depending on the environment it is usedin. Here, we examine aspect use in imperative forms and with negated pastevents.

The third study involves the use of reflexive morphemes. “Reflexive” isused as a cover term here, because the marker in question is used in otherfunctions beside expressing reflexive situations. In this case, we are thusdealing with a morphological rather than functional category.

2.2. Procedures

The following general procedure applies in principle to any linguistic vari-able that could be operationalized for the present approach. The procedureconsists of the following steps:1. The set of instances in the corpus that involve the linguistic variable, i.e.,

the envelope of variation (Labov 2004: 7), is extracted.In the case of the two aspect studies below, this is done by querying theRussian original text for imperatives and negated past verb forms, andextracting them along with their translations. Here, all the imperatives ornegated past forms represent the envelope of variation for aspect use: ingeneral, any Slavic verb form has a defined value for verbal aspect.With other variables, the envelope of variation is much larger. Forexample, any verb form may involve reflexive marking. Therefore, for thereflexive study, all Russian sentences with a verb form are extracted alongwith their translations first, and then all those sentences that do not in-

Explorations into variation across Slavic: Taking a bottom-up approach 297

volve a reflexive element in any of the languages are discarded, leavingonly those that contain a reflexive in at least one version.

2. These results are filtered to exclude mistakes and irrelevant cases and eachoccurrence is assigned the appropriate label (i.e., imperfective or perfective inthe case of aspect).In this step, the exact procedures greatly depend on the variable in ques-tion. Problems may result from errors in the automatic morphologicalanalysis that the query depends on: for example, the result set may containforms erroneously identified as imperative. Since the texts are aligned ona sentence, rather than word level, the extracted segments may containmore than one instantiation of the variable (e.g., more than one impera-tive in a sentence); these instantiations need to be correctly linked to eachother across the text versions.Translation effects may need to be assessed in this step, since the concep-tualization of a situation may be rather different in different translations(for example, a negated event may be framed as a positive one, e.g., leave now!

may be equivalent to don’t stay!; see Waldenfels 2012b for a discussion.)3. The results are transformed into a matrix for further analysis.

The columns in the resulting matrix relate to instances of the envelope ofvariation (in the case studies below, imperatives, negated past forms, andverb forms that may or not be reflexive); the rows relate to the texts in dif-ferent languages. Each cell thus contains the value for a specific context ina specific translation. Figure 2 contains an illustration.

4. The result set is evaluated using different quantitative and qualitative ap-proaches.In the case studies below, the result sets are visualized in neighbor netgraphs generated with the software package SplitsTree (Huson and Bryant2006). For the visualization in such graphs, the matrices in question are

Figure 2. Aspect values in corresponding imperatives as a matrix (excerpt). Each cellcontains either no value (?), perfective (p) or imperfective (i) aspect, or missing data (-).

298 Ruprecht von Waldenfels

transformed into distance matrices by computing the Hamming distancebetween the aggregated doculect values. In the computation of Hammingdistances, for each pair of rows in Figure 2, the proportion of non-identicalvalues between the rows is computed and interpreted as a distance: if, say,the Russian and the Polish text have different values in 250 of 500 values,the distance is calculated to be 0.5; if they differ in 100 of 500 instances, thedistance is much smaller at 0.2.The resulting distance matrix is transformed to a neighbor net, which

shows possible groupings resulting from these distance data and visualizesthe distances: the shortest path between two nodes along the net is propor-tional to their distance in the matrix. An illustrative example concerning acomplex net of distances is shown using data from German autobahn dis-tances between cities in Figure 3. For more information on this type of vis-ualization and distance matrix computation, see Nichols and Warnow (2008)and the introduction to this volume.

Bootstrapping is employed to validate the visualization in the graph.Bootstrapping consists of repeatedly recalculating the distance matrix with

Figure 3. A neighbor net graph used as a graphical representationsof a distance matrix: highway distances between German cities.

Explorations into variation across Slavic: Taking a bottom-up approach 299

randomly selected parts of data withheld and then basing the graph only onthose structures which are recurrent, thereby factoring out chance relation-ships.

Summarizing, the procedure used here involves a bottom-up approachto the comparison of category use across languages. As is the case in gen-eral in corpus-driven research, the use of corpora makes it possible to startwith a clean slate: rather than taking departure from an initial hypothesisconcerning which contexts are relevant to variation, the assessment ofvariation is based in principle on all uses of the variable found in the cor-pus. The resulting graph is an empirically founded visualization of similar-ities and differences in category use across translations. This visualizationis then interpreted in light of the literature and investigated in more detailas necessary.

3. Case studies on aspect

3.1. Variation in Slavic aspect

Although the study of Slavic aspect has been very important in the develop-ment of the study of aspect in general, as Dahl (1985: 69) remarks, “if onelooks at Slavic aspects from a typological perspective, it becomes clear thatthe Slavic systems are in fact rather idiosyncratic in many ways”. In Slavic, as-pect is an inherent category of verbal lexemes: in principle, any verb is eitherof perfective or imperfective aspect and this determines the distribution anduse of this verb in major ways. While the distinction is present in every Slaviclanguage, the distribution of aspectual forms across the paradigm varies andthe details of aspectual marking vary from language to language.4 Aspect isespecially interesting for the given approach since it is both frequent andhighly context dependent, which calls for an approach that is based on actualtextual usage.

In recent years, variation in the use of this category across Slavic lan-guages has increasingly moved into focus (Dickey 2000; Petruchina 2000;Dickey and Kresin 2009; Benacchio 2010). Many questions are still unsolved.

4 For example, there is no variation across Slavic in the fact that phase verbs governonly imperfective verbs (e.g., Slovenian zacel je pisati (IPFV) vs. *zacel je napisati(PFV) ‘began to write’). In contrast, only in the East and West Slavic languages,the analytical future is restricted to imperfective verbs (e.g., Polish bedzie pisał vs.*bedzie napisał ‘will write’). Generally, in all Slavic languages some inflectionalforms are restricted to verbs of a certain aspect.

300 Ruprecht von Waldenfels

However, it is clear that with respect to at least two functional domains thereis systematic variation in Slavic. The first is iterativity: use of perfective as-pect in iterative contexts is categorically excluded in some languages, such asRussian, Bulgarian or Polish, but possible in others, such as Czech, Slovak orSlovenian. The second domain concerns the so-called “general factual” useof the imperfective aspect found in some, but not all Slavic languages. Here,imperfective past forms are used in contexts where perfective past formswould be expected, resulting in a statement that stresses that something hastaken place in general without relevance of the specific circumstances oftime and space (“bare fact”).

Dickey (2000) shows that these contrasts are areally distributed and claimsthat the variation has a common conceptual background. He distinguishesaspect of the Western type, which he analyzes as being generally tied to thesituation type as viewed by the speaker, and aspect of the Eastern type,where the relation to other situations is more prominent. The Eastern Slaviclanguages Russian, Ukrainian and Belarusian as well as the Eastern SouthSlavic language Bulgarian form the core of the Eastern group with respect toaspect usage, while the West Slavic languages Czech, Slovak and WesternSouth Slavic Slovenian form the core of the Western group. Polish and Bos-nian/Croatian/Serbian are said to be intermediate between the two, withPolish tending more towards the Eastern, and Bosnian/Croatian/Serbiantending more towards the Western group.

The present study concerns aspect in negated past events and in the im-perative, two rather non-prototypical contexts of use where variation acrossSlavic may be expected.

3.2. Aspect in the imperative: Background

Aspect use in the imperative in Russian has received a fair amount of atten-tion in the literature, because aspect can fulfill specific non-actional prag-matic functions in this environment (see Paduceva 1996 for an overview).The most conspicuous case is what Rassudova (1968) calls signal k dejstviju, a‘signal to start action’ for Russian. Normally, in a telic context, the speakeruses a perfective imperative to convey the fact that the speaker wants thehearer to perform some action. For example, in (1), the speaker asks theinterlocutor to turn on the TV, and frames this in the default aspect for thistelic situation, namely, the perfective:

Explorations into variation across Slavic: Taking a bottom-up approach 301

However, the speaker may also use the imperfective aspect, even though thesituation is basically telic. This then results in a sense of expectedness ofthe request; the speaker only gives the “signal to start action”. In the nextexample, the request is expected; the interlocutors have already agreed on ashow they plan to see and the speaker simply gives the signal to start with thiscontextually definite action:

The opposition between the two uses is grounded in interlocutor relations: thespeaker knows that the hearer knows that both expect the request to take placeand both cooperate in the second example; in the first example, the commandto turn on the TV is unexpected and no contextual grounding is available.Wiemer (2008) summarizes the variation in this respect in the following way:

“[…] imperfective verbs are used if the speaker supposes that the action in ques-tion is expected […] e.g. because it belongs to the relevant script or because it hasalready been introduced; perfective verbs, in contrast, are used if the speaker doesnot suppose this and the situation in question is in this respect considered new orunexpected (Wiemer 2008, my translation)”5

It is easy to see how this leads to politeness effects. In situations of some for-mality, for example, a host will ask a guest to take off their coat. Althoughthis is a telic event, this request will be given in imperfective aspect – razde-

5 “Als Quintessenz darf man ansehen, daß im unnegierten Imperativ ipf. Verbendann gewählt werden, wenn der Sprecher voraussetzt, daß die betreffende Hand-lung sich bereits von selbst versteht, z. B. weil sie zum Skriptwissen gehört oderweil die Handlung vorher schon einmal erwähnt worden ist, pf. Verben hingegendann, wenn der Sprecher meint, dies nicht voraussetzen zu können und die jewei-lige Situation in diesem Sinne neu bzw. unerwartet ist.” (Wiemer 2008)

(1) Vkljucite (PFV) televizor, segodnja interesnaja

Turn.on.PFV.IMP.2PL TV.ACC today interesting.NOM.SG.Fperedaca.show.SG.NOM.F‘Turn on the TV, there’s something interesting on.’ (Rassudova 1968:104, my transl.)

(2) Vkljucajte (IPFV) televizor, uze sem’ casov.

Turn.on.IPFV.IMP.2PL TV.ACC already 7 o’clock.Peredaca nacinaetsja.Show.NOM.SG begins.IPFV.3SG‘Turn on the TV, it’s already 7 o’clock – the show starts.’ (Rassudova1968: 104, my transl.)

302 Ruprecht von Waldenfels

vajtes’ (IPFV), not razden’tes’ (PFV) – reflecting the fact that the host may ex-pect the guest to expect this request due to their cooperative construal of thesituation: the command is therefore expected and it would be impolite toconvey any other impression.

However, in a different situation, where this request is in fact unexpected,the same request may be rather impolite or outright rude, due to the fact thatthe speaker, by using the imperfective form, conveys an interpretation thatthe request is backed by the situational background. By doing that, thespeaker assumes authority over the hearer and threatens their negative facein the sense of Brown and Levinson (1987); see Benacchio (2002) for a pol-iteness perspective on aspect use in the Russian imperative.

Taking the literature on Russian as her starting point, Benacchio (2004,2005, 2010) uses questionnaire data to examine to what extent this and otheraspectual contrasts in the imperative are relevant in other Slavic languages.She finds that the Eastern Slavic languages Russian, Ukrainian and Belaru-sian form one group, while the other languages fall into transitional casessuch as Bulgarian and Polish as well as a core group of Sorbian, Czech, Slo-vak and Slovene that are the most dissimilar to the Eastern group (Benacchio2010: 186). In these westernmost languages, the pragmatic opposition ofperfective and imperfective aspect demonstrated above does not play a roleat all, while in the intermediate languages, a lesser degree of opposition witha different function is noted. Benacchio hypothetically links this contrast in-side Slavic to the influence of German on the Western languages (Benacchio2010: 181). Benacchio’s results are thus largely compatible to Dickey’s (2000)geographic pattern.

Benacchio also expands the perspective to include Modern Greek, a lan-guage where aspect is also relevant in the imperative. Based on work with theSlavic questionnaire adapted to Greek data, she finds that Modern Greeklargely behaves like the Western Group of Slavic, that is, the use of the im-perfective aspect is much more limited than in the Eastern languages.

3.2. Aspect in the imperative in the parallel corpus

I now turn to the present study (which was reported in Waldenfels 2012a in amore detailed but partly preliminary version). Adapting the general pro-cedural steps outlined in 2.2. above, the envelope of variation is defined asthe set of all imperatives. Therefore, all imperatives in the Russian originalversion of the novel are extracted along with their translated segments in Be-larusian, Ukrainian, Polish (two translations), Czech, Slovak, Slovene, Croa-tian, Serbian (two translations), Macedonian, Bulgarian and modern Greek.

Explorations into variation across Slavic: Taking a bottom-up approach 303

For reasons of economy, a preliminary classification is then done on thebasis of those translations where automatic annotation of morphosyntacticinformation is available. Since the Czech, Slovak, Polish, Russian, Slovenianand Bulgarian translations are tagged and information concerning aspectdoes not have to be looked up in dictionaries, the imperative forms in theselanguages are easily annotated with respect to the aspect of the imperativesinvolved. If all these versions agree in their usage of aspect, the context inquestion is deemed to be part of a non-variable core of Slavic aspect. Only ifthere is some variation in these core languages, the aspect of the imperativesin the remaining versions (Belarusian, Ukrainian, Croatian, two Serbiantranslations, Macedonian) are also annotated with the help of standard dic-tionaries (which is much more time consuming).

How can we be certain this approach does not miss a significant numberof cases where the core languages agree, but the other languages do not? Totest this possibility, a random set of 41 supposedly uniform contexts werechosen (22 perfective, 19 imperfective cases). Then, the aspect values ofthese contexts in the non-core languages were established; of an estimated150 aspect values6 across these languages, only six values, i.e., less than 4 %,differed from the value established on the basis of the tagged translations.All of these cases concerned isolated, not systematic differences in a varietyof contexts. Taking into account the high amount of essentially random vari-ation expected in translation, this is an acceptable error rate.

Altogether, the approach yielded 362 non-negated imperative contexts,194 (54 %) of which were consistently perfective, and 49 (14 %) which wereconsistently imperfective. 119 instances (33 %) were not consistent acrossSlavic. Only the non-consistent examples are used in the subsequent quanti-tative analysis, since only these contribute to groupings: with respect to theconsistent cases, the translations behave alike.

Table 1. Imperative: aspect use across translations.

6 The values for six versions (Belarusian, Ukrainian, Croatian, two Serbian trans-lations, Macedonian) were extrapolated; for 41 contexts, this amounts to 246 as-pect values. Allowing for missing data and omissions, a number of extrapolated150 values is therefore a conservative estimate.

Total Perfective Imperfective Variation

362 194 49 119

304 Ruprecht von Waldenfels

One non-consistent example is given below. It involves an instance wherethe Russian imperfective imperative otvecaj ‘answer!’ is well-grounded in thesituational context: it repeats a question just asked and can therefore beunderstood as an adequate “signal to act”, which generally calls for the use ofan imperfective command in Russian (see above Section 3.2.):

This pattern is consistent with the expectation based on both Dickey (2000)and Benacchio (2010) that translations into the languages of Dickey’s positedWestern aspectual group (i.e., Czech, Slovak and Slovene) use perfective aspectwhere Dickey’s Eastern aspectual group (East Slavic Russian, Belarusian,Ukrainian and South East Slavic Bulgarian and Macedonian) consistentlyshows the imperfective. Polish and Bosnian/Croatian/Serbian are identifiedas languages of a transitional group, and here we find that most of the trans-lations, except the Croatian translation, pattern with the Russian original inusing the imperfective in this particular context.

The crucial question is: how typical is this example for the overall patternof aspect use across Slavic? In order to abstract away from individual in-stances, the procedures outlined in Section 2.2. above were applied: all non-consistent examples were coded in a matrix of contexts intersected with lan-guage versions of the text and the overall differences between these versionswere aggregated into a distance matrix. This was visualized in a neighbor netgraph, shown in Figure 4. A further graph, shown in Figure 5, is based on

(3) a. … ty kogda=libo govoril chto=nibud’ o

You when=INDF say.PST.SG something aboutvelikom kesare? Otvecaj (IPFV)!great.LOC caesar.LOC Answer.IPFV.IMP.2SG!‘… did you ever say anything about the great Caesar? Answer!’(Russian)

b. Vidpovidaj (IPFV)! (Ukrainian)c. Adkazvaj (IPFV)! (Belarusian)d. Odpowiadaj (IPFV)! (both Polish translations)e. Odpovez (PFV)! (Czech)f. Odpovedz (PFV)! (Slovak)g. Odgovori (PFV)! (Slovene)h. Odgovori (PFV)! (Croatian)i. Odgovaraj (IPFV)! (both Serbian translations)j. Odgovaraj (IPFV)! (Macedonian)k. Otgovarjaj (IPFV)! (Bulgarian)

‘Answer!’

Explorations into variation across Slavic: Taking a bottom-up approach 305

100-fold bootstrapping and a significance level of 97 %, that is, only thosesplits that are present in 97 of 100 trials with part of the data withheld areshown.

Recall that the shortest paths between nodes in these graphs are propor-tional to their distances as calculated on the basis of aspect usage, while theactual placement of the nodes is arbitrary. In Figure 5, the Serbian, Croatianand Macedonian translations thus share the same distances to all the othertranslations in addition to the distances to each other; the fact that the twoSerbian translations (Serbian/1 and Serbian/2) are on different sides of thenear horizontal line is therefore irrelevant. With respect to the other languages,Serbian, Croatian and Macedonian behave as a group, but with internal dif-ferentiation. Such differentiation is expected in any case, since even textswritten in the same language are not grouped identically, as the two Polish

Figure 4. Neighbor net graph of aspect usage in imperatives in Slavic parallel texts.

Figure 5. Neighbor net graph of aspect usage in the imperative;100-fold bootstrapping.

306 Ruprecht von Waldenfels

translations show. Generally, texts written in languages very closely related tothe original show the most congruity on all levels of the text and thereforehave the potential to be grouped closer together than two translations of thesame text into the same, but more distantly related language.

The graphs show a very similar picture to that found in the literature: a di-vide of the translations essentially into two maximally distant groups: namelythe Eastern Slavic languages on the one hand and Slovene, Slovak and Czechon the other. Bosnian/Croatian/Serbian, Macedonian and Bulgarian are in-termediate; in accordance with the findings of Benacchio (2010), the Polishversion is closer to the Eastern Slavic languages in aspect use in the impera-tive than to the likewise geographically adjacent Czech and Slovak. The re-sults of the more qualitative questionnaire based study and the present cor-pus-driven study thus converge, which can be seen as a confirmation of bothapproaches and of the validity of their results.

3.2. Enlarging the scope to further languages

Since ParaSol also includes translations of Master i Margarita into ModernGreek, an expansion to this language is possible in a straightforward way, likein Benacchio’s questionnaire based study. Here, the advantages of the ap-proach come into view quite clearly. For the inclusion of Modern Greek, allnon-consistent contexts were also annotated for the aspect of the form usedin the Modern Greek translation with the help of a competent linguist.7

These data were then added to the matrix of Slavic aspect usage and visual-ized using SplitsTree. Figure 6 shows that the results are very similar to thoseof Benacchio (2010), i.e., the Modern Greek translation is shown to be simi-lar to the translations in the languages of the Western Group.

However, it should be recalled that only those contexts are taken into ac-count where Slavic exhibits variation. Cases where the Slavic varieties in-volve the same aspect value are ignored since they do not, by definition, con-tribute anything to the differentiation of the Slavic languages. The expansionof this data basis to other languages such as Modern Greek is only a valid ap-proach if that language conforms to the Slavic pattern in this part of the data,too; otherwise, we are ignoring a substantial part of the evidence making itdifferent from Slavic as a whole. Note that this is also a potential problemwith questionnaires, where one needs to ensure that the questions capture allareas that are relevant in the languages the questionnaire is applied to.

7 I am indebted to Yannis Kakridis, Bern, for the annotation of most of the Greekdata.

Explorations into variation across Slavic: Taking a bottom-up approach 307

In order to ascertain whether Greek uses the same aspect as Slavic wherethere is no inner-Slavic variation, the random sample used above to investi-gate the uniformity of the Slavic translations was annotated for the aspectvalues in the Greek translation. The results are given in Table 2.

Table 2. Greek aspect usage in a random sample of contexts where core Slavictranslations use imperfective (22 cases) or perfective (19 cases) aspect.

As Table 2 shows, the Greek translation is clearly not consistent with Slavicwhere imperfective aspect is concerned.8 In fact, only in five out of 41 cases,imperfective aspect is used in Greek, suggesting that in general, perfectiveaspect in the imperative is much more pervasive in Greek than in Slavic.

As a conclusion, this study cannot confirm Benacchio’s (2010) result thatGreek is more like Czech and Slovene with regard to aspect usage in the im-perative. The likeness of the Western aspect group to Greek in our data isonly present as long as one restricts the focus to those contexts where thereis cross-Slavic variation; it disappears if one takes into account the wholerange of data.

8 According to a chi square test the difference in distribution of perfective and im-perfective aspect in the Greek and Slavic sample is highly significant with a prob-ability of p<0.001 for identity.

Slavic imperfective Slavic perfective Total

Greek IPFV 4 1 5Greek PFV 15 21 36

Total 19 22 41

Figure 6. Neighbor net graph of aspect use in the imperative including a ModernGreek translation; 100-fold bootstrapping.

308 Ruprecht von Waldenfels

This case illustrates an important advantage of our approach: using paral-lel texts, we have data from a complex domain to our disposal which is pro-duced independently of our research question. This has several advantages.First of all, the data can be reused as new hypotheses are developed. Sec-ondly, we do not have to provide for all possible questions and hypothesesduring data acquisition. A potential methodological problem of using ques-tionnaires, as with any other approach that involves hypothesis-lead primarydata acquisition, is that one runs the risk of using skewed data because thewrong questions had been asked at an early stage in research. This risk can beavoided using parallel texts.

3.3. Aspect in negated past events

The second case study involves a further study of aspect usage in a differentrestricted environment, namely with negated past events. This environmentwas chosen because negation often, but not always, leads to aspect shift incomparison to non-negated utterances. I will restrict myself to a general out-line of the issue.

Simplifying the issue, we can say that as a general default across Slavic,telic situations, that is, situations involving an intrinsic boundary such as win-

ning a race, opening a door or kicking a ball are normally expressed with perfectiveverb forms. Atelic situations, in contrast, are situations that do not intrinsi-cally involve a boundary, such as running, working, or loving. Such situations aregenerally prone to be expressed with imperfective verb forms in Slavic.

Using this pattern as a starting point, it is not immediately clear which ofthe two classes a negated event belongs to. For example, the situation of clos-ing a door is clearly telic, i.e., it involves some activity that results in a change-of-state after which the door is closed and would be expressed with a perfec-tive verb form as a default. However, not closing a door is more difficult: onthe one hand, it can be seen as a state without an internal change-of-state andmay therefore be expected to be expressed by an imperfective form. On theother hand, it may be seen as a telic event in as far as it happens at a specificpoint in time: he didn’t close the door at that point and it thus stayed open. Here, ne-gation concerns non-action at “a specific juncture” (Forsyth 1970: 104); in asense, refraining from opening is equivalent to a positive, telic situation witha specific resulting state-of-affairs.

The interaction between predicate negation and aspect is thus complexand variation across Slavic may be expected. Note that cross-linguistically,perfective aspect is not generally less compatible with negation than imper-fective aspect (Miestamo and van der Auwera 2011). The question has not,

Explorations into variation across Slavic: Taking a bottom-up approach 309

to my knowledge, been studied from a cross-Slavic perspective, with the ex-ception of Dickey and Kresin (2009) which include a preliminary investi-gation of six Slavic languages as an appendix to a comparative study of as-pect and negation in Russian and Czech. This preliminary study is, like thepresent study, based on parallel translations and confirms the East-Westdivision concerning aspect arrived at in Dickey (2000).

I now turn to the present study. Extracting the data for an investigation ofaspect in negated past events is more complex than for the imperative study,since the set of constructions and contexts that negated past verb forms areembedded in is larger and more diverse than imperative constructions. There-fore, conditionals, composite predicates, negative polar questions, existentialpredications and other contexts that were likely to influence aspect choicewere excluded from analysis. All past tense forms were taken into account in-discriminately, even though their inventory varies across Slavic.

Aside from these decisions, the procedure followed was the same as inthe imperative study; again, the study took Russian negated past verb formsin Master i Margarita as the point of departure and compared aspect use inthese forms as well as in the Slavic translations. Of 755 cases, 352 proved rel-evant.

About one third (130 cases) involved non-variation in the use of perfec-tive aspect, as in the next example (4). The Slovene translation is added forillustration; here, all versions involve perfective aspect:

(4) a. Nuzno li govorit’, cto ona ne

Necessary Q say.INF COMP she NEGvernulas’? (Russian)return.PFV.PST.SG.F

b. Komaj da je vredno omenjati, da se

Hardly COMP is necessary remark.INF COMP REFLni vrnila! (Slovene)NEG return.PFV.SG.F‘Needless to say, she never came back!’ (lit. ‘is it necessary to say thatshe didn’t come back?’)

Table 3. Negative past word forms: aspect use across translations.

Total Perfective Imperfective Variation

352 130 100 122

310 Ruprecht von Waldenfels

Use of the perfective may be explained here by the non-occurrence of a telicevent at a specific juncture; contrary to what could have also happened, theperson did not return.

Less than a third of the attestations (100 cases) involved non-variation inthe use of imperfective aspect, as in (5):

Here, an atelic state (knowing) is involved and, as may be expected, imper-fective aspect, which is the default case for atelic situations, is used in all ofthe translations.

Finally, in more than a third of the contexts (122 cases), we find variation.These examples were again visualized using SplitsTree; the result after boot-strapping is shown in Figure 7.

The results are, as was found for the imperative, broadly compatible withthe East-West-Divide in aspect usage found in Dickey (2000). In fact, threegroups seem to exist: an Eastern Group, including the Eastern Slavic language

(5) a. Nikto ne znal o nasej

Nobody NEG knew.IPFV.PST.SG.M of our.LOCsvjazi […]. (Russian)relationship.LOC

b. Nikt nie wiedział o naszym

Nobody NEG knew.IPFV.PST.SG.M of our.LOCzwiazku […]. (Polish)relationship.LOC‘No one knew of our liaison’

Figure 7. Neighbor net graph of aspect use in negated past events after 100-foldbootstrapping.

Explorations into variation across Slavic: Taking a bottom-up approach 311

versions and perhaps the Bulgarian text; an intermediate group, includingboth Bosnian/Croatian/Serbian and Polish, and a western Group includingCzech, Slovak and Slovene. However, this latter group is very diverse and isunited less by similarities than by common dissimilarities in regard to theother translations: the distance between, say, Slovenian and Czech is as largeas the distance between Slovenian and Belarusian. Dickey and Kresin’s(2009: 168) tentative finding that Polish is less similar to Russian in this en-vironment than in others is supported here. As a next step, an expansion ofthe data basis and more qualitative work is necessary, especially involving acareful examination of individual contexts.

3.4. Reflexive marking

3.4.1. Background

The third case study concerns the use of reflexive marking. Reflexive ormiddle marking is found in the Slavic languages in both lexical and grammati-cal functions. Besides being used to signal diathesis categories such as reflex-ive, reciprocal, passive or impersonal, it is a widespread derivational or, moregeneral, lexical morpheme; in fact, this is clearly the most frequent function interms of text token frequency. However, as Kemmer (1993) argues, use of thereflexive morpheme in lexical function is not arbitrary but generally subject tocross-linguistic patterns: there are natural classes of middle marked situations.Along the lines of these “middle marked situations”, there is cross-linguisticvariation in the extent to which languages employ middle marking. In a re-lated proposal, Nichols, Peterson, and Barnes (2004) distinguish lexicallydominantly transitivizing and detransitivizing languages, with reflexivizationor middle marking being a detransitivizing strategy.

Since German uses reflexive marking for detransitive situations less oftenthan Russian, it seems a worthwhile question to ask how closely the Slaviclanguages match each other in the use of the reflexive marker and whetherthe different degrees of influences of German on the Slavic languages is re-flected in this domain. Therefore the German, rather than Greek, translationwas added, in order to examine possible effects of language contact. In ad-dition to the approach relying on manual coding, for this case, a more auto-matic quantitative extraction method was employed.

All Slavic languages possess a set of reflexive morphemes of both light (se,sa, so, si, -sja) and heavy (sobie, sebe, etc.) type. In the East Slavic languages, thelight morphemes are postfixes attached to the end of the word form; in allother languages, they are (partly second position) clitics. The heavy forms

312 Ruprecht von Waldenfels

are never clitics. In Polish and the East Slavic languages Russian, Ukrainianand Belarusian, only accusative light morphemes are found; in the other lan-guages, we also have dative reflexives. Note that I speak of these markers asreflexive solely for purposes of convenience; reflexive marking is but one oftheir functions, others including detransitivizing derivation, passive, imper-sonal and various other diathesis functions (for a recent overview, see Fehr-mann, Junghanns, and Lenertová 2010).

The reflexive light morpheme (se, -sja, etc.) is systematically used to deriveintransitive from transitive lexemes in Slavic, e.g., Russian slomat’ ‘break(trans.)’ vs. slomat’sja ‘break (intrans.)’. In general, because of the pervasive-ness of this type, the Slavic languages are considered to be intransitivizing lan-

guages that derive intransitive verbs on the basis of transitive verbs rather thanvice versa, according to the typology forwarded by Nichols, Peterson, andBarnes (2004). Aside from this derivational use, many verbs are reflexiva tan-tum, e.g., Russian reflexive ulybat’sja ‘to smile’, which does not have acounterpart in a non-reflexive verb *ulybat’. In fact, according to a count byKalasnikova and Say (2006), the group of reflexiva tantum forms the majorityof reflexive usages in Russian.

It has to be noted that the amount of variation in the translation of reflex-ives is much greater than in the case of verbal aspect; this is, of course, tied tothe fact that aspect is much more integrated into the grammatical system whilethe reflexive marker is, most often, a lexically conditioned morpheme. Assuch, the choice between lexical items of one or the other class is much morevariable than the choice of a grammatical category, which may be chosen in aparticular context irrespective of lexical filling (e.g., compare the use of thegrammatical category tense against the lexical category of particle verbs).

The idea that the distribution of reflexive or middle marking essentiallyfollows semantic criteria means that in some cases we may expect the trans-lations to agree on using reflexive verbs but to differ in the actual choice oflexical item. This finds some support in the data, viz. (6) involving the Rus-sian reflexive verb posatnut’sja ‘to sway’:

(6) a. Arestovannyj posatnul-sja, no sovladal

Arrested.NOM.SG sway.PST.SG-REFL but got.hold.PST.SGs soboju […] i otvetil chriplo: (Russian)with REFL.INS and answer.PST.SG hoarsely

b. Aresztowany zachwiał sie, ale przemógł

Arrested.NOM.SG sway.PST.3SG REFL but overcamesłabos c […]i ochryple powiedział: (Polish)weakness and hoarsely say.PST.3SG

Explorations into variation across Slavic: Taking a bottom-up approach 313

Example (6) involves a reflexive verb in all versions except the Czech, Croatianand one of the two Serbian translations. However, this is not due to an inheritedreflexive verb; in almost all of these cases etymologically different verbs werechosen. The following reflexive forms are involved: Belarusian pachisnuusja,Ukrainian chytnuvsja, Polish zachwiał sie (both translations), Slovak zatackal sa,Slovene zamajal se, Serbian zatetura se, Macedonian zanisa se, Bulgarian oljulja se.The non-reflexive cases involve Czech zavrávoral, Croatian je zateturao, Serbianposrnu (note that the verb zateturati may be used in Bosnian/Croatian/Serbianboth with or without reflexive element to denote the meaning in question).

Cases like these speak for the reflexive morpheme to be semantically con-ditioned, that is, they may reflect a general propensity of the Slavic languagesto mark certain situations as middle situations in the sense of Kemmer (1993),a propensity not shared to the same extent by German. Note that the Germantranslation, in accordance with expectation, indeed does not involve a reflex-ive verb; neither do three of the twelve Slavic versions. This may or may notreflect the influence of German on the make-up of the lexicon of these Slaviclanguages; based on this example alone, this is difficult to judge.

However, the hypothetical propensity to mark middle situations is evi-dently not the only factor driving the use of a reflexive vs. a non-reflexiveverb in these translations. Example (7) illustrates a case where this choice be-tween reflexive and non-reflexive verb is at the same time the choice be-tween a more basic and a semantically more elaborate verb:

c. Der Gefangene wankte, doch er riß sich

The prisoner sway.PST.3SG, but he pull.PST.3SG REFLzusammen […]und antwortete heiser : (German)together and answer.PST.3SG hoarsely

d. Vezen zavrávoral, ale pak se s námahou

Prisoner sway.PST.3SG but again REFL with might.INSovládl […] a odpovedel chraplave: (Czech)control.PST.3SG and answer.PST.3SG hoarsely‘The arrested man swayed, but got hold of himself [, his colour re-turned, he caught his breath] and answered hoarsely’

(7) a. Kirpic ni s togo ni s sego […]

Brick NEG from this.GEN NEG from that.GENnikomu i nikogda na golovu ne

nobody.DAT and never on head.ACC NEGsvalitsja. (Russian)falls.3SG

314 Ruprecht von Waldenfels

The Russian original employs the reflexive verb svalit’sja ‘fall down’, whichrefers to something heavy that falls down with some impetus; the verb is de-rived from the transitive svalit’ ‘throw down’. This word fits very well into thegiven context: the setting is the Stalinist Moscow of the thirties, where thedevil has come to visit and is now explaining to two citizens that nothinghappens without a reason, especially not so-called accidents. The fact thatthe verb used stands in a derivational relationship with a volitional transitiveverb seems to emphasize the tension between non-caused, accidental mo-tion and a potential (divine or other) force causing the incident.

In this case, however, only three of the translations follow the Russianexample in the choice of a detransitive reflexive marked verb: The Serbiantranslation adduced above uses srusiti se, which is derived from srusiti ‘throw’,while the Bulgarian translation involves the analogue se stovari. The Ukrai-nian translation faithfully follows Russian in using the cognate form valit’sja.All other translations choose cognates of the Slavic basic word for falling,e.g., Belarusian upadac as illustrated in (7b). Note that in Russian, the sameoption to use the verb upast’ rather than svalit’sja would have likewise beengiven.9

One can speculate why the basic verb and not a more elaborate and ex-pressive reflexive candidate was chosen in so many of the Slavic translations.However, it seems clear that the choice between such alternatives is muchmore subject to individual decisions of the translator than the choice of as-pect, which is more dependent on a general construal of the situation. It isfor this reason, I suggest, that the number of examples necessary to arrive at

9 Interestingly, the verb pasti ‘fall’ was also used with a reflexive marker without adifference in meaning in Old Church Slavonic (Yannis Kakridis, p.c.).

b. Caglina ni z tago ni z sjago […]

Brick NEG from this.GEN NEG from that.GENnikomu i nikoli na golau ne

nobody.DAT and never on head.ACC NEGupadze. (Belarusian)falls.3SG

c. Cigla se nikada i nikome […] jos nije

Brick REFL never and nobody.DAT yet NEG.be.3SGsrusila na glavu tek onako. (Serbian)fall.PST.SG on head.ACC that easily.‘“No brick,” [the stranger interrupted imposingly,] “will ever fall onanyone’s head just out of the blue.”’

Explorations into variation across Slavic: Taking a bottom-up approach 315

a clear, interpretable picture seems to be larger than in the study of aspectabove. This will be seen shortly.

A second frequent case in the sample concerns inherited divergence oflexemes across the Slavic languages. This is reflected in the next example. Inall but one translation, cognate items are involved, which, however, are con-sistently non-reflexive in all Western and South Slavic languages, while onlyRussian, Belarusian and Ukrainian, that is, the Eastern Slavic languages,involve the reflexive marker with this lexeme (only Russian and Polish areglossed to avoid repetition): 10

I now turn to the study itself. Here, the absolute number of instances of thevariable is much higher than in the two aspect studies. Of the 10 278 sen-tences in the Russian original, 7701 sentences, that is, 75 %, involve a reflex-ive marker in either the original or at least one Slavic translation. Since weare also interested in possible language contact effects, we also include Ger-man as far as this is possible. The number of sentences which involve re-flexives in either a Slavic language or German is only slightly higher atn=7853.11

10 This example is taken from the Ostrovskij-Subcorpus of ParaSol for convenience.11 These numbers were approximated by filtering for the relevant reflexive mor-

phemes; in German, this involves only third person reflexives, since first and sec-ond reflexive pronouns do not take a dedicated form.

(8) a. My tut ostaem-sja. (Russian)10

We here stay.1PL-REFLb. My tut zastaem-sja. (Belarusian)c. Mi tut lisajemo-sja. (Ukrainian)d. My tutaj zostajemy. (Polish)

We here stay.1PLe. My tu zùstaneme. (Czech)f. My ostaneme tu. (Slovak)g. Wostanjemy

tu.

(Upper Sorbian)

h. Midva ostaneva tu. (Slovenian)i. Mi ostajemo ovde. (Croatian)j. Nie ostanuvame ovde. (Macedonian)k. Nie ostavame tuk. (Bulgarian)

‘We stay here.’

316 Ruprecht von Waldenfels

Two different approaches were attempted to assess the overall similarityand dissimilarity of reflexive marking. First, a procedure largely analogous tothe one described above for the two aspect studies was performed using datafrom Master i Margarita. To do this, a random sample of 450 cases were takenfrom the set of sentences where, either in one of the Slavic versions or inGerman, a reflexive form was used. This sample was manually annotated toreflect the reflexive marking in equivalent forms across language versions. Inprinciple, any use of a (light) accusative reflexive was taken into account, thatis, the distinction between lexical and grammatical functions of the reflexivemarking was disregarded. Dative reflexives were not taken into account. Alarge amount of filtering was necessary that excluded, among others, lexicallyoverly complex cases, alignment mistakes, and those isolated cases of reflex-ive verb usage where free translations made the comparison across trans-lations problematic; cf. the motto of the novel, taken from Goethe’s Faust,where only the Belarusian translation of all Slavic translations employs a re-flexive form for reasons of rhyme:

Examples like these, as well as numerous mistakes, were filtered out. All inall, 196 examples remained that were taken into consideration. All of theseexamples were classified as either reflexive or non-reflexive for each lan-guage and the resulting matrix was input into SplitsTree. Note that only 17examples – less than 10 % – were uniformly reflexive across all versions;variation was therefore more wide-spread than in the aspect study even inthis manually cleaned sample. Since variation was persuasive, the completedata were used during visualization.

(9) a. Castka sily toj lichoj, Dabro

Part power.GEN this.GEN bad.GEN good.NOMutvaraec=ca z jakoj. (Belarusian)create.3SG=REFL with which.INS.

b. Ja – cast’ toj sily, cto vecno chocet

I part that.GEN power.GEN COMP always want.3SGzla i vecno soversaet blago. (Russian)bad.GEN and always do.3SG good.ACC

c. Ein Teil von jener Kraft, die stets das Böse will

A part of that power which always the evil wantsund stets das Gute schafft. (German)and always the good creates‘I am part of that power which eternally wills evil and eternally worksgood.’

Explorations into variation across Slavic: Taking a bottom-up approach 317

Table 4. Reflexive: aspect use across translations.

Figure 8 shows the resulting graph. Most groupings are expected – the twoPolish translations, all the Bosnian/Croatian/Serbian translations, Czechand Slovak are grouped together. However, Belarusian is not grouped withEast Slavic, but nearer to Czech and Slovak. This seems to be due to a singlefrequent verb cluster: Only Czech, Slovak and Belarusian use reflexive verbsfor asking (e.g., Czech ptát se), which was present in six cases; excluding thesecases moved Belarusian into the Eastern Slavic group (this is not shownhere). No Slavic language is grouped with German.

It is clear that at this point an attempt to interpret the graph suffers fromthe problem that we do not know to what extent it reflects variation in thedata that should be treated as noise.

Variation found here is much larger than for aspect and the picture arrivedat is less systematic. Figure 9 shows the result of a bootstrapping validationof the data: here, most of the structure is removed. Belarusian continues tobe grouped with Czech and Slovak; the other groupings are those to be ex-pected from sheer linguistic similarity of the target languages. German issimply equidistant from all Slavic translations.

Total Random sample after filtering variation

7852 450 196 179

Figure 8. Neighbor net graph of reflexive; manual annotation (196 cases).

318 Ruprecht von Waldenfels

Since the graph is much less informative than the aspect graphs we have seenabove in regard to internal consistency, a second set-up was attempted. Inthis approach, all segments of the texts where either the original or anyof its Slavic translations contains a reflexive element were extracted. Notethat German could not be included, as the reflexive use of the first andsecond person plural cannot be reliably distinguished on the basis of theirform alone (i.e., mich, dich, uns may or may not be used with reflexive subjectreference).

From the resulting set, all cases with multiple uses of the reflexive elementin any of the languages were excluded as it would not have been possible toautomatically decide which reflexive elements correspond to which elementin the other languages. This condition disfavors large aligned segments, mak-ing the correspondence across languages as specific as possible. Altogether,of 7701 instances with reflexives in a Slavic language, 4326 were used. In thisway, a large subset of all examples with reflexive elements was extracted, withsome confidence that the uses of the reflexive element actually relate toequivalent cases across the texts.

While this approach of course is certainly more error-ridden than themanual evaluation, the resulting set is much larger. This set of examples wasthen transformed into a matrix and input into SplitsTree, analogous to theapproach described above.

Figure 9. Neighbor net graph of reflexive; manual annotation with 100-fold boot-strapping (97 % significance).

Explorations into variation across Slavic: Taking a bottom-up approach 319

The resulting graph, given in Figure 10, is very different from what wasfound for the two aspect studies above. Rather than displaying two poles, thetranslations are assembled in a circle that closely resembles the genealogicalpicture of Slavic. The Eastern Slavic languages (Belarusian, Russian and Uk-rainian) and the South Slavic languages (Bulgarian, Macedonian, Serbian andCroatian) cluster most clearly. The South Slavic language Slovenian takes anintermediate position between West Slavic Czech and Slovak and the otherSouth languages, which closely reflects the actual geographic and genealogi-cal relations. Polish, as can likewise be expected, is positioned betweenCzech/Slovak and the Eastern Slavic languages.

This rather clear picture is attained despite two major sources of statisticalnoise in the data – the high amount of variation introduced by the translationprocess, and automatic extraction, which is naturally more error-ridden thanmanual extraction. This noise is thus effectively drowned out in the largenumber of examples and a systematic relationship is found in the data. Boot-strapping (not shown here) does not affect the picture, since the data set islarge and varied.

Note that this procedure yields all cases of reflexive marking, i.e. not onlylexical reflexives, but also grammatical voice marking (passive or imper-sonal). However, based on Kalasnikova and Say’s (2006) study on Russian as

Figure 10. Neighbor net graph resulting from an automatic extraction of 4326examples of reflexive marking. German not taken into account.

320 Ruprecht von Waldenfels

well as manual inspection of other Slavic languages, we assume that lexicalreflexives are so much more frequent than grammatical reflexives that thegraph basically reflects only the distribution of lexical reflexives.

The result attained suggests that proximity of the pattern of reflexive verbdistribution in translation is, by and large, parallel to the genealogical pro-ximity of the languages. In other words, these data suggest that languagecontact has not shaped the use of reflexive verbs in a major way, and that dif-ferences are rather due to a general drift of the languages away from a com-mon ancestor.

This may be a plausible result, but it is by far not the only interpretation.Perhaps the investigation of lexical material generally tends to yield a cleargenealogical picture. It may also be that issues in alignment quality, which arenot controlled for in the automatic approach, lead to a disproportionate ef-fect of a genealogical factor. Overall, further research is necessary to inves-tigate in a more detailed fashion the impact that peculiarities of the data set,general properties of the method and characteristics of the investigated vari-ables have on this result.

4. Conclusions

In this paper, three case studies were presented involving a corpus-based ap-proach to inner-Slavic convergence and divergence. In general, it seems thisapproach can easily be adapted to be used in other areas such as dialectology,typology or register studies.

In the case of two aspect studies, a limited amount of manually extractedexamples yielded results that were, by and large, compatible with other, morequalitative studies (Dickey 2000; Dickey and Kresin 2009; Benacchio 2010).This suggests that manual extraction seems both feasible and revealing withgrammatical functional categories that are obligatory to a large extent. Inthe case of the imperative study, especially, this approach gave good resultsbecause (a) imperatives were most often translated by imperatives, and (b)aspect is an obligatory category of all verbal forms in Slavic. The amount ofvariability introduced by translation was therefore limited and rather system-atic. With negated past events, variability was higher, since translation didnot always involve equivalent verbal forms; consequently, more examplesneeded to be filtered out. However, the approach still yielded internally con-sistent graphs, and a clear pattern along an East to West spread of the dataemerged in both cases. In further research, these data need to be investigatedin more detail in order to ascertain what the factors driving this bipartitionare: can we find evidence for a contact-induced development of the aspect

Explorations into variation across Slavic: Taking a bottom-up approach 321

category, for example, or are we just dealing with random drift leading to di-vergence in the language family?

The third case study concerned the more lexical category of reflexive mark-ing. In this case, the greater degree of freedom given in the use of lexicalelements leads to more variability in translation. This makes it harder to findsystematic relations in a limited amount of hand-annotated examples. Here, amore automatic procedure involving the extraction of over 4000 examplesproved more effective than manual annotation. The picture attained was con-sistent with the known genealogical relationship of the languages. This couldbe interpreted in several ways. It may be that language contact, e.g., with Ger-man in the case of the West and Southwest Slavic languages, had a rather weakimpact on the derivational type of the languages in question; but it may also bethat the method favors the genealogical factor in the data for other reasons,such as an inherently higher amount of variation in the use of lexical featuresor because of undesired effects of an automatic extraction procedure.

The conclusions that can be drawn from these case studies are thus ratherpreliminary. Many details remain to be understood and more research intothe properties of these methods is necessary in order to arrive at a clearer in-terpretation of their results. However, such a quantitative approach to diver-gence and convergence in the use of grammatical and lexical categoriesacross Slavic on the basis of parallel texts promises to reveal tendencies andgroupings that have shaped the Slavic languages beyond genealogical re-latedness found in the phonological and morphological patterns.

References

Benacchio, Rosanna 2002 Konkurencija vidov, vezlivost’ i etiket v russkom imper-ative [Aspectual Competition, Politeness and Etiquette in the Russian Imperative].Russian Linguistics 26(2): 149–178.

Benacchio, Rosanna 2004 Glagol’nyj vid v imperative v juznoslavjanskich jazykach[Verbal aspect in the imperative in the South Slavic languages]. In: Sokrovennyesmysli. Slovo. Tekst. Kul’tura, 267–275. Moskva: Jazyki slavjanskoj kul’tury.

Benacchio, Rosanna 2005 Glagol’nyj vid v imperative v cesskom i slovackom jazy-kach [Verbal aspect in the imperative in Czech and Slovak]. In: Jazyk. Licnost’.Tekst, 191–200 Moskva: Jazyki slavjanskoj kul’tury.

Benacchio, Rosanna 2010 Vid i kategorija vezlivosti v slavjanskom imperative. Sravnitel’nyjanaliz [Aspect and the category of politeness in the Slavic imperative. A com-parative analysis]. München, Berlin: Kubon und Sagner.

Brown, Penelope and Stephen C. Levinson 1987 Politeness. Some Universals in Lan-guage Usage. Cambridge: Cambridge University Press.

Cysouw, Michael and Bernhard Wälchli (eds.) 2007 Parallel Texts: Using trans-lational equivalents in linguistic typology. Special Issue of Sprachtypologie und Uni-versalienforschung STUF 60(2).

322 Ruprecht von Waldenfels

Croft, William 2003 Typology and Universals. Second edition. Cambridge: CambridgeUniversity Press.

Dahl, Östen 1985 Tense and Aspect Systems. Oxford, New York: Basil Blackwell.Daniel, Michael 2010 Linguistic typology and the study of language. In: Jae Jung

Song (ed.), The Oxford Handbook of Linguistic Typology, 43–68. Oxford: Oxford Uni-versity Press.

Dickey, Stephen M. 2000 Parameters of Slavic Aspect. A Cognitive Approach. Stanford:CSLI Publications.

Dickey, Stephen M. and Susan C. Kresin 2009 Verbal aspect and negation in Rus-sian and Czech. Russian Linguistics 33(2): 121–176.

Fehrmann, Dorothee, Uwe Junghanns and Denisa Lenertová 2010 Two reflexivemarkers in Slavic. Russian Linguistics 34: 203–238.

Forsyth, John 1970 A Grammar of Aspect: Usage and Meaning in the Russian Verb. Cam-bridge: Cambridge University Press.

Huson, Daniel H. and David Bryant 2006 Application of phylogenetic networks inevolutionary studies. Molecular Biology and Evolution 23: 254–267.

Kalasnikova, Ksenja V. and Sergej S. Say 2006 Sistemnye otnosenija mezdu klas-sami russkich refleksivnych glagolov v svjazi s ich castotnymi charakteristikami(po dannym korpusnogo issledovanija) [Systematic relations between classes ofRussian reflexive verbs and their frequencies (on the basis of a korpus study)]. In:Viktor S. Chrakovskij, Sergej Ju. Dmitrenko, and Natalja M. Zaika (eds.), Problemytipologii i obs cej lingvistiki [Problems of typology and general linguistics], 56–64.St. Petersburg: Nestor – Istorija.

Kemmer, Suzanne 1993 The Middle Voice. Amsterdam, Philadelphia: John Benjamins.Kibrik, Aleksandr E. 1998 Does intragenetic typology make sense? In: Winfried

Boeder, Christoph Schroeder, Karl Heinz Wagner, and Wolfgang Wildgen (eds.),Sprache im Raum und Zeit. In memoriam Johannes Bechert. Band 2, 61–68. Tübingen:Gunter Narr.

Kibrik, Aleksandr E. 2005 Konstanty i peremennye jazyka.[Constants and variables oflanguage] St. Petersburg: Aletejja.

Labov, William 2004 Quantitative reasoning in linguistics. In: Ulrich Ammon,Norbert Dittmar, Klaus J. Mattheier, and Peter Trudgill (eds.), Sociolinguistics/ Sozio-linguistik: An International Handbook of the Science of Language and Society. 2nd edition 1,6–22. Berlin, New York: Walter de Gruyter.

Mares, Frantisek 1980 Die Tetrachotomie und doppelte Dichotomie der slavischenSprachen. Wiener slavistisches Jahrbuch 26: 33–45.

Miestamo, Matti and Johan van der Auwera 2011 Negation and perfective vs. im-perfective aspect. In: Jesse Mortelmans, Tanja Mortelmans, and Walter De Mulder(eds.), From Now to Eternity, 65–84. (Cahiers Chronos 22.) Amsterdam, New York:Rodopi.

Murelli, Adriano 2011 Relative Constructions in European Non-Standard Varieties. Ber-lin/New York: de Gruyter (Empirical Approaches to Language Typology 50.)

Nichols, Johanna, David A. Peterson, and Jonathan Barnes 2004 Transitivizing anddetransitivizing languages. Linguistic Typology 8(2): 149–211

Nichols, Johanna and Tandy Warnow 2008 Tutorial on computational linguisticphylogeny. Language and Linguistics Compass 2(5): 760–820.

Paduceva, Elena 1996 Semanticeskie issledovanija. [Semantic investigations] Moskva:Skola Jazyki russkoj kul’tury.

Explorations into variation across Slavic: Taking a bottom-up approach 323

Petruchina, Elena 2000 Aspektual’nye kategorii glagola v russkom jazyke v sopostavlenii scesskim, slovackim, pol’skim i bolgarskim jazykami. [Aspectual categories of the verb in Rus-sian in contrast to Czech, Slovak, Polish and Bolgarian]. Moskva: Izdatel’stvo Mos-kovskogo universiteta.

Rassudova, Ol’ga P. 1968 Upotreblenie vidov glagola v russkom jazyke. [The use of verbal as-pect in Russian]. Moskva: Izdatel’stvo Moskovskogo universiteta.

Sussex, Roland and Paul Cubberley 2006 The Slavic Languages. (Cambridge LanguageSurveys.) Cambridge: Cambridge University Press.

von Waldenfels, Ruprecht 2006 Compiling a parallel corpus of Slavic languages.text strategies, tools and the question of lemmatization in alignment. In: BernhardBrehmer, Vladislava Zdanova, and Rafał Zimny (eds.), Beiträge der EuropäischenSlavistischen Linguistik (POLYSLAV) 9, 123–138. München: Sagner.

von Waldenfels, Ruprecht 2011 Recent developments in ParaSol: Breadth for depthand XSLT based web concordancing with CWB. In: Daniela Majchráková andRadovan Garabík (eds.), Natural Language Processing, Multilinguality. Proceedings ofSlovko 2011, 156–162. Bratislava: Tribun.

von Waldenfels, Ruprecht 2012a Aspect in the imperative across Slavic – a corpusdriven pilot study. In: Atle Grønn and Anna Pazelskaya (eds.), The Russian Verb.Oslo Studies in Language 4: 141–154

von Waldenfels, Ruprecht 2012b Polish tea is Czech coffee: advantages and pitfalls inusing a parallel corpus in linguistic research. In: Andrea Ender, Adrian Leemann,and Bernhard Wälchli (eds.), Methods in Contemporary Linguistics, 262–283. Berlin,Boston: Mouton de Gruyter.

Wälchli, Bernhard and Ruprecht von Waldenfels 2013 Measuring morphosemanticlanguage distance in parallel texts. In: Anju Saxena and Lars Borin (eds.), Ap-proaches to Measuring Linguistic Differences. Berlin, Boston: Mouton de Gruyter.

Wiemer, Björn 2008 Zur innerslavischen Variation bei der Aspektwahl und derGewichtung ihrer Faktoren. In: Karl Gutschmidt, Ulrike Jekutsch, SebastianKempgen, and Ludger Udolph, (eds.): Deutsche Beiträge zum 14. Internationalen Slav-istenkongress, Ohrid 2008, 383–409 (Die Welt der Slaven. Sammelbände / Sbor-niki 30.). München: Sagner.

Wingender, Monika forthc. Typen slavischer Standardsprachen. In: SebastianKempgen, Peter Kosta, and Tilman Berger (eds.), Die slavischen Sprachen / The SlavicLanguages. Ein internationales Handbuch zu ihrer Struktur, ihrer Geschichte und ihrer Er-forschung / An International Handbook of their Structure, their History and their Investi-gation. Band 2 / Volume 2. Berlin, New York: de Gruyter.