Unpacking Slavic Modality (Divjak, Szymor, Lyashevskaya, Ovsjannikova, Clancy, Stanojevic)

28
Unpacking Slavic Modality Dagmar Divjak, Nina Szymor, Olga Lyashevskaya, Masha Ovsjannikova, Steven Clancy, Mateusz-Milan Stanojević

Transcript of Unpacking Slavic Modality (Divjak, Szymor, Lyashevskaya, Ovsjannikova, Clancy, Stanojevic)

Unpacking Slavic Modality

Dagmar Divjak, Nina Szymor, Olga Lyashevskaya,

Masha Ovsjannikova, Steven Clancy,

Mateusz-Milan Stanojević

Accounts of modality

Roots

philosophy: modal logic

pragmatics: speech acts

linguistics: English vs typological Perkins 1983, Huddleston 1988, Sweetser 1990, Bybee et al. 1994, van der

Auwera and Plungian 1998, Palmer 2001, Hengeveld 2004, Nuyts 2006

Issues

Lack of agreement on

1. number of modal types to distinguish

? 2, 3, 4

2. labels and definitions of each type

Modal types (van der Auwera & Plungian 1998)

modality = “semantic domains that involve possibility and necessity as

paradigmatic variants, that is, as constituting a paradigm with two possible choices”.

1. Participant-internal - participant’s internal abilities and needs,

e.g. Boris can get by with sleeping five hours a night

2. Participant-external - possibilities and necessities influenced by

factors external to the participant, e.g. To get to the station, you can

take bus 66

3. Deontic - permissions and obligations imposed on the participant

by social/moral/legal norms, e.g. John may leave now

4. Epistemic - a proposition is judged to be uncertain or probable

relative to some judgement(s), e.g. John may have arrived

Interrater (dis)agreement

modality_type (Auwera & Plungian 1998) 88% for možno (OL-MO)

72% for powinien (NS-DD)

67% for móc (NS-AS)

63% for musieć (NS-AS)

63% for wolno (NS-DD)

60% for można (NS-AS)

Can the corpus help?

Can empirical data and

quantitative methods,

incl. visualization tools,

play as objective viewpoint

on previous intuitive analyses

of modal types and functions?

Classifying language data

Linguistically

adequate Psychologically

realistic

Toward a usage-based classification

? (how) do modality types correlate with usage

→ track behavior of words in modal contexts

→ inherently multivariate nature of language

behavior profiles: morphology, syntax, semantics

Modal functions

NECESSITY - the modal refers to the state or fact of being required or

necessary, e.g.: To switch the TV on, you have to press this button. [it’s

necessary if you want to watch the TV, but you’re not obliged to do it].

POSSIBILITY - the modal refers to the state or fact of being able to occur/exist,

being achievable, being do-able, e.g. You can lose weight by following these

steps.

PERMISSION - the modal refers to the action of officially allowing someone to

do a particular thing; consent or authorization, e.g. You may come in.

PROBABILITY - the modal refers to the quality or state of being probable; the

extent to which something is likely to happen or be the case,

e.g. He must have left because his train leaves in 15 minutes.

cf. also PROHIBITION, OBLIGATION, ABILITY, IMPOSSIBILITY

Aims

• quantify the intuitive clarity of the proposed

classifications o determine how different classifications of modality

correlate with aspects of usage

• across a number of Slavic languages o Pl, Rus, Cz (Cr)

Form [7]: polarity (2), aspect of the inf (2),

elips (2), voice (2), subject case (2), subject

presence (2), modal word (4-7*)

Meaning [6]: type (4), function (4), source (2),

subject semantics (4), infinitive semantics (9),

SoA applicability (2)

Variables [13 variables; 44 levels]

Quantitative corpus research tools

• multiple correspondence analysis o visual exploration + variance explained

o which labels tend to cluster together

o libraries {ca} and {rgl} in Language R

• polytomous logistic regression o which combination of variables is the best predictor

o library {polytomous} in Language R

o anova: whether model with added variable is

significantly better than that without

Type and function learning

• four ʻfacultiesʼ (Pl, Rus, Cz, Cr)

• 7 formal + 4 meaningful ʻteachersʼ

• each teacher teaches two or more classes of

students (labels, values of variables)

• the regression model ascribe the number of

teaching hours (weight)

• after training, each student vote for her class – {deontic, external, internal, epistemic}

– {necessity, permission, possibility, probability}

Evaluating the fit of the model

• accuracy, precision, recall, R2 likelihood

• baseline (most frequent category is always chosen)

predicted

necessity permission possibility probability Total

necessity 111 12 3 0 126

permission 13 1048 60 0 1121

possibility 0 5 187 0 192

probability 0 2 4 55 61

Total 124 1067 254 55 1500 ob

se

rve

d

Unpacking Polish, Russian, and Czech modality

Results:

Disclaimer:

• a pilot study

• the first round of annotation

• this is a first impression and will need considerably more work

Polish, Russian, Czech modals used in this study

1359 obs.

997 obs.

1500 obs.

TYPE by word by aspect by polarity by source

Participant-internal

Participant-external

Epistemic

Deontic

FUNCTION by word by aspect by polarity

Possibility

Necessity

Probability

Permission/prohibition

Regression for type

Prediction accuracy: 79.61%

= slightly better than always picking most frequent item [71.37%]

R2 likelihood: 28.27%

NB: if deontic is subtype of external, 90 % is covered

Deontic External Internal Internal

Deontic 112 76 0 0

External 69 970 [=71.37%] 0 0

Internal 1 67 0 0

Epistemic 0 64 0 0

Regression for function

Prediction accuracy: 87.27% [most frequent choice = 47.76%]

R2 likelihood: 70%

necessity permission possibility probability

necessity 691 72 7 0

permission 0 128 16 0

possibility 0 14 367 0

probability 33 0 31 0

Modal functions

map more directly onto pointers within sentence

? or are we missing something

1.forced choice: categorizations equally difficult

2.sorting: categories align “naturally” with

functional classification (// word meaning)

? how about other languages

Russian function ~ word, polarity, aspect, source,

subject ellipsis [65.7%]

Russian type ~ word, polarity, aspect, source,

subject ellipsis [64.9%]

External

Epistemic

Internal

Deontic

Necessity

Permission

Possibility

Probability

Type Deontic External Internal Epistemic

Deontic 12 111 3 0

External 13 1048 [=69.86%)

60 0

Internal 0 5 187 0

Epistemic 0 2 59 0

Russian - regression

Function Necessity Permission Possibility Probability

Necessity 471 0 7 0

Permission 18 110 61 1

Possibility [=49.47%]

11 35 675 21

Probability 0 0 60 30

Russian function ~ word, polarity, aspect,

source

Accuracy: 85.74%

[baseline 49%]

R2: 64.34%

Accuracy: 83.13%

[baseline 75%]

R2: 53.52%

Russian type ~ word, polarity, aspect,

source, subject ellipsis

Czech function ~ word, polarity, aspect [68.2%] Czech type ~ word, polarity, aspect, [56.7%]

Necessity

Probability

Possibility

Permission

Epistemic

Deontic

External

Internal

Type Deontic External Internal Epistemic

Deontic 442 35 0 9

External 71 151 0 4

Internal 24 28 0 0

Epistemic 33 12 0 37

Czech regression

Function Necessity Permission Possibility Probability

Necessity 282 28 14 19

Permission 22 159 20 4

Possibility 16 22 177 4

Probability 19 4 12 44

Czech function ~ all variables except

subject ellipsis

Accuracy: 78.25%

[baseline 39.65%]

R2: 46.61%

Accuracy: 74.46%

[baseline 58.26%]

R2: 29.1%

Czech type ~ all variables except subject

ellipsis

Discussion

● “words”: different inventories of modal words o Cz: mainly “proper” verbs Pl/Rus adverbials

● not all properties (equally) relevant for all

languages (in same way)

e.g. subject ellipsis (categorical vs probabilistic)

● labels: “type” difficult to reach agreement on o supports our hypothesis ;-)

Modal functions:

less is more:

●usage points to 4 functions

with 2 anchorage points

~ possibility & necessity cf. Auwera & Plungian (1998)

! variations between languages in detail

of relationship with sub-functions

Conclusions