Unpacking Slavic Modality
Dagmar Divjak, Nina Szymor, Olga Lyashevskaya,
Masha Ovsjannikova, Steven Clancy,
Mateusz-Milan Stanojević
Accounts of modality
Roots
philosophy: modal logic
pragmatics: speech acts
linguistics: English vs typological Perkins 1983, Huddleston 1988, Sweetser 1990, Bybee et al. 1994, van der
Auwera and Plungian 1998, Palmer 2001, Hengeveld 2004, Nuyts 2006
Issues
Lack of agreement on
1. number of modal types to distinguish
? 2, 3, 4
2. labels and definitions of each type
Modal types (van der Auwera & Plungian 1998)
modality = “semantic domains that involve possibility and necessity as
paradigmatic variants, that is, as constituting a paradigm with two possible choices”.
1. Participant-internal - participant’s internal abilities and needs,
e.g. Boris can get by with sleeping five hours a night
2. Participant-external - possibilities and necessities influenced by
factors external to the participant, e.g. To get to the station, you can
take bus 66
3. Deontic - permissions and obligations imposed on the participant
by social/moral/legal norms, e.g. John may leave now
4. Epistemic - a proposition is judged to be uncertain or probable
relative to some judgement(s), e.g. John may have arrived
Interrater (dis)agreement
modality_type (Auwera & Plungian 1998) 88% for možno (OL-MO)
72% for powinien (NS-DD)
67% for móc (NS-AS)
63% for musieć (NS-AS)
63% for wolno (NS-DD)
60% for można (NS-AS)
Can the corpus help?
Can empirical data and
quantitative methods,
incl. visualization tools,
play as objective viewpoint
on previous intuitive analyses
of modal types and functions?
Toward a usage-based classification
? (how) do modality types correlate with usage
→ track behavior of words in modal contexts
→ inherently multivariate nature of language
behavior profiles: morphology, syntax, semantics
Modal functions
NECESSITY - the modal refers to the state or fact of being required or
necessary, e.g.: To switch the TV on, you have to press this button. [it’s
necessary if you want to watch the TV, but you’re not obliged to do it].
POSSIBILITY - the modal refers to the state or fact of being able to occur/exist,
being achievable, being do-able, e.g. You can lose weight by following these
steps.
PERMISSION - the modal refers to the action of officially allowing someone to
do a particular thing; consent or authorization, e.g. You may come in.
PROBABILITY - the modal refers to the quality or state of being probable; the
extent to which something is likely to happen or be the case,
e.g. He must have left because his train leaves in 15 minutes.
cf. also PROHIBITION, OBLIGATION, ABILITY, IMPOSSIBILITY
Aims
• quantify the intuitive clarity of the proposed
classifications o determine how different classifications of modality
correlate with aspects of usage
• across a number of Slavic languages o Pl, Rus, Cz (Cr)
Form [7]: polarity (2), aspect of the inf (2),
elips (2), voice (2), subject case (2), subject
presence (2), modal word (4-7*)
Meaning [6]: type (4), function (4), source (2),
subject semantics (4), infinitive semantics (9),
SoA applicability (2)
Variables [13 variables; 44 levels]
Quantitative corpus research tools
• multiple correspondence analysis o visual exploration + variance explained
o which labels tend to cluster together
o libraries {ca} and {rgl} in Language R
• polytomous logistic regression o which combination of variables is the best predictor
o library {polytomous} in Language R
o anova: whether model with added variable is
significantly better than that without
Type and function learning
• four ʻfacultiesʼ (Pl, Rus, Cz, Cr)
• 7 formal + 4 meaningful ʻteachersʼ
• each teacher teaches two or more classes of
students (labels, values of variables)
• the regression model ascribe the number of
teaching hours (weight)
• after training, each student vote for her class – {deontic, external, internal, epistemic}
– {necessity, permission, possibility, probability}
Evaluating the fit of the model
• accuracy, precision, recall, R2 likelihood
• baseline (most frequent category is always chosen)
predicted
necessity permission possibility probability Total
necessity 111 12 3 0 126
permission 13 1048 60 0 1121
possibility 0 5 187 0 192
probability 0 2 4 55 61
Total 124 1067 254 55 1500 ob
se
rve
d
Unpacking Polish, Russian, and Czech modality
Results:
Disclaimer:
• a pilot study
• the first round of annotation
• this is a first impression and will need considerably more work
TYPE by word by aspect by polarity by source
Participant-internal
Participant-external
Epistemic
Deontic
Regression for type
Prediction accuracy: 79.61%
= slightly better than always picking most frequent item [71.37%]
R2 likelihood: 28.27%
NB: if deontic is subtype of external, 90 % is covered
Deontic External Internal Internal
Deontic 112 76 0 0
External 69 970 [=71.37%] 0 0
Internal 1 67 0 0
Epistemic 0 64 0 0
Regression for function
Prediction accuracy: 87.27% [most frequent choice = 47.76%]
R2 likelihood: 70%
necessity permission possibility probability
necessity 691 72 7 0
permission 0 128 16 0
possibility 0 14 367 0
probability 33 0 31 0
Modal functions
map more directly onto pointers within sentence
? or are we missing something
1.forced choice: categorizations equally difficult
2.sorting: categories align “naturally” with
functional classification (// word meaning)
? how about other languages
Russian function ~ word, polarity, aspect, source,
subject ellipsis [65.7%]
Russian type ~ word, polarity, aspect, source,
subject ellipsis [64.9%]
External
Epistemic
Internal
Deontic
Necessity
Permission
Possibility
Probability
Type Deontic External Internal Epistemic
Deontic 12 111 3 0
External 13 1048 [=69.86%)
60 0
Internal 0 5 187 0
Epistemic 0 2 59 0
Russian - regression
Function Necessity Permission Possibility Probability
Necessity 471 0 7 0
Permission 18 110 61 1
Possibility [=49.47%]
11 35 675 21
Probability 0 0 60 30
Russian function ~ word, polarity, aspect,
source
Accuracy: 85.74%
[baseline 49%]
R2: 64.34%
Accuracy: 83.13%
[baseline 75%]
R2: 53.52%
Russian type ~ word, polarity, aspect,
source, subject ellipsis
Czech function ~ word, polarity, aspect [68.2%] Czech type ~ word, polarity, aspect, [56.7%]
Necessity
Probability
Possibility
Permission
Epistemic
Deontic
External
Internal
Type Deontic External Internal Epistemic
Deontic 442 35 0 9
External 71 151 0 4
Internal 24 28 0 0
Epistemic 33 12 0 37
Czech regression
Function Necessity Permission Possibility Probability
Necessity 282 28 14 19
Permission 22 159 20 4
Possibility 16 22 177 4
Probability 19 4 12 44
Czech function ~ all variables except
subject ellipsis
Accuracy: 78.25%
[baseline 39.65%]
R2: 46.61%
Accuracy: 74.46%
[baseline 58.26%]
R2: 29.1%
Czech type ~ all variables except subject
ellipsis
Discussion
● “words”: different inventories of modal words o Cz: mainly “proper” verbs Pl/Rus adverbials
● not all properties (equally) relevant for all
languages (in same way)
e.g. subject ellipsis (categorical vs probabilistic)
● labels: “type” difficult to reach agreement on o supports our hypothesis ;-)
Modal functions:
less is more:
●usage points to 4 functions
with 2 anchorage points
~ possibility & necessity cf. Auwera & Plungian (1998)
! variations between languages in detail
of relationship with sub-functions
Conclusions
Top Related