'Most' vs 'More Than Half': An Alternatives Explanation - ACL ...

10
‘Most’ vs ‘More Than Half’: An Alternatives Explanation Fausto Carcassi Jakub Szymanik Institute for Logic, Language and Computation Department of Linguistics, University of Amsterdam Spuistraat 134, 1012 VB Amsterdam [email protected] [email protected] Abstract While ‘most’ and ‘more than half’ are gener- ally assumed to be truth-conditionally equiva- lent, the former is usually interpreted as con- veying greater proportions than the latter. Pre- vious work has attempted to explain this dif- ference in terms of pragmatic strengthening or variation in meanings. In this paper, we pro- pose a novel explanation that keeps the truth- conditions equivalence. We support this expla- nation with a computational model of usage in the Rational Speech Act framework. We find that the difference in typical proportions associated with the two expressions can be ex- plained with previously independently moti- vated semantic and pragmatic mechanisms. 1 Introduction According to their standard analysis, sentences ‘most As are B’ and ‘more than half of As are B’ (with A and B referring to two sets A and B) are verified by the same As and Bs, i.e. are truth- conditionally equivalent. Specifically, ‘most As are B’ is analysed as conveying that the size of A B is greater than the size of A - B, whereas ‘More than half of As are B’ is analysed as conveying that the size of A B is greater than half the size of A (Hackl, 2009): 1 JmostK (A)(B) ⇐⇒ |A B| > |A - B| (1) JMTHK (A)(B) ⇐⇒ |A B| > 1 2 |A| (2) In contrast to this assumption, the behaviours of ‘most’ and ‘more than half’ differ in several ways. The differences between ‘most’ and ‘more than half’ call for an explanation. Early work has fo- cused on the different behaviour of the two expres- sions with respect to their upper bounds (Ariel, 1 In the following, we assume for simplicity that A and B are two finite sets. For a slightly more general definition in terms of measure functions see Solt (2016). 2003) or their cognitive encoding, which has been argued to lead to different verification procedures (Hackl, 2009). More recent work has focused on the more general fact that ‘most’ typically conveys proportions higher than ‘more than half’. Follow- ing Deni ´ c and Szymanik (2020), we can categorize the explanations for this difference in two classes. First, the pragmatic strengthening hypotheses claim that while the two expressions are truth- conditionally identical, ‘most’ is pragmatically strengthened, resulting in a threshold higher than ‘more than half’. This strengthening can happen, e.g., through a scalar implicature or through an R-implicature. Solt (2016) argues for the latter op- tion and claims that the reason why ‘most’ receives a different interpretation to begin with is a (non truth-conditional) difference in the types of scales underlying the two expressions. On the other hand, lexical meaning hypotheses attempt to explain these differences in terms of a truth-conditional differ- ence in their logical forms, which can come from, e.g., conventionalization of implicatures or from ‘most’ being a vague quantifier. More recent work has produced experimental evidence supporting the hypothesis of a semantic difference between the two expressions: Ramotowska et al. (2019) have observed a difference in decision times and be- haviour of subjects verifying sentences with the two quantifiers that are consistent with the model in which threshold for ”more than half” is 50% and threshold for ”most” is higher. Deni ´ c and Szymanik (2020) have reported that the thresholds of ”most” does not change under the downward monotone environment and remains higher that the threshold for ”more than half”. This finding also suggest that the difference between the two quantifiers is due to semantics. In this paper, we propose a novel explanation of the difference between ‘most’ and ‘more than half’. We argue that two independently needed mecha- 334 Proceedings of the Society for Computation in Linguistics (SCiL) 2021, pages 334-343. Held on-line February 14-19, 2021

Transcript of 'Most' vs 'More Than Half': An Alternatives Explanation - ACL ...

‘Most’ vs ‘More Than Half’: An Alternatives Explanation

Fausto Carcassi Jakub SzymanikInstitute for Logic, Language and Computation

Department of Linguistics, University of AmsterdamSpuistraat 134, 1012 VB Amsterdam

[email protected] [email protected]

Abstract

While ‘most’ and ‘more than half’ are gener-ally assumed to be truth-conditionally equiva-lent, the former is usually interpreted as con-veying greater proportions than the latter. Pre-vious work has attempted to explain this dif-ference in terms of pragmatic strengthening orvariation in meanings. In this paper, we pro-pose a novel explanation that keeps the truth-conditions equivalence. We support this expla-nation with a computational model of usagein the Rational Speech Act framework. Wefind that the difference in typical proportionsassociated with the two expressions can be ex-plained with previously independently moti-vated semantic and pragmatic mechanisms.

1 Introduction

According to their standard analysis, sentences‘most As are B’ and ‘more than half of As areB’ (with A and B referring to two sets A and B)are verified by the same As and Bs, i.e. are truth-conditionally equivalent. Specifically, ‘most As areB’ is analysed as conveying that the size of A ∩Bis greater than the size of A− B, whereas ‘Morethan half of As are B’ is analysed as conveying thatthe size of A ∩B is greater than half the size of A(Hackl, 2009):1

JmostK(A)(B) ⇐⇒ |A ∩B| > |A−B| (1)

JMTHK(A)(B) ⇐⇒ |A ∩B| > 1

2|A| (2)

In contrast to this assumption, the behaviours of‘most’ and ‘more than half’ differ in several ways.

The differences between ‘most’ and ‘more thanhalf’ call for an explanation. Early work has fo-cused on the different behaviour of the two expres-sions with respect to their upper bounds (Ariel,

1In the following, we assume for simplicity that A and Bare two finite sets. For a slightly more general definition interms of measure functions see Solt (2016).

2003) or their cognitive encoding, which has beenargued to lead to different verification procedures(Hackl, 2009). More recent work has focused onthe more general fact that ‘most’ typically conveysproportions higher than ‘more than half’. Follow-ing Denic and Szymanik (2020), we can categorizethe explanations for this difference in two classes.

First, the pragmatic strengthening hypothesesclaim that while the two expressions are truth-conditionally identical, ‘most’ is pragmaticallystrengthened, resulting in a threshold higher than‘more than half’. This strengthening can happen,e.g., through a scalar implicature or through anR-implicature. Solt (2016) argues for the latter op-tion and claims that the reason why ‘most’ receivesa different interpretation to begin with is a (nontruth-conditional) difference in the types of scalesunderlying the two expressions. On the other hand,lexical meaning hypotheses attempt to explain thesedifferences in terms of a truth-conditional differ-ence in their logical forms, which can come from,e.g., conventionalization of implicatures or from‘most’ being a vague quantifier. More recent workhas produced experimental evidence supporting thehypothesis of a semantic difference between thetwo expressions: Ramotowska et al. (2019) haveobserved a difference in decision times and be-haviour of subjects verifying sentences with thetwo quantifiers that are consistent with the modelin which threshold for ”more than half” is 50% andthreshold for ”most” is higher. Denic and Szymanik(2020) have reported that the thresholds of ”most”does not change under the downward monotoneenvironment and remains higher that the thresholdfor ”more than half”. This finding also suggest thatthe difference between the two quantifiers is due tosemantics.

In this paper, we propose a novel explanation ofthe difference between ‘most’ and ‘more than half’.We argue that two independently needed mecha-

334Proceedings of the Society for Computation in Linguistics (SCiL) 2021, pages 334-343.

Held on-line February 14-19, 2021

nisms in the pragmatic interpretation of quantifierssuffice to predict the difference between the twoexpressions, without assuming a difference in scalestructures or truth conditions. The first mechanismis the tendency of the listener to guess points cen-tral to a category, in order to minimize the expecteddistance between their own guess and the speaker’sobservation. The second mechanism is the struc-tural theory of conceptual alternatives, which letsthe alternative set of an utterance depend on thestructure of the concept conveyed by the utterance.We show that these mechanisms make the correctpredictions with a computational model of prag-matics, the Rational Speech Act model.

2 Solt’s account

In order to identify the differences between ‘most’and ‘more than half’, Solt (2016) considers all ap-pearances of the two expressions as quantifiers inthe nominal domain in the Corpus of ContemporaryAmerican English (COCA) (Davies, 2017). Whilevarious differences emerge when comparing thetwo expressions, in order to compare the typicalproportions for which the two expressions are usedthose appearances were selected which included aspecific percentage (n = 54 for ‘more than half’and n = 141 for ‘most’). The corpus data showsthat (1) ‘more than half’ is mostly used for per-centages in the 50%-65% range, (2) ‘most’ has amuch flatter distribution which covers the whole50%-100% range.

The difference between the upper bounds andthat between the lower bounds of the typical setsof ‘most’ and ‘more than half’ receive separate ex-planations in Solt’s account. In order to explainthe difference in lower bounds, Solt proposes thatthe scales that underlie the two expressions are dif-ferent. According to Solt, ‘more than half’ usesa ratio scale, while ‘most’ uses an ordinal scale.2

While two points on a ratio scale can be comparedin terms of the proportion between them, pointson an ordinal scale can only be compared in termsof which one is the greater or lower of the two(Stevens, 1946). Therefore, to say that the scaleunderlying ‘most’ is an ordinal scale means that anexpression such as ‘most A B’ only requires us todetermine whether the size of A∩B is greater thanthe size of A−B. On the other hand, ‘more thanhalf of As B’ requires us to compare the size of

2Technically, Solt argues that ‘most’ also allows for semi-ordered scales. We return to semi-ordered scales below.

A∩B to a proportion of the size of A, namelyA/2.Since ratio scales allow arbitrarily precise compar-isons between points, Solt’s account predicts thelower bound of ‘more than half’ to be closer to 0.5than the lower bound of ‘most’, as observed in thecorpus data.

On the other hand, Solt accounts for the differ-ence in the upper bounds of the two expressionswith scalar implicatures. Solt points out that ‘morethan half’ has a rich set of alternative utterances,including ‘more than two thirds’ and ‘more thanthree quarters’. On the other hand, the alternativeutterances to ‘most’ are more sparse, including ‘all’.Since the set of alternative utterances is more fine-grained for ‘more than half’ than for ‘most’, scalarimplicatures constrain the upper bound of the for-mer to be lower than the latter. Solt proposes thatthe reason why the two expressions have differentsets of alternatives is the different types of scalesunderlying them. In the next section, we introducean alternative account that explains the differencebetween ‘most’ and ‘more than half’ without as-suming a difference in the scales underlying thetwo expressions.

3 Two mechanisms in the pragmatics ofquantifiers

In this section, we present our account in non for-mal terms. Our account explains the difference intypical set between ‘most’ and ‘more than half’,and is based on two phenomena relating to the in-terpretation of quantifiers. The first is the idea thatthe listener attempts to minimize the differencebetween their own guess and the speaker’s observa-tion, the second is the fact that different conceptualstructures cause different sets of alternatives. Wenext consider these two mechanisms in turn.

3.1 Distance-minimizing listeners

Some semantic domains, such as nationality, foot-ball teams, or personal identity, are not usuallystructured by relations of similarity. For instance,it is nonsensical to claim that Billy the kid is closer,in terms of his identity, to Jesse James than DocHolliday.3 On the other hand, the members of somesemantic domains, such as the domain of numbers,colors, or proportions, enter in relations of similar-ity to each other. For instance, two shades of blue

3There are feature with respect to which two individualsmight be more or less close to each other, but this does notconcern their identity as such.

335

are closer to each other than either of them is to ashade of red.

In many cases, when communication happens indomains structured by distance, and the listener’stask is to construct a representation of the worldstate given a description produced by the speaker,4

communicative success is not simply a function ofwhether the listener’s representation is identical ornot to the true world state. Rather, success is a(ninverse) function of the similarity between the truestate of the world and the listener’s guess. In otherwords, communication is more or less successfuldepending on whether the listener’s guess is moreor less dissimilar (respectively) from the true worldstate.

In this perspective, it is a sensible strategy for alistener to not simply sample from the set of possi-ble world states given their probability after receiv-ing the message, but rather to attempt to minimizethe expected distance between their guess and thetrue world state. For instance, if the speaker utters‘blue’, the listener might select a shade of blue thatis located around the center of the blue category,because a point near the center of the category willhave a lower expected distance to the true worldstate than a point that is around the margin of thecategory. Previous literature supports this idea thatlisteners should tend to guess the center of a cate-gory when communicative success depends on thesimilarity between true state and listener’s guess,e.g. Jager et al. (2011) showed that the optimalstrategy for such so-called simmax signaling gamesinvolves a listener that guesses the central point inthe category.

Consistently with previous literature,5 we willassume in the rest of this paper that communica-tion with quantifiers happens on a semantic domainstructured by a distance, namely the scales of pro-portions and numbers.6 Moreover, we claim thatin communication with quantifiers, communica-tive success is of the graded type presented above.For instance, if 1/2 of the As are B, the commu-nication is more successful if the listener guesses|A ∩B|/|A| = 0.6 than if the speaker guesses 0.9.This implies that a rational listener does not guess

4This communicative setup is called descriptive by Franke(2014), who opposes it to referential communicative games. Inthe following, we limit ourselves to discussions of descriptivecommunication.

5See chapter 2 of Carcassi (2020) for an overview.6In the following, we will focus on the scale of proportions,

but what we say can be easily generalized to the scale ofnumbers.

a proportion simply by sampling from the posteriorover proportions conditional on the received signal,but rather they attempt to minimize the expecteddistance between their guess and the true state ofthe world.

The tendency for the listener to guess a state thatminimizes the expected distance to the speaker’sobservation, when in a scalar semantic domain, isnot only a result about rational agents, but alsoaligns with the way we use quantifiers in practice.For instance, imagine receiving the signal ‘between50 and 100’, and creating a representation of theworld state. Even within the part of the scale ofintegers covered by the expression—e.g. numbersbetween 50 and 100—the guess does not happenuniformly. Rather, we intuitively tend to guess aninteger that is around the center of the category, i.e.around 75. In other words, we are less likely toselect a number close to the category boundaries,such as 99. As we discuss in more detail below,the situation is subtler when multiple possible ut-terances are involved.

3.2 The structural account of alternativesAs discussed above, Solt’s explanation for the dif-ferent comparison sets of ‘most’ and ‘more thanhalf’, which produces in her account the differentupper bounds, is based on the difference betweenthe scale types. We propose an alternative expla-nation for the difference in the sets of alternativeutterances which does not rely on scale structure.In particular, we rely on the structural account ofalternatives (Katzir, 2007; Fox and Katzir, 2011;Trinh and Haida, 2015) to explain why ‘most’ and‘more than half’ have different sets of alternativeutterances.

The structural theory of alternatives starts withthe idea of a structural alternative. ψ is a struc-tural alternative to φ (ψ . φ) iff ψ is structurallyat most as complex as φ, i.e. ψ can be obtainedfrom φ through a “finite series of deletions, con-tractions, and replacements of constituents of φ”with constituents of the same category taken fromthe lexicon (Katzir, 2007). The core idea is to de-fine the set Astr(φ) of utterances alternative to φas follows:

Astr(φ) = {ψ|ψ . φ} (3)

In words, the set of utterances that enter in thecalculation of implicatures for φ is the set of utter-ances that are structurally at most as complex asφ.

336

While the original criterion for alternatives inKatzir (2007) is a syntactic one, there is emerg-ing theoretical and experimental evidence that itis best characterized as acting not on the syntacticstructure, but rather on the conceptual structure ofutterances (Chemla, 2007; Buccola et al., 2018).The structural account of conceptual alternatives isstill being developed, however in the following welimit ourselves to using the basic idea in equation3, as it is all we need for the present purposes.

In the following, we make two crucial assump-tions about the way alternatives are generated forthe expressions under consideration. First, notevery expression of the form ‘a b’ is considered,where a is a cardinal number and b an ordinal num-ber such that a ≤ b. If every a and b were con-sidered, the set of alternatives to ‘one half’ wouldbe the set of rational numbers in the unit interval.Various factors plausibly restrict the set of consid-ered numbers. First, the listener can generally as-sume that the speaker has noisy measurement of thetrue proportion, and therefore only produces utter-ances implying at most a certain level of granular-ity. Moreover, the communicative aims generallydo not require transmission of precise proportions.The idea that the notion of alternative is graded anddepends on the complexity of the concept, devel-oped in Buccola et al. (2018), could also accountfor why ‘five sixth’ seems to compete with simplerfractions such as ‘two thirds’, while the oppositeis not true; the difference would depend on the dif-ferent conceptual complexity of different fractions.We do not develop this idea further in the presentpaper.7

The second assumption we make is that ‘most’and ‘more than half’ are conceptually structuredas proposed by Hackl (2009), i.e. as in equations 1and 2 above. The main consequence of this assump-tion is that ‘two thirds’ and structurally equivalentexpressions are alternatives to ‘half’ according tothe criterion in equation 3.

Under the two assumptions just discussed, thecriterion defined in equation 3 has the correctconsequences for the cases at hand. Namely,Astr(‘most’) contains ‘all’ and does not contain‘more than three quarters’. On the other hand,Astr(‘(one) half’) contains e.g. ‘three quarters’.

In this section, we have presented two mecha-nisms that play a role in the way quantifiers are

7Thanks to Milica Denic for recommending the literatureon conceptual alternatives.

interpreted. These two mechanisms have alreadybeen discussed in the literature in other contexts.While our approach relies on a difference in con-ceptual structures, the account remains a pragmaticaccount insofar as conceptual structure affects theresults only indirectly by causing a difference in al-ternative sets, rather than directly as in Solt (2016).The main contribution of this paper is thereforeto show how these two mechanisms can, withoutrecourse to the scale types discussed by Solt, ex-plain the difference between ‘most’ and ‘more thanhalf’ with respect to the proportions that the twoexpressions typically convey.

In the next section, we model the mechanismswe discussed. We consider a pragmatic speakerwho picks ‘most’ or ‘more than half’ not simply asa function of their extension on the scale of propor-tions, but instead also implicitly selecting the setof alternatives that will allow a pragmatic listenerto choose a proportion that is as close as possibleto the speaker’s observation. Since a pragmaticlistener guesses points closer to 0.5 for a rich alter-native set such as the one induced by ‘more thanhalf’, the speaker selects ‘more than half’ for suchproportions. On the other hand, since the listenerwill guess alternatives close to 0.75 for ‘most’, thespeaker produces ‘more than half’ for such pro-portions. In order to formalize this intuition, inthe next section we present the RSA modellingframework for pragmatic language use.

4 An RSA model of the two mechanisms

4.1 Basic RSA model

The RSA framework is meant to model the processof recursive mindreading that lies behind the prag-matic interpretation or production of utterances(Goodman and Stuhlmuller, 2013; Franke, 2014;Frank et al., 2017). RSA models usually start witha pragmatic listener who interprets utterances basedon the simulated behaviour of a pragmatic speaker.The pragmatic speaker in turn given an observationtends to choose the most useful utterance for a lit-eral listener who interprets it based solely on itsliteral meaning. We will first explain the simplesttype of RSA model, and then a modification thatwill be useful to model numerals.

The simplest RSA model starts with a set of utter-ances u and a set of possible states s. The meaningof each utterance can be encoded as the set of thosestates that verify the utterance. The pragmatic lis-tener L1 receives an utterance u and calculates a

337

Figure 1: Simple RSA model with three possible ut-terances u (y-axis) and three states s (x-axis). L1 cal-culates a scalar implicature for utterances u1 and u2(α = 4). The left, central, and right plots correspondto L0, S1, and L1 respectively. Note that the color indi-cates the probability of guessing a state given a signalforL0 andL1, and the probability of producing a signalgiven a state for S1.

posterior over states by Bayesian update, combin-ing their prior over states with the probability thatthe pragmatic speaker S1 would have produced theutterance given each state:

pL1(s|u) ∝ pL1(s)pS1(u|s) (4)

The pragmatic speaker in turn observes a stateand produces an utterance that aims at optimizingthe utility U(u|s) for a literal listener L0 given thestate, while minimizing the utterance cost:

pS1(u|s) ∝ exp(αU(u|s)− c(u)) (5)

The utility U(u|s) is the negative surprisal of thestate given the utterance, so that the speaker favoursutterances that make the state less surprising forthe literal listener:

U(u|s) = log(pL0(s|u)) (6)

Finally, the probability that literal listener L0

attributes to each state given an utterance is simply0 if the utterance is not verified by the state, andproportional to the prior for the state otherwise:

pL0(s|u) ∝{p(s) if s verifies u0 otherwise

(7)

Figure 1 shows L0, S1, and L1 in this simple RSAmodel. The crucial phenomenon that can be ob-served in figure 1 is that L1 calculates a scalarimplicature: although utterance u1 is, in its literalsense, compatible with both s1 and s2, S1 tendsto produce u1 mostly for s2, because when s1 isobserved S1 tends to use the more useful signal u1.Therefore, when hearing u1 L1 is more likely toguess s2.

4.2 Distance based listeners

In the simple RSA models above, the successof communication is binary, solely a function ofwhether the listener’s guess coincides with thespeaker’s observed state. This is plausible in caseswhere the set of states has no internal structure.However, as discussed above in the case wherea notion of distance is well-defined on the set ofstates, the listener might not be simply trying toguess the speaker’s observation, but rather mightstrive to minimize the (expected) distance betweenthe state they select and the speaker’s observation.8

In order to model the effects of a well-defineddistance D on the set of states, we modify the lis-tener L1 so that instead of selecting a state by sam-pling from their posterior distribution given thesignal, they try to minimize the expected distancebetween their selection s and the true state. There-fore, we define the choice probability for listenersas follows:9

pCL(s|u) ∝ exp

(−ρ

i∈states

pL1(i|u)D(i, s)

)

(8)where ρ is the parameter of a softmax functionwhich determines how strongly the listener tendsto minimize the expected distance and pL1 is de-fined as above in equation 4. The listener describedin equation 8 tends therefore to minimize the ex-pected linear distance function. Figure 2 showsthe effects of this modification of the model for20 states, when D(sn, sm) = |n −m|. The rightplot shows that in this modified RSA model, L1

tends to guess points that are located centrally inthe category, after the category has been restrictedby scalar implicature.

4.3 Varying sets of alternatives

The modification to the basic RSA model aboveis an implementation of the first mechanism dis-cussed in section 3. The second mechanism con-

8Cf previous work where the effects of a distance structureaffects the speaker but not the listener, such as Franke (2014).It is worth noticing that our model does not have more degreesof freedom than Franke’s model: while we introduce one moreparameter than the basic RSA model to regulate the listener’stendency to minimize expected distance, Franke introducesone parameter to regulate the amount of pragmatic slack. Wedo not investigate in this work the differences between the twoapproaches.

9We apply this modification only to L1, assuming that theattempt to minimize distance is something above and beyondthe literal reading of the signals. We leave to future work aninvestigation of the effects of modifying both listeners.

338

Figure 2: RSA model with a distance-minimizing L1.The model displayed in the plot use a language withthree utterances and 20 states. The listener L1 donot simply guess the signal observed by the speakerby sampling their posterior, but rather attempt to mini-mize the expected distance between their guess and thespeaker’s observation (α = 4, ρ = 2). See figure 1 formore detail.

cerns the way that the comparison set depends onthe speaker’s utterance.

In the basic RSA framework, the set of possi-ble utterances considered by the pragmatic speakerand the pragmatic listener that the speaker modelsare identical. However, according to the structuralaccount of alternatives discussed above the set ofutterances considered by the listener depends onthe actual utterance that is picked by the speaker.For instance, if the speaker utters ‘101’, the listenerwill consider all alternative utterances that are atmost at a similar level of granularity as 101, suchas 91 and 100. However, if the speaker utters ‘100’,the listener in the model considers an alternativesset containing e.g. only 90 and 100, but not 101.

In order to model this in the RSA model, weintroduce a speaker S2. S2, much like S1, tends toselect the signal that minimizes the listener’s sur-prise for the real state given the signal. However,the set of alternative utterances considered by L1

is not independent of the signal received by L1.Instead, the set of alternative utterances consideredby L1 (and therefore by the lower levels S1 andL0) depends on the actual utterance of S2, as de-scribed in the section on the structural account ofalternatives above. The model below is therefore aproduction model.

This picture of alternatives is in many respectsa simplification. For instance, it is likely that fromthe point of the listener there is uncertainty as tothe set of alternatives that ought to be consideredin the context. More complex discussions of is-sues related to granularity and alternatives can befound in the literature, see e.g. Bastiaanse (2011)for numerals. However, these more complex mod-els are not needed to explain the issue at hand, and

Figure 3: Structure of modified RSA model. The set ofalternative utterances considered by L1 is not fixed, butrather depends on the received utterance. Moreover, L0

and L1 do not simply guess a state based on their pos-terior probability given the received signal, but rathertend to guess a state that is expected to be close to thespeaker’s observation.

therefore we leave investigation of the subtleties tofuture work.

In sum, the only requirements for this model toapply are (1) that the listener is trying to minimisethe distance between their guess and the speaker’sobservation, and (2) that some terms have an alter-natives set that is more granular than other ones.In particular, it applies even if the two expressionsthat induce different granularities are synonymous.

In this section, we have formalized the twomechanisms discussed in section 3 within the RSAframework. The resulting model is summarizedin natural language in figure 3. In the resultingmodel, structurally different expressions induce thepragmatic listener to consider different sets of al-ternative utterances. Moreover, the listener doesnot simply guess uniformly from the enriched partof the parameter space, but rather tends to guesspoints that are central in the pragmatically enrichedcategory. Therefore, even intensionally equivalentexpressions will be used differently, as long as theyare structurally different. In the next section, weshow how this model applies to the specific case ofthe contrast between ‘most’ and ‘more than half’.

5 An alternatives account of ‘most’ vs‘most than half’

5.1 An RSA model of the contrastIn the following, we will model communicationwith quantifiers by applying the RSA model de-scribed above to the following simple referentialcommunication task, modelled after Pezzelle et al.

339

Utterance ExtensionAll (∀) {1}

Most (1/2, 1]None (¬∃) {0}

Some (∃) (0, 1]MT a half (> 1/2) (1/2, 1]

MT one third (> 1/3) (1/3, 1]MT two thirds (> 2/3) (2/3, 1]

LT a half (< 1/2) [0, 1/2)LT one third (< 1/3) [0, 1/3)

LT two thirds (< 2/3) [0, 2/3)

Table 1: Meaning of each signal in the model.MT=‘more than’, LT=‘less than’.

(2018) where communication was set up similarlyin a production task. A speaker observes two sets,A and B, and attempts to communicate to a lis-tener which proportion of A is also in B in the waymodelled by the modified RSA model introducedabove. As possible signals, we have chosen thelexically simple Aristotelian quantifiers and someminimal set of alternatives for ‘more than half’.The literal meaning of each quantifier in the modelcorresponds to a portion of the scale of proportions(see table 1). This set of utterances is closed undersubstitution of ‘more’ by ‘less’, and by (seman-tically meaningful) substitutions of one, two, andthree (both their cardinal and ordinal versions) witheach other.

As in the modified RSA model presented above,the alternatives considered by the pragmatic lis-tener depend on the speaker’s utterance. For in-stance, if the speaker uttered ‘some’ the listenerwould consider a set of alternatives containing ‘all’but not ‘more than two thirds’, while if the speakeruttered ‘more than one third’ both ‘all’ and ‘morethan two thirds’ would be possible options for thelistener. In the present case, the utterances abovecan be divided in two groups, the first containing‘all’, ‘most’, ‘none’, and ‘some’, and the secondcontaining the remaining utterances. Each utter-ance in the first group contains all other utterancesin that group as alternatives, and none of the utter-ances in the second group. Each of the utterancesin the second group contains all utterances in itsset of alternatives.10

In order to isolate the effects of the account of10While previous work has argued that utterances with dif-

ferent monotonicity profiles do not appear in the same set ofalternatives (e.g. Horn, 1989), Katzir (2007) has argued thestructural theory of alternatives can lift this restriction.

Figure 4: The plots shows the results with |A| = 100,α = 3 and ρ = 1. Each plot shows the behaviour of adifferent agent in the RSA model.

alternatives discussed above from the consequencesof utterance cost, we assume that signals have nocost. Moreover, to keep the results as simple aspossible S2 can only produce ‘all’, ‘most’, ‘none’,‘some’, ‘more than a half’ and ‘less than a half’.In order for the speaker to be able to calculate adistribution over utterances given any state, therehas to be at least one utterance to refer to each statein each set of alternative utterances.

The results of the model are shown in figure4. L0 guesses uniformly within the categories ex-pressed by each signal considered by S2. L0 treats‘most’ and ‘more than half’ identically, guessinguniformly among the states between 51 and 100.Finally, S0 selects the maximum for ‘every’ andthe minimum for ‘none’.

With S1, the set of alternatives for each signalmatters (second plot from top in figure 4). Morespecifically, while the lower bound for both ‘most’and ‘more than half’ are similar for S1, their upperbounds are different as a consequence of the dif-ferent ways that the respective set of alternativescover the scale. ‘More than half’ implicates lessthan two thirds, and therefore tends to not be usedfor proportions higher than two thirds, while mostonly implicates ‘not all’. Note that while the sixsignals are plotted together in figure 4, the distri-bution for each signal is computed independentlywith a possibly different set of alternatives utter-ances. Therefore, S1 does not suffice to explain thedifference between ‘most’ and ‘more than half’.

340

L1 tends to pick the central point in the cate-gories as produced by S1 (third plot from top infigure 4). Therefore, L1 tends to guess points closerto the middle of the scale for ‘more than half’ thanfor ‘most’, because the former is produced by S1for a range of proportions closer to the scale’s mid-point. Finally, the pragmatic speaker S2 tends topick ‘more than half’ for signals closer to the mid-point of the scale than ‘most’ (bottom plot in figure4).

The results in figure 4, while qualitatively cor-rect, are quantitatively surprising in two respects.First, the upper bound of ‘most’ is lower in theresults shown in figure 4 than in the data presentedby Solt. This is a consequence of the unusuallylow bound for ‘all’, which goes down to 80%. Theexact value of this bound changes depending onthe exact set of utterances available to the speaker.For instance, the availability of ‘almost all’ wouldpush the lower bound for ‘all’ higher up on thescale. The second difference is that the upperbound of ‘more than half’ goes higher than in thedata presented by Solt. The positions of the in-volved bounds are sensitive to the parameter values.For instance, figure 5 shows a parameter settingthat makes predictions closes to Solt’s data. More-over, the upper bound of ‘more than half’ is alsosensitive to the set of alternatives. For instance,including proportions such as ‘three fourths’ wouldpush the upper bound of ‘more than half’ closer to0.5.

5.2 Ignoring the mechanisms

In order to see what role each of the two mecha-nisms play in the predictions of the model above,it is instructive to observe the consequences of ig-noring each of the two mechanisms.

When both mechanisms are ignored, the resultis a simple RSA model as described in 4.1. Inparticular, we consider an RSA model with differ-ent costs for the signals. The results are shownin figure 6. When both mechanisms are ignoredand production costs are implemented, speaker S2does not introduce any substantial innovation overthe listener L1, and therefore we stop the compu-tation at the level of L1. The effect of cost in thissetting is simply to make S1’s production proba-bility for ‘more than half’ uniformly lower thanthe one of ‘most’ for any given state. However,the difference in cost cannot be exploited by L1

to draw an inference about the state observed by

Figure 5: The plots shows the results with |A| = 100,α = 5 and ρ = 0.5. This parameter setting producesresults closer to Solt’s observed typical sets for ‘most’and ‘more than half’.

S1. In sum, a difference in cost alone without themechanisms discussed above cannot be exploitedby a pragmatic speaker to convey different infor-mation with truth-conditionally equivalent signals.For similar reasons, when only the first mechanism,namely the structural account of alternatives, is ig-nored, all utterances compete with each other, andtherefore S2 does not introduce interesting results.Again, since the symmetry is not broken by thedifferent set of alternatives, ‘most’ and ‘more thanhalf’ end up conveying identical information to S1.

When only the second mechanism—the distance-minimizing listener—is ignored, we still obtain thecrucial result that ‘more than half’ is generally usedto convey proportions closer to the scale’s midpointthan ‘half’. However, the results differ from sec-tion 4 in two crucial ways. First, the speaker S2ends up producing each signal with uniform proba-bility within the pragmatically enriched category,as shown in figure 8. For instance, ‘more than half’is produced with uniform probability for propor-tions above 1/2 and below 1/3. Second, the modelwithout distance-minimizing L2 predicts that thespeaker would use ‘all’ and ‘none’ exclusively forthe maximum and minimum of the scale respec-tively. These two consequences contradict bothSolt’s data and the data in Pezzelle et al. (2018).In the data, the production probabilities for ‘morethan half’ resembles a Gaussian distribution ratherthan a uniform distribution, ‘all’ and ‘none’ are

341

Figure 6: Results without both mechanisms. We stopthe computation at the level of pragmatic listener L1.S1 is less likely to produce ‘more than half’ for anygiven state, because of its higher cost. However, L1

does not derive any difference between the informationconveyed by ‘most’ and ‘more than half’, so that thetwo lines perfectly overlap for L1. In this plot, α = 2.While the speaker in this model can produce all signals,for ease of comparison with the previous plots we onlyplot the 6 signals that the speaker could produce in theprevious models. We model the cost of each utterancesimply as the number of words in the utterance: ‘all’,‘most’, ‘none’, and ‘some’ get cost 1, while all othersignals get cost 4.

used for signals close to the scale’s extremes ratherthan exclusively to the extremes.

In this section, we have shown that the two mech-anisms discussed in section 3 are not only inde-pendently motivated, but are also both needed tomake sense of the difference in typical sets between‘most’ and ‘more than half’.

6 Conclusions

‘Most’ and ‘more than half’, while traditionallyassumed to be truth-conditionally equivalent, aretypically associated with different proportions. Inthe most developed explanation of this difference,Solt (2016) introduces a difference between thestructures of the scales used by the two expressions.In contrast, in this paper we proposed a novel ac-count of the difference that is based on indepen-dently motivated mechanisms and does not rely ondifferent scale structures. Moreover, we analysedthe predictions of the account by implementing itin a popular computational model of pragmatic rea-soning, the RSA model. Further and especiallyexperimental work is needed to compare the em-pirical accuracy of the different emerging accounts

Figure 7: No distance minimizing.

Figure 8: Results without the second mechanism: whenthe distance-minimizing listener is substituted with thepragmatic listener of equation 4, speaker S2 has a uni-form probability of producing ‘more than half’ acrossits whole pragmatically enriched domain. Only L1 andS2 are plotted as L0 and S1 are essentially the same asin figure 4.

of the differences between ‘most’ and ‘more thanhalf’.

The work we presented can be extended in var-ious possible directions. First, a similar modelcould be used to account for the usage of modi-fier numerals, since a similar contrast to the onediscussed here can be found e.g. between ‘morethan 100’ and ‘more than 101’, where the typicalguessed number for the former utterance is higherthan the for the latter. Second, the model above canbe used to fit experimental production data withquantifiers. A third possible development wouldlook at whether the model predicts that the quanti-fiers’ thresholds stay the same even in downwardentailing contexts, as suggested by the experimen-tal data in Denic and Szymanik (2020). Lastly, inthe models presented in this paper we only consid-ered a small set of alternative proportions for ‘half’,namely ‘one third’ and ‘two thirds’. However, itwould be valuable to study the predictions of themodel when more alternative utterances, contain-ing more complex proportions, are included. Weleave all these exciting possible developments tofuture work.

Acknowledgments

The authors have received funding from theEuropean Research Council under the Euro-pean Union’s Seventh Framework Programme(FP/2007–2013)/ERC Grant Agreement n. STG

342

716230 CoSaQ.

ReferencesMira Ariel. 2003. Does most mean ‘more than half’?

Annual Meeting of the Berkeley Linguistics Society,29(1):17–30.

Harald Bastiaanse. 2011. The Rationality of RoundInterpretation. In Vagueness in Communication,Lecture Notes in Computer Science, pages 37–50,Berlin, Heidelberg. Springer.

Brian Buccola, Manuel Kriz, and Emmanuel Chemla.2018. Conceptual alternatives: Competition in lan-guage and beyond.

Fausto Carcassi. 2020. The Cultural Evolution ofScalar Categorization. PhD Thesis, University ofEdinburgh, Edinburgh.

Emmanuel Chemla. 2007. French both: a gap in thetheory of antipresupposition.

Mark Davies. 2017. Corpus of Contemporary Ameri-can English (COCA).

Milica Denic and Jakub Szymanik. 2020. Differencesin thresholds between ’most’ and ’more than half’:Semantics or pragmatics? In Sinn Und Bedeutung25, London. OSF.

Danny Fox and Roni Katzir. 2011. On the characteri-zation of alternatives. Natural Language Semantics,19(1):87–107.

Michael C. Frank, Andres Gomez Emilsson, BenjaminPeloquin, Noah D. Goodman, and Christopher Potts.2017. Rational speech act models of pragmatic rea-soning in reference games.

Michael Franke. 2014. Typical use of quantifiers: Aprobabilistic speaker model. Proceedings of theAnnual Meeting of the Cognitive Science Society,36(36).

Noah D. Goodman and Andreas Stuhlmuller. 2013.Knowledge and Implicature: Modeling LanguageUnderstanding as Social Cognition. Topics in Cog-nitive Science, 5(1):173–184.

Martin Hackl. 2009. On the grammar and processingof proportional quantifiers: Most versus more thanhalf. Natural Language Semantics, 17(1):63–98.

Laurence R. Horn. 1989. A Natural History of Nega-tion. University of Chicago Press, Chicago.

Gerhard Jager, Lars P. Metzger, and Frank Riedel.2011. Voronoi languages: Equilibria in cheap-talkgames with high-dimensional types and few signals.Games and Economic Behavior, 73(2):517–537.

Roni Katzir. 2007. Structurally-defined alternatives.Linguistics and Philosophy, 30(6):669–690.

Sandro Pezzelle, Raffaella Bernardi, and Manuela Pi-azza. 2018. Probing the mental representation ofquantifiers. Cognition, 181:117–126.

Sonia Ramotowska, Shane Steinert-Threlkeld, Leen-dert van Maanen, and Jakub Szymanik. 2019. Most,but not more than half, is proportion-dependent andsensitive to individual differences. In Proceedingsof Sinn Und Bedeutung 24, pages 165–182, Os-nabrueck.

Stephanie Solt. 2016. On measurement and quantifi-cation: The case of most and more than half. Lan-guage, 92(1):65–100.

S. S. Stevens. 1946. On the Theory of Scales of Mea-surement. Science, 103(2684):677–680.

Tue Trinh and Andreas Haida. 2015. Constraining thederivation of alternatives. Natural Language Seman-tics, 23(4):249–270.

343