On calculating the reliability of the comparative method at long and medium distances: Afroasiatic...

44
is is a contribution from Journal of Historical Linguistics 2:2 © 2012. John Benjamins Publishing Company is electronic file may not be altered in any way. e author(s) of this article is/are permitted to use this PDF file to generate printed copies to be used by way of offprints, for their personal use only. Permission is granted by the publishers to post this file on a closed server which is accessible to members (students and staff) only of the author’s/s’ institute, it is not permitted to post this PDF on the open internet. For any other use of this material prior written permission should be obtained from the publishers or through the Copyright Clearance Center (for USA: www.copyright.com). Please contact [email protected] or consult our website: www.benjamins.com Tables of Contents, abstracts and guidelines are available at www.benjamins.com John Benjamins Publishing Company

Transcript of On calculating the reliability of the comparative method at long and medium distances: Afroasiatic...

This is a contribution from Journal of Historical Linguistics 2:2© 2012. John Benjamins Publishing Company

This electronic file may not be altered in any way.The author(s) of this article is/are permitted to use this PDF file to generate printed copies to be used by way of offprints, for their personal use only.Permission is granted by the publishers to post this file on a closed server which is accessible to members (students and staff) only of the author’s/s’ institute, it is not permitted to post this PDF on the open internet.For any other use of this material prior written permission should be obtained from the publishers or through the Copyright Clearance Center (for USA: www.copyright.com). Please contact [email protected] or consult our website: www.benjamins.com

Tables of Contents, abstracts and guidelines are available at www.benjamins.com

John Benjamins Publishing Company

Journal of Historical Linguistics 2:2 (2012), 239–281. doi 10.1075/jhl.2.2.04ratissn 2210–2116 / e-issn 2210-2124 © John Benjamins Publishing Company

On calculating the reliability of the comparative method at long and medium distancesAfroasiatic comparative lexica as a test case

Robert R. RatcliffeTokyo University of Foreign Studies

The high degree of contradiction and incompatibility between two indepen-dently produced Afroasiatic comparative lexica (Ehret 1995, Orel & Stolbova 1995) calls into question the reliability of the comparative method at deep time depths. The discrepancy could only have arisen if one or both sources contain a large number of chance or spurious matches. This article first documents the discrepancy between the two comparative lexica, and then attempts to explain it. The central proposal is that the evaluation of a proposed reconstruction must go beyond qualitative evaluations of individual proposed cognate sets and incor-porate quantitative tools for evaluating the probable degree of chance matches within the reconstruction as a whole.

Keywords: Afroasiatic, comparative method, phonological reconstruction, lexical reconstruction, probabilistic evaluation, probability theory in historical linguistics

1. Introduction

The appearance in the same year of two independently produced Afroasiatic comparative reconstructions, Ehret (1995) and Orel & Stolbova (1995), offers a rare chance to test empirically the claim that the comparative method can be re-liably applied across long distances. If the method is reliable, we would expect that different practicioners confronted with the same primary data would arrive at broadly similar results. It is disturbing to discover therefore, that the reconstruc-tions are divergent to the point of being mutually incompatible. The present article first documents the discrepancy between the sources, then attempts to account for it. The central proposal is that the evaluation of a proposed reconstruction

© 2012. John Benjamins Publishing CompanyAll rights reserved

240 Robert R. Ratcliffe

must go beyond evaluation of specific proposals and primary data, and extend to the probabilistic evaluation of the particular method of comparison that a com-parativist has employed in arriving at the results. Using probability theory, the paper presents a set of metrics for explicitly calculating the expected number of chance resemblances likely to be present in a proposed reconstruction, given the comparativist’s stated assumptions about such factors as sound-correspondences, word-structure, semantic range and sub-classification. The conclusion is that both sources probably incorporate a large number of chance resemblances, but for dif-ferent reasons, hence the discrepancy between them.

2. Comparing comparative lexica

The question of the possibility of identifying and reconstructing language families at historical time depths deeper than those assumed for established families like Indo-European has been fiercely debated. Some have argued that the comparative method simply does not apply beyond a certain point (e.g., Nichols 1996, Campbell 2004). Others contend that there is in principle no reason why the method should have a cut-off point, and that deeper relationships can be, and perhaps have been identified (Greenberg 2000, Ruhlen 1994, Bomhard 1998, etc.).

One way to test the reliability of the comparative method would be to under-take the following experiment. Take a set of languages for which a relationship has been suggested, but for which regular sound correspondences and a recon-structed proto-language sound system have not yet been established. Furnish two libraries in different places with all the available and relevant information on the languages (dictionaries, grammars, texts). Take two teams of researchers trained in the comparative method, put them in the libraries, keep them in isolation from each other and see what they come up with. If it is a reliable procedure then two trained practitioners of it confronted with the same body of data should come up with broadly similar results — repeatability of experiments might reasonably be expected, as in natural science. Of course, finding qualified volunteers for such an experiment might prove quite difficult.

The world of comparative linguistics is fortunate, therefore, that something very close to this experiment came to be performed by accident. The year 1995 saw the publication of two Afroasiatic comparative etymological dictionaries, produced in relative isolation from each other: Ehret (1995) and Orel & Stolbova (1995). Both works were produced by trained linguists, with knowledge of sev-eral of the relevant languages, making a good-faith effort to apply the comparative method. Both works hold strictly to the principle of regular sound correspondence in identifying cognates. And in both cases the reconstructed sound system which

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 241

the correspondences go back to are consistent with the uniformitarian principle: both the size of the inventory and the types of contrast reconstructed are consis-tent with what is found in living languages.

It is surprising, and indeed disturbing, therefore, to discover that the two lexica are widely divergent, as anyone who looks carefully at the two will soon discover. After much time sifting through the two sources I was able to find only 70 cases out of 1,011 entries in Ehret (1995) and 2,762 entries in Orel & Stolbova (1995) where the sources agree in connecting an item in one Afroasiatic branch with an item in another branch. (Of these 27 were also identified in an earlier Afroasiatic lexicon, Cohen 1947). By contrast I found nearly two-and-a-half times as many cases, 167, where the proposals of cognacy in the sources contradict each other, that is where both sources take up a lexical item in one branch but connect it with a distinct (internally non-cognate) lexical item in another branch. The vast major-ity of the proposals in each source find no reflection at all in the other source.

One might suppose that all these remaining proposals are complementary, such that the Proto-Afroasiatic lexicon could be said to consist of all of Ehret’s 1,011 proposals plus all of Orel & Stolbova’s 2,672 proposals, minus the contradic-tory sets. But such a conclusion would not be methodologically justified. The high ratio of contradictory to agreeing sets is indicative of a deep discrepancy between the two sources. Close inspection of the contradictory cases shows that the con-tradictions arise from different assumptions about sound correspondences, differ-ent assumptions about root-structure and morphological derivation, or different assumptions about what constitutes semantic similarity. Each set of assumptions may be internally consistent and justifiable within each source, but they are not compatible with each other. Therefore, even when there is no direct contradic-tion, proposals of cognacy in one source which crucially rely upon assumptions about sound correspondences, root-structure, or semantics which are contradic-tory with the assumptions about these matters made by the other source must be judged incompatible in principle. By this criterion I have judged that about 555 of Ehret’s (1995) entries, more than half the total, as incompatible with the Afroasiatic reconstruction of Orel & Stolbova (1995). In short the two sources have reconstructed mutually unrecognizable proto-languages.

This result would seem to indicate that the comparative method is not terribly reliable at the time depth of Afroasiatic. More realistically, I think, it indicates that what is called the comparative method is actually a rather inexplicit, not terribly rigorous, procedure that allows for considerable latitude of application. In order to evaluate the reliability of a proposed reconstruction (or to evaluate competing reconstructions), therefore, we need a way to critically evaluate the reliability of the comparative method itself, that is, of the particular comparative methodology that has been employed in each specific case.

© 2012. John Benjamins Publishing CompanyAll rights reserved

242 Robert R. Ratcliffe

It is traditional to approach a problem like this as a qualitative or data-based problem. That is, the traditional approach to evaluating a proposed reconstruction is to focus on evaluating the data that goes into the individual cognate proposals. This is entirely appropriate up to a point. Proposals based on significant errors in the primary data will of course have no value. But a purely qualitative evaluation is necessarily incomplete. That is because a reconstruction is not a substance (not a datum, not an observable). A reconstruction is a hypothesis. It is easy to lose sight of this distinction because superficially reconstructions take the same form as de-scriptive linguistic statements. Epistemologically, however, they are quite different. If a fieldworker asserts that language X has a phoneme /p/ or a word pik meaning ‘fish’, these are in principle falsifiable statements which can be confirmed or dis-confirmed by observation. If a comparativist asserts that proto-language X had a phoneme *p or word *pik meaning ‘fish’ these are hypothetical statements, which can never be confirmed as either true or false, insofar as direct observation of the proto-language is impossible. Such statements can only be evaluated as probable or improbable — in other words, in terms of their probability. Descriptive linguis-tics is a historical discipline: It is concerned with the documentation of human culture. Philology, paleography, archaeology, epigraphy (our sources of evidence about languages no longer spoken and for language history) are likewise histori-cal disciplines. But reconstruction on the basis of the comparative method is not a historical discipline. While a judicious historian is careful to avoid extrapolating beyond what can be documented, reconstruction is by definition an attempt to extrapolate beyond the data. Reconstruction is a matter of reasoning about prob-abilities in order to draw inferences about the undocumented past. “Reasoning about probabilities” falls within the purview of a branch of applied mathematics called probability theory.

I would like to submit that in dealing with questions of long- and medium-dis-tance comparison, the field of historical linguistics could benefit immensely from recognizing that it combines two distinct fields with distinct research methodolo-gies: documentation of linguistic data (where a historicist methodology is appro-priate) and probabilistic reasoning beyond the documentation (which is properly a branch of applied mathematics). The field could further benefit from trying, as far as this is practically possible, to separate issues of historical documentation (e.g. accuracy of data and sources) from issues of probabilistic reasoning, and from refining and improving the tools for dealing with issues of the second type. This can be done by moving our discussions of these issues from the domain of common-sense, ordinary-language modal logic (“It seems likely/unlikely to me that …”) into the more explicit domain of probability theory.

Mathematical tools have been applied to problems in historical linguistics with increasing sophistication in recent years (Bender 1969, Embleton 1986, Ringe

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 243

1992, 1998, Kessler 2001, Zuraw 2003, McMahon & McMahon 2005, Holman et al. 2008, Gray et al. 2009, etc.). However most of these studies have been principally concerned with issues of language classification (and secondarily with issues of sub-classification and dating). We are concerned here with something different: Evaluating the reliability of a reconstruction. Yet probabilistic tools developed for classification issues can be adapted to the problem at hand because the same cen-tral question is involved in each: What is the expected number of chance matches (similarities, correspondences, etc. depending upon one’s criteria) which can be expected to be found in the comparison of two (or more) languages? That is how to calculate the expectation of chance.

In a classification study the expectation, once calculated, serves as the mea-sure against which the actual number of found matches (the observation) can be compared. If the observation exceeds the expectation by a statistically significant amount, one concludes that the similarity is not due to chance. If this similar-ity is also not due to universal patterns of onomatopoeia, sound-symbolism, etc., one concludes that it must be due to historical factors (either common genetic inheritance or borrowing). In a classification study, the size of the expectation is not important; it is only the size of the difference between expectation and ob-servation that matters. In trying to evaluate the reliability of a reconstruction, by contrast, we are really only interested in calculating the size of the expectation. The question of how reliable a reconstruction is can be posed as a question about how many chance resemblances or spurious cognates are likely to be present in it. The comparative method employed in a given case is reliable to the extent that it has a low expectation of chance matches. A reconstructive methodology which can be expected to yield a high-number of chance matches has to be judged unreliable, regardless of how many matches are actually found.

The problem in trying to apply these ideas to the evaluation of a proposed re-construction is that comparativists normally publish only their results and rarely provide an explicit statement of the method of comparison used to arrive at these results. Kessler (2001) succinctly explains the problem:

… traditional historical [linguistic] work does not proceed along lines that invite statistical analysis. In a canonical statistical study, it is important that one select in advance (i.e. before even looking at the languages) the attributes that one is going to study and the metric for evaluating those attributes. (Kessler 2001: 18)

It is only by clearly specifying in advance how a comparison is going to be made that the expected number of chance matches can be explicitly calculated. In other words the expectation of chance is a function of how a comparison is conduct-ed. Ringe (1992, 1998) and Kessler (2001) have elaborated a method for calcu-lating chance matches, but they have done so for only one way of conducting a

© 2012. John Benjamins Publishing CompanyAll rights reserved

244 Robert R. Ratcliffe

comparison, namely using a 100- or 200-word Swadesh list and matching each word on the list in one language with only one semantic equivalent in each other language compared. As an illustration of the application of mathematical tools to comparative linguistics this is fine. Yet no one disputes that it is an artificially narrow and restricted version of the comparative method, quite far removed from actual practice, as critics have pointed out (Baxter 1998).

The crucial point for the problem at hand is that any method of conducting a comparison can be expected to turn up a certain number of chance resemblances or spurious cognates, and this number can in principle be calculated. The expected number of chance matches is a function of how a researcher goes about search-ing for cognates. What is found is a function of how it was looked for. There is no absolute number of expected random similarities which occur or exist, between two languages. There is only a number of random similarities that can be expected to be found, depending on how the search was conducted. If we were to program a computer to look for possible cognates, one of the first things we would have to do is write an algorithm to calculate the number of chance matches expected given the particular search parameters.

If everything is known about how a linguist has conducted a comparison it is in theory a simple matter to calculate the expected number of chance matches in that scholar’s results. In practice, however, comparative linguists do not normally specify how they have conducted a search for cognates. Therefore any attempt to evaluate a particular scholar’s reconstruction probabilistically will require us to look at the reconstruction as a whole in terms of what it reveals about the author’s search methodology. In other words, rather than trying to evaluate each proposed cognate set individually, we will want to look at the total complex of proposed cognates in terms of what it reveals about the comparativist’s assumptions about phonology, morphology, semantics, etc. and how these assumptions have affected the search strategy. Our evaluation should also take into account bibliographies, indices, introductory statements and any other useful indications of methodology that a researcher may have provided.

In the case at hand, one would not wish to deny the importance of a data-based critique. There are indeed some simple mistakes in the supporting primary data in each of the lexica, as previous reviewers have pointed out (Kammerzell 1996, Kaye 1996, Diakonoff & Kogan 1997, Kogan 2002). This by itself, however, is hardly sufficient to explain the discrepancy between them. Another potential source of discrepancy may simply be an emphasis on different data. While both researchers seem to have relied on much the same primary and secondary sources, judging by the bibliographies, there is reason to think that the researchers have given differ-ent priority or emphasis to different branches. There is a clear skew or bias toward Cushitic in Ehret (1995) and toward Chadic in Orel & Stolbova (1995), such that

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 245

a high number of Ehret’s proposals involve a comparison of Cushitic with one of the other branches, while a high number of Orel & Stolbova’s proposals involve a comparison between Chadic and another branch. This no doubt reflects the fact that Ehret had previously worked primarily on Cushitic, while Orel & Stolbova had previously done work on Chadic. This difference of focus or perspective on different branches can partially explain the discrepancy, in particular the higher proportion of complementary to agreeing sets. But it does not explain the contra-dictions and incompatibilities. In cases where proposed cognate sets are contra-dictory, one or both sets must reflect mere accidental similarity. In cases where contradictory sound changes are proposed, then the lexical items supporting the proposed correspondence in one or the other source must be due to accidental similarity. Therefore, there is good reason to suspect that many seemingly plau-sible reconstructions in one or both sources are based on “cognate sets” which actually reflect chance or accidental similarity.

In what follows I will present the results of my comparison of the two lexica, and then show how probabilistic tools can be used to evaluate and explain the dis-crepancy between them. I will consider how to calculate in principle the relative costs (in terms of raised expectation of chance matches) of different ways of con-ducting a comparison. I will also attempt to determine how allowing for leeway in different areas may vary the number of chance matches. Finally I will show how it might be possible to estimate the expected number of chance matches likely to be present in a proposed set of reconstructions, based on what the researcher making the proposal reveals about the method of conducting the search.

The conclusion for the specific case is that both Ehret (1995) and Orel & Stolbova (1995) have adopted methods and assumptions which have insured a high expected number of chance matches. Yet because the methods and assump-tions adopted are different in each case, the range of chance or random results ex-tends in different directions, leading to discrepancy, incompatibility and in some cases outright contradiction in the two proposed reconstructions.

There turn out to be four particular features of one or both reconstructions which stand out as probable sources of chance matches or spurious cognates: direct comparison of languages in different branches in the absence of solid reconstruc-tion at the branch level; use of data from well-documented languages without ety-mological analysis; broad semantic leeway and/or highly abstract lexico-semantic reconstruction; and reconstruction of Proto-Afroasiatic root-morphemes as pri-marily biconsonantal CVC. In the case of Orel & Stolbova (1995) the main poten-tial source of chance matches are broad semantic leeway and the comparison of many languages with little or no regard for branch- and sub-branch level recon-struction. While Ehret relies much more on branch-level reconstructions, he has also been able to achieve a high degree of chance results due to equally broad but

© 2012. John Benjamins Publishing CompanyAll rights reserved

246 Robert R. Ratcliffe

different semantic leeway, rigorous application of the biconsonantal hypothesis, which allows him to ignore third consonants, and the liberal exploitation of a pan-chronic, non-etymological, Arabic dictionary.

Many linguists might regard each of these four features as problematic on general principles. The first two might be judged as problematic deviations from best practice, but they are to some extent unavoidable given the current state of scholarship. Some linguists might reject a reconstruction containing the second two features as violations of the uniformitarian principle. A language in which the core lexicon consists of a small number of abstract concepts (in Ehret’s case almost exclusively verbs), with more concrete meanings such as body parts derived by af-fixation might strike some as typologically unusual.

There are two problems with critiques based on best practice and uniformitar-ian assumptions. First, not all such principles have the same epistemological solid-ity. In many cases we do not really know how sound they are. Arguments based on what sorts of structure are likely or unlikely, or what direction of change is likely or unlikely in languages are themselves based on probabilistic reasoning over statisti-cal trends. Rejection of a reconstruction on uniformitarian grounds presupposes reference to some notion of what kind of structures are most frequent or least frequent in living languages. Ideally this should be determined by systematic typo-logical sampling, but in practice it is usually determined by a linguist’s own, neces-sarily limited experience, which may be biased by over-exposure to languages of a particular family or area.

Second, such critiques can only lead to qualitative evaluations (good or bad), and hence to complete rejection or complete acceptance of the proposal being evaluated. But what is really needed is an evaluation costly vs. less costly along a quantitative continuum. This allows for a more subtle evaluation, perhaps allow-ing us to reject some aspects of a reconstruction, while retaining others, and also bringing out more explicitly what areas of the data-based historical side of the problem are most in need of further research. For example, we might all agree that ignoring vowels in comparison (which neither Ehret nor Orel & Stolbova do) is bad practice, but in probabilistic terms it turns out not to be very costly. On the other hand ignoring anything but the first two consonants and reconstructing to a CVC target may or may not be bad practice. The judgment depends on many fac-tors. But it is demonstrably extremely costly.

Again we may all agree that etymological research on the literary languages within Afroasiatic and low-level reconstruction of the numerous branches and sub-sub-branches are ideally prerequisite to a broader reconstruction of Afroasiatic. But etymological dictionaries are not yet available for many languages, and low-level reconstruction is proceeding at an uneven pace. A qualitative approach can only lead us either to reject any attempt at reconstructing proto-Afroasiatic in the

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 247

absence of these resources or to ignore the problem altogether. But such method-ological purism does not reckon with the practical realities of scholarship. Given the paucity of researchers in the field, etymological dictionaries and low-level re-constructions may be slow in coming. A quantitative approach allows us to ask how costly it is to proceed in the absence of these resources, and draws attention to areas where future research can be most fruitfully employed. Hence rather than critiquing the sources by simply pointing out these problems, I will focus on ex-ploring the quantifiable methodological implications of each of them. In different ways each increases the expected number of chance matches and allows a measur-able expectation of randomness into the results.

3. Measuring the discrepancy

Before proceeding to the methodological discussion, I wish to make explicit my own method for comparing the two comparative lexica, and to elaborate on the nature of the discrepancy which appears to exist between them.

With a language grouping as large as Afroasiatic, it is reasonable to ask wheth-er the same languages and sources have been looked at — the extent to which the experiment imagined in the introduction has actually been realized. Most of the primary sources (language descriptions) and all of the secondary sources (lower-level reconstructions) cited by Ehret (1995) are also cited by Orel & Stolbova (1995). Orel & Stolbova’s (1995) bibliography is much larger. But Ehret’s (1995) bibliography is not intended to provide a complete list of his primary sources. He relies heavily on prior branch-level reconstructions, notably his own prior work on South-Cushitic and Cushitic (Ehret 1980, 1987), and for Chadic, Jungraithmayr & Shimizu (1981) and Newman & Ma (1966) supplemented by Shuh’s (1981) dic-tionary of Ngizm.1 All of these sources are also cited by Orel & Stolbova (1995), who however indicate Dolgopolskij (1973), which Ehret (1995) does not cite, as their basic source for Cushitic. For Egyptian and Omotic the same sources appear to have been used, notably Faulkner (1962) and Cerny (1976) for Egyptian, and Bender (1988) for Omotic. For Semitic, Ehret relies principally on his own internal reconstruction of Arabic supplemented by data from Modern South Arabian lan-guages. Orel & Stolbova’s treatment of Semitic is more conventional and draws on a larger number of languages. One major language for which different sources ap-pear to have been used is Classical Arabic, where Ehret’s main source is Steingass (1884), while Orel & Stolbova’s main source is Biberstein-Kazimirski (1860). A greater problem than this difference, however, is that both dictionaries are quite old and neither is based on etymological principles. I come back to the method-ological implications of this below.

© 2012. John Benjamins Publishing CompanyAll rights reserved

248 Robert R. Ratcliffe

Beyond any question of different sources, Ehret (1995) and Orel & Stolbova (1995) do appear to have given greater weight or importance to different sub-branch-es, such that one gets the impression of a Cushitic bias in Ehret (1995) and a Chadic bias in Orel & Stolbova (1995). This impression can be made more explicit through the following observations. Of the first 100 entries in Ehret (1995), 77 are support-ed by a Cushitic-Semitic comparison, and this is the highest number of pairwise comparisons. The next highest numbers are provided by Cushitic-Egyptian (51), Cushitic-Chadic (46) and only then Semitic-Chadic (45). For the first 200 entries in Orel & Stolbova (1995) the highest number of pairwise comparisons is Semitic-West Chadic (49), followed by Semitic-East Chadic (38), Semitic-Central Chadic (33), Egyptian-West Chadic (29) and only then by Semitic-Lowland East Cushitic (27).

A further complication in comparing the lexica is that the sources adopt slight-ly different, though not incompatible, sub-classifications. Ehret (1995) maintains a standard six branch division (Semitic, Egyptian, Cushitic, Omotic, Chadic, and Berber), but he chooses to ignore Berber data in the comparison.2,3 Orel & Stolbova (1995) question, but do not necessarily reject, the reality of a Cushitic sub-branch, and present ‘Cushitic’ data under the headings of six sub-families: Beja, Agaw, ‘East Cushitic’ [single quotation marks in original], Dahalo (a single language), Mogogodo (a single language, also known as Yaaku), and Rift (the South Cushitic of Ehret 1995 and other sources). While Chadic is recognized as a single branch by Orel & Stolbova (1995), Chadic data is likewise grouped under three headings: West, Central, and East. A further difficulty for the evaluator is that Ehret often gives reconstructed forms only without attestation of actual language data.

In the following discussion I will use the term “Agreeing” to describe cases where both sources agree in connecting a particular word in one branch with a particular word in at least one other. It is important to emphasize that this category refers to proposed cognate sets, not to the form reconstructed as underlying the cognate set. There are often considerable differences in reconstruction even when the cognate sets are agreeing, as example (1a) in Table 1 indicates.4

Table 1. Agreeing

branch A B C D

E x z

OS x z

Ex. (1a) Semitic Berber Egyptian Chadic Cushitic

E600

*maaw *mwt mwt, mt *mətə EC *umaawə

‘die’ Ar. mawt Ng. mət *-am-w(t)

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 249

Table 1. (continued)OS1751

*mawut, *mu:t *mVt *mt, mwt W, E *mawut *mut

‘die’ Ar. mwt Kby. emmet Hs. mutu Rnd. amut

etc. etc. etc.

Ex. (1b) Semitic Berber Egyptian Chadic Cushitic Omotic

E51

*pir *pr(r) p3 *pərə *pa/i/ur *pa/ir

‘fly’ ‘flee’ ‘fly’ ‘fly, jump’ ‘fly, jump’ ‘fly’

pri J. pir

‘go up’

OS1981

*pir *pVr *fVr pry *pir/pVr *fir

‘fly’ Ug. pr ‘fly’ ‘soar’ ‘soar’ Ag. *fir

‘fly’ Hs. fi:ra Bed. fir

Ar. frr Ang. pi:r

‘flee’ etc.

etc.

The term “Complementary” describes two cases. The first is where connections are proposed between words not treated by the other source, but where these pro-posals are not incompatible in principle. The second case is where both sources take up a word in one branch but connect it with proposed cognates in different branches, as in Table 2, example (2a), below where the same Semitic (Arabic) form is taken up and example (2b) where the same Cushitic (Agaw) form is taken up.

Table 2. Complementary (case 2)

branch A B C D

E x y w

OS x z

Ex. (2a) Semitic Egyptian Chadic Cushitic Omotic

E585

*mak Ar. makk mkw *m-k Ym. makt

‘eat up’ ‘suck out’ ‘food’ ‘gullet’ ‘be hungry’

OS1790

*muk *muk *muk

‘suck’ Ar. mkk Ang. muk

© 2012. John Benjamins Publishing CompanyAll rights reserved

250 Robert R. Ratcliffe

Table 2. (continued)‘suck’ ‘sip’

Ex. (2b) Semitic Egyptian Chadic Cushitic Omotic

E322

*kum Ar. kamm km *kum *kum

‘add together’ ‘assemble’ ‘to total’ ‘multitude’

‘increase’ kmyt Ag. ‘cattle’

‘herd of cattle’

OS1479

*kom W. *kwam Ag. *kim

‘cattle’ ‘cow’ ‘cattle, cow’

C. *kum

‘meat’

E. kwama

‘buffalo’

The term “Contradictory” defines cases where both sources treat the same word in one branch but connect it with a different words or internally non-cognate words in another branch, as illustrated in Table 3.

Table 3. Contradictory

branch A B C D

E x z

OS x b

Ex. (3a) Semitic Egyptian Cushitic

E57

*pax MSA *pxð phr PEC *bax

‘bend’ ‘thigh’ ‘turn’ ‘bow’ (n.)

OS1931

*paxud Mhr. faxed, etc. xpd Som baʔudo

‘leg, thigh’ ‘thigh’ ‘thigh’ ‘hip’

Ex. (3b) Semitic Berber Chadic Cushitic

E88

*fat’ *ptʕ *ps′r *fat′a

‘excrete’ ‘excrete’ ‘urine’ ‘fresh dung’

Ar. fatʕfatʕ

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 251

Table 3. (continued)‘drop excrement’

Ar. Fatʕʔ

‘break wind’

Ar. fatʕħ

‘give birth’

OS2003

*putʃ’ Ar. faðʕðʕ Ahg. tafəzʕzʕa Wch. *putʃ′i-ar

‘urine’ ‘horse urine’ ‘urine’ Hs. fits’aarii

‘urine’

Ex. (3c) Semitic Egyptian Chadic Cushitic

E425

*k′eer/k′oor *qr qrt Ng. gəriɗ PSC k′eer

‘cut into’ ‘cut’ ‘depression’ ‘cave’, ‘cut meat’

‘hollow in a tree’

OS1556

*kʕar Ar. qaʕar Ng. kara Dhl. k′eer

‘cut ‘cut trees’ ‘cut’ ‘cut’

Note that Mehri is a Modern South Arabian language. Hence it appears that in example (3a) the same Semitic forms are being referred to, although both the Egyptian and Cushitic forms are different. In example (3b) the Chadic forms seem to agree; the Semitic forms do not. Note that in any case different sound corre-spondences are assumed, making the proposals incompatible in principle. In ex-ample (3c) the South Cushitic form is the same, but the proposed Chadic cognates are different, and the proposed Semitic cognates are probably also different. It is not clear which Semitic words Ehret (1995) has in mind here since the appendix does not list this root. He refers to Ehret (1989), but this publication does not list *qr ‘cut’, although eight other roots *qr with various meanings are reconstructed.

Note that it is theoretically possible for a set to be simultaneously agreeing and contradictory.

Table 4. Agreeing and contradictory

branch A B C

E x y z

OS x w z

Ex. (4) Semitic Egyptian Cushitic

© 2012. John Benjamins Publishing CompanyAll rights reserved

252 Robert R. Ratcliffe

E126

*dagw Ar. daʤʤ dg3, dgs PSC dakw

‘walk about’ ‘walk along’ ‘walk’ ‘be going’

OS619

*dag *dig dg3 Ag. *dig

‘go’ ‘go slowly’ ‘go’ ‘come close’

Ar. dgg HEC *dag

‘come’

These cases are few and I have counted them as agreeing. Thus example (4) in Table 4 is counted as agreeing, since the proposed Semitic–Egyptian cognates agree, although the proposed Cushitic cognates appear not to.

Finally the term “incompatible in principle” describes the case where one source proposes connections between words not treated by the other source, but where the validity of these connections depends upon proposed sound correspon-dences or assumptions about grammatical structure or semantics not admitted by the other source. The categories of agreeing, contradictory, and complemen-tary require no further explanation. But the category of incompatible in princi-ple (henceforth IIP) involves sometimes rather subtle judgment (on which other evaluators might possibly disagree). Hence the basis for making such judgments should be laid out in detail.

3.1 Incompatible in principle

There are three different causes of incompatibility between the sources: incompat-ible sound correspondences, incompatible assumptions about root structure, and incompatible assumptions about semantics. The category accounting for the larg-est number of IIP sets is incompatible sound correspondences. It is worth noting, however, that most contradictory sets are made possible by different assumptions about root structure or semantics.

3.1.1 Incompatible sound correspondencesIf a cognate set depends upon a sound correspondence contradictory with the sound correspondences proposed in the other source it is IIP. The consonant systems reconstructed by the two sources are superficially fairly similar. But the correspondence sets which the reconstructions reflect are sometimes contradic-tory. A system of 42 consonants is reconstructed by Ehret (1995) and one of 33 by Orel & Stolbova (1995). The 11 consonants reconstructed by Ehret but not by Orel & Stolbova (1995) are three nasals (palatal, velar, and labio-velar ɲ, ŋ, ŋʷ),

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 253

two sibilants (z and ʃ), a labial ejective (p’), and a labio-velar series of three stops (kʷ, gʷ, k’ʷ) and two fricatives (xʷ, ɣʷ). The two consonants reconstructed by Orel & Stolbova (1995) but not by Ehret are two uvular stops, unvoiced and ejective (q, q’).

This leaves a set of 30 identical or nearly identical reconstructed consonants. For 12 of these there is more or less complete agreement in the proposed corre-spondence sets, as presented in Figure 1:

b t dh�

ʔћ

mr l

w j

Figure 1. Agreeing sound correspondence sets

If some leeway is allowed, the total number of more-or-less agreeing sets can be augmented as follows. Reconstructed /s/ in both sources reflects the same corre-spondence sets, although Ehret obscures this by reconstructing /s/ here for Proto-Semitic, rather than the standard Semiticist /ʃ/ (which Orel & Stolbova 1995 main-tain). On both analyses a p~f distinction is reconstructed on the basis of Egyptian and Chadic with merger assumed in Semitic and (for Orel & Stolbova) Berber, al-though Orel & Stolbova assume a set of mergers and splits in Cushitic and Omotic which perhaps prevent complete agreement. The correspondence sets for the velar stops are also the same (essentially no sound change assumed for any branch) ex-cept for Ehret’s labio-velars, which can be collapsed with the plain velars for com-parison purposes. The dental nasal /n/ corresponds to /n/ in all branches in both approaches. Ehret’s five nasals are based on Cushitic, and supported somewhat by Chadic. The palatal, velar, and labio-velar nasals are assumed to have merged with /n/ in Egyptian and Semitic. These four nasals can thus be treated as a single set for purposes of comparison.

That still leaves 17 proposed correspondence sets (= proposed reconstructed segments) in Ehret (1995) and 14 in Orel & Stolbova (1995). These include all the reconstructed affricates, lateral fricatives, velar or uvular fricatives, and emphatics (ejectives on both reconstructions) other than k’, as well as two sibilants for Ehret and two uvular stops for Orel & Stolbova. Of these only the ejective palatal affri-cate ʧ’ unequivocally reflects the same correspondence set (reflecting Arabic ðʕ). Ehret’s lateral fricative ɬ, however, reflects essentially the same correspondence set as Orel & Stolbova’s (1995) alveolar-lateral affricate tɬ’ (reflecting proto-Semitic ɬ in both cases), with a discrepancy only in Omotic, which is in any case involved in few proposed cognate sets.

© 2012. John Benjamins Publishing CompanyAll rights reserved

254 Robert R. Ratcliffe

This gives a compatible set of 21 reconstructed segments, corresponding with 27 segments in Ehret (1995) as indicated in Figure 2 (with Ehret’s extra segments enclosed in brackets, and Orel & Stolbova’s equivalent to Ehret’s ɬ in parentheses).

p b t d k g k′[ kw gw k′ w ]

f s h(tɬ) ʧ′ɬ

� ŋŋw

m n [ ][ ]

r lw j

ʔ

ћ

Figure 2. Compatible sound correspondence sets

The remaining 15 reconstructed segments in Ehret (1995) and 12 in Orel & Stolbova (1995) represent proposed correspondence sets which cannot be recon-ciled with any proposed correspondent set in the other source:

– Ehret: p', ʃ, ʦ, z, dz, ʧ, ʤ, t', s', tl', dl, ɣ, ɣʷ, x, xʷ– Orel & Stolbova: ʦ, dz, ʧ, ʤ, t', ʦ', ɬ, tɬ', ʁ, χ, q, q'

Entries reconstructed as beginning with these segments account for a total num-ber of 276 entries in Ehret (1995) or roughly 27% of the total of 1,011 entries: p' (18), ʃ (23), ʦ (6), z (28), dz (8), ʧ (6), “ʧ or ts” (13), ʤ (7), “ʤ or dz” (12), t' (22), s' (17), tl' (32), dl (26), ɣ(24), ɣʷ(9), x(16), xʷ(9).

The problematic segments in Orel & Stolbova (1995) account for 450 entries or 17% of the total of 2672 entries: ʦ (41), dz (45), ʧ (35), ʤ (32), t' (48), ʦ' (49), ɬ (28), tɬ' (12), ʁ (24), χ (92), q (23), q' (21). Note that besides /χ/ the average number of supporting entries for each correspondence in Orel & Stolbova (1995) is below the average of 81.

Entries beginning with these segments have been excluded as IIP and account for the largest number of IIP entries (276/555 = 50% of all IIP entries, 27% of all entries). The second largest number of IIP entries consists of proposed cognates which have one of these consonants elsewhere in the word (131 entries or 24% of all IIP entries, 13% of all entries).

3.1.2 Incompatible assumptions about root structureBoth Ehret (1995) and Orel & Stolbova (1995) allow 2-to-3 matches, that is cases where a word with two consonants in one language is proposed as cognate with a word with three consonants in another. However, the justification for this is differ-ent in each case. Orel & Stolbova (1995) allow for the loss of certain consonants, notably sonorants and gutturals (velar fricatives and post-velar consonants), in all environments, and allow for a small number of consonants to be treated as

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 255

fossilized prefixes or suffixes. Ehret takes the more radical view that all third con-sonants are fossilized suffixes. This allows him further to recognize 2-out-of-3 matches, cases where the first two consonants in a three-consonant word corre-spond with the first two consonants in a three-consonant word in another lan-guage, although the third consonants do not correspond.

– Permitted by Ehret and Orel & Stolbova (3-to-3 and 2-to-3) ABC ABC ABC AB

– Permitted by Orel & Stolbova, but not by Ehret (2-to-3) ABC ABC BC AC

– Permitted by Ehret but not by Orel & Stolbova (2-out-of-3) ABC ABD

I have regarded proposed correspondences which rely on the second two types of matches as IIP. Since I took Ehret (1995) as the starting point for comparison, only the last type occurred, however. This accounts for 125 cases or roughly 23% of all IIP entries, 12% of all entries.

3.1.3 IIP or contradictory because of semantic assumptionsThe category of semantically incompatible includes two types of problems, first for a word with many meanings or many translation equivalents, the sources may choose to take different ones as basic and search other languages accordingly. For example, for the Arabic root b-k-r, Orel & Stolbova (1995) take the Arabic bakr ‘young camel’ and connect it with a Berber form *bVkVr meaning ‘lamb’ or ‘kid’. Ehret (1995) cites Arabic bakar ‘morning’ and connects it with an Egyptian bk3 ‘to be bright’, Cushitic (Dahalo ɓakkeeð) ‘kindle’, Chadic *b-kǝ ‘roast’, and Omotic *bak ‘star’ (E11 *bâk ‘shine’, OS196 *bakVr ‘young animal’). If one ignores the se-mantics, these entries are complementary. But I have made a judgment that in this case the semantics are too far apart for one source simply to incorporate the other’s proposal. Hence I judge these proposals IIP.

The second problem, not unconnected with the first, is that the authors have different assumptions about what sort of vocabulary to reconstruct for the proto-language. As this example also illustrates, Ehret has tendency to reconstruct verbs wherever possible, while Orel & Stolbova (1995) prefer concrete nouns.

I have used semantic incompatibility sparingly in deciding IIP, only judging 23 examples (4% of IIP entries) as IIP on this criterion. But it is clear that many of the contradictory sets also result from different semantic assumptions, meaning

© 2012. John Benjamins Publishing CompanyAll rights reserved

256 Robert R. Ratcliffe

that the category could be more widely applied (cf. example 2b, above, classed as complementary).

3.2 Measuring the discrepancy: Results

If the comparative method is truly a reliable and consistent procedure even when applied to a language grouping as broad and diverse as Afroasiatic, we would ex-pect the bulk of the entries to be agreeing or complementary. In fact, however, this is far from being the case. Specifically I was able to find only 70 agreeing sets, as against 167 contradictory sets. The full lists of these is given in Appendices 1 and 2. Of Ehret’s (1995) 1,011 entries, 555 are incompatible in principle with Orel & Stolbova (1995), while only 214 are complementary. (There remain another 1,000 entries in Orel & Stolbova 1995 which are either complementary or IIP with Ehret, but which I have not evaluated).

Thus slightly less than seven percent of the cognate sets proposed by Ehret (1995) are also proposed by Orel & Stolbova (1995). Slightly more than anoth-er 21% are complementary. From the reverse perspective, since Orel & Stolbova (1995) have roughly two and a half times as many entries as Ehret (1995), only a little more than two percent of the cognate sets proposed by Orel & Stolbova (1995) are also acknowledged by Ehret (1995). Die-hard opponents of long-dis-tance comparison may leap to the conclusion that the method is 93% to 98% inac-curate even at medium depths, but such a conclusion would be premature. Still, the fact remains that two sets of scholars have been able to reconstruct mutu-ally unrecognizable proto-languages, and this demands an explanation. If Orel & Stolbova’s (1995) reconstructions are correct then Ehret (1995) has hundreds of spurious cognates. Contrariwise if Ehret’s reconstructions are correct, then Orel & Stolbova (1995) must be full of spurious cognates. Of course both sets of re-constructions could be flawed to some degree, in which case both sources would contain a number of spurious cognates.

4. Degree of randomness

From the point of view of the specialist interested in Afroasiatic, a perfectly appro-priate response to this situation would be to undertake to sift through and evaluate each proposed cognate set, rejecting, accepting, or modifying as necessary. Such work is indeed ongoing (see for example Diakonoff & Kogan 1997, Kogan 2002). From the point of view of comparative linguistic methodology, however, the situ-ation raises a question of more general interest: If it is possible for trained and knowledgeable scholars to produce hundreds of plausible, but necessarily spurious

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 257

cognates (in one case or the other), what does that tell us about the pitfalls of the comparative method at medium to long range time-depths?

It shows clearly that there is latitude for application of the comparative method and that differences in application may lead to radical differences in result. In oth-er words the number of spurious matches found is to a large extent a consequence of how a researcher conducts the search for cognates. Furthermore it should be possible to calculate the degree of randomness in any comparison, where degree of randomness is defined as the expected number of accidental matches (leaving the term “match” undefined for the moment, as it depends upon the criteria ad-opted by a given researcher), if we know everything about how a researcher has conducted the comparison.

The relevant tool is probability theory. The average expected number of chance occurrences of an event is the probability of the event (P) times the number of tri-als (T). (The probability of a flipped coin coming up heads is 50% or .5. If the coin is flipped 100 times the expected number of chance occurrences of heads is 100 x .5 = 50.) The probability of a match, for example, the probability that semantically equivalent words in different languages will start with the same consonant or with consonants hypothesized to correspond, is a function of the relative frequency of the segments compared (Ringe 1992, Kessler 2001). The number of trials in a com-parative study is the number of comparisons made. In conducting a comparison, there are various assumptions and procedures which a researcher may adopt, which are not inherently illegitimate, but which nonetheless lead to increasing the prob-ability of a match or to increasing the effective number of comparisons made, and hence to increasing the degree of randomness. Let us consider what these are and how they may have affected the results of the two Afroasiatic comparative lexica.

4.1 General Calculations

4.1.1 Probability of a MatchTo begin with the probability of finding any segment in any given position in a word is equivalent to the frequency of the segment in that position in that lan-guage (Ringe 1992, Rosenfelder n.d.). If 5% of the words in a given language start with d- then the probability that the word for ‘dog’ or any other given word will start with d- is 5%, or 1/20. Under the formula of PT, therefore, if one takes 20 words in English starting with d- and compare them with exactly one semantic equivalent in this hypothetical language one would expect to find on average that one of these 20 will also start with d-. The number of words compared is the num-ber of trials. If 100 English words starting with d- are compared, then (on average) five words starting with d- in the hypothetical language can be expected. (Note

© 2012. John Benjamins Publishing CompanyAll rights reserved

258 Robert R. Ratcliffe

that the relevant notion of frequency depends on type counts rather than token counts, i.e. dictionary entries rather than running text).

Without calculating the frequency of each consonant (or vowel), we can calcu-late the average frequency of all consonants (or vowels) in a language by simply di-viding one by the number of consonants (or vowels, respectively) in the consonant (or vowel) inventory of that language. I will use average frequency here for pur-poses of illustration, then briefly consider the effects of variable frequencies. If a cognate is defined as a word containing two or more correspondences, and if both languages contain 20 consonants, each having the same frequency of occurrence in all positions, and correspondences are assumed to be exactly 1-to-1 between the two languages, then the chance of a match on C1 (on average for any of the 20 C’s) is .05 (1/20) and the probability of a match on C1 and C2 is .05 x .05 = .0025 (1/400), and the probability of a match on C1, C2, and C3 is .000125 (1/8000).

– 20 C’s C CC CCC .05 (1/20) .0025 (1/400) .000125 (1/8000)

If the consonant inventories are larger the probability of a chance match is smaller.

– For 25 consonants the probabilities are C CC CCC .04 (1/25) .0016 (1/625) .000064 (1/15,625)

– For 30 consonants the probabilities are C CC CCC .0333 (1/30) .00111(1/900) .0000370 (1/27,000)

– For 40 consonants the probabilities are C CC CCC .025 (1/40) .000625 (1/1600) .000015625 (1/64,000)

If vowel correspondences are also required the probability of a chance match de-creases further. For a five vowel system the chance of a match on the first vowel is (on average) 1/5 or .2 and for a match on two vowels is 1/25 or .04. For a three vow-el system the probabilities are 1/3 (.33) and 1/9 (.11). The possibilities for CVC, CVCV, CVCC, and CVCVC matches are as follows:

– 20C+3V CVC CVCV CVCC CVCVC .000833 .000278 .0000417 .0000139 (1/1,200) (1/3,600) (1/24,000) (1/72,000)

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 259

– 20C+5V .0005 .0001 .000025 .000005 (1/2,000) (1/10,000) (1/40,000) (1/200,000)

– 25C+3V .000533 .000178 .0000457 .00000711 (1/1,875) (1/5,625) (1/21,875) (1/140,625)

– 25C+5V .00032 .000064 .0000128 .00000256 (1/3,125) (1/15,625) (1/78,125) (1/390,625)

– 30C+3C .00037 .000123 .0000123 .00000412 (1/2,700) (1/8,100) (1/81,000) (1/243,000)

– 30C+5V .000222 .0000444 .00000741 .00000148 (1/4,500) (1/22,500) (1/135,000) (1/675,000)

Of course the distribution of the phonemes of a language through its words does not usually show a consistent distribution: Some segments are more frequent than others. It should be clear that one is more likely to get a chance match with more frequent consonants than with less frequent ones. What is the range of frequency between the most frequent and least frequent consonants (or vowels or segments) in a language? As far as I am aware there has never been any systematic typological research into this question cross-linguistically, although it is a question of consid-erable theoretical and practical interest. For purposes of illustration, however, I can report my prior calculation of such frequencies for modern written Arabic, based on the 3,775 entries in Wehr (1979). In word-initial position the most fre-quent consonant (/n/) has a distribution (6.99%) just under twice the theoretical average for 28 consonants (3.57%), while the least frequent consonant (/ðʕ/) has a distribution slightly more than one-tenth the average (0.48%).

Taking these figures for purposes of illustration, if high-frequency consonants are about twice as frequent as average, then obviously the probability of finding a chance match on a single high-frequency consonant will be about twice that of the probability of a chance match on a consonant with a distribution close to the average. The probability of a chance match on a sequence of two high-frequency consonants will be four times higher than the probability of a chance match on a sequence of two consonants close to the average distribution. And the probabil-ity of a chance match on a sequence of three high-frequency consonants will be eight times higher than that for a sequence of three consonants close to average

© 2012. John Benjamins Publishing CompanyAll rights reserved

260 Robert R. Ratcliffe

distribution. Similarly if rare consonants are about one-tenth as frequent as the average, then the probability of a chance match on a sequence of one, two, or three rare consonants will be ten-times, 100-times and 1,000-times less likely than the comparable sequence involving consonants close to the average. Chance prob-abilities for the case of a 25-consonant inventory would be as follows:

C CC CCC

High .08(1/12.5) .0064 (1/156) .0000512 (1/1953)

Avg. .04 (1/25) .0016 (1/625) .000064 (1/15,625)

Low .004 (1/250) .00016 (1/62,500) .000000064(1/15,62500)

A number of methodological points emerge from these figures. First, consonants are more important than vowels. That is, insofar as consonant inventories are larg-er than vowel inventories in a language, which seems to be universally the case (Haspelmath et al. 2005), the probability of a chance match on consonants is lower than that on vowels. It is thus safer to ignore vowels than consonants in the initial analysis. The cost (in terms of increased probability of a chance match) of ignoring one consonant out of three in three-consonant words is higher than the cost of ig-noring the vowels. Ignoring vowels increases the probability of a chance match usu-ally by an order of magnitude or less, but ignoring the third consonants when three are present increases the probability of a chance match by two orders of magnitude. This is an important point in the current context, because it relates to the age-old de-bate as to whether Proto-Afroasiatic roots should be reconstructed as biconsonantal or tri-consonantal. Regardless of the merits of either these positions, the search for cognates based on the assumption of biconsonantal C-C or CVC roots has a higher expectation of chance results than does the triconsonantal alternative, and this ex-pectation does not decrease significantly even if vowel correspondences in proposed CVC roots are required. Both Ehret (1995) and, less consistently, Orel & Stolbova (1995) reconstruct to a CVC target, rather than to a C-C-C or CvC(v)C target.

Second, insofar as distribution of consonants is more or less even through the languages compared, chance matches are less likely between languages having large consonant inventories than for those with small ones. This should ensure less randomness in Afroasiatic, where consonant inventories tend to be large.

Third, where distribution is uneven, chance matches between statistically in-frequent consonants are less likely than for frequent ones.

4.1.2 Increasing the probability of a (chance) matchThere are a number of steps a comparatist might take, which while not inherently illegitimate, have the effect of increasing the probability of a chance match, and allowing more randomness into the results. The two most obvious of these are

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 261

allowing many-to-one correspondences and systematically ignoring some feature of phonological structure.

4.1.2.1 Many-to-one and many-to-many correspondences. Since mergers and splits do occur, it is not inherently illegitimate to allow many-to-one or many-to-many correspondences, but this increases the probability of a chance match. For example, under the scenario above of a language with 20 consonants the chance of finding a match on one of them is 1/20 (.05). But if this consonant inventory is dis-tributed through four places and five manners of articulation, and the researcher allows a consonant in one language to match with any consonant at the same place of articulation in the other, then the chance of a match is 1/4 (.25). Here the pos-sibility of a match on both one and two is 1/16 (.0125) and of one, two and three is 1/64 (.0156). If we were to take 20 words in English starting with d- and compare them with exactly one semantic equivalent in this hypothetical language we would expect to find on average that five (20 times .25) of these 20 would start with a dental of some kind.

4.1.2.2 Ignoring features of structure. Another way to increase the probability of a chance match is to ignore certain features of phonological structure. This too can be perfectly legitimate. For example, because developments affecting stress or tone can be complicated, a researcher may choose to ignore them in an initial stage of comparison. In fact ignoring such features has a low cost in terms of increased probability of a chance match because such features generally have few values, for stress either stressed or unstressed, for register tone, usually either high or low. As already noted, for essentially the same reason, ignoring vowels is less costly than ignoring consonants.

Another way to increase the probability of a chance match is to treat as cognate cases where only two segments (out of three or more in the word) correspond. I have already noted in Section 2.1.2 that both Orel & Stolbova (1995) and Ehret (1995) do this to some extent and in different ways. Since loss of segments is not at all unusual, and fusion of an affix with a stem is possible, there is nothing inherently wrong with allowing for this possibility, but it increases the probability of a match, and hence the degree of randomness, considerably. The issues which are particu-larly relevant here are the costs of allowing 2-out-of-3 (ABC = ABD) and one type of 2-to-3 (ABC = AB) as in Ehret (1995), and the cost of allowing several types of 2-to-3 matches (ABC = AB, ABC = AC, ABC = BC) as in Orel & Stolbova (1995).

The chance of finding a 2-out-of-3 match for the first two consonants (ABC = ABD) is no different from the chance of finding a 2-to-3 match (AB = ABD). In either case C3 is ignored so the chance of finding a match on C3 is 100%. For 25 consonants, the average probability is .04 x .04 x 1.0 = .0016. This is of course also

© 2012. John Benjamins Publishing CompanyAll rights reserved

262 Robert R. Ratcliffe

the same as the probability of finding a 2-to-2 match (AB = AB). What changes with 2-out-of-3 matching is not the probability but the number of trials. The num-ber of potential comparisons that can be made increases. I will elaborate on this in the following sub-section.

If multiple types of 2-to-3 matching is allowed, as in Orel & Stolbova (1995), then for a two-consonant stem in one language C1 can be compared with either C1 or C2 in the other, and C2 can be compared with either C2 or C3. In the first case the chance that the first consonant will match either C1 or C2 is the frequency of C1 plus the frequency of C2. In the above case with 25 consonants evenly dis-tributed this would be

.04 + 04 = .08 (or 1/12.5).

Multiplying this by the probability that the final consonants will also match gives the probability of finding either an AC = ABC or BC = ABC match. Again in the case of 25 consonants this would be

08 x .04 = .0032 (1/312.5).

If C2 is also allowed to match either C2 or C3 the probability of a match on one consonant is again, in the hypothetical case, .08 and the probability of a chance match of either AB = ABC, BC = ABC or AC = ABC is

.08 x .08 = .0064 (1/156.25).

The point here is that all types of 2-to-3 or 2-out-of-3 comparison allow a prob-ability of a chance match two orders of magnitude higher than that for a 3-out-of-3 match. Thus under the formula of PT, if the same number of comparisons are made, at least a hundred times more spurious cognates are expected under the as-sumption of 2-to-3 or 2-out-of-3 matches than under the assumption of strict 3-to-3 matches. Of course this is mitigated somewhat by the fact that there are always fewer possible two-consonant words than three-consonant words in any language.

4.1.3 Increasing the number of trialsThe number of trials in a comparative study is the number of comparisons made. If we accept only 1-to-1 comparison, each word in language A compared only with its exact semantic equivalent (assuming it exists) in language B, then for two 1000-word word-lists or dictionaries containing the same entries there will be 1000 tri-als. Assuming all the words in the list have three consonants and we only accept as cognate those words in which all three consonants correspond, the probability of a chance match, in the example case of 25 consonants equally distributed, is .000064, and the average expected number of chance resemblances or spurious “cognates” is .000064 x 1000 = .064, i.e. less than one. The chance of finding k matches in

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 263

n trials is given by the binomial equation: (n*pk*(1−p)n−k)÷*(k!(n−k)!.5 By this equation the chance of finding no chance matches (k = 0) under these conditions is .938, or just under 94%; the chance of finding one is .06 and the chance of finding more than one falls under .002 (or two chances in 1,000). If however, we allow the third consonant to be ignored, as in Ehret (1995), then the probability of a chance match is .0016, the expected number of chance matches is 1.6, i.e. around two. The chance of finding five or more matches is approximately .024 (or about one in 42) and the chance of finding six or more is .0062 (or about one in 160). Statisticians normally define the threshold of statistical significance as p < .05 or p < .01, i.e. only a result with a chance probability of more than one in 20 or more than one in 100 is regarded as significant. We can see that these thresholds are crossed in this case with as few as five or six matches, respectively.

The expected number of chance matches increases dramatically, however, if semantic leeway is allowed into the comparison. Again since semantic shifts do occur, it is not inherently illegitimate to allow for them. But if, for example, one compares each word in the hypothetical 1,000 word list above not with its exact semantic equivalent only but with a range of ten semantically close terms in the other word list, the number of comparisons then increases to 10,000, with the ex-pected number of chance matches also increasing ten-fold. Thus under the hypo-thetical case elaborated above, 0.64 3-to-3 matches and 16 2-to-3 matches are now expected. If the semantic range is 100 the number of expected chance “cognates” again increases ten-fold, with 6.4 3-to-3 and 160 2-to-3 matches expected in the example case. Note that as the average number of expected chance matches in-creases, the number of matches needed to cross the threshold of statistical signifi-cance also increases, but at a higher rate. Hence while in the preceding paragraph two 3-to-3 and six 2-to-3 matches were judged to be statistically significant at the p < .01 level, when a 1,000 comparisons were made, with 10,000 comparisons this threshold is crossed with four and 26 matches, respectively.6

How much semantic range do comparatists normally allow for, and how much have Ehret (1995) and Orel & Stolbova (1995) allowed for? Researchers rarely spell out how they have gone about searching for cognates, how many comparisons they have made before they have found anything. But often this can be deduced from the results. One interesting cause and effect relation is that broad semantic leeway in making the comparison leads to the reconstruction of forms with vague seman-tics and also not unusually to the reconstruction of a large numbers of synonyms.

This can be illustrated by example. Suppose one compares the word for ‘bird’ in one language not only with the equivalent for ‘bird’ in the other language, but with the words for ‘sparrow’, ‘pigeon’, ‘vulture’, etc. If there are 50 “bird-words” in each language, and one compares each of these in one language with the fifty in the other, there is a total of 2500 (50 x 50) comparisons made. Of course if one finds

© 2012. John Benjamins Publishing CompanyAll rights reserved

264 Robert R. Ratcliffe

apparent cognates on the basis of assuming semantic equivalence between pairs like ‘swallow’~‘hawk’, ‘parrot’~‘quail’, ‘kite’~‘ostrich’, ‘butterfly’~‘pelican’, etc. one can only reconstruct a superordinate term like ‘bird’ or ‘kind of bird’. If several such matches are found, the researcher ends up reconstructing several synonyms for ‘bird’.

This example may sound exaggerated, but in fact it seems to be exactly what Orel & Stolbova (1995) have done in this particular case. The index under ‘bird’ indicates 52 sets reconstructed with the meaning ‘bird’. This includes proposed cognate sets with the following meanings.

Table 5. Orel & Stobova (1995) ‘bird’ words

No. Set

10 Sem., Eg. ‘kind of bird’, ECh. ‘duck’

320 Eg. ‘duck’, ECh. ‘hen’

356 Eg. ‘falcon’, CCh. ‘vulture’, ‘hen’, ECh. ‘great bustard’, Ag. ‘kind of bird’

397 Sem. ‘swallow’, Rift ‘hawk’

432 Sem. ‘sparrow’, Ch. ‘guinea fowl’

443 Eg. ‘kite’, Ch. ‘parakeet’

714 Ch. ‘guinea fowl’, Rift ‘stork’

736 Sem. ‘thrush’, Ch. ‘kite’

748 Sem. ‘parrot’, WCh. ‘quail’

1261 Sem. ‘kite’, ECh. ‘ostrich’

1505 Sem. ‘crane’, CCh. ‘dove’, ‘falcon’

1539 Eg. ‘cuckoo’, WCh. ‘rooster’, CCh. ‘hen’

1598 CCh. ‘hawk’, ECh. ‘dove’

2072 Eg. ‘goose’, HEC ‘crow’

2090 Sem. ‘crane’, ECh. ‘vulture’

2190 Berber ‘butterfly’, ‘small bird’, Ch. ‘guinea fowl’, Bed. ‘pelican’

It is quite possible that there are several valid cognate sets among these 52. Certainly the material provides an interesting starting point for a future mono-graph on Afroasiatic birds (along the lines of Indo-European trees, Friedrich 1970). However it is not plausible that Afroasiatic had 52 synonyms for the generic term ‘bird’ and that these later differentiated in the individual languages. Natural languages do not allow such a high degree of synonymy. The reconstruction of synonyms is an artefact of method — specifically of allowing broad semantic lee-way in making comparisons.

This kind of semantic leeway is a consistent feature of the Orel & Stolbova (1995) study. I found that on average Orel & Stolbova (1995) reconstruct ten items

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 265

for each word in the 100 word Swadesh list, the highest number being the 52 for ‘bird’. In trying to determine how many comparisons have been made, the highest number found rather than the average should be taken as a guide. If 50 matches were found, at least 50 items were looked at (although admittedly not necessar-ily in any one language). If ten items were found, more than ten must have been looked at. There are roughly 1600 entries in Orel & Stolbova’s (1995) appendix, and 2,672 items reconstructed, many of them synonymous. We might assume that these figures reflect word lists of about 2,000 words. If 2,000 words are compared, with each word in one language tested against 50 semantically similar ones in the other, then the total number of trials or comparisons is 100,000. If all the compari-sons were 2-out-of-3, and the number of consonants in each language compared is 25, then based on the probabilities calculated above, 640 chance 2-of-3 matches are expected, and if all matches are 3-of-3 then 6.4 chance matches are expected.

These calculations are based on the assumption that comparison involves two languages, or two branches. Another way to increase the number of comparisons, and hence the average number of chance matches, is to directly compare languages of different branches without attempting a reconstruction to the branch level. This is equivalent to reconstructing a variety of synonyms for the proto-language of the branch. Take the example of words for ‘hen’ in Chadic. The Orel & Stolbova (1995) entries under ‘bird’ include 25 words for ‘hen’ in 21 Chadic languages (Kwang, Kera-2, Gisiga, Sumray, Mofu, Gude, Munjuk, Musgum, Sibine, Bolewa-2, Dera, Tangale-2, Pero, Ngamo, Gulfey, Sura, Angas, Montol-2, Nanchere, Kabalay, Mokilko) with four languages apparently having two synonyms (although the two Montol words kier and kiyǝ‚ analyzed as going back to different proto-Afroasiatic etyma, no. 1593 and no. 1598, respectively, look suspiciously like mere transcrip-tional variants by different field workers). These are grouped into 13 cognate sets, reflecting five words for ‘hen’ reconstructed for proto-East-Chadic, five recon-structed for proto-Central-Chadic, and four reconstructed for proto-West-Chad-ic. In two cases, the Central Chadic and East Chadic sets are grouped together as cognates, but without any suggested proto-Chadic reconstruction. If this example is typical it means the number of comparisons, and hence the number of expected chance correspondences, between Chadic and any non-Chadic language or proto-branch language should be multiplied by 13.

Of course this example may not be typical. Many of the words taken back to different etymons look similar, suggesting that there may be considerably fewer than 13 distinct etymons here. Nonetheless there are a number of quite distinct items here, and this degree of lexical diversity within Chadic is by no means atypi-cal. Jungraithmayr & Ibriszimow (1993: xiii) observe that even between Hausa and Tangale (both West Chadic) only 20% of the Swadesh list vocabulary is cognate and between Sura and Migama (West and East, respectively) 26%. Given the great

© 2012. John Benjamins Publishing CompanyAll rights reserved

266 Robert R. Ratcliffe

divergence among Chadic languages, and that Orel & Stolbova (1995) maintain a separate comparison on the basis of East, Central, and West Chadic we should as-sume that at least three different comparisons have been made and that in most cases (roughly 80%) etymologically distinct items are involved. This would mean 240,000 to 300,000 trials, resulting in 1,536 to 1,800 chance 2-out-of-3 matches and 15 to 18 3-to-3 matches. Note that in the actual case of ‘hen’ above only two of the thir-teen comparisons involve a 3-to-3 match (entries 301 with Egyptian and 943 with Oromo), but one of these (943) involves a reduplication of the first consonant and looks suspiciously like an onomatopoeia. Note too that only two of the ‘hen’ com-parisons (and neither of the 3-to-3) involve comparison with more than one non-Chadic branch (356 involving Egyptian and Cushitic and 1,539 involving Egyptian

Table 6. Chadic ‘hen’ in Orel & Stolbova (1995)

No. Cognates Branchesa

301 ECh. *bwagur, Kwang bogor-to, Kera dəbərgə, compared with Eg. bd3 ‘duck’

1–3/3–2

356 CCh. *bwak, Gisiga bokoy with ECh. ʔabuka ‘great bustard’, Eg. byk ‘falcon’, Agaw bik ‘kind of bird’

2–2/2&2/3–1

748 ECh. *dur, Sumray dure:, with WCh. durwa ‘quail’, Sem. durr ‘parrot’ 1–2/3–1

943 CCh. *gwa-gwar, Mofu gwagwar, ECh. *gu-gur, Kera gu-gur, with LEC Oromo gogorrii ‘guinea fowl’

1–3/3–2

965 CCh. *gVya, Gude gyagya, with Eg. d-wy.t ‘kind of bird’ 1–2/3–1

988 CCh. *yVgur, Munjuk yugur, Musgum yugur, igur, ECh. *gurVy, Sibine gəray, with Eg. gry ‘poultry’

1–2/3–3

1088 WCh. *Hyabi, Bolewa yawi, Dera ya:we, Tangale yabe, Pero yabe, Ngamo yabi, with Eg. ʕbw ‘kind of bird’

1–1/3?–5

1478 WCh. *kwam, Tangale kom, Bolewa kom, with Sem. kumVy ‘water-fowl’

1–2/3–2

1539 CCh. *kwak, Gulfey kwaku, with ECh. Bid. keeke ‘bird’, WCh. Fyer ‘rooster’, Eg. kk ‘cuckoo’, Sem. kakVy ‘bird’

2–2/2&2/3–1

1593 WCh. *keyar << kewari, Montol kier, with CCh. Mofu kwerekwere ‘duck’, Sem. kariʕ ‘kind of bird’

1–2/3–1

1598 WCh. *keway, Sura kwɛɛ, kyɛ, Angas ki, Montol kiy, with ECh. *kway ‘bird’, CCh. *kuy ‘hawk’, Eg. *key ‘bird’

1–2/2–3

2443 ECh. *tur, Nanchere turoba, Kabalay turo, with CCh. Gisiga turo ‘partridge’, Sem taaʔir

1–2/3–2

2494 ECh. *ʔwas, Mokilko ʔosso, with Eg. (MK) wʃ3.t ‘poultry’ 1–2/3–1a Figures in this column indicate number of branches outside of Chadic for which a cognate is proposed/type of match/number of distinct Proto-E, W, or C Chadic etymons proposed by Orel & Stolbova (1995) to underly the attested forms.

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 267

and Semitic). Six comparisons are based on some Chadic language and Egyptian only; four comparisons are based on some Chadic language and Semitic only; and one comparison is based on some Chadic language and Cushitic (Oromo) only. Given that Orel & Stolbova (1995) also do not reconstruct Proto-Cushitic, but treat it as seven independent sub-branches, the same methodological problem arises here.

Another way to increase the number of comparisons, in effect another way to broaden semantic leeway, is to ignore derivational and etymological history. If one considers not just the most basic or etymologically oldest sense of a word but secondary semantic shifts, dialect variant meanings, derived words no longer closely semantically connected with the source, etc., one can increase the number of comparisons, and hence the degree of randomness. Conversely, a careful use of etymological and derivational history to restrict the number of comparisons can reduce the degree of randomness.

This too can be best illustrated by example. Suppose someone looks up a word in an Arabic-English bilingual dictionary, takes all the English equivalents, and looks them up in an English–Somali dictionary (a not unrealistic description of what a comparatist might actually do), then the number of trials is a factor not of the number of Arabic words being compared, but of the number of English trans-lation equivalents. Since Arabic dictionaries are traditionally panchronic, incor-porating material from a variety of sources, time periods, and places, without any indication of provenance or date of attestation, each word frequently has a variety of meanings and a variety of English translations. Further, since Arabic is a mor-phologically rich language, in which exploitation of native derivational resources rather than borrowing has been the preferred means of expanding the lexical stock, each root entry typically contains a variety of deverbal nouns, denominal verbs, secondary derived verb forms, lexicalized participles, etc., which have acquired distinct semantics from the root. This observation applies most emphatically to Steingass’s Arabic dictionary which Ehret (1995) has used as his primary source for Semitic. Note Kaye’s (1996: 280) observation “It was not a reliable dictionary in 1884, and to use it today does not make much sense.” Ehret’s declared reason for using this dictionary is revealing:

Definitions of Arabic words are commonly cited here from Steingass (1884), not because his is a very good dictionary, but because his meticulously detailed gloss-es are especially useful in the comparative semantic analysis. (Ehret 1995: 73)

Based on analyzing every 200th page (pp. 200, 400, 600, 800, 1,000, and 1,200) in Steingass’s dictionary, I calculated the following averages:

– avg. no. of Arabic words/page: 23– avg. no. of three-consonant root entries/page: 8

© 2012. John Benjamins Publishing CompanyAll rights reserved

268 Robert R. Ratcliffe

– avg. no. of Eng. translation equivalents per/word: 5.2– avg. no. of Eng. translation equivalents for all words under a root entry: 13.8

Since the dictionary has more than 1,200 pages, if these averages hold good, there should be 9,600 root entries in the dictionary. If only the first or most basic meaning of each of these is taken and compared against its translation equivalent in Somali or any other language, then a maximum of 9,600 trials have been made. In the hy-pothetical case of 25 consonants evenly distributed, this should yield an average of 0.61 3-to-3 chance “cognates”. (In other words a chance cognate should be found for two out of every three languages compared.) It should also yield for any language about 15 cases of chance matches on just the first two consonants. If a range of five meanings is taken into account for each Arabic entry the number of trials increases five times, and hence the number of expected chance matches increases to three (3-to-3) and 77 (first two only). If a range of 14 meanings is considered, the expected number of chance matches increases to nine and 215, respectively.

If only the first two consonants are considered as constituting an entry, as un-der Ehret’s root theory, then the range of translation equivalents increases dramat-ically, thereby increasing the semantic range associated with each two-consonant “root”. But since the total number of Arabic root-entries will decrease proportion-ally this step will not normally increase the number of trials (since we are in effect defining the number of trials as the number of Arabic entries times the number of translation equivalents). Yet the total possible number of initial two consonant sequences in Arabic is only 600 (Total possible number of 2C combinations in Arabic is 282 = 784 minus 184 excluded homorganic combinations = 600), and not all actually occur. Thus under the assumption of the biconsonantal root and a liberal use of a source like Steingass, one can conservatively expect that about 36% (215/600) of putative Arabic “biconsonantal” roots will have purely chance “cognates” in any language in the world for which a suitably large English–to–x dictionary exists.

Under this way of increasing semantic leeway, too, one might expect semanti-cally vague reconstructions and many synonyms. This is exactly what is found in Ehret (1995). There are 84 instances of synonymous entries, reflecting 286 re-constructions. There are also many near synonyms, e.g. ‘be moist’ (four entries), ‘be wet’ (another four), ‘be very wet’ (one), ‘become wet’ (four), ‘become thor-oughly wet’ (one); or ‘raise’ (seven), ‘rise’ (20), ‘rise up’ (two), ‘rise above’ (one). Interestingly, in complete contradistinction to Orel & Stolbova (1995) the vast majority of these (65 of 84) are verbs. There are only 15 synonymous noun en-tries, and no reconstruction at all for “hen” or “bird” or most of the nouns in the Swadesh list. One strongly suspects this bias reflects the traditional organization of Arabic dictionaries, where a verb stem is always given as the head entry, even

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 269

where a noun of the same root is etymologically primitive. If so it indicates a use of the Arabic sources much in the way outlined above.

The question raised at the outset can now be answered in general terms at least. Both Ehret (1995) and Orel & Stolbova (1995) have adopted assumptions which have insured a high probability of a chance matches and have used sources in such a way as to insure a high number of comparisons. The two things together insure a high number of spurious “cognates” which are in fact mere accidental re-semblances — a high degree of randomness. Yet because the assumptions adopted and the use of sources is slightly different, the degree of randomness extends, as it were, in a different direction in each case.

5. Implications and Prospects

The essential paradox of long and medium range comparison can be stated as follows. Over time words are lost from languages, obscuring evidence of cognate-ship. Genuine cognates that remain are subject to semantic shifts which may make them difficult to recognize. At the same time loss of segments, fusion of segments, and odd changes like metathesis make regular sound correspondences increasing-ly difficult to identify. In order to compensate for these factors, a comparatist may relax the criteria for a correspondence, and broaden the search for cognates by allowing more and more semantic leeway. The result of this, however, is necessar-ily to allow in a high degree of “noise” or purely chance matches, until eventually anything found becomes statistically meaningless. Thus the more we adapt our methodology to deal realistically with what actually happens in language change, the less realistic our reconstructions are likely to be.

One consequence of this logic is that the goal of a comparative methodology should not be to increase the number of proposed cognates by exploiting ever more ambitious and imaginative search strategies. Rather we should be focusing on ways to constrain the search strategy so as to find fewer (spurious) cognates. Respecting the Neogrammarian principle of regular sound change is one way to constrain the search parameters. It is, of course, not literally the case that “sound change is regular and admits of no exceptions”. At best regular sound change is a strong statistical trend. But by assuming that it is absolute, we exclude the possibil-ity of one-to-many or many-to-many sound correspondences, except those that can reasonably be attributed to merger or environmentally conditioned split, and thereby restrict the search space. At a minimum this is what is meant by the tra-ditional comparative method. But the discrepancy between the Afroasiatic lexica suggests that this minimal standard is not sufficient to ensure reliable results, and that further constraints on search methodology are necessary.

© 2012. John Benjamins Publishing CompanyAll rights reserved

270 Robert R. Ratcliffe

In particular, four issues stand out as major sources of randomness in one or both lexica: absence of reconstruction of intermediate nodes; use of data from well-documented languages without etymological analysis; broad and/or highly abstract lexico-semantic reconstruction; and reconstruction of Afroasiatic root morphemes as primarily biconsonantal CVC. While each of these points might be criticized from the point of view of traditional notions of best practice and uni-formitarian assumptions, each of these is also to some extant justifiable either on theoretical grounds or on purely pragmatic considerations of the still preliminary nature of comparative Afroasiatic research. It is therefore worthwhile considering briefly the methodological consequences of each in terms of increased degree of randomness and further considering how these consequences can be mitigated in future research.

5.1 Use of data from many languages of a single sub-branch without reconstructing at the sub-branch level

Comparison across branches in the absence of intermediate node reconstruction is notably a problem that relates to Orel & Stolbova (1995). Ehret (1995) apparently cannot be criticized for this failing, although it is difficult to be sure, since he does not publish the evidence upon which his branch level reconstructions are based. As noted above, this procedure allows for a choice of potential cognates among many languages of a given sub-branch, increasing the number of comparisons and hence the degree of randomness. The solution is, of course, more attention to re-construction at the small family and sub-branch level. Of course, it is sometimes the case that a peripheral language within a sub-group retains a conservative form, while the majority of the languages in the sub-group have innovated together. Reconstruction purely on the basis of the internal information of the sub-group would probably lead us to reconstruct the majority (innovative) form and to ig-nore the actually conservative peripheral form in these cases. This is a consequence which simply has to be accepted in the preliminary stages of comparison, in the same way that the Neogrammarian principle of regular sound change will initially force us to ignore some legitimate cognates, for example place names or personal names, which have not changed according to the regular sound laws.

In the years since the two Afroasiatic lexica appeared, considerable progress has actually been made in lower-level reconstruction. For Chadic there is now Jungraithmayr & Ibrisimov’s (1993) sifting of Chadic lexical roots, which actu-ally appeared before the two sources under discussion, although neither team ap-parently incorporated it. For the Angas-Sura subgroup of West Chadic there is now Takác’s (2004) comparative dictionary. For Semitic Cohen et al.’s (1994–2012) dictionary of Semitic synonyms has now reached ten fasicles. Militarev & Kogan

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 271

(2000, 2005) have also produced two volumes of a comparative Semitic etymologi-cal dictionary organized on semantic principles. For Berber there is Kossmann’s (1999) reconstruction and for South Cushitic Kiessling (2002). For Agaw (Central Cushitic) there is Appleyard’s (2006) comparative dictionary.

5.2 Panchronic use of the sources, choosing meanings from various time depths

Panchronic use of sources relates especially to Ehret’s use of Arabic data. But possi-bly Orel & Stolbova (1995) have also followed this practice in their use of Egyptian data. (See the entries under ‘water’, where several different items are reconstructed for Proto-Afroasiatic based on different stages of Egyptian: 1860 *ni, Egyptian (Middle Kingdom) nwy.t, n.t; 1877 *nin or *nun Egyptian (Pyramid texts) nnw; 2586 *yawin, Egyptian (Greek papyri) i’wny ‘waters’.) This procedure too leads to an increase in the number of comparisons that can be made.

Arabic and Egyptian have a special status within Afroasiatic, Egyptian because it has a longer documented history than any other language in the group (or in the world for that matter, except possibly Chinese), Arabic because it is far and away the most thoroughly documented Afroasiatic language. But this breadth and depth of attestation presents pitfalls if attention is not paid to etymology.

To illustrate the problem by an analogy from archeology, the historical lin-guist who works with a dictionary like Steingass (1884), or most Classical Arabic dictionaries is in the position of an archeologist presented with a truckload of artefacts stripped from a variety of sites and historical strata and thrown together at random. Without the crucial information about provenance and time depth, it is difficult or impossible to interpret the items in a historically meaningful way. A major desideratum of the field is an etymological dictionary of Arabic. In the meantime the solution might be to work with existing etymological dictionaries of languages like Hebrew (Klein 1988) and Ge’ez (Leslau 1987) and the recent comparative Semitic lexica referred to above and to avoid using Arabic meanings which can not be confirmed elsewhere in Semitic.

5.3 Broad semantic reconstruction

Both sources reconstruct a large number of homonyms and synonyms. About 40% of the entries in Orel & Stolbova (1995) are part of a group of synonyms, as are about 28% of those in Ehret (1995), not counting near synonyms. In other words only 60% and 72%, respectively, of the reconstructed items represent the unique item reconstructed for that meaning. There are also a large number of homonyms. I calculated that in Orel & Stolbova (1995) 858 reconstructed items or 32% of the

© 2012. John Benjamins Publishing CompanyAll rights reserved

272 Robert R. Ratcliffe

total are part of a group of homonyms. The most prolific of these is *mar, which is given as the reconstruction for 14 distinct cognate sets. Ehret also reconstructs a high degree of homonymy for Proto-Semitic, as he acknowledges:

The analytical groupings of roots distinguished by this approach reveal more sharply than ever before the pervasive polysemy and homonymy of a large pro-portion of Semitic verb roots … it becomes clear that in pre-proto Semitic, be-cause of the loss of stem vowel distinctions in verbs, a great many previously dis-tinct roots had fallen together. (Ehret 1995: 2)

Ehret attempts to avoid this at the Afroasiatic level by reconstructing distinctive vowels and tones for otherwise identical two-consonant stems.

Reconstruction of excessive synonyms and homonyms is generally considered inconsistent with best practice. Presumably this reflects uniformitarian assump-tions about what sort of thing is possible in actual languages. For functional rea-sons a language can only tolerate a certain amount of synonymy and homonymy, since homonymy creates ambiguity, which interferes with communication, and synonymy, which is a form of redundancy, imposes unneccesary burdens on memory. However, exactly how much synonymy and homonymy languages toler-ate has never been systematically studied cross-linguistically, and it is difficult to see how this even could be done, although Krupa (1968) presents an interesting attempt to quantify synonymy for one language, Maori.

But the methodological implications of a high degree of synonymy and hom-onymy in a reconstruction is clear. Excessive synonymy results from allowing a broad range of semantic equivalence. This also increases the number of compari-sons made, and hence the degree of randomness. Excessive homonymy results from ignoring features of phonological structure (usually third consonants). For example in Orel & Stolbova (1995) we find Arabic qarn ‘horn’, qaar-at ‘hill’, qarra ‘be cold’, and qaraʔa ‘read’, reconstructed as *qar ‘horn’, *qar ‘hill’, *qar ‘be cold’, and *qar ‘shout’, respectively, with the extra segmental material which distinguishes the words reconstructed out. Ignoring this material increases the probability of finding a match, hence the degree of randomness. Thus the following correlations can be set up:

– broad semantic leeway = increased number of trials = reconstruction of syn-onyms

– ignoring phonological material (vowels, third consonants, etc.) = increased probability of a match = reconstruction of homonyms

Thus the number of homonyms and synonyms that a researcher reconstructs is an important clue to understanding how the researcher conducted the initial com-parison, and where randomness or error might be expected in that researcher’s

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 273

results. Probabilistic reasoning supports traditions of best practice on this point. Obviously, constraining the range of semantic leeway in the initial comparison is the best way to avoid this problem, although it may be premature to expect any consensus on how this can best be accomplished.

5.4 The biradical hypothesis and morphological reconstruction

As already noted, both sources make extensive use of the biradical hypothesis, re-constructing predominately CVC roots for proto-Afroasiatic. As a rule this means reconstructing lexical relationships in the attested languages as going back to deri-vational relationships in the proto-language. (In at least one case Orel & Stolbova 1995 also reconstruct a derivational relationship — an Arabic singular-plural pair in nos. 1568 and 1589 — as going back to lexical ones in Proto-Afroasiatic.)

The biradical hypothesis has a venerable history in Afroasiatic studies, and a full discussion of it is beyond the scope of the present paper. It is grounded in two types of observations. First in Semitic languages, especially in Classical Arabic, one frequently finds sets of words which share two consonants and some element of meaning, e.g. Classical Arabic qatʕaʕ ‘cut’ and qatʕtʕ ‘cut, clip’. Second, early students of “Hamitic” languages recorded an impression that CVC roots predominate in these languages. On the first point Zaborski (1991) has cogently argued that an ex-planation in terms of primitive biradicals is neither the only possible nor even the most plausible explanation. Analogy, sound-symbolic attraction, and inter-dialec-tical borrowing can explain the observed patterns. With regard to the second point, initial impressions have largely been overturned by further research. Jungraithmayr (1983) argues on comparative Chadic grounds that in those Chadic languages where mono- and bi-consonantal stems appear to be frequent this is most likely to reflect an innovative state (a result of consonant loss due to sound change) rather than a conservative or “primitive” state directly reflecting proto-Chadic.

There are also uniformitarian grounds for skepticism about Ehret’s particular version of the biradical hypothesis. Ehret reconstructs a proto-language in which the basic vocabulary consists of polysemous verbal roots with general meanings, while verbs with more specific meanings and almost all nouns are derived by suf-fixation. Further, all consonants in Ehret’s Proto-Afroasiatic can serve as suffixes. My impression is that in living languages non-derived words are specific and con-crete, and derived words are more likely to be abstract. I also have the impression that the affixational morphology of languages tend to exploit a relatively small and unmarked subset of the consonant inventory of the language rather than the whole inventory. This objection may be weak, of course, in the absence of a systematic typological study of these questions.

© 2012. John Benjamins Publishing CompanyAll rights reserved

274 Robert R. Ratcliffe

Finally, reconstructing a system of CVC roots plus affixes raises a number of problems in explaining how one gets from this system to the actual morphologi-cal systems of the attested languages. The pattern-based morphology of Semitic presupposes (one might even say imposes) triconsonantal roots. With CVC stems one can have at most a minimal system of ablaut or apophony but no real system of morphological patterns. If one reconstructs pattern-based morphology for proto-Afroasiatic, then triconsonantalism necessarily follows. If one does not, then the way in which this system emerged in Semitic must be explained. Another problem involves the well-known homorganic co-occurrence restrictions on Semitic roots, found to some extent elsewhere in Afroasiatic (Greenberg 1950, Bender 1978). If each consonant could have functioned as a suffix, then each suffix would also have to have had at least one allomorph at a different point of articulation. A complex system of dissimilation rules would be needed to account for the observed distri-bution. In short it seems impossible to separate issues of lexical and phonological reconstruction from issues of morphological reconstruction.

While the question of the inherent plausibility of the biradical hypothesis is complex and difficult to resolve, the methodological consequences of fully exploit-ing this hypothesis in the preliminary stage of cognate search are clearly disas-trous in terms of allowing a high degree of chance results into the reconstruction. Ehret’s justification for his use of the hypothesis is revealing:

With respect to triconsonantal roots in Semitic, a[n] … explanation of the third consonant as lexicalized pre-proto-Semitic suffixal morphemes has now been put forward (Ehret 1989) … It has been applied here without apology because, quite simply it works. (Ehret 1995: 2)

As the above calculations have shown, such a procedure should indeed work quite well as a way of generating chance matches.

When considering in general the problem of biradicalism vs. triradicalism, the following calculations should be taken into account. The ratio of possible two-consonant roots to three-consonant roots is x2 :: x3, where x is the number of con-sonants in the inventory. Unless the language has a rich vowel and/or supraseg-mental inventory, two-consonant stems will not provide an adequate vocabulary. If distribution of actual words is proportional to potential, then one would expect to find that two-consonant words are only about 4% of total two- and three-con-sonant words, as the following Table 7 indicates, based on consonant inventories from 20 to 30.

For Modern Standard Arabic (with 28 consonants), as a real-world example, biconsonantals constitute 2.05% of all bi- and tri-consonantal stems.

If two genetically related languages each have 4% biconsonantals and 96% triconsonantals, it is logical to assume that the common proto-language had the

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 275

same proportion. However, pure chance matching would yield a completely differ-ent proportion. Assume that lexical resources and the comparatist’s imagination is such as to allow 200,000 comparisons, 4% of these or 8,000 should involve a bicon-sonantal, and 96% or 192,000 should involve a triconsonantal. For a 25 consonant inventory the chance of finding a 3-out-of-3 match is .000064. Multiplying this by 192,000 yields 12.3 expected matches. For multiple 2-to-3 matches as in Orel & Stolbova (1995) the chance of a match is .0064, which multiplied by 8,000 yields 51 expected chance matches involving a biconsonantal. In other words, if one starts by comparing two languages in which the proportion of two-consonant words to three-consonant words is 4% :: 96%, the proportion of chance matches involv-ing two-consonant words will be in a proportion of 51 :: 12 or 81% :: 19% to those involving 3-to-3 matches in three-consonant words. (This proportion is constant regardless of the number of comparisons.)

If 2-out-of-3 matches are allowed as in Ehret (1995), the probability of find-ing a match is lower (.0016) but 100% of the items are available for comparison. Thus on the basis of 200,000 comparisons the expected number of chance matches involving only two consonants will be 320. The ratio of this number to the 12 ex-pected chance 3-to-3 matches is 96% :: 4%, the exact opposite of what was started with, and what would be expected from a judicious reconstruction. In other words the fact that Ehret (1995) and Orel & Stolbova (1995) both reconstruct a large number of biconsonantals is likely to be an artefact of method — specifically that the methodology employed in searching for cognates has let in a large number of chance matches.

6. Final observations

I sincerely hope that it is obvious that none of this is intended to disparage the work of Ehret (1995) or Orel & Stolbova (1995). One can only find cognates by looking for them, hence no progress can be made without work of this kind. Yet an inescapable, although perhaps discouraging, conclusion of the preceding analysis

Table 7. Distribution of number of consonants in the vocabulary

X = no. of C’s in inventory

X2 = total no. pos-sible 2C stems

X3 = total no. of pos-sible 3Cstems

X2/(x2 + x3) x 100 = 2C stems as % of total

20 400 8,000 4.76%

25 625 15,625 3.85%

28 784 21,952 3.45%

30 900 27,900 3.23%

© 2012. John Benjamins Publishing CompanyAll rights reserved

276 Robert R. Ratcliffe

is that the more effort and imagination a comparatist expends, the greater the de-gree of randomness will be. In order to avoid wasting effort, much could be gained if comparatists were to indicate explicitly how a comparison was conducted, i.e., how broad a search was conducted, how much semantic and phonetic leeway was allowed — in the process not merely in the result. Only then can an evaluator get a clear idea of the probable validity of the proposed reconstruction.

In terms of longer distance comparison, the preceding discussion may have some bearing on the issue raised by Baxter (1998) in his critique of Ringe (1998) and Oswalt (1998). Baxter argues:

They [Ringe and Oswalt] purport to evaluate the degree of chance resemblance between vocabulary lists directly, without reference to the content of any pre-ex-isting proposals about how the particular languages are related; the results would be the same, no matter what Illic Svityc, Dolgopolsky or anyone else actually claimed about correspondences and cognates. In fact their methods are presented not as tests of hypotheses, but as direct tests on the data themselves … To take the results as relevant to any hypothesis other than the ones actually being tested is a misunderstanding of probability theory. (Baxter 1998: 218)

Baxter’s point is well-taken. My own logic leads me to believe that there is no way to determine how many chance matches exist or occur between two languages. One can only ask how many chance matches will be found given a particular method of conducting a comparison. Oswalt and Ringe each show how many matches can be found using a particular and explicitly outlined method of conducting a com-parison. A fair evaluation of Illič-Svityč (1971–1984) or Dolgopolsky (1998) in the same terms, would require us to know explicitly how Illič-Svityč or Dolgopolsky had conducted their comparisons.

Finally let me make clear that none of the preceding is intended to call into question the validity of Afroasiatic as a genetic grouping. For my own part I am convinced the proposal of an Afroasiatic family remains the most plausible ex-planation for the detailed similarities found in the languages of the (proposed) family in morphology, including detailed paradigmatic resemblances and shared anomalies in both the verb and pronouns (Castellino 1962, Diakonoff 1965) and the noun (Greenberg 1955, Ratcliffe 1992), as well as many similarities in basic vocabulary. But whether the Proto-Afroasiatic lexicon and phonology can ever be reconstructed with the same confidence as for Proto-Indo-European remains entirely in doubt. There may simply not be enough remaining shared lexicon to allow for reconstruction of the full Proto-Afroasiatic phoneme inventory. At the very least it should be clear that Bomhard’s (1998) evaluation of the state of affairs in Afroasiatic is excessively optimistic:

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 277

… the Afroasiatic parent language must be placed as far back as 10,000 BCE, or perhaps even earlier, according to some scholars. This extremely ancient date notwithstanding, the major sound correspondences have been determined with great accuracy (cf. Diakonoff 1992; Ehret 1995), excellent progress is being made in reconstructing the common lexicon (cf. Ehret 1995, Orel & Stolbova 1995) … (Bomhard 1998: 21)

Likewise, of course, any attempt to conduct a lexical/phonological comparison of Afroasiatic with any other large family in the hope of finding yet deeper genetic relationships must be considered premature.

Abbreviations

E Ehret (1995) J. JeguOS Orel & Stolbova (1995) Kby. KabyleAg. Agaw Mhr. MehriAhg Ahaggar MSA Modern South ArabianAng Angas Ng. NgizmAr. Arabic PSC Proto-South CushticBed. Beja or Bedawie Rnd. RendilleBil Bilin Som. SomaliDhl Dahalo Ug. UgariticHEC Highland East Cushitic Ym. YemHs. Hausa

Notes

1. An alternative and more recent reconstruction for South Cushitic has been offered by Kiessling (2002). Jungraithmayr & Shimizu (1981) has been superseded by Jungraithmayr & Ibrisimov (1993) which was apparently unavailable to either Ehret or Orel & Stolbova.

2. Although both sources accept Omotic as an independent branch of Afroasiatic, the status of Omotic is inn fact controversial. Zaborski (1986) argues that it should be classified within Cushitic (as West Cushitic). More recent work (Theil 2010) suggests that it may be an indepen-dent language family which does not belong to Afroasiatic at all.

3. The justification he gives for ignoring the Berber data is the presumed innovative nature of Berber phonology and the absence of a reconstruction for proto-Berber. The second problem has now largely been answered by the appearance of Kossmann (1999).

4. See the list of abbreviations below. Transcription systems have been regularized to IPA to facilitate comparison, except that a standard Egyptologist transcription is retained for Egyptian, where there is uncertainty about the phonetic realization of many phonemes. Strings of

© 2012. John Benjamins Publishing CompanyAll rights reserved

278 Robert R. Ratcliffe

consonants without vowels, not preceded by an asterisk, reflect dictionary lemmata (“conso-nantal roots”) in the case of Semitic and the recoverable data in the case of Egyptian, where the script does not record vowels.

5. A good approximation where the probability is low and the number of trials is high, as here, can be achieved by the formula of the Poisson distribution, which is easier to calculate: ((npk)/k!)*e−np.

6. These figures were estimated using the Poisson distribution referred to in the previous note.

References

Appleyard, David. 2006. A Comparative Dictionary of Agaw Languages. Cologne: R. Köppe.Baxter, William H. 1998. Response to Oswalt and Ringe. In Salmons & Joseph, eds., 217–236.Bender, Lionel. 1969. Chance CVC Correspondences in Unrelated Languages. Language

45.519–531.Bender, Lionel. 1978. Consonant Co-Occurrence Restrictions in Afroasiatic Verb Roots. Atti del

Secondo Congresso Internazionale di Linguistica Camito-Semitica ed. by Pelio Fronzaroli, 9–19. Florence: Instituto di linguistica e di lingue orientali universita di firenze.

Bender, Lionel. 1988. Proto-Omotic Phonology and Lexicon. Cushitic–Omotic. Papers from the International Symposium on Cushitic and Omotic Languages, Cologne Jan. 6–9, 1986 ed. by Marianne Bechhaus-Guerst & Fritz Serzisko, 121–162. Hamburg: Buske Verlag.

Biberstein-Kazimirski, Albert de. 1860. Dictionaire arabe–française. Paris: Maisonneuve.Bomhard, Allan R. 1998. Nostratic, Eurasiatic, and Indo-European. In Salmons & Joseph, eds.,

17–50.Campbell, Lyle. 2004. Beyond the Comparative Method? Selected Papers from the International

Conference of Historical Linguistics ed. by Barry Blake & Kate Burridge, 33–57. Amsterdam: John Benjamins.

Castellino, Giorgio R. 1962. The Akkadian Personal Pronouns and Verbal System in Light of Semitic and Hamitic. Leiden: E.J. Brill.

Cerny, Jaroslav. 1976. Coptic Etymological Dictionary. Cambridge: Cambrige University Press.Cohen, David, François Bron & Antoine Lonnet. 1994–2012. Dictionaire des racines sémitique

ou attestées dans les langues sémitiques, Vol. 1–10. Paris: Champion.Cohen, Marcel. 1947. Essai comparatif sur le vocabulaire et la phonétique du Hamito-Sémitique.

Paris: Champion.Diakonoff, Igor M. 1965. Semito-Hamitic Languages: An Essay in Classification. Moscow: Nauka.Diakonoff, Igor M. 1992. Proto-Afrasian and Old Akkadian: A Study in Historical Phonetics.

Princeton: Institute of Semitic Studies.Diakonoff, Igor & Leonid Kogan. 1997. Addenda et Corrigenda to Hamito-Semitic Etymological

Dictionary by V. Orel and O. Stolbova. Zeitschrift für deutsches morgenlandisches Gesell-schaft. 146:1.25–38.

Dolgopolsky, Ahron B. 1973. Sravnitel’no-istoricheskaja fonetika kushitiskix jazykov. Moscow: Nauka.

Ehret, Christopher. 1980. The Historical Reconstruction of Southern Cushitic Phonology and Vocabulary. Cologne: Dietich Reimer Verlag.

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 279

Ehret, Christopher. 1987. Proto-Cushitic Reconstruction. Sprache und Geschichte in Afrika 8.7–180.

Ehret, Christopher. 1989. The Origin of the Third Consonants in Semitic Roots: An Internal Reconstruction (Applied to Arabic). Journal of Afroasiatic Languages 2:2.109–202.

Ehret, Christopher. 1995. Reconstructing Proto-Afroasiatic (Proto-Afrasian): Vowels, Tone, Consonants, and Vocabulary. Berkeley: University of California Press.

Embleton, Sheila. 1986. Statistics in Historical Linguistics. Bochum: Brockmeyer.Faulkner, Raymond O. 1962. A Concise Dictionary of Middle Egyptian. Oxford: Griffith

Institute.Friedrich, Paul. 1970. Proto-Indo-European Trees: The Arboreal System of a Primitive People.

Chicago: University of Chicago Press.Greenberg, Joseph. 1950. The Patterning of Root Morphemes in Semitic. Word 6.162–181.Greenberg, Joseph. 1955. Internal A-Plurals in Afroasiatic. Afrikanistische Studien Diedrich

Westermann zum 80. Geburtstag gewidmet ed. by Johannes Lukas, 198–204. Berlin: Akademie Verlag.

Greenberg, Joseph. 2000. Indo-European and Its Closest Relatives: The Eurasiatic Language Family. Stanford: Stanford University Press.

Gray, Russell D., Alexei J. Drummond & Simon J. Greenhill. 2009. Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement. Science 323.479–483.

Haspelmath, Martin, Matthew S. Dryer, David Gil & Bernard Comrie. 2005. World Atlas of Linguistic Structures. Oxford: Oxford University Press.

Holman, Eric, Søren Wichmann, Cecil Brown, Viveka Velupillai, André Miller & Dik Bakker. 2008. Explorations in Automated Language Classification. Folia Linguistica 42:2.331–354.

Illič-Svitič, Vadislav M. 1971–1984. Opyt sravnenija nostraticheskikkh jazykov (semitoxamitskij, kartvel’skij, indoevropejskij, ural’skij, dravidskij, altagskij). 3 vols. Moscow: Nauka.

Jungraithmayr, Herrmann. 1983. On Mono- and Triradicality in Early and Present-Day Chadic. Studies in Chadic and Afroasiatic Linguistics ed. by Ekkehard Wolff & Hilke Meyer-Bahlberg, 139–156. Hamburg: Buske Verlag.

Jungraithmayr, Herrmann & Dymitr Ibriszimow. 1993. Chadic Lexical Roots. 2 vols. Berlin: Dietrich Reimer Verlag.

Jungraithmayr, Herrmann & Kiyoshi Shimizu. 1981. Chadic Lexical Roots. Berlin: Dietrich Reimer Verlag.

Kammerzell, Frank. 1996. Review of Orel & Stolbova (1995). Indogermanische Forschung 101.268–290.

Kaye, Alan. 1996. Review of Ehret (1995). Canadian Journal of Linguistics 41:3.278–282.Kessler, Brett. 2001. The Significance of Word Lists. Stanford: CSLI Publications.Kiessling, Roland. 2002. Die Reconstruktion der südkuschitischen Sprachen (West-Rift): von wys-

temlinguistischen Manifestationen zum geschellschaftlichen Rahmen des Sprachwandels; with an English Summary. Cologne: R. Köppe.

Klein, Ernest. 1988. A Comprehensive Etymological Dictionary of the Hebrew Language. New York: Prentice Hall.

Kogan, Leonid. 2002. Addenda et Corrigenda to the Hamito-Semitic Etymological Dictionary (HSED) by V. Orel and O. Stolbova (II). Journal of Semitic Studies 47:1.183–202.

Kossmann, Maarten. 1999. Essai sur la phonologie du proto-berbère. Cologne: R. Köppe.Krupa, Viktor. 1968. The Maori Language. Moscow: Nauka.Leslau, Wolf. 1987. Comparative Dictionary of Ge’ez. Wiesbaden: Otto Harrasowitz.

© 2012. John Benjamins Publishing CompanyAll rights reserved

280 Robert R. Ratcliffe

McMahon, April & Robert McMahon. 2005. Language Classification by Numbers. Oxford: Oxford University Press.

Militarev, Alexander & Leonid Kogan. 2000. Semitic Etymological Dictionary, Vol. 1: Anatomy of Man and Animals. Alter Orient und Altes Testament, Bd. 278/1. Muenster: Ugarit-Verlag.

Militarev, Alexander & Leonid Kogan. 2005. Semitic Etymological Dictionary, Vol. 2: Animal Names. Alter Orient und Altes Testament, Bd. 278/2. Muenster: Ugarit-Verlag.

Newman, Paul & Roxanna Ma. 1966. Chadic Comparative Phonology and Lexicon. Journal of African Languages 5:3.218–251.

Nichols, Johanna. 1996. The Comparative Method as Heuristic. The Comparative Method Reviewed ed. by Mark Durie & Malcom Ross, 39–71. Oxford: Oxford University Press.

Orel, Vladimir & Olga V. Stolbova. 1995. Hamito-Semitic Etymological Dictionary: Materials for a Reconstruction. Leiden: E.J. Brill.

Oswalt, Robert L. 1998. A Probabilistic Evaluation of North Eurasiatic Nostratic. In Salmons & Joseph, eds., 199–216.

Ratcliffe, Robert R. 1992. The ‘Broken’ Plural Problem in Arabic, Semitic, and Afroasiatic. New Haven, CT: Yale University PhD dissertation.

Ringe, Donald. 1992. On Calculating the Factor of Chance in Language Comparison. Philadelphia: The American Philosophical Society.

Ringe, Donald. 1998. A Probabilistic Evaluation of Indo-Uralic. In Salmons & Joseph, eds., 153–198.

Rosenfelder, Mark. [n.d.] How Likely are Chance Resemblances between Languages. Available at http://www.zompist.com/chance.htm.

Ruhlen, Merritt. 1994. The Origin of Language: Tracing the Evolution of the Mother Tongue. New York: Wiley & Sons.

Salmons, Joseph C. & Brian D. Joseph, eds. 1998. Nostratic: Sifting the Evidence. Amsterdam: John Benjamins.

Schuh, Russell. 1981. A Dictionary of Ngizm. Berkeley: University of California Press.Steingass, Francis. 1884 [1978]. Arabic–English Dictionary. New Delhi: Curzon.Takács, Gabor. 2004. Comparative Vocabulary of the Angas-Sura Languages. Berlin: Dietrich

Reimer.Theil, Rolf. 2010. Is Omotic Afroasiatic? A Critical Discussion. Available at http://www.uio.no/

studier/emnel/hf/iln/LING2110/v07/L%20Omotif%20Afroasiatic.pdf.Wehr, Hans. 1979. A Dictionary of Modern Written Arabic, ed. by J. Milton Cowan. Wiesbaden:

Harrasowitz.Zaborski, Andrzej. 1986. Can Omotic be Reclassified as West Chadic? Ethiopian Semitic Studies:

Proceedings of the Sixth International Conference, Tel-Aviv, 1980 ed. by Gideon Goldberg, 525–530. Rotterdam: Bakelma.

Zaborski, Andrzej. 1991. Biconsonantal Roots and Triconsonantal Roots in Semitic: Solutions and Prospects. Semitic Studies in Honor of Wolf Leslau on the Occasion of his Eighty-Fifth Birthday, November 14th, 1991 ed. by Alan Kaye, Vol. II, 1675–1703. Wiesbaden: Harrasowitz.

Zuraw, Kie. 2003. Probability in Language Change. Probabilistic Linguistics ed. by Rens Bod, Jennifer Hay & Stefanie Jannedy, 139–176. Cambridge, MA: MIT Press.

© 2012. John Benjamins Publishing CompanyAll rights reserved

On calculating the reliability of the comparative method at long and medium distances 281

Appendices 1 and 2

Appendices can be found online at: http://dx.doi.org/10.1075/jhl.2.2.04rat.additional

Author’s address

Robert R. RatcliffeTokyo University of Foreign Studies3-11-1, Asahi-cho, Fuchu-shiTokyo 183-8534Japan

[email protected]