Decipherment of the Indus Valley Script: In Search of an “Entering Wedge”

38
Decipherment of the Indus Valley Script: In Search of an “Entering Wedge” Steven Bonta Chennai May 2015 [Draft only: Not for Citation] My esteemed colleagues, today I want to present to you some of my recent findings regarding possible interpretations of certain common Harappan signs. I do not intend to go over previously ploughed ground, except by way of a brief introduction and a few illustrations. I do intend to talk a little bit about my methodology, or rather, certain techniques and assumptions I have developed as a result of what the evidence seems to be trying to tell us. As I mentioned yesterday, the goal at this stage of the work would appear to be finding an “entering wedge,” some clue or clues that will lead to progress on solving this persistent puzzle. First of all, a word on a method of decipherment that cannot work. Epigraphers sometimes call it the “comparative method,” which is confusing to a linguist like myself, for which the term “comparative method” has reference to a time-honored method of historical reconstruction of a vanished language -- such as Proto-Indo-European, the hoary ancestor of both Sanskrit and English -- of which we have no direct records. The “comparative method” involves comparing cognate words in different languages and trying to discern regularities among them, in order to ascertain what the ancestor word might have looked like. In this way, we can reconstruct a great deal of what Proto-Indo-European, or Proto- Dravidian, or whatever, must have looked like.

Transcript of Decipherment of the Indus Valley Script: In Search of an “Entering Wedge”

Decipherment of the Indus Valley Script: In Search of an “Entering Wedge”

Steven Bonta

Chennai

May 2015

[Draft only: Not for Citation]

My esteemed colleagues, today I want to present to you some of my recent findings regarding possible

interpretations of certain common Harappan signs. I do not intend to go over previously ploughed

ground, except by way of a brief introduction and a few illustrations. I do intend to talk a little bit about

my methodology, or rather, certain techniques and assumptions I have developed as a result of what

the evidence seems to be trying to tell us. As I mentioned yesterday, the goal at this stage of the work

would appear to be finding an “entering wedge,” some clue or clues that will lead to progress on solving

this persistent puzzle.

First of all, a word on a method of decipherment that cannot work. Epigraphers sometimes call it the

“comparative method,” which is confusing to a linguist like myself, for which the term “comparative

method” has reference to a time-honored method of historical reconstruction of a vanished language --

such as Proto-Indo-European, the hoary ancestor of both Sanskrit and English -- of which we have no

direct records. The “comparative method” involves comparing cognate words in different languages and

trying to discern regularities among them, in order to ascertain what the ancestor word might have

looked like. In this way, we can reconstruct a great deal of what Proto-Indo-European, or Proto-

Dravidian, or whatever, must have looked like.

For epigraphers, however, the term “comparative method” means to look at signs and compare them to

real-world objects in an attempt to ascertain their meaning, usually assigning sound values to those

signs that correspond to words or roots in the assumed language underlying the writing system. This has

been the approach used by almost all serious investigators of the Harappan script, and by a good many

would-be decipherers of other scripts, and it has not worked. It can have some use for interpretation

after other facts have been established, but when used as the baseline methodology for decipherment,

the “comparative method” again and again has yielded forced, indefensible readings, and often outright

gibberish.

Why is this so? Because the “comparative method” in and of itself has little utility if the identity of the

underlying language is not known beforehand; yet this is something that must be established by a

reasonable evidentiary standard before any other tricks can be employed.

So our first, and perhaps most important task, is to assemble incontrovertible evidence as to the

underlying language or languages we are dealing with. How is this to be done? One important sub-

branch of linguistics is called “language typology.” This is the study and classification of languages

according to structural and grammatical features – canonical order of subject, verb, and object, for

example – rather than family or genetic relationship. Thus, for example, the Indo-Aryan languages of

northern South Asia and Sri Lanka are genetically more closely related to English, French, Spanish,

Russian, and a host of other European languages than they are to Dravidian languages like Tamil and

Telugu. This is because the Indo-Aryan languages, like most of the languages of Europe and some from

the Middle East and Central Asia, all belong to the Indo-European language family, whereas the

Dravidian languages stem from a completely different stock, the Dravidian family. Some have claimed to

discern evidence that, in the pre-Indo-European past, Indo-European, Dravidian, Altaic, and Afroasiatic

languages were all part of a supergroup called the “Nostratic” languages, but that is another story.

On the other hand, if we look at modern Indo-Aryan languages from a typological standpoint, we find

that they are much more similar to language groups like Dravidian and Altaic, because they have evolved

to the point where they have more structural commonalities with these languages (their near neighbors)

than with far-distant European tongues. For example, in Dravidian as well as Indo-Aryan, the canonical

word ordering is normally SOV, or Subject-Object-Verb, meaning that the direct object normally

precedes the verb in this part of the world, whereas in Europe, it normally follows the verb. Another is

the pervasive adoption of postpositions (a Dravidian trait, assumed to have occurred under the long-

term influence of Dravidian), like par in Hindi. Even as far back as early Sanskrit, many prepositions

derived from Proto-Indo-European (and PIE was prepositional) were already being used postpositionally.

So while the genetic relationship of modern Indo-Aryan languages to the rest of the Indo-European

language family has not changed, the typology of the languages has. And it is the typology of a language

that will enable us to home in on its genetic identity. For example, suppose I find coded texts in an

unknown modern language, and I am able to establish that this language has SOV canonical word order,

uses postpositions, uses a correlative construction to embed quoted clauses, and exhibits ergativity or

object-agreement marking in past tense verbs. The only language in the entire world that I’m aware of

that meets these criteria is Hindi (and possibly a few other very closely-related northern Indo-Aryan

languages). I have now taken an enormous step towards decipherment, and I can now invoke whatever

additional knowledge I may have of the language and its cultural and geographical context.

Now how does this apply to the Indus Valley script? Simply in that, by carefully analyzing the behavior of

the signs relative to one another, we may be able to put together a typological portrait of the language

under consideration. This is not a new idea; investigators have been arguing for decades over supposed

evidence for postpositions and case affixes in the script, and we will have to address those issues as

well, eventually. But there is a very important reason that no one has been able to persuade anyone else

of the rightness or wrongness of their interpretation, and that is that nobody has taken the time to try

to figure out what the different domains within the text actually mean.

For example, consider the following inscriptions, which represent four canonical domains or fields within

typical (or Type I) Harappan inscriptions (4-digit index numbers are from Mahadevan 1977):

Table 1: Full Type I inscriptions with all four fields:

T C M I

(1087)

(2446)

(2335)

(4005)

(2325)

(1077)

(1142)

(4023)

(2271)

In earlier work, I have labeled these domains in various ways, but for now, let me call them the initial

field or I-field (I), the metrology field or M-field (M), the core field or C-field (C), and terminal field or T-

field (T). Certain of these fields can be absent, and the others still be recognizable; in this table, we see

only examples of inscriptions containing all four fields.

Briefly, I would like to focus on what I call the metrology field, that so often consists of one or two “oval”

signs , , , and , one or more “fish” signs , , , , , , and , the so-called

“mortar and pestle” sign right adjacent to a triple stroke , and the occasional other stroke numeral

sign (Terms like “mortar and pestle” and “fish” are used for descriptive convenience only; whether the

apparent shape of any of these signs is directly relevant to the sound value or values they may possess

will be considered further on).

Now this metrology or M-field is highly-rigid and non-random; where oval sign(s), fish sign(s), and the

“mortar and pestle/triple stroke” pairing are all present (as with 2446 in Table 1), they nearly always

occur in that sequence. Where an initial field is absent, the M-field often stands in a sort of prefix-like

position to the rest of the inscription. The very nature of the signs participating in this field and their

rigid distributional behavior tells us right away that these are not random “sound signs,” but that they

have some specialized function. As I have elsewhere said and shown in great detail, I think the evidence

is overwhelming that these sign clusters represent notations of value, weight, property, and assets

generally; hence my terming them the “metrology field.” For the sake of those unfamiliar with those

arguments, they are laid out in great detail in my monograph, and were also referred to in yesterday’s

talk; I summarize them here again briefly.

Firstly, the “fish” grapheme is very frequently associated with various stroke numeral-like signs,

showing beyond any reasonable doubt that, in very many of its uses, it is related to numbers in some

respect. Table 2 shows a few examples of such occurrences:

Table 2: Fish grapheme with stroke numerals

(1041)

(1365)

(1376, 2374)

(2594)

(3246)

(3351)

(4073)

(4141)

(9822)

(2524)

(1019)

(2229)

(7243)

(4171, 4873)

(1314)

(4356)

(4377)

(2128)

(2233)

(4116)

(4009)

(7224)

Second, the six fish sign compounds , , , , , and based on the fish grapheme , which

participate in clusters of up to three, behave distributionally like units of measurement, presumably

weights. They therefore most probably represent a series of seven standard weights, conventionalized

multiples of some fundamental unit probably represented by the fish grapheme , which may also

occur with various stroke numerals, as shown in Table 2, when not representing amounts that can be

reckoned in terms of these standardized multiples. The series of “oval” signs associated with the “fish”

signs probably represent a second series of weights, presumably a heavier or lighter series of three or

four, depending on whether and are separate signs, or merely allographs, as I believe.

None of this is to say, by the way, that certain of these metrological signs do not serve elsewhere in

different functions; but it is to say that, when they are found in the context of a metrology field, they

must be interpreted as such and not as, say, random syllables from a logosyllabic inventory.

Now this understanding will have an enormous impact on how we regard the script, the inscriptions,

and, most importantly, the potential typology of the language or languages under consideration. This is

because we can in effect cordon off metrology or M-fields (and other such numerical and metrological

notations as may be found) and disregard them for the purposes of assessing the language itself, or at

least, its typology. There is one important feature of M-fields which may turn out to be extremely

important in helping us to ascertain what language we are dealing with, but we will set it aside for the

moment.

The problem that many other investigators have had, in trying to assess the typology of the Harappan

language as evidenced by the structural traits of the inscriptions, is that they have failed to make this

crucial differentiation and to recognize that the metrology field must be treated separately from other

fields. This is also the probable source of the error in the claim that the Indus script cannot be writing

because it is not random enough; that statement is actually true if you treat the rigidly ordered, very

frequent M-fields on equal footing with the rest of the script. Even to a casual investigator, the

anomalously regular distributions associated with M-fields is obvious; the trick is to recognize that we

must segregate these from the rest of the inscriptional material in order to find out anything about the

language we are dealing with.

So consider now the following initial or I-fields and core or C-fields, completely dissociated from any

metrology fields that may occur with them, as well as inscriptions that do not have the identifiable fields

typical of Type I inscriptions, which we will term Type II inscriptions (M and H index numbers are from

Joshi and Parpola 1987):

Table 3: Type II (non-ordered) inscriptions compared with I-fields and C-fields in isolation:

Type II inscriptions:

(M-1)

(M-3)

(M-6)

(H-29)

(1018)

(1038)

(1056)

(4024)

I-fields:

(from 1087)

(from 1065)

(from 2325)

(from 1033)

(from 1077)

(from 1142)

(from 2018)

(from 2271)

(from many inscriptions) C-fields:

(from many inscriptions)

(from many inscriptions)

(from many inscriptions)

(from many inscriptions)

(from many inscriptions) Even from this very small part of the data, you can see a much higher degree of randomness, something

much closer to what we would expect for a writing system. There are still patterns, of course – for

example, the three “juncture” signs , , and are much more common after I-fields than after Type

II inscriptions, and are never found in C-fields, and C-fields themselves are more repetitive than either I-

fields or Type II inscriptions -- but for now, we hope to discern patterns within these three fields that will

tell us about the typology of the language we are trying to read.

From the sheer number of signs involved, it seems pretty clear that we are dealing with some kind of

logosyllabic writing system, of which a significant component are signs representing complex syllables of

various kinds, such as CV, CVC, and so forth.

Before proceeding further, we must reasonably ask: aside from the metrology field, what other

information is likely to be included in these inscriptions? The most obvious of all is names and titles,

both of persons and families or sodalities, as well as of places. Quite apart from anything else, it’s very

hard to imagine that names and titles are not to be found on objects such as these. Not all of them

appear to have names and titles – many appear to be purely numerical or metrological -- and that

inescapable fact has been a stumbling block (see Table 4 for some inscriptions that appear to be

numerical/metrological, with no other content, as well as Table 2 above). But the seals were, at least in

some cases, personal items, so it stands to reason that some of them would include the names of

owners, or the names of individuals or sodalities to which assets could be ascribed.

Table 4: Inscriptions from seals with pure numerical/metrological content

(1365)

(4171, 4873)

(4377)

(4009)

(1220, etc.)

(2858, etc.)

(1243, etc.)

(2127)

(1411, etc.)

(2008, etc.)

(7072)

(1025, etc.)

The two fields where we would expect to find names and titles are the initial and core fields; M-fields we

have already identified as metrology/asset related, and T-fields seem to have other functions, discussed

at length in my monograph. Additionally, we will consider Type II inscriptions, which have no discernible

internal field structure.

At this point, I should mention that, in my monograph published last year, I argued that I-fields (called

PCs or Prefix Clusters in that work) are best interpreted as indicative of goods and property. I have since

modified my view, because the data seems to point in another direction.

I argued previously that many of the Type I inscriptions are likely of the general form, “such and such

amount/assets belong to so and so;” this general interpretation is suggested by the presence of the

metrology fields. The part of the notation indicating assets is likely to be not only highly stylized but also

abbreviated, much as measurements and accountancy-type information has always been recorded.

Names and titles, however, are likely to be rendered with more fidelity to the sound shape of the

underlying language. Therefore, it is in the I-fields, C-fields, and Type 2 inscriptions that we are most

likely to encounter character strings whose patterning will actually tell us something about the

underlying language.

First of all, what are the three “juncture signs” , , and likely to signify? The most likely answer is

noun case, or something akin to that. The most likely cases are the genitive and the dative, because

these are the two cases that we find associated with possession in South Asian languages. However, we

must be cautious, because the notion of “case” is something recognized both by modern linguists and by

classical Indian grammarians, like Pāṇini, but may not have been a grammatical category acknowledged

by the Harappans. So while the Harappan signary may have signs that happen to coincide with case

affixes, the signs themselves may actually denote notions associated with case, like possession,

partitivity, and so on. For now, I will – along with various other investigators over the years – assume

that the three juncture signs represent properties or relationships that are marked by a case or cases.

Whether they represent three different cases, or (more likely, in my opinion) three different forms of

the same case or case-based relationship, we cannot yet determine. We will denote the general

meaning of these three signs by the word CASE.

So what we have now are a series of inscriptions of the form [Character string]-CASE (nearly all I-fields,

and some Type II inscriptions), or merely [Character string] (many Type II inscriptions and all C-fields).

Because of the length of such character strings, and because they frequently (though not always) end

with a sign denoting case, directly or indirectly, we conclude that these strings represent either simple

or compound nouns, which denote individual, family, caste, and tribal names or titles, and possibly place

names as well. Examining just these strings that we have assumed to be simple or complex nouns, we

can now look for typological characteristics that may give clues to the likely identity of the underlying

language.

One example of a typological contrast between Dravidian and Indo-Aryan is the propensity for forming

noun compounds. As all students of Sanskrit soon discover, early Indo-Aryan was a language rich in

compounds – so much so that the classical grammarians devised mnemonic names for all the different

classes of compound, names like karmadharaya, tatpurusha, and bahuvrihi. These compounds consisted

of nouns, verb stems, and modifiers strung together in various configurations to form elaborate

compound words. This capacity becomes more elaborate in the classical language; in general, the norm

in the Vedic language is for compounds to have two elements, but compounds with more than two

elements are also encountered. It is worth noting that Indo-Aryan’s propensity for forming compounds

is especially pronounced with names, both of gods and men. So the mysterious Vedic deity

Hiranyagarbha consists of two elements, hiranya, ‘golden’, and garbha, ‘womb, egg.’ Depending on

whether this compound is interpreted as a noun or an attributive, this name could mean ‘golden womb,’

‘golden embryo,’ ‘golden egg,’ or – because garbha was also used as a grammaticalized suffix that

meant ‘containing’ or ‘possessing’ – ‘possessing gold.’ Moreover, both of these elements – hiranya and

garbha – may occur as separate words. We still encounter, all over Hindu and Buddhist portions of

South Asia, surnames and given names that are in fact Indo-Aryan compounds. To cite just a few

examples: Vishnuśarma, Mahādeva, Maitripala, Gopala, Gunasena, Rājasinha, Gunaratna, Premadāsa,

Candrasekara, and on and on. The Sri Lankan author of several papers describing the gypsies I’m

studying in Sri Lanka has the formidable Tamilized surname Thananjayarajasingam, which contains four

different Indo-Aryan words in a single compound name: sthana, jaya, rāja, and sinha. Certain words are

particularly popular in old Indo-Aryan names, including elements like mahā, rāja, sinha, sena, śarma,

varma, pala, deva, guna, and many others. Moreover, we find that some of these elements can be

combined in various additive ways to form names with more than two elements; thus Rājasinha is

certainly an admissible surname of considerable antiquity – but so is Mahārājasinha. Besides these, the

attested number of noun compounds used as given names, surnames, and tribal names in Indian

antiquity runs into the many thousands, many of which are conveniently listed in venerable modern

sources like Monier-Williams’ dictionary.

The situation with Dravidian languages is significantly different. Authentic Dravidian nouns show

comparatively little tendency to form compounds, particularly compounds of the noun + noun, noun +

verb stem and adjective + noun varieties so typical of Sanskrit and early Indo-Aryan generally. This can

easily be discerned by comparing a piece of Sangam Tamil literature with Sanskrit literature. Dravidian

names, surnames, god names, and the like are similar (although certain words of Indo-Aryan origin, like

Tamil cāmi, derived from Indo-Aryan svāmin, are found nowadays in compounded deity names). This is

not to say that noun compounds are completely absent even from early Dravidian, but they are a much

less an important feature. For all intents and purposes, early Indo-Aryan was typologically a language

very much given to forming elaborate nominal and adjectival compounds. Dravidian, on the other hand,

was not. Hence evidence for the presence or absence of nominal compounds in the Indus Valley script

would be a very significant typological clue.

Consider now the data on Table 5. We term “autonomous” any sign sequence that can occur as a full I-

field, C-field, or Type II inscription. Such a sequence will then denote either a single lexeme or

compound of more than one lexeme.

Table 5: Evidence for Possible Noun Compounds

I. Autonomous Sign Sequences:

Inscription Autonomous Sequence

3122, etc.

2600, etc.

4331, etc.

2144, etc.

2380, etc.

1013, etc.

2556

4522

2691

II. Combinations of Autonomous Sign Sequences:

Inscription Autonomous Sign Sequences in Combination

4419 +

1058 +

4119 +

1322 +

4334, etc. + +

2288 +

1010 +

These sets of inscriptions illustrate very clearly that a significant number of Harappan core fields appear

to be composed of smaller units that are recognizable as potentially independent lexemes, because they

occur as such in other contexts. We note that the terminal field, typified by the so-called “jar sign” , is

a separate element as well, and clearly apart from the core fields that precede it.

In particular, note inscription 4334, an inscription with multiple occurrences in the Harappan corpus.

This inscription consists, in addition to the terminal sign , of three different C-field sign sequences run

together (in addition to an M-field at the beginning), each of which occurs elsewhere as an independent,

stand-alone element. In other words, because each of the three C-field subsequences in 4334 ( ,

, and ) are found as isolated elements elsewhere, they must have the value of potentially

independent lexemes – presumably all nouns, since they, like other inscriptions in such contexts, are

likely names and titles. Their presence in this compound inscription is a particularly vivid illustration of

the likelihood of nominal compounds in the language underlying the script – more specifically, of

compound names and titles of the type alluded to earlier.

We have now gained one important typological clue – the probable formation of nominal compounds –

from distributional patterns in initial and core fields within Type I inscriptions, and in Type II inscriptions

generally. It is now worthwhile to ask whether we can garner any additional clues from other

inscriptional fields.

Because the metrological field is so frequently encountered, let us examine this field in some greater

detail. These fields are of variable complexity, and some signs, like the “staff” sign or signs / , which

appear to be metrological in their own right by virtue of frequently appearing left adjacent to various

stroke numerals (see Table 4 above), nevertheless do not participate in the clustering patterns

associated with the normal metrology field. We will discuss the “staff” sign in further detail elsewhere;

for now, consider the structure of a few typical, complete metrology fields in Table 6:

Table 6: Inscriptions with “complete” M-fields (oval sequence + fish sequence + (M-field

underlined)

(4028)

(2446)

(2426)

(4015)

(2541)

(1802)

In Table 6, we see several examples of “full” metrology fields, which consist of three distinct elements,

each of which may occur by itself or in combination with either of the other two to form “reduced”

metrology fields. When all three of these elements or subfields occur in a metrology field, they almost

always observe a canonical order. Rightmost, or in initial position within the metrology field will occur

up to two of three or four signs involving a common oval motif or circumgraph ( / , , and rarely,

). Left adjacent to this “oval cluster” or subfield is a cluster of up to three different signs involving a

fish-like motif or circumgraph, which most investigators, myself included, have termed “fish signs.” As

already mentioned, there are a total of seven different fish signs that participate in such clusters ( , ,

, , , , and ), one of which is the basic fish grapheme also found in non-metrological

contexts, and the other six of which appear to be compounds involving the fish grapheme, which are

found only in metrology contexts only. Two of these compound signs, as we have shown elsewhere, are

written as two graphemically separate elements – a fish grapheme left-adjacent to a double bar ( )

and a rake-like sign ( ), respectively, while the other have more the appearance of ligatured single

signs.

These two subfields – the oval cluster and the fish cluster – both have the appearance series of weights,

for reasons that I have discussed elsewhere and need no repetition here. As already mentioned, we

might posit that the fish sign class represents a series of seven graded weights, and the oval sign class a

separate but related series of an additional three or four graded weights. For both of these classes, we

observe that there is never an instance of gemination within the M-field, although the fish grapheme

itself ( ) does geminate occasionally when occurring in other fields.

The third subfield in the metrology field is somewhat anomalous. It usually occurs in leftmost or final

position within the metrology field, and, as already mentioned, always consists of two signs, the first or

right member of the pair resembling a stylized mortar and pestle, and the second or left member of the

pair the numeral-like sign consisting of three long strokes: . This sign pair does not appear to be

part of a series of related measures, like the fish series and the oval series – for one thing, unlike the fish

and oval series, it is not part of a series of signs built on a common graphological motif – but instead

appears to be some kind of term or word associated with metrology. When in combination with a

fish/oval series, it behaves almost like a term somehow significative of weighing or measuring, instead

of some unit of mensuration per se – as a modifying term, so to speak. But this pair may also stand with

nothing besides left-adjacent terminal sign , and no accompanying metrological cluster.

There is a term in South Asian languages whose uses would appear to square with the distributional

characteristics of , the Indo-Aryan pleonastic number/measurer mā-tra-m (also found as mā-trā

and occasionally mā-tri), whose Dravidian equivalent is probably to be found with the root aḷ- (Tamil

aḷavu, ‘number, amount, measure’; see Murray and Emeneau 1984: 27-28). mā-tra-m is a peculiar word,

meaning literally ‘measure’ and ‘amount’ (and is in fact very distantly cognate with English words like

‘meter’ and ‘measure’), but also denoting a specific, fundamental unit, a conventional “measure” such

as was also found in Mesopotamian systems of weights and measures. This word is often difficult to

translate to English since, when it follows a particular number or amount, it has the sense of “exactly” or

“the exact amount of;” thus we have, from Monier-Williams (1990: 804), the following discussion of the

uses of mā-tra-m:

Measure, quantity, sum, size, duration, measure of any kind (whether of height, depth, breadth,

length, distance, time or number, e.g. ańgula-mātram, a finger’s [ańgula] breadth… artha-

mātram, a certain sum of money…krośa-mātram, at the distance of a Kos [krośa] …māsa-mātre,

in a month…śata- mātram, a hundred in number…. [additional definition] amounting (only) to

(pleonastically after numerals; cf. tri-mo).

The feminine form of this word, mā-trā-, is discussed as follows in Monier-Williams:

[M]easure (of any kind) , quantity , size , duration , number , degree &c. RV. &c. &c. … unit of

measure , foot … unit of time , moment … the full measure of anything (= mātra) … right or

correct measure … a minute portion , particle , atom , trifle …. [M]aterials , property , goods ,

household , furniture , money , wealth , substance , livelihood (also pl.)

These uses of mā-tra-m/mā-trā show that this term can denote both measurement and property of

many kinds; its use as a pleonastic indicator of number or amount is familiar to advanced students of

Sanskrit, because it crops up quite frequently in the literature used in this way.

Tamil aḷavu appears to be the closest attested Dravidian synonym to mā-tra-m/mā-trā; its meanings

include (Winslow 1989: 49): “measure, degree, proportion, quantity, magnitude, size, capacity, bound,

limit, compass, number, standard…the proper or required measure, the complement number, size,

weight, &c.” It thus appears that in both Indo-Aryan and Dravidian, a pleonastic number-measurer was

used. We hypothesize that the pair represented this term (as a pleonasm, translatable as ‘exactly,

in the amount of’, etc.) when left-adjacent to a fish/oval cluster, and signified simply either ‘a [standard]

measure’ or ‘goods, property’ when it did not. I shall gloss this pair, for now, as [MEAS] when occurring

in the pleonastic sense, and MEASURE when intended to signify either a conventional measure or

property generally. Note that, in both Indo-Aryan and Dravidian, this term would normally follow a

number or unit of measurement indicated, which squares with the tendency of to occur as the

leftmost element in a metrological cluster.

A few examples of glossed in context are shown in Table 7, with entire MCs underlined as before,

and divided into a fish/oval cluster and :

Table 7: as measure/pleonastic number-measurer:

(2015, 2575, 5089, 7229) TC MEASURE

(4015) TC [MEAS] MC

(4028) TC [MEAS] MC IC

(2426) TC CC [MEAS] MC

(1456) TC CC [MEAS] MC IC

(2446) TC CC [MEAS] MC PC

(2541) TC CC [MEAS] MC

(2654) TC CC [MEAS] MC IC

We note that there are a very few instances when the pair is not the leftmost member of an MC,

but these are not the norm. Given that the canonical order of signs in the script is never absolute, even

in patterned contexts, and given also that the script, although generally (but not always) written in a

“linear” style, is nonetheless probably not a “linear” script in the sense that Linear A or B are, such

occasional deviations from a general pattern should not be upheld as absolute disproof. In our view, the

facts that 1) is always apparently associated with metrological contexts; 2) does not appear to be

a functional pair or compound along the lines of, say, , , or (and hence does not appear to be

analogous in function with the “fish” series or the “oval” series), but instead has the appearance of two

discrete syllables or lexemes; and 3) may stand alone (with a left-adjacent TC) to denote an autonomous

pair as well as in association with fish/oval clusters, are all distributional and graphological traits

suggestive of the characteristically South Asian pleonastic number-measurer.

But is there anything about this pair of signs that is suggestive of either the Dravidian or the Indo-Aryan

term? We first note that the sign is one of the more frequent and randomly-occurring signs in the

Harappan signary, occurring frequently in both initial clusters and Type 2 inscriptions. This circumstance

suggests that likely has the value of a high-frequency syllable or other common sound. Table 8 gives

a sampling of such occurrences:

Table 8: in Contexts Other Than

(2105)

(2107)

(2284)

(1464)

(1038)

(1142)

(2440)

(3105)

(1337)

(1310)

(5078)

(1340)

Note that occurs in initial, medial, and final position within the initial, core, and Type 2 sequences in

which it occurs, suggesting by its comparatively random distribution that it represents a common sound

that appears in diverse phonotactic contexts.

The triple stroke , while looking like a stroke numeral, sometimes appears in contexts aside from

in which it seems to have non-numeric or phonetic value; a sampling such occurrences is shown on

Table 9:

Table 9: in Probable Non-Numeric Contexts Aside from :

(4482)

(2109)

(4474)

(5480)

Hence we should most likely regard as a brief syllabic sequence, probably of two syllables, with

each one corresponding to each of the two signs.

The three-stroke sign is suggestive of the number three. In Dravidian, the root form of this numeral is

mu- or mun-, while in Indo-Aryan, it is tri- or tra-. Comparing these potential values with the Dravidian

and Indo-Aryan terms for ‘measurement’ or ‘amount’ -- aḷa- (Burrow and Emeneau 1984, entry 295) and

mātra, respectively, we see immediately an interesting possible correspondence, namely, that the Indo-

Aryan term ends in the affix –tra or –trā, a homophone for one of the forms of the root for the numeral

‘three.’ The word mātra is composed of two elements, the root mā-, ‘measure,’ and the rather

productive suffix –tra used to create both agent nouns and nouns “expressing the instrument or means”

(MacDonell 1990: 257) in Sanskrit; other common Sanskrit nouns derived from verbal roots that employ

this suffix include kshetra-, ‘field,’ pātra-, ‘cup,’ vastra-, ‘garment,’ mantra-, ‘prayer,’ kshatra-, ‘power,

dominion,’ yantra-, ‘instrument, device,’ gotra-, ‘family, lineage, race,’ and mitra-, ‘friend’ (ibid., p. 258).

It is therefore natural to wonder whether the long three-stroke sign might correspond to the syllable

–tra- and/or –trā-, with other word-final occurrences corresponding to uses in other words ending in the

common syllable –tra. This would imply a syllabic value of -mā- (as well as –ma-, if we assume that

vowel length is not distinguished by Harappan characters) for . Given the relative frequency and

random distribution of this sign, a common syllabic value like –ma- would be expected.

Such an interpretation would imply an Indo-Aryan rather than a Dravidian interpretation for the

Harappan language. Since I am a Dravidianist who has studied Tamil for more than two decades, whose

PhD work involved the documentation of a dialect of Sri Lankan Tamil, and who is currently researching

a dialect of Telugu -- another Dravidian language -- spoken by Sri Lankan gypsies, I have to confess that a

conclusion of this sort leaves me with mixed emotions. It will likely be rejected without a hearing by

some scholars in the West, for whom a consensus has grown up over the last 70 years or so that, if the

Harappan language is related to any living language, it must be Dravidian. An Indo-Aryan solution, it is

widely maintained outside of India, is ruled out because we supposedly know that the Indo-Aryans

arrived in the Subcontinent sometime after the middle of the second millennium BCE. For a long time, it

was assumed that the Indo-Aryans replaced the Harappans in northern South Asia, and may have even

been responsible for their demise.

But what, in fact, do we really know? The Vedas are essentially ahistorical; attempts by non-Hindu,

Western scholars to discern clues as to their historical context – wars between light and dark-skinned

races, for example – are highly speculative, even though they are now consecrated by scholarly

tradition. The fact, often cited by Western Indologists, that many in India who equate the Indus Valley

civilization with the Hindu mystical geography of the Land of Seven Rivers, the Sapta-Sindhava of the

Vedic rishis, are motivated not by science but by cultural chauvinism, is no doubt correct in many

instances, but it does not mean that they are mistaken. Likewise, of course, the fact that the Dravidian

hypothesis has been assumed by generations of competent scholars in the West (as well as some here in

India, like our esteemed colleague Professor Mahadevan) but has so far failed to yield any progress

towards decipherment does not necessarily mean that the Dravidian hypothesis is wrong.

What I am suggesting is that we must all look at what the script itself is trying to tell us by examining the

internal evidence of how the signs are distributed with respect to one another. And so far, the

distributional evidence seems to suggest that the underlying language is very amenable to the formation

of noun compounds, and possesses a pair of signs associated with mensuration clusters for which a very

nice explanation would be afforded by assuming a pair of Indo-Aryan root values.

In my recently-published monograph, I mentioned some other signs whose graphology combined with

distributional characteristics would be neatly explained by assuming Indo-Aryan rather than Dravidian

root values. However, in the interest of time, I leave aside the remainder of that material to proceed

with some newer material, and to discuss matters pertaining to methodology.

One thing that I published in my monograph, which I no longer think is correct, is that the initial

character fields (which I termed “prefix clusters” in the monograph) were likely to contain descriptions

of commodities, property, and assets, for which the metrology clusters gave specific amounts. This claim

was motivated in part by the fact that any inscription of the type “such and such belongs to so and so”

could be expected to describe not only amounts, but also the composition of assets. It was also

motivated by the fact that certain signs associated with mensuration, like the aforementioned “fish”

and “staff” signs and also crop up in initial clusters.

However, if we consider the normal syntax of expressions of ownership in South Asian languages, we

find that the subject or owner normally precedes that which is owned or possessed, with the predicate

or verb in final position. Thus, e.g., in Tamil we can say:

avanukku oru puttakam irukku

him (Dative) a book is (‘He has a book,’ literally, ‘to him a book is’)

The subject (avanukku) – which in Tamil and other Dravidian languages is typically in the dative case –

appears first, followed by the thing possessed, ‘a book’ (oru puttakam), and the possessive copula ‘is’

(irukku) is sentence final.

Sinhala, an Indo-Aryan language, has the same syntactic order:

eyāTǝ potak tiyenǝwa

him (Dative) book is

We find the same pattering in ancient Indo-Aryan, except that the genitive rather than the dative case

sometimes marks the subject, since the dative case was often merged with the genitive in the later

classical period. In Pāli, for example, we might have:

mama hirañña suvaṇṇam atthi

me gold suvarna is (‘I have a suvarna of gold’)

We also see instances where the possessor follows the possessed, but these appear to be in embedded

clauses or for emphasis. So we cannot necessarily rule out that the asset possessed might appear first,

but it appears typologically less likely as the normal, default syntactic order for South Asian languages.

What I sought to explain in my monograph was why certain signs clearly associated with mensuration

(the “fish” and “staff” signs and , as mentioned above) are found quite consistently in initial

fields.

In my monograph, I unwittingly furnished a possible explanation, without realizing it. In the monograph,

operating on the assumption that the language of the inscriptions might after all turn out to be Indo-

Aryan, I noted that a very common and ancient unit of measurement was the pala, which, besides

denoting a unit of weight equivalent to four karshas, also meant ‘straw.’ The “staff” signs and

resemble pieces of straw, and on this basis, I tentatively assigned to these two signs (which I hold to be

graphological variants of the same sign, although not all investigators agree) the sound value pala to

accommodate this provisional interpretation. This being the case, I assumed – because this sign is often

found in initial clusters, and nearly always in penultimate position, right adjacent to a juncture sign,

which we assume to represent case or a case-like relationship – that such contexts also signified

mensurable assets or property, and in such contexts the “staff” sign either signified the pala as a unit of

reckoning or, perhaps, was a classifier of dry goods generally or some such.

Upon further reflection, however, I was concerned about the matter of typology mentioned earlier, that

owner tends syntactically to precede rather than to follow the object owned. Additionally, there was the

vexing matter of Type 2 inscriptions, which pattern like initial fields, and are too diverse to simply be

explained as notations of assets, or so it would appear. Since Type 2 inscriptions, as mentioned

previously, are overwhelmingly likely to contain bearers’ or owners’ names, does it not seem likely that

initial fields do as well?

On this new assumption, I re-examined the data for the “staff” sign, to ascertain whether was another

possible explanation for the presence of this sign in initial field. Some of this data is shown in Table 10.

Table 10: Staff Signs and in I-Fields, C-Fields, and Type 2 Inscriptions (outside of M-fields)

2666

3083

1060

1087

2018

9091

1150

4351

4161, etc.

2176

9011

1186

2241

2271

1058

Inspecting this data, we see right away that and , when occurring outside M-fields (which M-fields,

for these two signs, usually appear to take the form of or + various stroke numerals; these fields are

independent of M-fields involving fish and oval clusters; see examples on Table 4 above), typically occur

in final position in a Type 2 inscription or C-field, and in penultimate position right-adjacent to a juncture

sign in an I-field. Note also that it occurs several times left-adjacent to the “bowman” sign . We even

have one instance (9091) of occurring both in an I-field and an M-field.

It happens that there is a word in Indo-Aryan, pāla-, ‘protector, guard, keeper,’ which differs from pala-

only in having a long (ā) rather than short (a) root vowel. More significantly, this word occurs with very

high frequency as the final element in compound Indo-Aryan names, many of which persist into the

present day – Gunapala, Senapala, Dharmapala, Dhanapala, and Chandrapala, e.g. – while many others

figured in antiquity, like Devapāla, Mahīpāla, Brāhmaṇapāla, and dozens of others recorded by Monier-

Williams. It is even part of the name of the modern country Nepal (IA nepāla). If we make the rather

natural assumption, mentioned previously, that vowel length was not differentiated by the Indus

signary, and identify and with the sound value pAla (where uppercase A represents both long and

short a), then we suddenly have a neat explanation for why this sign might appear in both M-fields and

as the (usually) final element of what may well be names and titles, in I-fields and C-fields.

Moreover, if we note the sign pair / , which is one of the commonest pairings involving the

“staff” sign outside of M-fields, we observe a possibly significant coincidence. In my monograph, before

embarking on this line of inquiry, I identified the “bowman” sign with the rather arcane sound value

DHAN2, for reasons which I don’t want to go over in detail, but which amount to an argument (which I

still think is valid) that and appear to have similar or identical sound values, and that that sound

value is likely to be similar to the Indo-Aryan root for ‘money, grain’ (dhan-/dhān-) and ‘bow’ (dhan-). A

literal sound value of either dhAna- or dhAnya- seems to be a plausible reading for .

It just so happens that one of the most common names involving –pāla, both anciently and modernly, is

dhanapāla, which means not only ‘guardian of the treasury, treasurer’ (attested in the Vedas, according

to Monier-Williams) but was also the name of a king, a grammarian, a writer, and many other people.

What’s more, dhānyapāla is also attested as a family name in Monier-Williams.

If this interpretation of the “staff” sign is correct, we have a valuable potential “entering wedge” for

determining other possible sign values, by comparing fields ending in and with other attested names

ending in –pāla.

Fortunately, we now have available a useful tool for investigating potential sound values, in the form of

an online, searchable version of Monier-Williams’ Sanskrit dictionary. Put online by the University of

Cologne (http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/tamil/index.html), this version is

searchable by exact character string, prefix character string, and substring, features that are invaluable

for ascertaining possible sound values in association with known or strongly suspected sound values. For

example, a search of the substring pala yields a total of 803 results, including both pāla and pala, since

this online version of Monier-Williams represents long vowels in uppercase and short vowels in

lowercase, whereas the search program is not case-sensitive. This turns out to be a useful feature for

our purposes, however unintended, if it turns out, as I believe, that vowel length is not represented in

Harappan writing.

In any event, poring over the entries for a given search can be an exhaustive enterprise, since it can

involve inspecting hundreds of entries and recording adjacent syllables, then searching for those

adjacent syllables as substrings and comparing their behavior with candidate Harappan signs.

We should note, for those interested in pursuing the potentialities of a Dravidian solution, that the

University of Cologne has also put online a large Tamil dictionary at the same URL, searchable with the

same parameters, and Burrow and Emeneau’s magisterial Dravidian Etymological Dictionary has been

online and searchable for years. Tools such as these – searchable databases of lexicons – are

indispensable for this particular task, where we are confronted with very large numbers of very short

inscriptions, and the potential for perceiving patterns is enhanced by the ability to look at many

inscriptions and compare them with large assortments of lexical items.

Finally, a word on another question that may have some bearing on the typology of the language

underlying the Indus script. The Harappan signs exhibit a number of features besides overt signs, such as

superscripting, gemination, and elaboration via the addition of internal or external hatch marks, which

could denote grammatical properties.

A feature of grammar found to a greater or lesser extent in every language is agreement, or in other

words, the use of morphology to mark properties assigned to related words. If a writing system marks

some grammatical trait that exhibits agreement in the underlying language, we might expect that the

graphology signifying such agreement to display regular repetition. In other words, suppose the

grammatical property [feminine gender] exhibits adjective-noun agreement in a hypothetical language,

as it does in many Indo-European languages ancient and modern, and the writing system uses a sign X to

denote [feminine gender]; we then might expect to see repetitions of sign X whenever a sequence

(feminine) adjective-(feminine) noun is represented. If Y represents some adjective and Z a noun in the

feminine gender, we would observe the sequence YX ZX (assuming that adjectives precede, or are

represented as preceding, nouns). Finding sign X frequently in such repetitive contexts might then be

construed as evidence that X denotes some grammatical trait subject to agreement.

Nor need grammatical properties be represented only by discrete signs. The Indus script, as noted

above, appears to have a number of systemic properties whereby signs may undergo positional and

graphological modifications.

One prominent feature of the Indus script is the occurrence of many doubled or “geminate” signs, two

of which ( and , as against and ) actually occur more frequently than their non-geminate

counterparts. Most such geminates are of the repetitive, side-by-side variety, as for instance , ,

and , but some are mirror-image doublets along either the horizontal ( , e.g.) or vertical ( ,

e.g.) axis. Some signs, like and , have the appearance of “overlapping” geminates, though they

may in fact be separate signs unrelated to . There are, furthermore, doublets that appear to be quasi-

geminate, that is, pairings of two signs that have the same fundamental graphology but differ by some

added graphological character; such “quasi-geminates” include , , and . Finally,

gemination of ligatured features sometimes seems to occur, especially where is the dominant

grapheme. There are two occurrences of , three of , 21 of , (as well as one apiece of the

sequences and , where the signs in question are not technically ligatured, but which give

every appearance of being analogous to , etc.).

It is of course possible that gemination is not a graphological operation at all, but that instead doubled

signs are either purely distinct signs by convention or that their occurrence is simply representative of

repeated sounds or lexemes. The second possibility seems extremely unlikely, inasmuch as two common

signs ( and noted above) occur far more frequently doubly than singly, a circumstance that would

seem to disallow the notion that, e.g., is merely a repeated syllable or word – though of course it

is possible and even likely that at least some doubling of signs is incidental and not systematic. The first

possibility, that geminate signs are completely distinct from their non-geminate counterparts, is

plausible, but also unlikely.

But if gemination does represent some significant linguistic property, what might that property be? The

data is probably insufficient to make a confident determination, but one clue might be the co-

occurrence of various geminates within the same inscription, sometimes (though not always) adjacent.

Table 11 shows some examples of co-occurring geminates:

Table 11: Co-Occurring Geminates and Quasi-Geminates:

(8108-8115)

(9902)

(2436)

(4589, 5490)

(2035)

(2043)

(2405)

(4003)

In light of data such as this, that shows co-occurring geminate signs, gemination might be regarded as

denoting some grammatical property, such as gender, that is susceptible to agreement, but given the

paucity of examples, a surer determination will have to await the publication of more texts.

Another interesting property of the script is the rather conspicuous tendency to modify signs by means

of either internal or external “bristles” (sometimes also called “hatching;” I assume, perhaps too readily,

that both internal “hatching” and external “bristles” likely denote the same property). Some sign pairs

with similar graphology except for the presence versus the absence of bristles are shown on Table 12:

Table 12: Some Signs with “Bristled” and “Hatched” correlates:

For at least some of these signs, there is distributional evidence that their similar graphology is not

coincidental, but that these sign pairs really do have a shared meaning element. In the case of the pair

, we actually find them in juxtaposition in several instances:

(4320, 5263)

(4589, 5490)

We also find one occurrence of the pair :

(4188)

is too scarce a sign in the available corpus to furnish much useful information, but it is worth noting

that, in one of the three occurrences recorded in the Mahadevan concordance, it occurs in a context

where is also known to occur, namely, right-adjacent to :

(4359)

occurs only six times, but one of them is in a context characteristic of and several of its

apparent compounds ( , , and ): right-adjacent to :

(4410)

Both and are rather frequent signs with rather divergent patterns of distribution, but at least one

common context (right-adjacent to ) suggests a possible relationship:

(2005)

(4010)

(2651)

(4599)

(1337) Although none of these hatched signs occurs with anywhere near the frequency of their unhatched

counterparts, the evidence does suggest that many of them are in fact compound signs whereof the

hatching/bristles represent some additive property.

A very interesting inscription recorded by Mahadevan is

___ ___ (4707)

This incomplete inscription is potentially of interest because of the fact that elsewhere the hatched

pairing occurs quite frequently, at least in proportion to the total number of occurrences of

and :

Table 13: Occurrences of :

(4074)

(5336)

(5335)

(4679)

(4372)

(2157)

(2018)

(2603)

(1140)

If we assume that the sequence is merely the hatched variant of , then we might also

conclude that the internal hatching denotes some grammatical property being applied to both and

, if these two signs represent separate lexemes. Table 14 shows evidence that / and /

may indeed have discrete lexemic values:

Table 14: / and as Discrete Lexemes

(1328) T C M I

___ (5065) T C I

(4522; second line ) ? T C

(2360) T C

(1283, 6226, 7043) T C M

(4343, 5237 – 5240;

T C M I second line )

The fact that , , and may all occur as individual signs corresponding to C-fields suggests

strongly their potential value as lexemes. Although we have no unambiguous examples of as a

discrete lexeme, the evidence does suggest that is to as is to .

It is also worth noting that, although we apparently have the unhatched/hatched pair / , never

occurs adjacent to , but does six times, in the sequence (4672, e.g.).

I hypothesize that the hatching/bristling probably denotes some grammatical property that may be

found on adjacent or agreeing lexemes, and that a good candidate for the property in question is

plurality. That is, we would read as [singular ] and as [plural ]. The property [plurality] for

hatching/bristling is suggested by the iconicity of its graphology; what more likely denotatum for

numerous small bristles or internal markings than the plural number? A few other examples of possible

co-occurring hatched/bristled signs (underscored), which may denote discrete lexemes exhibiting plural

agreement, are shown on Table 15:

Table 15: Other Co-occurrences of Hatching/Bristling

(5293; second line )

(4419: second line )

(1058)

(2677)

(9232)

Gemination and bristling/elaboration are but two significant aspects of the Indus Valley script that

appear to be systemic rather than incidental. I have so far been unable to discern unifying patterns for

signs with superscripts of various kinds; and there are possibly other such graphological features that

also convey meaning, which I have not noticed. What grammatical properties these features may

represent – gender, number, or what have you – is difficult to determine. However, if we assume that

case is marked by overt affix signs, like the juncture signs, then we might assume that the other two

major characteristics found in both Indo-Aryan and Dravidian, namely, number and gender, are marked

in some other way; as we suggested above, hatching/bristling might represent plurality.

The meaning of many of the signs and features of the Indus script may remain obscure for a long time to

come, given how comparatively little data we have to work with. But that does not mean that a

decipherment sufficient to enable us to read the inscriptions and to identify the language or languages

that underlie them is impossible. Quite the contrary; I believe that the idiosyncratic patterning of many

of the signs and the fields in which they tend to occur should be more than sufficient to furnish the long-

sought “entering wedge.” As with other decipherments, we cannot ever expect the work to be

complete, but we can hope that, over successive iterations, we can at least gain the assurance that we

are headed in the right direction.

Sources:

Joshi, Jagat Pati and Parpola, Asko 1987. Corpus of Indus Seals and Inscriptions, 2 vols. Helsinki. Mahadevan, Iravatham 1977. The Indus Script: Texts, Concordances, and Tables. New Delhi.