A microsatellite study to disentangle the ambiguity of linguistic, geographic, ethnic and genetic...

14
A Microsatellite Study to Disentangle the Ambiguity of Linguistic, Geographic, Ethnic and Genetic influences on Tribes of India to Get a Better Clarity of the Antiquity and Peopling of South Asia S. Krithika, Suvendu Maji, and T.S. Vasulu* Biological Anthropology Unit, Indian Statistical Institute, Kolkata, West Bengal 700 108, India KEY WORDS peopling of India; Tibeto-Burman; Austro-Asiatic; Dravidian ABSTRACT An understanding of the genetic affinity and the past history of the tribal populations of India requires the untangling of the confounding influences of language, ethnicity, and geography on the extant diverse tribes. The present study examines the genetic relation- ship of linguistically (Dravidian, Austro-Asiatic, and Tibeto-Burman) and ethnically (Australian and East Asian) diverse tribal populations (46) inhabiting differ- ent regions of the Indian subcontinent. For the purpose, we have utilized the published data on allele frequency of 15 autosomal STR loci of our study on six Adi sub- tribes of Arunachal Pradesh and compared the same with the reported allele frequency data, for nine common autosomal STR loci, of 40 other tribes. Phylogenetic and principal component analyses exhibit geography based clustering of Tibeto-Burman speakers and separation of the Mundari and Mon-Khmer speaking Austro-Asiatic populations. The combined analyses of all 46 populations show clustering of the groups belonging to same ethnic- ity and inhabiting contiguous geographic regions, irre- spective of their different languages. These results help us to reconstruct and understand three plausible sce- narios of the antiquity of Indian tribal populations: the Dravidian and Austro-Asiatic (Mundari) tribes were possibly derived from common early settlers; the Tibeto-Burman tribes possibly belonged to a different ancestry and the Mon-Khmer speaking Austro-Asiatic populations share a common ancestry with some of the Tibeto-Burman speakers. Am J Phys Anthropol 139:533–546, 2009. V V C 2009 Wiley-Liss, Inc. The co-existence of a large number of ethnically and linguistically diverse indigenous tribal groups epitomizes a distinctive feature of the Indian subcontinent (Thapar, 1966; Fuchs, 1973; Singh, 1994; Census of India, 2001). These groups, supposedly the earliest inhabitants, embrace potential clues to the peopling of India in the past (Thapar, 1966; Cavalli-Sforza et al., 1994). They are also expected to provide further insights into past genetic history of human evolution as the molecular genetic studies have shown that the Indian subcontinent served as a major corridor and a transit stage for migra- tion and expansion of Man out of Africa to other regions of the globe especially to Southeast Asia and Oceania (Cann et al., 1987; Excoffier and Langaney, 1989; Ing- man et al., 2000; Templeton, 2002; Basu et al., 2003; Kivisild et al., 2003; Barnabas et al., 2006; Chaubey et al., 2007). In view of their importance, an investiga- tion of the genetic affinity and diversity of these cultur- ally and linguistically diverse regional tribes of the sub- continent is crucial. India harbors about 532 scheduled tribes, which repre- sents around 8% of the total Indian population, distrib- uted over wide geographical regions and exhibiting enor- mous diversity in terms of demographic parameters like population size and their distribution; cultural parame- ters such as habits, customs, beliefs, religion; subsistence pattern; language; and ethnicity (Singh, 1994; Bhasin and Walter, 2001). On the basis of language, a majority of the tribes can be categorized into three major linguis- tic families: Austro-Asiatic (AA), Dravidian (DR), and Tibeto-Burman (TB), while Indo-European speaking tribes are very few and based on their morphological or physical features, they fall into two ethnic categories: aboriginal Australian and East-Asian specific types (Singh, 1994; Gadgil et al., 1998; Bhasin and Walter, 2001). Tibeto-Burman speaking populations, chiefly in- habiting the northern and northeastern terrains, show East Asian physical features; while majority of the Dra- vidian speaking tribal populations widely distributed in Southern, Western, Central, and Eastern zones, share physical features akin to Australian tribes (Sarkar, 1958; Malhotra, 1978; Malhotra and Vasulu, 1993; Gadgil et al., 1998). However, the Austro-Asiatic speaking popu- lations spread over eastern and northeastern regions of the country, belong to both the ethnic groups. Most of the Mundari branch of Austro-Asiatic speaking popula- tions show physical features similar to those of the Additional Supporting Information may be found in the online version of this article. Grant sponsor: Indian Statistical Institute, India. *Correspondence to: Dr. T.S. Vasulu, Biological Anthropology Unit, Indian Statistical Institute, 203 BT Road, Kolkata, West Ben- gal 700 108, India. E-mail: [email protected] Received 10 April 2008; accepted 1 December 2008 DOI 10.1002/ajpa.21018 Published online 10 March 2009 in Wiley InterScience (www.interscience.wiley.com). V V C 2009 WILEY-LISS, INC. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 139:533–546 (2009)

Transcript of A microsatellite study to disentangle the ambiguity of linguistic, geographic, ethnic and genetic...

A Microsatellite Study to Disentangle the Ambiguity ofLinguistic, Geographic, Ethnic and Genetic influences onTribes of India to Get a Better Clarity of the Antiquityand Peopling of South Asia

S. Krithika, Suvendu Maji, and T.S. Vasulu*

Biological Anthropology Unit, Indian Statistical Institute, Kolkata, West Bengal 700 108, India

KEY WORDS peopling of India; Tibeto-Burman; Austro-Asiatic; Dravidian

ABSTRACT An understanding of the genetic affinityand the past history of the tribal populations of Indiarequires the untangling of the confounding influences oflanguage, ethnicity, and geography on the extant diversetribes. The present study examines the genetic relation-ship of linguistically (Dravidian, Austro-Asiatic, andTibeto-Burman) and ethnically (Australian and EastAsian) diverse tribal populations (46) inhabiting differ-ent regions of the Indian subcontinent. For the purpose,we have utilized the published data on allele frequencyof 15 autosomal STR loci of our study on six Adi sub-tribes of Arunachal Pradesh and compared the samewith the reported allele frequency data, for nine commonautosomal STR loci, of 40 other tribes. Phylogenetic andprincipal component analyses exhibit geography based

clustering of Tibeto-Burman speakers and separation ofthe Mundari and Mon-Khmer speaking Austro-Asiaticpopulations. The combined analyses of all 46 populationsshow clustering of the groups belonging to same ethnic-ity and inhabiting contiguous geographic regions, irre-spective of their different languages. These results helpus to reconstruct and understand three plausible sce-narios of the antiquity of Indian tribal populations: theDravidian and Austro-Asiatic (Mundari) tribeswere possibly derived from common early settlers; theTibeto-Burman tribes possibly belonged to a differentancestry and the Mon-Khmer speaking Austro-Asiaticpopulations share a common ancestry with some of theTibeto-Burman speakers. Am J Phys Anthropol139:533–546, 2009. VVC 2009 Wiley-Liss, Inc.

The co-existence of a large number of ethnically andlinguistically diverse indigenous tribal groups epitomizesa distinctive feature of the Indian subcontinent (Thapar,1966; Fuchs, 1973; Singh, 1994; Census of India, 2001).These groups, supposedly the earliest inhabitants,embrace potential clues to the peopling of India in thepast (Thapar, 1966; Cavalli-Sforza et al., 1994). They arealso expected to provide further insights into pastgenetic history of human evolution as the moleculargenetic studies have shown that the Indian subcontinentserved as a major corridor and a transit stage for migra-tion and expansion of Man out of Africa to other regionsof the globe especially to Southeast Asia and Oceania(Cann et al., 1987; Excoffier and Langaney, 1989; Ing-man et al., 2000; Templeton, 2002; Basu et al., 2003;Kivisild et al., 2003; Barnabas et al., 2006; Chaubeyet al., 2007). In view of their importance, an investiga-tion of the genetic affinity and diversity of these cultur-ally and linguistically diverse regional tribes of the sub-continent is crucial.India harbors about 532 scheduled tribes, which repre-

sents around 8% of the total Indian population, distrib-uted over wide geographical regions and exhibiting enor-mous diversity in terms of demographic parameters likepopulation size and their distribution; cultural parame-ters such as habits, customs, beliefs, religion; subsistencepattern; language; and ethnicity (Singh, 1994; Bhasinand Walter, 2001). On the basis of language, a majorityof the tribes can be categorized into three major linguis-tic families: Austro-Asiatic (AA), Dravidian (DR), andTibeto-Burman (TB), while Indo-European speaking

tribes are very few and based on their morphological orphysical features, they fall into two ethnic categories:aboriginal Australian and East-Asian specific types(Singh, 1994; Gadgil et al., 1998; Bhasin and Walter,2001). Tibeto-Burman speaking populations, chiefly in-habiting the northern and northeastern terrains, showEast Asian physical features; while majority of the Dra-vidian speaking tribal populations widely distributed inSouthern, Western, Central, and Eastern zones, sharephysical features akin to Australian tribes (Sarkar, 1958;Malhotra, 1978; Malhotra and Vasulu, 1993; Gadgilet al., 1998). However, the Austro-Asiatic speaking popu-lations spread over eastern and northeastern regions ofthe country, belong to both the ethnic groups. Most ofthe Mundari branch of Austro-Asiatic speaking popula-tions show physical features similar to those of the

Additional Supporting Information may be found in the onlineversion of this article.

Grant sponsor: Indian Statistical Institute, India.

*Correspondence to: Dr. T.S. Vasulu, Biological AnthropologyUnit, Indian Statistical Institute, 203 BT Road, Kolkata, West Ben-gal 700 108, India. E-mail: [email protected]

Received 10 April 2008; accepted 1 December 2008

DOI 10.1002/ajpa.21018Published online 10 March 2009 in Wiley InterScience

(www.interscience.wiley.com).

VVC 2009 WILEY-LISS, INC.

AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 139:533–546 (2009)

Dravidian linguistic family, while Mon-Khmer branch ofAustro-Asiatic speaking populations (Shompen and Nico-barese of the Nicobar Island; Khasi tribe of Meghalaya)exhibit East Asian features (Bareh, 1997; Kumar andReddy, 2003). The existence of two morphologically dis-tinct groups within a single linguistic family (Austro-Asi-atic) and the tribes of the same ethnic entity speakinglanguages of two different linguistic families (Austro-Asi-atic and Dravidian) is an interesting observation, butpose problems to investigate the peopling of India aswell as the genetic affinity and the antiquity of tribes.Therefore, understanding this ambiguity between lin-guistic and ethnic affiliation of the populations is impor-tant to obtain clarity of the genetic affinity among thediverse tribes belonging to Austro-Asiatic, Dravidian,and Tibeto-Burman linguistic families. Recently theseissues pertaining to the relationship between geneticsand language were addressed, with the help of moleculargenetic markers, among both global and regional popula-tions. Cavalli-Sforza et al. (1994) observed a good agree-ment between linguistic and genetic diversity across thecontinents. On the basis of multiple microsatellite loci,Belle and Barbujani (2007) examined the associationbetween geography and language in shaping humangenetic diversity among 52 world populations. At re-gional level, for example, in Europe, Y-chromosomemarkers have been used to examine the genetic affinitybetween two modern Hungarian-speaking populations(Csanyi et al., 2008), and the association between Y-chro-mosome diversity (Y-SNP) and linguistic affiliation ofAustronesian- and Paupan-speaking communities in theSolomon Islands was also investigated (Cox and Lahr,2006). Similarly, Hunley and Long (2005) found inconsis-tency between genetic structure (based on mitochondrialDNA diversity) and linguistic classification and also pro-vided evidence of gene flow across linguistic boundariesamong the native North American populations. Recently,the association between the Central Asian contributionand the language replacement episode was examinedbased on Alu insertion polymorphisms in case of Balkansof Anatolia (Berkman et al., 2008). In India, Rosenberget al. (2006) had investigated the association betweenlanguage, geography, and microsatellite diversity insome caste groups.In recent years, ample number of molecular genetic

studies have been attempted to address the issues con-cerning the population structure and antiquity, originand affinity among Indian castes and tribes. A majorityof these studies are concerned with mitochondrial and Y-chromosome markers among different regional popula-tions and a few on autosomal STR markers. A compre-hensive overview of the findings of these studies can befound in the recent publications (e.g., Basu et al., 2003,Endicott et al., 2007, Reddy, 2008, Krithika et al., 2008).In this regard, the issues concerning the antiquity andpast genetic history of the tribal populations and theconfounding influences of region, language, and ethnicityhave remained elusive. This study investigates the am-biguous influence of language and ethnicity, among theTibeto-Burman, Dravidian, and Austro-Asiatic speakingpopulations, so as to get a better clarity of the possiblescenarios of the antiquity of the tribes and the peoplingof Indian subcontinent.The situation among the Indian tribes envisages us to

propose two possible paradigms to explain the observedambiguity, especially between the linguistic and ethnicaffiliation of the tribal populations. Overall, genetic

changes (of morphological characters) occur over long pe-riod of thousands of years (as a result of adaptive pro-cess), while cultural traits, such as language, can changerapidly through short period (several generations) as aresult of cultural contact or diffusion. Therefore, the con-founding influences of linguistic, geographic, and ethnicaffiliations on the genetic status of the tribes, that poseproblems toward the understanding of their antiquityand peopling of Indian subcontinent, can be explained bytwo possible scenarios: (A) Initially the early migrants orsettlers had a common ethnicity as well as spoken acommon language and in due course of time, these popu-lations dispersed to different geographical regions andsome groups have possibly retained their language,while other groups likely to have acquired different lan-guages as a result of cultural diffusion, separation andisolation, and (B) Initially, diverse ethnic groups speak-ing different languages had settled over contiguous geo-graphical regions at different times and in due course oftime probably their language was overlaid by an adaptedor acquired common local language but diverse endoga-mous groups had retained their biological or ethnic iden-tity.These two possibilities (represented in Fig. 1) can best

explain the situation among the three linguistic families,which exhibit diverse physical features. Therefore, it canbe hypothesized that the tribes which are morphologi-cally dissimilar but belong to a common linguistic familycould be the result of cultural diffusion and they areexpected to show least genetic relatedness. In contrast,different linguistic tribes who show close morphologicalsimilarity (or affinity) are expected to show close geneticaffinity. This has been investigated, in this presentstudy, based on our published autosomal STR data (Kri-thika et al., 2005; 2007a,b) on the six Adi tribal groups(of Tibeto-Burman language) and from the compiledautosomal STR data available from literature on addi-tional 40 other tribes belonging to three different lin-guistic families (Austro-Asiatic (AA), Dravidian (DR),and Tibeto-Burman (TB)) representing either East Asianor aboriginal-Australian physical feature.

Fig. 1. Paradigm showing the possible scenario of pastgenetic history of peopling of Indian subcontinent. A: Ancestralpopulation of same ethnicity after settlement in differentregions in due course of time (t) adopt two different languagesas a result of geographical separation and isolation. B: Two eth-nically diverse ancestral groups, gets settled in a contiguousregion in due course of time (t) retain their identity but adopt acommon language.

534 S. KRITHIKA ET AL.

American Journal of Physical Anthropology

MATERIALS AND METHODS

Populations

The study includes 46 Indian tribes constituting twodifferent physical features and representing three lin-guistic families: Austro-Asiatic, Dravidian, and Tibeto-Burman. The studied groups comprise of 13 AA popula-tions from eastern and northeastern regions; 12 DRspeaking groups from eastern, southern, central, andwestern regions; and 21 TB speaking populations fromnorthern, eastern, and northeastern regions of the In-dian subcontinent. The geographical distribution of thestudied populations, their sample sizes, ethnic and lin-guistic affiliations, subsistence patterns, and their datasource are given in Table 1 and their geographical loca-tions are shown in Figure 2. The current study utilizesour recently published allele frequency data, of 15 auto-somal STR markers, on the six Adi tribes of ArunachalPradesh (Krithika et al., 2005, 2007a,b), the populationand the methodology details of which are given in ourprevious publications (Krithika et al., 2005, 2007a,b).The study was funded by Indian Statistical Institute andapproved by the ‘Indian Statistical Institute ReviewCommittee for Protection of Research Risk to Humans.’Further, to understand the phylogenetic relationships

between the different TB-, AA-, and DR-speaking popu-lations of India, the published STR data of the six Adisub-tribes was compared with the reported allele fre-quency data (for nine common STR loci) of 40 linguisti-cally and geographically diverse tribal populations(Chattopadhyay et al., 2001; Kashyap et al., 2002; Sahooand Kashyap, 2002; Sarkar and Kashyap, 2002; Trivediet al., 2002; Gaikwad and Kashyap, 2003; Maity et al.,2003; Rajkumar and Kashyap, 2003; Sitalaximi et al.,2003; Langstieh et al., 2004; Banerjee et al., 2005; Singhet al., 2006; Thangaraj et al., 2006; Bindu et al., 2007).

Statistical analyses

As the objectives of the present study are to investi-gate the affinity and diversity within and between thelinguistic and ethnic groups, the data set has been cate-gorized into three sects: (a) Single (Within) linguisticgroup [SLG] dataset: comprising of each of the three lin-guistic groups separately; (b) Paired (Between) linguisticgroups [PLG] dataset: consisting of a pair of linguisticgroups (results not shown); and (c) Multiple (Combined)linguistic groups [MLG] dataset: the combined set of allthe three linguistic groups.On the basis of the allele frequencies of the nine com-

mon STR loci, the locus-wise genetic diversity (GST)(Nei, 1973, 1987) and the population-wise average heter-ozygosity were estimated. Possible divergence of the al-lele frequencies of the nine STR loci from Hardy-Wein-berg equilibrium (HWE), the unbiased estimates of theexpected homozygote and heterozygote frequencies, thelikelihood ratio as well as the exact test values havebeen considered for each population. Initially, to under-stand the allele range and its variation across thegroups, certain preliminary analyses including the studyof loci exhibiting the maximum and minimum number ofalleles as well as the locus-wise most and the least com-mon alleles, were studied and compared across linguisticgroups [results not shown]. To further investigate theextent of microsatellite diversity across linguistic groups,the unique alleles were also analyzed and the significantdifference of the occurrence of these alleles was tested

by a contingency chi-square test. The difference was alsoinvestigated after excluding the two populations(Kunabhi, Dravidian tribe of Karnataka and Garo,Tibeto-Burman tribe of Meghalaya) that showed themaximum number of unique alleles.For each data set, to understand the genetic related-

ness between the studied groups, pair-wise genetic dis-tances between populations using the modified Cavalli-Sfroza distance (DA) and the standard genetic distance(DST) measures (Nei et al., 1983) were computed usingthe software DISPAN (Ota, 1993). Subsequently, the con-ventional rectangular form of two phylogenetic trees: theunweighted pair group method with arithmetic mean(UPGMA) tree and neighbor-joining (NJ) tree were con-structed based on the DA and DST distance measures byemploying the software DISPAN. To check for the reli-ability and consistency of the clustering pattern of theobtained dendrograms, a total of 1,000 and 10,000 boot-strap replications were separately performed for the dif-ferent data sets. As the dendrograms obtained usingboth the replications were consistent, we present herethe phylogenetic trees based on 1,000 replications andpercentage bootstrap values are shown in the phyloge-netic trees. Bootstrap values above 70% have been con-sidered for recognizing the clusters. To further explorethe topology of the obtained phylogenetic trees includingthe positions and lengths of the branches, branching pat-terns as well as the cluster formation, the radiation formof the trees were also constructed using the phylogeneticsoftware Mega v3.1 (Kumar et al., 2004). As DA distancemeasure is the most efficient for obtaining correct phylo-genetic trees under various evolutionary conditions andalso is least affected by small size (Takezaki and Nei,1996) and because UPGMA and the NJ phylogeniesdepict the similar pattern of relationship between thepopulations, our discussions are based on the DA–NJtrees. The phylogenetic analyses were separately donefor the studied 46 tribes and also for 44 tribes, aftereliminating two Karnataka populations (Halakki andKunabhi) because of their disputed tribal status.Further, Principal Component Analysis (PCA) was per-

formed to examine the clustering pattern of the threelinguistic groups. The PCA plots based on components 1and 2 (and components 1 and 3) were constructed for‘single’ and ‘multiple’ linguistic groups datasets (compo-nent 1–2, 2–3, and 1–3 and 3D plot). This analysis wasperformed based on the DA distance matrix, of thestudied groups, using SPSS software (Version 11.0),Chicago, IL.

RESULTS

Extent of microsatellite diversity

The analysis of unique alleles at nine STR locirevealed a higher frequency of these alleles among theDR (31.9%) and TB speakers (22.9%) but strikinglylower percentage among the AA speaking tribes (8.7%).Among TB populations, Garo tribe of Meghalaya showedthe maximum number of unique alleles (14) followed bylower Adi Pasi (9) and in case of DR tribes, Kunabhi,and Halakki from Karnataka showed the highest num-ber of unique alleles (15 and 12). Unlike in TB and DRpopulations, AA populations showed only three uniquealleles that were observed among Maram, Munda, andSantal.In the combined set of 46 populations, 21.3% of unique

alleles were observed and maximum number unique

535MICROSATELLITE DIVERSITY AMONG INDIAN TRIBES

American Journal of Physical Anthropology

TABLE

1.Sample

size,geogra

phicaldistribution

,ethnic

andlinguisticaffiliation

sandthesu

bsisten

cepatternsof

thestudied46pop

ulation

s

Nameof

the

pop

ulation

Sample

Size

Geo

graphic

Distribution

Ethnic

Status

LinguisticAffiliation

*TraditionalOccupation

Microsa

telliteData

Sou

rce

AdiPasi

(upper)

121

Aru

nach

alPradesh

Mon

goloid

TB,North-A

ssam,Tani

Hunting-gathering,sh

ifting

cultivation

Krithikaet

al.,2007a

AdiPasi

(low

er)

203

Aru

nach

alPradesh

Mon

goloid

TB,North-A

ssam,Tani

Hunting-gathering,sh

ifting

cultivation

Krithikaet

al.,2005

AdiMinyon

g33

Aru

nach

alPradesh

Mon

goloid

TB,North-A

ssam,Tani

Hunting-gathering,sh

ifting

cultivation

Krithikaet

al.,2007a

Hmar-Mizoram

80

Mizoram

Mon

goloid

TB,Kuki-Chin-N

aga,Kuki-

Chin,Cen

tral

Shiftingcu

ltivation

Maityet

al.,2003

Mara

90

Mizoram

Mon

goloid

TB,Kuki-Chin-N

aga,Kuki-

Chin,Sou

thern

Shiftingcu

ltivation

Maityet

al.,2003

Lai

92

Mizoram

Mon

goloid

TB,Kuki-Chin-N

aga,Kuki-

Chin,Cen

tral

Shiftingcu

ltivation

Maityet

al.,2003

Lusei

92

Mizoram

Mon

goloid

TB,Kuki-Chin-N

aga,Kuki-

Chin,Cen

tral

Shiftingcu

ltivation

Maityet

al.,2003

Bhutia

75

Sikkim

Mon

goloid

TB,Him

alayish,Tibeto-

Kanauri,Tibetic,Tibetan,

Sou

thern

Shiftingcu

ltivation

Kash

yapet

al.,2002

Lep

cha

48

Sikkim

Mon

goloid

TB,Him

alayish,Tibeto-

Kanauri,Lep

cha

Agricu

lture

Kash

yapet

al.,2002

Naga

106

Manipur

Mon

goloid

TB,Kuki-Chin-N

aga,Naga,

Tangkhul

Shiftingcu

ltivation

Chattop

adhyayet

al.,2001

Kuki

105

Manipur

Mon

goloid

TB,Kuki-Chin-N

aga,Kuki-

Chin,Northern

Shiftingcu

ltivation

Chattop

adhyayet

al.,2001

Hmar-Manipur

101

Manipur

Mon

goloid

TB,Kuki-Chin-N

aga,Kuki-

Chin,Cen

tral

Shiftingcu

ltivation

Chattop

adhyayet

al.,2001

Garo-W

estBen

gal

110

WestBen

gal

Mon

goloid

TB,Jingpho-Kon

yak-B

odo,

Kon

yak-B

odo-Garo,Bod

o-Garo,Garo

Shiftingcu

ltivation

Chattop

adhyayet

al.,2001

AdiPanggi

110

Aru

nach

alPradesh

Mon

goloid

TB,North-A

ssam,Tani

Hunting-gathering,sh

ifting

cultivation

Krithikaet

al.,2007b

AdiKom

kar

63

Aru

nach

alPradesh

Mon

goloid

TB,North-A

ssam,Tani

Hunting-gathering,sh

ifting

cultivation

Krithikaet

al.,2007b

AdiPadam

50

Aru

nach

alPradesh

Mon

goloid

TB,North-A

ssam,Tani

Hunting-gathering,agricu

lture

Krithikaet

al.,2007b

Garo-M

eghalaya

128

Meg

halaya

Mon

goloid

TB,Jingpho-Kon

yak-B

odo,

Kon

yak-B

odo-Garo,Bod

o-Garo,Garo

Shiftingcu

ltivation

,agricu

lture

Langstiehet

al.,2004

LadakhBuddhist

156

Ladakh

Mon

goloid

TB,Him

alayish,Tibeto-

Kanauri,Tibetic,Tibetan,

Western

,Ladakhi

Priesthood

Trived

iet

al.,2002

Argon

51

Ladakh

Mon

goloid

TB

Tradeandcommerce

Trived

iet

al.,2002

Drokpa

33

Ladakh

Mon

goloid

TB

Tradeandagricu

lture

Trived

iet

al.,2002

Balti

67

Ladakh

Mon

goloid

TB,Him

alayish,Tibeto-

Kanauri,Tibetic,Tibetan,

Western

Tradeandagricu

lture

Trived

iet

al.,2002

Lyngnam

156

Meg

halaya

Mon

goloid

AA,Mon

-Khmer,Northern

Mon

-Khmer,Khasian

Shiftingcu

ltivation

Langstiehet

al.,2004

536 S. KRITHIKA ET AL.

American Journal of Physical Anthropology

TABLE

1.(C

ontinued

)

Nameof

the

pop

ulation

Sample

Size

Geographic

Distribution

Ethnic

Status

LinguisticAffiliation

*TraditionalOccupation

Microsa

telliteData

Sou

rce

Non

gtrai

90

Meg

halaya

Mon

goloid

AA,Mon

-Khmer,Northern

Mon

-Khmer,Khasian

Shiftingcu

ltivation

Langstiehet

al.,2004

Maram

96

Meg

halaya

Mon

goloid

AA,Mon

-Khmer,Northern

Mon

-Khmer,Khasian

Agricu

lture

Langstiehet

al.,2004

Khynriam

146

Meg

halaya

Mon

goloid

AA,Mon

-Khmer,Northern

Mon

-Khmer,Khasian

Agricu

lture

Langstiehet

al.,2004

Pnar

100

Meg

halaya

Mon

goloid

AA,Mon

-Khmer,Northern

Mon

-Khmer,Khasian

Agricu

lture

Langstiehet

al.,2004

Bhoi

90

Meg

halaya

Mon

goloid

AA,Mon

-Khmer,Northern

Mon

-Khmer,Khasian

Shiftingcu

ltivation

Langstiehet

al.,2004

WarKhasi

80

Meg

halaya

Mon

goloid

AA,Mon

-Khmer,Northern

Mon

-Khmer,Khasian

Horticu

lture

Langstiehet

al.,2004

WarJaintia

46

Meg

halaya

Mon

goloid

AA,Mon

-Khmer,Northern

Mon

-Khmer,Khasian

Horticu

lture

Langstiehet

al.,2004

Juang

50

Orissa

Australoid

AA,Munda,Sou

thMunda,

Kharia-Juang

Shiftingcu

ltivation

Sahoo

andKash

yap,2002

Saora

35

Orissa

Australoid

AA,Munda,Sou

thMunda,

KoraputMunda,Sora-Juray-

Gorum,Sora-Juray

Shiftingcu

ltivation

Sahoo

andKash

yap,2002

Lod

ha

99

WestBen

gal

Australoid

AA,Munda,Sou

thMunda,

KoraputMunda,Sora-Juray-

Gorum,Sora-Juray

Hunting-gathering

Singhet

al.,2006

Munda

64

Jhark

hand

Australoid

AA,Munda,North

Munda,

Kherwari,Mundari

Agricu

lture

Banerjeeet

al.,2005

Santal

61

Jhark

hand

Australoid

AA,Munda,North

Munda,

Kherwari,Santali

Agricu

lture

Banerjeeet

al.,2005

Paroja

78

Orissa

Australoid

DR,Sou

th-C

entral,Gon

di-Kui,

Kon

da-K

ui,Kon

da

Agricu

lture

Sahoo

andKash

yap,2002

Irular

54

TamilNadu

Australoid

DR,Sou

thern,Tamil-K

annada,

Tamil-K

odagu,Tamil-

Malayalam,Tamil

Hunting-gathering

Sitalaxim

iet

al.,2003

Dheria

Gon

d36

Chattisgarh

Australoid

DR,Sou

th-C

entral,Gon

di-Kui,

Gon

di

Agricu

lture

Sark

arandKash

yap,2002

Madia

Gon

d45

Maharash

tra

Australoid

DR,Sou

th-C

entral,Gon

di-Kui,

Gon

di

Hunting-gathering,sh

ifting

cultivation

GaikwadandKash

yap,2003

Mahadeo

Koli

45

Maharash

tra

Australoid

DR

Agricu

lture

GaikwadandKash

yap,2003

Kuru

va

60

Karn

ataka

Australoid

DR

Pastoralism

RajkumarandKash

yap,2003

Chechu

100

Andhra

Pradesh

Australoid

DR,Sou

th-C

entral,Telugu

Hunting-gathering

Binduet

al.,2005

Naikpod

Gon

d104

Andhra

Pradesh

Australoid

DR,Sou

th-C

entral,Gon

di-Kui,

Gon

di

Agricu

lture

Binduet

al.,2005

Yerukula

101

Andhra

Pradesh

Australoid

DR,Sou

thern,Telugu

Hunting-gathering

Binduet

al.,2005

Halakki

44

Karn

ataka

Australoid

DR

Hunting-gathering,agricu

lture

Thangarajet

al.,2006

Kunabhi

40

Karn

ataka

Australoid

DR

Hunting-gathering,agricu

lture

Thangarajet

al.,2006

Oraon

60

Jhark

hand

Australoid

DR,Northern

Agricu

lture

Banerjeeet

al.,2005

TB,Tibeto-Burm

an;AA,Austro-A

siatic;

DR,Dravidian.

*Sou

rce:

www.ethnolog

ue.com

537MICROSATELLITE DIVERSITY AMONG INDIAN TRIBES

American Journal of Physical Anthropology

alleles (10 and 9) were observed among the two DRspeaking tribes of Karnataka—Halakki and Kunabhi.The occurrence of the unique alleles across the popula-tions was found to be statistically significant (v2 524.104, d.f. 2) when all the populations were includedand also when the two populations (Kunabhi and Garo-Meghalaya), exhibiting maximum number of unique al-leles, were excluded (v2 5 14.977, d.f. 2).The average heterozygosity values exhibited a wide

range in TB (from 0.7480 in Adi Panggi to 0.8416 inGaro of Meghalaya) than in AA and DR tribes. Thelocus-wise GST values ranged from 3.16% (D21S11) to6.27% (D13S317) among TB speakers, from 2.24%(D7S820) to 4.77% (D18S51) among AA groups and from2.14% (D3S1358) to 6.09% (FGA) among DR speakers.The populations showed a high-degree of genetic differ-entiation, the average GST values being 3.77% amongAA speakers, 4.49% among DR speakers and 4.65%among TB groups. In the MLG dataset, the locus-wiseGST values ranged from 3.76% at D21S11 to 5.63% atFGA and the average GST value was found to be 4.82%.

Genetic affinity among the linguistic groups

Single (within) linguistic group datasets. Genetic af-finity within each linguistic group was independentlyinvestigated to understand the influence of geographic,ethnic, and linguistic affiliations. The NJ phylogenetictree (based on the DA distance measure) showing thepattern of clustering among the three linguistic groupsare shown in Figure 3A–C.The DA-NJ dendrogram among the TB populations

(Fig. 3A) showed three major clusters with higher boot

strap values (91%) in case of Adi sub-cluster. The firstmajor cluster comprised of 10 tribes from Arunachal Pra-desh and Mizoram and hence, can be designated as the‘Arunachal-Mizoram’ cluster. This cluster consisted oftwo sub-clusters; all the six Adi subpopulations of Aru-nachal Pradesh forming one (with 91% bootstrap values)and the four Mizoram populations (Hmar, Mara, Lai,and Lusei) forming another. It is interesting that amongthe Adi samples, Adi Pasi-upper and Adi Minyong forma close cluster with high (73%) boot strap values. Thesecond major cluster of six populations can be recognizedas the ‘Ladakh-Sikkim’ cluster as it contained popula-tions from Ladakh (Buddhist, Argon, Drokpa, and Balti)and Sikkim (Bhutia and Lepcha). Within this cluster,Ladakh Buddhist and Argon form a close cluster with82% boot strap values. The third major cluster can becalled as the ‘Manipur-Garo’ group that included fivepopulations—three from Manipur (Hmar, Naga, andKuki) and two Garo populations, one from West Bengaland the other from Meghalaya, which is distinctively dif-ferent from others, with lengthy branch length and withhigher boot strap values (88%). Overall, the geographi-cally proximate TB populations in different regions clus-tered together indicating their genetic affinity, except forthe grouping of two Garo tribes which live in two differ-ent geographical regions (West Bengal and Meghalaya).The clustering pattern among the AA populations (Fig.

3B) showed clear distinction between the Mon-Khmer-speaking populations of north-east (Meghalaya) and theMundari-speaking populations of eastern region (Orissa,Jharkhand, and West Bengal), which shows higher bootstrap values (98%) and greater branch length comparedto the Mon-Khmer group. The Mon-Khmer populations

Fig. 2. Map of India showing the location of the studied populations in different geographical regions with details of linguisticand ethnic affiliations. Tibeto-Burman; Mongoloid, Dravidian; Australoid, Austro-Asiatic; Mongoloid, Austro-Asiatic;Australoid.

538 S. KRITHIKA ET AL.

American Journal of Physical Anthropology

exhibited wide diversity and formed two groups in con-trast to the Mundari populations that formed a distinctclose cluster of five populations. In case of DR popula-tions (Fig. 3C), two clusters were observed with lowerboot strap values (except the sub cluster of Chenchu andNaikpodGond (76%)). Dheria Gond (Chattisgarh) andKuruva (Karnataka) formed a cluster. And another clus-ter consisted of two subclusters, the first sub cluster isformed by Chenchu (Andhra Pradesh), Naikpod Gond(Andhra Pradesh), and Paroja (Orissa) and the other subcluster includes Irular (Tamil Nadu) and Oraon (Jhark-hand) with Mahadeo Koli (Maharashtra) as an outlier.The remaining four Dravidian populations, Yerukula(Andhra Pradesh), Madia Gond (Maharashtra), Halakkiand Kunabhi of Karnataka stood out as outliers, wherethe branch length of Halakki and Kunabhi was dis-tinctly different from the others. But the clustering pat-tern obtained from the PCA plot is different from the NJdendrogram for the three linguistic groups (TB, AA, andDR populations).The PCA plot (component 1 and 2) in case of TB

speaking tribes (Fig. 4A) showed wide scatter of popula-tions that hardly correspond with their geographicallocation, excepting for three Manipur populations whichform a close cluster and is distinctively located fromothers. The Garo (b1) from West Bengal gets separatedfrom the Garo of Meghalaya (g1) and the Adi Minyong(a3) gets separated from other Adi groups, but however

these groups clustered with higher boot strap values inthe NJ dendrogram (Fig. 3A). Also, the two TB popula-tions from northern region, especially Ladakh Buddhists(L1) and Argon (L2) which showed a close cluster withhigh bootstrap values (82%) in NJ tree (Fig. 3A) areplaced separately in the PCA plot. In case of AA tribes,the PCA plot (Fig. 4B) also illustrated a close cluster ofthe Mundari-speaking tribes and it is distinctively sepa-rated from the Mon-Khmer groups, which are scatteredover a wide space. War-Khasi (m7) is placed away fromWar Jaintia (m8), in contrast to the NJ tree (Fig. 3B)where both formed a close cluster; however these twogroups deviate from the rest of the AA populations asseen in the NJ tree. The PCA plot of the 12 DR popula-tions showed distinct separation of the two tribes of Kar-nataka—Halakki and Kunabhi—from the rest of the 10tribes which formed a close cluster (not shown). A crossexamination suggested that Halakki and Kunabhi aremisclassified as tribes, so the PCA plot were reconsid-ered with 10 populations excluding the Halakki andKunabhi groups (Fig. 4C) and the plot showed wide dis-persal of the populations, differing from the patternobserved in NJ tree (Fig. 3C). However, Chenchu andNaikpod Gond are closely placed in the PCA plot, inagreement with their higher boot strap values (76%)observed in the dendrogram. Overall, the clustering ofDR tribes does not reflect their geographic proximity,unlike in the case of AA and TB populations. A pair wise

Fig. 3. DA-NJ phylogenetic trees depicting the genetic relationships in the ‘single linguistic group’ dataset. The values given atthe nodes are the bootstrap values (%). A: Tibeto-Burman (TB); B: Austro-Asiatic (AA); C: Dravidian (DR).

539MICROSATELLITE DIVERSITY AMONG INDIAN TRIBES

American Journal of Physical Anthropology

comparison of the three linguistic groups was also per-formed (results not shown) wherein similar clusteringpattern, as in the case of the multiple linguistic groupdata sets, were obtained. The clustering patternsobtained from the PCA plots based on components 2–3and 1–3 were also examined for the linguistic groups. Incase of Adi samples, Adi Minyong (A3) is more closer toKuki (N2) than other Adi samples, as is the case withPCA 1–2 plot. These plots showed similar pattern of clus-tering as obtained in the case of the plots based on com-ponents 1 and 2. (see Supporting Information figures).

Multiple linguistic group data sets. All the studied46 tribes were considered together to investigate thegenetic relationships between them. The dendrogram(see Fig. 5) comprising of 21 TB, 13 AA, and 12 DR pop-ulations illustrated that geography and ethnicity play acrucial role, than language, in determining the geneticrelatedness between these populations. For example, AAand DR populations of common ethnicity from Orissa(Juang, Saora, and Paroja) formed a single cluster and asimilar trend was observed in case of Jharkhand popula-tions (Oraon, Munda, and Santal) also. Populationsinhabiting the same geographic region (e.g. Arunachal

Fig. 4. PCA plots depicting the genetic relationships in the ‘single linguistic group’ dataset. A: Tibeto-Burman. a1, Adi Pasi:upper; a2, Adi Pasi: lower; a3, Adi Minyong; z1, Hmar: Mizoram; z2, Mara; z3, Lai; z4- Lusei; s1, Bhutia; s2, Lepcha; n1, Naga; n2,Kuki; n3, Hmar: Manipur; b1, Garo: West Bengal; a4, Adi Panggi; a5, Adi Komkar; a6, Adi Padam; g1, Garo: Meghalaya; l1,Ladakh Buddhist; l2, Argon; l3, Drokpa; l4, Balti. B: Austro-Asiatic. m1, Lyngngam; m2, Nongtrai; m3- Maram; m4, Khynriam; m5,Pnar; m6, Bhoi; m7, War Khasi; m8, War Jaintia; o1, Juang; o2, Saora; b1, Lodha; j1, Munda; j2, Santal. C: Dravidian. o1, Paroja;t1, Irular; c1, Dheria Gond; m1, Madia Gond; m2, Mahadeo Koli; k1, Kuruva; a1, Chenchu; a2, Naikpod Gond; a3, Yerukula;j1- Oraon.

Fig. 5. DA-NJ phylogenetic trees depicting the genetic rela-tionships in the ‘multiple linguistic groups’ dataset (AA, DR andTB populations). The values at the nodes are the bootstrapvalues (%).

540 S. KRITHIKA ET AL.

American Journal of Physical Anthropology

Pradesh (6), Manipur (3), and Meghalaya (9)) were foundto form independent clusters. All the Mon-Khmer speak-ing AA tribes separated from the DR and Mundarispeaking AA tribes. However, the bootstrap values, ingeneral are low except for two clusters: Adi cluster of sixpopulations (87%) and the three Manipur populationsalong with Garo (WB) population (77%). The PCA plot(based on component 1 and 2), excluding Halakki andKunabhi (see Fig. 6), showed two clusters placed sepa-rately: towards the positive end of component 2 is theMon-Khmer speaking AA populations from Meghalaya(north-east) which have been separated out from theMundari speaking AA populations of eastern region. WarKhasi (g8), which formed a separate cluster along withWar Jaintia (g9), in the NJ tree (Fig. 3B), is distinctlyplaced from the Meghalaya group and similarly Garo(WB) which clustered with Manipur population in theNJ tree (Fig. 3A) gets separated out from the TB-AA(Mundari)-DR cluster. The pattern of clustering obtainedfrom PCA based on components 2 and 3 is almost similarto the pattern obtained from PCA based on components1 and 2, wherein both show distinct separation of Mon-Khmer speaking AA groups from the rest of the popula-tions. This scenario is, however, less distinct in case ofPCA plots based on components 1 and 3. (see SupportingInformation figures) 3-D PCA plot (not shown) howevershows separation of the Mon-Khmer speaking AA groupsfrom the rest of the studied populations as in case ofPCA plots 1–2, 2–3, and 1–3 components.

DISCUSSION

The tribal populations of India pose hurdles to investi-gate the genetic affinity among them due to their widediversity in terms of language and ethnicity. The intrigu-

ing queries in this regard are: were the three linguisticgroups derived from diverse ancestral groups? Or werethey derived from a common ancestry of early foundersbut adopted different languages in due course of time, asa result of isolation and cultural diffusion? These queriesled us to formulate plausible hypotheses to explain theantiquity of the Indian tribal populations. We haveinvestigated the confounding influence of language (AA,DR, and TB tribes) on genetic structure of the popula-tions, based on autosomal STR markers, to obtain a bet-ter clarity of their genetic affinity and to infer which ofthe proposed paradigms (introduction) can better explainthe possible antiquity and peopling of Indian subconti-nent! The phylogenies and the PCA plot obtained foreach of the three linguistic groups show distinct patternswhich help us to unravel the affinity and diversityamong populations within each linguistic group anddepict plausible scenarios of the early settlers and theirpast genetic history.

Genetic relationship within the threelinguistic families

Tibeto-Burman linguistic group. Among the three lin-guistic groups (TB, AA, and DR), TB-speaking tribesretain their folklore tradition and cultural artifacts thatdescribe their possible origin (from a common Tibetanancestry Tani group), migration and distribution in dif-ferent parts of the sub-Himalayan mountain ranges(Sen, 1985). The geophysical terrain (mountain valleysand rivers), of eastern ranges of Himalayas, might haveacted as geographical barriers and cultural aspects(group identity, tribal war-fare, etc) could have posedhurdles for migration thereby preventing gene flow fromother regions and from other populations as well. Thisprobably should have resulted in regional genetic differ-entiation in their respective geographical regions amongdifferent TB tribes. This expectation is supported by theresults obtained from the phylogenetic trees and PCAplot distribution of 22 TB populations. The contiguoustribes from common geographical region (e.g. Adisub-tribes from Siang river valley of Arunachal Pradesh;tribes of Mizoram; or tribes from Manipur) form a closecluster in agreement with the expectation from theethno-historical information and isolation due to physicaland cultural barriers.

Dravidian linguistic group. The results of the cluster-ing pattern obtained from the phylogenetic trees and thePCA plot show clustering of the geographically sparsetribes with small branch lengths; except for Halakki andKunabhi of Karnataka, who show longer branch lengths,in support of their disputed tribal status. These two pop-ulations show different physical features and practicefishing occupation allegedly unlike other DR speakingtribes [personal communication with Prof. K.C. Malho-tra]. Overall, it can thus be inferred that the Dravidiantribes who share similar physical features and similarcultural attributes distributed over wide geographicregions possibly might have later shared a commonancestry in the remote past, and have spread to differentgeographic regions. This inference, however, requiresfurther validation.

Austro-Asiatic linguistic group. In case of either TB-or DR-speaking tribes distributed over wide geographicalregions, considerable homogeneity can be observed inlinguistic affiliation, physical appearance, and cultural

Fig. 6. PCA plots depicting the genetic relationships in the‘multiple linguistic groups’ dataset. a1, Adi Pasi: upper; a2, AdiPasi: lower; a3, Adi Minyong; z1, Hmar: Mizoram; z2, Mara; z3,Lai; z4, Lusei; s1, Bhutia; s2, Lepcha; n1, Naga; n2, Kuki; n3,Hmar: Manipur; b1, Garo: West Bengal; a4, Adi Panggi; a5, AdiKomkar; a6, Adi Padam; g1, Garo: Meghalaya; l1, Ladakh Bud-dhist; l2, Argon; l3, Drokpa; l4, Balti; g2, Lyngngam; g3, Non-gtrai; g4, Maram; g5, Khynriam; g6, Pnar; g7, Bhoi; g8, WarKhasi; g9, War Jaintia; o1, Juang; o2, Saora; b2, Lodha; j1,Munda; j2, Santal; o3, Paroja; t1, Irular; c1, Dheria Gond; h1,Madia Gond; h2, Mahadeo Koli; k1, Kuruva; d1, Chenchu; d2,Naikpod Gond; d3, Yerukula; j3, Oraon.

541MICROSATELLITE DIVERSITY AMONG INDIAN TRIBES

American Journal of Physical Anthropology

traits. On the contrary, the two branches of AA speakersshow considerable heterogeneity. Although the Mundaribranch of AA family is located in eastern region, theMon-Khmer branch is found in northeastern region(Meghalaya) and the Nicobar Islands. The two branchesalso possess distinct cultural practices (Mundari beingpatrilineal and patrilocal while Mon-Khmer (of Megha-laya) being matrilineal and matrilocal) and differ widelyin their physical appearance, viz. East-Asian and abo-riginal-Australian (Bareh, 1997; Kumar and Reddy,2003). As per the proposed hypothesis of cultural diffu-sion versus biological determinism, we expect leastgenetic affinity between the Mundari and the Mon-Khmer branches. The clustering pattern observed inthe phylogenetic trees and the PCA plot show distinctseparation of the Mundari and the Mon-Khmer (ofMeghalaya) branches which form separate clusters.Interestingly, all the Mundari-speaking tribes form aclose cluster with smaller branch lengths whereas theMon-Khmer speakers exhibit considerable genetic diver-sity. This clustering pattern gives credence to the pro-posed paradigm of differential origin of the two sub-fami-lies (Mon-Khmer and Mundari tribes of AA language)and we suggest that their common linguistic affiliationmight plausibly be the result of cultural diffusion thatoccurred in the past. In this regard, the Wikipedia eth-nologic information of Mon-Khmer group indicates theirpossible ancestral migration from southern Tibet aroundthree to four thousand years ago (Gordon, 1914); how-ever, no such information about the antiquity and the or-igin of Mundari (as well as DR) tribes are available fromthe ethnographic accounts (Thurston and Rangachari,2001; Thurston, 2005). Thus, the ethnographic accounts,which indicate the diverse historical migration and set-tlement history of the two sub-families of AA linguisticfamily, support the results of our study.

Genetic relationship between TB, AA,and DR tribes

In general, the antiquity of the tribal populations isambiguous, except perhaps among some TB populations,where their antiquity is retained through their folkloretradition. In view of the common physical features andcultural similarities between the DR speakers and theMundari branch of the AA speakers, it seems likely thatthey might have been derived from a common ancestryin the remote past. Similarly, the TB-speaking tribesand the Mon-Khmer-speaking AA tribes also share suchcommonality in their culture and physical features,thereby invoking the possibility of their descent from acommon ancestry. The phylogenies and the PCA plotobtained from the pair-wise comparison of TB, AA, andDR tribes indeed support the proposed hypothesis ofclose genetic affinity between the same ethnic groups,despite diverse linguistic backgrounds. The DR tribescluster with the Mundari speaking AA tribes while theMon-Khmer group separates out. Interestingly, the Mun-dari-speaking AA tribes exhibit affinity with the Adi-Mizo cluster of TB populations, while the rest of TB pop-ulations cluster with the Mon-Khmer-speaking AAtribes. This, however, requires further clarification inview of the ethno-historical records of the Mon-Khmergroup and of the other TB tribes, especially of tribes ofManipur and Sikkim.In the recent years, there have been several molecular

genetic studies on regional castes and tribes of India. A

majority of these studies are region based, examining afew selected population samples in a given region andare mostly based on mitochondrial DNA and Y-chromo-some markers and on very few autosomal STR markers(Basu et al., 2003; Endicott et al., 2007; Krithika et al.,2008; Reddy, 2008). Concerning the antiquity of thetribal groups in India, Roychoudhury et al. (2001) stud-ied a few samples from the three linguistic groups (AA,DR, and TB) and reported extensive sharing of mtDNARSP haplotype across the tribes, which was laterreported by Basu et al. (2003) as well. The above studies(which included Mundari group of AA) suggested theremote antiquity of the Austro-Asiatic groups. The studyof mtDNA variation among regional tribes by Cordauxet al. (2003, 2004a,b), revealed that the northeasterntribes (TB) are distinct—from other tribes—and areclosely related to Southeast Asian populations. Thestudy also found sharing of common mtDNA haplotypeacross the tribes. Kumar et al. (2006) examined the mito-chondrial DNA 9-bp deletion/insertion among 31 Austro-Asiatic populations (Mundari and Mon-Khmer) and theirresults suggested multiple origins/migrations and distinc-tive origin of Austro-Asiatic populations; the Mundarigroups (of African origin) probably being the earliest set-tlers. Also, Kumar et al. (2007) assayed Y-STR and SNPmarkers among 25 AA groups and Southeast Asian popu-lations and detected strong paternal genetic link betweenthe Indian AA and the studied Southeast Asian popula-tions. Thus the mtDNA and Y-chromosome studies sup-ports our findings (based on autosomal STR markers)that Mon-Khmer-and Mundari-speaking AA tribes are ge-netically distinct and that central, eastern and southernAA tribes show close affinity in spite of their geographicalseparation. Overall, the study shows that the linguisticaffiliation does not reflect the genetic closeness amongthe AA, DR, and TB tribes. The linguistic similarity incase of Mundari-speaking AA and Mon-Khmer-speakingAA groups and the dissimilarity between Mundari-speak-ing AA and DR tribes are better explained as a result ofcultural and geographical influences.

Association between genetic and linguisticboundaries

In general, there is fairly good agreement between thelinguistic and genetic affiliation at the global level (Cav-alli Sforza et el., 1994, Belle et al., 2007). Belle et al.(2007) examined 377 microsatellite loci among 52 worldpopulations to understand the association between geog-raphy, language, and genetics. Over all, the results sug-gested greater association between geography andgenetic diversity than language affiliation. But thismight be different in some regional populations depend-ing on the history of settlement, migration and demo-graphic events which might show disagreement betweenlinguistic and genetic diversity. Disagreement betweenlanguage and genetics (observed in AA, DR, and TBgroups of the study) were observed among Austronesian-and Papuan-speaking communities from SolomonIslands. Cox et al. (2006) found no association betweenlanguage affiliation and Y-chromosome variation in Solo-mon Island. Similarly, in a study based on mtDNAsequences of 17 North American populations, Hunleyand Long (2005) found departure of the genetic structurefrom the linguistic classification and further analysesindicated strong evidence for gene flow across linguisticboundaries. In Europe, the Hungarians linguistically

542 S. KRITHIKA ET AL.

American Journal of Physical Anthropology

belong to the Finno-Ugric language family, but from thehistorical information, it is possible that during 1,000years the ancient ethnic components of Hungarian popu-lations could have changed as a result of admixture withother tribes. This has been investigated (Csanyi et al.,2008) by comparing the Y-chromosome markers amongthe two modern Hungarian-speaking populations andancient Hungarian skeletal samples of 10th century. Theresults (high frequency of haplogroup J in both Szeklersand Hungarians) confirmed that modern populations aregenetically closely related and are similar to populationsfrom Central Europe and the Balkans.In case of AA, DR, and TB tribes of India, we lack in-

formation regarding their origin, historical migrationand settlement history in case of Mundari and Dravidiantribes. Elucidation of such events can possibly explainthe ambiguity between linguistic and genetic affiliations.However, TB-speaking tribes have folklore describingtheir possible origin, history of settlement and migrationincluding internal tribal ware fare etc. Such events arenot possible to trace in case of DR and Mundari-speak-ing AA tribes. What was the origin, settlement history,original language spoken etc. is not known for the Mun-dari, Dravidian, Andaman and Nicobar tribes. A major-ity of DR tribes now speak the local language as a resultof their interaction with Caste populations. Chaubey etal., (2008) reported mtDNA study among Mushar com-munity (in UP and eastern parts of India) who were(Mundari) Austro-Asiatic speakers earlier, but nowspeak Hindi with virtually no influence of the local peo-ple on their genetic makeup. In case of Khasi (Mon-Khmer) of Meghalaya, the ethnographic informationindicates their historical migration from across the Hi-malayan Mountains (Bareh, 1997). This supports theirseparate and diverse origin, derived from an ancestralpopulation not common to the Mundari group. The sepa-rate cluster formation of Mon-Khmer Khasi group fromMundari group is in agreement with the historicalaccounts of the Khasi group.

Antiquity and possible past (genetic)history of the tribes

In terms of the antiquity, there have been variouscompeting hypotheses wherein some scholars believedthat Austro-Asiatics were the early inhabitants whileothers considered the Dravidian speakers to be the earlysettlers of the subcontinent (Rishley, 1915; Buxton, 1925;Rapson, 1955; Sarkar, 1958; Pattanayak, 1998). On theother hand, Tibeto-Burman speaking tribes are supposedto be relatively later migrants from Tibet and Myanmar(Guha, 1935). The fossil hominid remains, whichretained the characteristics of the later Pleistocene popu-lations from other parts of Europe, Africa, and Asia,obtained from the Mesolithic sites (especially from theGangetic plain), however, are consistent with the hy-pothesis that people speaking the Austro-Asiatic lan-guages were probably the early migrants into the sub-continent (Gordon, 1958; Kennedy, 1984). In this regard,various paleoanthropological, linguistic, cultural, and bi-ological evidences (obtained from anthropometric, classi-cal genetic, and molecular genetic studies) were consid-ered to address the issues pertaining to the peopling ofthe Indian subcontinent and to corroborate differenthypotheses (Rishley, 1915; Rapson, 1955; Malhotra andVasulu, 1993; Cavalli-Sforza et al., 1994; Gadgil et al.,1998; Majumder, 1998; Pattanayak, 1998; Bhasin and

Walter, 2001; Kumar and Reddy, 2003). Some studiesbased on a few regional tribes (Roychoudhury, 1977;Malhotra, 1978; Majumder and Mukherjee, 1993; Malho-tra and Vasulu, 1993; Majumder, 1998; Bhasin and Wal-ter, 2001) have shown that the geography, language, andethnicity play a significant role in determining thegenetic relatedness among the different populations,however, the dichotomy between the linguistic and eth-nic affiliation of the populations in inferring the possibleantiquity were hardly attempted.Therefore, an understanding of the past genetic his-

tory of peopling of the Indian subcontinent is riddleddue to the lack of clarity of the confounding factors oflinguistic, geographic, and ethnic diversity of the tribes,especially, the existence of two morphologically distincttribes speaking or belonging to the same linguistic fam-ily and two linguistically different tribes sharing com-mon physical features. To address these factors, we pro-pose two alternative possible scenarios of past historicalevents of peopling of Indian subcontinent. These twoparadigms are based on the ‘inequality relationship’ ofevolutionary change that varies among the biological asagainst cultural systems of the human populations. Themorphological changes are slower than linguistic trans-formations, in absence of intervening rapid demographicand cultural factors. Which of the scenarios is best suita-ble for the peopling of Indian subcontinent? And whatcan be inferred from the results of the current study?The results of the phylogenetic tress and the PCA plot—(showing the separation of the Mundari and Mon-Khmerbranches of AA linguistic family; clustering of Mundari-speaking AA populations with some of the DR and closeaffinity of some of the TB-speaking tribes with the Mon-Khmer-speaking AA tribes)—helps us to understand thepossible scenarios of past genetic history of these tribalpopulations, which can be described under the followingalternative models (see Fig. 7).

Case A: The derived tribes had retained their com-mon ethnicity and language from the early settlers.The early settlers who had a common language and cul-ture in the remote past, in due course of time of theirsettlement in different geographical regions had retainedtheir ethnic and cultural identity and continued to existwith slight modifications as a result of dispersal and iso-lation. This can best explain the situation in case of TB-and DR-speaking tribes. Both the tribal populationsshow clear distinction in their geographical distribution,physical features, cultural, and linguistic traits. Theirhistorical migration and antiquity are different: the DRspeakers are supposed to be ancient while the TB speak-ers comparatively possess a recent origin and historicalmigration (Buxton, 1925; Guha, 1935; Sarkar, 1958). Theancestors of the DR tribes are supposed to belong to aproto-Dravidian language whereas the TB tribes wereprobably derived from a common ancestry who spoke acommon Sino-Tibetan language.

Case B: The derived sub-tribes had retained a com-mon ethnicity but acquired different languages.The early settlers during their settlement in differentregions had differentiated into sub-tribes, which hadretained their ethnicity but acquired or adopted differentlanguages. The close genetic affinity observed betweenthe morphologically related DR tribes and the Mundari-speaking AA tribes indicate their probable commonancestry and most possibly they had a common lan-guage, but in due course of time of their dispersal to

543MICROSATELLITE DIVERSITY AMONG INDIAN TRIBES

American Journal of Physical Anthropology

different geographical regions had adopted differentlanguages.

Case C: Sub-populations derived from (two) differ-ent ancestry had retained their separate ethnicitybut adopted a common language. Another situationof the past history of peopling of the Indian subcontinentcould be that the early founders belonging to differentorigin and historical background settled in a contiguousor the same geographical region and had acquired thelocal language, but retained their ethnicity. The Mundariand Mon-Khmer-speaking AA tribes probably were fromdifferent early settlers in the past, but had acquired oradopted the AA language as a result of cultural diffusionbut retained their ethnicity and other cultural practiceswith least admixture between them.The aforesaid three scenarios describes the past

(genetic) history or course of events that can explain thepersisting biological diversity among the tribal popula-tions of the subcontinent. The results also indicate thatthe TB populations were different from the DR or theAA speakers and belonged to different ancestry. Theancestors of ‘DR,’ ‘AA,’ and ‘TB’ populations representdifferent waves of migration that had entered the Indiansubcontinent from different directions at different timeperiods. Further studies based on mitochondrial and Y-chromosome markers of the Indian tribes are expected to

provide further clarity of the inferences based on autoso-mal markers.

ACKNOWLEDGMENTS

This study is essentially a part of the Indian Statisti-cal Institute project undertaken in collaboration withCentral Forensic Science Laboratory (CFSL), Kolkata.We thank Directors of both the Institutes for logisticsupport. We acknowledge the Adi tribes for their partici-pation and cooperation in the study. Our sincere grati-tude is to the officials of the Government of ArunachalPradesh especially of the Siang districts. We thank Dr.Kashyap, CFSL for providing the laboratory facilitiesand the required materials to carry out the experimentsand Dr. R Trivedi, CFSL for technical support. We alsothank the research scholars of CFSL for their help inlaboratory experiments.

LITERATURE CITED

Banerjee J, Trivedi R, Kashyap VK. 2005. Polymorphism at 15short tandem repeat AmpFlSTR Identifiler loci in three abo-riginal populations of India: an assessment in human identifi-cation. J Forensic Sci 50:1229–1234.

Bareh H. 1997. The history and culture of the Khasi People.Guwahati: Spectrum Publishers.

Barnabas S, Shouche Y, Suresh CG. 2006. High-resolutionmtDNA studies of the Indian population: implications forpaleolithic settlement of the Indian subcontinent. Ann HumGenet 70:42–58.

Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, Chakra-borty M, Dey B, Roy M, Roy B, Bhattacharyya NP, Roychoud-hury S, Majumder PP. 2003. Ethnic India: a genomic view,with special reference to peopling and structure. Genome Res13:2277–2290.

Belle EMS, Barbujani G. 2007. Worldwide analysis of multiplemicrosatellites: language diversity has a detectable influenceon DNA diversity. Am J Phys Anthrop 133:1137–1146.

Berkman CC, Dinc H, Sekeryapan C, Togan I. 2008. Alu inser-tion polymorphisms and an assessment of the genetic contri-bution of Central Asia to Anatolia with respect to the Bal-kans. Am J Phys Anthrop 136:11–18.

Bhasin MK, Walter H. 2001. Genetics of castes and tribes ofIndia. Delhi: Kamla-Raj Enterprises.

Bindu GH, Trivedi R, Kashyap VK. 2007. Allele frequency dis-tribution based on 17 STR markers in three major Dravidianlinguistic populations of Andhra Pradesh, India. Forensic SciInt 170:76–85.

Buxton LHD. 1925. The peoples of Asia. London: Kegan Paul,Trench & Trubner.

Cann RL, Stoneking M, Wilson AC. 1987. Mitochondrial DNAand human evolution. Nature 325:31–36.

Cavalli-Sforza LL, Menozzi P, Piazza A. 1994. The history andgeography of human genes. Princeton: Princeton UniversityPress.

Census of India. 2001. Provisional population tables, paper 1,series 1. New Delhi: Office of Registrar General, Governmentof India Publication.

Chattopadhyay P, Ranjan D, Kashyap VK. 2001. Populationdata for nine fluorescent based STR loci among four impor-tant tribal populations of India. J Forensic Sci 46:184–188.

Chaubey G, Metspalu M, Karmin M, Thangaraj K, Rootsi S,Parik J, Solnik A, Selvi Rani D, Singh VK, Naidu BP, ReddyAG, Metspalu E, Singh L, Kivisild T, Villems R. 2008. Lan-guage shift by indigenous population: a model genetic studyin South Asia. In: Reddy BM, editor. Trends in molecular an-thropology. Delhi: Kamla-Raj Enterprises. p 41–50.

Chaubey G, Metspalu M, Kivisild T, Villems R. 2007. Peoplingof South Asia: investigating the caste-tribe continuum inIndia. Bioessays 29:91–100.

Fig. 7. Reconstruction of possible antiquity of Indian tribesto explain the dichotomy between ethnicity and language. CaseA: Ancestral populations continue to retain in due course oftime (t) their ethnicity (E) and cultural and linguistic identity(L) after their settlement. Case B: Ancestral populations aftersettled in different geographical regions continue their ethnicitybut adapt to different languages (L1, L2) as a result of culturalinfluences. Case C: Diverse ancestral ethnic groups after theirsettlement in a contiguous region in due course of time (t) adapta common language (L) as a result of cultural diffusion.

544 S. KRITHIKA ET AL.

American Journal of Physical Anthropology

Cordaux R, Aunger R, Bentley G, Nasidze I, Sirajuddin SM,Stoneking M. 2004b. Independent origins of Indian caste andtribal paternal lineages. Curr Biol 14:231–235.

Cordaux R, Saha N, Bentley GR, Aunger R, Sirajuddin SM,Stoneking M. 2003. Mitochondrial DNA analysis revealsdiverse histories of tribal populations from India. Eur J HumGenet 11:253–264.

Cordaux R, Weiss G, Saha N, Stoneking M. 2004a. The north-east Indian passageway: a barrier or corridor for humanmigrations? Mol Biol Evol 21:1525–33.

Cox MP, Lahr MM. 2006. Y-Chromosomal diversity is inverselyassociated with language affiliation in paired Austronesianand Papuan-speaking communities from Solomon Islands. AmJ Hum Biol 18:35–50.

Csanyi B, Bogacsi-Szabo E, Tomory Gy, Czibula A, Priskin K,Csosz A, Mende B, Lango P, Csete K, Zsolnai A, Downes CS,Rasko I. 2008. Y-chromosome analysis of ancient Hungarianand two modern Hungarian-speaking populations from theCarpathian Basin. Ann Hum Genet 72:519–534.

Endicott P, Metspalu M, Kivisild T. 2007. Genetic evidence onmodern human dispersals in South Asia: Y chromosome andmitochondrial DNA perspectives: The world through the eyesof two haploid genomes. In: Petragalia DM, Allchin A, editors.The evolution and history of human populations in SouthAsia. The Netherlands: Springer. p 229–244.

Excoffier L, Langaney A. 1989. Origin and differentiation ofhuman mitochondrial DNA. Am J Hum Genet 44:73–85.

Fuchs S. 1973. The aboriginal tribes of India. London: Macmil-lan.

Gadgil M, Joshi N, Manoharan S, Patil S, Prasad UVS. 1998.Peopling of India. In: Balasubramanian D, Rao NA, editors.The human heritage. Hyderabad: Hyderabad University Press.p 100–129.

Gaikwad S, Kashyap VK. 2003. Genetic diversity in four tribalgroups of western India: a survey of polymorphism in 15 STRloci and their application in human identification. ForensicSci Int 134:225–231.

Gordon DH. 1958. The pre-historic background of Indian cul-ture. Mumbai: Bhulabai Memorial Institute.

Gordon PRT. 1914. The Khasis. New Delhi: Macmillan.Guha BS. 1935. The racial affinities of the people of India. In:

Census of India, 1931. Simla: Government of India Press.Hunley K, Long JC. 2005. Gene flow across linguistic bounda-

ries in native North American populations. Proc Natl AcadSci USA 102:1312–1317.

Ingman M, Kaessmann H, Paabo S, Gyllensten U. 2000. Mito-chondrial genome variation and the origin of modern humans.Nature 408:708–713.

Kashyap VK, Guha S, Trivedi R. 2002. Concordance study on 15STR loci in three major populations of Himalayan State Sik-kim. J Forensic Sci 47:1163–1167.

Kennedy KAR. 1984. Biological adaptations and affinities ofMesolithic South Asians. In: Lukas JR, editor. The people ofsouth Asia. The biological anthropology of India, Pakistanand Nepal. New York: Plenum Press. p 29–57.

Kivisild T, Rootsi S, Metspalu M, Mastana S, Kaldma K, ParikJ, Metspalu E, Adojaan M, Tolk HV, Stepanov V, Golge M,Usanga E, Papiha SS, Cinnioglu C, King R, Cavalli-Sforza L,Underhill PA, Villems R. 2003. The genetic heritage of theearliest settlers persists both in Indian tribal and caste popu-lations. Am J Hum Genet 72:313–332.

Krithika S, Maji S, Vasulu TS. 2008. Molecular genetic per-spective of Indian populations: a Y-chromosome scenario. In:Bhasin V, Bhasin MK, editors. Anthropology today: trends,scope and applications. Delhi: Kamla-Raj Enterprises. p385–392.

Krithika S, Trivedi R, Kashyap VK, Vasulu TS. 2005. Geneticdiversity at 15 microsatellite loci among the Adi Pasi popula-tion of Adi tribal cluster in Arunachal Pradesh, India. LegMed (Tokyo) 7:306–310.

Krithika S, Trivedi R, Kashyap VK, Vasulu TS. 2007a. Genotypeprofile for fifteen tetranucleotide repeat loci in two Tibeto-Burman speaking tribal populations of Arunachal Pradesh,India. J Forensic Sci 52:239–41.

Krithika S, Trivedi R, Kashyap VK, Vasulu TS. 2007b. Allelefrequency distribution at 15 autosomal STR loci in Panggi.Komkar and Padam sub tribes of Adi, a Tibeto-Burmanspeaking population of Arunachal Pradesh, India. Leg Med(Tokyo) 9:210–217.

Kumar V, Langsiteh BT, Biswas S, Babu JP, Rao TN, ThangarajK, Reddy AG, Singh L, Reddy BM. 2006. Asian and non-Asianorigins of Mon-Khmer-and-Mundari-speaking Austro-Asiaticpopulations of India. Am J Hum Biol 18:461–469.

Kumar V, Reddy AN, Babu JP, Rao TN, Langstieh BT, Than-garaj K, Reddy AG, Singh L, Reddy BM. 2007. Y-chromosomeevidence suggests a common paternal heritage of Austro-Asi-atic populations. BMC Evol Biol 7:47.

Kumar V, Reddy BM. 2003. Status of Austro-Asiatic groups inthe peopling of India: an exploratory study based on the avail-able prehistoric, linguistic and biological evidences. J Biosci28:507–522.

Kumar S, Tamura K, Nei M. 2004. MEGA3: integrated softwarefor Molecular Evolutionary Genetics Analysis and sequencealignment. Briefings Bioinformatics 5:150–163.

Langstieh BT, Reddy BM, Thangaraj K, Kumar V, Singh L.2004. Genetic diversity and relationships among the tribes ofMeghalaya compared to other Indian and Continental popula-tions. Hum Biol 76:569–590.

Maity B, Nunga SC, Kashyap VK. 2003. Genetic polymorphismrevealed by 13 tetrameric and 2 pentameric STR loci in fourMongoloid tribal populations. Forensic Sci Int 132:216–222.

Majumder PP. 1998. People of India: biological diversity andaffinities. Evol Anthropol 6:100–110.

Majumder PP, Mukherjee BN. 1993. Genetic diversity and affin-ities among Indian populations: an overview. In: MajumderPP, editor. Human population genetics. New York: Plenum.p 255–275.

Malhotra KC. 1978. Morphological composition of the people ofIndia. J Hum Evol 7:45–63.

Malhotra KC, Vasulu TS. 1993. Structure of Human populationsin India. In: Majumder PP, editors. Human population genet-ics. New York: Plenum. p 207–233.

Nei M. 1973. Analysis of gene diversity in subdivided popula-tions. Proc Natl Acad Sci USA 70:3321–3323.

Nei M. 1987. Molecular evolutionary genetics. New York: Co-lumbia University Press.

Nei M, Tajima F, Tateno Y. 1983. Accuracy of estimated phyloge-netic trees from molecular data. J Mol Evol 19:153–170.

Ota T. 1993. Dispan: genetic distance and phylogenetic analysis.Pennsylvania: Institute of Molecular Evolutionary Genetics,Pennsylvania State University, University Park.

Pattanayak DP. 1998. The language heritage of India. In: Bala-subramanian D, Rao NA, editors. The Indian human heritage.Hyderabad: University Press. p 95–99.

Rajkumar R, Kashyap VK. 2003. Evaluation of 15 biparentalSTR loci in human identification and genetic study of theKannada-speaking groups of India. Am J Forensic Med Pathol24:187–192.

Rapson EJ. 1955. People and languages. In: Rapson EJ, editor.Cambridge history of India, vol. 1, Ancient India. Delhi: SChand. p 33–57.

Reddy B, editor. 2008. Trends in molecular anthropology. Delhi:Kamla-Raj Enterprises.

Rishley HH. 1915. The people of India. Calcutta: Thacker Spink.Rosenberg NA, Mahajan S, Gonzalez-Quevedo C, Blum MGB,

Nino-Rosales L, Ninis V, Das P, Hegde M, Molinari L, ZapataG, Weber W, Belmont W. 2006. Low levels of genetic diver-gence across geographically and linguistically diverse popula-tions of India. PLoS Genetics 2:2052–2061.

Roychoudhury AK. 1977. Genetic diversity in Indian popula-tions. Hum Genet 46:99–106.

Roychoudhury S, Roy S, Basu S, Banerjee R, Viswanathan H,Usha Rani MV, Sil SK, Mitra M, Majumder PP. 2001.Genomic structures and population histories of linguisticallydistinct tribal groups of India. Hum Genet 109:339–350.

Sahoo S, Kashyap VK. 2002. Genetic variation at 15 autosomalmicrosatellite loci in the three highly endogamous tribal popu-lations of Orissa, India. Forensic Sci Int 130:189–193.

545MICROSATELLITE DIVERSITY AMONG INDIAN TRIBES

American Journal of Physical Anthropology

Sarkar SS. 1958. Race and race movements in India. In:Chatterjee SK, editor. The cultural heritage of India.Calcutta: Ramakrishna Mission Institute of Culture. p 17–32.

Sarkar N, Kashyap VK. 2002. Genetic diversity at two pentanu-cleotide STR and thirteen tetranucleotide STR loci by multi-plex PCR in four predominant population groups of centralIndia. Forensic Sci Int 128:196–201.

Sen S. 1985. Folklore in North-East India. New Delhi: OmsonsPublications.

Singh A, Trivedi R, Kashyap VK. 2006. Genetic polymorphismat 15 tetrameric short tandem repeat loci in four aboriginaltribal populations of Bengal. J Forensic Sci 51:183–187.

Singh KS. 1994. People of India: the scheduled tribes, Vol. III.Delhi: Oxford University Press.

Sitalaximi T, Trivedi R, Kashyap VK. 2003. Autosomal microsa-tellite profile of three socially diverse ethnic Tamil popula-tions of India. J Forensic Sci 48:211–214.

Takezaki N, Nei M. 1996. Genetic distances and reconstructionof phylogenetic trees from microsatellite data. Genetics144:389–399.

Templeton A. 2002. Out of Africa again and again. Nature416:45–51.

Thangaraj K, Chaubey G, Singh VK, Reddy AG, Pavate PP,Singh L. 2006. Genetic profile of nine autosomal STR lociamong Halakki and Kunabhi populations of Karnataka, India.J Forensic Sci 51:190–192.

Thapar R. 1966. A history of India, Vol. 1. Middlesex: Penguin.Thurston E. 2005. Ethnographic survey in South India

(Reprint). Delhi: Nidhi Book Enclave. ISBN. 81–902086-5–9.Thurston E, Rangachari S. 2001. Castes and tribes of Southern

India (Reprint). NewDelhi,Madras: Asian Educational Services.Trivedi R, Chattopadhyay P, Maity B, Kashyap VK. 2002.

Genetic polymorphism at nine microsatellite loci in four highaltitude Himalayan desert human populations. Forensic SciInt 127:150–155.

546 S. KRITHIKA ET AL.

American Journal of Physical Anthropology