Admixture Indicative Interval (AII): a new approach to assess trends in genetic admixture

10
Admixture Indicative Interval (AII): a new approach to assess trends in genetic admixture Ge ´raud Gourjon Be ´renge `re Saliba-Serre Anna Degioanni Received: 25 April 2012 / Accepted: 13 September 2014 / Published online: 20 September 2014 Ó Springer International Publishing Switzerland 2014 Abstract The genetic admixture is a dynamic and dia- chronic process, taking place during a great number of generations. Consequently, a sole admixture rate does not represent such an event and several estimates could help to take into account its dynamics. We developed an Admix- ture Indicative Interval (AII) which gives a mathematical key to avoid this problem by integrating several admixture estimators and their respective accuracy into a single metric and provides a trend in genetic admixture. To illustrate AIIs interests in admixture studies, AII were calculated using seven estimators on two sets of simulated SNPs data generated under two different admixture sce- narios and were then calculated from several published admixed population data: a Comorian population and several Puerto-Rican and Colombian populations for recent admixture events as well as European populations repre- senting the Neolithic/Paleolithic admixture for an older event. Our method provides intervals taking properly the variability and accuracy of admixture estimates into account. The AII lays in the intuitive interval in all actual and simulated datasets and is not biased by divergent points by the mean of a double-weighting step. The great quantity of heterogeneous parental contributions is synthesized by a few AII, which turn out to be more manageable and meaningful than aplenty variable point estimates. This offers an improvement in admixture study, allowing a better understanding of migratory flows. Furthermore, it offers a better assessment of admixture than the arithmetic mean, and enhances comparisons between regions, sam- ples, and between studies on same population. Keywords Genetic admixture Admixture rates Interval computation Estimation method Human peopling Introduction In search of the history of human populations, admixture events are of great importance to better understand demic movements and gene flows that have led to current-day populations. The admixture is a dynamic process which runs through many generations in human populations. During and, above all, after this event, several evolutionary forces affect parental and admixed populations’ genetic pool. Consequently, a clear discrepancy may exist between the genetic pattern observed in the next generation fol- lowing the mixing and the today sampled populations’ genetic structure. Moreover, the stochastic sampling error may increases divergences from ancestral allele frequen- cies (McEvoy et al. 2006). However, the knowledge of parental contributions has always played a significant role in the analysis of the gene pool of admixed populations, in particular in Latin or Central American populations (Fejerman et al. 2005; Mo- rera et al. 2003; Sans 2000; Silva et al. 2010; Via et al. 2011) and in Hispanic-American populations (Bertoni et al. 2003; Erdei et al. 2011; Rojas et al. 2010), where Electronic supplementary material The online version of this article (doi:10.1007/s10709-014-9792-3) contains supplementary material, which is available to authorized users. G. Gourjon (&) A. Degioanni CNRS, MCC – MMSH, LAMPEA UMR 7269, Aix-Marseille Universite ´, 5 rue du Cha ˆteau de l’Horloge, BP 647, 13094 Aix-en-Provence Cedex 2, France e-mail: [email protected] B. Saliba-Serre Faculte ´ de me ´decine de Marseille, Aix-Marseille Universite ´, CNRS, EFS, ADE ´ S UMR 7268, 51 boulevard Pierre Dramard, 13344 Marseille Cedex 15, France 123 Genetica (2014) 142:473–482 DOI 10.1007/s10709-014-9792-3

Transcript of Admixture Indicative Interval (AII): a new approach to assess trends in genetic admixture

Admixture Indicative Interval (AII): a new approach to assesstrends in genetic admixture

Geraud Gourjon • Berengere Saliba-Serre •

Anna Degioanni

Received: 25 April 2012 / Accepted: 13 September 2014 / Published online: 20 September 2014

� Springer International Publishing Switzerland 2014

Abstract The genetic admixture is a dynamic and dia-

chronic process, taking place during a great number of

generations. Consequently, a sole admixture rate does not

represent such an event and several estimates could help to

take into account its dynamics. We developed an Admix-

ture Indicative Interval (AII) which gives a mathematical

key to avoid this problem by integrating several admixture

estimators and their respective accuracy into a single

metric and provides a trend in genetic admixture. To

illustrate AIIs interests in admixture studies, AII were

calculated using seven estimators on two sets of simulated

SNPs data generated under two different admixture sce-

narios and were then calculated from several published

admixed population data: a Comorian population and

several Puerto-Rican and Colombian populations for recent

admixture events as well as European populations repre-

senting the Neolithic/Paleolithic admixture for an older

event. Our method provides intervals taking properly the

variability and accuracy of admixture estimates into

account. The AII lays in the intuitive interval in all actual

and simulated datasets and is not biased by divergent points

by the mean of a double-weighting step. The great quantity

of heterogeneous parental contributions is synthesized by a

few AII, which turn out to be more manageable and

meaningful than aplenty variable point estimates. This

offers an improvement in admixture study, allowing a

better understanding of migratory flows. Furthermore, it

offers a better assessment of admixture than the arithmetic

mean, and enhances comparisons between regions, sam-

ples, and between studies on same population.

Keywords Genetic admixture � Admixture rates � Interval

computation � Estimation method � Human peopling

Introduction

In search of the history of human populations, admixture

events are of great importance to better understand demic

movements and gene flows that have led to current-day

populations. The admixture is a dynamic process which

runs through many generations in human populations.

During and, above all, after this event, several evolutionary

forces affect parental and admixed populations’ genetic

pool. Consequently, a clear discrepancy may exist between

the genetic pattern observed in the next generation fol-

lowing the mixing and the today sampled populations’

genetic structure. Moreover, the stochastic sampling error

may increases divergences from ancestral allele frequen-

cies (McEvoy et al. 2006).

However, the knowledge of parental contributions has

always played a significant role in the analysis of the gene

pool of admixed populations, in particular in Latin or

Central American populations (Fejerman et al. 2005; Mo-

rera et al. 2003; Sans 2000; Silva et al. 2010; Via et al.

2011) and in Hispanic-American populations (Bertoni et al.

2003; Erdei et al. 2011; Rojas et al. 2010), where

Electronic supplementary material The online version of thisarticle (doi:10.1007/s10709-014-9792-3) contains supplementarymaterial, which is available to authorized users.

G. Gourjon (&) � A. Degioanni

CNRS, MCC – MMSH, LAMPEA UMR 7269, Aix-Marseille

Universite, 5 rue du Chateau de l’Horloge, BP 647,

13094 Aix-en-Provence Cedex 2, France

e-mail: [email protected]

B. Saliba-Serre

Faculte de medecine de Marseille, Aix-Marseille Universite,

CNRS, EFS, ADES UMR 7268, 51 boulevard Pierre Dramard,

13344 Marseille Cedex 15, France

123

Genetica (2014) 142:473–482

DOI 10.1007/s10709-014-9792-3

admixture events have been studied to obtain a better

understanding of the dynamics of colonization or in a

medical goal. From least-square methods to recent ABC

(approximative Bayesian computation) and Bayesian clus-

tering methods, many estimators have been developed to

assess contributions to the genetic pool of an admixed pop-

ulation (Bertorelle and Excoffier 1998; Cavalli-Sforza et al.

1994; Chakraborty 1975; Chakraborty et al. 1992; Chikhi

et al. 2001; Elston 1971; Excoffier et al. 2005; Giovannini

et al. 2009; Glass and Li 1953; Helgason et al. 2000; Krieger

et al. 1965; Lathrop 1982; Long 1991; Roberts and Hiorns

1962; Sousa et al. 2009; Wang 2003, 2006) and the genome

ancestry of an individual (Pritchard et al. 2000; Falush et al.

2003; Tang et al. 2005). These methods rely on several

assumptions (no sampling error, Hardy–Weinberg equilib-

rium, null natural selection, limited or lack of migration,

molecular information, etc.), taking into account some of

evolutionary forces which may have modified allele fre-

quencies. See Degioanni and Gourjon (2010) for a complete

review of population admixture estimation’s methods.

Following authors of admixture estimation methods (see

original papers) and Choisy et al. (2004), under optimal

conditions (i.e. not too old or too recent admixture events

and highly differentiated parental populations) all methods

and associated programs would be applicable when using

appropriate data (or reciprocally) and could lead to rather

accurate estimates with little bias. However, based on

simulated data, Choisy et al. (2004) found that some esti-

mates (especially for the Chakraborty’s Gene Identity

method and the Coalescence-Based method of Bertorelle

and Excoffier) frequently go outside the [0, 1], in particular

for parental populations that are not very different, for old

admixture events, for a low number of loci, and for small

samples. In other words, when these optimal conditions

were not met and so for all human population admixture

events, ‘‘substantial bias […] occurred and […] confidence

intervals partly lost their credibility’’ (Choisy et al. 2004).

In addition to software intrinsic factors, the main vari-

ability in admixture rate estimates is introduced by

experimenter’s choices (parental populations, markers,

admixed population samples, values for parameters such as

migration rate, genetic drift, mutation rate, sample size,

etc.), [see Gourjon (2010) and Gourjon et al. (2011) for

more information], and social behaviors [directional mat-

ing or societal selection like the ‘‘meaning of Black’’

(Edgar et al. 2009)] which both highly influence admixture

estimation. Even if Approximate Bayesian Computation

approach (Excoffier et al. 2005; Sousa et al. 2009) and

clustering methods (Prittchard et al.) have updated the

field, major problems described here still remain.

Our previous study on the Comorian population (Gourjon

et al. 2011) illustrates this great variability in admixture

contribution estimates. All our results emphasize the

conclusion that point estimates alone could provide a mis-

interpretation of the migration history and the admixture

event. In point of fact, since the genetic admixture does not

consist in a sole generation event, a sole point admixture

estimate would not explain this complex process, especially

knowing the bias and the huge estimate variability described

before. In addition, parental contributions may vary among

generations (Roberts and Hiorns 1962; Cavalli-Sforza et al.

1994), following complex spatiotemporal processes (Du-

rand et al. 2009; Francois et al. 2010).

Considering all described problems, only two way should

be followed in admixture study: to stop using admixture

estimation methods because of their lack of accuracy in

actual conditions, either, as suggested by Gourjon et al.

(2011), to estimate admixture rates with several estimators

in order to restrain bias. Indeed, the use of several markers

and combinations of parental populations or admixed pop-

ulation samples, as well as several estimation methods,

helps to assess the actual dynamics by providing an

‘‘admixture pattern’’. However, it is hardly manageable to

quantify a parental contribution or a genomic ancestry from

such a number of heterogeneous estimates. We developed a

method providing an Admixture Indicative Interval (AII),

which mathematically answer to this problem. Instead of

integrating all point estimates in an intuitive interval, our

procedure allows to get a trend in genetic admixture which

takes all estimates and their respective accuracy into

account. This AII can be applied on population admixture

estimates as well as on individual genomic ancestry esti-

mates. One has to note that, since the AII totally relies on

admixture estimator values which are taken as original data

(with their assumptions and known bias), our method does

not provide a ‘‘new’’ estimation. This method is designed as

a statistical tool to properly handle and manage several data

and to support anthropological interpretations.

Firstly, we tested our method on simulated data obtained

from two different admixture scenarios. Secondly, we calcu-

lated AII using data from several actual admixed populations

to highlight the various applications of our method: a Com-

orian population (Gourjon et al. 2011), and several admixed

populations from Puerto-Rico (Choudhry et al. 2006; Via et al.

2011), Colombia (Rojas et al. 2010), and Europe (Belle et al.

2006), these European populations being considered as the

result of the Neolithic–Paleolithic admixture.

Materials and methods

Basic procedures of Admixture Indicative Interval

The Admixture Indicative Interval (AII) is determined

following a strategy in three steps. In the first step, each

estimator is weighted considering its accuracy and its

474 Genetica (2014) 142:473–482

123

convergence with others. In the second step, the weighted

estimators allow to create a distribution of M, the admix-

ture contribution estimate. The last step consists in deter-

mining the length of the interval which includes the most

statistically likely M value.

In the following procedure, Mi will refer to the value of

the ith estimator (with i 2 1; . . .; If g, and I, the total

number of estimators, I� 3). For each parental population

(PP), these three steps have to be cleared to calculate the

corresponding Admixture Indicative Interval (as such AII as

parental populations).

First step: weights computation

a. Accuracy The measure of dispersion is normalized by

using the coefficient of variation (CV): the ratio of the

standard deviation to Mi. Consequently, it gives a better

understanding of the accuracy than other measures of dis-

persion, especially for low value of admixture rates, which

is useful to compare one estimator to another. The more the

CV is low, the more the estimator is accurate, and is so

more informative and more representative of the actual

admixture rate than an estimator with a great CV value.

Indeed, a too conservative method is of a limited interest in

an admixture study.

To weight an estimator, we use the inverse of the

coefficient of variation 1CV

, assigning a higher weight for a

more accurate estimator. Finally, to obtain a relative weight

between estimators, 1CV

is reported for each estimator as a

percent of the sum of the 1CV

from all estimators.

WaMi¼ 1

CVi

,XI

i¼1

1

CVi

!� 100

With WaMithe relative accuracy weight of the ith admix-

ture estimator (Mi) in comparison to the other estimators,

and CVi the corresponding coefficient of variation.

b. Convergence One can hypothesize that if several

estimators point to the same range of M, while another

estimator points to a significantly different value, the actual

contribution would be most likely in this convergent trend

than around the divergent value. Consequently, a higher

weight has to be given to an estimator for which the value

is closer to other Mi.

To assess the convergence between estimators, Tempo-

rary Intervals (TIi) are created and centered on each Mi.

Then, a distance matrix between all pairs of Mi is con-

structed, each term of the matrix being equal to the abso-

lute difference between 2 estimator values. Thus, for each

i, the mean absolute difference di between Mi and the (I -

1) other estimators Mi (i0 2 1; . . .; If g) is calculated:

di ¼1

I � 1

XI

i0¼1

jMi �Mi0 j

The TIi is delimited by [Mi- di; Mi ? di ].

A Global Interval (GI) is set up from the lowest limit (L)

to the uppermost limit (U) among all TIi. One moves along

the GI with an incremental value of d (8d; d 2 R�þ). Values

taken by M are noted mk, with mk ¼ Lþ k � d, k varying

between 0 to U�Ld , mk varying from m0 ¼ L to mU�L

d¼ U.

Along this GI, for each mk, the number of TIi including

this mk, noted Wcmk, is the convergence weight of mk. As

instance, if three TIi include a given mk, Wcmk¼ 3.

Second step: weights’ distribution

For each mk, Wamkis the sum of the WaMi

of Mi for which

corresponding TIi includes mk.

Wamk¼XI

i¼1

ITIimkð Þ �WaMi

with ITIimkð Þ is the indicator function of TIi; ITIi

mkð Þ ¼ 1 if

mk 2 TIi and ITIimkð Þ ¼ 0 otherwise.

Thereof, since WaMiis a percent, the sum will be equal

to 100 at a given mk if the number of TIi including mk is

equal to I. The total weight of a given mk, Wmk, is obtained

by:

Wmk¼ Wamk

:Wcmk

A piecewise constant function which corresponds to the

Weights’ Distribution WD mkð Þ ¼ Wmk, is constructed.

Third step: Admixture Indicative Interval

The WD delimits an area representing the cumulated sum

of weights along GI. This area is split in three thirds.

Assuming that WD peak points to the most likely mk value,

the AII is the interval delimited by the two mk values which

correspond to bounds of the central third (see Results for an

example).

Datasets and statistical programs

Admixture contribution estimates allowing to build our

Admixture Indicative Interval have been obtained from

simulated and published actual data.

We firstly simulated 2 trihybrid admixture scenarios using

SIMCOAL2, an extension of SIMCOAL (Excoffier et al.

2000). Since all admixture estimation methods consider an

instantaneous event in their models, we decided to fit this

assumption in our simulated models. (i) A sole instantaneous

admixture between 3 parental populations, 25 generations

ago. (ii) Two separated instantaneous admixture events: a

Genetica (2014) 142:473–482 475

123

unilateral gene flow, 25 generations ago, from the parental

population 3 (PP3) to the parental population 1 (PP1) and,

five generations later, an introgression from the parental

population 2 (PP2) in the admixed population resulting from

the first event. Simulation parameters are available in Online

Resource 1. Admixture contributions were estimated from

the frequencies of 30 SNPs (See Online Resource 2 and

Online Resource 3, for SNPs data resulting from scenario (i)

and (ii), respectively). Output files have been processed

using Arlequin v3.5 (Excoffier and Lischer 2010).

Secondly, we applied Admixture Indicative Interval

method to calculate parental contributions to published

actual populations:

• A Comorian population (Gourjon et al. 2011): we built

AII for a same parental population combination (South-

East Africa Bantu, Saudi, and Indonesian populations)

by combining estimator’s values for each marker type:

classical serologic markers (seven blood groups) and

uniparental molecular markers (SNPs from mtDNA and

Y-Chromosome).

• Several admixed samples from Puerto-Rico: AII were

built from individual ancestry (IA) estimates of

admixed samples from six different regions (Via et al.

2011) and six different clinic recruitment locations

(Choudhry et al. 2006). These IA were calculated from

93 and 44 autosomal AIMs, respectively.

• Several admixed samples from Colombia (Rojas et al.

2010): we built AII by combining parental contributions

(calculated from 11 autosomal AIMs) to 15 admixed

(mestizo) populations.

• Several European populations resulting from the Neo-

lithic–Paleolithic admixture event (Belle et al. 2006): we

built AII for the Paleolithic parental contributions in all

European admixed populations separately, parental con-

tributions being estimated from multilocus STRs data.

To estimate parental contributions for simulated data-

sets, seven population admixture estimators (Mi) have been

used at most (Krieger et al. 1965; Roberts and Hiorns 1965;

Chakraborty 1985; Long 1991; Chakraborty et al. 1992;

Bertorelle and Excoffier 1998; Wang 2003), respectively

MK, MRH, MC, ML, MLC, MY, MW, implemented in five

well-tried software programs (ADMIX95, ADMIX, Admix

2.0, LEADMIX and Mistura). Input files for admixture

estimation software have been created using AdFiT v1.7

(Gourjon and Degioanni 2009). Since our goal was not to

assess efficiency and accuracy of a given estimator, we

anonymized them for simulated data and we provided

results without any order in estimators (M1 to M7, since one

estimator gave useless results).

Admixture estimation methods being based on different

assumptions and admixture models, we wondered if a

given method and a particular assumption would signifi-

cantly and systematically influence the AII calculation.

Given that no statistical tests exist to quantify the influence

of a particular estimator, we recalculated new intervals by

removing one different estimator each time to assess this

putative influence and the AII robustness. This means that

for an AII solved from seven estimates, we recalculate

seven other AII, without one different Mi for each one.

Results

Simulated data

Admixture rates for scenario (i) and (ii) are presented in

Table 1. As the M4 estimator did not give exploitable

results, it was excluded when building AII for scenario (ii).

We described the construction of the AII for the scenario (i)

below. A full example is presented in the Online Resource

4.

First step

Starting from the PP1, the WaMiis computed for each

estimator (Accuracy weight table in Online Resource 4). A

distance matrix is built, allowing to delimit the seven TIi.

Table 1 Admixture rates from simulated data

M1 M2 M3 M4 M5 M6 M7

Scenario (i)

PP1 67.54±2.17 75±7.65 86.85±0.48 84.45±1.23 81.87±13.95 86.55±16.58 79.04±7.09

PP2 13.05±1.48 14.65±5.31 9.87±0.29 7.13±0.61 3.72±7.64 9.24±8.59 15.71±2.35

PP3 19.41±2.28 10.35±7.93 3.28±0.53 8.42±1.31 14.41±10.19 4.21±15.99 5.25±1.35

Scenario (ii)

PP1 3.50±2.02 7.18±5.11 2.05±0.27 0.01±0.01 4.26±4.66 1.17±9.29 6.71±0.17

PP2 59.44±3.29 63.12±9.50 61.82±0.48 99.99±0.01 55.9±11.87 63.58±16.30 73.86±0.41

PP3 37.06±2.97 29.70±8.48 36.13±0.36 0.00±0.01 39.83±14.75 35.25±15.49 19.43±0.50

Admixture rates in italic are not reliable values

476 Genetica (2014) 142:473–482

123

The Global Interval is defined by the lowest and the highest

bounds among all TIi (mk line in Online Resource 4). Thus,

GI = [52.7867, 94.6250]. Given an incremental value

d = 0.0001, mk = 52.7867 ? 0.0001k, with 0B k B 418383,

mk varying from m0 = 52.7867 to mU�Ld¼ 94.625.

Second step

Considering that WD mkð Þ ¼ Wmkis a piecewise constant func-

tion, Wmkcan be calculated for lower and upper limits of different

pieces only, all mk having the same Wmkin a given piece.

Third step

The WD mkð Þ delimits an area that has been split in 3 thirds

(see Fig. 1). The AII is defined by the lower (mk = 81.0002)

and upper limits (mk = 84.9515) of the central area.

Admixture Indicative Intervals

Same procedures have been followed for each parental popu-

lation in order to get three AII (admixture contribution of

parental population 1, 2 and 3) for scenario (i) and then for

scenario (ii). Admixture Indicative Intervals are given in

Table 2.

Application on actual datasets

Table 3 presents AII obtained from Bantu, Saudi and

Indonesian contributions to the Comorian admixed

population, according to different markers, AII obtained

from Paleolithic contributions to the different European

admixed populations and AII for from African, European,

and Native American contributions to the Puerto-Rican and

Colombian admixed populations.

Figures 1 to 4 in Online Resource 5 are graphical rep-

resentations of AII, with Mi values provided for a com-

parative purpose and to highlight the estimates’ variability.

Influence of admixture estimation methods on the AII

calculation

The Admixture Indicative Intervals calculated without one

estimator each time are given in Online Resource 6.

Discussion

Efficiency of AII method

Current population admixture estimation methods provide

point estimates which may lead to different values. This

matter is clearly illustrated in Gourjon et al. (2011) and

Belle et al. (2006) studies in which the different methods

applied on the same data sets led to significantly different

admixture contribution estimates. How to identify which

estimate corresponds to the actual parental population’s

contribution while we are unaware of dynamic parameters

and evolutionary forces influence? Currently, considering

the number of biological and societal factors influencing

Fig. 1 Weights distribution and

Admixture Indicative Interval

based on parental population 1

from scenario (i)

Table 2 Admixture Indicative Interval (simulated data)

PP1 PP2 PP3

Scenario (i) [81.0002, 84.9515] [9.6071, 11.5613] [7.6429, 10.3783]

Scenario (ii) [4.2622, 5.9199] [62.3189, 65.3810] [32.4109, 35.6880]

Genetica (2014) 142:473–482 477

123

such an event and his dynamics in human populations, no

method can reach this actual value. Even ABC and clus-

tering methods do not allow including all these parameters.

In addition, these recent methods offer a synchronic

assessment of parental contributions instead of a diachronic

one (Gourjon and Degioanni 2012) leading to an incorrect

picture of the ancestral parental contributions.

To avoid a misinterpretation of a historical admixture

event, Gourjon et al. (2011) suggested using several

admixture estimation methods to get a genetic admixture

pattern instead of a sole point estimate. Even if most of the

estimates converge to an interval resulting from intuitive

awareness, it is essential to use a simple and reliable

method to build mathematically this interval. Albeit the AII

does not necessarily correspond to the actual admixture

contribution, given that it depends on admixture estimation

methods, their various assumptions and their admixture

models, it is a best suited representation of estimates’

variability and convergence.

To test our new method, we simulated two different

scenarios. We used seven admixture estimators which take

various evolutionary forces into account and provide

slightly different admixture contribution estimates. In both

scenarios, AII were located around the convergent values

and was only slightly affected by divergent values. For

example in scenario (i), Mi for PP1 varied from 67.54 to

86.85 %, converging around 80–85 %. Despite of a

divergent value M1 = 67.54 %, AII for PP1 was located in

the same range (AII = [81.0002, 84.9515] with M1 and

AII = [82.9845, 85.7785] without M1). For other parental

populations and scenario, AII kept the same behavior.

When applied to actual populations, our method pro-

vided, there again, intervals that take into account all

admixture estimates as well as their respective accuracy,

while, as a general rule, divergent points did not signifi-

cantly affect it. The great quantity of results obtained from

different markers in Gourjon et al. (2011) study was syn-

thesized by a few AII which turn out to be more meaningful

than aplenty variable point estimates, which are hardly

exploitable to explain more properly the Comorian

admixture event. As for instance, the Indonesian contri-

bution from SNP mtDNA markers ranged from

M = 5.27 % to M = 16.71 % while AII gave an indicative

trend equal to [8.909, 10.8941], centered around conver-

gent M values with low dispersion (MB = 5.27 ± 1.42,

MY = 8.34 ± 0.94, and MRH = 8.62 ± 3.29) whereas MK,

ML, and MYL have a lower weight (greater dispersion and/

or divergent values). For dihybrid admixture model, we

observed that, while Mi ranged around 60–80 %, the

divergent MY = 96.82 ± 1.36 did not greatly affect AII

(AII = [74.699, 82.8553] and AII = [69.4578, 73.1593]

with and without MY, respectively). However, this diver-

gent value with low standard deviation increased signifi-

cantly the length of the interval (see below). Consequently,

it may be interesting to examine carefully a divergent

estimate that shows a low dispersion when interpreting

results.

For the old admixture event (Belle et al. 2006), the

diversity in estimates among admixed European samples

and methods leads the authors to discuss estimates one by

one, or to use mean approximate admixture contributions

which were determined intuitively and without a

Table 3 Admixture Indicative Interval (actual populations)

Populations/markers Admixture Indicative Intervals

Gourjon et al. (2011) Bantu Saudi Indonesian

mtDNA [87.367, 88.5319] [2.1836, 3.4632] [8.909, 10.8941}

Y-Chromosome [70.8624, 72.0991] [24.2742, 24.8286] [3.3333, 3.9503]

Blood Group TriH [69.6184, 71.877] [21.5113, 26.4661] [4.7171, 8.2111]

Blood Group DiH [74.699, 82.8553] [24.1466, 30.3714] N.A.

Belle et al. (2006) Paleolithic Paleolithic

French [51.5437, 57.6022] Orcadians [54.9489, 59.5586]

N. Italians [61.0382, 65.8128] Russians [61.0325, 65.4394]

Sardinians [57.5839, 63.2999] Adygei [73.0127, 75.287]

Tuscans [66.1532, 70.1252] Europeans [60.6813, 64.6123]

Via et al. (2011) European African Native American

Puerto Rico [58.3351, 67.4745] [17.3174, 21.3468] [13.964, 16.1071]

Choudhry et al. (2006) European African Native American

Puerto Rico Case [63.5224, 66.0278] [14.6934, 17.4734] [19.8806, 20.4625]

Puerto Rico Control [61.4674, 68.1159][ [16.3646, 20.0505] [19.3013, 22.0276]

Rojas et al. (2010) European African Native American

Colombia [10.2794, 20.5409] [40.4494, 47.5594] [45.6887, 53.3915]

478 Genetica (2014) 142:473–482

123

mathematical support. Admixture Indicative Intervals were

consistent with Belle et al. (2006) results and set up in the

range of convergent estimates (Online Ressource 5, figure

2). In the French, Orcadian, Russian, and European popu-

lations, only one estimator (mW in Belle et al. 2006) gave a

divergent value from other estimates (for French,

MW = 34.1 ± 6.4 while MRH = 59.7 ± 2.3 and

MLC = 55.2 ± 2.5; for Orcadians, MW = 37.3 ± 5.8

while MRH = 59.6 ± 3.2 and MLC = 60.3 ± 3.4). In both

case, we observed that AII was set in the range of con-

vergent values, mW mainly affecting the length and moving

the AII to highest AII bounds (for French, AII = [51.5437,

57.6022] with all estimators and AII = [55.0634, 58.638]

without MW; for Orcadians, AII = [54.9489, 59.5586] with

all estimators and AII = [58.1905, 59.8396] without MW).

Generally, the length of the AII depends on the number

of convergent values (and implicitly the length of the

Global Interval) and on the accuracy of these estimates.

The more the estimates are close from each others; the

lower is the length of AII. In Online Ressource 5, figure 1a,

we observed reduced AII for Y-Chromosome and mtDNA

markers which provided aggregated Mi. Conversely, wide

AII were constructed for the dihybrid event because of a

great spread of admixture contributions. We have to

highlight that, with a low standard deviation, a convergent

Mi reduces the length of AII whereas a divergent Mi

increases it. Nevertheless, even if a divergent value shows

a low standard deviation, the double-weighting in the first

step prevents a significant increase of the length of AII.

Finally, in all studied cases, we noticed that the sum

of the center values of AII is around 100 percents

(P

AII¼ 102:62 andP

AII ¼ 102:99 for scenario (i) and

(ii), respectively;P

AIIðmtDNAÞ ¼ 100:67,P

AIIðY�ChromÞ ¼99:97,

PAIIðBloodGroupTriHÞ ¼ 101:2, and

PAIIðBloodGroupDiHÞ

¼ 106:04 for Comorian sample), implying that the

parental indicative contributions obtained from AII are in

adequacy to an actual admixture event.

Methodological aspects to be discussed

We want to stress on some other methodological points:

1. Influence of admixture estimation methods on AII:

because our interval is calculated from different

statistical methods, one may wonder wether a given

method, and inherently some of its assumptions and its

model, would have a significant influence on AII or

not. Generally speaking, we observed (Online

Resource 6) that removing any estimator does not

affect substantially the AII value. Obviously, to remove

a highly divergent estimator will decrease the Global

Interval length and consequently may decrease the AII

length. Most of the time, the interval remains however

in the same range of values (Online Resource 6). In the

same way, the bounds of the Weight Distribution’s

central third will move in the opposite direction from

the divergent value (see before for French and

Orcadians in Belle et al. (2006) study). As previously

reported, we highly recommend to remove a very

divergent estimator from the admixture event analyze.

This raises the burning issue of a given method which

could influence the AII. An itemized analyze of Online

Resource 6 reveals that it does not exist such an estimator

which systematically biased the AII. For Comoros’ study,

we observed that one after another, several estimators can

slightly modify the interval. For BloodGroups data and

Bantu contribution, MRH is the one that has the greatest

influence (AII = [69.6184, 71.877] with all estimators and

AII = [71.3276, 72.2989] without MRH) while for Saudi

and Indonesia contributions, it is MB (respectively,

AII = [21.5113, 26.4661] with all estimators and

AII = [26.0189, 30.0331] without MB; AII = [4.7171,

8.2111] with all estimators and AII = [2.6038, 3.4868]

without MB). For mtDNA, MY seems to be the most

‘‘influential’’. For Neolithic/Paleolithic contributions (Belle

et al. 2006), the estimator which is the most influential is

MW for French, Orcadians, Russians, or Sardinians (see

previous paragraph for French and Orcadians). For Rus-

sians and Sardinians, AII = [61.0325, 65.4394] with all

estimators and AII = [64.2899, 65.9296] without MW;

AII = [57.5389, 63.2999] with all estimators and

AII = [61.0883, 64.7815] without MW, respectively.

One can conclude that no specific estimator has an

influence on AII calculation and consequently, no model’s

assumptions nor specific admixture estimation method

have a particular influence on AII.

2. Admixture estimation methods could give negative or

null values. In this case, the coefficient of variation

cannot be used. A negative estimate could be inter-

preted as a null or negligible contribution from the

given parental population, or as the result of an

improper choice of parental populations, loci, param-

eters, etc. Because of the risk of an erroneous

interpretation, a conservative approach conduces to

remove this kind of values from any analysis. By

extension, since this parental population does not seem

to have contributed to the admixture event, one should

remove this parental population from any estimation.

To be able to include such a Mi in our AII, this Mi has

to be set to a value close to 0 (0.00001 as instance).

However, considering that it displays a great CV and

consequently has a negligible weight compared to

other Mi, we recommend to exclude it. In a same order

of idea, a null standard deviation means that the Mi has

Genetica (2014) 142:473–482 479

123

not been properly calculated and it has to be removed

as well.

3. Although, the AII has to be built from at least 3

estimators, we suggest to use at least four Mi. Indeed,

the AII is of special interest when a trend in admixture

is not obvious due to numerous results. Additionally,

the impact of a divergent or inaccurate Mi is as less as

the number of Mi is high.

4. Markers: because different types of markers (molec-

ular, uniparental, classical) follow a different admix-

ture pattern, they lead to different admixture

contribution estimates. To build an AII from Mi

obtained from heterogeneous types of markers is of

course a non-sense.

5. Increment d: in the given example, we advised in the first

step to use an incremental value of 0.0001. To use a

lower value (i.e. 0.00001 or less) will significantly

extend the computation time while the AII accuracy will

not be increased. In scenario (i) for parental population

1, AII is the same for d = 0.00001 and d = 0.0001

(AII = [81.0002, 84.9515]). To use a slightly higher

value (i.e. 0.001\d\ 0, 1) will shorten the computation

time without lowering the AII accuracy (AII = [80.99,

84.94] for d = 0.001). A d[ 0.01 will give a highly

biased AII (AII = [73.5, 78.3] for d = 0.1) so we advise

users against using such incremental values.

Other applications on actual populations

Our method has been developed to combine several esti-

mators into one trend in admixture. This implies that AII can

be calculated by combining several samples from a region or

a country, respectively. Instead of calculating a basic arith-

metic mean which can be biased by a unique divergent

estimate because it doesn’t include convergence or accuracy

of each estimator, AII provides an overview of admixture

contributions in the whole region or country and includes

properties of estimators. By the way, it offers an interesting

alternative to display a global admixture trend. For example,

Via et al. (2011) calculated genomic ancestry of six regional

samples across Puerto-Rico island and provided a mean

ancestry for the whole island (MNative Am. = 15.2 %;

MAfrican = 21.2 %; MEuropean = 63.7 %). Corresponding

Admixture Indicative Intervals (AIINative Am.

= [13.964–16.1071]; AIIAfrican = [17.3174–21.3468];

AIIEuropean = [58.3351–67.4745], Online Ressource 5, fig-

ure 3) include these means and the AII for the African con-

tribution is not significantly affected by the higher African

contribution in the East sample (M = 31.8 % while all other

estimates ranged from 16 to 21 %).

Similarly, combining estimates from several samples or

regions has an interest to compare a given subsample or

region to the whole sample or country. Rojas et al. (2010)

estimated admixture rates in 15 mestizo (admixed) sub-

populations across Colombia. Colombian’s population is

highly admixed and 86 % of individuals identified them-

selves as of mixed ancestry (Rojas et al. 2010). To compare

properly each region to the full Colombian population

helps to better understand colonization dynamics. For this

purpose, AII provides a reference interval to assess if a

given region followed the general trend in admixture or if it

followed a different introgression history (Online Res-

source 5, figure 4). For example, AII for autosomic esti-

mates allows highlighting that, even if Cundinamarca

population is in the general trend for Native American and

European parts, this population exhibits a lower African

ancestry than the whole Colombian population.

Moreover, AII makes possible comparisons between

studies and to know if trends in a given parental contri-

butions overlap. Some points can be raised with AII

obtained from Via et al. (2011) and Choudhry et al. (2006)

studies on Puerto-Rican populations (Online Ressource

5, figure 3). Via and colleagues estimates lead to wider AII,

implying more heterogeneous parental contributions

among Puerto-Rican populations than in the Choudhry and

colleagues study. For example, AIINative =

[13.964–16.1071] instead of AIINative =

[19.8806–20.4625], respectively. Interestingly, the Inter-

vals for Native American or for African contributions don’t

overlap over the two studies, whereas the European AII

calculated from Choudhry and colleagues estimates is

almost included in the one from Via and colleagues. We

observed that means for case controls in Choudhry and

colleagues are affected by divergent and inaccurate esti-

mates for the Carolina population due to a very limited

sample size (mean = 65.32 % instead of 60.2 % when

removing this population). Conversely, the AII are not

biased by these values (AIIEuropean = [64.8833–68.5528]

instead of AIIEuropean = [61.4674–68.1159]) and the mean

without biased value is closer to the AII centers (66.72 %

instead of 64.79 %).

Conclusion

Admixture Indicative Intervals proved to be able to sum-

marize properly heterogeneous admixture rates by taking

convergence between estimators as well as their respective

accuracy into account. This double-weighting leads to a

robust method which offers a trend in admixture instead of

isolated point estimates. In addition, the AII offers an

interesting alternative to arithmetic mean by providing a

reference value for comparative purposes. Besides, even if

the AII method obviously applies for population admixture

480 Genetica (2014) 142:473–482

123

studies, the AII can be calculated from parental contribu-

tions as well as from individual ancestry estimates (as in

Via et al. 2011).

The AII method has been implemented in AII.v2 soft-

ware (freely downloadable on http://AdFiT.free.fr).

Acknowledgments We like to thanks Sandrine Cabut for her help

in statistical analyzes.

References

Belle E, Landry PA, Barbujani G (2006) Origins and evolution of the

Europeans’ genome: evidence from multiple microsatellite loci.

Proc Biol Sci 273(1594):1595–1602

Bertorelle G, Excoffier L (1998) Inferring admixture proportions from

molecular data. Mol Biol Evol 15(10):1298–1311

Bertoni B, Budowle B, Sans M, Barton S, Chakraborty R (2003)

Admixture in Hispanics: distribution of ancestral population

contributions in the Continental United States. Hum Biol

75(1):1–11

Cavalli-Sforza LL, Menozzi P, Piazza A (1994) The history and

geography of human genes. Princeton University Press,

Princeton

Chakraborty R (1975) Estimation of race admixture—a new method.

Am J Phys Anthropol 42(3):507–511

Chakraborty R (1985) Gene identity in racial hybrids and estimation

of admixture rates. In: Ahuja YR, Neel JV (eds) Genetic

differentiation in human and other animal populations. Indian

Anthropological Association, Delhi, pp 171–180

Chakraborty R, Kamboh MI, Nwankwo M, Ferrell RE (1992)

Caucasian genes in American blacks: new data. Am J Hum

Genet 50(1):145–155

Chikhi L, Bruford MW, Beaumont MA (2001) Estimation of

admixture proportions: a likelihood-based approach using Mar-

kov chain Monte Carlo. Genetics 158(3):1347–1362

Choisy M, Franck P, Cornuet J (2004) Estimating admixture

proportions with microsatellites: comparison of methods based

on simulated data. Mol Ecol 13(4):955–968

Choudhry S, Burchard E, Borrell L, Tang H, Gomez I, Naqvi M,

Nazario S, Torres A, Casal J, Martinez-Cruzado J, Ziv E, Avila

P, Rodriguez-Cintron W, Risch N (2006) Ancestry–environment

interactions and asthma risk among Puerto Ricans. Am J Respir

Crit Care Med 174(10):1088–1093

Degioanni A, Gourjon G (2010) Le melange dans les populations

humaines: modeles et methodes d’estimation. Anthropologie

48(1):41–56

Durand E, Jay F, Gaggiotti OE, Francois O (2009) Spatial inference

of admixture proportions and secondary contact zones. Mol Biol

Evol 26(9):1963–1973

Edgar H (2009) Biohistorical approaches to ‘‘race’’ in the United

States: Biological distances among African Americans, Euro-

pean Americans, and their ancestors. Am J Phys Anthropol

139(1):58–67

Elston RC (1971) The estimation of admixture in racial hybrids. Ann

Hum Genet 35(1):9–17

Erdei E, Sheng H, Maestas E, Mackey A, White KA, Li L, Dong Y,

Taylor J, Berwick M, Morse DE (2011) Self-reported ethnicity

and genetic ancestry in relation to oral cancer and pre-cancer in

Puerto Rico. PLoS One 6(8):e23950

Excoffier L, Lischer HE (2010) Arlequin suite ver 3.5: a new series of

programs to perform population genetics analyses under Linux

and Windows. Mol Ecol Res 10(3):564–567

Excoffier L, Novembre J, Schneider S (2000) SIMCOAL: a general

coalescent program for the simulation of molecular data in

interconnected populations with arbitrary demography. J Hered

91(6):506–509

Excoffier L, Estoup A, Cornuet JM (2005) Bayesian analysis of an

admixture model with mutations and arbitrarily linked markers.

Genetics 169(3):1727–1738

Falush D, Stephens M, Pritchard JK (2003) Inference of population

structure using multilocus genotype data: linked loci and

correlated allele frequencies. Genetics 164(4):1567–1587

Fejerman L, Carnese F, Goicoechea A, Avena S, Dejean C, Ward R

(2005) African ancestry of the population of Buenos Aires. Am J

Phys Anthropol 128(1):164–170

Francois O, Currat M, Ray N, Han E, Excoffier L, Novembre J (2010)

Principal component analysis under population genetic models

of range expansion and admixture. Mol Biol Evol

27(6):1257–1268

Giovannini A, Zanghirati G, Beaumont MA, Chikhi L, Barbujani G

(2009) A novel parallel approach to the likelihood-based

estimation of admixture in population genetics. Bioinformatics

25(11):1440–1441

Glass B, Li C (1953) The dynamics of racial intermixture: an analysis

based on the American Negro. Am J Hum Genet 5(1):1–20

Gourjon G (2010) L’estimation du melange genetique dans les

populations humaines. PhD Thesis. University of Aix-Marseille

Gourjon G, Degioanni A (2009) AdFiT v1.7 (Admixture File Tool):

input files creating tool for genetic admixture estimation

software. Bull et Mem de la Societe d’Anthropologie de Paris

21(3–4):223–229

Gourjon G, Degioanni A (2012) Ancestry and admixture rates: the

matter of evolution in admixture study. J Biol Res (formerly

Bollettino della Societa Italiana di Biologia Sperimentale)

85(1):147–150

Gourjon G, Boetsch G, Degioanni A (2011) Gender and population

history: sex bias revealed by studying genetic admixture of

Ngazidja population (Comoro Archipelago). Am J Phys Anthro-

pol 144(4):653–660

Helgason A, Sigurðardottir S, Nicholson J, Sykes B, Hill EW,Bradley DG, Bosnes V, Gulcher JR, Ward R, Stefansson K

(2000) Estimating Scandinavian and Gaelic ancestry in the male

settlers of Iceland. Am J Hum Genet 67(3):697–717

Krieger H, Morton NE, Mi MP, Azevedo E, Freire-Maia A, Yasuda N

(1965) Racial admixture in north-eastern Brazil. Ann Hum Genet

29(2):113–125

Lathrop G (1982) Evolutionary trees and admixture: phylogenetic

inference when some populations are hybridized. Ann Hum

Genet 46(Pt3):245–255

Long J (1991) The genetic structure of admixed populations. Genetics

127(2):417–428

McEvoy B, Brady C, Moore LT, Bradly DG (2006) The scale and

nature of Viking settlement in Ireland from Y-chromosome

admixture analysis. Eur J Hum Genet 14(12):1288–1294

Morera B, Barrantes R, Marin-Rojas R (2003) Gene admixture in the

Costa Rican population. Ann Hum Genet 67(1):71–80

Pritchard J, Stephens M, Donnelly P (2000) Inference of population

structure using multilocus genotype data. Genetics

155(2):945–959

Roberts DF, Hiorns RW (1962) The dynamics of racial intermixture.

Am J Hum Genet 14:261–277

Roberts DF, Hiorns RW (1965) Methods of analysis of the genetic

composition of a hybrid population. Hum Biol 37:38–43

Rojas W, Parra M, Campo O, Caro M, Lopera J, Arias W, Duque C,

Naranjo A, Garcıa J, Vergara C, Lopera J, Hernandez E,

Valencia A, Caicedo Y, Cuartas M, Gutierrez J, Lopez S, Ruiz-

Linares A, Bedoya G (2010) Genetic make up and structure of

Genetica (2014) 142:473–482 481

123

Colombian populations by means of uniparental and biparental

DNA markers. Am J Phys Anthropol 143(1):13–20

Sans M (2000) Admixture studies in Latin America: from the 20th to

the 21st century. Hum Biol 72(1):155–177

Silva MC, Zuccherato LW, Soares-Souza GB, Vieira M, Cabrera L,

Herrera P, Balqui J, Romero C, Jahuira H, Gilman RH, Martins

ML, Tarazona-Santos E (2010) Development of two multiplex

mini-sequencing panels of ancestry informative SNPs for studies

in Latin Americans: an application to populations of the State of

Minas Gerais (Brazil). Genet Mol Res 9(4):2069–2085

Sousa V, Fritz M, Beaumont MA, Chikhi L (2009) Approximate

Bayesian computation without summary statistics: the case of

admixture. Genetics 181(4):1507–1519

Tang H, Peng J, Wang P, Risch N (2005) Estimation of individual

admixture: analytical and study design considerations. Genet

Epidemiol 28(4):289–301

Via M, Gignoux CR, Roth LA, Fejerman L, Galanter J, Choudhry S,

Toro-Labrador G, Viera-Vera J, Oleksyk TK, Beckman K, Ziv

E, Risch N, Burchard EG, Martınez-Cruzado JC (2011) History

shaped the geographic distribution of genomic admixture on the

island of Puerto Rico. PLoS One 6(1):e16513

Wang J (2003) Maximum-likelihood estimation of admixture pro-

portions from genetic data. Genetics 164(2):747–765

Wang J (2006) A coalescent-based estimator of admixture from DNA

sequences. Genetics 173(3):1679–1692

482 Genetica (2014) 142:473–482

123