Admixture Indicative Interval (AII): a new approach to assess trends in genetic admixture
Transcript of Admixture Indicative Interval (AII): a new approach to assess trends in genetic admixture
Admixture Indicative Interval (AII): a new approach to assesstrends in genetic admixture
Geraud Gourjon • Berengere Saliba-Serre •
Anna Degioanni
Received: 25 April 2012 / Accepted: 13 September 2014 / Published online: 20 September 2014
� Springer International Publishing Switzerland 2014
Abstract The genetic admixture is a dynamic and dia-
chronic process, taking place during a great number of
generations. Consequently, a sole admixture rate does not
represent such an event and several estimates could help to
take into account its dynamics. We developed an Admix-
ture Indicative Interval (AII) which gives a mathematical
key to avoid this problem by integrating several admixture
estimators and their respective accuracy into a single
metric and provides a trend in genetic admixture. To
illustrate AIIs interests in admixture studies, AII were
calculated using seven estimators on two sets of simulated
SNPs data generated under two different admixture sce-
narios and were then calculated from several published
admixed population data: a Comorian population and
several Puerto-Rican and Colombian populations for recent
admixture events as well as European populations repre-
senting the Neolithic/Paleolithic admixture for an older
event. Our method provides intervals taking properly the
variability and accuracy of admixture estimates into
account. The AII lays in the intuitive interval in all actual
and simulated datasets and is not biased by divergent points
by the mean of a double-weighting step. The great quantity
of heterogeneous parental contributions is synthesized by a
few AII, which turn out to be more manageable and
meaningful than aplenty variable point estimates. This
offers an improvement in admixture study, allowing a
better understanding of migratory flows. Furthermore, it
offers a better assessment of admixture than the arithmetic
mean, and enhances comparisons between regions, sam-
ples, and between studies on same population.
Keywords Genetic admixture � Admixture rates � Interval
computation � Estimation method � Human peopling
Introduction
In search of the history of human populations, admixture
events are of great importance to better understand demic
movements and gene flows that have led to current-day
populations. The admixture is a dynamic process which
runs through many generations in human populations.
During and, above all, after this event, several evolutionary
forces affect parental and admixed populations’ genetic
pool. Consequently, a clear discrepancy may exist between
the genetic pattern observed in the next generation fol-
lowing the mixing and the today sampled populations’
genetic structure. Moreover, the stochastic sampling error
may increases divergences from ancestral allele frequen-
cies (McEvoy et al. 2006).
However, the knowledge of parental contributions has
always played a significant role in the analysis of the gene
pool of admixed populations, in particular in Latin or
Central American populations (Fejerman et al. 2005; Mo-
rera et al. 2003; Sans 2000; Silva et al. 2010; Via et al.
2011) and in Hispanic-American populations (Bertoni et al.
2003; Erdei et al. 2011; Rojas et al. 2010), where
Electronic supplementary material The online version of thisarticle (doi:10.1007/s10709-014-9792-3) contains supplementarymaterial, which is available to authorized users.
G. Gourjon (&) � A. Degioanni
CNRS, MCC – MMSH, LAMPEA UMR 7269, Aix-Marseille
Universite, 5 rue du Chateau de l’Horloge, BP 647,
13094 Aix-en-Provence Cedex 2, France
e-mail: [email protected]
B. Saliba-Serre
Faculte de medecine de Marseille, Aix-Marseille Universite,
CNRS, EFS, ADES UMR 7268, 51 boulevard Pierre Dramard,
13344 Marseille Cedex 15, France
123
Genetica (2014) 142:473–482
DOI 10.1007/s10709-014-9792-3
admixture events have been studied to obtain a better
understanding of the dynamics of colonization or in a
medical goal. From least-square methods to recent ABC
(approximative Bayesian computation) and Bayesian clus-
tering methods, many estimators have been developed to
assess contributions to the genetic pool of an admixed pop-
ulation (Bertorelle and Excoffier 1998; Cavalli-Sforza et al.
1994; Chakraborty 1975; Chakraborty et al. 1992; Chikhi
et al. 2001; Elston 1971; Excoffier et al. 2005; Giovannini
et al. 2009; Glass and Li 1953; Helgason et al. 2000; Krieger
et al. 1965; Lathrop 1982; Long 1991; Roberts and Hiorns
1962; Sousa et al. 2009; Wang 2003, 2006) and the genome
ancestry of an individual (Pritchard et al. 2000; Falush et al.
2003; Tang et al. 2005). These methods rely on several
assumptions (no sampling error, Hardy–Weinberg equilib-
rium, null natural selection, limited or lack of migration,
molecular information, etc.), taking into account some of
evolutionary forces which may have modified allele fre-
quencies. See Degioanni and Gourjon (2010) for a complete
review of population admixture estimation’s methods.
Following authors of admixture estimation methods (see
original papers) and Choisy et al. (2004), under optimal
conditions (i.e. not too old or too recent admixture events
and highly differentiated parental populations) all methods
and associated programs would be applicable when using
appropriate data (or reciprocally) and could lead to rather
accurate estimates with little bias. However, based on
simulated data, Choisy et al. (2004) found that some esti-
mates (especially for the Chakraborty’s Gene Identity
method and the Coalescence-Based method of Bertorelle
and Excoffier) frequently go outside the [0, 1], in particular
for parental populations that are not very different, for old
admixture events, for a low number of loci, and for small
samples. In other words, when these optimal conditions
were not met and so for all human population admixture
events, ‘‘substantial bias […] occurred and […] confidence
intervals partly lost their credibility’’ (Choisy et al. 2004).
In addition to software intrinsic factors, the main vari-
ability in admixture rate estimates is introduced by
experimenter’s choices (parental populations, markers,
admixed population samples, values for parameters such as
migration rate, genetic drift, mutation rate, sample size,
etc.), [see Gourjon (2010) and Gourjon et al. (2011) for
more information], and social behaviors [directional mat-
ing or societal selection like the ‘‘meaning of Black’’
(Edgar et al. 2009)] which both highly influence admixture
estimation. Even if Approximate Bayesian Computation
approach (Excoffier et al. 2005; Sousa et al. 2009) and
clustering methods (Prittchard et al.) have updated the
field, major problems described here still remain.
Our previous study on the Comorian population (Gourjon
et al. 2011) illustrates this great variability in admixture
contribution estimates. All our results emphasize the
conclusion that point estimates alone could provide a mis-
interpretation of the migration history and the admixture
event. In point of fact, since the genetic admixture does not
consist in a sole generation event, a sole point admixture
estimate would not explain this complex process, especially
knowing the bias and the huge estimate variability described
before. In addition, parental contributions may vary among
generations (Roberts and Hiorns 1962; Cavalli-Sforza et al.
1994), following complex spatiotemporal processes (Du-
rand et al. 2009; Francois et al. 2010).
Considering all described problems, only two way should
be followed in admixture study: to stop using admixture
estimation methods because of their lack of accuracy in
actual conditions, either, as suggested by Gourjon et al.
(2011), to estimate admixture rates with several estimators
in order to restrain bias. Indeed, the use of several markers
and combinations of parental populations or admixed pop-
ulation samples, as well as several estimation methods,
helps to assess the actual dynamics by providing an
‘‘admixture pattern’’. However, it is hardly manageable to
quantify a parental contribution or a genomic ancestry from
such a number of heterogeneous estimates. We developed a
method providing an Admixture Indicative Interval (AII),
which mathematically answer to this problem. Instead of
integrating all point estimates in an intuitive interval, our
procedure allows to get a trend in genetic admixture which
takes all estimates and their respective accuracy into
account. This AII can be applied on population admixture
estimates as well as on individual genomic ancestry esti-
mates. One has to note that, since the AII totally relies on
admixture estimator values which are taken as original data
(with their assumptions and known bias), our method does
not provide a ‘‘new’’ estimation. This method is designed as
a statistical tool to properly handle and manage several data
and to support anthropological interpretations.
Firstly, we tested our method on simulated data obtained
from two different admixture scenarios. Secondly, we calcu-
lated AII using data from several actual admixed populations
to highlight the various applications of our method: a Com-
orian population (Gourjon et al. 2011), and several admixed
populations from Puerto-Rico (Choudhry et al. 2006; Via et al.
2011), Colombia (Rojas et al. 2010), and Europe (Belle et al.
2006), these European populations being considered as the
result of the Neolithic–Paleolithic admixture.
Materials and methods
Basic procedures of Admixture Indicative Interval
The Admixture Indicative Interval (AII) is determined
following a strategy in three steps. In the first step, each
estimator is weighted considering its accuracy and its
474 Genetica (2014) 142:473–482
123
convergence with others. In the second step, the weighted
estimators allow to create a distribution of M, the admix-
ture contribution estimate. The last step consists in deter-
mining the length of the interval which includes the most
statistically likely M value.
In the following procedure, Mi will refer to the value of
the ith estimator (with i 2 1; . . .; If g, and I, the total
number of estimators, I� 3). For each parental population
(PP), these three steps have to be cleared to calculate the
corresponding Admixture Indicative Interval (as such AII as
parental populations).
First step: weights computation
a. Accuracy The measure of dispersion is normalized by
using the coefficient of variation (CV): the ratio of the
standard deviation to Mi. Consequently, it gives a better
understanding of the accuracy than other measures of dis-
persion, especially for low value of admixture rates, which
is useful to compare one estimator to another. The more the
CV is low, the more the estimator is accurate, and is so
more informative and more representative of the actual
admixture rate than an estimator with a great CV value.
Indeed, a too conservative method is of a limited interest in
an admixture study.
To weight an estimator, we use the inverse of the
coefficient of variation 1CV
, assigning a higher weight for a
more accurate estimator. Finally, to obtain a relative weight
between estimators, 1CV
is reported for each estimator as a
percent of the sum of the 1CV
from all estimators.
WaMi¼ 1
CVi
,XI
i¼1
1
CVi
!� 100
With WaMithe relative accuracy weight of the ith admix-
ture estimator (Mi) in comparison to the other estimators,
and CVi the corresponding coefficient of variation.
b. Convergence One can hypothesize that if several
estimators point to the same range of M, while another
estimator points to a significantly different value, the actual
contribution would be most likely in this convergent trend
than around the divergent value. Consequently, a higher
weight has to be given to an estimator for which the value
is closer to other Mi.
To assess the convergence between estimators, Tempo-
rary Intervals (TIi) are created and centered on each Mi.
Then, a distance matrix between all pairs of Mi is con-
structed, each term of the matrix being equal to the abso-
lute difference between 2 estimator values. Thus, for each
i, the mean absolute difference di between Mi and the (I -
1) other estimators Mi (i0 2 1; . . .; If g) is calculated:
di ¼1
I � 1
XI
i0¼1
jMi �Mi0 j
The TIi is delimited by [Mi- di; Mi ? di ].
A Global Interval (GI) is set up from the lowest limit (L)
to the uppermost limit (U) among all TIi. One moves along
the GI with an incremental value of d (8d; d 2 R�þ). Values
taken by M are noted mk, with mk ¼ Lþ k � d, k varying
between 0 to U�Ld , mk varying from m0 ¼ L to mU�L
d¼ U.
Along this GI, for each mk, the number of TIi including
this mk, noted Wcmk, is the convergence weight of mk. As
instance, if three TIi include a given mk, Wcmk¼ 3.
Second step: weights’ distribution
For each mk, Wamkis the sum of the WaMi
of Mi for which
corresponding TIi includes mk.
Wamk¼XI
i¼1
ITIimkð Þ �WaMi
with ITIimkð Þ is the indicator function of TIi; ITIi
mkð Þ ¼ 1 if
mk 2 TIi and ITIimkð Þ ¼ 0 otherwise.
Thereof, since WaMiis a percent, the sum will be equal
to 100 at a given mk if the number of TIi including mk is
equal to I. The total weight of a given mk, Wmk, is obtained
by:
Wmk¼ Wamk
:Wcmk
A piecewise constant function which corresponds to the
Weights’ Distribution WD mkð Þ ¼ Wmk, is constructed.
Third step: Admixture Indicative Interval
The WD delimits an area representing the cumulated sum
of weights along GI. This area is split in three thirds.
Assuming that WD peak points to the most likely mk value,
the AII is the interval delimited by the two mk values which
correspond to bounds of the central third (see Results for an
example).
Datasets and statistical programs
Admixture contribution estimates allowing to build our
Admixture Indicative Interval have been obtained from
simulated and published actual data.
We firstly simulated 2 trihybrid admixture scenarios using
SIMCOAL2, an extension of SIMCOAL (Excoffier et al.
2000). Since all admixture estimation methods consider an
instantaneous event in their models, we decided to fit this
assumption in our simulated models. (i) A sole instantaneous
admixture between 3 parental populations, 25 generations
ago. (ii) Two separated instantaneous admixture events: a
Genetica (2014) 142:473–482 475
123
unilateral gene flow, 25 generations ago, from the parental
population 3 (PP3) to the parental population 1 (PP1) and,
five generations later, an introgression from the parental
population 2 (PP2) in the admixed population resulting from
the first event. Simulation parameters are available in Online
Resource 1. Admixture contributions were estimated from
the frequencies of 30 SNPs (See Online Resource 2 and
Online Resource 3, for SNPs data resulting from scenario (i)
and (ii), respectively). Output files have been processed
using Arlequin v3.5 (Excoffier and Lischer 2010).
Secondly, we applied Admixture Indicative Interval
method to calculate parental contributions to published
actual populations:
• A Comorian population (Gourjon et al. 2011): we built
AII for a same parental population combination (South-
East Africa Bantu, Saudi, and Indonesian populations)
by combining estimator’s values for each marker type:
classical serologic markers (seven blood groups) and
uniparental molecular markers (SNPs from mtDNA and
Y-Chromosome).
• Several admixed samples from Puerto-Rico: AII were
built from individual ancestry (IA) estimates of
admixed samples from six different regions (Via et al.
2011) and six different clinic recruitment locations
(Choudhry et al. 2006). These IA were calculated from
93 and 44 autosomal AIMs, respectively.
• Several admixed samples from Colombia (Rojas et al.
2010): we built AII by combining parental contributions
(calculated from 11 autosomal AIMs) to 15 admixed
(mestizo) populations.
• Several European populations resulting from the Neo-
lithic–Paleolithic admixture event (Belle et al. 2006): we
built AII for the Paleolithic parental contributions in all
European admixed populations separately, parental con-
tributions being estimated from multilocus STRs data.
To estimate parental contributions for simulated data-
sets, seven population admixture estimators (Mi) have been
used at most (Krieger et al. 1965; Roberts and Hiorns 1965;
Chakraborty 1985; Long 1991; Chakraborty et al. 1992;
Bertorelle and Excoffier 1998; Wang 2003), respectively
MK, MRH, MC, ML, MLC, MY, MW, implemented in five
well-tried software programs (ADMIX95, ADMIX, Admix
2.0, LEADMIX and Mistura). Input files for admixture
estimation software have been created using AdFiT v1.7
(Gourjon and Degioanni 2009). Since our goal was not to
assess efficiency and accuracy of a given estimator, we
anonymized them for simulated data and we provided
results without any order in estimators (M1 to M7, since one
estimator gave useless results).
Admixture estimation methods being based on different
assumptions and admixture models, we wondered if a
given method and a particular assumption would signifi-
cantly and systematically influence the AII calculation.
Given that no statistical tests exist to quantify the influence
of a particular estimator, we recalculated new intervals by
removing one different estimator each time to assess this
putative influence and the AII robustness. This means that
for an AII solved from seven estimates, we recalculate
seven other AII, without one different Mi for each one.
Results
Simulated data
Admixture rates for scenario (i) and (ii) are presented in
Table 1. As the M4 estimator did not give exploitable
results, it was excluded when building AII for scenario (ii).
We described the construction of the AII for the scenario (i)
below. A full example is presented in the Online Resource
4.
First step
Starting from the PP1, the WaMiis computed for each
estimator (Accuracy weight table in Online Resource 4). A
distance matrix is built, allowing to delimit the seven TIi.
Table 1 Admixture rates from simulated data
M1 M2 M3 M4 M5 M6 M7
Scenario (i)
PP1 67.54±2.17 75±7.65 86.85±0.48 84.45±1.23 81.87±13.95 86.55±16.58 79.04±7.09
PP2 13.05±1.48 14.65±5.31 9.87±0.29 7.13±0.61 3.72±7.64 9.24±8.59 15.71±2.35
PP3 19.41±2.28 10.35±7.93 3.28±0.53 8.42±1.31 14.41±10.19 4.21±15.99 5.25±1.35
Scenario (ii)
PP1 3.50±2.02 7.18±5.11 2.05±0.27 0.01±0.01 4.26±4.66 1.17±9.29 6.71±0.17
PP2 59.44±3.29 63.12±9.50 61.82±0.48 99.99±0.01 55.9±11.87 63.58±16.30 73.86±0.41
PP3 37.06±2.97 29.70±8.48 36.13±0.36 0.00±0.01 39.83±14.75 35.25±15.49 19.43±0.50
Admixture rates in italic are not reliable values
476 Genetica (2014) 142:473–482
123
The Global Interval is defined by the lowest and the highest
bounds among all TIi (mk line in Online Resource 4). Thus,
GI = [52.7867, 94.6250]. Given an incremental value
d = 0.0001, mk = 52.7867 ? 0.0001k, with 0B k B 418383,
mk varying from m0 = 52.7867 to mU�Ld¼ 94.625.
Second step
Considering that WD mkð Þ ¼ Wmkis a piecewise constant func-
tion, Wmkcan be calculated for lower and upper limits of different
pieces only, all mk having the same Wmkin a given piece.
Third step
The WD mkð Þ delimits an area that has been split in 3 thirds
(see Fig. 1). The AII is defined by the lower (mk = 81.0002)
and upper limits (mk = 84.9515) of the central area.
Admixture Indicative Intervals
Same procedures have been followed for each parental popu-
lation in order to get three AII (admixture contribution of
parental population 1, 2 and 3) for scenario (i) and then for
scenario (ii). Admixture Indicative Intervals are given in
Table 2.
Application on actual datasets
Table 3 presents AII obtained from Bantu, Saudi and
Indonesian contributions to the Comorian admixed
population, according to different markers, AII obtained
from Paleolithic contributions to the different European
admixed populations and AII for from African, European,
and Native American contributions to the Puerto-Rican and
Colombian admixed populations.
Figures 1 to 4 in Online Resource 5 are graphical rep-
resentations of AII, with Mi values provided for a com-
parative purpose and to highlight the estimates’ variability.
Influence of admixture estimation methods on the AII
calculation
The Admixture Indicative Intervals calculated without one
estimator each time are given in Online Resource 6.
Discussion
Efficiency of AII method
Current population admixture estimation methods provide
point estimates which may lead to different values. This
matter is clearly illustrated in Gourjon et al. (2011) and
Belle et al. (2006) studies in which the different methods
applied on the same data sets led to significantly different
admixture contribution estimates. How to identify which
estimate corresponds to the actual parental population’s
contribution while we are unaware of dynamic parameters
and evolutionary forces influence? Currently, considering
the number of biological and societal factors influencing
Fig. 1 Weights distribution and
Admixture Indicative Interval
based on parental population 1
from scenario (i)
Table 2 Admixture Indicative Interval (simulated data)
PP1 PP2 PP3
Scenario (i) [81.0002, 84.9515] [9.6071, 11.5613] [7.6429, 10.3783]
Scenario (ii) [4.2622, 5.9199] [62.3189, 65.3810] [32.4109, 35.6880]
Genetica (2014) 142:473–482 477
123
such an event and his dynamics in human populations, no
method can reach this actual value. Even ABC and clus-
tering methods do not allow including all these parameters.
In addition, these recent methods offer a synchronic
assessment of parental contributions instead of a diachronic
one (Gourjon and Degioanni 2012) leading to an incorrect
picture of the ancestral parental contributions.
To avoid a misinterpretation of a historical admixture
event, Gourjon et al. (2011) suggested using several
admixture estimation methods to get a genetic admixture
pattern instead of a sole point estimate. Even if most of the
estimates converge to an interval resulting from intuitive
awareness, it is essential to use a simple and reliable
method to build mathematically this interval. Albeit the AII
does not necessarily correspond to the actual admixture
contribution, given that it depends on admixture estimation
methods, their various assumptions and their admixture
models, it is a best suited representation of estimates’
variability and convergence.
To test our new method, we simulated two different
scenarios. We used seven admixture estimators which take
various evolutionary forces into account and provide
slightly different admixture contribution estimates. In both
scenarios, AII were located around the convergent values
and was only slightly affected by divergent values. For
example in scenario (i), Mi for PP1 varied from 67.54 to
86.85 %, converging around 80–85 %. Despite of a
divergent value M1 = 67.54 %, AII for PP1 was located in
the same range (AII = [81.0002, 84.9515] with M1 and
AII = [82.9845, 85.7785] without M1). For other parental
populations and scenario, AII kept the same behavior.
When applied to actual populations, our method pro-
vided, there again, intervals that take into account all
admixture estimates as well as their respective accuracy,
while, as a general rule, divergent points did not signifi-
cantly affect it. The great quantity of results obtained from
different markers in Gourjon et al. (2011) study was syn-
thesized by a few AII which turn out to be more meaningful
than aplenty variable point estimates, which are hardly
exploitable to explain more properly the Comorian
admixture event. As for instance, the Indonesian contri-
bution from SNP mtDNA markers ranged from
M = 5.27 % to M = 16.71 % while AII gave an indicative
trend equal to [8.909, 10.8941], centered around conver-
gent M values with low dispersion (MB = 5.27 ± 1.42,
MY = 8.34 ± 0.94, and MRH = 8.62 ± 3.29) whereas MK,
ML, and MYL have a lower weight (greater dispersion and/
or divergent values). For dihybrid admixture model, we
observed that, while Mi ranged around 60–80 %, the
divergent MY = 96.82 ± 1.36 did not greatly affect AII
(AII = [74.699, 82.8553] and AII = [69.4578, 73.1593]
with and without MY, respectively). However, this diver-
gent value with low standard deviation increased signifi-
cantly the length of the interval (see below). Consequently,
it may be interesting to examine carefully a divergent
estimate that shows a low dispersion when interpreting
results.
For the old admixture event (Belle et al. 2006), the
diversity in estimates among admixed European samples
and methods leads the authors to discuss estimates one by
one, or to use mean approximate admixture contributions
which were determined intuitively and without a
Table 3 Admixture Indicative Interval (actual populations)
Populations/markers Admixture Indicative Intervals
Gourjon et al. (2011) Bantu Saudi Indonesian
mtDNA [87.367, 88.5319] [2.1836, 3.4632] [8.909, 10.8941}
Y-Chromosome [70.8624, 72.0991] [24.2742, 24.8286] [3.3333, 3.9503]
Blood Group TriH [69.6184, 71.877] [21.5113, 26.4661] [4.7171, 8.2111]
Blood Group DiH [74.699, 82.8553] [24.1466, 30.3714] N.A.
Belle et al. (2006) Paleolithic Paleolithic
French [51.5437, 57.6022] Orcadians [54.9489, 59.5586]
N. Italians [61.0382, 65.8128] Russians [61.0325, 65.4394]
Sardinians [57.5839, 63.2999] Adygei [73.0127, 75.287]
Tuscans [66.1532, 70.1252] Europeans [60.6813, 64.6123]
Via et al. (2011) European African Native American
Puerto Rico [58.3351, 67.4745] [17.3174, 21.3468] [13.964, 16.1071]
Choudhry et al. (2006) European African Native American
Puerto Rico Case [63.5224, 66.0278] [14.6934, 17.4734] [19.8806, 20.4625]
Puerto Rico Control [61.4674, 68.1159][ [16.3646, 20.0505] [19.3013, 22.0276]
Rojas et al. (2010) European African Native American
Colombia [10.2794, 20.5409] [40.4494, 47.5594] [45.6887, 53.3915]
478 Genetica (2014) 142:473–482
123
mathematical support. Admixture Indicative Intervals were
consistent with Belle et al. (2006) results and set up in the
range of convergent estimates (Online Ressource 5, figure
2). In the French, Orcadian, Russian, and European popu-
lations, only one estimator (mW in Belle et al. 2006) gave a
divergent value from other estimates (for French,
MW = 34.1 ± 6.4 while MRH = 59.7 ± 2.3 and
MLC = 55.2 ± 2.5; for Orcadians, MW = 37.3 ± 5.8
while MRH = 59.6 ± 3.2 and MLC = 60.3 ± 3.4). In both
case, we observed that AII was set in the range of con-
vergent values, mW mainly affecting the length and moving
the AII to highest AII bounds (for French, AII = [51.5437,
57.6022] with all estimators and AII = [55.0634, 58.638]
without MW; for Orcadians, AII = [54.9489, 59.5586] with
all estimators and AII = [58.1905, 59.8396] without MW).
Generally, the length of the AII depends on the number
of convergent values (and implicitly the length of the
Global Interval) and on the accuracy of these estimates.
The more the estimates are close from each others; the
lower is the length of AII. In Online Ressource 5, figure 1a,
we observed reduced AII for Y-Chromosome and mtDNA
markers which provided aggregated Mi. Conversely, wide
AII were constructed for the dihybrid event because of a
great spread of admixture contributions. We have to
highlight that, with a low standard deviation, a convergent
Mi reduces the length of AII whereas a divergent Mi
increases it. Nevertheless, even if a divergent value shows
a low standard deviation, the double-weighting in the first
step prevents a significant increase of the length of AII.
Finally, in all studied cases, we noticed that the sum
of the center values of AII is around 100 percents
(P
AII¼ 102:62 andP
AII ¼ 102:99 for scenario (i) and
(ii), respectively;P
AIIðmtDNAÞ ¼ 100:67,P
AIIðY�ChromÞ ¼99:97,
PAIIðBloodGroupTriHÞ ¼ 101:2, and
PAIIðBloodGroupDiHÞ
¼ 106:04 for Comorian sample), implying that the
parental indicative contributions obtained from AII are in
adequacy to an actual admixture event.
Methodological aspects to be discussed
We want to stress on some other methodological points:
1. Influence of admixture estimation methods on AII:
because our interval is calculated from different
statistical methods, one may wonder wether a given
method, and inherently some of its assumptions and its
model, would have a significant influence on AII or
not. Generally speaking, we observed (Online
Resource 6) that removing any estimator does not
affect substantially the AII value. Obviously, to remove
a highly divergent estimator will decrease the Global
Interval length and consequently may decrease the AII
length. Most of the time, the interval remains however
in the same range of values (Online Resource 6). In the
same way, the bounds of the Weight Distribution’s
central third will move in the opposite direction from
the divergent value (see before for French and
Orcadians in Belle et al. (2006) study). As previously
reported, we highly recommend to remove a very
divergent estimator from the admixture event analyze.
This raises the burning issue of a given method which
could influence the AII. An itemized analyze of Online
Resource 6 reveals that it does not exist such an estimator
which systematically biased the AII. For Comoros’ study,
we observed that one after another, several estimators can
slightly modify the interval. For BloodGroups data and
Bantu contribution, MRH is the one that has the greatest
influence (AII = [69.6184, 71.877] with all estimators and
AII = [71.3276, 72.2989] without MRH) while for Saudi
and Indonesia contributions, it is MB (respectively,
AII = [21.5113, 26.4661] with all estimators and
AII = [26.0189, 30.0331] without MB; AII = [4.7171,
8.2111] with all estimators and AII = [2.6038, 3.4868]
without MB). For mtDNA, MY seems to be the most
‘‘influential’’. For Neolithic/Paleolithic contributions (Belle
et al. 2006), the estimator which is the most influential is
MW for French, Orcadians, Russians, or Sardinians (see
previous paragraph for French and Orcadians). For Rus-
sians and Sardinians, AII = [61.0325, 65.4394] with all
estimators and AII = [64.2899, 65.9296] without MW;
AII = [57.5389, 63.2999] with all estimators and
AII = [61.0883, 64.7815] without MW, respectively.
One can conclude that no specific estimator has an
influence on AII calculation and consequently, no model’s
assumptions nor specific admixture estimation method
have a particular influence on AII.
2. Admixture estimation methods could give negative or
null values. In this case, the coefficient of variation
cannot be used. A negative estimate could be inter-
preted as a null or negligible contribution from the
given parental population, or as the result of an
improper choice of parental populations, loci, param-
eters, etc. Because of the risk of an erroneous
interpretation, a conservative approach conduces to
remove this kind of values from any analysis. By
extension, since this parental population does not seem
to have contributed to the admixture event, one should
remove this parental population from any estimation.
To be able to include such a Mi in our AII, this Mi has
to be set to a value close to 0 (0.00001 as instance).
However, considering that it displays a great CV and
consequently has a negligible weight compared to
other Mi, we recommend to exclude it. In a same order
of idea, a null standard deviation means that the Mi has
Genetica (2014) 142:473–482 479
123
not been properly calculated and it has to be removed
as well.
3. Although, the AII has to be built from at least 3
estimators, we suggest to use at least four Mi. Indeed,
the AII is of special interest when a trend in admixture
is not obvious due to numerous results. Additionally,
the impact of a divergent or inaccurate Mi is as less as
the number of Mi is high.
4. Markers: because different types of markers (molec-
ular, uniparental, classical) follow a different admix-
ture pattern, they lead to different admixture
contribution estimates. To build an AII from Mi
obtained from heterogeneous types of markers is of
course a non-sense.
5. Increment d: in the given example, we advised in the first
step to use an incremental value of 0.0001. To use a
lower value (i.e. 0.00001 or less) will significantly
extend the computation time while the AII accuracy will
not be increased. In scenario (i) for parental population
1, AII is the same for d = 0.00001 and d = 0.0001
(AII = [81.0002, 84.9515]). To use a slightly higher
value (i.e. 0.001\d\ 0, 1) will shorten the computation
time without lowering the AII accuracy (AII = [80.99,
84.94] for d = 0.001). A d[ 0.01 will give a highly
biased AII (AII = [73.5, 78.3] for d = 0.1) so we advise
users against using such incremental values.
Other applications on actual populations
Our method has been developed to combine several esti-
mators into one trend in admixture. This implies that AII can
be calculated by combining several samples from a region or
a country, respectively. Instead of calculating a basic arith-
metic mean which can be biased by a unique divergent
estimate because it doesn’t include convergence or accuracy
of each estimator, AII provides an overview of admixture
contributions in the whole region or country and includes
properties of estimators. By the way, it offers an interesting
alternative to display a global admixture trend. For example,
Via et al. (2011) calculated genomic ancestry of six regional
samples across Puerto-Rico island and provided a mean
ancestry for the whole island (MNative Am. = 15.2 %;
MAfrican = 21.2 %; MEuropean = 63.7 %). Corresponding
Admixture Indicative Intervals (AIINative Am.
= [13.964–16.1071]; AIIAfrican = [17.3174–21.3468];
AIIEuropean = [58.3351–67.4745], Online Ressource 5, fig-
ure 3) include these means and the AII for the African con-
tribution is not significantly affected by the higher African
contribution in the East sample (M = 31.8 % while all other
estimates ranged from 16 to 21 %).
Similarly, combining estimates from several samples or
regions has an interest to compare a given subsample or
region to the whole sample or country. Rojas et al. (2010)
estimated admixture rates in 15 mestizo (admixed) sub-
populations across Colombia. Colombian’s population is
highly admixed and 86 % of individuals identified them-
selves as of mixed ancestry (Rojas et al. 2010). To compare
properly each region to the full Colombian population
helps to better understand colonization dynamics. For this
purpose, AII provides a reference interval to assess if a
given region followed the general trend in admixture or if it
followed a different introgression history (Online Res-
source 5, figure 4). For example, AII for autosomic esti-
mates allows highlighting that, even if Cundinamarca
population is in the general trend for Native American and
European parts, this population exhibits a lower African
ancestry than the whole Colombian population.
Moreover, AII makes possible comparisons between
studies and to know if trends in a given parental contri-
butions overlap. Some points can be raised with AII
obtained from Via et al. (2011) and Choudhry et al. (2006)
studies on Puerto-Rican populations (Online Ressource
5, figure 3). Via and colleagues estimates lead to wider AII,
implying more heterogeneous parental contributions
among Puerto-Rican populations than in the Choudhry and
colleagues study. For example, AIINative =
[13.964–16.1071] instead of AIINative =
[19.8806–20.4625], respectively. Interestingly, the Inter-
vals for Native American or for African contributions don’t
overlap over the two studies, whereas the European AII
calculated from Choudhry and colleagues estimates is
almost included in the one from Via and colleagues. We
observed that means for case controls in Choudhry and
colleagues are affected by divergent and inaccurate esti-
mates for the Carolina population due to a very limited
sample size (mean = 65.32 % instead of 60.2 % when
removing this population). Conversely, the AII are not
biased by these values (AIIEuropean = [64.8833–68.5528]
instead of AIIEuropean = [61.4674–68.1159]) and the mean
without biased value is closer to the AII centers (66.72 %
instead of 64.79 %).
Conclusion
Admixture Indicative Intervals proved to be able to sum-
marize properly heterogeneous admixture rates by taking
convergence between estimators as well as their respective
accuracy into account. This double-weighting leads to a
robust method which offers a trend in admixture instead of
isolated point estimates. In addition, the AII offers an
interesting alternative to arithmetic mean by providing a
reference value for comparative purposes. Besides, even if
the AII method obviously applies for population admixture
480 Genetica (2014) 142:473–482
123
studies, the AII can be calculated from parental contribu-
tions as well as from individual ancestry estimates (as in
Via et al. 2011).
The AII method has been implemented in AII.v2 soft-
ware (freely downloadable on http://AdFiT.free.fr).
Acknowledgments We like to thanks Sandrine Cabut for her help
in statistical analyzes.
References
Belle E, Landry PA, Barbujani G (2006) Origins and evolution of the
Europeans’ genome: evidence from multiple microsatellite loci.
Proc Biol Sci 273(1594):1595–1602
Bertorelle G, Excoffier L (1998) Inferring admixture proportions from
molecular data. Mol Biol Evol 15(10):1298–1311
Bertoni B, Budowle B, Sans M, Barton S, Chakraborty R (2003)
Admixture in Hispanics: distribution of ancestral population
contributions in the Continental United States. Hum Biol
75(1):1–11
Cavalli-Sforza LL, Menozzi P, Piazza A (1994) The history and
geography of human genes. Princeton University Press,
Princeton
Chakraborty R (1975) Estimation of race admixture—a new method.
Am J Phys Anthropol 42(3):507–511
Chakraborty R (1985) Gene identity in racial hybrids and estimation
of admixture rates. In: Ahuja YR, Neel JV (eds) Genetic
differentiation in human and other animal populations. Indian
Anthropological Association, Delhi, pp 171–180
Chakraborty R, Kamboh MI, Nwankwo M, Ferrell RE (1992)
Caucasian genes in American blacks: new data. Am J Hum
Genet 50(1):145–155
Chikhi L, Bruford MW, Beaumont MA (2001) Estimation of
admixture proportions: a likelihood-based approach using Mar-
kov chain Monte Carlo. Genetics 158(3):1347–1362
Choisy M, Franck P, Cornuet J (2004) Estimating admixture
proportions with microsatellites: comparison of methods based
on simulated data. Mol Ecol 13(4):955–968
Choudhry S, Burchard E, Borrell L, Tang H, Gomez I, Naqvi M,
Nazario S, Torres A, Casal J, Martinez-Cruzado J, Ziv E, Avila
P, Rodriguez-Cintron W, Risch N (2006) Ancestry–environment
interactions and asthma risk among Puerto Ricans. Am J Respir
Crit Care Med 174(10):1088–1093
Degioanni A, Gourjon G (2010) Le melange dans les populations
humaines: modeles et methodes d’estimation. Anthropologie
48(1):41–56
Durand E, Jay F, Gaggiotti OE, Francois O (2009) Spatial inference
of admixture proportions and secondary contact zones. Mol Biol
Evol 26(9):1963–1973
Edgar H (2009) Biohistorical approaches to ‘‘race’’ in the United
States: Biological distances among African Americans, Euro-
pean Americans, and their ancestors. Am J Phys Anthropol
139(1):58–67
Elston RC (1971) The estimation of admixture in racial hybrids. Ann
Hum Genet 35(1):9–17
Erdei E, Sheng H, Maestas E, Mackey A, White KA, Li L, Dong Y,
Taylor J, Berwick M, Morse DE (2011) Self-reported ethnicity
and genetic ancestry in relation to oral cancer and pre-cancer in
Puerto Rico. PLoS One 6(8):e23950
Excoffier L, Lischer HE (2010) Arlequin suite ver 3.5: a new series of
programs to perform population genetics analyses under Linux
and Windows. Mol Ecol Res 10(3):564–567
Excoffier L, Novembre J, Schneider S (2000) SIMCOAL: a general
coalescent program for the simulation of molecular data in
interconnected populations with arbitrary demography. J Hered
91(6):506–509
Excoffier L, Estoup A, Cornuet JM (2005) Bayesian analysis of an
admixture model with mutations and arbitrarily linked markers.
Genetics 169(3):1727–1738
Falush D, Stephens M, Pritchard JK (2003) Inference of population
structure using multilocus genotype data: linked loci and
correlated allele frequencies. Genetics 164(4):1567–1587
Fejerman L, Carnese F, Goicoechea A, Avena S, Dejean C, Ward R
(2005) African ancestry of the population of Buenos Aires. Am J
Phys Anthropol 128(1):164–170
Francois O, Currat M, Ray N, Han E, Excoffier L, Novembre J (2010)
Principal component analysis under population genetic models
of range expansion and admixture. Mol Biol Evol
27(6):1257–1268
Giovannini A, Zanghirati G, Beaumont MA, Chikhi L, Barbujani G
(2009) A novel parallel approach to the likelihood-based
estimation of admixture in population genetics. Bioinformatics
25(11):1440–1441
Glass B, Li C (1953) The dynamics of racial intermixture: an analysis
based on the American Negro. Am J Hum Genet 5(1):1–20
Gourjon G (2010) L’estimation du melange genetique dans les
populations humaines. PhD Thesis. University of Aix-Marseille
Gourjon G, Degioanni A (2009) AdFiT v1.7 (Admixture File Tool):
input files creating tool for genetic admixture estimation
software. Bull et Mem de la Societe d’Anthropologie de Paris
21(3–4):223–229
Gourjon G, Degioanni A (2012) Ancestry and admixture rates: the
matter of evolution in admixture study. J Biol Res (formerly
Bollettino della Societa Italiana di Biologia Sperimentale)
85(1):147–150
Gourjon G, Boetsch G, Degioanni A (2011) Gender and population
history: sex bias revealed by studying genetic admixture of
Ngazidja population (Comoro Archipelago). Am J Phys Anthro-
pol 144(4):653–660
Helgason A, Sigurðardottir S, Nicholson J, Sykes B, Hill EW,Bradley DG, Bosnes V, Gulcher JR, Ward R, Stefansson K
(2000) Estimating Scandinavian and Gaelic ancestry in the male
settlers of Iceland. Am J Hum Genet 67(3):697–717
Krieger H, Morton NE, Mi MP, Azevedo E, Freire-Maia A, Yasuda N
(1965) Racial admixture in north-eastern Brazil. Ann Hum Genet
29(2):113–125
Lathrop G (1982) Evolutionary trees and admixture: phylogenetic
inference when some populations are hybridized. Ann Hum
Genet 46(Pt3):245–255
Long J (1991) The genetic structure of admixed populations. Genetics
127(2):417–428
McEvoy B, Brady C, Moore LT, Bradly DG (2006) The scale and
nature of Viking settlement in Ireland from Y-chromosome
admixture analysis. Eur J Hum Genet 14(12):1288–1294
Morera B, Barrantes R, Marin-Rojas R (2003) Gene admixture in the
Costa Rican population. Ann Hum Genet 67(1):71–80
Pritchard J, Stephens M, Donnelly P (2000) Inference of population
structure using multilocus genotype data. Genetics
155(2):945–959
Roberts DF, Hiorns RW (1962) The dynamics of racial intermixture.
Am J Hum Genet 14:261–277
Roberts DF, Hiorns RW (1965) Methods of analysis of the genetic
composition of a hybrid population. Hum Biol 37:38–43
Rojas W, Parra M, Campo O, Caro M, Lopera J, Arias W, Duque C,
Naranjo A, Garcıa J, Vergara C, Lopera J, Hernandez E,
Valencia A, Caicedo Y, Cuartas M, Gutierrez J, Lopez S, Ruiz-
Linares A, Bedoya G (2010) Genetic make up and structure of
Genetica (2014) 142:473–482 481
123
Colombian populations by means of uniparental and biparental
DNA markers. Am J Phys Anthropol 143(1):13–20
Sans M (2000) Admixture studies in Latin America: from the 20th to
the 21st century. Hum Biol 72(1):155–177
Silva MC, Zuccherato LW, Soares-Souza GB, Vieira M, Cabrera L,
Herrera P, Balqui J, Romero C, Jahuira H, Gilman RH, Martins
ML, Tarazona-Santos E (2010) Development of two multiplex
mini-sequencing panels of ancestry informative SNPs for studies
in Latin Americans: an application to populations of the State of
Minas Gerais (Brazil). Genet Mol Res 9(4):2069–2085
Sousa V, Fritz M, Beaumont MA, Chikhi L (2009) Approximate
Bayesian computation without summary statistics: the case of
admixture. Genetics 181(4):1507–1519
Tang H, Peng J, Wang P, Risch N (2005) Estimation of individual
admixture: analytical and study design considerations. Genet
Epidemiol 28(4):289–301
Via M, Gignoux CR, Roth LA, Fejerman L, Galanter J, Choudhry S,
Toro-Labrador G, Viera-Vera J, Oleksyk TK, Beckman K, Ziv
E, Risch N, Burchard EG, Martınez-Cruzado JC (2011) History
shaped the geographic distribution of genomic admixture on the
island of Puerto Rico. PLoS One 6(1):e16513
Wang J (2003) Maximum-likelihood estimation of admixture pro-
portions from genetic data. Genetics 164(2):747–765
Wang J (2006) A coalescent-based estimator of admixture from DNA
sequences. Genetics 173(3):1679–1692
482 Genetica (2014) 142:473–482
123