Genome-Wide Characterization of Arabian Peninsula ...

12
Genome-Wide Characterization of Arabian Peninsula Populations: Shedding Light on the History of a Fundamental Bridge between Continents Veronica Fernandes,* ,1,2 Nicolas Brucato, 3 Joana C. Ferreira, 1,2,4 Nicole Pedro, 1,2 Bruno Cavadas, 1,2 Franc¸ois-XavierRicaut, 3 Farida Alshamali, 5 and Luisa Pereira 1,2,6 1 i3S – Instituto de Investigac¸~ ao e Inovac¸~ ao em Sa ude, Universidade do Porto, Porto, Portugal 2 IPATIMUP – Instituto de Patologia e Imunologia Molecular da Universidade do Porto, Porto, Portugal 3 Laboratoire Evolution & Diversit e Biologique (EDB UMR 5174), Universit e de Toulouse Midi-Pyr en ees, CNRS, IRD, UPS. 118 route de Narbonne, Bat 4R1, 31062 Toulouse cedex 9, France 4 Instituto de Ci ^ encias Biom edicas Abel Salazar (ICBAS), Universidade do Porto, Porto, Portugal 5 Department of Forensic Sciences and Criminology, Dubai Police General Headquarters, Dubai, United Arab Emirates 6 Faculdade de Medicina da Universidade do Porto, Porto, Portugal *Corresponding author: E-mail: [email protected]. Associate editor: Connie Mulligan Abstract The Arabian Peninsula (AP) was an important crossroad between Africa, Asia, and Europe, being the cradle of the structure defining these main human population groups, and a continuing path for their admixture. The screening of 741,000 variants in 420 Arabians and 80 Iranians allowed us to quantify the dominant sub-Saharan African admixture in the west of the peninsula, whereas South Asian and Levantine/European influence was stronger in the east, leading to a rift between western and eastern sides of the Peninsula. Dating of the admixture events indicated that Indian Ocean slave trade and Islamization periods were important moments in the genetic makeup of the region. The western–eastern axis was also observable in terms of positive selection of diversity conferring lactose tolerance, with the West AP developing local adaptation and the East AP acquiring the derived allele selected in European populations and existing in South Asia. African selected malaria resistance through the DARC gene was enriched in all Arabian genomes, especially in the western part. Clear European influences associated with skin and eye color were equally frequent across the Peninsula. Key words: Arabian Peninsula, genome-wide characterization, admixture, selection. Introduction The Arabian Peninsula (AP) holds the answers to adamant questions on the out-of-Africa (OOA) migration and the be- ginning of continental genetic diversity structure of modern human species. Two main OOA models occur in the litera- ture (Quintana-Murci et al. 1999; Lahr and Foley 2005; Macaulay et al. 2005): One states that the migration took place through the terrestrial Levantine link between Africa and Southwest Asia; the other claims that a small group of East Africans crossed the Red Sea, reaching south AP and following eastward. Archaeologists and geneticists are trying to discover evidence of the role played by the AP in this and other major events of human evolution. Genetics in the AP is being challenged by the impossibility to conduct ancient DNA studies, which contribute to breakthroughs in other regions of the world (Fu et al. 2016; Skoglund et al. 2017). In the absence of ancient biological human samples, genetics is restricted to study extant humans from the AP and neigh- boring populations. Genomic evidence strongly supports a single successful dispersal OOA around 50–70 thousand years ago (ka) (Soares et al. 2012; Malaspinas et al. 2016; Mallick et al. 2016), although the followed route remains under debate. Complete genomes are still limited at a population level, and in AP context they are available for only three subpopu- lations from Qatar (104 samples [Rodriguez-Flores et al. 2016]). Meanwhile, genome-wide chips containing thousands or millions of single nucleotide polymorphisms (SNPs) are becoming a useful tool in evaluating global human diversity, in elucidating admixture events and in mapping selection along the genome (Li et al. 2008; Rosenberg et al. 2010). However, they are not as secure in dating events that took place at prehistoric periods (Fernandes et al. 2015). The mo- lecular dating is based on linkage disequilibrium (LD) decay, estimating the time since admixture events by the decrease in haplotype size due to recombination. The time-window is limited to around 200 generations (4 ka), after which recombination destroys reliably detectable haplotypes (Hellenthal et al. 2014). First methods, as ROLLOF and ALDER (Moorjani et al. 2011; Loh et al. 2013), only identified a mean value of admixture age, merging up signs of multiple Article ß The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 36(3):575–586 doi:10.1093/molbev/msz005 Advance Access publication January 21, 2019 575 Downloaded from https://academic.oup.com/mbe/article/36/3/575/5288780 by guest on 25 July 2022

Transcript of Genome-Wide Characterization of Arabian Peninsula ...

Genome-Wide Characterization of Arabian PeninsulaPopulations Shedding Light on the History of a FundamentalBridge between Continents

Veronica Fernandes12 Nicolas Brucato3 Joana C Ferreira124 Nicole Pedro12 Bruno Cavadas12

Francois-Xavier Ricaut3 Farida Alshamali5 and Luisa Pereira126

1i3S ndash Instituto de Investigac~ao e Inovac~ao em Saude Universidade do Porto Porto Portugal2IPATIMUP ndash Instituto de Patologia e Imunologia Molecular da Universidade do Porto Porto Portugal3Laboratoire Evolution amp Diversite Biologique (EDB UMR 5174) Universite de Toulouse Midi-Pyrenees CNRS IRD UPS 118 route deNarbonne Bat 4R1 31062 Toulouse cedex 9 France4Instituto de Ciencias Biomedicas Abel Salazar (ICBAS) Universidade do Porto Porto Portugal5Department of Forensic Sciences and Criminology Dubai Police General Headquarters Dubai United Arab Emirates6Faculdade de Medicina da Universidade do Porto Porto Portugal

Corresponding author E-mail vfernandesipatimuppt

Associate editor Connie Mulligan

Abstract

The Arabian Peninsula (AP) was an important crossroad between Africa Asia and Europe being the cradle of thestructure defining these main human population groups and a continuing path for their admixture The screening of741000 variants in 420 Arabians and 80 Iranians allowed us to quantify the dominant sub-Saharan African admixture inthe west of the peninsula whereas South Asian and LevantineEuropean influence was stronger in the east leading to arift between western and eastern sides of the Peninsula Dating of the admixture events indicated that Indian Ocean slavetrade and Islamization periods were important moments in the genetic makeup of the region The westernndasheastern axiswas also observable in terms of positive selection of diversity conferring lactose tolerance with the West AP developinglocal adaptation and the East AP acquiring the derived allele selected in European populations and existing in South AsiaAfrican selected malaria resistance through the DARC gene was enriched in all Arabian genomes especially in the westernpart Clear European influences associated with skin and eye color were equally frequent across the Peninsula

Key words Arabian Peninsula genome-wide characterization admixture selection

IntroductionThe Arabian Peninsula (AP) holds the answers to adamantquestions on the out-of-Africa (OOA) migration and the be-ginning of continental genetic diversity structure of modernhuman species Two main OOA models occur in the litera-ture (Quintana-Murci et al 1999 Lahr and Foley 2005Macaulay et al 2005) One states that the migration tookplace through the terrestrial Levantine link between Africaand Southwest Asia the other claims that a small group ofEast Africans crossed the Red Sea reaching south AP andfollowing eastward Archaeologists and geneticists are tryingto discover evidence of the role played by the AP in this andother major events of human evolution Genetics in the AP isbeing challenged by the impossibility to conduct ancientDNA studies which contribute to breakthroughs in otherregions of the world (Fu et al 2016 Skoglund et al 2017)In the absence of ancient biological human samples geneticsis restricted to study extant humans from the AP and neigh-boring populations Genomic evidence strongly supports asingle successful dispersal OOA around 50ndash70 thousand yearsago (ka) (Soares et al 2012 Malaspinas et al 2016 Mallick

et al 2016) although the followed route remains underdebate

Complete genomes are still limited at a population leveland in AP context they are available for only three subpopu-lations from Qatar (104 samples [Rodriguez-Flores et al2016]) Meanwhile genome-wide chips containing thousandsor millions of single nucleotide polymorphisms (SNPs) arebecoming a useful tool in evaluating global human diversityin elucidating admixture events and in mapping selectionalong the genome (Li et al 2008 Rosenberg et al 2010)However they are not as secure in dating events that tookplace at prehistoric periods (Fernandes et al 2015) The mo-lecular dating is based on linkage disequilibrium (LD) decayestimating the time since admixture events by the decrease inhaplotype size due to recombination The time-window islimited to around 200 generations (4 ka) after whichrecombination destroys reliably detectable haplotypes(Hellenthal et al 2014) First methods as ROLLOF andALDER (Moorjani et al 2011 Loh et al 2013) only identifieda mean value of admixture age merging up signs of multiple

Article

The Author(s) 2019 Published by Oxford University Press on behalf of the Society for Molecular Biology and EvolutionAll rights reserved For permissions please e-mail journalspermissionsoupcom

Mol Biol Evol 36(3)575ndash586 doi101093molbevmsz005 Advance Access publication January 21 2019 575

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

migration events Improvements by Gravel et al (2013) andHellenthal et al (2014) have begun to disentangle the multi-ple events by organizing blocks in a distribution of sizes withbigger blocks meaning younger admixture events whereassmaller blocks indicate older admixture events AP individualsscreened for genome-wide chips (10 Yemeni 15 YemeniJews 20 Saudi and 14 Emirati) (Li et al 2008 Hellenthalet al 2014) allowed so far to identify the following admix-tures and corresponding ages 6ndash25 sub-Saharan Africaninput in the Arabian pool 8ndash37 generations ago(Fernandes et al 2015) and 129 sub-Saharan Africaninput by 1530 CE (Common Era) in Yemeni 223 by 1278CE in Saudi and 228 by 746 CE and 4 by 1754 CE inEmirati (Hellenthal et al 2014) Other genome-wide datasets were characterized for 168 Qatari (Hunter-Zinck et al2010) revealing 3 clear clusters consistent with Arabianorigin eastern or Persian origin and African admixtureand for 90 Yemeni (Vyas et al 2017) that reinforced theevidence that Levantine and southern Arabian popula-tions bear similar genetic relationships to both Africanand non-African populations

The admixture ages estimated from nuclear DNA are con-siderably younger than the ones estimated from mitochon-drial DNA (mtDNA) The mtDNA block is never interruptedby LD and in theory there is no time-window limit for ageestimates of mtDNA most recent comment ancestors Weand other authors were able to identify admixture eventsbetween the Levant and Arabia most likely via the Gulf cor-ridor in the Late Glacial and to a lesser extent the immediatepostglacialNeolithic (Cerny et al 2011 Al-Abri et al 2012Fernandes et al 2015 Vyas et al 2016) Lineages from thetime of the OOA affiliated in L African haplogroups are stillelusive in the AP However their direct descendants N1 N2and X had a relict distribution that suggested an ancientancestry within the AP which most likely spread from theGulf Oasis region toward the Levant and Europe at 55ndash24 ka(Fernandes et al 2012) MtDNA pointed out that the mainldquoback-to-Africardquo migrations occurred in the Late Glacial pe-riod for introductions into East Africa while the Neolithic wasmore significant for migrations toward North Africa(Musilova et al 2011 Hodgson et al 2014 Fernandes et al2015)

In the eighth millennium a remarkable maritime tradesystem was developed between Arabia Africa Levant andIndia (Cerny et al 2009 Fernandes et al 2015) It reached itspeak in the last 4 ka with the appearance of several earlykingdoms which dominated the Indian Ocean trading net-work It was so important that it left a genetic imprint in theAfrican Swahili Corridor (Brucato et al 2018) and in severalpopulations in Pakistan and India (Laso-Jadart et al 2017)Across the Red Sea the maritime traffic was dominant in theEgyptian Pre-Dynastic period and it was intensified through-out the Arab slave trade established from 25 ka to veryrecent times (Kivisild et al 2004) The Arab slave trade trans-ported 2400000 Africans from Nubia to Zanzibar to theLevant the AP and even India and China This trade wasmainly focused on female slaves (with a 21 female to maleratio) whom became domestic servants entertainers andor

concubine (Segal 2002 Lovejoy 2011 Fernandes et al 2015)The trade also played an important role in the spread of Islamafter AD 630 outward from Mecca and the subsequent rapidexpansion of the Arab Empire toward both the Atlantic andPacific Oceans (Hogarth 1904)

Selection screenings along the genome based on chip datasets are revealing that admixture events also allow descen-dant populations to acquire genetic adaptations that parentalpopulations evolved in the original habitat (Triska et al 2015Patin et al 2017) As malaria has been such a major selectiondriver in the African context (Roche et al 2017) East Africanpopulations of Arabs and Nubians (mainly Eurasian descen-dant) have an African-enriched background for the DARCgene region (Triska et al 2015) a pattern shared with theMakranis of Pakistan (Laso-Jadart et al 2017) This acquisitionof adaptation by admixture was also detected along theBantu expansion in southern Africa (Patin et al 2017) withthe western Bantu wave acquiring the HLA selection fromPygmies and the eastern Bantu receiving the input of north-ern East African LCT selected region It remains to be eluci-dated to which extent this kind of events played a role indesigning AP genetic diversity

In this work we performed a systematic genome-widecharacterization of the AP by screening 741000 SNPs inaround 420 individuals from Saudi Arabia Yemen Omanand the United Arab Emirates (UAE) and another 80 fromIran Having in mind the time-window this data set can shedlight on we focused on the last 4000 years highlighted in thiscareful high-resolution sampling of AP diversity By applyingadmixture mapping and haplotype-based screening of selec-tion we provided insight into local and acquired adaptationsof AP populations

Results and Discussion

Population Structure in the APWe tested the relationships of AP individuals with otherpopulations (ldquorestricted data setrdquo including relevant neigh-boring populations supplementary table S1 SupplementaryMaterial online) using ADMIXTURE This analysis identifiedsix ancestral populations as the best fit for the diversity pro-files of the analyzed populations (fig 1A other K plots andcross-validation graph are displayed in supplementary figs S1and S2 Supplementary Material online) AP populations pre-sented a high proportion of an ArabianNorth African com-ponent (in blue) especially in the west of the Peninsula 65in Saudi 61 in Yemen and Bedouin 36 in Oman and 32in the UAE The second highest component in the AP was aLevantCaucasian component (in red reaching 72 in Druze)28 in the UAE and Oman 20 in Yemen 18 in Bedouinand 16 in Saudi Arabia European background (in peach)was higher in Levantine populations (around 8ndash17) than inArabia (15ndash33) whereas sub-Saharan African (light anddark green) ancestry was 11 in Yemen the UAE andOman and then between 4 and 7 in Saudi Arabian andLevantine populations and vestigial (0ndash2) in Samaritansand Druze The Iranian population had a substantial poolfrom South Asia (28 light pink) consistent with its

Fernandes et al doi101093molbevmsz005 MBE

576

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Persian ancestry and this South Asian component was sim-ilarly frequent in the UAE and Oman (23ndash26) falling tobelow 10 frequency in all other Arabian and Levantpopulations

Principal Component Analysis (PCA) analyses revealedidentical population structure (supplementary fig S3Supplementary Material online) The first two principalcomponents separated the individuals along well-established geographical axes PC1 explained most of thediversity (799) and organized the populations in sub-Saharan African versus other with admixed Arabian and

Levantine individuals in between PC2 (92) separatedArabians and Levantines from Europeans and SouthAsians Most AP populations displayed a cluster withBedouin individuals and were very close to Levantine pop-ulations indicating that these individuals were less admixedwith non-Arabian populations A variable fraction of APindividuals displayed signs of admixture with North andsub-Saharan African populations spreading across the Xaxis In Iran Oman and the UAE some admixed individualswere more dispersed along the Y axis concentrating be-tween Levantine and South Asian populations

FIG 1 Population structure inferred by ADMIXTURE analysis (A) in which each individual is represented by a vertical (100) stacked column ofgenetic components proportions shown in color for K 6 Shared IBD fragments between pairs of individuals from the Arabian Peninsula Levantand Egyptthorn East Africa filtering for IBDlt 2 cM (B) IBD 2ndash5 cM (C) IBD 10ndash20 cM (D) and IBDgt20 cM (E) Populations are represented by a circleproportional to the sample size of the populations The Levantine populations are represented in pink AP in green and Egyptianthorn East African inblack Shared IBD fragments are represented by a line in pink to Levant and in green to AP The maps were adapted via Wikimedia Commons

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

577

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Gene Flow in the Region and Dating of AdmixtureThe set of algorithms based on haplotype sharing were testedin the ldquoextended data setrdquo (supplementary table S2Supplementary Material online) Refined Identity-by-descent(IBD) (Browning and Browning 2013) (fig 1BndashE) was appliedto AP Levantine Egyptian and East African populations andallowed us to identify shared IBD segments Since the lengthof shared haplotype decreases throughout time segmentlengths of lt2 cM and 2ndash5 cM are indicative of old commonancestry These results attested the genetic continuity in theregion given the high share of haplotypes Although nothighly differentiated the Levant seemed to share a loweramount of IBD with Kenya Somalia and Ethiopia when com-pared with what the AP shares with these populations Whenfiltering for shared IBD lengths 10ndash20 cM andgt20 cM (youn-ger shared history) the southern sharing (AP and East Africa)was dominant

The estimated effective migration surfaces (EEMS) plot(fig 2A) revealed a barrier between West and East AP pop-ulations East Africa shared a migration surface with SaudiArabia and Yemen populations whereas another genetic cor-ridor was evident between the Levant Iran and the East APThese results were corroborated by Wrightrsquos FST metric

(supplementary fig S4 Supplementary Material online)with Saudi Arabia being genetically closer to Yemen (0002)and the UAE closer to Oman (0001) than between bothsides of the Peninsula (0006) In terms of FST pairwise com-parisons between AP and South Asian populations the UAEdisplayed the lowest values (0003 with Iran 0007 withMakrani and 0009 with Balochi) and Iran was closer tothe Levant and the East AP (0003 with Syria Jordan andLebanon 0003 with the UAE and 0004 with Oman) thanto South Asian populations (0005 with Balochi 0006 withPathan and 0032 with India) All AP populations showed alower distance from the Levant (0004ndash0008) than fromEurope (0012ndash0027) East Africa (0084ndash0110) and finallyWest Africa (0095ndash0123) The same pattern of west-eastdiversity in the Peninsula was observed for the f3 statistics(supplementary fig S5 and supplementary table S3Supplementary Material online) with detection of Africanand LevantEuropean admixture in all Arabian populationsand admixture with South Asian ancestry limited to theEast AP

The clustered coancestry matrix from thefineSTRUCTURE and the ChromoPainter analyses whichexplores patterns of haplotype sharing among groups

FIG 2 Estimated effective migration surfaces gradient map around the Arabian Peninsula (A) The color scale reveals low (light gray) and high (darkgray) genetic barriers between populations localized on a grid of 500 demes The scale represents log values of the effective migration rates (m)Each deme (black dot) is proportional to the sample sizes of the populations Map was generated with the R package worldmap Dates ofadmixture events (B) in AP and Iranian populations as estimated by GLOBETROTTER Pie charts reflect the admixture proportions of ancestrieslisted in the legend Open circles in the date of admixture refer to one event whereas filled circles refer to multiple events and lines represent the95 confidence interval

Fernandes et al doi101093molbevmsz005 MBE

578

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

classified AP and Iranian individuals in 12 clusters attestingtheir diverse genetic reservoir (supplementary fig S6Supplementary Material online) Proportions of ancestriesreflected the ones obtained in ADMIXTURE mostly AP andIranian individuals were clustered with LevantNorthAfrican populations (92 from Yemen 92 Iran 92 theUAE 84 Saudi Arabia and 80 Oman) and to a lesserextent with South Asians (4ndash8 of UAE and Oman indi-viduals) and sub-Saharan Africans including with ComoroMadagascar and Swahili populations (12 Oman 12Saudi Arabia and 4 Iran)

Overall GLOBETROTTER results highlighted recent ad-mixture events in the AP region (fig 2B and supplementarytable S4 Supplementary Material online) Between 400 and1000 years before present admixture events took place withEast African ancestry in all AP populations The non-Africanbackground was similar to a European background in allpopulations except in the UAE and Iran where a closenessto South Asian ancestry was identified The sub-SaharanAfrican proportion was double in the West AP than in theEast AP In Saudi Arabia the UAE and Oman multidateevents were more probable but the second event was evenyounger (67 153 and 160 BP respectively) and involved thesame ancestries of the older event

Local Ancestry InferenceThe RFMix software package (Maples et al 2013) was used toanalyze the genomes of admixed Arabian and Iranian popu-lations and to detect regions showing excess of a given pa-rental population (sub-Saharan African or South Asian indetriment of the main EuropeanLevant background)Results revealed statistically significant enrichment of a par-ticular ancestry in certain regions of the genome which areassociated with selection signals due to adaptive traits such asmalaria resistance skin pigmentation and lactose tolerance(fig 3A and B and supplementary figs S7 and S8Supplementary Material online show results for all testedpopulations supplementary table S5 SupplementaryMaterial online)

The region of the genome with the highest proportion ofsub-Saharan African ancestry (gt4SD) observed in Arabianand Iranian populations was located on chromosome 1 ina gene-rich region containing the DARC gene and severalolfactory receptor genes such as OR10J1 OR10J3 andOR10J5 This sub-Saharan African input was higher inYemen (a mean of 061 supplementary fig S9Supplementary Material online) than in other regions ofthe Peninsula (049 Oman 035 the UAE 028 Saudi Arabiaand 016 in Iran) Another region highly enriched in all pop-ulations for sub-Saharan African ancestry was located in chro-mosome 9 containing genes coding for forkhead box D4(FOXD4L4 and FOXD4L2) and spermatogenesis-associatedproteins (FAM75A5 and FAM75A7)

The RFMix analysis indicated a South Asian-enriched re-gion on chromosome 2 containing the RAB3GAP1 ZRANB3LCT MCM6 and CXCR4 genes They have been previouslyassociated with lactase persistence the ability to digest milkinto adulthood which is an adaptive trait mostly found in

European populations and a classic example of genetic adap-tation in humans (Tishkoff et al 2007 Campbell and Tishkoff2008 Triska et al 2015) This region reached highly significantvalues (gt4SD) in Saudi and Yemen and was also significant inthe UAE and Iran (3SD lt X lt 4SD) Another South Asian-enriched region (3SD lt X lt 4SD) was detected on chromo-some 5 in Yemen Saudi Arabia and the UAE containing theSLC45A2 gene associated with skin eye and hair pigmenta-tion Nevertheless the most significant South Asian-enrichedsignal detected in all populations (gt4SD) was located inchromosome 6 containing many genes related with the im-mune system (HLA-B HLA-DR HLA-DQ MICA MICB andTNF)

Selection in the AP and IranWhen checking positive selection signals in the AP and Iranby using integrated haplotype score (iHS) and populationextended haplotype homozygosity (XP-EHH) measures sev-eral genomic regions displayed significant values (fig 3CndashFand supplementary figs S10ndashS13 and supplementary tablesS6ndashS9 Supplementary Material online)

Particularly several signals previously associated with se-lection in non-African populations were detected East APand Iranian populations exhibited a strong signal of positiveselection (gt4SD) in the XP-EHH-European comparison onchromosome 2 region containing the R3HDM1 and LCTgenes associated to Europeans with food conversion effi-ciency and metabolism of lactose (Zhao et al 2015) TheSLC24A5 gene on chromosome 15 which is strongly associ-ated with skin pigmentation variation among European andSouth Asian populations (Basu Mallick et al 2013) presentedsigns of positive selection in all populations in iHS (gt4SD forYemen Saudi Arabia and the UAEgt3SD for Oman and Iran)and XP-EHH-South Asian (gt3SD for all populations) analysesAnother skin pigmentation-related gene SLC45A2 which islocated on chromosome 5 and is known to be strongly asso-ciated to this phenotype in European populations (Candilleet al 2012 Hernandez-Pacheco et al 2017) displayed signs ofpositive selection in all populations for the XP-EHH-Europeancomparison (gt4SD in Saudi Arabia and gt3SD in remainingpopulations) The same occurred for chromosome 4 regioncontaining the TLR1 TLR6 and TLR10 genes (role in pathogenrecognition and activation of innate immunity [Laayouniet al 2014] gt4SD in Yemen and Saudi Arabia and gt3SDin remaining populations) and on chromosome 12 regioncontaining the ATXN2 ACAD10 and ALDH2 genes (associ-ated with hypertension [Russo et al 2018] gt4SD in SaudiArabia and gt3SD in remaining populations)

In the comparison with African populations (XP-EHH-African) significant values (gt3SD) were observed for thechromosome 1 region containing the DARC gene in SaudiArabia and Iran Amongst the strongest signals of positiveselection was the chromosome 21 region containing theC21orf34 MIR99A and MIRLET7C genes detected in all pop-ulations (gt4SD) both in iHS and XP-EHH-African tests Thissignal was previously identified as amongst the strongest se-lection signals in non-Africans (Pickrell et al 2009) and re-cently C21orf34 was identified as MIR99AHG a miRNA gene

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

579

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

FIG 3 Circos plots highlighting significantly (gt4SD) enriched local admixed and positively selected segments in Arabian Peninsula and Iranianpopulations (A) Enriched sub-Saharan African ancestry (B) Enriched South Asian ancestry (C) iHS selection metrics (D) XP-EHH versus Europeanancestry (E) XP-EHH versus sub-Saharan African ancestry (F) XP-EHH versus South Asian ancestry

Fernandes et al doi101093molbevmsz005 MBE

580

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

involved in the regulation of hematopoiesis and oncogenesfor the development of myeloid leukemia (Emmrich et al2014) In addition a signal on the chromosome 10 regioncontaining the SPAG6 gene was observed in all populationsin iHS (except Yemen) and XP-EHH-African (stronger inOman and the UAE) analyses This gene is essential for thestructural integrity of the central microtubule of sperm sta-bility and flagellar motility and has been previously identifiedas being under selection before the separation betweenEuropean and Asian populations (Racimo 2016) The signifi-cant signal in iHS score on chromosome 7 region containingseveral genes (CYP3A5 ZSCAN25 ATP5J2-PTCD1 CPSF4ATP5J2 ZNF789 and ZKSCAN5) was also detected in all pop-ulations at the XP-EHH-African comparison CYP3A5 gene isinvolved in cholesterol metabolism and steroid biosynthesisin the liver and has been identified to be under positive se-lection in African and non-African populations (Wagh et al2012) CYP3A513 polymorphism was also described as be-ing under selective pressure for water retention and risk forsalt-sensitive hypertension in equatorial populations(Thompson et al 2004)

The iHS metrics identified a positively selected region onchromosome 6 in all populations which contains candidategenes involved in the human immunodeficiency virus (HIV)stages of the viral lifecycle the ZNRD1-AS1and TRIM26 genesZNRD1-AS1 plays a role in the regulation of cell proliferationand is essential for the completion of HIV lifecycle Itsrs3132130 SNP was described to confer host resistance toHIV-1 acquisition by causing a loss of nuclear factor bindingand decreasing the ZNRD1 promoter activity (An et al 2014)Although with unknown function the tripartite motif (TRIM)proteins may contribute to the innate immunity of retrovi-ruses affecting both early and late stages of the retroviral lifecycle (Ozato et al 2008 Uchil et al 2008) Another signifi-cantly iHS selected region in all populations except the SaudiArabians was identified on chromosome 22 region contain-ing the TTC38 PKDREJ GTSE1 and PPARA genes PPARA geneis a transcription receptor that regulates lipid and glucosemetabolism during food deficiency and a recent study(Tekola-Ayele et al 2015) found a specific signal of recentpositive selection of this gene in Ethiopia suggesting a met-abolic adaptation to high-altitude hypoxia

ConclusionsThe detailed genome-wide characterization of Arabian andIranian populations allowed us to establish for the first timethe palimpsest of admixture and selection along the genomesof these inhabitants the main bridge between African Asianand European population groups Admixture events com-prised up to 20 of sub-Saharan African ancestry in theWest and 20 of South Asian ancestry in the East of theAP enriching the local predominant Arabian- North African-Levantine and Caucasian-like components The sub-SaharanAfrican input decreased toward the East whereas the SouthAsian input decreased in the direction of the West of thePeninsula testifying the continuous genetic exchange withAfrica and Asia as it was evident in the IBD sharing analysis

Of course there are specific subgroups that depart from thesegeneral characteristics such as the Qatari Bedouins that rep-resent a ldquocoldspotrdquo of admixture in the AP (Rodriguez-Floreset al 2016) whereas the Hadrami in Yemen are a ldquohotspotrdquo ofadmixture (Vyas et al 2017) EEMS analysis identified higherexchanges of genes between Africa and the West AP as wellas between the East AP and the LevantSouth Asia thanbetween the West and the East AP This result agrees withpreviously documented degree of isolation between Westand East AP populations identified when studying uniparen-tal markers (Alshamali et al 2009 Fernandes et al 2012 2015)We previously found relict maternal lineages (60 ka) of theoriginal OOA successful settlement by modern humans in theAP and most likely they spread to South Asia and the LevantEurope from the Gulf corridor (Fernandes et al 2012)However current LD decayndashbased dates only indicated re-cent admixture events in the eastern side of the PeninsulaSimilarly younger ages of admixture in the West AP reflectedthe Islamization of North Africa as the most probable maincontributor of this region The exchanges across the Red Seadetected for the autosomal data matched the Arab slavetrade and maritime dominance

Recent studies have provided evidence that admixed pop-ulations acquired positively selected genomic regions fromtheir parental populations or in other words the distributionof admixed blocks across the genomes of descendants wasnot even (Triska et al 2015 Laso-Jadart et al 2017 Patin et al2017) The acquired selected regions can be inferred from theconcomitant identification of the same genomic regions inthe selection metrics and in the local enrichment ancestryanalysis AP populations displayed a jigsaw puzzle of AfricanEuropean and South Asian-selected genes linked to responseto infection diseases skin pigmentation lactase persistenceand food conversion

The strongest African-related enrichment sign was for theDARC gene which codes the Duffy antigen a membrane-bound chemokine receptor used by Plasmodium vivax toinfect red blood cells Plasmodium vivax causes a chronicform of malaria and is the most widespread type of malariaoutside Africa (McManus et al 2017) It is described that thederived C allele mutation in the promoter which causeserythroid-specific suppression of gene expression confers re-sistance against Plasmodium vivax malaria and thus it is al-most fixed in African populations (Triska et al 2015 Laso-Jadart et al 2017 McManus et al 2017) The AP is known tohave a diverse malaria epidemiology in recent times from alikely absence of transmission in Kuwait to an intense parasiteexposure similar to conditions in many parts of Africa amongresidents of Saudi Arabia and Yemen (Snow et al 2013) Thedetected enrichment signal confirmed here in the genomes ofArabians supported the higher malaria pressure in the WestAP The enrichment was strong enough to allow detection ofselection in the XP-EHH metrics in Saudi Arabia and Iran

The most significant EuropeanSouth Asian-related enrich-ments in AP populations were linked to lactase tolerance-specific chromosome 2 RAB3GAP1LCTMCM6 region skinpigmentation associated to the SLC24A5 gene and HLA re-gion on chromosome 6 The LCT gene 13915G allele

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

581

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

(rs41380347) was described as being present at high fre-quency in the AP and the Levant and estimated to havebeen originated 4 ka (62 ka) in the AP (Bayoumi et al2016) possibly as a result of the domestication of the Arabiancamel at 6 ka (Ranciaro et al 2014) By contrast theEuropean 13910T selected allele (rs4988235) is also themost commonly observed in South Asia (Gallego Romeroet al 2012) We could not verify frequencies of rs41380347SNP that was not part of the chip genotyped here but wechecked the linked rs4988243 SNP This last SNP testified thehigher frequencies in the West versus the East AP (432 inYemen 461 in Saudi Arabia 179 in Oman 191 in theUAE and 174 in Iran notice that although rs41380347 andrs4988243 are linked this last SNP has overall frequencies20 higher than the first) The genotyped Europeanrs4988235 SNP presented higher frequencies of the derivedallele in the East than in the West AP (06 13 125182 and 103 respectively) The haplotype backgroundof Arabian and European derived alleles are related and dis-tant from the African one (Ranciaro et al 2014) whichexplains the high non-African enrichment detected inRFMix analysis in the West AP (the Arabian haplotypes)and the selection signal in XP-EHH-European comparison inThe East AP and Iran (the European haplotypes) This is aninteresting case generated by the isolation between the twosides of the Peninsula where the West developed a localpositive selection of a lactose tolerant variant whereas theEast acquired the EuropeanSouth Asian variant Previouslythis difference had been reported for ethnic groups withinOman (Bayoumi et al 2016) where frequencies of13910Tand 13915G alleles diverged between Arabs of northernand southern Oman whom migrated from Yemen (0ndash1 and14ndash72 respectively) and Omanis of Asian origin (16 and0 respectively) Here we verified that this pattern is ex-tended to the entire Peninsula

In relation to the genetic diversity linked to skin color theEuropeanSouth Asian-selected derived SNPs are equally fre-quent in both the West and East AP (rs1426654 in theSLC24A5 gene 101 in Yemen 133 in Saudi Arabia132 in Oman 135 in the UAE and 118 in Iranrs183671 in the SLC45A2 gene 354 294 227 290and 401 respectively rs1667394 in the OCA2HERC2 gene381 288 307 335 and 349 respectively)

The palimpsest of complex interactions between cross andlocal selective pressures and demographic factors identifiedhere in the AP testifies the rich genetic dynamics taking placein bridging zones of the globe In the near future wholegenomes of Arabians across the entire peninsula (with amore precise geographical information than the one we couldpresent here) and more sensitive selection measures will addinformation to the role played by this geographic region inthe adaptation of modern humans to non-African habitatsImprovements to dating techniques applicable to completegenomes such as multiple sequentially Markovian coalescentapproach will also help in answering the question if the APwas only a passage way to reach further distances or one ofthe oldest inhabited regions OOA Complete genomes mayindeed shed light on the lost period in the mtDNA pool due

to the absence of the direct OOA descendants The first stepshave already been taken and a few Arabian completegenomes have begun to be available (John et al 2015Mallick et al 2016 Rodriguez-Flores et al 2016)

Materials and Methods

Population Samples Genome-Wide Genotyping andPublished DataDNA samples analyzed in this study were collected from 420individuals of the AP (Saudi Arabia Yemen Oman and theUAE) and from 80 individuals of Iran populations (furtherinformation provided in supplementary table S1Supplementary Material online) Individuals were residing inDubai at the time of sampling (between years 1991 and 2000)representing a widespread collection of birth origins fromSaudi Arabia Yemen Oman the UAE and Iran This studyobtained ethical approval from the Ethics Committee of theUniversity of Porto Portugal (17CEUP2012) The individualswere genotyped with the Illumina Human Omni ExpressBead Chip (OmniExpress) containing 741000 SNPs Qualitychecks were applied using PLINK (Purcell et al 2007 Changet al 2015) and 22 individuals were excluded for the lack ofmore than 10 of genotypes A preliminary PCA revealedthat some UAE and Oman individuals (five individuals fromeach population) had a considerable sub-Saharan African an-cestry an indicator that they were most probably recentimmigrants from Africa (supplementary fig S14Supplementary Material online) Hence these individualswere removed from the analyses leading to a final set of468 individuals Markers with genotyping call rates lowerthan 95 were removed consequently a total of 627435autosomal SNPs passed quality control check The genotypesfrom the individuals analyzed in this study can be accessedfrom the EGA repository (European Genome-PhenomeArchive accession number EGAS00001003335) In order toperform a fine-scale population structure analysis of our com-plete data the 468 individuals were merged with relevantavailable data sets (supplementary table S1 SupplementaryMaterial online) As several population genetic analyses as-sume independent markers SNPs were pruned for pairwiseLD in PLINK (wwwcog-genomicsorgplink19 Purcell et al2007 Chang et al 2015) by removing any SNP that had an r2

gt 04 with another SNP within a 50-SNPs sliding windowwith a step of 20 SNPs After pruning the ldquorestricted data setrdquoincluded 176309 autosomal SNPs and 1782 individuals (sup-plementary table S1 Supplementary Material online) andthis data set was used in the analyses based on allele frequen-cies (PCA ADMIXTURE FST and f3 statistic) Another data setwas created by randomly selecting a maximum of 25 individ-uals from each population but extending the number andthe geographical distribution of populations this data setconsisting of 171728 SNPs and 2056 individuals was desig-nated as ldquoextended data setrdquo (supplementary table S2Supplementary Material online) The homogenization ofthe number of samples per group avoided statistical biasesdue to overrepresentation of some populations and allowedto run computationally demanding algorithms such as EEMS

Fernandes et al doi101093molbevmsz005 MBE

582

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

fineSTRUCTURE Chromopainter GLOBETROTTER and IBDAll genotypes from the unpruned ldquoextended data setrdquo werephased with SHAPEIT v2r79044 (Delaneau et al 2011) usingthe 1000 Genomes phased data (Auton et al 2015) as a ref-erence panel and the HapMap phase 2 genetic map (Frazeret al 2007) We did not merge our data with relevant data setsfrom Qatar and Yemen (Hunter-Zinck et al 2010 Vyas et al2017) as these were screened through Affymetrix chips bear-ing a low number of overlapping SNPs Qatari completegenomes from (Rodriguez-Flores et al 2016) were specificallyselected not making a random population sample whichcould be comparable with our populations

Population Structure Differentiation and InbreedingPCA which infers worldwide axes of human genetic variationfrom the allele frequencies of various populations was carriedout in the pruned ldquorestricted data setrdquo by using the smartpcatool included in the EIGENSOFT package (Patterson et al2006 Price et al 2006) We ran ADMIXTURE (Alexander et al2009) to decompose genetic ancestries of the prunedldquorestricted data setrdquo using maximum likelihood for compo-nents Kfrac14 2 to Kfrac14 12 with the optimal K estimated throughcross-validation of the logistic regression (Kfrac14 6) Wrightrsquos FST

metric was calculated using Vcftools (Danecek et al 2011)The THREEPOP program from TreeMix was used to calculatef3 statistics (Reich et al 2009 Patterson et al 2012 Pickrell andPritchard 2012) which tests between alternative scenarios ofadmixture In order to analyze and visualize spatial populationstructure from georeferenced genetic samples and to definegenetic barriers and corridors the method of EEMS (Petkovaet al 2016) was used Geographic coordinates and a geneticdissimilarity matrix between populations were used to set upa map defining a grid of 500 demes and 3000000 Markovchain Monte Carlo (MCMC) iterations were run beforechecking for convergence Plots were generated in R followingEEMS v1 manual indications

Inference and Dating of Population AdmixturePopulation structure of the phased ldquoextended data setrdquo wasevaluated using the fineSTRUCTURE v207 package 18(Lawson et al 2012) fineSTRUCTURE captures informationon population structure provided by patterns of haplotypesimilarity calculated with Chromopainter v20 18 (Lawsonet al 2012) and performs a model-based Bayesian clusteringof genotypes From the results a coancestry heat map and adendrogram were inferred to visualize the number of statis-tically defined clusters that describe the data The clustersobtained by fineSTRUCTURE were then used to assign thepopulations as target or donorsurrogate when testing foradmixture Mutational rates and effective population sizeparameters were first estimated with an Estimation-Maximization algorithm running Chromopainter v2 on all22 autosomes for the entire data set with 10 iterations(Lawson et al 2012) The weighted average of these param-eters according to the SNP coverage of each chromosomeand the number of individuals was then used to compute thechromosomal painting Two Arabian groups were identifiedas target populations the western group represented by

Yemen and Saudi Arabia and the eastern group representedby Oman and the UAE We ran chromosomal painting foreach group Due to the possibility of common ancestry weexcluded the Levantine populations from surrogate popula-tions The painted chromosomes obtained for each clusterwere used in GLOBETROTTER v10 26 (Hellenthal et al 2014)to estimate the ratios and dates of the potential admixtureevents in the region Coancestry curves were estimated foreach target population both standardizing against a null(NULL 1) and standard (NULL 0) individual and consistencybetween each estimated parameter was checked A total of100 bootstrap resamplings were performed to estimate theprobability values of the admixture events considering theldquoNULLrdquo individual by calculating the proportion of non-sensical inferred dates (lt1 or gt400 generations) producedby the model (NULL 1) The ldquobest-guessrdquo scenario given byGLOBETROTTER v10 26 (Hellenthal et al 2014) was consid-ered for each target population Dates of admixture given ingenerations were converted to chronological time using ageneration interval of 29 years (Langergraber et al 2012)

Haplotype sharing between pairs of individuals was esti-mated by the refined IBD algorithm of Beagle v4026 27(Browning and Browning 2007) and visualized withCytoscape v32160 (Shannon et al 2003) All maps inthe present study were generated using Global Mapperv15 (httpwwwbluemarblegeocomproductsglobal-mapperphp)

Local Ancestry InferenceRFMix (Maples et al 2013) software which employs a LDmodel between markers to infer ancestry for each segmentof the genome between a mixture of putative ancestral panelsof haplotypes was used in the unpruned and phased data setfrom the four AP populations plus Iran The phased data fromthe 1000 Genomes populations of Great Britain (representingEuropean ancestry) Yoruba from Nigeria (sub-SaharanAfrican ancestry) and Indian Telugu (South Asian ancestry)were used as parental populations Information on ancestrywas obtained for each locus along chromosomes for everyindividual and these values were averaged in each popula-tion In order to identify regions which had a significantlyhigher proportion of a given parental ancestry we followedpreviously published studies (Bryc et al 2015) consideringonly regions outside the range defined by the genomemean 6 4SD The karyograms of the local ancestry acrossthe 22 chromosomes were generates in R using the ggbiopackage (Yin et al 2012)

Analysis of SelectioniHS (Voight et al 2006) was estimated in the SelScan package(Szpiech and Hernandez 2014) in the phased data set fromthe four AP populations plus Iran Scores were calculated foreach SNP within the population as a whole and standardizedwithin each of 100 bins of allele frequency using the normtool In order to identify loci potentially affected by positiveselection a percentile score was assigned based on the pro-portion of extreme iHS values in segment SelScan (Szpiechand Hernandez 2014) tool was also used to estimate cross

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

583

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

population extended haplotype homozygosity in each APpopulation by comparing with Great Britain Yoruba fromNigeria or Indian Telugu Consistently with other tests forselection only markers with MAFgt 1 were used for XP-EHH computation The obtained XP-EHH values were nor-malized by subtracting genome-wide mean XP-EHH and di-viding by standard deviation (Szpiech and Hernandez 2014)Genomic regions with percentile score exceeding 4SD and3SD were reported

Selected SNPs were mapped to genes in 5 kb flanking re-gion and significantly selected regions were represented incircos Plots generated in R (R Development Core Team 2018)using the shinyCircos graphical interface (Yu et al 2018)

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsThis work was financed by FEDERmdashFundo Europeu deDesenvolvimento Regional funds through COMPETE2020mdashOperacional Programme for Competitiveness andInternationalisation (POCI) Portugal 2020 and byPortuguese funds through FCTmdashFundac~ao para a Ciencia ea TecnologiaMinisterio da Ciencia Tecnologia e Inovac~ao inthe framework of the project ldquoBiomedical anthropologicalstudy in Arabian Peninsula based on high throughputgenomicsrdquo (POCI-01-0145-FEDER-016609) VF has a postdocgrant through FCT (SFRHBPD1149272016)

ReferencesAl-Abri A Podgorna E Rose JI Pereira L Mulligan CJ Silva NM Bayoumi

R Soares P Cerny V 2012 PleistocenendashHolocene boundary inSouthern Arabia from the perspective of human mtDNA variationAm J Phys Anthropol 149(2)291ndash298

Alexander DH Novembre J Lange K 2009 Fast model-based estimationof ancestry in unrelated individuals Genome Res 19(9)1655ndash1664

Alshamali F Pereira L Budowle B Poloni ES Currat M 2009 Localpopulation structure in Arabian Peninsula revealed by Y-STR diver-sity Hum Hered 68(1)45ndash54

An P Goedert JJ Donfield S Buchbinder S Kirk GD Detels R Winkler CA2014 Regulatory variation in HIV-1 dependency factor ZNRD1 asso-ciates with host resistance to HIV-1 acquisition J Infect Dis210(10)1539ndash1548

Auton A Brooks LD Durbin RM Garrison EP Kang HM Korbel JOMarchini JL McCarthy S McVean GA Abecasis GR 2015 A globalreference for human genetic variation Nature 526(7571)68ndash74

Basu Mallick C Iliescu FM Mols M Hill S Tamang R Chaubey G Goto RHo SYW Gallego Romero I Crivellaro F et al 2013 The light skinallele of SLC24A5 in South Asians and Europeans shares identity bydescent PLoS Genet 9(11)e1003912

Bayoumi R De Fanti S Sazzini M Giuliani C Quagliariello A Bortolini EBoattini A Al-Habori M Al-Zubairi AS Rose JI et al 2016 Positiveselection of lactase persistence among people of Southern ArabiaAm J Phys Anthropol 161(4)676ndash684

Browning BL Browning SR 2013 Improving the accuracy and efficiencyof identity-by-descent detection in population data Genetics194(2)459ndash471

Browning SR Browning BL 2007 Rapid and accurate haplotype phasingand missing-data inference for whole-genome association studies byuse of localized haplotype clustering Am J Hum Genet81(5)1084ndash1097

Brucato N Fernandes V Mazieres S Kusuma P Cox MP Ngrsquoangrsquoa JWOmar M Simeone-Senelle MC Frassati C Alshamali F et al 2018The Comoros show the earliest Austronesian gene flow into theSwahili corridor Am J Hum Genet 102(1)58ndash68

Bryc K Durand EY Macpherson JM Reich D Mountain JL 2015 Thegenetic ancestry of African Americans Latinos and EuropeanAmericans across the United States Am J Hum Genet 96(1)37ndash53

Campbell MC Tishkoff SA 2008 African genetic diversity implica-tions for human demographic history modern human originsand complex disease mapping Annu Rev Genomics Hum Genet9403ndash433

Candille SI Absher DM Beleza S Bauchet M McEvoy B GarrisonNA Li JZ Myers RM Barsh GS Tang H et al 2012 Genome-wideassociation studies of quantitatively measured skin hair and eyepigmentation in four European populations PLoS One 7(10)e48294

Cerny V Mulligan CJ Fernandes V Silva NM Alshamali F Non A HarichN Cherni L El Gaaied AB Al-Meeri A et al 2011 Internal diversifi-cation of mitochondrial haplogroup R0a reveals post-last glacialmaximum demographic expansions in South Arabia Mol Biol Evol28(1)71ndash78

Cerny V Pereira L Kujanova M Vasikova A Hajek M Morris M MulliganCJ 2009 Out of Arabiamdashthe settlement of island Soqotra as revealedby mitochondrial and Y chromosome genetic diversity Am J PhysAnthropol 138(4)439ndash447

Chang CC Chow CC Tellier LC Vattikuti S Purcell SM Lee JJ 2015Second-generation PLINK rising to the challenge of larger and richerdatasets Gigascience 47

Danecek P Auton A Abecasis G Albers CA Banks E DePristo MAHandsaker RE Lunter G Marth GT Sherry ST et al 2011 The variantcall format and VCFtools Bioinformatics 27(15)2156ndash2158

Delaneau O Marchini J Zagury JF 2011 A linear complexity phasingmethod for thousands of genomes Nat Methods 9(2)179ndash181

Emmrich S Streltsov A Schmidt F Thangapandi VR Reinhardt DKlusmann JH 2014 LincRNAs MONC and MIR100HG act as onco-genes in acute megakaryoblastic leukemia Mol Cancer 13171

Fernandes V Alshamali F Alves M Costa MD Pereira JB Silva NMCherni L Harich N Cerny V Soares P et al 2012 The Arabian cradlemitochondrial relicts of the first steps along the southern route outof Africa Am J Hum Genet 90(2)347ndash355

Fernandes V Triska P Pereira JB Alshamali F Rito T Machado AFajkosova Z Cavadas B Cerny V Soares P et al 2015 Genetic stra-tigraphy of key demographic events in Arabia PLoS One10(3)e0118625

Frazer KA Ballinger DG Cox DR Hinds DA Stuve LL Gibbs RA BelmontJW Boudreau A Hardenbol P Leal SM et al 2007 A second gener-ation human haplotype map of over 31 million SNPs Nature449(7164)851ndash861

Fu Q Posth C Hajdinjak M Petr M Mallick S Fernandes D FurtwanglerA Haak W Meyer M Mittnik A et al 2016 The genetic history of IceAge Europe Nature 534(7606)200ndash205

Gallego Romero I Basu Mallick C Liebert A Crivellaro F Chaubey G ItanY Metspalu M Eaaswarkhanth M Pitchappan R Villems R et al2012 Herders of Indian and European cattle share their predomi-nant allele for lactase persistence Mol Biol Evol 29(1)249ndash260

Gravel S Zakharia F Moreno-Estrada A Byrnes JK Muzzio M Rodriguez-Flores JL Kenny EE Gignoux CR Maples BK Guiblet W et al 2013Reconstructing Native American migrations from whole-genomeand whole-exome data PLoS Genet 9(12)e1004023

Hellenthal G Busby GBJ Band G Wilson JF Capelli C Falush D Myers S2014 A genetic atlas of human admixture history Science343(6172)747ndash751

Hernandez-Pacheco N Flores C Alonso S Eng C Mak ACY Hunstman SHu D White MJ Oh SS Meade K et al 2017 Identification of a novellocus associated with skin colour in African-admixed populationsSci Rep 744548

Hodgson JA Mulligan CJ Al-Meeri A Raaum RL 2014 Early back-to-Africa migration into the horn of Africa PLoS Genet 10(6)e1004393

Fernandes et al doi101093molbevmsz005 MBE

584

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Hogarth DG 1904 The penetration of Arabia a record of the develop-ment of western knowledge concerning the Arabian Peninsula NewYork Cambridge Library Collection

Hunter-Zinck H Musharoff S Salit J Al-Ali KA Chouchane L Gohar AMatthews R Butler MW Fuller J Hackett NR et al 2010 Populationgenetic structure of the people of Qatar Am J Hum Genet87(1)17ndash25

John SE Thareja G Hebbar P Behbehani K Thanaraj TA Alsmadi O2015 Kuwaiti population subgroup of nomadic Bedouin ancestrymdashwhole genome sequence and analysis Genom Data 3116ndash127

Kivisild T Reidla M Metspalu E Rosa A Brehm A Pennarun E Parik JGeberhiwot T Usanga E Villems R 2004 Ethiopian mitochondrialDNA heritage tracking gene flow across and around the gate oftears Am J Hum Genet 75(5)752ndash770

Laayouni H Oosting M Luisi P Ioana M Alonso S Rica~no-Ponce ITrynka G Zhernakova A Plantinga TS Cheng S-C et al 2014Convergent evolution in European and Rroma populations revealspressure exerted by plague on Toll-like receptors Proc Natl Acad SciU S A 111(7)2668ndash2673

Lahr MM Foley R 2005 Multiple dispersals and modern human originsEvol Anthropol 3(2)48ndash60

Langergraber KE Prufer K Rowney C Boesch C Crockford C Fawcett KInoue E Inoue-Muruyama M Mitani JC Muller MN et al 2012Generation times in wild chimpanzees and gorillas suggest earlierdivergence times in great ape and human evolution Proc Natl AcadSci U S A 109(39)15716ndash15721

Laso-Jadart R Harmant C Quach H Zidane N Tyler-Smith C Mehdi QAyub Q Quintana-Murci L Patin E 2017 The genetic legacy of theIndian Ocean slave trade recent admixture and post-admixture se-lection in the Makranis of Pakistan Am J Hum Genet101(6)977ndash984

Lawson DJ Hellenthal G Myers S Falush D 2012 Inference of populationstructure using dense haplotype data PLoS Genet 8(1)e1002453

Li JZ Absher DM Tang H Southwick AM Casto AM Ramachandran SCann HM Barsh GS Feldman M Cavalli-Sforza LL et al 2008Worldwide human relationships inferred from genome-wide pat-terns of variation Science 319(5866)1100ndash1104

Loh PR Lipson M Patterson N Moorjani P Pickrell JK Reich D Berger B2013 Inferring admixture histories of human populations using link-age disequilibrium Genetics 193(4)1233ndash1254

Lovejoy PE 2011 Transformations in slavery a history of slavery in AfricaCambridge Cambridge University Press

Macaulay V Hill C Achilli A Rengo C Clarke D Meehan W Blackburn JSemino O Scozzari R Cruciani F et al 2005 Single rapid coastalsettlement of Asia revealed by analysis of complete mitochondrialgenomes Science 308(5724)1034

Malaspinas A-S Westaway MC Muller C Sousa VC Lao O Alves IBergstrom A Athanasiadis G Cheng JY Crawford JE et al 2016 Agenomic history of Aboriginal Australia Nature 538(7624)207

Mallick S Li H Lipson M Mathieson I Gymrek M Racimo F Zhao MChennagiri N Nordenfelt S Tandon A et al 2016 The SimonsGenome Diversity Project 300 genomes from 142 diverse popula-tions Nature 538(7624)201

Maples BK Gravel S Kenny EE Bustamante CD 2013 RFMix a discrim-inative modeling approach for rapid and robust local-ancestry infer-ence Am J Hum Genet 93(2)278ndash288

McManus KF Taravella AM Henn BM Bustamante CD Sikora MCornejo OE 2017 Population genetic analysis of the DARC locus(Duffy) reveals adaptation from standing variation associated withmalaria resistance in humans PLoS Genet 13(3)e1006560

Moorjani P Patterson N Hirschhorn JN Keinan A Hao L Atzmon GBurns E Ostrer H Price AL Reich D 2011 The history of African geneflow into Southern Europeans Levantines and Jews PLoS Genet7(4)e1001373

Musilova E Fernandes V Silva NM Soares P Alshamali F Harich NCherni L Gaaied AB Al-Meeri A Pereira L et al 2011 Populationhistory of the Red Seamdashgenetic exchanges between the ArabianPeninsula and East Africa signaled in the mitochondrial DNA HV1haplogroup Am J Phys Anthropol 145592ndash598

Ozato K Shin D-M Chang T-H Morse HC 2008 TRIM family proteinsand their emerging roles in innate immunity Nat Rev Immunol8(11)849ndash860

Patin E Lopez M Grollemund R Verdu P Harmant C Quach H Laval GPerry GH Barreiro LB Froment A et al 2017 Dispersals and geneticadaptation of Bantu-speaking populations in Africa and NorthAmerica Science 356(6337)543ndash546

Patterson N Moorjani P Luo Y Mallick S Rohland N Zhan YGenschoreck T Webster T Reich D 2012 Ancient admixture inhuman history Genetics 192(3)1065

Patterson N Price AL Reich D 2006 Population structure and eigena-nalysis PLoS Genet 2(12)e190

Petkova D Novembre J Stephens M 2016 Visualizing spatial populationstructure with estimated effective migration surfaces Nat Genet48(1)94ndash100

Pickrell JK Coop G Novembre J Kudaravalli S Li JZ Absher D SrinivasanBS Barsh GS Myers RM Feldman MW et al 2009 Signals of recentpositive selection in a worldwide sample of human populationsGenome Res 19(5)826ndash837

Pickrell JK Pritchard JK 2012 Inference of population splits and mixturesfrom genome-wide allele frequency data PLoS Genet8(11)e1002967

Price AL Patterson NJ Plenge RM Weinblatt ME Shadick NA Reich D2006 Principal components analysis corrects for stratification ingenome-wide association studies Nat Genet 38(8)904ndash909

Purcell S Neale B Todd-Brown K Thomas L Ferreira MA Bender DMaller J Sklar P de Bakker PI Daly MJ et al 2007 PLINK a tool set forwhole-genome association and population-based linkage analysesAm J Hum Genet 81(3)559ndash575

Quintana-Murci L Semino O Bandelt H-J Passarino G McElreavey KSantachiara-Benerecetti AS 1999 Genetic evidence of an early exit ofHomo sapiens sapiens from Africa through eastern Africa Nat Genet23(4)437

R Development Core Team 2018 R a language and environment forstatistical computing Vienna (Austria)R Foundation for StatisticalComputing

Racimo F 2016 Testing for ancient selection using cross-populationallele frequency differentiation Genetics 202(2)733ndash750

Ranciaro A Campbell MC Hirbo JB Ko W-Y Froment A Anagnostou PKotze MJ Ibrahim M Nyambo T Omar SA et al 2014 Geneticorigins of lactase persistence and the spread of pastoralism inAfrica Am J Hum Genet 94(4)496ndash510

Reich D Thangaraj K Patterson N Price AL Singh L 2009Reconstructing Indian population history Nature461(7263)489ndash494

Roche B Rougeron V Quintana-Murci L Renaud F Abbate JLPrugnolle F 2017 Might interspecific interactions between patho-gens drive host evolution The case of Plasmodium species andDuffy-negativity in human populations Trends Parasitol 33(1)21ndash29

Rodriguez-Flores JL Fakhro K Agosto-Perez F Ramstetter MD Arbiza LVincent TL Robay A Malek JA Suhre K Chouchane L et al 2016Indigenous Arabs are descendants of the earliest split from ancientEurasian populations Genome Res 26(2)151ndash162

Rosenberg NA Huang L Jewett EM Szpiech ZA Jankovic I Boehnke M2010 Genome-wide association studies in diverse populations NatRev Genet 11(5)356ndash366

Russo A Di Gaetano C Cugliari G Matullo G 2018 Advances in thegenetics of hypertension the effect of rare variants Int J Mol Sci19(3)688

Segal R 2002 Islamrsquos black slaves the other black diaspora LondonAtlantic Books

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin NSchwikowski B Ideker T 2003 Cytoscape a software environmentfor integrated models of biomolecular interaction networks GenomeRes 13(11)2498ndash2504

Skoglund P Thompson JC Prendergast ME Mittnik A Sirak K HajdinjakM Salie T Rohland N Mallick S Peltzer A et al 2017 Reconstructingprehistoric African population structure Cell 171(1)59ndash71 e21

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

585

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Snow RW Amratia P Zamani G Mundia CW Noor AM Memish ZA AlZahrani MH Al Jasari A Fikri M Atta H 2013 The malaria transitionon the Arabian Peninsula progress toward a malaria-free regionbetween 1960ndash2010 Adv Parasitol 82205ndash251

Soares P Alshamali F Pereira JB Fernandes V Silva NM Afonso C CostaMD Musilova E Macaulay V Richards MB et al 2012 The expan-sion of mtDNA haplogroup L3 within and out of Africa Mol BiolEvol 29(3)915ndash927

Szpiech ZA Hernandez RD 2014 selscan an efficient multithreadedprogram to perform EHH-based scans for positive selection MolBiol Evol 31(10)2824ndash2827

Tekola-Ayele F Adeyemo A Chen G Hailu E Aseffa AA-O Davey GA-ONewport MJ Rotimi CA-OX 2015 Novel genomic signals of recentselection in an Ethiopian population Eur J Hum Genet23(8)1085ndash1092

Thompson EE Kuttab-Boulos H Witonsky D Yang L Roe BA Di RienzoA 2004 CYP3A variation and the evolution of salt-sensitivity var-iants Am J Hum Genet 75(6)1059ndash1069

Tishkoff SA Reed FA Ranciaro A Voight BF Babbitt CC Silverman JSPowell K Mortensen HM Hirbo JB Osman M et al 2007Convergent adaptation of human lactase persistence in Africa andEurope Nat Genet 39(1)31ndash40

Triska P Soares P Patin E Fernandes V Cerny V Pereira L 2015Extensive admixture and selective pressure across the Sahel BeltGenome Biol Evol 7(12)3484ndash3495

Uchil PD Quinlan BD Chan W-T Luna JM Mothes W 2008 TRIM E3ligases interfere with early and late stages of the retroviral life cyclePLoS Pathog 4(2)e16

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Vyas DN Al-Meeri A Mulligan CJ 2017 Testing support for the northernand southern dispersal routes out of Africa an analysis of Levantineand southern Arabian populations Am J Phys Anthropol164(4)736ndash749

Vyas DN Kitchen A Miro-Herrans AT Pearson LN Al-Meeri A MulliganCJ 2016 Bayesian analyses of Yemeni mitochondrial genomes sug-gest multiple migration events with Africa and Western Eurasia AmJ Phys Anthropol 159(3)382ndash393

Wagh K Bhatia A Alexe G Reddy A Ravikumar V Seiler M Boemo MYao M Cronk L Naqvi A et al 2012 Lactase persistence and lipidpathway selection in the Maasai PLoS One 7(9)e44751

Yin T Cook D Lawrence M 2012 ggbio an R package for extending thegrammar of graphics for genomic data Genome Biol 13(8)R77

Yu Y Ouyang Y Yao W 2018 shinyCircos an RShiny application forinteractive creation of Circos plot Bioinformatics 34(7)1229ndash1231

Zhao F McParland S Kearney F Du L Berry DP 2015 Detection ofselection signatures in dairy and beef cattle using high-density ge-nomic information Genet Sel Evol 4749

Fernandes et al doi101093molbevmsz005 MBE

586

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

migration events Improvements by Gravel et al (2013) andHellenthal et al (2014) have begun to disentangle the multi-ple events by organizing blocks in a distribution of sizes withbigger blocks meaning younger admixture events whereassmaller blocks indicate older admixture events AP individualsscreened for genome-wide chips (10 Yemeni 15 YemeniJews 20 Saudi and 14 Emirati) (Li et al 2008 Hellenthalet al 2014) allowed so far to identify the following admix-tures and corresponding ages 6ndash25 sub-Saharan Africaninput in the Arabian pool 8ndash37 generations ago(Fernandes et al 2015) and 129 sub-Saharan Africaninput by 1530 CE (Common Era) in Yemeni 223 by 1278CE in Saudi and 228 by 746 CE and 4 by 1754 CE inEmirati (Hellenthal et al 2014) Other genome-wide datasets were characterized for 168 Qatari (Hunter-Zinck et al2010) revealing 3 clear clusters consistent with Arabianorigin eastern or Persian origin and African admixtureand for 90 Yemeni (Vyas et al 2017) that reinforced theevidence that Levantine and southern Arabian popula-tions bear similar genetic relationships to both Africanand non-African populations

The admixture ages estimated from nuclear DNA are con-siderably younger than the ones estimated from mitochon-drial DNA (mtDNA) The mtDNA block is never interruptedby LD and in theory there is no time-window limit for ageestimates of mtDNA most recent comment ancestors Weand other authors were able to identify admixture eventsbetween the Levant and Arabia most likely via the Gulf cor-ridor in the Late Glacial and to a lesser extent the immediatepostglacialNeolithic (Cerny et al 2011 Al-Abri et al 2012Fernandes et al 2015 Vyas et al 2016) Lineages from thetime of the OOA affiliated in L African haplogroups are stillelusive in the AP However their direct descendants N1 N2and X had a relict distribution that suggested an ancientancestry within the AP which most likely spread from theGulf Oasis region toward the Levant and Europe at 55ndash24 ka(Fernandes et al 2012) MtDNA pointed out that the mainldquoback-to-Africardquo migrations occurred in the Late Glacial pe-riod for introductions into East Africa while the Neolithic wasmore significant for migrations toward North Africa(Musilova et al 2011 Hodgson et al 2014 Fernandes et al2015)

In the eighth millennium a remarkable maritime tradesystem was developed between Arabia Africa Levant andIndia (Cerny et al 2009 Fernandes et al 2015) It reached itspeak in the last 4 ka with the appearance of several earlykingdoms which dominated the Indian Ocean trading net-work It was so important that it left a genetic imprint in theAfrican Swahili Corridor (Brucato et al 2018) and in severalpopulations in Pakistan and India (Laso-Jadart et al 2017)Across the Red Sea the maritime traffic was dominant in theEgyptian Pre-Dynastic period and it was intensified through-out the Arab slave trade established from 25 ka to veryrecent times (Kivisild et al 2004) The Arab slave trade trans-ported 2400000 Africans from Nubia to Zanzibar to theLevant the AP and even India and China This trade wasmainly focused on female slaves (with a 21 female to maleratio) whom became domestic servants entertainers andor

concubine (Segal 2002 Lovejoy 2011 Fernandes et al 2015)The trade also played an important role in the spread of Islamafter AD 630 outward from Mecca and the subsequent rapidexpansion of the Arab Empire toward both the Atlantic andPacific Oceans (Hogarth 1904)

Selection screenings along the genome based on chip datasets are revealing that admixture events also allow descen-dant populations to acquire genetic adaptations that parentalpopulations evolved in the original habitat (Triska et al 2015Patin et al 2017) As malaria has been such a major selectiondriver in the African context (Roche et al 2017) East Africanpopulations of Arabs and Nubians (mainly Eurasian descen-dant) have an African-enriched background for the DARCgene region (Triska et al 2015) a pattern shared with theMakranis of Pakistan (Laso-Jadart et al 2017) This acquisitionof adaptation by admixture was also detected along theBantu expansion in southern Africa (Patin et al 2017) withthe western Bantu wave acquiring the HLA selection fromPygmies and the eastern Bantu receiving the input of north-ern East African LCT selected region It remains to be eluci-dated to which extent this kind of events played a role indesigning AP genetic diversity

In this work we performed a systematic genome-widecharacterization of the AP by screening 741000 SNPs inaround 420 individuals from Saudi Arabia Yemen Omanand the United Arab Emirates (UAE) and another 80 fromIran Having in mind the time-window this data set can shedlight on we focused on the last 4000 years highlighted in thiscareful high-resolution sampling of AP diversity By applyingadmixture mapping and haplotype-based screening of selec-tion we provided insight into local and acquired adaptationsof AP populations

Results and Discussion

Population Structure in the APWe tested the relationships of AP individuals with otherpopulations (ldquorestricted data setrdquo including relevant neigh-boring populations supplementary table S1 SupplementaryMaterial online) using ADMIXTURE This analysis identifiedsix ancestral populations as the best fit for the diversity pro-files of the analyzed populations (fig 1A other K plots andcross-validation graph are displayed in supplementary figs S1and S2 Supplementary Material online) AP populations pre-sented a high proportion of an ArabianNorth African com-ponent (in blue) especially in the west of the Peninsula 65in Saudi 61 in Yemen and Bedouin 36 in Oman and 32in the UAE The second highest component in the AP was aLevantCaucasian component (in red reaching 72 in Druze)28 in the UAE and Oman 20 in Yemen 18 in Bedouinand 16 in Saudi Arabia European background (in peach)was higher in Levantine populations (around 8ndash17) than inArabia (15ndash33) whereas sub-Saharan African (light anddark green) ancestry was 11 in Yemen the UAE andOman and then between 4 and 7 in Saudi Arabian andLevantine populations and vestigial (0ndash2) in Samaritansand Druze The Iranian population had a substantial poolfrom South Asia (28 light pink) consistent with its

Fernandes et al doi101093molbevmsz005 MBE

576

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Persian ancestry and this South Asian component was sim-ilarly frequent in the UAE and Oman (23ndash26) falling tobelow 10 frequency in all other Arabian and Levantpopulations

Principal Component Analysis (PCA) analyses revealedidentical population structure (supplementary fig S3Supplementary Material online) The first two principalcomponents separated the individuals along well-established geographical axes PC1 explained most of thediversity (799) and organized the populations in sub-Saharan African versus other with admixed Arabian and

Levantine individuals in between PC2 (92) separatedArabians and Levantines from Europeans and SouthAsians Most AP populations displayed a cluster withBedouin individuals and were very close to Levantine pop-ulations indicating that these individuals were less admixedwith non-Arabian populations A variable fraction of APindividuals displayed signs of admixture with North andsub-Saharan African populations spreading across the Xaxis In Iran Oman and the UAE some admixed individualswere more dispersed along the Y axis concentrating be-tween Levantine and South Asian populations

FIG 1 Population structure inferred by ADMIXTURE analysis (A) in which each individual is represented by a vertical (100) stacked column ofgenetic components proportions shown in color for K 6 Shared IBD fragments between pairs of individuals from the Arabian Peninsula Levantand Egyptthorn East Africa filtering for IBDlt 2 cM (B) IBD 2ndash5 cM (C) IBD 10ndash20 cM (D) and IBDgt20 cM (E) Populations are represented by a circleproportional to the sample size of the populations The Levantine populations are represented in pink AP in green and Egyptianthorn East African inblack Shared IBD fragments are represented by a line in pink to Levant and in green to AP The maps were adapted via Wikimedia Commons

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

577

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Gene Flow in the Region and Dating of AdmixtureThe set of algorithms based on haplotype sharing were testedin the ldquoextended data setrdquo (supplementary table S2Supplementary Material online) Refined Identity-by-descent(IBD) (Browning and Browning 2013) (fig 1BndashE) was appliedto AP Levantine Egyptian and East African populations andallowed us to identify shared IBD segments Since the lengthof shared haplotype decreases throughout time segmentlengths of lt2 cM and 2ndash5 cM are indicative of old commonancestry These results attested the genetic continuity in theregion given the high share of haplotypes Although nothighly differentiated the Levant seemed to share a loweramount of IBD with Kenya Somalia and Ethiopia when com-pared with what the AP shares with these populations Whenfiltering for shared IBD lengths 10ndash20 cM andgt20 cM (youn-ger shared history) the southern sharing (AP and East Africa)was dominant

The estimated effective migration surfaces (EEMS) plot(fig 2A) revealed a barrier between West and East AP pop-ulations East Africa shared a migration surface with SaudiArabia and Yemen populations whereas another genetic cor-ridor was evident between the Levant Iran and the East APThese results were corroborated by Wrightrsquos FST metric

(supplementary fig S4 Supplementary Material online)with Saudi Arabia being genetically closer to Yemen (0002)and the UAE closer to Oman (0001) than between bothsides of the Peninsula (0006) In terms of FST pairwise com-parisons between AP and South Asian populations the UAEdisplayed the lowest values (0003 with Iran 0007 withMakrani and 0009 with Balochi) and Iran was closer tothe Levant and the East AP (0003 with Syria Jordan andLebanon 0003 with the UAE and 0004 with Oman) thanto South Asian populations (0005 with Balochi 0006 withPathan and 0032 with India) All AP populations showed alower distance from the Levant (0004ndash0008) than fromEurope (0012ndash0027) East Africa (0084ndash0110) and finallyWest Africa (0095ndash0123) The same pattern of west-eastdiversity in the Peninsula was observed for the f3 statistics(supplementary fig S5 and supplementary table S3Supplementary Material online) with detection of Africanand LevantEuropean admixture in all Arabian populationsand admixture with South Asian ancestry limited to theEast AP

The clustered coancestry matrix from thefineSTRUCTURE and the ChromoPainter analyses whichexplores patterns of haplotype sharing among groups

FIG 2 Estimated effective migration surfaces gradient map around the Arabian Peninsula (A) The color scale reveals low (light gray) and high (darkgray) genetic barriers between populations localized on a grid of 500 demes The scale represents log values of the effective migration rates (m)Each deme (black dot) is proportional to the sample sizes of the populations Map was generated with the R package worldmap Dates ofadmixture events (B) in AP and Iranian populations as estimated by GLOBETROTTER Pie charts reflect the admixture proportions of ancestrieslisted in the legend Open circles in the date of admixture refer to one event whereas filled circles refer to multiple events and lines represent the95 confidence interval

Fernandes et al doi101093molbevmsz005 MBE

578

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

classified AP and Iranian individuals in 12 clusters attestingtheir diverse genetic reservoir (supplementary fig S6Supplementary Material online) Proportions of ancestriesreflected the ones obtained in ADMIXTURE mostly AP andIranian individuals were clustered with LevantNorthAfrican populations (92 from Yemen 92 Iran 92 theUAE 84 Saudi Arabia and 80 Oman) and to a lesserextent with South Asians (4ndash8 of UAE and Oman indi-viduals) and sub-Saharan Africans including with ComoroMadagascar and Swahili populations (12 Oman 12Saudi Arabia and 4 Iran)

Overall GLOBETROTTER results highlighted recent ad-mixture events in the AP region (fig 2B and supplementarytable S4 Supplementary Material online) Between 400 and1000 years before present admixture events took place withEast African ancestry in all AP populations The non-Africanbackground was similar to a European background in allpopulations except in the UAE and Iran where a closenessto South Asian ancestry was identified The sub-SaharanAfrican proportion was double in the West AP than in theEast AP In Saudi Arabia the UAE and Oman multidateevents were more probable but the second event was evenyounger (67 153 and 160 BP respectively) and involved thesame ancestries of the older event

Local Ancestry InferenceThe RFMix software package (Maples et al 2013) was used toanalyze the genomes of admixed Arabian and Iranian popu-lations and to detect regions showing excess of a given pa-rental population (sub-Saharan African or South Asian indetriment of the main EuropeanLevant background)Results revealed statistically significant enrichment of a par-ticular ancestry in certain regions of the genome which areassociated with selection signals due to adaptive traits such asmalaria resistance skin pigmentation and lactose tolerance(fig 3A and B and supplementary figs S7 and S8Supplementary Material online show results for all testedpopulations supplementary table S5 SupplementaryMaterial online)

The region of the genome with the highest proportion ofsub-Saharan African ancestry (gt4SD) observed in Arabianand Iranian populations was located on chromosome 1 ina gene-rich region containing the DARC gene and severalolfactory receptor genes such as OR10J1 OR10J3 andOR10J5 This sub-Saharan African input was higher inYemen (a mean of 061 supplementary fig S9Supplementary Material online) than in other regions ofthe Peninsula (049 Oman 035 the UAE 028 Saudi Arabiaand 016 in Iran) Another region highly enriched in all pop-ulations for sub-Saharan African ancestry was located in chro-mosome 9 containing genes coding for forkhead box D4(FOXD4L4 and FOXD4L2) and spermatogenesis-associatedproteins (FAM75A5 and FAM75A7)

The RFMix analysis indicated a South Asian-enriched re-gion on chromosome 2 containing the RAB3GAP1 ZRANB3LCT MCM6 and CXCR4 genes They have been previouslyassociated with lactase persistence the ability to digest milkinto adulthood which is an adaptive trait mostly found in

European populations and a classic example of genetic adap-tation in humans (Tishkoff et al 2007 Campbell and Tishkoff2008 Triska et al 2015) This region reached highly significantvalues (gt4SD) in Saudi and Yemen and was also significant inthe UAE and Iran (3SD lt X lt 4SD) Another South Asian-enriched region (3SD lt X lt 4SD) was detected on chromo-some 5 in Yemen Saudi Arabia and the UAE containing theSLC45A2 gene associated with skin eye and hair pigmenta-tion Nevertheless the most significant South Asian-enrichedsignal detected in all populations (gt4SD) was located inchromosome 6 containing many genes related with the im-mune system (HLA-B HLA-DR HLA-DQ MICA MICB andTNF)

Selection in the AP and IranWhen checking positive selection signals in the AP and Iranby using integrated haplotype score (iHS) and populationextended haplotype homozygosity (XP-EHH) measures sev-eral genomic regions displayed significant values (fig 3CndashFand supplementary figs S10ndashS13 and supplementary tablesS6ndashS9 Supplementary Material online)

Particularly several signals previously associated with se-lection in non-African populations were detected East APand Iranian populations exhibited a strong signal of positiveselection (gt4SD) in the XP-EHH-European comparison onchromosome 2 region containing the R3HDM1 and LCTgenes associated to Europeans with food conversion effi-ciency and metabolism of lactose (Zhao et al 2015) TheSLC24A5 gene on chromosome 15 which is strongly associ-ated with skin pigmentation variation among European andSouth Asian populations (Basu Mallick et al 2013) presentedsigns of positive selection in all populations in iHS (gt4SD forYemen Saudi Arabia and the UAEgt3SD for Oman and Iran)and XP-EHH-South Asian (gt3SD for all populations) analysesAnother skin pigmentation-related gene SLC45A2 which islocated on chromosome 5 and is known to be strongly asso-ciated to this phenotype in European populations (Candilleet al 2012 Hernandez-Pacheco et al 2017) displayed signs ofpositive selection in all populations for the XP-EHH-Europeancomparison (gt4SD in Saudi Arabia and gt3SD in remainingpopulations) The same occurred for chromosome 4 regioncontaining the TLR1 TLR6 and TLR10 genes (role in pathogenrecognition and activation of innate immunity [Laayouniet al 2014] gt4SD in Yemen and Saudi Arabia and gt3SDin remaining populations) and on chromosome 12 regioncontaining the ATXN2 ACAD10 and ALDH2 genes (associ-ated with hypertension [Russo et al 2018] gt4SD in SaudiArabia and gt3SD in remaining populations)

In the comparison with African populations (XP-EHH-African) significant values (gt3SD) were observed for thechromosome 1 region containing the DARC gene in SaudiArabia and Iran Amongst the strongest signals of positiveselection was the chromosome 21 region containing theC21orf34 MIR99A and MIRLET7C genes detected in all pop-ulations (gt4SD) both in iHS and XP-EHH-African tests Thissignal was previously identified as amongst the strongest se-lection signals in non-Africans (Pickrell et al 2009) and re-cently C21orf34 was identified as MIR99AHG a miRNA gene

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

579

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

FIG 3 Circos plots highlighting significantly (gt4SD) enriched local admixed and positively selected segments in Arabian Peninsula and Iranianpopulations (A) Enriched sub-Saharan African ancestry (B) Enriched South Asian ancestry (C) iHS selection metrics (D) XP-EHH versus Europeanancestry (E) XP-EHH versus sub-Saharan African ancestry (F) XP-EHH versus South Asian ancestry

Fernandes et al doi101093molbevmsz005 MBE

580

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

involved in the regulation of hematopoiesis and oncogenesfor the development of myeloid leukemia (Emmrich et al2014) In addition a signal on the chromosome 10 regioncontaining the SPAG6 gene was observed in all populationsin iHS (except Yemen) and XP-EHH-African (stronger inOman and the UAE) analyses This gene is essential for thestructural integrity of the central microtubule of sperm sta-bility and flagellar motility and has been previously identifiedas being under selection before the separation betweenEuropean and Asian populations (Racimo 2016) The signifi-cant signal in iHS score on chromosome 7 region containingseveral genes (CYP3A5 ZSCAN25 ATP5J2-PTCD1 CPSF4ATP5J2 ZNF789 and ZKSCAN5) was also detected in all pop-ulations at the XP-EHH-African comparison CYP3A5 gene isinvolved in cholesterol metabolism and steroid biosynthesisin the liver and has been identified to be under positive se-lection in African and non-African populations (Wagh et al2012) CYP3A513 polymorphism was also described as be-ing under selective pressure for water retention and risk forsalt-sensitive hypertension in equatorial populations(Thompson et al 2004)

The iHS metrics identified a positively selected region onchromosome 6 in all populations which contains candidategenes involved in the human immunodeficiency virus (HIV)stages of the viral lifecycle the ZNRD1-AS1and TRIM26 genesZNRD1-AS1 plays a role in the regulation of cell proliferationand is essential for the completion of HIV lifecycle Itsrs3132130 SNP was described to confer host resistance toHIV-1 acquisition by causing a loss of nuclear factor bindingand decreasing the ZNRD1 promoter activity (An et al 2014)Although with unknown function the tripartite motif (TRIM)proteins may contribute to the innate immunity of retrovi-ruses affecting both early and late stages of the retroviral lifecycle (Ozato et al 2008 Uchil et al 2008) Another signifi-cantly iHS selected region in all populations except the SaudiArabians was identified on chromosome 22 region contain-ing the TTC38 PKDREJ GTSE1 and PPARA genes PPARA geneis a transcription receptor that regulates lipid and glucosemetabolism during food deficiency and a recent study(Tekola-Ayele et al 2015) found a specific signal of recentpositive selection of this gene in Ethiopia suggesting a met-abolic adaptation to high-altitude hypoxia

ConclusionsThe detailed genome-wide characterization of Arabian andIranian populations allowed us to establish for the first timethe palimpsest of admixture and selection along the genomesof these inhabitants the main bridge between African Asianand European population groups Admixture events com-prised up to 20 of sub-Saharan African ancestry in theWest and 20 of South Asian ancestry in the East of theAP enriching the local predominant Arabian- North African-Levantine and Caucasian-like components The sub-SaharanAfrican input decreased toward the East whereas the SouthAsian input decreased in the direction of the West of thePeninsula testifying the continuous genetic exchange withAfrica and Asia as it was evident in the IBD sharing analysis

Of course there are specific subgroups that depart from thesegeneral characteristics such as the Qatari Bedouins that rep-resent a ldquocoldspotrdquo of admixture in the AP (Rodriguez-Floreset al 2016) whereas the Hadrami in Yemen are a ldquohotspotrdquo ofadmixture (Vyas et al 2017) EEMS analysis identified higherexchanges of genes between Africa and the West AP as wellas between the East AP and the LevantSouth Asia thanbetween the West and the East AP This result agrees withpreviously documented degree of isolation between Westand East AP populations identified when studying uniparen-tal markers (Alshamali et al 2009 Fernandes et al 2012 2015)We previously found relict maternal lineages (60 ka) of theoriginal OOA successful settlement by modern humans in theAP and most likely they spread to South Asia and the LevantEurope from the Gulf corridor (Fernandes et al 2012)However current LD decayndashbased dates only indicated re-cent admixture events in the eastern side of the PeninsulaSimilarly younger ages of admixture in the West AP reflectedthe Islamization of North Africa as the most probable maincontributor of this region The exchanges across the Red Seadetected for the autosomal data matched the Arab slavetrade and maritime dominance

Recent studies have provided evidence that admixed pop-ulations acquired positively selected genomic regions fromtheir parental populations or in other words the distributionof admixed blocks across the genomes of descendants wasnot even (Triska et al 2015 Laso-Jadart et al 2017 Patin et al2017) The acquired selected regions can be inferred from theconcomitant identification of the same genomic regions inthe selection metrics and in the local enrichment ancestryanalysis AP populations displayed a jigsaw puzzle of AfricanEuropean and South Asian-selected genes linked to responseto infection diseases skin pigmentation lactase persistenceand food conversion

The strongest African-related enrichment sign was for theDARC gene which codes the Duffy antigen a membrane-bound chemokine receptor used by Plasmodium vivax toinfect red blood cells Plasmodium vivax causes a chronicform of malaria and is the most widespread type of malariaoutside Africa (McManus et al 2017) It is described that thederived C allele mutation in the promoter which causeserythroid-specific suppression of gene expression confers re-sistance against Plasmodium vivax malaria and thus it is al-most fixed in African populations (Triska et al 2015 Laso-Jadart et al 2017 McManus et al 2017) The AP is known tohave a diverse malaria epidemiology in recent times from alikely absence of transmission in Kuwait to an intense parasiteexposure similar to conditions in many parts of Africa amongresidents of Saudi Arabia and Yemen (Snow et al 2013) Thedetected enrichment signal confirmed here in the genomes ofArabians supported the higher malaria pressure in the WestAP The enrichment was strong enough to allow detection ofselection in the XP-EHH metrics in Saudi Arabia and Iran

The most significant EuropeanSouth Asian-related enrich-ments in AP populations were linked to lactase tolerance-specific chromosome 2 RAB3GAP1LCTMCM6 region skinpigmentation associated to the SLC24A5 gene and HLA re-gion on chromosome 6 The LCT gene 13915G allele

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

581

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

(rs41380347) was described as being present at high fre-quency in the AP and the Levant and estimated to havebeen originated 4 ka (62 ka) in the AP (Bayoumi et al2016) possibly as a result of the domestication of the Arabiancamel at 6 ka (Ranciaro et al 2014) By contrast theEuropean 13910T selected allele (rs4988235) is also themost commonly observed in South Asia (Gallego Romeroet al 2012) We could not verify frequencies of rs41380347SNP that was not part of the chip genotyped here but wechecked the linked rs4988243 SNP This last SNP testified thehigher frequencies in the West versus the East AP (432 inYemen 461 in Saudi Arabia 179 in Oman 191 in theUAE and 174 in Iran notice that although rs41380347 andrs4988243 are linked this last SNP has overall frequencies20 higher than the first) The genotyped Europeanrs4988235 SNP presented higher frequencies of the derivedallele in the East than in the West AP (06 13 125182 and 103 respectively) The haplotype backgroundof Arabian and European derived alleles are related and dis-tant from the African one (Ranciaro et al 2014) whichexplains the high non-African enrichment detected inRFMix analysis in the West AP (the Arabian haplotypes)and the selection signal in XP-EHH-European comparison inThe East AP and Iran (the European haplotypes) This is aninteresting case generated by the isolation between the twosides of the Peninsula where the West developed a localpositive selection of a lactose tolerant variant whereas theEast acquired the EuropeanSouth Asian variant Previouslythis difference had been reported for ethnic groups withinOman (Bayoumi et al 2016) where frequencies of13910Tand 13915G alleles diverged between Arabs of northernand southern Oman whom migrated from Yemen (0ndash1 and14ndash72 respectively) and Omanis of Asian origin (16 and0 respectively) Here we verified that this pattern is ex-tended to the entire Peninsula

In relation to the genetic diversity linked to skin color theEuropeanSouth Asian-selected derived SNPs are equally fre-quent in both the West and East AP (rs1426654 in theSLC24A5 gene 101 in Yemen 133 in Saudi Arabia132 in Oman 135 in the UAE and 118 in Iranrs183671 in the SLC45A2 gene 354 294 227 290and 401 respectively rs1667394 in the OCA2HERC2 gene381 288 307 335 and 349 respectively)

The palimpsest of complex interactions between cross andlocal selective pressures and demographic factors identifiedhere in the AP testifies the rich genetic dynamics taking placein bridging zones of the globe In the near future wholegenomes of Arabians across the entire peninsula (with amore precise geographical information than the one we couldpresent here) and more sensitive selection measures will addinformation to the role played by this geographic region inthe adaptation of modern humans to non-African habitatsImprovements to dating techniques applicable to completegenomes such as multiple sequentially Markovian coalescentapproach will also help in answering the question if the APwas only a passage way to reach further distances or one ofthe oldest inhabited regions OOA Complete genomes mayindeed shed light on the lost period in the mtDNA pool due

to the absence of the direct OOA descendants The first stepshave already been taken and a few Arabian completegenomes have begun to be available (John et al 2015Mallick et al 2016 Rodriguez-Flores et al 2016)

Materials and Methods

Population Samples Genome-Wide Genotyping andPublished DataDNA samples analyzed in this study were collected from 420individuals of the AP (Saudi Arabia Yemen Oman and theUAE) and from 80 individuals of Iran populations (furtherinformation provided in supplementary table S1Supplementary Material online) Individuals were residing inDubai at the time of sampling (between years 1991 and 2000)representing a widespread collection of birth origins fromSaudi Arabia Yemen Oman the UAE and Iran This studyobtained ethical approval from the Ethics Committee of theUniversity of Porto Portugal (17CEUP2012) The individualswere genotyped with the Illumina Human Omni ExpressBead Chip (OmniExpress) containing 741000 SNPs Qualitychecks were applied using PLINK (Purcell et al 2007 Changet al 2015) and 22 individuals were excluded for the lack ofmore than 10 of genotypes A preliminary PCA revealedthat some UAE and Oman individuals (five individuals fromeach population) had a considerable sub-Saharan African an-cestry an indicator that they were most probably recentimmigrants from Africa (supplementary fig S14Supplementary Material online) Hence these individualswere removed from the analyses leading to a final set of468 individuals Markers with genotyping call rates lowerthan 95 were removed consequently a total of 627435autosomal SNPs passed quality control check The genotypesfrom the individuals analyzed in this study can be accessedfrom the EGA repository (European Genome-PhenomeArchive accession number EGAS00001003335) In order toperform a fine-scale population structure analysis of our com-plete data the 468 individuals were merged with relevantavailable data sets (supplementary table S1 SupplementaryMaterial online) As several population genetic analyses as-sume independent markers SNPs were pruned for pairwiseLD in PLINK (wwwcog-genomicsorgplink19 Purcell et al2007 Chang et al 2015) by removing any SNP that had an r2

gt 04 with another SNP within a 50-SNPs sliding windowwith a step of 20 SNPs After pruning the ldquorestricted data setrdquoincluded 176309 autosomal SNPs and 1782 individuals (sup-plementary table S1 Supplementary Material online) andthis data set was used in the analyses based on allele frequen-cies (PCA ADMIXTURE FST and f3 statistic) Another data setwas created by randomly selecting a maximum of 25 individ-uals from each population but extending the number andthe geographical distribution of populations this data setconsisting of 171728 SNPs and 2056 individuals was desig-nated as ldquoextended data setrdquo (supplementary table S2Supplementary Material online) The homogenization ofthe number of samples per group avoided statistical biasesdue to overrepresentation of some populations and allowedto run computationally demanding algorithms such as EEMS

Fernandes et al doi101093molbevmsz005 MBE

582

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

fineSTRUCTURE Chromopainter GLOBETROTTER and IBDAll genotypes from the unpruned ldquoextended data setrdquo werephased with SHAPEIT v2r79044 (Delaneau et al 2011) usingthe 1000 Genomes phased data (Auton et al 2015) as a ref-erence panel and the HapMap phase 2 genetic map (Frazeret al 2007) We did not merge our data with relevant data setsfrom Qatar and Yemen (Hunter-Zinck et al 2010 Vyas et al2017) as these were screened through Affymetrix chips bear-ing a low number of overlapping SNPs Qatari completegenomes from (Rodriguez-Flores et al 2016) were specificallyselected not making a random population sample whichcould be comparable with our populations

Population Structure Differentiation and InbreedingPCA which infers worldwide axes of human genetic variationfrom the allele frequencies of various populations was carriedout in the pruned ldquorestricted data setrdquo by using the smartpcatool included in the EIGENSOFT package (Patterson et al2006 Price et al 2006) We ran ADMIXTURE (Alexander et al2009) to decompose genetic ancestries of the prunedldquorestricted data setrdquo using maximum likelihood for compo-nents Kfrac14 2 to Kfrac14 12 with the optimal K estimated throughcross-validation of the logistic regression (Kfrac14 6) Wrightrsquos FST

metric was calculated using Vcftools (Danecek et al 2011)The THREEPOP program from TreeMix was used to calculatef3 statistics (Reich et al 2009 Patterson et al 2012 Pickrell andPritchard 2012) which tests between alternative scenarios ofadmixture In order to analyze and visualize spatial populationstructure from georeferenced genetic samples and to definegenetic barriers and corridors the method of EEMS (Petkovaet al 2016) was used Geographic coordinates and a geneticdissimilarity matrix between populations were used to set upa map defining a grid of 500 demes and 3000000 Markovchain Monte Carlo (MCMC) iterations were run beforechecking for convergence Plots were generated in R followingEEMS v1 manual indications

Inference and Dating of Population AdmixturePopulation structure of the phased ldquoextended data setrdquo wasevaluated using the fineSTRUCTURE v207 package 18(Lawson et al 2012) fineSTRUCTURE captures informationon population structure provided by patterns of haplotypesimilarity calculated with Chromopainter v20 18 (Lawsonet al 2012) and performs a model-based Bayesian clusteringof genotypes From the results a coancestry heat map and adendrogram were inferred to visualize the number of statis-tically defined clusters that describe the data The clustersobtained by fineSTRUCTURE were then used to assign thepopulations as target or donorsurrogate when testing foradmixture Mutational rates and effective population sizeparameters were first estimated with an Estimation-Maximization algorithm running Chromopainter v2 on all22 autosomes for the entire data set with 10 iterations(Lawson et al 2012) The weighted average of these param-eters according to the SNP coverage of each chromosomeand the number of individuals was then used to compute thechromosomal painting Two Arabian groups were identifiedas target populations the western group represented by

Yemen and Saudi Arabia and the eastern group representedby Oman and the UAE We ran chromosomal painting foreach group Due to the possibility of common ancestry weexcluded the Levantine populations from surrogate popula-tions The painted chromosomes obtained for each clusterwere used in GLOBETROTTER v10 26 (Hellenthal et al 2014)to estimate the ratios and dates of the potential admixtureevents in the region Coancestry curves were estimated foreach target population both standardizing against a null(NULL 1) and standard (NULL 0) individual and consistencybetween each estimated parameter was checked A total of100 bootstrap resamplings were performed to estimate theprobability values of the admixture events considering theldquoNULLrdquo individual by calculating the proportion of non-sensical inferred dates (lt1 or gt400 generations) producedby the model (NULL 1) The ldquobest-guessrdquo scenario given byGLOBETROTTER v10 26 (Hellenthal et al 2014) was consid-ered for each target population Dates of admixture given ingenerations were converted to chronological time using ageneration interval of 29 years (Langergraber et al 2012)

Haplotype sharing between pairs of individuals was esti-mated by the refined IBD algorithm of Beagle v4026 27(Browning and Browning 2007) and visualized withCytoscape v32160 (Shannon et al 2003) All maps inthe present study were generated using Global Mapperv15 (httpwwwbluemarblegeocomproductsglobal-mapperphp)

Local Ancestry InferenceRFMix (Maples et al 2013) software which employs a LDmodel between markers to infer ancestry for each segmentof the genome between a mixture of putative ancestral panelsof haplotypes was used in the unpruned and phased data setfrom the four AP populations plus Iran The phased data fromthe 1000 Genomes populations of Great Britain (representingEuropean ancestry) Yoruba from Nigeria (sub-SaharanAfrican ancestry) and Indian Telugu (South Asian ancestry)were used as parental populations Information on ancestrywas obtained for each locus along chromosomes for everyindividual and these values were averaged in each popula-tion In order to identify regions which had a significantlyhigher proportion of a given parental ancestry we followedpreviously published studies (Bryc et al 2015) consideringonly regions outside the range defined by the genomemean 6 4SD The karyograms of the local ancestry acrossthe 22 chromosomes were generates in R using the ggbiopackage (Yin et al 2012)

Analysis of SelectioniHS (Voight et al 2006) was estimated in the SelScan package(Szpiech and Hernandez 2014) in the phased data set fromthe four AP populations plus Iran Scores were calculated foreach SNP within the population as a whole and standardizedwithin each of 100 bins of allele frequency using the normtool In order to identify loci potentially affected by positiveselection a percentile score was assigned based on the pro-portion of extreme iHS values in segment SelScan (Szpiechand Hernandez 2014) tool was also used to estimate cross

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

583

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

population extended haplotype homozygosity in each APpopulation by comparing with Great Britain Yoruba fromNigeria or Indian Telugu Consistently with other tests forselection only markers with MAFgt 1 were used for XP-EHH computation The obtained XP-EHH values were nor-malized by subtracting genome-wide mean XP-EHH and di-viding by standard deviation (Szpiech and Hernandez 2014)Genomic regions with percentile score exceeding 4SD and3SD were reported

Selected SNPs were mapped to genes in 5 kb flanking re-gion and significantly selected regions were represented incircos Plots generated in R (R Development Core Team 2018)using the shinyCircos graphical interface (Yu et al 2018)

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsThis work was financed by FEDERmdashFundo Europeu deDesenvolvimento Regional funds through COMPETE2020mdashOperacional Programme for Competitiveness andInternationalisation (POCI) Portugal 2020 and byPortuguese funds through FCTmdashFundac~ao para a Ciencia ea TecnologiaMinisterio da Ciencia Tecnologia e Inovac~ao inthe framework of the project ldquoBiomedical anthropologicalstudy in Arabian Peninsula based on high throughputgenomicsrdquo (POCI-01-0145-FEDER-016609) VF has a postdocgrant through FCT (SFRHBPD1149272016)

ReferencesAl-Abri A Podgorna E Rose JI Pereira L Mulligan CJ Silva NM Bayoumi

R Soares P Cerny V 2012 PleistocenendashHolocene boundary inSouthern Arabia from the perspective of human mtDNA variationAm J Phys Anthropol 149(2)291ndash298

Alexander DH Novembre J Lange K 2009 Fast model-based estimationof ancestry in unrelated individuals Genome Res 19(9)1655ndash1664

Alshamali F Pereira L Budowle B Poloni ES Currat M 2009 Localpopulation structure in Arabian Peninsula revealed by Y-STR diver-sity Hum Hered 68(1)45ndash54

An P Goedert JJ Donfield S Buchbinder S Kirk GD Detels R Winkler CA2014 Regulatory variation in HIV-1 dependency factor ZNRD1 asso-ciates with host resistance to HIV-1 acquisition J Infect Dis210(10)1539ndash1548

Auton A Brooks LD Durbin RM Garrison EP Kang HM Korbel JOMarchini JL McCarthy S McVean GA Abecasis GR 2015 A globalreference for human genetic variation Nature 526(7571)68ndash74

Basu Mallick C Iliescu FM Mols M Hill S Tamang R Chaubey G Goto RHo SYW Gallego Romero I Crivellaro F et al 2013 The light skinallele of SLC24A5 in South Asians and Europeans shares identity bydescent PLoS Genet 9(11)e1003912

Bayoumi R De Fanti S Sazzini M Giuliani C Quagliariello A Bortolini EBoattini A Al-Habori M Al-Zubairi AS Rose JI et al 2016 Positiveselection of lactase persistence among people of Southern ArabiaAm J Phys Anthropol 161(4)676ndash684

Browning BL Browning SR 2013 Improving the accuracy and efficiencyof identity-by-descent detection in population data Genetics194(2)459ndash471

Browning SR Browning BL 2007 Rapid and accurate haplotype phasingand missing-data inference for whole-genome association studies byuse of localized haplotype clustering Am J Hum Genet81(5)1084ndash1097

Brucato N Fernandes V Mazieres S Kusuma P Cox MP Ngrsquoangrsquoa JWOmar M Simeone-Senelle MC Frassati C Alshamali F et al 2018The Comoros show the earliest Austronesian gene flow into theSwahili corridor Am J Hum Genet 102(1)58ndash68

Bryc K Durand EY Macpherson JM Reich D Mountain JL 2015 Thegenetic ancestry of African Americans Latinos and EuropeanAmericans across the United States Am J Hum Genet 96(1)37ndash53

Campbell MC Tishkoff SA 2008 African genetic diversity implica-tions for human demographic history modern human originsand complex disease mapping Annu Rev Genomics Hum Genet9403ndash433

Candille SI Absher DM Beleza S Bauchet M McEvoy B GarrisonNA Li JZ Myers RM Barsh GS Tang H et al 2012 Genome-wideassociation studies of quantitatively measured skin hair and eyepigmentation in four European populations PLoS One 7(10)e48294

Cerny V Mulligan CJ Fernandes V Silva NM Alshamali F Non A HarichN Cherni L El Gaaied AB Al-Meeri A et al 2011 Internal diversifi-cation of mitochondrial haplogroup R0a reveals post-last glacialmaximum demographic expansions in South Arabia Mol Biol Evol28(1)71ndash78

Cerny V Pereira L Kujanova M Vasikova A Hajek M Morris M MulliganCJ 2009 Out of Arabiamdashthe settlement of island Soqotra as revealedby mitochondrial and Y chromosome genetic diversity Am J PhysAnthropol 138(4)439ndash447

Chang CC Chow CC Tellier LC Vattikuti S Purcell SM Lee JJ 2015Second-generation PLINK rising to the challenge of larger and richerdatasets Gigascience 47

Danecek P Auton A Abecasis G Albers CA Banks E DePristo MAHandsaker RE Lunter G Marth GT Sherry ST et al 2011 The variantcall format and VCFtools Bioinformatics 27(15)2156ndash2158

Delaneau O Marchini J Zagury JF 2011 A linear complexity phasingmethod for thousands of genomes Nat Methods 9(2)179ndash181

Emmrich S Streltsov A Schmidt F Thangapandi VR Reinhardt DKlusmann JH 2014 LincRNAs MONC and MIR100HG act as onco-genes in acute megakaryoblastic leukemia Mol Cancer 13171

Fernandes V Alshamali F Alves M Costa MD Pereira JB Silva NMCherni L Harich N Cerny V Soares P et al 2012 The Arabian cradlemitochondrial relicts of the first steps along the southern route outof Africa Am J Hum Genet 90(2)347ndash355

Fernandes V Triska P Pereira JB Alshamali F Rito T Machado AFajkosova Z Cavadas B Cerny V Soares P et al 2015 Genetic stra-tigraphy of key demographic events in Arabia PLoS One10(3)e0118625

Frazer KA Ballinger DG Cox DR Hinds DA Stuve LL Gibbs RA BelmontJW Boudreau A Hardenbol P Leal SM et al 2007 A second gener-ation human haplotype map of over 31 million SNPs Nature449(7164)851ndash861

Fu Q Posth C Hajdinjak M Petr M Mallick S Fernandes D FurtwanglerA Haak W Meyer M Mittnik A et al 2016 The genetic history of IceAge Europe Nature 534(7606)200ndash205

Gallego Romero I Basu Mallick C Liebert A Crivellaro F Chaubey G ItanY Metspalu M Eaaswarkhanth M Pitchappan R Villems R et al2012 Herders of Indian and European cattle share their predomi-nant allele for lactase persistence Mol Biol Evol 29(1)249ndash260

Gravel S Zakharia F Moreno-Estrada A Byrnes JK Muzzio M Rodriguez-Flores JL Kenny EE Gignoux CR Maples BK Guiblet W et al 2013Reconstructing Native American migrations from whole-genomeand whole-exome data PLoS Genet 9(12)e1004023

Hellenthal G Busby GBJ Band G Wilson JF Capelli C Falush D Myers S2014 A genetic atlas of human admixture history Science343(6172)747ndash751

Hernandez-Pacheco N Flores C Alonso S Eng C Mak ACY Hunstman SHu D White MJ Oh SS Meade K et al 2017 Identification of a novellocus associated with skin colour in African-admixed populationsSci Rep 744548

Hodgson JA Mulligan CJ Al-Meeri A Raaum RL 2014 Early back-to-Africa migration into the horn of Africa PLoS Genet 10(6)e1004393

Fernandes et al doi101093molbevmsz005 MBE

584

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Hogarth DG 1904 The penetration of Arabia a record of the develop-ment of western knowledge concerning the Arabian Peninsula NewYork Cambridge Library Collection

Hunter-Zinck H Musharoff S Salit J Al-Ali KA Chouchane L Gohar AMatthews R Butler MW Fuller J Hackett NR et al 2010 Populationgenetic structure of the people of Qatar Am J Hum Genet87(1)17ndash25

John SE Thareja G Hebbar P Behbehani K Thanaraj TA Alsmadi O2015 Kuwaiti population subgroup of nomadic Bedouin ancestrymdashwhole genome sequence and analysis Genom Data 3116ndash127

Kivisild T Reidla M Metspalu E Rosa A Brehm A Pennarun E Parik JGeberhiwot T Usanga E Villems R 2004 Ethiopian mitochondrialDNA heritage tracking gene flow across and around the gate oftears Am J Hum Genet 75(5)752ndash770

Laayouni H Oosting M Luisi P Ioana M Alonso S Rica~no-Ponce ITrynka G Zhernakova A Plantinga TS Cheng S-C et al 2014Convergent evolution in European and Rroma populations revealspressure exerted by plague on Toll-like receptors Proc Natl Acad SciU S A 111(7)2668ndash2673

Lahr MM Foley R 2005 Multiple dispersals and modern human originsEvol Anthropol 3(2)48ndash60

Langergraber KE Prufer K Rowney C Boesch C Crockford C Fawcett KInoue E Inoue-Muruyama M Mitani JC Muller MN et al 2012Generation times in wild chimpanzees and gorillas suggest earlierdivergence times in great ape and human evolution Proc Natl AcadSci U S A 109(39)15716ndash15721

Laso-Jadart R Harmant C Quach H Zidane N Tyler-Smith C Mehdi QAyub Q Quintana-Murci L Patin E 2017 The genetic legacy of theIndian Ocean slave trade recent admixture and post-admixture se-lection in the Makranis of Pakistan Am J Hum Genet101(6)977ndash984

Lawson DJ Hellenthal G Myers S Falush D 2012 Inference of populationstructure using dense haplotype data PLoS Genet 8(1)e1002453

Li JZ Absher DM Tang H Southwick AM Casto AM Ramachandran SCann HM Barsh GS Feldman M Cavalli-Sforza LL et al 2008Worldwide human relationships inferred from genome-wide pat-terns of variation Science 319(5866)1100ndash1104

Loh PR Lipson M Patterson N Moorjani P Pickrell JK Reich D Berger B2013 Inferring admixture histories of human populations using link-age disequilibrium Genetics 193(4)1233ndash1254

Lovejoy PE 2011 Transformations in slavery a history of slavery in AfricaCambridge Cambridge University Press

Macaulay V Hill C Achilli A Rengo C Clarke D Meehan W Blackburn JSemino O Scozzari R Cruciani F et al 2005 Single rapid coastalsettlement of Asia revealed by analysis of complete mitochondrialgenomes Science 308(5724)1034

Malaspinas A-S Westaway MC Muller C Sousa VC Lao O Alves IBergstrom A Athanasiadis G Cheng JY Crawford JE et al 2016 Agenomic history of Aboriginal Australia Nature 538(7624)207

Mallick S Li H Lipson M Mathieson I Gymrek M Racimo F Zhao MChennagiri N Nordenfelt S Tandon A et al 2016 The SimonsGenome Diversity Project 300 genomes from 142 diverse popula-tions Nature 538(7624)201

Maples BK Gravel S Kenny EE Bustamante CD 2013 RFMix a discrim-inative modeling approach for rapid and robust local-ancestry infer-ence Am J Hum Genet 93(2)278ndash288

McManus KF Taravella AM Henn BM Bustamante CD Sikora MCornejo OE 2017 Population genetic analysis of the DARC locus(Duffy) reveals adaptation from standing variation associated withmalaria resistance in humans PLoS Genet 13(3)e1006560

Moorjani P Patterson N Hirschhorn JN Keinan A Hao L Atzmon GBurns E Ostrer H Price AL Reich D 2011 The history of African geneflow into Southern Europeans Levantines and Jews PLoS Genet7(4)e1001373

Musilova E Fernandes V Silva NM Soares P Alshamali F Harich NCherni L Gaaied AB Al-Meeri A Pereira L et al 2011 Populationhistory of the Red Seamdashgenetic exchanges between the ArabianPeninsula and East Africa signaled in the mitochondrial DNA HV1haplogroup Am J Phys Anthropol 145592ndash598

Ozato K Shin D-M Chang T-H Morse HC 2008 TRIM family proteinsand their emerging roles in innate immunity Nat Rev Immunol8(11)849ndash860

Patin E Lopez M Grollemund R Verdu P Harmant C Quach H Laval GPerry GH Barreiro LB Froment A et al 2017 Dispersals and geneticadaptation of Bantu-speaking populations in Africa and NorthAmerica Science 356(6337)543ndash546

Patterson N Moorjani P Luo Y Mallick S Rohland N Zhan YGenschoreck T Webster T Reich D 2012 Ancient admixture inhuman history Genetics 192(3)1065

Patterson N Price AL Reich D 2006 Population structure and eigena-nalysis PLoS Genet 2(12)e190

Petkova D Novembre J Stephens M 2016 Visualizing spatial populationstructure with estimated effective migration surfaces Nat Genet48(1)94ndash100

Pickrell JK Coop G Novembre J Kudaravalli S Li JZ Absher D SrinivasanBS Barsh GS Myers RM Feldman MW et al 2009 Signals of recentpositive selection in a worldwide sample of human populationsGenome Res 19(5)826ndash837

Pickrell JK Pritchard JK 2012 Inference of population splits and mixturesfrom genome-wide allele frequency data PLoS Genet8(11)e1002967

Price AL Patterson NJ Plenge RM Weinblatt ME Shadick NA Reich D2006 Principal components analysis corrects for stratification ingenome-wide association studies Nat Genet 38(8)904ndash909

Purcell S Neale B Todd-Brown K Thomas L Ferreira MA Bender DMaller J Sklar P de Bakker PI Daly MJ et al 2007 PLINK a tool set forwhole-genome association and population-based linkage analysesAm J Hum Genet 81(3)559ndash575

Quintana-Murci L Semino O Bandelt H-J Passarino G McElreavey KSantachiara-Benerecetti AS 1999 Genetic evidence of an early exit ofHomo sapiens sapiens from Africa through eastern Africa Nat Genet23(4)437

R Development Core Team 2018 R a language and environment forstatistical computing Vienna (Austria)R Foundation for StatisticalComputing

Racimo F 2016 Testing for ancient selection using cross-populationallele frequency differentiation Genetics 202(2)733ndash750

Ranciaro A Campbell MC Hirbo JB Ko W-Y Froment A Anagnostou PKotze MJ Ibrahim M Nyambo T Omar SA et al 2014 Geneticorigins of lactase persistence and the spread of pastoralism inAfrica Am J Hum Genet 94(4)496ndash510

Reich D Thangaraj K Patterson N Price AL Singh L 2009Reconstructing Indian population history Nature461(7263)489ndash494

Roche B Rougeron V Quintana-Murci L Renaud F Abbate JLPrugnolle F 2017 Might interspecific interactions between patho-gens drive host evolution The case of Plasmodium species andDuffy-negativity in human populations Trends Parasitol 33(1)21ndash29

Rodriguez-Flores JL Fakhro K Agosto-Perez F Ramstetter MD Arbiza LVincent TL Robay A Malek JA Suhre K Chouchane L et al 2016Indigenous Arabs are descendants of the earliest split from ancientEurasian populations Genome Res 26(2)151ndash162

Rosenberg NA Huang L Jewett EM Szpiech ZA Jankovic I Boehnke M2010 Genome-wide association studies in diverse populations NatRev Genet 11(5)356ndash366

Russo A Di Gaetano C Cugliari G Matullo G 2018 Advances in thegenetics of hypertension the effect of rare variants Int J Mol Sci19(3)688

Segal R 2002 Islamrsquos black slaves the other black diaspora LondonAtlantic Books

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin NSchwikowski B Ideker T 2003 Cytoscape a software environmentfor integrated models of biomolecular interaction networks GenomeRes 13(11)2498ndash2504

Skoglund P Thompson JC Prendergast ME Mittnik A Sirak K HajdinjakM Salie T Rohland N Mallick S Peltzer A et al 2017 Reconstructingprehistoric African population structure Cell 171(1)59ndash71 e21

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

585

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Snow RW Amratia P Zamani G Mundia CW Noor AM Memish ZA AlZahrani MH Al Jasari A Fikri M Atta H 2013 The malaria transitionon the Arabian Peninsula progress toward a malaria-free regionbetween 1960ndash2010 Adv Parasitol 82205ndash251

Soares P Alshamali F Pereira JB Fernandes V Silva NM Afonso C CostaMD Musilova E Macaulay V Richards MB et al 2012 The expan-sion of mtDNA haplogroup L3 within and out of Africa Mol BiolEvol 29(3)915ndash927

Szpiech ZA Hernandez RD 2014 selscan an efficient multithreadedprogram to perform EHH-based scans for positive selection MolBiol Evol 31(10)2824ndash2827

Tekola-Ayele F Adeyemo A Chen G Hailu E Aseffa AA-O Davey GA-ONewport MJ Rotimi CA-OX 2015 Novel genomic signals of recentselection in an Ethiopian population Eur J Hum Genet23(8)1085ndash1092

Thompson EE Kuttab-Boulos H Witonsky D Yang L Roe BA Di RienzoA 2004 CYP3A variation and the evolution of salt-sensitivity var-iants Am J Hum Genet 75(6)1059ndash1069

Tishkoff SA Reed FA Ranciaro A Voight BF Babbitt CC Silverman JSPowell K Mortensen HM Hirbo JB Osman M et al 2007Convergent adaptation of human lactase persistence in Africa andEurope Nat Genet 39(1)31ndash40

Triska P Soares P Patin E Fernandes V Cerny V Pereira L 2015Extensive admixture and selective pressure across the Sahel BeltGenome Biol Evol 7(12)3484ndash3495

Uchil PD Quinlan BD Chan W-T Luna JM Mothes W 2008 TRIM E3ligases interfere with early and late stages of the retroviral life cyclePLoS Pathog 4(2)e16

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Vyas DN Al-Meeri A Mulligan CJ 2017 Testing support for the northernand southern dispersal routes out of Africa an analysis of Levantineand southern Arabian populations Am J Phys Anthropol164(4)736ndash749

Vyas DN Kitchen A Miro-Herrans AT Pearson LN Al-Meeri A MulliganCJ 2016 Bayesian analyses of Yemeni mitochondrial genomes sug-gest multiple migration events with Africa and Western Eurasia AmJ Phys Anthropol 159(3)382ndash393

Wagh K Bhatia A Alexe G Reddy A Ravikumar V Seiler M Boemo MYao M Cronk L Naqvi A et al 2012 Lactase persistence and lipidpathway selection in the Maasai PLoS One 7(9)e44751

Yin T Cook D Lawrence M 2012 ggbio an R package for extending thegrammar of graphics for genomic data Genome Biol 13(8)R77

Yu Y Ouyang Y Yao W 2018 shinyCircos an RShiny application forinteractive creation of Circos plot Bioinformatics 34(7)1229ndash1231

Zhao F McParland S Kearney F Du L Berry DP 2015 Detection ofselection signatures in dairy and beef cattle using high-density ge-nomic information Genet Sel Evol 4749

Fernandes et al doi101093molbevmsz005 MBE

586

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Persian ancestry and this South Asian component was sim-ilarly frequent in the UAE and Oman (23ndash26) falling tobelow 10 frequency in all other Arabian and Levantpopulations

Principal Component Analysis (PCA) analyses revealedidentical population structure (supplementary fig S3Supplementary Material online) The first two principalcomponents separated the individuals along well-established geographical axes PC1 explained most of thediversity (799) and organized the populations in sub-Saharan African versus other with admixed Arabian and

Levantine individuals in between PC2 (92) separatedArabians and Levantines from Europeans and SouthAsians Most AP populations displayed a cluster withBedouin individuals and were very close to Levantine pop-ulations indicating that these individuals were less admixedwith non-Arabian populations A variable fraction of APindividuals displayed signs of admixture with North andsub-Saharan African populations spreading across the Xaxis In Iran Oman and the UAE some admixed individualswere more dispersed along the Y axis concentrating be-tween Levantine and South Asian populations

FIG 1 Population structure inferred by ADMIXTURE analysis (A) in which each individual is represented by a vertical (100) stacked column ofgenetic components proportions shown in color for K 6 Shared IBD fragments between pairs of individuals from the Arabian Peninsula Levantand Egyptthorn East Africa filtering for IBDlt 2 cM (B) IBD 2ndash5 cM (C) IBD 10ndash20 cM (D) and IBDgt20 cM (E) Populations are represented by a circleproportional to the sample size of the populations The Levantine populations are represented in pink AP in green and Egyptianthorn East African inblack Shared IBD fragments are represented by a line in pink to Levant and in green to AP The maps were adapted via Wikimedia Commons

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

577

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Gene Flow in the Region and Dating of AdmixtureThe set of algorithms based on haplotype sharing were testedin the ldquoextended data setrdquo (supplementary table S2Supplementary Material online) Refined Identity-by-descent(IBD) (Browning and Browning 2013) (fig 1BndashE) was appliedto AP Levantine Egyptian and East African populations andallowed us to identify shared IBD segments Since the lengthof shared haplotype decreases throughout time segmentlengths of lt2 cM and 2ndash5 cM are indicative of old commonancestry These results attested the genetic continuity in theregion given the high share of haplotypes Although nothighly differentiated the Levant seemed to share a loweramount of IBD with Kenya Somalia and Ethiopia when com-pared with what the AP shares with these populations Whenfiltering for shared IBD lengths 10ndash20 cM andgt20 cM (youn-ger shared history) the southern sharing (AP and East Africa)was dominant

The estimated effective migration surfaces (EEMS) plot(fig 2A) revealed a barrier between West and East AP pop-ulations East Africa shared a migration surface with SaudiArabia and Yemen populations whereas another genetic cor-ridor was evident between the Levant Iran and the East APThese results were corroborated by Wrightrsquos FST metric

(supplementary fig S4 Supplementary Material online)with Saudi Arabia being genetically closer to Yemen (0002)and the UAE closer to Oman (0001) than between bothsides of the Peninsula (0006) In terms of FST pairwise com-parisons between AP and South Asian populations the UAEdisplayed the lowest values (0003 with Iran 0007 withMakrani and 0009 with Balochi) and Iran was closer tothe Levant and the East AP (0003 with Syria Jordan andLebanon 0003 with the UAE and 0004 with Oman) thanto South Asian populations (0005 with Balochi 0006 withPathan and 0032 with India) All AP populations showed alower distance from the Levant (0004ndash0008) than fromEurope (0012ndash0027) East Africa (0084ndash0110) and finallyWest Africa (0095ndash0123) The same pattern of west-eastdiversity in the Peninsula was observed for the f3 statistics(supplementary fig S5 and supplementary table S3Supplementary Material online) with detection of Africanand LevantEuropean admixture in all Arabian populationsand admixture with South Asian ancestry limited to theEast AP

The clustered coancestry matrix from thefineSTRUCTURE and the ChromoPainter analyses whichexplores patterns of haplotype sharing among groups

FIG 2 Estimated effective migration surfaces gradient map around the Arabian Peninsula (A) The color scale reveals low (light gray) and high (darkgray) genetic barriers between populations localized on a grid of 500 demes The scale represents log values of the effective migration rates (m)Each deme (black dot) is proportional to the sample sizes of the populations Map was generated with the R package worldmap Dates ofadmixture events (B) in AP and Iranian populations as estimated by GLOBETROTTER Pie charts reflect the admixture proportions of ancestrieslisted in the legend Open circles in the date of admixture refer to one event whereas filled circles refer to multiple events and lines represent the95 confidence interval

Fernandes et al doi101093molbevmsz005 MBE

578

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

classified AP and Iranian individuals in 12 clusters attestingtheir diverse genetic reservoir (supplementary fig S6Supplementary Material online) Proportions of ancestriesreflected the ones obtained in ADMIXTURE mostly AP andIranian individuals were clustered with LevantNorthAfrican populations (92 from Yemen 92 Iran 92 theUAE 84 Saudi Arabia and 80 Oman) and to a lesserextent with South Asians (4ndash8 of UAE and Oman indi-viduals) and sub-Saharan Africans including with ComoroMadagascar and Swahili populations (12 Oman 12Saudi Arabia and 4 Iran)

Overall GLOBETROTTER results highlighted recent ad-mixture events in the AP region (fig 2B and supplementarytable S4 Supplementary Material online) Between 400 and1000 years before present admixture events took place withEast African ancestry in all AP populations The non-Africanbackground was similar to a European background in allpopulations except in the UAE and Iran where a closenessto South Asian ancestry was identified The sub-SaharanAfrican proportion was double in the West AP than in theEast AP In Saudi Arabia the UAE and Oman multidateevents were more probable but the second event was evenyounger (67 153 and 160 BP respectively) and involved thesame ancestries of the older event

Local Ancestry InferenceThe RFMix software package (Maples et al 2013) was used toanalyze the genomes of admixed Arabian and Iranian popu-lations and to detect regions showing excess of a given pa-rental population (sub-Saharan African or South Asian indetriment of the main EuropeanLevant background)Results revealed statistically significant enrichment of a par-ticular ancestry in certain regions of the genome which areassociated with selection signals due to adaptive traits such asmalaria resistance skin pigmentation and lactose tolerance(fig 3A and B and supplementary figs S7 and S8Supplementary Material online show results for all testedpopulations supplementary table S5 SupplementaryMaterial online)

The region of the genome with the highest proportion ofsub-Saharan African ancestry (gt4SD) observed in Arabianand Iranian populations was located on chromosome 1 ina gene-rich region containing the DARC gene and severalolfactory receptor genes such as OR10J1 OR10J3 andOR10J5 This sub-Saharan African input was higher inYemen (a mean of 061 supplementary fig S9Supplementary Material online) than in other regions ofthe Peninsula (049 Oman 035 the UAE 028 Saudi Arabiaand 016 in Iran) Another region highly enriched in all pop-ulations for sub-Saharan African ancestry was located in chro-mosome 9 containing genes coding for forkhead box D4(FOXD4L4 and FOXD4L2) and spermatogenesis-associatedproteins (FAM75A5 and FAM75A7)

The RFMix analysis indicated a South Asian-enriched re-gion on chromosome 2 containing the RAB3GAP1 ZRANB3LCT MCM6 and CXCR4 genes They have been previouslyassociated with lactase persistence the ability to digest milkinto adulthood which is an adaptive trait mostly found in

European populations and a classic example of genetic adap-tation in humans (Tishkoff et al 2007 Campbell and Tishkoff2008 Triska et al 2015) This region reached highly significantvalues (gt4SD) in Saudi and Yemen and was also significant inthe UAE and Iran (3SD lt X lt 4SD) Another South Asian-enriched region (3SD lt X lt 4SD) was detected on chromo-some 5 in Yemen Saudi Arabia and the UAE containing theSLC45A2 gene associated with skin eye and hair pigmenta-tion Nevertheless the most significant South Asian-enrichedsignal detected in all populations (gt4SD) was located inchromosome 6 containing many genes related with the im-mune system (HLA-B HLA-DR HLA-DQ MICA MICB andTNF)

Selection in the AP and IranWhen checking positive selection signals in the AP and Iranby using integrated haplotype score (iHS) and populationextended haplotype homozygosity (XP-EHH) measures sev-eral genomic regions displayed significant values (fig 3CndashFand supplementary figs S10ndashS13 and supplementary tablesS6ndashS9 Supplementary Material online)

Particularly several signals previously associated with se-lection in non-African populations were detected East APand Iranian populations exhibited a strong signal of positiveselection (gt4SD) in the XP-EHH-European comparison onchromosome 2 region containing the R3HDM1 and LCTgenes associated to Europeans with food conversion effi-ciency and metabolism of lactose (Zhao et al 2015) TheSLC24A5 gene on chromosome 15 which is strongly associ-ated with skin pigmentation variation among European andSouth Asian populations (Basu Mallick et al 2013) presentedsigns of positive selection in all populations in iHS (gt4SD forYemen Saudi Arabia and the UAEgt3SD for Oman and Iran)and XP-EHH-South Asian (gt3SD for all populations) analysesAnother skin pigmentation-related gene SLC45A2 which islocated on chromosome 5 and is known to be strongly asso-ciated to this phenotype in European populations (Candilleet al 2012 Hernandez-Pacheco et al 2017) displayed signs ofpositive selection in all populations for the XP-EHH-Europeancomparison (gt4SD in Saudi Arabia and gt3SD in remainingpopulations) The same occurred for chromosome 4 regioncontaining the TLR1 TLR6 and TLR10 genes (role in pathogenrecognition and activation of innate immunity [Laayouniet al 2014] gt4SD in Yemen and Saudi Arabia and gt3SDin remaining populations) and on chromosome 12 regioncontaining the ATXN2 ACAD10 and ALDH2 genes (associ-ated with hypertension [Russo et al 2018] gt4SD in SaudiArabia and gt3SD in remaining populations)

In the comparison with African populations (XP-EHH-African) significant values (gt3SD) were observed for thechromosome 1 region containing the DARC gene in SaudiArabia and Iran Amongst the strongest signals of positiveselection was the chromosome 21 region containing theC21orf34 MIR99A and MIRLET7C genes detected in all pop-ulations (gt4SD) both in iHS and XP-EHH-African tests Thissignal was previously identified as amongst the strongest se-lection signals in non-Africans (Pickrell et al 2009) and re-cently C21orf34 was identified as MIR99AHG a miRNA gene

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

579

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

FIG 3 Circos plots highlighting significantly (gt4SD) enriched local admixed and positively selected segments in Arabian Peninsula and Iranianpopulations (A) Enriched sub-Saharan African ancestry (B) Enriched South Asian ancestry (C) iHS selection metrics (D) XP-EHH versus Europeanancestry (E) XP-EHH versus sub-Saharan African ancestry (F) XP-EHH versus South Asian ancestry

Fernandes et al doi101093molbevmsz005 MBE

580

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

involved in the regulation of hematopoiesis and oncogenesfor the development of myeloid leukemia (Emmrich et al2014) In addition a signal on the chromosome 10 regioncontaining the SPAG6 gene was observed in all populationsin iHS (except Yemen) and XP-EHH-African (stronger inOman and the UAE) analyses This gene is essential for thestructural integrity of the central microtubule of sperm sta-bility and flagellar motility and has been previously identifiedas being under selection before the separation betweenEuropean and Asian populations (Racimo 2016) The signifi-cant signal in iHS score on chromosome 7 region containingseveral genes (CYP3A5 ZSCAN25 ATP5J2-PTCD1 CPSF4ATP5J2 ZNF789 and ZKSCAN5) was also detected in all pop-ulations at the XP-EHH-African comparison CYP3A5 gene isinvolved in cholesterol metabolism and steroid biosynthesisin the liver and has been identified to be under positive se-lection in African and non-African populations (Wagh et al2012) CYP3A513 polymorphism was also described as be-ing under selective pressure for water retention and risk forsalt-sensitive hypertension in equatorial populations(Thompson et al 2004)

The iHS metrics identified a positively selected region onchromosome 6 in all populations which contains candidategenes involved in the human immunodeficiency virus (HIV)stages of the viral lifecycle the ZNRD1-AS1and TRIM26 genesZNRD1-AS1 plays a role in the regulation of cell proliferationand is essential for the completion of HIV lifecycle Itsrs3132130 SNP was described to confer host resistance toHIV-1 acquisition by causing a loss of nuclear factor bindingand decreasing the ZNRD1 promoter activity (An et al 2014)Although with unknown function the tripartite motif (TRIM)proteins may contribute to the innate immunity of retrovi-ruses affecting both early and late stages of the retroviral lifecycle (Ozato et al 2008 Uchil et al 2008) Another signifi-cantly iHS selected region in all populations except the SaudiArabians was identified on chromosome 22 region contain-ing the TTC38 PKDREJ GTSE1 and PPARA genes PPARA geneis a transcription receptor that regulates lipid and glucosemetabolism during food deficiency and a recent study(Tekola-Ayele et al 2015) found a specific signal of recentpositive selection of this gene in Ethiopia suggesting a met-abolic adaptation to high-altitude hypoxia

ConclusionsThe detailed genome-wide characterization of Arabian andIranian populations allowed us to establish for the first timethe palimpsest of admixture and selection along the genomesof these inhabitants the main bridge between African Asianand European population groups Admixture events com-prised up to 20 of sub-Saharan African ancestry in theWest and 20 of South Asian ancestry in the East of theAP enriching the local predominant Arabian- North African-Levantine and Caucasian-like components The sub-SaharanAfrican input decreased toward the East whereas the SouthAsian input decreased in the direction of the West of thePeninsula testifying the continuous genetic exchange withAfrica and Asia as it was evident in the IBD sharing analysis

Of course there are specific subgroups that depart from thesegeneral characteristics such as the Qatari Bedouins that rep-resent a ldquocoldspotrdquo of admixture in the AP (Rodriguez-Floreset al 2016) whereas the Hadrami in Yemen are a ldquohotspotrdquo ofadmixture (Vyas et al 2017) EEMS analysis identified higherexchanges of genes between Africa and the West AP as wellas between the East AP and the LevantSouth Asia thanbetween the West and the East AP This result agrees withpreviously documented degree of isolation between Westand East AP populations identified when studying uniparen-tal markers (Alshamali et al 2009 Fernandes et al 2012 2015)We previously found relict maternal lineages (60 ka) of theoriginal OOA successful settlement by modern humans in theAP and most likely they spread to South Asia and the LevantEurope from the Gulf corridor (Fernandes et al 2012)However current LD decayndashbased dates only indicated re-cent admixture events in the eastern side of the PeninsulaSimilarly younger ages of admixture in the West AP reflectedthe Islamization of North Africa as the most probable maincontributor of this region The exchanges across the Red Seadetected for the autosomal data matched the Arab slavetrade and maritime dominance

Recent studies have provided evidence that admixed pop-ulations acquired positively selected genomic regions fromtheir parental populations or in other words the distributionof admixed blocks across the genomes of descendants wasnot even (Triska et al 2015 Laso-Jadart et al 2017 Patin et al2017) The acquired selected regions can be inferred from theconcomitant identification of the same genomic regions inthe selection metrics and in the local enrichment ancestryanalysis AP populations displayed a jigsaw puzzle of AfricanEuropean and South Asian-selected genes linked to responseto infection diseases skin pigmentation lactase persistenceand food conversion

The strongest African-related enrichment sign was for theDARC gene which codes the Duffy antigen a membrane-bound chemokine receptor used by Plasmodium vivax toinfect red blood cells Plasmodium vivax causes a chronicform of malaria and is the most widespread type of malariaoutside Africa (McManus et al 2017) It is described that thederived C allele mutation in the promoter which causeserythroid-specific suppression of gene expression confers re-sistance against Plasmodium vivax malaria and thus it is al-most fixed in African populations (Triska et al 2015 Laso-Jadart et al 2017 McManus et al 2017) The AP is known tohave a diverse malaria epidemiology in recent times from alikely absence of transmission in Kuwait to an intense parasiteexposure similar to conditions in many parts of Africa amongresidents of Saudi Arabia and Yemen (Snow et al 2013) Thedetected enrichment signal confirmed here in the genomes ofArabians supported the higher malaria pressure in the WestAP The enrichment was strong enough to allow detection ofselection in the XP-EHH metrics in Saudi Arabia and Iran

The most significant EuropeanSouth Asian-related enrich-ments in AP populations were linked to lactase tolerance-specific chromosome 2 RAB3GAP1LCTMCM6 region skinpigmentation associated to the SLC24A5 gene and HLA re-gion on chromosome 6 The LCT gene 13915G allele

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

581

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

(rs41380347) was described as being present at high fre-quency in the AP and the Levant and estimated to havebeen originated 4 ka (62 ka) in the AP (Bayoumi et al2016) possibly as a result of the domestication of the Arabiancamel at 6 ka (Ranciaro et al 2014) By contrast theEuropean 13910T selected allele (rs4988235) is also themost commonly observed in South Asia (Gallego Romeroet al 2012) We could not verify frequencies of rs41380347SNP that was not part of the chip genotyped here but wechecked the linked rs4988243 SNP This last SNP testified thehigher frequencies in the West versus the East AP (432 inYemen 461 in Saudi Arabia 179 in Oman 191 in theUAE and 174 in Iran notice that although rs41380347 andrs4988243 are linked this last SNP has overall frequencies20 higher than the first) The genotyped Europeanrs4988235 SNP presented higher frequencies of the derivedallele in the East than in the West AP (06 13 125182 and 103 respectively) The haplotype backgroundof Arabian and European derived alleles are related and dis-tant from the African one (Ranciaro et al 2014) whichexplains the high non-African enrichment detected inRFMix analysis in the West AP (the Arabian haplotypes)and the selection signal in XP-EHH-European comparison inThe East AP and Iran (the European haplotypes) This is aninteresting case generated by the isolation between the twosides of the Peninsula where the West developed a localpositive selection of a lactose tolerant variant whereas theEast acquired the EuropeanSouth Asian variant Previouslythis difference had been reported for ethnic groups withinOman (Bayoumi et al 2016) where frequencies of13910Tand 13915G alleles diverged between Arabs of northernand southern Oman whom migrated from Yemen (0ndash1 and14ndash72 respectively) and Omanis of Asian origin (16 and0 respectively) Here we verified that this pattern is ex-tended to the entire Peninsula

In relation to the genetic diversity linked to skin color theEuropeanSouth Asian-selected derived SNPs are equally fre-quent in both the West and East AP (rs1426654 in theSLC24A5 gene 101 in Yemen 133 in Saudi Arabia132 in Oman 135 in the UAE and 118 in Iranrs183671 in the SLC45A2 gene 354 294 227 290and 401 respectively rs1667394 in the OCA2HERC2 gene381 288 307 335 and 349 respectively)

The palimpsest of complex interactions between cross andlocal selective pressures and demographic factors identifiedhere in the AP testifies the rich genetic dynamics taking placein bridging zones of the globe In the near future wholegenomes of Arabians across the entire peninsula (with amore precise geographical information than the one we couldpresent here) and more sensitive selection measures will addinformation to the role played by this geographic region inthe adaptation of modern humans to non-African habitatsImprovements to dating techniques applicable to completegenomes such as multiple sequentially Markovian coalescentapproach will also help in answering the question if the APwas only a passage way to reach further distances or one ofthe oldest inhabited regions OOA Complete genomes mayindeed shed light on the lost period in the mtDNA pool due

to the absence of the direct OOA descendants The first stepshave already been taken and a few Arabian completegenomes have begun to be available (John et al 2015Mallick et al 2016 Rodriguez-Flores et al 2016)

Materials and Methods

Population Samples Genome-Wide Genotyping andPublished DataDNA samples analyzed in this study were collected from 420individuals of the AP (Saudi Arabia Yemen Oman and theUAE) and from 80 individuals of Iran populations (furtherinformation provided in supplementary table S1Supplementary Material online) Individuals were residing inDubai at the time of sampling (between years 1991 and 2000)representing a widespread collection of birth origins fromSaudi Arabia Yemen Oman the UAE and Iran This studyobtained ethical approval from the Ethics Committee of theUniversity of Porto Portugal (17CEUP2012) The individualswere genotyped with the Illumina Human Omni ExpressBead Chip (OmniExpress) containing 741000 SNPs Qualitychecks were applied using PLINK (Purcell et al 2007 Changet al 2015) and 22 individuals were excluded for the lack ofmore than 10 of genotypes A preliminary PCA revealedthat some UAE and Oman individuals (five individuals fromeach population) had a considerable sub-Saharan African an-cestry an indicator that they were most probably recentimmigrants from Africa (supplementary fig S14Supplementary Material online) Hence these individualswere removed from the analyses leading to a final set of468 individuals Markers with genotyping call rates lowerthan 95 were removed consequently a total of 627435autosomal SNPs passed quality control check The genotypesfrom the individuals analyzed in this study can be accessedfrom the EGA repository (European Genome-PhenomeArchive accession number EGAS00001003335) In order toperform a fine-scale population structure analysis of our com-plete data the 468 individuals were merged with relevantavailable data sets (supplementary table S1 SupplementaryMaterial online) As several population genetic analyses as-sume independent markers SNPs were pruned for pairwiseLD in PLINK (wwwcog-genomicsorgplink19 Purcell et al2007 Chang et al 2015) by removing any SNP that had an r2

gt 04 with another SNP within a 50-SNPs sliding windowwith a step of 20 SNPs After pruning the ldquorestricted data setrdquoincluded 176309 autosomal SNPs and 1782 individuals (sup-plementary table S1 Supplementary Material online) andthis data set was used in the analyses based on allele frequen-cies (PCA ADMIXTURE FST and f3 statistic) Another data setwas created by randomly selecting a maximum of 25 individ-uals from each population but extending the number andthe geographical distribution of populations this data setconsisting of 171728 SNPs and 2056 individuals was desig-nated as ldquoextended data setrdquo (supplementary table S2Supplementary Material online) The homogenization ofthe number of samples per group avoided statistical biasesdue to overrepresentation of some populations and allowedto run computationally demanding algorithms such as EEMS

Fernandes et al doi101093molbevmsz005 MBE

582

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

fineSTRUCTURE Chromopainter GLOBETROTTER and IBDAll genotypes from the unpruned ldquoextended data setrdquo werephased with SHAPEIT v2r79044 (Delaneau et al 2011) usingthe 1000 Genomes phased data (Auton et al 2015) as a ref-erence panel and the HapMap phase 2 genetic map (Frazeret al 2007) We did not merge our data with relevant data setsfrom Qatar and Yemen (Hunter-Zinck et al 2010 Vyas et al2017) as these were screened through Affymetrix chips bear-ing a low number of overlapping SNPs Qatari completegenomes from (Rodriguez-Flores et al 2016) were specificallyselected not making a random population sample whichcould be comparable with our populations

Population Structure Differentiation and InbreedingPCA which infers worldwide axes of human genetic variationfrom the allele frequencies of various populations was carriedout in the pruned ldquorestricted data setrdquo by using the smartpcatool included in the EIGENSOFT package (Patterson et al2006 Price et al 2006) We ran ADMIXTURE (Alexander et al2009) to decompose genetic ancestries of the prunedldquorestricted data setrdquo using maximum likelihood for compo-nents Kfrac14 2 to Kfrac14 12 with the optimal K estimated throughcross-validation of the logistic regression (Kfrac14 6) Wrightrsquos FST

metric was calculated using Vcftools (Danecek et al 2011)The THREEPOP program from TreeMix was used to calculatef3 statistics (Reich et al 2009 Patterson et al 2012 Pickrell andPritchard 2012) which tests between alternative scenarios ofadmixture In order to analyze and visualize spatial populationstructure from georeferenced genetic samples and to definegenetic barriers and corridors the method of EEMS (Petkovaet al 2016) was used Geographic coordinates and a geneticdissimilarity matrix between populations were used to set upa map defining a grid of 500 demes and 3000000 Markovchain Monte Carlo (MCMC) iterations were run beforechecking for convergence Plots were generated in R followingEEMS v1 manual indications

Inference and Dating of Population AdmixturePopulation structure of the phased ldquoextended data setrdquo wasevaluated using the fineSTRUCTURE v207 package 18(Lawson et al 2012) fineSTRUCTURE captures informationon population structure provided by patterns of haplotypesimilarity calculated with Chromopainter v20 18 (Lawsonet al 2012) and performs a model-based Bayesian clusteringof genotypes From the results a coancestry heat map and adendrogram were inferred to visualize the number of statis-tically defined clusters that describe the data The clustersobtained by fineSTRUCTURE were then used to assign thepopulations as target or donorsurrogate when testing foradmixture Mutational rates and effective population sizeparameters were first estimated with an Estimation-Maximization algorithm running Chromopainter v2 on all22 autosomes for the entire data set with 10 iterations(Lawson et al 2012) The weighted average of these param-eters according to the SNP coverage of each chromosomeand the number of individuals was then used to compute thechromosomal painting Two Arabian groups were identifiedas target populations the western group represented by

Yemen and Saudi Arabia and the eastern group representedby Oman and the UAE We ran chromosomal painting foreach group Due to the possibility of common ancestry weexcluded the Levantine populations from surrogate popula-tions The painted chromosomes obtained for each clusterwere used in GLOBETROTTER v10 26 (Hellenthal et al 2014)to estimate the ratios and dates of the potential admixtureevents in the region Coancestry curves were estimated foreach target population both standardizing against a null(NULL 1) and standard (NULL 0) individual and consistencybetween each estimated parameter was checked A total of100 bootstrap resamplings were performed to estimate theprobability values of the admixture events considering theldquoNULLrdquo individual by calculating the proportion of non-sensical inferred dates (lt1 or gt400 generations) producedby the model (NULL 1) The ldquobest-guessrdquo scenario given byGLOBETROTTER v10 26 (Hellenthal et al 2014) was consid-ered for each target population Dates of admixture given ingenerations were converted to chronological time using ageneration interval of 29 years (Langergraber et al 2012)

Haplotype sharing between pairs of individuals was esti-mated by the refined IBD algorithm of Beagle v4026 27(Browning and Browning 2007) and visualized withCytoscape v32160 (Shannon et al 2003) All maps inthe present study were generated using Global Mapperv15 (httpwwwbluemarblegeocomproductsglobal-mapperphp)

Local Ancestry InferenceRFMix (Maples et al 2013) software which employs a LDmodel between markers to infer ancestry for each segmentof the genome between a mixture of putative ancestral panelsof haplotypes was used in the unpruned and phased data setfrom the four AP populations plus Iran The phased data fromthe 1000 Genomes populations of Great Britain (representingEuropean ancestry) Yoruba from Nigeria (sub-SaharanAfrican ancestry) and Indian Telugu (South Asian ancestry)were used as parental populations Information on ancestrywas obtained for each locus along chromosomes for everyindividual and these values were averaged in each popula-tion In order to identify regions which had a significantlyhigher proportion of a given parental ancestry we followedpreviously published studies (Bryc et al 2015) consideringonly regions outside the range defined by the genomemean 6 4SD The karyograms of the local ancestry acrossthe 22 chromosomes were generates in R using the ggbiopackage (Yin et al 2012)

Analysis of SelectioniHS (Voight et al 2006) was estimated in the SelScan package(Szpiech and Hernandez 2014) in the phased data set fromthe four AP populations plus Iran Scores were calculated foreach SNP within the population as a whole and standardizedwithin each of 100 bins of allele frequency using the normtool In order to identify loci potentially affected by positiveselection a percentile score was assigned based on the pro-portion of extreme iHS values in segment SelScan (Szpiechand Hernandez 2014) tool was also used to estimate cross

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

583

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

population extended haplotype homozygosity in each APpopulation by comparing with Great Britain Yoruba fromNigeria or Indian Telugu Consistently with other tests forselection only markers with MAFgt 1 were used for XP-EHH computation The obtained XP-EHH values were nor-malized by subtracting genome-wide mean XP-EHH and di-viding by standard deviation (Szpiech and Hernandez 2014)Genomic regions with percentile score exceeding 4SD and3SD were reported

Selected SNPs were mapped to genes in 5 kb flanking re-gion and significantly selected regions were represented incircos Plots generated in R (R Development Core Team 2018)using the shinyCircos graphical interface (Yu et al 2018)

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsThis work was financed by FEDERmdashFundo Europeu deDesenvolvimento Regional funds through COMPETE2020mdashOperacional Programme for Competitiveness andInternationalisation (POCI) Portugal 2020 and byPortuguese funds through FCTmdashFundac~ao para a Ciencia ea TecnologiaMinisterio da Ciencia Tecnologia e Inovac~ao inthe framework of the project ldquoBiomedical anthropologicalstudy in Arabian Peninsula based on high throughputgenomicsrdquo (POCI-01-0145-FEDER-016609) VF has a postdocgrant through FCT (SFRHBPD1149272016)

ReferencesAl-Abri A Podgorna E Rose JI Pereira L Mulligan CJ Silva NM Bayoumi

R Soares P Cerny V 2012 PleistocenendashHolocene boundary inSouthern Arabia from the perspective of human mtDNA variationAm J Phys Anthropol 149(2)291ndash298

Alexander DH Novembre J Lange K 2009 Fast model-based estimationof ancestry in unrelated individuals Genome Res 19(9)1655ndash1664

Alshamali F Pereira L Budowle B Poloni ES Currat M 2009 Localpopulation structure in Arabian Peninsula revealed by Y-STR diver-sity Hum Hered 68(1)45ndash54

An P Goedert JJ Donfield S Buchbinder S Kirk GD Detels R Winkler CA2014 Regulatory variation in HIV-1 dependency factor ZNRD1 asso-ciates with host resistance to HIV-1 acquisition J Infect Dis210(10)1539ndash1548

Auton A Brooks LD Durbin RM Garrison EP Kang HM Korbel JOMarchini JL McCarthy S McVean GA Abecasis GR 2015 A globalreference for human genetic variation Nature 526(7571)68ndash74

Basu Mallick C Iliescu FM Mols M Hill S Tamang R Chaubey G Goto RHo SYW Gallego Romero I Crivellaro F et al 2013 The light skinallele of SLC24A5 in South Asians and Europeans shares identity bydescent PLoS Genet 9(11)e1003912

Bayoumi R De Fanti S Sazzini M Giuliani C Quagliariello A Bortolini EBoattini A Al-Habori M Al-Zubairi AS Rose JI et al 2016 Positiveselection of lactase persistence among people of Southern ArabiaAm J Phys Anthropol 161(4)676ndash684

Browning BL Browning SR 2013 Improving the accuracy and efficiencyof identity-by-descent detection in population data Genetics194(2)459ndash471

Browning SR Browning BL 2007 Rapid and accurate haplotype phasingand missing-data inference for whole-genome association studies byuse of localized haplotype clustering Am J Hum Genet81(5)1084ndash1097

Brucato N Fernandes V Mazieres S Kusuma P Cox MP Ngrsquoangrsquoa JWOmar M Simeone-Senelle MC Frassati C Alshamali F et al 2018The Comoros show the earliest Austronesian gene flow into theSwahili corridor Am J Hum Genet 102(1)58ndash68

Bryc K Durand EY Macpherson JM Reich D Mountain JL 2015 Thegenetic ancestry of African Americans Latinos and EuropeanAmericans across the United States Am J Hum Genet 96(1)37ndash53

Campbell MC Tishkoff SA 2008 African genetic diversity implica-tions for human demographic history modern human originsand complex disease mapping Annu Rev Genomics Hum Genet9403ndash433

Candille SI Absher DM Beleza S Bauchet M McEvoy B GarrisonNA Li JZ Myers RM Barsh GS Tang H et al 2012 Genome-wideassociation studies of quantitatively measured skin hair and eyepigmentation in four European populations PLoS One 7(10)e48294

Cerny V Mulligan CJ Fernandes V Silva NM Alshamali F Non A HarichN Cherni L El Gaaied AB Al-Meeri A et al 2011 Internal diversifi-cation of mitochondrial haplogroup R0a reveals post-last glacialmaximum demographic expansions in South Arabia Mol Biol Evol28(1)71ndash78

Cerny V Pereira L Kujanova M Vasikova A Hajek M Morris M MulliganCJ 2009 Out of Arabiamdashthe settlement of island Soqotra as revealedby mitochondrial and Y chromosome genetic diversity Am J PhysAnthropol 138(4)439ndash447

Chang CC Chow CC Tellier LC Vattikuti S Purcell SM Lee JJ 2015Second-generation PLINK rising to the challenge of larger and richerdatasets Gigascience 47

Danecek P Auton A Abecasis G Albers CA Banks E DePristo MAHandsaker RE Lunter G Marth GT Sherry ST et al 2011 The variantcall format and VCFtools Bioinformatics 27(15)2156ndash2158

Delaneau O Marchini J Zagury JF 2011 A linear complexity phasingmethod for thousands of genomes Nat Methods 9(2)179ndash181

Emmrich S Streltsov A Schmidt F Thangapandi VR Reinhardt DKlusmann JH 2014 LincRNAs MONC and MIR100HG act as onco-genes in acute megakaryoblastic leukemia Mol Cancer 13171

Fernandes V Alshamali F Alves M Costa MD Pereira JB Silva NMCherni L Harich N Cerny V Soares P et al 2012 The Arabian cradlemitochondrial relicts of the first steps along the southern route outof Africa Am J Hum Genet 90(2)347ndash355

Fernandes V Triska P Pereira JB Alshamali F Rito T Machado AFajkosova Z Cavadas B Cerny V Soares P et al 2015 Genetic stra-tigraphy of key demographic events in Arabia PLoS One10(3)e0118625

Frazer KA Ballinger DG Cox DR Hinds DA Stuve LL Gibbs RA BelmontJW Boudreau A Hardenbol P Leal SM et al 2007 A second gener-ation human haplotype map of over 31 million SNPs Nature449(7164)851ndash861

Fu Q Posth C Hajdinjak M Petr M Mallick S Fernandes D FurtwanglerA Haak W Meyer M Mittnik A et al 2016 The genetic history of IceAge Europe Nature 534(7606)200ndash205

Gallego Romero I Basu Mallick C Liebert A Crivellaro F Chaubey G ItanY Metspalu M Eaaswarkhanth M Pitchappan R Villems R et al2012 Herders of Indian and European cattle share their predomi-nant allele for lactase persistence Mol Biol Evol 29(1)249ndash260

Gravel S Zakharia F Moreno-Estrada A Byrnes JK Muzzio M Rodriguez-Flores JL Kenny EE Gignoux CR Maples BK Guiblet W et al 2013Reconstructing Native American migrations from whole-genomeand whole-exome data PLoS Genet 9(12)e1004023

Hellenthal G Busby GBJ Band G Wilson JF Capelli C Falush D Myers S2014 A genetic atlas of human admixture history Science343(6172)747ndash751

Hernandez-Pacheco N Flores C Alonso S Eng C Mak ACY Hunstman SHu D White MJ Oh SS Meade K et al 2017 Identification of a novellocus associated with skin colour in African-admixed populationsSci Rep 744548

Hodgson JA Mulligan CJ Al-Meeri A Raaum RL 2014 Early back-to-Africa migration into the horn of Africa PLoS Genet 10(6)e1004393

Fernandes et al doi101093molbevmsz005 MBE

584

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Hogarth DG 1904 The penetration of Arabia a record of the develop-ment of western knowledge concerning the Arabian Peninsula NewYork Cambridge Library Collection

Hunter-Zinck H Musharoff S Salit J Al-Ali KA Chouchane L Gohar AMatthews R Butler MW Fuller J Hackett NR et al 2010 Populationgenetic structure of the people of Qatar Am J Hum Genet87(1)17ndash25

John SE Thareja G Hebbar P Behbehani K Thanaraj TA Alsmadi O2015 Kuwaiti population subgroup of nomadic Bedouin ancestrymdashwhole genome sequence and analysis Genom Data 3116ndash127

Kivisild T Reidla M Metspalu E Rosa A Brehm A Pennarun E Parik JGeberhiwot T Usanga E Villems R 2004 Ethiopian mitochondrialDNA heritage tracking gene flow across and around the gate oftears Am J Hum Genet 75(5)752ndash770

Laayouni H Oosting M Luisi P Ioana M Alonso S Rica~no-Ponce ITrynka G Zhernakova A Plantinga TS Cheng S-C et al 2014Convergent evolution in European and Rroma populations revealspressure exerted by plague on Toll-like receptors Proc Natl Acad SciU S A 111(7)2668ndash2673

Lahr MM Foley R 2005 Multiple dispersals and modern human originsEvol Anthropol 3(2)48ndash60

Langergraber KE Prufer K Rowney C Boesch C Crockford C Fawcett KInoue E Inoue-Muruyama M Mitani JC Muller MN et al 2012Generation times in wild chimpanzees and gorillas suggest earlierdivergence times in great ape and human evolution Proc Natl AcadSci U S A 109(39)15716ndash15721

Laso-Jadart R Harmant C Quach H Zidane N Tyler-Smith C Mehdi QAyub Q Quintana-Murci L Patin E 2017 The genetic legacy of theIndian Ocean slave trade recent admixture and post-admixture se-lection in the Makranis of Pakistan Am J Hum Genet101(6)977ndash984

Lawson DJ Hellenthal G Myers S Falush D 2012 Inference of populationstructure using dense haplotype data PLoS Genet 8(1)e1002453

Li JZ Absher DM Tang H Southwick AM Casto AM Ramachandran SCann HM Barsh GS Feldman M Cavalli-Sforza LL et al 2008Worldwide human relationships inferred from genome-wide pat-terns of variation Science 319(5866)1100ndash1104

Loh PR Lipson M Patterson N Moorjani P Pickrell JK Reich D Berger B2013 Inferring admixture histories of human populations using link-age disequilibrium Genetics 193(4)1233ndash1254

Lovejoy PE 2011 Transformations in slavery a history of slavery in AfricaCambridge Cambridge University Press

Macaulay V Hill C Achilli A Rengo C Clarke D Meehan W Blackburn JSemino O Scozzari R Cruciani F et al 2005 Single rapid coastalsettlement of Asia revealed by analysis of complete mitochondrialgenomes Science 308(5724)1034

Malaspinas A-S Westaway MC Muller C Sousa VC Lao O Alves IBergstrom A Athanasiadis G Cheng JY Crawford JE et al 2016 Agenomic history of Aboriginal Australia Nature 538(7624)207

Mallick S Li H Lipson M Mathieson I Gymrek M Racimo F Zhao MChennagiri N Nordenfelt S Tandon A et al 2016 The SimonsGenome Diversity Project 300 genomes from 142 diverse popula-tions Nature 538(7624)201

Maples BK Gravel S Kenny EE Bustamante CD 2013 RFMix a discrim-inative modeling approach for rapid and robust local-ancestry infer-ence Am J Hum Genet 93(2)278ndash288

McManus KF Taravella AM Henn BM Bustamante CD Sikora MCornejo OE 2017 Population genetic analysis of the DARC locus(Duffy) reveals adaptation from standing variation associated withmalaria resistance in humans PLoS Genet 13(3)e1006560

Moorjani P Patterson N Hirschhorn JN Keinan A Hao L Atzmon GBurns E Ostrer H Price AL Reich D 2011 The history of African geneflow into Southern Europeans Levantines and Jews PLoS Genet7(4)e1001373

Musilova E Fernandes V Silva NM Soares P Alshamali F Harich NCherni L Gaaied AB Al-Meeri A Pereira L et al 2011 Populationhistory of the Red Seamdashgenetic exchanges between the ArabianPeninsula and East Africa signaled in the mitochondrial DNA HV1haplogroup Am J Phys Anthropol 145592ndash598

Ozato K Shin D-M Chang T-H Morse HC 2008 TRIM family proteinsand their emerging roles in innate immunity Nat Rev Immunol8(11)849ndash860

Patin E Lopez M Grollemund R Verdu P Harmant C Quach H Laval GPerry GH Barreiro LB Froment A et al 2017 Dispersals and geneticadaptation of Bantu-speaking populations in Africa and NorthAmerica Science 356(6337)543ndash546

Patterson N Moorjani P Luo Y Mallick S Rohland N Zhan YGenschoreck T Webster T Reich D 2012 Ancient admixture inhuman history Genetics 192(3)1065

Patterson N Price AL Reich D 2006 Population structure and eigena-nalysis PLoS Genet 2(12)e190

Petkova D Novembre J Stephens M 2016 Visualizing spatial populationstructure with estimated effective migration surfaces Nat Genet48(1)94ndash100

Pickrell JK Coop G Novembre J Kudaravalli S Li JZ Absher D SrinivasanBS Barsh GS Myers RM Feldman MW et al 2009 Signals of recentpositive selection in a worldwide sample of human populationsGenome Res 19(5)826ndash837

Pickrell JK Pritchard JK 2012 Inference of population splits and mixturesfrom genome-wide allele frequency data PLoS Genet8(11)e1002967

Price AL Patterson NJ Plenge RM Weinblatt ME Shadick NA Reich D2006 Principal components analysis corrects for stratification ingenome-wide association studies Nat Genet 38(8)904ndash909

Purcell S Neale B Todd-Brown K Thomas L Ferreira MA Bender DMaller J Sklar P de Bakker PI Daly MJ et al 2007 PLINK a tool set forwhole-genome association and population-based linkage analysesAm J Hum Genet 81(3)559ndash575

Quintana-Murci L Semino O Bandelt H-J Passarino G McElreavey KSantachiara-Benerecetti AS 1999 Genetic evidence of an early exit ofHomo sapiens sapiens from Africa through eastern Africa Nat Genet23(4)437

R Development Core Team 2018 R a language and environment forstatistical computing Vienna (Austria)R Foundation for StatisticalComputing

Racimo F 2016 Testing for ancient selection using cross-populationallele frequency differentiation Genetics 202(2)733ndash750

Ranciaro A Campbell MC Hirbo JB Ko W-Y Froment A Anagnostou PKotze MJ Ibrahim M Nyambo T Omar SA et al 2014 Geneticorigins of lactase persistence and the spread of pastoralism inAfrica Am J Hum Genet 94(4)496ndash510

Reich D Thangaraj K Patterson N Price AL Singh L 2009Reconstructing Indian population history Nature461(7263)489ndash494

Roche B Rougeron V Quintana-Murci L Renaud F Abbate JLPrugnolle F 2017 Might interspecific interactions between patho-gens drive host evolution The case of Plasmodium species andDuffy-negativity in human populations Trends Parasitol 33(1)21ndash29

Rodriguez-Flores JL Fakhro K Agosto-Perez F Ramstetter MD Arbiza LVincent TL Robay A Malek JA Suhre K Chouchane L et al 2016Indigenous Arabs are descendants of the earliest split from ancientEurasian populations Genome Res 26(2)151ndash162

Rosenberg NA Huang L Jewett EM Szpiech ZA Jankovic I Boehnke M2010 Genome-wide association studies in diverse populations NatRev Genet 11(5)356ndash366

Russo A Di Gaetano C Cugliari G Matullo G 2018 Advances in thegenetics of hypertension the effect of rare variants Int J Mol Sci19(3)688

Segal R 2002 Islamrsquos black slaves the other black diaspora LondonAtlantic Books

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin NSchwikowski B Ideker T 2003 Cytoscape a software environmentfor integrated models of biomolecular interaction networks GenomeRes 13(11)2498ndash2504

Skoglund P Thompson JC Prendergast ME Mittnik A Sirak K HajdinjakM Salie T Rohland N Mallick S Peltzer A et al 2017 Reconstructingprehistoric African population structure Cell 171(1)59ndash71 e21

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

585

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Snow RW Amratia P Zamani G Mundia CW Noor AM Memish ZA AlZahrani MH Al Jasari A Fikri M Atta H 2013 The malaria transitionon the Arabian Peninsula progress toward a malaria-free regionbetween 1960ndash2010 Adv Parasitol 82205ndash251

Soares P Alshamali F Pereira JB Fernandes V Silva NM Afonso C CostaMD Musilova E Macaulay V Richards MB et al 2012 The expan-sion of mtDNA haplogroup L3 within and out of Africa Mol BiolEvol 29(3)915ndash927

Szpiech ZA Hernandez RD 2014 selscan an efficient multithreadedprogram to perform EHH-based scans for positive selection MolBiol Evol 31(10)2824ndash2827

Tekola-Ayele F Adeyemo A Chen G Hailu E Aseffa AA-O Davey GA-ONewport MJ Rotimi CA-OX 2015 Novel genomic signals of recentselection in an Ethiopian population Eur J Hum Genet23(8)1085ndash1092

Thompson EE Kuttab-Boulos H Witonsky D Yang L Roe BA Di RienzoA 2004 CYP3A variation and the evolution of salt-sensitivity var-iants Am J Hum Genet 75(6)1059ndash1069

Tishkoff SA Reed FA Ranciaro A Voight BF Babbitt CC Silverman JSPowell K Mortensen HM Hirbo JB Osman M et al 2007Convergent adaptation of human lactase persistence in Africa andEurope Nat Genet 39(1)31ndash40

Triska P Soares P Patin E Fernandes V Cerny V Pereira L 2015Extensive admixture and selective pressure across the Sahel BeltGenome Biol Evol 7(12)3484ndash3495

Uchil PD Quinlan BD Chan W-T Luna JM Mothes W 2008 TRIM E3ligases interfere with early and late stages of the retroviral life cyclePLoS Pathog 4(2)e16

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Vyas DN Al-Meeri A Mulligan CJ 2017 Testing support for the northernand southern dispersal routes out of Africa an analysis of Levantineand southern Arabian populations Am J Phys Anthropol164(4)736ndash749

Vyas DN Kitchen A Miro-Herrans AT Pearson LN Al-Meeri A MulliganCJ 2016 Bayesian analyses of Yemeni mitochondrial genomes sug-gest multiple migration events with Africa and Western Eurasia AmJ Phys Anthropol 159(3)382ndash393

Wagh K Bhatia A Alexe G Reddy A Ravikumar V Seiler M Boemo MYao M Cronk L Naqvi A et al 2012 Lactase persistence and lipidpathway selection in the Maasai PLoS One 7(9)e44751

Yin T Cook D Lawrence M 2012 ggbio an R package for extending thegrammar of graphics for genomic data Genome Biol 13(8)R77

Yu Y Ouyang Y Yao W 2018 shinyCircos an RShiny application forinteractive creation of Circos plot Bioinformatics 34(7)1229ndash1231

Zhao F McParland S Kearney F Du L Berry DP 2015 Detection ofselection signatures in dairy and beef cattle using high-density ge-nomic information Genet Sel Evol 4749

Fernandes et al doi101093molbevmsz005 MBE

586

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Gene Flow in the Region and Dating of AdmixtureThe set of algorithms based on haplotype sharing were testedin the ldquoextended data setrdquo (supplementary table S2Supplementary Material online) Refined Identity-by-descent(IBD) (Browning and Browning 2013) (fig 1BndashE) was appliedto AP Levantine Egyptian and East African populations andallowed us to identify shared IBD segments Since the lengthof shared haplotype decreases throughout time segmentlengths of lt2 cM and 2ndash5 cM are indicative of old commonancestry These results attested the genetic continuity in theregion given the high share of haplotypes Although nothighly differentiated the Levant seemed to share a loweramount of IBD with Kenya Somalia and Ethiopia when com-pared with what the AP shares with these populations Whenfiltering for shared IBD lengths 10ndash20 cM andgt20 cM (youn-ger shared history) the southern sharing (AP and East Africa)was dominant

The estimated effective migration surfaces (EEMS) plot(fig 2A) revealed a barrier between West and East AP pop-ulations East Africa shared a migration surface with SaudiArabia and Yemen populations whereas another genetic cor-ridor was evident between the Levant Iran and the East APThese results were corroborated by Wrightrsquos FST metric

(supplementary fig S4 Supplementary Material online)with Saudi Arabia being genetically closer to Yemen (0002)and the UAE closer to Oman (0001) than between bothsides of the Peninsula (0006) In terms of FST pairwise com-parisons between AP and South Asian populations the UAEdisplayed the lowest values (0003 with Iran 0007 withMakrani and 0009 with Balochi) and Iran was closer tothe Levant and the East AP (0003 with Syria Jordan andLebanon 0003 with the UAE and 0004 with Oman) thanto South Asian populations (0005 with Balochi 0006 withPathan and 0032 with India) All AP populations showed alower distance from the Levant (0004ndash0008) than fromEurope (0012ndash0027) East Africa (0084ndash0110) and finallyWest Africa (0095ndash0123) The same pattern of west-eastdiversity in the Peninsula was observed for the f3 statistics(supplementary fig S5 and supplementary table S3Supplementary Material online) with detection of Africanand LevantEuropean admixture in all Arabian populationsand admixture with South Asian ancestry limited to theEast AP

The clustered coancestry matrix from thefineSTRUCTURE and the ChromoPainter analyses whichexplores patterns of haplotype sharing among groups

FIG 2 Estimated effective migration surfaces gradient map around the Arabian Peninsula (A) The color scale reveals low (light gray) and high (darkgray) genetic barriers between populations localized on a grid of 500 demes The scale represents log values of the effective migration rates (m)Each deme (black dot) is proportional to the sample sizes of the populations Map was generated with the R package worldmap Dates ofadmixture events (B) in AP and Iranian populations as estimated by GLOBETROTTER Pie charts reflect the admixture proportions of ancestrieslisted in the legend Open circles in the date of admixture refer to one event whereas filled circles refer to multiple events and lines represent the95 confidence interval

Fernandes et al doi101093molbevmsz005 MBE

578

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

classified AP and Iranian individuals in 12 clusters attestingtheir diverse genetic reservoir (supplementary fig S6Supplementary Material online) Proportions of ancestriesreflected the ones obtained in ADMIXTURE mostly AP andIranian individuals were clustered with LevantNorthAfrican populations (92 from Yemen 92 Iran 92 theUAE 84 Saudi Arabia and 80 Oman) and to a lesserextent with South Asians (4ndash8 of UAE and Oman indi-viduals) and sub-Saharan Africans including with ComoroMadagascar and Swahili populations (12 Oman 12Saudi Arabia and 4 Iran)

Overall GLOBETROTTER results highlighted recent ad-mixture events in the AP region (fig 2B and supplementarytable S4 Supplementary Material online) Between 400 and1000 years before present admixture events took place withEast African ancestry in all AP populations The non-Africanbackground was similar to a European background in allpopulations except in the UAE and Iran where a closenessto South Asian ancestry was identified The sub-SaharanAfrican proportion was double in the West AP than in theEast AP In Saudi Arabia the UAE and Oman multidateevents were more probable but the second event was evenyounger (67 153 and 160 BP respectively) and involved thesame ancestries of the older event

Local Ancestry InferenceThe RFMix software package (Maples et al 2013) was used toanalyze the genomes of admixed Arabian and Iranian popu-lations and to detect regions showing excess of a given pa-rental population (sub-Saharan African or South Asian indetriment of the main EuropeanLevant background)Results revealed statistically significant enrichment of a par-ticular ancestry in certain regions of the genome which areassociated with selection signals due to adaptive traits such asmalaria resistance skin pigmentation and lactose tolerance(fig 3A and B and supplementary figs S7 and S8Supplementary Material online show results for all testedpopulations supplementary table S5 SupplementaryMaterial online)

The region of the genome with the highest proportion ofsub-Saharan African ancestry (gt4SD) observed in Arabianand Iranian populations was located on chromosome 1 ina gene-rich region containing the DARC gene and severalolfactory receptor genes such as OR10J1 OR10J3 andOR10J5 This sub-Saharan African input was higher inYemen (a mean of 061 supplementary fig S9Supplementary Material online) than in other regions ofthe Peninsula (049 Oman 035 the UAE 028 Saudi Arabiaand 016 in Iran) Another region highly enriched in all pop-ulations for sub-Saharan African ancestry was located in chro-mosome 9 containing genes coding for forkhead box D4(FOXD4L4 and FOXD4L2) and spermatogenesis-associatedproteins (FAM75A5 and FAM75A7)

The RFMix analysis indicated a South Asian-enriched re-gion on chromosome 2 containing the RAB3GAP1 ZRANB3LCT MCM6 and CXCR4 genes They have been previouslyassociated with lactase persistence the ability to digest milkinto adulthood which is an adaptive trait mostly found in

European populations and a classic example of genetic adap-tation in humans (Tishkoff et al 2007 Campbell and Tishkoff2008 Triska et al 2015) This region reached highly significantvalues (gt4SD) in Saudi and Yemen and was also significant inthe UAE and Iran (3SD lt X lt 4SD) Another South Asian-enriched region (3SD lt X lt 4SD) was detected on chromo-some 5 in Yemen Saudi Arabia and the UAE containing theSLC45A2 gene associated with skin eye and hair pigmenta-tion Nevertheless the most significant South Asian-enrichedsignal detected in all populations (gt4SD) was located inchromosome 6 containing many genes related with the im-mune system (HLA-B HLA-DR HLA-DQ MICA MICB andTNF)

Selection in the AP and IranWhen checking positive selection signals in the AP and Iranby using integrated haplotype score (iHS) and populationextended haplotype homozygosity (XP-EHH) measures sev-eral genomic regions displayed significant values (fig 3CndashFand supplementary figs S10ndashS13 and supplementary tablesS6ndashS9 Supplementary Material online)

Particularly several signals previously associated with se-lection in non-African populations were detected East APand Iranian populations exhibited a strong signal of positiveselection (gt4SD) in the XP-EHH-European comparison onchromosome 2 region containing the R3HDM1 and LCTgenes associated to Europeans with food conversion effi-ciency and metabolism of lactose (Zhao et al 2015) TheSLC24A5 gene on chromosome 15 which is strongly associ-ated with skin pigmentation variation among European andSouth Asian populations (Basu Mallick et al 2013) presentedsigns of positive selection in all populations in iHS (gt4SD forYemen Saudi Arabia and the UAEgt3SD for Oman and Iran)and XP-EHH-South Asian (gt3SD for all populations) analysesAnother skin pigmentation-related gene SLC45A2 which islocated on chromosome 5 and is known to be strongly asso-ciated to this phenotype in European populations (Candilleet al 2012 Hernandez-Pacheco et al 2017) displayed signs ofpositive selection in all populations for the XP-EHH-Europeancomparison (gt4SD in Saudi Arabia and gt3SD in remainingpopulations) The same occurred for chromosome 4 regioncontaining the TLR1 TLR6 and TLR10 genes (role in pathogenrecognition and activation of innate immunity [Laayouniet al 2014] gt4SD in Yemen and Saudi Arabia and gt3SDin remaining populations) and on chromosome 12 regioncontaining the ATXN2 ACAD10 and ALDH2 genes (associ-ated with hypertension [Russo et al 2018] gt4SD in SaudiArabia and gt3SD in remaining populations)

In the comparison with African populations (XP-EHH-African) significant values (gt3SD) were observed for thechromosome 1 region containing the DARC gene in SaudiArabia and Iran Amongst the strongest signals of positiveselection was the chromosome 21 region containing theC21orf34 MIR99A and MIRLET7C genes detected in all pop-ulations (gt4SD) both in iHS and XP-EHH-African tests Thissignal was previously identified as amongst the strongest se-lection signals in non-Africans (Pickrell et al 2009) and re-cently C21orf34 was identified as MIR99AHG a miRNA gene

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

579

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

FIG 3 Circos plots highlighting significantly (gt4SD) enriched local admixed and positively selected segments in Arabian Peninsula and Iranianpopulations (A) Enriched sub-Saharan African ancestry (B) Enriched South Asian ancestry (C) iHS selection metrics (D) XP-EHH versus Europeanancestry (E) XP-EHH versus sub-Saharan African ancestry (F) XP-EHH versus South Asian ancestry

Fernandes et al doi101093molbevmsz005 MBE

580

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

involved in the regulation of hematopoiesis and oncogenesfor the development of myeloid leukemia (Emmrich et al2014) In addition a signal on the chromosome 10 regioncontaining the SPAG6 gene was observed in all populationsin iHS (except Yemen) and XP-EHH-African (stronger inOman and the UAE) analyses This gene is essential for thestructural integrity of the central microtubule of sperm sta-bility and flagellar motility and has been previously identifiedas being under selection before the separation betweenEuropean and Asian populations (Racimo 2016) The signifi-cant signal in iHS score on chromosome 7 region containingseveral genes (CYP3A5 ZSCAN25 ATP5J2-PTCD1 CPSF4ATP5J2 ZNF789 and ZKSCAN5) was also detected in all pop-ulations at the XP-EHH-African comparison CYP3A5 gene isinvolved in cholesterol metabolism and steroid biosynthesisin the liver and has been identified to be under positive se-lection in African and non-African populations (Wagh et al2012) CYP3A513 polymorphism was also described as be-ing under selective pressure for water retention and risk forsalt-sensitive hypertension in equatorial populations(Thompson et al 2004)

The iHS metrics identified a positively selected region onchromosome 6 in all populations which contains candidategenes involved in the human immunodeficiency virus (HIV)stages of the viral lifecycle the ZNRD1-AS1and TRIM26 genesZNRD1-AS1 plays a role in the regulation of cell proliferationand is essential for the completion of HIV lifecycle Itsrs3132130 SNP was described to confer host resistance toHIV-1 acquisition by causing a loss of nuclear factor bindingand decreasing the ZNRD1 promoter activity (An et al 2014)Although with unknown function the tripartite motif (TRIM)proteins may contribute to the innate immunity of retrovi-ruses affecting both early and late stages of the retroviral lifecycle (Ozato et al 2008 Uchil et al 2008) Another signifi-cantly iHS selected region in all populations except the SaudiArabians was identified on chromosome 22 region contain-ing the TTC38 PKDREJ GTSE1 and PPARA genes PPARA geneis a transcription receptor that regulates lipid and glucosemetabolism during food deficiency and a recent study(Tekola-Ayele et al 2015) found a specific signal of recentpositive selection of this gene in Ethiopia suggesting a met-abolic adaptation to high-altitude hypoxia

ConclusionsThe detailed genome-wide characterization of Arabian andIranian populations allowed us to establish for the first timethe palimpsest of admixture and selection along the genomesof these inhabitants the main bridge between African Asianand European population groups Admixture events com-prised up to 20 of sub-Saharan African ancestry in theWest and 20 of South Asian ancestry in the East of theAP enriching the local predominant Arabian- North African-Levantine and Caucasian-like components The sub-SaharanAfrican input decreased toward the East whereas the SouthAsian input decreased in the direction of the West of thePeninsula testifying the continuous genetic exchange withAfrica and Asia as it was evident in the IBD sharing analysis

Of course there are specific subgroups that depart from thesegeneral characteristics such as the Qatari Bedouins that rep-resent a ldquocoldspotrdquo of admixture in the AP (Rodriguez-Floreset al 2016) whereas the Hadrami in Yemen are a ldquohotspotrdquo ofadmixture (Vyas et al 2017) EEMS analysis identified higherexchanges of genes between Africa and the West AP as wellas between the East AP and the LevantSouth Asia thanbetween the West and the East AP This result agrees withpreviously documented degree of isolation between Westand East AP populations identified when studying uniparen-tal markers (Alshamali et al 2009 Fernandes et al 2012 2015)We previously found relict maternal lineages (60 ka) of theoriginal OOA successful settlement by modern humans in theAP and most likely they spread to South Asia and the LevantEurope from the Gulf corridor (Fernandes et al 2012)However current LD decayndashbased dates only indicated re-cent admixture events in the eastern side of the PeninsulaSimilarly younger ages of admixture in the West AP reflectedthe Islamization of North Africa as the most probable maincontributor of this region The exchanges across the Red Seadetected for the autosomal data matched the Arab slavetrade and maritime dominance

Recent studies have provided evidence that admixed pop-ulations acquired positively selected genomic regions fromtheir parental populations or in other words the distributionof admixed blocks across the genomes of descendants wasnot even (Triska et al 2015 Laso-Jadart et al 2017 Patin et al2017) The acquired selected regions can be inferred from theconcomitant identification of the same genomic regions inthe selection metrics and in the local enrichment ancestryanalysis AP populations displayed a jigsaw puzzle of AfricanEuropean and South Asian-selected genes linked to responseto infection diseases skin pigmentation lactase persistenceand food conversion

The strongest African-related enrichment sign was for theDARC gene which codes the Duffy antigen a membrane-bound chemokine receptor used by Plasmodium vivax toinfect red blood cells Plasmodium vivax causes a chronicform of malaria and is the most widespread type of malariaoutside Africa (McManus et al 2017) It is described that thederived C allele mutation in the promoter which causeserythroid-specific suppression of gene expression confers re-sistance against Plasmodium vivax malaria and thus it is al-most fixed in African populations (Triska et al 2015 Laso-Jadart et al 2017 McManus et al 2017) The AP is known tohave a diverse malaria epidemiology in recent times from alikely absence of transmission in Kuwait to an intense parasiteexposure similar to conditions in many parts of Africa amongresidents of Saudi Arabia and Yemen (Snow et al 2013) Thedetected enrichment signal confirmed here in the genomes ofArabians supported the higher malaria pressure in the WestAP The enrichment was strong enough to allow detection ofselection in the XP-EHH metrics in Saudi Arabia and Iran

The most significant EuropeanSouth Asian-related enrich-ments in AP populations were linked to lactase tolerance-specific chromosome 2 RAB3GAP1LCTMCM6 region skinpigmentation associated to the SLC24A5 gene and HLA re-gion on chromosome 6 The LCT gene 13915G allele

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

581

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

(rs41380347) was described as being present at high fre-quency in the AP and the Levant and estimated to havebeen originated 4 ka (62 ka) in the AP (Bayoumi et al2016) possibly as a result of the domestication of the Arabiancamel at 6 ka (Ranciaro et al 2014) By contrast theEuropean 13910T selected allele (rs4988235) is also themost commonly observed in South Asia (Gallego Romeroet al 2012) We could not verify frequencies of rs41380347SNP that was not part of the chip genotyped here but wechecked the linked rs4988243 SNP This last SNP testified thehigher frequencies in the West versus the East AP (432 inYemen 461 in Saudi Arabia 179 in Oman 191 in theUAE and 174 in Iran notice that although rs41380347 andrs4988243 are linked this last SNP has overall frequencies20 higher than the first) The genotyped Europeanrs4988235 SNP presented higher frequencies of the derivedallele in the East than in the West AP (06 13 125182 and 103 respectively) The haplotype backgroundof Arabian and European derived alleles are related and dis-tant from the African one (Ranciaro et al 2014) whichexplains the high non-African enrichment detected inRFMix analysis in the West AP (the Arabian haplotypes)and the selection signal in XP-EHH-European comparison inThe East AP and Iran (the European haplotypes) This is aninteresting case generated by the isolation between the twosides of the Peninsula where the West developed a localpositive selection of a lactose tolerant variant whereas theEast acquired the EuropeanSouth Asian variant Previouslythis difference had been reported for ethnic groups withinOman (Bayoumi et al 2016) where frequencies of13910Tand 13915G alleles diverged between Arabs of northernand southern Oman whom migrated from Yemen (0ndash1 and14ndash72 respectively) and Omanis of Asian origin (16 and0 respectively) Here we verified that this pattern is ex-tended to the entire Peninsula

In relation to the genetic diversity linked to skin color theEuropeanSouth Asian-selected derived SNPs are equally fre-quent in both the West and East AP (rs1426654 in theSLC24A5 gene 101 in Yemen 133 in Saudi Arabia132 in Oman 135 in the UAE and 118 in Iranrs183671 in the SLC45A2 gene 354 294 227 290and 401 respectively rs1667394 in the OCA2HERC2 gene381 288 307 335 and 349 respectively)

The palimpsest of complex interactions between cross andlocal selective pressures and demographic factors identifiedhere in the AP testifies the rich genetic dynamics taking placein bridging zones of the globe In the near future wholegenomes of Arabians across the entire peninsula (with amore precise geographical information than the one we couldpresent here) and more sensitive selection measures will addinformation to the role played by this geographic region inthe adaptation of modern humans to non-African habitatsImprovements to dating techniques applicable to completegenomes such as multiple sequentially Markovian coalescentapproach will also help in answering the question if the APwas only a passage way to reach further distances or one ofthe oldest inhabited regions OOA Complete genomes mayindeed shed light on the lost period in the mtDNA pool due

to the absence of the direct OOA descendants The first stepshave already been taken and a few Arabian completegenomes have begun to be available (John et al 2015Mallick et al 2016 Rodriguez-Flores et al 2016)

Materials and Methods

Population Samples Genome-Wide Genotyping andPublished DataDNA samples analyzed in this study were collected from 420individuals of the AP (Saudi Arabia Yemen Oman and theUAE) and from 80 individuals of Iran populations (furtherinformation provided in supplementary table S1Supplementary Material online) Individuals were residing inDubai at the time of sampling (between years 1991 and 2000)representing a widespread collection of birth origins fromSaudi Arabia Yemen Oman the UAE and Iran This studyobtained ethical approval from the Ethics Committee of theUniversity of Porto Portugal (17CEUP2012) The individualswere genotyped with the Illumina Human Omni ExpressBead Chip (OmniExpress) containing 741000 SNPs Qualitychecks were applied using PLINK (Purcell et al 2007 Changet al 2015) and 22 individuals were excluded for the lack ofmore than 10 of genotypes A preliminary PCA revealedthat some UAE and Oman individuals (five individuals fromeach population) had a considerable sub-Saharan African an-cestry an indicator that they were most probably recentimmigrants from Africa (supplementary fig S14Supplementary Material online) Hence these individualswere removed from the analyses leading to a final set of468 individuals Markers with genotyping call rates lowerthan 95 were removed consequently a total of 627435autosomal SNPs passed quality control check The genotypesfrom the individuals analyzed in this study can be accessedfrom the EGA repository (European Genome-PhenomeArchive accession number EGAS00001003335) In order toperform a fine-scale population structure analysis of our com-plete data the 468 individuals were merged with relevantavailable data sets (supplementary table S1 SupplementaryMaterial online) As several population genetic analyses as-sume independent markers SNPs were pruned for pairwiseLD in PLINK (wwwcog-genomicsorgplink19 Purcell et al2007 Chang et al 2015) by removing any SNP that had an r2

gt 04 with another SNP within a 50-SNPs sliding windowwith a step of 20 SNPs After pruning the ldquorestricted data setrdquoincluded 176309 autosomal SNPs and 1782 individuals (sup-plementary table S1 Supplementary Material online) andthis data set was used in the analyses based on allele frequen-cies (PCA ADMIXTURE FST and f3 statistic) Another data setwas created by randomly selecting a maximum of 25 individ-uals from each population but extending the number andthe geographical distribution of populations this data setconsisting of 171728 SNPs and 2056 individuals was desig-nated as ldquoextended data setrdquo (supplementary table S2Supplementary Material online) The homogenization ofthe number of samples per group avoided statistical biasesdue to overrepresentation of some populations and allowedto run computationally demanding algorithms such as EEMS

Fernandes et al doi101093molbevmsz005 MBE

582

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

fineSTRUCTURE Chromopainter GLOBETROTTER and IBDAll genotypes from the unpruned ldquoextended data setrdquo werephased with SHAPEIT v2r79044 (Delaneau et al 2011) usingthe 1000 Genomes phased data (Auton et al 2015) as a ref-erence panel and the HapMap phase 2 genetic map (Frazeret al 2007) We did not merge our data with relevant data setsfrom Qatar and Yemen (Hunter-Zinck et al 2010 Vyas et al2017) as these were screened through Affymetrix chips bear-ing a low number of overlapping SNPs Qatari completegenomes from (Rodriguez-Flores et al 2016) were specificallyselected not making a random population sample whichcould be comparable with our populations

Population Structure Differentiation and InbreedingPCA which infers worldwide axes of human genetic variationfrom the allele frequencies of various populations was carriedout in the pruned ldquorestricted data setrdquo by using the smartpcatool included in the EIGENSOFT package (Patterson et al2006 Price et al 2006) We ran ADMIXTURE (Alexander et al2009) to decompose genetic ancestries of the prunedldquorestricted data setrdquo using maximum likelihood for compo-nents Kfrac14 2 to Kfrac14 12 with the optimal K estimated throughcross-validation of the logistic regression (Kfrac14 6) Wrightrsquos FST

metric was calculated using Vcftools (Danecek et al 2011)The THREEPOP program from TreeMix was used to calculatef3 statistics (Reich et al 2009 Patterson et al 2012 Pickrell andPritchard 2012) which tests between alternative scenarios ofadmixture In order to analyze and visualize spatial populationstructure from georeferenced genetic samples and to definegenetic barriers and corridors the method of EEMS (Petkovaet al 2016) was used Geographic coordinates and a geneticdissimilarity matrix between populations were used to set upa map defining a grid of 500 demes and 3000000 Markovchain Monte Carlo (MCMC) iterations were run beforechecking for convergence Plots were generated in R followingEEMS v1 manual indications

Inference and Dating of Population AdmixturePopulation structure of the phased ldquoextended data setrdquo wasevaluated using the fineSTRUCTURE v207 package 18(Lawson et al 2012) fineSTRUCTURE captures informationon population structure provided by patterns of haplotypesimilarity calculated with Chromopainter v20 18 (Lawsonet al 2012) and performs a model-based Bayesian clusteringof genotypes From the results a coancestry heat map and adendrogram were inferred to visualize the number of statis-tically defined clusters that describe the data The clustersobtained by fineSTRUCTURE were then used to assign thepopulations as target or donorsurrogate when testing foradmixture Mutational rates and effective population sizeparameters were first estimated with an Estimation-Maximization algorithm running Chromopainter v2 on all22 autosomes for the entire data set with 10 iterations(Lawson et al 2012) The weighted average of these param-eters according to the SNP coverage of each chromosomeand the number of individuals was then used to compute thechromosomal painting Two Arabian groups were identifiedas target populations the western group represented by

Yemen and Saudi Arabia and the eastern group representedby Oman and the UAE We ran chromosomal painting foreach group Due to the possibility of common ancestry weexcluded the Levantine populations from surrogate popula-tions The painted chromosomes obtained for each clusterwere used in GLOBETROTTER v10 26 (Hellenthal et al 2014)to estimate the ratios and dates of the potential admixtureevents in the region Coancestry curves were estimated foreach target population both standardizing against a null(NULL 1) and standard (NULL 0) individual and consistencybetween each estimated parameter was checked A total of100 bootstrap resamplings were performed to estimate theprobability values of the admixture events considering theldquoNULLrdquo individual by calculating the proportion of non-sensical inferred dates (lt1 or gt400 generations) producedby the model (NULL 1) The ldquobest-guessrdquo scenario given byGLOBETROTTER v10 26 (Hellenthal et al 2014) was consid-ered for each target population Dates of admixture given ingenerations were converted to chronological time using ageneration interval of 29 years (Langergraber et al 2012)

Haplotype sharing between pairs of individuals was esti-mated by the refined IBD algorithm of Beagle v4026 27(Browning and Browning 2007) and visualized withCytoscape v32160 (Shannon et al 2003) All maps inthe present study were generated using Global Mapperv15 (httpwwwbluemarblegeocomproductsglobal-mapperphp)

Local Ancestry InferenceRFMix (Maples et al 2013) software which employs a LDmodel between markers to infer ancestry for each segmentof the genome between a mixture of putative ancestral panelsof haplotypes was used in the unpruned and phased data setfrom the four AP populations plus Iran The phased data fromthe 1000 Genomes populations of Great Britain (representingEuropean ancestry) Yoruba from Nigeria (sub-SaharanAfrican ancestry) and Indian Telugu (South Asian ancestry)were used as parental populations Information on ancestrywas obtained for each locus along chromosomes for everyindividual and these values were averaged in each popula-tion In order to identify regions which had a significantlyhigher proportion of a given parental ancestry we followedpreviously published studies (Bryc et al 2015) consideringonly regions outside the range defined by the genomemean 6 4SD The karyograms of the local ancestry acrossthe 22 chromosomes were generates in R using the ggbiopackage (Yin et al 2012)

Analysis of SelectioniHS (Voight et al 2006) was estimated in the SelScan package(Szpiech and Hernandez 2014) in the phased data set fromthe four AP populations plus Iran Scores were calculated foreach SNP within the population as a whole and standardizedwithin each of 100 bins of allele frequency using the normtool In order to identify loci potentially affected by positiveselection a percentile score was assigned based on the pro-portion of extreme iHS values in segment SelScan (Szpiechand Hernandez 2014) tool was also used to estimate cross

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

583

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

population extended haplotype homozygosity in each APpopulation by comparing with Great Britain Yoruba fromNigeria or Indian Telugu Consistently with other tests forselection only markers with MAFgt 1 were used for XP-EHH computation The obtained XP-EHH values were nor-malized by subtracting genome-wide mean XP-EHH and di-viding by standard deviation (Szpiech and Hernandez 2014)Genomic regions with percentile score exceeding 4SD and3SD were reported

Selected SNPs were mapped to genes in 5 kb flanking re-gion and significantly selected regions were represented incircos Plots generated in R (R Development Core Team 2018)using the shinyCircos graphical interface (Yu et al 2018)

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsThis work was financed by FEDERmdashFundo Europeu deDesenvolvimento Regional funds through COMPETE2020mdashOperacional Programme for Competitiveness andInternationalisation (POCI) Portugal 2020 and byPortuguese funds through FCTmdashFundac~ao para a Ciencia ea TecnologiaMinisterio da Ciencia Tecnologia e Inovac~ao inthe framework of the project ldquoBiomedical anthropologicalstudy in Arabian Peninsula based on high throughputgenomicsrdquo (POCI-01-0145-FEDER-016609) VF has a postdocgrant through FCT (SFRHBPD1149272016)

ReferencesAl-Abri A Podgorna E Rose JI Pereira L Mulligan CJ Silva NM Bayoumi

R Soares P Cerny V 2012 PleistocenendashHolocene boundary inSouthern Arabia from the perspective of human mtDNA variationAm J Phys Anthropol 149(2)291ndash298

Alexander DH Novembre J Lange K 2009 Fast model-based estimationof ancestry in unrelated individuals Genome Res 19(9)1655ndash1664

Alshamali F Pereira L Budowle B Poloni ES Currat M 2009 Localpopulation structure in Arabian Peninsula revealed by Y-STR diver-sity Hum Hered 68(1)45ndash54

An P Goedert JJ Donfield S Buchbinder S Kirk GD Detels R Winkler CA2014 Regulatory variation in HIV-1 dependency factor ZNRD1 asso-ciates with host resistance to HIV-1 acquisition J Infect Dis210(10)1539ndash1548

Auton A Brooks LD Durbin RM Garrison EP Kang HM Korbel JOMarchini JL McCarthy S McVean GA Abecasis GR 2015 A globalreference for human genetic variation Nature 526(7571)68ndash74

Basu Mallick C Iliescu FM Mols M Hill S Tamang R Chaubey G Goto RHo SYW Gallego Romero I Crivellaro F et al 2013 The light skinallele of SLC24A5 in South Asians and Europeans shares identity bydescent PLoS Genet 9(11)e1003912

Bayoumi R De Fanti S Sazzini M Giuliani C Quagliariello A Bortolini EBoattini A Al-Habori M Al-Zubairi AS Rose JI et al 2016 Positiveselection of lactase persistence among people of Southern ArabiaAm J Phys Anthropol 161(4)676ndash684

Browning BL Browning SR 2013 Improving the accuracy and efficiencyof identity-by-descent detection in population data Genetics194(2)459ndash471

Browning SR Browning BL 2007 Rapid and accurate haplotype phasingand missing-data inference for whole-genome association studies byuse of localized haplotype clustering Am J Hum Genet81(5)1084ndash1097

Brucato N Fernandes V Mazieres S Kusuma P Cox MP Ngrsquoangrsquoa JWOmar M Simeone-Senelle MC Frassati C Alshamali F et al 2018The Comoros show the earliest Austronesian gene flow into theSwahili corridor Am J Hum Genet 102(1)58ndash68

Bryc K Durand EY Macpherson JM Reich D Mountain JL 2015 Thegenetic ancestry of African Americans Latinos and EuropeanAmericans across the United States Am J Hum Genet 96(1)37ndash53

Campbell MC Tishkoff SA 2008 African genetic diversity implica-tions for human demographic history modern human originsand complex disease mapping Annu Rev Genomics Hum Genet9403ndash433

Candille SI Absher DM Beleza S Bauchet M McEvoy B GarrisonNA Li JZ Myers RM Barsh GS Tang H et al 2012 Genome-wideassociation studies of quantitatively measured skin hair and eyepigmentation in four European populations PLoS One 7(10)e48294

Cerny V Mulligan CJ Fernandes V Silva NM Alshamali F Non A HarichN Cherni L El Gaaied AB Al-Meeri A et al 2011 Internal diversifi-cation of mitochondrial haplogroup R0a reveals post-last glacialmaximum demographic expansions in South Arabia Mol Biol Evol28(1)71ndash78

Cerny V Pereira L Kujanova M Vasikova A Hajek M Morris M MulliganCJ 2009 Out of Arabiamdashthe settlement of island Soqotra as revealedby mitochondrial and Y chromosome genetic diversity Am J PhysAnthropol 138(4)439ndash447

Chang CC Chow CC Tellier LC Vattikuti S Purcell SM Lee JJ 2015Second-generation PLINK rising to the challenge of larger and richerdatasets Gigascience 47

Danecek P Auton A Abecasis G Albers CA Banks E DePristo MAHandsaker RE Lunter G Marth GT Sherry ST et al 2011 The variantcall format and VCFtools Bioinformatics 27(15)2156ndash2158

Delaneau O Marchini J Zagury JF 2011 A linear complexity phasingmethod for thousands of genomes Nat Methods 9(2)179ndash181

Emmrich S Streltsov A Schmidt F Thangapandi VR Reinhardt DKlusmann JH 2014 LincRNAs MONC and MIR100HG act as onco-genes in acute megakaryoblastic leukemia Mol Cancer 13171

Fernandes V Alshamali F Alves M Costa MD Pereira JB Silva NMCherni L Harich N Cerny V Soares P et al 2012 The Arabian cradlemitochondrial relicts of the first steps along the southern route outof Africa Am J Hum Genet 90(2)347ndash355

Fernandes V Triska P Pereira JB Alshamali F Rito T Machado AFajkosova Z Cavadas B Cerny V Soares P et al 2015 Genetic stra-tigraphy of key demographic events in Arabia PLoS One10(3)e0118625

Frazer KA Ballinger DG Cox DR Hinds DA Stuve LL Gibbs RA BelmontJW Boudreau A Hardenbol P Leal SM et al 2007 A second gener-ation human haplotype map of over 31 million SNPs Nature449(7164)851ndash861

Fu Q Posth C Hajdinjak M Petr M Mallick S Fernandes D FurtwanglerA Haak W Meyer M Mittnik A et al 2016 The genetic history of IceAge Europe Nature 534(7606)200ndash205

Gallego Romero I Basu Mallick C Liebert A Crivellaro F Chaubey G ItanY Metspalu M Eaaswarkhanth M Pitchappan R Villems R et al2012 Herders of Indian and European cattle share their predomi-nant allele for lactase persistence Mol Biol Evol 29(1)249ndash260

Gravel S Zakharia F Moreno-Estrada A Byrnes JK Muzzio M Rodriguez-Flores JL Kenny EE Gignoux CR Maples BK Guiblet W et al 2013Reconstructing Native American migrations from whole-genomeand whole-exome data PLoS Genet 9(12)e1004023

Hellenthal G Busby GBJ Band G Wilson JF Capelli C Falush D Myers S2014 A genetic atlas of human admixture history Science343(6172)747ndash751

Hernandez-Pacheco N Flores C Alonso S Eng C Mak ACY Hunstman SHu D White MJ Oh SS Meade K et al 2017 Identification of a novellocus associated with skin colour in African-admixed populationsSci Rep 744548

Hodgson JA Mulligan CJ Al-Meeri A Raaum RL 2014 Early back-to-Africa migration into the horn of Africa PLoS Genet 10(6)e1004393

Fernandes et al doi101093molbevmsz005 MBE

584

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Hogarth DG 1904 The penetration of Arabia a record of the develop-ment of western knowledge concerning the Arabian Peninsula NewYork Cambridge Library Collection

Hunter-Zinck H Musharoff S Salit J Al-Ali KA Chouchane L Gohar AMatthews R Butler MW Fuller J Hackett NR et al 2010 Populationgenetic structure of the people of Qatar Am J Hum Genet87(1)17ndash25

John SE Thareja G Hebbar P Behbehani K Thanaraj TA Alsmadi O2015 Kuwaiti population subgroup of nomadic Bedouin ancestrymdashwhole genome sequence and analysis Genom Data 3116ndash127

Kivisild T Reidla M Metspalu E Rosa A Brehm A Pennarun E Parik JGeberhiwot T Usanga E Villems R 2004 Ethiopian mitochondrialDNA heritage tracking gene flow across and around the gate oftears Am J Hum Genet 75(5)752ndash770

Laayouni H Oosting M Luisi P Ioana M Alonso S Rica~no-Ponce ITrynka G Zhernakova A Plantinga TS Cheng S-C et al 2014Convergent evolution in European and Rroma populations revealspressure exerted by plague on Toll-like receptors Proc Natl Acad SciU S A 111(7)2668ndash2673

Lahr MM Foley R 2005 Multiple dispersals and modern human originsEvol Anthropol 3(2)48ndash60

Langergraber KE Prufer K Rowney C Boesch C Crockford C Fawcett KInoue E Inoue-Muruyama M Mitani JC Muller MN et al 2012Generation times in wild chimpanzees and gorillas suggest earlierdivergence times in great ape and human evolution Proc Natl AcadSci U S A 109(39)15716ndash15721

Laso-Jadart R Harmant C Quach H Zidane N Tyler-Smith C Mehdi QAyub Q Quintana-Murci L Patin E 2017 The genetic legacy of theIndian Ocean slave trade recent admixture and post-admixture se-lection in the Makranis of Pakistan Am J Hum Genet101(6)977ndash984

Lawson DJ Hellenthal G Myers S Falush D 2012 Inference of populationstructure using dense haplotype data PLoS Genet 8(1)e1002453

Li JZ Absher DM Tang H Southwick AM Casto AM Ramachandran SCann HM Barsh GS Feldman M Cavalli-Sforza LL et al 2008Worldwide human relationships inferred from genome-wide pat-terns of variation Science 319(5866)1100ndash1104

Loh PR Lipson M Patterson N Moorjani P Pickrell JK Reich D Berger B2013 Inferring admixture histories of human populations using link-age disequilibrium Genetics 193(4)1233ndash1254

Lovejoy PE 2011 Transformations in slavery a history of slavery in AfricaCambridge Cambridge University Press

Macaulay V Hill C Achilli A Rengo C Clarke D Meehan W Blackburn JSemino O Scozzari R Cruciani F et al 2005 Single rapid coastalsettlement of Asia revealed by analysis of complete mitochondrialgenomes Science 308(5724)1034

Malaspinas A-S Westaway MC Muller C Sousa VC Lao O Alves IBergstrom A Athanasiadis G Cheng JY Crawford JE et al 2016 Agenomic history of Aboriginal Australia Nature 538(7624)207

Mallick S Li H Lipson M Mathieson I Gymrek M Racimo F Zhao MChennagiri N Nordenfelt S Tandon A et al 2016 The SimonsGenome Diversity Project 300 genomes from 142 diverse popula-tions Nature 538(7624)201

Maples BK Gravel S Kenny EE Bustamante CD 2013 RFMix a discrim-inative modeling approach for rapid and robust local-ancestry infer-ence Am J Hum Genet 93(2)278ndash288

McManus KF Taravella AM Henn BM Bustamante CD Sikora MCornejo OE 2017 Population genetic analysis of the DARC locus(Duffy) reveals adaptation from standing variation associated withmalaria resistance in humans PLoS Genet 13(3)e1006560

Moorjani P Patterson N Hirschhorn JN Keinan A Hao L Atzmon GBurns E Ostrer H Price AL Reich D 2011 The history of African geneflow into Southern Europeans Levantines and Jews PLoS Genet7(4)e1001373

Musilova E Fernandes V Silva NM Soares P Alshamali F Harich NCherni L Gaaied AB Al-Meeri A Pereira L et al 2011 Populationhistory of the Red Seamdashgenetic exchanges between the ArabianPeninsula and East Africa signaled in the mitochondrial DNA HV1haplogroup Am J Phys Anthropol 145592ndash598

Ozato K Shin D-M Chang T-H Morse HC 2008 TRIM family proteinsand their emerging roles in innate immunity Nat Rev Immunol8(11)849ndash860

Patin E Lopez M Grollemund R Verdu P Harmant C Quach H Laval GPerry GH Barreiro LB Froment A et al 2017 Dispersals and geneticadaptation of Bantu-speaking populations in Africa and NorthAmerica Science 356(6337)543ndash546

Patterson N Moorjani P Luo Y Mallick S Rohland N Zhan YGenschoreck T Webster T Reich D 2012 Ancient admixture inhuman history Genetics 192(3)1065

Patterson N Price AL Reich D 2006 Population structure and eigena-nalysis PLoS Genet 2(12)e190

Petkova D Novembre J Stephens M 2016 Visualizing spatial populationstructure with estimated effective migration surfaces Nat Genet48(1)94ndash100

Pickrell JK Coop G Novembre J Kudaravalli S Li JZ Absher D SrinivasanBS Barsh GS Myers RM Feldman MW et al 2009 Signals of recentpositive selection in a worldwide sample of human populationsGenome Res 19(5)826ndash837

Pickrell JK Pritchard JK 2012 Inference of population splits and mixturesfrom genome-wide allele frequency data PLoS Genet8(11)e1002967

Price AL Patterson NJ Plenge RM Weinblatt ME Shadick NA Reich D2006 Principal components analysis corrects for stratification ingenome-wide association studies Nat Genet 38(8)904ndash909

Purcell S Neale B Todd-Brown K Thomas L Ferreira MA Bender DMaller J Sklar P de Bakker PI Daly MJ et al 2007 PLINK a tool set forwhole-genome association and population-based linkage analysesAm J Hum Genet 81(3)559ndash575

Quintana-Murci L Semino O Bandelt H-J Passarino G McElreavey KSantachiara-Benerecetti AS 1999 Genetic evidence of an early exit ofHomo sapiens sapiens from Africa through eastern Africa Nat Genet23(4)437

R Development Core Team 2018 R a language and environment forstatistical computing Vienna (Austria)R Foundation for StatisticalComputing

Racimo F 2016 Testing for ancient selection using cross-populationallele frequency differentiation Genetics 202(2)733ndash750

Ranciaro A Campbell MC Hirbo JB Ko W-Y Froment A Anagnostou PKotze MJ Ibrahim M Nyambo T Omar SA et al 2014 Geneticorigins of lactase persistence and the spread of pastoralism inAfrica Am J Hum Genet 94(4)496ndash510

Reich D Thangaraj K Patterson N Price AL Singh L 2009Reconstructing Indian population history Nature461(7263)489ndash494

Roche B Rougeron V Quintana-Murci L Renaud F Abbate JLPrugnolle F 2017 Might interspecific interactions between patho-gens drive host evolution The case of Plasmodium species andDuffy-negativity in human populations Trends Parasitol 33(1)21ndash29

Rodriguez-Flores JL Fakhro K Agosto-Perez F Ramstetter MD Arbiza LVincent TL Robay A Malek JA Suhre K Chouchane L et al 2016Indigenous Arabs are descendants of the earliest split from ancientEurasian populations Genome Res 26(2)151ndash162

Rosenberg NA Huang L Jewett EM Szpiech ZA Jankovic I Boehnke M2010 Genome-wide association studies in diverse populations NatRev Genet 11(5)356ndash366

Russo A Di Gaetano C Cugliari G Matullo G 2018 Advances in thegenetics of hypertension the effect of rare variants Int J Mol Sci19(3)688

Segal R 2002 Islamrsquos black slaves the other black diaspora LondonAtlantic Books

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin NSchwikowski B Ideker T 2003 Cytoscape a software environmentfor integrated models of biomolecular interaction networks GenomeRes 13(11)2498ndash2504

Skoglund P Thompson JC Prendergast ME Mittnik A Sirak K HajdinjakM Salie T Rohland N Mallick S Peltzer A et al 2017 Reconstructingprehistoric African population structure Cell 171(1)59ndash71 e21

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

585

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Snow RW Amratia P Zamani G Mundia CW Noor AM Memish ZA AlZahrani MH Al Jasari A Fikri M Atta H 2013 The malaria transitionon the Arabian Peninsula progress toward a malaria-free regionbetween 1960ndash2010 Adv Parasitol 82205ndash251

Soares P Alshamali F Pereira JB Fernandes V Silva NM Afonso C CostaMD Musilova E Macaulay V Richards MB et al 2012 The expan-sion of mtDNA haplogroup L3 within and out of Africa Mol BiolEvol 29(3)915ndash927

Szpiech ZA Hernandez RD 2014 selscan an efficient multithreadedprogram to perform EHH-based scans for positive selection MolBiol Evol 31(10)2824ndash2827

Tekola-Ayele F Adeyemo A Chen G Hailu E Aseffa AA-O Davey GA-ONewport MJ Rotimi CA-OX 2015 Novel genomic signals of recentselection in an Ethiopian population Eur J Hum Genet23(8)1085ndash1092

Thompson EE Kuttab-Boulos H Witonsky D Yang L Roe BA Di RienzoA 2004 CYP3A variation and the evolution of salt-sensitivity var-iants Am J Hum Genet 75(6)1059ndash1069

Tishkoff SA Reed FA Ranciaro A Voight BF Babbitt CC Silverman JSPowell K Mortensen HM Hirbo JB Osman M et al 2007Convergent adaptation of human lactase persistence in Africa andEurope Nat Genet 39(1)31ndash40

Triska P Soares P Patin E Fernandes V Cerny V Pereira L 2015Extensive admixture and selective pressure across the Sahel BeltGenome Biol Evol 7(12)3484ndash3495

Uchil PD Quinlan BD Chan W-T Luna JM Mothes W 2008 TRIM E3ligases interfere with early and late stages of the retroviral life cyclePLoS Pathog 4(2)e16

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Vyas DN Al-Meeri A Mulligan CJ 2017 Testing support for the northernand southern dispersal routes out of Africa an analysis of Levantineand southern Arabian populations Am J Phys Anthropol164(4)736ndash749

Vyas DN Kitchen A Miro-Herrans AT Pearson LN Al-Meeri A MulliganCJ 2016 Bayesian analyses of Yemeni mitochondrial genomes sug-gest multiple migration events with Africa and Western Eurasia AmJ Phys Anthropol 159(3)382ndash393

Wagh K Bhatia A Alexe G Reddy A Ravikumar V Seiler M Boemo MYao M Cronk L Naqvi A et al 2012 Lactase persistence and lipidpathway selection in the Maasai PLoS One 7(9)e44751

Yin T Cook D Lawrence M 2012 ggbio an R package for extending thegrammar of graphics for genomic data Genome Biol 13(8)R77

Yu Y Ouyang Y Yao W 2018 shinyCircos an RShiny application forinteractive creation of Circos plot Bioinformatics 34(7)1229ndash1231

Zhao F McParland S Kearney F Du L Berry DP 2015 Detection ofselection signatures in dairy and beef cattle using high-density ge-nomic information Genet Sel Evol 4749

Fernandes et al doi101093molbevmsz005 MBE

586

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

classified AP and Iranian individuals in 12 clusters attestingtheir diverse genetic reservoir (supplementary fig S6Supplementary Material online) Proportions of ancestriesreflected the ones obtained in ADMIXTURE mostly AP andIranian individuals were clustered with LevantNorthAfrican populations (92 from Yemen 92 Iran 92 theUAE 84 Saudi Arabia and 80 Oman) and to a lesserextent with South Asians (4ndash8 of UAE and Oman indi-viduals) and sub-Saharan Africans including with ComoroMadagascar and Swahili populations (12 Oman 12Saudi Arabia and 4 Iran)

Overall GLOBETROTTER results highlighted recent ad-mixture events in the AP region (fig 2B and supplementarytable S4 Supplementary Material online) Between 400 and1000 years before present admixture events took place withEast African ancestry in all AP populations The non-Africanbackground was similar to a European background in allpopulations except in the UAE and Iran where a closenessto South Asian ancestry was identified The sub-SaharanAfrican proportion was double in the West AP than in theEast AP In Saudi Arabia the UAE and Oman multidateevents were more probable but the second event was evenyounger (67 153 and 160 BP respectively) and involved thesame ancestries of the older event

Local Ancestry InferenceThe RFMix software package (Maples et al 2013) was used toanalyze the genomes of admixed Arabian and Iranian popu-lations and to detect regions showing excess of a given pa-rental population (sub-Saharan African or South Asian indetriment of the main EuropeanLevant background)Results revealed statistically significant enrichment of a par-ticular ancestry in certain regions of the genome which areassociated with selection signals due to adaptive traits such asmalaria resistance skin pigmentation and lactose tolerance(fig 3A and B and supplementary figs S7 and S8Supplementary Material online show results for all testedpopulations supplementary table S5 SupplementaryMaterial online)

The region of the genome with the highest proportion ofsub-Saharan African ancestry (gt4SD) observed in Arabianand Iranian populations was located on chromosome 1 ina gene-rich region containing the DARC gene and severalolfactory receptor genes such as OR10J1 OR10J3 andOR10J5 This sub-Saharan African input was higher inYemen (a mean of 061 supplementary fig S9Supplementary Material online) than in other regions ofthe Peninsula (049 Oman 035 the UAE 028 Saudi Arabiaand 016 in Iran) Another region highly enriched in all pop-ulations for sub-Saharan African ancestry was located in chro-mosome 9 containing genes coding for forkhead box D4(FOXD4L4 and FOXD4L2) and spermatogenesis-associatedproteins (FAM75A5 and FAM75A7)

The RFMix analysis indicated a South Asian-enriched re-gion on chromosome 2 containing the RAB3GAP1 ZRANB3LCT MCM6 and CXCR4 genes They have been previouslyassociated with lactase persistence the ability to digest milkinto adulthood which is an adaptive trait mostly found in

European populations and a classic example of genetic adap-tation in humans (Tishkoff et al 2007 Campbell and Tishkoff2008 Triska et al 2015) This region reached highly significantvalues (gt4SD) in Saudi and Yemen and was also significant inthe UAE and Iran (3SD lt X lt 4SD) Another South Asian-enriched region (3SD lt X lt 4SD) was detected on chromo-some 5 in Yemen Saudi Arabia and the UAE containing theSLC45A2 gene associated with skin eye and hair pigmenta-tion Nevertheless the most significant South Asian-enrichedsignal detected in all populations (gt4SD) was located inchromosome 6 containing many genes related with the im-mune system (HLA-B HLA-DR HLA-DQ MICA MICB andTNF)

Selection in the AP and IranWhen checking positive selection signals in the AP and Iranby using integrated haplotype score (iHS) and populationextended haplotype homozygosity (XP-EHH) measures sev-eral genomic regions displayed significant values (fig 3CndashFand supplementary figs S10ndashS13 and supplementary tablesS6ndashS9 Supplementary Material online)

Particularly several signals previously associated with se-lection in non-African populations were detected East APand Iranian populations exhibited a strong signal of positiveselection (gt4SD) in the XP-EHH-European comparison onchromosome 2 region containing the R3HDM1 and LCTgenes associated to Europeans with food conversion effi-ciency and metabolism of lactose (Zhao et al 2015) TheSLC24A5 gene on chromosome 15 which is strongly associ-ated with skin pigmentation variation among European andSouth Asian populations (Basu Mallick et al 2013) presentedsigns of positive selection in all populations in iHS (gt4SD forYemen Saudi Arabia and the UAEgt3SD for Oman and Iran)and XP-EHH-South Asian (gt3SD for all populations) analysesAnother skin pigmentation-related gene SLC45A2 which islocated on chromosome 5 and is known to be strongly asso-ciated to this phenotype in European populations (Candilleet al 2012 Hernandez-Pacheco et al 2017) displayed signs ofpositive selection in all populations for the XP-EHH-Europeancomparison (gt4SD in Saudi Arabia and gt3SD in remainingpopulations) The same occurred for chromosome 4 regioncontaining the TLR1 TLR6 and TLR10 genes (role in pathogenrecognition and activation of innate immunity [Laayouniet al 2014] gt4SD in Yemen and Saudi Arabia and gt3SDin remaining populations) and on chromosome 12 regioncontaining the ATXN2 ACAD10 and ALDH2 genes (associ-ated with hypertension [Russo et al 2018] gt4SD in SaudiArabia and gt3SD in remaining populations)

In the comparison with African populations (XP-EHH-African) significant values (gt3SD) were observed for thechromosome 1 region containing the DARC gene in SaudiArabia and Iran Amongst the strongest signals of positiveselection was the chromosome 21 region containing theC21orf34 MIR99A and MIRLET7C genes detected in all pop-ulations (gt4SD) both in iHS and XP-EHH-African tests Thissignal was previously identified as amongst the strongest se-lection signals in non-Africans (Pickrell et al 2009) and re-cently C21orf34 was identified as MIR99AHG a miRNA gene

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

579

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

FIG 3 Circos plots highlighting significantly (gt4SD) enriched local admixed and positively selected segments in Arabian Peninsula and Iranianpopulations (A) Enriched sub-Saharan African ancestry (B) Enriched South Asian ancestry (C) iHS selection metrics (D) XP-EHH versus Europeanancestry (E) XP-EHH versus sub-Saharan African ancestry (F) XP-EHH versus South Asian ancestry

Fernandes et al doi101093molbevmsz005 MBE

580

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

involved in the regulation of hematopoiesis and oncogenesfor the development of myeloid leukemia (Emmrich et al2014) In addition a signal on the chromosome 10 regioncontaining the SPAG6 gene was observed in all populationsin iHS (except Yemen) and XP-EHH-African (stronger inOman and the UAE) analyses This gene is essential for thestructural integrity of the central microtubule of sperm sta-bility and flagellar motility and has been previously identifiedas being under selection before the separation betweenEuropean and Asian populations (Racimo 2016) The signifi-cant signal in iHS score on chromosome 7 region containingseveral genes (CYP3A5 ZSCAN25 ATP5J2-PTCD1 CPSF4ATP5J2 ZNF789 and ZKSCAN5) was also detected in all pop-ulations at the XP-EHH-African comparison CYP3A5 gene isinvolved in cholesterol metabolism and steroid biosynthesisin the liver and has been identified to be under positive se-lection in African and non-African populations (Wagh et al2012) CYP3A513 polymorphism was also described as be-ing under selective pressure for water retention and risk forsalt-sensitive hypertension in equatorial populations(Thompson et al 2004)

The iHS metrics identified a positively selected region onchromosome 6 in all populations which contains candidategenes involved in the human immunodeficiency virus (HIV)stages of the viral lifecycle the ZNRD1-AS1and TRIM26 genesZNRD1-AS1 plays a role in the regulation of cell proliferationand is essential for the completion of HIV lifecycle Itsrs3132130 SNP was described to confer host resistance toHIV-1 acquisition by causing a loss of nuclear factor bindingand decreasing the ZNRD1 promoter activity (An et al 2014)Although with unknown function the tripartite motif (TRIM)proteins may contribute to the innate immunity of retrovi-ruses affecting both early and late stages of the retroviral lifecycle (Ozato et al 2008 Uchil et al 2008) Another signifi-cantly iHS selected region in all populations except the SaudiArabians was identified on chromosome 22 region contain-ing the TTC38 PKDREJ GTSE1 and PPARA genes PPARA geneis a transcription receptor that regulates lipid and glucosemetabolism during food deficiency and a recent study(Tekola-Ayele et al 2015) found a specific signal of recentpositive selection of this gene in Ethiopia suggesting a met-abolic adaptation to high-altitude hypoxia

ConclusionsThe detailed genome-wide characterization of Arabian andIranian populations allowed us to establish for the first timethe palimpsest of admixture and selection along the genomesof these inhabitants the main bridge between African Asianand European population groups Admixture events com-prised up to 20 of sub-Saharan African ancestry in theWest and 20 of South Asian ancestry in the East of theAP enriching the local predominant Arabian- North African-Levantine and Caucasian-like components The sub-SaharanAfrican input decreased toward the East whereas the SouthAsian input decreased in the direction of the West of thePeninsula testifying the continuous genetic exchange withAfrica and Asia as it was evident in the IBD sharing analysis

Of course there are specific subgroups that depart from thesegeneral characteristics such as the Qatari Bedouins that rep-resent a ldquocoldspotrdquo of admixture in the AP (Rodriguez-Floreset al 2016) whereas the Hadrami in Yemen are a ldquohotspotrdquo ofadmixture (Vyas et al 2017) EEMS analysis identified higherexchanges of genes between Africa and the West AP as wellas between the East AP and the LevantSouth Asia thanbetween the West and the East AP This result agrees withpreviously documented degree of isolation between Westand East AP populations identified when studying uniparen-tal markers (Alshamali et al 2009 Fernandes et al 2012 2015)We previously found relict maternal lineages (60 ka) of theoriginal OOA successful settlement by modern humans in theAP and most likely they spread to South Asia and the LevantEurope from the Gulf corridor (Fernandes et al 2012)However current LD decayndashbased dates only indicated re-cent admixture events in the eastern side of the PeninsulaSimilarly younger ages of admixture in the West AP reflectedthe Islamization of North Africa as the most probable maincontributor of this region The exchanges across the Red Seadetected for the autosomal data matched the Arab slavetrade and maritime dominance

Recent studies have provided evidence that admixed pop-ulations acquired positively selected genomic regions fromtheir parental populations or in other words the distributionof admixed blocks across the genomes of descendants wasnot even (Triska et al 2015 Laso-Jadart et al 2017 Patin et al2017) The acquired selected regions can be inferred from theconcomitant identification of the same genomic regions inthe selection metrics and in the local enrichment ancestryanalysis AP populations displayed a jigsaw puzzle of AfricanEuropean and South Asian-selected genes linked to responseto infection diseases skin pigmentation lactase persistenceand food conversion

The strongest African-related enrichment sign was for theDARC gene which codes the Duffy antigen a membrane-bound chemokine receptor used by Plasmodium vivax toinfect red blood cells Plasmodium vivax causes a chronicform of malaria and is the most widespread type of malariaoutside Africa (McManus et al 2017) It is described that thederived C allele mutation in the promoter which causeserythroid-specific suppression of gene expression confers re-sistance against Plasmodium vivax malaria and thus it is al-most fixed in African populations (Triska et al 2015 Laso-Jadart et al 2017 McManus et al 2017) The AP is known tohave a diverse malaria epidemiology in recent times from alikely absence of transmission in Kuwait to an intense parasiteexposure similar to conditions in many parts of Africa amongresidents of Saudi Arabia and Yemen (Snow et al 2013) Thedetected enrichment signal confirmed here in the genomes ofArabians supported the higher malaria pressure in the WestAP The enrichment was strong enough to allow detection ofselection in the XP-EHH metrics in Saudi Arabia and Iran

The most significant EuropeanSouth Asian-related enrich-ments in AP populations were linked to lactase tolerance-specific chromosome 2 RAB3GAP1LCTMCM6 region skinpigmentation associated to the SLC24A5 gene and HLA re-gion on chromosome 6 The LCT gene 13915G allele

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

581

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

(rs41380347) was described as being present at high fre-quency in the AP and the Levant and estimated to havebeen originated 4 ka (62 ka) in the AP (Bayoumi et al2016) possibly as a result of the domestication of the Arabiancamel at 6 ka (Ranciaro et al 2014) By contrast theEuropean 13910T selected allele (rs4988235) is also themost commonly observed in South Asia (Gallego Romeroet al 2012) We could not verify frequencies of rs41380347SNP that was not part of the chip genotyped here but wechecked the linked rs4988243 SNP This last SNP testified thehigher frequencies in the West versus the East AP (432 inYemen 461 in Saudi Arabia 179 in Oman 191 in theUAE and 174 in Iran notice that although rs41380347 andrs4988243 are linked this last SNP has overall frequencies20 higher than the first) The genotyped Europeanrs4988235 SNP presented higher frequencies of the derivedallele in the East than in the West AP (06 13 125182 and 103 respectively) The haplotype backgroundof Arabian and European derived alleles are related and dis-tant from the African one (Ranciaro et al 2014) whichexplains the high non-African enrichment detected inRFMix analysis in the West AP (the Arabian haplotypes)and the selection signal in XP-EHH-European comparison inThe East AP and Iran (the European haplotypes) This is aninteresting case generated by the isolation between the twosides of the Peninsula where the West developed a localpositive selection of a lactose tolerant variant whereas theEast acquired the EuropeanSouth Asian variant Previouslythis difference had been reported for ethnic groups withinOman (Bayoumi et al 2016) where frequencies of13910Tand 13915G alleles diverged between Arabs of northernand southern Oman whom migrated from Yemen (0ndash1 and14ndash72 respectively) and Omanis of Asian origin (16 and0 respectively) Here we verified that this pattern is ex-tended to the entire Peninsula

In relation to the genetic diversity linked to skin color theEuropeanSouth Asian-selected derived SNPs are equally fre-quent in both the West and East AP (rs1426654 in theSLC24A5 gene 101 in Yemen 133 in Saudi Arabia132 in Oman 135 in the UAE and 118 in Iranrs183671 in the SLC45A2 gene 354 294 227 290and 401 respectively rs1667394 in the OCA2HERC2 gene381 288 307 335 and 349 respectively)

The palimpsest of complex interactions between cross andlocal selective pressures and demographic factors identifiedhere in the AP testifies the rich genetic dynamics taking placein bridging zones of the globe In the near future wholegenomes of Arabians across the entire peninsula (with amore precise geographical information than the one we couldpresent here) and more sensitive selection measures will addinformation to the role played by this geographic region inthe adaptation of modern humans to non-African habitatsImprovements to dating techniques applicable to completegenomes such as multiple sequentially Markovian coalescentapproach will also help in answering the question if the APwas only a passage way to reach further distances or one ofthe oldest inhabited regions OOA Complete genomes mayindeed shed light on the lost period in the mtDNA pool due

to the absence of the direct OOA descendants The first stepshave already been taken and a few Arabian completegenomes have begun to be available (John et al 2015Mallick et al 2016 Rodriguez-Flores et al 2016)

Materials and Methods

Population Samples Genome-Wide Genotyping andPublished DataDNA samples analyzed in this study were collected from 420individuals of the AP (Saudi Arabia Yemen Oman and theUAE) and from 80 individuals of Iran populations (furtherinformation provided in supplementary table S1Supplementary Material online) Individuals were residing inDubai at the time of sampling (between years 1991 and 2000)representing a widespread collection of birth origins fromSaudi Arabia Yemen Oman the UAE and Iran This studyobtained ethical approval from the Ethics Committee of theUniversity of Porto Portugal (17CEUP2012) The individualswere genotyped with the Illumina Human Omni ExpressBead Chip (OmniExpress) containing 741000 SNPs Qualitychecks were applied using PLINK (Purcell et al 2007 Changet al 2015) and 22 individuals were excluded for the lack ofmore than 10 of genotypes A preliminary PCA revealedthat some UAE and Oman individuals (five individuals fromeach population) had a considerable sub-Saharan African an-cestry an indicator that they were most probably recentimmigrants from Africa (supplementary fig S14Supplementary Material online) Hence these individualswere removed from the analyses leading to a final set of468 individuals Markers with genotyping call rates lowerthan 95 were removed consequently a total of 627435autosomal SNPs passed quality control check The genotypesfrom the individuals analyzed in this study can be accessedfrom the EGA repository (European Genome-PhenomeArchive accession number EGAS00001003335) In order toperform a fine-scale population structure analysis of our com-plete data the 468 individuals were merged with relevantavailable data sets (supplementary table S1 SupplementaryMaterial online) As several population genetic analyses as-sume independent markers SNPs were pruned for pairwiseLD in PLINK (wwwcog-genomicsorgplink19 Purcell et al2007 Chang et al 2015) by removing any SNP that had an r2

gt 04 with another SNP within a 50-SNPs sliding windowwith a step of 20 SNPs After pruning the ldquorestricted data setrdquoincluded 176309 autosomal SNPs and 1782 individuals (sup-plementary table S1 Supplementary Material online) andthis data set was used in the analyses based on allele frequen-cies (PCA ADMIXTURE FST and f3 statistic) Another data setwas created by randomly selecting a maximum of 25 individ-uals from each population but extending the number andthe geographical distribution of populations this data setconsisting of 171728 SNPs and 2056 individuals was desig-nated as ldquoextended data setrdquo (supplementary table S2Supplementary Material online) The homogenization ofthe number of samples per group avoided statistical biasesdue to overrepresentation of some populations and allowedto run computationally demanding algorithms such as EEMS

Fernandes et al doi101093molbevmsz005 MBE

582

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

fineSTRUCTURE Chromopainter GLOBETROTTER and IBDAll genotypes from the unpruned ldquoextended data setrdquo werephased with SHAPEIT v2r79044 (Delaneau et al 2011) usingthe 1000 Genomes phased data (Auton et al 2015) as a ref-erence panel and the HapMap phase 2 genetic map (Frazeret al 2007) We did not merge our data with relevant data setsfrom Qatar and Yemen (Hunter-Zinck et al 2010 Vyas et al2017) as these were screened through Affymetrix chips bear-ing a low number of overlapping SNPs Qatari completegenomes from (Rodriguez-Flores et al 2016) were specificallyselected not making a random population sample whichcould be comparable with our populations

Population Structure Differentiation and InbreedingPCA which infers worldwide axes of human genetic variationfrom the allele frequencies of various populations was carriedout in the pruned ldquorestricted data setrdquo by using the smartpcatool included in the EIGENSOFT package (Patterson et al2006 Price et al 2006) We ran ADMIXTURE (Alexander et al2009) to decompose genetic ancestries of the prunedldquorestricted data setrdquo using maximum likelihood for compo-nents Kfrac14 2 to Kfrac14 12 with the optimal K estimated throughcross-validation of the logistic regression (Kfrac14 6) Wrightrsquos FST

metric was calculated using Vcftools (Danecek et al 2011)The THREEPOP program from TreeMix was used to calculatef3 statistics (Reich et al 2009 Patterson et al 2012 Pickrell andPritchard 2012) which tests between alternative scenarios ofadmixture In order to analyze and visualize spatial populationstructure from georeferenced genetic samples and to definegenetic barriers and corridors the method of EEMS (Petkovaet al 2016) was used Geographic coordinates and a geneticdissimilarity matrix between populations were used to set upa map defining a grid of 500 demes and 3000000 Markovchain Monte Carlo (MCMC) iterations were run beforechecking for convergence Plots were generated in R followingEEMS v1 manual indications

Inference and Dating of Population AdmixturePopulation structure of the phased ldquoextended data setrdquo wasevaluated using the fineSTRUCTURE v207 package 18(Lawson et al 2012) fineSTRUCTURE captures informationon population structure provided by patterns of haplotypesimilarity calculated with Chromopainter v20 18 (Lawsonet al 2012) and performs a model-based Bayesian clusteringof genotypes From the results a coancestry heat map and adendrogram were inferred to visualize the number of statis-tically defined clusters that describe the data The clustersobtained by fineSTRUCTURE were then used to assign thepopulations as target or donorsurrogate when testing foradmixture Mutational rates and effective population sizeparameters were first estimated with an Estimation-Maximization algorithm running Chromopainter v2 on all22 autosomes for the entire data set with 10 iterations(Lawson et al 2012) The weighted average of these param-eters according to the SNP coverage of each chromosomeand the number of individuals was then used to compute thechromosomal painting Two Arabian groups were identifiedas target populations the western group represented by

Yemen and Saudi Arabia and the eastern group representedby Oman and the UAE We ran chromosomal painting foreach group Due to the possibility of common ancestry weexcluded the Levantine populations from surrogate popula-tions The painted chromosomes obtained for each clusterwere used in GLOBETROTTER v10 26 (Hellenthal et al 2014)to estimate the ratios and dates of the potential admixtureevents in the region Coancestry curves were estimated foreach target population both standardizing against a null(NULL 1) and standard (NULL 0) individual and consistencybetween each estimated parameter was checked A total of100 bootstrap resamplings were performed to estimate theprobability values of the admixture events considering theldquoNULLrdquo individual by calculating the proportion of non-sensical inferred dates (lt1 or gt400 generations) producedby the model (NULL 1) The ldquobest-guessrdquo scenario given byGLOBETROTTER v10 26 (Hellenthal et al 2014) was consid-ered for each target population Dates of admixture given ingenerations were converted to chronological time using ageneration interval of 29 years (Langergraber et al 2012)

Haplotype sharing between pairs of individuals was esti-mated by the refined IBD algorithm of Beagle v4026 27(Browning and Browning 2007) and visualized withCytoscape v32160 (Shannon et al 2003) All maps inthe present study were generated using Global Mapperv15 (httpwwwbluemarblegeocomproductsglobal-mapperphp)

Local Ancestry InferenceRFMix (Maples et al 2013) software which employs a LDmodel between markers to infer ancestry for each segmentof the genome between a mixture of putative ancestral panelsof haplotypes was used in the unpruned and phased data setfrom the four AP populations plus Iran The phased data fromthe 1000 Genomes populations of Great Britain (representingEuropean ancestry) Yoruba from Nigeria (sub-SaharanAfrican ancestry) and Indian Telugu (South Asian ancestry)were used as parental populations Information on ancestrywas obtained for each locus along chromosomes for everyindividual and these values were averaged in each popula-tion In order to identify regions which had a significantlyhigher proportion of a given parental ancestry we followedpreviously published studies (Bryc et al 2015) consideringonly regions outside the range defined by the genomemean 6 4SD The karyograms of the local ancestry acrossthe 22 chromosomes were generates in R using the ggbiopackage (Yin et al 2012)

Analysis of SelectioniHS (Voight et al 2006) was estimated in the SelScan package(Szpiech and Hernandez 2014) in the phased data set fromthe four AP populations plus Iran Scores were calculated foreach SNP within the population as a whole and standardizedwithin each of 100 bins of allele frequency using the normtool In order to identify loci potentially affected by positiveselection a percentile score was assigned based on the pro-portion of extreme iHS values in segment SelScan (Szpiechand Hernandez 2014) tool was also used to estimate cross

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

583

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

population extended haplotype homozygosity in each APpopulation by comparing with Great Britain Yoruba fromNigeria or Indian Telugu Consistently with other tests forselection only markers with MAFgt 1 were used for XP-EHH computation The obtained XP-EHH values were nor-malized by subtracting genome-wide mean XP-EHH and di-viding by standard deviation (Szpiech and Hernandez 2014)Genomic regions with percentile score exceeding 4SD and3SD were reported

Selected SNPs were mapped to genes in 5 kb flanking re-gion and significantly selected regions were represented incircos Plots generated in R (R Development Core Team 2018)using the shinyCircos graphical interface (Yu et al 2018)

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsThis work was financed by FEDERmdashFundo Europeu deDesenvolvimento Regional funds through COMPETE2020mdashOperacional Programme for Competitiveness andInternationalisation (POCI) Portugal 2020 and byPortuguese funds through FCTmdashFundac~ao para a Ciencia ea TecnologiaMinisterio da Ciencia Tecnologia e Inovac~ao inthe framework of the project ldquoBiomedical anthropologicalstudy in Arabian Peninsula based on high throughputgenomicsrdquo (POCI-01-0145-FEDER-016609) VF has a postdocgrant through FCT (SFRHBPD1149272016)

ReferencesAl-Abri A Podgorna E Rose JI Pereira L Mulligan CJ Silva NM Bayoumi

R Soares P Cerny V 2012 PleistocenendashHolocene boundary inSouthern Arabia from the perspective of human mtDNA variationAm J Phys Anthropol 149(2)291ndash298

Alexander DH Novembre J Lange K 2009 Fast model-based estimationof ancestry in unrelated individuals Genome Res 19(9)1655ndash1664

Alshamali F Pereira L Budowle B Poloni ES Currat M 2009 Localpopulation structure in Arabian Peninsula revealed by Y-STR diver-sity Hum Hered 68(1)45ndash54

An P Goedert JJ Donfield S Buchbinder S Kirk GD Detels R Winkler CA2014 Regulatory variation in HIV-1 dependency factor ZNRD1 asso-ciates with host resistance to HIV-1 acquisition J Infect Dis210(10)1539ndash1548

Auton A Brooks LD Durbin RM Garrison EP Kang HM Korbel JOMarchini JL McCarthy S McVean GA Abecasis GR 2015 A globalreference for human genetic variation Nature 526(7571)68ndash74

Basu Mallick C Iliescu FM Mols M Hill S Tamang R Chaubey G Goto RHo SYW Gallego Romero I Crivellaro F et al 2013 The light skinallele of SLC24A5 in South Asians and Europeans shares identity bydescent PLoS Genet 9(11)e1003912

Bayoumi R De Fanti S Sazzini M Giuliani C Quagliariello A Bortolini EBoattini A Al-Habori M Al-Zubairi AS Rose JI et al 2016 Positiveselection of lactase persistence among people of Southern ArabiaAm J Phys Anthropol 161(4)676ndash684

Browning BL Browning SR 2013 Improving the accuracy and efficiencyof identity-by-descent detection in population data Genetics194(2)459ndash471

Browning SR Browning BL 2007 Rapid and accurate haplotype phasingand missing-data inference for whole-genome association studies byuse of localized haplotype clustering Am J Hum Genet81(5)1084ndash1097

Brucato N Fernandes V Mazieres S Kusuma P Cox MP Ngrsquoangrsquoa JWOmar M Simeone-Senelle MC Frassati C Alshamali F et al 2018The Comoros show the earliest Austronesian gene flow into theSwahili corridor Am J Hum Genet 102(1)58ndash68

Bryc K Durand EY Macpherson JM Reich D Mountain JL 2015 Thegenetic ancestry of African Americans Latinos and EuropeanAmericans across the United States Am J Hum Genet 96(1)37ndash53

Campbell MC Tishkoff SA 2008 African genetic diversity implica-tions for human demographic history modern human originsand complex disease mapping Annu Rev Genomics Hum Genet9403ndash433

Candille SI Absher DM Beleza S Bauchet M McEvoy B GarrisonNA Li JZ Myers RM Barsh GS Tang H et al 2012 Genome-wideassociation studies of quantitatively measured skin hair and eyepigmentation in four European populations PLoS One 7(10)e48294

Cerny V Mulligan CJ Fernandes V Silva NM Alshamali F Non A HarichN Cherni L El Gaaied AB Al-Meeri A et al 2011 Internal diversifi-cation of mitochondrial haplogroup R0a reveals post-last glacialmaximum demographic expansions in South Arabia Mol Biol Evol28(1)71ndash78

Cerny V Pereira L Kujanova M Vasikova A Hajek M Morris M MulliganCJ 2009 Out of Arabiamdashthe settlement of island Soqotra as revealedby mitochondrial and Y chromosome genetic diversity Am J PhysAnthropol 138(4)439ndash447

Chang CC Chow CC Tellier LC Vattikuti S Purcell SM Lee JJ 2015Second-generation PLINK rising to the challenge of larger and richerdatasets Gigascience 47

Danecek P Auton A Abecasis G Albers CA Banks E DePristo MAHandsaker RE Lunter G Marth GT Sherry ST et al 2011 The variantcall format and VCFtools Bioinformatics 27(15)2156ndash2158

Delaneau O Marchini J Zagury JF 2011 A linear complexity phasingmethod for thousands of genomes Nat Methods 9(2)179ndash181

Emmrich S Streltsov A Schmidt F Thangapandi VR Reinhardt DKlusmann JH 2014 LincRNAs MONC and MIR100HG act as onco-genes in acute megakaryoblastic leukemia Mol Cancer 13171

Fernandes V Alshamali F Alves M Costa MD Pereira JB Silva NMCherni L Harich N Cerny V Soares P et al 2012 The Arabian cradlemitochondrial relicts of the first steps along the southern route outof Africa Am J Hum Genet 90(2)347ndash355

Fernandes V Triska P Pereira JB Alshamali F Rito T Machado AFajkosova Z Cavadas B Cerny V Soares P et al 2015 Genetic stra-tigraphy of key demographic events in Arabia PLoS One10(3)e0118625

Frazer KA Ballinger DG Cox DR Hinds DA Stuve LL Gibbs RA BelmontJW Boudreau A Hardenbol P Leal SM et al 2007 A second gener-ation human haplotype map of over 31 million SNPs Nature449(7164)851ndash861

Fu Q Posth C Hajdinjak M Petr M Mallick S Fernandes D FurtwanglerA Haak W Meyer M Mittnik A et al 2016 The genetic history of IceAge Europe Nature 534(7606)200ndash205

Gallego Romero I Basu Mallick C Liebert A Crivellaro F Chaubey G ItanY Metspalu M Eaaswarkhanth M Pitchappan R Villems R et al2012 Herders of Indian and European cattle share their predomi-nant allele for lactase persistence Mol Biol Evol 29(1)249ndash260

Gravel S Zakharia F Moreno-Estrada A Byrnes JK Muzzio M Rodriguez-Flores JL Kenny EE Gignoux CR Maples BK Guiblet W et al 2013Reconstructing Native American migrations from whole-genomeand whole-exome data PLoS Genet 9(12)e1004023

Hellenthal G Busby GBJ Band G Wilson JF Capelli C Falush D Myers S2014 A genetic atlas of human admixture history Science343(6172)747ndash751

Hernandez-Pacheco N Flores C Alonso S Eng C Mak ACY Hunstman SHu D White MJ Oh SS Meade K et al 2017 Identification of a novellocus associated with skin colour in African-admixed populationsSci Rep 744548

Hodgson JA Mulligan CJ Al-Meeri A Raaum RL 2014 Early back-to-Africa migration into the horn of Africa PLoS Genet 10(6)e1004393

Fernandes et al doi101093molbevmsz005 MBE

584

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Hogarth DG 1904 The penetration of Arabia a record of the develop-ment of western knowledge concerning the Arabian Peninsula NewYork Cambridge Library Collection

Hunter-Zinck H Musharoff S Salit J Al-Ali KA Chouchane L Gohar AMatthews R Butler MW Fuller J Hackett NR et al 2010 Populationgenetic structure of the people of Qatar Am J Hum Genet87(1)17ndash25

John SE Thareja G Hebbar P Behbehani K Thanaraj TA Alsmadi O2015 Kuwaiti population subgroup of nomadic Bedouin ancestrymdashwhole genome sequence and analysis Genom Data 3116ndash127

Kivisild T Reidla M Metspalu E Rosa A Brehm A Pennarun E Parik JGeberhiwot T Usanga E Villems R 2004 Ethiopian mitochondrialDNA heritage tracking gene flow across and around the gate oftears Am J Hum Genet 75(5)752ndash770

Laayouni H Oosting M Luisi P Ioana M Alonso S Rica~no-Ponce ITrynka G Zhernakova A Plantinga TS Cheng S-C et al 2014Convergent evolution in European and Rroma populations revealspressure exerted by plague on Toll-like receptors Proc Natl Acad SciU S A 111(7)2668ndash2673

Lahr MM Foley R 2005 Multiple dispersals and modern human originsEvol Anthropol 3(2)48ndash60

Langergraber KE Prufer K Rowney C Boesch C Crockford C Fawcett KInoue E Inoue-Muruyama M Mitani JC Muller MN et al 2012Generation times in wild chimpanzees and gorillas suggest earlierdivergence times in great ape and human evolution Proc Natl AcadSci U S A 109(39)15716ndash15721

Laso-Jadart R Harmant C Quach H Zidane N Tyler-Smith C Mehdi QAyub Q Quintana-Murci L Patin E 2017 The genetic legacy of theIndian Ocean slave trade recent admixture and post-admixture se-lection in the Makranis of Pakistan Am J Hum Genet101(6)977ndash984

Lawson DJ Hellenthal G Myers S Falush D 2012 Inference of populationstructure using dense haplotype data PLoS Genet 8(1)e1002453

Li JZ Absher DM Tang H Southwick AM Casto AM Ramachandran SCann HM Barsh GS Feldman M Cavalli-Sforza LL et al 2008Worldwide human relationships inferred from genome-wide pat-terns of variation Science 319(5866)1100ndash1104

Loh PR Lipson M Patterson N Moorjani P Pickrell JK Reich D Berger B2013 Inferring admixture histories of human populations using link-age disequilibrium Genetics 193(4)1233ndash1254

Lovejoy PE 2011 Transformations in slavery a history of slavery in AfricaCambridge Cambridge University Press

Macaulay V Hill C Achilli A Rengo C Clarke D Meehan W Blackburn JSemino O Scozzari R Cruciani F et al 2005 Single rapid coastalsettlement of Asia revealed by analysis of complete mitochondrialgenomes Science 308(5724)1034

Malaspinas A-S Westaway MC Muller C Sousa VC Lao O Alves IBergstrom A Athanasiadis G Cheng JY Crawford JE et al 2016 Agenomic history of Aboriginal Australia Nature 538(7624)207

Mallick S Li H Lipson M Mathieson I Gymrek M Racimo F Zhao MChennagiri N Nordenfelt S Tandon A et al 2016 The SimonsGenome Diversity Project 300 genomes from 142 diverse popula-tions Nature 538(7624)201

Maples BK Gravel S Kenny EE Bustamante CD 2013 RFMix a discrim-inative modeling approach for rapid and robust local-ancestry infer-ence Am J Hum Genet 93(2)278ndash288

McManus KF Taravella AM Henn BM Bustamante CD Sikora MCornejo OE 2017 Population genetic analysis of the DARC locus(Duffy) reveals adaptation from standing variation associated withmalaria resistance in humans PLoS Genet 13(3)e1006560

Moorjani P Patterson N Hirschhorn JN Keinan A Hao L Atzmon GBurns E Ostrer H Price AL Reich D 2011 The history of African geneflow into Southern Europeans Levantines and Jews PLoS Genet7(4)e1001373

Musilova E Fernandes V Silva NM Soares P Alshamali F Harich NCherni L Gaaied AB Al-Meeri A Pereira L et al 2011 Populationhistory of the Red Seamdashgenetic exchanges between the ArabianPeninsula and East Africa signaled in the mitochondrial DNA HV1haplogroup Am J Phys Anthropol 145592ndash598

Ozato K Shin D-M Chang T-H Morse HC 2008 TRIM family proteinsand their emerging roles in innate immunity Nat Rev Immunol8(11)849ndash860

Patin E Lopez M Grollemund R Verdu P Harmant C Quach H Laval GPerry GH Barreiro LB Froment A et al 2017 Dispersals and geneticadaptation of Bantu-speaking populations in Africa and NorthAmerica Science 356(6337)543ndash546

Patterson N Moorjani P Luo Y Mallick S Rohland N Zhan YGenschoreck T Webster T Reich D 2012 Ancient admixture inhuman history Genetics 192(3)1065

Patterson N Price AL Reich D 2006 Population structure and eigena-nalysis PLoS Genet 2(12)e190

Petkova D Novembre J Stephens M 2016 Visualizing spatial populationstructure with estimated effective migration surfaces Nat Genet48(1)94ndash100

Pickrell JK Coop G Novembre J Kudaravalli S Li JZ Absher D SrinivasanBS Barsh GS Myers RM Feldman MW et al 2009 Signals of recentpositive selection in a worldwide sample of human populationsGenome Res 19(5)826ndash837

Pickrell JK Pritchard JK 2012 Inference of population splits and mixturesfrom genome-wide allele frequency data PLoS Genet8(11)e1002967

Price AL Patterson NJ Plenge RM Weinblatt ME Shadick NA Reich D2006 Principal components analysis corrects for stratification ingenome-wide association studies Nat Genet 38(8)904ndash909

Purcell S Neale B Todd-Brown K Thomas L Ferreira MA Bender DMaller J Sklar P de Bakker PI Daly MJ et al 2007 PLINK a tool set forwhole-genome association and population-based linkage analysesAm J Hum Genet 81(3)559ndash575

Quintana-Murci L Semino O Bandelt H-J Passarino G McElreavey KSantachiara-Benerecetti AS 1999 Genetic evidence of an early exit ofHomo sapiens sapiens from Africa through eastern Africa Nat Genet23(4)437

R Development Core Team 2018 R a language and environment forstatistical computing Vienna (Austria)R Foundation for StatisticalComputing

Racimo F 2016 Testing for ancient selection using cross-populationallele frequency differentiation Genetics 202(2)733ndash750

Ranciaro A Campbell MC Hirbo JB Ko W-Y Froment A Anagnostou PKotze MJ Ibrahim M Nyambo T Omar SA et al 2014 Geneticorigins of lactase persistence and the spread of pastoralism inAfrica Am J Hum Genet 94(4)496ndash510

Reich D Thangaraj K Patterson N Price AL Singh L 2009Reconstructing Indian population history Nature461(7263)489ndash494

Roche B Rougeron V Quintana-Murci L Renaud F Abbate JLPrugnolle F 2017 Might interspecific interactions between patho-gens drive host evolution The case of Plasmodium species andDuffy-negativity in human populations Trends Parasitol 33(1)21ndash29

Rodriguez-Flores JL Fakhro K Agosto-Perez F Ramstetter MD Arbiza LVincent TL Robay A Malek JA Suhre K Chouchane L et al 2016Indigenous Arabs are descendants of the earliest split from ancientEurasian populations Genome Res 26(2)151ndash162

Rosenberg NA Huang L Jewett EM Szpiech ZA Jankovic I Boehnke M2010 Genome-wide association studies in diverse populations NatRev Genet 11(5)356ndash366

Russo A Di Gaetano C Cugliari G Matullo G 2018 Advances in thegenetics of hypertension the effect of rare variants Int J Mol Sci19(3)688

Segal R 2002 Islamrsquos black slaves the other black diaspora LondonAtlantic Books

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin NSchwikowski B Ideker T 2003 Cytoscape a software environmentfor integrated models of biomolecular interaction networks GenomeRes 13(11)2498ndash2504

Skoglund P Thompson JC Prendergast ME Mittnik A Sirak K HajdinjakM Salie T Rohland N Mallick S Peltzer A et al 2017 Reconstructingprehistoric African population structure Cell 171(1)59ndash71 e21

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

585

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Snow RW Amratia P Zamani G Mundia CW Noor AM Memish ZA AlZahrani MH Al Jasari A Fikri M Atta H 2013 The malaria transitionon the Arabian Peninsula progress toward a malaria-free regionbetween 1960ndash2010 Adv Parasitol 82205ndash251

Soares P Alshamali F Pereira JB Fernandes V Silva NM Afonso C CostaMD Musilova E Macaulay V Richards MB et al 2012 The expan-sion of mtDNA haplogroup L3 within and out of Africa Mol BiolEvol 29(3)915ndash927

Szpiech ZA Hernandez RD 2014 selscan an efficient multithreadedprogram to perform EHH-based scans for positive selection MolBiol Evol 31(10)2824ndash2827

Tekola-Ayele F Adeyemo A Chen G Hailu E Aseffa AA-O Davey GA-ONewport MJ Rotimi CA-OX 2015 Novel genomic signals of recentselection in an Ethiopian population Eur J Hum Genet23(8)1085ndash1092

Thompson EE Kuttab-Boulos H Witonsky D Yang L Roe BA Di RienzoA 2004 CYP3A variation and the evolution of salt-sensitivity var-iants Am J Hum Genet 75(6)1059ndash1069

Tishkoff SA Reed FA Ranciaro A Voight BF Babbitt CC Silverman JSPowell K Mortensen HM Hirbo JB Osman M et al 2007Convergent adaptation of human lactase persistence in Africa andEurope Nat Genet 39(1)31ndash40

Triska P Soares P Patin E Fernandes V Cerny V Pereira L 2015Extensive admixture and selective pressure across the Sahel BeltGenome Biol Evol 7(12)3484ndash3495

Uchil PD Quinlan BD Chan W-T Luna JM Mothes W 2008 TRIM E3ligases interfere with early and late stages of the retroviral life cyclePLoS Pathog 4(2)e16

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Vyas DN Al-Meeri A Mulligan CJ 2017 Testing support for the northernand southern dispersal routes out of Africa an analysis of Levantineand southern Arabian populations Am J Phys Anthropol164(4)736ndash749

Vyas DN Kitchen A Miro-Herrans AT Pearson LN Al-Meeri A MulliganCJ 2016 Bayesian analyses of Yemeni mitochondrial genomes sug-gest multiple migration events with Africa and Western Eurasia AmJ Phys Anthropol 159(3)382ndash393

Wagh K Bhatia A Alexe G Reddy A Ravikumar V Seiler M Boemo MYao M Cronk L Naqvi A et al 2012 Lactase persistence and lipidpathway selection in the Maasai PLoS One 7(9)e44751

Yin T Cook D Lawrence M 2012 ggbio an R package for extending thegrammar of graphics for genomic data Genome Biol 13(8)R77

Yu Y Ouyang Y Yao W 2018 shinyCircos an RShiny application forinteractive creation of Circos plot Bioinformatics 34(7)1229ndash1231

Zhao F McParland S Kearney F Du L Berry DP 2015 Detection ofselection signatures in dairy and beef cattle using high-density ge-nomic information Genet Sel Evol 4749

Fernandes et al doi101093molbevmsz005 MBE

586

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

FIG 3 Circos plots highlighting significantly (gt4SD) enriched local admixed and positively selected segments in Arabian Peninsula and Iranianpopulations (A) Enriched sub-Saharan African ancestry (B) Enriched South Asian ancestry (C) iHS selection metrics (D) XP-EHH versus Europeanancestry (E) XP-EHH versus sub-Saharan African ancestry (F) XP-EHH versus South Asian ancestry

Fernandes et al doi101093molbevmsz005 MBE

580

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

involved in the regulation of hematopoiesis and oncogenesfor the development of myeloid leukemia (Emmrich et al2014) In addition a signal on the chromosome 10 regioncontaining the SPAG6 gene was observed in all populationsin iHS (except Yemen) and XP-EHH-African (stronger inOman and the UAE) analyses This gene is essential for thestructural integrity of the central microtubule of sperm sta-bility and flagellar motility and has been previously identifiedas being under selection before the separation betweenEuropean and Asian populations (Racimo 2016) The signifi-cant signal in iHS score on chromosome 7 region containingseveral genes (CYP3A5 ZSCAN25 ATP5J2-PTCD1 CPSF4ATP5J2 ZNF789 and ZKSCAN5) was also detected in all pop-ulations at the XP-EHH-African comparison CYP3A5 gene isinvolved in cholesterol metabolism and steroid biosynthesisin the liver and has been identified to be under positive se-lection in African and non-African populations (Wagh et al2012) CYP3A513 polymorphism was also described as be-ing under selective pressure for water retention and risk forsalt-sensitive hypertension in equatorial populations(Thompson et al 2004)

The iHS metrics identified a positively selected region onchromosome 6 in all populations which contains candidategenes involved in the human immunodeficiency virus (HIV)stages of the viral lifecycle the ZNRD1-AS1and TRIM26 genesZNRD1-AS1 plays a role in the regulation of cell proliferationand is essential for the completion of HIV lifecycle Itsrs3132130 SNP was described to confer host resistance toHIV-1 acquisition by causing a loss of nuclear factor bindingand decreasing the ZNRD1 promoter activity (An et al 2014)Although with unknown function the tripartite motif (TRIM)proteins may contribute to the innate immunity of retrovi-ruses affecting both early and late stages of the retroviral lifecycle (Ozato et al 2008 Uchil et al 2008) Another signifi-cantly iHS selected region in all populations except the SaudiArabians was identified on chromosome 22 region contain-ing the TTC38 PKDREJ GTSE1 and PPARA genes PPARA geneis a transcription receptor that regulates lipid and glucosemetabolism during food deficiency and a recent study(Tekola-Ayele et al 2015) found a specific signal of recentpositive selection of this gene in Ethiopia suggesting a met-abolic adaptation to high-altitude hypoxia

ConclusionsThe detailed genome-wide characterization of Arabian andIranian populations allowed us to establish for the first timethe palimpsest of admixture and selection along the genomesof these inhabitants the main bridge between African Asianand European population groups Admixture events com-prised up to 20 of sub-Saharan African ancestry in theWest and 20 of South Asian ancestry in the East of theAP enriching the local predominant Arabian- North African-Levantine and Caucasian-like components The sub-SaharanAfrican input decreased toward the East whereas the SouthAsian input decreased in the direction of the West of thePeninsula testifying the continuous genetic exchange withAfrica and Asia as it was evident in the IBD sharing analysis

Of course there are specific subgroups that depart from thesegeneral characteristics such as the Qatari Bedouins that rep-resent a ldquocoldspotrdquo of admixture in the AP (Rodriguez-Floreset al 2016) whereas the Hadrami in Yemen are a ldquohotspotrdquo ofadmixture (Vyas et al 2017) EEMS analysis identified higherexchanges of genes between Africa and the West AP as wellas between the East AP and the LevantSouth Asia thanbetween the West and the East AP This result agrees withpreviously documented degree of isolation between Westand East AP populations identified when studying uniparen-tal markers (Alshamali et al 2009 Fernandes et al 2012 2015)We previously found relict maternal lineages (60 ka) of theoriginal OOA successful settlement by modern humans in theAP and most likely they spread to South Asia and the LevantEurope from the Gulf corridor (Fernandes et al 2012)However current LD decayndashbased dates only indicated re-cent admixture events in the eastern side of the PeninsulaSimilarly younger ages of admixture in the West AP reflectedthe Islamization of North Africa as the most probable maincontributor of this region The exchanges across the Red Seadetected for the autosomal data matched the Arab slavetrade and maritime dominance

Recent studies have provided evidence that admixed pop-ulations acquired positively selected genomic regions fromtheir parental populations or in other words the distributionof admixed blocks across the genomes of descendants wasnot even (Triska et al 2015 Laso-Jadart et al 2017 Patin et al2017) The acquired selected regions can be inferred from theconcomitant identification of the same genomic regions inthe selection metrics and in the local enrichment ancestryanalysis AP populations displayed a jigsaw puzzle of AfricanEuropean and South Asian-selected genes linked to responseto infection diseases skin pigmentation lactase persistenceand food conversion

The strongest African-related enrichment sign was for theDARC gene which codes the Duffy antigen a membrane-bound chemokine receptor used by Plasmodium vivax toinfect red blood cells Plasmodium vivax causes a chronicform of malaria and is the most widespread type of malariaoutside Africa (McManus et al 2017) It is described that thederived C allele mutation in the promoter which causeserythroid-specific suppression of gene expression confers re-sistance against Plasmodium vivax malaria and thus it is al-most fixed in African populations (Triska et al 2015 Laso-Jadart et al 2017 McManus et al 2017) The AP is known tohave a diverse malaria epidemiology in recent times from alikely absence of transmission in Kuwait to an intense parasiteexposure similar to conditions in many parts of Africa amongresidents of Saudi Arabia and Yemen (Snow et al 2013) Thedetected enrichment signal confirmed here in the genomes ofArabians supported the higher malaria pressure in the WestAP The enrichment was strong enough to allow detection ofselection in the XP-EHH metrics in Saudi Arabia and Iran

The most significant EuropeanSouth Asian-related enrich-ments in AP populations were linked to lactase tolerance-specific chromosome 2 RAB3GAP1LCTMCM6 region skinpigmentation associated to the SLC24A5 gene and HLA re-gion on chromosome 6 The LCT gene 13915G allele

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

581

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

(rs41380347) was described as being present at high fre-quency in the AP and the Levant and estimated to havebeen originated 4 ka (62 ka) in the AP (Bayoumi et al2016) possibly as a result of the domestication of the Arabiancamel at 6 ka (Ranciaro et al 2014) By contrast theEuropean 13910T selected allele (rs4988235) is also themost commonly observed in South Asia (Gallego Romeroet al 2012) We could not verify frequencies of rs41380347SNP that was not part of the chip genotyped here but wechecked the linked rs4988243 SNP This last SNP testified thehigher frequencies in the West versus the East AP (432 inYemen 461 in Saudi Arabia 179 in Oman 191 in theUAE and 174 in Iran notice that although rs41380347 andrs4988243 are linked this last SNP has overall frequencies20 higher than the first) The genotyped Europeanrs4988235 SNP presented higher frequencies of the derivedallele in the East than in the West AP (06 13 125182 and 103 respectively) The haplotype backgroundof Arabian and European derived alleles are related and dis-tant from the African one (Ranciaro et al 2014) whichexplains the high non-African enrichment detected inRFMix analysis in the West AP (the Arabian haplotypes)and the selection signal in XP-EHH-European comparison inThe East AP and Iran (the European haplotypes) This is aninteresting case generated by the isolation between the twosides of the Peninsula where the West developed a localpositive selection of a lactose tolerant variant whereas theEast acquired the EuropeanSouth Asian variant Previouslythis difference had been reported for ethnic groups withinOman (Bayoumi et al 2016) where frequencies of13910Tand 13915G alleles diverged between Arabs of northernand southern Oman whom migrated from Yemen (0ndash1 and14ndash72 respectively) and Omanis of Asian origin (16 and0 respectively) Here we verified that this pattern is ex-tended to the entire Peninsula

In relation to the genetic diversity linked to skin color theEuropeanSouth Asian-selected derived SNPs are equally fre-quent in both the West and East AP (rs1426654 in theSLC24A5 gene 101 in Yemen 133 in Saudi Arabia132 in Oman 135 in the UAE and 118 in Iranrs183671 in the SLC45A2 gene 354 294 227 290and 401 respectively rs1667394 in the OCA2HERC2 gene381 288 307 335 and 349 respectively)

The palimpsest of complex interactions between cross andlocal selective pressures and demographic factors identifiedhere in the AP testifies the rich genetic dynamics taking placein bridging zones of the globe In the near future wholegenomes of Arabians across the entire peninsula (with amore precise geographical information than the one we couldpresent here) and more sensitive selection measures will addinformation to the role played by this geographic region inthe adaptation of modern humans to non-African habitatsImprovements to dating techniques applicable to completegenomes such as multiple sequentially Markovian coalescentapproach will also help in answering the question if the APwas only a passage way to reach further distances or one ofthe oldest inhabited regions OOA Complete genomes mayindeed shed light on the lost period in the mtDNA pool due

to the absence of the direct OOA descendants The first stepshave already been taken and a few Arabian completegenomes have begun to be available (John et al 2015Mallick et al 2016 Rodriguez-Flores et al 2016)

Materials and Methods

Population Samples Genome-Wide Genotyping andPublished DataDNA samples analyzed in this study were collected from 420individuals of the AP (Saudi Arabia Yemen Oman and theUAE) and from 80 individuals of Iran populations (furtherinformation provided in supplementary table S1Supplementary Material online) Individuals were residing inDubai at the time of sampling (between years 1991 and 2000)representing a widespread collection of birth origins fromSaudi Arabia Yemen Oman the UAE and Iran This studyobtained ethical approval from the Ethics Committee of theUniversity of Porto Portugal (17CEUP2012) The individualswere genotyped with the Illumina Human Omni ExpressBead Chip (OmniExpress) containing 741000 SNPs Qualitychecks were applied using PLINK (Purcell et al 2007 Changet al 2015) and 22 individuals were excluded for the lack ofmore than 10 of genotypes A preliminary PCA revealedthat some UAE and Oman individuals (five individuals fromeach population) had a considerable sub-Saharan African an-cestry an indicator that they were most probably recentimmigrants from Africa (supplementary fig S14Supplementary Material online) Hence these individualswere removed from the analyses leading to a final set of468 individuals Markers with genotyping call rates lowerthan 95 were removed consequently a total of 627435autosomal SNPs passed quality control check The genotypesfrom the individuals analyzed in this study can be accessedfrom the EGA repository (European Genome-PhenomeArchive accession number EGAS00001003335) In order toperform a fine-scale population structure analysis of our com-plete data the 468 individuals were merged with relevantavailable data sets (supplementary table S1 SupplementaryMaterial online) As several population genetic analyses as-sume independent markers SNPs were pruned for pairwiseLD in PLINK (wwwcog-genomicsorgplink19 Purcell et al2007 Chang et al 2015) by removing any SNP that had an r2

gt 04 with another SNP within a 50-SNPs sliding windowwith a step of 20 SNPs After pruning the ldquorestricted data setrdquoincluded 176309 autosomal SNPs and 1782 individuals (sup-plementary table S1 Supplementary Material online) andthis data set was used in the analyses based on allele frequen-cies (PCA ADMIXTURE FST and f3 statistic) Another data setwas created by randomly selecting a maximum of 25 individ-uals from each population but extending the number andthe geographical distribution of populations this data setconsisting of 171728 SNPs and 2056 individuals was desig-nated as ldquoextended data setrdquo (supplementary table S2Supplementary Material online) The homogenization ofthe number of samples per group avoided statistical biasesdue to overrepresentation of some populations and allowedto run computationally demanding algorithms such as EEMS

Fernandes et al doi101093molbevmsz005 MBE

582

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

fineSTRUCTURE Chromopainter GLOBETROTTER and IBDAll genotypes from the unpruned ldquoextended data setrdquo werephased with SHAPEIT v2r79044 (Delaneau et al 2011) usingthe 1000 Genomes phased data (Auton et al 2015) as a ref-erence panel and the HapMap phase 2 genetic map (Frazeret al 2007) We did not merge our data with relevant data setsfrom Qatar and Yemen (Hunter-Zinck et al 2010 Vyas et al2017) as these were screened through Affymetrix chips bear-ing a low number of overlapping SNPs Qatari completegenomes from (Rodriguez-Flores et al 2016) were specificallyselected not making a random population sample whichcould be comparable with our populations

Population Structure Differentiation and InbreedingPCA which infers worldwide axes of human genetic variationfrom the allele frequencies of various populations was carriedout in the pruned ldquorestricted data setrdquo by using the smartpcatool included in the EIGENSOFT package (Patterson et al2006 Price et al 2006) We ran ADMIXTURE (Alexander et al2009) to decompose genetic ancestries of the prunedldquorestricted data setrdquo using maximum likelihood for compo-nents Kfrac14 2 to Kfrac14 12 with the optimal K estimated throughcross-validation of the logistic regression (Kfrac14 6) Wrightrsquos FST

metric was calculated using Vcftools (Danecek et al 2011)The THREEPOP program from TreeMix was used to calculatef3 statistics (Reich et al 2009 Patterson et al 2012 Pickrell andPritchard 2012) which tests between alternative scenarios ofadmixture In order to analyze and visualize spatial populationstructure from georeferenced genetic samples and to definegenetic barriers and corridors the method of EEMS (Petkovaet al 2016) was used Geographic coordinates and a geneticdissimilarity matrix between populations were used to set upa map defining a grid of 500 demes and 3000000 Markovchain Monte Carlo (MCMC) iterations were run beforechecking for convergence Plots were generated in R followingEEMS v1 manual indications

Inference and Dating of Population AdmixturePopulation structure of the phased ldquoextended data setrdquo wasevaluated using the fineSTRUCTURE v207 package 18(Lawson et al 2012) fineSTRUCTURE captures informationon population structure provided by patterns of haplotypesimilarity calculated with Chromopainter v20 18 (Lawsonet al 2012) and performs a model-based Bayesian clusteringof genotypes From the results a coancestry heat map and adendrogram were inferred to visualize the number of statis-tically defined clusters that describe the data The clustersobtained by fineSTRUCTURE were then used to assign thepopulations as target or donorsurrogate when testing foradmixture Mutational rates and effective population sizeparameters were first estimated with an Estimation-Maximization algorithm running Chromopainter v2 on all22 autosomes for the entire data set with 10 iterations(Lawson et al 2012) The weighted average of these param-eters according to the SNP coverage of each chromosomeand the number of individuals was then used to compute thechromosomal painting Two Arabian groups were identifiedas target populations the western group represented by

Yemen and Saudi Arabia and the eastern group representedby Oman and the UAE We ran chromosomal painting foreach group Due to the possibility of common ancestry weexcluded the Levantine populations from surrogate popula-tions The painted chromosomes obtained for each clusterwere used in GLOBETROTTER v10 26 (Hellenthal et al 2014)to estimate the ratios and dates of the potential admixtureevents in the region Coancestry curves were estimated foreach target population both standardizing against a null(NULL 1) and standard (NULL 0) individual and consistencybetween each estimated parameter was checked A total of100 bootstrap resamplings were performed to estimate theprobability values of the admixture events considering theldquoNULLrdquo individual by calculating the proportion of non-sensical inferred dates (lt1 or gt400 generations) producedby the model (NULL 1) The ldquobest-guessrdquo scenario given byGLOBETROTTER v10 26 (Hellenthal et al 2014) was consid-ered for each target population Dates of admixture given ingenerations were converted to chronological time using ageneration interval of 29 years (Langergraber et al 2012)

Haplotype sharing between pairs of individuals was esti-mated by the refined IBD algorithm of Beagle v4026 27(Browning and Browning 2007) and visualized withCytoscape v32160 (Shannon et al 2003) All maps inthe present study were generated using Global Mapperv15 (httpwwwbluemarblegeocomproductsglobal-mapperphp)

Local Ancestry InferenceRFMix (Maples et al 2013) software which employs a LDmodel between markers to infer ancestry for each segmentof the genome between a mixture of putative ancestral panelsof haplotypes was used in the unpruned and phased data setfrom the four AP populations plus Iran The phased data fromthe 1000 Genomes populations of Great Britain (representingEuropean ancestry) Yoruba from Nigeria (sub-SaharanAfrican ancestry) and Indian Telugu (South Asian ancestry)were used as parental populations Information on ancestrywas obtained for each locus along chromosomes for everyindividual and these values were averaged in each popula-tion In order to identify regions which had a significantlyhigher proportion of a given parental ancestry we followedpreviously published studies (Bryc et al 2015) consideringonly regions outside the range defined by the genomemean 6 4SD The karyograms of the local ancestry acrossthe 22 chromosomes were generates in R using the ggbiopackage (Yin et al 2012)

Analysis of SelectioniHS (Voight et al 2006) was estimated in the SelScan package(Szpiech and Hernandez 2014) in the phased data set fromthe four AP populations plus Iran Scores were calculated foreach SNP within the population as a whole and standardizedwithin each of 100 bins of allele frequency using the normtool In order to identify loci potentially affected by positiveselection a percentile score was assigned based on the pro-portion of extreme iHS values in segment SelScan (Szpiechand Hernandez 2014) tool was also used to estimate cross

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

583

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

population extended haplotype homozygosity in each APpopulation by comparing with Great Britain Yoruba fromNigeria or Indian Telugu Consistently with other tests forselection only markers with MAFgt 1 were used for XP-EHH computation The obtained XP-EHH values were nor-malized by subtracting genome-wide mean XP-EHH and di-viding by standard deviation (Szpiech and Hernandez 2014)Genomic regions with percentile score exceeding 4SD and3SD were reported

Selected SNPs were mapped to genes in 5 kb flanking re-gion and significantly selected regions were represented incircos Plots generated in R (R Development Core Team 2018)using the shinyCircos graphical interface (Yu et al 2018)

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsThis work was financed by FEDERmdashFundo Europeu deDesenvolvimento Regional funds through COMPETE2020mdashOperacional Programme for Competitiveness andInternationalisation (POCI) Portugal 2020 and byPortuguese funds through FCTmdashFundac~ao para a Ciencia ea TecnologiaMinisterio da Ciencia Tecnologia e Inovac~ao inthe framework of the project ldquoBiomedical anthropologicalstudy in Arabian Peninsula based on high throughputgenomicsrdquo (POCI-01-0145-FEDER-016609) VF has a postdocgrant through FCT (SFRHBPD1149272016)

ReferencesAl-Abri A Podgorna E Rose JI Pereira L Mulligan CJ Silva NM Bayoumi

R Soares P Cerny V 2012 PleistocenendashHolocene boundary inSouthern Arabia from the perspective of human mtDNA variationAm J Phys Anthropol 149(2)291ndash298

Alexander DH Novembre J Lange K 2009 Fast model-based estimationof ancestry in unrelated individuals Genome Res 19(9)1655ndash1664

Alshamali F Pereira L Budowle B Poloni ES Currat M 2009 Localpopulation structure in Arabian Peninsula revealed by Y-STR diver-sity Hum Hered 68(1)45ndash54

An P Goedert JJ Donfield S Buchbinder S Kirk GD Detels R Winkler CA2014 Regulatory variation in HIV-1 dependency factor ZNRD1 asso-ciates with host resistance to HIV-1 acquisition J Infect Dis210(10)1539ndash1548

Auton A Brooks LD Durbin RM Garrison EP Kang HM Korbel JOMarchini JL McCarthy S McVean GA Abecasis GR 2015 A globalreference for human genetic variation Nature 526(7571)68ndash74

Basu Mallick C Iliescu FM Mols M Hill S Tamang R Chaubey G Goto RHo SYW Gallego Romero I Crivellaro F et al 2013 The light skinallele of SLC24A5 in South Asians and Europeans shares identity bydescent PLoS Genet 9(11)e1003912

Bayoumi R De Fanti S Sazzini M Giuliani C Quagliariello A Bortolini EBoattini A Al-Habori M Al-Zubairi AS Rose JI et al 2016 Positiveselection of lactase persistence among people of Southern ArabiaAm J Phys Anthropol 161(4)676ndash684

Browning BL Browning SR 2013 Improving the accuracy and efficiencyof identity-by-descent detection in population data Genetics194(2)459ndash471

Browning SR Browning BL 2007 Rapid and accurate haplotype phasingand missing-data inference for whole-genome association studies byuse of localized haplotype clustering Am J Hum Genet81(5)1084ndash1097

Brucato N Fernandes V Mazieres S Kusuma P Cox MP Ngrsquoangrsquoa JWOmar M Simeone-Senelle MC Frassati C Alshamali F et al 2018The Comoros show the earliest Austronesian gene flow into theSwahili corridor Am J Hum Genet 102(1)58ndash68

Bryc K Durand EY Macpherson JM Reich D Mountain JL 2015 Thegenetic ancestry of African Americans Latinos and EuropeanAmericans across the United States Am J Hum Genet 96(1)37ndash53

Campbell MC Tishkoff SA 2008 African genetic diversity implica-tions for human demographic history modern human originsand complex disease mapping Annu Rev Genomics Hum Genet9403ndash433

Candille SI Absher DM Beleza S Bauchet M McEvoy B GarrisonNA Li JZ Myers RM Barsh GS Tang H et al 2012 Genome-wideassociation studies of quantitatively measured skin hair and eyepigmentation in four European populations PLoS One 7(10)e48294

Cerny V Mulligan CJ Fernandes V Silva NM Alshamali F Non A HarichN Cherni L El Gaaied AB Al-Meeri A et al 2011 Internal diversifi-cation of mitochondrial haplogroup R0a reveals post-last glacialmaximum demographic expansions in South Arabia Mol Biol Evol28(1)71ndash78

Cerny V Pereira L Kujanova M Vasikova A Hajek M Morris M MulliganCJ 2009 Out of Arabiamdashthe settlement of island Soqotra as revealedby mitochondrial and Y chromosome genetic diversity Am J PhysAnthropol 138(4)439ndash447

Chang CC Chow CC Tellier LC Vattikuti S Purcell SM Lee JJ 2015Second-generation PLINK rising to the challenge of larger and richerdatasets Gigascience 47

Danecek P Auton A Abecasis G Albers CA Banks E DePristo MAHandsaker RE Lunter G Marth GT Sherry ST et al 2011 The variantcall format and VCFtools Bioinformatics 27(15)2156ndash2158

Delaneau O Marchini J Zagury JF 2011 A linear complexity phasingmethod for thousands of genomes Nat Methods 9(2)179ndash181

Emmrich S Streltsov A Schmidt F Thangapandi VR Reinhardt DKlusmann JH 2014 LincRNAs MONC and MIR100HG act as onco-genes in acute megakaryoblastic leukemia Mol Cancer 13171

Fernandes V Alshamali F Alves M Costa MD Pereira JB Silva NMCherni L Harich N Cerny V Soares P et al 2012 The Arabian cradlemitochondrial relicts of the first steps along the southern route outof Africa Am J Hum Genet 90(2)347ndash355

Fernandes V Triska P Pereira JB Alshamali F Rito T Machado AFajkosova Z Cavadas B Cerny V Soares P et al 2015 Genetic stra-tigraphy of key demographic events in Arabia PLoS One10(3)e0118625

Frazer KA Ballinger DG Cox DR Hinds DA Stuve LL Gibbs RA BelmontJW Boudreau A Hardenbol P Leal SM et al 2007 A second gener-ation human haplotype map of over 31 million SNPs Nature449(7164)851ndash861

Fu Q Posth C Hajdinjak M Petr M Mallick S Fernandes D FurtwanglerA Haak W Meyer M Mittnik A et al 2016 The genetic history of IceAge Europe Nature 534(7606)200ndash205

Gallego Romero I Basu Mallick C Liebert A Crivellaro F Chaubey G ItanY Metspalu M Eaaswarkhanth M Pitchappan R Villems R et al2012 Herders of Indian and European cattle share their predomi-nant allele for lactase persistence Mol Biol Evol 29(1)249ndash260

Gravel S Zakharia F Moreno-Estrada A Byrnes JK Muzzio M Rodriguez-Flores JL Kenny EE Gignoux CR Maples BK Guiblet W et al 2013Reconstructing Native American migrations from whole-genomeand whole-exome data PLoS Genet 9(12)e1004023

Hellenthal G Busby GBJ Band G Wilson JF Capelli C Falush D Myers S2014 A genetic atlas of human admixture history Science343(6172)747ndash751

Hernandez-Pacheco N Flores C Alonso S Eng C Mak ACY Hunstman SHu D White MJ Oh SS Meade K et al 2017 Identification of a novellocus associated with skin colour in African-admixed populationsSci Rep 744548

Hodgson JA Mulligan CJ Al-Meeri A Raaum RL 2014 Early back-to-Africa migration into the horn of Africa PLoS Genet 10(6)e1004393

Fernandes et al doi101093molbevmsz005 MBE

584

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Hogarth DG 1904 The penetration of Arabia a record of the develop-ment of western knowledge concerning the Arabian Peninsula NewYork Cambridge Library Collection

Hunter-Zinck H Musharoff S Salit J Al-Ali KA Chouchane L Gohar AMatthews R Butler MW Fuller J Hackett NR et al 2010 Populationgenetic structure of the people of Qatar Am J Hum Genet87(1)17ndash25

John SE Thareja G Hebbar P Behbehani K Thanaraj TA Alsmadi O2015 Kuwaiti population subgroup of nomadic Bedouin ancestrymdashwhole genome sequence and analysis Genom Data 3116ndash127

Kivisild T Reidla M Metspalu E Rosa A Brehm A Pennarun E Parik JGeberhiwot T Usanga E Villems R 2004 Ethiopian mitochondrialDNA heritage tracking gene flow across and around the gate oftears Am J Hum Genet 75(5)752ndash770

Laayouni H Oosting M Luisi P Ioana M Alonso S Rica~no-Ponce ITrynka G Zhernakova A Plantinga TS Cheng S-C et al 2014Convergent evolution in European and Rroma populations revealspressure exerted by plague on Toll-like receptors Proc Natl Acad SciU S A 111(7)2668ndash2673

Lahr MM Foley R 2005 Multiple dispersals and modern human originsEvol Anthropol 3(2)48ndash60

Langergraber KE Prufer K Rowney C Boesch C Crockford C Fawcett KInoue E Inoue-Muruyama M Mitani JC Muller MN et al 2012Generation times in wild chimpanzees and gorillas suggest earlierdivergence times in great ape and human evolution Proc Natl AcadSci U S A 109(39)15716ndash15721

Laso-Jadart R Harmant C Quach H Zidane N Tyler-Smith C Mehdi QAyub Q Quintana-Murci L Patin E 2017 The genetic legacy of theIndian Ocean slave trade recent admixture and post-admixture se-lection in the Makranis of Pakistan Am J Hum Genet101(6)977ndash984

Lawson DJ Hellenthal G Myers S Falush D 2012 Inference of populationstructure using dense haplotype data PLoS Genet 8(1)e1002453

Li JZ Absher DM Tang H Southwick AM Casto AM Ramachandran SCann HM Barsh GS Feldman M Cavalli-Sforza LL et al 2008Worldwide human relationships inferred from genome-wide pat-terns of variation Science 319(5866)1100ndash1104

Loh PR Lipson M Patterson N Moorjani P Pickrell JK Reich D Berger B2013 Inferring admixture histories of human populations using link-age disequilibrium Genetics 193(4)1233ndash1254

Lovejoy PE 2011 Transformations in slavery a history of slavery in AfricaCambridge Cambridge University Press

Macaulay V Hill C Achilli A Rengo C Clarke D Meehan W Blackburn JSemino O Scozzari R Cruciani F et al 2005 Single rapid coastalsettlement of Asia revealed by analysis of complete mitochondrialgenomes Science 308(5724)1034

Malaspinas A-S Westaway MC Muller C Sousa VC Lao O Alves IBergstrom A Athanasiadis G Cheng JY Crawford JE et al 2016 Agenomic history of Aboriginal Australia Nature 538(7624)207

Mallick S Li H Lipson M Mathieson I Gymrek M Racimo F Zhao MChennagiri N Nordenfelt S Tandon A et al 2016 The SimonsGenome Diversity Project 300 genomes from 142 diverse popula-tions Nature 538(7624)201

Maples BK Gravel S Kenny EE Bustamante CD 2013 RFMix a discrim-inative modeling approach for rapid and robust local-ancestry infer-ence Am J Hum Genet 93(2)278ndash288

McManus KF Taravella AM Henn BM Bustamante CD Sikora MCornejo OE 2017 Population genetic analysis of the DARC locus(Duffy) reveals adaptation from standing variation associated withmalaria resistance in humans PLoS Genet 13(3)e1006560

Moorjani P Patterson N Hirschhorn JN Keinan A Hao L Atzmon GBurns E Ostrer H Price AL Reich D 2011 The history of African geneflow into Southern Europeans Levantines and Jews PLoS Genet7(4)e1001373

Musilova E Fernandes V Silva NM Soares P Alshamali F Harich NCherni L Gaaied AB Al-Meeri A Pereira L et al 2011 Populationhistory of the Red Seamdashgenetic exchanges between the ArabianPeninsula and East Africa signaled in the mitochondrial DNA HV1haplogroup Am J Phys Anthropol 145592ndash598

Ozato K Shin D-M Chang T-H Morse HC 2008 TRIM family proteinsand their emerging roles in innate immunity Nat Rev Immunol8(11)849ndash860

Patin E Lopez M Grollemund R Verdu P Harmant C Quach H Laval GPerry GH Barreiro LB Froment A et al 2017 Dispersals and geneticadaptation of Bantu-speaking populations in Africa and NorthAmerica Science 356(6337)543ndash546

Patterson N Moorjani P Luo Y Mallick S Rohland N Zhan YGenschoreck T Webster T Reich D 2012 Ancient admixture inhuman history Genetics 192(3)1065

Patterson N Price AL Reich D 2006 Population structure and eigena-nalysis PLoS Genet 2(12)e190

Petkova D Novembre J Stephens M 2016 Visualizing spatial populationstructure with estimated effective migration surfaces Nat Genet48(1)94ndash100

Pickrell JK Coop G Novembre J Kudaravalli S Li JZ Absher D SrinivasanBS Barsh GS Myers RM Feldman MW et al 2009 Signals of recentpositive selection in a worldwide sample of human populationsGenome Res 19(5)826ndash837

Pickrell JK Pritchard JK 2012 Inference of population splits and mixturesfrom genome-wide allele frequency data PLoS Genet8(11)e1002967

Price AL Patterson NJ Plenge RM Weinblatt ME Shadick NA Reich D2006 Principal components analysis corrects for stratification ingenome-wide association studies Nat Genet 38(8)904ndash909

Purcell S Neale B Todd-Brown K Thomas L Ferreira MA Bender DMaller J Sklar P de Bakker PI Daly MJ et al 2007 PLINK a tool set forwhole-genome association and population-based linkage analysesAm J Hum Genet 81(3)559ndash575

Quintana-Murci L Semino O Bandelt H-J Passarino G McElreavey KSantachiara-Benerecetti AS 1999 Genetic evidence of an early exit ofHomo sapiens sapiens from Africa through eastern Africa Nat Genet23(4)437

R Development Core Team 2018 R a language and environment forstatistical computing Vienna (Austria)R Foundation for StatisticalComputing

Racimo F 2016 Testing for ancient selection using cross-populationallele frequency differentiation Genetics 202(2)733ndash750

Ranciaro A Campbell MC Hirbo JB Ko W-Y Froment A Anagnostou PKotze MJ Ibrahim M Nyambo T Omar SA et al 2014 Geneticorigins of lactase persistence and the spread of pastoralism inAfrica Am J Hum Genet 94(4)496ndash510

Reich D Thangaraj K Patterson N Price AL Singh L 2009Reconstructing Indian population history Nature461(7263)489ndash494

Roche B Rougeron V Quintana-Murci L Renaud F Abbate JLPrugnolle F 2017 Might interspecific interactions between patho-gens drive host evolution The case of Plasmodium species andDuffy-negativity in human populations Trends Parasitol 33(1)21ndash29

Rodriguez-Flores JL Fakhro K Agosto-Perez F Ramstetter MD Arbiza LVincent TL Robay A Malek JA Suhre K Chouchane L et al 2016Indigenous Arabs are descendants of the earliest split from ancientEurasian populations Genome Res 26(2)151ndash162

Rosenberg NA Huang L Jewett EM Szpiech ZA Jankovic I Boehnke M2010 Genome-wide association studies in diverse populations NatRev Genet 11(5)356ndash366

Russo A Di Gaetano C Cugliari G Matullo G 2018 Advances in thegenetics of hypertension the effect of rare variants Int J Mol Sci19(3)688

Segal R 2002 Islamrsquos black slaves the other black diaspora LondonAtlantic Books

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin NSchwikowski B Ideker T 2003 Cytoscape a software environmentfor integrated models of biomolecular interaction networks GenomeRes 13(11)2498ndash2504

Skoglund P Thompson JC Prendergast ME Mittnik A Sirak K HajdinjakM Salie T Rohland N Mallick S Peltzer A et al 2017 Reconstructingprehistoric African population structure Cell 171(1)59ndash71 e21

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

585

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Snow RW Amratia P Zamani G Mundia CW Noor AM Memish ZA AlZahrani MH Al Jasari A Fikri M Atta H 2013 The malaria transitionon the Arabian Peninsula progress toward a malaria-free regionbetween 1960ndash2010 Adv Parasitol 82205ndash251

Soares P Alshamali F Pereira JB Fernandes V Silva NM Afonso C CostaMD Musilova E Macaulay V Richards MB et al 2012 The expan-sion of mtDNA haplogroup L3 within and out of Africa Mol BiolEvol 29(3)915ndash927

Szpiech ZA Hernandez RD 2014 selscan an efficient multithreadedprogram to perform EHH-based scans for positive selection MolBiol Evol 31(10)2824ndash2827

Tekola-Ayele F Adeyemo A Chen G Hailu E Aseffa AA-O Davey GA-ONewport MJ Rotimi CA-OX 2015 Novel genomic signals of recentselection in an Ethiopian population Eur J Hum Genet23(8)1085ndash1092

Thompson EE Kuttab-Boulos H Witonsky D Yang L Roe BA Di RienzoA 2004 CYP3A variation and the evolution of salt-sensitivity var-iants Am J Hum Genet 75(6)1059ndash1069

Tishkoff SA Reed FA Ranciaro A Voight BF Babbitt CC Silverman JSPowell K Mortensen HM Hirbo JB Osman M et al 2007Convergent adaptation of human lactase persistence in Africa andEurope Nat Genet 39(1)31ndash40

Triska P Soares P Patin E Fernandes V Cerny V Pereira L 2015Extensive admixture and selective pressure across the Sahel BeltGenome Biol Evol 7(12)3484ndash3495

Uchil PD Quinlan BD Chan W-T Luna JM Mothes W 2008 TRIM E3ligases interfere with early and late stages of the retroviral life cyclePLoS Pathog 4(2)e16

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Vyas DN Al-Meeri A Mulligan CJ 2017 Testing support for the northernand southern dispersal routes out of Africa an analysis of Levantineand southern Arabian populations Am J Phys Anthropol164(4)736ndash749

Vyas DN Kitchen A Miro-Herrans AT Pearson LN Al-Meeri A MulliganCJ 2016 Bayesian analyses of Yemeni mitochondrial genomes sug-gest multiple migration events with Africa and Western Eurasia AmJ Phys Anthropol 159(3)382ndash393

Wagh K Bhatia A Alexe G Reddy A Ravikumar V Seiler M Boemo MYao M Cronk L Naqvi A et al 2012 Lactase persistence and lipidpathway selection in the Maasai PLoS One 7(9)e44751

Yin T Cook D Lawrence M 2012 ggbio an R package for extending thegrammar of graphics for genomic data Genome Biol 13(8)R77

Yu Y Ouyang Y Yao W 2018 shinyCircos an RShiny application forinteractive creation of Circos plot Bioinformatics 34(7)1229ndash1231

Zhao F McParland S Kearney F Du L Berry DP 2015 Detection ofselection signatures in dairy and beef cattle using high-density ge-nomic information Genet Sel Evol 4749

Fernandes et al doi101093molbevmsz005 MBE

586

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

involved in the regulation of hematopoiesis and oncogenesfor the development of myeloid leukemia (Emmrich et al2014) In addition a signal on the chromosome 10 regioncontaining the SPAG6 gene was observed in all populationsin iHS (except Yemen) and XP-EHH-African (stronger inOman and the UAE) analyses This gene is essential for thestructural integrity of the central microtubule of sperm sta-bility and flagellar motility and has been previously identifiedas being under selection before the separation betweenEuropean and Asian populations (Racimo 2016) The signifi-cant signal in iHS score on chromosome 7 region containingseveral genes (CYP3A5 ZSCAN25 ATP5J2-PTCD1 CPSF4ATP5J2 ZNF789 and ZKSCAN5) was also detected in all pop-ulations at the XP-EHH-African comparison CYP3A5 gene isinvolved in cholesterol metabolism and steroid biosynthesisin the liver and has been identified to be under positive se-lection in African and non-African populations (Wagh et al2012) CYP3A513 polymorphism was also described as be-ing under selective pressure for water retention and risk forsalt-sensitive hypertension in equatorial populations(Thompson et al 2004)

The iHS metrics identified a positively selected region onchromosome 6 in all populations which contains candidategenes involved in the human immunodeficiency virus (HIV)stages of the viral lifecycle the ZNRD1-AS1and TRIM26 genesZNRD1-AS1 plays a role in the regulation of cell proliferationand is essential for the completion of HIV lifecycle Itsrs3132130 SNP was described to confer host resistance toHIV-1 acquisition by causing a loss of nuclear factor bindingand decreasing the ZNRD1 promoter activity (An et al 2014)Although with unknown function the tripartite motif (TRIM)proteins may contribute to the innate immunity of retrovi-ruses affecting both early and late stages of the retroviral lifecycle (Ozato et al 2008 Uchil et al 2008) Another signifi-cantly iHS selected region in all populations except the SaudiArabians was identified on chromosome 22 region contain-ing the TTC38 PKDREJ GTSE1 and PPARA genes PPARA geneis a transcription receptor that regulates lipid and glucosemetabolism during food deficiency and a recent study(Tekola-Ayele et al 2015) found a specific signal of recentpositive selection of this gene in Ethiopia suggesting a met-abolic adaptation to high-altitude hypoxia

ConclusionsThe detailed genome-wide characterization of Arabian andIranian populations allowed us to establish for the first timethe palimpsest of admixture and selection along the genomesof these inhabitants the main bridge between African Asianand European population groups Admixture events com-prised up to 20 of sub-Saharan African ancestry in theWest and 20 of South Asian ancestry in the East of theAP enriching the local predominant Arabian- North African-Levantine and Caucasian-like components The sub-SaharanAfrican input decreased toward the East whereas the SouthAsian input decreased in the direction of the West of thePeninsula testifying the continuous genetic exchange withAfrica and Asia as it was evident in the IBD sharing analysis

Of course there are specific subgroups that depart from thesegeneral characteristics such as the Qatari Bedouins that rep-resent a ldquocoldspotrdquo of admixture in the AP (Rodriguez-Floreset al 2016) whereas the Hadrami in Yemen are a ldquohotspotrdquo ofadmixture (Vyas et al 2017) EEMS analysis identified higherexchanges of genes between Africa and the West AP as wellas between the East AP and the LevantSouth Asia thanbetween the West and the East AP This result agrees withpreviously documented degree of isolation between Westand East AP populations identified when studying uniparen-tal markers (Alshamali et al 2009 Fernandes et al 2012 2015)We previously found relict maternal lineages (60 ka) of theoriginal OOA successful settlement by modern humans in theAP and most likely they spread to South Asia and the LevantEurope from the Gulf corridor (Fernandes et al 2012)However current LD decayndashbased dates only indicated re-cent admixture events in the eastern side of the PeninsulaSimilarly younger ages of admixture in the West AP reflectedthe Islamization of North Africa as the most probable maincontributor of this region The exchanges across the Red Seadetected for the autosomal data matched the Arab slavetrade and maritime dominance

Recent studies have provided evidence that admixed pop-ulations acquired positively selected genomic regions fromtheir parental populations or in other words the distributionof admixed blocks across the genomes of descendants wasnot even (Triska et al 2015 Laso-Jadart et al 2017 Patin et al2017) The acquired selected regions can be inferred from theconcomitant identification of the same genomic regions inthe selection metrics and in the local enrichment ancestryanalysis AP populations displayed a jigsaw puzzle of AfricanEuropean and South Asian-selected genes linked to responseto infection diseases skin pigmentation lactase persistenceand food conversion

The strongest African-related enrichment sign was for theDARC gene which codes the Duffy antigen a membrane-bound chemokine receptor used by Plasmodium vivax toinfect red blood cells Plasmodium vivax causes a chronicform of malaria and is the most widespread type of malariaoutside Africa (McManus et al 2017) It is described that thederived C allele mutation in the promoter which causeserythroid-specific suppression of gene expression confers re-sistance against Plasmodium vivax malaria and thus it is al-most fixed in African populations (Triska et al 2015 Laso-Jadart et al 2017 McManus et al 2017) The AP is known tohave a diverse malaria epidemiology in recent times from alikely absence of transmission in Kuwait to an intense parasiteexposure similar to conditions in many parts of Africa amongresidents of Saudi Arabia and Yemen (Snow et al 2013) Thedetected enrichment signal confirmed here in the genomes ofArabians supported the higher malaria pressure in the WestAP The enrichment was strong enough to allow detection ofselection in the XP-EHH metrics in Saudi Arabia and Iran

The most significant EuropeanSouth Asian-related enrich-ments in AP populations were linked to lactase tolerance-specific chromosome 2 RAB3GAP1LCTMCM6 region skinpigmentation associated to the SLC24A5 gene and HLA re-gion on chromosome 6 The LCT gene 13915G allele

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

581

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

(rs41380347) was described as being present at high fre-quency in the AP and the Levant and estimated to havebeen originated 4 ka (62 ka) in the AP (Bayoumi et al2016) possibly as a result of the domestication of the Arabiancamel at 6 ka (Ranciaro et al 2014) By contrast theEuropean 13910T selected allele (rs4988235) is also themost commonly observed in South Asia (Gallego Romeroet al 2012) We could not verify frequencies of rs41380347SNP that was not part of the chip genotyped here but wechecked the linked rs4988243 SNP This last SNP testified thehigher frequencies in the West versus the East AP (432 inYemen 461 in Saudi Arabia 179 in Oman 191 in theUAE and 174 in Iran notice that although rs41380347 andrs4988243 are linked this last SNP has overall frequencies20 higher than the first) The genotyped Europeanrs4988235 SNP presented higher frequencies of the derivedallele in the East than in the West AP (06 13 125182 and 103 respectively) The haplotype backgroundof Arabian and European derived alleles are related and dis-tant from the African one (Ranciaro et al 2014) whichexplains the high non-African enrichment detected inRFMix analysis in the West AP (the Arabian haplotypes)and the selection signal in XP-EHH-European comparison inThe East AP and Iran (the European haplotypes) This is aninteresting case generated by the isolation between the twosides of the Peninsula where the West developed a localpositive selection of a lactose tolerant variant whereas theEast acquired the EuropeanSouth Asian variant Previouslythis difference had been reported for ethnic groups withinOman (Bayoumi et al 2016) where frequencies of13910Tand 13915G alleles diverged between Arabs of northernand southern Oman whom migrated from Yemen (0ndash1 and14ndash72 respectively) and Omanis of Asian origin (16 and0 respectively) Here we verified that this pattern is ex-tended to the entire Peninsula

In relation to the genetic diversity linked to skin color theEuropeanSouth Asian-selected derived SNPs are equally fre-quent in both the West and East AP (rs1426654 in theSLC24A5 gene 101 in Yemen 133 in Saudi Arabia132 in Oman 135 in the UAE and 118 in Iranrs183671 in the SLC45A2 gene 354 294 227 290and 401 respectively rs1667394 in the OCA2HERC2 gene381 288 307 335 and 349 respectively)

The palimpsest of complex interactions between cross andlocal selective pressures and demographic factors identifiedhere in the AP testifies the rich genetic dynamics taking placein bridging zones of the globe In the near future wholegenomes of Arabians across the entire peninsula (with amore precise geographical information than the one we couldpresent here) and more sensitive selection measures will addinformation to the role played by this geographic region inthe adaptation of modern humans to non-African habitatsImprovements to dating techniques applicable to completegenomes such as multiple sequentially Markovian coalescentapproach will also help in answering the question if the APwas only a passage way to reach further distances or one ofthe oldest inhabited regions OOA Complete genomes mayindeed shed light on the lost period in the mtDNA pool due

to the absence of the direct OOA descendants The first stepshave already been taken and a few Arabian completegenomes have begun to be available (John et al 2015Mallick et al 2016 Rodriguez-Flores et al 2016)

Materials and Methods

Population Samples Genome-Wide Genotyping andPublished DataDNA samples analyzed in this study were collected from 420individuals of the AP (Saudi Arabia Yemen Oman and theUAE) and from 80 individuals of Iran populations (furtherinformation provided in supplementary table S1Supplementary Material online) Individuals were residing inDubai at the time of sampling (between years 1991 and 2000)representing a widespread collection of birth origins fromSaudi Arabia Yemen Oman the UAE and Iran This studyobtained ethical approval from the Ethics Committee of theUniversity of Porto Portugal (17CEUP2012) The individualswere genotyped with the Illumina Human Omni ExpressBead Chip (OmniExpress) containing 741000 SNPs Qualitychecks were applied using PLINK (Purcell et al 2007 Changet al 2015) and 22 individuals were excluded for the lack ofmore than 10 of genotypes A preliminary PCA revealedthat some UAE and Oman individuals (five individuals fromeach population) had a considerable sub-Saharan African an-cestry an indicator that they were most probably recentimmigrants from Africa (supplementary fig S14Supplementary Material online) Hence these individualswere removed from the analyses leading to a final set of468 individuals Markers with genotyping call rates lowerthan 95 were removed consequently a total of 627435autosomal SNPs passed quality control check The genotypesfrom the individuals analyzed in this study can be accessedfrom the EGA repository (European Genome-PhenomeArchive accession number EGAS00001003335) In order toperform a fine-scale population structure analysis of our com-plete data the 468 individuals were merged with relevantavailable data sets (supplementary table S1 SupplementaryMaterial online) As several population genetic analyses as-sume independent markers SNPs were pruned for pairwiseLD in PLINK (wwwcog-genomicsorgplink19 Purcell et al2007 Chang et al 2015) by removing any SNP that had an r2

gt 04 with another SNP within a 50-SNPs sliding windowwith a step of 20 SNPs After pruning the ldquorestricted data setrdquoincluded 176309 autosomal SNPs and 1782 individuals (sup-plementary table S1 Supplementary Material online) andthis data set was used in the analyses based on allele frequen-cies (PCA ADMIXTURE FST and f3 statistic) Another data setwas created by randomly selecting a maximum of 25 individ-uals from each population but extending the number andthe geographical distribution of populations this data setconsisting of 171728 SNPs and 2056 individuals was desig-nated as ldquoextended data setrdquo (supplementary table S2Supplementary Material online) The homogenization ofthe number of samples per group avoided statistical biasesdue to overrepresentation of some populations and allowedto run computationally demanding algorithms such as EEMS

Fernandes et al doi101093molbevmsz005 MBE

582

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

fineSTRUCTURE Chromopainter GLOBETROTTER and IBDAll genotypes from the unpruned ldquoextended data setrdquo werephased with SHAPEIT v2r79044 (Delaneau et al 2011) usingthe 1000 Genomes phased data (Auton et al 2015) as a ref-erence panel and the HapMap phase 2 genetic map (Frazeret al 2007) We did not merge our data with relevant data setsfrom Qatar and Yemen (Hunter-Zinck et al 2010 Vyas et al2017) as these were screened through Affymetrix chips bear-ing a low number of overlapping SNPs Qatari completegenomes from (Rodriguez-Flores et al 2016) were specificallyselected not making a random population sample whichcould be comparable with our populations

Population Structure Differentiation and InbreedingPCA which infers worldwide axes of human genetic variationfrom the allele frequencies of various populations was carriedout in the pruned ldquorestricted data setrdquo by using the smartpcatool included in the EIGENSOFT package (Patterson et al2006 Price et al 2006) We ran ADMIXTURE (Alexander et al2009) to decompose genetic ancestries of the prunedldquorestricted data setrdquo using maximum likelihood for compo-nents Kfrac14 2 to Kfrac14 12 with the optimal K estimated throughcross-validation of the logistic regression (Kfrac14 6) Wrightrsquos FST

metric was calculated using Vcftools (Danecek et al 2011)The THREEPOP program from TreeMix was used to calculatef3 statistics (Reich et al 2009 Patterson et al 2012 Pickrell andPritchard 2012) which tests between alternative scenarios ofadmixture In order to analyze and visualize spatial populationstructure from georeferenced genetic samples and to definegenetic barriers and corridors the method of EEMS (Petkovaet al 2016) was used Geographic coordinates and a geneticdissimilarity matrix between populations were used to set upa map defining a grid of 500 demes and 3000000 Markovchain Monte Carlo (MCMC) iterations were run beforechecking for convergence Plots were generated in R followingEEMS v1 manual indications

Inference and Dating of Population AdmixturePopulation structure of the phased ldquoextended data setrdquo wasevaluated using the fineSTRUCTURE v207 package 18(Lawson et al 2012) fineSTRUCTURE captures informationon population structure provided by patterns of haplotypesimilarity calculated with Chromopainter v20 18 (Lawsonet al 2012) and performs a model-based Bayesian clusteringof genotypes From the results a coancestry heat map and adendrogram were inferred to visualize the number of statis-tically defined clusters that describe the data The clustersobtained by fineSTRUCTURE were then used to assign thepopulations as target or donorsurrogate when testing foradmixture Mutational rates and effective population sizeparameters were first estimated with an Estimation-Maximization algorithm running Chromopainter v2 on all22 autosomes for the entire data set with 10 iterations(Lawson et al 2012) The weighted average of these param-eters according to the SNP coverage of each chromosomeand the number of individuals was then used to compute thechromosomal painting Two Arabian groups were identifiedas target populations the western group represented by

Yemen and Saudi Arabia and the eastern group representedby Oman and the UAE We ran chromosomal painting foreach group Due to the possibility of common ancestry weexcluded the Levantine populations from surrogate popula-tions The painted chromosomes obtained for each clusterwere used in GLOBETROTTER v10 26 (Hellenthal et al 2014)to estimate the ratios and dates of the potential admixtureevents in the region Coancestry curves were estimated foreach target population both standardizing against a null(NULL 1) and standard (NULL 0) individual and consistencybetween each estimated parameter was checked A total of100 bootstrap resamplings were performed to estimate theprobability values of the admixture events considering theldquoNULLrdquo individual by calculating the proportion of non-sensical inferred dates (lt1 or gt400 generations) producedby the model (NULL 1) The ldquobest-guessrdquo scenario given byGLOBETROTTER v10 26 (Hellenthal et al 2014) was consid-ered for each target population Dates of admixture given ingenerations were converted to chronological time using ageneration interval of 29 years (Langergraber et al 2012)

Haplotype sharing between pairs of individuals was esti-mated by the refined IBD algorithm of Beagle v4026 27(Browning and Browning 2007) and visualized withCytoscape v32160 (Shannon et al 2003) All maps inthe present study were generated using Global Mapperv15 (httpwwwbluemarblegeocomproductsglobal-mapperphp)

Local Ancestry InferenceRFMix (Maples et al 2013) software which employs a LDmodel between markers to infer ancestry for each segmentof the genome between a mixture of putative ancestral panelsof haplotypes was used in the unpruned and phased data setfrom the four AP populations plus Iran The phased data fromthe 1000 Genomes populations of Great Britain (representingEuropean ancestry) Yoruba from Nigeria (sub-SaharanAfrican ancestry) and Indian Telugu (South Asian ancestry)were used as parental populations Information on ancestrywas obtained for each locus along chromosomes for everyindividual and these values were averaged in each popula-tion In order to identify regions which had a significantlyhigher proportion of a given parental ancestry we followedpreviously published studies (Bryc et al 2015) consideringonly regions outside the range defined by the genomemean 6 4SD The karyograms of the local ancestry acrossthe 22 chromosomes were generates in R using the ggbiopackage (Yin et al 2012)

Analysis of SelectioniHS (Voight et al 2006) was estimated in the SelScan package(Szpiech and Hernandez 2014) in the phased data set fromthe four AP populations plus Iran Scores were calculated foreach SNP within the population as a whole and standardizedwithin each of 100 bins of allele frequency using the normtool In order to identify loci potentially affected by positiveselection a percentile score was assigned based on the pro-portion of extreme iHS values in segment SelScan (Szpiechand Hernandez 2014) tool was also used to estimate cross

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

583

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

population extended haplotype homozygosity in each APpopulation by comparing with Great Britain Yoruba fromNigeria or Indian Telugu Consistently with other tests forselection only markers with MAFgt 1 were used for XP-EHH computation The obtained XP-EHH values were nor-malized by subtracting genome-wide mean XP-EHH and di-viding by standard deviation (Szpiech and Hernandez 2014)Genomic regions with percentile score exceeding 4SD and3SD were reported

Selected SNPs were mapped to genes in 5 kb flanking re-gion and significantly selected regions were represented incircos Plots generated in R (R Development Core Team 2018)using the shinyCircos graphical interface (Yu et al 2018)

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsThis work was financed by FEDERmdashFundo Europeu deDesenvolvimento Regional funds through COMPETE2020mdashOperacional Programme for Competitiveness andInternationalisation (POCI) Portugal 2020 and byPortuguese funds through FCTmdashFundac~ao para a Ciencia ea TecnologiaMinisterio da Ciencia Tecnologia e Inovac~ao inthe framework of the project ldquoBiomedical anthropologicalstudy in Arabian Peninsula based on high throughputgenomicsrdquo (POCI-01-0145-FEDER-016609) VF has a postdocgrant through FCT (SFRHBPD1149272016)

ReferencesAl-Abri A Podgorna E Rose JI Pereira L Mulligan CJ Silva NM Bayoumi

R Soares P Cerny V 2012 PleistocenendashHolocene boundary inSouthern Arabia from the perspective of human mtDNA variationAm J Phys Anthropol 149(2)291ndash298

Alexander DH Novembre J Lange K 2009 Fast model-based estimationof ancestry in unrelated individuals Genome Res 19(9)1655ndash1664

Alshamali F Pereira L Budowle B Poloni ES Currat M 2009 Localpopulation structure in Arabian Peninsula revealed by Y-STR diver-sity Hum Hered 68(1)45ndash54

An P Goedert JJ Donfield S Buchbinder S Kirk GD Detels R Winkler CA2014 Regulatory variation in HIV-1 dependency factor ZNRD1 asso-ciates with host resistance to HIV-1 acquisition J Infect Dis210(10)1539ndash1548

Auton A Brooks LD Durbin RM Garrison EP Kang HM Korbel JOMarchini JL McCarthy S McVean GA Abecasis GR 2015 A globalreference for human genetic variation Nature 526(7571)68ndash74

Basu Mallick C Iliescu FM Mols M Hill S Tamang R Chaubey G Goto RHo SYW Gallego Romero I Crivellaro F et al 2013 The light skinallele of SLC24A5 in South Asians and Europeans shares identity bydescent PLoS Genet 9(11)e1003912

Bayoumi R De Fanti S Sazzini M Giuliani C Quagliariello A Bortolini EBoattini A Al-Habori M Al-Zubairi AS Rose JI et al 2016 Positiveselection of lactase persistence among people of Southern ArabiaAm J Phys Anthropol 161(4)676ndash684

Browning BL Browning SR 2013 Improving the accuracy and efficiencyof identity-by-descent detection in population data Genetics194(2)459ndash471

Browning SR Browning BL 2007 Rapid and accurate haplotype phasingand missing-data inference for whole-genome association studies byuse of localized haplotype clustering Am J Hum Genet81(5)1084ndash1097

Brucato N Fernandes V Mazieres S Kusuma P Cox MP Ngrsquoangrsquoa JWOmar M Simeone-Senelle MC Frassati C Alshamali F et al 2018The Comoros show the earliest Austronesian gene flow into theSwahili corridor Am J Hum Genet 102(1)58ndash68

Bryc K Durand EY Macpherson JM Reich D Mountain JL 2015 Thegenetic ancestry of African Americans Latinos and EuropeanAmericans across the United States Am J Hum Genet 96(1)37ndash53

Campbell MC Tishkoff SA 2008 African genetic diversity implica-tions for human demographic history modern human originsand complex disease mapping Annu Rev Genomics Hum Genet9403ndash433

Candille SI Absher DM Beleza S Bauchet M McEvoy B GarrisonNA Li JZ Myers RM Barsh GS Tang H et al 2012 Genome-wideassociation studies of quantitatively measured skin hair and eyepigmentation in four European populations PLoS One 7(10)e48294

Cerny V Mulligan CJ Fernandes V Silva NM Alshamali F Non A HarichN Cherni L El Gaaied AB Al-Meeri A et al 2011 Internal diversifi-cation of mitochondrial haplogroup R0a reveals post-last glacialmaximum demographic expansions in South Arabia Mol Biol Evol28(1)71ndash78

Cerny V Pereira L Kujanova M Vasikova A Hajek M Morris M MulliganCJ 2009 Out of Arabiamdashthe settlement of island Soqotra as revealedby mitochondrial and Y chromosome genetic diversity Am J PhysAnthropol 138(4)439ndash447

Chang CC Chow CC Tellier LC Vattikuti S Purcell SM Lee JJ 2015Second-generation PLINK rising to the challenge of larger and richerdatasets Gigascience 47

Danecek P Auton A Abecasis G Albers CA Banks E DePristo MAHandsaker RE Lunter G Marth GT Sherry ST et al 2011 The variantcall format and VCFtools Bioinformatics 27(15)2156ndash2158

Delaneau O Marchini J Zagury JF 2011 A linear complexity phasingmethod for thousands of genomes Nat Methods 9(2)179ndash181

Emmrich S Streltsov A Schmidt F Thangapandi VR Reinhardt DKlusmann JH 2014 LincRNAs MONC and MIR100HG act as onco-genes in acute megakaryoblastic leukemia Mol Cancer 13171

Fernandes V Alshamali F Alves M Costa MD Pereira JB Silva NMCherni L Harich N Cerny V Soares P et al 2012 The Arabian cradlemitochondrial relicts of the first steps along the southern route outof Africa Am J Hum Genet 90(2)347ndash355

Fernandes V Triska P Pereira JB Alshamali F Rito T Machado AFajkosova Z Cavadas B Cerny V Soares P et al 2015 Genetic stra-tigraphy of key demographic events in Arabia PLoS One10(3)e0118625

Frazer KA Ballinger DG Cox DR Hinds DA Stuve LL Gibbs RA BelmontJW Boudreau A Hardenbol P Leal SM et al 2007 A second gener-ation human haplotype map of over 31 million SNPs Nature449(7164)851ndash861

Fu Q Posth C Hajdinjak M Petr M Mallick S Fernandes D FurtwanglerA Haak W Meyer M Mittnik A et al 2016 The genetic history of IceAge Europe Nature 534(7606)200ndash205

Gallego Romero I Basu Mallick C Liebert A Crivellaro F Chaubey G ItanY Metspalu M Eaaswarkhanth M Pitchappan R Villems R et al2012 Herders of Indian and European cattle share their predomi-nant allele for lactase persistence Mol Biol Evol 29(1)249ndash260

Gravel S Zakharia F Moreno-Estrada A Byrnes JK Muzzio M Rodriguez-Flores JL Kenny EE Gignoux CR Maples BK Guiblet W et al 2013Reconstructing Native American migrations from whole-genomeand whole-exome data PLoS Genet 9(12)e1004023

Hellenthal G Busby GBJ Band G Wilson JF Capelli C Falush D Myers S2014 A genetic atlas of human admixture history Science343(6172)747ndash751

Hernandez-Pacheco N Flores C Alonso S Eng C Mak ACY Hunstman SHu D White MJ Oh SS Meade K et al 2017 Identification of a novellocus associated with skin colour in African-admixed populationsSci Rep 744548

Hodgson JA Mulligan CJ Al-Meeri A Raaum RL 2014 Early back-to-Africa migration into the horn of Africa PLoS Genet 10(6)e1004393

Fernandes et al doi101093molbevmsz005 MBE

584

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Hogarth DG 1904 The penetration of Arabia a record of the develop-ment of western knowledge concerning the Arabian Peninsula NewYork Cambridge Library Collection

Hunter-Zinck H Musharoff S Salit J Al-Ali KA Chouchane L Gohar AMatthews R Butler MW Fuller J Hackett NR et al 2010 Populationgenetic structure of the people of Qatar Am J Hum Genet87(1)17ndash25

John SE Thareja G Hebbar P Behbehani K Thanaraj TA Alsmadi O2015 Kuwaiti population subgroup of nomadic Bedouin ancestrymdashwhole genome sequence and analysis Genom Data 3116ndash127

Kivisild T Reidla M Metspalu E Rosa A Brehm A Pennarun E Parik JGeberhiwot T Usanga E Villems R 2004 Ethiopian mitochondrialDNA heritage tracking gene flow across and around the gate oftears Am J Hum Genet 75(5)752ndash770

Laayouni H Oosting M Luisi P Ioana M Alonso S Rica~no-Ponce ITrynka G Zhernakova A Plantinga TS Cheng S-C et al 2014Convergent evolution in European and Rroma populations revealspressure exerted by plague on Toll-like receptors Proc Natl Acad SciU S A 111(7)2668ndash2673

Lahr MM Foley R 2005 Multiple dispersals and modern human originsEvol Anthropol 3(2)48ndash60

Langergraber KE Prufer K Rowney C Boesch C Crockford C Fawcett KInoue E Inoue-Muruyama M Mitani JC Muller MN et al 2012Generation times in wild chimpanzees and gorillas suggest earlierdivergence times in great ape and human evolution Proc Natl AcadSci U S A 109(39)15716ndash15721

Laso-Jadart R Harmant C Quach H Zidane N Tyler-Smith C Mehdi QAyub Q Quintana-Murci L Patin E 2017 The genetic legacy of theIndian Ocean slave trade recent admixture and post-admixture se-lection in the Makranis of Pakistan Am J Hum Genet101(6)977ndash984

Lawson DJ Hellenthal G Myers S Falush D 2012 Inference of populationstructure using dense haplotype data PLoS Genet 8(1)e1002453

Li JZ Absher DM Tang H Southwick AM Casto AM Ramachandran SCann HM Barsh GS Feldman M Cavalli-Sforza LL et al 2008Worldwide human relationships inferred from genome-wide pat-terns of variation Science 319(5866)1100ndash1104

Loh PR Lipson M Patterson N Moorjani P Pickrell JK Reich D Berger B2013 Inferring admixture histories of human populations using link-age disequilibrium Genetics 193(4)1233ndash1254

Lovejoy PE 2011 Transformations in slavery a history of slavery in AfricaCambridge Cambridge University Press

Macaulay V Hill C Achilli A Rengo C Clarke D Meehan W Blackburn JSemino O Scozzari R Cruciani F et al 2005 Single rapid coastalsettlement of Asia revealed by analysis of complete mitochondrialgenomes Science 308(5724)1034

Malaspinas A-S Westaway MC Muller C Sousa VC Lao O Alves IBergstrom A Athanasiadis G Cheng JY Crawford JE et al 2016 Agenomic history of Aboriginal Australia Nature 538(7624)207

Mallick S Li H Lipson M Mathieson I Gymrek M Racimo F Zhao MChennagiri N Nordenfelt S Tandon A et al 2016 The SimonsGenome Diversity Project 300 genomes from 142 diverse popula-tions Nature 538(7624)201

Maples BK Gravel S Kenny EE Bustamante CD 2013 RFMix a discrim-inative modeling approach for rapid and robust local-ancestry infer-ence Am J Hum Genet 93(2)278ndash288

McManus KF Taravella AM Henn BM Bustamante CD Sikora MCornejo OE 2017 Population genetic analysis of the DARC locus(Duffy) reveals adaptation from standing variation associated withmalaria resistance in humans PLoS Genet 13(3)e1006560

Moorjani P Patterson N Hirschhorn JN Keinan A Hao L Atzmon GBurns E Ostrer H Price AL Reich D 2011 The history of African geneflow into Southern Europeans Levantines and Jews PLoS Genet7(4)e1001373

Musilova E Fernandes V Silva NM Soares P Alshamali F Harich NCherni L Gaaied AB Al-Meeri A Pereira L et al 2011 Populationhistory of the Red Seamdashgenetic exchanges between the ArabianPeninsula and East Africa signaled in the mitochondrial DNA HV1haplogroup Am J Phys Anthropol 145592ndash598

Ozato K Shin D-M Chang T-H Morse HC 2008 TRIM family proteinsand their emerging roles in innate immunity Nat Rev Immunol8(11)849ndash860

Patin E Lopez M Grollemund R Verdu P Harmant C Quach H Laval GPerry GH Barreiro LB Froment A et al 2017 Dispersals and geneticadaptation of Bantu-speaking populations in Africa and NorthAmerica Science 356(6337)543ndash546

Patterson N Moorjani P Luo Y Mallick S Rohland N Zhan YGenschoreck T Webster T Reich D 2012 Ancient admixture inhuman history Genetics 192(3)1065

Patterson N Price AL Reich D 2006 Population structure and eigena-nalysis PLoS Genet 2(12)e190

Petkova D Novembre J Stephens M 2016 Visualizing spatial populationstructure with estimated effective migration surfaces Nat Genet48(1)94ndash100

Pickrell JK Coop G Novembre J Kudaravalli S Li JZ Absher D SrinivasanBS Barsh GS Myers RM Feldman MW et al 2009 Signals of recentpositive selection in a worldwide sample of human populationsGenome Res 19(5)826ndash837

Pickrell JK Pritchard JK 2012 Inference of population splits and mixturesfrom genome-wide allele frequency data PLoS Genet8(11)e1002967

Price AL Patterson NJ Plenge RM Weinblatt ME Shadick NA Reich D2006 Principal components analysis corrects for stratification ingenome-wide association studies Nat Genet 38(8)904ndash909

Purcell S Neale B Todd-Brown K Thomas L Ferreira MA Bender DMaller J Sklar P de Bakker PI Daly MJ et al 2007 PLINK a tool set forwhole-genome association and population-based linkage analysesAm J Hum Genet 81(3)559ndash575

Quintana-Murci L Semino O Bandelt H-J Passarino G McElreavey KSantachiara-Benerecetti AS 1999 Genetic evidence of an early exit ofHomo sapiens sapiens from Africa through eastern Africa Nat Genet23(4)437

R Development Core Team 2018 R a language and environment forstatistical computing Vienna (Austria)R Foundation for StatisticalComputing

Racimo F 2016 Testing for ancient selection using cross-populationallele frequency differentiation Genetics 202(2)733ndash750

Ranciaro A Campbell MC Hirbo JB Ko W-Y Froment A Anagnostou PKotze MJ Ibrahim M Nyambo T Omar SA et al 2014 Geneticorigins of lactase persistence and the spread of pastoralism inAfrica Am J Hum Genet 94(4)496ndash510

Reich D Thangaraj K Patterson N Price AL Singh L 2009Reconstructing Indian population history Nature461(7263)489ndash494

Roche B Rougeron V Quintana-Murci L Renaud F Abbate JLPrugnolle F 2017 Might interspecific interactions between patho-gens drive host evolution The case of Plasmodium species andDuffy-negativity in human populations Trends Parasitol 33(1)21ndash29

Rodriguez-Flores JL Fakhro K Agosto-Perez F Ramstetter MD Arbiza LVincent TL Robay A Malek JA Suhre K Chouchane L et al 2016Indigenous Arabs are descendants of the earliest split from ancientEurasian populations Genome Res 26(2)151ndash162

Rosenberg NA Huang L Jewett EM Szpiech ZA Jankovic I Boehnke M2010 Genome-wide association studies in diverse populations NatRev Genet 11(5)356ndash366

Russo A Di Gaetano C Cugliari G Matullo G 2018 Advances in thegenetics of hypertension the effect of rare variants Int J Mol Sci19(3)688

Segal R 2002 Islamrsquos black slaves the other black diaspora LondonAtlantic Books

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin NSchwikowski B Ideker T 2003 Cytoscape a software environmentfor integrated models of biomolecular interaction networks GenomeRes 13(11)2498ndash2504

Skoglund P Thompson JC Prendergast ME Mittnik A Sirak K HajdinjakM Salie T Rohland N Mallick S Peltzer A et al 2017 Reconstructingprehistoric African population structure Cell 171(1)59ndash71 e21

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

585

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Snow RW Amratia P Zamani G Mundia CW Noor AM Memish ZA AlZahrani MH Al Jasari A Fikri M Atta H 2013 The malaria transitionon the Arabian Peninsula progress toward a malaria-free regionbetween 1960ndash2010 Adv Parasitol 82205ndash251

Soares P Alshamali F Pereira JB Fernandes V Silva NM Afonso C CostaMD Musilova E Macaulay V Richards MB et al 2012 The expan-sion of mtDNA haplogroup L3 within and out of Africa Mol BiolEvol 29(3)915ndash927

Szpiech ZA Hernandez RD 2014 selscan an efficient multithreadedprogram to perform EHH-based scans for positive selection MolBiol Evol 31(10)2824ndash2827

Tekola-Ayele F Adeyemo A Chen G Hailu E Aseffa AA-O Davey GA-ONewport MJ Rotimi CA-OX 2015 Novel genomic signals of recentselection in an Ethiopian population Eur J Hum Genet23(8)1085ndash1092

Thompson EE Kuttab-Boulos H Witonsky D Yang L Roe BA Di RienzoA 2004 CYP3A variation and the evolution of salt-sensitivity var-iants Am J Hum Genet 75(6)1059ndash1069

Tishkoff SA Reed FA Ranciaro A Voight BF Babbitt CC Silverman JSPowell K Mortensen HM Hirbo JB Osman M et al 2007Convergent adaptation of human lactase persistence in Africa andEurope Nat Genet 39(1)31ndash40

Triska P Soares P Patin E Fernandes V Cerny V Pereira L 2015Extensive admixture and selective pressure across the Sahel BeltGenome Biol Evol 7(12)3484ndash3495

Uchil PD Quinlan BD Chan W-T Luna JM Mothes W 2008 TRIM E3ligases interfere with early and late stages of the retroviral life cyclePLoS Pathog 4(2)e16

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Vyas DN Al-Meeri A Mulligan CJ 2017 Testing support for the northernand southern dispersal routes out of Africa an analysis of Levantineand southern Arabian populations Am J Phys Anthropol164(4)736ndash749

Vyas DN Kitchen A Miro-Herrans AT Pearson LN Al-Meeri A MulliganCJ 2016 Bayesian analyses of Yemeni mitochondrial genomes sug-gest multiple migration events with Africa and Western Eurasia AmJ Phys Anthropol 159(3)382ndash393

Wagh K Bhatia A Alexe G Reddy A Ravikumar V Seiler M Boemo MYao M Cronk L Naqvi A et al 2012 Lactase persistence and lipidpathway selection in the Maasai PLoS One 7(9)e44751

Yin T Cook D Lawrence M 2012 ggbio an R package for extending thegrammar of graphics for genomic data Genome Biol 13(8)R77

Yu Y Ouyang Y Yao W 2018 shinyCircos an RShiny application forinteractive creation of Circos plot Bioinformatics 34(7)1229ndash1231

Zhao F McParland S Kearney F Du L Berry DP 2015 Detection ofselection signatures in dairy and beef cattle using high-density ge-nomic information Genet Sel Evol 4749

Fernandes et al doi101093molbevmsz005 MBE

586

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

(rs41380347) was described as being present at high fre-quency in the AP and the Levant and estimated to havebeen originated 4 ka (62 ka) in the AP (Bayoumi et al2016) possibly as a result of the domestication of the Arabiancamel at 6 ka (Ranciaro et al 2014) By contrast theEuropean 13910T selected allele (rs4988235) is also themost commonly observed in South Asia (Gallego Romeroet al 2012) We could not verify frequencies of rs41380347SNP that was not part of the chip genotyped here but wechecked the linked rs4988243 SNP This last SNP testified thehigher frequencies in the West versus the East AP (432 inYemen 461 in Saudi Arabia 179 in Oman 191 in theUAE and 174 in Iran notice that although rs41380347 andrs4988243 are linked this last SNP has overall frequencies20 higher than the first) The genotyped Europeanrs4988235 SNP presented higher frequencies of the derivedallele in the East than in the West AP (06 13 125182 and 103 respectively) The haplotype backgroundof Arabian and European derived alleles are related and dis-tant from the African one (Ranciaro et al 2014) whichexplains the high non-African enrichment detected inRFMix analysis in the West AP (the Arabian haplotypes)and the selection signal in XP-EHH-European comparison inThe East AP and Iran (the European haplotypes) This is aninteresting case generated by the isolation between the twosides of the Peninsula where the West developed a localpositive selection of a lactose tolerant variant whereas theEast acquired the EuropeanSouth Asian variant Previouslythis difference had been reported for ethnic groups withinOman (Bayoumi et al 2016) where frequencies of13910Tand 13915G alleles diverged between Arabs of northernand southern Oman whom migrated from Yemen (0ndash1 and14ndash72 respectively) and Omanis of Asian origin (16 and0 respectively) Here we verified that this pattern is ex-tended to the entire Peninsula

In relation to the genetic diversity linked to skin color theEuropeanSouth Asian-selected derived SNPs are equally fre-quent in both the West and East AP (rs1426654 in theSLC24A5 gene 101 in Yemen 133 in Saudi Arabia132 in Oman 135 in the UAE and 118 in Iranrs183671 in the SLC45A2 gene 354 294 227 290and 401 respectively rs1667394 in the OCA2HERC2 gene381 288 307 335 and 349 respectively)

The palimpsest of complex interactions between cross andlocal selective pressures and demographic factors identifiedhere in the AP testifies the rich genetic dynamics taking placein bridging zones of the globe In the near future wholegenomes of Arabians across the entire peninsula (with amore precise geographical information than the one we couldpresent here) and more sensitive selection measures will addinformation to the role played by this geographic region inthe adaptation of modern humans to non-African habitatsImprovements to dating techniques applicable to completegenomes such as multiple sequentially Markovian coalescentapproach will also help in answering the question if the APwas only a passage way to reach further distances or one ofthe oldest inhabited regions OOA Complete genomes mayindeed shed light on the lost period in the mtDNA pool due

to the absence of the direct OOA descendants The first stepshave already been taken and a few Arabian completegenomes have begun to be available (John et al 2015Mallick et al 2016 Rodriguez-Flores et al 2016)

Materials and Methods

Population Samples Genome-Wide Genotyping andPublished DataDNA samples analyzed in this study were collected from 420individuals of the AP (Saudi Arabia Yemen Oman and theUAE) and from 80 individuals of Iran populations (furtherinformation provided in supplementary table S1Supplementary Material online) Individuals were residing inDubai at the time of sampling (between years 1991 and 2000)representing a widespread collection of birth origins fromSaudi Arabia Yemen Oman the UAE and Iran This studyobtained ethical approval from the Ethics Committee of theUniversity of Porto Portugal (17CEUP2012) The individualswere genotyped with the Illumina Human Omni ExpressBead Chip (OmniExpress) containing 741000 SNPs Qualitychecks were applied using PLINK (Purcell et al 2007 Changet al 2015) and 22 individuals were excluded for the lack ofmore than 10 of genotypes A preliminary PCA revealedthat some UAE and Oman individuals (five individuals fromeach population) had a considerable sub-Saharan African an-cestry an indicator that they were most probably recentimmigrants from Africa (supplementary fig S14Supplementary Material online) Hence these individualswere removed from the analyses leading to a final set of468 individuals Markers with genotyping call rates lowerthan 95 were removed consequently a total of 627435autosomal SNPs passed quality control check The genotypesfrom the individuals analyzed in this study can be accessedfrom the EGA repository (European Genome-PhenomeArchive accession number EGAS00001003335) In order toperform a fine-scale population structure analysis of our com-plete data the 468 individuals were merged with relevantavailable data sets (supplementary table S1 SupplementaryMaterial online) As several population genetic analyses as-sume independent markers SNPs were pruned for pairwiseLD in PLINK (wwwcog-genomicsorgplink19 Purcell et al2007 Chang et al 2015) by removing any SNP that had an r2

gt 04 with another SNP within a 50-SNPs sliding windowwith a step of 20 SNPs After pruning the ldquorestricted data setrdquoincluded 176309 autosomal SNPs and 1782 individuals (sup-plementary table S1 Supplementary Material online) andthis data set was used in the analyses based on allele frequen-cies (PCA ADMIXTURE FST and f3 statistic) Another data setwas created by randomly selecting a maximum of 25 individ-uals from each population but extending the number andthe geographical distribution of populations this data setconsisting of 171728 SNPs and 2056 individuals was desig-nated as ldquoextended data setrdquo (supplementary table S2Supplementary Material online) The homogenization ofthe number of samples per group avoided statistical biasesdue to overrepresentation of some populations and allowedto run computationally demanding algorithms such as EEMS

Fernandes et al doi101093molbevmsz005 MBE

582

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

fineSTRUCTURE Chromopainter GLOBETROTTER and IBDAll genotypes from the unpruned ldquoextended data setrdquo werephased with SHAPEIT v2r79044 (Delaneau et al 2011) usingthe 1000 Genomes phased data (Auton et al 2015) as a ref-erence panel and the HapMap phase 2 genetic map (Frazeret al 2007) We did not merge our data with relevant data setsfrom Qatar and Yemen (Hunter-Zinck et al 2010 Vyas et al2017) as these were screened through Affymetrix chips bear-ing a low number of overlapping SNPs Qatari completegenomes from (Rodriguez-Flores et al 2016) were specificallyselected not making a random population sample whichcould be comparable with our populations

Population Structure Differentiation and InbreedingPCA which infers worldwide axes of human genetic variationfrom the allele frequencies of various populations was carriedout in the pruned ldquorestricted data setrdquo by using the smartpcatool included in the EIGENSOFT package (Patterson et al2006 Price et al 2006) We ran ADMIXTURE (Alexander et al2009) to decompose genetic ancestries of the prunedldquorestricted data setrdquo using maximum likelihood for compo-nents Kfrac14 2 to Kfrac14 12 with the optimal K estimated throughcross-validation of the logistic regression (Kfrac14 6) Wrightrsquos FST

metric was calculated using Vcftools (Danecek et al 2011)The THREEPOP program from TreeMix was used to calculatef3 statistics (Reich et al 2009 Patterson et al 2012 Pickrell andPritchard 2012) which tests between alternative scenarios ofadmixture In order to analyze and visualize spatial populationstructure from georeferenced genetic samples and to definegenetic barriers and corridors the method of EEMS (Petkovaet al 2016) was used Geographic coordinates and a geneticdissimilarity matrix between populations were used to set upa map defining a grid of 500 demes and 3000000 Markovchain Monte Carlo (MCMC) iterations were run beforechecking for convergence Plots were generated in R followingEEMS v1 manual indications

Inference and Dating of Population AdmixturePopulation structure of the phased ldquoextended data setrdquo wasevaluated using the fineSTRUCTURE v207 package 18(Lawson et al 2012) fineSTRUCTURE captures informationon population structure provided by patterns of haplotypesimilarity calculated with Chromopainter v20 18 (Lawsonet al 2012) and performs a model-based Bayesian clusteringof genotypes From the results a coancestry heat map and adendrogram were inferred to visualize the number of statis-tically defined clusters that describe the data The clustersobtained by fineSTRUCTURE were then used to assign thepopulations as target or donorsurrogate when testing foradmixture Mutational rates and effective population sizeparameters were first estimated with an Estimation-Maximization algorithm running Chromopainter v2 on all22 autosomes for the entire data set with 10 iterations(Lawson et al 2012) The weighted average of these param-eters according to the SNP coverage of each chromosomeand the number of individuals was then used to compute thechromosomal painting Two Arabian groups were identifiedas target populations the western group represented by

Yemen and Saudi Arabia and the eastern group representedby Oman and the UAE We ran chromosomal painting foreach group Due to the possibility of common ancestry weexcluded the Levantine populations from surrogate popula-tions The painted chromosomes obtained for each clusterwere used in GLOBETROTTER v10 26 (Hellenthal et al 2014)to estimate the ratios and dates of the potential admixtureevents in the region Coancestry curves were estimated foreach target population both standardizing against a null(NULL 1) and standard (NULL 0) individual and consistencybetween each estimated parameter was checked A total of100 bootstrap resamplings were performed to estimate theprobability values of the admixture events considering theldquoNULLrdquo individual by calculating the proportion of non-sensical inferred dates (lt1 or gt400 generations) producedby the model (NULL 1) The ldquobest-guessrdquo scenario given byGLOBETROTTER v10 26 (Hellenthal et al 2014) was consid-ered for each target population Dates of admixture given ingenerations were converted to chronological time using ageneration interval of 29 years (Langergraber et al 2012)

Haplotype sharing between pairs of individuals was esti-mated by the refined IBD algorithm of Beagle v4026 27(Browning and Browning 2007) and visualized withCytoscape v32160 (Shannon et al 2003) All maps inthe present study were generated using Global Mapperv15 (httpwwwbluemarblegeocomproductsglobal-mapperphp)

Local Ancestry InferenceRFMix (Maples et al 2013) software which employs a LDmodel between markers to infer ancestry for each segmentof the genome between a mixture of putative ancestral panelsof haplotypes was used in the unpruned and phased data setfrom the four AP populations plus Iran The phased data fromthe 1000 Genomes populations of Great Britain (representingEuropean ancestry) Yoruba from Nigeria (sub-SaharanAfrican ancestry) and Indian Telugu (South Asian ancestry)were used as parental populations Information on ancestrywas obtained for each locus along chromosomes for everyindividual and these values were averaged in each popula-tion In order to identify regions which had a significantlyhigher proportion of a given parental ancestry we followedpreviously published studies (Bryc et al 2015) consideringonly regions outside the range defined by the genomemean 6 4SD The karyograms of the local ancestry acrossthe 22 chromosomes were generates in R using the ggbiopackage (Yin et al 2012)

Analysis of SelectioniHS (Voight et al 2006) was estimated in the SelScan package(Szpiech and Hernandez 2014) in the phased data set fromthe four AP populations plus Iran Scores were calculated foreach SNP within the population as a whole and standardizedwithin each of 100 bins of allele frequency using the normtool In order to identify loci potentially affected by positiveselection a percentile score was assigned based on the pro-portion of extreme iHS values in segment SelScan (Szpiechand Hernandez 2014) tool was also used to estimate cross

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

583

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

population extended haplotype homozygosity in each APpopulation by comparing with Great Britain Yoruba fromNigeria or Indian Telugu Consistently with other tests forselection only markers with MAFgt 1 were used for XP-EHH computation The obtained XP-EHH values were nor-malized by subtracting genome-wide mean XP-EHH and di-viding by standard deviation (Szpiech and Hernandez 2014)Genomic regions with percentile score exceeding 4SD and3SD were reported

Selected SNPs were mapped to genes in 5 kb flanking re-gion and significantly selected regions were represented incircos Plots generated in R (R Development Core Team 2018)using the shinyCircos graphical interface (Yu et al 2018)

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsThis work was financed by FEDERmdashFundo Europeu deDesenvolvimento Regional funds through COMPETE2020mdashOperacional Programme for Competitiveness andInternationalisation (POCI) Portugal 2020 and byPortuguese funds through FCTmdashFundac~ao para a Ciencia ea TecnologiaMinisterio da Ciencia Tecnologia e Inovac~ao inthe framework of the project ldquoBiomedical anthropologicalstudy in Arabian Peninsula based on high throughputgenomicsrdquo (POCI-01-0145-FEDER-016609) VF has a postdocgrant through FCT (SFRHBPD1149272016)

ReferencesAl-Abri A Podgorna E Rose JI Pereira L Mulligan CJ Silva NM Bayoumi

R Soares P Cerny V 2012 PleistocenendashHolocene boundary inSouthern Arabia from the perspective of human mtDNA variationAm J Phys Anthropol 149(2)291ndash298

Alexander DH Novembre J Lange K 2009 Fast model-based estimationof ancestry in unrelated individuals Genome Res 19(9)1655ndash1664

Alshamali F Pereira L Budowle B Poloni ES Currat M 2009 Localpopulation structure in Arabian Peninsula revealed by Y-STR diver-sity Hum Hered 68(1)45ndash54

An P Goedert JJ Donfield S Buchbinder S Kirk GD Detels R Winkler CA2014 Regulatory variation in HIV-1 dependency factor ZNRD1 asso-ciates with host resistance to HIV-1 acquisition J Infect Dis210(10)1539ndash1548

Auton A Brooks LD Durbin RM Garrison EP Kang HM Korbel JOMarchini JL McCarthy S McVean GA Abecasis GR 2015 A globalreference for human genetic variation Nature 526(7571)68ndash74

Basu Mallick C Iliescu FM Mols M Hill S Tamang R Chaubey G Goto RHo SYW Gallego Romero I Crivellaro F et al 2013 The light skinallele of SLC24A5 in South Asians and Europeans shares identity bydescent PLoS Genet 9(11)e1003912

Bayoumi R De Fanti S Sazzini M Giuliani C Quagliariello A Bortolini EBoattini A Al-Habori M Al-Zubairi AS Rose JI et al 2016 Positiveselection of lactase persistence among people of Southern ArabiaAm J Phys Anthropol 161(4)676ndash684

Browning BL Browning SR 2013 Improving the accuracy and efficiencyof identity-by-descent detection in population data Genetics194(2)459ndash471

Browning SR Browning BL 2007 Rapid and accurate haplotype phasingand missing-data inference for whole-genome association studies byuse of localized haplotype clustering Am J Hum Genet81(5)1084ndash1097

Brucato N Fernandes V Mazieres S Kusuma P Cox MP Ngrsquoangrsquoa JWOmar M Simeone-Senelle MC Frassati C Alshamali F et al 2018The Comoros show the earliest Austronesian gene flow into theSwahili corridor Am J Hum Genet 102(1)58ndash68

Bryc K Durand EY Macpherson JM Reich D Mountain JL 2015 Thegenetic ancestry of African Americans Latinos and EuropeanAmericans across the United States Am J Hum Genet 96(1)37ndash53

Campbell MC Tishkoff SA 2008 African genetic diversity implica-tions for human demographic history modern human originsand complex disease mapping Annu Rev Genomics Hum Genet9403ndash433

Candille SI Absher DM Beleza S Bauchet M McEvoy B GarrisonNA Li JZ Myers RM Barsh GS Tang H et al 2012 Genome-wideassociation studies of quantitatively measured skin hair and eyepigmentation in four European populations PLoS One 7(10)e48294

Cerny V Mulligan CJ Fernandes V Silva NM Alshamali F Non A HarichN Cherni L El Gaaied AB Al-Meeri A et al 2011 Internal diversifi-cation of mitochondrial haplogroup R0a reveals post-last glacialmaximum demographic expansions in South Arabia Mol Biol Evol28(1)71ndash78

Cerny V Pereira L Kujanova M Vasikova A Hajek M Morris M MulliganCJ 2009 Out of Arabiamdashthe settlement of island Soqotra as revealedby mitochondrial and Y chromosome genetic diversity Am J PhysAnthropol 138(4)439ndash447

Chang CC Chow CC Tellier LC Vattikuti S Purcell SM Lee JJ 2015Second-generation PLINK rising to the challenge of larger and richerdatasets Gigascience 47

Danecek P Auton A Abecasis G Albers CA Banks E DePristo MAHandsaker RE Lunter G Marth GT Sherry ST et al 2011 The variantcall format and VCFtools Bioinformatics 27(15)2156ndash2158

Delaneau O Marchini J Zagury JF 2011 A linear complexity phasingmethod for thousands of genomes Nat Methods 9(2)179ndash181

Emmrich S Streltsov A Schmidt F Thangapandi VR Reinhardt DKlusmann JH 2014 LincRNAs MONC and MIR100HG act as onco-genes in acute megakaryoblastic leukemia Mol Cancer 13171

Fernandes V Alshamali F Alves M Costa MD Pereira JB Silva NMCherni L Harich N Cerny V Soares P et al 2012 The Arabian cradlemitochondrial relicts of the first steps along the southern route outof Africa Am J Hum Genet 90(2)347ndash355

Fernandes V Triska P Pereira JB Alshamali F Rito T Machado AFajkosova Z Cavadas B Cerny V Soares P et al 2015 Genetic stra-tigraphy of key demographic events in Arabia PLoS One10(3)e0118625

Frazer KA Ballinger DG Cox DR Hinds DA Stuve LL Gibbs RA BelmontJW Boudreau A Hardenbol P Leal SM et al 2007 A second gener-ation human haplotype map of over 31 million SNPs Nature449(7164)851ndash861

Fu Q Posth C Hajdinjak M Petr M Mallick S Fernandes D FurtwanglerA Haak W Meyer M Mittnik A et al 2016 The genetic history of IceAge Europe Nature 534(7606)200ndash205

Gallego Romero I Basu Mallick C Liebert A Crivellaro F Chaubey G ItanY Metspalu M Eaaswarkhanth M Pitchappan R Villems R et al2012 Herders of Indian and European cattle share their predomi-nant allele for lactase persistence Mol Biol Evol 29(1)249ndash260

Gravel S Zakharia F Moreno-Estrada A Byrnes JK Muzzio M Rodriguez-Flores JL Kenny EE Gignoux CR Maples BK Guiblet W et al 2013Reconstructing Native American migrations from whole-genomeand whole-exome data PLoS Genet 9(12)e1004023

Hellenthal G Busby GBJ Band G Wilson JF Capelli C Falush D Myers S2014 A genetic atlas of human admixture history Science343(6172)747ndash751

Hernandez-Pacheco N Flores C Alonso S Eng C Mak ACY Hunstman SHu D White MJ Oh SS Meade K et al 2017 Identification of a novellocus associated with skin colour in African-admixed populationsSci Rep 744548

Hodgson JA Mulligan CJ Al-Meeri A Raaum RL 2014 Early back-to-Africa migration into the horn of Africa PLoS Genet 10(6)e1004393

Fernandes et al doi101093molbevmsz005 MBE

584

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Hogarth DG 1904 The penetration of Arabia a record of the develop-ment of western knowledge concerning the Arabian Peninsula NewYork Cambridge Library Collection

Hunter-Zinck H Musharoff S Salit J Al-Ali KA Chouchane L Gohar AMatthews R Butler MW Fuller J Hackett NR et al 2010 Populationgenetic structure of the people of Qatar Am J Hum Genet87(1)17ndash25

John SE Thareja G Hebbar P Behbehani K Thanaraj TA Alsmadi O2015 Kuwaiti population subgroup of nomadic Bedouin ancestrymdashwhole genome sequence and analysis Genom Data 3116ndash127

Kivisild T Reidla M Metspalu E Rosa A Brehm A Pennarun E Parik JGeberhiwot T Usanga E Villems R 2004 Ethiopian mitochondrialDNA heritage tracking gene flow across and around the gate oftears Am J Hum Genet 75(5)752ndash770

Laayouni H Oosting M Luisi P Ioana M Alonso S Rica~no-Ponce ITrynka G Zhernakova A Plantinga TS Cheng S-C et al 2014Convergent evolution in European and Rroma populations revealspressure exerted by plague on Toll-like receptors Proc Natl Acad SciU S A 111(7)2668ndash2673

Lahr MM Foley R 2005 Multiple dispersals and modern human originsEvol Anthropol 3(2)48ndash60

Langergraber KE Prufer K Rowney C Boesch C Crockford C Fawcett KInoue E Inoue-Muruyama M Mitani JC Muller MN et al 2012Generation times in wild chimpanzees and gorillas suggest earlierdivergence times in great ape and human evolution Proc Natl AcadSci U S A 109(39)15716ndash15721

Laso-Jadart R Harmant C Quach H Zidane N Tyler-Smith C Mehdi QAyub Q Quintana-Murci L Patin E 2017 The genetic legacy of theIndian Ocean slave trade recent admixture and post-admixture se-lection in the Makranis of Pakistan Am J Hum Genet101(6)977ndash984

Lawson DJ Hellenthal G Myers S Falush D 2012 Inference of populationstructure using dense haplotype data PLoS Genet 8(1)e1002453

Li JZ Absher DM Tang H Southwick AM Casto AM Ramachandran SCann HM Barsh GS Feldman M Cavalli-Sforza LL et al 2008Worldwide human relationships inferred from genome-wide pat-terns of variation Science 319(5866)1100ndash1104

Loh PR Lipson M Patterson N Moorjani P Pickrell JK Reich D Berger B2013 Inferring admixture histories of human populations using link-age disequilibrium Genetics 193(4)1233ndash1254

Lovejoy PE 2011 Transformations in slavery a history of slavery in AfricaCambridge Cambridge University Press

Macaulay V Hill C Achilli A Rengo C Clarke D Meehan W Blackburn JSemino O Scozzari R Cruciani F et al 2005 Single rapid coastalsettlement of Asia revealed by analysis of complete mitochondrialgenomes Science 308(5724)1034

Malaspinas A-S Westaway MC Muller C Sousa VC Lao O Alves IBergstrom A Athanasiadis G Cheng JY Crawford JE et al 2016 Agenomic history of Aboriginal Australia Nature 538(7624)207

Mallick S Li H Lipson M Mathieson I Gymrek M Racimo F Zhao MChennagiri N Nordenfelt S Tandon A et al 2016 The SimonsGenome Diversity Project 300 genomes from 142 diverse popula-tions Nature 538(7624)201

Maples BK Gravel S Kenny EE Bustamante CD 2013 RFMix a discrim-inative modeling approach for rapid and robust local-ancestry infer-ence Am J Hum Genet 93(2)278ndash288

McManus KF Taravella AM Henn BM Bustamante CD Sikora MCornejo OE 2017 Population genetic analysis of the DARC locus(Duffy) reveals adaptation from standing variation associated withmalaria resistance in humans PLoS Genet 13(3)e1006560

Moorjani P Patterson N Hirschhorn JN Keinan A Hao L Atzmon GBurns E Ostrer H Price AL Reich D 2011 The history of African geneflow into Southern Europeans Levantines and Jews PLoS Genet7(4)e1001373

Musilova E Fernandes V Silva NM Soares P Alshamali F Harich NCherni L Gaaied AB Al-Meeri A Pereira L et al 2011 Populationhistory of the Red Seamdashgenetic exchanges between the ArabianPeninsula and East Africa signaled in the mitochondrial DNA HV1haplogroup Am J Phys Anthropol 145592ndash598

Ozato K Shin D-M Chang T-H Morse HC 2008 TRIM family proteinsand their emerging roles in innate immunity Nat Rev Immunol8(11)849ndash860

Patin E Lopez M Grollemund R Verdu P Harmant C Quach H Laval GPerry GH Barreiro LB Froment A et al 2017 Dispersals and geneticadaptation of Bantu-speaking populations in Africa and NorthAmerica Science 356(6337)543ndash546

Patterson N Moorjani P Luo Y Mallick S Rohland N Zhan YGenschoreck T Webster T Reich D 2012 Ancient admixture inhuman history Genetics 192(3)1065

Patterson N Price AL Reich D 2006 Population structure and eigena-nalysis PLoS Genet 2(12)e190

Petkova D Novembre J Stephens M 2016 Visualizing spatial populationstructure with estimated effective migration surfaces Nat Genet48(1)94ndash100

Pickrell JK Coop G Novembre J Kudaravalli S Li JZ Absher D SrinivasanBS Barsh GS Myers RM Feldman MW et al 2009 Signals of recentpositive selection in a worldwide sample of human populationsGenome Res 19(5)826ndash837

Pickrell JK Pritchard JK 2012 Inference of population splits and mixturesfrom genome-wide allele frequency data PLoS Genet8(11)e1002967

Price AL Patterson NJ Plenge RM Weinblatt ME Shadick NA Reich D2006 Principal components analysis corrects for stratification ingenome-wide association studies Nat Genet 38(8)904ndash909

Purcell S Neale B Todd-Brown K Thomas L Ferreira MA Bender DMaller J Sklar P de Bakker PI Daly MJ et al 2007 PLINK a tool set forwhole-genome association and population-based linkage analysesAm J Hum Genet 81(3)559ndash575

Quintana-Murci L Semino O Bandelt H-J Passarino G McElreavey KSantachiara-Benerecetti AS 1999 Genetic evidence of an early exit ofHomo sapiens sapiens from Africa through eastern Africa Nat Genet23(4)437

R Development Core Team 2018 R a language and environment forstatistical computing Vienna (Austria)R Foundation for StatisticalComputing

Racimo F 2016 Testing for ancient selection using cross-populationallele frequency differentiation Genetics 202(2)733ndash750

Ranciaro A Campbell MC Hirbo JB Ko W-Y Froment A Anagnostou PKotze MJ Ibrahim M Nyambo T Omar SA et al 2014 Geneticorigins of lactase persistence and the spread of pastoralism inAfrica Am J Hum Genet 94(4)496ndash510

Reich D Thangaraj K Patterson N Price AL Singh L 2009Reconstructing Indian population history Nature461(7263)489ndash494

Roche B Rougeron V Quintana-Murci L Renaud F Abbate JLPrugnolle F 2017 Might interspecific interactions between patho-gens drive host evolution The case of Plasmodium species andDuffy-negativity in human populations Trends Parasitol 33(1)21ndash29

Rodriguez-Flores JL Fakhro K Agosto-Perez F Ramstetter MD Arbiza LVincent TL Robay A Malek JA Suhre K Chouchane L et al 2016Indigenous Arabs are descendants of the earliest split from ancientEurasian populations Genome Res 26(2)151ndash162

Rosenberg NA Huang L Jewett EM Szpiech ZA Jankovic I Boehnke M2010 Genome-wide association studies in diverse populations NatRev Genet 11(5)356ndash366

Russo A Di Gaetano C Cugliari G Matullo G 2018 Advances in thegenetics of hypertension the effect of rare variants Int J Mol Sci19(3)688

Segal R 2002 Islamrsquos black slaves the other black diaspora LondonAtlantic Books

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin NSchwikowski B Ideker T 2003 Cytoscape a software environmentfor integrated models of biomolecular interaction networks GenomeRes 13(11)2498ndash2504

Skoglund P Thompson JC Prendergast ME Mittnik A Sirak K HajdinjakM Salie T Rohland N Mallick S Peltzer A et al 2017 Reconstructingprehistoric African population structure Cell 171(1)59ndash71 e21

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

585

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Snow RW Amratia P Zamani G Mundia CW Noor AM Memish ZA AlZahrani MH Al Jasari A Fikri M Atta H 2013 The malaria transitionon the Arabian Peninsula progress toward a malaria-free regionbetween 1960ndash2010 Adv Parasitol 82205ndash251

Soares P Alshamali F Pereira JB Fernandes V Silva NM Afonso C CostaMD Musilova E Macaulay V Richards MB et al 2012 The expan-sion of mtDNA haplogroup L3 within and out of Africa Mol BiolEvol 29(3)915ndash927

Szpiech ZA Hernandez RD 2014 selscan an efficient multithreadedprogram to perform EHH-based scans for positive selection MolBiol Evol 31(10)2824ndash2827

Tekola-Ayele F Adeyemo A Chen G Hailu E Aseffa AA-O Davey GA-ONewport MJ Rotimi CA-OX 2015 Novel genomic signals of recentselection in an Ethiopian population Eur J Hum Genet23(8)1085ndash1092

Thompson EE Kuttab-Boulos H Witonsky D Yang L Roe BA Di RienzoA 2004 CYP3A variation and the evolution of salt-sensitivity var-iants Am J Hum Genet 75(6)1059ndash1069

Tishkoff SA Reed FA Ranciaro A Voight BF Babbitt CC Silverman JSPowell K Mortensen HM Hirbo JB Osman M et al 2007Convergent adaptation of human lactase persistence in Africa andEurope Nat Genet 39(1)31ndash40

Triska P Soares P Patin E Fernandes V Cerny V Pereira L 2015Extensive admixture and selective pressure across the Sahel BeltGenome Biol Evol 7(12)3484ndash3495

Uchil PD Quinlan BD Chan W-T Luna JM Mothes W 2008 TRIM E3ligases interfere with early and late stages of the retroviral life cyclePLoS Pathog 4(2)e16

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Vyas DN Al-Meeri A Mulligan CJ 2017 Testing support for the northernand southern dispersal routes out of Africa an analysis of Levantineand southern Arabian populations Am J Phys Anthropol164(4)736ndash749

Vyas DN Kitchen A Miro-Herrans AT Pearson LN Al-Meeri A MulliganCJ 2016 Bayesian analyses of Yemeni mitochondrial genomes sug-gest multiple migration events with Africa and Western Eurasia AmJ Phys Anthropol 159(3)382ndash393

Wagh K Bhatia A Alexe G Reddy A Ravikumar V Seiler M Boemo MYao M Cronk L Naqvi A et al 2012 Lactase persistence and lipidpathway selection in the Maasai PLoS One 7(9)e44751

Yin T Cook D Lawrence M 2012 ggbio an R package for extending thegrammar of graphics for genomic data Genome Biol 13(8)R77

Yu Y Ouyang Y Yao W 2018 shinyCircos an RShiny application forinteractive creation of Circos plot Bioinformatics 34(7)1229ndash1231

Zhao F McParland S Kearney F Du L Berry DP 2015 Detection ofselection signatures in dairy and beef cattle using high-density ge-nomic information Genet Sel Evol 4749

Fernandes et al doi101093molbevmsz005 MBE

586

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

fineSTRUCTURE Chromopainter GLOBETROTTER and IBDAll genotypes from the unpruned ldquoextended data setrdquo werephased with SHAPEIT v2r79044 (Delaneau et al 2011) usingthe 1000 Genomes phased data (Auton et al 2015) as a ref-erence panel and the HapMap phase 2 genetic map (Frazeret al 2007) We did not merge our data with relevant data setsfrom Qatar and Yemen (Hunter-Zinck et al 2010 Vyas et al2017) as these were screened through Affymetrix chips bear-ing a low number of overlapping SNPs Qatari completegenomes from (Rodriguez-Flores et al 2016) were specificallyselected not making a random population sample whichcould be comparable with our populations

Population Structure Differentiation and InbreedingPCA which infers worldwide axes of human genetic variationfrom the allele frequencies of various populations was carriedout in the pruned ldquorestricted data setrdquo by using the smartpcatool included in the EIGENSOFT package (Patterson et al2006 Price et al 2006) We ran ADMIXTURE (Alexander et al2009) to decompose genetic ancestries of the prunedldquorestricted data setrdquo using maximum likelihood for compo-nents Kfrac14 2 to Kfrac14 12 with the optimal K estimated throughcross-validation of the logistic regression (Kfrac14 6) Wrightrsquos FST

metric was calculated using Vcftools (Danecek et al 2011)The THREEPOP program from TreeMix was used to calculatef3 statistics (Reich et al 2009 Patterson et al 2012 Pickrell andPritchard 2012) which tests between alternative scenarios ofadmixture In order to analyze and visualize spatial populationstructure from georeferenced genetic samples and to definegenetic barriers and corridors the method of EEMS (Petkovaet al 2016) was used Geographic coordinates and a geneticdissimilarity matrix between populations were used to set upa map defining a grid of 500 demes and 3000000 Markovchain Monte Carlo (MCMC) iterations were run beforechecking for convergence Plots were generated in R followingEEMS v1 manual indications

Inference and Dating of Population AdmixturePopulation structure of the phased ldquoextended data setrdquo wasevaluated using the fineSTRUCTURE v207 package 18(Lawson et al 2012) fineSTRUCTURE captures informationon population structure provided by patterns of haplotypesimilarity calculated with Chromopainter v20 18 (Lawsonet al 2012) and performs a model-based Bayesian clusteringof genotypes From the results a coancestry heat map and adendrogram were inferred to visualize the number of statis-tically defined clusters that describe the data The clustersobtained by fineSTRUCTURE were then used to assign thepopulations as target or donorsurrogate when testing foradmixture Mutational rates and effective population sizeparameters were first estimated with an Estimation-Maximization algorithm running Chromopainter v2 on all22 autosomes for the entire data set with 10 iterations(Lawson et al 2012) The weighted average of these param-eters according to the SNP coverage of each chromosomeand the number of individuals was then used to compute thechromosomal painting Two Arabian groups were identifiedas target populations the western group represented by

Yemen and Saudi Arabia and the eastern group representedby Oman and the UAE We ran chromosomal painting foreach group Due to the possibility of common ancestry weexcluded the Levantine populations from surrogate popula-tions The painted chromosomes obtained for each clusterwere used in GLOBETROTTER v10 26 (Hellenthal et al 2014)to estimate the ratios and dates of the potential admixtureevents in the region Coancestry curves were estimated foreach target population both standardizing against a null(NULL 1) and standard (NULL 0) individual and consistencybetween each estimated parameter was checked A total of100 bootstrap resamplings were performed to estimate theprobability values of the admixture events considering theldquoNULLrdquo individual by calculating the proportion of non-sensical inferred dates (lt1 or gt400 generations) producedby the model (NULL 1) The ldquobest-guessrdquo scenario given byGLOBETROTTER v10 26 (Hellenthal et al 2014) was consid-ered for each target population Dates of admixture given ingenerations were converted to chronological time using ageneration interval of 29 years (Langergraber et al 2012)

Haplotype sharing between pairs of individuals was esti-mated by the refined IBD algorithm of Beagle v4026 27(Browning and Browning 2007) and visualized withCytoscape v32160 (Shannon et al 2003) All maps inthe present study were generated using Global Mapperv15 (httpwwwbluemarblegeocomproductsglobal-mapperphp)

Local Ancestry InferenceRFMix (Maples et al 2013) software which employs a LDmodel between markers to infer ancestry for each segmentof the genome between a mixture of putative ancestral panelsof haplotypes was used in the unpruned and phased data setfrom the four AP populations plus Iran The phased data fromthe 1000 Genomes populations of Great Britain (representingEuropean ancestry) Yoruba from Nigeria (sub-SaharanAfrican ancestry) and Indian Telugu (South Asian ancestry)were used as parental populations Information on ancestrywas obtained for each locus along chromosomes for everyindividual and these values were averaged in each popula-tion In order to identify regions which had a significantlyhigher proportion of a given parental ancestry we followedpreviously published studies (Bryc et al 2015) consideringonly regions outside the range defined by the genomemean 6 4SD The karyograms of the local ancestry acrossthe 22 chromosomes were generates in R using the ggbiopackage (Yin et al 2012)

Analysis of SelectioniHS (Voight et al 2006) was estimated in the SelScan package(Szpiech and Hernandez 2014) in the phased data set fromthe four AP populations plus Iran Scores were calculated foreach SNP within the population as a whole and standardizedwithin each of 100 bins of allele frequency using the normtool In order to identify loci potentially affected by positiveselection a percentile score was assigned based on the pro-portion of extreme iHS values in segment SelScan (Szpiechand Hernandez 2014) tool was also used to estimate cross

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

583

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

population extended haplotype homozygosity in each APpopulation by comparing with Great Britain Yoruba fromNigeria or Indian Telugu Consistently with other tests forselection only markers with MAFgt 1 were used for XP-EHH computation The obtained XP-EHH values were nor-malized by subtracting genome-wide mean XP-EHH and di-viding by standard deviation (Szpiech and Hernandez 2014)Genomic regions with percentile score exceeding 4SD and3SD were reported

Selected SNPs were mapped to genes in 5 kb flanking re-gion and significantly selected regions were represented incircos Plots generated in R (R Development Core Team 2018)using the shinyCircos graphical interface (Yu et al 2018)

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsThis work was financed by FEDERmdashFundo Europeu deDesenvolvimento Regional funds through COMPETE2020mdashOperacional Programme for Competitiveness andInternationalisation (POCI) Portugal 2020 and byPortuguese funds through FCTmdashFundac~ao para a Ciencia ea TecnologiaMinisterio da Ciencia Tecnologia e Inovac~ao inthe framework of the project ldquoBiomedical anthropologicalstudy in Arabian Peninsula based on high throughputgenomicsrdquo (POCI-01-0145-FEDER-016609) VF has a postdocgrant through FCT (SFRHBPD1149272016)

ReferencesAl-Abri A Podgorna E Rose JI Pereira L Mulligan CJ Silva NM Bayoumi

R Soares P Cerny V 2012 PleistocenendashHolocene boundary inSouthern Arabia from the perspective of human mtDNA variationAm J Phys Anthropol 149(2)291ndash298

Alexander DH Novembre J Lange K 2009 Fast model-based estimationof ancestry in unrelated individuals Genome Res 19(9)1655ndash1664

Alshamali F Pereira L Budowle B Poloni ES Currat M 2009 Localpopulation structure in Arabian Peninsula revealed by Y-STR diver-sity Hum Hered 68(1)45ndash54

An P Goedert JJ Donfield S Buchbinder S Kirk GD Detels R Winkler CA2014 Regulatory variation in HIV-1 dependency factor ZNRD1 asso-ciates with host resistance to HIV-1 acquisition J Infect Dis210(10)1539ndash1548

Auton A Brooks LD Durbin RM Garrison EP Kang HM Korbel JOMarchini JL McCarthy S McVean GA Abecasis GR 2015 A globalreference for human genetic variation Nature 526(7571)68ndash74

Basu Mallick C Iliescu FM Mols M Hill S Tamang R Chaubey G Goto RHo SYW Gallego Romero I Crivellaro F et al 2013 The light skinallele of SLC24A5 in South Asians and Europeans shares identity bydescent PLoS Genet 9(11)e1003912

Bayoumi R De Fanti S Sazzini M Giuliani C Quagliariello A Bortolini EBoattini A Al-Habori M Al-Zubairi AS Rose JI et al 2016 Positiveselection of lactase persistence among people of Southern ArabiaAm J Phys Anthropol 161(4)676ndash684

Browning BL Browning SR 2013 Improving the accuracy and efficiencyof identity-by-descent detection in population data Genetics194(2)459ndash471

Browning SR Browning BL 2007 Rapid and accurate haplotype phasingand missing-data inference for whole-genome association studies byuse of localized haplotype clustering Am J Hum Genet81(5)1084ndash1097

Brucato N Fernandes V Mazieres S Kusuma P Cox MP Ngrsquoangrsquoa JWOmar M Simeone-Senelle MC Frassati C Alshamali F et al 2018The Comoros show the earliest Austronesian gene flow into theSwahili corridor Am J Hum Genet 102(1)58ndash68

Bryc K Durand EY Macpherson JM Reich D Mountain JL 2015 Thegenetic ancestry of African Americans Latinos and EuropeanAmericans across the United States Am J Hum Genet 96(1)37ndash53

Campbell MC Tishkoff SA 2008 African genetic diversity implica-tions for human demographic history modern human originsand complex disease mapping Annu Rev Genomics Hum Genet9403ndash433

Candille SI Absher DM Beleza S Bauchet M McEvoy B GarrisonNA Li JZ Myers RM Barsh GS Tang H et al 2012 Genome-wideassociation studies of quantitatively measured skin hair and eyepigmentation in four European populations PLoS One 7(10)e48294

Cerny V Mulligan CJ Fernandes V Silva NM Alshamali F Non A HarichN Cherni L El Gaaied AB Al-Meeri A et al 2011 Internal diversifi-cation of mitochondrial haplogroup R0a reveals post-last glacialmaximum demographic expansions in South Arabia Mol Biol Evol28(1)71ndash78

Cerny V Pereira L Kujanova M Vasikova A Hajek M Morris M MulliganCJ 2009 Out of Arabiamdashthe settlement of island Soqotra as revealedby mitochondrial and Y chromosome genetic diversity Am J PhysAnthropol 138(4)439ndash447

Chang CC Chow CC Tellier LC Vattikuti S Purcell SM Lee JJ 2015Second-generation PLINK rising to the challenge of larger and richerdatasets Gigascience 47

Danecek P Auton A Abecasis G Albers CA Banks E DePristo MAHandsaker RE Lunter G Marth GT Sherry ST et al 2011 The variantcall format and VCFtools Bioinformatics 27(15)2156ndash2158

Delaneau O Marchini J Zagury JF 2011 A linear complexity phasingmethod for thousands of genomes Nat Methods 9(2)179ndash181

Emmrich S Streltsov A Schmidt F Thangapandi VR Reinhardt DKlusmann JH 2014 LincRNAs MONC and MIR100HG act as onco-genes in acute megakaryoblastic leukemia Mol Cancer 13171

Fernandes V Alshamali F Alves M Costa MD Pereira JB Silva NMCherni L Harich N Cerny V Soares P et al 2012 The Arabian cradlemitochondrial relicts of the first steps along the southern route outof Africa Am J Hum Genet 90(2)347ndash355

Fernandes V Triska P Pereira JB Alshamali F Rito T Machado AFajkosova Z Cavadas B Cerny V Soares P et al 2015 Genetic stra-tigraphy of key demographic events in Arabia PLoS One10(3)e0118625

Frazer KA Ballinger DG Cox DR Hinds DA Stuve LL Gibbs RA BelmontJW Boudreau A Hardenbol P Leal SM et al 2007 A second gener-ation human haplotype map of over 31 million SNPs Nature449(7164)851ndash861

Fu Q Posth C Hajdinjak M Petr M Mallick S Fernandes D FurtwanglerA Haak W Meyer M Mittnik A et al 2016 The genetic history of IceAge Europe Nature 534(7606)200ndash205

Gallego Romero I Basu Mallick C Liebert A Crivellaro F Chaubey G ItanY Metspalu M Eaaswarkhanth M Pitchappan R Villems R et al2012 Herders of Indian and European cattle share their predomi-nant allele for lactase persistence Mol Biol Evol 29(1)249ndash260

Gravel S Zakharia F Moreno-Estrada A Byrnes JK Muzzio M Rodriguez-Flores JL Kenny EE Gignoux CR Maples BK Guiblet W et al 2013Reconstructing Native American migrations from whole-genomeand whole-exome data PLoS Genet 9(12)e1004023

Hellenthal G Busby GBJ Band G Wilson JF Capelli C Falush D Myers S2014 A genetic atlas of human admixture history Science343(6172)747ndash751

Hernandez-Pacheco N Flores C Alonso S Eng C Mak ACY Hunstman SHu D White MJ Oh SS Meade K et al 2017 Identification of a novellocus associated with skin colour in African-admixed populationsSci Rep 744548

Hodgson JA Mulligan CJ Al-Meeri A Raaum RL 2014 Early back-to-Africa migration into the horn of Africa PLoS Genet 10(6)e1004393

Fernandes et al doi101093molbevmsz005 MBE

584

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Hogarth DG 1904 The penetration of Arabia a record of the develop-ment of western knowledge concerning the Arabian Peninsula NewYork Cambridge Library Collection

Hunter-Zinck H Musharoff S Salit J Al-Ali KA Chouchane L Gohar AMatthews R Butler MW Fuller J Hackett NR et al 2010 Populationgenetic structure of the people of Qatar Am J Hum Genet87(1)17ndash25

John SE Thareja G Hebbar P Behbehani K Thanaraj TA Alsmadi O2015 Kuwaiti population subgroup of nomadic Bedouin ancestrymdashwhole genome sequence and analysis Genom Data 3116ndash127

Kivisild T Reidla M Metspalu E Rosa A Brehm A Pennarun E Parik JGeberhiwot T Usanga E Villems R 2004 Ethiopian mitochondrialDNA heritage tracking gene flow across and around the gate oftears Am J Hum Genet 75(5)752ndash770

Laayouni H Oosting M Luisi P Ioana M Alonso S Rica~no-Ponce ITrynka G Zhernakova A Plantinga TS Cheng S-C et al 2014Convergent evolution in European and Rroma populations revealspressure exerted by plague on Toll-like receptors Proc Natl Acad SciU S A 111(7)2668ndash2673

Lahr MM Foley R 2005 Multiple dispersals and modern human originsEvol Anthropol 3(2)48ndash60

Langergraber KE Prufer K Rowney C Boesch C Crockford C Fawcett KInoue E Inoue-Muruyama M Mitani JC Muller MN et al 2012Generation times in wild chimpanzees and gorillas suggest earlierdivergence times in great ape and human evolution Proc Natl AcadSci U S A 109(39)15716ndash15721

Laso-Jadart R Harmant C Quach H Zidane N Tyler-Smith C Mehdi QAyub Q Quintana-Murci L Patin E 2017 The genetic legacy of theIndian Ocean slave trade recent admixture and post-admixture se-lection in the Makranis of Pakistan Am J Hum Genet101(6)977ndash984

Lawson DJ Hellenthal G Myers S Falush D 2012 Inference of populationstructure using dense haplotype data PLoS Genet 8(1)e1002453

Li JZ Absher DM Tang H Southwick AM Casto AM Ramachandran SCann HM Barsh GS Feldman M Cavalli-Sforza LL et al 2008Worldwide human relationships inferred from genome-wide pat-terns of variation Science 319(5866)1100ndash1104

Loh PR Lipson M Patterson N Moorjani P Pickrell JK Reich D Berger B2013 Inferring admixture histories of human populations using link-age disequilibrium Genetics 193(4)1233ndash1254

Lovejoy PE 2011 Transformations in slavery a history of slavery in AfricaCambridge Cambridge University Press

Macaulay V Hill C Achilli A Rengo C Clarke D Meehan W Blackburn JSemino O Scozzari R Cruciani F et al 2005 Single rapid coastalsettlement of Asia revealed by analysis of complete mitochondrialgenomes Science 308(5724)1034

Malaspinas A-S Westaway MC Muller C Sousa VC Lao O Alves IBergstrom A Athanasiadis G Cheng JY Crawford JE et al 2016 Agenomic history of Aboriginal Australia Nature 538(7624)207

Mallick S Li H Lipson M Mathieson I Gymrek M Racimo F Zhao MChennagiri N Nordenfelt S Tandon A et al 2016 The SimonsGenome Diversity Project 300 genomes from 142 diverse popula-tions Nature 538(7624)201

Maples BK Gravel S Kenny EE Bustamante CD 2013 RFMix a discrim-inative modeling approach for rapid and robust local-ancestry infer-ence Am J Hum Genet 93(2)278ndash288

McManus KF Taravella AM Henn BM Bustamante CD Sikora MCornejo OE 2017 Population genetic analysis of the DARC locus(Duffy) reveals adaptation from standing variation associated withmalaria resistance in humans PLoS Genet 13(3)e1006560

Moorjani P Patterson N Hirschhorn JN Keinan A Hao L Atzmon GBurns E Ostrer H Price AL Reich D 2011 The history of African geneflow into Southern Europeans Levantines and Jews PLoS Genet7(4)e1001373

Musilova E Fernandes V Silva NM Soares P Alshamali F Harich NCherni L Gaaied AB Al-Meeri A Pereira L et al 2011 Populationhistory of the Red Seamdashgenetic exchanges between the ArabianPeninsula and East Africa signaled in the mitochondrial DNA HV1haplogroup Am J Phys Anthropol 145592ndash598

Ozato K Shin D-M Chang T-H Morse HC 2008 TRIM family proteinsand their emerging roles in innate immunity Nat Rev Immunol8(11)849ndash860

Patin E Lopez M Grollemund R Verdu P Harmant C Quach H Laval GPerry GH Barreiro LB Froment A et al 2017 Dispersals and geneticadaptation of Bantu-speaking populations in Africa and NorthAmerica Science 356(6337)543ndash546

Patterson N Moorjani P Luo Y Mallick S Rohland N Zhan YGenschoreck T Webster T Reich D 2012 Ancient admixture inhuman history Genetics 192(3)1065

Patterson N Price AL Reich D 2006 Population structure and eigena-nalysis PLoS Genet 2(12)e190

Petkova D Novembre J Stephens M 2016 Visualizing spatial populationstructure with estimated effective migration surfaces Nat Genet48(1)94ndash100

Pickrell JK Coop G Novembre J Kudaravalli S Li JZ Absher D SrinivasanBS Barsh GS Myers RM Feldman MW et al 2009 Signals of recentpositive selection in a worldwide sample of human populationsGenome Res 19(5)826ndash837

Pickrell JK Pritchard JK 2012 Inference of population splits and mixturesfrom genome-wide allele frequency data PLoS Genet8(11)e1002967

Price AL Patterson NJ Plenge RM Weinblatt ME Shadick NA Reich D2006 Principal components analysis corrects for stratification ingenome-wide association studies Nat Genet 38(8)904ndash909

Purcell S Neale B Todd-Brown K Thomas L Ferreira MA Bender DMaller J Sklar P de Bakker PI Daly MJ et al 2007 PLINK a tool set forwhole-genome association and population-based linkage analysesAm J Hum Genet 81(3)559ndash575

Quintana-Murci L Semino O Bandelt H-J Passarino G McElreavey KSantachiara-Benerecetti AS 1999 Genetic evidence of an early exit ofHomo sapiens sapiens from Africa through eastern Africa Nat Genet23(4)437

R Development Core Team 2018 R a language and environment forstatistical computing Vienna (Austria)R Foundation for StatisticalComputing

Racimo F 2016 Testing for ancient selection using cross-populationallele frequency differentiation Genetics 202(2)733ndash750

Ranciaro A Campbell MC Hirbo JB Ko W-Y Froment A Anagnostou PKotze MJ Ibrahim M Nyambo T Omar SA et al 2014 Geneticorigins of lactase persistence and the spread of pastoralism inAfrica Am J Hum Genet 94(4)496ndash510

Reich D Thangaraj K Patterson N Price AL Singh L 2009Reconstructing Indian population history Nature461(7263)489ndash494

Roche B Rougeron V Quintana-Murci L Renaud F Abbate JLPrugnolle F 2017 Might interspecific interactions between patho-gens drive host evolution The case of Plasmodium species andDuffy-negativity in human populations Trends Parasitol 33(1)21ndash29

Rodriguez-Flores JL Fakhro K Agosto-Perez F Ramstetter MD Arbiza LVincent TL Robay A Malek JA Suhre K Chouchane L et al 2016Indigenous Arabs are descendants of the earliest split from ancientEurasian populations Genome Res 26(2)151ndash162

Rosenberg NA Huang L Jewett EM Szpiech ZA Jankovic I Boehnke M2010 Genome-wide association studies in diverse populations NatRev Genet 11(5)356ndash366

Russo A Di Gaetano C Cugliari G Matullo G 2018 Advances in thegenetics of hypertension the effect of rare variants Int J Mol Sci19(3)688

Segal R 2002 Islamrsquos black slaves the other black diaspora LondonAtlantic Books

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin NSchwikowski B Ideker T 2003 Cytoscape a software environmentfor integrated models of biomolecular interaction networks GenomeRes 13(11)2498ndash2504

Skoglund P Thompson JC Prendergast ME Mittnik A Sirak K HajdinjakM Salie T Rohland N Mallick S Peltzer A et al 2017 Reconstructingprehistoric African population structure Cell 171(1)59ndash71 e21

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

585

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Snow RW Amratia P Zamani G Mundia CW Noor AM Memish ZA AlZahrani MH Al Jasari A Fikri M Atta H 2013 The malaria transitionon the Arabian Peninsula progress toward a malaria-free regionbetween 1960ndash2010 Adv Parasitol 82205ndash251

Soares P Alshamali F Pereira JB Fernandes V Silva NM Afonso C CostaMD Musilova E Macaulay V Richards MB et al 2012 The expan-sion of mtDNA haplogroup L3 within and out of Africa Mol BiolEvol 29(3)915ndash927

Szpiech ZA Hernandez RD 2014 selscan an efficient multithreadedprogram to perform EHH-based scans for positive selection MolBiol Evol 31(10)2824ndash2827

Tekola-Ayele F Adeyemo A Chen G Hailu E Aseffa AA-O Davey GA-ONewport MJ Rotimi CA-OX 2015 Novel genomic signals of recentselection in an Ethiopian population Eur J Hum Genet23(8)1085ndash1092

Thompson EE Kuttab-Boulos H Witonsky D Yang L Roe BA Di RienzoA 2004 CYP3A variation and the evolution of salt-sensitivity var-iants Am J Hum Genet 75(6)1059ndash1069

Tishkoff SA Reed FA Ranciaro A Voight BF Babbitt CC Silverman JSPowell K Mortensen HM Hirbo JB Osman M et al 2007Convergent adaptation of human lactase persistence in Africa andEurope Nat Genet 39(1)31ndash40

Triska P Soares P Patin E Fernandes V Cerny V Pereira L 2015Extensive admixture and selective pressure across the Sahel BeltGenome Biol Evol 7(12)3484ndash3495

Uchil PD Quinlan BD Chan W-T Luna JM Mothes W 2008 TRIM E3ligases interfere with early and late stages of the retroviral life cyclePLoS Pathog 4(2)e16

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Vyas DN Al-Meeri A Mulligan CJ 2017 Testing support for the northernand southern dispersal routes out of Africa an analysis of Levantineand southern Arabian populations Am J Phys Anthropol164(4)736ndash749

Vyas DN Kitchen A Miro-Herrans AT Pearson LN Al-Meeri A MulliganCJ 2016 Bayesian analyses of Yemeni mitochondrial genomes sug-gest multiple migration events with Africa and Western Eurasia AmJ Phys Anthropol 159(3)382ndash393

Wagh K Bhatia A Alexe G Reddy A Ravikumar V Seiler M Boemo MYao M Cronk L Naqvi A et al 2012 Lactase persistence and lipidpathway selection in the Maasai PLoS One 7(9)e44751

Yin T Cook D Lawrence M 2012 ggbio an R package for extending thegrammar of graphics for genomic data Genome Biol 13(8)R77

Yu Y Ouyang Y Yao W 2018 shinyCircos an RShiny application forinteractive creation of Circos plot Bioinformatics 34(7)1229ndash1231

Zhao F McParland S Kearney F Du L Berry DP 2015 Detection ofselection signatures in dairy and beef cattle using high-density ge-nomic information Genet Sel Evol 4749

Fernandes et al doi101093molbevmsz005 MBE

586

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

population extended haplotype homozygosity in each APpopulation by comparing with Great Britain Yoruba fromNigeria or Indian Telugu Consistently with other tests forselection only markers with MAFgt 1 were used for XP-EHH computation The obtained XP-EHH values were nor-malized by subtracting genome-wide mean XP-EHH and di-viding by standard deviation (Szpiech and Hernandez 2014)Genomic regions with percentile score exceeding 4SD and3SD were reported

Selected SNPs were mapped to genes in 5 kb flanking re-gion and significantly selected regions were represented incircos Plots generated in R (R Development Core Team 2018)using the shinyCircos graphical interface (Yu et al 2018)

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsThis work was financed by FEDERmdashFundo Europeu deDesenvolvimento Regional funds through COMPETE2020mdashOperacional Programme for Competitiveness andInternationalisation (POCI) Portugal 2020 and byPortuguese funds through FCTmdashFundac~ao para a Ciencia ea TecnologiaMinisterio da Ciencia Tecnologia e Inovac~ao inthe framework of the project ldquoBiomedical anthropologicalstudy in Arabian Peninsula based on high throughputgenomicsrdquo (POCI-01-0145-FEDER-016609) VF has a postdocgrant through FCT (SFRHBPD1149272016)

ReferencesAl-Abri A Podgorna E Rose JI Pereira L Mulligan CJ Silva NM Bayoumi

R Soares P Cerny V 2012 PleistocenendashHolocene boundary inSouthern Arabia from the perspective of human mtDNA variationAm J Phys Anthropol 149(2)291ndash298

Alexander DH Novembre J Lange K 2009 Fast model-based estimationof ancestry in unrelated individuals Genome Res 19(9)1655ndash1664

Alshamali F Pereira L Budowle B Poloni ES Currat M 2009 Localpopulation structure in Arabian Peninsula revealed by Y-STR diver-sity Hum Hered 68(1)45ndash54

An P Goedert JJ Donfield S Buchbinder S Kirk GD Detels R Winkler CA2014 Regulatory variation in HIV-1 dependency factor ZNRD1 asso-ciates with host resistance to HIV-1 acquisition J Infect Dis210(10)1539ndash1548

Auton A Brooks LD Durbin RM Garrison EP Kang HM Korbel JOMarchini JL McCarthy S McVean GA Abecasis GR 2015 A globalreference for human genetic variation Nature 526(7571)68ndash74

Basu Mallick C Iliescu FM Mols M Hill S Tamang R Chaubey G Goto RHo SYW Gallego Romero I Crivellaro F et al 2013 The light skinallele of SLC24A5 in South Asians and Europeans shares identity bydescent PLoS Genet 9(11)e1003912

Bayoumi R De Fanti S Sazzini M Giuliani C Quagliariello A Bortolini EBoattini A Al-Habori M Al-Zubairi AS Rose JI et al 2016 Positiveselection of lactase persistence among people of Southern ArabiaAm J Phys Anthropol 161(4)676ndash684

Browning BL Browning SR 2013 Improving the accuracy and efficiencyof identity-by-descent detection in population data Genetics194(2)459ndash471

Browning SR Browning BL 2007 Rapid and accurate haplotype phasingand missing-data inference for whole-genome association studies byuse of localized haplotype clustering Am J Hum Genet81(5)1084ndash1097

Brucato N Fernandes V Mazieres S Kusuma P Cox MP Ngrsquoangrsquoa JWOmar M Simeone-Senelle MC Frassati C Alshamali F et al 2018The Comoros show the earliest Austronesian gene flow into theSwahili corridor Am J Hum Genet 102(1)58ndash68

Bryc K Durand EY Macpherson JM Reich D Mountain JL 2015 Thegenetic ancestry of African Americans Latinos and EuropeanAmericans across the United States Am J Hum Genet 96(1)37ndash53

Campbell MC Tishkoff SA 2008 African genetic diversity implica-tions for human demographic history modern human originsand complex disease mapping Annu Rev Genomics Hum Genet9403ndash433

Candille SI Absher DM Beleza S Bauchet M McEvoy B GarrisonNA Li JZ Myers RM Barsh GS Tang H et al 2012 Genome-wideassociation studies of quantitatively measured skin hair and eyepigmentation in four European populations PLoS One 7(10)e48294

Cerny V Mulligan CJ Fernandes V Silva NM Alshamali F Non A HarichN Cherni L El Gaaied AB Al-Meeri A et al 2011 Internal diversifi-cation of mitochondrial haplogroup R0a reveals post-last glacialmaximum demographic expansions in South Arabia Mol Biol Evol28(1)71ndash78

Cerny V Pereira L Kujanova M Vasikova A Hajek M Morris M MulliganCJ 2009 Out of Arabiamdashthe settlement of island Soqotra as revealedby mitochondrial and Y chromosome genetic diversity Am J PhysAnthropol 138(4)439ndash447

Chang CC Chow CC Tellier LC Vattikuti S Purcell SM Lee JJ 2015Second-generation PLINK rising to the challenge of larger and richerdatasets Gigascience 47

Danecek P Auton A Abecasis G Albers CA Banks E DePristo MAHandsaker RE Lunter G Marth GT Sherry ST et al 2011 The variantcall format and VCFtools Bioinformatics 27(15)2156ndash2158

Delaneau O Marchini J Zagury JF 2011 A linear complexity phasingmethod for thousands of genomes Nat Methods 9(2)179ndash181

Emmrich S Streltsov A Schmidt F Thangapandi VR Reinhardt DKlusmann JH 2014 LincRNAs MONC and MIR100HG act as onco-genes in acute megakaryoblastic leukemia Mol Cancer 13171

Fernandes V Alshamali F Alves M Costa MD Pereira JB Silva NMCherni L Harich N Cerny V Soares P et al 2012 The Arabian cradlemitochondrial relicts of the first steps along the southern route outof Africa Am J Hum Genet 90(2)347ndash355

Fernandes V Triska P Pereira JB Alshamali F Rito T Machado AFajkosova Z Cavadas B Cerny V Soares P et al 2015 Genetic stra-tigraphy of key demographic events in Arabia PLoS One10(3)e0118625

Frazer KA Ballinger DG Cox DR Hinds DA Stuve LL Gibbs RA BelmontJW Boudreau A Hardenbol P Leal SM et al 2007 A second gener-ation human haplotype map of over 31 million SNPs Nature449(7164)851ndash861

Fu Q Posth C Hajdinjak M Petr M Mallick S Fernandes D FurtwanglerA Haak W Meyer M Mittnik A et al 2016 The genetic history of IceAge Europe Nature 534(7606)200ndash205

Gallego Romero I Basu Mallick C Liebert A Crivellaro F Chaubey G ItanY Metspalu M Eaaswarkhanth M Pitchappan R Villems R et al2012 Herders of Indian and European cattle share their predomi-nant allele for lactase persistence Mol Biol Evol 29(1)249ndash260

Gravel S Zakharia F Moreno-Estrada A Byrnes JK Muzzio M Rodriguez-Flores JL Kenny EE Gignoux CR Maples BK Guiblet W et al 2013Reconstructing Native American migrations from whole-genomeand whole-exome data PLoS Genet 9(12)e1004023

Hellenthal G Busby GBJ Band G Wilson JF Capelli C Falush D Myers S2014 A genetic atlas of human admixture history Science343(6172)747ndash751

Hernandez-Pacheco N Flores C Alonso S Eng C Mak ACY Hunstman SHu D White MJ Oh SS Meade K et al 2017 Identification of a novellocus associated with skin colour in African-admixed populationsSci Rep 744548

Hodgson JA Mulligan CJ Al-Meeri A Raaum RL 2014 Early back-to-Africa migration into the horn of Africa PLoS Genet 10(6)e1004393

Fernandes et al doi101093molbevmsz005 MBE

584

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Hogarth DG 1904 The penetration of Arabia a record of the develop-ment of western knowledge concerning the Arabian Peninsula NewYork Cambridge Library Collection

Hunter-Zinck H Musharoff S Salit J Al-Ali KA Chouchane L Gohar AMatthews R Butler MW Fuller J Hackett NR et al 2010 Populationgenetic structure of the people of Qatar Am J Hum Genet87(1)17ndash25

John SE Thareja G Hebbar P Behbehani K Thanaraj TA Alsmadi O2015 Kuwaiti population subgroup of nomadic Bedouin ancestrymdashwhole genome sequence and analysis Genom Data 3116ndash127

Kivisild T Reidla M Metspalu E Rosa A Brehm A Pennarun E Parik JGeberhiwot T Usanga E Villems R 2004 Ethiopian mitochondrialDNA heritage tracking gene flow across and around the gate oftears Am J Hum Genet 75(5)752ndash770

Laayouni H Oosting M Luisi P Ioana M Alonso S Rica~no-Ponce ITrynka G Zhernakova A Plantinga TS Cheng S-C et al 2014Convergent evolution in European and Rroma populations revealspressure exerted by plague on Toll-like receptors Proc Natl Acad SciU S A 111(7)2668ndash2673

Lahr MM Foley R 2005 Multiple dispersals and modern human originsEvol Anthropol 3(2)48ndash60

Langergraber KE Prufer K Rowney C Boesch C Crockford C Fawcett KInoue E Inoue-Muruyama M Mitani JC Muller MN et al 2012Generation times in wild chimpanzees and gorillas suggest earlierdivergence times in great ape and human evolution Proc Natl AcadSci U S A 109(39)15716ndash15721

Laso-Jadart R Harmant C Quach H Zidane N Tyler-Smith C Mehdi QAyub Q Quintana-Murci L Patin E 2017 The genetic legacy of theIndian Ocean slave trade recent admixture and post-admixture se-lection in the Makranis of Pakistan Am J Hum Genet101(6)977ndash984

Lawson DJ Hellenthal G Myers S Falush D 2012 Inference of populationstructure using dense haplotype data PLoS Genet 8(1)e1002453

Li JZ Absher DM Tang H Southwick AM Casto AM Ramachandran SCann HM Barsh GS Feldman M Cavalli-Sforza LL et al 2008Worldwide human relationships inferred from genome-wide pat-terns of variation Science 319(5866)1100ndash1104

Loh PR Lipson M Patterson N Moorjani P Pickrell JK Reich D Berger B2013 Inferring admixture histories of human populations using link-age disequilibrium Genetics 193(4)1233ndash1254

Lovejoy PE 2011 Transformations in slavery a history of slavery in AfricaCambridge Cambridge University Press

Macaulay V Hill C Achilli A Rengo C Clarke D Meehan W Blackburn JSemino O Scozzari R Cruciani F et al 2005 Single rapid coastalsettlement of Asia revealed by analysis of complete mitochondrialgenomes Science 308(5724)1034

Malaspinas A-S Westaway MC Muller C Sousa VC Lao O Alves IBergstrom A Athanasiadis G Cheng JY Crawford JE et al 2016 Agenomic history of Aboriginal Australia Nature 538(7624)207

Mallick S Li H Lipson M Mathieson I Gymrek M Racimo F Zhao MChennagiri N Nordenfelt S Tandon A et al 2016 The SimonsGenome Diversity Project 300 genomes from 142 diverse popula-tions Nature 538(7624)201

Maples BK Gravel S Kenny EE Bustamante CD 2013 RFMix a discrim-inative modeling approach for rapid and robust local-ancestry infer-ence Am J Hum Genet 93(2)278ndash288

McManus KF Taravella AM Henn BM Bustamante CD Sikora MCornejo OE 2017 Population genetic analysis of the DARC locus(Duffy) reveals adaptation from standing variation associated withmalaria resistance in humans PLoS Genet 13(3)e1006560

Moorjani P Patterson N Hirschhorn JN Keinan A Hao L Atzmon GBurns E Ostrer H Price AL Reich D 2011 The history of African geneflow into Southern Europeans Levantines and Jews PLoS Genet7(4)e1001373

Musilova E Fernandes V Silva NM Soares P Alshamali F Harich NCherni L Gaaied AB Al-Meeri A Pereira L et al 2011 Populationhistory of the Red Seamdashgenetic exchanges between the ArabianPeninsula and East Africa signaled in the mitochondrial DNA HV1haplogroup Am J Phys Anthropol 145592ndash598

Ozato K Shin D-M Chang T-H Morse HC 2008 TRIM family proteinsand their emerging roles in innate immunity Nat Rev Immunol8(11)849ndash860

Patin E Lopez M Grollemund R Verdu P Harmant C Quach H Laval GPerry GH Barreiro LB Froment A et al 2017 Dispersals and geneticadaptation of Bantu-speaking populations in Africa and NorthAmerica Science 356(6337)543ndash546

Patterson N Moorjani P Luo Y Mallick S Rohland N Zhan YGenschoreck T Webster T Reich D 2012 Ancient admixture inhuman history Genetics 192(3)1065

Patterson N Price AL Reich D 2006 Population structure and eigena-nalysis PLoS Genet 2(12)e190

Petkova D Novembre J Stephens M 2016 Visualizing spatial populationstructure with estimated effective migration surfaces Nat Genet48(1)94ndash100

Pickrell JK Coop G Novembre J Kudaravalli S Li JZ Absher D SrinivasanBS Barsh GS Myers RM Feldman MW et al 2009 Signals of recentpositive selection in a worldwide sample of human populationsGenome Res 19(5)826ndash837

Pickrell JK Pritchard JK 2012 Inference of population splits and mixturesfrom genome-wide allele frequency data PLoS Genet8(11)e1002967

Price AL Patterson NJ Plenge RM Weinblatt ME Shadick NA Reich D2006 Principal components analysis corrects for stratification ingenome-wide association studies Nat Genet 38(8)904ndash909

Purcell S Neale B Todd-Brown K Thomas L Ferreira MA Bender DMaller J Sklar P de Bakker PI Daly MJ et al 2007 PLINK a tool set forwhole-genome association and population-based linkage analysesAm J Hum Genet 81(3)559ndash575

Quintana-Murci L Semino O Bandelt H-J Passarino G McElreavey KSantachiara-Benerecetti AS 1999 Genetic evidence of an early exit ofHomo sapiens sapiens from Africa through eastern Africa Nat Genet23(4)437

R Development Core Team 2018 R a language and environment forstatistical computing Vienna (Austria)R Foundation for StatisticalComputing

Racimo F 2016 Testing for ancient selection using cross-populationallele frequency differentiation Genetics 202(2)733ndash750

Ranciaro A Campbell MC Hirbo JB Ko W-Y Froment A Anagnostou PKotze MJ Ibrahim M Nyambo T Omar SA et al 2014 Geneticorigins of lactase persistence and the spread of pastoralism inAfrica Am J Hum Genet 94(4)496ndash510

Reich D Thangaraj K Patterson N Price AL Singh L 2009Reconstructing Indian population history Nature461(7263)489ndash494

Roche B Rougeron V Quintana-Murci L Renaud F Abbate JLPrugnolle F 2017 Might interspecific interactions between patho-gens drive host evolution The case of Plasmodium species andDuffy-negativity in human populations Trends Parasitol 33(1)21ndash29

Rodriguez-Flores JL Fakhro K Agosto-Perez F Ramstetter MD Arbiza LVincent TL Robay A Malek JA Suhre K Chouchane L et al 2016Indigenous Arabs are descendants of the earliest split from ancientEurasian populations Genome Res 26(2)151ndash162

Rosenberg NA Huang L Jewett EM Szpiech ZA Jankovic I Boehnke M2010 Genome-wide association studies in diverse populations NatRev Genet 11(5)356ndash366

Russo A Di Gaetano C Cugliari G Matullo G 2018 Advances in thegenetics of hypertension the effect of rare variants Int J Mol Sci19(3)688

Segal R 2002 Islamrsquos black slaves the other black diaspora LondonAtlantic Books

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin NSchwikowski B Ideker T 2003 Cytoscape a software environmentfor integrated models of biomolecular interaction networks GenomeRes 13(11)2498ndash2504

Skoglund P Thompson JC Prendergast ME Mittnik A Sirak K HajdinjakM Salie T Rohland N Mallick S Peltzer A et al 2017 Reconstructingprehistoric African population structure Cell 171(1)59ndash71 e21

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

585

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Snow RW Amratia P Zamani G Mundia CW Noor AM Memish ZA AlZahrani MH Al Jasari A Fikri M Atta H 2013 The malaria transitionon the Arabian Peninsula progress toward a malaria-free regionbetween 1960ndash2010 Adv Parasitol 82205ndash251

Soares P Alshamali F Pereira JB Fernandes V Silva NM Afonso C CostaMD Musilova E Macaulay V Richards MB et al 2012 The expan-sion of mtDNA haplogroup L3 within and out of Africa Mol BiolEvol 29(3)915ndash927

Szpiech ZA Hernandez RD 2014 selscan an efficient multithreadedprogram to perform EHH-based scans for positive selection MolBiol Evol 31(10)2824ndash2827

Tekola-Ayele F Adeyemo A Chen G Hailu E Aseffa AA-O Davey GA-ONewport MJ Rotimi CA-OX 2015 Novel genomic signals of recentselection in an Ethiopian population Eur J Hum Genet23(8)1085ndash1092

Thompson EE Kuttab-Boulos H Witonsky D Yang L Roe BA Di RienzoA 2004 CYP3A variation and the evolution of salt-sensitivity var-iants Am J Hum Genet 75(6)1059ndash1069

Tishkoff SA Reed FA Ranciaro A Voight BF Babbitt CC Silverman JSPowell K Mortensen HM Hirbo JB Osman M et al 2007Convergent adaptation of human lactase persistence in Africa andEurope Nat Genet 39(1)31ndash40

Triska P Soares P Patin E Fernandes V Cerny V Pereira L 2015Extensive admixture and selective pressure across the Sahel BeltGenome Biol Evol 7(12)3484ndash3495

Uchil PD Quinlan BD Chan W-T Luna JM Mothes W 2008 TRIM E3ligases interfere with early and late stages of the retroviral life cyclePLoS Pathog 4(2)e16

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Vyas DN Al-Meeri A Mulligan CJ 2017 Testing support for the northernand southern dispersal routes out of Africa an analysis of Levantineand southern Arabian populations Am J Phys Anthropol164(4)736ndash749

Vyas DN Kitchen A Miro-Herrans AT Pearson LN Al-Meeri A MulliganCJ 2016 Bayesian analyses of Yemeni mitochondrial genomes sug-gest multiple migration events with Africa and Western Eurasia AmJ Phys Anthropol 159(3)382ndash393

Wagh K Bhatia A Alexe G Reddy A Ravikumar V Seiler M Boemo MYao M Cronk L Naqvi A et al 2012 Lactase persistence and lipidpathway selection in the Maasai PLoS One 7(9)e44751

Yin T Cook D Lawrence M 2012 ggbio an R package for extending thegrammar of graphics for genomic data Genome Biol 13(8)R77

Yu Y Ouyang Y Yao W 2018 shinyCircos an RShiny application forinteractive creation of Circos plot Bioinformatics 34(7)1229ndash1231

Zhao F McParland S Kearney F Du L Berry DP 2015 Detection ofselection signatures in dairy and beef cattle using high-density ge-nomic information Genet Sel Evol 4749

Fernandes et al doi101093molbevmsz005 MBE

586

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Hogarth DG 1904 The penetration of Arabia a record of the develop-ment of western knowledge concerning the Arabian Peninsula NewYork Cambridge Library Collection

Hunter-Zinck H Musharoff S Salit J Al-Ali KA Chouchane L Gohar AMatthews R Butler MW Fuller J Hackett NR et al 2010 Populationgenetic structure of the people of Qatar Am J Hum Genet87(1)17ndash25

John SE Thareja G Hebbar P Behbehani K Thanaraj TA Alsmadi O2015 Kuwaiti population subgroup of nomadic Bedouin ancestrymdashwhole genome sequence and analysis Genom Data 3116ndash127

Kivisild T Reidla M Metspalu E Rosa A Brehm A Pennarun E Parik JGeberhiwot T Usanga E Villems R 2004 Ethiopian mitochondrialDNA heritage tracking gene flow across and around the gate oftears Am J Hum Genet 75(5)752ndash770

Laayouni H Oosting M Luisi P Ioana M Alonso S Rica~no-Ponce ITrynka G Zhernakova A Plantinga TS Cheng S-C et al 2014Convergent evolution in European and Rroma populations revealspressure exerted by plague on Toll-like receptors Proc Natl Acad SciU S A 111(7)2668ndash2673

Lahr MM Foley R 2005 Multiple dispersals and modern human originsEvol Anthropol 3(2)48ndash60

Langergraber KE Prufer K Rowney C Boesch C Crockford C Fawcett KInoue E Inoue-Muruyama M Mitani JC Muller MN et al 2012Generation times in wild chimpanzees and gorillas suggest earlierdivergence times in great ape and human evolution Proc Natl AcadSci U S A 109(39)15716ndash15721

Laso-Jadart R Harmant C Quach H Zidane N Tyler-Smith C Mehdi QAyub Q Quintana-Murci L Patin E 2017 The genetic legacy of theIndian Ocean slave trade recent admixture and post-admixture se-lection in the Makranis of Pakistan Am J Hum Genet101(6)977ndash984

Lawson DJ Hellenthal G Myers S Falush D 2012 Inference of populationstructure using dense haplotype data PLoS Genet 8(1)e1002453

Li JZ Absher DM Tang H Southwick AM Casto AM Ramachandran SCann HM Barsh GS Feldman M Cavalli-Sforza LL et al 2008Worldwide human relationships inferred from genome-wide pat-terns of variation Science 319(5866)1100ndash1104

Loh PR Lipson M Patterson N Moorjani P Pickrell JK Reich D Berger B2013 Inferring admixture histories of human populations using link-age disequilibrium Genetics 193(4)1233ndash1254

Lovejoy PE 2011 Transformations in slavery a history of slavery in AfricaCambridge Cambridge University Press

Macaulay V Hill C Achilli A Rengo C Clarke D Meehan W Blackburn JSemino O Scozzari R Cruciani F et al 2005 Single rapid coastalsettlement of Asia revealed by analysis of complete mitochondrialgenomes Science 308(5724)1034

Malaspinas A-S Westaway MC Muller C Sousa VC Lao O Alves IBergstrom A Athanasiadis G Cheng JY Crawford JE et al 2016 Agenomic history of Aboriginal Australia Nature 538(7624)207

Mallick S Li H Lipson M Mathieson I Gymrek M Racimo F Zhao MChennagiri N Nordenfelt S Tandon A et al 2016 The SimonsGenome Diversity Project 300 genomes from 142 diverse popula-tions Nature 538(7624)201

Maples BK Gravel S Kenny EE Bustamante CD 2013 RFMix a discrim-inative modeling approach for rapid and robust local-ancestry infer-ence Am J Hum Genet 93(2)278ndash288

McManus KF Taravella AM Henn BM Bustamante CD Sikora MCornejo OE 2017 Population genetic analysis of the DARC locus(Duffy) reveals adaptation from standing variation associated withmalaria resistance in humans PLoS Genet 13(3)e1006560

Moorjani P Patterson N Hirschhorn JN Keinan A Hao L Atzmon GBurns E Ostrer H Price AL Reich D 2011 The history of African geneflow into Southern Europeans Levantines and Jews PLoS Genet7(4)e1001373

Musilova E Fernandes V Silva NM Soares P Alshamali F Harich NCherni L Gaaied AB Al-Meeri A Pereira L et al 2011 Populationhistory of the Red Seamdashgenetic exchanges between the ArabianPeninsula and East Africa signaled in the mitochondrial DNA HV1haplogroup Am J Phys Anthropol 145592ndash598

Ozato K Shin D-M Chang T-H Morse HC 2008 TRIM family proteinsand their emerging roles in innate immunity Nat Rev Immunol8(11)849ndash860

Patin E Lopez M Grollemund R Verdu P Harmant C Quach H Laval GPerry GH Barreiro LB Froment A et al 2017 Dispersals and geneticadaptation of Bantu-speaking populations in Africa and NorthAmerica Science 356(6337)543ndash546

Patterson N Moorjani P Luo Y Mallick S Rohland N Zhan YGenschoreck T Webster T Reich D 2012 Ancient admixture inhuman history Genetics 192(3)1065

Patterson N Price AL Reich D 2006 Population structure and eigena-nalysis PLoS Genet 2(12)e190

Petkova D Novembre J Stephens M 2016 Visualizing spatial populationstructure with estimated effective migration surfaces Nat Genet48(1)94ndash100

Pickrell JK Coop G Novembre J Kudaravalli S Li JZ Absher D SrinivasanBS Barsh GS Myers RM Feldman MW et al 2009 Signals of recentpositive selection in a worldwide sample of human populationsGenome Res 19(5)826ndash837

Pickrell JK Pritchard JK 2012 Inference of population splits and mixturesfrom genome-wide allele frequency data PLoS Genet8(11)e1002967

Price AL Patterson NJ Plenge RM Weinblatt ME Shadick NA Reich D2006 Principal components analysis corrects for stratification ingenome-wide association studies Nat Genet 38(8)904ndash909

Purcell S Neale B Todd-Brown K Thomas L Ferreira MA Bender DMaller J Sklar P de Bakker PI Daly MJ et al 2007 PLINK a tool set forwhole-genome association and population-based linkage analysesAm J Hum Genet 81(3)559ndash575

Quintana-Murci L Semino O Bandelt H-J Passarino G McElreavey KSantachiara-Benerecetti AS 1999 Genetic evidence of an early exit ofHomo sapiens sapiens from Africa through eastern Africa Nat Genet23(4)437

R Development Core Team 2018 R a language and environment forstatistical computing Vienna (Austria)R Foundation for StatisticalComputing

Racimo F 2016 Testing for ancient selection using cross-populationallele frequency differentiation Genetics 202(2)733ndash750

Ranciaro A Campbell MC Hirbo JB Ko W-Y Froment A Anagnostou PKotze MJ Ibrahim M Nyambo T Omar SA et al 2014 Geneticorigins of lactase persistence and the spread of pastoralism inAfrica Am J Hum Genet 94(4)496ndash510

Reich D Thangaraj K Patterson N Price AL Singh L 2009Reconstructing Indian population history Nature461(7263)489ndash494

Roche B Rougeron V Quintana-Murci L Renaud F Abbate JLPrugnolle F 2017 Might interspecific interactions between patho-gens drive host evolution The case of Plasmodium species andDuffy-negativity in human populations Trends Parasitol 33(1)21ndash29

Rodriguez-Flores JL Fakhro K Agosto-Perez F Ramstetter MD Arbiza LVincent TL Robay A Malek JA Suhre K Chouchane L et al 2016Indigenous Arabs are descendants of the earliest split from ancientEurasian populations Genome Res 26(2)151ndash162

Rosenberg NA Huang L Jewett EM Szpiech ZA Jankovic I Boehnke M2010 Genome-wide association studies in diverse populations NatRev Genet 11(5)356ndash366

Russo A Di Gaetano C Cugliari G Matullo G 2018 Advances in thegenetics of hypertension the effect of rare variants Int J Mol Sci19(3)688

Segal R 2002 Islamrsquos black slaves the other black diaspora LondonAtlantic Books

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin NSchwikowski B Ideker T 2003 Cytoscape a software environmentfor integrated models of biomolecular interaction networks GenomeRes 13(11)2498ndash2504

Skoglund P Thompson JC Prendergast ME Mittnik A Sirak K HajdinjakM Salie T Rohland N Mallick S Peltzer A et al 2017 Reconstructingprehistoric African population structure Cell 171(1)59ndash71 e21

Admixture and Selection in Arabian Peninsula doi101093molbevmsz005 MBE

585

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Snow RW Amratia P Zamani G Mundia CW Noor AM Memish ZA AlZahrani MH Al Jasari A Fikri M Atta H 2013 The malaria transitionon the Arabian Peninsula progress toward a malaria-free regionbetween 1960ndash2010 Adv Parasitol 82205ndash251

Soares P Alshamali F Pereira JB Fernandes V Silva NM Afonso C CostaMD Musilova E Macaulay V Richards MB et al 2012 The expan-sion of mtDNA haplogroup L3 within and out of Africa Mol BiolEvol 29(3)915ndash927

Szpiech ZA Hernandez RD 2014 selscan an efficient multithreadedprogram to perform EHH-based scans for positive selection MolBiol Evol 31(10)2824ndash2827

Tekola-Ayele F Adeyemo A Chen G Hailu E Aseffa AA-O Davey GA-ONewport MJ Rotimi CA-OX 2015 Novel genomic signals of recentselection in an Ethiopian population Eur J Hum Genet23(8)1085ndash1092

Thompson EE Kuttab-Boulos H Witonsky D Yang L Roe BA Di RienzoA 2004 CYP3A variation and the evolution of salt-sensitivity var-iants Am J Hum Genet 75(6)1059ndash1069

Tishkoff SA Reed FA Ranciaro A Voight BF Babbitt CC Silverman JSPowell K Mortensen HM Hirbo JB Osman M et al 2007Convergent adaptation of human lactase persistence in Africa andEurope Nat Genet 39(1)31ndash40

Triska P Soares P Patin E Fernandes V Cerny V Pereira L 2015Extensive admixture and selective pressure across the Sahel BeltGenome Biol Evol 7(12)3484ndash3495

Uchil PD Quinlan BD Chan W-T Luna JM Mothes W 2008 TRIM E3ligases interfere with early and late stages of the retroviral life cyclePLoS Pathog 4(2)e16

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Vyas DN Al-Meeri A Mulligan CJ 2017 Testing support for the northernand southern dispersal routes out of Africa an analysis of Levantineand southern Arabian populations Am J Phys Anthropol164(4)736ndash749

Vyas DN Kitchen A Miro-Herrans AT Pearson LN Al-Meeri A MulliganCJ 2016 Bayesian analyses of Yemeni mitochondrial genomes sug-gest multiple migration events with Africa and Western Eurasia AmJ Phys Anthropol 159(3)382ndash393

Wagh K Bhatia A Alexe G Reddy A Ravikumar V Seiler M Boemo MYao M Cronk L Naqvi A et al 2012 Lactase persistence and lipidpathway selection in the Maasai PLoS One 7(9)e44751

Yin T Cook D Lawrence M 2012 ggbio an R package for extending thegrammar of graphics for genomic data Genome Biol 13(8)R77

Yu Y Ouyang Y Yao W 2018 shinyCircos an RShiny application forinteractive creation of Circos plot Bioinformatics 34(7)1229ndash1231

Zhao F McParland S Kearney F Du L Berry DP 2015 Detection ofselection signatures in dairy and beef cattle using high-density ge-nomic information Genet Sel Evol 4749

Fernandes et al doi101093molbevmsz005 MBE

586

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022

Snow RW Amratia P Zamani G Mundia CW Noor AM Memish ZA AlZahrani MH Al Jasari A Fikri M Atta H 2013 The malaria transitionon the Arabian Peninsula progress toward a malaria-free regionbetween 1960ndash2010 Adv Parasitol 82205ndash251

Soares P Alshamali F Pereira JB Fernandes V Silva NM Afonso C CostaMD Musilova E Macaulay V Richards MB et al 2012 The expan-sion of mtDNA haplogroup L3 within and out of Africa Mol BiolEvol 29(3)915ndash927

Szpiech ZA Hernandez RD 2014 selscan an efficient multithreadedprogram to perform EHH-based scans for positive selection MolBiol Evol 31(10)2824ndash2827

Tekola-Ayele F Adeyemo A Chen G Hailu E Aseffa AA-O Davey GA-ONewport MJ Rotimi CA-OX 2015 Novel genomic signals of recentselection in an Ethiopian population Eur J Hum Genet23(8)1085ndash1092

Thompson EE Kuttab-Boulos H Witonsky D Yang L Roe BA Di RienzoA 2004 CYP3A variation and the evolution of salt-sensitivity var-iants Am J Hum Genet 75(6)1059ndash1069

Tishkoff SA Reed FA Ranciaro A Voight BF Babbitt CC Silverman JSPowell K Mortensen HM Hirbo JB Osman M et al 2007Convergent adaptation of human lactase persistence in Africa andEurope Nat Genet 39(1)31ndash40

Triska P Soares P Patin E Fernandes V Cerny V Pereira L 2015Extensive admixture and selective pressure across the Sahel BeltGenome Biol Evol 7(12)3484ndash3495

Uchil PD Quinlan BD Chan W-T Luna JM Mothes W 2008 TRIM E3ligases interfere with early and late stages of the retroviral life cyclePLoS Pathog 4(2)e16

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A map of recentpositive selection in the human genome PLoS Biol 4(3)e72

Vyas DN Al-Meeri A Mulligan CJ 2017 Testing support for the northernand southern dispersal routes out of Africa an analysis of Levantineand southern Arabian populations Am J Phys Anthropol164(4)736ndash749

Vyas DN Kitchen A Miro-Herrans AT Pearson LN Al-Meeri A MulliganCJ 2016 Bayesian analyses of Yemeni mitochondrial genomes sug-gest multiple migration events with Africa and Western Eurasia AmJ Phys Anthropol 159(3)382ndash393

Wagh K Bhatia A Alexe G Reddy A Ravikumar V Seiler M Boemo MYao M Cronk L Naqvi A et al 2012 Lactase persistence and lipidpathway selection in the Maasai PLoS One 7(9)e44751

Yin T Cook D Lawrence M 2012 ggbio an R package for extending thegrammar of graphics for genomic data Genome Biol 13(8)R77

Yu Y Ouyang Y Yao W 2018 shinyCircos an RShiny application forinteractive creation of Circos plot Bioinformatics 34(7)1229ndash1231

Zhao F McParland S Kearney F Du L Berry DP 2015 Detection ofselection signatures in dairy and beef cattle using high-density ge-nomic information Genet Sel Evol 4749

Fernandes et al doi101093molbevmsz005 MBE

586

Dow

nloaded from httpsacadem

icoupcomm

bearticle3635755288780 by guest on 25 July 2022