Evolutionary and biomedical insights from the rhesus macaque genome

Post on 26-Apr-2023

2 views 0 download

Transcript of Evolutionary and biomedical insights from the rhesus macaque genome

RESEARCH ARTICLE

Evolutionary and Biomedical Insightsfrom the Rhesus Macaque GenomeRhesus Macaque Genome Sequencing and Analysis Consortium*†

The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from theancestors of Homo sapiens about 25 million years ago. Because they are genetically andphysiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate inbasic and applied biomedical research. We determined the genome sequence of an Indian-originMacaca mulatta female and compared the data with chimpanzees and humans to reveal thestructure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individualanimals was used to investigate their underlying genetic diversity. The complete description of themacaque genome blueprint enhances the utility of this animal model for biomedical research andimproves our understanding of the basic biology of the species.

Rhesus macaques (Macaca mulatta) (1)are one of the most frequently en-countered and thoroughly studied of all

nonhuman primates (table S1.1). They have abroad geographic distribution that reaches fromAfghanistan and India across Asia to theChinese shore of the Pacific Ocean. As an OldWorld monkey (superfamily Cercopithecoidea,family Cercopithecidae), this species is closelyrelated to humans and shares a last commonancestor from about 25 million years ago (Mya)(2). The two species often live in close associa-tion, and macaques exhibit complex and in-tensely social behavioral repertoires.

The relationship between humans and ma-caques is even more important because biomedicalresearch has come to depend on these primatesas animal models. Compared with rodents, whichare separated from humans by more than 70million years (2, 3), macaques exhibit greatersimilarity to human physiology, neurobiology,and susceptibility to infectious and metabolicdiseases. Critical progress in biomedicineattributed to macaques includes the identifica-tion of the “rhesus factor” blood groups andadvances in neuroanatomy and neurophysiol-ogy. Most important, their response to infectiousagents related to human pathogens, includingsimian immunodeficiency virus and influenza,has made macaques the preferred model forvaccine development. Lesser-known contribu-tions of these animals include their early use inthe U.S. space program—a rhesus monkey waslaunched into space more than a dozen yearsbefore any chimpanzee.

The cynomolgus macaque (M. fascicularis),pigtailed macaque (M. nemestrina), and Japa-nese macaque (M. fuscata) have all contributed

to research, but the rhesus macaque has beenused most widely. Taxonomists recognize sixM. mulatta subspecies (1), which differ substan-tially in their geographical range, body size, and avariety of morphological, physiological, and be-havioral characteristics. North American researchcolonies include animals representing both Indianand Chinese subspecies, although India ended theexportation of these animals in the 1970s.

With the advent of whole-genome sequenc-ing, a highly accurate human genome sequenceand a draft of the chimpanzee genome havebeen generated and compared. The chimpanzeeshared a common ancestor with humans ap-proximately 6 Mya (4, 5), and the major impactof the chimpanzee genome sequence data hasbeen in their direct comparison with data fromthe human genome. However, the chimpanzeedata have major limitations. First, because thealignable sequence is only 1 to 2% differentfrom that of the human, there is no informative“signal” to distinguish conserved elements fromthe overall high background level of conserva-tion. This is exacerbated by the fact that thechimpanzee genome was an incomplete draft,containing sequence errors that could potentiallymask true divergence. Second, the differencesthat are found between humans and chimpan-zees are difficult to assign as specific to eitherthe chimpanzee or the human. As a result, thechimpanzee analyses have on their own providedrelatively few answers to the fundamental ques-tion of the nature of the specific molecularchanges that make us human.

By contrast, the genome of the rhesus ma-caque has diverged farther from our own, with anaverage human-macaque sequence identity of~93%. Figure 1 shows the inferred common an-cestor for all three species, as well as a commonancestor that predated the human-chimpanzeedivergence. A characteristic that is found inhumans but not in the chimpanzee can be rec-

ognized as a loss in the chimpanzee if it ispresent in the macaque, or it can be recognizedas a gain in the human if it is absent in macaque.In principle, this three-way comparison shouldmake it possible to pinpoint many changes andidentify specific underlying mutational mecha-nisms, which could have been critically impor-tant during the past 25 million years in shapingthe biology of the three primate species.

We examined the basic elements of the rhesusmacaque genome and undertook reconstructionof the major changes in the human-chimpanzee–rhesus macaque (HCR) trio. The regions of thegenome that were duplicated in macaque werethen identified and correlated with other ge-nome features. Individual macaque genes werestudied, and the orthologous genes in the HCRtrio were aligned to reveal evidence for the ac-tion of selection on individual loci. Additionalanimals from other populations were alsosampled by DNA sequencing to study theirgenetic diversity. Throughout, complementarymethods were applied and the different resultscombined in order to represent the most com-plete picture of macaque biology. For a visualrepresentation of some of the insights gainedfrom the genome and more information aboutthe importance of the macaque as a modelorganism, see the poster in this issue (6).

Sequencing the GenomeTo generate a draft genome sequence for therhesus macaque, whole-genome shotgunsequences were assembled. The bulk of thesequencing used DNA from a single M. mulattafemale, whereas DNA from an unrelated malewas used to construct a bacterial artificial chro-mosome (BAC) library to provide BAC endsequences and to aid in selective finishing. We

The Rhesus Macaque Genome

human macaquechimpanzee

25 Mya

~6 Mya

14

43

5

HC Ancestor

HCR Ancestor

Fig. 1. Evolutionary triangulation in the hu-man, chimpanzee and rhesus macaque lineages(lineage-specific breaks), showing a summary ofchromosomal breakpoints on a microscopic scale(Fig. 3) (7). Circled numbers indicate numbers oflineage-specific breaks.

*To whom correspondence should be addressed. Richard A.Gibbs, E-mail: agibbs@bcm.edu†All authors with their contributions and affiliations appearat the end of this paper.

13 APRIL 2007 VOL 316 SCIENCE www.sciencemag.org222

used several whole-genome shotgun librarieswith different insert sizes (~3.0, 10, 35, and180 kb) to generate a total of 18.4 Gb of rawDNA sequence through standard fluorescentSanger sequencing technologies. Initial assembliesto the intermediate scaffold stage were carried outby the three different assembly methods: Atlas–whole-genome shotgun, parallel contig assemblyprogram (PCAP), and the Celera Assembler (7).These were compared by means of more than 200metrics, including gross sequence statistics, agree-ment with finished sequence, utility for genepredictions in the Ensembl pipeline, and accuracyof alignment to the human genome. The threeunpolished assemblies were found to be largelysimilar and of high quality, so all were used incombination with other genome data for thesubsequent assembly and placement of long se-quence segments on the macaque chromosomes(tables S2.1 to S2.4).

To produce an optimal representation of thegenome, the three intermediate assemblies weremerged (Fig. 2). Melding the assemblies in-volved mapping the Atlas–whole-genomeshotgun and PCAP data to the Celera Assem-bler output, which had longer contiguity than theother two data sets at this stage of the process.There was little difference between assemblies atthe sequence contig level, at which robustsequence alignments guide the reconstructions,so we focused our attention instead on contigsthat were joined into scaffolds. Additional pairsof Celera Assembler scaffolds were joined basedon their mapping to the other two macaqueassemblies. Analysis of the output showed thatthis composite assembly was superior to any ofits components (table S2.4).

During assembly, a comparison with thehuman genome sequence [National Center forBiotechnology Information (NCBI) accessioncode bld35] identified a small number (<100)of obvious inconsistencies, such as improperjoins of different chromosomes. These scaf-folds were therefore split at the misassembly

point. The human map was also used to helpplace large merged scaffolds onto the ma-caque chromosomes (8, 9) [the chromosomenumbering of Rogers et al. (8) was used] at thehighest level of the assembly process. Giventhat the human data were only used to splitscaffolds and that de novo macaque assemblieswere always given precedence over the mappingto the human genome in the macaque assemblymerging and chromosome assignment process,the final product should not be regarded as a“humanized assembly.”

The total length of the combined genomeassembly was approximately 2.87 Gb (Table 1).This incorporated ~14.9 Gb of raw sequence,which represents about a 5.2-fold coverage ofthe macaque genome. Comparison with ex-pressed sequence tag (EST) sequence data andapproximately 1.8 Mb of finished sequence (see“Selected sequence finishing,” below) indicatedthat ~98% of the available genome was rep-resented. No misassemblies were identified inthat comparison. Contigs showed an N50 (min-imum length of contigs representing half ofthe total length of the assembly) of >25 kb; theN50 for sequence scaffolds was >24 Mb.GenBank accession codes are available online(table S2.5).

Selected sequence finishing. The rhesusmacaque genome assembly is a draft DNAsequence, and it contains many gaps. A higherdata quality with greater contiguity was desiredat several genomic regions that attracted addi-tional interest. In these cases, individual BACclones were isolated, and data quality was im-proved by sequence “finishing.” Many of theseBACs were in regions of pronounced genomeduplication, whereas others were gene-rich. Allfinished BACs, their gene content, and theirgenome coordinates are listed in table S2.6.

Overview of Genome FeaturesGeneral organization and content. The ma-caque genome is organized into 20 autosomes

and the XY sex chromosomes. With the ex-ception of 48 breakpoints (Fig. 1)—includingthree fusions, one fission, and breakpointsinduced by inversions that are each detectablethrough chromosome staining, by radiation hy-brid mapping, or by comparative linkage map-ping—there is a superficial similarity betweenthe macaque and human chromosomes (8–11).Several chromosomes in the macaque are alsomore acrocentric than their human counter-parts, but many from the two species are dif-ficult to distinguish.

Nucleotide sequences that aligned betweenthe human and rhesus average 93.54% iden-tity. If, however, small insertions and deletionsare included in the calculation, identity is re-duced to 90.76%. Considering regions that aredifficult to align, such as lineage-specific in-terspersed repeat elements, would furtherdecrease the level of computed identity. More-over, evolutionary distances exhibit local fluc-tuations, as in other mammals (3), and lessdivergence was observed in chromosome X(94.26% identity of aligned bases). The GC-content of the rhesus in aligned bases was notnotably lower than that of the human (40.71%versus 40.74%).

Gene content. A human-centric approachwas used to generate new macaque gene sets(table S3.1 and fig. S3.1). These sets include (i)Ensembl (12) gene models based primarily onthe alignment of the human Uniprot and RefSeqresources with the current assembly to definethe overall gene model, followed by the intro-duction of the macaque-specific sequences(mainly as lineage-specific paralogs) in thatframework; (ii) Gnomen (NCBI) models thatinclude the consideration of the available(~50,000) macaque ESTs along with the humanRefSeq; and (iii) Nscan data that includemultiple-species alignments along with cDNAalignments (13). Overall, ~20,000 loci werepredicted by our methods in which at least oneexon was found by two additional predictors.An additional ~5000 loci were each predictedby a single method, but manual inspection of asubset of these loci shows that they are enrichedin gene-prediction errors, mainly due to mis-classification of evidence (e.g., cDNAs fromuntranslated regions that were classified ascontaining protein coding). On average, high-confidence orthologs have 97.5% identity be-tween the human and macaque at both the nu-

Fig. 2. Assembly by three methodsof the rhesus macaque genome.WGS, whole-genome shotgun.BCM-HGSC, Baylor College of Med-icine Human Genome SequencingCenter; WashU-GSC, WashingtonUniversity Genome SequencingCenter; JCVI, J. Craig Venter Insti-tute. QA/QC, quality assurance andquality control.

ATLASBCM-HGSC

P-CAPWashU-GSC

Celera AssemblerJCVI

WGSReads

Polishing and QA/QC

Merging Process

Scaffold Analysis (>200 metrics)

6x4x

Preliminaryassembly

Assembly RheMac2

Table 1. M. mulatta assembly statistics. Totalbases, excluding gaps, number 2,871,189,834.

Contigs Scaffolds

Total number 301,039 122,580N50 size in bp 25,707 24,345,431Number to N50 32,114 36Largest in bp 219,335 98,200,701

www.sciencemag.org SCIENCE VOL 316 13 APRIL 2007 223

SPECIALSECTION

cleotide and amino acid sequence levels. (Thenucleotide and amino acid percentages agreebecause roughly one-third of nucleotide differ-ences within coding regions change an aminoacid.)

Overall repetitive landscape. Repeat ele-ments account for ~50% of the genomes ofall sequenced primates (14) (Table 2). Similarto the human, the rhesus macaque containsabout 320,000 recognizable copies from morethan 100 different families of DNA transposonsand more than half a million recognizable copiesof endogenous retroviruses (ERVs). In general,the DNA transposons show no new lineages,but the ERVs demonstrate a complex phylogenyand many examples of new and expanded fam-ily members, some resulting from horizontaltransmission. In addition, we conservativelyestimate that ~20,000 L1s [a family of longinterspersed elements (LINEs)], and ~110,000Alu elements [a primate-specific family of shortinterspersed elements (SINEs)], were specifical-ly acquired in the Old World monkey lineage.These two retrotransposon families accountedfor most lineage-specific insertions and haveplayed a major role in shaping genomicarchitecture. Among them, rhesus macaque–specific subsets (derived from the L1PA5lineage and AluY) are frequently polymorphicand can be assayed by polymerase chain re-

action (PCR) genotyping analyses for geneticstudies (15).

Determining Ancestral Genome StructureCytogenetically visible rearrangements. Themostnotable genomic differences among the HCR trioare the presence of cytogenetically visible rear-rangements. The human and chimpanzee karyo-types are distinguishable by one chromosomefusion and nine cytogenetically visible pericentricinversions (16); with the use of the macaque asan outgroup, all of these breakpoints (exceptthose induced by two inversions) have now beencharacterized at the DNA sequence level (17).Analysis of genomic sequence confirms that 14breakpoints, corresponding to seven inversions,occurred in the chimpanzee lineage, as indicatedin Fig. 1. (Five of the inversions are summarizedin table S4.1.) The pericentric inversions of hu-man chromosomes 1 and 18 and the fusion cre-ating human chromosome 2 are specific to thehuman. Comparison of the reconstructed human-chimpanzee ancestral genome and the rhesusgenome reveals 43 breakpoints on the micro-scopic scale (Figs. 1 and 3).

Submicroscopic rearrangements. Previousanalyses [reviewed in (14)] have indicated thatprimate genomes harbor more structural dif-ferences than visible by cytogenetic staining.Analysis of these events is complicated by two

issues: the draft state of the genomes and thepresence of extensive segmental duplications.We analyzed these structural rearrangements byusing the distance between orthologous blocksin each species to infer the ancestral genomestructure and determine where rearrangementsoccurred on the phylogenetic tree. We excludedevents smaller than 10 kilobase pairs (kbp),which are mostly due to retroposon insertions,and focused on cytogenetically undetectablebreakpoints induced by insertions, deletions,inversions, and complex rearrangements of sizesbetween 10 kbp and 4 Mbp. Data were com-bined from inversion detection and ancestral re-constructions by the contiguous ancestral regionsmethod (18) and gap detection by the genomictriangulation method (19), which further inte-grates data from genomic sequence comparisons(20) and comparative maps (8, 9, 21). The anal-ysis revealed more than 1000 rearrangement-induced breakpoints through the HCR lineages,of which 820 occur between rhesus and thereconstructed human-chimpanzee ancestor (Fig. 3and fig. S4.1). Each chromosome therefore con-stitutes a complex mosaic, with multiple changesintroduced to orthologous counterparts. Whenrhesus macaque is compared with the human-chimpanzee ancestor, the X chromosome exhib-its three times more rearrangements per mega-base than the autosomes. This is both statisticallysignificant and consistent with a slightly morethan threefold difference observed in the hu-man lineage following the branching off ofchimpanzee (19). Given that a slower rate ofvariability at the single-nucleotide level in the Xchromosome compared with autosomes has beeninterpreted as support for speciation models, thisdifference is worthy of further investigation (22).

Duplications in the Genome andGene Family ExpansionsGenomic Duplications. Segmental duplicationof genomic regions and the genes they contain

Table 2. Summary of repeat content of the rhesus macaque genome compared with the human andchimpanzee genomes. hg18, human genome version 18; panTro2, Pan troglodytes version 2;rheMac2, rhesus macaque version 2; LTR, long terminal repeat; MIR, mammalian interspersedrepeat. SVA is a composite repetitive element named after its main components, SINE, variablenumber of tandem repeats, and Alu; includes SVA precursor elements.

Species DNA LTR/ERVLINE SINE

SVAL1 L2 Alu MIR

hg18 355,000 506,000 572,000 363,000 1,144,000 584,000 3400panTro2 305,000 453,000 558,000 315,000 1,111,000 553,000 4400rheMac2 327,000 432,000 531,000 298,000 1,094,000 539,000 150

Fig. 3. Chromosomalbreakpoints betweenrhesus macaque andthe human-chimpanzeeancestor. Each chromo-some is represented bya white bar (left) and acolored bar (right). Atotal of 820 thin hori-zontal lines in the whitebars represent submicro-scopic breakpoints (10-kbp to 4-Mbp range)detected by genomictriangulation (19), and43 thick black lines inthe colored bars repre-sent breakpoints on amicroscopic scale (>4 Mbp) (7). Numbers above each bar show the total lines within the bar.

13 APRIL 2007 VOL 316 SCIENCE www.sciencemag.org224

The Rhesus Macaque Genome

are well known in mammals and are postulatedto drive fundamental processes, including thebirth of new genes and the subsequent expan-sion of gene families (23). To discover duplica-tions in the macaque genome, we used a batteryof different complementary approaches. Two ofthese, whole-genome assembly comparison (24)and BLASTZ (25) analysis of segmentalduplications, depended directly on the assembly.We used a third method, whole-genome shotgunsequence detection (26), that calculated depth ofcoverage of the raw shotgun sequence reads rela-tive to the assembly. A fourth procedure wascreated on the basis of BAC end sequence readscombined with BACs that were directly mappedbymeans of the pooled genomic indexing method(21). The common interspersed repeat familieswere not considered in any of these analyses.

The first two approaches identified approx-imately 35.0 Mb of a recently duplicated se-quence in the macaque assembly. A further ~15Mb were collapsed in the assembly anddiscovered by whole-genome shotgun sequencedetection (fig. S5.1 and table S5.1). Adjustingfor these collapsed duplications and the over-all assembly coverage, we estimate that ap-proximately 66.7 Mb or 2.3% of the macaquegenome consists of segmental duplication(Fig. 4)—this proportion is substantially lowerthan that of either the human or chimpanzeegenome (5 to 6%) (26, 27).

The pooled genomic indexing and BAC endsequence read methods suggested slightly high-

er levels of overall duplication, on the basis offluorescence in situ hybridization analysis ofrandomly selected large-insert BAC clones (28).However, this estimate was still less than the4.8% recently estimated for the baboon genome(28). Overall, we consider 2.3% to be the lowerbound of duplicated genomic DNA in themacaque genome.

As with the human and chimpanzee, theanalysis of the macaque assembly revealed anenrichment of segmental duplications near gaps,centromeres, and telomeres (14, 29). The studyalso identified segmental duplications that con-tain genes of high biological significance. Forexample, the CCL3L1-CCL4 gene region [forwhich copy-number variation in humans is cor-related with susceptibility to HIV infection (30)],cytochrome P450 (associated with toxicity re-sponse), KRAB-C2H2 zinc finger (a developmentalregulatory transcription factor), olfactory recep-tor (smell), human leukocyte antigen (HLA), andother immune and autoantigen gene families wereall observed in regions of genome duplication.

Expansion of gene families. Two approacheswere used to study gene family structure directlywithin the draft genome sequence: (i) a statisticalapproach, based on a likelihood model of genegain and loss across the mammalian tree (31)and (ii) hybridization of whole genomic DNA tocDNA arrays [a variation of array-based com-parative genomic hybridization (array CGH)] toobserve changes in gene content directly (32).The results are shown in Tables 3 and 4.

The statistical approach revealed that 1358genes were gained by duplication along themacaque lineage. This method simultaneouslyestimates rates of change along individual lin-eages and generates a quantitative assessmentof confidence in rate differences among lin-eages. Iterative modeling revealed higher ratesin primates, relative to other mammals. Therates are similar to those obtained by inde-pendent methods in both humans (33) androdents (3).

We identified 108 gene families, computa-tionally predicted to have changed in sizeamong the primates, evolving at a significantlyhigher rate than the overall primate rates of genegain and loss (all P < 0.0001, Table 3). Morethan 60% of the macaque-specific expansions dis-play evidence of positive selection in their codingsequences, supporting the notion that this ratedisparity may be driven by natural selection.

Gene copy-number estimates by genomichybridization (cDNA array CGH) (32) iden-tified 51 genes (124 cDNAs) with copy-numberincreases in the macaque, relative to the human(Table 4 and table S5.2). Of these array CGH-predicted macaque-specific increases, 33% (17out of 51) were also found by computationalanalysis of gene family gains and losses. Aseparate analysis found that 55% (28 out of 51)are increased in copy number as estimated byBLAST-like Alignment Tool (BLAT)–based (34)predictions from the rheMac2 assembly. In con-trast, when random sets of genes (cDNAs) werechosen for BLAT queries, only 1.45% suggestcopy-number increases (P < 0.0001).

The genome-wide acceleration identified inprimates may be due to an explosion in thenumber of Alu transposable elements in theprimate ancestor, which may have allowed anincrease in the rates of nonallelic homologousrecombination, leading to higher rates of bothduplication and deletion (35). Alternatively, therates of duplicate gene fixation may be due tothe small population size in primates (36)relative to rodents.

Particular expanded gene families. Expan-sion of individual gene families may help toidentify processes that distinguish biologicalfeatures among organisms. One example in hu-mans is the preferentially expressed antigen ofmelanoma (PRAME) gene family that consistsof a single gene on chromosome 22q11.22 and acluster of several dozen genes on chromosome1p36.21. PRAME and PRAME-like genes areactively expressed in cancers but normally man-ifest testis-specific expression and may thus havea role in spermatogenesis. The genomic organi-zation is complicated; the cluster on humanchromosome 1 exhibits copy-number variationin human populations (37, 38) and, together witha similar orthologous cluster on mouse chromo-some 4, apparently arose by translocation notlong before the divergence of primates and

50 100 150 200chr1

50 100 150chr2

50 100 150chr3

50 100 150chr4

50 100 150chr5

50 100 150chr6

50 100 150chr7

50 100chr8

50 100chr9

50chr10

50 100chr11

50 100chr12

50 100chr13

50 100chr14

50 100chr15

50chr16

50chr17

50chr18

50chr19

50chr20

50 100 150chrX

RheMac2 AssemblyWGAC All (> 1kb > 90%)34.97 Mb non redundantIntra: 24.73 Mb nr (blue)Inter: 13.74 Mb nr (red)

Fig. 4. Global pattern of macaque segmental duplications. The statistics are based on all WGAC duplications(> 90%, >1 kb in length), whereas the figure displays only those between 90 and 95% sequence identityand >10 kb in length for simplicity. Red lines indicate interchromosomal (Inter) duplications, blue ticks showintrachromosomal (Intra) events, and purple bars show centromeric, acrocentric, and/or large-gap regions.WGAC, whole-genome assembly comparison. nr, nonredundant.

www.sciencemag.org SCIENCE VOL 316 13 APRIL 2007 225

SPECIALSECTION

rodents, about 85 Mya (39) (Fig. 5 and fig S5.2).After that translocation event, the human andmouse gene clusters expanded independently.Evidence for positive selection has been foundin these genes, and two segmental duplicationspostdating human-chimpanzee divergenceadded about a dozen genes to the human cluster.

To properly resolve evolutionary changes inthe PRAME gene family, we further sequencedsix macaque BAC clones to achieve a higherdata quality, and we assembled them into asingle contig (table S2.6). These eight PRAMEgenes were compared with human and chimpan-zee genes identified from the latest assembliesfor both species. We estimated a phylogeny forall identified genes, designating the mouse genecluster and the human PRAME gene on chro-mosome 22 as outgroups. We then reconciledthis gene tree with the species tree by maximumparsimony. Our reconstruction reveals exten-sive duplication early in primate evolution (Fig.5B, branch a), in recent chimpanzee evolution(Fig. 5B, branch d), and, most notably, in recenthuman evolution (Fig. 5B, branch e). ThePRAME gene cluster appears to have beenmuch less dynamic on the macaque lineage(Fig. 5B, branch b) and in early hominins (thehuman and chimpanzee branch, Fig. 5B,branch c). A large inverted tandem duplicationoccurred on the macaque lineage shortly afterdivergence from the human lineage, but noadditional large-scale rearrangements are evi-dent. The relative quiescence in macaque al-lows us to identify older duplications that aredifficult to discern in the exceedingly complexhuman self-alignments (7).

The inferred PRAME gene tree shows pro-nounced differences in evolutionary rates acrossbranches, as well as some quite long branchesthat suggest bursts of adaptive change. Usingmaximum likelihood methods, we found evi-dence of positive selection on several of thesebranches (Fig. 5A). This positive selection,combined with the highly variable pattern ofgene duplication and expansion, suggests thatthe PRAME gene family has played a key rolein species evolution.

We identified a second segment of extensivegenomic duplications concentrated at the telo-mere of macaque chromosome 9, orthologous toa human locus at 10p15.3 and observed by multi-ple approaches to be distributed throughout themacaque genome. The genes phosphofructokinase-platelet form (PFKP) and DIP2C were ex-panded in this region and yielded the highestarray CGH macaque-to-human ratios in the ge-nome (average log2 ratios of 3.30 and 2.54,respectively). DIP2C is implicated in segmenta-tion patterning, although its relevance to ma-caque evolution is currently obscure. PFKP isimportant in sugar (fructose) metabolism, raisingthe possibility that the pronounced copy-numberexpansion in macaque may be relevant to the

Table 3. Gene families with significant copy-number expansions (P < 0.0001) in the human and theidentical statistic for the rhesus macaque. Gene family ID, identification numbers from Ensembl version 41.Family size, number of gene copies in the current genome assemblies. Gains and losses, number of genesgained and lost since the human’s split with chimpanzee or the macaque’s split with human-chimpanzeelineage. IG, immunoglobulin; IGE, immunoglobulin E; Pre, precursor; MHC, major histocompatibilitycomplex; TCR, T cell receptor; ENV, envelope; ATP, adenosine 5´-triphosphate.

Gene family ID Description Family size Gains Losses

Expanded in humanENSF00000000020 IG heavy chain V region 42 10 0ENSF00000000073 Receptor 56 16 0ENSF00000000233 Peptidyl prolyl cis trans isomerase 38 9 0ENSF00000000312 Histone H2b 28 7 0ENSF00000000597 Golgin subfamily A 49 26 0ENSF00000000664 Ankyrin repeat domain 33 9 0ENSF00000000822 Unknown 15 9 0ENSF00000000841 Tripartite motif 21 7 1ENSF00000000936 Centaurin gamma 15 9 0ENSF00000001036 Cold inducible RNA binding 22 8 0ENSF00000001546 Ubiquitin carboxyl terminal hydrolase 16 13 2ENSF00000001599 Leucine-rich repeat 14 7 0ENSF00000001665 DNA mismatch repair PMS2 12 5 0ENSF00000001738 Unknown 15 7 0ENSF00000001920 40S ribosomal S26 13 7 1ENSF00000001974 Unknown 17 3 0ENSF00000002160 Double homeobox 15 13 0ENSF00000002570 Keratin associated 5 7 2 0ENSF00000003683 Unknown 5 3 0ENSF00000004835 Ambiguous 13 9 0

Expanded in macaqueENSF00000000014 HLA class I 17 12 0ENSF00000000037 HLA class I 16 10 0ENSF00000000070 Keratin type I 65 30 0ENSF00000000077 Histone H3 32 11 0ENSF00000000085 IG kappa chain V region 47 22 2ENSF00000000138 Keratin type II 39 10 0ENSF00000000150 Taste receptor type 2 23 9 0ENSF00000000178 Aldo keto reductase family 1 19 9 0ENSF00000000397 Ral guanine nucleotide dissoc stim. 19 10 1ENSF00000000432 Killer cell IG receptor Pre MHC class I 9 3 0ENSF00000000630 TCR beta chain V region Pre 18 9 0ENSF00000000705 ENV polyprotein 13 11 0ENSF00000000766 60S ribosomal l7A 26 17 1ENSF00000000773 Ribosomal l7 23 12 0ENSF00000000826 60S ribosomal l23A 20 6 0ENSF00000001027 60S ribosomal l17 12 3 0ENSF00000001077 Nucleoplasmin 17 9 0ENSF00000001211 67-kD laminin 18 10 0ENSF00000001235 Nonhistone chromosomal HMG 17 24 12 0ENSF00000001236 60S ribosomal l31 23 11 0ENSF00000001249 60S ribosomal l12 16 8 0ENSF00000001359 USP6 N terminal 14 10 0ENSF00000001460 Prohibitin 7 4 0ENSF00000001671 60S ribosomal l32 10 6 0ENSF00000001861 40S ribosomal S10 9 5 0ENSF00000002239 60S ribosomal l19 8 5 0ENSF00000002279 40S ribosomal S17 8 4 0ENSF00000002476 60S ribosomal l18 7 4 0ENSF00000002633 IGE binding 19 14 0ENSF00000003321 Argininosuccinate synthase 9 6 0ENSF00000003395 10-kD heat shock protein 11 8 0ENSF00000004083 ATP synthase subunit G 4 3 0ENSF00000007347 Unknown 7 3 0

13 APRIL 2007 VOL 316 SCIENCE www.sciencemag.org226

The Rhesus Macaque Genome

high-fruit diet common among macaques. As withother array CGH copy-number estimates, the func-tional status of the additional copies is not known.Six of the individual macaque BACs that mappedto the region revealed related duplicated se-quences on rhesus chromosome 3, which formedfrom the fusion of orthologs of human chromo-somes 7 and 21, suggesting that these genesmay have played a role in this expansion.

Another macaque-specific increase involvesthe 22 HLA-related genes located in the regionorthologous to human chromosome 6p21 (tableS5.4). A previous study found that HLA genecopy number was higher in the macaque than inthe human (40), and our results confirm andextend this finding, demonstrating that themacaque HLA copy number is greater than thatfound for the human as well as all four great apespecies (fig S5.3). This finding also suggeststhat, although the macaque has been extensivelyused to model the human immune response,there may be substantial and previously un-appreciated differences in HLA function be-tween these species. Notably, the copy numberof another immune system–related gene cluster,immunoglobulin lambda-like (IGL) at 22q11.23,is also predicted to be increased in the macaque(table S5.4). Members of the IGL locus encodelight chain subunits that are part of the Pre–Bcell receptor; do not undergo rearrangements;and, when mutated, can result in B cell de-ficiency and agammaglobulinemia. Additionalknown genes predicted by array CGH to havemarkedly increased copy numbers in the ma-caque relative to the human include DHFR,ATP5J2, DNAJC8, ADFP, andMAT2B. Overall,the main characteristics of the set of amplifiedgenes were their diversity and the wide varietyof genomic regions they occupied.

Orthologous RelationshipsThe macaque genome has also allowed for adetailed study of more subtle changes that haveaccumulated within orthologous primate genes.The average human gene differs from itsortholog in the macaque by 12 nonsynonymousand 22 synonymous substitutions, whereas itdiffers from its ortholog in the chimpanzee byfewer than three nonsynonymous and fivesynonymous substitutions. Similarly, 89% ofhuman-macaque orthologs differ at the aminoacid level, as compared with only 71% ofhuman-chimpanzee orthologs. Thus, the chim-panzee and human genomes are in many waystoo similar for characterizing protein-codingevolution in primates, but the added divergenceof the macaque helps substantially in clarifyingthe signatures of natural selection.

General characteristics of orthologous genes.We developed an automatic pipeline to identify10,376 trios of HCR genes to which we couldassign a high confidence of 1:1:1 orthology.For comparison, we also identified 6762 hu-

Table 4. Genes identified as expanded in copy number in the macaque, relative to the human, bythe array CGH method. The leftmost column represents IMAGE cDNA clones that show array CGH–predicted copy number increases in the rhesus macaque relative to the human. The middle twocolumns list corresponding gene names and array CGH log2 macaque-to-human ratios. Therightmost column presents BLAT-predicted copy numbers based on rheMac2 and hg18 genomeassemblies.

IMAGE clone Gene Average log2array CGH ratio

rheMac2/hg18BLAT-predictedcopy numbers

41109/1900937 PFKP 3.30 3/2†454926/1862434 DIP2C 2.54 4/2†1475421/757369 EST 1.74 7/4†50877/110020 EST 1.48 3/4795258/1574131/191978 ATP5J2 1.42 29/10†824545/278888 EST 1.37 0/12457916/322067 DNAJC8 1.37 9/6†504421/435036 ADFP 1.27 6/4†769921/146882 UBE2C 1.17 8/3†155620/154809 IGL 1.14 8/101985794*/1470105 EST 1.14 1/632083/270786 FLJ30436 1.13 2/3306344/773260 MAT2B 1.11 3/2†884480/194908 COX7C/PRO2463 1.09 13/4†244205*/462961/768172/824776*/123971 DHFR 1.05 14/13†72745/1626871* HLA 1.02 1/51493107/1637726 LTB4DH/EST 0.97 3/2†163407/843374 STOM 0.96 2/2258666/428043 PSMB7/EST 0.96 3/2†112498/824894 EST 0.95 0/032231/770984 EST/FLJ12442 0.95 3/2†981713*/953542*/981925* EST 0.95 0/01636233/814459 C9orf23 0.93 4/2†529185/609265 SELK 0.93 6/4†298965/1472754/512003* COX6B 0.93 7/4†322561/240620* EST 0.92 31/22†208656/415195 FLJ20294 0.92 2/2840698/39977 FLJ20254/MAPRE3 0.90 4/2†773287/1635681 NDUFA2 0.89 4/2†756763*/725401* EST 0.85 1/180742/80694 EST 0.85 0/01415672*/1558664* EST 0.84 0/0323806/38029 EST 0.83 2/2595547/997889* EST 0.83 1/1953654*/953643* EST 0.83 0/0783035*/783249* EST 0.83 1/1884272*/1415750* H3F3A 0.83 45/40†322175/210873 EST/PPY2 0.81 1/11569731/1569604 EST 0.79 4/4292982/129431 EST 0.78 1/2112785/361565* RoXaN/GLUD1 0.78 7/4†292452/450327 SMBP 0.78 2/21606275/1534633 Corf129/STOM 0.77 3/2†212847*/1415750* EST 0.76 22/16†664121*/745347* PIG7/EST 0.76 4/2†982122/982113/121546/503715 EST/FLJ14668 0.73 9/6†950688/811603 EST/ATP6V1G1 0.73 4/3†327202/194384 EST/BTF3 0.72 1/1897007/897676* EST 0.71 1/1301388/825470 TOP2A 0.69 6/2†590390/756469* RoXaN 0.62 3/2†*Consistent with computational analysis of gene family gains and losses. †BLAT-based copy-number estimates of rheMac2and hg18 genome assemblies that are consistent with array CGH predictions.

www.sciencemag.org SCIENCE VOL 316 13 APRIL 2007 227

SPECIALSECTION

man, macaque, mouse, and rat quartets; 5641HCR, mouse, and rat quintets; and 5286 HCR,mouse, and dog quintets. Because the humangene models are by far the best characterizedfor primates, we first identified a set of 21,256known human protein-coding genes derivedfrom a union of the RefSeq (41), Vega (42),and University of California–Santa Cruz KnownGenes (43) collections. These genes were thenmapped to synteny-based genome-wide multiplealignments (44, 45) and subjected to a series ofrigorous filters to eliminate spurious annotations,paralogous alignments, genes that have becomepseudogenized in one or more species, and geneswith incompletely conserved exon-intron struc-tures (7). The genes that pass all filters represent1:1:1 orthologs in which aligned protein-codingbases are highly likely to encode proteins in allspecies, with identical reading frames.

Despite the draft quality of the chimpanzeeand macaque assemblies, the majority of humangenes mapped through syntenic alignments tothe chimpanzee (93% of genes) and macaque(89%) genomes (Fig. 6) (7), and most of thesegenes were completely alignable in their codingregions. Fairly large fractions of human genes,however, were discarded because of apparentframe-shift insertions and deletions (indels) ornonconserved exon-intron structures with re-spect to their putative chimpanzee or macaqueorthologs. On the basis of 81 finished BACScovering 294 genes, we estimate that, out of

5526 genes failing the filters for alignmentcompleteness, frame-shift indels, and conservedexon-intron structure, 2138 (39%) were discardedcompletely because of flaws in the macaqueassembly; the remaining 3388 (61%) were dis-carded either because of genuine changes togenes or because of annotation or alignmenterrors (7). Another 2261 genes passed thehuman-macaque filters but failed the human-chimpanzee filters, and a large majority of thesefailures were probably due to flaws in thechimpanzee assembly. Altogether, we estimatethat finished genomes for the macaque andchimpanzee would allow the number of genes inhigh-confidence orthologous trios to be in-creased by at least 23%, to ~12,800 (7). No-tably, our conservative ortholog sets may createa bias against fast-evolving genes and thereforemay lead to underestimates of average levels ofdivergence and the prevalence of positiveselection.

Alignments of the 10,376 orthologous trioswere used to estimate the ratio of the rates ofnonsynonymous and synonymous substitu-tions per gene (denoted w), with continuous-time Markov models of codon evolution andmaximum likelihood methods for parameterestimation (46–48). This yielded a meanestimate of w = 0.247 (median 0.144), closeto the value of 0.23 estimated for human andchimpanzee genes (29). About 9.8% of all genesshow no nonsynonymous changes in the three

species, and 2.8% have w > 1, suggesting thatthey are under positive selection. Consistentwith previous studies (49), certain classes ofgenes exhibit unusually large or small w values,such as those assigned to the gene ontology (50)category “immune response,” which have an wdistribution shifted significantly toward largervalues, and those assigned to the “transcriptionfactor activity” category, which have a distri-bution shifted toward smaller values (fig. S6.1).

Our estimates for w in primates are consid-erably larger than previously reported estimatesfor rodents, which have a median of 0.11 (3),and larger than similar estimates from primate-versus-rodent comparisons (29) (Fig. 7). Tocompare the average rates of evolution ofprotein-coding genes in primates with those inother mammals, we estimated a separate valueof w for each branch of a five-species phylog-eny, pooling data from all 5286 one-to-oneorthologs for these species (fig. S6.2). Weobtained similar estimates of w for the hu-man (w = 0.169) and chimpanzee (w = 0.175)lineages, but substantially smaller estimates forthe branches leading to nonprimate mammals(w = 0.104 to 0.128), suggesting a reduction inpurifying selection in hominins (29). Theestimate of w for the macaque lineage (w =0.124) is substantially smaller than the esti-mates for the human and chimpanzee and iscloser to the estimates for the mouse and dog,perhaps reflecting the larger population size of

Fig. 5. Organization of the PRAME gene cluster in the HCR lineages. (A)Maximum-likelihood phylogeny for PRAME-like genes in the human (H),chimpanzee (P), and rhesus macaque (M) genomes. Colored circles indicateinferred duplication events, partial genes are shown in italics, and branches

showing significant evidence of positive selection are colored orange (Pvalues are shown above orange lines). Scale bar, 0.05 substitutions per site.(B) Another view of the same phylogeny, showing the duplication history inthe context of the species tree (7).

13 APRIL 2007 VOL 316 SCIENCE www.sciencemag.org228

The Rhesus Macaque Genome

macaques compared with the other primates.The estimates for the internal branches betweenthe most recent common ancestors of thehuman and mouse and of the human andmacaque, as well as the most recent commonancestors of the human and macaque and of thehuman and chimpanzee, are nearly equal to themacaque estimate. This suggests that protein-coding sequence evolution in macaques mayhave occurred at a typical primate rate, whereasit is the elevated rates in hominins that may beanomalous.

When primate and rodent w of individualgenes were compared, primate orthologs werefound to be evolving more rapidly by a 3:2ratio. This asymmetry was also evident amonggenes showing substantial differences in primatew (wp), on the basis of human-macaque align-ments, and rodent w (wr), deduced from mouse-rat alignments. According to a strict Bonferronicorrection for multiple testing, 22 genes showedstatistically significant wp > wr, whereas onlythree genes showed wr > wp (McNemar P <0.001). If multiple testing criteria are relaxed,the bias toward larger wp is more notable (144versus 8; tables S6.1 and S6.2). Cases of wp >wr generally reflect an increase in wp, whereascases of wr > wp result both from an increasein wr and a decrease in wp. The genes showingstatistically significant wp > wr are enrichedfor functions in sensory perception of smelland taste as well as for regulation of tran-scription (7).

Positive selection. Taking advantage of theadditional phylogenetic information provided bythe macaque genome, we performed a genome-wide scan for positive selection, using our10,376 HCR orthologous trios and likelihoodratio tests (LRTs) (51–53). Four differentLRTs were performed: test TA, for positiveselection across all branches of the phylogeny,and tests TH, TC, and TM for positive se-lection on the individual branches to human,

chimpanzee, and macaque, respectively. Ourmethods use an unrooted tree and cannotdistinguish between the branches to macaqueand the human-chimpanzee ancestor; for con-venience, we refer to the combined branch asthe macaque branch. In all cases, variationamong sites in w was allowed and, to reducethe number of parameters to estimate per gene,the branch-length proportions and transition-transversion ratio (k) were estimated by pool-ing data from genes of similar G+C content(7). Test TA identified 67 genes, and tests TH,TC, and TM identified 2, 14, and 131 genes(false-discovery rate (FDR) < 0.1 in all cases),respectively. The large number of genes identi-fied for the macaque branch is partly a reflectionof its greater length compared with the chim-panzee and human branches (7).

These four sets of genes overlap consider-ably, particularly among their highest scoringpredictions (Table 5 and table S6.3). Theirunion contains 178 genes, or 1.7% of all genestested. The two genes identified by TH—thoseencoding the leukocyte immunoglobulin-likereceptor LILRB1 and hypothetical proteinLOC399947—were also identified by TA,

and the gene for LILRB1 was identified byTC as well, indicating evidence of positiveselection on multiple branches. However, 12out of 14 genes identified by TC were notidentified by the other tests, indicating possi-ble lineage-specific selection in the chimpan-zee. These include sex comb on midleg-like1 (SCML1) and protamine 1 (PRM1), whichwere previously identified in an analysis thatcould not distinguish between selection on thehuman and chimpanzee branches (52). Inaddition, 99 genes were identified by TM butnot the other tests. These genes may be underlineage-specific selection in the macaque and/ormay have experienced positive selection on thebranch leading to the most recent commonancestor of the human and chimpanzee.

The genes identified by our tests for positiveselection are enriched for several categoriesfrom the gene ontology (50) and ProteinAnalysis Through Evolutionary Relationships(PANTHER) (54) classification systems that aresimilar to those observed in previous genome-wide scans for positive selection (52, 53). Theseinclude defense response, immune response, Tcell–mediated immunity, signal transduction,

Fig. 6. Numbers of hu-man genes passing suc-cessive filters in theorthology analysis pipe-line. Genes are requiredto fall in regions of large-scale synteny betweengenomes, to have com-pletely aligned codingregions, not to haveframe-shift indels or al-tered gene structures,and not to show signsof recent duplication.

21256

0

5000

10000

15000

20000

humanchimp

humanmacaque

humanmouse

humanrat

humandog

humanchimp

macaque

humanmacaquemouse

rat

humanchimp

macaquemouse

rat

humanchimp

macaquemouse

dog

filter 1: lack of synteny

filter 2: incomplete alignments

filter 3: frame-shift indels

filter 4: changes in exon-intron structure

filter 5: recent duplication

remaining 1:1 orthologs

Fig. 7. Distributions of w in pri-mates versus rodents. Histogram ofestimates of w = dN/dS for human,chimpanzee, and macaque versusestimates for mouse and rat in5641 orthologous quintets, showinga pronounced shift toward largervalues in primates (P = 2.2 × 10−16,Mann Whitney test). Genes withdN = 0 or dS = 0 are counted inthe relative frequencies but notshown.

Freq

uenc

y

ω

0.09

0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.01

00.001 0.01 0.1 1 10

RodentsPrimates

www.sciencemag.org SCIENCE VOL 316 13 APRIL 2007 229

SPECIALSECTION

and cell adhesion (tables S6.4 to S6.7). Amongthe genes in these categories are severalimmunoglobulin-like genes, including thosethat encode the leukocyte-associated inhib-itory receptors LILRB1 and LAIR1 (locatedin a cluster on chromosome 19), the T cellsurface glycoprotein CD3 epsilon chain pre-cursor CD3E, and the intercellular adhesionmolecule 1 precursor ICAM1. Other identi-fied genes associated with cell adhesion and/or signal transduction include those that en-code DSG1, a calcium-binding transmembranecomponent of desmosomes, and the trans-membrane protein TSPAN8 (which has gainedan exon by duplication in the macaque ge-nome). Genes encoding membrane proteins ingeneral are strongly overrepresented; otherexamples include the genes that encode con-nexin 40.1, active in cell communication, andOPN1SW, the gene encoding blue-sensitiveopsin.

In addition, we observed strong enrichmentsfor new categories such as iron ion binding [e.g.,the beta globin (HBB), lactotransferrin (LTF), andcytochrome B-245 heavy chain genes (CYBB)]and oxidoreductase activity (e.g., KRTAP5-8and KRTAP5-4, which encode keratin-associatedproteins, and NDUFS5, which encodes a sub-unit of the nicotinamide adenine dinucleotideubiquinone oxidoreductase). Two keratin genes,which are important for hair-shaft formation,

are present among the top-scoring genes;these genes could conceivably have comeunder positive selection as a result of mateselection or climate change. Genes classifiedas part of the extracellular region, which in-clude the keratin genes, are in general over-represented. Many of the identified genesfrom this category encode secreted proteins,such as the interferon alpha 8 precursor IFNA8,which exhibits antiviral activity; the interleukin8 precursor IL8, a mediator of inflammatoryresponse; and CRISP1, which is expressed inthe epididymis and plays a role at fertilization insperm-egg fusion.

We found only weak enrichments forgenes involved in apoptosis and spermato-genesis (52), but we did see a significant ex-cess of high likelihood ratios among genesinvolved in fertilization. Other categories thatshow an excess of high likelihood ratios butthat are not enriched for genes identified byour tests include blood coagulation, responseto wounding, and related categories; epider-mis morphogenesis; KRAB-box transcriptionfactor; and olfactory receptor activity (tablesS6.6 and S6.7). Their elevated likelihoodratios may reflect either weak positive selec-tion or relaxation of constraint.

The inclusion of the macaque genome sub-stantially improves statistical power to detectpositive selection in primates, compared with

previous scans that used only the human andchimpanzee genomes (29, 52). By examiningabout 8000 human-chimpanzee alignments witha similar LRT, Nielsen et al. (52) were able toidentify only 35 genes with nominal P < 0.05,and when considering multiple comparisons,they were able to establish only that a 5% falsediscovery rate set was nonempty. By contrast,the use of the macaque genome allows theidentification of 15 genes under positive selec-tion in hominins and an additional 163 underselection on one or more other branches of thephylogeny, with FDR < 0.1. We estimate thatincluding the macaque genome makes test TAabout three times as powerful. However, includ-ing macaque rather than mouse (53) as anoutgroup improves the power of test TH onlymarginally (7).

The genes identified by the LRTs are gen-erally randomly distributed in the genome, andno significant clustering was observed whentested (P = 0.24), although small clusters werefound on human chromosomes 11 and 19 (7).Chromosome 11, with 10 genes identified bytest TA, has more than twice the expectednumber of genes under positive selection, butthis enrichment is not significant after correctingfor multiple comparisons [P = 0.10, Fisher’sexact test and Holm correction (7)]. However, asignificant enrichment was observed for genesoverlapping segmental duplications that oc-

Table 5. Selected genes from top 40 showing evidence of positive selection inprimates. Accession, the number of the reference transcript for each gene (human).Chr, human chromosome on which reference gene resides. P value, nominal P

value for test TA (7). Genes shown have FDR < 0.04. Test, the test (other than testTA) that detected the given gene. The Dup column has a checkmark if a geneoverlaps a segmental duplication preceding the human/macaque divergence.

Accession Gene name Chr Description P value Test Dup

AB126077 KRTAP5-8 11 Keratin-associated protein 5-8 6.20 × 10–16 TM ✓NM_006669 LILRB1 19 Leukocyte immunoglobulin-like receptor 7.20 × 10–14 TH, TC ✓NM_001942 DSG1 18 Desmoglein 1 preproprotein 1.10 × 10–10 –NM_173523 MAGEB6 X Melanoma antigen family B, 6 5.30 × 10–8 TC ✓NM_054032 MRGPRX4 11 G protein–coupled receptor MRGX4 5.60 × 10–8 TM ✓NM_000397 CYBB X Cytochrome b-245, beta polypeptide 1.50 × 10–7 TMNM_001911 CTSG 14 Cathepsin G preproprotein 1.50 × 10–7 TMNM_000735 CGA 6 Glycoprotein hormones, alpha polypeptide 1.20 × 10–6 TMNM_001012709 KRTAP5-4 11 Keratin-associated protein 5-4 2.70 × 10–6 TM ✓NM_000201 ICAM1 19 Intercellular adhesion molecule 1 precursor 2.70 × 10–6 TMNM_001131 CRISP1 6 Acidic epididymal glycoprotein-like 1 isoform 1 1.60 × 10–5 TMNM_002287 LAIR1 19 Leukocyte-associated immunoglobulin-like 3.10 × 10–5 TM ✓NM_153368 CX40.1 10 Connexin40.1 4.90 × 10–5 –NM_018643 TREM1 6 Triggering receptor expressed on myeloid cells 6.30 × 10–5 TMNM_000300 PLA2G2A 1 Phospholipase A2, group IIA 1.30 × 10–4 –BC020840 TCRA 14 T cell receptor alpha chain C region 1.50 × 10–4 –NM_000733 CD3E 11 CD3E antigen, epsilon polypeptide 1.50 × 10–4 TMNM_001014975 CFH 1 Complement factor H isoform b precursor 1.50 × 10–4 –NM_001423 EMP1 12 Epithelial membrane protein 1 1.50 × 10–4 TMNM_001424 EMP2 16 Epithelial membrane protein 2 1.50 × 10–4 TMNM_002170 IFNA8 9 Interferon, alpha 8 1.50 × 10–4 –NM_030766 BCL2L14 12 BCL2-like 14 isoform 2 1.50 × 10–4 –NM_006464 TGOLN2 2 Trans-golgi network protein 2 1.80 × 10–4 TMNM_014317 PDSS1 10 Prenyl diphosphate synthase, subunit 1 1.80 × 10–4 –NM_000518 HBB 11 Beta globin 2.00 × 10–4 TM

13 APRIL 2007 VOL 316 SCIENCE www.sciencemag.org230

The Rhesus Macaque Genome

curred before the human-macaque divergence(P = 0.006, Fisher’s exact test), suggesting anincreased likelihood of adaptive evolutionfollowing gene duplication. Four of the top fivegenes identified by test TA overlap segmentalduplications that predate the human-macaquedivergence (Table 5).

Genetic Variation in MacaquesThe use of rhesus macaques as animal modelsof human physiology can be greatly enhancedby an improved understanding of their under-lying genetic variation. To explore rhesus ge-netic diversity and to create resources for furthergenetic studies, we generated a total of 26.2 Mbof whole-genome shotgun sequence from 16unrelated individuals (eight of Chinese originand eight of Indian origin, table S7.1). We nextidentified 26,479 single-base differences [puta-tive single-nucleotide polymorphisms (SNPs)]through comparison with the reference genome.Overall, we found approximately one SNP perkilobase, which is on average close to that foundin similar human studies. There was a surprisingdifference of 50% in overall diversity betweenthe autosomes and the X chromosome (Fig. 8A);we expected a value of 75%. This expectationwas based on differences in effective chro-mosome population sizes, given that femaleshave two X chromosomes and males carryonly one. The reduction in diversity could bedue to recent selective sweeps of positivelyselected recessive mutations on the X chromo-some (55).

We also found that the frequency of thewhole-genome shotgun SNPs differed substan-tially among the animals from the different pop-ulations (0.95/kb in Indian rhesus and 1.06/kbin Chinese rhesus), and there was suggestivevariation in SNP density within their subpopu-lations (SD = 0.0275/kb for Chinese macaques;SD = 0.0527/kb for Indian macaques). Togetherwith complementary data from PCR analysis ofpolymorphic L1 and Alu element insertions (figs.S7.1 and S7.2) that showed population sub-

structure, this prompted additional experimentsin which 48 animals from the two populationswere surveyed by PCR-direct DNA sequencing.Details and most conclusions from that studyhave been reported by Hernandez et al. (56),including a demonstration that >67% of SNPsdiscovered by direct sequencing are private toeach subpopulation. The strong population dif-ferentiation is reflected in fixation index (FST)values (a measure of population differentiation)and a marked difference in Watterson’s (57) es-timate of the population mutation rate betweenthe two groups. Here, we observed that the pop-ulation differences are also reflected in differen-tial distribution of Tajima’s D statistic and inlinkage disequilibrium across sampled regions(Fig. 8, B and C). Each of these statistics furtherreflects the possibilities of sweeps of naturalselection or major differences in populationhistories that must be factored into ongoinggenetic studies. These initial insights into theunderlying patterns of variation within individ-ual animals will therefore provide the basis forfuture genetic analyses. In addition to their utilityfor identification of individual animals, the SNPmarkers will be invaluable for larger-scale pop-ulation studies.

Male mutation bias. A comparison of human-rhesus substitution rates (calculated at interspersedrepetitive elements) between the X chromosomeand the autosomes yielded an estimate of themale-to-female mutation rate ratio (a) of 2.87(95% CI = 2.37 to 3.81; table S7.2). This valueis lower than a = 6 estimated for the human andchimpanzee (58) but higher than a = 2 esti-mated for the mouse and rat (3, 59). Thus, thisargues against a uniform magnitude of malemutation bias in mammals (5) and supports acorrelation between male mutation bias andgeneration time (60, 61).

Human Disease Orthologs in the MacaqueWhile the general morphological and phys-iological similarities between humans and ma-caques greatly enhance the utility of the latter as

a model organism, specific differences in theirunderlying coding sequences can also providebiological insights. By comparing human dis-ease genes with their macaque equivalents, weidentified numerous instances in which the alleleobserved in the macaque corresponds to thedisease allele in the human. These occurrencessuggest that the human disease variants couldbe either persistent (i.e., ancestral) or recurringsequences that represent the recapitulation ofancestral states that may once have been pro-tective, but which now result in adverse conse-quences for human health (62).

To identify the ancestral disease-associatedalleles in human, we screened the macaque andchimpanzee assemblies for the presence of anyof the 64,251 different disease-causing ordisease-associated mutations collected in theHuman Gene Mutation Database (63, 64). Atotal of 229 substitutions were identified forwhich the amino acid considered to be mutantin human corresponded to the wild-type aminoacid present in macaque, chimpanzee, and/or areconstructed ancestral genome (Table 6) (65)(see table S8.1 for a full list).

One surprising result of the analysis wasthe identification of several human loci that,when mutated, give rise to profound clinicalphenotypes, including severe mental retarda-tion. For example, the macaque data revealeddeleterious alleles in the ornithine transcarba-mylase (OTC) and phenylalanine hydroxylase(PAH) genes, which are associated in humanwith OTC deficiency and phenylketonuria. Inhumans, these mutations greatly perturb thenormal serum amino acid levels. Direct exam-ination of macaque blood revealed lowerconcentrations of cystine and cysteine than inthe human and slightly higher concentrations ofglycine than in the human, but no increase inphenylalanine or ammonia, which might havebeen a predicted result of these changes (tablesS8.2 and S8.3). Although the effect of theobserved alleles might be greatly influenced bycompensatory mutations (66) or other environ-

B CA

2.4 1.6 0.8 0 0.8 1.6 2.4

ChineseIndian

Tajima's D

#am

plic

ons

010

2030

4050

60

Autosome X

ChineseIndian

SN

Ps

per

kb

0.0

0.5

1.0

1.5

2.0

2 3 4 5 6 7 8 9 10 11 12

ChineseIndian

haplotypes per block

bloc

ks

05

1015

2025

3035

Fig. 8. SNP within rhesus macaques. (A) SNP densities per kilobase foreight Chinese (blue) and eight Indian (red) individuals in autosomes andthe X chromosome. Error bars indicate standard error with variancecalculated across individual-chromosome replicates. (B) Distribution of

Tajima’s D statistic across 166 amplicons for each population (n = 38 forIndian and n = 9 for Chinese individuals). (C) The distribution of thenumber of haplotypes per haplotype block (determined using the four-gamete test) across five regions.

www.sciencemag.org SCIENCE VOL 316 13 APRIL 2007 231

SPECIALSECTION

mental factors, it remains a possibility that thebasic metabolic machinery of the macaque mayexhibit functionally important differences withrespect to our own (Fig. 9).

Ancestral mutations were also identified inthe N-alpha-acetylglucosaminidase (NAGLU)gene that gives rise to mucopolysaccharidoses(Sanfillipo syndrome), which is also charac-terized by profound mental retardation. Theiroccurrence invites further investigation of thecontribution of this and related genes to thephenotypic differences between macaques andhumans, and the potential for further explo-ration of these monkeys as models for thisdisorder.

We also identified a human mutation asso-ciated with Stargardt disease and maculardystrophy that matches an ancestral allele byreplacing lysine with glutamine at position 223of the human ABCA4 protein (Fig. 9). Umedaet al. (67) reported the presence of the glutaminein a cynomolgus monkey, and all other euthe-rian mammals as well as the predicted boreoeu-therian ancestral sequence have glutamine at thisposition. Furthermore, glutamine is present atthis residue in Xenopus, thereby implying con-servation through some 300 million years ofvertebrate evolution. Thus, it may be inferredthat the ancestral glutamine has been replacedby lysine in humans. Similarly, one CFTR mu-tation [Phe87→Leu87 (Phe87Leu)] is present notonly in most mammals (Fig. 9) but also inFugu, also implying extensive conservationthrough vertebrate evolution.

Impact of a Genomic Sequence onBiological StudiesIn addition to its impact on comparative andgenetic studies, the genome sequence reportedhere heralds a new era in laboratory studies ofmacaque biology. The full potential for moreprecise definition of this animal model and itsgene content is not yet realized, but the value ofthe new sequence in guiding DNA microarraysfor studying macaque gene expression has al-ready become clear (68). Previously, human ormacaque EST-based arrays had been used forexpression studies (69). The most recentlyreleased microarray now adds probes designedby alignment of the 3′ untranslated regions of23,000 human RefSeq genes to the sequencesfrom the initial macaque genome release (Janu-ary 2005, Mmul0.1, approximate genome cov-erage of 3.5-fold). The vast majority of theprobes on this array (98.5%) now match thecurrent macaque genome release with highconfidence and represent 18,690 unique genomicloci. These provide a representation of recog-nized functional pathways with an enhancementabout three times that of the previous data, andoverall more uniform and robust hybridizationsignals compared with those of previous micro-arrays (69) (tables S9.1 to S9.3).

The power of global transcriptional profil-ing with advanced macaque-specific reagentshas been demonstrated in studies of virulenceand pathogenicity of influenza from historicpandemic strains, as well as from emergingagents of zoonotic origin. We infected macaqueswith the human influenza strain A/Texas/36/9(70) and compared the expression changesobserved in lung tissues to those seen in wholeblood during the course of infection. Figure 10shows a differential time course of expressionbetween interferon-induced genes and genes inthe inflammation pathway, in different tissues(table S9.4). The increased expression in lungtissue shortly after infection reflects the earlyinnate response, whereas genes associatedwith the reemergence of the inflammationpattern at day 7 implicate a transition to anadaptive immune response. These kinds ofstudies will be crucial for elucidating all of thetransitions from innate to adaptive immune re-sponses and are fully enabled by the macaque-specific microarrays developed from the genomesequences.

We expect many more immediate examplesof the impact of other tools developed from thefinished macaque genome. For example, therequirement for improvements in PCR-basedmethods is shown by a recent report on thelarge-scale cloning of terminal exons for ma-caque genes, in which the use of human primerswas successful, on average, in 67% of cases(71). Only a native sequence can allow suffi-cient precision for these types of highly specificassays. A similar increase of activity in studiesof the macaque proteome can be predicted,given that early efforts in macaque proteomics

have had to rely on human reference sequencesfor analyzing liquid chromatography and tan-dem mass spectrometry data (70).

DiscussionThe draft genomic sequence reported here hasalready moved the macaque from a model thathas been much studied at the level of physiol-ogy, behavior, and ecology to a whole-organismsystem that can be interrogated at the level ofthe single DNA base. This transformation isevident in the literature as well as in this specialsection (15, 19, 57, 72).

Additional general conclusions emergedfrom this study. First, the data make it con-ceivable to define completely all of the opera-tional components of the pathways underlyingthe individual biological systems that togetherconstitute the functioning adult macaque. Forexample, a complete description of all the dif-ferent macaque immune function componentswill enable an even more thoughtful use ofrhesus macaques in areas such as AIDS researchand for vaccine production.

Second, we were struck by the high valueof adding regions of genome finishing to thedraft sequence for the comparative analysesof genes and duplicated structures. This pro-vides an argument for future finished primategenomes.

Third, the data now provide new oppor-tunities to explore the basic biology of thishighly successful species. Rhesus macaquesretain a broad geographic distribution withreasonably healthy population numbers andwidely studied ecology and ethology. The ge-netic resources generated in this study will

Table 6. Examples of human mutations that cause inherited disease and match an ancestral ornonhuman primate state. Chr:start-stop shows the address in the March 2006 human assembly.Name is the name used by the Human Gene Mutation Database (64). The notation “N>A:CHMT”means that N is the consensus human amino acid, A is the disease-associated form, C is in thecurrent chimp assembly, H is in the inferred human-chimp ancestor, M is in rhesus, and T is in theinferred human-rhesus ancestor (the mouse and dog were used as outgroup species) (73).

Chr:start-stop Strand Name ReplacementN>A:CHMT Gene Disease

chr1:94270150-94270152 - CM014300 R>Q:RRQR ABCA4 Stargardt diseasechr1:94316821-94316823 - CM015072 H>R:RRRR ABCA4 Stargardt diseasechr1:94337037-94337039 - CM042258 K>Q:QQQQ ABCA4 Stargardt diseasechr6:26201158-26201160 + HM030028 V>A:VVAA HFE Hemochromatosischr7:116936418-116936420 + CM940237 F>L:FFLL CFTR Cystic fibrosischr7:117054872-117054874 + CM941984 K>R:KKRK CFTR Cystic fibrosischr12:101761685-101761687 - CM962547 Y>H:YYHY PAH Phenylketonuriachr12:101784521-101784523 - CM941128 I>T:IITI PAH Phenylketonuriachr13:51413354-51413356 - CM044579 V>A:AAAA ATP7B Wilson diseasechr13:112843266-112843268 + CM021094 D>E:DDED F10 Factor X deficiencychr17:37948991-37948993 + CM040465 R>Q:RRQQ NAGLU Sanfilippo syndrome Bchr19:43656115-43656117 + CM064230 S>G:GGGG RYR1 Malignant hyperthermiachrX:38111528-38111530 + CM941115 R>H:RRHH OTC Ornithine hyperammonemiachrX:38125613-38125615 + CM961052 T>M:MTTT OTC Ornithine hyperammonemiachrX:138458220-138458222 + CM045148 E>K:EEKK F9 Hemophilia B

13 APRIL 2007 VOL 316 SCIENCE www.sciencemag.org232

The Rhesus Macaque Genome

undoubtedly form the basis of many analysesof population variability and inter-populationdiversity.

Finally, the genomic rearrangements, dupli-cations, gene-specific expansions, and measure-ments of the impact of natural selection presented

here have revealed the rich and heterogeneousgenomic changes that have occurred during theevolution of the human, chimpanzee, and ma-caque. The marked diversity of the types ofchange that have occurred demonstrate a majorfeature of primate evolution: The aggregation ofchanges that we see, even in closely relatedspecies, does not reflect smooth, progressive,and orderly genomic divergence. Models ofabrupt or punctuated evolution already acknowl-edge that smooth and continuous change isdifficult to achieve on an evolutionary timescale, but this study provides a notable exampleof the operation of this principle in our closerelatives.

References and Notes1. C. Groves, Primate Taxonomy (Smithsonian Institution

Press, Washington, DC, 2001).2. S. Kumar, S. B. Hedges, Nature 392, 917 (1998).3. R. A. Gibbs et al., Nature 428, 493 (2004).4. F. C. Chen, W. H. Li, Am. J. Hum. Genet. 68, 444

(2001).5. N. Patterson, D. J. Richter, S. Gnerre, E. S. Lander, D. Reich,

Nature 441, 1103 (2006).6. L. M. Zahn, B. R. Jasny, Eds., poster from the special issue

on the Macaque Genome, Science 316, following p. 246(13 April 2007); interactive online (www.sciencemag.org/sciext/macaqueposter/).

7. Materials, methods, and additional discussion areavailable on Science Online.

8. J. Rogers et al., Genomics 87, 30 (2006).9. W. J. Murphy et al., Genomics 86, 383 (2005).

10. B. Dutrillaux, Hum. Genet. 48, 251 (1979).11. J. Wienberg, R. Stanyon, A. Jauch, T. Cremer,

Chromosoma 101, 265 (1992).12. T. J. Hubbard et al., Nucleic Acids Res. 35, D610

(2007).13. M. J. van Baren, M. R. Brent, Genome Res. 16, 678

(2006).14. International Human Genome Sequencing Consortium,

Nature 409, 860 (2001).15. K. Han et al., Science 316, 238 (2007).16. J. J. Yunis, O. Prakash, Science 215, 1525 (1982).17. H. Kehrer-Sawatzki, D. N. Cooper, Hum. Genet. 120, 759

(2007).18. J. Ma et al., Genome Res. 16, 1557 (2006).19. R. A. Harris et al., Science 316, 235 (2007).20. K. J. Kalafus, A. R. Jackson, A. Milosavljevic, Genome Res.

14, 672 (2004).21. A. Milosavljevic et al., Genome Res. 15, 292 (2005).22. N. H. Barton, Curr. Biol. 16, R647 (2006).23. J. A. Bailey, E. E. Eichler, Nat. Rev. Genet. 7, 552

(2006).24. J. A. Bailey, A. M. Yavor, H. F. Massa, B. J. Trask,

E. E. Eichler, Genome Res. 11, 1005 (2001).25. S. Schwartz et al., Genome Res. 13, 103 (2003).26. J. A. Bailey et al., Science 297, 1003 (2002).27. Z. Cheng et al., Nature 437, 88 (2005).28. X. She et al., Genome Res. 16, 576 (2006).29. The Chimpanzee Sequencing and Analysis Consortium,

Nature 437, 69 (2005).30. E. Gonzalez et al., Science 307, 1434 (2005).31. M. W. Hahn, B. T. De, J. E. Stajich, C. Nguyen, N. Cristianini,

Genome Res. 15, 1153 (2005).32. A. Fortna et al., PLoS Biol. 2, E207 (2004).33. M. Lynch, J. S. Conery, J. Struct. Funct. Genomics 3, 35

(2003).34. W. J. Kent, Genome Res. 12, 656 (2002).35. J. A. Bailey, E. E. Eichler, Cold Spring Harb. Symp. Quant.

Biol. 68, 115 (2003).36. M. Lynch, M. O’Hely, B. Walsh, A. Force, Genetics 159,

1789 (2001).37. A. J. Iafrate et al., Nat. Genet. 36, 949 (2004).

Fig. 9. Ancestral diseasemutations. Examples ofhuman mutations thatmatch the sequences ofchimp and/or macaqueare shown. (A) Genes inwhich the ancestral alleleis now the disease-associated allele in hu-mans. (B) An instancein which the mutantallele in humans is thenormal allele in ma-caque. The amino acidsequences predicted forthe boreoeutherian an-cestor (65) are given onthe top row of eachalignment block. Iden-

tities are shown as dots and differences are given asletters (73). The position of the mutation in humans isboxed in orange, and the box extends through the rele-vant comparisons.

Fig. 10. Application of rhesus-specific microarrays. A microarray based on the rhesus macaquedraft genome was used to analyze gene expression in a macaque model of human influenzainfection. Gray bars measure an overall response for indicated functional categories, based oncorresponding heat maps, and reveal a significant rebound in expression at day 7 for genesassociated with the inflammatory response, when compared to interferon induction. Red, increasedexpression; green, reduced expression. Details are given in (7, 70).

www.sciencemag.org SCIENCE VOL 316 13 APRIL 2007 233

SPECIALSECTION

38. J. Sebat et al., Science 305, 525 (2004).39. Z. Birtle, L. Goodstadt, C. Ponting, BMC Genomics 6, 120

(2005).40. R. Daza-Vamenta, G. Glusman, L. Rowen, B. Guthrie,

D. E. Geraghty, Genome Res. 14, 1501 (2004).41. K. D. Pruitt, T. Tatusova, D. R. Maglott, Nucleic Acids Res.

35, D61 (2007).42. J. L. Ashurst et al., Nucleic Acids Res. 33, D459 (2005).43. F. Hsu et al., Bioinformatics 22, 1036 (2006).44. M. Blanchette et al., Genome Res. 14, 708 (2004).45. W. J. Kent, R. Baertsch, A. Hinrichs, W. Miller, D. Haussler,

Proc. Natl. Acad. Sci. U.S.A. 100, 11484 (2003).46. N. Goldman, Z. Yang, Mol. Biol. Evol. 11, 725 (1994).47. R. Nielsen, Z. Yang, Genetics 148, 929 (1998).48. Z. Yang, Comput. Appl. Biosci. 13, 555 (1997).49. D. Graur, L. Li, Fundamentals of Molecular Evolution

(Sinauer Associates, Tel Aviv, Israel, ed. 2, 2000).50. M. Ashburner et al., Nat. Genet. 25, 25 (2000).51. Z. Yang, R. Nielsen, Mol. Biol. Evol. 19, 908 (2002).52. R. Nielsen et al., PLoS Biol. 3, e170 (2005).53. A. G. Clark et al., Science 302, 1960 (2003).54. H. Mi et al., Nucleic Acids Res. 33, D284 (2005).55. A. J. Betancourt, Y. Kim, H. A. Orr, Genetics 168, 2261

(2004).56. R. D. Hernandez et al., Science 316, XXXX (2007).57. G. A. Watterson, Theor. Popul. Biol. 7, 256 (1975).58. J. Taylor, S. Tyekucheva, M. Zody, F. Chiaromonte,

K. D. Makova, Mol. Biol. Evol. 23, 565 (2006).59. K. D. Makova, S. Yang, F. Chiaromonte, Genome Res. 14,

567 (2004).60. A. Bartosch-Harlid, S. Berlin, N. G. Smith, A. P. Moller,

H. Ellegren, Evolution Int. J. Org. Evolution 57, 2398 (2003).61. M. P. Goetting-Minesky, K. D. Makova, J. Mol. Evol. 63,

537 (2006).62. J. V. Neel, Am. J. Hum. Genet. 14, 353 (1962).63. P. D. Stenson et al., Hum. Mutat. 21, 577 (2003).64. Human Gene Mutation Database at the Institute of

Medical Genetics in Cardiff (www.hgmd.org; October2006 release).

65. M. Blanchette, E. D. Green, W. Miller, D. Haussler,Genome Res. 14, 2412 (2004).

66. L. Gao, J. Zhang, Trends Genet. 19, 678 (2003).67. S. Umeda et al., Invest. Ophthalmol. Vis. Sci. 46, 683 (2005).68. J. D. Chismar et al., Biotechniques 33, 516 (2002).69. J. C. Wallace et al., BMC Genomics 8, 28 (2007).70. T. Baas et al., J. Virol. 80, 10813 (2006).71. E. R. Spindel et al., BMC Genomics 6, 160 (2005).72. M. Ventura et al., Science 316, 243 (2007).73. Single-letter abbreviations for the amino acid residues

are as follows: A, Ala; C, Cys; D, Asp; E, Glu; F, Phe;G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro;Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; and Y, Tyr.

74. This project was supported by National Human GenomeResearch Institute grants to the Baylor College ofMedicine Human Genome Sequencing Center (U54HG003273), Washington University Genome SequencingCenter (U54 HG003079), and the J. Craig Venter Institute(U54 HG003068). We thank members of the NHGRI stafffor their ongoing efforts: A. Felsenfeld, J. Peterson,M. Guyer, and W. Lu. Additional acknowledgments ofsupport are available online (7). We thank the CaliforniaNational Primate Research Center (NPRC), Oregon NPRC,Southwest NPRC, and Yerkes NPRC for contributingbiological samples used in this study.

Rhesus Macaque Genome Sequencing andAnalysis ConsortiumProject Leader: Richard A. Gibbs1,2

White paper: Jeffrey Rogers,3 Michael G. Katze,4 RogerBumgarner,4 Richard A. Gibbs,1,2 George M. Weinstock1,2

Principal investigators: Richard A. Gibbs,1,2 Elaine R. Mardis,5

Karin A. Remington,6 Robert L. Strausberg,6 J. Craig Venter,6

George M. Weinstock,1,2 Richard K. Wilson5

Analysis leaders: Mark A. Batzer,7 Carlos D. Bustamante,8 Evan E.Eichler,9 Richard A. Gibbs,1,2 Matthew W. Hahn,10 Ross C.Hardison,11 Kateryna D. Makova,11 Webb Miller,11 AleksandarMilosavljevic,1,2 Robert E. Palermo,4 Adam Siepel,8 James M.Sikela,12 George M. Weinstock1,2

Genome sequencing: Tony Attaway,1,2 Stephanie Bell,1,2 Kelly E.Bernard,5 Christian J. Buhay,1,2 Mimi N. Chandrabose,1,2 MarvinDao,1,2 Clay Davis,1,2 Kimberly D. Delehaunty,5 Yan Ding,1,2

Huyen H. Dinh,1,2 Shannon Dugan-Rocha,1,2 Lucinda A. Fulton,5

Ramatu Ayiesha Gabisi,1,2 Toni T. Garner,1,2 Richard A. Gibbs,1,2

Jennifer Godfrey,5 Alicia C. Hawes,1,2 Judith Hernandez,1,2 SandraHines,1,2 Michael Holder,1,2 Jennifer Hume,1,2 Shalini N.Jhangiani,1,2 Vandita Joshi,1,2 Ziad Mohid Khan,1,2 Ewen F.Kirkness6 (leader), Andrew Cree,1,2 R. Gerald Fowler,1,2 SandraLee,1,2 Lora R. Lewis,1,2 Zhangwan Li,1,2 Yih-shin Liu,1,2 Stephanie M.Moore,1,2 Donna Muzny1,2 (leader), Lynne V. Nazareth1,2 (leader),Dinh Ngoc Ngo,1,2 Geoffrey O. Okwuonu,1,2 Grace Pai,6 DavidParker,1,2 Heidie A. Paul,1,2 Cynthia Pfannkoch,6 Craig S. Pohl,5

Yu-Hui Rogers,6 San Juana Ruiz,1,2 Aniko Sabo,1,2 JirehSantibanez,1,2 Brian W. Schneider,1,2 Scott M. Smith,5 EricaSodergren,1,2 Amanda F. Svatek,1,2 Teresa R. Utterback,1,2 SelinaVattathil,1,2 Wesley Warren5 (leader), George M. Weinstock,1,2

Courtney Sherell White1,2

Genome assembly: Asif T. Chinwalla5 (leader), Yucheng Feng,5

Aaron L. Halpern,6 LaDeana W. Hillier,5 Xiaoqiu Huang,13 Ewen F.Kirkness,6 Pat Minx,5 Joanne O. Nelson,5 Kymberlie H. Pepin,5

Xiang Qin,1,2 Karin A. Remington,6 Granger G. Sutton6 (leader), EliVenter,6 Brian P. Walenz,6 John W. Wallis,5 George M. Weinstock,1,2

Kim C. Worley1,2 (leader), Shiaw-Pyng Yang5

Mapping: LaDeana W. Hillier,5 Steven M. Jones,14 Marco A.Marra,14 Mariano Rocchi,15 Jacqueline E. Schein,14 John W. Wallis5

Sequence finishing: Christian J. Buhay,1,2 Yan Ding,1,2 ShannonDugan-Rocha,1,2 Alicia C. Hawes,1,2 Judith Hernandez,1,2 MichaelHolder,1,2 Jennifer Hume,1,2 Ziad Mohid Khan,1,2 Zhangwan Li,1,2

Dinh Ngoc Ngo,1,2 Aniko Sabo1,2

Assembly comparison: Robert Baertsch,16 Asif T. Chinwalla,5

Laura Clarke,17 Miklós Csürös,18 Jarret Glasscock,5 R. AlanHarris,1,2 Paul Havlak,1,2 LaDeana W. Hillier,5 Andrew R.Jackson,1,2 Huaiyang Jiang,1,2 Yue Liu,1,2 David N. Messina,5

Xiang Qin,1,2 Yufeng Shen,1,2 Henry Xing-Zhi Song,1,2 George M.Weinstock1,2 (leader), Kim C. Worley1,2 (leader), Todd Wylie,5

Lan Zhang1,2

Gene prediction: Ewan Birney,17 Laura Clarke17

Repetitive elements: Mark A. Batzer7 (leader), Kyudong Han,7Miriam K. Konkel,7 Jungnam Lee,7 Webb Miller,11 Arian F. A.Smit,19 Brygg Ullmer,20 Hui Wang,7 Jinchuan Xing7,21

Ancestral genomes and segmental duplications: RichardBurhans,11 Ze Cheng,9 Miklós Csürös,18 Evan E. Eichler,9 R. AlanHarris,1,2 Andrew R. Jackson,1,2 John E. Karro,11 Jian Ma,22

Aleksandar Milosavljevic1,2 (leader), Brian Raney,22 Xinwei She9

Gene duplication/gene families: Michael J. Cox,12 Jeffery P.Demuth,10 Laura J. Dumas,12 Matthew W. Hahn10 (leader),Sang-Gook Han,10 Janet Hopkins,12 Anis Karimpour-Fard,23

Young H. Kim,24 Jonathan R. Pollack,24 James M. Sikela12

(leader)PRAME Gene Family Analysis: Webb Miller11 (leader), DonnaMuzny,1,2 Brian Raney,22 Aniko Sabo,1,2 Adam Siepel,8 TomasVinar8

Orthologous genes: Charles Addo-Quaye,11 Jeremiah Degenhardt,8

Alexandra Denby,8 Melissa J. Hubisz,25 Amit Indap,8 CarolinKosiol,8 Bruce T. Lahn,25,26 Heather A. Lawson,11 Alison Marklein,8

Rasmus Nielsen,27 Adam Siepel8 (leader), Eric J. Vallender,25,26

Tomas Vinar8

Population genetics: Mark A. Batzer7 (leader), Carlos D.Bustamante8 (leader), Andrew G. Clark,28 Jeremiah Degenhardt,8

Betsy Ferguson,29 Richard A. Gibbs,1,2 Matthew W. Hahn,10

Kyudong Han,7 Ryan D. Hernandez,8 Kashif Hirani,1,2 AmitIndap,8 Hildegard Kehrer-Sawatzki,30 Jessica Kolb,30 Miriam K.Konkel,7 Jungnam Lee,7 Lynne V. Nazareth,1,2 Shobha Patil,1,2

Ling-Ling Pu,1,2 Jeffrey Rogers,3 Yanru Ren,1,2 David GlennSmith,3 Brygg Ullmer,20 Hui Wang,7 David A. Wheeler,1,2

Jinchuan Xing7,21

Sex chromosome evolution: Kateryna D. Makova,11 Ian Schenck11Human disease orthologs: Edward V. Ball,31 Rui Chen,1,2

David N. Cooper,31 Belinda Giardine,11 Richard A. Gibbs,1,2 Ross C.Hardison11 (leader), Fan Hsu,22 W. James Kent,22 Arthur Lesk,11

Webb Miller,11 David L. Nelson,2 William E. O’Brien,2 Kay Prüfer,32

Peter D. Stenson31

Additional biological impact of genomic sequence: Michael G.Katze,4 Robert E. Palermo4 (leader), James C. Wallace4

Macaque sample collection: Hui Ke,33 Xiao-Ming Liu,34 PengWang,33 Andy Peng Xiang,33 Fan Yang33

Genome browser: Robert Baertsch,16 Galt P. Barber,22 DavidHaussler35,16 (leader), Donna Karolchik,22 Andy D. Kern,22

Robert M. Kuhn,22 Kayla E. Smith,22 Ann S. Zwieg22

1Human Genome Sequencing Center, Baylor College ofMedicine, Houston, TX 77030, USA. 2Department of Mo-lecular and Human Genetics, Baylor College of Medicine,Houston, TX 77030, USA. 3Department of Genetics, South-west Foundation for Biomedical Research, San Antonio, TX78227, USA. 4Department of Microbiology, University ofWashington, Seattle, WA 98195, USA. 5Genome SequencingCenter, Washington University, St. Louis, MO 63108, USA. 6J.Craig Venter Institute, 9704 Medical Center Drive, Rockville,MD 20850, USA. 7Department of Biological Sciences,Biological Computation and Visualization Center, Center forBioModular Multi-scale Systems, Louisiana State University,Baton Rouge, LA 70803, USA. 8Department of BiologicalStatistics and Computational Biology, Cornell University,Ithaca, NY 14853, USA. 9Department of Genome Sciences,University of Washington, Seattle, WA 98195, USA. 10Depart-ment of Biology and School of Informatics, Indiana University,Bloomington, IN 47405, USA. 11Center for ComparativeGenomics and Bioinformatics, Pennsylvania State University,University Park, PA 16802, USA. 12Human Medical Geneticsand Neuroscience Programs, Department of Pharmacology,University of Colorado at Denver and Health Sciences Center,Aurora, CO 80045, USA. 13Department of Computer Science,Iowa State University, Ames, IA 50011, USA. 14GenomeSciences Centre, British Columbia Cancer Agency, 570 West7th Avenue, Vancouver, BC, Canada. 15Department of Geneticsand Microbiology, University of Bari, Bari, Italy. 16Departmentof Bioinformatics, University of California Santa Cruz, SantaCruz, CA 95060, USA. 17The Wellcome Trust Sanger Institute,Wellcome Trust Genome Campus, Hinxton, Cambridgeshire,CB10 1SA, UK. 18Département d’Informatique et de RechercheOpérationnelle, Université de Montréal, Montréal, QC H3C3J7, Canada. 19Institute for Systems Biology, 1441 North 34thStreet, Seattle, WA 98103–8904, USA. 20Center for Computa-tion and Technology, Department of Computer Sciences,Louisiana State University, Baton Rouge, LA 70803, USA.21Eccles Institute of Human Genetics, University of Utah, SaltLake City, UT 84112, USA. 22Center for Biomolecular Scienceand Engineering, University of California Santa Cruz, SantaCruz, CA 95064, USA. 23Department of Preventative Medicineand Biometrics, University of Colorado at Denver and HealthSciences Center, Aurora, CO 80045, USA. 24Department ofPathology, Stanford University, Stanford, CA 94305, USA.25Department of Human Genetics, University of Chicago, Chi-cago, IL 60637, USA. 26Howard Hughes Medical Institute,Department of Human Genetics, University of Chicago,Chicago, IL 60637, USA. 27Institute of Biology, University ofCopenhagen, Copenhagen DK-1017, Denmark. 28Departmentof Molecular Biology and Genetics, Cornell University, Ithaca,NY 14853, USA. 29Genetics Research and Informatics Pro-gram, Oregon National Primate Research Center, Beaverton,OR 97006, USA. 30Institute of Human Genetics, University ofUlm, Ulm, 89081, Germany. 31Institute of Medical Genetics,Cardiff University, Heath Park, Cardiff, CF14 4XN, UK.32Department Evolutionary Genetics, Max Planck Institute forEvolutionary Anthropology, Leipzig, 04103, Germany. 33Centrefor Stem Cell Biology and Tissue Engineering, Sun Yat-senUniversity, Guangzhou 510080, China. 34South-China PrimateResearch and Development Center, Guangzhou 510080,China. 35Howard Hughes Medical Institute, Santa Cruz, CA95060, USA.

Supporting Online Materialwww.sciencemag.org/cgi/content/full/316/5822/222/DC1Materials and MethodsSOM TextFigs. S1.1 to S7.2Tables S1.1 to S9.4References and Notes

22 December 2006; accepted 16 March 200710.1126/science.1139247

13 APRIL 2007 VOL 316 SCIENCE www.sciencemag.org234

The Rhesus Macaque Genome