Man's place in hominoidea revealed by mitochondrial DNA genealogy

J Mol Evol (1992) 35:32-43 Journal of Molecular Evolution (~ Springer-Verlag New York Inc. 1992

Man's Place in Hominoidea Revealed by Mitochondrial DNA Genealogy

Satoshi Horai, ~ Yoko Satta, 1 Kenji Hayasaka, 1 Rumi Kondo, ~ Tadashi Inoue, 2 Takafumi Ishida, 3 Seiji Hayashi, 4 and Naoyuki Takahata I

1 National Institute of Genetics, Mishima 411, Japan 2 College of Agriculture and Veterinary Medicine, Nihon University, Kanagawa 252, Japan 3 Faculty of Science, the University of Tokyo, Tokyo 113, Japan 4 Faculty of Science, Nagoya University, Nagoya 464, Japan

Summary. Molecular biology has resurrected C. Darwin and T.H. Huxley's question about the origin of humans, but the precise branching pattern and dating remain controversial. To settle this issue, a large amount of sequence information is required. We determined mitochondrial (mr) DNA sequences for five hominoids; pygmy and common chimpanzees, gorilla, orangutan, and siamang. The common region compared with the known human sequence is 4759 bp long, encompassing genes for 11 transfer RNAs and 6 proteins. Because of the high substitution rates in mammalian mtDNA and an unprec- edentedly large region compared, the sequence differences clearly indicate that the closest relatives to human are chimpanzees rather than gorilla. For dating the divergences of human, chimpanzee, and gorilla, we used only unsaturated parts of sequence differences in which the mtDNA genealogy is not obscured by multiple substitutions. The result sug- gests that gorilla branched off7.7 + 0.7 million years (Myr) ago and human 4.7 + 0.5 Myr ago; the time difference between these divergences being as long as 3 Myr.

Key words: Hominoid phylogeny -- Mitochon- drial DNA -- Nucleotide substitutions -- Molecular clock -- Phylogenetic trees -- Divergence times

Introduction

Ever since Darwin, man's place in nature, from either a zoocentric or anthropocentric perspective,

Offprint requests to: S. Horai

has been a cardinal question in building compre- hensive systems in biology (Darwin 1859; Huxley 1894; Gould 1980). The discovery of a molecular clock (constancy of molecular evolutionary rate) has revolutionized the field (Zuckerkandl and Pauling 1965; Sarich and Wilson 1967), and it has become widely accepted that human and the African apes share a Pliocene ancestor, much more recent than previously thought (Pilbeam 1984; Mellars and Stringer 1989; Stringer 1990). The precise branching pattern (cladogram) and dating in hominoid diver- sification are nevertheless highly controversial (Goodman et al. 1983; Foran et al. 1988; Djian and Green 1989; Gibbons 1990). This controversy reflects the stochastic nature of the molecular clock and the fact that human, chimpanzee, and gorilla might have diverged within a short period of evolutionary time (trichotomy). To resolve the trichotomy problem, it is essential to find a number of nucleotide substitutions that can be assigned to in- ternodal branches in the cladogram of hominoids. Although the longest DNA sequences now available (the ffn-globin gene and its flanking region; 11,483 bp) assign about 8-14 substitutions that can support the human-chimpanzee clade, the likelihood is not significantly higher than that of the human-gorilla or chimpanzee-gorilla clade (Goodman et al. 1989). This is due to relatively slow rates of nuclear DNA evolution, and the trichotomy remains an open question. As for ourselves, the significance in this molecular systematic pursuit is to provide a basis for better understanding of the evolution of mor- phological and behavioral traits of our own species, H o m o sapiens sapiens.

To this end, and because of a large number of

33

sequence differences required, we used mi tochon- drial (mr) D N A that is known to evolve m u c h more rapidly (Brown et al. 1982; H ixon and Brown 1986). We sequenced a c o m m o n region o f 4938 bp length for p y g m y (Pan paniscus) and c o m m o n (Pan trog- lodytes) c h i m p a n z e e s , gor i l l a (Gorilla gorilla), orangutan (Pongo pygmaeus), and s iamang (Hylob- ates syndactylus). This region contains the comple te genes for N A D H dehydrogenase subunit 2 (ND2), cytochrome oxidase subunit I and I I (COIand COIl), A TPase 8, por t ions o f two genes for ND1 and A TP- ase 6, and 11 interspersed tRNAs . Here we focus on the phylogenet ic impl ica t ion o f these sequences together with the known sequence of h u m a n (An- derson et al. 1981) and present o ther molecular aspects elsewhere.

Materials and Methods

Extraction and Cloning of mtDNA. We purified mtDNA from cultured cells of a common chimpanzee and an orangutan, whereas relevant segments of mtDNA were amplified from a pygmy chimpanzee, a gorilla, and a siamang by means of the polymerase chain reaction (PCR) (Saiki et al. 1988). The total DNAs were used as templates in the PCR. A preliminary study by Southern hybridization analysis showed that a human EcoRI site [bp 4121 by Anderson et al.'s (1981) numbering] and a PstI site Cop 9020) are conserved in the hominoid species except for siamang. We then prepared a clone library from each species by digesting mtDNA with EcoRI alone, EcoRI plus HindIII, and HindIII plus PstI, and cloned the resultant fragments in plasmid vectors. We cleaved these fragments from the recombinant plasmids under the above enzyme combinations and purified them by agarose gel electrophoresis. For siamang, PCR-amplified fragments, rang- ing from bp 3694 to bp 9912, were used for subsequent enzyme digestions. The fragments recovered from the gels were further digested with HaeIII and/or AIuI and subcloned in the Sinai- cleaved vector, M 13mp 10, in order to prepare a single-stranded template DNA.

DNA Sequencing. Sequencing reactions were performed by the dideoxynucleotide chain-termination method (Sanger et al. 1977) using 32p-dCTP (Amersham) and Sequenase version 2.0 (USB Co.). We sequenced both double-stranded DNA from the original plasmid clones and the single-stranded DNA. Because the sizes of most subcloned fragments that were cleaved with HaeIII or AluI were less than 300 bp, we could read a full length of any insert in one sequence reaction. These fragmental sequences were connected and assembled by GENETYX (Software Development Co., Ltd., Japan). The sequences of 4.9 kb in length were aligned together with the human homologue to compute the pairwise sequence differences.

Results and Discussion

Mode o f Substitution

The nucleot ide sequences for the six h o m i n o i d species are shown in Fig. 1. The length o f the aligned sequence is 4938 bp, which corresponds to the ho- mologous region o f bp 4121-9025 in the h u m a n

m t D N A sequence (Anderson et al. 1981). The actual length is, respectively, 4905 bp in h u m a n and com- m o n chimpanzee , 4904 bp in p y g m y chimpanzee , 4909 bp in gorilla, 4928 bp in orangutan, and 4910 bp in s iamang. The por t ions cor respond to 140 bp at the 3' end of ND1 and 499 bp at the 5' end o f A TPase 6.

Excluding two incomple te ly de te rmined codons in ND1 and A TPase 6, small over lapping parts, and noncoding parts in which inser t ion/delet ion is frequent, we examined sequence differences in the remain ing region (4759 bp), classifying t hem into t ransi t ions (AG/TC) and t ransvers ions (V). Tran- sit ions were d iv ided into two types, A G and TC, because o f the composi t ional bias. Thus, in the t R N A region, there are three different categories. In the prote in-coding region, we t reated three codon posit ions separately and classified t hem further into s y n o n y m o u s (S) and n o n s y n o n y m o u s (N). At the first positions, there are A G nonsynonymous (AGN), T C n o n s y n o n y m o u s (TCN), TC s y n o n y m o u s (TCS), and transversional , n o n s y n o n y m o u s (VN) differences. Similarly, at the second posit ions, there are A G N , TCN, and V N differences, whereas, at the third posit ions, there are AGS, TCS, VS, and VN. For convenience, these symbols are used together with a subscript when coding posi t ions are specified; for example , TCS1 means T C synonymous change at the first codon posit ions.

Table 1 shows sequence differences thus classified in the t R N A and prote in-coding regions. The high subst i tut ion rate o f m t D N A raises several cautions. Synonymous transi t ions such as AGS3, TCS1, and TCS3 level off rapidly and are in some cases satu- ra ted even between h u m a n and chimpanzees . This is consistent with wel l -known high t ransi t ion rates in m a m m a l i a n m t D N A , the ratio o f A G / T C changes to V changes being abou t 10 (Brown et al. 1982) or more . However , the kinetic behav iors of var ious types o f synonymous changes (AGS3, TCS~, TCS3, and VS3) differ f rom one another . Such differences m a y be accounted for by their different saturat ion levels. Usually, the content o f G residues at the third codon posi t ions is ex t remely low (3-10% depending on genes as well as lineages: Anderson et al. 198 i; Brown et al. 1982), so the sa turat ion level o f A G transi t ions at the third posi t ions mus t be low and a t ta ined rapidly. This s i tuat ion is s imilar to that for TC transi t ions at the first posi t ions in which Leu codons ( T T R and CTR) are involved. Because there are only 132.3 such Leu codons on average, some 40 TC differences (Table 1) imply that more than one- th i rd o f Leu codons have undergone single TC transit ions. The slower leveling-off in synonymous T C transi t ions and part icularly in synonymous t ransvers ions is due to relative abundances o f A, C, and T residues.

34

60 120 HUM GAATTCGAACAGCATACCCCCGATTCCGCTACGACCAACTCATACACCTCCTATGAAA•AACTTCC•ACCACTCACCCTAGCATTACTTATATGATATGTCTCCATACCCATTACAATCT

CHI T T G C C G A C

PYG A T C TT C C G A T C

GOR T T T T G G T G G C C A T T CC

ORA CA A G T G C A C CA C

T G SIA C A T T G G CC C CA T TC T TG ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

180 240

HUM CCAGCATTCCCCCTCAAACCTAAGAAATATGTCTGATAAAAGAGTTACTTTGATAGAGTAAATAATAGGAGCTTAAACCCCCTTATTTCTAGGACTATGAGAATCGAACCCATCCCTGAG

CHI C A T C T A T

PYG C A T T G

GOR AG T T

ORA TG A C A AG TC T G T C

SIA T C T C C CG C G T T A C AG G T ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

300 360

HUM AATCCAAAATTCTCCGTGCCACCTATCACACCCCATCCTAAAGTAAGGTCAGCTAAATAAGCTATCGGGCCCATACCCCGAAAATGTTGGTTATACCCTTCCCGTACTAATTAATCCCCT

CHI C

PYG

GOR G C C T C

ORA C C C

SIA C CG G TG T - A C ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

420 480

HUM GGCCCAAcc•GTCATCTACTCTACCATCTTTGCAGGCACACTCATCAcAGCGCTAAGcTcG•ACTGATTTTTTACcTGAGTAGGC•TAGAAATAAACATGCTAGCTTTTATTCCAGTTCT CH~ A A C A G T A C T A C A C

PYG A A C T T T A A C T A C C GOR A C T T T C G A C C

ORA A T A CTC G CA T T A G C C G C G A C C C C

SIA A T A A T T T C ATCT A CC CG T C C CT ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

540 600

HUM AACcAAAAAAATAAACCCTCGTTCCACAGAAGCTGCCATCAAGTATTTcCTCACG•AAGCAACCG•ATCCATAATCCTTCTAATAGCTATCCTCTTCAACAATATACTCTCCGGA•AATG CHI G C C C A C T A T G T C G C GC

PYG G C C A C T A G G T C C C

GOR T C C C A A T C C C C

ORA C G C C C T A A TT C C CA C T T G

SIA A C C T T GTA A C CT T CC T ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

660 720

HUM AACCATAACCAATAcTACCAATCAATACTCATCATTAATAATCATAATAGCTATAGCAATAAAACTAGGAATAGCCC•CTTTCA•TTCTGAGTCcCAGAGGTTA••CAAGGcACcccTCT CHI T A G T T A C

PYG C T C G T T A C

GOR C G C T G G G T G A C C

ORA C G C C C T TC TG CC CC C A C AGT C

SIA C C T C C T C CT C C A C T A TA C ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

780 840

HUM GACATCCGGCCTGCTTCTTCTCACATGACAAAAACTAGCCCCCATCTCAATCATATACCAAATCTCTCCCTCACTAAACGTAAGCCTTCTCCTCACTCTCTCAATCTTATCCATCATAGC

CHI A T A C C T T T T CT A G A C T G T

PYG A T A C C T A CT A G A C T G T

GOR A TG T A C C T G T CT G AC TAC T C C ORA A A C C T T T T A AC A G G A A C C TC T C G T

SIA C A A C T T TC AGT A A A C AT T T T T ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

900 960

HUM AGGCAGTTGAGGTGGATTAAACCAGACCCAGCTACGCAAAATCTTAGCATAcTCCTCAATTACCCACATAGGATGAATAATAGCAGTTCTACCGTACAACCCTAACATAACCATTCTTAA

CHI C C C A A C C C C A T

PYG C C C A A C C C A

GOR C C C A T A G TC C TG A T T T

ORA C C C A A A C G C T G A C A C T C C

SIA C C C A CAAC TCGC AT CT G C TG C C C A T A A C T C ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

1020 1080

HUM TTTAACTATTTATATTATCCTAACTACTACCGCATTCcTACTACTCAACTTAAACTCCAGCACCAcGACCCTACTA•TAT•TCGCACCTGAAACAAG•TAACATGACTAA•ACCCTTAAT CHI C C C T G A T T C

PYG C A G A T TC

GOR C G C C A T C G A T T T T A G T TC

ORA CC TC C C G A A A T G T A T GT C A G T C

SIA CC C C CG A A C T GC C T A T C T AT C CT C . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . +

1140 1200

HUM TCCATCCACCCTCCTCTCCCTAGGAGGCCTGCCCCCGCTAACCGGCTTTTTGCCCAAATGGGCCATTATCGAAGAATTCACAAAAAACAATAGCCTCATCATCCCCACCATCATAGCCAC

CHI A A T C A A TT C T T

PYG C A A C A TT C T T C T

GOR C C T A A CC A CTT T CGA T C T G T

ORA CT A T A A T G A CC A A T C T G G G C AT T TGCT A T

SIA C AT A A G A T A CC A CTTG C T GC GG CT T A T T T ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

1260 1320

HUM CATCACCCTCCTTAACCTCTACTTCTACCTACGCCTAATCTACTCCACCTCAATCACACTACTCCCCATATCTAACAACGTAAAAATAAAATGACAGTTTGAACATACAAAACCCACCCC

CHI T T T T T A C

PYG T T C G T T C T T A T

GOR C T T T T A C G C C T T T

ORA T T T C GC T AT G A C A G

SIA G TA C A A T T C C T A A C T A T ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

Fig. 1. Cont inued on pages 35 -37 ,

35

1380 1440

HUM ATTCCTCCCCAcACTCATCG~CTTACCACG~TACTCCTACCTAT~TC~CCTTTTATAcT~ATAATCTTATAGAAATTTAGGTTAAATACAGACCAAGAGCcTTCAAAG~CcTCAGT~G CHI C T A A G T C C C GC C

PYG C T A A T C C G C T

GOR C T T A A C C T C C C G TC C T T

O~ TC CTAT G C T C C C TCC CC --C C

SIA TC CTAT C T T A G AC C C T CCCTGC CC C ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

15D0 1560

HUM TTG-C~TACTTAATTTCTGT~CA-GC-TAAGGACTGCAAAACCCCACTCTGCATC~CTG~CGCAAATCAGCCACTTTAATT~GCTAAGCCCTTACTAGACCAATGGGACTTAAAC

CHI A- C -A - TT

PYG A- C -A - TT

GOR A- C C -A - T T C

OH CA- GC --- T G C G

SIA AA A CA CC G T T C T G ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

1620 1680

HUM CCACAAACACTTAGTTAACAGCT~GCACCCTAATCAAC-TGGCTTCAATCTACTTCTCCCGCCGCCGGGAAAAAA-GGCGGGAG~GCCCCGGCAGG-TTTGAAGCTGCTTCTTCG~T

CHI T A AA A -

PYG T A G - AA A - T

GOR T A G - AA A

O~ T A G T- G C G T A CC A C

SIA T A T A A C G A - A-G T C ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

1740 1800

HUM TTGCAATTCAATATGAAAA-TCAC~TCGGAGCTGGTAAAAAGAGGCCT~CC~CTGTCTTTAGATTTACAGTCCAATGCTTCA-CTCAGCCATTTTACCT~A~CCCC ...... ACTGATG

CHI - A T C T .......

PYG - A T - T .......

GOR T- C G T C T - CTTTTTTCC---

OH C C - A G T G TC C T C T A G C CT TTTTCTCC A

SIA CG A T C TT C T C T T - C AT T ....... A ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

1860 1920

HUM TTCGCCGACcGTTGACTATTCTCTACAAACCA•AAAGACATTGGAACACTATAcCTATTATTCGGCGCATGAGCTGGAGTCCTAGGCACAGCTCTAAGCCTCCTTATTCGAGCCGAGCTG CHI A C T C T G C T G T A A

PYG A C T C A T G C T T AT A

GOR A C T T T T C C T A A T

OH C G C G T G G G T T C C T T A A

SIA C GT C T T T GC T C C C A A ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

1980 2040

HUM GGc~AGCCAGGC~CcTTCTAGGTAACGACCACATCTACAACGTTATCGTCACAG~CCATGCATTTGT~TAATCTTCTTCATAGT~TACCCATCATAATCGGAGG~TTTGGCAACTGA CHI A C T T C C G T T G

PYG A T T C T G C T T C G

~R T A T T T G C T G T G

O~ A C C T T T T C C T T G C G A T

SIA T A C T C G C T T T C A C G C C T T G T G ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... ÷ ......... +

2100 2160

HUM CTAGTTCCc•TAATAATCGGTGCCCCCGATATGGcGTTTCCcCGCATAAA•AA•ATAAGCTTCTGACTCTTACCTCCCTCTcTCCTACTCCTGCT•GCATCTGCTATAGTGGAGGCCGGA CHI T G T C A C G C G C T T A T C A A C

PYG T G T C A C T C C T T A T C A A C

GOR A T T C A C C T C T T T C A A C

O~ G G T C T A C G T C C TC CT C AT A T C A

SIA C T A C A C T T T G T C T T A T C C C A A C ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

2 2 2 0 2280

HUM GCAGG~CAGGTTGAACAGTCTACCCTCCCTTAGCAGGGAACTACTCCCACCCTG~GCCTCCGTAGACCTAACCATCTTCTCCTTACACCTAGCAGGTGTCTCCTCTATCTTAGGGGCC

CHI G G A G T T G CA C A

PYG C G A T G T G G C C A

GOR G T G A T T C T T T T C CA TC

O~ G C T A C A T A T T G A TC CA T A TC T

SIA C A T C A G A T A T T T T T T T C G T A A A C T . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . +

2340 2400

HUM ATCAATTTCATCACAAC~TTATCAATATAAAACCCCCTGCCATAACCC~TACCAAACG~CCTCTTCGTCTGATCCGTccTAATCACAGCAGT~CTACTTCTCCTATCTCTCCCAGTC CHI T C T T G A T C

PYG T C C T T A T T T C

GOR C T C C A T T T A

O~ T T A T T T AA G C C

SIA T C T T C C A T T T A T T T C T C C A T ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

2460 2520

HUM CTAG~TGCTGGCATCACTATACTACT~CAGACCGC~CCT~AACAC~ACCTTCTTCGACCCCGC~GGAGGAGGAGACCCCATTCTATACCAACACCTATTCTGATTTTTCGGTCAC~CT CHI C T G T T T A G T T T T C C

PYG C T T T T A T G T T T T C C

GOR A T C T T T T A T T C T T A C

OH C C T A T T A T G T T G T T C T C SIA C T C T T T A T T G T C C

......... + ......... + ......... + ......... +---~ ..... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

2580 2640

HUM G~GTTTATATTCTTAT~CTACCAGGCTTCGGAATAATCTCCCATATTGT~CTTACTAcTCCGGAAAAAAAG~CcATTTGGATACATAGGTATGGTCTGAGcTATGATATCAATTGGC CHI T C T T C T A

PYG C T C T A C C T A T

GOR C A C T C T A A T

O~ C C C T C C C AC G T C A C AG C T

SIA C G G C T C A C A T A C C C G ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

Fig. 1. Cont inued on next page.

36

2700 2760

HUM TTCCTAGGGTTTATCGTGTGAGCACACCATATATTTACAGTAGGAATAGACGTAGACACACGAGCATATTTCACCTCCGCTACCATAATCATCGCTATCCCCACCGGCGTCAAAGTATTT

CHI G C C T T T C

PYG T C T T T T C

GOR G A T C C T C C C

ORA G T A C C C G G C C T T C

SIA T C T C C C G C T A C T T C T T T . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . +

2820 2880

HUM AGCTGACTCGCCACACTCCACGGAAGCAATATGAAATGATCTGCTGCAGTGCTCTGAGCCCTAGGATTCATCTTTCTTTTCACCGTAGGTGGCCTGACTGGCATTGTATTAGCA•ACTCA

CHI T T C A G T C A C C

PYG G T T C A G C G A C T C

GOR T T T CC C A G T C T C A C C C G

ORA T C CT C A C T T C C C T A A C C G

SIA G TG C CC C A C T C C T A G A T C T ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

2 9 4 0 3 0 0 0

HUM TCACTAGACATCGTACTACACGACACGTACTACGTTGTAGCCCACTTCCACTATGTCCTATCAATAGGAGCTGTATTTGCCATCATAGGAGGCTTCATTCACTGATTTCCCCTATTCTCA

CHI T G A C C T C C

PYG T A T C C C C T

GOR T T G G A T C T T G C G T

ORA T T A T C T C C G C A

SIA G T T A T C T T C C C T G C C G . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . +

3 0 6 0 3120

HUM GGCTACACCCTAGACCAAACCTACGCCAAAATCCATTTCACTATCATATTCATCGGCGTAAATCTAACTTTCTTCCCACAACACTTTCTCGGCCTATCCGGAATGCCCCGACGTTACTCG

CHI T T A TG C G T C C G C T T G

PYG T T A TG C T C C C T G T

GOR T T T C TG C T T C T T A A

ORA A G T T T C TC C TG T C G T C T A T A C C

SIA TT T T T C G C T TG A A CT C T C G A T T C . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . +

3 1 8 0 3 2 4 0 HUM GACTACCCCGATGCATACACCACATGAAACATCCTATCATCTGTAGGCTCATTCATTTCTCTAACAGCAGTAATATTAATAATTTTCATGATTTGAGAAGCCTTCGCTTCGAAGCGAAAA

CHI TG C T C C G T A A

PYG T TG C C C G A T A A

GOR T T T G C G C T A C C A A

ORA T C T T TT C C T C C T C A T C A A

SIA T T C C G T C C A C G T C A . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . +

3 3 0 0 3 3 6 0

HUM GTCCTAATAGTAGAAGAACCCTCCATAAACCTGGAGTGACTATATGGATGCCCCCCACCCTACCACACATTCGAAGAACCCGTATACATAAAATCTAGACAAAAAAGGAAGGAATCGAAC

CHI G GC A

PYG G GC G G G

GOR A C C T G T T T T GT T T G A

ORA C A T C T C G A GT C T G T C T C CGAG

SIA A A C C G CT A GT G C A T T T G C T C . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . +

3420 3480

HUM CCCCCAAAGCTGGTTTCAAGCCAACCCCATGGCCTCCATGACTTTTTCAAAAAGGTATTAG~CCATTTCATAACTTTGTCAAAGTTAAATTATAGGCT-AAATCCTATATATCTTA

CHI T A A T C T - CC CG

PYG T A C A T C T - C CG

GOR A T A T G CG T - C CG

ORA T T AA C C G C TC- G C G G

A C C TCC C CG SIA T T A C G AA TT T ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

3540 3 6 0 0

HUM ATGG•ACATG•AG•G•AAGTAGGTCTACAAGACGCTACTTCCC•TATCATAGAAGAGCTTATCACCTTT•ATGATCA•GCCCTCATAAT•ATTTTCCTTATCTGCTTCCTAGTCCTGTAT CHI T A T T C C T T C T C T A C

PYG C A T T C T T C T C C

GOR T A A T T T C T C

ORA G C A G C T AT GG T C C C A C

SIA T C A A A T T C C C T A A ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

366Q 3720

HUM GCCCTTTTCcTA•CACTCACAACAAAACTA•CTAATACTAACATCTCAGACGCTCAGGAAATAGAAACCGTCTGAACTATCCTGCCCGCCATCATCCTAGTCCTCATCGCCCTCCCATCC

CHI GT T C T T A

PYG GT T C T T T T

GOR G C AC C A G A C T T T G T

ORA G C C C T C A G TAT A A A T A T

SIA C CC T T A G C A G A A T T T T T A T . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . +

3780 3840

HUM CTACGCATCCTTTACATAACAGACGAGGTCAACGATCCCTCCCTTACCATCAAATCAATTGGCCACCAATGGTACTGAACCTACGAGTACACCGACTACGGCGGACTAATCTTCAACTCC

CHI G T C T T T C T A T A G

PYG T T C T T T C T A T A

GOR A C T G T AA T C T C T C T A A T T T G T

ORA C T AA C T T C T C T T T T T G T SIA C T AA C T T GG C T T G A G A T A T G T T

. . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . +

3 9 0 0 3 9 6 0

HUM TACATACTTCCCCcATTATTCCTAGAACCAGGCGACCTGCGACTCCTTGACGTTGACAATCGAGTAGTACTCCCGATTGAAGCCCCCATTCGTATAATAATTACATCACAAGACGTCTTG

CHI C T T T A T C G C AG G T TC A

PYG C T A T C G G C AG G T TC A

GOR T C T A C C C T AG G C C T A

ORA G C A C T A C C AG C G C C C

SIA A A T G G T A C C T T A TG C C A ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

Fig. 1. Cont inued on next page.

37

4020 4080

HUM CA CT C ATGAGC T GTCCCCACAT TAG GCT TAAAAACAGATGCAAT T C C C GGACG TCTAAACCAAACCACT TTCACCGC T ACACGACCGGGGG T ATACTACGG T C AATGCTCTGAAATCTGT

CHI T C C C C A A C A

PYG C C C C A A T C A

GOR T C C C G C G A C A A C G A

ORA A C T C C G C C C A T C A A T T C A

SIA A T C TC T G T C C A C C A A C A ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

4140 4200

HUM GGAGCAAACCACAGTT T CATGCCCAT CGTCC TAGAATTAAT T C CCC TAAAAAT C T T TGAAATAGGGCCC GTATTTAC C C TATAGCACC C C C TC TA- CCCCCTC TA ...............

CHI T A C T A C T TT C T C ...............

PYG T A G C A C T A TT C T C ...............

GOR C T T T GC C A CG AT TCTC T T C ...............

ORA T C T T C C C A CG TT A T TT A CCC C C TCCTACCTCCTTTCC

SIA G C T C T A G T T C T C G T A C G CTCTG- T C GT ............... ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

4260 4320

HUM -GAGCC-CACT GTAAAGC T AACTTAGCATTAAC C T T T T AAGTTAAAGAT T AAGAGAAC CAACACCTCTT T ACAGTGAAATGCCCCAACTAAATAC T ACCGTATGGC CCAC CA TAAT T AC C

CHI T- C G G CG A

PYG -A T- C G CG A C

GOR T- C G T T GG G G C G ORA T G - T C C C G CT G A C AC T G C C T

SIA -A T T G GACC C T C G T C C G A T A C C A ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

4380 4440

HUM C C CATACTC C TTACACTATTCCTCATCACC CAACTAAAAATATTAAACACAAACTACCACCTACC TC C CTCACCAAAGCCCATAAAAATAAAAAATTATAACAAACCCTGAGAAC C AAA

CHI G T G TT T T C A C C T

PYG T T T T T T C A C C T GOR A C T T G T T C T AA T C TCTGT

ORA A C T G GT C T T C CT C CA C ATTT CC C CCCC CGC

SIA T A T C C TA C C TGT T C CG C A T C T CCCC C C AC ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... ÷ ......... + ......... +

4500 4560

HUM TGAACGAAAAT CTG T TCGCTTCAT TCATTGCCCCCACAATCC T AGGC CTACCCGCCGCAGTACTGATCAT T C T ATT TCC C C CTC T ATTGAT C C CCAC C TCCAAATAT CTCATCAACAAC C

CHI A GC T A C C G T T C

PYG GC C C G T T C

GOR T A T T A T C C C

ORA G A A CC C A G C T A T AG CT CT C AG T C T

SIA A GC T CA T T C A T T T C T ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

4620 4680

HUM GAC TAAT C AC CA C C C AACAAT GACTAAT C AAAC T AAC CT C AAAACAAAT GATAAC C ATACACAACACTAAAGGACGAAC C T GATCTC TTATAC T AGTATCCT TAAT C ATTT TTATT G C CA

CHI T TC G A T G C A

PYG T TC G A T C T

GOR G TG G C A T T G C G C GT A GA T

ORA T T G CG C CT A T CC T T C A AC C C T

SIA T TC T G T A AC C C T A T TC T C T ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

4740 4800

HUM CAAC TAAC C TCC T C GGACTCC TGCCTCAC T CAT TTACAC C AAC C AC C C AAC TAT C T ATAAACC TAGCCAT GGC C ATCCCC T T ATGAGC G GGCACAGTGATTATAGGC T T TCGC T CTAAGA

CHI C T T T G T A C C T C A G AG C T

PYG T T T A C T C A AG C T

GOR C T C C T G G A C G A G A C C C

ORA C T C CT T G C T T A A C AA G AGCC G C A TC G

SIA T C T A C C G C A A T C A T GCC C T A TCTC A ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

4860 4920

HUM T TAAAAA T G C C C TAG C C CAC T T C T TAC CACAA G G CACAC C TACA C C C C T TAT C C C CA TAC TAGT TAT TAT C GAAA C CAT CAG C C TACTCAT T CAAC CAATAGC CC T G G C C G TAC GCC TAA CHI C G C T T T A T

PYG C G A C T T T A T

GOR C C AC C T C C T T C A T

ORA CC T AG T C G T C C C A T T T TT T GC T A

SIA CC CA T T T C T C G C T A C T T T T C C G A AT ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... +

4936

CCGCTAACATTACTGCAG HUM

CHI

PYG

GOR C ORA C C

s~A __L__~___L+____~ . . . . . .

Fig, 1. Continued from pages 34-36. Nucleotide sequences for 4.9-kb fragments ofmtDNA from six species of hominoids; HUM = human; CHI = common chimpanzee; PYG = pygmy chimpanzee; GOR = gorilla; ORA = orangutan; SIA = siamang. The nucleotide sequences for the human are derived from Anderson et al. (1981). The whole nucleotide sequence for the human is shown on the top. For the other species, only nucleotides different from those in the human sequence are shown. A gap is denoted

by a dash (-). The 17 genes encompass the following nucleotides: ND1, 1-140; tRNA ll~, 143-211; tRNA Gtu, 209-280; tRNA Me', 282- 349; ND2, 350-1390; tRNA TM, 1392-1460; tRNA Ate, 1470-1538; tRNA As', 1540-1613; tRNA cys, 1646-1713; tRNA r~r, 1713-1779; COL 1798-3336; tRNA ser(ucN;, 3339-3410; tRNA Asp, 3412-3480; C011, 3481-4161; tRNA z~s, 4207-4277; ATPase 8, 4279-4482; ATPase 6, 4440-4938.

A l t h o u g h these trans i t ions m a y be o f l i tt le phy- l ogene t i c va lue , it s h o u l d be n o t e d that large differences are of ten f o u n d in the c o m p a r i s o n w i t h orangutan, rather than s iamang , w h i c h we c ho s e as an

out -group spec ies ( P i l b e a m 1984) . T h i s h o l d s true also for VS3 differences, w h i c h do n o t p r o v i d e any clear e v i d e n c e o f l e v e l i n g - o f f a m o n g h o m i n o i d s . D e - spite this s l ow rate, the largest difference is f o u n d

38

Table 1. Sequence differences in the protein-coding (above the diagonal) and tRNA (below the diagonal) regions sequenced among six hominoid species

Common Pygmy chimp chimp Human Gorilla Orangutan Siamang

Common AG 10/2/(40) 18/5/(91) 27/9/(92) 49/7/(100) 31/5/(82) chimp TC 5(4)/4/(74) 4(20)/10/(201) 9(23)/9/(220) 26(40)/43/(255) 24(41)/38/(298) (C) V 0/0/1(8) 2/0/1(14) 6/3/5(41) 22/9/14(79) 35/7/14(109)

Pygmy AG 4 16/3/(82) 27/7/(82) 45/5/(96) 26/3/(78) chimp TC 7 7(18)/14/(204) 11(24)/13/(211) 26(41)/45/(266) 25/(39)/40/(286) (P) V 1 2/0/2(14) 6/3/6(41) 22/9/12(80) 35/7/15(105)

Human AG 10 10 27/4/(93) 49/6/(99) 26/2/(96) (H) TC 18 17 9(37)/15/(238) 25/(45)/44/(256) 22(37)/40/(300)

V 1 0 6/3/4(41) 24/9/13(77) 35/7/14(110) Gorilla AG 13 15 16 45/7/(96) 30/5/(88)

(G) TC 21 22 25 22(45)/42/(268) 21(40)/37/(301) V 3 2 2 24/12/15(80) 35/10/16(115)

Orangutan AG 25 25 25 23 48/4/(90) (O) TC 36 37 40 41 28(42)/47/(271)

V 4 3 3 5 29/14/15(125) Siamang AG 23 23 25 26 31

(S) TC 39 36 47 43 47 V 8 7 7 9 10

Each element shows AG/TC transitions and transversions (V). For the protein-coding region, the differences are given for the first, second, and third codon positions (from left to right, separated by slashes). Differences at the first and third codon positions can be either synonymous (in parentheses) and nonsynonymous so that these are also given separately. Abbreviations in parentheses followed with species names are also applied to those in Tables 2 and 3, and Fig. 2

also be tween orangutan and s iamang (notice a large value between orangutan and siamang: Table 1). I f m o s t s y n o n y m o u s changes are selectively neutral ( K i m u r a 1983; Nei 1987), these findings (except in ND2) suggest tha t m u t a t i on rates, whether transi- t ional or t ransversional , have increased in the orang- u tan lineage.

N o n s y n o n y m o u s and tRNA changes, against which selective constraints would be stronger than against s y n o n y m o u s ones, show no evidence o f leveling-off a m o n g hominoids . Yet, it is apparen t that some prote in genes (ND2, ATPase 8, ATPase 6) in the orangutan lineage have accumula ted those changes m o r e rapidly than in any other lineage. Fur- the rmore , o rangutan tRNAs , in part icular Ash, Tyr, Ser (UCN), have m a n y unique substi tutions, not found even in the compar i son a m o n g m a m m a l s including cow (Anderson et al. 1982) and mouse (Bibb et al. 1981). Therefore , one possibil i ty for these enhanced rates is a re laxat ion o f funct ional constraints against molecules (K_imura 1983, 1987; Li et al. 1985). We examined this possibil i ty through codon usage pat terns and a m i n o acid compos i t ions (Table 2). Using the m a x i m u m p a r s i m o n y m e t h o d (Fel- senstein 1990), we identified five a m i n o acids that are m o s t often invo lved in the enhanced nonsynon- y m o u s changes in the orangutan lineage. They are Ile, Met, Val, Thr, and Ala, all o f which, except Thr, are hydrophob ic and nonpolar . The SOAP profile (Kyte and Dooli t t le 1982) showed that the hydrop-

a thy o f orangutan proteins is well conserved. These analyses suggest that prote in functions in the orang- u tan lineage have not been al tered greatly. Al though it is unclear that such a functional conservat ion can be appl ied to t R N A , an increased muta t ion rate is a more likely explanat ion for the enhanced substi- tu t ion rate in the orangutan lineage. Obvious ly , this hypothesis explains the e levated subst i tut ion rates in the t R N A s as well. Keeping these results in mind , we carr ied out analyses o f m t D N A genealogy.

Resolution of Trichotomy

We used neighbor-joining (NJ) (Saitou and Nei 1987) and m a x i m u m likel ihood (ML) (Felsenstein 1990) methods , which are suited for var iable subst i tut ion rates a m o n g lineages. I f we analyze the whole region, ignoring possible heterogeneous subst i tut ion rates along D N A sequences, the n u m b e r o f subst i tut ions assigned to the in ternodal b ranch between the m o s t recent c o m m o n ancestor o f human--ch impanzee and that o f h u m a n - c h i m p a n z e e - g o r i l l a becomes larger than 50 (Table 3). Fur thermore , the m a x i m u m par- s imony (Felsenstein 1990) analysis shows that the h u m a n - c h i m p a n z e e clade is 26 + 9.4 more parsi- mon ious than the ch impanzee-gor i l l a clade and the m a x i m u m likel ihood analysis shows that the fo rmer is e 53.9 ~ 1023 t imes as likely as the latter. Although these es t imates m a y well be model -dependent , the impl ica t ion is tha t a major i ty o f sequence differences

Siamang

(4,S) 4

(3,4l

I

13 Myr

(4,0)

(3,G)

3 (2,H)

(2,3) 1211 , P)

(1,C)

I I I

T T T g h c

Gorilla

Human

Pygmy chimpanzee

Common chimpanzee

Orangutan

39

Fig. 2. Genealogy of six hominoid mtDNAs. As a cladogram, all tree-making methods best support this mtDNA genealogy, irrespective of genes or data sets as in Table 3. A node is numbered through 1-4, and a branch length between X and Y (either nodes or tips) is indicated by (X, Y). The gorilla splitting time (Tg in text) was estimated from the divergence time between orangutan and the African apes (13 Myr ago) multiplied by the proportion of

(H,G) branch length, (3, 4) + (H, G)' where (H, G)

= 1/2[(2, 3) + (2, H) + (3, G)], and then the human divergence time (Th in text) was ob-

tained by (2, H)Tg Here, a constant (2, H) + (2, 3)"

rate is assumed in both lineages leading to gorilla and human. In the chimpanzee lineage, the rate appears to be constant but somewhat

(C, P)Th retarded, so (1, 2) + (C, P)' where (C, P) =

1/2[(I, C) + (1, P)]. The maximum sampling errors of estimated divergence times were computed by modifying the method in Taka- hata and Tajima (1991).

undoubtedly support the human-chimpanzee clade, provided that the mtDNA genealogy is topologically identical to the hominoid ancestry (see below). Thus, we conclude that chimpanzees are the closest extant relatives to human. With relatively small numbers of nucleotide differences, the same conclusion was drawn by data on the COHgene (Ruvolo et al. 1991), the ~b~- (Koop et al. 1986; Miyamoto et al. 1987) and ~-globin genes (Koop et al. 1989), the rRNA gene (Gonzalez et al. 1990), and the immunoglobulin-~ pseudogene (Ueda et al. 1989). This is also supported by DNA-DNA hybridization (Sibley and Ahlquist 1984; Caccone and Powell 1989; Sibley et al. 1990).

Divergence Times

The dating of species divergences is a subtler problem. It depends more strongly on the inference of multiple-hit substitutions. Although there are a number of statistical studies on such inferences (KJ- mura 1983; Nei 1987), we think that information is still too meager to construct accurate methods for correcting extensive multiple-hits. For long sequences that include many different genes, another complication might occur due to gene-specific evolutionary rates and/or to extensive multiple-hit substitutions in some regions. To avoid these problems and obtain reliable dating, it seems necessary to restrict an analysis to relatively conserved regions that we are, to a certain extent, confident of what might happen in the evolutionary process. In fact, if we consider all types of differences, the estimates of branch lengths in the mtDNA genealogy by MP, ML, and NJ would differ from each other not to a small extent (the first three rows in Table 3). For

this reason, AG-TCS3 and T C S 1 differences were excluded in N J, whereas all information at the third codon positions was discarded in MP and ML (but not TCS~ differences for technical reasons).

However, it is statistically undesirable to divide data into too many categories. The first data set we collectively used for dating hominoid diversifica- tions consists of the tRNAs, first and second codon posit ions (DATA1). Even for these conserved regions, MP tends to underestimate long branch length and overestimate recent divergence times. For this reason, we did not use it for dating. It turned out that synonymous changes at the first positions of Leu codons have substantial effects on the esti- mation. Because such changes are nearly saturated among hominoids (Table 1), we became skeptical of including them. We therefore examined another data set DATA2, which consists of the tRNA and nonsynonymous differences. As mentioned, VS3 differences do not show any leveling-off. However, because VS3 differences alone are subjected to large sampling errors due to a small number of relevant sites (481), we included them into DATA2 (DATA3). DATA2 and DATA3 could be analyzed only by NJ.

Under the assumption that the divergence time between orangutan and the African apes is 13 Myr ago (Pilbeam 1984), we estimated the divergence times of gorilla (T~), human (Th), and between common and pygmy chimpanzees (To) to be

T~ T~ T~ DATAI(ML) 7.0 4.3 2.1 DATA 1 (NJ) 8.1 5.6 2.8 DATA2(NJ) 7.6 5.2 2.7 DATA3(NJ) 7.7 4.7 2.5

4O

Table 2. Codon usages in the protein-coding region (NDI, ND2, CO1, COIl, ATPase 8, and ATPase 6)

C P H G O S C P H G O S

Phe Ser TTT +2 - 3 27 ±0 - 6 - 1 TCT - 4 - 4 17 ±0 - 3 - 3 TTC - 2 +5 47 ±0 +10 +2 TCC +2 +1 29 +4 ±0 +4

TCA +6 +6 21 - 3 +6 ±0 TCG - 1 ±0 3 ±0 - 3 ±0

*Leu TTA - 3 - 6 24 ±0 ±0 +3 TTG ±0 +1 3 ±0 +3 +1

Leu Pro CTT +1 +5 23 - 1 -11 +1 CCT CTC - 3 - 6 47 ±0 +10 +2 CCC CTA +4 +5 91 - 2 - 2 ±0 CCA CTG ±0 - 2 14 +5 +2 - 7 CCG

±0 +5 16 ±0 - 4 +5 - 1 - 6 54 - 3 - 6 -16 +3 +3 19 +4 +11 +13 - 3 - 3 4 - 4 ±0 - 2

lie Thr ATT +8 +9 39 +4 + 10 + 19 ACT ATC - 1 0 -13 77 - 4 - 2 -17 ACC

ACA ACG

Met ATA - 1 ±0 71 - 6 -17 - 9 ATG +1 - 1 13 +1 - 2 - 4

Val Ala GTT +4 +4 9 ±0 - 6 - 3 GCT GTC ±0 ±0 22 - 2 +10 +4 GCC GTA ±0 - 1 29 ±0 - 4 +3 GCA GTG - 1 +1 4 +1 - 2 - 1 GCG

+4 +4 19 +7 - 1 +8 - 1 - 2 59 - 5 - 6 -16 ±0 +5 44 +4 +7 +14 - 4 - 3 5 - 5 +1 - 2

- 4 - 2 20 - 1 +2 - 4 +2 +3 35 +8 +6 +17 + 0 - 4 29 - 1 + 1 - 7

+1 - 1 4 - 2 - 3 - 3

The absolute number of codons used is given only for H (human), and the relative gains/losses in the remaining species are shown by + signs. Codon usage patterns differ from lineage to lineage, but most changes occur between synonymous codons (e.g., 13 gains of TAT and 13 losses of TAC on gorilla). Notable exceptions are lie, Met, Val, and His in the orangutan lineage (see text). Each of the underlined codons is read by a single tRNA through Watson-Crick pairing, which is most often used in a two-codon group, whereas the C residue rather than the A is often preferred at the third positions of four-codon groups. The anticodons indicated by the asterisks have modified Us in the first anticodon positions

M y r ago (see l egend in Fig . 2 for t h i s e s t i m a t i o n m e t h o d ) . M L o b t a i n e d a l i t t le s h o r t e r d i v e r g e n c e t i m e s t h a n N J for t he s a m e D A T A 1. T h i s d i f fe rence a p p e a r s to b e c a u s e d b y the p r o b l e m a t i c s y n o n y -

m o u s changes in L e u c o d o n s . I n d e e d , i f we b a s e M L o n l y o n the t R N A s a n d the s e c o n d c o d o n p o s i t i o n s , we h a v e Tg = 7.6, Th = 5.3, a n d Tc = 2.8, w h i c h a r e c lose to t h o s e o b t a i n e d b y N J . T h e d a t i n g f r o m D A T A 3 d o e s n o t d i f fer g rea t ly f r o m t h a t o f D A T A 2 in b o t h o f w h i c h t h e r e is e v i d e n c e for a m o l e c u l a r c l o c k ( T a b l e 3). I n t e r m s o f s a m p l i n g e r ro rs , h o w - ever , D A T A 3 is o b v i o u s l y m o s t p r e f e r a b l e ( T a k a -

h a t a a n d T a j i m a 1991). T o c o n c l u d e , we c h o s e D A T A 3 a n d u s e d the N J

d a t i n g a b o v e . T h a t is, Tg = 7.7 +__ 0.7 M y r ago, Th = 4.7 +__ 0.5 M y r ago, a n d Tc = 2.5 + 0.5 M y r ago. T h e Tg is s i m i l a r to t he e s t i m a t e s f r o m D N A - D N A h y b r i d i z a t i o n (S ib ley a n d A h l q u i s t 1984; C a c c o n e

a n d Powe l l 1989; S ib ley et al. 1990), the ~n -g lob in genes ( K o o p et al. 1986; M i y a m o t o et al. 1987; G o o d m a n et al. 1990), a n d the r R N A gene ( G o n - za lez et al. 1990), w h e r e a s the Th is s i m i l a r to t he e s t i m a t e s f r o m the r R N A gene ( G o n z a l e z et al. 1990) a n d i m m u n o g l o b u l i n - ~ p s e u d o g e n e ( U e d a et al. 1989). T h e m a x i m u m l i k e l i h o o d e s t i m a t e s b a s e d o n the 8 9 6 - b p m t D N A (Brown et al. 1982) a re m u c h shor te r , Tg = 5.1 M y r ago a n d Th = 3.9 M y r ago

( H a s e g a w a a n d K i s h i n o 1991), w h i c h is p r o b a b l y d u e to t he r e l a t i v e l y s m a l l r eg ion c o m p a r e d . T h e p r e s e n t d a t i n g is d i f fe ren t f r o m tha t for t he C O H gene ( R u v o l o et al . 1991). R u v o l o et al. (1991) as- s u m e d Th = 6 M y r ago, w h i c h we w a n t e d to k n o w , a n d t h e i r e s t i m a t e was la rge ly b a s e d on the s y n o n - y m o u s d i f fe rences in th i s c o n s e r v e d gene, w h i c h we e x c l u d e d . W e agree, h o w e v e r , t h a t h u m a n , c h i m - panzee , a n d gor i l l a were d i v e r s i f i e d f r o m each o t h e r

Table 2. Continued

C P H G O S C P H G O S

41

Tyr Cys TAT +3 +6 10 +13 +3 +7 TGT +0 +0 1 +1 -+0 - 0 TAC - 4 - 7 37 - 1 3 - 6 - 7 TGC +0 +1 3 +0 +0 - 1

Ter *Trp TAA . . . . . . TGA +1 +0 35 _+0 - 2 - 3 TAG . . . . . . TGG - 1 _+ 0 3 + 1 + 2 + 3

His Arg CAT +3 +3 7 +3 __+0 +2 CGT _+0 + 1 5 - 2 - 3 - 0 CAC - 3 - 3 26 - 3 +5 - 1 CGC +0 - 2 9 _+0 +4 - 1

CGA - 1 - 1 11 +1 - 1 +0 CGG + 1 + 1 0 +0 _+0 +0

*Gin CAA +4 +6 28 +1 - 0 +4 CAG - 2 - 4 5 _+0 + 1 - 2

Ash Ser AAT +5 +3 15 +4 - 1 - 3 AGT +1 +1 2 +1 - 1 +1 AAC - 8 - 4 42 - 5 - 4 +4 AGC +2 +0 10 - 1 +2 - 2

*Lys Ter AAA +2 +2 33 - 2 +3 - 4 AGA . . . . . . AAG - 3 - 3 5 + 1 - 4 _+0 AGG . . . . . .

Asp Gly GAT +2 ___0 5 +5 +1 +1 GGT - 3 - 2 10 +1 - 1 _+0 GAC - 2 _0 22 - 4 +1 - 1 GGC +5 +5 31 - 2 +5 - 1

GGA - 4 - 1 29 + 1 - 4 _+0 GGG +2 - 2 5 +0 _+0 +3

*Glu GAA +5 +3 22 +3 +1 +0 GAG - 5 - 3 7 - 3 - 2 - 0

Table 3. Estimated branch lengths (in terms of the number of substitutions) based on the cladogram of hominoids in Fig. 2

(1 ,C) (1,P) (1,2) (2,H) (2,3) (3,G) (3,4) (4,0) (4,S)

Whole region (4759 bp)

MP 89.2 71.5 124.5 201.5 117.6 228.0 192.3 380.6 435.9 ML 91.1 75.0 118.8 235.1 95.8 281.2 191.8 450.5 538.0 NJ 89.0 76.3 111.1 232.9 51.5 286.9 115.5 443.6 492.5

DATA1 (3423 bp)

MP 20.2 16.8 24.5 45.5 37.3 63.8 65.5 158.5 145.0 ML 19.6 17.9 19.4 53.1 33.0 70.8 66.9 178.1 157.3 NJ 19.1 18.4 19.5 52.0 23.1 72.7 45.3 173.5 155.6

DATA2 (3789 bp)

NJ 16.6 17.7 16.2 38.3 18.0 59.8 41.9 152.0 141.1

DATA3 (4270 bp)

NJ 20.9 21.5 19.2 45.5 29.7 84.0 54.9 205.5 233.3

DATA 1 consists of the tRNA, first codon, and second codon positions; DATA2 consists of the tRNA and nonsynonymous sites; and DATA3 includes VS3 in DATA2. For the whole region and DATA 1, there is considerable variation in the estimates. Such variation is mostly due to frequent synonymous changes and reflects the difficulties in inferring multiple-hit substitutions accurately. All methods indicated that the evolutionary rate of the orangutan mtDNA is enhanced and that the synonymous substitution rate is somewhat retarded in the lineage leading to chimpanzees. However, DATA2 and DATA3 show rough constancy of evolutionary rates among human, gorilla, pygmy, and common chimpanzees. In any case, because there are heterogeneities in sequence differences from gene to gene and in nucleotide compositions, we had to deal with them. When using the DNAML in PHYLIP (Felsenstein 1990), we set an empirical base frequency for each region (codon positions or tRNAs) and looked for the ratio of transitions to transversions so as to maximize the likelihood (6.0 at the first and second codon positions and 13 for 11 tRNAs). To correct multiple-hit substitutions from synonymous and nonsynonymous differences in NJ (Saitou and Nei 1987), we also took into account the transition bias and heterogeneous substitution rates along sequences. After these region-region corrections, we computed the total number of substitutions for larger specified data sets. Abbreviations MP, ML, and NJ indicate the maximum parsimony, maximum likelihood, and neighbor- joining methods, respectively

42

wi th subs t an t i a l s epa ra t ion t imes (Sibley a n d Ahl - qu is t 1984; Gonza l ez et al. 1990; R u v o l o et al. 1991).

T h e t i m e difference b e t w e e n gori l la a n d h u m a n

thus appears 3 M y r a n d a m o u n t s to 200 ,000 gen- e ra t ions i f the gene ra t i on t i m e o f the ances t ra l spe-

cies was 15 years. W h e t h e r different e s t ima tes o f this t i m e pe r iod are a t t r i b u t e d to ances t ra l poly-

m o r p h i s m ( R u v o l o et al. 1991), a po t en t i a l cause o f

d i s co rdance b e t w e e n sequence genealogy a n d spe-

cies re la tedness , d e p e n d s o n the ances t ra l p opu l a - size ( P a m i l o a n d Ne i 1988; T a k a h a t a 1989). Be it

the s ame order o f m a g n i t u d e (104 ) as the e s t i m a t e d

c u r r e n t h u m a n p o p u l a t i o n size (Nei 1987), the effect

m u s t be smal l , a n d it is l ikely tha t m t D N A genealogy

is i den t i ca l to species re la tedness ( T a k a h a t a 1989). No te , however , tha t i f the ances t ra l p o p u l a t i o n size t u r n s ou t to be o n the o rder o f 105 ( T a k a h a t a et al.

1992), m o l e c u l a r phy logeny o f h o m i n o i d s m a y dif-

fer f r o m gene to gene. O n the o the r hand , the t i m e p e r i o d s ince h u m a n first b r a n c h e d off a m o u n t s to

s o m e 300 ,000 genera t ions . D u r i n g these gene ra t ions

the h o m i n i d l ine u n d e r w e n t a d d i t i o n a l l y a t least four or five m a j o r changes in m o r p h o l o g i c a l t ra i ts

( P i l b e a m 1984; Lewin 1988; St r inger 1990). O u r

resul ts do n o t clarify the causes a n d m e c h a n i s m s o f

h u m a n e v o l u t i o n , b u t e s tab l i sh ing the phy logeny a n d t i m e s p r o v i d e s the necessary b a c k g r o u n d for

such research.

Acknowledgments. This paper is dedicated to the late Allan C. Wilson who pioneered and had made outstanding contributions to molecular systematics. We thank J.F. Crow and J. Klein for comments on an early version of this paper and S. Ueda and O. Takenaka for providing the gorilla and pygmy chimpanzee ge- nomic DNA.

References

Anderson S, Bankier AT, Barrell BG, de Brnijn MHL, Coulson AR, Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, Schreier PH, Smith AJH, Staden R, Young IG (1981) Se- quence and organization of the human mitochondrial genome. Nature 290:457-465

Anderson S, de Bruijn MHL, Coulson AR, Eperon IC, Sanger F, Young IG (1982) Complete sequence of bovine mitochondrial DNA: conserved features of the mammalian mitochondrial genome. J Mol Biol. 156:683-717

Bibb M J, Van Etten RA, Wright CT, Walberg MW, Clayton DA (1981) Sequence and gene organization of mouse mitochondrial DNA. Cell 26:167-180

Brown WM, Prager EM, Wang A, Wilson AC (1982) Mito- chondrial DNA sequences of primates: tempo and mode of evolution. J Mol Evol 18:225-239

Caccone A, Powell JR (1989) DNA divergence among hominoids. Evolution 43:925-942

Darwin C (1859) The origin of species by means of natural selection. John Murray, London

Djian P, Green H (1989) Vectorial expansion of the involucrin gene and the relatedness of the hominoids. Proc Natl Acad Sci USA 86:8447-8451

Felsenstein J (1990) PHYLIP manual version 3.3. University Herbarium, University of California, Berkeley

Foran DR, Hixson JE, Brown WM (1988) Comparison of ape and human sequences that regulate mitochondrial DNA tran- scription and D-loop DNA synthesis. Nucleic Acids Res 16: 5841-5861

Gibbons A (1990) Our chimp cousins get that much closer. Science 250:376

Gonzalez IL, Sylvester JE, Smith TF, Stambolian D, Schmickel RD (1990) Ribosomal RNA gene sequences and hominoid phylogeny. Mol Biol Evol 7:203-219

Goodman M, Braunitzer G, Stangl A, Schrank B (1983) Evi- dence on human origin from haemoglobins of African apes. Nature 303:546-548

Goodman M, Koop BF, Czelusniak J, Fitch DHA, Tagle DA, Slightom JL (1989) Molecular phylogeny of the family of apes and humans. Genome 31:316-335

Gould SJ (1980) Our natural place. In: Hen's teeth and horse's toes. W.W. Norton, New York, p 241

Hasegawa M, Kishino H (1991) DNA sequence analysis and evolution of Hominoidea. In: Kimura M, Takahata N (eds) New aspects of the genetics of molecular evolution. Springer/ Verlag, Tokyo, p 303

Hixson JE, Brown WM (1986) A comparison of the small ribosomal RNA genes from the mitochondrial DNA of great apes and humans: sequence, structure, evolution, and phylogenetic implications. Mol Biol Evol 3:1-18

Huxley TH (1894) Evolution and ethics and other essays. D. Appleton, New York

Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge

Kimura M (1987) Molecular evolutionary clock and the neutral theory. J Mol Evol 26:24-33

Koop BF, Goodman M, Xu P, Chan K, Slightom JL (1986) Primate n-globin DNA sequences and man's place among the great apes. Nature 319:234-238

Koop BF, Tagle DA, Goodman M, Slightom JL (1989) A molecular view of primate phylogeny and important systematic and evolutionary questions. Mol Biol Evol 6:580-612

Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157:105- 132

Lewin R (1988) In the age of mankind. Smithsonian Book, Washington DC

Li W-H, Luo C-C, Wu C-I (1985) Evolution of DNA sequences. In: Maclntyre RJ (ed) Molecular evolutionary genetics. Ple- num, New York, pp 1-94

Mellars P, Stringer C (1989) The human revolution: behavioral and biological perspectives on the origin of modern humans. Princeton University Press, Princeton NJ

Miyamoto MM, Slightom JL, Goodman M (1987) Phylogenetic relations of humans and African apes from DNA sequences in the ¢/~-globin region. Science 238:369-373

NeiM (1987) Molecular evolutionary genetics. ColumbiaUni- versity Press, New York

Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Mol Biol Evol 5:568-583

Pilbeam DR (1984) The descent of hominoids and hominids. Sci Am 250:60-69

Ruvolo M, Disotell TR, Allard MW, Brown WM, Honeycutt RL (1991) Resolution of the African hominoid trichotomy by use of a mitochondrial gene sequence. Proc Natl Acad Sei USA 88:1570-1574

Saiki RK, Gelfand DH, Stoffen S, Scharf SH, Higuchi R, Horn GT, Mullis KB, Erlich HA (1988) Primer-directed enzy- matic amplifications of DNA with a thermostable DNA polymerase. Science 239:487-491

Saitou N, Nei M (1987) The neighbour-joining method: a new

method for reconstructing phylogenetic trees. Mol Biol Evol 4:406-425

Sanger F, Nicklen S, Coulson AR (1977) DNA sequence with chain terminating inhibitors. Proc Natl Acad Sci USA 74: 5463-5467

Sarich VM, Wilson AC (1967) Immunological time scale for hominoid evolution. Science 158:1200-1203

SibleyCG, AhlquistJE (1984) The phylogeny of the hominoid primates, as indicated by DNA hybridization. J Mol Evol 20: 2-15

Sibley CG, Comstock JA, Ahlquist JE (1990) DNA hybridization evidence of hominoid phylogeny: a reanalysis of the data. J Mol Evol 30:202-236

Stringer CB (1990) The emergence of modern humans. Sci Am 263:68-74

TakahataN (1989) Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122:957-966

43

Takahata N, Tajima F (1991) Sampling errors in phylogeny. Mol Biol Evol 8:494-502

Takahata N, Satta Y, Klein J (1992) Polymorphism and bal- ancing selection at the major histocompatibility complex loci. Genetics 130:925-938

Ueda S, Watanabe Y, Saitou N, Omoto K, Hayashida H, Miyata T, Hisajima H, Honjo T (1989) Nucleotide sequences of immunoglobulin-epsilon pseudogenes in man and apes and their phylogenetic relationships. J Mol Biol 205:85-90

ZuckerkandlE, PaulingL (1965) Evolutionary divergence and convergence in proteins. In: Bryson Y, Vogel HJ (eds) Evolv- ing genes and proteins. Academic Press, New York, pp 97- 166

Received September 8, 1991/Revised and accepted February 22, 1992

Man's place in hominoidea revealed by mitochondrial DNA genealogy

Documents

Transcript of Man's place in hominoidea revealed by mitochondrial DNA genealogy