The Draft Genome of Ciona intestinalis: Insights into Chordate and Vertebrate Origins

12
The Draft Genome of Ciona intestinalis: Insights into Chordate and Vertebrate Origins Paramvir Dehal, 1 * Yutaka Satou, 2 * Robert K. Campbell, 3,4 Jarrod Chapman, 1 Bernard Degnan, 5 Anthony De Tomaso, 6 Brad Davidson, 7 Anna Di Gregorio, 7 Maarten Gelpke, 1 David M. Goodstein, 1 Naoe Harafuji, 7 Kenneth E. M. Hastings, 8 Isaac Ho, 1 Kohji Hotta, 9 Wayne Huang, 1 Takeshi Kawashima, 10 Patrick Lemaire, 11 Diego Martinez, 1 Ian A. Meinertzhagen, 12 Simona Necula, 1 Masaru Nonaka, 13 Nik Putnam, 1 Sam Rash, 1 Hidetoshi Saiga, 14 Masanobu Satake, 15 Astrid Terry, 1 Lixy Yamada, 2 Hong-Gang Wang, 16 Satoko Awazu, 2 Kaoru Azumi, 17 Jeffrey Boore, 1 Margherita Branno, 18 Stephen Chin-bow, 19 Rosaria DeSantis, 18 Sharon Doyle, 1 Pilar Francino, 1 David N. Keys, 1,7 Shinobu Haga, 9 Hiroko Hayashi, 9 Kyosuke Hino, 2 Kaoru S. Imai, 2 Kazuo Inaba, 20 Shungo Kano, 2,18 Kenji Kobayashi, 2 Mari Kobayashi, 2 Byung-In Lee, 1 Kazuhiro W. Makabe, 2 Chitra Manohar, 1 Giorgio Matassi, 18 Monica Medina, 1 Yasuaki Mochizuki, 2 Steve Mount, 21 Tomomi Morishita, 9 Sachiko Miura, 9 Akie Nakayama, 2 Satoko Nishizaka, 9 Hisayo Nomoto, 9 Fumiko Ohta, 9 Kazuko Oishi, 9 Isidore Rigoutsos, 19 Masako Sano, 9 Akane Sasaki, 2 Yasunori Sasakura, 2 Eiichi Shoguchi, 2 Tadasu Shin-i, 9 Antoinetta Spagnuolo, 18 Didier Stainier, 22 Miho M. Suzuki, 23 Olivier Tassy, 11 Naohito Takatori, 2 Miki Tokuoka, 2 Kasumi Yagi, 2 Fumiko Yoshizaki, 13 Shuichi Wada, 2 Cindy Zhang, 1 P. Douglas Hyatt, 24 Frank Larimer, 24 Chris Detter, 1 Norman Doggett, 25 Tijana Glavina, 1 Trevor Hawkins, 1 Paul Richardson, 1 Susan Lucas, 1 Yuji Kohara, 9 Michael Levine, 7,26 Nori Satoh, 2 Daniel S. Rokhsar 1,7,26 The first chordates appear in the fossil record at the time of the Cambrian explosion, nearly 550 million years ago. The modern ascidian tadpole represents a plausible approximation to these ancestral chordates. To illuminate the origins of chordate and vertebrates, we generated a draft of the protein-coding portion of the genome of the most studied ascidian, Ciona intestinalis. The Ciona genome contains 16,000 protein-coding genes, similar to the number in other invertebrates, but only half that found in vertebrates. Vertebrate gene families are typically found in simplified form in Ciona, suggesting that ascidians contain the basic ancestral complement of genes involved in cell signaling and devel- opment. The ascidian genome has also acquired a number of lineage-specific innovations, including a group of genes engaged in cellulose metabolism that are related to those in bacteria and fungi. Introduction There has been considerable debate regarding the evolutionary origins of the chordates and, within them, of vertebrates, and the relation- ships between the first chordates and inver- tebrate deuterostomes (13). There is a consensus that the three major deuteros- tome phyla (echinoderms, hemichordates, and chordates) arose from a common an- cestor more than 550 million years ago. The chordates subsequently diverged into three subphyla: urochordates (also known as tunicates), cephalochordates, and verte- brates (Fig. 1). It is generally believed that cephalochordates and vertebrates are sister groups, with the urochordate lineage most basal among the chordates. The most prevalent modern urochordates are the ascidians, known familiarly as sea squirts. The analysis of ascidians offers the simultaneous prospects of providing in- sights into two compelling problems in evolutionary biology: the origins of the chordates from an ancestral deuterostome, and the origins of the vertebrates from a simple chordate. Ascidians, or sea squirts, are sessile, her- maphroditic marine invertebrates that live in shallow ocean waters around the world. The adults are simple filter feeders encased in a fibrous tunic supporting incurrent and outcur- rent siphons, and were classified by Aristotle as related to mollusks (Fig. 2A). The close kinship of ascidians with vertebrates was dis- covered over 130 years ago by Kowalevsky (4 ), who recognized that the ascidian larva has the general appearance of a vertebrate tadpole (Fig. 2L). Most notably, the tail of the ascidian tadpole contains a prominent noto- chord and a dorsal tubular nerve cord. These findings elevated ascidians to the ranks of the 1 U.S. Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA. 2 Department of Zoology, Graduate School of Science, Kyoto University, Kyoto 606-8502, Japan. 3 Marine Biological Laboratories, Woods Hole, MA 02543, USA. 4 Serono Reproductive Biology Institute, One Technol- ogy Place, Rockland, MA 02370, USA. 5 Department of Zoology and Entomology, University of Queensland, Brisbane, Qld 4072, Australia. 6 Department of Pathol- ogy, Stanford University School of Medicine, Stanford, CA 93950, USA. 7 Department of Molecular and Cel- lular Biology, Division of Genetics, 401 Barker Hall, University of California, Berkeley, CA 94720, USA. 8 Montreal Neurological Institute, McGill University, Montreal, H3A 2T5, Canada. 9 National Institute of Genetics, Mishima 411-8540, Japan. 10 Bioinformatics Center, Institute for Chemical Research, Kyoto Univ- eristy, Uji, Kyoto 611-0011, Japan. 11 LGPD, IBDM, Case 907, Campus de Luminy, F-13288 Marseille Ce- dex 09, France. 12 Life Sciences Centre, Dalhousie Uni- versity, Halifax, Nova Scotia, B3H 4J1, Canada. 13 De- partment of Biological Sciences, Graduate School of Science, University of Tokyo, Tokyo 113-0033, Japan. 14 Department of Biological Sciences, Graduate School of Science, Tokyo Metropolitan University, Hachiohji, Tokyo 192-0397, Japan. 15 Department of Molecular Immunology, Institute of Development, Aging and Cancer, Tohoku University, Seiryo-machi, Aoba-ku, Sendai 980-8575, Japan. 16 Drug Discovery Program, Moffitt Cancer Center, 12902 Magnolia Drive, Tampa, FL 33612, USA. 17 Department of Biochemistry, Grad- uate School of Pharmaceutical Sciences, Hokkaido University, Sapporo 060-0812, Japan. 18 Stazione Zoo- logica “Anton Dohrn,” Villa Comunale, 80121 Napoli, Italy. 19 IBM Watson Research Center, Post Office Box 218, Yorktown Heights, NY 10598, USA. 20 Asamushi Marine Biological Station, Graduate School of Science, Tohoku University, Asamushi, Aomori 039-3501, Ja- pan. 21 University of Maryland, College Park, MD 20742, USA. 22 Department of Biochemistry and Bio- physics, Programs in Developmental Biology, Genet- ics, and Human Genetics, University of California, San Francisco, CA 94143, USA. 23 Wellcome Trust Centre for Cell Biology, Institute of Cell and Molecular Biol- ogy, University of Edinburgh, Edinburgh EH9 3JR, UK. 24 Oak Ridge National Laboratory, Post Office Box 2008, Oak Ridge, TN 37831, USA. 25 Los Alamos Na- tional Laboratory, Los Alamos, NM 87545, USA. 26 Center for Integrative Genomics, University of Cal- ifornia, Berkeley 94720, USA. *These authors contributed equally to this work. To whom correspondence should be addressed. E- mail: [email protected] (Y.K.), mlevine@uclink4. berkeley.edu (M.L.), [email protected] (N.S.), [email protected] (D.R.) R ESEARCH A RTICLES www.sciencemag.org SCIENCE VOL 298 13 DECEMBER 2002 2157

Transcript of The Draft Genome of Ciona intestinalis: Insights into Chordate and Vertebrate Origins

The Draft Genome of Cionaintestinalis: Insights into

Chordate and Vertebrate OriginsParamvir Dehal,1* Yutaka Satou,2* Robert K. Campbell,3,4

Jarrod Chapman,1 Bernard Degnan,5 Anthony De Tomaso,6

Brad Davidson,7 Anna Di Gregorio,7 Maarten Gelpke,1

David M. Goodstein,1 Naoe Harafuji,7 Kenneth E. M. Hastings,8

Isaac Ho,1 Kohji Hotta,9 Wayne Huang,1 Takeshi Kawashima,10

Patrick Lemaire,11 Diego Martinez,1 Ian A. Meinertzhagen,12

Simona Necula,1 Masaru Nonaka,13 Nik Putnam,1 Sam Rash,1

Hidetoshi Saiga,14 Masanobu Satake,15 Astrid Terry,1

Lixy Yamada,2 Hong-Gang Wang,16 Satoko Awazu,2

Kaoru Azumi,17 Jeffrey Boore,1 Margherita Branno,18

Stephen Chin-bow,19 Rosaria DeSantis,18 Sharon Doyle,1

Pilar Francino,1 David N. Keys,1,7 Shinobu Haga,9

Hiroko Hayashi,9 Kyosuke Hino,2 Kaoru S. Imai,2 Kazuo Inaba,20

Shungo Kano,2,18 Kenji Kobayashi,2 Mari Kobayashi,2

Byung-In Lee,1 Kazuhiro W. Makabe,2 Chitra Manohar,1

Giorgio Matassi,18 Monica Medina,1 Yasuaki Mochizuki,2

Steve Mount,21 Tomomi Morishita,9 Sachiko Miura,9

Akie Nakayama,2 Satoko Nishizaka,9 Hisayo Nomoto,9

Fumiko Ohta,9 Kazuko Oishi,9 Isidore Rigoutsos,19 Masako Sano,9

Akane Sasaki,2 Yasunori Sasakura,2 Eiichi Shoguchi,2

Tadasu Shin-i,9 Antoinetta Spagnuolo,18 Didier Stainier,22

Miho M. Suzuki,23 Olivier Tassy,11 Naohito Takatori,2

Miki Tokuoka,2 Kasumi Yagi,2 Fumiko Yoshizaki,13

Shuichi Wada,2 Cindy Zhang,1 P. Douglas Hyatt,24

Frank Larimer,24 Chris Detter,1 Norman Doggett,25

Tijana Glavina,1 Trevor Hawkins,1 Paul Richardson,1

Susan Lucas,1 Yuji Kohara,9† Michael Levine,7,26† Nori Satoh,2†Daniel S. Rokhsar1,7,26†

The first chordates appear in the fossil record at the time of the Cambrianexplosion, nearly 550million years ago. Themodern ascidian tadpole representsa plausible approximation to these ancestral chordates. To illuminate the originsof chordate and vertebrates, we generated a draft of the protein-coding portionof the genome of the most studied ascidian, Ciona intestinalis. The Cionagenome contains�16,000 protein-coding genes, similar to the number in otherinvertebrates, but only half that found in vertebrates. Vertebrate gene familiesare typically found in simplified form in Ciona, suggesting that ascidians containthe basic ancestral complement of genes involved in cell signaling and devel-opment. The ascidian genome has also acquired a number of lineage-specificinnovations, including a group of genes engaged in cellulose metabolism thatare related to those in bacteria and fungi.

IntroductionThere has been considerable debate regardingthe evolutionary origins of the chordates and,within them, of vertebrates, and the relation-ships between the first chordates and inver-tebrate deuterostomes (1–3). There is aconsensus that the three major deuteros-tome phyla (echinoderms, hemichordates,and chordates) arose from a common an-cestor more than 550 million years ago.The chordates subsequently diverged into

three subphyla: urochordates (also knownas tunicates), cephalochordates, and verte-brates (Fig. 1). It is generally believed thatcephalochordates and vertebrates are sistergroups, with the urochordate lineage mostbasal among the chordates.

The most prevalent modern urochordatesare the ascidians, known familiarly as seasquirts. The analysis of ascidians offers thesimultaneous prospects of providing in-sights into two compelling problems in

evolutionary biology: the origins of thechordates from an ancestral deuterostome,and the origins of the vertebrates from asimple chordate.

Ascidians, or sea squirts, are sessile, her-maphroditic marine invertebrates that live inshallow ocean waters around the world. Theadults are simple filter feeders encased in afibrous tunic supporting incurrent and outcur-rent siphons, and were classified by Aristotleas related to mollusks (Fig. 2A). The closekinship of ascidians with vertebrates was dis-covered over 130 years ago by Kowalevsky(4), who recognized that the ascidian larvahas the general appearance of a vertebratetadpole (Fig. 2L). Most notably, the tail of theascidian tadpole contains a prominent noto-chord and a dorsal tubular nerve cord. Thesefindings elevated ascidians to the ranks of the

1U.S. Department of Energy Joint Genome Institute,2800 Mitchell Drive, Walnut Creek, CA 94598, USA.2Department of Zoology, Graduate School of Science,Kyoto University, Kyoto 606-8502, Japan. 3MarineBiological Laboratories, Woods Hole, MA 02543, USA.4Serono Reproductive Biology Institute, One Technol-ogy Place, Rockland, MA 02370, USA. 5Department ofZoology and Entomology, University of Queensland,Brisbane, Qld 4072, Australia. 6Department of Pathol-ogy, Stanford University School of Medicine, Stanford,CA 93950, USA. 7Department of Molecular and Cel-lular Biology, Division of Genetics, 401 Barker Hall,University of California, Berkeley, CA 94720, USA.8Montreal Neurological Institute, McGill University,Montreal, H3A 2T5, Canada. 9National Institute ofGenetics, Mishima 411-8540, Japan. 10BioinformaticsCenter, Institute for Chemical Research, Kyoto Univ-eristy, Uji, Kyoto 611-0011, Japan. 11LGPD, IBDM,Case 907, Campus de Luminy, F-13288 Marseille Ce-dex 09, France. 12Life Sciences Centre, Dalhousie Uni-versity, Halifax, Nova Scotia, B3H 4J1, Canada. 13De-partment of Biological Sciences, Graduate School ofScience, University of Tokyo, Tokyo 113-0033, Japan.14Department of Biological Sciences, Graduate Schoolof Science, Tokyo Metropolitan University, Hachiohji,Tokyo 192-0397, Japan. 15Department of MolecularImmunology, Institute of Development, Aging andCancer, Tohoku University, Seiryo-machi, Aoba-ku,Sendai 980-8575, Japan. 16Drug Discovery Program,Moffitt Cancer Center, 12902 Magnolia Drive, Tampa,FL 33612, USA. 17Department of Biochemistry, Grad-uate School of Pharmaceutical Sciences, HokkaidoUniversity, Sapporo 060-0812, Japan. 18Stazione Zoo-logica “Anton Dohrn,” Villa Comunale, 80121 Napoli,Italy. 19IBM Watson Research Center, Post Office Box218, Yorktown Heights, NY 10598, USA. 20AsamushiMarine Biological Station, Graduate School of Science,Tohoku University, Asamushi, Aomori 039-3501, Ja-pan. 21University of Maryland, College Park, MD20742, USA. 22Department of Biochemistry and Bio-physics, Programs in Developmental Biology, Genet-ics, and Human Genetics, University of California, SanFrancisco, CA 94143, USA. 23Wellcome Trust Centrefor Cell Biology, Institute of Cell and Molecular Biol-ogy, University of Edinburgh, Edinburgh EH9 3JR, UK.24Oak Ridge National Laboratory, Post Office Box2008, Oak Ridge, TN 37831, USA. 25Los Alamos Na-tional Laboratory, Los Alamos, NM 87545, USA.26Center for Integrative Genomics, University of Cal-ifornia, Berkeley 94720, USA.

*These authors contributed equally to this work.†To whom correspondence should be addressed. E-mail: [email protected] (Y.K.), [email protected] (M.L.), [email protected](N.S.), [email protected] (D.R.)

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 298 13 DECEMBER 2002 2157

GUEST
CORRECTED 30 MAY 2003; SEE LAST PAGE PAGE

chordates and convinced Darwin that the as-cidian tadpole is a legitimate relative of ver-tebrates, including humans (5).

In addition to their unique evolutionaryposition as invertebrate chordates, ascidiansprovide a simple experimental system to in-vestigate the molecular mechanisms underly-ing cell-fate specification during chordate de-velopment (6–10). First, the ascidian tadpoleis composed of only about 2500 cells, andthere is extensive information on the celllineage of most major tissues and organs (11,12). Second, the blastomeres of early embry-os are large and easy to manipulate; theypermit the detailed visualization of differen-tial gene expression during development (6–10). Third, embryogenesis is rapid (�18hours from fertilization to a free-swimmingtadpole at 18°C) (Fig. 2, B to L) and theentire life cycle takes less than 3 months,thereby facilitating genetic analyses (13, 14).Finally, it is possible to introduce transgenicDNAs into developing embryos by usingsimple electroporation methods (15, 16) thatpermit the simultaneous transformation ofhundreds or even thousands of synchronouslydeveloping embryos. The method has beenused to characterize cis-regulatory DNAs(15, 16), and produce mutant phenotypes bymisexpressing or overexpressing a variety ofregulatory genes that encode transcriptionfactors (17) or signaling molecules (18).

The genome of the ascidian Ciona intes-tinalis is among the smallest of any experi-mentally accessible chordate; at an estimated160 million base pairs (19), it is nearly 20times smaller than the human genome.Among the chordates, only larvaceans (an-other group of urochordates) have a smallergenome (20), but they are not easily manip-ulated using modern molecular methods. It

has been proposed that large-scale geneduplications occurred in the vertebrate lin-eage after its divergence from cephalochor-dates and urochordates (21). The ensuingincrease in gene number seems to underliesome of the novel and increasingly com-plex developmental processes seen in ver-tebrates. These expansions suggest that themore basal urochordates and cephalochor-dates contain an approximation to the “an-cestral complement” of nonduplicatedchordate genes. Here we report the draftsequence of the protein-encoding portion ofthe C. intestinalis genome and compare itwith the gene complement of other animalsto gain insight into the evolutionary originsof chordates and vertebrates.

Sequencing and Global Analysis of theCiona intestinalis Genome

To sequence the Ciona genome, we used awhole-genome shotgun approach (22, 23).Most of the sequence data in this study wasderived from a single individual in HalfMoon Bay, California. Sperm was isolatedfrom this individual, and DNA was extractedand sheared to create 3-kb fragments, whichwere subsequently cloned into plasmids andend-sequenced to cover the genome eightfold(24). More than 2 million sequence frag-ments were obtained. Bacterial artificialchromosome (BAC) and cosmid librarieswere end-sequenced to 0.2� genome cover-age to provide longer range linking informa-tion. The DNA for these libraries was ob-tained from a pool of Japanese individuals(BAC) and a different Californian (cosmid).

A high level of allelic polymorphism wasfound in the sequences from the single individ-ual, with 1.2% of the nucleotides differing be-tween alleles—nearly 15-fold more than in hu-mans (24), and three-fold more than in fugu(22, 23). Allelic polymorphism is easily distin-guished from sequencing error, because ateightfold shotgun coverage, each allele is re-dundantly sampled by multiple shotgun frag-ments (Fig. 3). Observed variations include sin-gle-nucleotide polymorphisms (SNPs), as wellas small insertions and deletions. The variationis not uniform, with local peaks as high as 10 to15% polymorphic sites within a window of 100nucleotides. The high degree of allelic variationis in part a consequence of the large effectivepopulation size that results from the breedingmode of ascidians, which release sperm andeggs into the ocean for external fertilization.

Genome assembly was carried out usingthe JAZZ suite of assembly tools developedfor large whole-genome shotgun projects(23–25). The output of JAZZ is a set ofsequence “contigs” (segments of the genomereconstructed from overlapping shotgun frag-ments) linked together into sequence “scaf-folds” (reconstructed sequence with internalgaps of measurable size) (22). For haploidgenomes, or diploid genomes with a low rateof polymorphism (like human, or inbredstrains of laboratory animals like flies andmice), overlapping shotgun fragments de-rived from a given region of the genomediffer only by random sequencing error. Theextensive polymorphisms found in pufferfish(0.4%) (23) and Ciona (1.2%), however, pro-duce higher levels of sequence mismatch(Fig. 3). To overcome this problem we toler-ated imperfect alignments that were consis-tent with the order and orientation of thepaired end sequences derived from each plas-mid insert (23–25).

The assembled sequence is a mosaic ofthe maternal and paternal haplotypes of thesequenced individual. Using the consensusas a reference, one can extract long stretch-

Fig. 1. Phylogeny of bilaterian animals (1–3).Ciona intestinalis is a member of the urochor-dates, the most primitively branching clade ofchordates. Chordates in turn are deuteros-tomes, one of two great divisions (along withprotostomes) of bilaterian animals.

Fig. 2. The sea squirt Ciona intestinalis. (A)Adults with incurrent and outcurrent siphons.The white duct is the sperm duct, while theorange duct paralleling it is the egg duct. (B toL) Embryogenesis. (B) Fertilized egg, (C) 2-cellembryo, (D) 4-cell embryo, (E) 16-cell embryo,(F) 32-cell embryo, (G) gastrula (about 150cells), (H) neurula, (I to K) tailbud embryos, and(L) tadpole larva. Embryos were dechorionatedto show their outer morphology clearly. (M) Ajuvenile a few days after metamorphosis,showing the internal structures. ds, digestivesystem; en, endostyle; ht, heart; os, neuronalcomplex; and pg, pharyngeal gill.

R E S E A R C H A R T I C L E S

13 DECEMBER 2002 VOL 298 SCIENCE www.sciencemag.org2158

es of either haplotype (Fig. 3), as will bedescribed elsewhere (25). The distinctive-ness of the two haplotypes raises the con-cern that some highly polymorphic regionsof the genome could be represented twicein the assembly. To eliminate any residualredundancy, we aligned the initial assemblywith itself using BLASTN and removedsmall sequence scaffolds that contain morethan 95% identity with other scaffolds. Theseidentified artifacts typically involved smallscaffolds (shorter than 3 kb). For uniformitywe removed all scaffolds shorter than 3 kbfrom the present assembly release, which isdeposited in GenBank under project acces-sion number AABS00000000, versionAABS01000000. The removed scaffolds areavailable at www.jgi.doe.gov/ciona. After re-moving these identified duplications from theassembly, we estimate that up to 2 to 3%sequence redundancy persists. In some cases,short, �500–base pair (bp) duplications arepresent on either side of an intrascaffold gapdue to allelic variation; this problem has beencorrected in subsequent assemblies, which arecontinually improving with additional data andalgorithmic developments. Updated assembliesare available at www.jgi.doe.gov/ciona.

The C. intestinalis assembly presentedhere spans 116.7 Mbp of nonrepetitive se-quence in 2501 scaffolds longer than 3 kbp.Sixty Mbp, or about half the assembly, isreconstructed in only 177 scaffolds longerthan 190 kbp. More than 85% of the assem-bled sequence (total of 104.1 Mbp) is foundin 905 scaffolds longer than 20 kb, and there-fore longer than the span of a typical gene (5to 10 kbp). These scaffolds include 4264contigs that account for 100.9 Mbp. There are3359 captured gaps that total 3.2 Mbp, whichaccount for less than 3% of the net scaffoldsequence. Families of high–copy number tan-dem repeats identified from initial samplesequencing (ribosomal RNAs, clusters oftRNAs, and so forth) were withheld fromassembly (24) and account for nearly 11% ofthe raw data, for an additional 16 to 18 Mbpof genome sequence; these are typically prob-lematic to assemble. Another 15 to 17 Mbp ofgenome sequence (10% of the raw shotgundata) remains unassembled. This shotgun se-quence contains additional low-copy trans-posable elements and repeats, and a relativelysmall amount of unique sequence. We esti-mate (24) that the C. intestinalis genome is�153 to 159 Mbp in total length, in accordwith earlier estimates (19). The genome isnotably AT rich (65%) compared with thehuman genome (26, 27). Available mappingdata do not yet allow us to place the assem-bled sequence on Ciona’s 14 chromosomes.

We can confirm the completeness of theeuchromatic assembly by comparing it withother samplings of Ciona sequence. From thecurrent collection of 5647 full-length cDNA

sequences from the Kyoto project, 5359aligned with the assembled sequence usingBLAST (28, 29). These results suggest that95% of the protein-coding genes are con-tained in the assembly. Similarly, 146 of 151(96.7%) previously known Ciona genes wererecovered (24). These figures are commonfor draft whole-genome shotgun projects, be-cause certain segments of the genome cannotbe cloned and/or sequenced using presentmethods, or remain unassembled. A genemissing from the current assembly is proba-bly absent from the genome itself. Two of the151 known genes appear to be misassembledin the present draft, suggesting that a fewhundred genes might be interrupted in thismanner.

Annotation of gene contentThe genome sequence is complemented by anextensive collection of over 480,000 5� and3� expressed sequence tags (ESTs) andcDNAs produced from a large-scale ESTproject in Kyoto (http://ghost.zool.kyoto-u.ac.jpindexr1.html; see also supporting onlinematerial). These were essential for geneidentification and provide a resource forfuture functional studies. ESTs are avail-able from cDNA libraries derived from 10different developmental stages or adult tis-sues, including fertilized egg, early cleav-ing embryos, gastrula/neurula-stage embry-os, tailbud embryos, mature larvae, andwhole juvenile adults, as well as adultheart, neural complex, endostyle, and testis(28, 29). A partially redundant set of 5647

cDNA clones have been sequenced by di-rected walking.

To model genes in the Ciona genome, theassembled draft sequence was translated inall reading frames and compared with that ofall proteins in GenBank using ungappedBLASTX (30) with default parameters. Besthits at each locus were selected using a pars-ing scheme that finds collinear sequence-pairalignments with the highest sum of scores. Toaugment the EST collection, gene modelsbased on these homologies with known pro-teins were created with GeneWise (31). Theresulting partial gene models are typicallyincomplete at the 5� and 3� ends due toreduced amino acid conservation near thebeginning and end of proteins. The cDNAand EST sequences were combined withthese partial homology-based models to ob-tain a more complete set of gene models.Ultimately, each predicted peptide was ana-lyzed for domain content using InterPro (32)and aligned with the human, fly, and wormproteomes along with the remainder of theGenBank proteins.

An annotation workshop (33) was held atthe U.S. Department of Energy Joint GenomeInstitute to bring together experts from theworldwide ascidian research community to re-view gene models, assign putative names andfunctional annotations, and examine the Cionagenome from the perspective of metabolic andregulatory networks and pathways. Automatedanalyses were refined by inspection of genefamilies and alignments with known genes inother organisms at the workshop and subse-

Fig. 3. Allelic polymor-phism in the C. intesti-nalis genome. (A) Thenumber of variable nu-cleotides in a running100-bp window acrossa 100-kb segment ofgenomic sequence.Note the wide varia-tion in polymorphismrate on this scale. (B)Multiple sequencealignment of six shot-gun sequence frag-ments showing se-quence-level variationin the blue box in (A).The top line (“c”) rep-resents the assembledconsensus sequence,with polymorphic posi-tions highlighted in redand by “#” above. Eachsubsequent line showsindividual shotgun-se-quencing fragments aligned with the consensus. Dots indicate match to the consensus athigh-quality positions, and “o” indicates matches to the consensus at low-quality positions;ACTG indicates discrepancies with consensus, and dashes indicate nucleotides that are in theconsensus but missing from a sequence fragment. Evidently the first four fragments come fromone haplotype (“A”) and the next two come from the other (“B”), because the discrepancieswith the consensus fall into two distinct sets. Note the presence of single-nucleotide variation,and insertions or deletions of various lengths.

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 298 13 DECEMBER 2002 2159

quently in a distributed collaboration over theWeb. To date, nearly 2000 genes have beenexamined in this manner. This work is ongoingand can be accessed (and contributed to) atwww.jgi.doe.gov/ciona.

Gene complement and globalcomparisonsA total of 15,852 distinct gene models wereobtained. These models have been stringentlyscreened to eliminate any residual redundan-cies that may remain in the assembly (24),which may also have the effect of removingcopies of some recently duplicated genes.Genes that span the boundaries of two scaf-folds are counted twice in this tally; by ex-amining known genes we estimate that 5% ofthe genes may be split in this manner (24).The gene complement in this simple chordateis thus comparable in size to that of the othercompletely sequenced invertebrates [14,000in fruit flies (22) and 19,000 in nematodes(34)] and somewhat less than the number invertebrates [31,000 for pufferfish (23) and30,000 to 35,000 for human (26, 27)]. Three-quarters of the predicted Ciona genes aresupported by EST evidence.

The overall organization of the protein-cod-ing genes in the Ciona genome is intermediatebetween that of protostomes and vertebrates.Ciona genes are generally compact and denselypacked. The average gene density is somewhathigher than in Drosophila (one gene per 7.5 kbversus one per 9 kb in fly), and far greater thanthe density in human (one gene per 100 kb).Ciona genes contain an average of 6.8 exonsper gene (versus 5 in Drosophila and 8.8 inhuman). Notable large genes in Ciona includethe putative ortholog of the vertebrate cardiacryanodine receptor, which spans 63 kb andencodes a protein of 4978 amino acids in 113exons that matches its human counterpart at51% of amino acids over the entire length ofboth peptides. The intron size distribution (24)has a sharp peak near 60 bp, compared with 87bp in human and 59 bp in fly. Ciona exhibits abroad second peak near 300 bp that accountsfor the majority of introns, in contrast to themonotonic decrease in the intron size distribu-tion observed in fly and human. The exon sizedistribution (24) peaks at 125 bp, close to thehuman peak and somewhat less than the Dro-sophila peak at 150 bp.

The draft Ciona genome allows us to infergene origins by comparison with the genomesof flies, nematodes, pufferfish, and mammals.Nearly 60% of predicted Ciona gene models(9883) have a detectable protostome ho-molog, as found by Smith-Waterman align-ment of the predicted Ciona protein overmore than 60% of the length of fly and/ornematode peptides (24). These are presum-ably ancient bilaterian genes, with core phys-iological and/or developmental roles com-mon to all animals.

A few hundred Ciona genes have strongersimilarity to fly and/or worm genes than toany genes in the current vertebrate set (24).These are candidates for ancient bilateriangenes that were preserved through the ances-tral chordate but lost along the vertebratelineage. An alternative hypothesis is, ofcourse, that these genes have evolved rapidlyin vertebrates and no longer have recogniz-able similarity, or that these genes have beenhorizontally transferred from protostomes toascidians. These are mostly genes of un-known function in fly and worm, but includegenes encoding chitin synthase, phytochela-tin synthase (involved in metal detoxifica-tion), and hemocyanin (the primary oxgencarrier in protostomes).

Conversely, nearly one-sixth (2570) ofCiona genes lack a clear protostome homologyet possess a recognizable vertebrate coun-terpart as measured by a Smith-Watermanalignment over 60% of the length of a fugu orhuman protein. Given the phylogeny shownin Fig. 1, the most straightforward interpre-tation is that these genes arose in their mod-ern form on the deuterostome branch sometime before the last common chordate ances-tor. (An alternative hypothesis is that thesegenes are of bilaterian origin but have rapidlydiverged after the split between protosomesand deuterostomes, or were even lost in fliesand nematodes.) Because the known deuter-ostome genes are essentially all from chor-dates, we cannot know at this time how manygenes in this set are chordate- rather thandeuterostome-specific. This ambiguity willbe partly removed when the sea urchin ge-nome becomes available (35), which will al-low more accurate assessment of the originsof these genes along the deuterostomebranch.

Finally, nearly one-fifth of the present setof Ciona genes (3399) have no clear homologin fly, worm, pufferfish, or human by thecriterion of an alignment over 60% of thetarget protein (24). Some of these are nodoubt simply poorly modeled or fragmentarygenes. Most (80%) are supported by one ormore ESTs from the Kyoto set, however, and72% have partial alignments with known pro-teins (BLOSUM62 score above 150, butaligning over less than 60% of the targetprotein). Although some of these genes willno doubt join the protostome or chordateranks upon further refinement of the genome,this set contains candidates for tunicate- orascidian-specific genes (i.e., emerging alongthe urochordate lineage since its divergencefrom the common ancestor with vertebrates),and/or rapidly evolving genes whose ho-mologs cannot be reliably detected using sim-ple alignment methods.

The Ciona gene set can also be used as apoint of comparison to identify genes in ver-tebrates that appeared between the base of the

chordate clade and the radiation of bony ver-tebrates, as represented by teleost fish andtetrapods. These are genes found in fugu andhumans, but absent in Ciona, and include awide range of neural genes, as discussed be-low. Other vertebrate-specific innovationsthat apparently emerged in the interval be-tween the divergence of the urochordates andthe tetrapods include the bulk of the coregenes of the adaptive immune system. Thetumor necrosis factor (TNF) gene family iswidely expanded in vertebrates relative toCiona, as are the genes involved in steroidhormone metabolism, as described below.

We used InterProScan (36) to assess thedomain content of the C. intestinalis pro-teome, and compared it with that of otheranimal genomes (24). Notable expansions inhumans relative to Ciona are found in genescontaining zinc-finger DNA binding do-mains, immunoglobulin domains, and therhodopsin-like heterotrimeric GTP-bindingprotein (G protein)–coupled receptor family.The marked expansion of immunoglobulinsand sensory transmembrane proteins reflectsthe two most dramatic innovations in thevertebrate lineage: adaptive immunity and thenervous system.

The overall twofold difference in genenumber between ascidians and vertebrates isthought to reflect gene duplications in thevertebrate lineage since the common chordateancestor (21). The extent to which modernascidians represent the ancestral chordatedepends in part on the extent to which Cionagenes can be unambiguously assigned toindividual genes and gene families in verte-brates. Conversely, expansions in the ascid-ian lineage reflect divergences from theancestral gene complement of chordates. Toexamine this question, we grouped Ciona andhuman genes into clusters representing genefamilies and subfamilies (24). This analysisdemonstrates that there are many more lin-eage-specific duplications in vertebrates thanin ascidians. Some examples of these arediscussed below. These results suggest thatthe gene complement of Ciona is a reason-able approximation to that of the ancestralchordate.

Comparison of select gene families inascidians and vertebratesVertebrate-specific gene duplication eventsare responsible for the increased gene numberrelative to protostomes such as Drosophilaand Caenorhabditis elegans (21). But proto-stomes are often too distant from vertebratesto permit direct assignments of orthology ona gene-by-gene basis, which is confoundedby protostome-specific divergences and du-plications. When comparing Ciona to verte-brates, however, we often find that the seasquirt genome appears to represent the ances-tral gene content, in the sense that a paralo-

R E S E A R C H A R T I C L E S

13 DECEMBER 2002 VOL 298 SCIENCE www.sciencemag.org2160

gous family in vertebrates is represented bya single gene in Ciona (24 ). However, ap-parent lineage-specific duplications in as-cidians are also found. Thus, Ciona com-bines both ancestral and derived features.We illustrate this for several families ofdevelopmental signaling molecules andtranscription factors below.

Fgf genes. The fibroblast growth factor(FGF) gene family comprises secreted pro-teins that control cell proliferation and differ-entiation (37). There are at least 22 membersof the FGF family in mammals. In contrast,only one Fgf gene (branchless) has beenidentified in Drosophila and just two (egl-17and let-756) in C. elegans. The invertebrateFGF proteins (35 to 84 kD) are considerablylarger than vertebrate FGFs (17 to 34 kD),and the relationships between the vertebrateand invertebrate FGF proteins are not fullyunderstood.

The Ciona cDNA project has identifiedsix Fgf genes in C. intestinalis (38). Phylo-genetic analysis indicates that two of thegenes correspond to vertebrate Fgf 8/17/18and Fgf11/12/13/14, respectively. Three ofthe Ciona Fgf genes represent orthologs ofvertebrate Fgf3/7/10/22, Fgf4/5/6, and Fgf9/16/20, respectively. The sixth Ciona Fgf genecannot be assigned to any of the vertebrateFgf genes. A survey of the C. intestinalisgenome identified all six Fgf genes, but noothers. This analysis of Fgf duplication anddivergence is typical of the Ciona genome.The number of gene family members inCiona falls somewhere between those in in-vertebrate and vertebrate species, and in mostcases each of the Ciona genes clearly corre-sponds to one or more vertebrate genes.

Smad genes. Smad transcription factorsare the key mediators of bone morphogeneticfactor (BMP) and transforming growthfactor–� (TGF-�)/activin signaling in nema-todes, flies, and vertebrates (39). There areeight different Smad genes in mouse andhuman. These fall into three distinct classes:I-Smads (inhibitory Smads: Smad6 andSmad7), R-Smads (receptor-regulatedSmads: Smad1, Smad2, Smad3, Smad5, andSmad8/9), and C-Smad (the common Smad:Smad4). Two of the R-Smads (Smad2 andSmad3) function downstream of TGF-�/activin, while three of the R-Smads (Smad1,Smad5, and Smad8/9) function downstreamof BMP. The I-Smads and C-Smad mediateboth BMP and TGF-�/activin signaling.

The C. intestinalis genome contains fiveSmad genes. Like vertebrates, Ciona containstwo Smad2/3 genes that may function down-stream of TGF-�/activin, although they wereduplicated independently in the sea squirtgenome. (That is, the two Ciona Smad2/3genes cannot be placed in a one-to-one cor-respondence with the vertebrate Smad2/3genes.) Except for Smad2/3, there is at least

one Ciona gene for each of the three classes,including one for the I-Smads, one for theremaining R-Smads, and one for C-Smad. Asin the case of the Fgf genes, Ciona contains asingle copy of genes that are duplicated invertebrates. It is notable that both types ofSmads, which function downstream of BMPand TGF-�/activin, are seen in the Cionagenome.

T-box genes. Members of the T-box fam-ily of transcription factors share an evolution-arily conserved DNA-binding domain firstidentified in the product of the mouseBrachyury (T) gene (40). Mammals have atleast 18 T-box genes, whereas Drosophilaand C. elegans possess 8 and 20 T-box genes,respectively (41). Many of the T-box genes inDrosophila and C. elegans arose from lin-eage-specific duplication events, and it is dif-ficult to align them with their vertebratecounterparts.

The C. intestinalis genome contains 10 T-box genes. Six of these are single-copy repre-sentatives of six of the seven major subfamiliesfound in mammals: Brachyury, Tbx1/10, Tbx2/3/4/5, Tbx15/18/22, Tbx20, and Eomes/Tbr/Tbx22. The remaining subfamily, which is rep-resented by a single gene in mammals, Tbx6,has four copies in Ciona. (These copies aredistinct at the nucleotide level, and are notartifacts of sequence polymorphism.) The ver-tebrate Tbx2/3/4/5 subfamily can be dividedinto two subsets: Tbx2/3 and Tbx4/5. The CionaT-box gene that represents the Tbx2/3/4/5 sub-family is actually more closely related toTbx2/3 than to Tbx4/5. These observations sug-gest that Tbx4/5 arose after the separation of theascidian and vertebrate lineages. This makessense because Tbx4/5 has been implicated inthe development of vertebrate appendages,which are obviously lacking in Ciona. Thereare also no Tbx4/5 orthologs present in thegenomes of Drosophila and C. elegans (41,42).

Vertebrate Characters Viewed fromthe Ciona intestinalis GenomeClassic studies suggest that ascidians possessorgans homologous to the vertebrate thyroid,pineal, and gill slits (43, 44). Comparativegenome analyses are broadly consistent withthese homologies, as described in subsequentsections.

Genes for the endocrine system inCiona intestinalisMetazoans use a number of common mech-anisms of cell-fate specification during de-velopment. In contrast, there seems to belittle conservation of the mechanisms thatcontrol physiological processes (43, 44 ).Although many endocrine organs and hor-mone-receptor systems are conservedamong vertebrates, it is difficult to identifycorresponding systems in protostomes. The

C. intestinalis genome provides a uniqueopportunity to determine which of the ver-tebrate endocrine systems may have beenpresent in ancient chordates.

Inspection of the C. intestinalis genomeidentifies a number of genes and gene familiesimplicated in vertebrate endocrine processes. Inparticular, the Ciona genome contains genesencoding all the major endocrine receptors thatbind peptide or protein ligands, with the excep-tion of the growth hormone family of cytokinereceptors. For example, genes encoding the go-nadotropin-releasing pathway, the insulin sig-naling system, and a glycoprotein hormone areall found in the Ciona genome. These basicreceptor families are also represented in proto-stome genomes. The Ciona genome, how-ever, also contains genes for the synthesisof thyroid hormones, including a Na/I syn-porter and thyroid peroxidase, as well as athyroid hormone receptor (see below); sim-ilar genes are not observed outside thechordate clade (45). In contrast to the con-servation of the thyroid hormone systembetween ascidians and vertebrates, theCiona genome lacks clear orthologs for someof the P450 enzymes essential for the synthe-sis of vertebrate steroid hormones such asandrogens (CYP17), estrogens (CYP19), andcorticosteroids and mineralocorticoids(CYP21) (46). It also lacks genes encodingsteroid hormone receptors (see below).

Although Ciona lacks steroid hormones,it contains multiple genes that encode nu-clear receptors (NRs). Figure 4 shows amolecular phylogenetic analysis of NRs(47 ). There is a single NR gene in Cionathat is equally related to the vitamin Dreceptors, ecdysone receptors, RORs/Rev-erbs, retinoid X receptors, COUP,HNF4, TR2, TR4, NR4A orphan receptors,FTZ-F1 orphan receptors, and GCNF1 or-phan receptor (Fig. 4, black lines). TheseNR genes are common to bilaterian animalsincluding flies, ascidians, and humans. Incontrast, genes encoding thyroid hormonereceptors, retinoic acid receptors, and per-oxisome proliferator-activated receptorsare evident in both Ciona and vertebrategenomes, but not in the fly genome (Fig. 4,green lines). These genes are likely chor-date (or possibly deuterostome) innova-tions. Finally, as mentioned above, theCiona genome does not contain genes en-coding steroid receptors such as estrogenreceptors, androgen receptors, mineralocor-ticoid receptor, glucocorticoid receptor,and progesterone receptor; these genes thusappear to represent vertebrate innovations(Fig. 4, red line). However, a member ofthe estrogen-related receptor family—orphan receptors that do not bind anyknown, naturally occurring steroid hor-mone—is present in Ciona, as well as in flyand human.

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 298 13 DECEMBER 2002 2161

Innovation in ascidian endocrinesystems?

The Ciona genome contains genes with com-binations of two or three different proteindomains that have not been reported in otherorganisms. Glycoprotein hormone receptors(GLHRs) are mosaic proteins that contain anextracellular domain with multiple leucine-rich repeats (LRRs) and a G protein–coupledreceptor (GPCR) transmembrane and cytoso-lic domain. Protostome genomes contain or-thologs of these hormone receptors, and thereis also an ortholog in C. intestinalis. TheCiona genome also contains a single orthologof a vertebrate family of orphan receptorsrelated to GLHR—the leucine-rich repeat Gprotein–coupled receptors, or LGRs (48). In

addition, the Ciona genome contains a large,previously unidentified family of GLHR-re-lated receptors. A further example predicts apolypeptide composed of 333 amino acid res-idues. This protein contains a caspase domainin the NH2-terminal half, whereas theCOOH-terminal half closely resembles deltexand rhysin, which are involved in Notch sig-naling (49). This is not due to an artifact ofthe gene prediction program because the cor-responding mRNA was identified in the ESTcollection.

The ascidian endostyle and thevertebrate thyroid glandOne of the most prominent classical homol-ogies is the proposed relationship betweenthe vertebrate thyroid gland and the ascid-ian endostyle, based on their common useof iodine, and continuity with the endostyleof cephalochordates (43, 44 ). The verte-brate thyroid secretes the potent iodine-containing metabolic hormones triiodothy-ronine and thyroxine in response to signalsfrom the pituitary and in a feedback loopwith the hypothalamus, and these hormonesplay important roles in vertebrate develop-ment, maturation, and homeostasis by reg-ulating gene expression through a nuclearhormone receptor family transcription fac-tor (45). The ascidian endostyle is an io-dine-sequestering strip that secretes mucusinto the pharyngeal feeding apparatus ofthe adult sea squirt (Fig. 2, A and M). Wesought evidence for thyroid-specific pro-teins in the Ciona genome.

Thyroid hormone in vertebrates is pro-duced by the action of thyroid peroxidase(Tpo), which iodinates tyrosine residues onthe carrier protein thyroglobulin, using hy-drogen peroxide produced by a pair of thy-roid oxidases (also known as dual oxidases,Duox1/2) (50). Clear homologs of Tpo, andorthologs of Duox1 and Duox2, are encodedin the Ciona genome, but no homolog ofthyroglobulin was found, suggesting that adifferent source of tyrosine is used in Ciona.Tpo is a member of the myeloperoxidasefamily that generates hydrogen peroxide forvarious purposes across bilaterians. Notably,multiple Tpo homologs are encoded in theCiona genome, suggesting an ascidian-spe-cific expansion of this gene that is found inonly a single copy in vertebrates. No evi-dence for the thyroxine-binding globulin andother carrier molecules of vertebrate bloodwere found.

In vertebrates, the conversion of thyroxineto the active hormone triiodothyronine is cat-alyzed by a pair of iodothyronine deiodinasesexpressed in the thyroid (type I) and in bodytissues (type II and type I); a third relatedenzyme (type III) inactivates both forms ofthyroid hormone as a means of regulatinghormone action (51). The Ciona genome en-

codes a pair of closely related enzymes thatare slightly more closely related to type I/IIIthan to type II. Phylogenetic analysis showsthat the Ciona enzymes are more closelyrelated to each other than to the vertebrateenzymes, suggesting that the original deiodi-nase may have multiplied independently inthe ascidian and vertebrate lineages, or thatthe split between I/III and II occurred close tothe last common chordate ancestor. This is arecurring theme in our analysis of the Cionagenome: Ascidians and vertebrates have mo-bilized the same raw materials present in theancestral chordate genome to their uniquepurposes and needs.

Apoptosis genes in Ciona intestinalisApoptosis is triggered by the activation ofcysteine proteases called caspases (52, 53).Caspases are initially synthesized as inactivezymogens, but become activated by proteo-lytic processing during apoptosis. CED-3 isthe only nematode caspase involved in apo-ptosis. In contrast, there are at least 14 dis-tinct caspases in mammals, including four“initiator” (�2, �8, �9, and �10) and three“effector” caspases (�3, �6, and �7). TheCiona genome contains 11 caspases; 4 corre-spond to initiator caspases and 7 to effectorcaspases.

In C. elegans, CED-3 becomes activatedby autoprocessing through interaction withthe adapter protein CED-4 in cells fated to dieduring development. In vertebrates, however,there are two major pathways for the activa-tion of caspases, involving mitochondria (in-trinsic pathway) or death receptors (extrin-sic). The intrinsic pathway is controlled by aCED-4 ortholog called Apaf-1, and by anti-apoptotic proteins Bcl-2 and Bcl-XL, whichprevent the release of apoptogenic moleculesfrom mitochondria and thereby block caspaseactivation (52, 53). Vertebrates also containtwo proteins, Bax and Bok, that promotethese events. Two Bcl-2 family members,CED-9 and EGL-1, have been identified in C.elegans, but there are no counterparts of thevertebrate Bax and Bok genes in worms. TheCiona genome contains three genes related toCED-4 and Apaf-1, as well as one copy of theBCL-XL–like anti-apoptotic gene. In contrastto the situation seen in C. elegans, Ciona alsocontains two genes that encode proteins withsimilarities to the pro-apoptotic proteins Baxand Bok.

The extrinsic pathway of caspase activa-tion depends on TNF receptors and is notpresent in C. elegans. Vertebrates contain avariety of TNF receptors, including TNFR1,Fas, DR-3, DR4, and DR5. Four putativemembers of the TNF family and three possi-ble TNFR-like genes have been detected inthe Ciona genome. These results suggest thatCiona may use both intrinsic and extrinsicpathways for caspase activation.

Fig. 4. A molecular phylogenetic tree of nuclearreceptor (NR) proteins generated by the neigh-bor-joining method with both the ligand-bind-ing domain and the DNA-binding domain. Thetree was constructed using all of Ciona NRs,human NRs, and Drosophila NRs. Black linesshow the presence of the genes in the fly,ascidian, and human genomes. Green linesshow chordate innovations, i.e., genes found inascidians and vertebrates but not in worms andflies. Red lines show vertebrate innovation, i.e.,genes found in vertebrates but not in ascidians.A dotted line shows ascidian lineage-specificinnovation, in which the gene was lost only inthis lineage. Within a branch indicated by thicklines, two Ciona genes were found, suggestingthe branches can be further divided into two ormore subfamilies. TR, thyroid hormone recep-tors; VDR, vitamin D receptors; LXR, liver Xreceptor; FXR, Farnesoid X-activated receptor;RAR, retinoic acid receptors; PPAR, peroxisomeproliferator-activated receptors; ROR, RAR-re-lated orphan receptor; Rev-erb, RXR, retinoid Xreceptors; HNF4, hepatocyte nuclear factor 4;COUP-TF, chicken ovalbumin upstream pro-moter transcription factor; Drosophila Tailless,a fly nuclear receptor; TR2/4, TR2 and TR4orphan receptors; ER, estrogen receptor; ERR,estrogen-related receptor; SR, steroid hormonereceptor; FTZF, fushi tarazu factor; GCNF, germcell nuclear factor; and NR4A, nuclear receptor4A. Note that ECR (ecdysone receptor) is theDrosophila ortholog of the vertebrate VDRs,and HR96 is the Drosophila ortholog of verte-brate VDRs.

R E S E A R C H A R T I C L E S

13 DECEMBER 2002 VOL 298 SCIENCE www.sciencemag.org2162

Immunity-related genes inCiona intestinalis

All Metazoa possess a variety of innatemechanisms to resist infection by pathogens.In contrast, the lymphocyte-based adaptiveimmune system seems to have suddenly aris-en in the jawed vertebrate lineage (54, 55). Itis still not clear, however, how this highlysophisticated system involving hundreds ofspecific genes has evolved. The genome-wide identification of immunity-related genesin nonvertebrate chordates is expected to helpelucidate the evolution of both the innate andadaptive immune systems in vertebrates.

A systematic search of the C. intestinalisgenome failed to identify any of the pivotalgenes implicated in adaptive immunity, suchas immunoglobulin, T cell receptor, and ma-jor histocompatibility complex (MHC) class Iand II genes, although we cannot exclude thepossibility that Ciona has highly divergentorthologs of one or more of these genes. Amore convincing “negative” result was ob-tained by analyzing the genes that encode the20S proteasome, which destroys misfoldedproteins (56). Eukaryotic 20S proteasomesare composed of 14 different gene products;three possess catalytic activity. Mammalscontain a second copy of each of the genesthat encode these three catalytic subunits.These duplicated genes encode componentsof an immunoproteasome that is essential forthe presentation of antigen to T cells. TheCiona genome contains orthologs for each ofthe 14 vertebrate proteasome genes, but nonefor the immunoproteasome-specific genes.These observations strongly suggest thatCiona lacks the antigen-presenting system forT cells. Putative Ciona homologs of the ver-tebrate MHC-encoded genes do not exhibitan extensive linkage among them, nor syn-tenic conservation with the vertebrate MHC.

Although there is no evidence for adaptiveimmunity, a search of the Ciona genomereveals a variety of genes that are likely tomediate innate immunity. There are a largenumber of possible complement genes, in-cluding C1q-like and C6-like genes, threeToll-like receptor genes, and a variety oflectin genes. No interleukin or interleukin-receptor genes were identified except for aninterleukin-1 (IL-1) receptor and an IL-17receptor gene. It is possible that Ciona hasevolved distinctive innate-immunity genes,because a search of the protein domainsfound in vertebrate innate-immunity genesidentifies a number of Ciona genes that con-tain these domains in previously unknowncombinations.

Muscle-related genes inCiona intestinalisVertebrates produce multiple isoforms ofmany muscle contractile proteins, whose dif-ferential expression contributes to the func-

tional specializations of different muscle celltypes such as fast skeletal, slow skeletal,cardiac, and smooth muscles. Ascidians alsodevelop several distinct muscle types, includ-ing sarcomeric muscle in the tail of the larvaand in the adult heart, and nonsarcomeric, buttroponin-regulated, muscle in the adult body-wall (siphon and mantle) (6).

A search of the C. intestinalis genomesuggests that in most cases, the vertebrategene families encoding multiple isoforms ofthick- and thin-filament proteins arose in thevertebrate lineage following the ascidian/ver-tebrate divergence. For example, vertebratescontain three differentially expressed genes(fast skeletal, slow skeletal, and cardiac) foreach of the troponin subunits TnI and TnT.However, the Ciona genome contains oneTnI gene and one TnT gene, each of which isequally related to all three of the correspond-ing vertebrate isoforms. Thus, the vertebrateTnT and TnI gene families appear to havearisen within the vertebrate lineage followingthe ascidian/vertebrate divergence by dupli-cation of a single ancestral TnI or TnT gene,respectively. For some muscle proteins, e.g.,tropomyosin, sarcomeric myosin heavychain, and essential (alkali) myosin lightchain, the Ciona genome contains a family oftwo or more related genes. In each of thesecases, however, the vertebrate and ascidiangene families appear to have arisen indepen-dently; the vertebrate genes form a clade thatdoes not include any of the Ciona genes,suggesting duplication from a single ancestralgene following the ascidian/vertebrate diver-gence. Thus, in terms of molecular geneticspecializations and tissue evolution, verte-brate and ascidian muscles have followedlargely separate trajectories.

An example of lineage-specific special-ization concerns vertebrate smooth muscle.The Ciona genome does not appear to containsmooth muscle–specific genes such as thoseencoding smooth-muscle myosin heavy chainor actin. There are two major clades of classII myosin heavy chain genes in the meta-zoa—a sarcomeric clade and a nonmuscle/smooth muscle clade that in vertebrates in-cludes two nonmuscle myosins and the my-osin that is specifically expressed in smoothmuscle (57). The Ciona genome has six classII myosin genes; five fall into the metazoansarcomeric clade and one falls into the non-muscle/smooth muscle clade. The latter genefalls outside of a clade containing the verte-brate nonmuscle and smooth muscle genes.These relationships suggest that the ascidian/vertebrate common ancestor had a singleclass II nonmuscle myosin gene, and that theevolutionary invention of smooth muscle–specific myosin, through duplication/diver-gence of this gene, occurred in the vertebratelineage after the ascidian/vertebrate diver-gence. A similar situation exists for the

smooth muscle–specific actins. Vertebratescontain four closely related genes encodingmuscle-type actins: two for smooth muscle(vascular and enteric) and two for sarcomericmuscle (cardiac and skeletal) (58). Thesmooth muscle actins differ at two diagnosticamino acid residues from the sarcomeric ac-tins (59). The Ciona genome encodes sixmuscle-type actins, all of which contain theresidues diagnostic of sarcomeric muscle ac-tins. The absence of smooth muscle–specificactin and myosin genes in the Ciona genomesupports the concept that the tissue calledsmooth muscle in the vertebrates may be aunique characteristic that arose within thatgroup (60).

Origins of the vertebrate heartThe ascidian heart, although rudimentary instructure, exhibits an embryonic origin simi-lar to that of vertebrate hearts (6). In mostascidians, a beating heart is first detectedafter metamorphosis; however, the fusion ofbilateral heart primordia occurs during em-bryogenesis. The Ciona genome contains or-thologs of many of the critical regulatorygenes implicated in vertebrate heart develop-ment. These include myocardin, the GATAtranscription factors Nkx2.5 and its paralogs,the basic helix-loop-helix (bHLH) factors dand eHand, and the Mef2A, B, C, and Dgenes (61). The presence of these multipleparalogs, many of which seem to exhibitoverlapping functions, has complicated ef-forts to decipher vertebrate heart develop-mental genetics (61). As seen for a number ofvertebrate gene families and subfamilies,Ciona only contains single copies of thesegenes. There is only one Nkx2 ortholog, onebHLH Hand ortholog, and one MEF-2 or-tholog. Additionally, there are only twoputative orthologs of the vertebrate GATAfactors, one of which shows relatively poorconservation within the DNA binding do-main. These observations raise the possibilitythat research into Ciona heart genetics couldbe used to assess the functions of vertebrateheart genes without the complications arisingfrom redundancies in gene function.

Genes of the Spemann OrganizerThe Spemann organizer coordinates the for-mation of axial and paraxial meso-endodermand the induction of neural tissues in a varietyof vertebrate embryos. The organizer is asource of secreted antagonists that inhibit twomajor families of signaling molecules: BMPsand Wnts (62). Several BMP and Wnt genesare present in vertebrate and invertebrate ge-nomes, including Ciona. However, Drosoph-ila and C. elegans lack obvious Wnt antago-nists and contain only one BMP antagonist,which is related to the vertebrate Chordingene. In contrast, the Ciona genome containsa spectrum of Wnt and BMP antagonists,

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 298 13 DECEMBER 2002 2163

similar to those seen in vertebrates. Theseinclude Wnt antagonists related to the verte-brate sFRP and Dickkopf gene families, andBMP antagonists related to chordin, noggin,and the DAN/gremlin families. Ascidiansalso possess a Nodal gene; these encodeTGF-� signaling molecules that are an essen-tial component of the vertebrate organizer,but absent in worms and flies. The vertebrateorganizer is also a source of Wnt-11, whichregulates morphogenesis through convergentextension movements (63). Wnt-11 is presentin Ciona but not in the Drosophila and C.elegans genomes. Hence, the Ciona genomecontains the full complement of signalingmolecules produced by the Organizer. In ad-dition, a number of the transcription factorsthat regulate the expression of the signalingmolecules are also conserved in Ciona, in-cluding goosecoid, blimp1, hex, and otx.

The presence of most Organizer genes inCiona does not necessarily imply that it con-tains a functional Organizer. First, none of theconserved factors are strictly specific to thevertebrate organizer. All have additional func-tions later in the vertebrate life cycle. Second,we do not know whether ascidian orthologs areexpressed in the dorsal mesoderm at the onsetof gastrulation. Third, experimental evidencedoes not support a role for the dorsal mesodermin the specification of paraxial tissues such asthe tail muscles. These tissues can develop in anautonomous manner when separated from thedorsal mesoderm (64). Finally, BMPs may berequired for the development of the notochordin ascidians, but block notochord developmentin vertebrates. It is possible that the last com-mon chordate ancestor had a functional Orga-nizer that degenerated in ascidians owing to thelimited need for long-range signaling in embry-os with relatively small cell numbers. Alterna-tively, ancestral chordates may have used Or-ganizer molecules for other processes and sub-sequently recruited them into the Organizer inthe cephalochordate/vertebrate lineages.

Neural genesThe central nervous system of the ascidiantadpole is a chordate nervous system inminiature, with a 300-cell cerebral vesiclecontaining several sensory systems, and ahollow dorsal neural tube with a four-cellcircumference above a 40-cell notochord;10 motor neurons innervate the paired tailmuscles (65). Several neural genes knownin vertebrates have clear homologs inCiona and should now be viewed morebroadly as chordate characters. Ciliatedependymal cells and Reissner’s fiber in theneural tube appear to bind biogenic aminesas in vertebrates, as supported by the pres-ence of a SCO-spondin gene in Ciona.Claudins involved in vertebrate tight junc-tions are also present, as are muscle-typeacetylcholine receptors, the first example

of such a receptor outside vertebrates (66 ).Noelin, a secreted factor that endows neuralcrest competence in vertebrates, is alsopresent, although no evidence for neuralcrest cells is known in ascidians; an or-tholog of the tyrosinase gene involved inmelanin production in neural crest cells isalso found, although such genes are alsoknown in invertebrates (67 ).

Phototransduction differs dramaticallyin invertebrates and vertebrates. In verte-brates, a dark current of sodium ions is shutoff by a reduction in cyclic guanosinemonophosphate triggered by the photo-isomerization of rhodopsin (68). By con-trast, no dark current is present in inverte-brates, and light absorption triggers releaseof intracellular calcium by way of a phos-pholipase C pathway to open cation chan-nels (69). Three rhodopsin photoreceptorsfound in the Ciona genome are closelyrelated to the deep brain/pineal opsin ofvertebrates, supporting the ancient commonorigin of the vertebrate eye as derived fromthe pineal system. Orthologs of many of thecomponents of the vertebrate phototrans-duction cascade are present in the Cionagenome, suggesting that the dark current–based vertebrate scheme is found in thisinvertebrate. Thus, the dark-current ap-proach to phototransduction is likely achordate (or possibly deuterostome) ratherthan vertebrate character. It is easy to spec-ulate that the ancestral bilaterian had aprimitive light-sensing capability that washarnessed for amplification to distinct in-tracellular messenger systems in protos-tomes and deuterostomes.

The Ciona genome apparently lacksmany genes involved in the transmission oflong-range axonal signals and/or long-range guidance cues, presumably related tothe compactness of the few-millimeter-longtadpole. In particular, genes involved in theformation and maintenance of myelin areabsent, as are neurotrophins and their re-ceptor, plus neurexophilins and otherneuronal glycoproteins involved in axonguidance. The machinery for epinephrinesynthesis (expressed in neural crest– de-rived cells of vertebrates) is missing, as arethe enzymes for synthesis of melatonin andhistamine. Only weak matches to olfactory

receptors were identified, although Cionapresumably has chemosensory systems in-volved in larval attachment. Finally, al-though Ciona exhibits circadian rhythms,no clear ortholog of the period gene wasidentified.

Lineage-Specific Innovations inAscidiansIt is unclear to what extent ascidians rep-resent an ancestral, basal chordate or havediverged from ancient chordates throughthe acquisition of lineage-specific innova-tions (1–3). For example, the metamorpho-sis of tadpoles into sessile adults is a hall-mark of the ascidians (6 ); this process isnot seen in other chordate lineages, noteven other Urochordates (70). Here we con-sider three aspects of ascidian evolution inlight of the Ciona genome: genes that areconserved in other animals, but appear tobe missing in Ciona; genes that are found inmultiple copies in Ciona but present insingle copies in vertebrates; and genes thatare unique to ascidians and not seen inother animals.

Genes missing in the Ciona intestinalisgenomeHox genes are characterized by their clus-tered organization and by collinearity be-tween gene order within the cluster and se-quential patterns of expression during devel-opment (71). They are classified into 13paralogy groups on the basis of sequencesimilarity (72). Vertebrates have multipleHox gene clusters, each containing about 10different genes. By contrast, all invertebratesthat have been examined contain a single Hoxcluster (with the exception of Drosophila, inwhich the ancestral complex is split into theANT-C and BX-C). In protostomes, the clus-ter usually contains 10 or fewer genes be-cause there are only one or two genes fromparalogy groups 9 to 13. The sea urchin, aninvertebrate deuterostome, possesses a singleHox cluster that contains every class of Hoxgene, but with only a single Hox-4/5 gene,and three genes related to the posterior Hoxgenes 9 to 13 (73). The cephalochordate Am-phioxus has a single cluster that contains eachof the 13 Hox genes, as well as an additional14th gene (74, 75).

Fig. 5. The diagram represents the different scaffolds containing the Ciona Hox genes. Red arrowsindicate Hox genes and the orientation of transcription. The black rectangles correspond to nonHoxgenes. The dashed lines indicate intervals extending to the ends of the respective scaffolds that lackHox genes but contain additional non-Hox genes. A predicted reverse transcriptase gene is detectedat the end of scaffold 1.

R E S E A R C H A R T I C L E S

13 DECEMBER 2002 VOL 298 SCIENCE www.sciencemag.org2164

There are nine Hox genes in the C.intestinalis genome (76, 77 ). The CionaHox genes are located on five differentscaffolds containing Hox 1, Hox 2 to 4,Hox 5 and 6, Hox 10, and Hox 12 and 13.Hox 7, 8, 9, and 11 are apparently absent,while Hox 12 and 13 are divergently tran-scribed (Fig. 5). The sum of the five scaf-folds is �980 kb; if located in a singlecomplex, this would be considerably largerthan any of the vertebrate Hox complexes(each of the human complexes is �125 kb),and even larger than the one present in seaurchins (�500 kb) (73). The loss of thegenes in paralogy groups 7, 8, and 9 may bespecific to the entire urochordate lineage,because these genes appear to be absent inthe larvacean genome as well (78).

In addition to the apparent loss of Hox7, 8, 9, and 11, a number of other genescommon to Drosophila and vertebrates areapparently missing from the Ciona genome.These include the gene encoding histidinedecarboxylase, which is required for hista-mine synthesis in vertebrates and Drosoph-ila. Moreover, genes encoding orthologs ofthe Drosophila nuclear receptor tailless(Fig. 4, dotted line) and the circadian

rhythm protein clock were not found in theCiona genome. LIM class homeobox genescomprise six major groups; Lim1, Lim3,Islet, apterous, LIMX, and Lhx6/7 (79).Whereas flies, mice, and humans containone or more genes in each class, Cionacontains members of the first five classesbut appears to lack Lhx6/7 genes. Ofcourse, these apparent missing genes aresubject to the caveat that the genome se-quence is not complete.

Gene duplications found only in theCiona intestinalis genomeA number of gene duplication events appearto be unique to the ascidian lineage (Fig. 6).Several are observed for genes that encodetranscription factors, including FoxA, Pax2/5/8, and Prox. Prox is a member of the atyp-ical class of homeobox genes (80). Nema-todes (ceh-26), flies (prospero), mice(PROX1), and humans (Prox1) contain a sin-gle copy of the gene, which is involved in eyedevelopment. In contrast, the Ciona Proxgene was duplicated into two genes that arealigned in tandem within the same scaffoldwithin 20 kb. As mentioned earlier, the singleTbx6 of vertebrates appears to be duplicated

into four genes in Ciona. Similarly, the singleancestral gene encoding gonadotropin-releas-ing hormone receptor has also duplicated intofour genes.

Genes that suggest functionalinnovationsUrochordates are also called tunicates be-cause the adult body is enclosed by a fi-brous tunic, which in some species is thickand tough (6 ). The matrix of the tuniccontains fibers composed largely of a cel-lulose-like carbohydrate called tunicin(81). Because cellulose is typically pro-duced only by plants and bacteria, its pres-ence in ascidians is a curious lineage-specific evolutionary innovation.

Cellulose synthesis and degradation arecontrolled by a variety of enzymes, includ-ing cellulose synthases and endoglu-canases, respectively (82). The Cionagenome contains at least one potential cel-lulose synthase and several endoglucanases(Fig. 7). Most of the endoglucanases arerelated to the Korrigan genes of Arabidop-sis, which are essential for the biosynthesisof the plant wall (83). However, the closestmatches are seen for the endoglucanasespresent in termites and wood-eating cock-roaches. Most of these latter endoglu-canases are encoded in the genomes ofbacterial and fungal symbionts and are es-sential for the use of wood as a food source.However, there is also evidence that someof the endoglucanases present in termitesmay be endogenous and presumably arosefrom a horizontal gene-transfer event be-tween symbiont and host (84 ).

There is little doubt that the endoglu-canases present in Ciona are endogenousgenes and not due to contamination bypotential symbionts. For example, one ofthe Ciona endoglucanase genes is presenton scaffold 11, which is �680 kb in length(Fig. 7). This gene is expressed becausethere is a corresponding cDNA. There areconserved predicted genes located both 5�and 3� of the endoglucanase. The 5� geneencodes a putative glutamate/aspartate

Fig. 6. Lineage-specificgene duplications in Ciona.amphioxus, zebrafish(Danio), mouse, and hu-mans each appear to con-tain single copies of theTbx6 gene (top). However,the Ciona genome containsfour Tbx6 genes (red lines);these are contained as du-plicated genes in a singlescaffold. Similarly, thereare two copies of the Proxhomeobox gene in Ciona(red lines), but only singlecopies of this gene inmouse, human, zebrafish,worms, and flies (bottom).

Fig. 7. One of the Cionaendoglucanase genes isflanked by conserved “an-imal” genes, an aspartate/glutamate transporter onthe left and a splicing fac-tor on the right. The Kor-rigan-1 gene from Arabi-dopsis was used to BLASTthe Ciona genome (redrectangles, top). A predict-ed gene was identified onscaffold 11, which is 680kb in length; a portion of the scaffold is shown in this figure. A predicted genemodel contains extensive homology with the Korrigan gene used for theBLAST analysis (green rectangles represent predicted exons). It is flanked byat least two additional predicted genes. There are identified cDNAs for the

putative endoglucanase and the splicing factor on the right (gray rectanglesbelow the predicted exons in green). Moreover, both the genes on the leftand right are conserved in flies, worms (blue), and humans (teal). Therectangles and gaps correspond to exons and introns, respectively.

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 298 13 DECEMBER 2002 2165

transporter, whereas the 3� gene encodes aputative RNA splicing factor. Thus, whilethe endoglucanase gene is not conserved inworms, flies, or humans, it is flanked byauthentic “animal” genes.

Although it is possible that the genomesof wood-eating animals have acquired en-dogenous endoglucanase genes, no animalreported to date has been shown to containa bona fide cellulose synthase gene. Thereappears to be at least one such gene in theCiona genome that is similar to the cellu-lose synthases seen in a number of nitro-gen-fixing bacteria, such as Anabaena andSinorhizobium (85). Future studies will de-termine whether the encoded protein func-tions in cellulose biosynthesis. If so, thiswould represent a dramatic example of hor-izontal gene transfer.

Hemoglobins are oxygen-transportingproteins found in almost all representativesof every animal phylum (86 ). It is notable,therefore, that genes encoding hemoglobinsare not found in the Ciona genome. InsteadCiona contains two candidate genes encod-ing an alternate oxygen transport protein,hemocyanin, also found in arthropods andmollusks. Hemocyanins have oxygen-bind-ing centers that are structurally similar tothose seen in hemoglobins and tyrosinases(87 ). The Ciona hemocyanins are type IIIcopper proteins and have conserved aminoacid residues in the upper-binding center,although they are divergent from those ofarthropods and mollusks. Genes encodingmyoglobin and hemerythrin are, by con-trast, not found in the Ciona genome.

ConclusionsOur initial reading of the Ciona genomeprovides new insights into the evolutionaryorigins of the vertebrates. The last sharedancestor of modern chordates probably pos-sessed single-copy genes for the present-day complement of gene families engagedin a variety of signaling and regulatoryprocesses seen in vertebrate development,including FGFs, Smads, and T-box genes.There were also rudiments of key verte-brate organ systems including the heart andthe pineal and thyroid glands. Though use-ful as an approximation of the ancestralchordates, ascidians have experienced no-table lineage-specific evolution, includingthe remarkable acquisition of genes thatcontrol cellulose metabolism.

There are two recurring themes in ouranalysis of the ascidian genome and its rela-tionship to the vertebrate genome. First, wefind repeatedly that a family or subfamily ofvertebrate genes has only a single represen-tative in Ciona. The implication is that theCiona gene content in these families corre-sponds to the complement of the ancestralchordate. Second, there were notable cases of

gene families with multiple members in bothCiona and vertebrates that could not beplaced in easy correspondence with one an-other. In these cases, the gene content of theancestral chordate was apparently mobilizedand diversified independently in the two lin-eages, providing examples of macroevolu-tionary change within the different branchesof the chordate phylum.

The streamlined nature of the Ciona ge-nome should have an enormous impact onunraveling complex developmental processesin vertebrates. For example, informationabout the function of the major vertebrateFGF subfamilies can be obtained in Cionathrough the use of simple gene-disruptionmethods (e.g., morpholinos), unhampered bythe complications associated with functionalredundancies often encountered in large genefamilies.

Further analysis of the Ciona genome willalso provide considerable information aboutthe regulation of vertebrate genes. The CionacDNA projects have characterized transcriptsfor more than three-quarters of the genes(28). Systematic in situ hybridization assaysare being performed to determine the spatialand temporal expression of each gene duringembryonic and larval development. Thus far,there is information for nearly one-third ofthe genes in the Ciona genome (http://ghost.zool.kyoto-u.ac.jp/indexr1.html) (88, 89). Fi-nally, all of this information—the gene-dis-ruption data and gene-expression profiles—will be compiled along with large-scalescreens for cis-regulatory DNAs to determinecomplete gene-regulation networks underly-ing the development of basic chordate fea-tures such as the neural tube and notochord(90, 91).

References and Notes1. H. Gee, Before the Backbone. Views on the Origin of

the Vertebrates (Chapman & Hall, London, 1996).2. B. K. Hall, Evolutionary Developmental Biology (Chap-

man & Hall, London, ed. 2, 1998).3. A. Adoutte et al., Proc. Natl. Acad. Sci. U.S.A. 97,

4453 (2000).4. A. Kowalevsky, Mem. Acad. St. Petersbourg Ser. 7 10,

1 (1866).5. C. Darwin, The Descent of Man (Murray, 1871).6. N. Satoh, Developmental Biology of Ascidians (Cam-

bridge Univ. Press, New York, ed., 1994).7. N. Satoh, Differentiation 68, 1 (2001).8. J. C. Corbo, A. Di Gregorio, M. Levine, Cell 106, 535

(2001).9. H. Nishida, Int. Rev. Cytol. 176, 24 (1997).

10. W. R. Jeffery, Int. Rev. Cytol. 203, 3 (2001).11. E. G. Conklin, J. Acad. Nat. Sci. (Philadelphia) 13, 1

(1905).12. H. Nishida, Dev. Biol. 121, 526 (1987).13. Y. Nakatani, R. Moody, W. C. Smith, Development

126, 3293 (1999).14. P. Sordino et al., Sarsia 85, 173 (2000).15. J. C. Corbo, M. Levine, R. W. Zeller, Development 124,

589 (1999).16. A. Di Gregorio, M. Levine, Differentiation 70, 132

(2002).17. H. Takahashi et al., Genes Dev. 13, 1519 (1999).18. K. S. Imai, N. Satoh, Y. Satou, Development 129, 3441

(2002).

19. M. W. Simmen, S. Leitgeb, V. H. Clark, S. J. M. Jones,A. Bird, Proc. Natl. Acad. Sci. U.S.A. 95, 4437(1998).

20. H.-C. Seo et al., Science 294, 2506 (2001).21. P. W. H. Holland, J. Garcia-Fernandez, N. A. Wil-

liams, A. Sidow, Development 1994 (Suppl.), 125(1994).

22. M. D. Adams et al., Science 287, 2185 (2000).23. S. Aparicio et al. Science, 297, 1301 (2002).24. Supplemental methods and data are available on

Science Online.25. N. Putnam, J. Chapman, I. Ho, D. Rokhsar, unpub-

lished results.26. E. S. Lander et al., Nature 409, 860 (2001).27. J.C. Venter et al., Science 291, 1304 (2001).28. Y. Satou et al., Genesis 33, 153 (2002).29. W. J. Kent, Genome Res. 12, 656 (2002).30. S. F. Altschul et al. J. Mol. Biol. 215, 403 (1990).31. E. Birney, R. Durbin, Genome Res. 10, 547 (2000).32. N. J. Mulder et al., Brief Bioinform. 3, 225 (2002).33. E. Pennisi, Science 287, 2182 (2000).34. The C. elegans genome consortium, Science 282,

2012 (1998).35. R. A. Cameron et al., Proc. Natl. Acad. Sci. U.S.A. 97,

9514 (2000).36. E. M. Zdobnov, R. Apweiler, Bioinformatics 17, 847

(2001).37. D. M. Ornitz, N. Itoh, Genome Biol. 2, REVIEWS

3005.1 (2001).38. Y. Satou, K. S. Imai, N. Satoh, Dev. Genes Evol. 212,

432 (2002).39. C. S. Hill, Int. J. Biochem. Cell Biol. 31, 1249

(1999).40. B. G. Herrmann, S. Labeit, A. Poustka, T. R. King, H.

Lehrach, Nature 343, 617 (1990).41. V. E. Papaioannou, Int. Rev. Cytol. 207, 1 (2001).42. I. Ruvinsky, J. J. Gibson-Brown, Development 127,

5233 (2000).43. A. S. Romer, The Vertebrate Body (Saunders, Philadel-

phia, ed. 4, 1970).44. C. K. Weichert, Elements of Chordate Anatomy

(McGraw-Hill, New York, 1953).45. D. Kirsten, Neonatal Netw. 19, 11 (2000).46. A. M. Capponi. Trends Endocrinol. Metab. 13, 118

(2002).47. V. Laudet, J. Mol. Endocrinol. 19, 207 (1997).48. P. Joost, A. Methner, Genome Biol., in press.49. K. Brennan, P. Gardner, Bioessays 24, 405 (2002).50. L. Lacroix et al., Thyroid 11, 1017 (2001).51. J. T. Dunn, A. D. Dunn. Thyroid 11, 407 (2001).52. V. Cryns, J. Yuan, Genes Dev. 12, 1551 (1998).53. G. S. Salvesen, V. M. Dixit, Cell 91, 443 (1997).54. J. A. Hoffmann, F. C. Kafatos, C. A. Janeway, R. A.

Ezekowitz, Science 284, 1313 (1999).55. M. F. Flajnik, M. Kasahara, Immunity 15, 351 (2001).56. H. D. Ulrich, Curr. Top. Microbiol. Immunol. 268, 137

(2002).57. J. R. Sellers, Biochim. Biophys. Acta 1496, 3 (2000).58. J. Vandekerckove, K. Weber, J. Mol. Biol. 179, 391

(1984).59. S. Kovilur, J. W. Jacobson, R. L. Beach, W. R. Jeffery,

C. R. Tomlinson, J. Mol. Evol. 36, 361 (1993).60. S. Oota, N. Saitou, Mol. Biol. Evol. 16, 856 (1999).61. R. M. Cripps, E. N. Olson, Dev. Biol. 246, 14 (2002).62. R. Harland, J. Gerhart, Annu. Rev. Cell Dev. Biol. 13,

611 (1997).63. C. P. Heisenberg et al., Nature 405,76 (2000).64. T. H. Meedel, J. R. Whittaker, Dev. Biol. 105, 479

(1984).65. I. A. Meinertzhagen, Y. Okamura, Trends Neurosci. 24,

401 (2001).66. S. Tsukita, M. Furuse, M. Itoh. Nature Rev. Mol. Cell.

Biol. 2, 285 (2001).67. J. H. Christiansen, E. G. Coles, D. G. Wilkinson, Curr.

Opin. Cell Biol. 12, 719 (2000).68. L. Stryer, Annu. Rev. Neurosci. 9, 87 (1986).69. C. S. Zuker, Proc. Natl. Acad. Sci. U.S.A. 93, 571

(1996).70. A. Nishino, N. Satoh, Genesis 29, 36 (2001).71. D. Duboule, Development (Suppl.), 143 (1994).72. T. R. Burglin, in Guidebook to the Homeobox Genes,

D. Duboule, Ed. (Oxford Univ. Press, New York, 1994),pp. 27–64.

73. P. Martinez, J. P. Rast, C. Arenas-Mena, E. H. Davidson,Proc. Natl. Acad. Sci. U.S.A. 96, 1469 (1999).

R E S E A R C H A R T I C L E S

13 DECEMBER 2002 VOL 298 SCIENCE www.sciencemag.org2166

74. J. Garcia-Fernandez, P. W. H. Holland, Nature 370,563 (1994).

75. D. E. Ferrier, C. Minguillon, P. W. H. Holland, J. Garcia-Fernandez, Evol. Dev. 2, 284 (2000).

76. A. Di Gregorio et al., Gene 156, 253 (1995).77. M. Gionti et al., Dev. Genes. Evol. 207, 515 (1998).78. D. Chourrout, R. Di Lauro, personal communication.79. O. Hobert, H. Westphal, Trends Genet. 16, 75

(2000).80. S. I. Tomarev, Int. J. Dev. Biol. 41, 835 (1997).81. G. Krishnan, Indian J. Exp. Biol. 13, 172 (1975).82. S. M. Read, T. Bacic, Science 295, 59 (2002).83. J. Zuo et al., Plant Cell 12, 1137 (2000).84. N. Lo et al., Curr. Biol. 10, 801 (2000).85. D. R. Nobles, D. K. Romanovicz, R. M. Brown Jr., Plant

Physiol. 127, 529 (2001).

86. R. C. Hardison, Proc. Natl. Acad. Sci. U.S.A. 93, 5675(1996).

87. K. E. van Holde, K. I. Miller, H. Decker, J. Biol. Chem.276, 15563 (2001).

88. Y. Satou et al., Development 128, 2893 (2001).89. T. Kusakabe et al., Dev. Biol. 242, 188 (2002).90. N. Harafuji, D. N. Keys, M. Levine, Proc. Natl. Acad.

Sci. U.S.A. 99, 6802 (2002).91. D. N. Keys et al., in preparation.92. This work was performed under the auspices of the

U.S. Department of Energy’s Office of Science, Bio-logical and Environmental Research Program; by theUniversity of California, Lawrence Livermore NationalLaboratory under Contract No. W-7405-Eng-48,Lawrence Berkeley National Laboratory under con-tract no. DE-AC03-76SF00098, and Los Alamos Na-

tional Laboratory under contract no. W-7405-ENG-36; and by MEXT, Japan (grants 12201001 to Y.K.,12202001 to N.S.), Japan Society for the Promotionof Science (to Y.S.), Human Frontier Science Program(to N.S. and M.L.), and NIH (HD-37105 and NSFIBN-9817258 to M.L.)

Supporting Online Materialwww.sciencemag.org/cgi/content/full/298/5601/2157/DC1SOM TextTables S1 to S9Figures S1 and S2References

1 November 2002; accepted 20 November 2002

The Cortical Topography ofTonal Structures Underlying

Western MusicPetr Janata,1,2* Jeffrey L. Birk,1 John D. Van Horn,2,3

Marc Leman,4 Barbara Tillmann,1,2 Jamshed J. Bharucha1,2

Western tonal music relies on a formal geometric structure that determinesdistance relationships within a harmonic or tonal space. In functional magneticresonance imaging experiments, we identified an area in the rostromedialprefrontal cortex that tracks activation in tonal space. Different voxels in thisarea exhibited selectivity for different keys. Within the same set of consistentlyactivated voxels, the topography of tonality selectivity rearranged itself acrossscanning sessions. The tonality structure was thus maintained as a dynamictopography in cortical areas known to be at a nexus of cognitive, affective, andmnemonic processing.

The use of tonal music as a stimulus forprobing the cognitive machinery of the hu-man brain has an allure that derives, in part,from the geometric properties of the theo-retical and cognitive structures involved inspecifying the distance relationships amongindividual pitches, pitch classes (chroma),pitch combinations (chords), and keys (1–3). These distance relationships shape ourperceptions of music and allow us, for ex-ample, to notice when a pianist strikes awrong note. One geometric property ofWestern tonal music is that the distancesamong major and minor keys can be repre-sented as a tonality surface that projectsonto the doughnut shape of a torus (1, 4 ). Apiece of music elicits activity on the tonal-ity surface, and harmonic motion can beconceptualized as displacements of the ac-tivation focus on the tonality surface (3).The distances on the surface also help gov-ern expectations that actively arise while

one listens to music. Patterns of expecta-tion elicitation and fulfillment may underlieour affective responses to music (5).

Two lines of evidence indicate that thetonality surface is represented in the humanbrain. First, when one subjectively rateshow well each of 12 probe tones, drawnfrom the chromatic scale (6 ), fits into apreceding tonal context that is establishedby a single chord, chord progression, ormelody, the rating depends on the relation-ship of each tone to the instantiated tonalcontext. Nondiatonic tones that do not oc-cur in the key are rated as fitting poorly,whereas tones that form part of the tonictriad (the defining chord of the key) arejudged as fitting best (2). Probe-tone pro-files obtained in this manner for each keycan then be correlated with the probe-toneprofile of every other key to obtain a matrixof distances among the 24 major and minorkeys. The distance relationships among thekeys readily map onto the surface of thetorus (4 ). Thus, there is a direct correspon-dence between music-theoretic and cogni-tive descriptions of the harmonic organiza-tion of tonal music (7 ).

Second, electroencephalographic stud-ies of musical expectancy (8 –11) have ex-amined the effect of melodic and harmonic

context violations on one or more compo-nents of event-related brain responses thatindex the presence and magnitude of con-text violations. Overall, the cognitive dis-tance of the probe event from the estab-lished harmonic context correlates positive-ly with the amplitudes of such components.These effects appear even in listeners with-out any musical training (9, 11). The per-ceptual and cognitive structures that facil-itate listening to music may thus be learnedimplicitly (2, 12–15).

The prefrontal cortex has been implicatedin the manipulation and evaluation of tonalinformation (10, 11, 16–18). However, theregions that track motion on the tonality sur-face have not been identified directly. Whenpresented with a stimulus that systematicallymoves across the entire tonality surface, willsome populations of neurons respond selec-tively to one region of the surface and otherpopulations respond selectively to anotherregion of the surface?

Identification of tonality-trackingbrain areas. In order to identify corticalsites that were consistently sensitive to acti-vation changes on the tonality surface, eightmusically experienced listeners (see “sub-jects” in supporting online text) underwentthree scanning sessions each, separated by 1week on average, in which they performedtwo perceptual tasks during separate runs.During each run, they heard a melody thatsystematically modulated through all 12 ma-jor and 12 minor keys (see “stimuli andtasks” in supporting online text) (Fig. 1 andaudio S1). A timbre deviance detection taskrequired listeners to respond whenever theyheard a note played by a flute instead of thestandard clarinet timbre, whereas a tonalityviolation detection task required listeners torespond whenever they perceived notes thatviolated the local tonality (Fig. 1D). The useof two tasks that required attentive listeningto the same melody but different perceptualanalyses facilitated our primary goal of iden-tifying cortical areas that exhibit tonalitytracking that is largely independent of thespecific task that is being performed (see“scanning procedures” in supporting online

1Department of Psychological and Brain Sciences,2Center for Cognitive Neuroscience, 3DartmouthBrain Imaging Center, Dartmouth College, Hanover,NH 03755, USA. 4Institute for Psychoacoustics andElectronic Music, Ghent University, Ghent, Belgium.

*To whom correspondence should be addressed. E-mail: [email protected]

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 298 13 DECEMBER 2002 2167

1www.sciencemag.org SCIENCE Erratum post date 14 FEBRUARY 2003

post date 14 February 2003

ERRATUM

C O R R E C T I O N S A N D C L A R I F I C A T I O N S

TTEECCHHNNIICCAALL CCOOMMMMEENNTTSS:: Response to Comment on “Otolith δ18O record of

mid-Holocene sea surface temperatures in Peru” by C. F. T. Andrus et al.(10 Jan. 2003, www.sciencemag.org/cgi/content/full/299/5604/203b).

Reference (1) should have been cited in the first sentence, which should

read “We find the arguments of Bearez et al. (1) unconvincing.” All sub-

sequent references should be renumbered as number + 1. The numbers

in the reference list are correct.