Phylogenetic analysis of vertebrate fibrillar collagen locates the position of zebrafish alpha3(I)...

14
Phylogenetic Analysis of Vertebrate Fibrillar Collagen Locates the Position of Zebrafish a3(I) and Suggests an Evolutionary Link Between Collagen a Chains and Hox Clusters Ghislaine Morvan-Dubois, 1 Dominique Le Guellec, 1 Robert Garrone, 1 Louise Zylberberg, 2 Laure Bonnaud 2, * 1 Institut de Biologie et Chimie des Prote´ines, Equipe ‘‘Matrice Extracellulaire et De´veloppement,’’ CNRS UMR 5086, 7 passage du Vercors, 69367 Lyon, France 2 Universite´ Paris 7, Equipe ‘‘Formations Squelettiques,’’ CNRS UMR 8570, Case 7077, 2 place Jussieu, 75251 Paris Cedex 05, France Received: 9 August 2001 / Accepted: 14 May 2003 Abstract. Type I collagen in tetrapods is usually a heterotrimeric molecule composed of two a1 and one a2 chains. In some teleosts, a third a chain has been identified by chromatography, suggesting that type I collagen should also exist as an a1(I)a2(I)a3(I) heterotrimer. We prepared, from zebrafish, three distinct cDNAs identified to be those of the collagen a1(I), a2(I), and a3(I) chains. In this study on the evolution of fibrillar collagen a chains and their relationships, an exhaustive phylogenetic analysis, using vertebrate fibrillar collagen sequences, showed that each a chain constitutes a monophyletic cluster. Results obtained with the newly isolated sequences of the zebrafish showed that the a3(I) chain is phylo- genetically close to the a1(I) chain and support the hypothesis that the a3(I) chain arose from a dupli- cation of the a1(I) gene. The duplication might occur during the duplication of the actinopterygian ge- nome, soon after the divergence of actinopterygians and sarcopterygians, a hypothesis supported by the demonstration of a syntenic evolution between a set of fibrillar collagen genes and Hox clusters in mam- mals. An evolutionary scenario is proposed in which phylogenetic relationships of the a chains of fibrillar collagens of vertebrates could be related to Hox cluster history. Key words: Collagen a3(I) chain Zebrafish Phylogenetic analysis Hox cluster Introduction Collagens are proteins presumed to have appeared in the late Proterozoic (about 800 mA). They have al- lowed the emergence of organisms with stout bodies (Ohno 1996). Indeed, these proteins are the major structural components of the metazoan extracellular matrix, which plays an active role in a wide variety of complex processes. It has been established that, depending on tissues, the matrices vary in collagen composition (Mayne and Burgeson 1987; Vuorio and de Combrugghe 1990). The various collagens are re- ferred to as collagen types. More than 20 collagen types and at least 38 polypeptide chains have been identified and this number is still increasing (Myllyharju and Kivirikko 2001). Three polypeptide chains (a chains), each forming a left-handed helix, are intertwined and form a right-handed triple helix. In each of the polypeptide chains, every third amino acid is a glycine. Thus, the sequence of an a chain can be expressed as (Gly–X–Y) n , X and Y representing amino acids other than glycine and n varying ac- cording to the collagen type and domain. Proline is J Mol Evol (2003) 57:501–514 DOI: 10.1007/s00239-003-2502-x *Present address: Institut Jacques Monod, Tour 43, CNRS, Uni- versite´s Paris 6 et 7, Equipe ‘‘Evolution du De´veloppement des Ne´matodes,’’ 2 place Jussieu, 75005 Paris, France Correspondence to: Louise Zylberberg; email: [email protected] sieu.fr

Transcript of Phylogenetic analysis of vertebrate fibrillar collagen locates the position of zebrafish alpha3(I)...

Phylogenetic Analysis of Vertebrate Fibrillar Collagen Locates the Position of

Zebrafish a3(I) and Suggests an Evolutionary Link Between Collagen a Chains

and Hox Clusters

Ghislaine Morvan-Dubois,1 Dominique Le Guellec,1 Robert Garrone,1 Louise Zylberberg,2 Laure Bonnaud2,*

1 Institut de Biologie et Chimie des Proteines, Equipe ‘‘Matrice Extracellulaire et Developpement,’’ CNRS UMR 5086,

7 passage du Vercors, 69367 Lyon, France2 Universite Paris 7, Equipe ‘‘Formations Squelettiques,’’ CNRS UMR 8570, Case 7077, 2 place Jussieu, 75251 Paris Cedex 05, France

Received: 9 August 2001 / Accepted: 14 May 2003

Abstract. Type I collagen in tetrapods is usually aheterotrimeric molecule composed of two a1 and onea2 chains. In some teleosts, a third a chain has beenidentified by chromatography, suggesting that type Icollagen should also exist as an a1(I)a2(I)a3(I)heterotrimer. We prepared, from zebrafish, threedistinct cDNAs identified to be those of the collagena1(I), a2(I), and a3(I) chains. In this study onthe evolution of fibrillar collagen a chains and theirrelationships, an exhaustive phylogenetic analysis,using vertebrate fibrillar collagen sequences, showedthat each a chain constitutes a monophyletic cluster.Results obtained with the newly isolated sequences ofthe zebrafish showed that the a3(I) chain is phylo-genetically close to the a1(I) chain and support thehypothesis that the a3(I) chain arose from a dupli-cation of the a1(I) gene. The duplication might occurduring the duplication of the actinopterygian ge-nome, soon after the divergence of actinopterygiansand sarcopterygians, a hypothesis supported by thedemonstration of a syntenic evolution between a setof fibrillar collagen genes and Hox clusters in mam-mals. An evolutionary scenario is proposed in whichphylogenetic relationships of the a chains of fibrillar

collagens of vertebrates could be related to Hoxcluster history.

Key words: Collagen a3(I) chain — Zebrafish —Phylogenetic analysis — Hox cluster

Introduction

Collagens are proteins presumed to have appeared inthe late Proterozoic (about 800 mA). They have al-lowed the emergence of organisms with stout bodies(Ohno 1996). Indeed, these proteins are the majorstructural components of the metazoan extracellularmatrix, which plays an active role in a wide variety ofcomplex processes. It has been established that,depending on tissues, the matrices vary in collagencomposition (Mayne and Burgeson 1987; Vuorio andde Combrugghe 1990). The various collagens are re-ferred to as collagen types. More than 20 collagentypes and at least 38 polypeptide chains havebeen identified and this number is still increasing(Myllyharju and Kivirikko 2001). Three polypeptidechains (a chains), each forming a left-handed helix,are intertwined and form a right-handed triple helix.In each of the polypeptide chains, every third aminoacid is a glycine. Thus, the sequence of an a chain canbe expressed as (Gly–X–Y)n, X and Y representingamino acids other than glycine and n varying ac-cording to the collagen type and domain. Proline is

J Mol Evol (2003) 57:501–514DOI: 10.1007/s00239-003-2502-x

*Present address: Institut Jacques Monod, Tour 43, CNRS, Uni-

versites Paris 6 et 7, Equipe ‘‘Evolution du Developpement des

Nematodes,’’ 2 place Jussieu, 75005 Paris, France

Correspondence to: Louise Zylberberg; email: [email protected]

sieu.fr

often found in the X position and 4-hydroxyprolinein the Y position (Fietzek and Kuhn l976). This triplehelical domain shared, at least in part, by all colla-gens is flanked on either side by two terminal globulardomains, the N- and C-propeptides. Each collagenmolecule is composed of either three geneticallyidentical a chains forming a homotrimeric type suchas type II collagen a1(II)3 and type III collagena1(III)3 or three genetically distinct a chains forminga heterotrimeric type such as type I collagen [a1(I)]2a2(I) or type VI collagen a1(VI)a2(VI)a3(VI).Collagens are divided into two major groups based

on three important criteria: the nature of the supra-molecular aggregates, the length of the a chains, andthe continuity of Gly–X–Y domains. Thus, fibril-forming (fibrillar) collagens are distinct from non-fibril-forming (nonfibrillar) collagens (Vuorio and deCrombrugghe 1990; van der Rest 1991). Nonfibrillarcollagens form various supramolecular structures;they have triple helical regions that vary in length andcontain non-triple helical interruptions. The fibrillarcollagens (Table 1) appear as cross-striated fibrils andshow conserved protein structures, not only in thehelical domain but also in the nonhelical terminaldomains. All collagen a chains are encoded by dis-tinct genes except the a3(XI) chain, which is a post-translationally overmodified product of the genecoding for the a1(II) chain (Eyre and Wu 1987). It isnoteworthy that the COL3A1 and COL5A2 genes lieclose to one another in a tail-to-tail orientation onchromosome 2 (Valkkila et al. 2001), whereas theremaining five collagen genes that have been mappedare not linked (Table 1).The nonfibrillar collagens are encoded by genes

without obvious conservation of intron–exon organi-zation (Vuorio and de Combrugghe 1990; Exposito etal. 1991; van der Rest and Garrone 1991). In con-trast, fibrillar collagen genes are conserved and arethought to have derived from a single ancestral geneby gene duplication (Mayne and Burgeson 1987;

Vuorio and de Crombrugghe 1990; Exposito et al.2000). In the fibrillar collagens, the helical domain ofeach a chain contains a little more than 1000 aminoacids. The C-propeptide, composed of around 240–275 amino acids, is connected to the main triple helixby a short telopeptide which is the most conserveddomain. Therefore, it has been assumed to play acritical role in the first formative steps of the pro-collagen molecule, from recognition and initialinteraction to trimerization and stabilization of themolecule (Vuorio and de Combrugghe 1990). Thetwo terminal globular domains, the N- (for collagensI, II, and III) and C-propeptides are cleaved duringextracellular processing. Two short non-triple helicaldomains, the N- and C-telopeptides, remain attachedto the helical domain. The functional diversity ofcollagens is achieved by posttranslational modifica-tions that can vary depending on the tissue carrying agiven collagen. The extensive posttranslational modi-fications involve various enzymes, many of which arespecific to collagen (Prockop et al. 1979a, b).Hydroxylation of specific prolyl and lysyl residues

and glycosylation of hydroxylysyl residues are intra-cellular steps. Differences in the extent of hydroxy-lation and glycosylation of lysine characterize thetype I collagen of hard and soft tissues (Veis andSabsay 1987). Extracellular steps include at least twoproteolytic steps involved in the conversion of pro-collagen to collagen and an oxidative deamination.Five types of fibrillar collagens have been identi-

fied in tetrapods: types I, II, III, V, and XI. They arethemselves distributed into two subgroups. Onesubgroup contains types I, II, and III, considered tobe major collagens. They constitute the main com-ponent of the extracellular matrix and differ in theaspect and diameter of the fibrils they form. Thesecond subgroup includes type V and type XI colla-gens, present in smaller amounts, and considered asminor collagens. They are associated, respectively,with type I and type II collagens forming heterotypic

Table 1. Fibrillar collagen types in vertebratesa

Type

Constituent

chain

Chain

composition Gene

Human

chromosomal location Occurrence

I a1 [a1(I)]3 COL1A1 17q21.3-q22 Skin, bones, teeth, tendons,

blood vessels, etc.a2 [a1(I)]2a2(I) COL1A2 7q21.3-q22

a3 a1(I) a2(I) a3(I) COL1A3 —

II a1 [a1(II)]3 COL2A1 12q13-q14 Cartilages, cornea

III a1 [a1(III)]3 COL3A1 2q24.3-q31 Skin, blood vessels

V a1 [a1(V)]3 COL5A1 9q34.2-q34.3 Associated with type I

collagen; specific localization

depending on the

chain composition

a2 [a1(V)]2a2(V) COL5A2 2q24.3-q31

a3 a1(V) a2(V) a3(V) COL5A3 19p13.2

a4 [a1(V)]2a4(V) COL5A4 —

XI a1 [a1(XI)]2a2(V) COL11A1 1p21 Cartilages

a2 a1(XI) a2(XI) COL11A2 6p21.2

a3 a3(XI) COL2A1 12q13-q14

a The location of the a chain genes are indicated on human chromosomes.

502

fibrils: types I/V in the cornea (Birk et al. 1989) andtypes II/XI in cartilage (Vaughan et al. 1988; Mendleret al. 1989). However, according to further studies,type V and type XI should no longer be considered asseparate collagen types but as members of a largercollagen family within which different combinationsof the a chains could exist (Mayne et al. 1993;Imamura et al. 2000).The characteristics of each type of fibrillar colla-

gen are shared by the various groups of vertebrates;nevertheless, some differences are observed betweentetrapods and other vertebrates. It is noteworthy thatthe fibrillar type III collagen, often associated withtype I collagen, has been identified so far only intetrapods. The fibrillar type I collagen is the mostubiquitous, hence the most studied; it often consistsof two a1(I) chains and one a2(I) chain in a hetero-trimeric structure. However, it is also found as an(a1)3 homotrimer in birds (Jimenez et al. 1977) andmammals (Uitto 1979). It can also be composed ofthree different chains, as in actinopterygians (Piez1965). Indeed, early biochemical analyses carried outon the skin of a teleost (Gadus morrhua) revealed thepresence of an a3(I) chain (Piez 1965). Further bio-chemical analyses reported the presence of a third achain in type I collagen in almost all groups of tele-osts (Kimura 1985, 1992; Kimura et al. 1987; Ram-shaw et al. 1988; Matsui et al. 1991; Zylberberg et al.1992a; Saito et al. 1998). This third a chain was alsoidentified in the acipensiform chondrostean Acipensertransmontanus, the white sturgeon (Kimura 1992).Peptide analyses indicated that the a3(I) chain ismuch more similar to the a1(I) than to the a2(I)chains (Piez 1965; Kimura 1992; Zylberberg et al.1992a). This third chain was not identified in thesarcopterygian dipnoan, Lepidosiren paradoxa, alobe-finned fish (Kimura 1992). Interestingly, up tonow, the third a chain has been identified in actin-opterygians only.When present in actinopterygians, the three a

chains of type I collagen are thought to be encodedby three different genes. The third a3(I) chain couldoriginate from a duplication of the a1(I) gene (Kim-ura 1992; Zylberberg et al. 1992a). This hypothesis issupported by a comparison of the three a(I) genes atthe nucleotide level, which shows the greatest simi-larity between a1(I) and a3(I) genes encoding,respectively, the a1(I) and a3(I) chains of the rainbowtrout (Saito et al. 2001). Most studies concerning thephylogenetic relationships of fibrillar collagen geneshave been established from a taxonomically limiteddataset including only mammals. This may resultfrom an obvious bias between the abundant literatureconcerning collagen sequences in mammals and thesparse data from other vertebrates.To gather more data within actinopterygians, we

carried out a molecular identification of the three a

chains of type I collagen of zebrafish (Danio rerio)using an exhaustive dataset with all sequences avail-able in the EMBL database and in ZFIN, a zebrafish-specific database. New data on the teleost type Icollagen were integrated in order to determine thephylogenetic position of the a3(I) chain and to assessrelationships within the fibrillar collagens.Moreover, the present study points to similarities

between the history of fibrillar-type collagen genesand that of the Hox genes in mammals (Bailey et al.1997). Hox genes encode proteins that have a role indetermining the anterior–posterior axis during thedevelopment of bilateralia. A single ancestral Hoxcomplex is found in basal chordates such as amphi-oxus (cephalochordates) (Garcia-Fernandez andHolland 1996; Ferrier et al. 2000). In amphioxus, thecluster is constituted of 12 genes, likely to have beenbrought about by gene loss from an ancestor whichpossessed 13 genes (Ferrier and Holland 2001). Inextant mammals, the four Hox clusters, each of thembeing constituted of 9 to 11 genes, appear to havearisen by duplication events from the single ancestralone and by gene loss (Ohno 1996; Aparicio 2000).Five fibrillar collagen genes are linked to each ofthe four clusters and they were thought to haveduplicated concomitantly (Bailey et al. 1997). Actin-opterygians, especially teleosts, show further dupli-cation of Hox clusters leading to seven Hox clustersin the zebrafish. Thus, based on the hypothesis ofduplication in mammals and our new data on colla-gen gene evolution in Danio, we suggest an evolu-tionary scenario for the fibrillar collagen genes inrelation to Hox cluster duplication. More informa-tion on the history of collagen could improve ourunderstanding of mineralized tissue emergence inrelation to vertebrate evolution.

Materials and Methods

Cloning of Zebrafish cDNA Encoding Type ICollagen

An 18–40 to postfertilization oligo (dT) zebrafish cDNA library (gift

from Pr Chambon) constructed in a lambda ZapII cDNA vector

(Stratagene) was screened with an HF677 human a1(I) collagencDNA probe, labeled with a32P-dCTP (Amersham, France) using arandom labeling kit (Promega). Two clones (collz2400 and

collz3500) were isolated and subcloned into pBluescript SK+

(Stratagene). DNA sequencing was performed by the dideoxynu-

cleotide chain terminationmethod (Sanger et al. 1977). A third clone

(collz360) was obtained by RT-PCR on a 72H zebrafish mRNA

using degenerate primers (forward, gtactgggtggacccyraccagggcn;

and rev, ttggagccctgsggaggmagrgmcttcttcagg) built from trout se-

quences (Saito et al. 1998). ScreeningESTbankswith theBasic Local

Alignment Search Tool (BLAST) resulted in one clone fd02a10.y1,

which overlapped collz360. fd02a10.y1 data were retrieved from the

Zebrafish Information Network (ZFIN), Zebrafish International

Resource Center, University of Oregon, Eugene, OR 97403-5274

(World Wide Web URL: http://zfin.org/, June 22, 2000). The

503

504

fd02a10.y1 clone was purchased from RZPD (http://rzpd.de) and

sequenced. The cDNA obtained was named Collz1500.

Sequence Analysis

Nucleotide sequences of the following collagen a chain genes weretaken from EMBL. Bos taurus a1(III): L47641; Chrysophrys major

a1(V/XI): AB045975; Cynops pyrrhogaster a1(I): AB015438; a1(II):AB022046; Danio rerio a1(I): AJ318212; a2(I): AJ318213; a3(I):AJ318214; a1(II): U23822; Equus caballus a1(II): U62528; Gallusgallus a1(V): AF137273; a1(XI): M88593; Homo sapiens a1(I):K01228; a2(I): J03464; a1(III): X14420; a1(V): M76729; a2(V):Y14690; a3(V): AF177941; Mus musculus a2(I): X58251; a1(V):AB009993; a2(V): NM007737; a3(V): AF176645; a1(XI): D38162;Oncorhynchus mykiss a1(I): AB052835; a2(I): AB052837; a3(I):AB052836; Rana catesbeiana a1(I): AB015440; a2(I): D88764;Rattus norvegicus a1(I): Z78279; a1(II): L48440; a1(III): AJ005395;a1(V): AF272662; a2(V): AJ224880; a4(V): AF272661; Xenopuslaevis a1(I): AB034701; a1(II): M63595.Nucleotide and amino acid alignments of all collagen gene se-

quences were performed using the Clustal X package (Thompson

et al. 1997) and were corrected by eye, based on the known

structural constraints (position of cysteine, etc.) using the Se-Al

program (Rambaut 1996).

cDNAs encoding the helical and C-propeptide/C-telopeptide

domains were available for almost all the species. The data were

partial for some species (one domain only had been sequenced),

and therefore, analyses were performed separately for each do-

main. Preliminary analyses included all vertebrate sequences

available; the shortest sequence defined the length of the alignment.

In this case, the length of the sequences had no influence on the

topology of the tree. But when the same taxa were represented for

each region, results obtained by the analysis of the different regions

could be compared. In the phylogenetic analysis shown here, a

reduced dataset including the longest sequences was used. Partial

sequences were discarded.

The number of nonsynonymous mutations, i.e., leading to an

amino acid change, per nonsynonymous site (Ka value) and the

number of synonymous mutations, i.e., without any change of

amino acid, per synonymous site (Ks value) were calculated for all

possible observed combinations of collagen types. The Ka/Ks ratio

is a measure of the selective pressure on a protein (Li 1993).

Because of the difficulty, on the one hand, of acertaining ho-

mology of sites in variable regions and, on the other, of having a

comparable dataset for comparisons of each pairwise, sites with a

gap were excluded from the sequences before the analyses. Inclu-

sion or exclusion of gaps did not change the topology of the trees:

the relationships between the groups were not modified, but the

bootstrap values were lowered when gaps were included (these

characters were not informative and led only to added ‘‘noise’’

in the analyses). All the phylogenetic analyses were performed

excluding gaps.

Trees were constructed using neighbor-joining (NJ) and parsi-

mony (PAUP4 [Swofford 1998]) methods. Bootstrap resampling

analyses were performed in all cases. Bootstraps higher than 70%

were considered significant. Only the trees inferred by NJ analysis

were included because the two phylogenetic methods used led to

similar results concerning the respective positions of each a chaingroup. Parsimony analyses using PAUP4 showed a low CI (under

50%), even if gaps were excluded, and a high homoplasy index

(near 60%).

Results and Discussion

Sequence Structure

We isolated three different cDNAs, named collz1500,collz2400, and collz3500. Their lengths range from 1.5to 3.5 kb. The deduced amino acid sequences showedthat the three cDNAs encoded the C-terminal regionof the helical domain, the C-telopeptide, and the C-propeptide of three different fibrillar collagen a chains(Fig. 1). The main characteristics of the fibrillar col-lagens were identified in the zebrafish sequences(length, cysteine-residue positions, absence of inter-ruption in the helical domain). Cysteine residues cre-ate intra- and interchain covalent bonds which areessential in the stabilization of the procollagen mole-cule (Dion and Myers 1987; Koivu 1987; Lees andBulleid 1994). In an attempt to identify each clone as a

Fig. 1. (this page and facing page) Amino acid alignment of C-

propeptide and C-telopeptide regions of a chains of type I, type II,and type III collagens. Deduced amino acid sequences of am-

phibians (Rana, Xenopus), zebrafish, rainbow trout, and human are

compared using ClustalW software. Alignment in the C-telopeptide

region is eye-corrected. The BMP1 cleavage site between C-pro-

peptide and C-telopeptide is indicated by the arrow. Conserved

residues such as aspartic acid in the cleavage site and cysteine

residues are in bold face. Conserved hydrophobic zones, in dark

gray, presumably play a role in the primary interaction between3 C-

propeptide regions of a chains and the contiguous hydrophiliczone, in light gray, which putatively play a role in the specificity of

the interaction.

505

specific zebrafish a chain of type I collagen, thenumber and positions of cysteine residues in the threeisolated clones from zebrafish cDNA were comparedwith the sequences available in the databank. Theeight cysteine residues in the collz2400 sequence werecharacteristic of the a1(I) chains. In the collz3500 se-quence, the seven cysteine residues and the replace-ment by a serine of the cysteine residue present in thesecond position in the a1 chain were specific to thea2(I) chain. In the collz1500 sequence, the seven cys-teine residues and the cysteine in the third positionreplaced by a serine were characteristic of an a3(I)chain as observed in the trout (Saito et al. 1998) andthe sole sequences (data not shown). The C-telopep-tide had also features which allowed discriminationbetween the a chains (Fig. 1). In brief, collz1500 cor-responds to an a3(I) chain, collz2400 to an a1(I)chain, and collz3500 to an a2(I) chain.The three isolated cDNAs encoded partial a chains

of type I collagen and included sequences encodingthe whole C-propeptide region. It must be pointedout that C-propeptides and C-telopeptides of teleostsequences are more conserved than the helical do-main in the various collagen types (Sicot et al. 1997).Thus, no specific amino acid sequence would be ap-propriate to identify any type of a chain in the helicaldomain composed of repetitive (Gly–X–Y)n features,whereas comparisons of C-propeptide/C-telopeptideamino acid sequences revealed specific structural andfunctional characteristics of each a chain (Fig. 1).The C-telopeptide region included the cleavage site

of the molecule. Although C-telopeptides showedgreat variability in size and in composition amongmammals, amphibians, and teleosts (Fig. 1), commonfeatures characterized each a chain. a2(I) C-telopep-tides were 15 amino acids long with three Y residuesand numerous G residues, a1(I) C-telopeptides were25 or 26 amino acids long with a QEKA signature,and a3(I) C-telopeptides of zebrafish and trout were25–29 amino acids long with a conserved featureQEKGPDP similar to QEKAPDP of fish a1(I)C-telopeptides. The cleavage site sequences differedbetween a chain types, although they were cleaved bythe same C-proteinase (BMP1) (Fig. 1). Conservedamino acids were aspartic acid following the A–Dcleavage site (Kessler et al. 2001) and an aromaticresidue in position )2 or )3. The specificity in lengthand amino acid composition of the C-telopeptide foreach kind of chain could be the sign of specific con-straints. This domain should support functionalconstraints, a deletion of this domain preventing theformation of stable procollagen molecules (Alvareset al. 1999). Thus, because of its functional impor-tance and its specificity, this domain was included inthe phylogenic analyses.The C-propeptide plays a role in the recognition of

the a chains for trimerization (Doyle and Smith

1998). It did not show much variability in size andpresented conserved regions in both nucleotides andamino acids. One of the most conserved zones is aneight-amino acid hydrophobic feature (Fig. 1), whichcould have a role in primordial interaction between achains (Lees et al. 1997). Specific recognition could beensured by adjacent sequences composed of hydro-philic amino acids. C-propeptide sequences wereconserved for each type of a chain but they differedfor the various a chain types. Amino acid alignmentshowed that the zebrafish a3(I) chain shared mostcharacteristic features with the a1(I) chain.The percentage identity among zebrafish, human,

and trout type I collagen a chain sequences are shownin Table 2. The highest percentages were observedbetween the zebrafish and trout a1(I) chains (93%)and the zebrafish and trout a2(I) chains (91%). Thea3 chains of trout and zebrafish were more divergent(83% of identity) than the a1 and a2 chains. Thisindicated that a3 seems less constrained than a1 anda2 chains.In an attempt to evaluate the selective pressure on

the different a chains, we calculated the Ka, Ks, andKa/Ks values for each pairwise comparison of chainsof the various collagen types included in this study.Ka/Ks values were in the same range for C-propeptideand helical domains, from 0.1 to 0.3. Nevertheless, Kavalues obtained with sequences of the helical domainwere higher than Ka values obtained from theC-propeptide region. Table 3 shows the values ob-tained by comparing zebrafish with trout a chains oftype I sequences. Both synonymous (Ks values) andnonsynonymous (Ka values) occurred more fre-quently in the helical domain than in the C-propep-tide. The proportion of nonsynonymous versussynonymous mutations indicated that with regard toKs, Ka was proportionally higher in the helical do-main than in the C-propeptide. Surprisingly, flexi-bility in substitutions (mutations) was higher in therepeated Gly–X–Y features of the helical domainthan in the C-propeptide. This range of possiblevariations in helical domain apparently had no con-sequence on the function of the entire molecule. Se-lective pressure was stronger in the C-propeptidedomain than in the helical domain as attested by thelower Ka/Ks ratio in a1(I) and a2(I) chains. The Ka/Ksvalue in the a3(I) chain was almost the same for thetwo domains: C-propeptide Ka/Ks was 0.22 (0.09 fora1 and 0.1 for a2). The C-propeptide of the a3(I)chain was less constrained than the C-propeptide ofa1(I) and a2(I) chains. This was confirmed by ahigher number of nonsynonymous mutations in thea3(I) chain than in the a1(I) and a2(I) chains (Table3) due also to a fast evolutionary rate in the a3(I)chain. In mammals, the heterotrimer [a1(I)]2a2(I) isknown to be essential for the functioning of themolecule. In teleosts, the a3(I) chain was identified in

506

the skin, bones, and muscles, but it was less abundantin the latter (Saito et al. 2001). According to theseauthors, the a3(I) chain is thought to be responsiblefor the lower denaturation temperature of trout col-lagen in the skin than in the muscles. Indeed, thethermal stability of collagen was determined in partby the total Gly–Pro–Pro triplet content, which wasthe highest in the vertebrate a1(I) chain compared tothe a2(I) chain and to the a3(I) chain in trout. Thelower thermal stability of trout skin collagen shouldbe due to the presence of an a3(I) chain which couldreplace an a1(I) chain in the heterotrimeric structure.Moreover, the low number of Gly–Pro–Pro tripletsmight reflect low constraints imposed in a3(I) genesequences. However, further studies are necessary toclarify the specific role of the a3(I) chain in buildingof the collagen fibrils and to determine its function.

Phylogeny

Our phylogenetic analyses were based on the helicaldomain and C-propeptide/C-telopeptide sequences ofmore than 30 vertebrate sequences (see Materials andMethods). All the trees obtained had a similar topo-

logy, with robust groups supported by high bootstrapvalues. The phylogenetic information issuing fromeach domain was different: this specificity could berelated to the specific role of each domain (see above).Hence, to compare the informative content of the twomain parts of the gene, phylogenetic analyses of thehelical and C-propeptide/C-telopeptide domains wereperformed separately, using nucleotide and aminoacid sequences.Amino acid alignments in the helical domain were

unambiguous since the boundary in the C-terminalregion was defined by the richness in proline and thebeginning of the C-telopeptide. Trees obtained fromnucleotides and amino acids were similar in theirgeneral topology (data not shown). The bootstrapvalues were lower than those obtained from the C-propeptide region (see below). The helical domain didnot constitute a good candidate for phylogeneticanalysis because there was very little variation in theamino acid sequences and nucleotide sequences weresaturated at third codon positions.As shown above, the C-propeptide region was

more constrained, especially at the third codon posi-tion, leading to a lower number of synonymous mu-tations. Thus, this domain was more adequate toestablish the phylogeny of distant species. The treesobtained after analyses of C-propeptide/C-telopeptidesequences of nucleotides and amino acids were almostidentical. The main difference between the two treesconcerned the position of the a3(I) chain with respectto that of the a(I) chain. In order to focus on the a1(I)chain relationships, the part of the tree represented inFigs. 2 and 3 showed only the a chains of type I andtype II collagens and left out the a chains of the otherfibrillar collagens, which, nevertheless, were takeninto account for building the entire tree. The treebased on nucleotides (Fig. 2) showed a monophyleticcluster of teleost a3(I) chains linked with teleost a1(I)chains; it suggested that the a3 lineage arose after thedivergence between actinopterygians and sarcoptery-gians. The tree based on amino acids (Fig. 3) showedan a3(I) group as a sister-group of a monophyletica1(I) cluster: this configuration suggested an earlierdivergence of the a3 lineage with a secondary loss ofthe a3(I) gene in tetrapods.

Table 2. Percentage identity based on amino acid sequences among the a chains of type I collagen of Danio rerio, Oncorhynchus mykiss,

and Homo sapiensa

a1(I), Danio rerio a2(I), Danio rerio a3(I), Danio rerio

a1(I), Oncorhynchus mykiss 93 61 78

a1(I), Homo sapiens 84 60 74

a2(I), Oncorhynchus mykiss 61 91 59

a2(I), Homo sapiens 61 76 58

a3(I), Oncorhynchus mykiss 80 59 83

a These percentages were calculated from the partial helical domain (173 amino acids), the C-telopeptide (15 to 24 amino acids), and the C-

propeptide (242 to 245 amino acids). Gaps were not taken into account in these percentages.

Table 3. Number of synonymous (Ks) and nonsynonymous (Ka)

substitutions per site calculated from C-propeptide (CPRO) regions

and helical domains (HD) of the a chains of type I collagen inDanio rerio and Oncorhynchus mykissa

Ka Ks KaKs

a1(I), Danio rerio

a1(I), Oncorhynchus mykiss

CPRO 0.0423 0.4422 0.0957

HD 0.0881 0.6582 0.1338

a2(I), Danio rerio

a2(I), Oncorhynchus mykiss

CPRO 0.0557 0.5128 0.10865

HD 0.1516 0.5919 0.2561

a3(I), Danio rerio

a3(I), Oncorhynchus mykiss

CPRO 0.1199 0.5237 0.2289

HD 0.1797 0.8381 0.2144

a Ka and Ks were evaluated from a nucleotidic alignment of the type

I and type II collagens of vertebrates.

507

We tested the impact of the nucleotide position(inside a codon) on phylogenetic relationships.Analyses were performed using the first and thethird codon positions and with the first two codonpositions. The trees were identical in their generaltopology to those based on nucleotides. The maindifference, as expected, concerned the end of thebranches, i.e., the position of the teleost a3(I) generelative to the a1(I) gene, the corresponding chains ofwhich were supposed to have diverged the most re-cently. Analysis with the first and the third codonpositions led to a tree similar to that based on allnucleotides (Fig. 2). In contrast, analysis performedwith the first two codon positions led to a tree similarto the amino acid-based tree (Fig. 3). Substitutions inthe third codon position had no consequence on therelationships observed with amino acid sequences,probably because they were more frequently synon-ymous and led to conservative changes of amino ac-ids. Thus, the position of the teleost a3(I) chain in thenucleotide tree might be attributed to substitutions inthe third position.Relationships between these closely related species

were better defined using nucleotide sequences,relationships and support at nodes being determinedessentially by the third codon position. The relation-ships of chains which have diverged earlier were de-termined at the amino acid level, i.e., by the first andsecond bases of each codon, the third leading only tonoise and saturation at this level of variation. Thevariability of the third codon position was sufficientto explain and to discriminate among close species

but this position was not necessary to establish rela-tionships between actinopterygians and sarcoptery-gians. Moreover, the amino acid sequence and thesecondary structure reflected the constraints on theprotein. The grouping of all a(I) chains observed inthe amino acid tree reflected its essential function,whereas the common origin of the teleost a1(I) geneand the a3(I) gene was revealed only by nucleotideanalysis, the third position enclosing all the infor-mation.The phylogenetic relationships of all fibrillar col-

lagen types were represented in a consensus tree (Fig.4) built from the trees obtained from separate anal-yses of amino acid and nucleotide sequences of theC-propeptide/C-telopeptide and helical domains. Therobustness of the nodes in phylogenies obtained(represented by bootstrap values) was very surprisingconsidering the supposed very ancient divergence ofthese chains. This could be explained by the con-straints imposed on a molecule so essential to themetazoan condition. However, phylogenetic rela-tionships sometimes reflect a bias in the data: it hasbeen shown that G+C content could affect phylo-genetic results and could be associated with longbranches and phylogenetic position (Foster andHickey 1999). All the sequences included in thisanalysis had a G+C value ranging from 40 to 62%,the range of values also found in the human genome(Eyre-Walker and Hurst 2001), and the three do-

Fig. 2. Part of the phylogenetic tree obtained from C-propeptide/

C-telopeptide nucleotide sequences which shows the relationships

of the a chains of type I and type II collagens. The tree was ob-tained by neighbor-joining analysis (735 nucleotides excluding

gaps).

Fig. 3. Part of the phylogenetic tree obtained from C-propeptide/

C-telopeptide amino acid sequences which shows the relationships

of the a chains of type I and type II collagens. The tree was ob-tained by neighbor-joining analysis (245 amino acids excluding

gaps). The teleost a3(I) chains appear as a sister group to all othervertebrate a1(I) chains, whereas in Fig. 2 the teleost a3(I) chainsappear as a sister group to the teleost a1(I) chains only.

508

mains (helical, C-propeptide, and C-telopeptide) hadcomparable G+C percentages. It could be noted thatthe glycine residue imposed at every third position inthe helical domain did not lead to a higher G+Crate. The a chain sequences of a given species did notshow an identical G+C percentage; the range ofvariations for the same species could be wide. In thesame way, A and B groups (Fig. 4) and also an achain or a type of collagen could not be identified bya specific G+C percentage. The variation in theG+C content, which differs in cold- and warm-blooded vertebrates (Bernardi 2000), was not foundin the collagen sequences and did not influence thegrouping of various a chains. Consequently, thestructure of the phylogenetic trees obtained shouldreflect the evolutionary relationships between thevarious a chains of collagen and the combinationsobserved were not a consequence of bias in basecomposition. Especially, the grouping of a3(I) of

Danio and Oncorhynchus was not due to a G+C biasbecause their G+C ratios (57.43 and 58.25%) were inthe middle of the range of variation.The consensus tree obtained from nucleotide and

amino acid data defined well-supported relationshipsof a chains of fibrillar collagens (Fig. 4). This treeshowed that the a chains constituted two main clus-ters. The first group, named A in our results, includedtypes I, II, and III and the a2(V) chain, and the secondone, named group B, contained type XI and the a1(V),a3(V) and a4(V) chains. Groups A and B were sup-ported by high bootstrap values. Type II did not ap-pear as monophyletic with parsimony analysis (datanot shown). However, with NJ analysis, type II andtype III each formed a monophyletic cluster, respec-tively. The various a chains of type I and type V didnot form monophyletic groups. Our study is the firstevidence of the relationships among the a chainclusters within groups A and B. Most a chains con-

Fig. 4. Consensus tree built from

phylogenetic trees obtained with amino

acid and nucleotide sequences. The trees

were obtained by neighbor-joining

analysis (gaps excluded) and the

robustness of the nodes was tested by

bootstrapping. The numbers on this

consensus tree indicate the lowest boot-

strap value obtained for the same node.

The branch lengths are not significant.

Group A includes genes of a chains lo-cated near Hox clusters and genetically

linked to them, whereas group B is

composed of genes not linked to them.

509

stituted monophyletic clusters supported by highbootstrap values: a1(II), a1/a3(I), a2(I), a1(III),a2(V), a1(XI), a2(XI) (data not shown), a1(V), anda3/a4(V). But we observed that a2(V) was not thesister group of a1(V) or a3/a4(V) and that a2(I) wasnot the sister group of a1(I). The fact that the a3(I)chain was included in the same group as the a1(I)chain supported the closer link of the a3(I) chain withthe a1(I) chain than with any other a collagen chain.Our data provide evidence that the nomenclature ofvertebrate fibrillar collagens, based on biochemicalcharacteristics, does not correlate with their phylo-genetic relationships (Sicot et al. 1997). This nomen-clature is based on functional observations but it doesnot take into account evolutionary concepts. A col-lagen type is not necessarily composed of a chainsbelonging to the same monophyletic cluster. Analysesincluding a chain sequences from a wide systematicrange are thus very helpful to better understand theappearance and the evolution of the collagen chains invertebrates. Fibrillar collagens were assumed to arisefrom a single ancestor that possessed collagen a chains(Yamada et al. 1980; Sandell and Boyd 1990; Ohno1996; Exposito et al. 2000). Our phylogenetic treeshows that the relationships within each cluster arecoherent with the systematic framework within an achain cluster; mammals are closer to amphibians thanto teleosts. This means that all collagen types andalmost all the a chains of group A were present beforethe radiation of the osteichthyans. The following re-sults concern group A, which presented a larger sys-tematic range than group B.

a2(V) and a1(III) genes are in a basal position; it ismost likely that their ancestor would have appearedvery early (Bailey et al. 1997). Type III collagen,found only in tetrapods, might have been lost in theactinopterygian lineage. We cannot exclude that thegene coding for a1(III) is present in teleosts, but as apseudogene so modified that neither the gene nor theprotein could be identified. This could be checked bythe identification of type III collagen in other actin-opterygians such as the sturgeon or in Polypterus, forinstance.The various a chains of type I collagen appeared

successively. The a2(I) chain appeared first, followedby the a1/a3 chains. Zebrafish and trout a3(I) chainswere grouped together within the a1(I) cluster: theemergence of the a3(I) chain probably occurred onlyin the actinopterygian lineage, after the tetrapod di-vergence and before the separation of salmoniforms(Oncorhynchus mykiss) and cypriniforms (Daniorerio), which occurred in the late Jurassic period. Thea1(II) lineage arose after a2(I).The relationships between the a chains could be

related to duplication and/or polyploidization eventsfrom an ancestor of the group A collagens. Theseevents could be correlated with the evolution of

other genes that might involve genetic linkage be-tween them.

Collagen and Hox Cluster Evolution

It has been shown that some coding genes, includingcollagens, were used to study the sequence of dupli-cation events of the Hox clusters (Hughes 1999).First, the separation of the fibrillar collagen genesinto two groups, A and B, could be explained by theirgenetic relations to Hox clusters. In mammals, groupA includes genes of the a chains located near Hoxclusters and genetically linked to them, whereasgroup B is composed of genes not linked to them(Bailey et al. 1997). In group A, at least one a chaingene is associated with a Hox cluster (four Hoxclusters have been identified in mammals). Bailey et al.(1997) have shown that a2(I) is genetically linked tothe Hox A cluster, a1(I) to Hox B, a1(II) to Hox C,and a2(V) and a1(III) to Hox D. The variability incollagen types (or chains) could be considered aswitness to the duplication events of Hox clusters andthe evolution of a chains would then be directlyrelated to the evolution of the Hox clusters.More precisely, and according to this last

assumption, our phylogenetic trees suggest that anancestral collagen present before the evolutionaryradiation of osteichthyans duplicated into an a2(V)plus a1(III) ancestor on one side and a type I/II an-cestor on the other. In that second lineage, two suc-cessive duplications occurred: the first one giving riseto the a2(I) lineage and the second to the a1(I) anda1(II) lineages. As previously reported, mammalCOL3A1 and COL5A2 genes are located on chro-mosome 2 in a tail-to-tail orientation (Valkkila et al.2001). However, because of the sequence differencesbetween the COL5A2 and the COL3A1 genes and thegreater number of Alu insertions in the intron of theCOL5A2 gene, it is not certain that they derived froma common ancestral collagen gene (Valkkila et al.2001). Our phylogenetic results did not show up typeIII as a sister group of a2(V). Considering the lack oftype III, in particular, in actinopterygians, a dupli-cation of the COL5A2 gene cannot explain thepresence of the COL3A1 gene on the same chromo-some, this location being considered to be the resultof a ‘‘stochastic’’ event (Weill et al. 1987).The linkage of these genes in mammals and the

phylogenetic relationships obtained in our study en-abled us to deduce Hox cluster duplications from thesequence of appearance of collagen genes (Fig. 5).Our phylogenetic consensus tree suggests that theHox D lineage appeared first, followed by the Hox Alineage and, finally, by the divergence between Hox Band Hox C. These results agree with the hypothesis ofthree rounds of duplication and with the sequence[D(A(B,C))] proposed by Bailey et al. (1997) in a

510

study based on four a chain collagen genes. Thisnumber of duplications is consistent with the resultsobtained by cladistic analysis of other coding genes(Hughes 1999).In zebrafish, as already found in mammals, Hox

clusters arose by duplication (Bailey et al. 1997). Fi-brillar collagen genes also derived from a single geneby duplication, whether concomitantly or not. Acomparative mapping (Postlethwait et al. 2000;Woods et al. 2000) and the characterization of a so-matic cell hybrid panel (Chevrette et al. 2000) estab-lished that in the zebrafish the a2(I) gene belongs tothe same linkage group as hox aa (LG19) and that thea1(I) gene belongs to the same linkage group as hoxbb (LG12). These data support our hypothesis of aconcomitant duplication of some collagen genes andthe Hox cluster. The a1(II) gene belongs to linkagegroup 8, different from that of hox ca (LG23) or hoxcb (LG11). This apparent absence of a genetic linkbetween the a1(II) gene and the Hox C cluster can beexplained by the frequent chromosomal rearrange-ments which occurred in the actinopterygian lineage(Woods et al. 2000). In zebrafish, A, B, C, and D Hoxclusters duplicated by polyploidization, leading toeight clusters reduced to seven, most probably by theloss of a cluster (Aparicio 2000). Duplication of thegenome occurred probably by tetraploidization afterthe tetrapod–actinopterygian divergence. It hadprobably arisen in an ancestral actinopterygian, fol-lowed by successive recombinations and deletions invarious proportions in the various groups of teleosts(Aparicio et al. 1997; Holland 1997). The a2(V) anda3(I) genes have not yet been shown to be linked, butbased on the phylogenetic trees obtained, the a3(I)gene seems to have appeared after the divergence ofthe Hox B/Hox C clusters, either by tandem dupli-cation of the a1(I) gene on the same chromosome orby polyploidization. Thus, in the latter case, the a3(I)gene could be located on another chromosome, ho-mologous to that containing the a1(I) gene. The a3(I)

gene could be located near the cluster identified as bbin zebrafish (Amores et al. 1998). The acipensiformscould help to determine the timing of the genomeduplication since these actinopterygians appearedearly after the actinopterygian divergence and alreadyhave an a3(I) chain (Kimura 1992). Study of theirHox clusters, which unfortunately have not beenidentified as yet, would help to establish the hypo-thetical link between the duplication of the a1(I)/a3(I) genes and the Hox clusters (Bruce et al. 2001).The position of the a3(I) gene compared to the a1(I)gene in relation to polyploidization and the hypoth-esis of duplication events in collagen and Hox clustergenes could be tested also by the chromosomal lo-calization of collagen genes using FISH (fluorescencein situ hybridization) in Danio rerio.In zebrafish, the duplication of genes simultane-

ously with that of Hox clusters has been establishedalso for evx genes (evx1, evx2, eve1) (Amores et al.1998; Soderberg et al. 2000). Each evx gene is ge-netically or physically linked with a Hox cluster andthey probably appeared by duplication of the chro-mosomes from an ancestral evx gene (Sordino et al.1996; Amores et al. 1998). Those evx duplicates couldundergo subfunctionalization (Avaron 2003) or en-sure new functions (Borday et al. 2001). Indeed, anew function could have been acquired when dupli-cates underwent positive selection (Van de Peer et al.2001). Then various collagens with new functionalcapacities could appear by duplication of an ancestralcollagen gene. Type I and type II fibrillar collagensare both involved in the mineralization process. Fromour phylogeny, type II collagen seems to appear aftertype I collagen. These two fibrillar collagens con-stitute the main organic components of bone andcartilage, respectively. Whether cartilage appearedearlier than bone is a debated question in studiesdealing with the earliest vertebrates (Janvier 1996).Vertebrates have two skeletons: the dermal skeleton(or exoskeleton), which develops into the skin with-

Fig. 5. Correlation between phylogenetic relationships of collagen types (deduced from our study) and Hox clusters (deduced from their

supposed linkage).

511

out a cartilaginous precursor (reviewed by Zylberberget al. 1992b), and the endoskeleton, where cartilageprecedes bone in ontogeny. The association of calci-fied cartilage and bone is considered to be an ancientfeature that regressed but is not completely lost inextant chondrichthyans, which have the ability toossify their endoskeleton (Peignoux-Deville et al.1982). Although the dermal skeleton was oncethought to precede the endoskeleton in the vertebratelineage, recent works suggest that the common an-cestor of all craniates already had an ‘‘exclusivelycartilaginous endoskeleton’’ (Janvier 1996). Recently,cartilaginous tissues have been described in the re-mains of one of the earliest craniates (Shu et al. 1999).Cartilages appeared very early in the craniate lineage,whereas mineralization is thought to be a relativelyrecent innovation which took place after the origin ofcraniates (Janvier 1999). The successive appearanceof collagen type I and type II does not discriminatebetween the hypotheses about the sequence of ap-pearance of two different processes leading to boneand cartilage, respectively. Indeed, type I collagen isthe most ubiquitous fibrillar collagen, which alsoexists in unmineralized connective tissues, such as thedermis. Furthermore, type II collagen is synthesizedin tissues other than cartilage such as the vitreous(Linsenmayer and Little 1978; Mayne et al. 1993) andthe chick notochord (Linsenmayer et al. 1973).Conversely, in hagfish and lampreys, tissues showmorphological and structural characteristics of car-tilage and are considered cartilaginous, but they arenoncollagenous. Noncollagenous cartilage is alsopresent in Cephalochordata, Arthropoda, and Moll-usca. Thus it could be hypothesized that cartilagedifferentiated very early in the evolution of bilateralia(Wright et al. 2001) but that type II collagen as anessential constituent of the cartilage in craniates is arecent innovation. This is congruent with the phy-logeny obtained, in which the a1(II) chain appearedafter the a2(I) chain. Thus, the specific expression ofcollagen types may be an adaptative phenomenon.Further studies on this phenomenon should clarifythe structure–function relationships.We conclude that the a3(I) gene shows character-

istics which may indicate that this gene is less con-strained than the a1(I) and a2(I) genes, perhaps inrelation to functional properties. Phylogenetic anal-yses show that the a3(I) chain is included in the samecluster as the a1(I) chain and support a closer link ofthe a3(I) chain with the a1(I) chain than with anyother collagen a chain. These data are consistent withthe hypothesis that the a3(I) chain identified in theactinopterygian lineage arose from a duplication ofthe a1(I) gene. According to the linkage of fibrillarcollagen genes and Hox clusters established inmammals, the phylogenetic trees built from verte-brate fibrillar collagen sequences suggest the rela-

tionships [D(A(B,C))] of Hox clusters and supportthe hypothesis of a third round of duplication in theactinopterygian lineage.

Acknowledgments. The authors are grateful to Pr Chambon for

providing the zebrafish cDNA library and Drs. Kimura and Sato

for the trout cDNA clone encoding collagen a3(I) chain. We areindebted to Dr. M. Laurin for his helpful advice and to Dr. A.

Kovoor for his valuable suggestions and review of our English. We

thank Drs. J. Bonaventure, M. Girondot, and F. Meunier for

constructive discussions. This work was supported by a ‘‘Bonus

Qualite Recherche’’ grant from Lyon I University.

References

Alvares K, Siddiqui F, Malone J, Veis A (1999) Assembly of the

type 1 procollagen molecule: selectivity of the interactions be-

tween the alpha 1(I)- and alpha 2(I)-carboxyl propeptides.

Biochemistry 38:5401–5411

Amores A, Force A, Yan YL, Joly L, Amemiya C, Fritz A, Ho

RK, Langeland J, Prince V, Wang YL, Westerfield M, Ekker

M, Postlethwait JH (1998) Zebrafish hox clusters and verte-

brate genome evolution. Science 282:1711–1714

Aparicio S (2000) Vertebrate evolution: recent perspectives from

fish. Trends Genet 16:54–56

Aparicio S, Hawker K, Cottage A, Mikawa Y, Zuo L, Venkatesh

B, Chen E, Krumlauf R, Brenner S (1997) Organization of the

Fugu rubripes Hox clusters: Evidence for continuing evolution

of vertebrate Hox complexes. Nature Genet 16:79–83

Avaron F, Thaeron C, Beck C, Borday V, Geraudie J, Casane D,

Laurenti P (2003) Comparison of even skipped-related gene

expression pattern in Vertebrates shows an association between

expression domain loss and modification of selective constraints

on sequences.1 Evol Dev 5:145–146

Bailey WJ, Kim J, Wagner GP, Ruddle FH (1997) Phylogenetic

reconstruction of vertebrate Hox cluster duplications. Mol Biol

Evol 14:843–853

Bernardi G (2000) Isochores and the evolutionary genomics of

vertebrates. Gene 241:3–17

Birk DE, Fich JM, Barbiaz JP, Linsemmayer TF (1989) Collagen

type I and type V are2 present in the same fibril in the avian

corneal stroma. J Cell Biol 106:988–1008

Borday V, Thaeron C, Avaron F, Brulfert A, Casane D, Laur-

enti P, Geraudie J (2001) evx1 transcription in bony fin rays

segment boundaries leads to a reiterated pattern during ze-

brafish fin development and regeneration. Dev Dynam

220:91–98

Bruce AE, Oates AC, Prince VE, Ho RK (2001) Additional box

clusters in the zebrafish: divergent expression patterns belie

equivalent activities of duplicate hoxB5 genes. Evol Dev 3:127–

144

Chevrette M, Joly L, Tellis P, Knapik EW, Miles J, Fishman M,

Ekker M (2000) Characterization of a zebrafish/mouse somatic

cell hybrid panel. Genomics 64:119–126

Dion AS, Myers JC (1987) COOH-terminal propeptides of the

major human procollagens. Structural, functional and genetic

comparisons. J Mol Biol 193:127–143

Doyle SA, Smith BD (1998) Role of the pro-alpha2(I) COOH-

terminal region in assembly of type I collagen: Disruption of

two intramolecular disulfide bonds in pro-alpha2(I) blocks as-

sembly of type I collagen. J Cell Biochem 71:233–242

Exposito JY, Le Guellec D, Lu Q, Garrone R (1991) Short chain

collagens in sponges are encoded by a family of closely related

genes. J Biol Chem 266:21923–21928

512

Exposito J, Cluzel C, Lethias C, Garrone R (2000) Tracing the

evolution of vertebrate fibrillar collagens from an ancestral

alpha chain. Matrix Biol 19:275–279

Eyre-Walker A, Hurst LD (2001) The evolution of isochores. Nat

Rev Genet 2:549–555

Eyre D, Wu JJ (1987) Type XI or 1a2a3a collagen. In: Mayne R,Burgeson RE (eds). Structure and function of collagen types.

Academic Press, Orlando, FL, pp 261–279

Ferrier DE, Holland PW (2001) Ancient origin of the Hox gene

cluster. Nature Rev Genet 2:33–38

Ferrier DE, Minguillon C, Holland PW, Garcia-Fernandez J

(2000) The amphioxus Hox cluster: Deuterostome posterior

flexibility and Hox14. Evol Dev 2:284–293

Fietzek PP, Kuhn K (1976) The primary structure of collagen. Int

Rev Connect Tissue Res 7:1–60

Foster PG, Hickey DA (1999) Compositional bias may affect both

DNA-based and protein-based phylogenetic reconstructions. J

Mol Evol 48:284–290

Garcia-Fernandez J, Holland PW (1996) Amphioxus Hox genes:

Insights into evolution and development. Int J Dev Biol Sup-

pl:71S–72S

Holland PW (1997) Vertebrate evolution: Something fishy about

Hox genes. Curr Biol 7:R570–R572

Hughes AL (1999) Phylogenies of developmentally important

proteins do not support the hypothesis of two rounds of ge-

nome duplication early in vertebrate history. J Mol Evol

48:565–576

Imamura Y, Scott IC, Greenspan DS (2000) The pro-alpha3(V)

collagen chain. Complete primary structure, expression do-

mains in adult and developing tissues, and comparison to the

structures and expression domains of the other types V and XI

procollagen chains. J Biol Chem 275:8749–8759

Janvier P (1996) Early vertebrates. Oxford University Press, Ox-

ford, pp 274–279

Janvier P (1999) Catching the first fish. Nature 402:21–22

Jimenez SA, Bashey RI, Benditt M, Yankowski R (1977) Identifi-

cation of collagen alpha1(I) trimer in embryonic chick tendons

and calvaria. Biochem Biophys Res Commun 78:1354–1361

Kessler E, Fichard A, Chanut-Delalande H, Brusel M, Ruggiero F

(2001) Bone morphogenetic protein-1 (BMP-1) mediates C-

terminal processing of procollagen V homotrimer. J Biol Chem

276:27051–27057

Kimura S (1985) The interstitial collagens in fish. In: Bairati A,

Garrone R (eds) Biology of invertebrate and lower vertebrate

collagens. NATO Series A, 3. Plenum Press, New York, pp

397–408

Kimura S (1992) Wide distribution of the skin type I collagen alpha

3 chain in bony fish. Comp Biochem Physiol B 102:255–260

Kimura S, Ohno Y, Miyauchi Y, Uchida N (1987) Fish skin type I

collagen: Wide distribution of an alpha 3 subunit in teleosts.

Comp Biochem Physiol B 88:27–34

Koivu J (1987) Identification of disulfide bonds in carboxy-termi-

nal propeptides of human type I procollagen. FEBS Lett

212:229–232

Lees JF, Bulleid NJ (1994) The role of cysteine residues in the

folding and association of the COOH-terminal propeptide of

types I and III procollagen. J Biol Chem 269:24354–24360

Lees JF, Tasab M, Bulleid NJ (1997) Identification of the molec-

ular recognition sequence which determines the type-specific

assembly of procollagen. EMBO J 16:908–916

Li WH (1993) Unbiased estimation of the rates of synonymous and

nonsynonymous substitution. J Mol Evol 36:96–99

Linsenmayer TF, Little CD (1978) Embryonic neural retina

collagen: In vitro synthesis of high molecular weight forms of

type II plus a new genetic type. Proc Natl Acad Sci USA 75:

3235–3239

Linsenmayer TF, Trelstad RL, Gross J (1973) The collagen of chick

embryonic notochord. BiochemBiophys Res Commun 53:39–45

Matsui R, Ishida M, Kimura S (1991) Characterization of an a3chain from the skin type I collagen of chum salmon (Onc-

horhynchus keta). Comp Biochem Physiol 99B:171–174

Mayne R, Burgeson RE (1987) Structure and function of collagen

types. Academic Press, Orlando, FL

Mayne R, Brewton RG, Mayne PM, Baker JR (1993) Isolation and

characterization of the chains of type V/type XI collagen pre-

sent in bovine vitreous. J Biol Chem 268:9381–9386

Mendler M, Eich-Bender SG, Vaughn L, Winterhalter KH,

Bruckner P (1989) Cartilage contains mixed fibrils of collagen

types II, IX, and XI. J Cell Biol 108:191–197

Myllyharju J, Kivirikko KI (2001) Collagens and collagen-related

diseases. Ann Med 33:7–21

Ohno S (1996) The notion of the Cambrian pananimalia genome.

Proc Natl Acad Sci USA 93:8475–8478

Peignoux-Deville J, Lallier F, Vidal B (1982) Evidence for the

presence of osseous tissue in dogfish vertebrae. Cell Tissue Res

222:605–614

Piez KA (1965) Characterization of a collagen from codfish skin

containing three chromatographically different alpha chains.

Biochemistry 4:2590–2596

Postlethwait JH, Woods IG, Ngo-Hazelett P, Yan YL, Kelly PD,

Chu F, Huang H, Hill-Force A, Talbot WS (2000) Zebrafish

comparative genomics and the origins of vertebrate chromo-

somes. Genome Res 10:1890–1902

Prockop DJ, Kivirriko KI, Tuderman L, Guzman NA (1979a) The

biosynthesis of collagenand itsdisorders.NEngl JMed301:13–23

Prockop DJ, Kivirriko KI, Tuderman L, Guzman NA (1979b) The

biosynthesis of collagen and its disorders. N Engl J Med

301:77–85

Rambaut A (1996) Se-Al: Sequence Alignment Editor, version 1.0

Oxford University, Oxford

Ramshaw JA, Werkmeister JA, Bremner HA (1988) Characteriza-

tion of type I collagen from the skin of blue grenadier (Macrur-

onus novaezelandiae). Arch Biochem Biophys 267:497–502

Saito M, Kunasaki N, Hirono I, Aoki T, Ishida M, Urano N,

Kimura S (1998) Partial characterization of cDNA clones en-

coding the three distinct pro-achain of type I collagen fromrainbow trout. Fish Sci 64:780–786

Saito M, Takenouchi Y, Kunisaki N, Kimura S (2001) Complete

primary structure of rainbow trout type I collagen consisting of

alpha1(I)alpha2(I)alpha3(I) heterotrimers. Eur J Biochem

268:2817–2827

Sandell LJ, Boyd CD (1990) Conserved and divergent sequence

and functional elements within collagen genes. In: Sandell LJ,

Boyd CD (eds) Extracellular matrix genes. Academic Press, San

Diego, pp 1–56

Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with

chain-terminating inhibitors. Proc Natl Acad Sci USA 74:5463–

5467

Sharman AC, Holland PW (1998) Estimation of Hox gene cluster

number in lampreys. Int J Dev Biol 42:617–620

Shu DG, Luo HL, Conway SM, Zhang XL, Hu SX, Chen L, Han

J, Zhu M, Li Y, Chen LZ (1999) Lower Cambrian vertebrates

from south China. Nature 402:42–46

Sicot FX, Exposito JY, Masselot M, Garrone R, Deutsch J, Gaill F

(1997) Cloning of an annelid fibrillar-collagen gene and phy-

logenetic analysis of vertebrate and invertebrate collagens. Eur

J Biochem 246:50–58

Soderberg C, Wraith A, Ringvall M, Yan YL, Postlethwait JH,

Brodin L, Larhammar D (2000) Zebrafish genes for neuro-

peptide Y and peptide YY reveal origin by chromosome du-

plication from an ancestral gene linked to the homeobox

cluster. J Neurochem 75:908–918

513

Sordino P, Duboule D, Kondo T (1996) Zebrafish Hoxa and Evx-2

genes: Cloning, developmental expression and implications for

the functional evolution of posterior Hox genes. Mech Dev

59:165–175

Swofford DL (1998) PAUP*: Phylogenetic analysis using parsi-

mony (* and other methods), Version 4. Sinauer Associates,

Sunderland, MA

Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins

DG (1997) The CLUSTAL X windows interface: Flexible

strategies for multiple sequence alignment aided by quality

analysis tools. Nucleic Acids Res 25:4876–4882

Uitto J (1979) Collagen polymorphism: Isolation and partial

characterization of alpha 1(I)-trimer molecules in normal hu-

man skin. Arch Biochem Biophys 192:371–379

Valkkila M, Melkoniemi M, Kvist L, Kuivaniemi H, Tromp G,

Ala-Kokko L (2001) Genomic organization of the human

COL3A1 and COL5A2 genes: COL5A2 has evolved differently

than the other minor fibrillar collagen genes. Matrix Biol

20:357–366

Van de Peer Y, Taylor JS, Braasch I, Meyer A (2001) The ghost of

selection past: rates of evolution and functional divergence of

anciently duplicated genes. J Mol Evol 53:436–446

van der Rest M (1991) Collagens of bone. In: Hall BK (ed) Bone.

CRC Press, Boca Raton, pp 187–237

van der Rest M, Garrone R (1991) Collagen family of proteins.

FASEB J 5:2814–2823

Vaughan L, Mendler M, Huber S, Bruckner P, Winterhalter KH,

Irwin MI, Mayne R (1988) D-periodic distribution of colla-

gen type IX along cartilage fibrils. J Cell Biol 106:991–

997

Veis A, Sabsay B (1987) The collagen of mineralized matrices. In:

Peck WA (ed) Bone and mineral research 5. Elsevier, Amster-

dam, pp 1–63

Vuorio E, de Crombrugghe B (1990) The family of collagen genes.

Annu Rev Biochem 59:837–872

Weill D, Bernard M, Gargano S, Ramirez F (1987) The proa2(V)collagen gene is evolutionary related to the major fibrillar-

forming collagens. Nucleic Acids Res 15:181–198

Woods IG, Kelly PD, Chu F, Ngo-Hazelett P, Yan YL, Huang H,

Postlethwait JH, Talbot WS (2000) A comparative map of the

zebrafish genome. Genome Res 10:1903–1914

Wright GM, Keeley FW, Robson P (2001) The unusual cartilagi-

nous tissues of jawless craniates, cephalochordates and inver-

tebrates. Cell Tissue Res 304:165–174

Yamada Y, Avvedimento VE, Mudryj M, Ohkubo H, Vogeli G,

Irani M, Pastan I, de Crombrugghe B (1980) The collagen gene:

Evidence for its evolutionary assembly by amplification of a

DNA segment containing an exon of 54 bp. Cell 22:882–892

Zylberberg L, Bonaventure J, Cohen-Solal L, Hatmann DJ, Bere-

iter-Hahn J (1992a) Organization and characterization of fi-

brillar collagens in fish scales in situ and in vitro. J Cell Sci

103:273–285

Zylberberg L, Geraudie J, Meunier F, Sire JY (1992b) Biominer-

alization in the integumental skeleton of the living lower ver-

tebrates. In: Hall BK (ed.) Bone. CRC Press, Boca Raton, FL,

Vol 4, pp 171–224

514