Evolution of collagen IV genes from a 54-base pair exon: A role for introns in gene evolution

10
J Mol Evol (1990) 30:479--488 Journal of Molecular Evolution (~) Sprioger-Vcrlag New York Inr I990 Evolution of Collagen IV Genes from a 54-Base Pair Exon: A Role for Introns in Gene Evolution Giovanna Buttic6, ~ Paul Kaytes, 2 Jeanine D'Armiento, 1. Gabriel Vogeli, 2 and Markku Kurkinen ~ J Department of Medicine, University of Medicine and Dentistryof New Jersey, Robert Wood Johnson Medical School, Piscataway, New Jersey 08854, USA 2The Upjohn Company, Kalamazoo, Michigan 49001, USA Summary. The exon structure of the collagen IV gene provides a striking example for collagen evo- lution and the role of introns in gene evolution. Collagen IV, a major component of basement mem- branes, differs from the fibrillar collagens in that it contains numerous interruptions in the triple helical Gly-X-Y repeat domain. We have characterized all 47 exons in the mouse a2(IV) collagen gene and find two 36-, two 45-, and one 54-bp exons as well as one 99- and three 108-bp exons encoding the Gly- X-Y repeat sequence. All these exon sizes are also found in the fibrillar collagen genes. Strikingly, of the 24 interruption sequences present in the a2- chain of mouse collagen IV, 11 are encoded at the exon/intron borders of the gene, part of one inter- ruption sequence is encoded by an exon of its own, and the remaining interruptions are encoded within the body of exons. In such "fusion exons" the Gly- X-Y encoding domain is also derived from 36-, 45-, or 54-bp sequence elements. These data support the idea that collagen IV genes evolved from a primor- dial 54-bp coding unit. We furthermore interpret these data to suggest that the interruption sequences in collagen IV may have evolved from introns, pre- sumably by inactivation of splice site signals, fol- lowing which intronic sequences could have been recruited into exons. We speculate that this mech- anism could provide a role for introns in gene evo- lution in general. * Current address: Department of Biochemistry, University of Medicine and Dentistry of New Jersey, Robert Wood Johnson Medical School, Piscataway, New Jersey 08854, USA Offprint requests to: G. Buttic6 Key words: Gene evolution -- Exon/intron struc- ture -- Collagen IV -- Basement membrane Introduction Collagen genes provide an interesting system to study gene evolution. At least 22 separate genes are known to code for the different collagen types (for a review see Boyd et al. 1990). Genes encoding the fibrillar collagen types I, II, and III contain many discrete- size exons, most of them 54 bp long, in addition to the 45-, 99-, 108-, and 162-bp exons. It is very likely that these collagen genes evolved from an ancestral coding unit of a 54 bp (Yamada et al. 1980, 1984; Chu et al. 1984). Significantly, the exon organization of these genes has been conserved between species and the different fibrillar collagen types. To date there has been no evidence for a 54-bp coding unit in nonfibrillar collagen or in invertebrate collagen genes for that matter. Based on the structure of seven collagen genes from Caenorhabditis ele- gans, Fields (1988) argued that evolution of different collagens could have at least two separate branches: one for the vertebrate fibrillar collagen types and one for the nonfibrillar collagen types including in- vertebrate collagens. It is important to note, how- ever, that the above evolutionary relationship of vertebrate and invertebrate collagens as well as the different collagen types was based on the presence or absence of a 54-bp exon in the collagen genes studied so far. In contrast to these notions, Run- negar (1985) had suggested earlier, based on the structure of collagen genes col-1 and col-2 from C.

Transcript of Evolution of collagen IV genes from a 54-base pair exon: A role for introns in gene evolution

J Mol Evol (1990) 30:479--488

Journal of Molecular Evolution (~) Sprioger-Vcrlag New York Inr I990

Evolution of Collagen IV Genes from a 54-Base Pair Exon: A Role for Introns in Gene Evolution

Giovanna Buttic6, ~ Paul Kaytes, 2 Jeanine D'Armiento, 1. Gabriel Vogeli, 2 and Markku Kurkinen ~

J Department of Medicine, University of Medicine and Dentistry of New Jersey, Robert Wood Johnson Medical School, Piscataway, New Jersey 08854, USA 2 The Upjohn Company, Kalamazoo, Michigan 49001, USA

Summary. The exon structure of the collagen IV gene provides a striking example for collagen evo- lution and the role of introns in gene evolution. Collagen IV, a major component of basement mem- branes, differs from the fibrillar collagens in that it contains numerous interruptions in the triple helical Gly-X-Y repeat domain. We have characterized all 47 exons in the mouse a2(IV) collagen gene and find two 36-, two 45-, and one 54-bp exons as well as one 99- and three 108-bp exons encoding the Gly- X-Y repeat sequence. All these exon sizes are also found in the fibrillar collagen genes. Strikingly, of the 24 interruption sequences present in the a2- chain of mouse collagen IV, 11 are encoded at the exon/intron borders of the gene, part of one inter- ruption sequence is encoded by an exon of its own, and the remaining interruptions are encoded within the body of exons. In such "fusion exons" the Gly- X-Y encoding domain is also derived from 36-, 45-, or 54-bp sequence elements. These data support the idea that collagen IV genes evolved from a primor- dial 54-bp coding unit. We furthermore interpret these data to suggest that the interruption sequences in collagen IV may have evolved from introns, pre- sumably by inactivation of splice site signals, fol- lowing which intronic sequences could have been recruited into exons. We speculate that this mech- anism could provide a role for introns in gene evo- lution in general.

* Current address: Department of Biochemistry, University of Medicine and Dentistry of New Jersey, Robert Wood Johnson Medical School, Piscataway, New Jersey 08854, USA Offprint requests to: G. Buttic6

Key words: Gene evolution -- Exon/intron struc- ture -- Collagen IV -- Basement membrane

Introduction

Collagen genes provide an interesting system to study gene evolution. At least 22 separate genes are known to code for the different collagen types (for a review see Boyd et al. 1990). Genes encoding the fibrillar collagen types I, II, and III contain many discrete- size exons, most of them 54 bp long, in addition to the 45-, 99-, 108-, and 162-bp exons. It is very likely that these collagen genes evolved from an ancestral coding unit of a 54 bp (Yamada et al. 1980, 1984; Chu et al. 1984). Significantly, the exon organization of these genes has been conserved between species and the different fibrillar collagen types.

To date there has been no evidence for a 54-bp coding unit in nonfibrillar collagen or in invertebrate collagen genes for that matter. Based on the structure of seven collagen genes from Caenorhabditis ele- gans, Fields (1988) argued that evolution o f different collagens could have at least two separate branches: one for the vertebrate fibrillar collagen types and one for the nonfibrillar collagen types including in- vertebrate collagens. It is important to note, how- ever, that the above evolutionary relationship of vertebrate and invertebrate collagens as well as the different collagen types was based on the presence or absence of a 54-bp exon in the collagen genes studied so far. In contrast to these notions, Run- negar (1985) had suggested earlier, based on the structure of collagen genes col-1 and col-2 from C.

480

elegans, that a 54-bp sequence may have been in- volved in the evolut ion o f these genes. However , direct evidence for a 54-bp origin o f the nematode collagen genes is lacking. Because nematode colla- gens present a large gene family, there are perhaps 150 separate collagen genes in C. elegans (Cox et al. 1989); it is possible that the pr imordia l 54-bp coding unit was obli terated during the gene amplification process. By characterizing the exon structure o f mouse a2(IV) collagen genes, we now repor t direct evidence for the 54-bp exon in a nonfibrillar col- lagen gene and show that many o f the other exons in this gene can also be der ived f rom a 54-bp se- quence element for the Gly-X-Y repeat.

Collagen IV is a major componen t o f basement membranes , a specialized form of extracellular ma- trix found between the epithelial and connect ive tissue ceils (Timpl and Dziadek 1986). The com- plete amino acid sequences for the a 1 (IV) and cz2(IV) chain f rom mouse (Nath et al. 1986; Wood et al. 1988; Muthukumaran et al. 1989; Saus et al. 1989) and h u m a n (Brazel et al. 1987, 1988; Soininen et al. 1987; Host ikka and Tryggvason 1988) have been determined. Collagen IV differs f rom the collagen types I - I I I in that it does not assemble into ordered fibrils but instead forms loose network-l ike struc- tures via its amino- and carboxyl- terminal domains (Dolz et al. 1988). Another characteristic feature o f collagen IV is the numerous interrupt ions in the triple helical Gly-X-Y domain . There are 21 inter- rupt ions in the a l ( IV) chain and 24 in the a2(IV) chain. These in terrupt ion sequences are thought to play an impor tan t role in the assembly and structure o f collagen IV molecules (Hofman et al. 1984).

In order to gain insight into the evolut ion and structure o f collagen IV genes, we have characterized all 47 exons in the mouse a2( Iv) collagen gene, which represents more than 90 kb o f contiguous D N A se- quence. Strikingly, among the exons, we find exon sizes identical to the fibrillar collagen genes includ- ing one 54-bp exon. We have also noted an intrigu- ing pat tern for the Gly-X-Y repeat sequence in col- lagen IV and other nonfibril lar collagen genes. Often this repeat is encoded by an e lement der ived f rom 36-, 45-, or 54-bp sequences. This pat tern is ob- served for nonfibril lar collagens f rom sea urchin, n e m a t o d e , fruit fly, and human, suggesting an im- por tant role for these sequence elements in the evo- lution ofnonf ibr i l la r collagens. Based on these data, we suggest that collagen IV and, indeed, other col- lagen genes have evolved f rom a pr imordia l 54-bp coding unit.

While examining the exon structure o f the o~2(IV) collagen gene, we fur thermore noted that the inter- rupt ion encoding sequences were often located at exon/ in t ron borders o f the gene. O f the 24 inter- rupt ion sequences present, 11 are encoded at the

exon/ in t ron borders o f the gene, 13 are encoded within the body o f exons, and part o f one interrup- t ion sequence is encoded by an exon o f its own. We interpret these data to suggest that the in terrupt ion sequences in collagen IV m ay have evolved f rom introns, presumably by inact ivat ion o f splice site signals in the early gene.

M a t e r i a l s and M e t h o d s

Genomic Cloning. Using appropriate complementary (c)DNA probes covering the entire sequence of mouse a2(IV) collagen (Sans et al. 1989), the complete gene was isolated in seven over- lapping lambda clones. The mouse genomic library constructed with Charon 4A vector was a kind gift of Tom Maniatis.

Exon Sequencing. EcoRI fragments of the a2(IV) coUagen gene were cloned in mp 18/19 (Pharmacia) or pBS + vectors (Stra- tagene). Based on the cDNA sequence, 17-met oligonucleotides were synthesized on a Biosearch DNA synthesizer (New Bruns- wick) and used as probes to identify exon-containing clones. These oligonucleotides were then used as primers (Strauss et al. 1986) in single- (Sanger et al. 1977) or double-strand (Lira and Pene 1988) sequencing using asS-dATP (Biggin et al. 1983) and T7 DNA polymerase (Tabor and Richardson 1987), according to the protocols from United States Biochemicals. Thirty-four exons were sequenced completely, 10 exons (exons 1, 8, 9, 21, 24, 31, 36, 44, 46, and 47) were sequenced across one border, and the sizes for four exons (exons 4, 11, 25, and 39) were deduced. In comparison with the cDNA sequence we found no sequence dif- ferences in the cxon sequences.

Resu l t s

A 54-bp Exon in the a2(IV) Collagen Gene

The exon structure in the fibrillar collagen gene fam- ily shows a close evolut ionary relat ionship and has been maintained between the different collagen types and species (Yamada et al. 1980, 1984; Chu et al. 1984). In contrast, no obvious relat ionship has been found for the nonfibrillar collagen genes. Initially, we and others (Kurkinen et al. 1985; Sakurai et al. 1986; Soininen et al. 1986) had concluded f rom the sequence analysis o f a few exons, that collagen IV had followed a different path of evolution compared to the fibrillar collagens. We have now character ized 47 exons in the mouse a2(IV) collagen gene and thus can go back and address the quest ion o f col- lagen IV evolut ion in more detail. As shown in Table 1, 17 exons in the gene are found to encode the Gly- X-Y repeat sequence alone (underl ined in Table 1). The other exons represent "fusion exons" and en- code both the Gly-X-Y repeat and the interrupt ion sequences that are a characteristic feature o f collagen IV. Among the Gly-X-Y encoding exons we find one 54-, two 45-, two 36-, one 72-, one 99-, and three 108-bp exons. All these exon sizes are found in the fibrillar collagen genes as well, with the ex-

481

Table 1. Exons in the mouse a2(IV) collagen gene

Exon no. Size (bp) 5' codon 3' codon

1 91+ 2 89 0 3 55 0 0 4 81 0 0 5 135 0 0 6 45 0 0 7 117 0 0 8 72 0 0 9 36 0 0

10 63 0 0 11 78 0 0 12 93 0 0 13 36 0 0 14 51 0 0 15 45 0 0 16 54 0 0 17 67 0 1 18 111 1 1 19 150 1 1 20 92 1 1 21 155 1 0 22 73 0 1 23 107 1 1 24 202 1 1 25 60 1 1 26 57 1 1 27 108 1 1 28 222 1 1 29 162 1 1 30 171 1 1 31 144 1 1 32 123 1 1 33 182 1 0 34 64 0 1 35 75 1 1 36 108 1 1 37 10---8 1 1 38 72 1 1 39 126 1 1 40 117 1 1 41 162 1 1 42 99 1 1 43 147 1 1 44 117 1 1 45 192 1 1 46 287 1 0 47 255+ 0

Exon positions are numbered from the 5' end of the gene. The first three exons encode a 5' untranslated mRNA sequence and the signal peptide (Kaytes et al. 1988). Because the transcription start site has not yet been determined for the a2(IV) collagen gene, exon 1 is at least 91 bp in length. Twenty-two exons are fusion exons and encode both the Gly-X-Y repeat and interrup- tion sequences (see Fig. 1), whereas the other exons (underlined) encode the Gly-X-Y repeat without interruptions. Exon 25 codes for a 20-residue noncollagenous sequence, part of a 24-residue interruption. The last three exons encode the end of the Gly-X-Y repeat sequence and the noneollagenous carboxyl-peptide se- quence. In the fight-hand column numbers 0 and 1 refer to the intron phase either between codons (0) or after the first nucleotide of the codon (1). A characteristic feature of collagen IV genes is that most of the exons start with a split Gly codon (see text footnote). Note, however, that at the 5' end of the gene, all exons start with an intact Gly codon. The significance of this difference in collagen IV evolution is unclear at present

ception of the 72-bp exon. The 36-bp exons coding for Gly-X-Y repeats have been described in t he N-propeptide coding domain of fibrillar genes, and in the triple helical coding domain of the a2(XI) collagen gene. The other GIy-X-Y encoding exons have sizes not related to those in the fibrillar collagen genes (63, 64, 107, 117, 135, 144, and 171 bp). The presence of exons of the same sizes as found in fibrillar collagen genes leaves no doubt that the col- lagen IV genes also evolved from a primordial 54- bp coding unit.

36-, 45-, and 54-bp Sequence Elements Encoding the Gly-X- Y Repeat

Mouse a2(IV) collagen contains 24 noncollagenous sequences that interrupt the triple helical GIy-X-Y repeat domain (Saus et al. 1989). The interruption encoding nucleotide sequences are distributed among 23 exons and are "fused" to the Gly-X-Y encoding nucleotide sequences. An intriguing feature of these fusion exons is that the GIy-X-Y encoding portion of the exon appears to be derived from 36-, 45-, or 54-bp sequence elements. A schematic description of the fusion exons in the a2(IV) collagen gene is presented in Fig. 1. Altogether, among the fusion exons we find 20 elements for 36- (six cases), 45- (five cases), or 54-bp (nine cases) sequences. The abundance of the 54-bp sequence elements in the a2(IV) collagen gene clearly shows that it is evolu- tionarily related to the fibrillar collagens and further supports the conclusion that the collagen IV genes were also derived from a primordial 54-bp coding unit.

Common Sequence Elements Encode the Gly-X-Y Repeat in the Nonfibrillar Collagen Genes

Several other collagen types contain interruptions in the Gly-X-Y repeat sequence. It was of interest to examine the exons from these genes with respect to the interruptions and the Gly-X-Y encoding se- quence elements discussed above. First, as a refer- ence for this analysis, Fig. 2A shows that the 123- bp exon (exon 32 in Table 1) in the mouse a2(IV) collagen gene (Kurkinen et al. 1985), which contains the sequence for an 11-residue interruption, is de- rived from 36- and 54-bp sequence elements ~ for

Note, however, that often the Gly codon is split after the first nucleotide (Kurkinen et al. 1985; Sakurai et al. 1986; Soininen et al. 1986), which is a highly conserved feature of collagen IV genes, as it is also found in collagen IV genes from Drosophila (Blumberg et al. 1988) and C. elegans (Guo and Kramer 1989). Consequently, the 5' and 3' sequence elements discussed here are one nucleotide shorter or longer, respectively. For clarity and illustration purposes the sequence elements are depicted with intact Gly codons. This procedure, however, does not affect the results and conclusions of this paper

482

ccaca __ _ tqCa

38 I 2 6

2 8

o o

5 4 1O

, , T8 I ',o' H ~ I o �9

3 8

12 93 I t2 N~I;f~N| 29 o o

3 6

14 ~ t i

3 2

o o

3 6

17 f i t 3 3 m

I |

gg

! I

5 4 72

,9 ,50 I?l , . N i 24 I 3 . t I

8 I

I I

1{} 117

I o 5 4

o !

O l I 0 8

2, 202 ~1 27 ~l 3, 1

4 3

!

Fig . 1.

4 5

57 ! !

126

222 f ,2 ~] I

g9 5 4

1 !

3 6

123 I ,2 Imi ,~ !

,2 i 182 [ I 2 4

I

18 4 5

?5 I~ [ ] ,s I t

5 4

126 N I . N ~ I

5 4 4 5

I

3 6 ! 17

,~2 [ , 2 ~3 3g I s

4 5 72

"? I ,5 ~|1 " I I

S 4 5 4

"? I ~ " [~,, ,o I I I

81

27 l I

5 4

Is I I

1 0 8

3 6 ._ .J

Interruptions in the Gly-X-Y repeat of a2(IV) collagen are often encoded at the exordintron borders. In this schematic presentation we show 22 exons that contain sequences for interruptions. The other exons (not shown) encode GIy-X-Y repeats without interruptions and the amino- and carboxyl-terminal peptides (see Table 1). The exons are numbered from the 5' end of the gene, and their sizes are indicated in base pairs. Boxed numbers refer to the number of amino acids encoded. Shaded boxes denote the interruption sequence. Clear boxes denote the Gly-X-Y encoding part of the exon, and the numbers above refer to the nucleotide sequence in base pairs. Small numbers at the exon borders denote intact ((3) or split (1) eodons. As an example, the nucleotide and amino acid sequence (capital letters) and flanking intron sequence (small letters) for exon 8 are shown. Open triangles denote the intron splice sites. This exon encodes 12 residues (36 bp) for the Gly-X-Y repeat sequence followed by a 12-residue interruption sequence. Note that in 11 out of 22 exons, the interruption is encoded at the end or beginning of the exon.

the Gly-X-Y repeat. Second, in the chicken a l ( IX) and ot2(IX) collagen genes, homologous 78-bp exons encode a two-residue interruption located in the middle of the GIy-X-Y sequence (Lozano et al. 1985). In these exons the Gly-X-Y repeats are derived from a 36-bp sequence element. In the chicken a2(IX) collagen gene, an exon encoding a 15-residue inter- ruption contains a 144-bp sequence element for the Gly-X-Y repeat (Lozano et al. 1985). Chicken col- lagen X contains eight interruptions in the Gly-X-Y repeat sequence and, interestingly, is encoded by one long open reading frame without introns (Ninomiya et al . 1 9 8 6 ) . In this gene, the Gly-X-Y repeats are encoded by 45-, 72-, 99-, 108-, 126-, 135-, and 540- bp sequence elements. Note that 540 bp is exactly 10 • 54 bp (see Fig. 2B). Third, exon sequences in the invertebrate collagen genes appear to follow the same pattern. In the nematode C. elegans an exon in the collagen gene col-2 encodes five interruptions (Kramer et al. 1982) and contains 72-, 81-, 90-, and 1 1 7 - b p elements for the Gly-X-Y repeat. The 72-

bp element is also found in col-6 and col-8 genes. In addition, the coi-6 and col-7 genes contain a 99- bp and the col-19 gene contains a 171-bp sequence element for the GIy-X-Y repeat (see Fields 1988). In the a 1 (IV) collagen gene from the fruit fly Dro- sophila rnelanogaster (Blumberg et al. 1988) the ex- ons contain 36-, 45-, 54, 72-, 117-, and 171-bp se- quence elements for the Gly-X-Y repeat interspersed between the interruption sequences (there are 21 interruptions in the Drosophila collagen IV) or exon/ intron borders. As an example, exon 7, which is 969 bp in length encodes two interruptions and the exon ends with a 45-bp element for the Gly-X-Y repeat. Exon 8 is 1133 bp in length and encodes four in- terruptions, and, strikingly, this exon begins with a 54-bp element for the Gly-X-Y repeat (see Fig. 2C). Finally, one exon characterized in the collagen IV- like gene from the sea urchin Strongylocentrotus purpuratus (Venkatesan et al. 1986) encodes a two- residue interruption flanked by 90- and 117-bp se- quence elements for the Gly-X-Y repeat. In sum-

mary, the above examples demonstrate that the Gly- X-Y repeat is often derived from similar sequence elements and suggest an apparent relationship among the nonfibrillar collagens from various species. Fur- thermore, it is striking that most of the sequence elements described here for these genes (36, 45, 54, 72, 108, 117, 135, 144, and 171 bp) are also found as separate exons in the mouse a2(IV) collegan gene (see Table 1).

llouse a2(IV)

Chicken X

36 54

i , , Iiilii[iff, lll ,o I ! I

E x o n - 3 2

123-bp

72 45

483

Evolution of Interruption Sequences in Collagen IV

How did the interruption sequences evolve in col- lagen IV and what was the mechanism to generate an exon structure for the a2(IV) collagen gene that shows no obvious relationship to fibrillar collagen genes? When we analyzed the location of the DNA sequences encoding the interruptions in relation to the DNA sequences encoding the Gly-X-Y repeats within each exon, an intriguing pattern emerged (see Fig. 1). Eleven out of 24 interruptions are encoded at the exon/intron borders. Thirteen interruptions are encoded in the body of exons and a part of one interruption is encoded by a single exon of its own. Based on these observations and the fact that in many exons the GIy-X-Y encoding portion is de- rived from 36-, 45-, or 54-bp sequence elements, we suggest that DNA sequences coding for inter- ruptions were recruited in the primordial gene from flanking intron sequences. Accordingly, this could explain why the sequence elements of 54, 45, and 36 bp are still found a total of 20 times within these exons and suggests that they represented the exon sizes in the primordial collagen IV gene. Inciden- tally, two 36-, two 45-, and one 54-bp exon encoding the Gly-X-Y repeat are found in the present-day a2(IV) collagen gene (see Table 1).

In order to find evidence for the intronic origin of interruption sequences we examined the mouse a2(IV) collagen nucleotide sequence (Saus et al. 1989) in detail. When we compared the nucleotide se- quences for the interruptions (148 amino acids in total), the Gly-X-Y repeats (1275 amino acids in 425 Gly-X-Y triplets), and the noncollagenous car- boxyl-terminal peptide (227 amino acids), we found no differences in the codon usage or amino acid composition that were specific for the interruptions (data not shown). The GC contents in these regions are 51, 63, and 59%, respectively. The low GC con- tent of interruption sequences, however, is in line with their putative intronic origin.

Other evidence for the intronic origin of inter- ruptions is provided by the 24-residue-long inter- ruption sequence present in the a 1 (IV) collagen chain (Sans et al. 1989). In the a2(IV) collagen chain the homologous interruption sequence is only four res-

Drosophila IV

45 54

I1 ,5 I I ,8 lii iil i I

s Exon-8

969-bp I 133-bp

Fig. 2. Common sequence elements encoding the Gly-X-Y re- peat. A In this schematic presentation, the 123-bp exon of the mouse a2(IV) collagen gene is shown to be derived from 36- and 54-bp sequence elements for the Gly-X-Y repeat. Because of the split Gly codons, the numbers of Gly-X-Y repeat residues are 112/3 (35 bp) and 181/3 (55 bp). B Part of the chicken collagen X gene is shown to demonstrate the 72- and 45-bp elements for the GIy-X-Y repeat. C Shown schematically are the ends and begin- nings of exons 7 and 8, respectively, of the Drosophila collagen IV gene. Note the 45- and 54-bp sequence elements for the Gly- X-Y repeat. Symbols and definitions are as in Fig. I.

idues long (Muthukumaran et al. 1989). It is of in- terest that the sequence difference of 20 amino acids is encoded exactly by one exon (exon 25) in the ot2(IV) collagen gene. The remaining four residues of this interruption are contributed by the adjacent exon 26 (see Fig. 1). We interpret these data to sug- gest that in the a2(IV) collagen gene an intron se- quence of 60 nueleotides was recruited to become exon 25, resulting in the insertion of 20 additional residues in the a2(IV) chain.

In another approach we reasoned that if the in- terruption encoding sequences of the collagen IV gene represented remnants ofintrons, then their nu- cleotide sequence might still reveal similarity to the consensus intron sequence at the 5' and 3' splice sites. Accordingly, we compiled 10-nucleotide-long consensus sequences and compared them with the interruption encoding sequences. Furthermore, we looked for 5' site homology in exons encoding in- terruptions at their end, and conversely, 3' site ho- mology in exons encoding interruptions at their be- ginning. In total, we found two sequences with 80% homology to the consensus 3' splice site sequence. These examples are described in Fig. 3. In the in- terruption sequence encoded by exon 14, two nu- cleotide changes are required to distinguish it from the consensus 3' splice site sequence. Similarly, in the interruption sequence encoded by exon 39 one

484

EXON 35

o I o

14 g q e ~ ~

54 45

~l,~,!l ,8 m~l ,5 I ! I

39 g t g t t g a t ~

Fig. 3. Splice site sequence homology. Shown are two examples where the nucleotide sequences for interruptions are 80% iden- tical with the consensus intron 3' splice site sequence. Top line: nucleotide sequences for the a2(IV) collagen. Capital letters refer to the in-frame codons in the Gly-X-Y triplet and small letters are the nucleotide sequence for the interruptions. Bottom line: consensus 3' splice site sequence for the mouse a2fIV) collagen gene. The consensus is [c/t]3c[c/a]cagGG where the two GG nu- cleotides represent a consensus 5' exon sequence. The above consensus is based on 24 intron sequences flanking the GIy-X-Y encoding exon sequences. In each position the nucleotide, or pair ofnucleotides shown, has a frequency of 0.5 or higher. The se- quence agG is invariant. Open triangles denote the splice site in the consensus sequence. Exons are pictured as in Fig. 1.

nucleotide deletion and one nucleotide change are required to muta te the presumpt ive splice site. Re- markably, the consensus 3' splice sequence was found to have 80% homology only once more in the entire sequence o f 6184 nucleotides for the m R N A of the a2(IV) collagen chain. These results apparent ly are in line with the not ion that in terrupt ion sequences originate f rom introns.

D i s c u s s i o n

Evolution of Collagen IV Genes from a 54-bp Exon: Common Sequence Elements for the Gly-X- Y Repeat in Nonfibrillar Collagen Genes

Based on the partial exon structure for the chicken a2(I) collagen gene, Yamada et al. (1980) suggested that fibrillar collagen genes and, indeed, all collagen genes evolved f rom a pr imordia l 54-bp coding unit. Sequence data f rom other fibrillar collagen genes has provided strong support for this notion. Indeed, all fibrillar collagen genes (types I, II, and III) are com- posed o f 45-, 54-, 99-, 108-, and 162-bp exons, and this is true for all the species examined so far (Ya- mada et al. 1980, 1984; Chu et at. 1984). Further- more, an interesting feature o f these genes is that their exon structure has been conserved, that is, the arrangement o f the different exons in the gene has been main ta ined across species and the different fi- brillar collagen types. This result is in line with the idea that these genes evolved by gene duplicat ion f rom one c omm on , presumably a full-size, gene for fibrillar collagens. In this respect, an intriguing f i n d -

ing is that all fibrillar collagen genes (in human) are located on different ch romosomes (Solomon et al. 1985). It is thought that this si tuation was selected during evolut ion to reduce homologous recombi- nat ion between the collagen genes. Incidentally, similar arguments also apply to explain the dispersal o f collagen coding sequences into mult iple exons to prevent homologous recombina t ion between the highly repeti t ive sequence o f the Gly-X-Y encoding exons. There are several observations, however, that argue against the general val idi ty o f the above no- tions, First, collagen IV genes, in h u m an and mouse, are linked head- to-head on the same ch ro m o so m e (Burbelo et at. 1988; Kaytes et al. 1988; Poschl et al. 1988; Soininen et al. 1988). Second, the chicken collagen X gene is conta ined in one long open read- ing frame without introns (Ninomiya et al. 1986), and the Drosophila collagen IV gene has only four introns in the Gly-X-Y encoding region (Blumberg et at. 1988). Taken together, these data suggest that different ch romosomal locations and the numerous introns o f the collagen genes may have some other purpose than that originally suggested.

In this paper we present evidence to support the idea that collagen IV genes evolved from a 54-bp pr imordial coding unit. First, Table 1 shows that the mouse a2(IV) collagen gene contains two 36-, two 45-, and one 54-bp exon as well as one 72-, one 99-, and three 108-bp exons encoding the Gly-X-Y repeat sequence. Strikingly, except for the 72-bp ex- ons, all these exon sizes are also found in the fibrillar collagen genes. Second, a n u m b er o f other exons in the a2(IV) collagen represent fusion exons and en- code the Gly-X-Y repeat and interrupt ion se- quences, a characteristic feature o f collagen IV. In these exons, the GIy X-Y encoding doma in is also der ived f rom 36-, 45-, or 54-bp sequence elements. Figure 1 shows that the 54-bp e lement is found nine t imes in the fusion exons. Moreover , exon 45, which is 192 bp in length (see Table 1), encodes the end o f the helical Gly-X-Y sequence and the beginning o f the COOH- te rmina l peptide. In this exon the Gly- X-Y encoding doma in is a 54-bp sequence element. 1 Thus, in total, there are 11 54-bp elements for the Gly-X-Y repeat in the a2(IV) collagen gene.

O f interest is that examinat ion o f the published data for other nonfibrillar collagen or invertebrate collagen gene structures f rom sea urchin, nematode , fruit fly, and h u m an reveals a similar pat tern for the Gly-X-Y encoding sequence elements including those o f 36, 45, and 54 bp in length. In addit ion, partial sequence o f a collagen-like gene f rom the sea urchin Paracentrotus lividus revealed two 54-, one 99-, and one 198-bp exons for Gly-X-Y repeats (Saitta et al. 1989). Although at present it cannot be ruled out that these exons represent a pseudogene for sea urchin collagen, the above sequence data

--I ~4 lil 4~ I ~ "-t 99 I~

485

Fig. 4. Evolution of collagen IV genes from a 54-bp exon. In this schematic draw- ing boxes illustrate exons and introns are depicted as thin lines. Numbers refer to the size of exons in base pairs. To account for the abundance of 36-, 45-, and 54-bp se- quence elements for the Gly-X-Y repeats, it is thought that the early collagen IV gene evolved from a 54-bp coding unit by ho- mologous recombination within these ex- ons. In the next phase of evolution introns are shown to be lost during exon fusion either completely or partially, or to become part of the exons. This mechanism could account for the evolution of noncoUagenous interruptions in collagen IV. See text for further discussion.

support the idea that invertebrate collagen genes may have also evolved f rom a 54-bp coding unit. Fur thermore , several observat ions suggest that this collagen-like gene is transcriptionally active. First, all the exons characterized are flanked by typical intron consensus sequences for correct exon splicing and translation o f the Gly-X-Y sequence in frame. Second, the exon sequences hybridized at high strin- gency to a 6-kb m R N A that was developmental ly regulated in sea urchin (Saitta et al. 1989). Consis- tent with these notions, another collagen gene f rom sea urchin has recently been partially characterized and shown to be similar in exon structure o f those encoding ver tebrate fibrillar collagens (D'Alessio et at. 1989).

The Drosophila collagen IV gene contains nine exons (Blumberg et at. 1988) and clearly does not conform to the exon structure o f the mouse a2(IV) collagen gene. However , as we have shown here in a closer analysis, the GIy-X-Y repeats appear t6 be encoded or der ived f rom few characteristic sequence elements including those o f 36, 45, and 54 bp in length. The observat ion that in the Drosophila col- lagen IV gene exon 7 ends with a 45-bp and exon 8 begins with a 54-bp sequence e lement for the Gly- X-Y repeat (see Fig. 2C) gives further support to the not ion that nonfibrillar collagens and invertebrate collagens ma y have evolved f rom a 54-bp coding unit. It is possible that the much longer Gly-X-Y encoding sequences in this gene are the result o f fusions between the shorter sequence elements.

Intronic Origin of the Interruption Sequences in Collagen IV

We have proposed here that the numerous noncol- lagenous sequences that interrupt the Gly-X-Y re- peat domain o f collagen IV may have evolved f rom intronic sequences. This not ion could readily ex- plain the frequent occurrence o f the interrupt ion

encoding sequences at exon/ in t ron borders o f the gene. Such recrui tment o f intron sequences could have happened by the inact ivat ion o f an ancestral 5' or 3' splice site, leaving the splicing apparatus scanning for the next consensus splice site and thus introducing intronic sequences into the Gly-X-Y en- coding exons. The other class o f exons, where the interruptions are encoded within the body o f the exons, could represent examples for intron loss be- tween two exons, which already had acquired intron sequences either at their 5' or 3' end. Alternatively, such interruptions may also reflect an incomplete loss o f intronic sequences during a fusion o f two exons encoding the Gly-X-Y repeat. Wha tever the process might have been, one has to postulate that previously existing splice sites have been inactivat- ed.

For comparison, in another model we may con- sider that the interruptions in collagen IV evolved f rom the Gly-X-Y repeats by mutat ions in the Gly codon. At first sight this model would easily explain the evolution o f the three-residue interruptions AFP, ALP, and AVP present in the mouse a2(IV) chain (Saus et al. 1989). In these examples, one nucleotide change is required to muta te a Gly codon into a eodon for Ala found in these interruptions. One c o m m o n feature o f the above in terrupt ion se- quences is that they all are encoded in the body o f exons, in exons 29, 41, and 44, which are 162, 162, and 117 bp in length, respect ively (Fig. 1). The fre- quency o f Pro in the Y-posi t ion (0.36 = 162/425) and Phe, Leu, and Vat in the X-posi t ion (0.11, 0.11, and 0.01, respectively) o f a2(IV) GIy-X-Y repeats further supports the idea that the above interrup- tions could have evolved from the Gly-X-Y se- quence. However , there are a total o f 45 interrupt ion sequences in the a I - and a2-chains o f collagen IV and only 8 o f them are multiples o f 3 residues in length (3 for 3, 3 for 6, 1 for 9, and 1 for 12 residues). Consequently, to explain the evolut ion o f the

486

2-residue interruptions (a total of 17 in the collagen IV chains), for example, one would have to assume deletion of a Gly codon 17 times, a very unlikely event. Finally, the evolution of the longer interrup- tion sequences would require multiple deletions and/ or insertions of Gly codons. Based on these simple arguments, it seems unlikely that mutations, dele- tions, or insertions ofa Gly codon could have played any significant role in the evolution of interruption sequences of collagen IV.

Because collagen IV genes are thought to have evolved by gene duplication it is striking that the interruption sequences are totally unrelated between the a l - and a2-chains (Saus et al. 1989). The se- quence similarity is only 4% (6/148), whereas the overall similarity of the two chains is 42%. Had the interruption sequences evolved, by whatever mech- anism, before the gene duplication, it is difficult to understand how they became so different in the two chains. Another possibility is that the interruption sequences of collagen IV evolved independently af- ter the gene duplication and were selected by posi- tion and approximate length but not by exact se- quence. Our speculation that interruptions evolved from intron sequences, which diverged much faster than the coding sequences, is in line with the latter possibility.

The idea that interruptions in the Gly-X-Y repeat sequence of collagen IV evolved from intron se- quences immediately implies that the early collagen IV gene existed and was expressed without inter- ruption encoding sequences. Indeed, i f this was the case, then the removal of interruption encoding sequences should not interfere with the reading frame of the exons. Evidence for this notion is provided by the data shown in Fig. 3. Based on the 80% nucleotide homology between the consensus 3' splice site sequence and that encoding the interruption se- quence, the first example assumes that the splice site in the ancestral gene was in front of the intact Gly- codon and accordingly, that the exon size was 36 bp. If this exon (exon 14) was expressed and spliced in the correct frame in the early gene, it should have been flanked by exons with intact Gly codons as well. Indeed, the flanking exons are 36- and 45-bp in length and they both have intact GIy codons (see Table I). In the second example, it is assumed that the ancestral exon (exon 39) was 54 bp in length and had split Gly codons on both ends. Strikingly, this exon is flanked by exons with split Gly codons as weU (see Table 1). These two examples suggest that exons, which in the present-day gene encode interruption sequences, could have been expressed in the correct frame as is typical for Gly-X-Y en- coding exons in the early collagen IV gene.

In comparison with other genes, mutational events taking place at the exon/intron borders of collagen

IV genes are much easier to detect, as they will in- terrupt the characteristic GIy-X-Y repeat sequence. Because collagen IV does not form distinct fibriUar structures, the selective pressure to maintain pri- mary structure may have been considerably lower than in the case of fibril-forming collagen types, al- lowing interruptions to be incorporated in the Gly- X-Y repeat sequence. However, it is likely that the proposed role of introns in gene evolution is not a property of collagen IV alone but should be found in the evolution of other genes as well. As first point- ed out by Craik et al. (1983), in the serine proteinase family and the dihydrofolate reductase family, se- quence divergence of homologous enzymes, either deletion or insertion of few amino acids, is often located at the exon/intron borders. As an explana- tion for this observation, the authors proposed an intron sliding model for the extension or contraction ofexons. According to this model, intron splice sites are movable to create deletions or insertions of ami- no acids in the protein sequence. Another example of intron sliding is provided by the thyroglobulin gene. Thyroglobulin has evolved from sequences that were repeated many times, and, interestingly, the exons show sequence variations at their 5' or 3' ends. From these data it was proposed that such a vari- ation was brought into play by intron sequences that became incorporated into exons in a process the authors termed "'exonization" (Parma et al. 1987). Finally, the repeating structure of the fibronectin gene suggests that part of it evolved from a common exon sequence, which subsequently was modified by intron movement to generate variations in the exon sequences (Odermatt et al. 1985).

We have noted here that one exon in the mouse a2(IV) collagen gene (exon 25 in Table 1) encodes a 20-residue noncollagenous sequence, a part of a 24-residue interruption sequence. This results in the insertion of an additional 20 amino acids in the a2(IV) collagen chain not present in the a 1 (IV) chain. We have taken this as an example for the evolution of an intron sequence into an exon. It would be illuminating to characterize the corresponding in- tron region in the al(IV) collagen gene. Sequence comparison should allow one to determine whether this intron-like exon was recently acquired in the a2(IV) collagen gene or whether it was lost from the cd(IV) collagen gene. In the latter case we would expect to find a region with significant nucleotide similarity to the exon 25 of the ~2(IV) collagen gene.

Recently, two reports have described examples that are reminiscent of the intron-like exon of col- lagen IV cited above. First, the/3- and -y-crystallin genes have evolved from a common ancestor gene, and sequence comparison ofihe genes has suggested that the second exon in the # B1 gene originated from or has recruited intronic sequences (den Dun-

487

nen et al. 1986). Second, the lenses o f rodents con- tain an addi t ional ctA-crystallin fo rm called a A ins- crystallin, which is identical to aA-crysta l l in except for the insert ion o f 23 amino acids encoded by a single exon. No evidence for the expression o f a A in~- crystallin has been found in lens proteins f rom pri- ma tes including humans . A surprising observat ion, however , is that in the h u m a n aA-crystal l in gene the first in t ron contains a pseudoexon for the 23 residues found in the otAi~s-crystallin (Jaworski and Piat igorsky 1989). In this pseudoexon, the down- s t ream splice site is missing and in the ups t r eam intron there is an uncharacter is t ic adenosine at po- sition - 5 . Apar t f rom these differences the pseu- doexon is 86% identical in nucleotide sequence with the mouse exon, which is expressed in aAin~-crys- tallin, suggesting that in the early gene this pseu- doexon was expressed as well. F r o m the nucleotide sequence compar i son it was calculated that inacti- va t ion o f this exon in the p r ima te lineage occurred some 30-40 mil l ion years ago (Jaworski and Pia- t igorsky 1989). Thus, these data represent an ex- ample where an exon is recruited into an int ron sequence during gene evolut ion, p re sumab ly as a result o f point mu ta t i on in the splice site.

In general, there is good correspondence between exons and different doma ins or functions o f the pro- tein. In addit ion, a m i n o acids exposed on the surface o f a prote in are usually encoded at the exon/ in t ron borders , that is, in the beginning or at the end o f exons (Craik et al. 1982). These relat ionships be- tween prote in and gene structure are intriguing and p resumably were dic ta ted by the evolu t ionary pro- cess itself. For example , during prote in evolut ion, loss or gain o f doma i n functions could have been man ipu la ted by changing the surface residues, as these p resumably are the targets for molecu la r in- teractions. In principle, this could be accompl ished by insert ions and deletions or by mul t ip le po in t m u - ta t ions o f the gene. However , a more plausible m e c h a n i s m should be a poin t mu ta t i on or inacti- va t ion o f in t ron splice sites fol lowed by int ron slid- ing to delete or insert new sequence informat ion . In this respect, it is o f obv ious evolu t ionary advantage to have surface residues encoded at the exon/ in t ron borders. Moreover , it is o f interest to note that this m e c h a n i s m could also explain the striking corre- spondence between exons and different prote in do- mains . In this model , ampli f icat ion o f one exon would first generate an early gene with several s im- ilar exons followed by sequence divergence. Then int ron sliding at the exon/ in t ron borders would rap- idly mod i fy the existing exon structure to result in the generat ion o f new d o m a i n funct ions in the early proteins. Thus, according to this model , different prote in doma ins could have evo lved f rom exons tha t were always separa ted by introns.

In s u m m a r y , Fig. 4 depicts a scenario for the evolut ion o f the collagen IV genes. In the early gene, amplif icat ion and recombina t ion o f a 54-bp exon resulted in exon sizes o f 36, 45, and 54 bp for the G l y - X - Y repeat sequence. Subsequently, recruit- men t o f intronic sequences, exon fusions, and in- comple te loss o f in t rons during exon fusions resulted in the evolu t ion o f the complex exon structure and the presence o f in ter rupt ion sequences in collagen IV. In this scenario, one role for intronic sequences is thought to be a means for increasing exon se- quence var ia t ions and, consequently, enhancing evolut ion o f different d o m a i n functions for collagen IV.

Acknowledgments. We thank Francine Mittleman for typing this manuscript. This work was supported by NIH grant GM 34090.

References

Biggin MD, Gibson TJ, Hong GR (1983) Buffer gradient gels and 35S label as an aid to rapid DNA sequence determination. Proc Natl Acad Sci USA 80:3963-3965

Blumberg B, MacK.rell AJ, Fessler JH (1988) Drosophila base- ment membrane procollagen a 1 (IV). J Biol Chem 263:18328- 18337

Boyd C, Byers P, Sundell L (eds) (1990) Biology ofextracellular matrix. Academic Press, New York (in press)

Brazel D, Oberbaumer I, Dieringer H, Babel W, Glanville RW, DeutzmanR, KuhnK (1987) Completion ofthe amino acid sequence of the a l-chain of human basement membrane col- lagen type IV reveals 21 non-triplet interruptions located within the coUagenous domain. Eur J Biochem 168:529-536

Brazel D, Pollner R, Oberbaumer I, Kuhn K (1988) Human basement membrane collagen (type IV). Eur J Biochem 172: 35--42

Burbelo P, Martin G, Yamada Y (1988) al(IV) and ot2(IV) collagen genes are regulated by a bidirectional promoter and a shared enhancer. Proc Natl Acad Sci USA 85:9679-9682

Chu M-L, de Wet W, Bernard M, Ding J-F, Morabito M, Myers J, Williams C, Ramirez F (1984) Human pro 1(I) collagen gene structure reveals evolutionary conservation of a pattern of introns and exons. Nature 310:337-340

Cox G, Fields C, Kramer J, Rosenzweig B, Hirsh D (1989) Sequence comparisons of developmentally regulated collagen genes of Caenorhabditis elegans. Gene 76:331-344

Craik CS, Sprang S, Fletterick R, Rutter WJ (1982) Intron- exon splice junctions map at protein surfaces. Nature 299: 180-182

Craik CS, Rutter WJ, Fleuerick R (1983) Splice junctions: as- sociation with variations in protein structure. Science 220: 1125-1129

D'Alessio M, Ramirez F, Suzuki I, Solursh M, Gambino R (1989) Strncture and developmental expression of sea urchin fibriUar collagen gene. Proc Natl Acad Sci USA 86:9303-9307

den Dunnen JT, Moorman RJM, Lubsen NH, Schoenmakers JGG (1986) Intron insertions and deletions in the/~/'V-crys- tallin gene. Proc Natl Aead Sci USA 83:2855-2859

Dolz R, Engel J, Kuhn K (1988) Folding of collagen IV. Eur J Biochem 178:357-365

Fields C (1988) Domain organization and intron position in

488

Caenorhabditis elegans collagen genes. The 54-bp module hypothesis revisited. J Mol Evol 28:55-63

Guo X, Kramer JM (1989) The two C. elegans basement mem- brane (type IV) collagen genes are located on spearate chro- mosomes. J Biol Chem 264:17574-17582

Hofman H, Voss T, Kuhn K, Engel J (1984) Localization of flexible sites in thread-like molecules from electron micro- graphs. Comparison of interstitial, basement membranes and intima collagens. J Mol Biol 172:325-343

Hostikka SL, Tryggvason K (1988) The complete primary structure of the a2 chain of human type IV collagen and comparison with the al(IV) chain. J Biol Chem 263:19488- 19493

Jaworski CJ, Piatigorsky J (1989) A pseudo-exon in the func- tional human aA-erystallin gene. Nature 337:752-754

Kaytes PJ, Wood L, Theriault N, Kurkinen M, Vogell G (1988) Head-to-head arrangement ofmurine type IV collagen genes. J Biol Chem 263:19274-19277

Kxamer JM, Cox GN, Hirsch D (1982) Comparisons of the completed sequences of two collagen genes from Caenorhab- ditis elegans. Cell 30:599-606

Kurkinen M, Bernard MP, Barlow DP, Chow LT (1985) Char- acterization of 64, 123, and 182 base pair exons in the mouse a2(IV) collagen gene. Nature 217:177-179

LimHM, PeneJJ (1988) Optimal conditions for supercoil DNA sequencing with the Escherichia coli DNA polymerase. Gene Technol 5:32-39

Lozano G, Ninomiya Y, Thompson H, Olsen BR (1985) A distinct class of vertebrate collagen genes encodes chicken type IX collagen polypeptides. Proe Natl Acad Sci USA 82:4050- 4054

Muthukumaran G, Blumberg B, Kurkinen M (1989) The com- plete primary structure for the al-chain of mouse collagen IV. Differential evolution of collagen IV domains. J Biol Chem 264:6310-6317

Nath P, Laurent M, Horn E, Sobel ME, Zon G, Vogeli G (1986) Isolation of an cd type IV collagen cDNA clone using a syn- thetic oligodeoxynucleotide. Gene 43:301-304

Ninomiya Y, Gordon M, van der Rest M, Schmid T, Linsen- mayer T, Olsen BR (1986) The developmentally regulated type X collagen gene contains a long open reading frame without introns. J Biol Chem 261:5041-5050

Odermatt E, Tamkun JW, Hynes RO (1985) Repeating mo- lecular structure of the fibronectin gene: relationship to pro- tein structure and subunit variation. Proc Natl Acad Sci USA 82:6571-6575

Parma J, Christopher D, Pohl V, Vassart G (1987) Structural organization of the 5' region of the thyroglobulin gene. Evi- dence for intron loss and "exonization" during evolution. J Mol Biol 196:769-779

Poschl E, Pollner R, Kuhn K (1988) The genes for the al(IV) and a2(IV) chains of human basement membrane collagen type IV are arranged head-to-head and separated by a bidi- rectional promoter ofunique structure. EMBO J 7:2687-2695

Runnegar B (1985) Collagen gene construction and evolution. J Mol Evol 22:141-149

Saitta B, Buttice G, Gambina R (1989) Isolation of a putative collagen-like gene from the sea urchin Paracentrotus lividus. Biochem Biophys Res Commun 158:633-639

Sakural Y, Sullivan M, Yamada Y (1986) al type IV collagen gene evolved differently from fibrillar collagen genes. J Biol Chem 261:6654--6657

Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain terminating inhibitors. Proc Natl Acad Sei USA 74: 5463-5467

Saus J, Quinones S, MacK_tell AJ, Blumberg B, Muthukumaran G, Pihlajaniemi T, Kurkinen M (1989) The complete pri- mary structure of mouse a2(IV) collagen. Alignment with mouse al(IV) collagen. J Biol Chem 264:6318-6324

Soininen R, Tikka L, Chow L, Pihlajaniemi T, Kurkinen M, Prockop DJ, Boyd CD, Tryggvason K (1986) Large introns in the 3' end of the gene for the proal(IV) chain of human basement membrane collagen. Proc Nail Acad Sci USA 83: 1568-1572

Soininen R, Haka-Risku T, Prockop DJ, Tryggvason K (1987) Complete primary structure of the I chain of human basement membrane (type IV) collagen. FEBS Lett 225:188-194

Soininen R, Huotari M, Hostikka SL, Prockop DJ, Tryggvason K (1988) The structural genes for ~ 1 and a2 chains of human type IV collagen are divergently encoded on opposite DNA strands and have an overlapping promoter region. J Biol Chem 263:17217-17220

Solomon E, Hiorns LR, Spurt N, Kurkinen M, Barlow D, Hogan BLM, Dalgleish R (1985) Chromosomal assignments of the genes coding for human types II, III, and IV collagen: a dis- persed gene family. Proe Natl Aead Sci USA 82:3330-3334.

Strauss EC, Kobori JA, Sin G, Hood LE (1986) Specific-primer- directed DNA sequencing. Anal Biochem 154:353-360

Tabor S, Richardson CC (1987) DNA sequence analysis with a modified bacteriophage T7 DNA polymerase. Proc Natl Acad Sci USA 84:4767-4771

Timpl R, Dziadek M (t986) Structure, development, and mo- lecular pathology of basement membranes. Int Rev Exp Pa- thol 29:1-113

Venkatesan M, de Pablo F, Vogeli G, Simpson RT (1986) Struc- ture and developmentally regulated expression of a Stron- gylocentrotus purpuratus collagen gene. Proc Natl Acad Sci USA 83:3351-3355

Wood L, TheriaultN, Vogeli G (1988) cDNA clones completing the nucleotide and derived amino acid sequence of the al chain of basement membrane (type IV) collagen from mouse. FEBS Lett 227:5-8

Yamada Y, Avvedimento VE, Mudryj M, Ohkubo H, Vogeli G, Irani M, Pastan I, de Crombrugghe B (1980) The collagen gene: evidence of its evolutionary assembly by amplification ofa DNA segment containing an exon of 54 bp. Cell 22:887- 893

Yamada Y, Liau G, Mudryj M, Obici S, de Crombrugghe B (1984) Conservation of sizes for one but not another class of exons in two chick collagen genes. Nature 310:333-337

Received August 20, 1989/Revised November 17, 1989

N o t e A d d e d in Proof

T h e h u m a n a l ( I V ) co l lagen gene has t w o 54-bp ex-

ons (So in inen et al., 1989) a n d c o n t a i n s fou r 54 -bp

e l e m e n t s for t he G l y - X - Y repea t sequence .

Soininen R, Huotari M, Ganguly A, Prockop DJ, Tryggvason K (1989) Structural organization of the gene for the al chain of human type IV collagen. J Biol Chem 264:13565-13571