Sequence rearrangement and duplication of double stranded fibronectin cDNA probably occurring during...

10
Volume 8 Number 13 1980 Nucleic Acids Research Sequence rearrangement and duplication of double stranded fibronectin cDNA probably occuring during cDNA synthesis by AMV reverse transcriptase and Escherichia coli DNA polymerase I John B.Fagan, Ira Pastan and Benoit de Crombrugghe Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20205, USA Received 8 April 1980 ABSTRACT Two cloned cDNAs derived from the mRNA for cell fibronectin have been sequenced, providing evidence that transcription with AMV reverse tran- scriptase or Escherichia coli DNA polymerase I may not always result in double stranded cDNA that is exactly homologous with its mRNA template. Instead, the sequences of these cloned cDNAs are consistent with the dupli- cation and rearrangement of sequences during synthesis of double stranded cDNA. INTRODUCTION We have constructed a set of recombinant plasmids with inserts com- plementary to the fibronectin mRNA (1). Cell fibronectin plays important roles in cell adhesion and in maintaining normal cell morphology (2-4). Our primary purpose in constructing recombinant plasmids containing fibro- nectin cDNA sequences was to develop a specific hybridization probe which could be used to quantitate the changes in fibronectin mRNA levels that occur during avian sarcoma virus (ASV) transformation of chick embryo fi- broblasts (CEF). This work has been reported elsewhere (1). In examining the structures of 2 cloned cDNAs derived from the mRNA for cell fibronectin, we have obtained evidence that transcription cata- lyzed by AMV reverse transcriptase or E. coli DNA polymerase I may not always result in double stranded cDNA which is exactly homologous with its mRNA template. Instead we have obtained DNA sequence data suggesting that duplication and rearrangement of sequences occurred during the syn- thesis of double stranded cDNA by AMV reverse transcriptase and E. coli DNA polymerase I. ©) IRL Press Umited, 1 Falconberg Court, London W1V 5FG, U.K. 3055

Transcript of Sequence rearrangement and duplication of double stranded fibronectin cDNA probably occurring during...

Volume 8 Number 13 1980 Nucleic Acids Research

Sequence rearrangement and duplication of double stranded fibronectin cDNA probably occuringduring cDNA synthesis by AMV reverse transcriptase and Escherichia coli DNA polymerase I

John B.Fagan, Ira Pastan and Benoit de Crombrugghe

Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda,MD 20205, USA

Received 8 April 1980

ABSTRACT

Two cloned cDNAs derived from the mRNA for cell fibronectin have beensequenced, providing evidence that transcription with AMV reverse tran-scriptase or Escherichia coli DNA polymerase I may not always result indouble stranded cDNA that is exactly homologous with its mRNA template.Instead, the sequences of these cloned cDNAs are consistent with the dupli-cation and rearrangement of sequences during synthesis of double strandedcDNA.

INTRODUCTION

We have constructed a set of recombinant plasmids with inserts com-

plementary to the fibronectin mRNA (1). Cell fibronectin plays important

roles in cell adhesion and in maintaining normal cell morphology (2-4).

Our primary purpose in constructing recombinant plasmids containing fibro-

nectin cDNA sequences was to develop a specific hybridization probe which

could be used to quantitate the changes in fibronectin mRNA levels that

occur during avian sarcoma virus (ASV) transformation of chick embryo fi-

broblasts (CEF). This work has been reported elsewhere (1).

In examining the structures of 2 cloned cDNAs derived from the mRNA

for cell fibronectin, we have obtained evidence that transcription cata-

lyzed by AMV reverse transcriptase or E. coli DNA polymerase I may not

always result in double stranded cDNA which is exactly homologous with

its mRNA template. Instead we have obtained DNA sequence data suggesting

that duplication and rearrangement of sequences occurred during the syn-

thesis of double stranded cDNA by AMV reverse transcriptase and E. coli

DNA polymerase I.

©) IRL Press Umited, 1 Falconberg Court, London W1V 5FG, U.K. 3055

Nucleic Acids Research

MATERIALS AND METIHODS

Plasmid Construction

The construction and identification of recombinant plasmids with in-

serts complementary to fibronectin mRNA are described elsewhere (1). In

short, single stranded cDNA was transcribed from a fibronectin mRNA tem-

plate, purified from CEF as reported (5). The first strand was synthesized

with AMV reverse transcriptase using oligo-(dT)jO as primer. This single

stranded cDNA was made double stranded with DNA polymerase I. After diges-

tion with Sl nuclease to create flush ends, synthetic decanucleotide

"linkers" carrying the sequence recognized by the restriction endonuclease

hlind III (6, 7) were ligated to the double stranided cDNA with T4 DNA ligase.

The plasmid pBR322 and the cDNA linker complex were each digested with Hind

III to expose complementary single stranided ends. After treatment of the

linearized plasmid with bacterial alkaline phosphatase to remove phosphate

from the 5' ends and thereby prevent self-ligation, the cDNA linker complex

was ligated to the plasmid with T4 DNA ligase. E. coli were transformed

with the resultant population of recombinant plasmids and transformants

carrying plasmids with sequences complementary to fibronectin mRNA were

identified as described (1).

Preparation of DNA Fragments

The cDNA inserts of the plasmids pFN200 and pFN600 were excised from

the plasmids with Hlind III and isolated by centrifugation of 120 pg of DNA

through 5-20 percent sucrose gradients in 10 mM Tris-Cl pH 7.5, 1 mM Na-

EDTA for 14 hours at 26.5 x 103 rpm at 40C in a Beckman SW 27 rotor.

DNA Sequencing

The strategies for sequencing the inserts to pFN200 and pFN600 are

presented in Fig. 1. The labeling of fragments with 32P by polynucleotide

kinase, the preparation and electrophoretic isolation of fragments labeled

at a single end and the base specific cleavage reactions were according to

Maxam and Gilbert (8). Thin sequencing gels (0.4 mm) were prepared and run

as described by Sanger and Coulson (9).

RESULTS AND DISCUSSION

When the inserts of two plasmids containing fibronectin cDNAs were

characterized by DNA sequence analysis, we found that they contained long

inverted repeat sequences. The complete sequence of the insert from pFN200and the partial sequence of that from pFN600 are presented in Fig. 2. The

partial sequence of the insert from pFN600 has been previously reported (1).

3056

Nucleic Acids Research

RI HindilI Hindill HaelilI _~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~v

Hincli Taq I Hinf I Hinf I Alu I Hincli

-_ ~ ~~~~~3

a b

Figure 1

Panel a

Sequencing strategy for the insert from pFN200. The Hind III sitesdefine the ends of the cDNA insert. The Hae III and R I sites are inflanking pBR322 sequences. Two kinase reactions were carried out. In thefirst, the purified insert was kinased, the strands separated electrophore-tically and both strands sequenced. In the second, the whole plasmid wasdigested with R I, kinased and redigested with Hae III. The labeled RI-Hae III fragment was then isolated and sequenced.

Panel b

Sequencing strategy for the insert from pFN600. Four kinase reactionswere carried out. In the first, the insert was cut with Hinf I, kinasedand recut with Hinc II. In the second, the initial digestion was with AluI and the second digestion with Hinc II. In the third, the initial diges-tion was with Taq I and the second with llinc II. In the fourth, the uncutinsert was kinased at each end and then cut with Taq I. Bars a and bindicate the regions of the insert for which the sequence is presented inFig. 2.

The sequence organization of these inserts and the relationships be-

tween their structures are presented in Fig. 3. The salient features of

these structures are: (1) each insert consists of three domains, a central

domain of unique sequence, flanked by two identical sequences that are

exact inverted repeats of each other. The sizes of the central and f lank-

ing domains are 28 and 80 base pairs, respectively, for pFN200 and '300

and 161 base pairs, respectively, for pFN600. Thus, each insert is sym-

metrical and the flanking domains of each strand of these inserts are com-

plementary and could base-pair to form stem and loop structures. (2) The

leftward inverted repeat, the central domain, and 6 base pairs of the

3057

Nucleic Acids Research

kLL BL5 ...AGCTTTGGCACTTACAGTATAAAAATAATCACTGATCATAATTACACCAAATTCCTCTTTG

TCAACTGCCCACTAAGTGTCTTCAATACATTTTATTCCCATTTAAAAACACTTAGTGGGCAGTTGA

_ E RIkXLLCAAAGAGGAATTTGGTGTAATTATGATCAGTGATTATTTTTATACTGTMGTGCCAAAGCT...3'

.~~ ~ ~ ~ ~ kL.1.L. .I~ ~ ~ ~ BLRI HNI

5'... AGCTTTGGCACTTACAGTATAAAAATAATCACTGATCATAATTACACCAAATTCCTCTTTGTCAACTGCCCACTA

AGTGTCTTCAATACATTTTATTCCCATTTAAAAACACTTGAAGGTCAGGGGAACAAAACTGATAAATAACAGTAGGAGAT

ACTAAATCACAAACTGGTGGGGGATCAGAACGTCGAGGGGGTGGGAGAGAGTTGGAATTGAAAGGAAACCATACTATGCA

llwFI U±IIEL ALU *GACTC.... J"140 BASES ... TGATGTTTAAAATATGCACAGTCCTGATTCTTTCTCCATGATCCTGTAGCTTTAGT

.+ ' ' ' tATCTCATACTGTTATTTATCAGTTTTGTTCCCCTGACCTTCAAGTGTTTTTAAATGGGAATAAAATGTATTGAAGACACT

U±NCU RL RL kLLTAGTGGGCAGTTGACAAAGAGGAATTTGGTGTAATTATGATCAGTGATTATTTTTATACTGTAAGTGCCAAAGCT...3'

Figure 2

The sequences of the cDNA inserts from pFN200 (panel a) and pFN600(panel b). The inserts were excised from the plasmids by digestion withHind III and purified by sucrose gradient centrifugation, as described (1).Sequence analysis was carried out as described in MATERIALS AND METHODSusing the strategies presented in Fig. 1. The arrows above the sequenceindicate the limits of the inverted repeats. * indicates the one base inthe sequenced portion of the insert of pFN600 whose identity is not clear.+ indicates the single base in the rightward inverted repeat which is un-ambiguously different from the corresponding base in the leftward repeat.Eco R I digestion was carried out in 20 mM Tris Cl pH 8.5, 2 mM MgC12,allowing cleavage at the sequence AATT.

rightward repeat of pFN200 comprise the first 114 base pairs of the inver-

ted repeat of pFN600. This portion of pFN200 is repeated exactly in pFN600.The remaining 47 base pairs of the inverted repeat of pFN600 are not homol-

ogous to the sequence of the pFN200 insert. Thus, the extreme ends of

these two inserts are exactly homologous, while the sequence of the central

region of the pFN600 insert is unrelated to that of the pFN200 insert. The

3058

Nucleic Acids Research

-300 161pFN600 lI

114 4780+28+6

pFN

80 p 28 80

Fi gure 3

Diagram depicting the relationship between the sequences of the pFN200and pFN600 inserts. See text for description.

insert of a third plasmid which is derived from the same region of the

fibronectin mRNA has been characterized by restriction analysis (data not

shown) and does not contain inverted repeats.

There are three possible sources for the inverted repeats that we have

observed: (1) These structures could be present in the fibronectin mRNA.

(2) These structures could have been generated by sequence rearrangements

and duplication during cDNA synthesis. (3) These inverted repeats could

have been generated during plasmid replication. Although the sequence

data does not conclusively rule out any of these possibilities, two lines

of reasoning argue that these inverted repeats were most likely generated

during cDNA synthesis.

First, one of the three plasmids lacks an inverted repeat structure.

The presence of inverted repeats is neither a general property of the

products of this region of the fibronectin mRNA nor a general property of

this particular cDNA preparation. Second, if the inverted repeat sequence

of pFN200 were actually present in fibronectin mRNA, then it would be

expected that the homology between the pFN200 and pFN600 inserts would be

oriented in such a way as to be consistent with sequential transcription

from the same region of fibronectin mRNA. Instead, sequences homologous

to pFN200 are found at the extreme ends of the pFN600 insert and are orga-

nized in such a way as to preclude their sequential transcription from

the same region of the fibronectin mRNA from which pFN200 was transcribed.

Two possibilities for generating these inserts in the absence of sequence

3059

Nucleic Acids Research

rearrangements and duplications during cDNA synthesis are: (1) The

cDNAs for these two plasmids might have been transcribed from different

fibronectin mRNA species, one species having an inverted repeat colinear

with the pFN200 insert and one species having an inverted repeat colinear

with the pFN600 insert. A third mRNA species might be postulated to ac-

count for the plasmid whose insert lacks an inverted repeat. (2) A single

fibronectin mRNA species might contain four copies of the inverted repeat

sequence common to pFN200 and pFN600. These copies would be found twice

in two different orientations, once colinear with the sequence of the

pFN200 insert and once colinear with that of the pFN600 insert. The alter-

native to these possibilities is that the inverted repeat structures of

the pFN200 and pFN600 inserts were generated by duplication and rearrange-

ment of sequences present only once in the fibronectin mRNA. We prefer

this second alternative, since precedents exist for such nonsequential

transcription (10, 11, 12). This could occur during synthesis of the

first strand by AMV reverse transcriptase, during synthesis of the second

strand by E. coli DNA polymerase I, or possibly during replication of the

plasmid.

One mechanism by which these inserts could be generated during the re-

verse transcriptase reaction is presented in Fig. 4. During the reaction,

a region of fibronectin mRNA exceptionally high in A and U, such as that

found in the loop domains of pFN200 and pFN600, would result in unstable

base-pairing between the mRNA template and the nascent cDNA molecule. This

would allow the nascent chain to dissociate from the mRNA template and fold

back on itself to form a loop with a short base-paired stem as shown in Fig.

4c. The short base-paired stem could then be extended by reverse trans-

criptase to form the longer stem shown in Fig. 4d. The formation of fold-

back structures such as these is not uncommon during the reverse transcrip-

tase reaction (13, 14). In fact, the routine procedure for synthesizing

double stranded cDNA depends on " self-priming" from loops such as these for

the second strand reaction (15). Furthermore, the tendency for reverse

transcriptase to " jump" from one template to another is well-documented

(11), and is an essential component of the mechanism for proviral DNA syn-

thesis. The likelihood that jumping occurred during cDNA synthesis was

increased by the fact that actinomycin D was not present in these reac-

tions. It would be predicted from the model presented in Fig. 4 that for-

mation of the pFN200 insert would depend upon the presence in the fibronec-

tin mRNA of short, complementary sequences flanking the 28 bases which

3060

Nucleic Acids Research

a. _0 AAGUGU ACACUU (A)n

Reverse Transcriptase

TTCACA TGTGAA (dT)n

b. AAGUGU ACACUU (A)n

Region High in A+U

Dissociation of Nascent cDNAand Loop Formation

C. { \ ACACTT/TGTGAA (dT)n

Reverse Transcriptase

d\ACACTT ----------- _-

d. V r TGTGAA (dT)n

Denaturation with NaOHand Self Priming

e

.------- TTCACA TGTGAA (dT)nIDNA Polymerase I

(* I AAGTGT I ACACTT

* --. -TTCACA TGTGAA (dT)n

Figure 4

Model for the generation of inverted repeats during the reverse trans-criptase reaction. A detailed description is presented in the text.

correspond to the central loop domain of the pFN200 insert and are AT rich.

Such complementary sequences are found in exactly the predicted locations

within the insert of pFN600, which we assume to accurately reflect the se-

quence of the corresponding region of fibronectin mRNA. These complemen-

tary sequences are underlined in Fig. 2b. Within the insert of pFN600 the

28 base-pair domain, corresponding to the loop region of the pFN200 insert,

is flanked by the sequences 5' AAGTGT 3' and 5' ACACTT 3', which are exact-

ly complementary. We have included these sequences in the model presented

in Fig. 4. In the final steps of this model, the stem and loop structure

of Fig. 4d would be denatured, as shown in Fig. 4e, to become the template

for synthesis of double stranded cDNA having the inverted repeat structures

3Q61

Nucleic Acids Research

of pFN200 and pFN600.

Fig. 5 presents a mechanism by which these inserts could be generatedduring the DNA polymerase I reaction. During transcription of self-primed

single stranded cDNA into double stranded cDNA by DNA polymerase, a region

exceptionally high in A and T would result in unstable base-pairing and

would increase the probability of slippage, as shown in Fig. 5b. Slippagewould be followed by transcription from a new point on the template, as

shown in Fig. 5d, resulting in a double stranded cDNA with an inverted

repeat structure. As in the reverse transcriptase model presented in Fig.

4, the DNA polymerase model of Fig. 5 predicts the existence and location

of the complementary hexanucleotide sequences 5' AAGTGT 3' and 5' ACACTT 3'

which we have found to flank that 28 base-pair domain of the pFN600 insert

I First Strand Synthesisfrom mNNA Template

I with Reverse Transcriptase

a. TTCACA - TGTGAA (dT)n

I Self-Priming

b. ACACTTTGTGAA (dT)n

DNA Polymerase I

C. ACACTT -___--TGTGAA (dT)n

Region High in A+TSlippage and Self Priming

at New Site

TTCACA TGTGAA (dT)n

IDNA Polymerase I

,4Z JIAAGTGT I ACACTTle.

------- TTCAGA - TGTGAA (dT)n

Fi gure 5

Model for the generation of inverted repeats during the DNA polymeraseI reaction. A detailed description is presented in the text.

3062

Nucleic Acids Research

which corresponds to the central loop domain of the pFN200 insert.

O'Hare et al. (10) have reported the existence of a similar, although

shorter, inverted repeat at the end of a cloned cDNA derived from ovalbumin

mRNA. They suggest similar mechanisms for generating inverted repeats of

this nature. These authors also point out the similarity of these mecha-

nisms to that proposed for generation of the mini-insertion element IS2-6

(12). It is worthwhile pointing out that on the basis of this mechanism,

the inverted repeats of pFN200 and pFN600 could also have been generated

during replication of the recombinant plasmids.

Other investigators have observed that the error rate for transcrip-

tion by reverse transcriptase is relatively high, and have suggested that

this may be one reason for the high rate of mutation of avian tumor viruses

to defective forms (16). Our results do not provide an accurate measure of

transcriptional fidelity for reverse transcriptase. However, the high de-

gree of homology between pFN200 and pFN600 suggests that transcription was

relatively accurate for these short sequences. The predominant alteration

that we have observed to occur during cDNA synthesis was not inaccurate

transcription, but sequence rearrangement and duplication. The resultant

cloned cDNA is, therefore, not suitable for use in determining the sequence

of the mRNA template from which it was derived. It is, however, quite

suitable for use as a hybridization probe for mRNA quantitation. Results

of these studies are presented elsewhere (1).

REFERENCES

1. Fagan, J.B., Sobel, M.E., Yamada, K.M., de Crombrugghe, B., and Pastan,I. In preparation.

2. Yamada, K.M., Yamada, S.S., and Pastan, I. (1976) Proc. Natl. Acad.Sci. USA 73, 1217-1221

3. Willingham, M.C., Yamada, K.M., Yamada, S.S., Pouyssegur, J., andPastan, I. (1977) Cell 10, 375-380

4. Ali, I.U., Hautner, V.M., Lanza, R., and Hynes, R.O. (1977) Cell 11,115-126

5. Fagan, J.B., Yamada, K.M., de Crombrugghe, B., and Pastan, I. (1979)Nucleic Acids Research 6, 3471-3480

6. Scheller, R.H., Dickerson, R.E., Boyer, H.W., Riggs, A.D., andItakura, K. (1977) Science 196, 177-180

7. Bahl, C.P., Marian, K.J., Wu, R., Stawinski, J., and Narang, S.A.(1977) Gene 1, 81

8. Maxam, A.M. and Gilbert, W. (1977) Proc. Natl. Acad. Sci. USA 74, 560-564

9. Sanger, F. and Coulson, A.R. (1977) FEBS Lett 87, 107-11010. O'Hare, K., Breathnach, R., Benoist, C., and Chambon, P. (1979) Nucle-

ic Acids Research 7, 321-324

3063

Nucleic Acids Research

11. Gilboa, E., Mitra, S.W., Goff, S., and Baltimore, D. (1979) Cell 18,93-100

12. Ghosal, D. and Saedler, H. (1978) Nature 275, 611-61713. Taylor, J.M., Faras, A.J., Varmus, H.E., Levinson, W.E., and Bishop,

J.M. (1972) Biochemistry 11, 2343-235114. Leis, J.P. and Iiurwitz, J. (1972) Proc. Natl. Acad. Sci. USA 69, 2331-

233515. Efstratiadis, A., Kafatos, F.C., Maxam, A.M., and Maniatis, T. (1976)

Cell 7, 279-28816. Gopinathan, K.P., Weymouth, L.A., Kunkel, T.A., and Loeb, L.A. (1979)

Nature 278, 857-859

3064