The 5? ends of Escherichia coli lac mRNA*1

8
J. Mol. Biol. (1985) 182, 241-248 The 5’ Ends of Escherichia coli Zac mRNA Vincent J. Cannistraro and David Kennel1 Department of Microbiology and Immunology, Box 8093 Washington University School of Medicine St Louis, MO 63110, U.S.A. (Received 4 May 1984, and in revised form 9 October 1984) We identified the predominant 5’ ends of an mRNA in Escherichia coli to the exact nucleotides. There are four such ends of lac mRNA in fully induced cells. About 70% of the molecules have the reported major in vitro end, A-A-U-U-G (at + l), which is located 38 nucleotides before the A-U-G translation start. Another 15% start with A-U-U-G at +2. and about 8% start with A-U-U-A-G at -52. A fourth class of molecules begin with either A-G, C-A-G, A-C-A-G, or a weak A-C-A-C-A-G (at +24), observed only once. The origins of this latter set (I 10% of the total) are not known, but they could represent “ragged” ends of the mRNA when it is degraded to the beginning of the ribosome-protected region of the message. The A-U-U-A-G molecules are probably initiated from an upstream promoter whose position would coincide with the CAMP-CRP DNA binding site for the major promoter. 1. Introduction The PGalaset gene is the first of three structural genes of the Zac operon. Using a DNA fragment for in vitro synthesis, Maizels (1973) reported that the major transcription product contained 38 nucleotides preceding the A-U-G translation start of the BGalase message. As is true for all Escherichia coli mRNAs examined so far, this “leader” includes a purine-rich sequence that is important for ribosome recognition in the -5 to - 15 region (Shine & Dalgarno, 1974; Steitz & Jakes, 1975). However, E. coli leaders contain many more nucleotides than are necessary for this specificity (of 27 mRNA leaders surveyed that do not have known transcription attenuators, the mean length was 62 nucleotides and the median length 42; Kennell, 1985). Besides its significance for translation initiation, the leader RNA may include specific sites for degradation or processing before the start of the message. Certain lac0” mutations (Sadler & Smith, 1971; Smit’h & Sadler, 1971), which would disrupt a putative secondary structure in the first 21 nucleotides, cause a faster decay of the BGalase mRNA as well as more frequent translation initiations (Cannistraro & Kennell, 1979). Also, t Abbreviations used: PGalase, /3-galactosidase (EC 3.2.1.23) coded by the ZacZ gene; message, mRNA coding for a single, functional polypeptide (a polycistronie mRNA carries several messages); Zac, lactose operon; trp, tryptophan synthesis operon; gal. galactose operon: bp, base-pair: CRP, cyclic AMP receptor protein. there may be an RNase III recognition site in this structure since we had observed a translation- related defect for /?Galase synthesis in RNase III- strains (Talkad et al., 1978) and the putative stem- loop structure showed similarities with RNase III sites in bacteriophage T7 mRNA (Cannistraro & Kennell, 1979). Sites for RNase III recognition exist before several messages, e.g. from T7 (Dunn & Studier, 1973), T4 (Barkay & Goldfarb, 1982) and A (Lozeron et al., 1977) bacteriophages. If processing or inactivating cleavages (Blundell & Kennell, 1974; Achord & Kennell, 1974; Lim & Kennell, 1979, 1980) do occur near the start of a message, they might be identified if the ends generated by such events were present for a sufhcient time. Their detection would also be dependent on the sensitivity of the assay for specific ends. Almost all transcription start points have been determined using purified components in an in vitro system. Such systems would not detect processing or degradation sites and, of course, the origin of RNA synthesis in the cell might even be different. The in wivo start of the trp mRNA was inferred from the presence of an oligonucleotide consistent with the most abundant of three or four start oligonucleotides found in in vitro synthesis (Squires et al., 1976). The gal mRNA was shown by 81 nuclease mapping to have 5’ ends at t’wo approximate positions five nucleotides apart (Aiba et aZ., 1981), and since then this mapping procedure has been used to identify approximate start points for several mRNAs. We developed a protocol that can detect 5’ ends of the induced lac mRNA that 0022-28361851060241-08 $03.0010 241 Q 1985 Academic Press Inc. (London) Ltd.

Transcript of The 5? ends of Escherichia coli lac mRNA*1

J. Mol. Biol. (1985) 182, 241-248

The 5’ Ends of Escherichia coli Zac mRNA

Vincent J. Cannistraro and David Kennel1

Department of Microbiology and Immunology, Box 8093 Washington University School of Medicine

St Louis, MO 63110, U.S.A.

(Received 4 May 1984, and in revised form 9 October 1984)

We identified the predominant 5’ ends of an mRNA in Escherichia coli to the exact nucleotides. There are four such ends of lac mRNA in fully induced cells. About 70% of the molecules have the reported major in vitro end, A-A-U-U-G (at + l), which is located 38 nucleotides before the A-U-G translation start. Another 15% start with A-U-U-G at +2. and about 8% start with A-U-U-A-G at -52. A fourth class of molecules begin with either A-G, C-A-G, A-C-A-G, or a weak A-C-A-C-A-G (at +24), observed only once. The origins of this latter set (I 10% of the total) are not known, but they could represent “ragged” ends of the mRNA when it is degraded to the beginning of the ribosome-protected region of the message. The A-U-U-A-G molecules are probably initiated from an upstream promoter whose position would coincide with the CAMP-CRP DNA binding site for the major promoter.

1. Introduction

The PGalaset gene is the first of three structural genes of the Zac operon. Using a DNA fragment for in vitro synthesis, Maizels (1973) reported that the major transcription product contained 38 nucleotides preceding the A-U-G translation start of the BGalase message. As is true for all Escherichia coli mRNAs examined so far, this “leader” includes a purine-rich sequence that is important for ribosome recognition in the -5 to - 15 region (Shine & Dalgarno, 1974; Steitz & Jakes, 1975). However, E. coli leaders contain many more nucleotides than are necessary for this specificity (of 27 mRNA leaders surveyed that do not have known transcription attenuators, the mean length was 62 nucleotides and the median length 42; Kennell, 1985). Besides its significance for translation initiation, the leader RNA may include specific sites for degradation or processing before the start of the message. Certain lac0” mutations (Sadler & Smith, 1971; Smit’h & Sadler, 1971), which would disrupt a putative secondary structure in the first 21 nucleotides, cause a faster decay of the BGalase mRNA as well as more frequent translation initiations (Cannistraro & Kennell, 1979). Also,

t Abbreviations used: PGalase, /3-galactosidase (EC 3.2.1.23) coded by the ZacZ gene; message, mRNA coding for a single, functional polypeptide (a polycistronie mRNA carries several messages); Zac, lactose operon; trp, tryptophan synthesis operon; gal. galactose operon: bp, base-pair: CRP, cyclic AMP receptor protein.

there may be an RNase III recognition site in this structure since we had observed a translation- related defect for /?Galase synthesis in RNase III- strains (Talkad et al., 1978) and the putative stem- loop structure showed similarities with RNase III sites in bacteriophage T7 mRNA (Cannistraro & Kennell, 1979). Sites for RNase III recognition exist before several messages, e.g. from T7 (Dunn & Studier, 1973), T4 (Barkay & Goldfarb, 1982) and A (Lozeron et al., 1977) bacteriophages.

If processing or inactivating cleavages (Blundell & Kennell, 1974; Achord & Kennell, 1974; Lim & Kennell, 1979, 1980) do occur near the start of a message, they might be identified if the ends generated by such events were present for a sufhcient time. Their detection would also be dependent on the sensitivity of the assay for specific ends. Almost all transcription start points have been determined using purified components in an in vitro system. Such systems would not detect processing or degradation sites and, of course, the origin of RNA synthesis in the cell might even be different. The in wivo start of the trp mRNA was inferred from the presence of an oligonucleotide consistent with the most abundant of three or four start oligonucleotides found in in vitro synthesis (Squires et al., 1976). The gal mRNA was shown by 81 nuclease mapping to have 5’ ends at t’wo approximate positions five nucleotides apart (Aiba et aZ., 1981), and since then this mapping procedure has been used to identify approximate start points for several mRNAs. We developed a protocol that can detect 5’ ends of the induced lac mRNA that

0022-28361851060241-08 $03.0010 241

Q 1985 Academic Press Inc. (London) Ltd.

242 V. J. Cannistraro and D. Kennell

.

Fig. 1.

lac mRNA Ends 243

represent < 5% as many molecules as does the most abundant’ species (Cannistraro et al., 1985a,b). Furthermore, as opposed to the useful S1 nuclease mapping technique (Berk & Sharp, 1977; Weaver & Weissman, 1979), the procedure unambiguously identifies the end to the exact nucleotide. Also, in the S, n&ease technique RNA molecules whose 3’ ends are upstream from the 5’ end of the DNA probe, which could result from termination or processing, would go undetected, i.e. the 32P at the 5’ end of the DNA would be lost in the S, digestion.

In this paper we identify for the first time the predominant in vivo 5’ ends of an E. coli mRNA to the exact nucleotides. We chose mRNA from the lac operon to relate the results to earlier in vivo work discussed above. The DNA probe used to purify the start region of the lac mRNA was a 789 bp HpaI fragment from plasmid pMC3 that was constructed by M. Calos. It contains the last 223 bp of the luci gene, 87 bp of intervening sequences before the original in vitro 1acZ transcription start site of Maizels (1973) and the subsequent 479 bp of lac DNA. E. coli HfrC, proC, metB, used in the earlier study (Cannistraro & Kennell, 1979) was growing in minimal medium and induced for ten minutes with isopropylthiogalactoside and CAMP. The RNA was pulse-labeled with [5-jH]uridine during the last minute to follow hybrid yield. The 5’ ends of purified lac mRNA were labeled with 32P from [y- “P]ATP in the T4 polynucleotide kinase rea-ction. The RNA was then digested with T, RNase and the [32P]oligonucleotides were separated and sequenced. A complete discussion of all steps will be presented elsewhere (Cannistraro et al., 1985a,b).

It should be noted that for various technical reasons, the ratio of 32P incorporated onto two different 5’ ends may not accurately measure the ratio of the numbers of those ends. For example, the rate of labeling of an “internal” 5’ end or of a blunt double-stranded end may be five times slower than that of an “external” (not in a double strand) end (Berkner & Folk, 1980). To obviate this complication we used about an 800-fold excess of T4 polynucleotide kinase with respect to the units required to label the pmoles of ends completely. Similarly, some molecules may not hybridize as efficiently as others, so the hybrid reactions were continued to the maximum observed levels. Nonetheless, such reservations should be kept in mind in deriving ratios of molecules from these data.

2. Results and Discussion

(a) The observed in vivo 5’ ends

There were at least four ends that accounted for >90% of the 5’ ends of mRNA that form hybrid to this DNA (Fig. 1). About 70% of the molecules started with the same A-A-U-U-G first, observed in vitro by Maizels (1973). Another 15 to 20% started with A-U-U-G. These two sequences defined the 5’ ends of 80 to 90% of the RNA molecules for this part of the lac operon. S, nuclease mapping of in vivo RNA gives a lac band consistent with an end near these positions (Reznikoff, personal communi- cation). A third end was A-U-U-A-G. It was at a lower concentration (-8%), but present in all experiments. This end was not obvious at first because our oligonucleotide separation scheme

Figure 1. End oligonucleotides that identify 5’ ends of Zac mRNA molecules in E. co& cells. E. coli HfrC, proC, me@ cells were harvested during exponential growth in minimal medium after 10 min induction of the lac operon by isopropylthiogalactoside and CAMP (Cannistraro & Kennell, 1979). The Zac mRNA was purified by 2 rounds of hybridization to a 789 bp fragment of DNA containing the end of loci to +479 of la&, where + 1 is the major in vitro tucZ transcription start, A-A-U-U-G (Maizels, 1973). The Zuc RNA was 5’ end-labeled, digested with T, RNase, which cleaves after each G residue, and the resulting oligonucleotides were separated by electrophoresis in a 20% polyacrylamide gel, and each band was then electrophoresed on cellulose acetate. Each fraction was treated by partial alkali hydrolysis and its sequence determined by electrophoresis on cellulose acetate followed by homochromatography (Brownlee, 1972) on a PEI sheet (Cannistraro et al., 1985a,b). (a) A short exposure of the PEI sheet with the 5’ oligonucleotide from the predominant Zac molecules in the cell (A-A-U-U-G). (b) The same sheet exposed longer to show the appearance of the A-U and A-U-U fragments with the A, A-U-U-A and A-U-U-A-G fragments coincident with the A, A-A-U-U and A-A-U-U-G oligonucleotides of A-A-U-U-G. (c) An isomer band of the same 5’ oligonucleotides showing almost equal amounts of the two starts, A-A-U-U-G and A-U-U-A-G. A complete treatment with pancreatic RNase of the band gave the 2 spots on the right (A-U and A-A-U). (d) Another very weak isomer band gave a pure A-U-U-A-G oligonucleotide that required very long exposure. The oligonueleotides in (b) to (d) were obtained in 3 completely independent experiments. (e) Shows the A-U-U-G that accounts for about 20% of all the Zuc mRNA molecules; this end is seen in every experiment as a pure band. (f) Shows the weak A-C-A-G end. Not shown are C-A-G, a stronger A-G and a very weak A-C-A-C-A-G seen once but too weak to photograph.

Also shown in (c) are the expected approximate positions of the 4 5’ C3’P], 3’ P-diphosphomononucleotides (O), which from left to right would be: pup, pGp, pAp and pCp. The smaller 2’ and 3’ isomers of the same 5’ [32P], 2’,3’ P- diphospho-oligonucleotides separate from each other in one or both dimensions. The most separated product is pAp, which happens to be the 5’ nucleotide in all the panels. The separations are less the larger the oligonucleotide, as seen for pApAp compared to pAp in (a) and (c). Note that A separates mostly in the homochromatography dimension (vertical), while pApCp separates more in the electrophoresis dimension (see (f)). Rather than interfering with the sequencing. these differences add another useful variable.

244 V. J. Cannistraro and D. Kennel1

fractionates on the basis of size and composition, so that A-U-U-A-G was usually obscured in the same band containing the much stronger A-A-U-U-G. It only became obvious accidentally because some of the specific 3’, 5’-diphospho-oligonucleotides appar- ently exist in different isomer forms in the conditions used here and can separate from each other in both the 20% polyacrylamide sizing gel as well as during the second-dimension electrophoresis at pH 3.5. In most cases some A-U-U-A-G migrated in the main A-A-U-U-G band, and upon longer exposure of the sequence pattern, the less intense A-U and A-U-U spots could be seen (Fig. l(a) and (b)). Also shown in Figure l(c), are the sequence analyses for a band that contained approximately equal amounts of the t;vo oligonucleotides. Figure l(d) shows a band of pure A-U-U-A-G.

The preceding three ends were reproducible and each represents a unique sequence in the 789 nucleotides represented by this DNA probe. Another end is less specific. It consists of a small family of oligonucleotides: A-G, C-A-G, A-C-A-G, and a very weak A-C-A-C-A-G found only in one experiment. Excluding the hexamer, none of the smaller fragments is unique; A-C-A-G appears five times in the 789 nucleotides. We think these small oligonucleotides must be from the in vivo ends of lac mRNA, since the major source of spurious RNA comes from very small oligonucleotides derived from treatment with pancreatic RNase of the first hybrid that might stick to the nitrocellulose filter during the second hybridization. The last step before T, RNase digestion is fractionation of the [5’-32P]RNA through two successive Bio-Gel I’10 columns (Bio-Rad, Inc.), which remove oligonucleo- t,ides of < 20 bases (Cannistraro et al., 19853). Also, a strain with a complete lac deletion gave no specific (5’-32P)-labeled ends.

However, it might be expected that small [5’-32P]oligomers that represent < 59/o of the total molecules could simply result from the summation of all the random ends of the decaying lac mRNA population. This is also an unlikely complication, for the following reason. During steady-state induction, we estimated that approximately one- third of all PGalase mRNA molecules are complete, one-third are decaying and do not have the original 5’ end, while of the one-third that are nascent, about three-quarters have their original 5’ end. Thus, about 60% of all lac molecules still have the original 5’ end (Kennel1 & Riezman, 1977). Considering only the first 479 nucleotides in the present study, the numbers of molecules missing the original 5’ ends would be decreased by a factor of about 47913063 so that all the decaying 5’ ends would only account for I 10% of the total. Divided equally between the four nucleotides, the total of all random ends starting with A would only be about 39/d of the major start! Even the dinucleotide A-G must represent a significant end since the total A-G from random statistics would only be 2 x 0.25 = 0.5% of the total ends. Consistent with this calculation, we did not detect any U-G or C-G

within the resolution of our analysis (about 1% of the total ends).

(b) Comparing in vivo and in vitro observations

The A-A-U-U-G end must represent the major transcription start in vivo as was found in the in vitro reaction. However, in that original reaction with lac UV5 DNA, about 15% of the starts were G-A-A-U-U-G with an unprimed reaction and all started with G with a GpA primer (Maizels, 1973). In the later in vitro reactions of Majors (1975) with wild-type lac DNA, the starts were about evenly distributed between G-A-A-U-U-G and A-A-U-U-G. If the G start occurs in cells, we think it must be < 10% of the total 5’ ends in the cell. A pancreatic RNase digest should release G-A-A-U as a strong specific band and such a digest did not give this oligonucleotide.

A number of reasons can be suggested for this difference between the in vitro and in vivo results. Fir&, of course, the in vitro reaction may not mimic the conditions inside a cell and the G start may represent an artifact. Another explanation that we think more likely is that the polymerase can start with G, but that the frequency of such starts relative to the A start is limited by the effective ratio of free GTP to ATP for the reaction, as was shown in subsequent in vitro experiments (Carpousis et aZ., 1982). The level of ATP in most cells is much higher than GTP with a ratio of 2.5 or more in E. coli (Bagnara & Finch, 1974; Friesen et al., 1975). The in vitro reactions contained equimolar concentrations of ATP and GTP, which could have distorted the relative frequency of G start,s. If correct, this would mean that the G start does occur i,n ‘11ivo but at a level too low to det,ect unambiguously wit,h present methods. A third reason for an m vitro end to be missing in the cell is that it may be on molecules that are processed or degraded rapidly in vivo.

The A-U-U-G end was not observed in t,he original in vitro work (Maizels, 1973; Majors, 1975). However, Carpousis et al. (1982) did observe a very minor (-6% of total) A-U start in vitro from Zac UV5 and other mutant promoters. They did not show that it was derived from an A-U-U-G-ended molecule but favored that view. Again, two possible explanations can be considered for a potential disagreement between observations in vitro and in vivo, which bear on the possible origins of this end. First, it could represent a “slippage” transcription start. As proposed by Siebenlist et al. (1980) to explain why either G or A could be a lac mRNA start in vitro, the polymerase-DNA complex may have some flexibility with respect to the exact initiating nucleotide. Thus, if the G start is very low in viva because the GTP/ATP ratio is so low, then their explanation, if correct, would mean that the start from this promoter could be G-A-A-U-U-G, A-A-C-U-G or A-U-U-G. A second possibility relates to the same basic consideration noted before. A-V-U-G may not be a start at all but may be

lac mRNA Ends 245

generated in the cell by cleavage of the terminal A from A-A-U-U-G or by cleavage of the upstream transcript (below). An end from processing would not be seen in the in vitro system. Theoretically, release of the terminal A would disrupt the putative stem-loop structure at the start of the molecule (Cannistraro & Kennell, 1979).

Another in vitro start reported by Carpousis et al. (1982) was G-U-G-A-G, commencing at +5. Like the A-U start, it was synthesizing very infrequently from lac UV5 DNA but became a major end with other mutant DNAs. At the physiological ATP/GTP ratio, it became even lower. Wild-type DNA was not used in their experiments, but this end might be present in vivo, but if so, at a very low level.

(c) An upstream promoter

The A-U-U-A-G end is derived from a sequence that starts at -52 and has not been reported from in vitro studies. However, Reznikoff et al. (1982) observed an extremely weak transcript initiated at about -45 in an in vitro reaction containing 25% glycerol, but it was inhibited by cAMP+CRP. It is conceivable, but we think unlikely, that this end results from a specific cleavage of a transcript that is a readthrough of the laci mRNA. Besides being perhaps loo-fold lower in amount than the A-U-U-A-G transcript (Gilbert & Muller-Hill, 1970), the laci mRNA terminates in vitro only about ten nueleotides past the -52 position (Horowitz & Platt, 1982) and does so in vivo at this same position as well as at - 18 and -79 (Cone et al., 1983); the -79 termination would not include the -52 sequence and the other two cleaved laci mRNA transcripts would probably be too short to form stable hybrids. Also, these terminations were in the absence of CAMP-CRP, and it was proposed that in their presence, the laci mRNA would terminate further upstream (Cone et al., 1983).

It is more likely that this is an upstream transcription start, although it cannot be concluded one way or the other. from comparisons of its upstream sequences to those of other promoters. There are “consensus” sequences in the -10 and -35 regions of a transcription initiation site that are probably important for RNA polymerase recognition (reviewed by Siebenlist et al., 1980; Rosenberg & Court, 1979; Hawley & McClure, 1983). A frequent hexamer in the - 10 region is T-A-T-A-A-T and in the -35 region, T-T-G-A-C-A. At -63 to -58 (or - 11 to -6 before the A-U-U-A-G) the sequence T-T-A-G-C-T would agree for the two boundary nucleotides but not for the others. However, there can be considerable variation among promoters in the nucleotide at any one position (except possibly the boundary T at -6), with the greatest conservation seeming to be for the two boundary T residues. The internal nucleotides at each position in the -63 to -58 sequence are among the most frequent of the discrepant ones in these promoter surveys.

Furthermore, there are several bona fide promoters with agreement in only two out of six in the - 10 region, e.g. the malT, leu, ilvGEDA and thr promoters. There is better agreement with the -35 consensus sequence that, as is commonly the case, is 19 bp separated from the - 10 region. At -36 to -31 before the A-U-U-A-G end is G-T-G-A-G-C, which agrees in three of six positions with the consensus hexamer (O-16 probability for random chance agreement). There is considerable ambiguity observed at all positions in the -35 region also, with several promoters showing agreement in only two out of six.

An important consideration in such an assessment analysis is that the most studied promoters are generally the stronger ones. If these sequences do affect the frequencies of initiation, then it might not be surprising that a weak promoter could be weak precisely because it, showed relatively poor agreement. For example, the internal promoter in the trp operon is relatively weakly expressed and it also shows agreement in only two out of six in the - 10 region (with only one boundary T) and three out of six in the - 35 region. Thus, we do not think the agreement or lack of agreement to expected consensus sequences argues for or against the A-U-U-A-G end originating from transcription initiation,

There is precedence for more than one promoter for an operon. An early example of such a situation was the three promoters for the start of the early T7 mRNA (Dunn & Studier, 1973; Minkley & Pribnow, 1973). Since then many cases have been reported. There are at least two interesting features of certain operons with multiple promoters. First, each promoter can be regulated by a specific set of signals that differs from those controlling the other promoters. For example, transcription can initiate in the gal operon at either of two promoters, Pl or P2, that are about 5 bp apart. (Muss0 et al.. 1977). All three gene products are produced in about equimolar amounts when galactose is used as an energy source with initiation from Pl stimulated by CAMP. However, in the absence of CAMP only the first message for epimerase continues to be made with initiation being from P2 (Taniguchi et al., 1979).

A second interesting feature of multiple-promoter control is that the activity of one promoter with the same polarity can restrict the activity of the other. In the gal operon not only does CAMP-CRP activate Pl but it also represses initiation at P2 and probably does so by direct competition for binding with RNA polymerase at the P2 binding site (Muss0 et al., 1977). In a second type of competition the transcript from a distant upstream promoter can inhibit or “occlude” promoter initiation (Adhya & Gottesman, 1982; Hausler & Somerville, 1979).

Multiple promoter interactions may play a role in lac operon expression also. In an in vitro system in the absence of CAMP-CRP, RNA polymerase binds preferentially to a site located 22 bp before the principal (Pl) binding site (Malan & McClure, 1984;

246 V. J. Cannistraro and D. Kennel1

Malan et al., 1984) and under certain reaction conditions can result in initiation of a transcript from this position in vitro ($-U-U-C-C-G-; Reznikoff et al., 1982; Malan & McClure, 1984). Addition of cAMP-CRP caused binding to, and transcription from, the primary (+ 1) promoter and these authors proposed that binding at the upstream site interferes with binding at the usual promoter. cAMP-CRP would repress the upstream binding and position the RNA polymerase at the primary site (McClure et al., 1982).

We found no sign of transcripts starting with U-U-C-C in vivo (CAMP was present). Since almost

all starts found so far are with A, or less often with G (Siebenlist et al., 1980), such an unusual end should have been apparent here if it were derived from only 1 or 2% of the molecules, unless of course, it had a very short half-life in the cell.

The significance of a transcription initiation from the -52 site for lac operon expression is not known. In analogy with the preceding examples, the RNA polymerase binding site for the -52 transcript could compete directly for cAMP-CRP binding at the -50 to - 70 binding site for cAMP-CRP (Dickson et al., 1975), while the transcript itself could interfere with initiation downstream.

‘Oc G G

G A A U

p:“I5

UA Riborome-protected

;‘;20 c ----------_---___--__ -_- _____,

(a) S~ppAuuCA~AC&AAC%GCUA~GACC%GAu~ACGG~uCA~uG- - Thr Met ile Thr Asp Ser Leu-

13- Galactosidore

,----- - _______--__ - ____ -- ------,

( b) (5~ppp)Auu~uGAG~~Au~ACAA~uu~A~A~A~AA~~GGCu~ACC~uGAu~A~GG~U~A~UG- Thr Met lie Thr Asp Ser Leu-

'EGGS G A u

E Au20

*I A U -1G U

-5UA GC &.-- __ ____________ __ _ -a-----,

(c) ~‘-~~~A;~AGG;~CCC;L~OGGC;::ACH~UUUA?;GCU;~~GG;~CGUA~;~G~~AA~~G~UA~A~~~UGAU~A~GG~UU~A~UG-

Thr Met I le Thr Asp Ser Leu -

Thr Met ile Thr Asp SW Lou -

Figure 2. The 5’ ends of mRNA from the Zac operon in E. coli cells as inferred from the results in this paper. Putative secondary structures are drawn. The nucleotide sequence of the region was determined in earlier investigations (Maizels, 1973; Gilbert t Maxam, 1973; Dickson et al., 1975). The start of the most abundant end (a) is designated + 1. The Zaci gene ends at -88. The A-U-G translation start is underlined as well as the proposed ribosome recognition sequence at + 28 to + 31. The secondary structure at the start of (a) has a calculated free energy of - 3.6 kcal mol- ’ at 25°C and is the most thermodynamically stable structure found by the computer analysis of Zuker & Stiegler (1981). The absence of the first A gives close to zero free energy for that structure so (b) is shown as a linear molecule. If formed, the most stable structure from computer analysis would still have the A-G-G-A and A-U-G positions in a single-stranded region with a short double strand between 4 to 8 and 22 to 26.

It is not known if the molecule in (c) terminates before synthesis of the translation start at +39 to +41. Computer analysis for the most thermodynamically stable structure gives a similar stem-loop region but with the A-G-G-A ribosome recognition sequence even more inaccessible due to other possible pairings; in particular the G,G at 29, 30 are paired to the C,C at -42, -43 with smaller stem-loops in between them. The molecule(s) shown in (d) were not identified conclusively, but there is a small family of molecules whose ends are A-G, C-A-G, A-C-A-G and possibly A-C-A-C-A-G. They are depicted here with reference to a possible significance discussed in the text. The approximate relative numbers of each class are: (a) about 70%; (b) about 20%; (c) about 8%; (d) I 10%. Sequence hyphens and base- pair center dots have been omitted for clarity.

lac mRNA Ends 247

Conversely, instead of interfering with expression of other promoters, this upstream promoter could serve to provide some basal level of expression in the uninduced and/or low CAMP state. It was shown some years ago that some of the transcribing polymerases from an upstream fusion do read through the repressor-bound Zac DNA (Silverstone et al., 1969). Such a transcript would include the operator RNA and thus generate a very stable stem-loop structure ( N - 16 kcal mol - ’ ; 1 cal = 4.184 J) whose 3’ end would include the A-G-G-A sequence for ribosome initiation (Shine & Dalgarno, 1974; Steitz & Jakes, 1975) (Fig. 2). It might be translated poorly, as in the case of trplac fusion strains that would generate such a structure (Reznikoff et al., 1974; Ming et al., 1984). As opposed to the fusion case, the transcription, which is not accompanied by any upstream translation from trp mRNA, might terminate prematurely or pause (Maizels, 1973). It is of interest that a weak translation start (A-U-G at -24 to -26 with A-G-G for ribosome recognition at -38 to -40) would generate a 13 amino acid peptide terminated by U-A-A at + 14 to + 16 and disrupt the structure to expose the normal A-G-G-A translation initiation site for ,&Galase. CAMP was present in our inductions, and it will be interesting to observe the effects of different physiological conditions on the synthesis of the A-U-U-A-G transcript.

(d) A possible end from degradation

What may be a family of molecules that end with any one of the members in an A-C-A-C-A-G sequence could originate from any one of a few positions on the DNA or the members might be from more than one position. However, one possibility can be mentioned that is worth further testing. These molecules could originate from the region ending with the first two nucleotides of the Shine-Dalgarno A-G-G-A sequence. This region defines the upstream limit, of the sequence protected by an initiating ribosome i~z vitro (Maizels, 1974). The net wave of degradation in the 5’ to 3’ direction (Kennel1 & Riezman. 1977) could be transiently impeded when it reaches the part of the mRNA associated with ribosomes with variable penetration reflected by the imprecise definition of the end nucleotide.

(e) Implications of a negative result

The original purpose of this study was to identify specific 5’ ends that could define processing or inactivation sites for the BGalase mRNA. It is possible that the A-U-U-G end results from processing but, alternatively, it could be another initiation start resulting from slippage of the polymerase from the major A-A-U-U-G start. The only other possible end that could reflect. the degradation process is a ragged set of ends that might originate from the most upstream ribosome- protected mRNA. The sum of these latter ends is still < loo/, of the major one. Although significantly

9

above the amount predicted for random ends from the entire 479 nucleotides of the major 1ac.Z mRNA, this level, distributed into different end nucleotides, is still near the limits of detection by the present methods. The absence of any obvious end resulting from cleavage was disappointing, but this negative result may be the signpost directing us to the correct mechanism of mRNA degradation. With many mRNAs now sequenced, it can be shown fairly rigorously that there is no obvious consensus sequence or structure at the start of all messages that could be the target for inactivation by a common messenger RNase (Kennell, 1985)~. It is definite from the present results that’ the majority of molecules still have the original transcription end. Various observations will be presented in more detail (Kennell, 1985) to argue that loss of the original 5’ end is the primary and limiting event in the entire decay process and that there probably is no specific enzyme activity for this loss.

This work was supported by research grants from the NIH (AI-13492) and NSF (PCM-8214024).

References

Achord, D. & Kennell, D. (1974). J. Xol. AioZ. 90, 581-599.

Adhya, S. & Gottesman, M. (1982). (‘ell, 29. 939944. Aiba, H., Adhya, S. & decrombrugghe, B. (1981). J. Biol.

Ckem. 256, 11905-l 1910. Bagnara, A. S. & Finch, L. R. (1974). Eur. J. B&hem.

41, 421430. Barkay, T. & Goldfarb, A. (1982). J. Mol. Hiol. 162,

299-3 15. Berk, A. J. & Sharp, P. A. (1977). Cell, 12. 721l732. Berkner, K. L. & Folk, W. R. (1980). In Methoda in

Enzymology (Grossman, L. 6 Moldave, K., eds), vol. 65, part I, pp. 28-36, Academic Press, New York.

Blundell, M. & Kennell, D. (1974). J. Mol. Biol. 83, 1433161.

Brownlee, G. G. (1972). Determination of Sequences in RNA (Work, T. S. & Work, E., eds), North-Holland, Amsterda.m and Elsevier, New York.

Cannistraro, V. J. & Kennell, D. (1979). Xature (London), 277, 407409.

Cannistraro. V. J.. Strominger, M. B., Wice. B. M. & Kennell, D. E. (1985a). J. Biochem. Biophys. Sethods, in the press.

Cannistraro, V. J., Wice, B. M. & Kennell. D. E. (1985b). J. Biochem. Biophys. Methods, in the press.

Carpousis, A. J., Stefano, J. E. & Gralla, J. D. (1982). J. Mol. Biol. 157, 619-633.

Cone, K. C., Sellitti, M. A. & Steege, D. A. (1983). J. Biol. Chem. 258, 1129611304.

Dickson, R. C., Abelson, J. N., Barnes. W. M. & Reznikoff, W. S. (1975). Science, 187, 27-35.

Dunn, J. J. & Studier, F. W. (1973). Proc. Nat. Acad. Sci., U.S.A. 70, 329G-3300.

Friesen, J. D., Fiel, N. P. & von Meyenberg, K. (1975). J. Biol. Chem. 250, 304-309.

Gilbert, W. & Maxam, A. (1973). Proc. Nat. Acad. Sci., I/‘..S.A. 70, 3581-3585.

Gilbert, W. & Miiller-Hill, B. (1970). In The Lactose Operon (Beckwith, J. R. & Zipser. D., eds), pp. 93-109, Cold Spring Harbor Laboratory Press, New York.

248 V. J. Cannistraro and D. Kennel1

Hausler, B. & Somerville, R. (1979). J. Mol. BioZ. 127, 353-356.

Hawley, D. K. & McClure, W. R. (1983). NucZ. Acids Res. 11, 2237-2255.

Horowitz, H. & Platt, T. (1982). J. BioZ. Chem. 257, 1174911746.

Kennell, D. & Riezman. H. (1977). J. Mol. BioZ. 114, l- 21.

Kennell, D. E. (1985). In From Gene to Protein: Steps Dictating the Maximal Level of Gene Expression (Reznikoff, W. S. $ Gold, L., eds), Benjamin/Cummings, Menlo Park, California, in the press.

Lim, L. W. & Kennell, D. (1979). J. MOE. Biol. 135, 36% 390.

Lim, L. W. & Kennell, D. (1980). J. MOE. BioZ. 141, 227- 233.

Lozeron, H. A., Anevski, P. J. 6 Apirion, D. (1977). J. Mol. Biol. 109, 359-365.

Maizels, N. (1973). Proc. Nat. Acud. Sci., U.S.A. 70, 3585-3589.

Maizels, N. (1974). Nature (London), 215, 647-649. Majors, J. (1975). Proc. Nat. Aead. Sci., U.S.A. 72,

43944398. Malan, T. P. t McClure, W. R. (1984). Cell, 39, 173-180. Malan, T. P., Kolb, A., But, H. & McClure, W. R. (1984).

J. Mol. Biol. 180, 881-910. McClure, W. R., Hawley, D. K. & Malan, T. P. (1982) In

Promoters: Structure and Function (Rodriguez, R. L. & Chamberlin. M. J., eds), pp. 111-120, Praeger, New York.

Ming, Y. X., Munson, L. M. & Reznikoff, W. S. (1984). J. Mol. Biol. 172, 355-362.

Minkley, E. & Pribnow, D. (1973). J. Mol. Biol. 77. 255-277.

Musso, R. E., DiLauro, R., Adhya, S. & decrombrugghe,

Sadler, J. R. & Smith, T. F. (1971). J. Mol. Biol. 62,

B. (1977). Cell, 12, 847-854. Reznikoff, W. S., Michels, C. A., Cooper, T. G.,

Silverstone,

139169.

A. E. & Magasanik, B. (1974). J. Bacterial. 117, 1231-1239.

Reznikoff, W. S., Maquet, L. E., Munson, L. M., Johnson. R. C. 6 Mandecki, W. (1982). In Promoters: Structure and Function (Rodriquez, R. L. & Chamberlin, M. J.. eds), pp. 8995, Praeger, New York.

Rosenberg, M. & Court, D. (1979). Annu. Rev. Genet. 13, 31S353.

Shine, J. & Dalgarno, L. (1974). Proc. Nat. Acad. Sci., U.S.A. 71, 1324-1346.

Siebenlist, U., Simpson, R. B. & Gilbert, W. (1980). Cell, 20, 26S281.

Silverstone, A. E., Magasanik, B., Reznikoff, W. S.. Miller, J. H. & Beckwith, J. R. (1969). Nature (London), 221, 1012-1014.

Smith, T. F. & Sadler, J. R. (1971). J. Mol. Biol. 104, 285-298.

Squires, C.. Lee, F., Bertrand, K., Squires, C. L., Bronson, M. J. & Yanofsky, C. (1976). J. Mol. Biol. 103, 351-389.

Steitz, J. A. & Jakes, K. (1975). Proc. Nut. Acad. Sci., U.S.A. 72, 47364738.

Talkad, V., Achord, D. & Kennell, D. (1978). J. Bacterial. 135, 528-541.

Taniguehi, T., O’Neill, M. & decrombrugghe, B. (1979). Proc. Nat. Acad. Sci., U.S. A. 76, 509&5094.

Weaver, R. F. & Weissman, C. (1979). NucZ. Acids Res. 7. 11751193.

Zuker, M. & Stiegler, P. (1981). Nucl. Acids Res. 9, 133-148.

Edited by M. Gottesman