The complete nucleotide sequence of the ilvGMEDA cluster of Escherichia coli K-12

14
Gene, 56 (1987) 185-198 Elsevier 185 GEN 02069 The complete nucleotide sequence of the iZvGMEDA cluster of Escherichia coli K-12 (Recombinant DNA; codon usage; threonine deaminase; yeast mitochondria) James L. Cox, Betty J. Cox, Vincenzo Fidanza and David H. Calhoun Department of Microbiology, Mount Sinai School of Medicine, New York, NY 10029 (U.S.A.) Received 5 February 1987 Revised 4 May 1987 Accepted 4 May 1987 SUMMARY The ilvGA4EDA gene cluster of Escherichiu colt’ K-12 has been the focus of intensive genetic and biochemical analysis for the past 30 years. Genetic regulation of the ilvGMEDA cluster involves attenuation, internal promoters, internal Rho-dependent termination sites, a site of polarity in the ilvG pseudogene of the wild-type organism, and autoregulation by the ilvA gene product, the biosynthetic L-threonine deaminase. We have now completed the nucleotide sequence of the 6600-bp cluster and have analyzed it, along with the ilvYC, ilvBN, and i/vIH genes, for codon frequencies and possible evolutionary relationships. The isoleucine content of each of the gene products of the ilvGA4EDA cluster is quite similar (less than a two-fold variation), thus excluding one possible interpretation of the isoleucine-specific downstream amplification phenomenon. There is no evidence for retrograde evolution in the cluster since no significant homologies are detectable among genes that catalyze sequential reactions of the pathway. A highly significant homology does exist, however, between the threonine deaminases of yeast mitochondria and E. coli. The sequence at the boundary of the ilvA and ilvD genes is TAATAATG, so that the second TAA stop codon of ifvD overlaps the ATG initiation codon of ilvA. INTRODUCTION Correspondence 10: Dr. David H. Calhoun, Division of Bio- chemistry, Department of Chemistry, The City College of New York, Convent Avenue and 138th Street, New York, NY 10031 (U.S.A.), Tel. (212)690-8306. Abbreviations: aa, amino acid(s); bp, base pair(s); kb, kilobase or 1000 bp; nt, nucleotide(s); ORF, open reading frame; P2 index, frequency ofutilization of 16 specific codons (see Table II, footnote g); Pollk, Klenow (large) fragment of E. coli DNA polymerase I; Q/D ratio, the quartet to duet ratio for sextet codons (see Table II, footnote c); RBS, ribosome-binding site; Rho, termination factor for E. coli; SDS, sodium dodecyl sulfate. The biosynthesis of isoleucine, valine, and leucine in E. cofi K-12 involves the participation of parallel pathways, genes located in several chromosomal sites, and elaborate, but well-integrated, sets of con- trols at the level enzyme activity and synthesis (Umbarger, 1983). The largest cluster of genes, ifvGMEDA, is located at 85 min and has several interesting genetic regulatory features which are presently the subject ofactive investigation. Approxi- mately half (3819 bp) of the cluster has previously been sequenced, including 278 bp 5’ to the mRNA 0378-I I lY,X7.‘$03.50 0 IYX7 klse~w Sc~cnce Pubhs hers B.V. (BiomeduA DIVISION)

Transcript of The complete nucleotide sequence of the ilvGMEDA cluster of Escherichia coli K-12

Gene, 56 (1987) 185-198

Elsevier

185

GEN 02069

The complete nucleotide sequence of the iZvGMEDA cluster of Escherichia coli K-12

(Recombinant DNA; codon usage; threonine deaminase; yeast mitochondria)

James L. Cox, Betty J. Cox, Vincenzo Fidanza and David H. Calhoun

Department of Microbiology, Mount Sinai School of Medicine, New York, NY 10029 (U.S.A.)

Received 5 February 1987

Revised 4 May 1987

Accepted 4 May 1987

SUMMARY

The ilvGA4EDA gene cluster of Escherichiu colt’ K-12 has been the focus of intensive genetic and biochemical

analysis for the past 30 years. Genetic regulation of the ilvGMEDA cluster involves attenuation, internal

promoters, internal Rho-dependent termination sites, a site of polarity in the ilvG pseudogene of the wild-type

organism, and autoregulation by the ilvA gene product, the biosynthetic L-threonine deaminase. We have now

completed the nucleotide sequence of the 6600-bp cluster and have analyzed it, along with the ilvYC, ilvBN,

and i/vIH genes, for codon frequencies and possible evolutionary relationships. The isoleucine content of each

of the gene products of the ilvGA4EDA cluster is quite similar (less than a two-fold variation), thus excluding

one possible interpretation of the isoleucine-specific downstream amplification phenomenon. There is no

evidence for retrograde evolution in the cluster since no significant homologies are detectable among genes that

catalyze sequential reactions of the pathway. A highly significant homology does exist, however, between the

threonine deaminases of yeast mitochondria and E. coli. The sequence at the boundary of the ilvA and ilvD genes

is TAATAATG, so that the second TAA stop codon of ifvD overlaps the ATG initiation codon of ilvA.

INTRODUCTION

Correspondence 10: Dr. David H. Calhoun, Division of Bio-

chemistry, Department of Chemistry, The City College of New

York, Convent Avenue and 138th Street, New York, NY 10031

(U.S.A.), Tel. (212)690-8306.

Abbreviations: aa, amino acid(s); bp, base pair(s); kb, kilobase

or 1000 bp; nt, nucleotide(s); ORF, open reading frame; P2

index, frequency ofutilization of 16 specific codons (see Table II,

footnote g); Pollk, Klenow (large) fragment of E. coli DNA

polymerase I; Q/D ratio, the quartet to duet ratio for sextet

codons (see Table II, footnote c); RBS, ribosome-binding site;

Rho, termination factor for E. coli; SDS, sodium dodecyl sulfate.

The biosynthesis of isoleucine, valine, and leucine

in E. cofi K-12 involves the participation of parallel

pathways, genes located in several chromosomal

sites, and elaborate, but well-integrated, sets of con-

trols at the level enzyme activity and synthesis

(Umbarger, 1983). The largest cluster of genes,

ifvGMEDA, is located at 85 min and has several

interesting genetic regulatory features which are

presently the subject ofactive investigation. Approxi-

mately half (3819 bp) of the cluster has previously

been sequenced, including 278 bp 5’ to the mRNA

0378-I I lY,X7.‘$03.50 0 IYX7 klse~w Sc~cnce Pubhs hers B.V. (BiomeduA DIVISION)

186

start point, ihiG (1646 bp) and i/VA (260 bp) (Lawther

et al., 1981), ilvE (929 bp) (Kuramitsu et al., 1985)

and 241 bp of the 3’-end of ilvA (Wek and Hatfield,

1986). We report here the sequence of ilvD ( 1853 bp)

and ilvA ( 1544 bp), thus completing the sequence of

the 6600-bp cluster, and present an overall analysis

from several perspectives ofthese live i/v genes along

with the six other sequenced ih genes.

MATERIALS AND METHODS

(a) Vectors and methods

The 2262-bp Nindlll-EcoRI fragment containing

ilvD and i/vA coding sequences (see Fig. 1) was

cloned into M 13mp8 and M 13mp 11 (Messing, 1983)

for nucleotidc sequence analysis by the method of

Sanger et al. (1977). Deletions were introduced using

the method of Dale et al. (1985). A synthetic oligo-

[PGI

deoxynucleotide primer that was used to obtain one

segment of the sequence (Fig. 1) was prepared on a

SAM I synthesizer (Biosearch, San Rafael, CA)

using the phosphoramidite method as described by

the manufacturer. The 1657-bp Hind111 segment

containing ilvE and ilvD coding sequences (see

Fig. 1) was cloned into M13mpX in both orienta-

tions, and the indicated deletions were constructed

as described above. Restriction enzymes, PolIk. and

DNA ligase were obtained from New England

Biolabs or Boehringer Mannheim. International

Biotechnologies Incorporated provided reagents for

constructing deletions using the method of Dale

ct al. (1985).

(h) Sequence analysis

Nucleotide sequences were collected using a

model GP7 sonic digitizer (Gaf Bar Instruments)

and overlaps were generated using the MERGE

. ,I250 [=I

EpEl

y-$%] 1000 2000 2101 . 3000 4000 5000 6000 6588 *

I""""""""""""""""'~

33 131 271 1917 2177 2197 3126 3191 5044 6588

G j M f--?-l 1 D I A 01

FS

1276 KpnI I I

1471

Pvu lr

Hind III Stll I

I I 2082 2438

Hind III I

Hind III Sma I 3878 4118 4396 XhoI Sal I 5468 EcoRI

I I f- I I I I 1 I I 1 3568 4097 4912 5395 6358

Pvu II PVUII PVUI Pvu I

43 c-----, 4lH

42 k+ 15* I Hind m I4 -

- 31 I9 - - 32 114 -18- 12-

Ill - II3 - 13- II - 16t-- -II0 II2 -

Hind III I - 211 - 213 - 23 c--l ?*

w 210 - 28 u 24 M 212 - 27 c-r25 -22 - 26

w 214 u 29

I+‘&. 1. The i/vGMEDA cluster. The figure depicts the five structural genes and the start point of transcription preceding ilvG (controlled

by PC,) at nt position 1, the leader peptide (33-131 bp) associated with the attenuator and the attenuated (ATN) transcript (185 bp),

the site of the naturally occurring frameshift (F’S) at 1250 bp in ilvG, the internal site of transcription initiation (controlled by Pt)

preceding the i/vE gene, some selected restriction enzyme sites derived directly from the nucleotide sequence, and the sequencing strategy

we used to complete the sequence of the cluster. The second TAA stop codon for ihD overlaps by one base pair the ATG start codon

of i/r,A at nt position 5044. The deletion clones used to obtain the sequence are indicated (e.g., 41, 42, 15, etc.) and are shown above

and below the HindIll-Hind111 and HindIIl-EcoRI segments. The sequence obtained from the segment designated 214 was obtained

using the full-length template with a specifically constructed primer designed to confirm the overlap between clones 22 and 213.

1x7

routine of MicroGenie version 3.2 (Beckman Instru-

ments). Computerized data banks of amino acid

(NBRF) and nucleotide sequences (EMBL and

NIH) were searched in June, 1986 using the

XFASTN, XFASTP, and IFIND protocols availa-

ble on Bionet (Intelligenetics, Inc., Mountain View,

CA). The MicroGenie program was used to search

for inverted repeat sequences, to calculate codon

frequencies, amino acid composition, and molecular

weights, and for pairwise comparisons of the various

ilv genes and peptides as well as for comparisons of

the threonine deaminases from E. coli and yeast (see

RESuLTS, section d).

We used three published methods to analyze

codon usage. The Q/D ratio (Grantham et al., 1981)

analyzes the quartet-to-duet ratios among sextet

codons. The Grosjean and Fiers (1982) method

analyzes the frequency of eight modulatory codons

that are less common in highly expressed coding

regions. The P2 index (Gouy and Gautier, 1982) is

based on the frequency of utilization of 16 specific

codons that correlates with the level of expression in

E. coli.

RESULISAND DISCUSSION

(a) Nucleotide sequence analysis

The ilvGA4EDA cluster and the strategy we used to

complete the sequence analysis is indicated in Fig. 1.

Sequences of the 5’-end of the cluster (bp l-3490)

and 233 bp at the 3’-end of the cluster (bp

6 125-6358) have previously been reported (see

INTRODUCTION). To complete the sequence, we sub-

cloned the 1659-bp Hind111 segment and the 2262-bp

HindIII-EcoRI segment (Fig. 1) into phages

M 13mp8 and M 13mp 11 and constructed nested sets

of deletion derivatives for sequence analysis using

the enzymatic method (Sanger et al., 1981).

Our sequence, which was obtained several times

and from both strands, overlaps and confirms that

reported by Kuramitsu et al. (1985) at the 5’-end,

and it extends to the EroRI site near the 3’-terminus

of ihA (Wek and Hatfield, 1986). We find only two

long ORFs, and these code for peptides of the size

expected for the ilvD- and ihA-coded gene products.

(b) The ifvD gene

An ORF that encodes a 615-aa peptide of 64219

Da extends from nt position 3191 to tandem stop

codons at nt position 5039-5044 (Tables I and II

Fig. 2). Although all attempts to purify the i/vD-

coded dihydroxyacid dehydrase, which is very labile

in cell extracts, have been unsuccessful (D. Patin

and D.H.C., unpublished results), its M,. has been

estimated using two methods. First, we used (Gray

et al., 1981) a set of Adilv phages that contains

combinations of the ilv genes (i.e., GEDA, EDA, DA,

and A) for infection of IJV-irradiated E. coli cells in

the presence of [ “Slmethionine. The presence of the

ihD gene in the phage correlated with the synthesis

of a 65-kDa protein, as detected by autoradiography

of lo-15”,, SDS gels (Gray et al., 1981). In addition,

we used maxicells (Gray et al., 198 1; 1982) to further

localize the coding region for this peptide (and for

threonine deaminasc; see below). Thus, there is good

agreement between the predicted and observed M,s

for dihydroxyacid dehydrase. and the location of the

coding sequences is as previously predicted (Gray

et al., 198 1).

There is a relatively long segment of 64 bp separat-

ing the UAA stop codon of ilvE (nt position 3 126)

from the first in-frame AUG start codon (nt position

3 191) in the ilvD coding region (Fig. 2). Physical

studies suggest that an unstructured region of 72-84

nt of single-stranded RNA is required for stimulation

of Rho-dependent NTP hydrolysis and transcription

termination by Rho factor (reviewed in von Hipple

et al., 1984). Thus, this region is a candidate for a

Rho-dependent site of polarity, which would be con-

sistent with previous evidence that one or more such

sites may exist within this cluster (Smith et al., 1976).

However, we do note that an inverted repeat is

presented in the sequence immediately following the

ilvE stop codon (indicated in Fig. 2) that potentially

could form a particularly stable stem-and-loop struc-

ture (dG = -25.0 kcalimol), so that only 36 nt would

remain distal to this stem to serve as a Rho-

recognition structure. A consensus RBS is appro-

priately situated upstream from the ALJG codon at

nt position 3 177-3 18 1.

The sequence TAATAATG occurs at the bound-

ary of ilvD and i/VA, so that the ilvD gene ends with

tandem TAA stop codons, and the ATG codon for

ilvA overlaps the second stop codon by 1 nt. A

Cod

on

usag

e fo

r i/s

-cod

ed

pept

ides

*

Cod

on

Am

ino

i/v l

oci

acid

TT

T

Phe

TT

C

Phe

T T

A

Leu

TT

G

LW

CT

T

LC

U

CT

C

Leu

CT

A

Leu

CT

G

Leu

A T

‘1

lie

AT

C

lie

A T

A

lie L

A T

G

Met

G1.

T

Val

GT

C

Val

GT

A

Val

GT

G

Val

TC

? Se

r

TC

C

Ser

TC

A

Ser

TC

G

Ser

cc-l

- Pr

o

CC

C

Pro

CC

A

Pro

CC

G

Pro

AC

T

Thr

A c

c

Thr

.4 C

A

Thr

AC

G

Thr

GC

T

Akl

GC

C

Ala

GC

A

Ala

GC

G

Ala

---

x (1

.5)

9 (1

.7)

I6 (

3.1)

9 (1

.7)

4 (0

.8)

4 (0

.8)

1 (0

.2)

75 (

4.X

)

0 (0

.0)

I9 (

3.6)

1 (0

.2)

19 (

3.6)

14 (

2.7)

8 (1

.5)

3 (0

.6)

14 (

2.7)

2 (c

1.4)

5 (1

.0)

4 (0

.8)

3 (0

.6)

5 (1

.0)

2 (0

.4)

5 (1

.0)

18 (

3.5)

5 (1

.0)

20 (

3.8)

2 (0

.4)

2 (0

.4)

9 (1

.7)

13 (

2.7)

19 (

3.6)

17 (

3.3)

M

1 (1

.1)

2 (2

.3)

4 (4

.5)

1 (1

.1)

0 (0

.0)

0 (0

.0)

0 (0

.0)

2 (2

.3)

0 (0

.0)

3 (3

.4)

i(I.

1)

4 (4

.5j

3 (3

.4)

4 (4

.5)

1 (1

.1)

3 (3

.4)

0 (0

.0)

0 (0

.0)

2 (2

.3)

2 (2

.3)

0 (0

.0)

0 (0

.0)

2 (2

.3)

0 (0

.0)

0 (0

.0)

3 (3

.4)

2 (2

.3)

0 (0

.0)

l(l.1

)

b (6

.8)

2 (2

.3)

0 (0

.0)

E 3

(1.0

)

10 (

3.2)

1 (0

.3)

0 (0

.0)

0 (0

.0)

3 (1

.0)

0 (0

.0)

17 (

5.5)

1 (2

.3)

13 (

4.2)

0 (0

.0)

8 (2

.6)

9 (2

.9)

1 (0

.3)

4 (1

.3)

11 (

3.5)

2 (0

.6)

x (2

.6)

1 (0

.3)

5 (I

.61

0 (0

.0)

0 (0

.0)

4 (1

.3)

9 (2

.9)

2 (0

.6)

9 (2

.9)

0 (0

.0)

4 (1

.3)

4 (1

.3)

5 (1

.6)

6 (1

.9)

13 (

4.2)

D

_~ 5

(0.8

)

10 (

1.7)

1 (0

.2)

3 (O

Sj

8 (1

.3)

10 (

1.7)

0 (0

.0)

30 (

5.0)

12 (

2.0)

19 (

3.2)

1 (0

.2)

22 (

3.7j

14 (

2.3)

10 (

1.7)

9 (1

.5)

14 (

2.3j

6 (1

.0)

10 (

1.7)

5 (0

.8)

6 (l

.Oj

4 (0

.7)

3 (0

.5)

5 (0

.8)

17 (

2.8)

2 (0

.3)

20

(3.3

)

4 (0

.7)

6 (1

.0)

6 (1

.0)

17 (

2.8)

8 (1

.3)

25

(4.2

)

A

Y c

1 H

B

8 (1

.6)

17 (

3.3)

4 jO

.8)

6 (1

.2)

4 (0

.8)

10 (

1.9)

3 (0

.6)

32 (

6.2)

7 (1

.4)

13 (

2.5)

0 (0

.0)

12 (

2.3)

5 (1

.0)

9 (1

.7)

6 (1

.2)

20

(3.9

)

5 (1

.0)

4 (0

.8)

3 (0

.6)

5 (1

.0)

0 (0

.0)

2 (0

.4)

4 (0

.8)

15 (

2.9)

1 (0

.2)

10 (

1.9)

0 (0

.0)

5 (1

.0)

6 (1

.2)

15 (

2.9)

3 (0

.6)

36 (

7.0)

6 (2

.0)

5 (1

.7)

3 (1

.0)

6 (2

.0)

1 (0

.3)

4 (1

.3)

2 (0

.7)

18 (

6.0)

11 (

3.7)

3 (1

.0)

0 (0

.0)

8 (2

.7)

3 (1

.0)

1 (0

.3)

4 (1

.3)

18 (

6.0)

1 (0

.3)

I (0

.3)

2 (0

.7)

6 (2

.0)

0 (0

.0)

4 (1

.3)

I (0

.3)

16 (

5.4)

4 (1

.3)

4 (1

.3)

0 (0

.0)

7 (2

.3)

1 (0

.3)

5 (1

.7)

6 (2

.0)

14 (

4.7)

4 (0

.8)

12 (

2.4)

0 (0

.0)

2 (0

.4)

2 (0

.4)

2 (0

.4)

0 (0

.0)

39 (

7.9)

12 (

2.4)

16 (

3.3)

0 (0

.0)

21 (

4.3)

6 (1

.2)

4 (0

.8)

8 (1

.6)

13 (

2.6)

IO (

2.0)

6 (1

.2)

1 (0

.2)

2 (0

.4)

0 (0

.0)

1 (0

.2)

4 (0

.8)

12 (

2.4)

1 (0

.2)

17 (

3.5)

2 (0

.4)

2 (0

.4)

9 (1

.8)

8 (1

.6)

8 (1

.6)

25

(5.1

)

6 (1

.1)

7 (1

.2)

7 (1.2)

13 (

2.3)

7 (1

.2)

4 (0

.7)

1 (0

.2)

17 (

3.0)

16 (

2.8)

12 (

2.1)

2 (0

.4)

23

(4.1

)

11 (

1.9)

17 (

3.0)

11 (

1.9)

17 (

3.0)

7 (1

.2)

7 (1

.2)

2 (0

.4)

8 (1

.4)

2 (0

.4)

7 (1

.2)

8 (1

.4)

11 (

1.9)

9 (1

.6)

11 (

1.9)

0 (0

.0)

11 (

1.9)

l(1.

2)

9 (1

.6)

11 (

1.9)

23 (

4.1)

2 (1

.2)

1 (0

.6)

1 (4

.3)

2 (1

.2)

5 (3

.1)

1 (0

.6)

0 (0

.0)

3 (1

.9)

6 (3

.7)

4 (2

.5)

2 (1

.2)

3 (1

.9)

4 (2

.5)

5 (3

.1)

2 (1

.2)

8 (5

.0)

1 (0

.6)

2 (1

.2)

2 (1

.2)

4 (2

.5)

0 (0

.0)

1 (0

.6)

1 (0

.6)

1 (0

.6)

0 (0

.0)

6 (3

.7)

2 (1

.2)

1 (0

.6)

I (0

.6)

1 (0

.6)

3 (1

.9)

4 (2

.5)

10 (

1.8)

7 (1

.2j

3 (0

.5)

12 (

2.1)

1 (0

.2)

4 (0

.7)

0 (0

.0)

27 (

4.8)

21 (

3.7)

24 (

4.3)

2 (0

.4)

21 (

,3.7

)

9 (1

.6)

6 (1

.1)

1 (0

.2)

18 (

3.2)

2 (0

.4)

3 (0

.5)

2 (0

.4)

3 (0

.7)

5 (0

.9)

5 (0

.9j

5 (0

.9)

22

(3.9

)

4 (0

.7)

17 (

3.0)

4 (0

.7)

4 (0

.7)

6 (1

.1)

24 (

4.3)

9 (1

.6)

32 (

5.7)

N

E.

coli

( 199

gen

es)

h .-

- 5 (5

.2)

0 (0

.0)

(1.6

)

(2.0

)

0 (0

.0)

(0.8

) 0

(0.0

) (1

.0)

2 (2

.1)

(0.8

)

1 (1

.0)

(0.9

)

1 (1

.0)

(0.2

) 5

(5.2

) (5

.9)

3 (3

.1)

3 (3

.1)

1 (1

.0)

4 (4

.1)

3 (3

.1)

2 (2

.1)

2 (2

.1)

3 (3

.1)

0 (0

.0)

1 (1

.0)

0 (0

.0)

0 (0

.0)

0 (0

.0)

0 (0

.0)

0 (0

.0)

3 (3

.1)

I (1

.0)

2 (2

.1)

1 (1

.0)

1 (1

.0)

1 (1

.0)

1 (l

.Oj

0 (0

.0)

1 (1

.0)

(2.4

)

(3.2

)

(0.2

)

(2.7

)

(2.4

)

(1.3

)

(1.3

)

(2.4

)

(1.1

)

(1.1

)

(0.5

)

(0.7

)

(0.5

)

(0.3

)

(0.7

)

(2.6

)

(1.1

)

(2.5

)

(0.5

)

(1.1

)

(2.0

)

12.3

)

(2.1

) (3

.5)

TA

T

TA

C

TA

A

TA

G

CA

T

CA

C

CA

A

CA

G

AA

T

AA

C

AA

A

AA

G

GA

T

GA

C

GA

A

GA

G

TG

T

TG

C

TG

A

TG

G

CG

T

CG

C

CG

A

CG

G

AG

T

AG

C

AG

A

AG

G

GG

T

GG

C

GG

A

GG

G

Tyr

T

yr

End

End

His

His

Gltl

Gln

Asn

Asn

EY

S

EY

s

Asp

Asp

Glu

Glu

CY

s

CY

s E

nd

Trp

Arg

Arg

A

rg ’

Arg

Ser

Ser

Arg

Arg

GU

Y

GU

Y

Gly

Gly

5 (1

.0)

7 (1

.3)

0 (0

.0)

0 (0

.0)

9 (1

.7)

8 (1

.5)

13 (

2.5)

19 (

3.6)

5 (1

.0)

10 (

1.9)

14 (

2.7)

3 (0

.6)

18 (

3.5)

15 (

2.9)

17 (

3.3)

8 (1

.5)

5 (1

.0)

4 (0

.8)

0 (0

.0)

6 (1

.2)

8 (1

.5)

4 (0

.8)

4 (0

.8)

3 (0

.6)

3 (0

.6)

5 (1

.0)

0 (0

.0)

0 (0

.0)

16 (

3.1)

19 (

3.6)

1 (0

.2)

6 (1

.2)

0 (0

.0)

0 (0

.0)

0 (0

.0)

0 (0

.0)

2 (2

.3)

2 (2

.3)

4 (4

.5)

3 (3

.4)

6 (6

.8)

0 (0

.0)

l(l.1

)

0 (0

.0)

1 (1

.1)

2 (2

.3)

3 (3

.4)

0 (0

.0)

0 (0

.0)

2 (2

.3)

1 (1

.1)

0 (0

.0)

3 (3

.4)

3 (3

.4)

0 (0

.0)

l(l.1

)

1 (1

.1)

3 (3

.4)

0 (0

.0)

0 (0

.0)

1 (1

.1)

0 (0

.0)

0 (0

.0)

0 (0

.0)

-

6 (1

.9)

5 (1

.6)

1 (0

.3)

0 (0

.0)

4 (1

.3)

4 (1

.3)

3 (1

.0)

7 (2

.3)

3 (1

.0)

7 (2

.3)

10 (

3.2)

2 (0

.6)

10 (

3.2)

6 (1

.9)

16 (

5.2)

6 (1

.9)

2 (0

.6)

1 (0

.3)

0 (0

.0)

6 (1

.9)

8 (2

.6)

10 (

3.2)

0 (0

.0)

0 (0

.0)

0 (0

.0)

4 (1

.3)

0 (0

.0)

0 (0

.0)

11 (

3.5)

12 (

3.9)

5 (1

.6)

2 (0

.6)

5 (0

.8)

7 (1

.2)

1 (0

.2)

0 (0

.0)

4 (0

.7)

8 (1

.3)

7 (1

.2)

13 (

2.2)

5 (0

.8)

15 (

2.5)

21 (

3.5)

10 (

1.7)

24 (

4.0)

16 (

2.7)

22 (

3.7)

5 (0

.8)

5 (0

.8)

10 (

1.7)

0 (0

.0)

4 (0

.7)

9 (1

.5)

15 (

2.5)

6 (1

.0)

2 (0

.3)

3 (0

.5)

8 (1

.3)

0 (0

.0)

2 (0

.3)

20

(3.3

)

28 (

4.7)

4 (0

.7)

9 (1

.5)

5 (1

.0)

9 (1

.7)

0 (0

.0)

1 (0

.2)

7 (1

.4)

9 (1

.7)

6 (1

.2)

13 (

2.5)

2 (0

.4)

12 (

2.3)

16 (

3.1)

5 (1

.0)

13 (

2.5)

16 (

3.1)

33 (

6.4)

5 (1

.0)

1 (0

.2)

6 (1

.2)

0 (0

.0)

2 (0

.4)

6 (1

.2)

19 (

3.7)

0 (0

.0)

4 (0

.8)

0 (0

.0)

7 (1

.4)

1 (0

.2)

1 (0

.2)

10 (

1.9)

29 (

5.6)

2 (0

.4)

5 (1

.0)

1 (0

.3)

2 (0

.7)

0 (0

.0)

0 (0

.0)

6 (2

.0)

4 (1

.3)

2 (0

.7)

11 (

3.7)

3 (1

.0)

5 (1

.7)

7 (2

.3)

4 (1

.3)

11 (

3.7)

I (0

.3)

15 (

5.0)

9 (3

.0)

2 (0

.7)

2 (0

.7)

1 (0

.3)

3 (1

.0)

2 (0

.7)

16 (

5.4)

1 (0

.3)

3 (1

.0)

I (0

.3)

6 (2

.0)

0 (0

.0)

0 (0

.0)

4 (1

.3)

8 (2

.7)

0 (0

.0)

3 (1

.0)

6 (1

.2)

11 (

2.2)

1 (0

.2)

0 (0

.0)

2 (0

.4)

5 (1

.0)

3 (0

.6)

18 (

3.7)

3 (0

.6)

13 (

2.6)

28 (

5.7)

6 (1

.2)

17 (

3.5)

9 (1

.8)

28 (

5.7)

13 (

2.6)

3 (0

.6)

3 (0

.6)

0 (0

.0)

5 (1

.0)

14 (

2.8)

7 (1

.4)

0 (0

.0)

0 (0

.0)

0 (0

.0)

2 (0

.4)

0 (0

.0)

0 (0

.0)

17 (

3.5)

27 (

5.5)

0 (0

.0)

2 (0

.4)

11 (

1.9)

7 (1

.2)

1 (0

.2)

0 (0

.0)

14 (

2.5)

4 (0

.7)

12 (

2.1)

19 (

3.4)

10 (

1.8)

7 (1

.2)

16 (

2.8)

7 (1

.2)

18 (

3.2)

9 (1

.6)

13 (

2.3)

9 (1

.6)

2 (0

.4)

8 (1

.4)

0 (0

.0)

12 (

2.1)

8 (1

.4)

9 (1

.6)

5 (0

.9)

1 (0

.2)

4 (0

.7)

4 (0

.7)

1 (0

.2)

2 (0

.4)

13 (

2.3)

15 (

2.6)

7 (1

.2)

15 (

2.6)

I (0

.6)

2 (1

.2)

1 (0

.6)

0 (0

.0)

1 (0

.6)

1 (0

.6)

3 (1

.9)

5 (3

.1)

2 (1

.2)

1 (0

.6)

6 (3

.7)

2 (1

.2)

5 (3

.1)

2 (1

.2)

8 (5

.0)

4 (2

.5)

0 (0

.0)

0 (0

.0)

0 (0

.0)

0 (0

.0)

5 (3

.1)

6 (3

.7)

0 (0

.0)

2 (1

.2)

2 (1

.2)

3 (1

.9)

0 (0

.0)

0 (0

.0)

3 (1

.9)

I (4

.3)

1 (0

.6)

3 (1

.9)

5 (0

.9)

5 (0

.9)

1 (0

.2)

0 (0

.0)

10 (

1.8)

4 (0

.7)

9 (1

.6)

25 (

4.4)

Y (

1.6)

10 (

1.8)

14 (

2.5)

4 (0

.7)

17 (

3.0)

12 (

2.1)

21

(3.7

)

9 (1

.6)

7 (1

.2)

2 (0

.4)

0 (0

.0)

4 (0

.7)

8 (1

.4)

12 (

2.1)

0 (0

.0)

3 (0

.5)

3 (0

.5)

9 (1

.6)

1 (0

.2)

0 (0

.0)

12 (

2.1)

26 (

4.6)

3 (0

.5)

4 (0

.7)

0 (0

.0)

0 (0

.0)

1 (1

.0)

0 (0

.0)

3 (3

.1)

0 (0

.0)

2 (2

.1)

6 (6

.2)

2 (2

.1)

5 (5

.2)

2 (2

.1)

2 (2

.1)

3 (3

.1)

6 (6

.2)

2 (2

.1)

2 (2

.1)

2 (2

.1)

0 (0

.0)

0 (0

.0)

1 (1

.0)

2 (2

.1)

3 (3

.1)

0 (0

.0)

0 (0

.0)

0 (0

.0)

3 (3

.1)

0 (0

.0)

0 (0

.0)

0 (0

.0)

3 (3

.1)

0 (0

.0)

0 (0

.0)

(1.3

)

(1.5

)

(0.2

)

(0.0

2)

(0.9

)

(1.2

)

(1.2

)

(3.2

)

(1.3

)

(2.7

)

(3.9

)

(1.2

)

(3.0

)

(2.5

)

(4.7

)

(1.9

)

(0.4

)

(0.6

) (0

.06)

(1.1

)

(2.9

)

(2.2

)

(0.2

)

(0.3

)

(0.5

)

(1.5

)

(0.1

) (0

.06)

(3.3

)

(3.1

)

to.5

1

(0.8

)

h T

he

codo

ns

are

arra

nged

(t

op

to

botto

m)

in t

he

orde

r T

, C

, A

, G

fo

r th

e fi

rst,

seco

nd,

and

thir

d po

sitio

ns,

resp

ectiv

ely.

T

he

num

ber

of a

min

o ac

ids

for

each

co

don

is i

ndic

ated

an

d

the

perc

enta

ge

of t

he

tota

l co

don

usag

e fo

r th

e pr

otei

n co

ded

by

the

gene

in

eac

h co

lum

n is

giv

en

in p

aren

thes

es.

The

ilv

loci

(G

, IV

, E

, et

c.)

are

indi

cate

d fo

r ea

ch

colu

mn.

h T

his

com

pila

tion

of

199

E.

coli

gene

s w

as

prep

ared

fr

om

the

data

pr

esen

ted

by

Mar

uyam

a et

al.

(198

6),

and

is b

ased

on

re

leas

e N

o.

38 (

Nov

embe

r,

1985

) of

the

G

enB

ank

data

base

.

c M

odul

ator

y co

dons

(G

rosj

ean

and

Fier

s,

1982

).

&

3091

,

3150

GATAAATGGGGCTGGTTAGATCAAGTTAATCAATARATACGGGACGGCACGCA

DKWGWLDQVNQ'

>)>>>>>>>)> 3210

CCGTCCCAT;'TACGAGACAGACACTG~~T~T~G~ATGCCT~GTACCGTTCCGC

c<<<<<<<<<<

MPKYRSA

3270

CACCACCACTCATGGTCGTATATGGCGGGTGCTCGTGCGT

TTTHGRNMAGARALWRATGM

3330

GACCGACGCCGATTTCGGTAGCCGATTATCGCGGTTGT;TTTGT

TDADFGKPIIAVVNSFTQFV

3390

ACCGGGTCACGTCCATCTGGCGATCTCG;;TAARCTGGT~GCCG~CAAATTG~GCGGC

PGHVHLRDLGXLVAEQIEAA

3450

TGGCGGCGTTGCCAAAGAGTTCAACACCATTGCGGTGGATGATGGGATTGCCATGGGCCA

GGVAKEFNTIAVDDGIAMGH

3510

CGGGGGGATGCTTTATTCACTGCCATCTCGCGRACTGAT;GCTGATTCCGTTGAGTATAT

G

GM

L YSLPSRELIADSVEYM

3570

GGTCAACGCCCACTGCGCCGACGCCATGGTCTGCATCTCTATCACCCC

VNAHCADAMVCISNCDKITP

3630

GGGGATGCTGATGGCTTCCSTGCGCCTGAATRTTCCGGTGCCGGCGGCCC

GMLMASLRLNIPVIFVSGGP

3690

GATGGAGGCCGGGAARACCACTTTCCGATCAGATCATCGC

MEAGKTKLSDQI

IKLDLVDA

3750

GATGATCCAGGGCGCAGACCCGAAAGTATCTGACTCCCAGCGTTC

MIQGADPKVSDSQSDQVERS

3810

CGCGTGTCCGACCTGCGGT;CCTGCTCCGGGATGTTTAC~GCT~CTC~TG~CTGCCT

ACPTCGSCSGMFTANSMNCL

3870

GACCGPAGCGCTGGGCCTGTCGCAGCCGGGCAACGGCTCGGC~CGCTGCTGGC~CCCACGCCGA

TEALGLSQPGNGSLLATHAD

3930

CCGTAAGCAGCTGTTCCTTATGCTGGTAAACGCATTGTTCGTTATTA

RXQLFLNAGKRIVELTKRYY

3990

CGAGC~~GACGAAAGT;;CACTGCCGCGTAATATCGCCAGT~GGCGGCGTTTG~

EQNDESALPRNIASKAAFEN

4050

CGCCATGACGCTGGATATCECGATGGGTGdATCGACTARCCTGCTGGC

AMTLDIAMGGSTNTVLHLLA

CG

AC

CA GC

GC

CG

AT

GC

CC

GC

TA

AT

AT

AAAATACG

‘$G = -25.0

4110

GGCGGCGCAGGAAGCGGAAATCGACTTCACCATGAGTGATATCGAT~GCTTTCCCGC~

AAQEAEIDFTMSDI

DKLSRK

4170

GGTTCCACAGCTGTGTRRRGTTGCGCCGAGCACCCAG~TACCATATGG~GATGTTCA

VPQLCKVAPSTQKYHMEDVH

4230

CCGTGCTGGTGGTGTTATCGGTATTCTCGGCGAACTGGATCG

RAGGVIGILGELDRAGLLNR

4290

TGATGTG~ACGTACTTGGCCTGACGTTGCCGCTG

DVXNVLGLTLPQTLEQYDVH

GCTGACCCAGGATGACGCGGT AAAAAA'

4350

TATGTTCCGCGCiGGTCCTGCAGGCATTCGTAC

LTQDDAVKNMFRAGPAGIRT

4410

CACACAGGCATTCTCGCAAGATTGCCGTTGGGATACGCTGGACGACGATCGCGCC~TGG

TQAFSQDCRWDTLDDDRANG

.

4470

CTGTATCCGCTCGCTGGARCACGCCTACAGC~GACGGCGGCCTGGCGGTGCTCTACGG

CIRSLEHAYSKDGGLAVLYG

4530

TAACTTTGCGGAAAACGGCTGCATCGTGAARACGGCAGGCGTCGATGACAGCATCCTCkA

NFAENGCIVKTAGVDDS

I L

K

4590

ATTCACCGGCCCGGCGRRAGTGTACGRRRGCCAGGACGAGCGGTAG~GCGATTCTCGG

FTGPAKVYESQDDAVEAILG

4650

CGGTAAAGTTGTCGCCGGAGATGTGCTAGTAATTCGCTATGGCGGTCC

GKVVAGDVVVIRYEGPKGGP

4710

GGGGATGCA;;GAAATGCTCTACCCAACCA;;CTTCCTGAC

GMQEMLYPTSFLKSMGLGKA

4770

CTGTGCGCTGATCACCGACGGTCGTTTCTCTGGTGGCACCTCTGG~CTTTCCATCGGCCA

CALITDGRFSGGTSGLSIGH

4830

CGTCTCACCEGAAGCGGCAAGCGGCGGCAGCAGCATTGGCCTGATTG~GATGGTGACCTGAT

v

s

P

EAASGGS

I G L

I

EDGDLI

4890

CGCTATCGACATCCCGRACCGTGGCATTCAGTTACAGGTlGCGGC

AIDIPNRGIQLQVSDAELAA

49so

GCGTCGTGAAGCGCAGGACGCTCGAGGTGACAAAGCCTGGACGCCGAAAAATCGTGRACG

R

REAQDARGDKAWTPKNRER

5010

TCAGGTCTCCTTTGCCCTGCGTGCTTATGCCAGCRGGCGC

QVSFALRAYASLATSADKGA

--------

.

5070

GGTGCGCGATAAATCGAAACTGGGGGTTAATARTGGCTGGCTGAC~CGC~CCCCTGTCCGGT

VRDKSKLGG**M

ADSQPLSG

.

>>>>>>>>>>>>

.<i<<<<<<<<

GCTCCGGAAGGTGCCGAATATTTARGRGCAGCAGTGCTGCGCGCGCCGGTTTACGAGGCGGCG

APEGAEYLRAVLRAPVYEAA

T T

T

GA

GC

CG

CA

G

G

CG

GC

CG

C

GC

CG

GC TA

GT

GC

GG

TT

AG

= -1

9.0

<<

<

. CAGGTTACGCCGCTACAAAiAA

5190

TGGAAAAACTGTCGTCGCGTCTTGATAACGTCATTCTG

QVTPLQKMEKLSSRLDNVIL

5250

GTGAAGCGCGAAGATCGCCAGCCAGTGCACAGCTTTAAGCTGCGCGGCG~ATACGCCATG

VKREDRQPVHSFKLRGAYAM

5310

ATGGCGGGCCTGACGGAAGAAG~GCGCACGGCGTGATCACTGCTTCTGCGGGTAAC

MAGLTEEQKAHGVITASAGN

5370

CACGCGCAGGGCGTCGCGTTTTCTTCTGCGCGGTTAGGCGG

HAQGVAFSSARLGVK

AL

I

VM 5430

CCAACCGCCACCGCCGACATCARAGTCGACGCGGTGCGCGGCTTCGGCGGCGRAGTGCTG

PTATADI

KVDAVRGFGGEVL

5490

CTCCACGGCGCGAACTTTGATGAAGCGAIL;GCCRAAGCG~TCG~CTGT~ACAGCAGCAG

LHGANFDEAKAKAIELSQQQ

5590

GGGTTCACCTGGGTGCCGCCGTTCGACCATCCGACCATCCGATGGTGATTGCCGGGC~GGCACGCTG

GFTWVPPFDHPMVIAGQGTL

5610

GCGCTGGAACTGCTCCAGCAGGACGCCCATCTCGACCGCGTATTTGTGC~AGTCGGCGGC

AL

EL

L

QQDAHLDRVFVPVGG

5670

GGCGGTCTGGCTGCTGGCGTGGCGGTGCTdATCAAACAAGTG

GGLAAGVAVLIKQLMPQIKV

.

5730

ATCGCCGTAGkRGCGGRTCCGCCTGCCTGRARGCAGCGCTGGATGCGGGTCATCCG

IAVEAEDSACLKAALDAGHP

5790

GTTGATCTGCCGCGCGTAGGGCTATTTGCTG~GGCGTAGCGGT~CGCATCGGTGAC

VDLPRVGLFAEGVAVKR

I

G

D

5890

GAAACCTTCCGTTTATGCCAGGAGTATCTCGACGACATCATCACCGTCGATAGCGATGCG

ETFRLCQEYLDDI

ITVDSDA

. 5910

ATCTGTGCGGCGATGAAGGATTTATTCGGATGTGCGCGCGGTGGCGG~CCCTCTGGC

ICAAMKDLFEDVRAVAEPSG

5970

GCGCTGGCGCTGGCGGGAATG ' AAAAAATA'iATCGCCCTGtACAACATTC&GGCGAACGG

ALALAGMKXY

IALHNXRGER

6030

CTGGCGCATATTCTTTCCGGTGCCAACGTGAACTTCCACG

LAHXLSGANVNFHGLRYVSE

6090

CGCTGCGAACTGGGCGRACAGCGTGMGCGG

R

c

ELGEQREALLAVTIPEEK

, 6150

GGCAGCTTCCTCARATTCTGCCAACTGCTGCGGGCGTT

GSFLKFCQLLGGRSVTEFNY

6210

CGTTTTGCCGATGCC~CGCCTGCATCTTTGTCGGTGTGCGCCTGAGCCGCGGCCTC

RFADAKNACIFVGVRLSRGL

6270

GAAGAGCGCAARG~TTTTGCAGATGCTCAACGACGGCGGCTACAGCGTGGTTGATCTC

EERKEILQMLNDGGYSVVDL .

. 6330

TCCGACGACGAAATGGCGRAGCTACACGTGCGCTATATGGTCGGCGGACGTCCATCGCAT

SDDEMAKLHVRYMVGGRPSH

6390

CCGTTGCAGGAACGCCTCTACAGCTTCGA;LTTCCCGGAATCACCGGGCGCGCTGCTGCGC

PLQERLYSFEFPESPGALLR

6450

TTCCTCAACACGCTGGGTACGTACTGGAACATTTCTTTGTTCCACTATCGCAGCCATGGC

FLNTLGTYWNISLFHYRSHG

6510

ACCGACTACGGGCGCGTACTGGCGGCGTTCGRRCTTGGCGACCATG~CCGGATTTCG~

TDYGRVLAAFELGDHEPDFE

6570

ACCCGGCTGAATGAGCTGGGCTACGATTGCCACGACGAAACCCGGCGTTCAGG

TRLNELGYDCHDETNNPAFR

6630

TTCTTTTTGGCGGGTTAGGGAAAAATGCCTGATAGCGCTTCGCTTATCAGGCCTACCCGC

FFLAG'

>>>>>>>>>>>

<<<<<<<<<<<<

6690

GCGACAACGTCATTTGTGGTTCGGCAAAATCTTCTTCCAG~TGCCTC~TTAGCGGCTCATG

l

>>>>>>>

6710

TAGCCGCTTTTTCTGCGCAC

<cc<<<<

T

C

T

:: GC

AT T

TA

AT

GC

c":

CG

AATGCCTA

AG = -19.6

(il.u% stop)

T

A

G

C

T

TA

CG

GC

zj:

GC

TTATTT

A G = -15.6

(FlyA

stop

)

Fig.

2.

Nuc

leot

ide

sequ

ence

of

the

ilv

D a

nd

ilvA

gen

es.

The

nu

mbe

ring

is

as

show

n in

Fig

. 1.

Stop

co

dons

ar

e in

dica

ted

by

aste

risk

s.

Are

as

of i

nver

ted

repe

ats

are

indi

cate

d (>

, <

) a

nd

the

pote

ntia

l st

em

and

loop

st

ruct

ures

ar

e in

dica

ted

in

the

righ

t m

argi

n,

alon

g w

ith

the

calc

ulat

ed

free

en

ergy

. T

he

pote

ntia

l st

em-a

nd-l

oop

stru

ctur

es

at t

he

ihE

-ilv

D

and

ilvD

-ilv

A

boun

dari

es

are

of u

nkno

wn

sign

ific

ance

, w

hile

th

ose

cent

ered

ar

ound

nt

pos

ition

s 65

90-6

700

rese

mbl

e R

ho-i

ndep

ende

nt

tran

scri

ptio

n te

rmin

atio

n si

tes

for

i(vC

(tr

ansc

ribe

d fr

om

righ

t to

left

) an

d it

vA (

tran

scri

bed

from

le

ft t

o ri

ght)

, re

spec

tivel

y.

The

as

teri

sk

at

nt p

ositi

on

6641

is

cent

ered

at

the

TG

A

stop

co

don

ofth

e ilv

Cge

ne

prod

uct

code

d by

the

com

plem

enta

ry

stra

nd.

The

ov

erlin

ed

regi

ons

indi

cate

an

R

BS

for

the

ilvD

mR

NA

at

nt

po

sitio

ns

3177

-318

0,

the

TA

AT

AA

st

op

codo

ns

for

ilvD

an

d th

e A

TG

in

itiat

ion

codo

n fo

r il

vA

at

nt

posi

tions

5039

-504

6.

TA

BL

E

II

Am

ino

acid

co

mpo

sitio

n of

i&

code

d pe

ptid

es

a

Am

ino

acid

ilv l

oci

G

M

E

D

A

Y

c I

H

B

N

E.

coli

(199

pr

o-

tein

s b,

Ala

59

(11

.3)

Arg

19

(3

.6)

Asn

I5

(2

.9)

Asp

33

(6

.3)

CY

s Y

(1

.7)

Gln

32

(6

.1)

GlU

25

(4

.8)

GU

Y

42

(8.1

)

His

17

(3

.3)

Ile

20

(3.8

)

Leu

59

(11

.3)

Lys

17

(3

.3)

Met

19

(3

.6)

Phe

17

(3.3

)

Pro

30

(5.8

)

Ser

22

(4.2

)

Thr

29

(5

.6)

Tr P

6

(1.2

)

Tyr

12

(2

.3)

Val

39

(7

.5)

Aci

dic

Bas

icd

Aro

m

*

Hyd

ro

d

58 (

11.1

)

36

(6.9

)

35

(6.7

)

172

(33.

0)

9 (1

0.2)

1 (8

.0)

6 (6

.8)

3 (3

.4)

2 (2

.3)

I (8

.0)

3 (3

.4)

1 (1

.1)

4 (4

.5)

4 (4

.5)

7 (8

.0)

1 (1

.1)

4 (4

.5)

3 (3

.4)

2 (2

.3)

8 (9

.1)

5 (5

.7)

0 (0

.0)

0 (0

.0)

11 (

12.5

)

6 (6

.8)

8 (9

.1)

3 (3

.4)

29 (

33.0

)

28

(9.0

)

18

(5.8

)

10

(3.2

)

16

(5.2

)

3 (1

.0)

10

(3.2

)

22

(7.1

)

30

(9.7

)

8 (2

.6)

20

(6.5

)

21

(6.8

)

12

(3.9

)

8 (2

.6)

13

(4.2

)

13

(4.2

)

20

(6.5

)

15

(4.8

)

6 (1

.9)

11

(3.5

)

25

(8.1

)

38 (

12.3

)

30

(9.7

)

30

(9.7

)

104

(33.

5)

56

(9.3

)

34

(5.7

)

20

(3.3

)

40

(6.7

)

15

(2.5

)

20

(3.3

)

27

(4.5

)

61 (

10.2

)

12

(2.0

)

32

(5.3

)

52

(8.7

)

31

(5.2

)

22

(3.7

)

15

(2.5

)

29

(4.8

)

38

(6.3

)

32

(5.3

)

4 (0

.7)

12

(2.0

)

47

(7.8

)

67 (

11.2

)

65 (

10.9

)

31

(5.2

)

184

(30.

7)

60

(11.

7)

31

(6.0

)

14

(2.7

)

29

(5.6

)

7 (1

.4)

19

(3.7

)

38

(7.4

)

46

(8.9

)

16

(3.1

)

20

(3.9

)

59 (

11s)

21

(4.1

)

12

(2.3

)

25

(4.9

)

21

(4.1

)

24

(4.7

)

16

(3.1

)

2 (0

.4)

14

(2.7

)

40

(7.8

)

67 (

13.0

)

52 (

10.1

)

41

(8.0

)

172

(33.

4)

26

(8.7

)

22

(7.4

)

8 (2

.7)

12

(4.0

)

4 (1

.3)

13

(4.4

)

24

(8.1

)

15

(5.0

)

10

(3.4

)

14

(4.7

)

34 (

11.4

)

11

(3.7

)

8 (2

.7)

11

(3.7

)

21

(7.0

)

17

(5.7

)

15

(5.0

)

3 (1

.0)

3 (1

.0)

26

(8.7

)

99 (

33.2

)

36 (

12.1

)

33 (

11.1

)

11

(5.7

)

50 (

10.2

)

21

(4.3

)

16

(3.3

)

26

(5.3

)

6 (1

.2)

21

(4.3

)

41

(8.3

)

46

(9.3

)

7 (1

.4)

28

(5.7

)

45

(9.1

)

34

(6.9

)

21

(4.3

)

16

(3.3

)

17

(3.5

)

21

(4.3

)

22

(4.5

)

5 (1

.0)

17

(3.5

)

31

(6.3

)

163

(33.

1)

67

(13.

6)

55 (

11.2

)

38

(7.7

)

50 (

8.8)

26 (

4.6)

17 (

3.0)

27 (

4.8)

10 (

1.8)

31

(5.5

)

22

(3.9

)

50 (

8.8)

18 (

3.2)

30 (

5.3)

49

(8.6

)

23

(4.1

)

23

(4.1

)

13 (

2.3)

28 (

4.9)

32 (

5.6)

31 (

5.5)

12 (

2.1)

18 (

3.2)

56 (

9.9)

49

(8.6

)

49

(8.6

)

43

(7.6

)

201

(35.

4)

9 (5

.6)

13

(8.1

)

3 (1

.9)

7 (4

.3)

0 (0

.0)

8 (5

.0)

12

(7.5

)

14

(8.7

)

2 (1

.2)

12

(7.5

)

18 (

11.2

)

8 (5

.0)

3 (1

.9)

3 (1

.9)

3 (1

.9)

14

(8.7

)

9 (5

.6)

0 (0

.0)

3 (1

.9)

19 (

11.8

)

19 (

11.8

)

21

(13.

0)

6 (3

.7)

58 (

36.0

)

71 (

12.6

)

24

(4.3

)

19

(3.4

)

29

(5.2

)

9 (1

.6)

34

(6.0

)

30

(5.3

)

45

(8.0

)

14

(2.5

)

47

(8.3

)

47

(8.3

)

18

(3.2

)

21

(3.7

)

17

(3.0

)

37

(6.6

)

23

(4.1

)

29

(5.2

)

4 (0

.7)

10

(1.8

)

34

(6.0

)

59 (

10.5

)

42

(7.5

)

31

(5.5

)

180

(32.

0)

3 (3

.1)

(9.9

)

5 (5

.2)

(5.7

)

7 (7

.2)

(4.0

)

9 (9

.3)

(5.5

)

2 (2

.1)

(1.0

)

8 (8

.2)

(4.4

)

4 (4

.1)

(6.6

)

3 (3

.1)

(4.7

)

3 (3

.1)

(2.1

) 7

(7.2

) (5

.8)

9 ‘(

9.3)

(9

.6)

4 (4

.1)

(5.1

)

4 (4

.1)

(5.8

)

5 (5

.2)

(3.6

)

3 (3

.1)

(4.1

)

4 (4

.1)

(5.6

)

5 (5

.2)

(5.2

)

1 (1

.0)

(1.1

)

0 (0

.0)

(2.8

)

10 (

10.3

) (7

.4)

13 (

13.4

) 12

.1

9 (9

.3)

10.8

6 (6

.2)

7.5

36 (

37.1

) 36

.1

(Tab

le

II.

cont

inue

d)

M,

56,4

95

9704

34

098

6421

9 56

202

33

175

54

074

62

033

1753

4 60

447

1108

4 N

A’

Q’/

D

: arg

e

19/O

7/

O

1 s/o

32

12

2912

20

/o

21/o

28

13

13/o

23

/l 5/

O

NA

leu

3412

5 21

5 20

/l 48

14

49/1

0 25

19

4312

29

120

919

3211

5 8/

O

NA

ser

14/8

41

4 16

14

21/l

1 17

17

10/s

19

12

2418

91

5 1

l/12

l/3

NA

Mod

’ 3.

1%

2.3%

2.

3 %

4%

3.

1%

3.0%

0.

8%

5.5%

5.

0%

2.3%

3.

1%

NA

P2”

0.61

0.

38

0.56

0.

39

0.46

0.

29

0.58

0.

43

0.40

0.

59

0.48

N

A

il T

he

num

ber

of a

min

o ac

ids

and

its p

erce

ntag

e (i

n pa

rent

hese

s)

in e

ach

gene

pr

oduc

t is

pre

sent

ed.

In a

dditi

on,

the

M,,

Q

/D

ratio

, %

mod

ulat

or

codo

ns,

and

the

P2 i

ndex

ar

e in

dica

ted.

See

RE

SUL

TS

AN

D

DIS

CU

SSIO

N,

sect

ion

e, f

or

deta

ils.

b M

aruy

ama

et a

l. (1

986)

. Se

e T

able

I,

foo

tnot

e b.

’ N

ot

avai

labl

e.

d A

cidi

c (A

sp

+ G

lu),

ba

sic

(Arg

+

Lys

), ar

om

(aro

mat

ic:

Phe

+ T

rp

+ T

yr),

hy

dro

(hyd

roph

obic

: ar

omat

ic

+ Il

e +

Leu

+

Met

+

Val

).

e T

he

Q/D

va

lue

(qua

rtet

/due

t)

is c

alcu

late

d (G

rant

ham

et

al.,

19

81)

for

amin

o ac

ids

with

si

x co

dons

. Fo

r ex

ampl

e,

the

ilvG

gen

e us

es

19 A

rg

codo

ns

(8 +

4 +

4 +

3)

from

th

e qu

arte

t

(CT

G,

CG

C,

CG

A,

and

CG

G,

resp

ectiv

ely)

an

d no

ne

(0 +

0)

from

th

e du

et

(AG

A

and

AG

G,

resp

ectiv

ely)

(T

able

I)

.

r M

od

(mod

ulat

ory

codo

ns:

CU

A,

AU

A,

CG

A,

CG

G,

AG

A,

AG

G,

GG

A,

and

GG

G)

are

rare

ly

used

in

hig

hly

expr

esse

d ge

nes

(Gro

sjea

n an

d Fi

ers,

19

82).

Fo

r ex

ampl

e,

ilvG

has

16

mod

ulat

ory

codo

ns

and

521

tota

l co

dons

(1

6/52

1 =

3.1%

) (T

able

I)

.

g T

he

P2

inde

x (G

ouy

and

Gau

tier,

1982

) w

as

calc

ulat

ed

as

follo

ws:

(T

TC

+

AT

C

+ C

CT

+

GC

T

+ T

AC

+

AA

C

+ C

GT

+

GG

T)/

[(T

TC

+

AT

C

+ C

CT

+

GC

T

+ T

AC

+

AA

C

+

CG

T

+ G

GT

) +

(IT

T,

AT

T,

CC

C,

GC

C,

TA

T,

AA

T,

CG

C,

and

GG

C)]

. T

he

valu

e fo

r ilv

G i

s (S

S)/[

(SS)

+

(57)

] =

0.61

.

194

similar overlap is seen in the trpEDCBA cluster

(Yanofsky et al., 198 1) at the trpE-trpD and trpB-trpA

junctions. This circumstance in trp correlates with

the pairs of genes whose products are associated in

enzyme complexes, and, at least for the trpB-trpA

pair, the presence of translational coupling of the two

polypeptides. The ilvD- and ilvA-coded peptides

have not been observed to aggregate in cell extracts,

and the potential presence of translational coupling

has not yet been tested. In bacteriophage i, the

sequence ATGA occurs at the boundaries of a

cluster of several genes; it provides both terminator

(TGA) and (re)initiator (ATG) codons (Kroger and

Hobom, 1982). This overlapping sequence differs

from that of the bacterial genes and the significance

of the sequence in bacteriophage /i is uncertain.

(c) The ilvA gene

An ORF extending from an AUG at nt position

5044 to UAG stop codon at nt position 6588 encodes

a protein of 515 aa with an A4, of 56202 (Fig. 2,

Tables I and II). We previously purified the ihA gene

product, threonine deaminase (Calhoun et al., 1973)

and its subunit A4, was estimated to be 51800 by

equilibrium centrifugation in 6 M guanidine HCl,

and 53 000 on SDS-100,; polyacrylamide gels. The

subunit size estimate for this peptide on 10 O0 to 15 :’ 0

polyacrylamide gradient-SDS gels in maxicells

(Gray et al., 198 1) was 53 000-55 000. The predicted

and observed A4, estimates are, therefore, in good

agreement.

An inverted repeat sequence is situated near the

5’-end of the ilvA coding region at nt positions

5 192-522 1. The AG for this potential stem and loop

is -19.0 kcal/mol, which should provide considera-

ble stability. The occurrence of the inverted repeats

in the ihE-ilvD intercistronic region and near the

ilvD-ilvA boundary does not appear to simply be a

random event. A computer search for inverted

repeats in the ilvGMEDA coding region (nt position

271-6588) using various limits for stem length, 3,

match, loop length, and loopout size within the stem

revealed dozens of inverted repeats, but the two with

the greatest predicted stabilities are those described

above. For comparison, the estimated free energies

for the base-paired regions of the ilv leader mRNA

are -26 to -28 kcal/mol, and -19.6 and -15.6 for

the postulated (Wek and HatIield, 1986) Rho-

independent terminators for ilvC and ilvA, respec-

tively (Fig. 2). Therefore, it is reasonable to suspect

that there may be some yet to be determined

functional significance for these intercistronic in-

verted repeat sequences.

(d) Database searches

The nucleic acid and protein databases have

undergone significant increases in size during the

past few years, so we searched for homologous

sequences not only for ilvD and ilvA, but also for ilvG,

ilvM, ilvE, iIvY, ilvB, ilvN, and ilvC. We conducted

separate searches of the EMBL and NIH nucleotide

databases using both strands of the ilv genes as query

sequences, and we searched the NBRF library using

the deduced ilv peptide sequences.

A significant homology was detected between the

peptide (Fig. 3) and nucleotide sequences of the i/v/l

gene of E. coli and the IL VI gene of yeast (Kielland-

Brant et al., 1984) which code for biosynthetic

threonine deaminases. The region of homology

encompasses the entire ihA gene and all of the yeast

ILVl gene except for the 150 to 160 bp at the 5’-end.

The yeast gene, in comparison to the E. &gene, has

additional sequences at the 5’-end, and these

presumably encode the approx. 50 aa of the

N-terminal peptide that is processed in association

with mitochondrial import (Kielland-Brant ct al.,

1984). A similar situation was found (Falco et al.,

1985) when the yeast ZLV2-encoded acetoiactate

synthase was compared to the E. coli enzyme, in that

the yeast enzyme contains an additional 90 aa at the

N-terminal end.

With one possible exception, no additional signili-

cant homologies were detected between the ilv nucle-

otide and amino acid sequences and those in the

databases. The possible exception, detected using

the FASTN procedure (Lipman and Pierson, 19X5),

is a 989-bp overlap with 47’jb identity corresponding

to the 5’-end of the ilvC gene and the 3’-end of the

ilvIH region (the distal 267; of ihl and all of i/vfH).

While it is intriguing that there is some homology

between these two ifv loci, the extent of nucleotide

sequence similarity is of borderline significance, and

no significant homology was detected when the

predicted amino acid sequences were compared. In

addition. when more stringent algorithms were used

to test for relatedness among the i/v genes (see

RESULTS AND DISCUSSION, section b), this weak

homology was not detected.

195

E. coli Yeast

500

i 1.

I I I

Fig. 3. Comparison of the amino acid sequence of the threonine deaminases coded by E. coli K-12 ilvA gene (abscissa) and the yeast

IL VI gene (ordinate) using the MATRIX routine (see MATERIALS AND METHODS, section b). Note that extrapolation of the line

indicating homology intersects the ILVZ gene of yeast at approximately aa residue 50, indicating that the yeast peptide has additional

N-terminal residues relative to the ilvA coded peptide of E. coli (see RESULTS AND DISCUSSION, section d). A similar pattern of

homology can be found at the nucleotide level (not shown).

(e) Codon utilization and amino acid composition

Table I presents the codon utilization for each of

the eleven ilv loci that have been sequenced, and the

average codon utilization for 199 E. coli genes

(Maruyama et al., 1986). Table 11 presents the amino

acid composition of ilv-coded peptides, as well as

criteria (see MATERIALS AND METHODS, section b), thought to correlate with the level of gene expression,

including the Q/D ratio (Grantham et al., 1981), the

frequency of modulator codons (Grosjean and Fiers,

1982), and the P2 index (Gouy and Gatier, 1982).

196

The codon usage for &coded peptides is quite

similar to the average of 199 E. coli peptides

(Table I). It may be significant that the TTA codon

for leucine is markedly more frequent in peptides

coded by iZvG (3.1%), ilvM (4.5%) and ilvH (4.3:/,)

than by the average for E. coli (0.8”/,), and the ATT

codon for isoleucine is much lower for ilvG (0%) and

Gove (0%) than the average (2.4”~). The tRNAs

co~esponding to these codons are of average

abundance, ana the significance of this observation,

particularly in the context of the role of the intracel-

lular concentration of leucine and isoleucine (along

with valine) in regulating expression of the

iZvGA4EDA cluster (Freundlich et al., 1962), is

unknown.

The amino acid composition of the &-coded

peptides (Table II) is, without exception, typical of

the average amino acid composition for E. coli. This

observation is particularly significant with regard to

the average content of leucine, isoleucine, and valine

for the genes in the ilvGMEDA cluster in the context

of the phenomenon of downstream amplification

(Smith et al., 1976). It was originally noted (Kline

et al., 1974) that the extent of derepression, as

measured by enzyme assay of the ilvEDA gene pro-

ducts, was similar for all three enzymes during limita-

tion for valine or leucine. In contrast, limitation for

isoleucine resulted in a gradient of derepression, with

derepression greater for ilvD than ilvE, and greatest

for ilvA. At that time it was not possible to distinguish

between two alternative explanations for this result,

namely (i) the presence of isoleucine-specific dif-

ferential control for i/v,?, ilvD and i/VA, and (ii) the

presence of progressively lower isoleucine composi-

tion in the peptides coded by iIvE, ilvD, and i/VA,

respectively. Our results make it possible to exclude

the second hypothesis. Two other observations are

of interest in this regard. First, rho mutations (e.g.,

rho221) mimic isoleucine limitation, in that the

derepression measured by enzyme assay is about

twice as great for ilvD than for ilvE, while ilvA is about

5-IO-fold greater than ilvE (Smith et al., 1976).

Second, the isoleucine-specific effect is seen even

when expression is mediated exclusively from site(s)

downstream from the iivG promoter (Gray et al.,

1982). The molecular basis for these effects is cur-

rently under investigation.

The Q/D ratios (Grantham et al., 1981) for

arginine, leucine, and serine correlate with levels of

expression, and are higher in highly expressed genes

(6 l/l, 6316, and 48/l 1, respectively) than in genes

with low levels of expression (57/13, 71/24, and

46/22, respectively). By these criteria the expression

of most of the ilv genes (Table II) is predicted to be

generally moderate, but the ilvC gene, with ratios of

21/O, 4312, and 19/2, clearly fits into the highest

expression category.

modulatory codons (Grosjean and Fiers, 1982)

correspond to tRNAs of lowest abundance in vivo

and are quite rare in genes with high levels of expres-

sion (six modulator codons in 13 genes) but are

about IO-fold more common in genes expressed at a

lower level (68 modulator codons in 16 genes)

(Grantham et al., 1981). By this criteria most ilv

genes (Table II) fall into the intermediate expression

category, but, as seen with the Q/D ratios, the ilvC

gene is predicted to be the most highly expressed ilv

gene, with only 4/492 (0.87;) modulator codons.

Finally, the P2 ratios (Gouy and Gautier, 1982)

vary from about 0.6 to 0.9 for the most highly

expressed genes (e.g., tufA, ompA, and Ipp), and from

about 0.3 to 0.4 for the most weakly expressed genes

(e.g., regulatory proteins such as those specific for

trp_ LX, and ura). By this measurement the ilv genes

(Table II) are generally intermediate with some (i/vG,

ilvE, i/vC, and ilvB) at the lower extreme of the highly

expressed group. The ilvY gene, which codes for the

activator protein for the ilvC gene and is the only

regulatory protein in this group (Umbarger, 1983)

scores lowest by these criteria in accord with

expectations.

The absolute number of molecules per cell for

most of the i&coded peptides is not known, and the

levels vary in response to control signals in noi:-

uniform patterns (e.g., see above in this section).

However, our previous studies of [ “5S]methionine

labelling of proteins following infection of lysogenic

hosts (to prevent readthrough from phage pro-

moters) with 1dilv phage in which the iIvGMEDA and

i/vYC genes were monitored (Gray and Calhoun,

1982) can now be interpreted more quantitatively

since the results presented here (Table II) reveal that

the methionine composition for these gene products

varies by less than a factor of two. in those studies

we found that the ilvC gene product was produced at

roughly 5-lo-fold higher levels than ilvGEDA, while

the ilvY coded peptide was not detected, thus provid-

ing the expected correlation based on codon usage

(Table I).

197

(f) Searches for homology among the ilv genes

We conducted dot matrix analyses of all pairwise

combinations (see MATERIALS AND METHODS,

section b) of ilv nucleotide and amino acid sequences

to test for relatedness among these genes, and to test

for potential repeated sequences within each gene.

The analysis detected the previously reported

homologies between the ilvGM, ilvBN, and ilvIH loci

(Squires et al., 1983; Wek et al., 1985) that encode

isozymes of acetohydroxy acid synthases. No other

statistically significant homologies or internal repeats

were detected for the nucleotide or amino acid

sequences. These results do not indicate the presence

of retrograde evolution (Horowitz,. 1985) which

invokes a sequential recruitment during evolution of

new enzymes in reverse order of the biochemical

pathway. It is not known if the very weak homology

noted in section d, above between ilvC and ilvIH

reflects an ancestral relationship. Evidence for

retrograde evolution is found for some (e.g., the

P-ketoadipate pathway in bacterial genera; Yeh

et al., 1982) but not other (e.g., the trp operon of

E. coli; Yanofsky et al., 1981) pathways. Therefore,

alternative hypotheses, such as enzyme recruitment

based upon substrate ambiguity (Jensen, 1976), can

be considered as a viable alternative for the genes in

the ilvGMEDA cluster.

ADDENDUM

After this manuscript was submitted for publi-

cation we received a copy of a manuscript (Lawther

et al., 1987) containing the sequence of the E. coli

K-12 ilvD and ilvA genes. A comparison of these

independently determined sequences revealed twelve

differences at eight sites, including three additional

bases in our sequence between nt positions 4410 and

4500 that encode part of iZvD. We communicated

with these authors and their review of their se-

quencing data confirmed our sequence; a correction

will be submitted by these authors (R.P. Lawther,

personal communication).

ACKNOWLEDGEMENT

This research was supported by Public Health

Service grant GM23 182 from the National Institutes

of Health.

REFERENCES

Calhoun, D.H., Rimerman, R.A. and Hatfield, G.W.: Threonine

deaminase from Escherichia coli, I. Purification and proper-

ties. J. Biol. Chem. 248 (1973) 3511-3516.

Dale. R.M.K., McClure, B.A. and Houchins, J.P.: A rapid single-

stranded cloning strategy for producing a sequential series of

overlapping clones for use in DNA sequencing: application

to sequencing the corn mitochondrial 18s rDNA. Plasmid I3

(1985) 31-40.

Falco, S.C., Dumas, K.S. and Livak, K.J.: Nucleotide sequence

of the yeast ILV2 gene which encodes acetolactate synthase.

Nucl. Acids Res. 13 (1985) 401 I-4027.

Freundlich, M., Burns, R.O. and Umbarger, H.E.: Control of

isoleucine, valine, and leucine biosynthesis, 1. Multi-valent

repression. Proc. Natl. Acad. Sci. USA 48 (1962) 1X04-1808.

Gouy, M. and Gautier, C.: Codon usage in bacteria: correlation

with gene expressivity. Nucl. Acids Res. IO (1982)

7055-7074.

Grantham, R., Gautier, C., Gouy, M., Jacobzonc, M. and

Mercicr, R.: Codon catalogue usage is a genome strategy

modulated for gene expressivity. Nucl. Acids Res. 9 (1981)

r43-r73.

Gray, J.E., Bennett, D.C., Umbarger, H.E. and Calhoun, D.H.:

Physical and genetic localization ofilv regulatory sites in I,di/v

bacteriophages. J. Bacterial. 149 (1982) 1071-1081.

Gray, J.E. and Calhoun, D.H.: Absence of significant membrane

localization of the proteins coded by the i/vGEDAC genes of

E.rcherichia co/i K-12. J. Bacterial. 151 (1982) 119-126.

Gray, J.E., Patin, D.W and Calhoun, D.H.: Identilication of the

protein products of the rmC, ilv, rho region of the Escherichiu

coli K 12 chromosome. Mol. Gen. Genet. I83 (1981)

-‘78-430.

Gray. J.E., Wallen, J.W. and Calhoun, D.H.: Identification of a

protein of 15000 daltons related to isoleucine-valine bio-

synthesis in E.vherichia coli K-12. J. Bacterial. 151 (1982)

127-134.

Grosjean, H. and Fiers, W.: Preferential codon usage in

prokaryotic genes: the optimal codon-anticodon interaction

energy and the selective codon usage in efficiently expressed

genes. Gene 18 (1982) 199-209.

Horowitz, N.H.: On the evolution ofbiochemical synthesis. Proc.

Natl. Acad. Sci. USA 31 (1985) 153-157.

Jensen, R.A.: Enzyme recruitment in evolution of new function.

Annu. Rev. Microbial. 30 (1976) 409-425.

Kielland-Brant, M.. Holmberg, S., Petersen, J.G.L. and Nilsson-

Tillgnen. T.: Nucleotide sequence of the gene for threonine

deaminase (IL VI ) of Saccharom_tws cerevisiue. Carlsberg

Res. Commun. 49 (1984) 567-575.

198

Kline, E.L., Brown, C.S., Coleman Jr., W.G. and Umbarger,

H.E.: Regulation of isoleucine-valine biosynthesis in an

ilvDAC deletion strain of Escherichia cob K-12. Biochem.

Biophys. Res. Commun. 57 (1974) 1144-1151.

Kriiger, M. and Hobom, G.: A chain of interlinked genes in the

nitzR region of bacteriophage lambda. Gene 20 (1982) 25-38.

Kuramitsu, S., Ogawa, T., Ogawa, H. and Kagamiyama, H.:

Branched chain amino acid aminotransferase of Escherichiu

co/i: nucleotide sequence of the ilvE gene and the deduced

amino acid sequence. J. Biochem. 97 (1985) 993-999.

Lawther. R.P., Calhoun, D.H., Adams, C.W., Hauser, CA.,

Gray, J. and Hatfield, G.W.: Molecular basis of valine

resistance in Escherichiu coli K-12. Proc. Natl. Acad. Sci.

USA 78 (1981) 922-925.

Lawther, R.P., Wek, R.C., Lopes, J.M., Pereira, R., Tallion, B.E.

and Hatfield, G.W.: The complete nucleotide sequence ofthe

ilvGA4EDA operon of Escherichia co/i K-12. Nucl. Acids Res.

15 (1987) 2137-2155.

Lipman, D.J. and Pearson, W.R.: Rapid and sensitive protein

similarity searches. Science 227 (1985) 1435-1441.

Maruyama, T., Gojobori, T., Aota, S.-I. and Ikemura. T.: Codon

usage tabulated from the GenBank sequence data. Nucl.

Acids Rcs. 14 (1986) r151-197.

Messing, J.: New Ml3 vectors for cloning. Methods Enzymol.

101 (1983) 20-89.

Sanger, F., Nicklen, S. and Co&on, A.R.: DNA sequencing with

chain terminating inhibitors. Proc. Natl. Acad. Sci. USA 74

(1977) 5463-5467

Smith, J.M., Smolin, D.E. and Umbarger, H.E.: Polarity and

regulation of the ilv gene cluster in Escherichiu co/i K- 12. Mol.

Gen. Genet. 148 (1976) 111-124.

Squires, C.H., DeFelice, M., Devereux. J. and Calvo, J.M.:

Molecular structure ofilvlH and its evolutionary relationship

to ilvG in Escherichia coli K-12. Nucl. Acids Res. I I (1983)

5299-5313. Umbarger, H.E.: The biosynthesis of isoleucine and valine and

its regulation. In Herrmann, K.M. and Somerville, R.L.

(Eds.), Amino Acid Biosynthesis and Genetic Regulation.

Addison-Wesley, London. 1983, pp. 245-266.

von Hippel, P.H., Bear, D.G., Morgan, W.D. and McSwiggen,

J.A.: Protein-nucleic acid interactions in transcription: a

molecular analysis. Annu. Rev. Biochem. 53 (1984) 389-446. Wek. R.C., Hauser, CA. and Hatfield, G.W.: The nucleotidc

sequence of the ilvBN operon of Escherichicr cob: sequence

homologies of the acetohydroxy acid synthase isozymcs.

Nucl. Acids Res. 13 (1985) 3995-4010.

Wek, R.C. and Hatfield, G.W.: Nucleotide sequence and in vivo

expression of the ilvY and ilvC genes in Escherichiu co/i K-12:

transcription from divergent overlapping promoters. J. Biol.

Chem. 261 (1986) 2441-2450.

Yanofsky, C.. Platt, T.. Crawford, I.P., Nichols, B.P.. Christie,

G.E., Horowitz, H., VanCleemput. M. and Wu, A.M.: The

complete nucleotide sequence of the tryptophan opcron of

Escherichiu co/i. Nucl. Acids Res. 9 ( 198 I) 6647-666X.

Ych, W.K., Shih, C. and Ornston, L.N.: Overlapping evolu-

tionary affmities revealed by comparison of amino acid com-

positions. Proc. Natl. Acad. Sci. USA 79 (19X2) 3794-3797.

Communicated by S.R. Kushner.