Post on 06-Feb-2023
Gene, 56 (1987) 185-198
Elsevier
185
GEN 02069
The complete nucleotide sequence of the iZvGMEDA cluster of Escherichia coli K-12
(Recombinant DNA; codon usage; threonine deaminase; yeast mitochondria)
James L. Cox, Betty J. Cox, Vincenzo Fidanza and David H. Calhoun
Department of Microbiology, Mount Sinai School of Medicine, New York, NY 10029 (U.S.A.)
Received 5 February 1987
Revised 4 May 1987
Accepted 4 May 1987
SUMMARY
The ilvGA4EDA gene cluster of Escherichiu colt’ K-12 has been the focus of intensive genetic and biochemical
analysis for the past 30 years. Genetic regulation of the ilvGMEDA cluster involves attenuation, internal
promoters, internal Rho-dependent termination sites, a site of polarity in the ilvG pseudogene of the wild-type
organism, and autoregulation by the ilvA gene product, the biosynthetic L-threonine deaminase. We have now
completed the nucleotide sequence of the 6600-bp cluster and have analyzed it, along with the ilvYC, ilvBN,
and i/vIH genes, for codon frequencies and possible evolutionary relationships. The isoleucine content of each
of the gene products of the ilvGA4EDA cluster is quite similar (less than a two-fold variation), thus excluding
one possible interpretation of the isoleucine-specific downstream amplification phenomenon. There is no
evidence for retrograde evolution in the cluster since no significant homologies are detectable among genes that
catalyze sequential reactions of the pathway. A highly significant homology does exist, however, between the
threonine deaminases of yeast mitochondria and E. coli. The sequence at the boundary of the ilvA and ilvD genes
is TAATAATG, so that the second TAA stop codon of ifvD overlaps the ATG initiation codon of ilvA.
INTRODUCTION
Correspondence 10: Dr. David H. Calhoun, Division of Bio-
chemistry, Department of Chemistry, The City College of New
York, Convent Avenue and 138th Street, New York, NY 10031
(U.S.A.), Tel. (212)690-8306.
Abbreviations: aa, amino acid(s); bp, base pair(s); kb, kilobase
or 1000 bp; nt, nucleotide(s); ORF, open reading frame; P2
index, frequency ofutilization of 16 specific codons (see Table II,
footnote g); Pollk, Klenow (large) fragment of E. coli DNA
polymerase I; Q/D ratio, the quartet to duet ratio for sextet
codons (see Table II, footnote c); RBS, ribosome-binding site;
Rho, termination factor for E. coli; SDS, sodium dodecyl sulfate.
The biosynthesis of isoleucine, valine, and leucine
in E. cofi K-12 involves the participation of parallel
pathways, genes located in several chromosomal
sites, and elaborate, but well-integrated, sets of con-
trols at the level enzyme activity and synthesis
(Umbarger, 1983). The largest cluster of genes,
ifvGMEDA, is located at 85 min and has several
interesting genetic regulatory features which are
presently the subject ofactive investigation. Approxi-
mately half (3819 bp) of the cluster has previously
been sequenced, including 278 bp 5’ to the mRNA
0378-I I lY,X7.‘$03.50 0 IYX7 klse~w Sc~cnce Pubhs hers B.V. (BiomeduA DIVISION)
186
start point, ihiG (1646 bp) and i/VA (260 bp) (Lawther
et al., 1981), ilvE (929 bp) (Kuramitsu et al., 1985)
and 241 bp of the 3’-end of ilvA (Wek and Hatfield,
1986). We report here the sequence of ilvD ( 1853 bp)
and ilvA ( 1544 bp), thus completing the sequence of
the 6600-bp cluster, and present an overall analysis
from several perspectives ofthese live i/v genes along
with the six other sequenced ih genes.
MATERIALS AND METHODS
(a) Vectors and methods
The 2262-bp Nindlll-EcoRI fragment containing
ilvD and i/vA coding sequences (see Fig. 1) was
cloned into M 13mp8 and M 13mp 11 (Messing, 1983)
for nucleotidc sequence analysis by the method of
Sanger et al. (1977). Deletions were introduced using
the method of Dale et al. (1985). A synthetic oligo-
[PGI
deoxynucleotide primer that was used to obtain one
segment of the sequence (Fig. 1) was prepared on a
SAM I synthesizer (Biosearch, San Rafael, CA)
using the phosphoramidite method as described by
the manufacturer. The 1657-bp Hind111 segment
containing ilvE and ilvD coding sequences (see
Fig. 1) was cloned into M13mpX in both orienta-
tions, and the indicated deletions were constructed
as described above. Restriction enzymes, PolIk. and
DNA ligase were obtained from New England
Biolabs or Boehringer Mannheim. International
Biotechnologies Incorporated provided reagents for
constructing deletions using the method of Dale
ct al. (1985).
(h) Sequence analysis
Nucleotide sequences were collected using a
model GP7 sonic digitizer (Gaf Bar Instruments)
and overlaps were generated using the MERGE
. ,I250 [=I
EpEl
y-$%] 1000 2000 2101 . 3000 4000 5000 6000 6588 *
I""""""""""""""""'~
33 131 271 1917 2177 2197 3126 3191 5044 6588
G j M f--?-l 1 D I A 01
FS
1276 KpnI I I
1471
Pvu lr
Hind III Stll I
I I 2082 2438
Hind III I
Hind III Sma I 3878 4118 4396 XhoI Sal I 5468 EcoRI
I I f- I I I I 1 I I 1 3568 4097 4912 5395 6358
Pvu II PVUII PVUI Pvu I
43 c-----, 4lH
42 k+ 15* I Hind m I4 -
- 31 I9 - - 32 114 -18- 12-
Ill - II3 - 13- II - 16t-- -II0 II2 -
Hind III I - 211 - 213 - 23 c--l ?*
w 210 - 28 u 24 M 212 - 27 c-r25 -22 - 26
w 214 u 29
I+‘&. 1. The i/vGMEDA cluster. The figure depicts the five structural genes and the start point of transcription preceding ilvG (controlled
by PC,) at nt position 1, the leader peptide (33-131 bp) associated with the attenuator and the attenuated (ATN) transcript (185 bp),
the site of the naturally occurring frameshift (F’S) at 1250 bp in ilvG, the internal site of transcription initiation (controlled by Pt)
preceding the i/vE gene, some selected restriction enzyme sites derived directly from the nucleotide sequence, and the sequencing strategy
we used to complete the sequence of the cluster. The second TAA stop codon for ihD overlaps by one base pair the ATG start codon
of i/r,A at nt position 5044. The deletion clones used to obtain the sequence are indicated (e.g., 41, 42, 15, etc.) and are shown above
and below the HindIll-Hind111 and HindIIl-EcoRI segments. The sequence obtained from the segment designated 214 was obtained
using the full-length template with a specifically constructed primer designed to confirm the overlap between clones 22 and 213.
1x7
routine of MicroGenie version 3.2 (Beckman Instru-
ments). Computerized data banks of amino acid
(NBRF) and nucleotide sequences (EMBL and
NIH) were searched in June, 1986 using the
XFASTN, XFASTP, and IFIND protocols availa-
ble on Bionet (Intelligenetics, Inc., Mountain View,
CA). The MicroGenie program was used to search
for inverted repeat sequences, to calculate codon
frequencies, amino acid composition, and molecular
weights, and for pairwise comparisons of the various
ilv genes and peptides as well as for comparisons of
the threonine deaminases from E. coli and yeast (see
RESuLTS, section d).
We used three published methods to analyze
codon usage. The Q/D ratio (Grantham et al., 1981)
analyzes the quartet-to-duet ratios among sextet
codons. The Grosjean and Fiers (1982) method
analyzes the frequency of eight modulatory codons
that are less common in highly expressed coding
regions. The P2 index (Gouy and Gautier, 1982) is
based on the frequency of utilization of 16 specific
codons that correlates with the level of expression in
E. coli.
RESULISAND DISCUSSION
(a) Nucleotide sequence analysis
The ilvGA4EDA cluster and the strategy we used to
complete the sequence analysis is indicated in Fig. 1.
Sequences of the 5’-end of the cluster (bp l-3490)
and 233 bp at the 3’-end of the cluster (bp
6 125-6358) have previously been reported (see
INTRODUCTION). To complete the sequence, we sub-
cloned the 1659-bp Hind111 segment and the 2262-bp
HindIII-EcoRI segment (Fig. 1) into phages
M 13mp8 and M 13mp 11 and constructed nested sets
of deletion derivatives for sequence analysis using
the enzymatic method (Sanger et al., 1981).
Our sequence, which was obtained several times
and from both strands, overlaps and confirms that
reported by Kuramitsu et al. (1985) at the 5’-end,
and it extends to the EroRI site near the 3’-terminus
of ihA (Wek and Hatfield, 1986). We find only two
long ORFs, and these code for peptides of the size
expected for the ilvD- and ihA-coded gene products.
(b) The ifvD gene
An ORF that encodes a 615-aa peptide of 64219
Da extends from nt position 3191 to tandem stop
codons at nt position 5039-5044 (Tables I and II
Fig. 2). Although all attempts to purify the i/vD-
coded dihydroxyacid dehydrase, which is very labile
in cell extracts, have been unsuccessful (D. Patin
and D.H.C., unpublished results), its M,. has been
estimated using two methods. First, we used (Gray
et al., 1981) a set of Adilv phages that contains
combinations of the ilv genes (i.e., GEDA, EDA, DA,
and A) for infection of IJV-irradiated E. coli cells in
the presence of [ “Slmethionine. The presence of the
ihD gene in the phage correlated with the synthesis
of a 65-kDa protein, as detected by autoradiography
of lo-15”,, SDS gels (Gray et al., 1981). In addition,
we used maxicells (Gray et al., 198 1; 1982) to further
localize the coding region for this peptide (and for
threonine deaminasc; see below). Thus, there is good
agreement between the predicted and observed M,s
for dihydroxyacid dehydrase. and the location of the
coding sequences is as previously predicted (Gray
et al., 198 1).
There is a relatively long segment of 64 bp separat-
ing the UAA stop codon of ilvE (nt position 3 126)
from the first in-frame AUG start codon (nt position
3 191) in the ilvD coding region (Fig. 2). Physical
studies suggest that an unstructured region of 72-84
nt of single-stranded RNA is required for stimulation
of Rho-dependent NTP hydrolysis and transcription
termination by Rho factor (reviewed in von Hipple
et al., 1984). Thus, this region is a candidate for a
Rho-dependent site of polarity, which would be con-
sistent with previous evidence that one or more such
sites may exist within this cluster (Smith et al., 1976).
However, we do note that an inverted repeat is
presented in the sequence immediately following the
ilvE stop codon (indicated in Fig. 2) that potentially
could form a particularly stable stem-and-loop struc-
ture (dG = -25.0 kcalimol), so that only 36 nt would
remain distal to this stem to serve as a Rho-
recognition structure. A consensus RBS is appro-
priately situated upstream from the ALJG codon at
nt position 3 177-3 18 1.
The sequence TAATAATG occurs at the bound-
ary of ilvD and i/VA, so that the ilvD gene ends with
tandem TAA stop codons, and the ATG codon for
ilvA overlaps the second stop codon by 1 nt. A
Cod
on
usag
e fo
r i/s
-cod
ed
pept
ides
*
Cod
on
Am
ino
i/v l
oci
acid
TT
T
Phe
TT
C
Phe
T T
A
Leu
TT
G
LW
CT
T
LC
U
CT
C
Leu
CT
A
Leu
’
CT
G
Leu
A T
‘1
lie
AT
C
lie
A T
A
lie L
A T
G
Met
G1.
T
Val
GT
C
Val
GT
A
Val
GT
G
Val
TC
? Se
r
TC
C
Ser
TC
A
Ser
TC
G
Ser
cc-l
- Pr
o
CC
C
Pro
CC
A
Pro
CC
G
Pro
AC
T
Thr
A c
c
Thr
.4 C
A
Thr
AC
G
Thr
GC
T
Akl
GC
C
Ala
GC
A
Ala
GC
G
Ala
---
x (1
.5)
9 (1
.7)
I6 (
3.1)
9 (1
.7)
4 (0
.8)
4 (0
.8)
1 (0
.2)
75 (
4.X
)
0 (0
.0)
I9 (
3.6)
1 (0
.2)
19 (
3.6)
14 (
2.7)
8 (1
.5)
3 (0
.6)
14 (
2.7)
2 (c
1.4)
5 (1
.0)
4 (0
.8)
3 (0
.6)
5 (1
.0)
2 (0
.4)
5 (1
.0)
18 (
3.5)
5 (1
.0)
20 (
3.8)
2 (0
.4)
2 (0
.4)
9 (1
.7)
13 (
2.7)
19 (
3.6)
17 (
3.3)
M
1 (1
.1)
2 (2
.3)
4 (4
.5)
1 (1
.1)
0 (0
.0)
0 (0
.0)
0 (0
.0)
2 (2
.3)
0 (0
.0)
3 (3
.4)
i(I.
1)
4 (4
.5j
3 (3
.4)
4 (4
.5)
1 (1
.1)
3 (3
.4)
0 (0
.0)
0 (0
.0)
2 (2
.3)
2 (2
.3)
0 (0
.0)
0 (0
.0)
2 (2
.3)
0 (0
.0)
0 (0
.0)
3 (3
.4)
2 (2
.3)
0 (0
.0)
l(l.1
)
b (6
.8)
2 (2
.3)
0 (0
.0)
E 3
(1.0
)
10 (
3.2)
1 (0
.3)
0 (0
.0)
0 (0
.0)
3 (1
.0)
0 (0
.0)
17 (
5.5)
1 (2
.3)
13 (
4.2)
0 (0
.0)
8 (2
.6)
9 (2
.9)
1 (0
.3)
4 (1
.3)
11 (
3.5)
2 (0
.6)
x (2
.6)
1 (0
.3)
5 (I
.61
0 (0
.0)
0 (0
.0)
4 (1
.3)
9 (2
.9)
2 (0
.6)
9 (2
.9)
0 (0
.0)
4 (1
.3)
4 (1
.3)
5 (1
.6)
6 (1
.9)
13 (
4.2)
D
_~ 5
(0.8
)
10 (
1.7)
1 (0
.2)
3 (O
Sj
8 (1
.3)
10 (
1.7)
0 (0
.0)
30 (
5.0)
12 (
2.0)
19 (
3.2)
1 (0
.2)
22 (
3.7j
14 (
2.3)
10 (
1.7)
9 (1
.5)
14 (
2.3j
6 (1
.0)
10 (
1.7)
5 (0
.8)
6 (l
.Oj
4 (0
.7)
3 (0
.5)
5 (0
.8)
17 (
2.8)
2 (0
.3)
20
(3.3
)
4 (0
.7)
6 (1
.0)
6 (1
.0)
17 (
2.8)
8 (1
.3)
25
(4.2
)
A
Y c
1 H
B
8 (1
.6)
17 (
3.3)
4 jO
.8)
6 (1
.2)
4 (0
.8)
10 (
1.9)
3 (0
.6)
32 (
6.2)
7 (1
.4)
13 (
2.5)
0 (0
.0)
12 (
2.3)
5 (1
.0)
9 (1
.7)
6 (1
.2)
20
(3.9
)
5 (1
.0)
4 (0
.8)
3 (0
.6)
5 (1
.0)
0 (0
.0)
2 (0
.4)
4 (0
.8)
15 (
2.9)
1 (0
.2)
10 (
1.9)
0 (0
.0)
5 (1
.0)
6 (1
.2)
15 (
2.9)
3 (0
.6)
36 (
7.0)
6 (2
.0)
5 (1
.7)
3 (1
.0)
6 (2
.0)
1 (0
.3)
4 (1
.3)
2 (0
.7)
18 (
6.0)
11 (
3.7)
3 (1
.0)
0 (0
.0)
8 (2
.7)
3 (1
.0)
1 (0
.3)
4 (1
.3)
18 (
6.0)
1 (0
.3)
I (0
.3)
2 (0
.7)
6 (2
.0)
0 (0
.0)
4 (1
.3)
I (0
.3)
16 (
5.4)
4 (1
.3)
4 (1
.3)
0 (0
.0)
7 (2
.3)
1 (0
.3)
5 (1
.7)
6 (2
.0)
14 (
4.7)
4 (0
.8)
12 (
2.4)
0 (0
.0)
2 (0
.4)
2 (0
.4)
2 (0
.4)
0 (0
.0)
39 (
7.9)
12 (
2.4)
16 (
3.3)
0 (0
.0)
21 (
4.3)
6 (1
.2)
4 (0
.8)
8 (1
.6)
13 (
2.6)
IO (
2.0)
6 (1
.2)
1 (0
.2)
2 (0
.4)
0 (0
.0)
1 (0
.2)
4 (0
.8)
12 (
2.4)
1 (0
.2)
17 (
3.5)
2 (0
.4)
2 (0
.4)
9 (1
.8)
8 (1
.6)
8 (1
.6)
25
(5.1
)
6 (1
.1)
7 (1
.2)
7 (1.2)
13 (
2.3)
7 (1
.2)
4 (0
.7)
1 (0
.2)
17 (
3.0)
16 (
2.8)
12 (
2.1)
2 (0
.4)
23
(4.1
)
11 (
1.9)
17 (
3.0)
11 (
1.9)
17 (
3.0)
7 (1
.2)
7 (1
.2)
2 (0
.4)
8 (1
.4)
2 (0
.4)
7 (1
.2)
8 (1
.4)
11 (
1.9)
9 (1
.6)
11 (
1.9)
0 (0
.0)
11 (
1.9)
l(1.
2)
9 (1
.6)
11 (
1.9)
23 (
4.1)
2 (1
.2)
1 (0
.6)
1 (4
.3)
2 (1
.2)
5 (3
.1)
1 (0
.6)
0 (0
.0)
3 (1
.9)
6 (3
.7)
4 (2
.5)
2 (1
.2)
3 (1
.9)
4 (2
.5)
5 (3
.1)
2 (1
.2)
8 (5
.0)
1 (0
.6)
2 (1
.2)
2 (1
.2)
4 (2
.5)
0 (0
.0)
1 (0
.6)
1 (0
.6)
1 (0
.6)
0 (0
.0)
6 (3
.7)
2 (1
.2)
1 (0
.6)
I (0
.6)
1 (0
.6)
3 (1
.9)
4 (2
.5)
10 (
1.8)
7 (1
.2j
3 (0
.5)
12 (
2.1)
1 (0
.2)
4 (0
.7)
0 (0
.0)
27 (
4.8)
21 (
3.7)
24 (
4.3)
2 (0
.4)
21 (
,3.7
)
9 (1
.6)
6 (1
.1)
1 (0
.2)
18 (
3.2)
2 (0
.4)
3 (0
.5)
2 (0
.4)
3 (0
.7)
5 (0
.9)
5 (0
.9j
5 (0
.9)
22
(3.9
)
4 (0
.7)
17 (
3.0)
4 (0
.7)
4 (0
.7)
6 (1
.1)
24 (
4.3)
9 (1
.6)
32 (
5.7)
N
E.
coli
( 199
gen
es)
h .-
- 5 (5
.2)
0 (0
.0)
(1.6
)
(2.0
)
0 (0
.0)
(0.8
) 0
(0.0
) (1
.0)
2 (2
.1)
(0.8
)
1 (1
.0)
(0.9
)
1 (1
.0)
(0.2
) 5
(5.2
) (5
.9)
3 (3
.1)
3 (3
.1)
1 (1
.0)
4 (4
.1)
3 (3
.1)
2 (2
.1)
2 (2
.1)
3 (3
.1)
0 (0
.0)
1 (1
.0)
0 (0
.0)
0 (0
.0)
0 (0
.0)
0 (0
.0)
0 (0
.0)
3 (3
.1)
I (1
.0)
2 (2
.1)
1 (1
.0)
1 (1
.0)
1 (1
.0)
1 (l
.Oj
0 (0
.0)
1 (1
.0)
(2.4
)
(3.2
)
(0.2
)
(2.7
)
(2.4
)
(1.3
)
(1.3
)
(2.4
)
(1.1
)
(1.1
)
(0.5
)
(0.7
)
(0.5
)
(0.3
)
(0.7
)
(2.6
)
(1.1
)
(2.5
)
(0.5
)
(1.1
)
(2.0
)
12.3
)
(2.1
) (3
.5)
TA
T
TA
C
TA
A
TA
G
CA
T
CA
C
CA
A
CA
G
AA
T
AA
C
AA
A
AA
G
GA
T
GA
C
GA
A
GA
G
TG
T
TG
C
TG
A
TG
G
CG
T
CG
C
CG
A
CG
G
AG
T
AG
C
AG
A
AG
G
GG
T
GG
C
GG
A
GG
G
Tyr
T
yr
End
End
His
His
Gltl
Gln
Asn
Asn
EY
S
EY
s
Asp
Asp
Glu
Glu
CY
s
CY
s E
nd
Trp
Arg
Arg
A
rg ’
Arg
’
Ser
Ser
Arg
’
Arg
”
GU
Y
GU
Y
Gly
”
Gly
”
5 (1
.0)
7 (1
.3)
0 (0
.0)
0 (0
.0)
9 (1
.7)
8 (1
.5)
13 (
2.5)
19 (
3.6)
5 (1
.0)
10 (
1.9)
14 (
2.7)
3 (0
.6)
18 (
3.5)
15 (
2.9)
17 (
3.3)
8 (1
.5)
5 (1
.0)
4 (0
.8)
0 (0
.0)
6 (1
.2)
8 (1
.5)
4 (0
.8)
4 (0
.8)
3 (0
.6)
3 (0
.6)
5 (1
.0)
0 (0
.0)
0 (0
.0)
16 (
3.1)
19 (
3.6)
1 (0
.2)
6 (1
.2)
0 (0
.0)
0 (0
.0)
0 (0
.0)
0 (0
.0)
2 (2
.3)
2 (2
.3)
4 (4
.5)
3 (3
.4)
6 (6
.8)
0 (0
.0)
l(l.1
)
0 (0
.0)
1 (1
.1)
2 (2
.3)
3 (3
.4)
0 (0
.0)
0 (0
.0)
2 (2
.3)
1 (1
.1)
0 (0
.0)
3 (3
.4)
3 (3
.4)
0 (0
.0)
l(l.1
)
1 (1
.1)
3 (3
.4)
0 (0
.0)
0 (0
.0)
1 (1
.1)
0 (0
.0)
0 (0
.0)
0 (0
.0)
-
6 (1
.9)
5 (1
.6)
1 (0
.3)
0 (0
.0)
4 (1
.3)
4 (1
.3)
3 (1
.0)
7 (2
.3)
3 (1
.0)
7 (2
.3)
10 (
3.2)
2 (0
.6)
10 (
3.2)
6 (1
.9)
16 (
5.2)
6 (1
.9)
2 (0
.6)
1 (0
.3)
0 (0
.0)
6 (1
.9)
8 (2
.6)
10 (
3.2)
0 (0
.0)
0 (0
.0)
0 (0
.0)
4 (1
.3)
0 (0
.0)
0 (0
.0)
11 (
3.5)
12 (
3.9)
5 (1
.6)
2 (0
.6)
5 (0
.8)
7 (1
.2)
1 (0
.2)
0 (0
.0)
4 (0
.7)
8 (1
.3)
7 (1
.2)
13 (
2.2)
5 (0
.8)
15 (
2.5)
21 (
3.5)
10 (
1.7)
24 (
4.0)
16 (
2.7)
22 (
3.7)
5 (0
.8)
5 (0
.8)
10 (
1.7)
0 (0
.0)
4 (0
.7)
9 (1
.5)
15 (
2.5)
6 (1
.0)
2 (0
.3)
3 (0
.5)
8 (1
.3)
0 (0
.0)
2 (0
.3)
20
(3.3
)
28 (
4.7)
4 (0
.7)
9 (1
.5)
5 (1
.0)
9 (1
.7)
0 (0
.0)
1 (0
.2)
7 (1
.4)
9 (1
.7)
6 (1
.2)
13 (
2.5)
2 (0
.4)
12 (
2.3)
16 (
3.1)
5 (1
.0)
13 (
2.5)
16 (
3.1)
33 (
6.4)
5 (1
.0)
1 (0
.2)
6 (1
.2)
0 (0
.0)
2 (0
.4)
6 (1
.2)
19 (
3.7)
0 (0
.0)
4 (0
.8)
0 (0
.0)
7 (1
.4)
1 (0
.2)
1 (0
.2)
10 (
1.9)
29 (
5.6)
2 (0
.4)
5 (1
.0)
1 (0
.3)
2 (0
.7)
0 (0
.0)
0 (0
.0)
6 (2
.0)
4 (1
.3)
2 (0
.7)
11 (
3.7)
3 (1
.0)
5 (1
.7)
7 (2
.3)
4 (1
.3)
11 (
3.7)
I (0
.3)
15 (
5.0)
9 (3
.0)
2 (0
.7)
2 (0
.7)
1 (0
.3)
3 (1
.0)
2 (0
.7)
16 (
5.4)
1 (0
.3)
3 (1
.0)
I (0
.3)
6 (2
.0)
0 (0
.0)
0 (0
.0)
4 (1
.3)
8 (2
.7)
0 (0
.0)
3 (1
.0)
6 (1
.2)
11 (
2.2)
1 (0
.2)
0 (0
.0)
2 (0
.4)
5 (1
.0)
3 (0
.6)
18 (
3.7)
3 (0
.6)
13 (
2.6)
28 (
5.7)
6 (1
.2)
17 (
3.5)
9 (1
.8)
28 (
5.7)
13 (
2.6)
3 (0
.6)
3 (0
.6)
0 (0
.0)
5 (1
.0)
14 (
2.8)
7 (1
.4)
0 (0
.0)
0 (0
.0)
0 (0
.0)
2 (0
.4)
0 (0
.0)
0 (0
.0)
17 (
3.5)
27 (
5.5)
0 (0
.0)
2 (0
.4)
11 (
1.9)
7 (1
.2)
1 (0
.2)
0 (0
.0)
14 (
2.5)
4 (0
.7)
12 (
2.1)
19 (
3.4)
10 (
1.8)
7 (1
.2)
16 (
2.8)
7 (1
.2)
18 (
3.2)
9 (1
.6)
13 (
2.3)
9 (1
.6)
2 (0
.4)
8 (1
.4)
0 (0
.0)
12 (
2.1)
8 (1
.4)
9 (1
.6)
5 (0
.9)
1 (0
.2)
4 (0
.7)
4 (0
.7)
1 (0
.2)
2 (0
.4)
13 (
2.3)
15 (
2.6)
7 (1
.2)
15 (
2.6)
I (0
.6)
2 (1
.2)
1 (0
.6)
0 (0
.0)
1 (0
.6)
1 (0
.6)
3 (1
.9)
5 (3
.1)
2 (1
.2)
1 (0
.6)
6 (3
.7)
2 (1
.2)
5 (3
.1)
2 (1
.2)
8 (5
.0)
4 (2
.5)
0 (0
.0)
0 (0
.0)
0 (0
.0)
0 (0
.0)
5 (3
.1)
6 (3
.7)
0 (0
.0)
2 (1
.2)
2 (1
.2)
3 (1
.9)
0 (0
.0)
0 (0
.0)
3 (1
.9)
I (4
.3)
1 (0
.6)
3 (1
.9)
5 (0
.9)
5 (0
.9)
1 (0
.2)
0 (0
.0)
10 (
1.8)
4 (0
.7)
9 (1
.6)
25 (
4.4)
Y (
1.6)
10 (
1.8)
14 (
2.5)
4 (0
.7)
17 (
3.0)
12 (
2.1)
21
(3.7
)
9 (1
.6)
7 (1
.2)
2 (0
.4)
0 (0
.0)
4 (0
.7)
8 (1
.4)
12 (
2.1)
0 (0
.0)
3 (0
.5)
3 (0
.5)
9 (1
.6)
1 (0
.2)
0 (0
.0)
12 (
2.1)
26 (
4.6)
3 (0
.5)
4 (0
.7)
0 (0
.0)
0 (0
.0)
1 (1
.0)
0 (0
.0)
3 (3
.1)
0 (0
.0)
2 (2
.1)
6 (6
.2)
2 (2
.1)
5 (5
.2)
2 (2
.1)
2 (2
.1)
3 (3
.1)
6 (6
.2)
2 (2
.1)
2 (2
.1)
2 (2
.1)
0 (0
.0)
0 (0
.0)
1 (1
.0)
2 (2
.1)
3 (3
.1)
0 (0
.0)
0 (0
.0)
0 (0
.0)
3 (3
.1)
0 (0
.0)
0 (0
.0)
0 (0
.0)
3 (3
.1)
0 (0
.0)
0 (0
.0)
(1.3
)
(1.5
)
(0.2
)
(0.0
2)
(0.9
)
(1.2
)
(1.2
)
(3.2
)
(1.3
)
(2.7
)
(3.9
)
(1.2
)
(3.0
)
(2.5
)
(4.7
)
(1.9
)
(0.4
)
(0.6
) (0
.06)
(1.1
)
(2.9
)
(2.2
)
(0.2
)
(0.3
)
(0.5
)
(1.5
)
(0.1
) (0
.06)
(3.3
)
(3.1
)
to.5
1
(0.8
)
h T
he
codo
ns
are
arra
nged
(t
op
to
botto
m)
in t
he
orde
r T
, C
, A
, G
fo
r th
e fi
rst,
seco
nd,
and
thir
d po
sitio
ns,
resp
ectiv
ely.
T
he
num
ber
of a
min
o ac
ids
for
each
co
don
is i
ndic
ated
an
d
the
perc
enta
ge
of t
he
tota
l co
don
usag
e fo
r th
e pr
otei
n co
ded
by
the
gene
in
eac
h co
lum
n is
giv
en
in p
aren
thes
es.
The
ilv
loci
(G
, IV
, E
, et
c.)
are
indi
cate
d fo
r ea
ch
colu
mn.
h T
his
com
pila
tion
of
199
E.
coli
gene
s w
as
prep
ared
fr
om
the
data
pr
esen
ted
by
Mar
uyam
a et
al.
(198
6),
and
is b
ased
on
re
leas
e N
o.
38 (
Nov
embe
r,
1985
) of
the
G
enB
ank
data
base
.
c M
odul
ator
y co
dons
(G
rosj
ean
and
Fier
s,
1982
).
&
3091
,
3150
GATAAATGGGGCTGGTTAGATCAAGTTAATCAATARATACGGGACGGCACGCA
DKWGWLDQVNQ'
>)>>>>>>>)> 3210
CCGTCCCAT;'TACGAGACAGACACTG~~T~T~G~ATGCCT~GTACCGTTCCGC
c<<<<<<<<<<
MPKYRSA
3270
CACCACCACTCATGGTCGTATATGGCGGGTGCTCGTGCGT
TTTHGRNMAGARALWRATGM
3330
GACCGACGCCGATTTCGGTAGCCGATTATCGCGGTTGT;TTTGT
TDADFGKPIIAVVNSFTQFV
3390
ACCGGGTCACGTCCATCTGGCGATCTCG;;TAARCTGGT~GCCG~CAAATTG~GCGGC
PGHVHLRDLGXLVAEQIEAA
3450
TGGCGGCGTTGCCAAAGAGTTCAACACCATTGCGGTGGATGATGGGATTGCCATGGGCCA
GGVAKEFNTIAVDDGIAMGH
3510
CGGGGGGATGCTTTATTCACTGCCATCTCGCGRACTGAT;GCTGATTCCGTTGAGTATAT
G
GM
L YSLPSRELIADSVEYM
3570
GGTCAACGCCCACTGCGCCGACGCCATGGTCTGCATCTCTATCACCCC
VNAHCADAMVCISNCDKITP
3630
GGGGATGCTGATGGCTTCCSTGCGCCTGAATRTTCCGGTGCCGGCGGCCC
GMLMASLRLNIPVIFVSGGP
3690
GATGGAGGCCGGGAARACCACTTTCCGATCAGATCATCGC
MEAGKTKLSDQI
IKLDLVDA
3750
GATGATCCAGGGCGCAGACCCGAAAGTATCTGACTCCCAGCGTTC
MIQGADPKVSDSQSDQVERS
3810
CGCGTGTCCGACCTGCGGT;CCTGCTCCGGGATGTTTAC~GCT~CTC~TG~CTGCCT
ACPTCGSCSGMFTANSMNCL
3870
GACCGPAGCGCTGGGCCTGTCGCAGCCGGGCAACGGCTCGGC~CGCTGCTGGC~CCCACGCCGA
TEALGLSQPGNGSLLATHAD
3930
CCGTAAGCAGCTGTTCCTTATGCTGGTAAACGCATTGTTCGTTATTA
RXQLFLNAGKRIVELTKRYY
3990
CGAGC~~GACGAAAGT;;CACTGCCGCGTAATATCGCCAGT~GGCGGCGTTTG~
EQNDESALPRNIASKAAFEN
4050
CGCCATGACGCTGGATATCECGATGGGTGdATCGACTARCCTGCTGGC
AMTLDIAMGGSTNTVLHLLA
CG
AC
CA GC
GC
CG
AT
GC
CC
GC
TA
AT
AT
AAAATACG
‘$G = -25.0
4110
GGCGGCGCAGGAAGCGGAAATCGACTTCACCATGAGTGATATCGAT~GCTTTCCCGC~
AAQEAEIDFTMSDI
DKLSRK
4170
GGTTCCACAGCTGTGTRRRGTTGCGCCGAGCACCCAG~TACCATATGG~GATGTTCA
VPQLCKVAPSTQKYHMEDVH
4230
CCGTGCTGGTGGTGTTATCGGTATTCTCGGCGAACTGGATCG
RAGGVIGILGELDRAGLLNR
4290
TGATGTG~ACGTACTTGGCCTGACGTTGCCGCTG
DVXNVLGLTLPQTLEQYDVH
GCTGACCCAGGATGACGCGGT AAAAAA'
4350
TATGTTCCGCGCiGGTCCTGCAGGCATTCGTAC
LTQDDAVKNMFRAGPAGIRT
4410
CACACAGGCATTCTCGCAAGATTGCCGTTGGGATACGCTGGACGACGATCGCGCC~TGG
TQAFSQDCRWDTLDDDRANG
.
4470
CTGTATCCGCTCGCTGGARCACGCCTACAGC~GACGGCGGCCTGGCGGTGCTCTACGG
CIRSLEHAYSKDGGLAVLYG
4530
TAACTTTGCGGAAAACGGCTGCATCGTGAARACGGCAGGCGTCGATGACAGCATCCTCkA
NFAENGCIVKTAGVDDS
I L
K
4590
ATTCACCGGCCCGGCGRRAGTGTACGRRRGCCAGGACGAGCGGTAG~GCGATTCTCGG
FTGPAKVYESQDDAVEAILG
4650
CGGTAAAGTTGTCGCCGGAGATGTGCTAGTAATTCGCTATGGCGGTCC
GKVVAGDVVVIRYEGPKGGP
4710
GGGGATGCA;;GAAATGCTCTACCCAACCA;;CTTCCTGAC
GMQEMLYPTSFLKSMGLGKA
4770
CTGTGCGCTGATCACCGACGGTCGTTTCTCTGGTGGCACCTCTGG~CTTTCCATCGGCCA
CALITDGRFSGGTSGLSIGH
4830
CGTCTCACCEGAAGCGGCAAGCGGCGGCAGCAGCATTGGCCTGATTG~GATGGTGACCTGAT
v
s
P
EAASGGS
I G L
I
EDGDLI
4890
CGCTATCGACATCCCGRACCGTGGCATTCAGTTACAGGTlGCGGC
AIDIPNRGIQLQVSDAELAA
49so
GCGTCGTGAAGCGCAGGACGCTCGAGGTGACAAAGCCTGGACGCCGAAAAATCGTGRACG
R
REAQDARGDKAWTPKNRER
5010
TCAGGTCTCCTTTGCCCTGCGTGCTTATGCCAGCRGGCGC
QVSFALRAYASLATSADKGA
--------
.
5070
GGTGCGCGATAAATCGAAACTGGGGGTTAATARTGGCTGGCTGAC~CGC~CCCCTGTCCGGT
VRDKSKLGG**M
ADSQPLSG
.
>>>>>>>>>>>>
.<i<<<<<<<<
GCTCCGGAAGGTGCCGAATATTTARGRGCAGCAGTGCTGCGCGCGCCGGTTTACGAGGCGGCG
APEGAEYLRAVLRAPVYEAA
T T
T
GA
GC
CG
CA
G
G
CG
GC
CG
C
GC
CG
GC TA
GT
GC
GG
TT
AG
= -1
9.0
<<
<
. CAGGTTACGCCGCTACAAAiAA
5190
TGGAAAAACTGTCGTCGCGTCTTGATAACGTCATTCTG
QVTPLQKMEKLSSRLDNVIL
5250
GTGAAGCGCGAAGATCGCCAGCCAGTGCACAGCTTTAAGCTGCGCGGCG~ATACGCCATG
VKREDRQPVHSFKLRGAYAM
5310
ATGGCGGGCCTGACGGAAGAAG~GCGCACGGCGTGATCACTGCTTCTGCGGGTAAC
MAGLTEEQKAHGVITASAGN
5370
CACGCGCAGGGCGTCGCGTTTTCTTCTGCGCGGTTAGGCGG
HAQGVAFSSARLGVK
AL
I
VM 5430
CCAACCGCCACCGCCGACATCARAGTCGACGCGGTGCGCGGCTTCGGCGGCGRAGTGCTG
PTATADI
KVDAVRGFGGEVL
5490
CTCCACGGCGCGAACTTTGATGAAGCGAIL;GCCRAAGCG~TCG~CTGT~ACAGCAGCAG
LHGANFDEAKAKAIELSQQQ
5590
GGGTTCACCTGGGTGCCGCCGTTCGACCATCCGACCATCCGATGGTGATTGCCGGGC~GGCACGCTG
GFTWVPPFDHPMVIAGQGTL
5610
GCGCTGGAACTGCTCCAGCAGGACGCCCATCTCGACCGCGTATTTGTGC~AGTCGGCGGC
AL
EL
L
QQDAHLDRVFVPVGG
5670
GGCGGTCTGGCTGCTGGCGTGGCGGTGCTdATCAAACAAGTG
GGLAAGVAVLIKQLMPQIKV
.
5730
ATCGCCGTAGkRGCGGRTCCGCCTGCCTGRARGCAGCGCTGGATGCGGGTCATCCG
IAVEAEDSACLKAALDAGHP
5790
GTTGATCTGCCGCGCGTAGGGCTATTTGCTG~GGCGTAGCGGT~CGCATCGGTGAC
VDLPRVGLFAEGVAVKR
I
G
D
5890
GAAACCTTCCGTTTATGCCAGGAGTATCTCGACGACATCATCACCGTCGATAGCGATGCG
ETFRLCQEYLDDI
ITVDSDA
. 5910
ATCTGTGCGGCGATGAAGGATTTATTCGGATGTGCGCGCGGTGGCGG~CCCTCTGGC
ICAAMKDLFEDVRAVAEPSG
5970
GCGCTGGCGCTGGCGGGAATG ' AAAAAATA'iATCGCCCTGtACAACATTC&GGCGAACGG
ALALAGMKXY
IALHNXRGER
6030
CTGGCGCATATTCTTTCCGGTGCCAACGTGAACTTCCACG
LAHXLSGANVNFHGLRYVSE
6090
CGCTGCGAACTGGGCGRACAGCGTGMGCGG
R
c
ELGEQREALLAVTIPEEK
, 6150
GGCAGCTTCCTCARATTCTGCCAACTGCTGCGGGCGTT
GSFLKFCQLLGGRSVTEFNY
6210
CGTTTTGCCGATGCC~CGCCTGCATCTTTGTCGGTGTGCGCCTGAGCCGCGGCCTC
RFADAKNACIFVGVRLSRGL
6270
GAAGAGCGCAARG~TTTTGCAGATGCTCAACGACGGCGGCTACAGCGTGGTTGATCTC
EERKEILQMLNDGGYSVVDL .
. 6330
TCCGACGACGAAATGGCGRAGCTACACGTGCGCTATATGGTCGGCGGACGTCCATCGCAT
SDDEMAKLHVRYMVGGRPSH
6390
CCGTTGCAGGAACGCCTCTACAGCTTCGA;LTTCCCGGAATCACCGGGCGCGCTGCTGCGC
PLQERLYSFEFPESPGALLR
6450
TTCCTCAACACGCTGGGTACGTACTGGAACATTTCTTTGTTCCACTATCGCAGCCATGGC
FLNTLGTYWNISLFHYRSHG
6510
ACCGACTACGGGCGCGTACTGGCGGCGTTCGRRCTTGGCGACCATG~CCGGATTTCG~
TDYGRVLAAFELGDHEPDFE
6570
ACCCGGCTGAATGAGCTGGGCTACGATTGCCACGACGAAACCCGGCGTTCAGG
TRLNELGYDCHDETNNPAFR
6630
TTCTTTTTGGCGGGTTAGGGAAAAATGCCTGATAGCGCTTCGCTTATCAGGCCTACCCGC
FFLAG'
>>>>>>>>>>>
<<<<<<<<<<<<
6690
GCGACAACGTCATTTGTGGTTCGGCAAAATCTTCTTCCAG~TGCCTC~TTAGCGGCTCATG
l
>>>>>>>
6710
TAGCCGCTTTTTCTGCGCAC
<cc<<<<
T
C
T
:: GC
AT T
TA
AT
GC
c":
CG
AATGCCTA
AG = -19.6
(il.u% stop)
T
A
G
C
T
TA
CG
GC
zj:
GC
TTATTT
A G = -15.6
(FlyA
stop
)
Fig.
2.
Nuc
leot
ide
sequ
ence
of
the
ilv
D a
nd
ilvA
gen
es.
The
nu
mbe
ring
is
as
show
n in
Fig
. 1.
Stop
co
dons
ar
e in
dica
ted
by
aste
risk
s.
Are
as
of i
nver
ted
repe
ats
are
indi
cate
d (>
, <
) a
nd
the
pote
ntia
l st
em
and
loop
st
ruct
ures
ar
e in
dica
ted
in
the
righ
t m
argi
n,
alon
g w
ith
the
calc
ulat
ed
free
en
ergy
. T
he
pote
ntia
l st
em-a
nd-l
oop
stru
ctur
es
at t
he
ihE
-ilv
D
and
ilvD
-ilv
A
boun
dari
es
are
of u
nkno
wn
sign
ific
ance
, w
hile
th
ose
cent
ered
ar
ound
nt
pos
ition
s 65
90-6
700
rese
mbl
e R
ho-i
ndep
ende
nt
tran
scri
ptio
n te
rmin
atio
n si
tes
for
i(vC
(tr
ansc
ribe
d fr
om
righ
t to
left
) an
d it
vA (
tran
scri
bed
from
le
ft t
o ri
ght)
, re
spec
tivel
y.
The
as
teri
sk
at
nt p
ositi
on
6641
is
cent
ered
at
the
TG
A
stop
co
don
ofth
e ilv
Cge
ne
prod
uct
code
d by
the
com
plem
enta
ry
stra
nd.
The
ov
erlin
ed
regi
ons
indi
cate
an
R
BS
for
the
ilvD
mR
NA
at
nt
po
sitio
ns
3177
-318
0,
the
TA
AT
AA
st
op
codo
ns
for
ilvD
an
d th
e A
TG
in
itiat
ion
codo
n fo
r il
vA
at
nt
posi
tions
5039
-504
6.
TA
BL
E
II
Am
ino
acid
co
mpo
sitio
n of
i&
code
d pe
ptid
es
a
Am
ino
acid
ilv l
oci
G
M
E
D
A
Y
c I
H
B
N
E.
coli
(199
pr
o-
tein
s b,
Ala
59
(11
.3)
Arg
19
(3
.6)
Asn
I5
(2
.9)
Asp
33
(6
.3)
CY
s Y
(1
.7)
Gln
32
(6
.1)
GlU
25
(4
.8)
GU
Y
42
(8.1
)
His
17
(3
.3)
Ile
20
(3.8
)
Leu
59
(11
.3)
Lys
17
(3
.3)
Met
19
(3
.6)
Phe
17
(3.3
)
Pro
30
(5.8
)
Ser
22
(4.2
)
Thr
29
(5
.6)
Tr P
6
(1.2
)
Tyr
12
(2
.3)
Val
39
(7
.5)
Aci
dic
“
Bas
icd
Aro
m
*
Hyd
ro
d
58 (
11.1
)
36
(6.9
)
35
(6.7
)
172
(33.
0)
9 (1
0.2)
1 (8
.0)
6 (6
.8)
3 (3
.4)
2 (2
.3)
I (8
.0)
3 (3
.4)
1 (1
.1)
4 (4
.5)
4 (4
.5)
7 (8
.0)
1 (1
.1)
4 (4
.5)
3 (3
.4)
2 (2
.3)
8 (9
.1)
5 (5
.7)
0 (0
.0)
0 (0
.0)
11 (
12.5
)
6 (6
.8)
8 (9
.1)
3 (3
.4)
29 (
33.0
)
28
(9.0
)
18
(5.8
)
10
(3.2
)
16
(5.2
)
3 (1
.0)
10
(3.2
)
22
(7.1
)
30
(9.7
)
8 (2
.6)
20
(6.5
)
21
(6.8
)
12
(3.9
)
8 (2
.6)
13
(4.2
)
13
(4.2
)
20
(6.5
)
15
(4.8
)
6 (1
.9)
11
(3.5
)
25
(8.1
)
38 (
12.3
)
30
(9.7
)
30
(9.7
)
104
(33.
5)
56
(9.3
)
34
(5.7
)
20
(3.3
)
40
(6.7
)
15
(2.5
)
20
(3.3
)
27
(4.5
)
61 (
10.2
)
12
(2.0
)
32
(5.3
)
52
(8.7
)
31
(5.2
)
22
(3.7
)
15
(2.5
)
29
(4.8
)
38
(6.3
)
32
(5.3
)
4 (0
.7)
12
(2.0
)
47
(7.8
)
67 (
11.2
)
65 (
10.9
)
31
(5.2
)
184
(30.
7)
60
(11.
7)
31
(6.0
)
14
(2.7
)
29
(5.6
)
7 (1
.4)
19
(3.7
)
38
(7.4
)
46
(8.9
)
16
(3.1
)
20
(3.9
)
59 (
11s)
21
(4.1
)
12
(2.3
)
25
(4.9
)
21
(4.1
)
24
(4.7
)
16
(3.1
)
2 (0
.4)
14
(2.7
)
40
(7.8
)
67 (
13.0
)
52 (
10.1
)
41
(8.0
)
172
(33.
4)
26
(8.7
)
22
(7.4
)
8 (2
.7)
12
(4.0
)
4 (1
.3)
13
(4.4
)
24
(8.1
)
15
(5.0
)
10
(3.4
)
14
(4.7
)
34 (
11.4
)
11
(3.7
)
8 (2
.7)
11
(3.7
)
21
(7.0
)
17
(5.7
)
15
(5.0
)
3 (1
.0)
3 (1
.0)
26
(8.7
)
99 (
33.2
)
36 (
12.1
)
33 (
11.1
)
11
(5.7
)
50 (
10.2
)
21
(4.3
)
16
(3.3
)
26
(5.3
)
6 (1
.2)
21
(4.3
)
41
(8.3
)
46
(9.3
)
7 (1
.4)
28
(5.7
)
45
(9.1
)
34
(6.9
)
21
(4.3
)
16
(3.3
)
17
(3.5
)
21
(4.3
)
22
(4.5
)
5 (1
.0)
17
(3.5
)
31
(6.3
)
163
(33.
1)
67
(13.
6)
55 (
11.2
)
38
(7.7
)
50 (
8.8)
26 (
4.6)
17 (
3.0)
27 (
4.8)
10 (
1.8)
31
(5.5
)
22
(3.9
)
50 (
8.8)
18 (
3.2)
30 (
5.3)
49
(8.6
)
23
(4.1
)
23
(4.1
)
13 (
2.3)
28 (
4.9)
32 (
5.6)
31 (
5.5)
12 (
2.1)
18 (
3.2)
56 (
9.9)
49
(8.6
)
49
(8.6
)
43
(7.6
)
201
(35.
4)
9 (5
.6)
13
(8.1
)
3 (1
.9)
7 (4
.3)
0 (0
.0)
8 (5
.0)
12
(7.5
)
14
(8.7
)
2 (1
.2)
12
(7.5
)
18 (
11.2
)
8 (5
.0)
3 (1
.9)
3 (1
.9)
3 (1
.9)
14
(8.7
)
9 (5
.6)
0 (0
.0)
3 (1
.9)
19 (
11.8
)
19 (
11.8
)
21
(13.
0)
6 (3
.7)
58 (
36.0
)
71 (
12.6
)
24
(4.3
)
19
(3.4
)
29
(5.2
)
9 (1
.6)
34
(6.0
)
30
(5.3
)
45
(8.0
)
14
(2.5
)
47
(8.3
)
47
(8.3
)
18
(3.2
)
21
(3.7
)
17
(3.0
)
37
(6.6
)
23
(4.1
)
29
(5.2
)
4 (0
.7)
10
(1.8
)
34
(6.0
)
59 (
10.5
)
42
(7.5
)
31
(5.5
)
180
(32.
0)
3 (3
.1)
(9.9
)
5 (5
.2)
(5.7
)
7 (7
.2)
(4.0
)
9 (9
.3)
(5.5
)
2 (2
.1)
(1.0
)
8 (8
.2)
(4.4
)
4 (4
.1)
(6.6
)
3 (3
.1)
(4.7
)
3 (3
.1)
(2.1
) 7
(7.2
) (5
.8)
9 ‘(
9.3)
(9
.6)
4 (4
.1)
(5.1
)
4 (4
.1)
(5.8
)
5 (5
.2)
(3.6
)
3 (3
.1)
(4.1
)
4 (4
.1)
(5.6
)
5 (5
.2)
(5.2
)
1 (1
.0)
(1.1
)
0 (0
.0)
(2.8
)
10 (
10.3
) (7
.4)
13 (
13.4
) 12
.1
9 (9
.3)
10.8
6 (6
.2)
7.5
36 (
37.1
) 36
.1
(Tab
le
II.
cont
inue
d)
M,
56,4
95
9704
34
098
6421
9 56
202
33
175
54
074
62
033
1753
4 60
447
1108
4 N
A’
Q’/
D
: arg
e
19/O
7/
O
1 s/o
32
12
2912
20
/o
21/o
28
13
13/o
23
/l 5/
O
NA
leu
3412
5 21
5 20
/l 48
14
49/1
0 25
19
4312
29
120
919
3211
5 8/
O
NA
ser
14/8
41
4 16
14
21/l
1 17
17
10/s
19
12
2418
91
5 1
l/12
l/3
NA
Mod
’ 3.
1%
2.3%
2.
3 %
4%
3.
1%
3.0%
0.
8%
5.5%
5.
0%
2.3%
3.
1%
NA
P2”
0.61
0.
38
0.56
0.
39
0.46
0.
29
0.58
0.
43
0.40
0.
59
0.48
N
A
il T
he
num
ber
of a
min
o ac
ids
and
its p
erce
ntag
e (i
n pa
rent
hese
s)
in e
ach
gene
pr
oduc
t is
pre
sent
ed.
In a
dditi
on,
the
M,,
Q
/D
ratio
, %
mod
ulat
or
codo
ns,
and
the
P2 i
ndex
ar
e in
dica
ted.
See
RE
SUL
TS
AN
D
DIS
CU
SSIO
N,
sect
ion
e, f
or
deta
ils.
b M
aruy
ama
et a
l. (1
986)
. Se
e T
able
I,
foo
tnot
e b.
’ N
ot
avai
labl
e.
d A
cidi
c (A
sp
+ G
lu),
ba
sic
(Arg
+
Lys
), ar
om
(aro
mat
ic:
Phe
+ T
rp
+ T
yr),
hy
dro
(hyd
roph
obic
: ar
omat
ic
+ Il
e +
Leu
+
Met
+
Val
).
e T
he
Q/D
va
lue
(qua
rtet
/due
t)
is c
alcu
late
d (G
rant
ham
et
al.,
19
81)
for
amin
o ac
ids
with
si
x co
dons
. Fo
r ex
ampl
e,
the
ilvG
gen
e us
es
19 A
rg
codo
ns
(8 +
4 +
4 +
3)
from
th
e qu
arte
t
(CT
G,
CG
C,
CG
A,
and
CG
G,
resp
ectiv
ely)
an
d no
ne
(0 +
0)
from
th
e du
et
(AG
A
and
AG
G,
resp
ectiv
ely)
(T
able
I)
.
r M
od
(mod
ulat
ory
codo
ns:
CU
A,
AU
A,
CG
A,
CG
G,
AG
A,
AG
G,
GG
A,
and
GG
G)
are
rare
ly
used
in
hig
hly
expr
esse
d ge
nes
(Gro
sjea
n an
d Fi
ers,
19
82).
Fo
r ex
ampl
e,
ilvG
has
16
mod
ulat
ory
codo
ns
and
521
tota
l co
dons
(1
6/52
1 =
3.1%
) (T
able
I)
.
g T
he
P2
inde
x (G
ouy
and
Gau
tier,
1982
) w
as
calc
ulat
ed
as
follo
ws:
(T
TC
+
AT
C
+ C
CT
+
GC
T
+ T
AC
+
AA
C
+ C
GT
+
GG
T)/
[(T
TC
+
AT
C
+ C
CT
+
GC
T
+ T
AC
+
AA
C
+
CG
T
+ G
GT
) +
(IT
T,
AT
T,
CC
C,
GC
C,
TA
T,
AA
T,
CG
C,
and
GG
C)]
. T
he
valu
e fo
r ilv
G i
s (S
S)/[
(SS)
+
(57)
] =
0.61
.
194
similar overlap is seen in the trpEDCBA cluster
(Yanofsky et al., 198 1) at the trpE-trpD and trpB-trpA
junctions. This circumstance in trp correlates with
the pairs of genes whose products are associated in
enzyme complexes, and, at least for the trpB-trpA
pair, the presence of translational coupling of the two
polypeptides. The ilvD- and ilvA-coded peptides
have not been observed to aggregate in cell extracts,
and the potential presence of translational coupling
has not yet been tested. In bacteriophage i, the
sequence ATGA occurs at the boundaries of a
cluster of several genes; it provides both terminator
(TGA) and (re)initiator (ATG) codons (Kroger and
Hobom, 1982). This overlapping sequence differs
from that of the bacterial genes and the significance
of the sequence in bacteriophage /i is uncertain.
(c) The ilvA gene
An ORF extending from an AUG at nt position
5044 to UAG stop codon at nt position 6588 encodes
a protein of 515 aa with an A4, of 56202 (Fig. 2,
Tables I and II). We previously purified the ihA gene
product, threonine deaminase (Calhoun et al., 1973)
and its subunit A4, was estimated to be 51800 by
equilibrium centrifugation in 6 M guanidine HCl,
and 53 000 on SDS-100,; polyacrylamide gels. The
subunit size estimate for this peptide on 10 O0 to 15 :’ 0
polyacrylamide gradient-SDS gels in maxicells
(Gray et al., 198 1) was 53 000-55 000. The predicted
and observed A4, estimates are, therefore, in good
agreement.
An inverted repeat sequence is situated near the
5’-end of the ilvA coding region at nt positions
5 192-522 1. The AG for this potential stem and loop
is -19.0 kcal/mol, which should provide considera-
ble stability. The occurrence of the inverted repeats
in the ihE-ilvD intercistronic region and near the
ilvD-ilvA boundary does not appear to simply be a
random event. A computer search for inverted
repeats in the ilvGMEDA coding region (nt position
271-6588) using various limits for stem length, 3,
match, loop length, and loopout size within the stem
revealed dozens of inverted repeats, but the two with
the greatest predicted stabilities are those described
above. For comparison, the estimated free energies
for the base-paired regions of the ilv leader mRNA
are -26 to -28 kcal/mol, and -19.6 and -15.6 for
the postulated (Wek and HatIield, 1986) Rho-
independent terminators for ilvC and ilvA, respec-
tively (Fig. 2). Therefore, it is reasonable to suspect
that there may be some yet to be determined
functional significance for these intercistronic in-
verted repeat sequences.
(d) Database searches
The nucleic acid and protein databases have
undergone significant increases in size during the
past few years, so we searched for homologous
sequences not only for ilvD and ilvA, but also for ilvG,
ilvM, ilvE, iIvY, ilvB, ilvN, and ilvC. We conducted
separate searches of the EMBL and NIH nucleotide
databases using both strands of the ilv genes as query
sequences, and we searched the NBRF library using
the deduced ilv peptide sequences.
A significant homology was detected between the
peptide (Fig. 3) and nucleotide sequences of the i/v/l
gene of E. coli and the IL VI gene of yeast (Kielland-
Brant et al., 1984) which code for biosynthetic
threonine deaminases. The region of homology
encompasses the entire ihA gene and all of the yeast
ILVl gene except for the 150 to 160 bp at the 5’-end.
The yeast gene, in comparison to the E. &gene, has
additional sequences at the 5’-end, and these
presumably encode the approx. 50 aa of the
N-terminal peptide that is processed in association
with mitochondrial import (Kielland-Brant ct al.,
1984). A similar situation was found (Falco et al.,
1985) when the yeast ZLV2-encoded acetoiactate
synthase was compared to the E. coli enzyme, in that
the yeast enzyme contains an additional 90 aa at the
N-terminal end.
With one possible exception, no additional signili-
cant homologies were detected between the ilv nucle-
otide and amino acid sequences and those in the
databases. The possible exception, detected using
the FASTN procedure (Lipman and Pierson, 19X5),
is a 989-bp overlap with 47’jb identity corresponding
to the 5’-end of the ilvC gene and the 3’-end of the
ilvIH region (the distal 267; of ihl and all of i/vfH).
While it is intriguing that there is some homology
between these two ifv loci, the extent of nucleotide
sequence similarity is of borderline significance, and
no significant homology was detected when the
predicted amino acid sequences were compared. In
addition. when more stringent algorithms were used
to test for relatedness among the i/v genes (see
RESULTS AND DISCUSSION, section b), this weak
homology was not detected.
195
E. coli Yeast
500
i 1.
I I I
Fig. 3. Comparison of the amino acid sequence of the threonine deaminases coded by E. coli K-12 ilvA gene (abscissa) and the yeast
IL VI gene (ordinate) using the MATRIX routine (see MATERIALS AND METHODS, section b). Note that extrapolation of the line
indicating homology intersects the ILVZ gene of yeast at approximately aa residue 50, indicating that the yeast peptide has additional
N-terminal residues relative to the ilvA coded peptide of E. coli (see RESULTS AND DISCUSSION, section d). A similar pattern of
homology can be found at the nucleotide level (not shown).
(e) Codon utilization and amino acid composition
Table I presents the codon utilization for each of
the eleven ilv loci that have been sequenced, and the
average codon utilization for 199 E. coli genes
(Maruyama et al., 1986). Table 11 presents the amino
acid composition of ilv-coded peptides, as well as
criteria (see MATERIALS AND METHODS, section b), thought to correlate with the level of gene expression,
including the Q/D ratio (Grantham et al., 1981), the
frequency of modulator codons (Grosjean and Fiers,
1982), and the P2 index (Gouy and Gatier, 1982).
196
The codon usage for &coded peptides is quite
similar to the average of 199 E. coli peptides
(Table I). It may be significant that the TTA codon
for leucine is markedly more frequent in peptides
coded by iZvG (3.1%), ilvM (4.5%) and ilvH (4.3:/,)
than by the average for E. coli (0.8”/,), and the ATT
codon for isoleucine is much lower for ilvG (0%) and
Gove (0%) than the average (2.4”~). The tRNAs
co~esponding to these codons are of average
abundance, ana the significance of this observation,
particularly in the context of the role of the intracel-
lular concentration of leucine and isoleucine (along
with valine) in regulating expression of the
iZvGA4EDA cluster (Freundlich et al., 1962), is
unknown.
The amino acid composition of the &-coded
peptides (Table II) is, without exception, typical of
the average amino acid composition for E. coli. This
observation is particularly significant with regard to
the average content of leucine, isoleucine, and valine
for the genes in the ilvGMEDA cluster in the context
of the phenomenon of downstream amplification
(Smith et al., 1976). It was originally noted (Kline
et al., 1974) that the extent of derepression, as
measured by enzyme assay of the ilvEDA gene pro-
ducts, was similar for all three enzymes during limita-
tion for valine or leucine. In contrast, limitation for
isoleucine resulted in a gradient of derepression, with
derepression greater for ilvD than ilvE, and greatest
for ilvA. At that time it was not possible to distinguish
between two alternative explanations for this result,
namely (i) the presence of isoleucine-specific dif-
ferential control for i/v,?, ilvD and i/VA, and (ii) the
presence of progressively lower isoleucine composi-
tion in the peptides coded by iIvE, ilvD, and i/VA,
respectively. Our results make it possible to exclude
the second hypothesis. Two other observations are
of interest in this regard. First, rho mutations (e.g.,
rho221) mimic isoleucine limitation, in that the
derepression measured by enzyme assay is about
twice as great for ilvD than for ilvE, while ilvA is about
5-IO-fold greater than ilvE (Smith et al., 1976).
Second, the isoleucine-specific effect is seen even
when expression is mediated exclusively from site(s)
downstream from the iivG promoter (Gray et al.,
1982). The molecular basis for these effects is cur-
rently under investigation.
The Q/D ratios (Grantham et al., 1981) for
arginine, leucine, and serine correlate with levels of
expression, and are higher in highly expressed genes
(6 l/l, 6316, and 48/l 1, respectively) than in genes
with low levels of expression (57/13, 71/24, and
46/22, respectively). By these criteria the expression
of most of the ilv genes (Table II) is predicted to be
generally moderate, but the ilvC gene, with ratios of
21/O, 4312, and 19/2, clearly fits into the highest
expression category.
modulatory codons (Grosjean and Fiers, 1982)
correspond to tRNAs of lowest abundance in vivo
and are quite rare in genes with high levels of expres-
sion (six modulator codons in 13 genes) but are
about IO-fold more common in genes expressed at a
lower level (68 modulator codons in 16 genes)
(Grantham et al., 1981). By this criteria most ilv
genes (Table II) fall into the intermediate expression
category, but, as seen with the Q/D ratios, the ilvC
gene is predicted to be the most highly expressed ilv
gene, with only 4/492 (0.87;) modulator codons.
Finally, the P2 ratios (Gouy and Gautier, 1982)
vary from about 0.6 to 0.9 for the most highly
expressed genes (e.g., tufA, ompA, and Ipp), and from
about 0.3 to 0.4 for the most weakly expressed genes
(e.g., regulatory proteins such as those specific for
trp_ LX, and ura). By this measurement the ilv genes
(Table II) are generally intermediate with some (i/vG,
ilvE, i/vC, and ilvB) at the lower extreme of the highly
expressed group. The ilvY gene, which codes for the
activator protein for the ilvC gene and is the only
regulatory protein in this group (Umbarger, 1983)
scores lowest by these criteria in accord with
expectations.
The absolute number of molecules per cell for
most of the i&coded peptides is not known, and the
levels vary in response to control signals in noi:-
uniform patterns (e.g., see above in this section).
However, our previous studies of [ “5S]methionine
labelling of proteins following infection of lysogenic
hosts (to prevent readthrough from phage pro-
moters) with 1dilv phage in which the iIvGMEDA and
i/vYC genes were monitored (Gray and Calhoun,
1982) can now be interpreted more quantitatively
since the results presented here (Table II) reveal that
the methionine composition for these gene products
varies by less than a factor of two. in those studies
we found that the ilvC gene product was produced at
roughly 5-lo-fold higher levels than ilvGEDA, while
the ilvY coded peptide was not detected, thus provid-
ing the expected correlation based on codon usage
(Table I).
197
(f) Searches for homology among the ilv genes
We conducted dot matrix analyses of all pairwise
combinations (see MATERIALS AND METHODS,
section b) of ilv nucleotide and amino acid sequences
to test for relatedness among these genes, and to test
for potential repeated sequences within each gene.
The analysis detected the previously reported
homologies between the ilvGM, ilvBN, and ilvIH loci
(Squires et al., 1983; Wek et al., 1985) that encode
isozymes of acetohydroxy acid synthases. No other
statistically significant homologies or internal repeats
were detected for the nucleotide or amino acid
sequences. These results do not indicate the presence
of retrograde evolution (Horowitz,. 1985) which
invokes a sequential recruitment during evolution of
new enzymes in reverse order of the biochemical
pathway. It is not known if the very weak homology
noted in section d, above between ilvC and ilvIH
reflects an ancestral relationship. Evidence for
retrograde evolution is found for some (e.g., the
P-ketoadipate pathway in bacterial genera; Yeh
et al., 1982) but not other (e.g., the trp operon of
E. coli; Yanofsky et al., 1981) pathways. Therefore,
alternative hypotheses, such as enzyme recruitment
based upon substrate ambiguity (Jensen, 1976), can
be considered as a viable alternative for the genes in
the ilvGMEDA cluster.
ADDENDUM
After this manuscript was submitted for publi-
cation we received a copy of a manuscript (Lawther
et al., 1987) containing the sequence of the E. coli
K-12 ilvD and ilvA genes. A comparison of these
independently determined sequences revealed twelve
differences at eight sites, including three additional
bases in our sequence between nt positions 4410 and
4500 that encode part of iZvD. We communicated
with these authors and their review of their se-
quencing data confirmed our sequence; a correction
will be submitted by these authors (R.P. Lawther,
personal communication).
ACKNOWLEDGEMENT
This research was supported by Public Health
Service grant GM23 182 from the National Institutes
of Health.
REFERENCES
Calhoun, D.H., Rimerman, R.A. and Hatfield, G.W.: Threonine
deaminase from Escherichia coli, I. Purification and proper-
ties. J. Biol. Chem. 248 (1973) 3511-3516.
Dale. R.M.K., McClure, B.A. and Houchins, J.P.: A rapid single-
stranded cloning strategy for producing a sequential series of
overlapping clones for use in DNA sequencing: application
to sequencing the corn mitochondrial 18s rDNA. Plasmid I3
(1985) 31-40.
Falco, S.C., Dumas, K.S. and Livak, K.J.: Nucleotide sequence
of the yeast ILV2 gene which encodes acetolactate synthase.
Nucl. Acids Res. 13 (1985) 401 I-4027.
Freundlich, M., Burns, R.O. and Umbarger, H.E.: Control of
isoleucine, valine, and leucine biosynthesis, 1. Multi-valent
repression. Proc. Natl. Acad. Sci. USA 48 (1962) 1X04-1808.
Gouy, M. and Gautier, C.: Codon usage in bacteria: correlation
with gene expressivity. Nucl. Acids Res. IO (1982)
7055-7074.
Grantham, R., Gautier, C., Gouy, M., Jacobzonc, M. and
Mercicr, R.: Codon catalogue usage is a genome strategy
modulated for gene expressivity. Nucl. Acids Res. 9 (1981)
r43-r73.
Gray, J.E., Bennett, D.C., Umbarger, H.E. and Calhoun, D.H.:
Physical and genetic localization ofilv regulatory sites in I,di/v
bacteriophages. J. Bacterial. 149 (1982) 1071-1081.
Gray, J.E. and Calhoun, D.H.: Absence of significant membrane
localization of the proteins coded by the i/vGEDAC genes of
E.rcherichia co/i K-12. J. Bacterial. 151 (1982) 119-126.
Gray, J.E., Patin, D.W and Calhoun, D.H.: Identilication of the
protein products of the rmC, ilv, rho region of the Escherichiu
coli K 12 chromosome. Mol. Gen. Genet. I83 (1981)
-‘78-430.
Gray. J.E., Wallen, J.W. and Calhoun, D.H.: Identification of a
protein of 15000 daltons related to isoleucine-valine bio-
synthesis in E.vherichia coli K-12. J. Bacterial. 151 (1982)
127-134.
Grosjean, H. and Fiers, W.: Preferential codon usage in
prokaryotic genes: the optimal codon-anticodon interaction
energy and the selective codon usage in efficiently expressed
genes. Gene 18 (1982) 199-209.
Horowitz, N.H.: On the evolution ofbiochemical synthesis. Proc.
Natl. Acad. Sci. USA 31 (1985) 153-157.
Jensen, R.A.: Enzyme recruitment in evolution of new function.
Annu. Rev. Microbial. 30 (1976) 409-425.
Kielland-Brant, M.. Holmberg, S., Petersen, J.G.L. and Nilsson-
Tillgnen. T.: Nucleotide sequence of the gene for threonine
deaminase (IL VI ) of Saccharom_tws cerevisiue. Carlsberg
Res. Commun. 49 (1984) 567-575.
198
Kline, E.L., Brown, C.S., Coleman Jr., W.G. and Umbarger,
H.E.: Regulation of isoleucine-valine biosynthesis in an
ilvDAC deletion strain of Escherichia cob K-12. Biochem.
Biophys. Res. Commun. 57 (1974) 1144-1151.
Kriiger, M. and Hobom, G.: A chain of interlinked genes in the
nitzR region of bacteriophage lambda. Gene 20 (1982) 25-38.
Kuramitsu, S., Ogawa, T., Ogawa, H. and Kagamiyama, H.:
Branched chain amino acid aminotransferase of Escherichiu
co/i: nucleotide sequence of the ilvE gene and the deduced
amino acid sequence. J. Biochem. 97 (1985) 993-999.
Lawther. R.P., Calhoun, D.H., Adams, C.W., Hauser, CA.,
Gray, J. and Hatfield, G.W.: Molecular basis of valine
resistance in Escherichiu coli K-12. Proc. Natl. Acad. Sci.
USA 78 (1981) 922-925.
Lawther, R.P., Wek, R.C., Lopes, J.M., Pereira, R., Tallion, B.E.
and Hatfield, G.W.: The complete nucleotide sequence ofthe
ilvGA4EDA operon of Escherichia co/i K-12. Nucl. Acids Res.
15 (1987) 2137-2155.
Lipman, D.J. and Pearson, W.R.: Rapid and sensitive protein
similarity searches. Science 227 (1985) 1435-1441.
Maruyama, T., Gojobori, T., Aota, S.-I. and Ikemura. T.: Codon
usage tabulated from the GenBank sequence data. Nucl.
Acids Rcs. 14 (1986) r151-197.
Messing, J.: New Ml3 vectors for cloning. Methods Enzymol.
101 (1983) 20-89.
Sanger, F., Nicklen, S. and Co&on, A.R.: DNA sequencing with
chain terminating inhibitors. Proc. Natl. Acad. Sci. USA 74
(1977) 5463-5467
Smith, J.M., Smolin, D.E. and Umbarger, H.E.: Polarity and
regulation of the ilv gene cluster in Escherichiu co/i K- 12. Mol.
Gen. Genet. 148 (1976) 111-124.
Squires, C.H., DeFelice, M., Devereux. J. and Calvo, J.M.:
Molecular structure ofilvlH and its evolutionary relationship
to ilvG in Escherichia coli K-12. Nucl. Acids Res. I I (1983)
5299-5313. Umbarger, H.E.: The biosynthesis of isoleucine and valine and
its regulation. In Herrmann, K.M. and Somerville, R.L.
(Eds.), Amino Acid Biosynthesis and Genetic Regulation.
Addison-Wesley, London. 1983, pp. 245-266.
von Hippel, P.H., Bear, D.G., Morgan, W.D. and McSwiggen,
J.A.: Protein-nucleic acid interactions in transcription: a
molecular analysis. Annu. Rev. Biochem. 53 (1984) 389-446. Wek. R.C., Hauser, CA. and Hatfield, G.W.: The nucleotidc
sequence of the ilvBN operon of Escherichicr cob: sequence
homologies of the acetohydroxy acid synthase isozymcs.
Nucl. Acids Res. 13 (1985) 3995-4010.
Wek, R.C. and Hatfield, G.W.: Nucleotide sequence and in vivo
expression of the ilvY and ilvC genes in Escherichiu co/i K-12:
transcription from divergent overlapping promoters. J. Biol.
Chem. 261 (1986) 2441-2450.
Yanofsky, C.. Platt, T.. Crawford, I.P., Nichols, B.P.. Christie,
G.E., Horowitz, H., VanCleemput. M. and Wu, A.M.: The
complete nucleotide sequence of the tryptophan opcron of
Escherichiu co/i. Nucl. Acids Res. 9 ( 198 I) 6647-666X.
Ych, W.K., Shih, C. and Ornston, L.N.: Overlapping evolu-
tionary affmities revealed by comparison of amino acid com-
positions. Proc. Natl. Acad. Sci. USA 79 (19X2) 3794-3797.
Communicated by S.R. Kushner.