Codon preference in Corynebacteria

10
Gene, 134 (1993) 15-24 15 GENE 07350 usage; gene expression; G + C content; mutation pressure; molecular evolution) Area of Microbiology, Department of Ecology, Genetics and Microbiology, Faculty of Biology, Universityof León, 24071León, Spain Received by K.F. ChaleT: 7 January 1993; Revised/Accepted: 19 April/22 Apri11993; Received at publishers: 17 June 1993 SUMMARY the amino acid (aa) residues of the BLCG \1 species, non-pathogenic, are used in the production of of the so-called glutamic acid-producing corynebacteria Brevibacterium sp., which are well defined on the basis of and Komagata, 1970). Two of the aa-producing coryne- codons; lb, kilobase(s) or 1000 bp;Nc, effective number of codons used

Transcript of Codon preference in Corynebacteria

Gene, 134 (1993) 15-24@ 1993 Elsevier Science Publishers B.V. All rights reserved. 0378-1119/93/$06.00 15

GENE 07350

Codon preference in Corynebacteria

(Brevibacterium lactofermentum; Corynebacterium glutamicum; Corynebacterium diphtheriae; Gram +; synonymous codo n

usage; gene expression; G + C content; mutation pressure; molecular evolution)

Marcos Malumbres, José A. Gil and Juan F. Martín

Area of Microbiology, Department of Ecology, Genetics and Microbiology, F aculty of Biology, University of León, 24071 León, Spain

Received by K.F. ChaleT: 7 January 1993; Revised/Accepted: 19 April/22 Apri11993; Received at publishers: 17 June 1993

i I

~

SUMMARY

The codon usage (CU) of 34 genes from the closely related species, Brevibacterium lactofermentum and Corynebacterium

glutamicum (BLCG), was analysed and compared with that of 23 genes from other Brevibactt}rium and Corynebacterium

species. The G+C content ofthe BLCG genes ranged from 50 to 62%. A wider range was found in other corynebacterial

genes (25-71 %). The G+ C contents of non-coding regions in glutamic acid bacteria are lowt:r than those of the coding

regions and both values are lower than the G + C content of ribosomal RNA (rRNA) sequences, suggesting an un usual

biased mutation pressure. The CU and synonymous codon usage (SCU) analysisshowed several common characteristics

among the sequenced corynebacterial genes, consistent with the clase relatedness of B. lactofermentum and C. glutamicum.

A subset of 25 preferred codons were deduced from the presumably highly expressed genes and they encode most of

the amino acid (aa) residues of the BLCG group. An analysis of the effective number of codons (Nc) was carried outin arder to check the GC3s (G + C content at the silent third position of sense codons) dependence of the CU in

corynebacteria. Nc values showed differences between the BLCG group and other corynebacterial sequences. A compari-

son of the most used codons for each aa showed a stronger similarity to Streptomyces than to Escherichia coli. The

CUjSCU tables of corynebacteria are useful for identification of protein-coding regions, including start codons when

they are uncertain, and for designing oligodeoxyribonucleotide probes from an aa sequence. V"'-4 .~ "c""

: ",.." !

, , ,-. ,. , ."' " . 1,. r', '1 ()

f' INTRODUCTION structural and biochemical criteria and possess a pepti-

11 doglycan based on meso-diaminopimelic acid. However,

\1 Corynebacteria are irregular, non-sporing, Gram + corynebacteria are heterogenous with respect to their

" rods widely distributed in nature (Jones and Collins, mol% G+C content between species (46-78%; Yamada

1986). Some are plant or animal pathogens. Other and Komagata, 1970). Nevertheless, the G+C content

species, non-pathogenic, are used in the production of of the so-called glutamic acid-producing corynebacteriaamino acids (aa). They include Corynebacterium and falls into a narrow range from 51.2 to 54.4% (Yamada

Brevibacterium sp., which are well defined on the basis of and Komagata, 1970). Two of the aa-producing coryne-

Co"espondence to: Dr. J.F. Martín, Area ofMicrobiology, Department Gribskov et al. (1984); CGIHI, CGI calculated using the CU table of

of Ecology, Genetics and Microbiology, Faculty of Biology, University the presumably highly expressed genes ofBLCG; CGIT, CGI calculated

of León, 24071 León, Spain. Tel. (34-87) 291505; Fax (34-87) 291506; using the CU table of all the genes of BLCG; CU, codon usage;

e-mail: JAGIL@ELEULE 11. GC3s, G+C content at.. silent third positions of sense

codons; lb, kilobase(s) or 1000 bp;Nc, effective number of codons used

Abbreviations: aa, amino acid(s); B., Brevibacterium; BLCG, in a gene; nt, nucleotide(s); ORF, oren reading frame; P/T, percentage

B. lactofermentum and C. glutamicum group of sequences; bp, base of preferred codons used (preferred codons ayer total codons of a se-

pair(s); BSP, Brevibacterium sr. (excluding B. lactofermentum) group of quence, excluding Glu, Met ánd'ftp codons); r, ribosomal; RF, releasing

sequences; C., Corynebacterium; CGI, codon preference index of factor; SCU,synonymous codon usage.

16

bacteria, B.lactofermentum and C. glutamicum (sometimes bacterial sequences spans from 25 to 72% (Table 1), inconsidered to be a single species), have been studied in an extension similar to the G+ C content of genomic

some detail and several genes in volved in the glycolytic DNA in eubacteria (25-74%; Muto and Osawa, 1987).

pathway and in the biosynthesis of aa have been cloned Sequences related to C. diphtheriae contained from 42.6and sequenced (Martín, 1989). to 58.1 % of G + C. These sequences form a heterogenous

The genera Corynebacterium and Brevibacterium are group due to the presence of corynephage !3 tox gene andclassified as Gram + bacteria of high G + C content, clase plasmid sequences; but are all expressed in C. diphtheriae.

to Nocardia and mycobacteria and not distant from The sequences from B. steroliticum, B. liquefaciens,

Streptomyces (Park et al., 1987; Woese, 1987). In general, B. ammoniagenes and Brevibacterium sp. R312 containthe codon usage of microorganisms is related to the nu- 60-67% G + C and are clustered in the BSP group. Two

cleotide (nt) composition of their genomes (Muto and sequences from other Corynebacterium sp. ofhigh G+C

Osawa, 1987). Special attention has been given to the content(G+C=67-71%)havenotbeenconsidereddue

variation of the G + C content at the silent site of codons to the small representation. The remaining sequences (the

(GC3s; Lloyd and Sharp, 1992), in relation to the average majority) come from B.lactofermentum and C. glutamicum

nt composition of the genome (Muto and Osawa, 1987). strains and are clustered in the BLCG group (50-62%

In addition to species-specific variation in G+C content G+C), excluding pBL1 sequence. The average G+C

and CU, differences in codon preference exist among the composition of the BLCG group is of 54.8%, clase to thegenes within a species (Andersson and Kurland, 1990). value obtained by thermal melting point behaviour of

The synonymous codon usage (SCU) bias of a gene is B. lactofermentum (53.7%) and C. glutamicum (53.4%)also related to its level of in vivo expression in several chromosomal DNAs (Yamada and Komagata, 1970;

prokaryotic and eukaryotic systems (see reviews by Ernst, Jones and Collins, 1986).

1988; Andersson and Kurland, 1990). The codon prefer- Several sequences have atypical G+C composition

ence patterns of several organisms have been compiled and are not included in any of the groups. Thus, the

(Sharp et al., 1988) and studies ofcodon bias in the high- G+C content of bepIM gene (33%) does not correlate

G + C Micrococcus luteus (Ohama et al., 1990) and species with the average content described for B. epidermidisof Streptomyces (Wright and Bibb, 1992) have been pub- (63.3-63.5%; Jones and Collins, 1986) and a similar ob-

lished recently. servation can be made for TC-2 from C. nephridii and

In this paper we study codon usage in the group of the pTZ12 plasmid of C. xerosis. Plasmid sequences have,

corynebacteria with diverse G + C base composition, and in general, lower G + C content than the chromosomal

its relationship with the G + C content, by analysing all DNA, as occurs in pBLl of B. lactofermentum, and the

the published nt sequences of Brevibacterium and partition region of B. stationis plasmid pBY503 (Table 1).

Corynebacterium species. Actual CU and SCU tables are The average GC3s is 0.62 in BLCG, 0.72 in BSP, and

presented and correlated with the tables for the assumed 0.92 in the two Corynebacterium sp. genes. The GC3s of

highly expressed genes of B. lactofermentum and individual genes varied from 0.45 to 0.74 in the BLCG

C. glutamicum. group.

According to the neutral theory of evolution and muta-tion pressure (biased A.T/G.C pressure), since most parts

RESULTS AND DlSCUSSION of intergenic spacers are relatively unimportant, mostmutations in these regions are selectively neutral, and the

(a) G + C content of the analysed sequences evolutionary rate is higher than in the other parts of the

The corynebacterial sequences analysed were taken genome. The mutation pressure bias is thus stronger for

from the GenBank and EMBL databases (Table 1). spacers, les s for protein-encoding genes, and even les s

B. lactofermentum and C. glutamicum are the most repre- for stable RNA-encoding genes, although their contribu-sented species because of their importance in industrial tion to the average G + C content is 70-80% of genome

microbiology and biotechnology (Martín, 1989). Most of for protein-encoding genes, 20-30% for spacers includingthe published sequences correspond to genes involved in various signals, and less than 1 % for stable RNA-encod-

the glycolytic and aa biosynthetic pathways. Sequences ing genes (Muto and Osawa, 1987). These authors also

from C. diphtheriae include the corynephage !3 tox gene, predict that the G+C content of protein-coding genes

plasmid sequences (from pNG2 and pEP2) and a unique and non-coding DNA willbe similar in G+C-rich ge-

chromosomal DNA sequence (orfl-dtxR-orf3). The re- Domes. The averageG+C content of BLCG sequences

maining corynebacterial species are represented by one is 54:8% (in 58 978 bp ana1Ysed~. The G+C content of

or two sequences. the BLCG ORFs is 56.7% (,in 39261 bp; 66.6% of BLCG

The G + C content of the ORFs present in the coryne- sequenced DNA) and the G + C content of the spacers is

18

TABLE 1 ,. ,! ' "

Corynebacterialsequencesanalysed ti lJ,!,.,,1SI)! ',:J¡¡;,

'!c,

Genetic element or phenotypic mark SequenCe Gene" .c ORFb ORF or RN 'ACcession number

r:: size (bp) or'RNA ,'1 SilO (bp) G+C(o/aj ¡.,l'¡rT~t,:

, ,'!' 11)..."

B. iactofermentum t.'9v1~j,'n'y' tiC,}! ;, "'"' 'In c; ,

pBL1 4448/ DOO038. MOO038

4457 X03987

Homoserine dehydrogenase and homoserine kinase 2618 hom 1338 55.60 YO0476

thrB 930 54.41 YOO140

trp operan 7725/ trpL 54 50.00 X04960, M16663,

1461/ trpE 1557 57.28 and

621 trpG 627 54.55 M16664

trpD 1047 59.12

trpC 1425 57.96

trpB 1251 62.11

trpA 846 59.22

Dihydrodipicolinate synthase, 3573 dapB 744 54.62 Z21502

dihydrodipicolinate reductase orf2 558 50.80

and orf2product dapA 903 52.64

Threonine synthase 1781 thrC 1446 56.37 Unpublishedc

orfX (DAP decarboxylase-related gene) 2503 orfX 1554 57.66 Z21501

C. glutamicumPrephenate dehydratase 1088 pheA 948 61.60 M13774

Meso-diaminopimelate-D-dehydrogenase 1829 ddh 963 60.02 YOO151

Homoserine dehydrogenase and homoserine kinase 3685 hom 1338 55.60 YOO546

thrB 930 54.30

Leader region and anthranilate synthase 1890 trpL 54 50.00 X55994

trpE 1557 57.42

Fructose-biphosphate aldolase 3371 Ida 1031 56.33 X17313

Phosphoenol-pyruvate carboxylase 4885 ppc 2760 56.96 M25819

3292 X14234

DAP decarboxylase 1455 lysA 1338 58.89 X07563

2344 orfX 1554 57.40 X54740

Aspartokinase and 2803 lysC 1266 53.79 X57226

aspartate ¡3-semialdehyde dehydrogenase asd 1035 56.33

Dihydrodipicolinate synthetase 1341 dapA 903 52.83 X53993

Threonine synthase 3120 thrC 1446 56.15 X56037

5S rRNA 121 5S RNA 59.50 M16175

Lysineuptake(lysI)gene 1785 lysI 1506 53.03 X60312

Glutamate dehydrogenase 2082 gdh 1347 56.05 X59404

PS1 protein 2547 csp1 1980 57.38 X66078

Glyceraldehyde-3-phosphate dehydrogenase, 3690 gap 1011 56.57 X59403

3-phosphoglycerate kinase and pgk 1212 57.68

triosephosphate isomerase tpi 780 57.57

C. diphtheriae related sequencesCorynephage (3-toxin 1942 tox 1720 42.60 KO1722

1683 1683 XO0703

Tox iron-dependent regulatory element 2529 orf1 432 48.85 M34239

dtxR 681 50.07

orf3 987 49.24

pSV5 plasmid and gene for erythromycin 4483 ermA1 762 56.43 X57320

methyltransferase .,

pNG2 gene for erythromycin methyltransferase 1517 ermA2 762 56.52 X51472

pEP2 replication-associated protein 1848 rep 783 58.11 M60619,

M34342

pNG2 methyltransferase and leader peptide 1810 ermCL 66 53.03 M36726

,ermCd 855' 56.02

Brevibacterium sp. sequences

Brevibacterium sr. enantiomer-selective amidase and 2447 amdA 1566 62.58 M60264,

nitrile hydratase cx subunit NIHcx 564 63.3Ó M32282

19

TABLE I (continued), 'é"i'ii.Jl! '

Genetic element or phenotypic marker Sequence Gene" ORFb ORF or RNA Accession number

. '" . size (b~) ór RNA siz;e (bp) G+C (o/~) ,,¡ ,,"

,

B. stationis plasmid pBY503, partition region 673 par.. , 49.03 DO0661

!

B. /iquefaciens adenylate cyclase 2256 cya 1212 66.58 X57541

Brevibacterium sr. R312 amidase 2641 amiE 1038 60.79 M76451

orfl 1056 64.20

orf2 357 61.34B. ammoniagenes fatty acid synthetase 10549 las 9315 60.93 S90715

B. stero/icum cholesterol oxidase 2385 choB 1659 64.86 DO0712

B. epidermidis modification methylase gene 1721 bepIM 1211 33.16 X13555

B. he/vo/um 5S rRNA 120 5S RNA 63.33 X08003

B. /inens 5S rRNA 123 5S RNA 55.28 M16205

B. bifidum small subunit rRNA gene 1532 rRNA M38018

. Corynehacterium sp. sequencesCorynebacterium sr. 2,5-diketo-D-gluconate reductase 1639 DKG2 837 67.74 M12799

873 DKGl 834 70.62 M21193

. C. nephridii thioredoxin C-2 615 TC-2 371 38.71 J02801

. C. variabi/is 16S RNA 1458 16S RNA 56.28 X53185

C. rena/e 16S RNA 1366 16S RNA 56.25 M29553

C. aquaticum 5S rRNA 120 5S RNA 61.67 M16169

C. xerosis (CNF 010) 5S rRNA 122 5S RNA 65.57 M16177

C. xerosis (JCM 1971) 5S rRNA 121 5S RNA 66.12 M 16204 ..

'.'"C. xerosis pTZ12plasmid, Rep protein and 2517 rep 945 31,22 ..,,'.. M16192

chloramphenicol-resistance protein CR. 647 ,25.16~,¡ ., ~"

. \~ i.)")tT

"If a gene symbol has not been previously assigned, an abbreviation of the gene product is given in upper case italics.' -

bStop codons are included. ' .: j \) (j

cM.M., L.M. Mateos, C. Guerrero and J.F.M. (unpublished).

et al., 1990). Preference for UAA (over UAG and UGA) UUC,CCA, CCU, UCC, UCU, ACC, UAC, GUC, and

has been described in E. coli, Bacillus subtilis and GUU. Only the two codons encoding Glu (GAA/GAG=

Saccharomyces cerevisiae, probably due to the use of that 6/15 in gap and 11/23 in ida) could not be distinguishedcodon by both releasing factors RF1 and RF2; UAG and according to this definition of 'preferred' codons.

UGA are each used by only one release factor (Sharp Table 111 shows the use of these 'preferred' codons in the

and Bulmer, 1988). BLCG sequences as percentage of the 'preferred' codons

ayer total codons (P/T), excluding Glu, Met and Trp

(d) Codon preference indexes and gene expression codons(Met and Trp were not used as these aa have no

Quantitation of CU in corynebacteria was determined degenerate codons). The chosen 'preferred' codons

by different methods. Gribskov et al. (1984) described an encode most ofthe aa residues and only the orf2 sequence. al'gorithm for analysing codon preference and predicting (unknown function; Pisabarro et al., 1993) has a P/T

the level of gene expression in a sequence. The codon value lower than 0.5. Nine of the 25 'preferred' codons of

preference index of Gribskov (CGI) uses the nt sequence BLCG sequences ha ve A or T in the third position, com-

as input data and computes the value for each codon of pared with 11 in E. coli (CU data from Gribskov et al.,

the sequence ayer a context window, in comparison with 1984) and two are present in a subset of 25 'preferred'

a previous CU table. This index is correlated with the codons of the presumably highly expressed EF - Tu gene

level of gene expression when a CU table from presuma- of Streptomyces coelicolor (data from Wright and Bibb,

bly highly expressed genes is used (Fig. 1). 1992). A strong correlation can be seen among the values

We ha ve selected 25 'preferred' codons (Bennetzen and of BLCG genes obtained using the CGI and P/T algo-

Hall, 1982), encoding each aa in more than 75% of the rithms, whereas GC3s shows no correlation with the ex-

two most biased sequences of the presumably highly ex- pression-related indexes (Table 111). Low values in the

pressed genes (gap andfda; Table 111) and being the most expression-related indexes of trpL ha ve previously beenused codons in all the other presumably highly expressed predicted in the leader genes of aa biosynthesis operonssequences. The BLCG 'preferred' codons selected are: of E. col! (Gouy and Gautier, 1982).

GCA, GCU, CGC, CGU, AAC, GAC, UGC, CAG, Wright (1990) publisheda new strategy to investigate

GGC, GGU, CAC, AUC, CUC, CUG,.CUU, AAG, patterns of SCU bias that quantifies how far the codon

20

TABLE 11 , !;11 I

CU ~nd SCU of the BLCGand BSP corynebacterial groups~"!",J c).,¡,iJJ. )[n.¿;')

aa BLCGa BSPb /I¡.{J 1Q aá" BLCG""' "-"-~"

CU "!~U CU SCU"; ¡ 'CtJ 'seu¡ f.?:! f ,;or,

Phe .. tit1bi..,.' 94 0:214 44 0.308 Tyf!-"': UAU 71. 0.198

UUC 346 0.186 99 0.692 UAC 271 0.792 89 0.802

Leu UUA 39 0.032 10 0.023 ter UAA 26 0.765 1 0.200

útJd' 211 0.176 98 0.228 ter UAG 5 "'0.147 1 0.200

Leu CUU 216 0.180 33 0.077 His CAU 50 0.177 18 0.184

CUC 265 0.221 98 0.228 CAC 232 0.823 . 80 0.816

CUA 68 0.057 21 0.049 Gln CAA 92 0.241 47 0.278

CUG 401 0.334 170 0.395 CAG 289 0.759 122 0.722

Ile AUU 222 0.321 76 0.318 Asn AAU 85 0.183 41 0.270

AUC 467 0.675 161 0.674 AAC 379 0.817 111 0.730

AUA 3 0.004 2 0.008 Lys AAA 84 0.188 28 0.193

Met AUG 225 1.000 110 1.000 AAG 364 0.813 117 0.807

Val GUU 359 0.324 53 0.124 Asp GAU 382 0.460 118 0.382

GUC 314 0.283 194 0.454 GAC 448 0.540 191 0.618

GUA 99 0.089 39 0.092 Glu GAA 465 0.507 159 0.497

GUG 336 0.303 141 0.330 GAG 452 0.493 161 0.503 ,

Ser UCU 178 0.214 35 0.117 Cys UGU 19 0.224 8 0.320

UCC 385 0.462 92 0.307 UGC 66 0.776 17 0.680

UCA 68 0.082 21 0.070 ter UGA 3 0.088 3 0.600

UCG 57 0.068 91 0.303 Trp UGG 111. 1.000 791 1.000

Pro CCU 141 0.255 24 0.096 Arg CGU 214 0.301 36 0.118

CCC 72 0.130 54 0.216 C~ 399 0.560 181 0.593

CCA 261 0.473 55 0.220 CGA 49 0.069 26 0.085

CCG 78 0.141 117 0.468 CGG 24 0.034 36 0.118

¡ ,Thr ACU 122 0.165 25 0.077 Ser AGU 24 0.029 11 0.036

ACC 483 0.654 202 0.623 AGC 121 0.145 50 0.167

ACA 56 0.076 23 0.071 Arg AGA 9 0.013 4 0.013

ACG 77 0.104 74 0.229 AGG 17 0.024 6 0.020

Ala GCU 449 0.287 88 0.126 Gly GGU 313 0.279 107 0.223

GCC 343 0:219 274 0.393 GGC 602 0.536 271 0.565

GCA 521 0.333 162 0.233 GGA 178 0.159 65 0.135

'. ..,

0.162 173 ., 0.248J ,

a B. lactofermentum and C. glutamicum genes (excluding pBL1 sequence; 13 083 codoris).. ji.;

bGenes from other Brevibacterium species (excluding BepI sequence; 5118 codons). ;

Cter = stop codon.

usage of a gene departs from equal usage of synonymous The algorithm of Gribskov et al. (1984), using the

codons. This measure, the 'effective number of codons CUBLCG table (Table 11), rather iban the CU table of pre-used in a gene' (Nc), has been applied to corynebacterial sumably highly expressed genes (Table IV), has been par-sequences. The individual data for the BLCG sequences ticularly successful for predicting coding regions in

are presented in Table 111 and a visual display of the Nc corynebacterial DNA sequences (Fig. 3). The result was

values of corynebacterial genes against the corresponding quite different if a standard high expression table of E. coli

GC3s values is plotted in Fig. 2. Most of the analysed was used (data not shown).

sequences have a marked codon bias that is not explainedby the variation of the GC3s, which is shown by the (e) Codon selection in highly expressed genes of BLCG

average GC curve. A clear difference exists between the Very highly expressed corynebacterial genes, such as

BLCG and BSP data points, suggesting that the BLCG r-protein-encoding genes, were not found in the data-

group tends to use fewer codons regardless of the GC bases. However, several glycQlytic enzyme-encoding

content, whereas the BSP group (with higher G + C and genes, gap, Ida, tpi, pgk, and gdh, have been sequenced.

GC3s contents) shows less biased values. These genes should be moderately expressed (von der

21

BLCG TABLE III

c. glutamicum (O)

Cod . B 1 ,1: d C 1 .B.lactofennentum (8) on parameters In. actoJermentum an . g utamlcum genes

" ,...', ",:"'1

. 56.66 Gene" CG~I CdIfr P;P ", Nc tUi:GC3s

.. ;~" 'W ',\ I

1.6 gap 1.60 1.41';" 0.919 "29.27 0.67

ida 1.45 1.32 0.819 30.30 0.69

CGIHI .

tpl 1.43 1.30 0.873 35.10 0.61

1.3 pgk 1.41 1.30 0.872 30.90 0.65

O cspl 1.33 1.29 0.808 37.86 0.63

8 asd 1.32 1.29 0.806 35.94 0.60

O gdh 1.28 1.23 0.803 38.46 0.61

1.0 ddh 1.25 1.27 0.745 31.91 0.74

O orfXB 1.21 1.27 0.741 35.25 0.63

8 i orfXc 1.21 1.27 0.740 34.43 0.63

0.7 homB 1.19 1.23 0.728 42.51 0.54

25 30 { 35 1 40 45 50 55 60 65 70 75 homc 1.19 1.23 0.728 42.51 0.54. - G+C (OJ.) lysA 1.14 1.23 0.739 35.50 0.65

pTZ12 TC-2' C. diphtheriae . BSP . thrCc 1.14 1.20 0.728 37.97 0.64

BepI - trpB 1.13 1.23 0.765 35.39 0.74

¡ Corynebacteriumsp. thrCB 1.13 1.21 0.726 37.47 0.64

Fig. 1. Graphic plot of G + C content of the corynebacteria genes. The dapB 1.13 1.20 0.708 45.90 0.55

G+C values of the BLCG group were plotted vs. the CGIHI values lysC 1.09 1.18 0.701 44.43 0.53

(Gribskov et al., 1984), which have been computed using the corre- trpA 1.07 1.17 0.700 42.80 0.65

sponding CU of the presumably highly expressed genes. The individual trpD 1.05 1.18 0.619 38.68 0.66

CGI values of BLCG group are given in Table III. The vertical line thrBc 1.04 1.15 0.655 43.17 0.48

indicatestheaverageG+CcontentoftheBLCGORFs.Verticalarrows thrBB 1.03 1.14 0.651 43.90 0.48

and double-arrowed horizontallines indicate the G+C-content ranges ppc 1.00 1.13 0.663 43.82 0.66

of different genes (bepIM and TC-2) or groups of sequences (genes of trpG 0.98 1.15 0.637 42.48 0.56

the plasmid pTZ12 of C. xerosis, C. diphtheriae-related sequences, trpC 0.92 1.09 0.663 45.52 0.61

Corynebacterium sp. genes and the BSP and BLCG groups of trpEB 0.88 1.08 0.604 46.15 0.65

sequences). trpEc 0.87 1.08 0.604 45.97 0.66

pheA 0.87 1.05 0.631 36.03 0.68

O 1 1989 . E ' k 1992) Th 1 trpLB 0.84 1.03 0.571 0.57

sien et a ., , . 1 manos, : e csp gen~ en- trpLc 0.84 1.03 0.571 0.57

codes one of the majar secreted protems of C. glutamlcum, lysI 0.82 1.02 0.552 51.12 0.56

and therefore should be also highly expressed (J oliff et al., dapAB 0.80 0.97 0.557 56.01 0.45

1992). AII these sequences showed the highest CGI and dapAc 0.79 0.97 0.554 56.62 ,0.45

P/T values (Table 111). orf2 0.72 O.~" 0.415 5~.69 O¿9

In arder to prepare CU and SCU tables of highly ex- "The subscript letters indicate the s~urce of the g~ne: C, from

pressed genes from B. lactofermentum and C. glutamicum, c. glutamicum; B, from B. lactofermentum. The origin of the other genes

we have followed three different approaches: (a) genes is given in Table l.

assumed to be highly expressed (HI group); (b) genes with bCGI calculated from CU values of presumably highly expressed genes

highest CGIT values; and (c) genes with highest P/T ~~~~eC~~l~lated from CU of all BLCG sequences (Table 11).. values. The results obtained by the three strategies are d'Preferred' ayer total codons (excluding Glu, Met and Trp codons).

not significantly different and only data on the HI group

(CUHI and SCUHI) are shown in Table IV. A slight differ- Two codons (AUA and CGG) are not used in genesence can be observed in the GC3s of highly expressed of the HI group (Table IV). A comparison among the

genes (0.64) and the average GC3s of BLCG (0.62). In SCUBLCG and SCU of highly expressed genes shows ageneral, the presumably highly expressed sequences have stronger bias for the 'preferred' codons. It can be observed

the more biased Nc values with respect to the GC that codons with G or C in the third position are more

curve (Fig. 2). used by highly expressed genes, except for codons with

BLCG genes with higher CGI and P/T values are as- GC or CC in the two first positions (Ala and Pro codons

sumed to be highly expressed like those coding for glyco- which prefer A or U in the third position). AII the high-

lytic enzymes or secreted proteins. Other gene s with high expression genes use only UAA as a stop coJan.CGI and P/T values encaJe enzymes involved in the bio-synthesis of the cell wall: aspartic semialdehyde dehydro- (f) Comparison of the 'ideal' codons of several

gen ase (asd), and meso-diaminopimelate dehydrogenase microorganisms

(ddh) , which are involved in the biosynthesis of mes 0- The evolutionary selection for an efficient translation

diaminopimelate. system for genes that need to be highly expressed, is as-

22

60 TABLE IV

The CU and SCU in highly expressed genes of B. lactofermentum andNc C. glutamicum"

50. ,',. . ~ o o aa :' CU SW"-'-aa~' ,,;,~--~cCU [ SCU. . ,. "Cc_- , ~.

- o ., -.~.

40 A.. Phe UUU 10 0.115 Tyr UAU 5 0.071

~. . o UUC 77 0.885 UAC 65 0.929

. o Leu UVA 2 0.011 ter UAA 6 1.000

30 AAA UUG 14 0.077 ter UAG . O 0.000

Leu CUU 38 0.208 His CAU 7 0.140

20 CUC 47 0.257 CAC 43 0.860

0.00 . . 0.75 1.00 CUA 6 0.033 Gln CAA 5 0.068

GC3s CUG 76 0.415 CAG 69 0.932

Fig.2. The Nc plot of Wright (1990) computed for the BLCG (filled Ile AUU 26 0.198 Asn AA U 6 0.060

symbols) and BSP (open circles) sequences. The presumably highly ex- AUC 105 0.802 AAC 94 0.940

pressed genes of BLCG (gap,fda, pgk, tpi, cspl and gdh) are indicated AUA O 0.000 Lys AAA 3 0,025

by filled triangles. Nc and GC3s values were calculated as described by Met AUG 43 1.000 --o AAG 119 0.975

Lloyd and Sharp (1992).Val GUU 92 0.453 Asp GAU 38 0.271

GUC 69 0.340 GAC 102 0.729

dapB oif2 dapA GUA 16 0.079 Glu GAA 71 0.376

GVG 26 0.128 GAG 118 0.624

o 500 1000 1500 2000 2500 JOOO 3500 Ser UCU 35 0.223 Cys UGU 2 0.100

1.4~ ;~~~~:~~~;;~~ 3 UCC 90 0.573 UGC 18 0.900

1.2 UCA 4 0.025 ter UGA O 0.000

1.0 UCG 6 0.03..8 Trp UGG 34 1.000

0.8 Pro CCU 28 0.311 Arg CGU 34 0.324

0.6 CCC 4 0.044 CGC 60 0.571

~::k ~~M~W~'I.oJ~~~~~ 2 CCA 54 0.600 CGA 8 0.076

1.2 CCG 4 0.044 CGG O 0.000

1.0 Thr ACU 21 0.164 Ser AGU 2 0.013

0.8 ACC 102 0.797 AGC 20 0.127

0.6 "' "ACA 4 0.031 Arg AGA 2 0.0190.4

1.4 ¡¡,.¡ 1 ACG 1 0.008 AGG 1 0.009

1.2 Ala GCU 97 0.362 Gly GGU 77 0.304

1.0 GCC 27 0.101 GGC 144 0.569

0.8

0.6 -I"J ~A ¡~ 0.455,-\) GGA 31 0.123

0.4 GCG 22 0.084 GGG ~ 0.004

o 500 1000 1500 2000 2500 3000 3500 bp

F . 3 C d ti 1 t (G ..b k t 1 1984) f h d . "gap, Ida, pgk, tpi, cspl, and gdh sequences (2453 cbdbns; G'+cg

Ig.. o on pre erence pon s ove a., or eac co on rn

h h d. f (1 2 3) f h d B .' 2 d A f 56.95%; GC3s=0.64).t e t ree fea rng rames " o t e ap -or) - ap sequence o b

ter = stop codon.B. lactofermentum. The codon frequence table used was CU8LCG(Table 11), with a window size of 25 codons and a codon print interval

of 1. The CGIT values (using the pooled codon usages of all sequences used for each aa, with the exception of Met and Trp

in BLCG) of all the three ~enes are shown in Table III: dapB (frame 3) codons. AUC (Ile), AAC (Asn), UUC (Phe) and UAC

represents a moderately hlgh value; orf2 (frame 3) and dapA (frame 2) ,. , .

present the two lowest values of all the BLCG genes. The horizontal (Tyr) are Ideal codons shared by most of the orgamsms

line represents a baseline value (codon preference parameter for random (Bennetzen and Hall, 1982; Gribskov et al., 1985; Shieldssequence) ofO.76. and Sharp, 1987), including BLCG and BSP groups. AII

these four codons have A or U in the first and second

sumed to be responsible for their bias toward codons positions and C in the third.

which match 1he most abundant tRNA isoacceptors The genes of BLCG tend to have C in the third posi-

(Grosjean and Fiers, 1982; Ernst, 1988; Andersson and tion. Thus, 39.7% of 13 083 codons, 13 of the 25 'pre-Kurland, 1990). Thus, informal terminologies like 'pre- ferred' codons, and 11 of the 18 'ideal' codons have C in

ferred' codons (Bennetzen and Hall, 1982) or 'the effective the third positionin {he BLCG group. This preference in

number of codons' (Wright, 1990) have importance as a the BLCG group Is similar to the'ideal' codons of E. coli,representation of biological features, and can be used as and lower iban in Micrococcus .[ijteus, Streptomyces, and

a basis for developing a quantitative measure of SCU BSP 'ideal' codons. In {bese three groups of microorga-

bias. In highly expressed genes 'ideal' codons are the most nisms all the 'ideal' codons have G or C in the third

'"

23

position. A comparison of the number of identical 'ideal' Osawa, 1987). The comparison of CU characteristics in

codons shows strong correlation in the identity values the BLCG sequences shows that the 'preferred' and 'ideal'

among the microorganisms with high G + C contento The codons of this group had the highest similarity to

average G+C content ofthe BLCG group and the GC3s Streptomyces, whereas the G+C content is more similar

is closer to E. coli than to Streptomyces; nevertheless, the to that of E. coli, probably as consequence of an A.T

'ideal' codon usage of BLCG is closer to Streptomyces mutation pressure. The codon usage tables may be a veryand the high-G + C group of Gram + bacteria than to useful tool for molecular biology fo understand the heter-

E. coli, yeast or B. subtilis. ologous expression of corynebacterial genes in E. coli,

This bias in codon usage of corynebacteria may limit Bacillus or Streptomyces and to disco ver the evolutionaryexpression of E. coli and B. subtilis genes in implications of the codon preference in different

B. lactofermentum. Several genes coding for antibiotic re- organisms.sistance from E. coli or extracellular enzymes from

Bacillus are not very well expressed in B. lactofermentum.On the other hand, it should be easier fo express ACKNOWLEDGEMENTS

~ Streptomyces gene s in corynebacterial species. However, ..

the importance of the CU patterns in heterologous ex- We thank B. Miguélez for statistical and computer as-

) pression is not clear. The two genes encoding 2,5-diketo- sistance, F. Wright for the Nc program and M. Bibb for

D-gluconate reductase of Corynebacterium sp. have been critical reading of the manuscript and suggestions. This

expressed in Erwinia and E. coli (Anderson et al., 1985), work was supported by the BRIDGE BIOT CT91-0264and heterologous genes with P/T near O have been highly RZJE project of the Commission of the European

expressed in yeast (Ernst, 1988). Several heterologous Communities (Brussels), and the CAICYT BT85-0024

genes have been expressed in aa-producing corynebact- (Madrid). M.M. received a fellowship of the University

eria: kanamycin resistance-encoding gene of B. subtilis of León (León, Spain).

and E. coli; cat, amp, tet of E. coli; cat, amyE of Bacillusamyloliquefaciens. Corynebacterial aa biosynthetic genes

are well expressed in E. coli ceUs (Martín, 1989). REFERENCES

(g) Conclusions Anderson, s., Marks, C.B., Lazarus, R., Miller, J., Stafford, ~.,

(1) The G + C composition and the CU of different Seymour, J., Light, D., Rastetter, W. and Estell, D: Production of. . 2-keto-L-gluconate, an intermediate in L-ascorbate synthesis, by a

groups of corynebactenal DNA sequences studled are . ll d ' fi d E .. h b . 1 S . 230(1985) 144 149. genetlca y mo I e rwmla er ICO a. I;lence - .heterogenous. One of them (BLCG) contams only the Andersson, S.G.E. and Kurland, C.G.: Codon nreferences in free-living

sequences of B. lactofermentum and C. glutamicum, since microorganisms. Microbiol. Rev. 54 (1990) 198-210.no significant differences in the published sequences exist Bennetzen, J.L. and Hall, B.D.: Codon selection in yeast. J. Biol. Chem.

. 257 (1982) 3026-3031.

between the two SpeCleS. .. .

l . d .

f.. Elkmanns, B.J.: Identlficatlon, sequence ana YSIS, an expresslon o a

(2) Thfferent ways to quantlfy the codo n usage of cory- Corynebacterium glutamicum gene cluster encoding the three glyco-

nebacteria genes have been used (CGI, P/T and Nc). The Iytic enzymes glyceraldehyde-3-phosphate dehydrogenase,

results obtained by CGI and P/T show similar distribu- 3-phosphoglycerate kinase, and triosephosphate isomerase.

. tion of the codon bias in the BLCG sequences, and the . J. Bacteriol. 17~ (1992) 6076-6086.. . . Elkmanns, B.J., Kltcher, M. and Remscheld, D.J.: DIscnmmatlon of

Nc-plot values show a partlal mdependence of the CU Corynebacterium glutamicum, Brevibacterium ftavum and

from the GC3s content of the corynebacterial sequences. Brevibacterium lactofermentum by restriction pattern analysis of

: The genes ofhigher CGI and CBI values have, in general, DNA adjacent to the hom gene. FEMS Microbiol. Lett. 82 (1991)a low Nc value, and correspond to gene s which are as- 203-208. ..

d b h . hl d Th b . Ernst, J.F.: Codon usage and gene expresslon. Trends Blotechnol. 6

sume to e 19 y expresse. e coryne actena genes (1988) 196-199.

with high CGI and P/T values have a G+C content Gouy, M. and Gautier, C.: Codon usage in bacteria: correlatión with

similar to the average G+C composition of the ORFs. gene expressivity. Nucleic Acids Res. 10 (1982) 7055-7074.

The N plot of the BLCG sequences shows highly biased Gribskov, M., Devereaux, J. and Burgess, R.: The codon preference1 c

h d . th th f th BSP plot: graphic analysis of protein coding sequences and prediction of

va ues w en compare Wl ose o e sequences, . N l . A .d R 12 (1984) 539 549gene expresslon. uc elc CI s es. - .probably indicating a more biased composition of the Grosjean, H. and Fiers, W.: Preferential codon usage in prokaryotic

relative levels of iso-accepting tRNAs in the BLCG genes: the optimal codon-¡¡iJticodon interaction energy and the selec-

group. tive codon usage in effiGiently expressed genes. Gene 18 (1982)

(3) The mutation pressure (A.T/G'C) has played a 1.99-209. . .

. Hon, H.and Osawa, S.: Evolutlonary change m 5S rRNA secondary

majar role m dlverslficatlon of geno me DNA sequences, structure~nd a phylogenetictree of 352 rRNA species. BioSystemsand codon usage in bacterial evolution (Muto and 19 (1986) 163-172.

24

Ikemura, T.: Correlation between the abundance of E.. coli transfer rium with high genomic GC-content. Nucleic Acids Res. 18 (1990)

RNAs and the occurrence of the respective codons in its protein 1565-1569.

genes. J. Mol. Biol. 146 (1981)1-21. Park, Y.H., Hori, H., Suzuki, K.I., Osawa, S. and Komagata, K.:

Jolilf, G., Mathieu, L., Hahn, V., Bayan, N., Duchiron, F., Renaud, M., Phylogenetic analysis of the coryneform bacteria by 5S rRNA se-

Shechter, E. and Leblon, G.: Cloning and nucleotide sequence of quences. J. Bacteriol. 169 (1987) 1801-1806.

the csp1 gene encoding PS1, one of the two majar secreted proteins Pisabarro, A., Malumbres, M., Mateos, L.M., Oguiza, J.A. and Martín,

of Corynebacterium glutamicum: the deduced N-terminal region of J.F.: A cluster ofthree genes (dapA, orf2 and dapB) of Brevibacterium

PS1 is simillar to the Mycobacterium antigen 85 complex. Mol. lactofermentum encodes dihydrodipicolinate synthase, dihydrodipi-

Microbiol. 6 (1992) 2349-2362. colinate reductase and a third polypeptide of unknown function.

Jones, D. andCollins, M.D.: Irregular nonsporing Gram-positive rods. J. Bacteriol. 175 (1993) 2743-2749.In: Sneath, P.H.A., Mair, N.S., Sharpe, M.E. and Holt, J.G. (Eds.), Sharp, P.M.: Processes of genome evolution reflected by base frequency

Bergey's Manual of Systematic Bacteriology. Vol 2. Williams and dilferences among Serraría marcescens genes. Mol. Microbiol. 4

Wilkins, Baltímore, 1986, pp. 1261-1434. (1990) 119-122.

Kinoshita, S.: Glutamic acid bacteria. In: Demain, A.L. and Solomon, Sharp, P.M. and Bulmer, M.: Selective dilferences among translation

N.A. (Eds.), Biology of Industrial Microorganisms. Benjaminj termination codons. Gene 63 (1988) 141~145.

Cummins Publishing Company, London, 1985, pp. 115-142. Sharp, P.M., Cowe, E., Higgins, D.G., Shields, D.C., Wolfe, K.H. and

Leskiw, B.K., Bibb, M.J. and Chater, K.F.: The use of a rafe codon Wright, F.: Codon usage patterns in Escherichia coli, Bacillus subti-

specifically during development? Mol. Microbiol. 5 (1991) lis, Saccharomyces cerevisiae, Schizosaccharomyces pombe,2861-2867. Drosophila melanogaster and Horno sapiens; a review ofthe consider- '

Liebl, W., Ehrmann, M., Ludwig, W. and Schleifer, K.H.: Transfer of able within-species diversity. Nucleic Acids Res. 16 (1988)Brevibacterium divaricatum DSM 20297T, 'Brevibacterium flavum' 8207-8211.DSM 20411, 'Brevibacterium lactofermentum' DSM 20412 and DSM Shields, D.C. and Sharp, P.M.: Codon usage in Bacillus subtilis. N~cleic

1412, and Corynebacterium lilium DSM 20137T to Corynebacterium Acids Res. 15 (1987) 8023-8040.

glutamicum and their distinctíon by rRNA gene restriction patterns. van der Osten, C.H., Barbas, C.F., Wong, C.H. and Sinskey, A.J.:Int. J. Syst. Bacteriol. 41 (1991) 255-260. Molecular cloning, nucleotide sequence and fine-structural analysis

Lloyd, A.T. and Sharp, P.M.: CODONS: a microcomputer program of the Corynebacterium glutamicumfda gene: structural comparison

for codon usageanalysis. J. Hered. 83 (1992) 239-240. of C. glutamicum fructose-1,6-bisphosphate aldolase to class I and

Martín, J.F.: Molecular genetics of amino acid-producing corynebact- class 11 aldolase. Mol. Microbiol. 3 (1989) 1625-1637.eria. In: Baumberg, S., Hunter, l. and Rhodes, M. (Eds.), Microbial Woese, C.R.: Bacterial evolution. Microbiol. Rev. 51 (1987) 221-271.

Products: New Approaches. Cambridge University Press, Wright, F.: The 'elfective number of codons' used in a gene. Gene 87

Cambridge, 1989, pp. 25-59. (1990) 23-29.

Muto, A. and Osawa, S.: The guanine and cytosine content of genomic Wright, F. and Bibb, M.J.: Codon usage in the G+C-rich Streptomyces

DNA and bacterial evolution. Proc. Natl. Acad. Sci. USA 84 genome. Gene 113 (1992) 55-65.

(1987) 166-169. Yamada, K. and Komagata, K.: Taxonomic studies on coryneform bac-

Ohama, T., Muto, A. and Osawa, S.: Role ofGC-biased mutation pres- teria, 111. DNA base composition of coryneform bacteria. J. Gen.

sure on synonyrnous codon choice in Micrococcus luteus, a bacte- Appl. Microbiol. 16 (1970) 215-224.

<.: ,,') 1') C);¿ U,cl! , .)..11 ~J~4b!1¡;I'¡

1. Ir!:1 !~! 1, 1 r,!! t¡) 1,: i

J(;¡ :,!}:, !'j)O'(jj¡;J:'" í

.J) ,ti;

,ft :>,fi.J1:¡'lé i¡.¡'"i'))'jl,t:':¡U(

([ ';1 J"ir! 7~)¡;11.;' ffty' ¡ t., ,,'Cf 3~'i r" '1 r.i' J(! r,r',""

C I -"", f 1"'crl1i ",i..,')!! ¡:;¡,~;J~I¡;(iéjfi:i'f) 'lOr .U~

" ¡j !(~';¡ :'1' i, ,)