The genome sequence of the extreme thermophile Thermus thermophilus

7
ARTICLES Thermus aquaticus, the type species of the genus, was the first extremely thermophilic bacterium described 1 . Only a few years later, strains HB8 and HB27 of T. thermophilus were isolated 2,3 . These aero- bic, obligate heterotrophs have a higher maximum growth tempera- ture than T. aquaticus (85 °C as compared to 79 °C). They show high transformation competence and are therefore amenable to genetic manipulation, in contrast to other Thermus species 4,5 . T. thermophilus has become a model organism in structural biology. Several of its enzymes have been crystallized and their structures ana- lyzed by X-ray crystallography, one enzyme being the 30S ribosomal subunit 6 . Interesting subjects for basic research on, for example, the structural basis of protein thermostability or the adaptation strategies for survival at high temperatures, extremely thermophilic bacteria can also supply biocatalysts for biotechnological applications 7–10 . The intrinsic stability of thermostable enzymes and their resistance to denaturing physical and chemical factors are considerable advantages in industrial processes. Some enzymes of Thermus species are already used in biotechnological applications such as DNA polymerase, an indispensable enzyme used in PCR techniques 9,10 . Other fields of application for thermostable enzymes are starch-processing (e.g. α-amylases, glucose isomerases), organic synthesis (e.g. esterases, lipases, proteases), diagnostics, waste treatment, pulp and paper man- ufacture (e.g. xylanases), and animal feed and human food (amino acid and vitamin synthesis) 7,10 . Therefore, Thermus species have been studied very extensively. T. thermophilus has attracted special atten- tion; more than 1,000 publications about this species have appeared. Phylogenetic studies of 16S rRNA sequence and conserved genes indicate a close relationship between the Gram-negative genus Thermus and the Gram-positive genus Deinococcus and suggest that these two lineages form a distinctive grouping within the eubacteria that deserve the taxonomic status of a phylum 11,12 . We sequenced the genome of T. thermophilus HB27 using the ran- dom shotgun approach. After annotating all putative genes we focused on the identification of interesting genes of potential biotechnological value as well as on a comparison of the T. thermophilus genome with other published genome sequences, especially with that of radiation- resistant D. radiodurans R1 13 . RESULTS General features The genome of T. thermophilus HB27 is composed of a 1,894,877 base pair (bp) chromosome (TTC) and a 232,605-bp megaplasmid 1 Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, University of Göttingen, Grisebachstr. 8, 37077 Göttingen, Germany. 2 Department of Molecular Genetics and Preparative Molecular Biology, Institute of Microbiology and Genetics, University of Göttingen, Grisebachstr. 8, 37077 Göttingen, Germany. 3 Department of General and Applied Microbiology, Institute of Microbiology and Genetics, University of Göttingen, Grisebachstr. 8, 37077 Göttingen, Germany. 4 University of Regensburg, Universitätsstrasse 31, 93053 Regensburg, Germany. 5 Present addresses: Institut Pasteur, 25–28, Rue du Docteur Roux, 75724 Paris Cedex 15, France (H.B.),Variom Biotechnology AG, Robert-Koch-Platz 11, 10115 Berlin, Germany (C.R.), Genedata, Maulbeerstrasse 46, 4016 Basel, Switzerland (T.H.), Ministerium für Umwelt, Keplerstrasse 18, 66117 Saarbrücken, Germany (A.J.), GBF Gesellschaft für Biotechnologische Forschung mbH, Mascheroder Weg 1, 38124 Braunschweig, Germany (R.M.-A.), BASF Aktiengesellschaft, ZHV-A 030, 67056 Ludwigshafen, Germany (C.J.), EMBL Heidelberg, Meyerhofstrasse 1, 69117 Heidelberg, Germany (V.S.), e.gene Biotechnologie GmbH, Pöckinger Fussweg 7a, 82340 Feldafing, Germany (H.-P.K.). Correspondence should be addressed to A.H. ([email protected]). Published online 4 April 2004; doi:10.1038/nbt956 The genome sequence of the extreme thermophile Thermus thermophilus Anke Henne 1 , Holger Brüggemann 1,5 , Carsten Raasch 1,5 , Arnim Wiezer 1 , Thomas Hartsch 1,5 , Heiko Liesegang 1 , Andre Johann 1,5 , Tanja Lienard 1,3 , Olivia Gohl 1 , Rosa Martinez-Arias 1,5 , Carsten Jacobi 1,5 , Vytaute Starkuviene 1,5 , Silke Schlenczeck 2 , Silke Dencker 1 , Robert Huber 4 , Hans-Peter Klenk 1,5 , Wilfried Kramer 2 , Rainer Merkl 2 , Gerhard Gottschalk 1,3 & Hans-Joachim Fritz 1,2 Thermus thermophilus HB27 is an extremely thermophilic, halotolerant bacterium, which was originally isolated from a natural thermal environment in Japan. This organism has considerable biotechnological potential; many thermostable proteins isolated from members of the genus Thermus are indispensable in research and in industrial applications. We present here the complete genome sequence of T. thermophilus HB27, the first for the genus Thermus. The genome consists of a 1,894,877 base pair chromosome and a 232,605 base pair megaplasmid, designated pTT27. The 2,218 identified putative genes were compared to those of the closest relative sequenced so far, the mesophilic bacterium Deinococcus radiodurans. Both organisms share a similar set of proteins, although their genomes lack extensive synteny. Many new genes of potential interest for biotechnological applications were found in T. thermophilus HB27. Candidates include various proteases and key enzymes of other fundamental biological processes such as DNA replication, DNA repair and RNA maturation. NATURE BIOTECHNOLOGY VOLUME 22 NUMBER 5 MAY 2004 547 NATURE BIOTECHNOLOGY VOLUME 22 NUMBER 5 MAY 2004 547 © 2004 Nature Publishing Group http://www.nature.com/naturebiotechnology

Transcript of The genome sequence of the extreme thermophile Thermus thermophilus

A RT I C L E S

Thermus aquaticus, the type species of the genus, was the firstextremely thermophilic bacterium described1. Only a few years later,strains HB8 and HB27 of T. thermophilus were isolated2,3. These aero-bic, obligate heterotrophs have a higher maximum growth tempera-ture than T. aquaticus (∼ 85 °C as compared to 79 °C). They show hightransformation competence and are therefore amenable to geneticmanipulation, in contrast to other Thermus species4,5.

T. thermophilus has become a model organism in structural biology.Several of its enzymes have been crystallized and their structures ana-lyzed by X-ray crystallography, one enzyme being the 30S ribosomalsubunit6. Interesting subjects for basic research on, for example, thestructural basis of protein thermostability or the adaptation strategiesfor survival at high temperatures, extremely thermophilic bacteria canalso supply biocatalysts for biotechnological applications7–10. Theintrinsic stability of thermostable enzymes and their resistance todenaturing physical and chemical factors are considerable advantagesin industrial processes. Some enzymes of Thermus species are alreadyused in biotechnological applications such as DNA polymerase, anindispensable enzyme used in PCR techniques9,10. Other fields ofapplication for thermostable enzymes are starch-processing (e.g.α-amylases, glucose isomerases), organic synthesis (e.g. esterases,

lipases, proteases), diagnostics, waste treatment, pulp and paper man-ufacture (e.g. xylanases), and animal feed and human food (aminoacid and vitamin synthesis)7,10. Therefore, Thermus species have beenstudied very extensively. T. thermophilus has attracted special atten-tion; more than 1,000 publications about this species have appeared.

Phylogenetic studies of 16S rRNA sequence and conserved genesindicate a close relationship between the Gram-negative genusThermus and the Gram-positive genus Deinococcus and suggest thatthese two lineages form a distinctive grouping within the eubacteriathat deserve the taxonomic status of a phylum11,12.

We sequenced the genome of T. thermophilus HB27 using the ran-dom shotgun approach. After annotating all putative genes we focusedon the identification of interesting genes of potential biotechnologicalvalue as well as on a comparison of the T. thermophilus genome withother published genome sequences, especially with that of radiation-resistant D. radiodurans R113.

RESULTSGeneral featuresThe genome of T. thermophilus HB27 is composed of a 1,894,877base pair (bp) chromosome (TTC) and a 232,605-bp megaplasmid

1Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, University of Göttingen, Grisebachstr. 8, 37077 Göttingen, Germany. 2Department ofMolecular Genetics and Preparative Molecular Biology, Institute of Microbiology and Genetics, University of Göttingen, Grisebachstr. 8, 37077 Göttingen, Germany.3Department of General and Applied Microbiology, Institute of Microbiology and Genetics, University of Göttingen, Grisebachstr. 8, 37077 Göttingen, Germany.4University of Regensburg, Universitätsstrasse 31, 93053 Regensburg, Germany. 5Present addresses: Institut Pasteur, 25–28, Rue du Docteur Roux, 75724 ParisCedex 15, France (H.B.),Variom Biotechnology AG, Robert-Koch-Platz 11, 10115 Berlin, Germany (C.R.), Genedata, Maulbeerstrasse 46, 4016 Basel, Switzerland(T.H.), Ministerium für Umwelt, Keplerstrasse 18, 66117 Saarbrücken, Germany (A.J.), GBF Gesellschaft für Biotechnologische Forschung mbH, Mascheroder Weg1, 38124 Braunschweig, Germany (R.M.-A.), BASF Aktiengesellschaft, ZHV-A 030, 67056 Ludwigshafen, Germany (C.J.), EMBL Heidelberg, Meyerhofstrasse 1,69117 Heidelberg, Germany (V.S.), e.gene Biotechnologie GmbH, Pöckinger Fussweg 7a, 82340 Feldafing, Germany (H.-P.K.). Correspondence should be addressedto A.H. ([email protected]).

Published online 4 April 2004; doi:10.1038/nbt956

The genome sequence of the extreme thermophileThermus thermophilusAnke Henne1, Holger Brüggemann1,5, Carsten Raasch1,5, Arnim Wiezer1, Thomas Hartsch1,5, Heiko Liesegang1,Andre Johann1,5, Tanja Lienard1,3, Olivia Gohl1, Rosa Martinez-Arias1,5, Carsten Jacobi1,5, Vytaute Starkuviene1,5,Silke Schlenczeck2, Silke Dencker1, Robert Huber4, Hans-Peter Klenk1,5, Wilfried Kramer2, Rainer Merkl2,Gerhard Gottschalk1,3 & Hans-Joachim Fritz1,2

Thermus thermophilus HB27 is an extremely thermophilic, halotolerant bacterium, which was originally isolated from a naturalthermal environment in Japan. This organism has considerable biotechnological potential; many thermostable proteins isolatedfrom members of the genus Thermus are indispensable in research and in industrial applications. We present here the completegenome sequence of T. thermophilus HB27, the first for the genus Thermus. The genome consists of a 1,894,877 base pairchromosome and a 232,605 base pair megaplasmid, designated pTT27. The 2,218 identified putative genes were compared to those of the closest relative sequenced so far, the mesophilic bacterium Deinococcus radiodurans. Both organisms share asimilar set of proteins, although their genomes lack extensive synteny. Many new genes of potential interest for biotechnologicalapplications were found in T. thermophilus HB27. Candidates include various proteases and key enzymes of other fundamentalbiological processes such as DNA replication, DNA repair and RNA maturation.

NATURE BIOTECHNOLOGY VOLUME 22 NUMBER 5 MAY 2004 547NATURE BIOTECHNOLOGY VOLUME 22 NUMBER 5 MAY 2004 547

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy

A RT I C L E S

(TTP), designated pTT27. General features of the genome are sum-marized in Table 1. The G+C content is 69.4% on average. Regionsshowing a substantial lower G+C content represent ribosomalDNA clusters and at least three more gene clusters that are flankedby mobile elements. For instance, cluster TTC274–288 has a G+Ccontent of 58%; it contains 15 genes with atypical codon usage,most of which show similarities to sugar transferases, epimerasesand dehydrogenases involved in lipopolysaccharide O-antigenbiosynthesis.

The likely origin of replication was identified around bp position1,523,000 based on GC skew analysis and the location of characteristicreplication proteins such as DnaA (TTC1608). We have predicted2,218 putative genes and they are depicted in Figure 1. Substantialsimilarity to annotated database entries allowed us to assign putativefunction to 1,482 protein-coding genes. Of the remaining 736 openreading frames (ORFs), about 488 had no substantial similarity toentries in public databases, and were therefore designated hypotheticalproteins. These are more frequent on the plasmid (39% of all ORFs onpTT27) than on the chromosome (20% of all chromosomal ORFs).We identified regions of mobile DNA and predicted multiple copies ofvarious insertion elements (IS) (50 in total), several harboring com-plete or fragmental transposase genes.

Comparative genomicsA close phylogenetic relationship between Thermus and Deinococcushas been ascertained by microbiological criteria as well as bioinfor-matics11–14. The genome of D. radiodurans R1 is composed of twochromosomes (2,648,638 bp and 412,348 bp), a megaplasmid(177,466 bp) and a small plasmid (45,704 bp), with an average G+Ccontent of 66.6%13. An examination of all the gene products of T. ther-mophilus HB27 reveals their similarity to proteins encoded preferen-tially on chromosome I of D. radiodurans R1, as has been reported forsmaller subsets14. There are 1,443 T. thermophilus HB27 proteins thatshow similarity to proteins of D. radiodurans R1 with a cut-off e-value<10–5 using the Basic Local Alignment Search Tool (Blastp) (Fig. 1).On average, Blastp results showed that 47.1% of residues for theseorthologous proteins were identical. Bidirectional Blast results showedthat the next most similar genomes are those of Thermosynechococcuselongatus (1,012 orthologs, 39.9% identity), Thermotoga maritima(951 orthologs, 39.8% identity) and Thermoanaerobacter tengcongensis(1,074 orthologs, 39.8% identity). Despite the similarity of many oftheir gene products, genome-wide synteny between T. thermophilusHB27 and D. radiodurans R1 could not be detected (Fig. 2).

Comparison of the genomes of T. thermophilus HB27 and D. radio-durans R1 also provides an opportunity to ask how thermophilicorganisms respond genetically to thermal challenge. Using the knownkinetic parameters of hydrolytic deamination of cytosine residues indouble-stranded DNA15 in the Arrhenius equation, one would predict∼ 1,000-fold rate increase in deamination as temperature rises from 37to 85 °C. Hence, one is lead to expect that T. thermophilus would investheavily in the repair of DNA uracil residues to escape mutagenesis. Onegeneral line of defense against cytosine deamination is provided byenzymes of the Ung family of uracil-DNA-glycosylases, which arenearly ubiquitous in the bacterial and eukaryotic domain of the phylo-genetic tree. Therefore, it is surprising that the genome of T. ther-mophilus HB27 contains no ORF coding for an Ung-like protein. Wedid not find any pattern typical of Ung enzymes in any T. thermophilusORF by searching the PROSITE database; searching the Ung TIGRfam

548 VOLUME 22 NUMBER 5 MAY 2004 NATURE BIOTECHNOLOGY

Table 1 General features of the genome of T. thermophilus HB27

Chromosome Plasmid pTT27

Length (bp) 1,894,877 232,605

GC content (%) 69.4 69.2

Coding sequence (%) 95 89

ORFs:

with assigned function 1,397 (70%) 85 (37%)

conserved hypothetical 192 (10%) 56 (24%)

no database match 399 (20%) 89 (39%)

Total 1,988 230

RNAs

rRNA 2 clusters –

tRNA 47 –

IS elements 30 23

Figure 1 Maps of the chromosome and plasmid of T. thermophilus HB27.(a) The protein coding sequence of the chromosome is shown in red andgreen, depending on strand orientation. ORFs of T. thermophilus, which havesubstantial similarity to ORFs of D. radiodurans (Blastp e-value <10–5) areshown in yellow and blue. (b) The plasmid map shows all ORFs in red andblue, depending on strand orientation. The outer circles in both mapsrepresent the scale in bp, the inner circles show the G+C content variation(lower values inwards).

a

b

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy

A RT I C L E S

HMM protein family database yielded frame TTC1721 as the best fitwith an e-value of 0.43 (the Ung homolog of D. radiodurans, for com-parison, has an e-value of 1.6 × 10–132; see also Supplementary Table 1online). Instead, two other uracil-DNA-glycosylases with differing sub-strate spectra, TTUDGA (TTC366) and TTUDGB (TTC784), whichare absent from D. radiodurans16, are present. Both of these belong toother families of DNA uracil glycosylases and it remains to be seenwhether this clear-cut difference resembles specific functional demandsof life at high versus low temperature. Other pathways of base excisionrepair are represented by several glycosylases, MutM (TTC1454) andthree members of the helix-hairpin-helix superfamily, MutY(TTC1535), AlkA (TTC1654, putative) and Nth (TTC1892). The anno-tation of TTC1535 as MutY was confirmed biochemically (V.S. and H.-J.F., unpublished data). Another remarkable difference between D. radiodurans and T. thermophilus HB27 concerns LexA, the centralhinge of the bacterial SOS response. D. radiodurans clearly carries aLexA homolog, whereas there seems to be none in T. thermophilusHB27. This could mean that the role of LexA is taken over by a remotelyrelated protein or that T. thermophilus HB27 has evolved an altogetherdifferent regulatory circuit as a substitute for the SOS response.Alternatively, LexA-mediated repression may be lacking and the corre-sponding repair genes expressed constitutively (see SupplementaryTable 1 online for additional DNA repair genes).

Physiological and metabolic featuresWhat qualifies T. thermophilus HB27 to live in a hot spring environ-ment? The genome sequence reveals features typical of a scavenger,which means the organism is associated with solid surfaces and takesup and uses substrates as they pass by (also see Supplementary Figure1 and Supplementary Table 2 online).

The motility of the organism is restricted: no flagellum could bedetected by electron microscopy. Accordingly, no flagella biosynthesisgenes are present. However, the organism has been shown to possesstype IV pili, which play a functional role in attachment, surface colo-nization and twitching motility as well as natural transformation5,17.

We identified various genes for type IV pili biogenesis. Therefore, theabsence of chemotaxis cascades and methyl-accepting chemotaxisproteins is not surprising. Environmental signal transduction seems tobe restricted to ten two-component sensor/regulator systems as well asa few proteins harboring proposed signaling domains (GGDEF, EALor HD-GYP). Only a few of these proteins have homologs in D. radio-durans, whose genes apparently encode a different and larger set of sig-naling systems and, in addition, methyl-accepting chemotacticreceptor proteins14. This lends further support to the notion that thereare generally fewer signal transduction systems in hyperthermophilescompared with mesophiles18.

T. thermophilus HB27 uses various proteinaceous substrates as wellas carbohydrates for growth. These are made available by numerous(exo)proteases, lipases, pullulanases, α- and β-glucosidases and galac-tosidases, genes of which were identified in the genome sequence (seebelow). For substrate uptake, the organism preferably uses energy-coupled systems, as at least 42 complexes are primary transportersbelonging to the ATP-binding cassette (ABC) protein family. Amongthese, a conspicuous number of high-affinity ABC transporters forbranched chain amino acids could be found. In contrast, no geneshomologous to phosphotransferase systems were detected.

Biosynthetic pathways for all 20 amino acids are present in the T. thermophilus HB27 genome, as well as pathways for the biosynthesisof vitamins and cofactors such as folate, biotin, riboflavin, molyb-dopterin, thiamine, panthotenate, porphyrins and carotenoids. Thelatter two are described in detail below.

Catabolic pathways for most amino acids are present, as well as thegenes for a complete urea cycle. Several clusters involved in fatty acidmetabolism can be found, comprising enzymes of the β-oxidationpathway in multiple copies. The Embden-Meyerhof pathway, a com-plete tricarboxylic acid cycle and genes for gluconeogenesis and theglyoxylate bypass are also encoded in the genome.

Energy conservation is accomplished by a membrane-bound oxida-tive respiratory chain. Reducing equivalents are fed into the chain viaNADH dehydrogenase (TTC1907–1920) and a cytochrome-dependent

NATURE BIOTECHNOLOGY VOLUME 22 NUMBER 5 MAY 2004 549

Figure 2 Comparison of the chromosomeof T. thermophilus HB27 with chromosomeI of D. radiodurans R1. Based onnucleotide sequences, an ACT plot(http://www.sanger.ac.uk/Software/ACT/) was created to illustrate the lack of syntenybetween the two chromosomes (minimumcut-off score, 100). The longest stretchesshowing a conserved gene order representthe NADH-quinone oxidoreductase genecluster (TTC1907-1920, blue arrows) anda cluster encoding ribosomal proteins(TTC1316-1329, green arrows, inverted).The program PROmer, which is part of the MUMmer software package(http://www.tigr.org/software/mummer/) wasused to compare the translated six readingframes of both organisms. The outputconfirmed the lack of synteny on proteinlevel (data not shown).

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy

A RT I C L E S

succinate dehydrogenase (TTC1089–1092). Two sets of genes for termi-nal cytochrome c oxidases were found (TTC1670–1673 andTTC768–770), a caa3- and a ba3-type oxidase using cytochrome c552(TTC872, TTC962, TTC1058) as substrate. The latter is expressedunder limited O2 supply19,20. In addition, a system similar tomenaquinol-cytochrome c oxidoreductases of several bacterial specieswas identified consisting of a cytochrome b, a Rieske iron-sulfur pro-tein and a c-type cytochrome (TTC1567–1570). Another system, anicotinamide nucleotide transhydrogenase (TTC1778–1780), mightact as a respiratory chain-linked proton pump21. ATP synthesis isaccomplished by an ATP synthase of the V/A-type, which refers to thespecial phylogenetic position of the T. thermophilus complex betweenA0A1-type and V0V1-type ATPases. As observed previously, a F0F1-typeATPase is absent22.

Although Thermus species are considered aerobic chemoorgan-otrophs, it has been shown that the closely related strain HB8 growsanaerobically owing to a nitrate reductase, encoded within a conjuga-tive plasmid integrated into its chromosome23. Expression from thisoperon is induced under low oxygen concentrations in the presence ofnitrate. Genes of this complex are missing in strain HB27 which,accordingly, did not grow under low-oxygen conditions. However, sev-eral ‘anaerobic’ molybdopterin oxidoreductases are present (TTC155,TTC1786, TTC1403, TTC1404), raising the question of their meta-bolic involvement. For instance, a system with similarity to the anaerobic polysulfide respiratory chain of Wolinella succinogenes wasfound, comprising molybdopterin-dependent formate dehydrogenase

(TTP138) and polysulfide reductase (TTC154, TTC155)24. An addi-tional cluster is present comprising several genes homologous to thesox genes (TTC1046–1060) of many sulfur-oxidizing organisms. Thecluster, which is encoded downstream of ccm-like cytochrome c matu-ration genes (TTC1034–1045), is homologous to the periplasmic thio-sulfate:cytochrome c oxidoreductase complex of Rhodopseudomonaspalustris25.

Metabolic features with biotechnological potentialTwo biosynthetic pathways, the carotenoid and the vitamin B12biosynthesis, as well as some single gene products of biotechnologicalinterest, are described below.

Vitamin B12 (cobalamin) biosynthesis. Vitamin B12 production isan important biotechnology product, as animals (including humans)require this cofactor in trace amounts; large-scale production is car-ried out using a number of bacterial species including strains ofPseudomonas denitrificans and Propionibacterium freudenreichii26–28.As all genes for cobalamin production are present in the genome ofT. thermophilus HB27, the organism could potentially be exploited as aproducer of cobalamin under thermophilic conditions. T. ther-mophilus HB27 harbors corresponding genes on the chromosome aswell as on the megaplasmid (Fig. 3). The chromosomally encodedgenes are dispersed; they encode proteins involved in the pathway thatconverts δ-aminolevulinate into uroporphyrinogen III, the precursorof all common tetrapyrrols. All genes for subsequent conversion ofuroporphyrinogen III to cobalamin are located on pTT27; they are

550 VOLUME 22 NUMBER 5 MAY 2004 NATURE BIOTECHNOLOGY

Table 2 Selected peptidases identified in the genome sequence of T. thermophilus HB27

ORF number Annotation Class/family Specificity/comment Homology

TTC1128, TTC35 FtsH homolog M41,ATP-dependent zinc protease Unfolded proteins TT, DR

TTC418, TTC746, TTC1975 Lon protease S16, ATP-dependent serine protease Denatured proteins TT, DR

(heat shock protein)

TTC1662 Peptide deformylase Nascent proteins (iron as cofactor) TT, DR

TTC788, TTC1306, TTC1529, Aminopeptidase T M29 Co and Mg as cofactors TT, BM

TTC1449

TTC1716 Leader peptidase PilD A24A TT, NP

TTC494, TTC404, TTC458, Putative hydrolase Metallo-beta-lactamase superfamily diverse

TTC775, TTC917, etc.

TTP194, TTC372 Subtilisin homolog, thermitase-like S8 DR, BS

TTC8, TTC888 Putative glycoprotease M22 Similar to O-sialo- TM

glycoprotein peptidase

TTC133, TTC986 N-acyl-L-amino acid amidohydrolase M20 DR

TTC173/174, TTC250/251 Clp-like protease S14, with ATPase Misfolded protein turnover? DR

TTC233, TTC1637 Putative peptidyl-prolyl cis-trans isomerase Rotamase domain Protein folding accelerator

TTC264/265 Proteasome-like protease Clp chaperone family Heat shock protein

TTC351, TTC1411, TTC1965 Putative cell-wall endopeptidase M37

TTC403 γ-glutamyltrans-peptidase T03 Glutathione metabolism BS, DR

TTC417, TTC956, TTC1905 Serine protease Trypsin domain Periplasmic enzyme DR

TTC531 Pyroglutamyl peptidase C15 Thiol protease

TTC703, TTC1273 Xaa-Pro aminopeptidase M24

TTC713, TTC819, TTC828 Putative transpeptidase Penicillin-binding protein Cell wall synthesis

TTC900/901 Zinc protease M16 BS

TTC929 Carboxy-terminal processing protease S41

TTC1168, TTC1501 Oligopeptidase F M3 Cleavage of medium sized peptides

TTC1334 Carboxypeptidase G2 M20 Applied in cancer therapy Psp

TTC1457 Proline iminopeptidase S33 Release of a N-terminal proline

TTC1715 Carboxypeptidase Taq M32 (zinc protease) Broad specificity, thermostable TA

TTC1729 Cytosol aminopeptidase M17 Turnover of intracellular proteins

TT, T. thermophilus; DR, D. radiodurans; BM, Brucella melitensis; NP, Nostoc punctiforme; BS, Bacillus stearothermophilus; TM, T. maritima; Psp, Pseudomonas sp.; TA, T. aquaticus.

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy

A RT I C L E S

organized in a single gene cluster consisting of 23 ORFs (TTP01–23).Obviously, the capability to synthesize tetrapyrrols has evolved inde-pendently from the specialization towards cobalamin formation.Looking at the pathway in T. thermophilus HB27 in detail, homologs toall proteins catalyzing the oxygen-independent pathway, like that ofP. freudenreichii, can be found. On the other hand, enzymes involved inthe aerobic biosynthetic route of corrin ring formation, such as CobG(oxygen-requiring C-20 hydroxylase), CobF (C-1 methylase) andCobNST (oxygen-dependent cobalt insertion complex), which arepresent in P. denitrificans, are apparently absent from the genome ofT. thermophilus HB27 (Fig. 3).

Carotenoid biosynthesis. Carotenoids are a diverse class of naturalpigments that are of interest as food colorants and nutrient supple-ments as well as for pharmaceuticals29. An ongoing debate concernstheir possible cancer-preventive properties30. T. thermophilus, likemost Thermus species, synthesizes yellow carotenoid-like pigments.The main carotenoids of T. thermophilus are thermozeaxanthins andthermobiszeaxanthins31. In analogy to cobalamin biosynthesis, theterminal steps of carotenoid biosynthesis are encoded on the plasmid,whereas precursor synthesis, the formation of geranylgeranylpyrophosphate (GGPP) via the mevalonate-independent 2-C-methyl-D-erythritol-4-phosphate pathway, is accomplished by enzymesencoded on the chromosome (Fig. 3). The phytoene synthase (CrtB,TTP57), catalyzing the first step of the carotenoid pathway after GGPP,has been characterized previously32,33. Other relevant plasmid-encoded proteins are phytoene dehydrogenase (TTP66) and lycopenecyclase, probably encoded by TTP60. The latter ORF shows homologyto the N terminus of the fusion protein lycopene cyclase/phytoene

synthase of various fungi. Although the plas-mid-encoded carotenoid biosynthetic genesseem not to be organized in a single gene clus-ter, their proximity on the plasmid is apparent(Fig. 3).

Single gene products of biotechnologicalinterest. As mentioned above, enzymes ofthermophilic organisms are not only morethermostable, but also more resistant to chem-ical agents than their mesophilic homologs7,8.In particular, polymer-degrading enzymes ofthermophilic bacteria have attracted consider-able attention because of their potential use infood, chemical and pharmaceutical indus-tries9,10. Using the complete genome sequen-ces, we searched for additional targets ofbiotechnological interest (see SupplementaryTable 3 online).

Proteases/peptidases. The amount of pro-teases produced worldwide on a commercialscale is larger than that of any other bio-technologically applied enzymes34. Beforegenome sequencing, five proteases had beenidentified in both T. thermophilus and T. aquaticus (peptidase database MEROPS,Sanger Institute). Many more proteases andpeptidases with diverse specificities have nowbeen found: the genome encodes at least 42additional enzymes, most of which belong tothe classes of serine proteases and metal-lopeptidases (Table 2). An example of anenzyme of possible biotechnological interestis a peptidase (TTC1334) similar to car-

boxypeptidase G2 of Pseudomonas sp., which is in use as a powerfulprodrug-converting enzyme applied in cancer therapy as well as a res-cue agent during high-dose methotrexate therapy35.

Starch/oligosaccharide-hydrolyzing enzymes. T. thermophilus HB27lacks an α-amylase. Instead, two enzymes (TTC1198, TTC1828) werefound, which show homologies to pullulanases. TTC1198 is almostidentical to a pullulanase of type I of Bacillus flavocaldarius36, whereasTTC1828 exhibits homology to the type II pullulanases. From the classof oligosaccharide-degrading enzymes, two α-glucosidases (TTP221,TTC107), a β-glucosidase (TTP42), a maltodextrin glucosidase(TTC1283), a putative trehalase (TTC614), as well as a α-galactosidase(TTP72) and two putative β-galactosidases (TTP220, TTP222) werefound. The formation and degradation of glycogen is catalyzed by aglycogen synthase cluster (TTC1976-1981) and a glycogen phosphory-lase (TTC808), respectively.

Esterases. Esterases attract increasing attention because of their usein organic synthesis37. HB27 possesses several putative esterases(TTC552, TTC824, TTC904, TTC1341, TTC1494, TTC1787) as well asa possible enterochelin esterase (TTC749).

DNA/RNA processing enzymes. Thermophilic DNA polymerases ofvarious Thermus species are of indispensable value in PCR techniques.DNA polymerases of T. thermophilus strains—the Tth polymerase ofstrain HB8 (almost identical to TTC690) and Tte polymerase of strainB3538, for example—have been studied in detail. Besides a DNA poly-merase III (TTC1806, TTC1609, TTC1588, TTC461)39, strain HB27possesses the gene of an additional DNA-dependent DNA polymeraseIV (TTC785). The DNA-directed RNA polymerase (TTC1300,TTC1460, TTC1461) could be an interesting tool in molecular biology

NATURE BIOTECHNOLOGY VOLUME 22 NUMBER 5 MAY 2004 551

Figure 3 The carotenoid and the cobalamin biosynthesis pathways in T. thermophilus HB27. The genes encoding those enzymes, which catalyze the first steps of both pathways are scattered around the chromosome (ORF numbers in green). Subsequent steps are encoded by gene clusters on theplasmid pTT27 (in red). Additional genes related to carotenoid formation are TTP67, an isopentenyldiphosphate (IPP) isomerase, which guarantees the interchangeability between IPP and dimethylallyldiphosphate (DMAPP), and TTP47, a protein similar to carotenoid isomerases, exhibiting cis-to-transisomerization activity. Copies of the genes encoding IPP isomerase and CysG/CobA were found on thechromosome as well as on the plasmid.

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy

A RT I C L E S

for in vitro transcription assays, for example. Further interestingenzymes are a DNA ligase (TTC732) and several DNA helicases. Thegenome contains genes for about 17 nucleases; at least seven of themare endonucleases, such as endonucleases III (TTC1892), IV (TTC482,TTC1571) and V (TTC982).

Phosphatases. A number of phosphatases, some of which have beenidentified in other Thermus species, such as an alkaline phosphatase(TTP24) and an acid phosphatase (TTC1252), were found40. Apyrophosphatase (TTC1600) of T. thermophilus HB8 is commerciallyavailable9. In addition, a system for the formation and degradation ofpolyphosphates is present, comprising polyphosphate kinase(TTC637) and exopolyphosphatase (TTC636), respectively.

Alcohol dehydrogenases. Alcohol dehydrogenases (TTC97,TTC1572) of thermophilic organisms are thought to be promisingbiocatalysts in industrial processes41. Altogether, more than 100 dehy-drogenases or oxidoreductases were found to be encoded in thegenome of strain HB27, most of which have not been analyzed.

Among other interesting and until now unknown enzymes in T.thermophilus HB27 are those involved in a ring-cleavage pathway forthe degradation of phenylacetate, comprising homologs of 4-hydroxy-phenylacetate-3-monooxygenase and homoprotocatechuate 2,3-dioxygenase (TTC591-609). Interestingly, phenylacetic acid has beendescribed as a component of the peptidoglycan of T. thermophilus42.

DISCUSSIONThe complete genome sequence of the extremely thermophilic bac-terium Thermus thermophilus provides a solid foundation for investi-gating many aspects of thermophilic lifestyle; these range frommolecular stability determinants to key elements of organismic physi-ology. What contributions megaplasmid pTT27 makes to this lifestyleis an intriguing but still open question. Carotenoids may be a case inpoint; their biosynthesis is largely encoded on pTT27 and such pig-ments are thought to reduce membrane fluidity33,43.

A large number of orthologs with a high degree of sequence iden-tity are shared between T. thermophilus and D. radiodurans. Thisopens a unique opportunity for comparative studies of conforma-tional and chemical thermostability of proteins. In sharp contrast tothis similarity at the level of individual molecules, the two genomeshave little overall synteny. This attests to surprising flexibility ofgenome structure in at least one of the two lineages since their lastcommon ancestor.

Finally, T. thermophilus has already proven useful as a source ofnumerous thermostable biological macromolecules — to the benefitof both basic research and biotechnology. These hitherto sporadicefforts can now be put on a broad and systematic basis.

METHODSSequencing strategy. From cell material of T. thermophilus HB27, grown from asingle cell isolated by the ‘optical tweezers method44, total genomic DNA wasextracted and sheared. Several shotgun libraries were constructed using sizefractions ranging from 1 to 3 kbp. A cosmid library was constructed fromSau3AI partially digested genomic DNA cloned in the cosmid vectorSuperCos1 (Stratagene). Insert ends of the recombinant plasmids and cosmidswere sequenced using ABI Prism 377 DNA sequencers (Applied Biosystems)with dye-terminator chemistry and Licor IR4200 devices (LI-COR). Sequenceswere processed with Phred and assembled into contigs using the Phrap assem-bling tool (http://www.phrap.org/). Sequence editing was done using GAP4,which is part of the Staden package software45. A coverage of 9.1-fold wasobtained after the assembly of about 28,400 sequences. The resulting contigswere ordered according to the previously determined physical map, which con-tains the location of 61 genes from T. thermophilus and other Thermus strains46.To solve problems with misassembled regions caused by repetitive sequences

and to close remaining sequence gaps, PCR-based techniques and primer walk-ing on recombinant plasmids and cosmids were applied. The final mean errorrate on nucleotide level was <1 per 10,000 bases.

Gene prediction and annotation. Initial gene prediction was accomplishedusing GeneMarkS47. The output was verified and edited manually using criteriasuch as the presence of a ribosome binding site and codon usage analysis.Annotation was done using the ERGO tool (Integrated Genomics,http://www.integratedgenomics.com/) in a two-step approach. Initially, all pro-teins were searched against a nonredundant database by FASTA3 resulting in anautomatic annotation. This database comprises all publicly available sequencedata in addition to unfinished genomes sequenced by the Göttingen GenomicsLaboratory. All predictions were verified and modified manually by comparingthe protein sequences to corresponding entries in the public databasesSwissProt, GenBank, ProDom, COG, PROSITE and Pfam.

Nucleotide sequence accession number. The sequence reported in this paperhas been deposited in GenBank with accession numbers AE017221 (chromo-some) and AE017222 (plasmid).

Note: Supplementary information is available on the Nature Biotechnology website.

ACKNOWLEDGMENTSWe are grateful to Reinhard Sterner and Wolfgang Liebl for advice and discussionsthat helped to shape this project. We thank Takayuki Hoshino for strainconfirmation. This work was supported by a grant of the NiedersächischesMinisterium für Wissenschaft und Kultur to the Göttingen Genomics Laboratoryand by funds of the Competence Network Göttingen “Genome Research onBacteria” financed by the German Federal Ministry of Education and Research(BMBF).

COMPETING INTERESTS STATEMENTThe authors declare that they have no competing financial interests.

Received 24 September 2003; accepted 18 January 2004Published online at http://www.nature.com/naturebiotechnology/

1. Brock, T.D. & Freeze, H. Thermus aquaticus gen. n. and sp. n., a nonsporulatingextreme thermophile. J. Bacteriol. 98, 289–297 (1969).

2. Oshima, T. & Imahori, K. Description of Thermus thermophilus (Yoshida and Oshima)comb. nov., a nonsporulating thermophilic bacterium from a Japanese thermal spa.Int. J. Syst. Bacteriol. 24, 102–112 (1974).

3. Williams, R.A., Smith, K.E., Welch, S.G., Micallef, J. & Sharp, R.J. DNA relatednessof Thermus strains, description of Thermus brockianus sp. nov., and proposal toreestablish Thermus thermophilus (Oshima and Imahori). Int. J. Syst. Bacteriol. 45,495–499 (1995).

4. Koyama, Y., Hoshino, T., Tomizuka, N. & Furukawa, K. Genetic transformation of theextreme thermophile Thermus thermophilus and of other Thermus spp. J. Bacteriol.166, 338–340 (1986).

5. Friedrich, A., Prust, C., Hartsch, T., Henne, A. & Averhoff, B. Molecular analyses ofthe natural transformation machinery and identification of pilus structures in theextremely thermophilic bacterium Thermus thermophilus strain HB27. Appl. Environ.Microbiol. 68, 745–755 (2002).

6. Wimberly, B.T. et al. Structure of the 30S ribosomal subunit. Nature 407, 327–339(2000).

7. Vieille, C. & Zeikus, G.J. Hyperthermophilic enzymes: sources, uses, and molecularmechanisms for thermostability. Microbiol. Mol. Biol. Rev. 65, 1–43 (2001).

8. Sterner, R. & Liebl, W. Thermophilic adaptation of proteins. Crit. Rev. Biochem. Mol.Biol. 36, 39–106 (2001).

9. Pantazaki, A.A., Pritsa, A.A. & Kyriakidis, D.A. Biotechnologically relevant enzymesfrom Thermus thermophilus. Appl. Microbiol. Biotechnol. 58, 1–12 (2002).

10. Niehaus, F., Bertoldo, C., Kahler, M. & Antranikian, G. Extremophiles as a source ofnovel enzymes for industrial application. Appl. Microbiol. Biotechnol. 51, 711–729(1999).

11. Weisburg, W.G., Giovannoni, S.J. & Woese, C.R. The Deinococcus-Thermus phylumand the effect of rRNA composition on phylogenetic tree construction. Syst. Appl.Microbiol. 11, 128–134 (1989).

12. Hensel, R., Demharter, W., Kandler, O., Kroppenstedt, R.M. & Stackebrandt, E.Chemotaxonomic and molecular-genetic studies of the genus Thermus: evidence for aphylogenetic relationship of Thermus aquaticus and Thermus ruber to the genusDeinococcus. Int. J. Syst. Bacteriol. 36, 444–453 (1986).

13. White, O. et al. Genome sequence of the radioresistant bacterium Deinococcus radio-durans R1. Science 286, 1571–1577 (1999).

14. Makarova, K.S. et al. Genome of the extremely radiation-resistant bacteriumDeinococcus radiodurans viewed from the perspective of comparative genomics.Microbiol. Mol. Biol. Rev. 65, 44–79 (2001).

15. Fryxell, K.J. & Zuckerkandl, E. Cytosine deamination plays a primary role in the evo-

552 VOLUME 22 NUMBER 5 MAY 2004 NATURE BIOTECHNOLOGY

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy

A RT I C L E S

lution of mammalian isochores. Mol. Biol. Evol. 17, 1371–1383 (2000).16. Starkuviene, V. & Fritz, H.J. A novel type of uracil-DNA glycosylase mediating repair

of hydrolytic DNA damage in the extremely thermophilic eubacterium Thermus ther-mophilus. Nucleic Acids Res. 30, 2097–2102 (2002).

17. Friedrich, A., Rumszauer, J., Henne, A. & Averhoff, B. Pilin-like proteins in theextremely thermophilic bacterium Thermus thermophilus HB27: implication in com-petence for natural transformation and links to type IV pilus biogenesis. Appl.Environ. Microbiol. 69, 3695–3700 (2003).

18. Slesarev, A.I., et al. The complete genome of hyperthermophile Methanopyrus kand-leri AV19 and monophyly of archaeal methanogens. Proc. Natl. Acad. Sci. USA 99,4644–4649 (2002).

19. Mather, M.W., Springer, P., Hensel, S., Buse, G. & Fee, J.A. Cytochrome oxidasegenes from Thermus thermophilus. Nucleotide sequence of the fused gene and analy-sis of the deduced primary structures for subunits I and III of cytochrome caa3. J. Biol. Chem. 268, 5395–5408 (1993).

20. Soulimane, T. et al. Structure and mechanism of the aberrant ba(3)-cytochrome c oxi-dase from Thermus thermophilus. EMBO J. 19, 1766–1776 (2000).

21. Yamaguchi, M., Stout, C.D. & Hatefi, Y. The proton channel of the energy-transducingnicotinamide nucleotide transhydrogenase of Escherichia coli. J. Biol. Chem. 277,33670–33675 (2002).

22. Yokoyama, K. et al. V-type H+-ATPase/synthase from a thermophilic eubacterium,Thermus thermophilus. Subunit structure and operon. J. Biol. Chem. 275,13955–13961 (2000).

23. Ramirez-Arcos, S., Fernandez-Herrero, L.A. & Berenguer, J.A. Thermophilic nitratereductase is responsible for the strain specific anaerobic growth of Thermus ther-mophilus HB8. Biochim. Biophys. Acta. 1396, 215–227 (1998).

24. Krafft, T., Gross, R. & Kroger, A. The function of Wolinella succinogenes psr genes inelectron transport with polysulphide as the terminal electron acceptor. Eur. J.Biochem. 230, 601–606 (1995).

25. Larimer, F.W. et al. Complete genome sequence of the metabolically versatile photosyn-thetic bacterium Rhodopseudomonas palustris. Nat. Biotechnol. 22, 55–61 (2004).

26. Martens, J.H., Barg, H., Warren, M.J. & Jahn, D. Microbial production of vitaminB12. Appl. Microbiol. Biotechnol. 58, 275–285 (2002).

27. Debussche, L., Thibaut, D., Cameron, B., Crouzet, J. & Blanche, F. Biosynthesis ofthe corrin macrocycle of coenzyme B12 in Pseudomonas denitrificans. J. Bacteriol.175, 7430–7440 (1993).

28. Roessner, C.A., Huang, K.X., Warren, M.J., Raux, E. & Scott, A.I. Isolation and char-acterization of 14 additional genes specifying the anaerobic biosynthesis of cobal-amin (vitamin B12) in Propionibacterium freudenreichii (P. shermanii). Microbiology148, 1845–1853 (2002).

29. Sandmann, G. Carotenoid biosynthesis and biotechnological application. Arch.Biochem. Biophys. 385, 4–12 (2001).

30. Nishino, H. et al. Carotenoids in cancer chemoprevention. Cancer Metastasis Rev.21, 257–264 (2002).

31. Yokoyama, A., Shizuri, Y., Hoshino, T. & Sandmann, G. Thermocryptoxanthins: novel

intermediates in the carotenoid biosynthetic pathway of Thermus thermophilus. Arch.Microbiol. 165, 342–345 (1996).

32. Tabata, K., Ishida, S., Nakahara, T. & Hoshino, T. A carotenogenic gene cluster existson a large plasmid in Thermus thermophilus. FEBS Lett. 341, 251–255 (1994).

33. Hoshino, T., Fujii, R. & Nakahara, T. Molecular cloning and sequence analysis of thecrtB gene of Thermus thermophilus HB27, an extreme thermophile producingcarotenoid pigments. Appl. Environ. Microbiol. 59, 3150–3153 (1993).

34. Daniel, R.M., Toogood, H.S. & Bergquist, P.L. Thermostable proteases. Biotechnol.Genet. Eng. Rev. 13, 51–100 (1996).

35. Rowsell, S. et al. Crystal structure of carboxypeptidase G2, a bacterial enzyme withapplications in cancer therapy. Structure 5, 337–347 (1997).

36. Suzuki, Y., Hatagaki, K. & Oda, H. A hyperthermostable pullulanase produced by anextreme thermophile, Bacillus flavocaldarius KP 1228, and evidence for the prolinetheory of increasing protein thermostability. Appl. Microbiol. Biotechnol. 34,707–714 (1991).

37. Jaeger, K.E., Dijkstra, B.W. & Reetz, M.T. Bacterial biocatalysts: molecular biology,three-dimensional structures, and biotechnological applications of lipases. Annu.Rev. Microbiol. 53, 315–351 (1999).

38. Ruttimann, C., Cotoras, M., Zaldivar, J. & Vicuna, R. DNA polymerases from theextremely thermophilic bacterium Thermus thermophilus HB-8. Eur. J. Biochem.149, 41–46 (1985).

39. Bullard, J.M. et al. DNA polymerase III holoenzyme from Thermus thermophilus:identification, expression, purification of components, and use to reconstitute a pro-cessive replicase. J. Biol. Chem. 277, 13401–13408 (2002).

40. Pantazaki, A.A., Karagiorgas, A.A., Liakopoulou-Kyriakides, M. & Kyriakidis, D.A.Hyperalkaline and thermostable phosphatase in Thermus thermophilus. Appl.Biochem. Biotechnol. 75, 249–259 (1998).

41. Adachi, O. et al. New developments in oxidative fermentation. Appl. Microbiol.Biotechnol. 60, 643–653 (2003).

42. Quintela, J.C., Pittenauer, E., Allmaier, G., Aran, V. & de Pedro, M.A. Structure ofpeptidoglycan from Thermus thermophilus HB8. J. Bacteriol. 177, 4947–4962(1995).

43. Chamberlain, N.R. et al. Correlation of carotenoid production, decreased membranefluidity, and resistance to oleic acid killing in Staphylococcus aureus 18Z. Infect.Immun. 59, 4332–4337 (1991).

44. Huber, R. et al. Isolation of a hyperthermophilic archaeum predicted by in situ RNAanalysis. Nature 376, 57–58 (1995).

45. Staden, R., Beal, K.F. & Bonfield, J.K. The Staden package, 1998. Methods Mol.Biol. 132, 115–130 (2000).

46. Tabata, K. & Hoshino, T. Mapping of 61 genes on the refined physical map of thechromosome of Thermus thermophilus HB27 and comparison of genome organizationwith that of T. thermophilus HB8. Microbiology 142, 401–410 (1996).

47. Besemer, J., Lomsadze, A. & Borodovsky, M. GeneMarkS: a self-training method forprediction of gene starts in microbial genomes. Implications for finding sequencemotifs in regulatory regions. Nucleic Acids Res. 29, 2607–2618 (2001).

NATURE BIOTECHNOLOGY VOLUME 22 NUMBER 5 MAY 2004 553

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy