The Rhodobacter capsulatus genome

10
Photosynthesis Research 70: 43–52, 2001. © 2001 Kluwer Academic Publishers. Printed in the Netherlands. 43 Minireview The Rhodobacter capsulatus genome Robert Haselkorn 1,* , Alla Lapidus 1 , Yakov Kogan 1 , Cestmir Vlcek 2 , Jan Paces 2 , Vaclav Paces 2 , Pavel Ulbrich 2 , Tamara Pecenkova 2 , Denis Rebrekov 3 , Arthur Milgram 4 , Mikhail Mazur 1 , Ran- dal Cox 1 , Nikos Kyrpides 1 , Natalia Ivanova 1 , Vinayak Kapatral 1 , Tamara Los 1 , Athanasios Lykidis 1 , Natalia Mikhailova 1 , Gary Reznik 1 , Olga Vasieva 1 & Michael Fonstein 1 1 Integrated Genomics, Inc., 2201 West Campbell Park Drive, Chicago, IL 60612, USA, 2 Institute of Mo- lecular Genetics, Academy of Sciences of the Czech Republic and Institute of Chemical Technology, Prague CZ-16637, Czech Republic; 3 Russian Academy of Sciences, Institute of Bioorganic Chemistry, Moscow, Russia; 4 University of Pennsylvania, Philadelphia, PA 19104, USA; * Author for correspondence (e-mail: [email protected]; fax +1-312-491-0856) Key words: annotation, comparative genomics, DNA sequence, metabolic reconstruction Abstract The genome of Rhodobacter capsulatus has been completely sequenced. It consists of a single chromosome containing 3.5 Mb and a circular plasmid of 134 kb. This effort, started in 1992, began with a fine-structure restriction map of an overlapping set of cosmids that covered the genome. Cosmid sequencing led to a gapped genome that was filled by primer walking on the chromosome and by using lambda clones. Methods had to be developed to handle strong stops in the high GC (68%) inserts. Annotation was done with the ERGO system at Integrated Genomics, as was the reconstruction of the cell’s metabolism. It was possible to recognize 3709 orfs of which functional assignments could be made with high confidence to 2392 (65%). Unusual features include the presence of numerous cryptic phage genomes embedded in the chromosome. Abbreviations: BS – B. subtilis; EC – E. coli; Numbers assigned to proteins are the standard enzyme nomenclature of the International Union of Biochemistry and Molecular Biology Introduction Rhodobacter capsulatus has been a favorite research tool in the areas of photosynthesis, energetics and nitrogen fixation for many years. In addition to its fa- cility in growth habits, there is a convenient system for genetic analysis using the defective phage called Gene Transfer Agent (GTA). It is also possible to transfer DNA from E. coli to Rhodobacter using broad host range plasmid-based vectors. Through the decade of the 1980s, most efforts to understand the molecular genetics of Rhodobacter were based on cloning and sequencing genes for photosynthesis, electron trans- fer, nitrogen fixation and all of the regulatory proteins required for their expression. A more global approach to the genome was started in 1990. Physical mapping A rough circular genetic map of the Rhodobacter chromosome had been established using crosses with the GTA as well as conjugative crosses. Circularity was confirmed by the construction of a low-resolution physical map based on pulsed field gel analysis of restriction fragments generated by rarely cutting en- donucleases. For the latter analyses, several enzymes were found to produce a manageable number of frag- ments, and then each of the fragments was eluted, labeled, and hybridized to other enzyme digests. This produced the overlaps that permitted the map con- struction (Fonstein et al. 1992). In some cases, the map had ambiguities, which could be resolved by isol- ating phage clones that bridged the ends of two large fragments.

Transcript of The Rhodobacter capsulatus genome

Photosynthesis Research 70: 43–52, 2001.© 2001 Kluwer Academic Publishers. Printed in the Netherlands.

43

Minireview

The Rhodobacter capsulatus genome

Robert Haselkorn1,!, Alla Lapidus1, Yakov Kogan1, Cestmir Vlcek2, Jan Paces2, Vaclav Paces2,Pavel Ulbrich2, Tamara Pecenkova2, Denis Rebrekov3, Arthur Milgram4, Mikhail Mazur1, Ran-dal Cox1, Nikos Kyrpides1, Natalia Ivanova1, Vinayak Kapatral1, Tamara Los1, AthanasiosLykidis1, Natalia Mikhailova1, Gary Reznik1, Olga Vasieva1 & Michael Fonstein11Integrated Genomics, Inc., 2201 West Campbell Park Drive, Chicago, IL 60612, USA, 2Institute of Mo-lecular Genetics, Academy of Sciences of the Czech Republic and Institute of Chemical Technology, PragueCZ-16637, Czech Republic; 3Russian Academy of Sciences, Institute of Bioorganic Chemistry, Moscow,Russia; 4University of Pennsylvania, Philadelphia, PA 19104, USA; !Author for correspondence (e-mail:[email protected]; fax +1-312-491-0856)

Key words: annotation, comparative genomics, DNA sequence, metabolic reconstruction

Abstract

The genome of Rhodobacter capsulatus has been completely sequenced. It consists of a single chromosomecontaining 3.5 Mb and a circular plasmid of 134 kb. This effort, started in 1992, began with a fine-structurerestriction map of an overlapping set of cosmids that covered the genome. Cosmid sequencing led to a gappedgenome that was filled by primer walking on the chromosome and by using lambda clones. Methods had to bedeveloped to handle strong stops in the high GC (68%) inserts. Annotation was done with the ERGO system atIntegrated Genomics, as was the reconstruction of the cell’s metabolism. It was possible to recognize 3709 orfs ofwhich functional assignments could be made with high confidence to 2392 (65%). Unusual features include thepresence of numerous cryptic phage genomes embedded in the chromosome.

Abbreviations: BS – B. subtilis; EC – E. coli; Numbers assigned to proteins are the standard enzyme nomenclatureof the International Union of Biochemistry and Molecular Biology

Introduction

Rhodobacter capsulatus has been a favorite researchtool in the areas of photosynthesis, energetics andnitrogen fixation for many years. In addition to its fa-cility in growth habits, there is a convenient system forgenetic analysis using the defective phage called GeneTransfer Agent (GTA). It is also possible to transferDNA from E. coli to Rhodobacter using broad hostrange plasmid-based vectors. Through the decade ofthe 1980s, most efforts to understand the moleculargenetics of Rhodobacter were based on cloning andsequencing genes for photosynthesis, electron trans-fer, nitrogen fixation and all of the regulatory proteinsrequired for their expression. A more global approachto the genome was started in 1990.

Physical mapping

A rough circular genetic map of the Rhodobacterchromosome had been established using crosses withthe GTA as well as conjugative crosses. Circularitywas confirmed by the construction of a low-resolutionphysical map based on pulsed field gel analysis ofrestriction fragments generated by rarely cutting en-donucleases. For the latter analyses, several enzymeswere found to produce a manageable number of frag-ments, and then each of the fragments was eluted,labeled, and hybridized to other enzyme digests. Thisproduced the overlaps that permitted the map con-struction (Fonstein et al. 1992). In some cases, themap had ambiguities, which could be resolved by isol-ating phage clones that bridged the ends of two largefragments.

44

The resolution of the physical map was improvedsubsequently by the construction of a library of cos-mid clones and the arrangement of these clones into anordered set covering the chromosome uniquely (Fon-stein and Haselkorn 1993). This was accomplishedby spotting the library of cosmid-containing cells ona grid and then probing the grid with individuallylabeled cosmids. Each colony hit was then grown up,extracted and labeled to use as probe for the next roundof hybridization. Eventually, a set of 192 overlappingcosmids covered the chromosome and a 134-kb plas-mid. The ordering of the cosmid set relied on threekinds of data: the initial cosmid-cosmid hybridiza-tion, subsequent hybridization of transcripts from thecosmid ends with other cosmids, and hybridizationwith bridging clones of lambda phage inserts. Oftenall three data sets were required and in some casesrepeated DNA elements confounded the mapping sobadly that the mapping could not be completed untilthe full DNA sequence was obtained.

A high-resolution physical map was generatedfrom the cosmid set. One purpose of this map was toconfirm the ordering of the set. To do this, again twodatasets were generated. One was a complete digestof each cosmid with 6-base cutting endonucleases, tocreate a catalogue of fragment sizes for each cosmid.Often, this catalogue sufficed to construct the over-laps. For more precision, each cosmid was linearizedat the cos site using lambda terminase. Then each endwas labeled specifically with oligonucleotides com-plementary to either the left or right lambda end.Finally, each end-labeled cosmid was digested par-tially with a 6-base cutting endonuclease and the sizesof the partials were determined. The two data sets per-mitted precise determination of the restriction map foreach enzyme used.

This high-resolution map allowed the generationof an early version of a gene microarray (Fonstein etal. 1995). The full set of 192 cosmids was digestedwith one or more endunucleases and the digests wereseparated by electrophoresis and blotted. The resultingblots contained more than 500 DNA fragments arrayedin 192 columns, according to their location on thephysical map. This master blot was used in severalapplications: For expression profiling, for mappingindividual genes and for strain comparisons.

Genome sequencing

The sequencing project was started before it becameclear that a whole-genome shotgun would be feasiblefor a 3.6-Mb genome. It was, therefore, decided to se-quence one cosmid at a time and that is how the projectbegan. Plasmid libraries were constructed from eachcosmid and sequencing was done mostly on two ABI377 machines in the University of Chicago CancerCenter DNA Sequencing Facility. Part of the sequen-cing was done in Prague in the Institute of Geneticsof the Academy of Sciences of the Czech Republic(Vlcek et al. 1997). There were several ‘interesting’technical problems to be overcome. For example, thehigh GC content of Rhodobacter DNA led to regionsin which runs of g and runs of C form structures thatstop the sequencing reaction. These required specialpriming and the incorporation of modified bases to getby the strong stops. There were some regions that didnot appear in the cosmid or plasmid libraries; thesehad to be sequenced by long-range PCR on intactcellular DNA template or on circularized restrictiondigests of total cellular DNA. In some cases, the rel-evant sequences were found in lambda phage clones.Throughout, questionable areas were verified by PCR.As of this writing, there are still two regions that havenot been closed. One gap is flanked by highly repet-itive DNA that could be telomeres, so we may haveto do conventional genetic experiments to determinewhether the chromosome is actually circular.

Features of the sequence

The following sections were provided by mem-bers of the annotation and reconstruction teams atIntegrated Genomics. The complete sequence andits annotation are freely available on the web sitewww.integratedgenomics.com.

Species comparisons

The R. capsulatus genome was compared with thatof R. sphaeroides and R. palustris globally. PutativeORFs were identified by using IG’s proprietary soft-ware. All three genomes showed approximately equaldensity of about 1 ORF/Kb (Table 1). Over 60% ofthe genes in all three genomes have a putative func-tion assigned to them, with R. sphaeroides having thehighest percentage (78%). R. capsulatus showed thelowest number of ‘unique’ ORFs (9%) (i.e. ORFs with

45

no sequence similarity to any other genome), and R.palustris the highest (14%). The number of proteinfamilies present in all three genomes with at least onemember from each genome is 1094 families. Mostof these families are represented widely among thebacteria. If we ask for families that have at least onemember only in this group of three, there is only onesuch family. This family is of hypothetical proteins, sowe have no idea of its functionality.

Special features

The R. capsulatus genome has 55 identified IS ele-ments and 27 transposases (ranging in size up to766aa). We have also identified 41 phage-related genesin the chromosome, most of which are in 6–7 differentlocations, which we presume to correspond to crypticphage genomes. One of these is the well-known GeneTransfer Agent (Lang and Beatty 2001). The 15 genesencoding structural components of the GTA particleare clustered in a 15-kb region, expressed as the cellsenter stationary phase under the control of a two-component system, CckA and CtrA, both of whosegenes are located nearby. R. palustris has apparenthomologues of all 15 genes, but these genes are dis-persed among several distant chromosomal regionsand R. palustris does not produce GTA. Two of theother putative cryptic phage genomes in R. capsulatushave several orfs that are similar to those of the GTA.Codon use among the orfs in these phages suggeststhat they are relatively recent additions to the genome,that is, the result of lateral transfer.

Photosynthesis

Under microaerobic and anaerobic conditions, Rhodo-bacter capsulatus develops an intracytoplasmic mem-brane system containing light-harvesting complexes (Iand II) and photochemical reaction centers (Bauer andBird 1996; Pemberton et al. 1998; Cogdell et al. 1999;Loach 2000). Together with a membrane-embeddedquinone pool and the cytochrome bc1 complex, theycompose a conventional purple bacterial photosyn-thetic system. Functional annotation of the R. capsu-laus genome sequence revealed that all of the genesrequired for the structural and functional organizationof this photosystem are present in the sequence:1. There are operons that encode bacteriochloro-

phyll and carotenoid biosynthesis-related pro-teins (bacteriochlorophyll synthase, magnesium-protoporphyrin IX monomethyl ester oxidative

cyclase, phytoene synthase and dehydrogenase,spheroidene monooxygenase, etc.).

2. There are two operons (puc and puf) that con-tain sequences encoding the light-harvesting com-plexes (LH II and I, correspondingly) and the reac-tion center structural proteins: B-800/850 (alpha,beta and gamma chains-puc operon, LH II), B-870(alpha and beta chains-puf operon) as well as Mand L chains of the reaction center protein (puf op-eron). In addition, the puf operon also contains thesequence for the PufX protein, believed to be a partof LH I and responsible for the correct orientationof the reaction center with respect to the LH I com-plex. Another puf operon-encoded protein, PufQ,is involved in early events of bacteriochlorophyllbiosynthesis.

3. There are also orfs encoding cytochrome c2(soluble) precursors, proteins of ubiquinol-cytochrome c2 oxidoreductase (cytochrome bc1complex) as well as proteins for (ubi)quinonemetabolism.

The formation of the photosynthetic apparatus inRhodobacter capsulatus is regulated by oxygen ten-sion. Several studies have shown a regulatory effectof oxygen on the transcription of photosynthetic genesand on the stability of certain mRNA fragments as wellas on posttranslational steps (Pemberton et al. 1999).The two-component system RegA (a transcriptionalregulator of puc, puf and puh operons under anaerobicconditions)/ RegB (a sensory histidine kinase) is in-volved in transmission of the oxygen signal (positiveregulation). Both of these proteins are present in theavailable R. capsulatus sequence. The negative reg-ulation of the puc operon as well as of the crt andbch operons (which encode carotenoid and bacterio-chlorophyll metabolism-related proteins) is conductedthrough the CrtJ and CrtK proteins, which are alsopresent in the sequenced genome, as is another pro-tein, AppA, that is a regulator of CrtJ and CrtK. Therole of CrtK (TspO) is established for R. sphaeroides(Yeliseev and Kaplan 2000) and a similar orf is presentin the R. capsulatus sequence. With regard to furtherlevels of regulation, the ERGO site can be visited tosee the ‘pinned regions’ around the genes encodingPpsS and PpsR. These show that ten orfs, includingthe PpsS and PpsR proteins, are arrayed identically inR. sphaeroides and R. capsulatus. Most of these areorganized identically in R. palustris as well. Anotherpinned region shows conservation of the region encod-ing cytochrome c552, AppA precursor and AppB in

46

Table 1. Genome comparisons of three Rhodobacter species

R. capsulatus R.sphaeroides R. palustris

Size (Kb) 3721 4606 5507ORFs 3709 4536 5425

– ORFs with assigned functions 2531 (68%) 3063 (78%) 3273 (60%)– ORFs without assigned functions 1178 (32%) 1473 (32%) 2152 (40%)– ORFs without functions or similarity 330 (9%) 460 (10%) 764 (14%)

ORFs in ortholog clusters 2203 (59%) 2527 (56%) 2705 (50%)

R. sphaeroides with a similar region in R. capsulatusencoding cytochrome cy , AppA precursor and AppB.

Light-dependent activation of the puf operon andthe puhA gene (which encodes the third subunit, H,of the reaction center) is accomplished by two pro-teins encoded by the hvr operon, which contains thetrans-acting regulatory protein HvrA and transcrip-tional activator HvrB. Both members of this signaltransduction system are present in the sequenced gen-ome, as is an orf encoding the redox-sensing proteinFnrL, a positive regulator of the bch operon (bac-teriochlorophyll metabolism-relevant proteins). It hasbeen suggested that a redox signal originating from thecbb3-type cytochrome c oxidase (encoded by operonccoNOQP – organized identically in R. capsulatusand R. sphaeroides) could inhibit photosystem geneexpression under aerobic conditions (Bauer and Bird1996; Pemberton et al. 1998; Zeilstra-Ryalls et al.1998). The rdxBHIS operon-encoded proteins, locatedjust downstream from the ccoNOQP operon, appar-ently participates in the signal transduction cascadethat is based on cbb3 cytochrome oxidase activity. Thesame redox flow also plays a role in controlling the rel-ative accumulation of carotenoids, mainly spheroideneand spheroidenone, through the regulation of the crtoperon, under photosynthetic conditions.

Overall, the available genome sequence of R. cap-sulatus contains the entire network of regulatory andstructural protein participating in photosynthetic activ-ity. None of the expected genes is missing.

All the genes for saturated and monounsaturatedfatty acid synthesis and degradation have been identi-fied. Additionally, R. capsulatus appears to have theenzymes for polyunsaturated fatty acid utilization. Incontrast to R. sphaeroides, it does not synthesize beta-hydroxy fatty acids (we have not identified a fatty acidhydroxylase). It has the enzymes for lipidA biosyn-thesis. It utilizes the mevalonate-independent pathwayfor isoprenoid biosynthesis and we have identified the

enzymes for spheroidene and spheroidenone biosyn-thesis. As in R. sphaeroides, it is able to synthesizephosphatidylcholine using both phosphatidylcholinesynthase (RRC00355) and phosphatidylethanolaminemethyltransferase (RRC03911). It does not appear tohave a phosphatidylinositol synthase as is the case inR. sphaeroides. Also, in contrast to R. sphaeroides, itdoes not appear to have the enzymes for sulfolipid orbetaine-lipid biosynthesis.

Sigma factors

The genome contains six orfs encoding sigma factors.One is RpoD, the principal sigma-70. This sequencewas determined previously by S. Zheng (PhD thesis,University of Chicago) and found to agree with theamino acid sequence found for the purified sigmafactor from holoenzyme. Unusually, by comparisonwith other proteobacteria, the rpoD gene is alone, notin an operon together with a dna gene and a ribosomalprotein gene. A second orf encodes a protein similarto E. coli sigma-32 and R. sphaeroides sigma-37, theheat shock sigma factor. A third encodes a protein re-lated to RpoE of E. coli or R. sphaeroides, a sigma-19required for transcription of itself and the cycA operonin R. capsulatus. The fourth is NifR3, also known assigma-54, the factor that promotes transcription at se-quences recognized by the response regulator of thetwo-component system NtrB/NtrC, active at nitrogen-regulated promoters. The NifR3 gene is downstreamfrom the nifHDK operon, which it regulates. Thefifth sigma factor is similar to sigma-K of Clostridiumperfringens, a factor required for a late stage of spor-ulation. Its function in R. capsulatus is unknown.Finally, the last sigma factor is very similar to HrpL ofPseudomonas syringae, which works in conjunctionwith the two-component signaling system HrpS/HrpRto transcribe genes encoding virulence factors activein the colonization of susceptible plants by the phyto-

47

pathogenic bacterium. The hrpL gene is present in R.sphaeroides as well.

There is another similarity to proteobacteria thatare either pathogens or symbiotic. The rRNA operonsof all three purple photosynthetic bacteria are organ-ized as follows: 16s-ile-tRNA-ala-tRNA-23s-5s-met-tRNA. This exact organization is found in Brucellamelitensis and its close relatives, and nowhere else todate. In terms of the sequences of the rRNAs as well,Brucella clusters with the Rhodobacters.

Signalling

R. capsulatus has a single copy each of the flagellarand chemotaxis operons, unlike R. sphaeroides wherethe flagellar operon is duplicated. The class I regulat-ory genes flhD and flhC appear to be absent. All ofthe class II flagellar genes are present except for fliA(encoding the sigma-28 specific for the flagellar op-eron) and its negative regulator (flgM gene product).Note that flhDC was not found in the genome and isprobably superfluous because the flagellar genes canbe transcribed by sigma-54 RNA polymerase, as inPseudomonas. The motA and motB genes are tandemin most Gram-negative bacteria but in R. capsulatusonly MotB is present and MotA of the motor complexis missing. The motB operon is organized in a similarfashion as in R. sphaeroides. Most of the Class IIIflagella genes are present, including a single flagellingene.

With regard to chemotaxis, there are two copies ofmethyl esterases (CheB), two copies of methyl trans-ferases (CheR), three copies of the response regulatorCheY and three purine-binding chemotaxis proteins(CheW). There are 18 methyl-accepting chemotaxisproteins, not yet fully assigned to the various subtypes.There is no CheA, CheA-interacting protein (CheC) orCheV (fusion protein between CheW and CheY). Forsignal transduction, there are 5 regulatory componentsof sensory transduction histidine kinases, i.e. responseregulators, and 3 sensory components of sensory trans-duction histidine kinases. These three are paired withthe three response regulator genes. In addition, thereare 17 individual sensory transduction protein kinasedomains elsewhere in the genome. In total there areonly 25 ORFs potentially involved in two-componentsignal transduction, significantly fewer than in E. coli,for example.

There is an apparent five-gene operon containingthe structural gene for the photoactive yellow protein(PYP) and one for the coumaryl-CoA ligase, the en-

zyme that attaches the chromophore to the PYP. Theother three orfs encode hypothetical proteins with norecognizable domains. Wouter Hoff (personal com-munication) points out that it would be possible todetermine whether the PYP participates in chemo-taxis or phototaxis by physical characterization of theprotein.

Carbohydrate metabolism

Uptake and assimilation of carbohydrates

R. capsulatus seems to be unable to degrade anyextracellular carbohydrate polymer (cellulose, levan,chitin, amylose, maltodextrin, pectate, pectin). AnORF which might encode a precursor of glycohydro-lase (RRC01974) was tentatively assigned as beta-hexosaminidase precursor, however its exact sub-strate specificity is obscure. It is part of a con-served operon also found in Rhodobacter sphaeroides,Mesorhizobium loti, Brucella melitensis and Magneto-spirillum magnetotacticum, which also includes threeother conserved hypothetical proteins, most likelycytosolic.

Two ORFs homologous to P. aeruginosa PQQ-dependent ethanol dehydrogenase exaA were found(RRC03543 and RRC04444). Further conversion ofacetaldehyde to acetate is performed by a cytoso-lic NAD-dependent acetaldehyde dehydrogenase en-coded by RRC03580, which has high homology toXanthobacter autotrophicus chloroacetaldehyde de-hydrogenases as well as acetaldehyde dehydrogenasesfrom several other organisms, including the P. aeru-ginosa exaC enzyme. In P. aeruginosa, a cytochromec550 was shown to be an essential component of theethanol-degrading system (Schobert and Grisch 1999).A homolog of the exaB cytochrome c550 was foundin the R. capsulatus genome (RRC03554). However,in the P. aeruginosa genome, all the genes for theethanol degradation pathway are clustered, whereas inthe Rhodobacter genome they are scattered. Besides,downstream from the exaBC operon in P. aeruginosaa whole set of genes for PQQ biosynthesis is found,while none of these genes is present in Rhodobactercapsulatus. It is interesting that while all the PQQ bio-synthesis genes are present in R. sphaeroides, no genesfor the ethanol degradation pathway are there.

Along with ethanol, the periplasmic ethanol de-hydrogenase might utilize methanol as a substrate. Aglutathione-dependent formaldehyde-quenching sys-

48

tem was found in R. capsulatus, including formal-dehyde dehydrogenase (RRC01984) and formylgluta-thione hydrolase (RRC01985).

Ribitol 2-dehydrogenase (EC 1.1.1.56, RRC02336)and ribulokinase (EC 2.7.1.47, RRC02335) are foundin R. capsulatus. They are clustered with genesencoding a putative ABC transporter (RRC01430,RRC01431 and RRC02337) homologous to the E.coli xylose ABC transporter. However, based on thespecificity of enzymes clustered with this transporter,it might be assigned as a ribitol-specific ABC trans-porter. This assignment is supported by the findingof an ORF (RRC01433) upstream of the ABC trans-porter, which is homologous to the Klebsiella aero-genes ribitol operon repressor. Thus far, the onlyknown specific transporter for ribitol is the ribitol-proton symporter RbtT found in Klebsiella aerogenes.A homologous operon encoding a ribitol operonrepressor, ABC transporter, ribitol dehydrogenase andribulokinase is also found in R. sphaeroides.

Another polyol transporter is the sorbitol/mannitoltransporter (RRC00677, RRC00676, RRC00675,RRC00674), which is clustered with sorbitol dehyd-rogenase (EC 1.1.1.14, RRC00673) and mannitol 2-dehydrogenase (EC 1.1.1.67, RRC00672). Sorbitoland mannitol are converted to fructose, which isfurther phosphorylated by fructokinase (EC 2.7.1.4,RRC00974). Two other ORFs with homology to fruc-tokinase were found (RRC00947 and RRC02134), buthomology to experimentally proven fructokinases islow.

Alpha-glucosides (sucrose, trehalose and maltose)are taken in by a probable alpha-glucoside-specific ABC transporter (RRC03427, RRC03426,RRC03422), which is homologous to the Agl trans-porter of S. meliloti (Willis and Walker 1999). Fur-ther degradation to glucose is performed by alpha-glucosidase (EC 3.2.1.20, RRC03423). The trans-porter and alpha-glucosidase are clustered with ahomolog of the S. meliloti AglR transcriptional reg-ulator (RRC03428), which is followed by beta-glucosidase (EC 3.2.1.21, RRC03429) and glucok-inase (EC 2.7.1.2, RRC03430). Clustering of a beta-glucosidase gene with a probable alpha-glucosideABC transporter might suggest that beta-glucosides(cellobiose, etc.) are transported by this complex aswell.

A D-ribose ABC transporter was found (RRC01127,RRC01129, RRC01130). Ribokinase is most likelyencoded by RRC04080 and a ribokinase homologis encoded by RRC02404. D-Xylose is taken in

by an ABC transporter (RRC00724, RRC00723,RRC00722) and converted to xylulose-5-phosphate byxylose isomerase (EC 5.3.1.5, RRC00720) and xy-lulose kinase (EC 2.7.1.17, RRC00721). Fructose istaken in and phosphorylated to fructose-1-phosphateby a fructose-specific PTS system (RRC01003).Fructose-1-phosphate is further phosphorylated by 1-phosphofructokinase (EC 2.7.1.56, RRC01004). Noglucose-specific transporter was found in the R. cap-sulatus genome.

R. capsulatus is able to degrade several mono-and dicarboxylic acids, including D- and L-lactate,propionate and malonate. Dicarboxylic acids aretaken in by the C4-dicarboxylic acid transport pro-tein (RRC00167-RRC00169) and utilized via the TCAcycle. Both D-lactate dehydrogenase (EC 1.1.2.4,RRC01950) and L-lactate dehydrogenase (EC 1.1.2.3,RRC00194) are found, although no lactate permeasewas identified.

Propionate might be degraded via the methyl-citrate pathway (all genes were identified) or viamethylmalonyl-CoA. In the latter pathway, no ORFencoding methylmalonyl-CoA epimerase was identi-fied.

A sodium-dependent malonate transporter (RRC0-2280, RRC02281) and malonate decarboxylase(RRC02285–RRC02291), homologous to that of Ma-lomonas rubra, were identified in R. capsulatus.Malonate decarboxylase from R. capsulatus belongsto a family of biotin-containing sodium ion translocat-ing enzymes. This energy-conserving enzyme mightbe employed during anaerobic growth on malonateunder dark conditions. It is interesting that attemptsto express a functional biotin protein MadF in E. coliwere unsuccessful – only 5% of it appeared to be bi-otinylated. It was suggested that the biotin ligase ofM. rubra has different substrate specificity from thatof the E. coli enzyme. In R. capsulatus, there are twocopies of biotin ligase, one clustered with the malon-ate degradation operon (RRC02290). In addition, theBioY protein (RRC02277) of unknown function, yetsomehow related to biotin biosynthesis, is clusteredwith the malonate degradation genes.

An operon homologous to the S. enterica propa-nediol utilization operon pdu was found in R. cap-sulatus. Genes encoding putative propanol dehydro-genase (RRC01269), putative CoA-dependent propi-onaldehyde dehydrogenase (RRC01259), as well asgenes homologous to components of polyhedral bod-ies, which accumulate in S. enterica during anaer-obic growth on propanediol, were all found. How-

49

ever, a propanediol dehydratase homolog togetherwith enzymes reactivating cobalamin-dependent en-zymes (pduGH in S. enterica) are absent from the R.capsulatus genome, whereas pyruvate-formate lyase(RRC01270) and pyruvate-formate lyase activatingenzyme (RRC01257) are clustered with the propane-diol utilization genes. The last enzyme necessary forconversion of propanediol to propionate, propionatekinase, seems to be absent from the Rhodobacter gen-ome as well. Since propanediol dehydratase is the keyenzyme in the propanediol degradation pathway, it ishard to say whether R. capsulatus is able to utilizepropanediol in a way different from S. enterica, orthis operon participates in the utilization of some othercompound.

Carbohydrate central metabolism

All of the glycolytic genes (for both the Embden–Meyerhoff and Entner–Doudoroff pathways) werefound. R. capsulatus is able of slow fermentat-ive growth under dark anaerobic conditions (Gur-gun et al. 1976), which is due to the presence ofpyruvate-formate lyase (RRC01270), formate dehyd-rogenase (RRC00179–RRC00182) and L-lactate de-hydrogenase (EC 1.1.1.27, RRC04214). The latterprotein is homologous to the Thermus aquaticus en-zyme. Along with lactate, formate, acetate and CO2,2,3-butanediol, acetoin, and diacetyl were formed dur-ing fermentative growth of R. capsulatus. However,no homologs of acetolactate decarboxylase and acet-oin dehydrogenase were identified. All genes of thenon-oxidative branch of the pentose phosphate shuntwere found. All tricarboxylic acid cycle enzymeswere identified except for fumarate dehydratase (EC4.2.1.2). Acetyl-CoA for the TCA cycle can be sup-plied by both the pyruvate dehydrogenase complex(RRC03494–RRC03496) and by pyruvate-flavodoxinoxidoreductase (RRC03470). Pyruvate carboxylase(RRC02720), NADP-dependent malic enzyme (EC1.1.1.40, RRC04526) and NAD(P)-dependent malicenzyme (EC 1.1.1.39, RRC03006-RRC03008) werefound. Both NAD(P)- and NADP-dependent malicenzymes are highly homologous to S. meliloti counter-parts, which are unusual members of the malic enzymefamily due to the presence of an extra domain and in-ability either to carboxylate pyruvate or decarboxylateoxaloacetate. Genes of the glyoxylate shunt were alsofound.

Both ribulose bisphosphate carboxylase 1 (RRC03-778, RRC03779) and the large subunit of ribu-

lose bisphosphate carboxylase 2 (RRC02395) werefound. However, unlike R. sphaeroides, only onecopy each of phosphoribulokinase (RRC02391) andfructose bisphosphatase (RRC02390) are present.CO2 fixation genes are clustered on the chromo-some with glycogen cycle genes (glycogen phos-phorylase RRC00912, branching enzyme RRC02383,glucose-1-phosphate adenylyltransferase RRC02384,glycogen synthase RRC02385, glycogen debranch-ing enzyme RRC02386, and phosphoglucomutaseRRC02387). Gluconeogenesis from acetyl-CoA mightoccur via the glyoxylate shunt, from pyruvate– via pyruvate, phosphate dikinase (EC 2.7.9.1,RRC02727).

Nucleotide sugar biosynthesis

R. capsulatus is able to synthesize the follow-ing nucleotide sugars: UDPglucose, UDPglucuron-ate, UDPgalactose, dTDP-L-rhamnose, CDPglucose,GDPmannose, UDP-N-acetylglucosamine, UDP-N-acetylmannosamine, UDP-N-acetylneuraminate, CMP-3-deoxy-D-mann o-octulosonate. No genes for ADP-heptose biosynthesis were found. R. capsulatusmight produce periplasmic glucans (MdoG andMdoH protein homologs are encoded by RRC04027and RRC03014, respectively). However, no genesresponsible for succinylation and phosphoglycerolmodification were identified.

Nitrogen metabolism

Denitrification

The only denitrification enzyme found in Rhodobactercapsulatus is nitrous oxide reductase (EC 1.7.99.6). Acomplete operon encoding the nitrous oxide reductaseprecursor (NosZ), genes involved in the maturation ofthe copper center (nosDFY) and the NosR regulatorare present. Nitrous oxide respiration is an independ-ent process, which should not necessarily be coupledto complete denitrification. However, there are no lit-erature data for any organism having nitrous oxidereductase and not at least one other denitrificationenzyme. Wollinella succinogenes seems to be the best-studied example of a non-denitrifying bacterium beingcapable of nitrous oxide respiration (Youshinari 1980),but it also has nitrate reductase, which seems to beabsent from our Rhodobacter strain.

There are no data about regulation of expressionof this enzyme in Rhodobacter. The only expression

50

data are for Pseudomonas stutzeri, which has verysophisticated regulation of nosZ transcription, with 6promoters upstream of nosZ (Cuypers et al. 1995). InP. stutzeri, nosZ is expressed constitutively at a lowlevel even under aerobic conditions (other denitrific-ation enzymes are not). Full expression of nosZ isachieved under anaerobiosis or oxygen limitation (<3% air saturation) with concomitant presence of ni-trogen oxide as an inducer and only when the NosRregulator is present. Another inducer of nosZ ex-pression is nitrate, so the NarL regulator might beinvolved, but Tn5 mutagenesis data showed that allmutations leading to defects in nitrous oxide respira-tion were clustered within a single region ca. 8 kb long(Wiebrock and Zumft 1987). This makes participationof other transcription regulators of denitrification en-zymes, such as NarL, NnrR or NirI, unlikely. Noneof these regulators was found in the R. capsulatusgenome.

Membrane transport

The only nitrogen compound-transporting proteinsfound in R. capsulatus are the ammonium transporterand a urea transport system. No transporters for ni-trate, nitrite or cyanate were found.

Nitrogen fixation

We found genes for the two known nitrogenases inR. capsulatus: The iron–molybdenum enzyme andthe iron–iron enzyme. The gene cluster for Fe-Mocofactor biosynthesis sits separately from the genesfor the iron-molybdenum nitrogenase, as shown pre-viously by the lab of Klipp. All the nitrogen fixa-tion regulators that should be present in Rhodobacter:NtrB, NtrC, GlnB, NtrY/NtrX two-component system,NifR3, NifA, DraTG and AnfA were located. There isno NifL or rnf cluster.

Ammonia assimilation

Rhodobacter produces L-glutamate by transaminationwith L-glutamine, which is produced by glutaminesynthetase from L-glutamate and ammonia. It alsohas both the NADPH-dependent glutamate synthase(GOGAT) and the ferredoxin-dependent enzyme, butnot the NADH-dependent enzyme. L-Glutamine syn-thetase, EC 6.3.1.2, was assigned in both R. capsulatusand R. sphaeroides to several genes. One of them isnext to an ammonium transporter, the nitrogen reg-ulatory protein P-II , and nitric oxide reductase (EC

1.7.99.7) genes, showing evidence of functional clus-tering, but with pretty weak homology to the BS-glnAgene. Several other genes annotated as L-glutaminesynthetase are situated close to each other on thechromosome.

Amino acid biosynthesis

1. For chorismate biosynthesis, all ORFs are present:aroA, aroB, aroC, aroE, and aroK. Chorismate mutaseis found in both R. capsulatus and R. sphaeroides assingle genes, ORFs RRC03020 and RRS 08505.

2. Tryptophan biosynthesis via anthranilate is com-plete and represented by EC-trpA, trpB, and trpCgenes as well as anthranilate synthase, EC-pabA.

3. Tyrosine and phenylalanine biosynthesis fromchorismate via prephenate is also complete. Transa-mination can be provided by aromatic amino acidaminotransferase, tyrB, aspartate aminotransferase,aspC, histidinol phosphate aminotransferase, hisCprephenate aminotransferase, tyrA, with glutamate oraspartate as amino donors. Oxidation/reduction in thispathway can be accomplished by arogenate dehyd-rogenase activity. This contig also contains the geneasserted to be hisC.

4. For histidine, all necessary ORFs are presentand assigned as His-family proteins. The HisB pro-tein is bifunctional, with both imidazoleglycerol-Pdehydratase and histidinol phosphatase domains.

5. Valine biosynthesis is also represented via allORFs, assigned as Ilv-proteins.

6. Leucine biosynthesis from 2-oxomethylbutanoateshares part of the valine pathway and also con-tains ORFs assigned as Leu proteins. A branchedchain amino acid aminotransferase, EC 2.6.1.42, canprovide transamination in the last step of both path-ways. A specific leucine aminotransferase sequencewas not found.

7. Alanine biosynthesis has three opportunities:from valine and pyruvate as substrates, via EC2.6.1.42; from beta-alanine via omega-amino acid–pyruvate aminotransferase (EC 2.6.1.18); or via alan-ine dehydrogenase with NH4 as the source of theamino group. Alanine aminotransferase and serine-pyruvate aminotransferase were not found in the gen-ome. Alanine and glutamate racemases are present.

8, 9. Glutamate biosynthesis is from 2-oxoglutarateand glutamine, via glutamate synthase (ferredoxin),EC 1.4.1.7, or via glutamate synthase (NADPH),for which the large and small chains (GOGAT) are

51

present. Glutamine is made from glutamate and NH4by the product of the glnA gene (GS).

10. Ornithine from glutamate via arg genes couldbe synthesized in two ways: with glutamate acetyltransferase, EC 2.3.1.35, using glutamate as acceptorof the carboxyl group, or aminoacylase, EC 3.5.1.14 ,by which acetate could be produced.

11. L-Proline can be made from glutamate, inwhich the proA, proB and proC genes are involved.Alternatively, L-proline can be made from ornithinein two different ways: via acetylornithine aminotrans-ferase, ArgD, which also produces glutamate, andthen pyrroline 5-carboxylate reductase, the proC geneproduct, or via ornithine cyclodeaminase, EC 4.3.1.12,which releases NH4 in one step.

12–14. Serine, glycine and cysteine are synthes-ized from 3-P glycerate: serine via a phosphorylatedpathway, in which the serA, serB and serC genesare involved. Glycine is obtained via the glyA geneproduct, serine hydroxymethyltransferase, from L-serine or via threonine aldolase, EC 4.1.2.5, fromthreonine. Cysteine is produced via serine acetyl trans-ferase, cysE, and cysteine synthase, cysK. The sourceof the sulfur group is sulfide.

15. The oxaloacetate group of aminoacids, suchas L-aspartate, is added by aspartate aminotransferase,assigned to the patA or malY genes.

16. Lysine is made from aspartate, with the dapA,dapB, dapD, dapE and dapF genes involved, as wellas lysA, diaminopimelate decarboxylase. Homoserinedehydrogenase, EC 1.1.1.3, encoded by the metL gene,is used for L-homoserine biosynthesis.

17. Threonine synthase was found. Serine degrada-tion could also be the source of threonine via threoninealdolase and glycine hydroxymethyltransferase.

18. The isoleucine biosynthetic pathway fromthreonine is fully present via genes of the ilv family.

19. Homocysteine biosynthesis is from ho-moserine via the MetZ protein, 0-succinyl-homoserinesulfhydrylase, EC 4.2.99.-, as well as MetA, thehomoserine sulfhydrylase. R. capsulatus also has o-acetylhomoserine thiol-lyase, EC 4.2.99.10, as part ofMetC!, which could also add hydrogen sulfide.

Conclusion

Clearly there are still some elements missing from themetabolic reconstruction, such as replication and re-pair of DNA, transcription, translation, energy flux,some aspects of transport and signaling. These will

be added when the sequence is finally closed or whengenetic experiments indicate that the last gap is correctand that the chromosome is linear. The web site atwww.integratedgenomics.com offers subscription to asuite of tools (ERGO) for examination of the genometo date, with links to the literature on which the orf as-signments are based. The raw sequence data are freelyavailable on this site. Publication of a fuller account ofthe sequence is also expected.

Acknowledgements

Early stages of this project were supported by theDepartment of Energy and a grant from the NationalScience Foundation for international cooperation withthe sequencing group in Prague. Work in Praguewas additionally supported by grant number MSM223300006.

References

Bauer CE and Bird TH (1996) Regulatory circuits controllingphotosynthesis gene expression. Cell 85: 5–8

Cogdell RJ, Isaacs NW, Howard TD, McLuskey K, Fraser NJ andPrince SM (1999) How photosynthetic bacteria harvest solarenergy. J Bacteriol 181: 3869–3879

Cuypers H, Berghofer J and Zumft WG (1995) Multiple nosZ pro-moters and anaerobic expression of nos genes necessary forPseudomonas stutzeri nitrous oxide reductase and assembly ofits copper centers. Biochim Biophys Acta 1264: 183–190

Fonstein M and Haselkorn R (1993) Chromosomal structure ofRhodobacter capsulatus strain SB 1003: Cosmid encyclopediaand high resolution physical and genetic map. Proc Natl AcadSci USA 90: 2522–2526

Fonstein M, Zheng S and Haselkorn R (1992) Physical map of thegenome of R. capsulatus SB1003. J Bacteriol 174: 4070–4077

Fonstein M, Koshy EG, Nikolskaya T, Mourachov P and HaselkornR (1995) Refinement of the high resolution physical & geneticmap of R. capsulatus and genome surveys using blots of thecosmid encyclopedia. EMBO J 14: 1827–1841

Gurgun V, Kirchner G and Pfennig N (1976) Fermentation of pyr-uvate by seven species of phototrophic purple bacteria. Z AllgMikrobiol 16: 573–86

Lang AS and Beatty JT (2001) The gene transfer agent ofRhodobacter capsulatus and ‘constitutive transduction’ in proka-ryotes. Arch Microbiol 175: 241–249

Loach PA (2000) Supramolecular complexes in photosyntheticbacteria. Proc Natl Acad Sci USA 97: 5016–5018

Pemberton JM, Horne IM and McEwan AG (1998) Regulationof photosynthetic gene expression in purple bacteria. Microbiol144: 267–278

Schwintner C, Sabaty M, Berna B, Cahors S and Richaud P (1998)Plasmid content and localization of the genes encoding the de-nitrification enzymes in two strains of Rhodobacter sphaeroides.FEMS Microbiol Lett 165: 313–321

Schobert M and Gorisch H (1999) Cytochrome c550 is an essen-tial component of the quinoprotein ethanol oxidation system

52

in Pseudomonas aeruginosa: Cloning and sequencing of thegenes encoding cytochrome c550 and an adjacent acetaldehydedehydrogenase. Microbiology 145: 471–481

Vlcek C, Paces V, Maltsev N, Paces J, Haselkorn R and FonsteinM (1997) Sequence of a 189-kb segment of the chromosome ofRhodobacter capsulatus SB1003. Proc Natl Acad Sci USA 94:9384–9388

Wiebrock A and Zumft WG (1987) Physical mapping of transposonTn5 insertions defines a gene cluster functional in nitrous oxiderespiration by Pseudomonas stutzeri. J Bacteriol 169: 4577–4580

Willis LB and Walker GC (1999) A novel Sinorhizobium melilotioperon encodes an !-glucosidase and a periplasmic-binding-

protein-dependent transport system for !-glucosides. J Bacteriol181: 4176–4184

Yeliseev A and Kaplan S (2000) TspO of Rhodobacter sphaeroides.J Biol Chem 275: 5657–5667

Youshinari T (1980) N2O reduction by Vibrio succinogenes. ApplEnviron Microbiol 39: 81–84

Zeilstra–Ryalls J, Gomelsky M, Eraso JM, Yeliseev A, O’GaraJ and Kaplan S (1998) Control of photosystem formation inRhodobacter sphaeroides. J Bacteriol 180: 2801–2809

Zumft WG (1997) Cell biology and molecular basis of denitrifica-tion. Microbiol Mol Biol Rev 61: 533–616