Subtraction hybridisation and shot-gun sequencing: a new approach to identify symbiotic loci

7
Nucleic Acids Research, 1994, Vol. 22, No. 8 1335-1341 Subtraction hybridisation and shot-gun sequencing: a new approach to identify symbiotic loci X.Perret1 2, R.Fellay1, A.J.Bjourson3, J.E.Cooper3, S.Brenner2 and W.J.Broughton1 * 'Laboratoire de Biologie Moleculaire des Plantes Superieures, Universite de Geneve, 1 chemin de l'Imperatrice, 1292 Chambesy, Switzerland, 2Molecular Genetics, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Hills Road, Cambridge CB2 20Q and 3Department of Mycology and Plant Pathology, The Queen's University of Belfast, Belfast BT9 5PX, UK Received March 14, 1994; Accepted March 22, 1994 ABSTRACT Traditionally, new loci involved in the Rhizobium- legume symbiosis have been identified by transposon mutagenesis and/or complementation. Wide dispersal of the symbiotic loci in Rhizobium species NGR234, as well as the large number of potential host-plants to be screened, greatly reduces the efficiency of these techniques. As an alternate strategy designed to identify new NGR234 genes involved in the early stages of the symbiosis, we combined data from competitive RNA hybridisation, subtractive DNA hybridisation and shot-gun sequencing. On the assumption that the expression of most nodulation genes is triggered by compounds released by the host-plant, we identified, in the ordered cosmid library of the large symbiotic plasmid pNGR234a, restriction fragments that carry transcripts induced by flavonoids. To target genes not present in the closely related strain R.fredii USDA257, we selected fragments that also carried sequences purified by subtractive DNA hybridisation. Shot-gun sequencing of this subset of fragments lead to the identification of sequences with strong homology to diverse prokaryotic genes/proteins. Amongst these, a symbiotically active ORF from pNGR234a, is highly homologous to the leucine responsive regulatory protein of Escherichia coli (Lrp), is induced by flavonoids, and is not present in USDA257. INTRODUCTION Symbiotic associations between leguminous plants and soil bacteria belonging to the genera Azorhizobium, Brad)vrhizobium and Rhizobium lead to the formation of nitrogen-fixing root structures called nodules. In contrast to strains from temperate regions that tend to have a limited host-range, tropical rhizobia such as Rhizobium species NGR234 (1) and R.fredii USDA257 (2), nodulate a wide variety of host-plants. Tests on more than 400 different legumes have shown that NGR234 is able to nodulate at least 75 plant genera, including the non-legume EMBL accession no. X74134 Parasponia andersonii (1; 3; S.G.Pueppke and W.J.Broughton, unpublished). Comparative studies have shown that R.fredii USDA257 nodulates an exact subset of those of NGR234 (S.G.Pueppke and W.J.Broughton, unpublished). At the nucleotide level, several symbiotic loci, including nodABC (3) and niodS (4), are almost perfectly conserved, suggesting a very close phylogenetic relationship between the two rhizobia. Interestingly, the nodSU genes that allow NGR234 to nodulate Leuceana species (5), are present in the USDA257 genome. A deletion in the promoter region region renders nodSU inactive however and is responsible for the Nod- phenotype of USDA257 on Leuceana (4). Wide-spread dispersal of the symbiotic loci in NGR234 (5, 6), coupled with the large number of potential hosts to be screened, complicates traditional genetic approaches towards identifying symbiotic genes (random mutagenesis, interspecies complementation, etc). Accordingly, we designed an alternate strategy to identify genes involved in the early stages of nodulation (outlined in Fig. 1). The ordered cosmid library which covers the symbiotic plasmid pNGR234a (6), as well as 97% of the remaining 5.7 megabases of the NGR234 genome (7), was used to index the position of loci whose expression is triggered by plant signals (e.g. flavonoids). Many XhoI restriction fragments, dispersed over pNGR234a, and carrying flavonoid-inducible genes were identifed by competitive RNA hybridisation (8). Some of these fragments carried known inducible loci, such as the nodABC and nodSU genes. Concomitantly, using DNA subtraction hybridisation, we purified NGR234 sequences that are absent from the genome of R.fredii USDA257. By probing the cosmid library with these 'unique' sequences, we were able to assign them first to certain cosmid clones and later, to specific XhoI restriction fragments. To target flavonoid-inducible loci that are not present in USDA257, we combined the results of the competitive RNA and subtractive DNA hybridisations. This way, we identified a subset of restriction fragments that carry sequences not shared by USDA257, as well as inducible transcripts. Shot- gun sequencing of these DNA fragments together with a fast search for homology among existing nucleic acid and protein data *To whom correspondence should be addressed

Transcript of Subtraction hybridisation and shot-gun sequencing: a new approach to identify symbiotic loci

Nucleic Acids Research, 1994, Vol. 22, No. 8 1335-1341

Subtraction hybridisation and shot-gun sequencing: a newapproach to identify symbiotic loci

X.Perret1 2, R.Fellay1, A.J.Bjourson3, J.E.Cooper3, S.Brenner2 and W.J.Broughton1 *'Laboratoire de Biologie Moleculaire des Plantes Superieures, Universite de Geneve, 1 chemin del'Imperatrice, 1292 Chambesy, Switzerland, 2Molecular Genetics, University of Cambridge School ofClinical Medicine, Addenbrooke's Hospital, Hills Road, Cambridge CB2 20Q and 3Department ofMycology and Plant Pathology, The Queen's University of Belfast, Belfast BT9 5PX, UK

Received March 14, 1994; Accepted March 22, 1994

ABSTRACT

Traditionally, new loci involved in the Rhizobium-legume symbiosis have been identified by transposonmutagenesis and/or complementation. Wide dispersalof the symbiotic loci in Rhizobium species NGR234, aswell as the large number of potential host-plants to bescreened, greatly reduces the efficiency of thesetechniques. As an alternate strategy designed toidentify new NGR234 genes involved in the early stagesof the symbiosis, we combined data from competitiveRNA hybridisation, subtractive DNA hybridisation andshot-gun sequencing. On the assumption that theexpression of most nodulation genes is triggered bycompounds released by the host-plant, we identified,in the ordered cosmid library of the large symbioticplasmid pNGR234a, restriction fragments that carrytranscripts induced by flavonoids. To target genes notpresent in the closely related strain R.fredii USDA257,we selected fragments that also carried sequencespurified by subtractive DNA hybridisation. Shot-gunsequencing of this subset of fragments lead to theidentification of sequences with strong homology todiverse prokaryotic genes/proteins. Amongst these, asymbiotically active ORF from pNGR234a, is highlyhomologous to the leucine responsive regulatoryprotein of Escherichia coli (Lrp), is induced byflavonoids, and is not present in USDA257.

INTRODUCTION

Symbiotic associations between leguminous plants and soilbacteria belonging to the genera Azorhizobium, Brad)vrhizobiumand Rhizobium lead to the formation of nitrogen-fixing rootstructures called nodules. In contrast to strains from temperateregions that tend to have a limited host-range, tropical rhizobiasuch as Rhizobium species NGR234 (1) and R.fredii USDA257(2), nodulate a wide variety of host-plants. Tests on more than400 different legumes have shown that NGR234 is able tonodulate at least 75 plant genera, including the non-legume

EMBL accession no. X74134

Parasponia andersonii (1; 3; S.G.Pueppke and W.J.Broughton,unpublished). Comparative studies have shown that R.frediiUSDA257 nodulates an exact subset of those of NGR234(S.G.Pueppke and W.J.Broughton, unpublished). At thenucleotide level, several symbiotic loci, including nodABC (3)and niodS (4), are almost perfectly conserved, suggesting a very

close phylogenetic relationship between the two rhizobia.Interestingly, the nodSU genes that allow NGR234 to nodulateLeuceana species (5), are present in the USDA257 genome. Adeletion in the promoter region region renders nodSU inactivehowever and is responsible for the Nod- phenotype ofUSDA257 on Leuceana (4).Wide-spread dispersal of the symbiotic loci in NGR234 (5,

6), coupled with the large number of potential hosts to bescreened, complicates traditional genetic approaches towardsidentifying symbiotic genes (random mutagenesis, interspeciescomplementation, etc). Accordingly, we designed an alternatestrategy to identify genes involved in the early stages of nodulation(outlined in Fig. 1). The ordered cosmid library which covers

the symbiotic plasmid pNGR234a (6), as well as 97% of theremaining 5.7 megabases of the NGR234 genome (7), was usedto index the position of loci whose expression is triggered byplant signals (e.g. flavonoids). Many XhoI restriction fragments,dispersed over pNGR234a, and carrying flavonoid-induciblegenes were identifed by competitive RNA hybridisation (8). Someof these fragments carried known inducible loci, such as thenodABC and nodSU genes. Concomitantly, using DNAsubtraction hybridisation, we purified NGR234 sequences thatare absent from the genome of R.fredii USDA257. By probingthe cosmid library with these 'unique' sequences, we were ableto assign them first to certain cosmid clones and later, to specificXhoI restriction fragments. To target flavonoid-inducible loci thatare not present in USDA257, we combined the results of thecompetitive RNA and subtractive DNA hybridisations. This way,we identified a subset of restriction fragments that carry sequencesnot shared by USDA257, as well as inducible transcripts. Shot-gun sequencing of these DNA fragments together with a fastsearch for homology among existing nucleic acid and protein data

*To whom correspondence should be addressed

1336 Nucleic Acids Research, 1994, Vol. 22, No. 8

DNA SUBTRACTI1ON RNA COMPETITIONHYBRIDIZATION | HYBRIDlZlION

au3AI fragments from NGR234not shared by USDA257

PROBING OF THE ORDEREDPCR AMPLIFICATION * COSMID LIBRARY:

Dot-blots and Southem blots ofA7olrestriction digests

CLONING

Purificatior ofXhol fragments from pNGR234a,positive in both types of hybridization

Sau3Al LIBRARY

RANDOM SEQUENCING150 Sequences

pX 140 (ORF-1, LRPtike) > pRK41pX177 pRK21pXl85 pR57

pR64

Figure 1. Flow diagram showing the methods used to analyze the svmbioticplasmid pNGR234a for loci induced by flavonoids and not shared by R.firediiUSDA257.

bases, identified a number of putative genes with strong homologyto diverse prokaryotic genes/proteins.

MATERIAL AND METHODSRNA competition hybridisationRhizobium species NGR234 was grown at 28°C in RMMminimal medium (9) with succinate as the carbon source.Flavonoid induction was performed by adding 200 nM of daidzeinto cultures with a turbidity of 0.6 at 600 nm. Cells were harvestedat different times after induction and resuspended in a pre-warmedsolution (90°C) consisting of equal volumes of phenol saturatedwith sodium acetate (pH4.5) and 20 mM Tris -HCI containing600 mM NaCl, 1 mM EDTA, and 1 % (w/v) SDS. RNA wasextracted with phenol-chloroform, precipitated with ethanol andpurified by centrifugation through a CsCI cushion at 115,000 xg for 1 h. To prepare radioactive probes, 10 to 15 jg of RNAwas partially hydrolyzed in 125 mM NaOH for 25 min on ice,and labelled using T4 polynucleotide kinase and [32P]ATP for90 min at 37°C. The probes were purified by centrifugationthrough Sephadex G50 using Ultrafree-MC 0.45 tm filter units(Millipore, Bedford, MA, USA). Digested cosmid DNA wasseparated on 0.8 % (w/v) agarose gels and transferred to'GeneScreen Plus' nylon membranes which were pre-hybridisedovernight at 65°C in 50 mM Tris-HCI (pH7.4), 0.2% (w/v)bovine serum albumin, 0.2% (w/v) Ficol, 0. 1% (w/v) sodiumpyrophosphate, 1% (w/v) SDS, 1 M NaCl and 100- 150 ,g ofnon-labelled RNA prepared from non-induced rhizobia.Hybridisation was performed at 65°C for 20 h by adding thepurified probes directly to the pre-hybridisation solution. Washingwas performed (3 x 30 min at 65°C) in 1% (w/v) SDS. 1 xSSC and 15 min at RT in 0.2 x SSC.

DNA subtraction hvbridisation (foir a detailed protocol seeBjourson et al. 10)Genomic DNAs fronm NGR234 (the probe' strain) and R.fiediiUSDA257 (the subtracter' strain) were prepared using standardprocedures (10, It). Approximately I ,ug of DNA from eachstrain was digested to completion wvith Sau3AI. 600 ng of specificlinkers were ligated to _`200 ng of each of the restricted DNAs.Only the linker designed for the subtracter DNA was biotinylatedand was synthesized using uracil in place of thymidine. One ngof the ligated probe DNA was amplified by 45 PCR cycles (80sec denaturation at 94°C. 60 sec annealing at 55°C and 120 secDNA polymerization at 72°C) in reaction mixtures containing10 mM Tris-HCI (pH8.3), 50( mM KCI. 1.5 mM MgCl,.0.01 % (w/v) gelatin. 200 jtM dNTPs. 1 ,tM primer(complementary to the linker) and 0.5 U of Taq DNApolymerase. With the exception of the biotinylated primer andthe substitution of dTTP for dUTP, the same amplificationconditions were applied to the subtracter DNA. To ensure thatsufficient biotin groups were present for subsequent binding tostreptavidin. the anmplified subtr-actei DNA was additionallybiotinylated. Subtraction hybridisation was performed in 0.5 mlcentrifuge tubes, with _I to 5 ng of PCR-amplified probe DNA(NGR234) and 20 ,ug of subtracter DNA (USDA257) in a

hybridisation solution containing 50 mM HEPES (pH7.5), 0.5M NaCl, 1 mM EDTA, and 0. 1 % (w/v) SDS. The mixture wasdenatured at 99°C for 10 min, and incubated at 65°C for 48 hrs.To isolate the probe DNA from the subtraction mixture, 30 jugof streptavidin was added in two steps, and the mixture extractedseveral times with an equal volume of phenol-chloroform (50:50,v/v). Prior to cloning, the NGR234 DNA sequences left aftertwo consecutive cycles of subtraction were PCR-amplified usingthe same amplification conditions as described above, except forthe addition of uracil glycosilase to destroy any remaining tracesof USDA257 subtracter DNA. The specific primer-linkersflanking the subtracted sequences were removed by digestion withSau3AI. Fragments larger than 100 bp were purified from a 1 .2 %agarose gel using a DEAE cellulose membrane (Schleicher andSchuell GmbH, Dassel, Germany) and cloned into the BautiHIrestriction site of the Bluescript KS + vector (Stratagene, La Jolla,CA, USA).

DNA isolation and sequencingBacterial strains and plasmids used are listed in Table 1. E.coliwas grown on TYE or Terrific Broth (12). Bluescriptrecombinants were raised in E. coli DH5ct, while Lorist2 cosmidclones were grown in E.coli 1046. Cosmid and Bluescriptrecombinant DNAs were prepared by standard alkaline mini-preparations (12). A Sait3AI library of selected DNA fragmentsfrom the cosmid clones covering pNGR234a was prepared asfollows: XhoI digested restriction fragments purified from agarosegels were pooled and cleaved with Sau3AI, extracted withphenol/chloroform and cloned in Bluescript KS + DNAsequences of inserts larger than 100 bp were determined by thedideoxy method of Sanger et al. (13), using double strandedtemplates and the Sequenase II kit (United States BiochemicalCorp., Cleveland, OH, USA).DNA labelling and hybridisation procedures32P-labelling of the Sau3AI fragments from NGR234 remainingafter subtraction against USDA257 genomic DNA was performedby 3 cycles PCR amplification as in Bjourson et al. (10). Insertsfrom selected Bluescript KS + clones were radioactively labelled

L

RANDOM SEQUENCING100 sequences

Nucleic Acids Research, 1994, Vol. 22, No. 8 1337

Table 1. Bacterial strains, plasmids and vectors used in this work

Strains, plasmids and vectors Relevant Characteristics References

Bacteria:Escherichia coli DH5cx recAl, 080 lacZAM15 35E. coli 1046 recA] 36Rhizobium sp. NGR234 broad host-range, RifR 37R. sp. ANU265 sym-plasmid cured derivative of NGR234 38R.fredii USDA257 broad host-range Rhizobium isolated

from soybean nodules, KMR 2R. sp. NGR2340ORF-1 Q mutant of the ORF-l locus (Lrp like) this work

Cosmid clones:pXB315, pXB807 from the Sym-plasmid pNGR234a 6pXB739, pXB424 from the chromosome of NGR234 7

Bluescript KS+ clones:pX140 unique to NGR234, homologous to the leucine

responsive regulatory protein from E. coli this workpX 177 unique to NGR234, strong homology with

R. legunminosarum OMPIII locus this workpX 185 unique to NGR234, homologous to cation

ATPases this workpR57 homology with the C-terminal domain of

E.coli Gabd protein this workpR64 homologous to the C-terminal domain of

C.crescentus McpA protein this workpRK421 homologous to the UGDP gene of E.coli and

to the ATP-binding domain of that protein this workpXB315Xl.4 1.4 kb XhoI fragment from pXB315 this workpXB315P3 3 kb PstI fragment from pXB315 this workpXB807X5.2 5.2 kb XhoI fragment from pXB807 this workpXB739P3 3 kb PstI fragment from pXB739 this workpRAF14 Omega interposon inserted in the HindII site

of the 1.4 kb XhoI fragment from pXB315 this work

by PCR amplification using either T3 - T7 primers that flank theentire insert, or synthesized primers designed to span that partof the sequence with the highest degree of homology to thedatabase entries. Endonuclease digested DNAs were transferredto nylon membranes by standard Southern blotting procedures.Multiple samples of non-digested DNA were analysed by Dot-blot hybridisation.

Data acquisition and computer analysisSequence data was collected on Macintosh computers (AppleComputer Inc., Cuppertino, CA, USA) using the DNA Parrotsystem (Clonetech, Palo Alto, CA, USA). Once transferred toSun workstations (Sun Microsystems Inc., Mountain View, CA,USA), DNA sequences were analysed for redundant and similarelements using the ICATOOL programme (14). Similarsequences were subsequently aligned by CLUSTAL5 (15). Toidentify homologies with published nucleotide or amino acidsequences, the non-redundant elements were individuallycompared to the latest version of the EMBL, GENEBANK,NBRF and SWISSPROT databases using BLAST software (16).

Construction and phenotype of NGR234 ORF-1 mutantThe HindIII site in the polylinker of the clone pXB3 15X 1.4 wasremoved by digestion with Clal and BamH 1, the protruding endsfilled in, and the clone restored by re-ligation. A SpR Omegainterposon (17) was inserted in the remaining HindIll site internalto ORFI. pRAF 14 was derived by cloning the XhoI fragmentcontaining Omega in the suicide vector pJQ200SK (18). Thisvector carries the sacB gene from Bacillus subtilis, which isinducible by sucrose and lethal when expressed in Gram-negativebacteria. pRAF 14 was then mobilized into NGR234 by tri-

parental mating using the helper plasmid pRK20 13 (19).Transconjugants were selected and purified on RMM platescontaining 100 mg/ml Rif, 50 mg/ml Sp and 1% (w/v) mannitol.Single colonies were grown in liquid TY and spread on platescontaining both antibiotics and 5% (w/v) sucrose (to select forinactivation of the sacB gene). In NGR234QORF-1, markerexchange by double crossover was confirmed by Southern blotanalysis. Nodulation capacity of the ORF-1 Omega mutant wascompared to wild type NGR234 on Calopogonium caerulum(Benth . )Hemsl., Leucaena leucocephala (Lam. )DeWit,Pachyrhizus tuberosus (Lam.)Spreng., and Vigna unguiculata(L.)Walp. Except for V.unguiculata, all plants were grownMagentaTM jars (5). Twenty to thirty five plants were used pertreatment. They were harvested 35d after inoculation with 109bacteria per plant. Kinetics of nodulation in the 5 weeks followinginoculation were determined on Vigna plants held in growthpouches (5). Each experiment was repeated two to three times.

RESULTSCompetitive RNA hybridisationMore than 50 XhoI restriction fragments, representing 100 kbin total, and carrying genes regulated by flavonoids wereidentified. These fragments are dispersed over pNGR234a, anda detailed analysis of them is given in Fellay et al. (8).

Analysis of NGR234 DNA sequences not shared by R.frediiUSDA257To assess the efficiency of the two consecutive cycles ofsubtraction hybridisation, dot-blot filters of genomic DNAs fromUSDA257 (the subtracter strain), NGR234 (the probe strain) and

1338 Nucleic Acids Research, 1994, Vol. 22, iVo. 8

* 0~~~~ -~~~~~~~~~~~~~~~~~~~~~~ -----------___ __ _

0 * 0 *0IIi 0* *

rm-,. 11) 1. E 4 :..-Uj r- :., __x 2 .t 11) -co cc "' 3 - .2 * ;, i r. - ""a l" I'm m ii s :Q m m m z m W. "

x x . . x x x x x -1 x x x x x x x x x x x ,CL CL CL 0. m m m m m = m 4 CL a 0. a m m m 0. 0. = a .1-1I

AI,X 1- 7, .20 tn1 '',A-F ,. 'A:.........,,.... t,:;Ei''; ,9n1i

Rie,e.(Nl}ilil r >-rr yMTtAEY.:. ,,T, A'':. '''F A 1,f PY ',:rWl r

L{F(-.' I''AEDKTGTLT<','i VV ;,AA ,r :, ,DKTGTLTLTKIJ e '5L

pXI;8 5,14 t1 MvaMJT-AAFPM LP-SI - DKTGTLT?,I PEL LDKCTGTLT:DKTGTLTT

RI

LGGQwQRVA-MG.r ,,,;.:O.

.

I LI;(ial<).;.; t rj~ .--i--v.LSGGQPQRVAMGI ,'' A '' - M'-;kVsTtt T .A,P .V :,L zaXAP .aa ::~;Aa..

T.

RI~~~~~~~~~~~~~~~~~~~~I

Oil- I

riE AA V'

R kTd . :'tFr''xME ':'F_,Vt'#'!'' EP ~iA.......................-Fr'.[LE,-EEmnS L

:rf- _ A _

Bkd.r +tt<vA:-_ A .'.0e'

.A;IFigure 2. A. Hybridisation patterns of the labelled NGR234 DNA fragnmentsndot shared by R.frclii USDA257 on dot-blot tilters prepared from cosmid DNAscovering-. 97% of NGR234 genome. The 24 clones representing the symbioticplasmid pNGR234a are boxed. B. left. Xhtol restriction digests ot overlappingcosmid DNAs covering half of pNGR234a. Right. SoLtthel-n tflter of thc sanmegel probed as in Fig. I-A. Positions of representative known genes are shosvton the autoradiograrn with numbei-ed circles: I) n/tKDHI. ') tio/B, a1n1d 3) ORF-1.

ANU265 (NGR234 cured of its symbiotic plasnid) were probedwith the subtracted fragments. No cross-hybridisation wasdetected with USDA257, but the subtracted sequences hybridisedstrongly to ANU265 and NGR234 genomic DNAs (data notshown). Next, the ordered cosmid library was used to index theposition of these 'unique' sequences. Dot-blot filters of DNAprepared from the 309 cosmids that cover > 97% of NGR234genome (7) (see Fig.2-A), when probed with the uniquesequences, showed that less than a third of all the cloneshybridised. BY comparing their respective positions in the'contigs (sets of contiguous cosmids), we found that positiveclones generally overlapped, and were grouped in about 30distinct chromosomal regions. Since two thirds of the 24 cosmidsnecessary to cover pNGR234a hybridised to fragments not sharedby R.firedii USDA257, the symbiotic plasmid in proportion toits size, carries a greater number of unique sequences than thechromosome. Assignment to distinct restriction fragments wasachieved by probing Southern blots of Xhol restricted cosmidDNAs representative of pNGR234a. Specificity of the DNAsubtraction was confirmed by the absence of hybridisation signalsto restriction fragments known to carry genes (such as nodABC,iioCdS, tio/B and uifKDH) shared by both NGR234 and USDA257(see Fig.2-B).A sample of the unique fragments was analysed by shot-gun

sequencing. Of 100 randomly' picked clones, the sequences of73 inserts could be grouped into 24 families of similar elements.Subsequently, a limited set of 59 non-redundant sequences was

Figure 3. A. Proteini alignments for clones pN 177. pX 185. pR57, pR64 and pRK2aIssembiccl using, the BLAST proggratilmc. Uppcr lincs correspond to the putativecprotctin product cncoded by one of the 6 ORF's of the NGR234 query sequencc.The most sitnificant database matches arc displayed on the lower lines, with theidentical, conserved (double dot) and less conserved (single dot) aiflino acids listedin the micddlc lines. In the case of pX 185 however, two alignnietnts are provided,above anmd below the query sequence. with the prosite signature for the El -E2class ot ATPase's (accession numbei PSOO 154) marked in bold. The aspartatercstduc bcltceed to undergo phosphorylation is marked with an asterisk. Numbersnexct to the tirst and last atnsino acids of each line show their respective positionsin the homologous protein. The RI methvlation domain of the C. (cres(ce(tuOs McpAprotein is uinder-lined. while the 0 marks the potential methylation site based onthe ieported methVl accepting peptide RI in A'.co/i \lcpI (34). In alignment ofpRK2 I. the bold amino acids correspond to the ATP-binding site signature reportedin the Prosite database (acccssion number PSOO2 I 1). B. OnC gap protein alignnmentsot the ORF-I putative product (centre line) with L.'cohi Lrp (clpper line), andBkdr from P.putido (lower line). Pepticlc enidcs arc marked with *.

RS.1

ItRK21Kv rl-f

pR64, pIRS70O11F1

Figure 4. Gcnctic and SjCl restrictiori imap of thec 50( kb symbiotic plasilmidpNGR234a. SpeI resti-ictionl sites are imiarkcd sith S. Approximate positions ofthe known gences and the ncwlv identitlcd loci pRKI2 pR57. pR64. .V'rM andORF-I (pXN14)) and pRK4I) aIre shown on thew oLIteI circlC.

matched against the nucleotide and amnino-acid databases. Threeclones with significant homologies extending over the entire DNAsequence were studied further. Clones pX 140, pX 177 and pX 185

A

Nucleic Acids Research, 1994, Vol. 22, No. 8 1339

ASall Sacl SinaI

..........

I 1 200 hp.

ORF-1 ORF-2

B1 10 20 30 40 50 60 70 80 90

CTCGAGTGTGCGTCGCCGGGCGGGGCCGAGATCATTGGTGTATCCGGAAATTTCTGCCATGCCGCCGACGCGATGCGGTCGCCAAGGC -88

GCGTGATGGCGATCTTGTACGTCTCGTCCATGATGGTCGATTCCGGCGCCCGCGAGGCATCGGTGATCGGGATCGTCAGCGAATAGCCTT -178

TGACCGGGTAGACCGGCAGCTTGATGCCGTGACGCTTCAGCAGTAGCGGCGAATAGCTGCCGAGTGCGACGACAACAGCATCGGCCGCGA -268

GCCTTTCCCAGTTGGTTACGACACCCCTGACCTTGTCGCCCTCGACATCCAGTTTCCTGACTTCCGTGCCCCAGGAGAAGCGGACGCCCA -358

ATTGCTCCGCCTTCTTCGCAAGCGCGTTGGTGAACTTGAAGCAGTCGCCGGTCTCGTCCTTCGGCGTCAGCAGCCCACCGACGATCTTGT -448

CGCGCGCGGGTCAGCAACGC-A53CTCGCC8GAAGTTGGGCTTTCGCCGTCAGCCTGCCTGAGGCGTATCAAGCTT&TGAGCAGGCA-538RBS M E V G L S P S A C L R R I K L M E Q A

GGTGTCATCAGGGGCTATACGGCGCTTGTCGATCCGACGCAGTCGGAATCGACAATAGCCGTAATCATCAACATTACGCTGGAGCGGCAG -628G V I R G Y T A L V D P T Q S E S T I A V I I T I T L E R Q

ACGGAGGAGTACCTCGACAAGTTTGAAGCGGCCGTGCGCAAGCACCCCGAAATTAGGGAGTGCTATCTAATGACCGGCGGATCAGACTAC -728T E E Y L D K F E A A V R K H P E I R E C Y L M T G G S D Y

ATGCTGAGGGTGGACGTCGAGAATGCGGGGGCATTCGAGCGCATACACAAAGAGGTCCTGTCGACGTTGCCTGGGGTGCGGCGTATCCAT -808M L R V D V E N A G A F E R I H K E V L S T L P G V R R I HTCGAGCTTCTCGATTAGAAATGTGTTAGCGGGCCGTCTGAAAGCAAAAAGAIAAACTTTCCCATT-GAGATTGCTCGGCAGTGAGGT -898S S F S I R N V L A G R L K A K R Opa Opa Opa

AGGCTGTGTACCTCATATGACCGCTGCCCCTCAAGATCCGCAGGGGCTACAGGCCACAGAAGATGTGAGCTCAGCCAATCGAAGGCACTAG -988

GTCGTGGTGACGATTTAACGGATTGAGATTCCCAAGAAGGGGCAAATCAGATTCAACACTGACTTTTGGAGGTCAGTGAGATTGCGAT -1078RBS M D C D

GGGCTTCGGGACGATCAATGGGAACGTATCAGAGGTTTTGTGCCCGGGGGCACGAAGGGCAAGCGTGGCCCGCGCACGAACAACCGGCTG -1168G L R D D Q W E R I R G F V P G G T K G K R G P R T N N R L

TTTCTGGATGCGCTGCTGTGGATGGCCCGTTCGGGGGACCGCTGGCGAGACCTGCCAGAACGACTGGGTGACTACCGCGCCGTAAAACTA -1258F L D A L L W M A R S G D R W R D L P E R L G D Y R A V K L

CGCTATTACCGCTGGATCGAGATGGGCGTGCTCGACGAGATGCTTGCCGTGCTTGCCCGCGAAGCTGATTTGGAATGGTTGATGATCGAT -1348R Y Y R W I E M G V L D E M L A V L A R E A D L E W L M I D

TCGACTATCGTGCGCGCCCATCAGCATGCGGCCGGGGCGCGCAGGGCTAAAGGGGGGCGGATGCCCAGGGCTTGGGTCGGTCTCGAG -1435S T I V R A H Q H A A G A R R A K G G R M P R A W V G L E

Figure 5. A. Restriction map of the 1.4 kb Xhol fragment cloned in pXB315X1.4, with the position of the two open reading frames reported (shadowed boxes).B. complete DNA sequence of the same restriction fragment. Probable ribosome binding sites (RBS), putative start codons (ATGs and one alternate GTG) and non-sense codons (marked Opa) are underlined. The deduced amino acid sequence of the two ORFs is displayed under the nucleotide sequence.

matched (see Fig.3-A) a segment of a leucine regulatory proteinfrom E. coli (20), a sequence from R. leguminosarum coding foran outer membrane protein (21), and a cadmium resistanceprotein from Staphylococcus aureus (22) respectively.

Shot-gun sequence analysis of pNGR234a restrictionfragments that carry induced transcripts not shared byUSDA257To identify flavonoid-inducible loci of pNGR234a that are notpresent in USDA257 genome, we combined data from thecompetitive RNA hybridisation with those shown in Fig.2. ASau3AI library of the 18 XhoI restriction fragments that gavehybridisation signals in both experiments, that did not carry anyknown symbiotic loci and which are dispersed over > 57 kbof pNGR234a was prepared. Four (pR57, pR64, pRK21 andpRK41) out of 150 sequences of the library (representing - 28kb) showed very strong homologies (Fig.3-A) to a succinate-semialdehyde dehydrogenase from E.coli (Swissprot accessionnumber P25526), a methyl-accepting chemotaxis protein fromCaulobacter crescentus (23), the UGPC protein from E. coli (24)and the leucine responsive regulatory protein respectively.

Detailed analysis of the selected clonesConfidence in gene identification by homology search clearlydepends upon the accuracy of the query sequence and increaseswith homologies extending over larger DNA segments. To verifythat the homologies obtained for the seven selected clones were

not fortuitous, we cloned the corresponding genomic loci fromthe ordered cosmid library of NGR234. For each of the sevenloci, we confirmed and extended the original sequence using as

template the appropriate genomic fragment, and two syntheticprimers designed to span the DNA segment showing the highestdegree of homology with the database entries.Two sets of overlapping cosmids were homologous to pX 177:

clones pXBS23 and pXBS4 from pNGR234a as well as cosmidspXB482 and pXB739 from the chromosome. Sequence dataconfirmed that the segment of pX 177 which is homologous tothe R.leguminosarum OmpII gene, mapped to a 3 kb PstIrestriction fragment from the chromosome. Clone pX185 was

assigned to pXB424 of the chromosome. The homologiesreported with S.aureus CadA (22) and R.meliloti FixI (25)proteins correspond to a highly conserved domain in cationtransporters with El E2 ATPase activity (Fig.3-A).ICATOOL analysis showed that pR64 and pR57 sequences

were complementary and overlapped by 178 bases. Combined,they form a single Sau3AI fragment of 286 bp that maps to a3 kb PstI-XhoI restriction fragment shared by pXB43 andpXB315 (see Fig.4 for approximate position). Interestingly, bothpR64 and pR57 gave different and statistically significant resultsin the BLAST analysis. First, pR64 showed a high degree ofhomology to the carboxy-terminus domain of several E. coli andCaulobacter crescentus methyl accepting chemotaxis proteinswhich extend over the second of the two proposed methylationdomains (KI and RI) adjacent to a well conserved cytoplasmic

XhoI HindIll ClaI XhoI

I

1340 Nlucleic Acids Research, 1994, Vol. 22, No. 8

region (23). On the complementary strand, the putative peptideencoded by pR57 is highly homologous to the C-terminal domainof several semialdehyde dehydrogenases. Since the putativ eproteins from both sequences correspond to very conservedcarboxy-terminal domains, with non-sense codons correctlNplaced to match the right protein length, it seems as if this Sca3AIfragment extends over the ends of two genes transcribed inopposite directions and overlaps by 34 bp.The pRK21 insert was mapped to the 5.2 kb XhoI restriction

fragment of pXB807 (see position in Fig.4). This DNA fragmentwas cloned (pRB807X5.2) and partially sequenced. About 800bp of the NGR234 RS. 1 repeat element (one copy on pNGR234a.three on the chromosome, 6) cover one extremity of this DNAfragment while a svrM homologous sequence was identified atthe other extremity (data not shown). Alignments with databaseentries showed strong homologies, both at the DNA and proteinlevel to the UGPC locus from E. coli. The putative pRK2I proteinproduct also displayed a high degree of homology to other relatedATP-binding proteins, such as R. legumiinosamrun and R. loti Nod!.that are involved in the active transport of small hydrophilicmolecules across the cytoplasmic membrane. Despite thesehomologies, we believe that pRK21 does not code for theNGR234 NodI product, as recent sequence data shows that niodIis part of the nodABCIJ operon in Rhizobiuni sp. NGR234(B.Relic' unpublished). Finally, ICATOOL analysis demonstratedthat the pX 140 and pRK41 sequences are complementary andoverlap by 125 bp. Both clones are linked to a 1.4 kb Xholrestriction fragment carried by the cosmid pXB315 (see ORF-lmap location in Fig.4), and contiguous to the 3 kb P.stI-XhoIrestriction fragment carrying the pR57 and pR64 sequences.To test if any ot the 7 sequences described above are part of

open-reading frames whose expression is inducec; by flavonoids.we prepared PCR-amplified products from the selected insertsusing primers designed to flank the DNA segment with the highestdegree of homology in the BLAST analysis. Probing a Southerntransfer of the resulting PCR products in a competitive RNAhybridisation experiment showed that only inserts from pRK41and pX 140 hybridised to the labelled RNA prepared fromflavonoid-induced NGR234 bacteria. Later, induction of this locuswas also confirmed by Northern analysis (data not shown).

The deduced peptide of ORF-1 is strongly homologous toE.coli Lrp and P.putida BkdrTo test the reliability of our screening strategy, we analysed theLRP-like locus that is both inducible and unique to NGR234.First, to demonstrate that this sequence is truly unique, we probedrestricted genomic DNAs from USDA257 and NGR234 with a32P-labelled insert of pX140. As expected, only one strong bandwas observed in NGR234, and there was no cross-hybridisationwith R.fredii DNA (data not shown). Second, to determinewhether pRK41 and pX140 inserts are part of a larger open-reading frame, we sequenced the entire 1.4 kb XhoI restrictionfragment cloned in pXB315Xl.4 (Fig.5). Both pX140 and pRK41sequences matched the 381 bp ORF- 1. BLAST analysis showedthat the putative ORF-1 product is highly homologous to tworegulatory proteins, one from E. coli, the other from Pvseldloinonasputida [Lrp and Bkdr respectively (26)]. The one gap alignmentpresented in Fig.3-B, predicts that the deduced amino acidsequence of the protein encoded by ORF-I has 37% and 40%identity, or 78% and 81% homology (when similar amino acidsare included) to Lrp and Bkdr respectivelk. All scores are higherthan those proposed in a more flexible three gaps alignment of

the E.coli Lrp and AsnC proteins (20). Extensive homology ofthe ORF- 1 amino-ter-miiinal donuin with the E.coli LI-p Helix-Turn-Helix domain, suggests that protein synthesis should initiateat the GTG codon rather than at the downstream ATG (Fig.5-B).If translation starts at the alternate GTG codon. the NGR234 Lrphomologue is 127 amlino acids lone. 35 a.L. shor-ter at its aimiino-terminus than the E. coli Lrp. A second ORF (ORF-2 seeFig.4-A) was identified on the 1 .4 kb XlioI fragment. Thededuced peptide sequence of ORF-2 shares 28 identity and 72%homology (when similar amino acids are included in the analNsis)with the protein A3 from Agrohtcctrie n tnel,cfricics IS869 (27)in a no-gap protein alignment (datal not shown).

Symbiotic phenotype of the ORF-l::Q mutantTo assay symbiotic activ itv of the ORF- I locus, a imutant carrvincthe Omega interposon in the HiIllII site internal to the gene wasconstructed (NGR234QORF-1). In comparison with wild-typeNGR234. this mutant caused a1 4.5 day delay in nodulation ofV. unguiculata (measuied 2 d aftei inoculation). OnL. leucocephala and P. tuberosus. the number of nodules increasedby more than 65%c in comparison to the wild-type, while onC. calr/cumlen the nodule number was decreased by + 25 .

DISCUSSION

Random sequencing has been used to study the genome structureof various organisms including the Lar-unot,,0t(acheitis virus (28).Mvcoplasnia genitolliumi (29). Saccl(iarotivces (ces- i.si'ac(C.J.Davies, Ph.D. thesis, 1991) and the Pufferfish. Fuglt-ubripes r-lubripes (30). In association with competitive RNAhybridisation and/oi subtractioni DNA hybridisation. it becollmesa potent method to comiipare related tzenomiies as well as to targetactively transcribed genes. The screening strategy outlined inFig. 1 is based on the NGR234 physical map. and expands thelevel of analysis fromn linmited DNA segments to the wholereplicon. It is flexible since minor modifications to theRNA-DNA hybridisation procedures allow targeting of genesinduced or repressed under many different conditions. Only geneswith relatively strong homolocies to database entries will beidentified this way however.

Data from DNA subtraction hybridisations confirmed thatRhizobiumswl species NGR234 and R.frtedii strain USDA257 arephylogenetically related, and share most of their genomicbackground. No essential gene was identified in the randomsequence analysis of the Sau3AI fragments remaining after twocycles of DNA subtraction hybridisation. Homologies with ISelements indexed in the databases (data not shown), and theabsence fiom the USDA257 genomiie of the RS. I transposon likerepeat (X.Perret. unpublished) suggests that many of thesequences unique' to NGR234 are mobile elements which haveaccumulated since both bacteria diverged. The higher proportionof the unique' sequences in the symbiotic plasnid compared withthe rest of the genome suggests that pNGR234a toleratesintegration of non-endogenous sequences better than thechromosome. Icatool and ClustalV analysis of more than 100NGR234 Sau3AI fragrments not shared by R.firedii USDA257revealed inherent limitations in the librair prepared f-om thesubtracted fragments. There was 40%` redundancy amongst theclones analysed, with similar sequences grouped into 24 familiesof as many as 10 elements. Probing Southern blots of mnultipler-estriction digests of NGR234 genomic DNA with sequencesrepresentative of some of these larcest families showed that the

Nucleic Acids Research, 1994, Vol. 22, No. 8 1341

redundancy does not result from repeated elements in NGR234genome (data not shown). Moreover, several sequencemismatches were found among nearly identical fragments clonedin both orientations. This indicates that the PCR amplificationof subtracted sequences prior to cloning generates or increasesan unbalanced distribution of fragments, provoking smallanomalies due to Taq polymerase misreadings. This biasedfragment distribution prevents use of the level of redundancy inthe pool of analysed clones to estimate the total length ofsequences specific to NGR234. Nevertheless, valuable geneticinformation can be retrieved from the library of subtractedfragments particularly using the BLAST software which iscapable of detecting distant protein homologies even whenconfronted with such common sequencing errors as frameshiftsand replacements.

This combination of techniques lead to the identification ofseveral new loci with putative symbiotic functions. Among these,the sequence homologous to the symbiotic regulator syrM isadjacent to pRK21, the clone pX 177 with homology to the ompIIIgene which is symbiotically repressed in R. leguminosarum hasbeen mapped to the chromosome of NGR234, and pX 185 carriesa highly conserved domain with El E2 ATPase activity commonin cation transporters such as the FixI protein. In addition, pR57has been shown to be homologous to the C-terminal domain ofsuccinate- and other semialdehyde dehydrogenases. In R. meliloti,a mutant with a low succinic semialdehyde dehydrogenase activityis defective in symbiotic nitrogen fixation. More interestingly,we identified ORF- 1, a new symbiotic gene. The peptide encodedby ORF-1 is very similar to the regulatory proteins Lrp fromE. coli and BkdR from P.putida. BkdR is a positive activator ofthe branched-chain keto acid dehydrogenase operon, while Lrpcombines repressor and activator activities that coordinate variousfunctions involved in global responses (31). The conservationin the ORF- 1 product of all but one of the amino acids knownto affect the Lrp DNA binding ability (32), suggests that ORF- 1may have retained regulatory functions. In presence of a suitablecarbon source, the Lrp mutant in E. coli grows normally.Similarly, the ORF-1 mutant of NGR234 does not display anextreme phenotype. However, the Q: ORF- 1 mutation modifiesthe efficiency of nodulation by NGR234. Depending upon theplant tested, we observed a significant delay in nodulation, areduction or even a large increase in the number of nodules. Thissymbiotic phenotype, together with the observed flavonoidinduction of ORF- 1 and its location on the non-essential symbioticplasmid pNGR234a, suggest that this gene is probably notinvolved in the regulation of operons similar to those controlledby Lrp and BkdR. Furthermore, the absence of homologous genesin other rhizobia (data not shown), as well as in the closely relatedR.fredii USDA257, suggests that NGR234 has developedadditional systems to regulate nodulation. Another symbioticregulatory systems, nodVW, has been described inBradyrhizobium japonicum (33).

ACKNOWLEDGEMENTS

We wish to thank M.Trower, G.Elgar and D.Gerber for theirhelp in many aspects of this work. We are grateful to J.Parsonsand S.Aparicio for their assistance with the computer analysis.Financial support was provided by the Fonds National Suisse dela Recherche Scientifique (Grants # 31-30950.91 and31-36454.92) and the Fondation Sandoz pour l'Avancement des

Sciences Medico-biologiques. R.Fellay gratefully acknowledgesthe receipt of an EMBO short-term fellowship.

REFERENCES

1. Trinick, M.J. (1980) J. Appl. Bacteriol., 49, 39-53.2. Heron, D.S. and Pueppke, S.G. (1984) J. Bacteriol., 160, 1061-1066.3. Relic', B., Perret, X., Golinowsky, W., Pueppke, S.G., Krishnan, H.B.

and Broughton, W.J. (1993) Science, Submitted.4. Krishnan, H.B., Lewin, A., Fellay, R., Broughton, W.J. and Pueppke, S.G.

(1992) Mol. Microbiol., 6, 3321-3330.5. Lewin, A., Cervantes, E., Wong, C.-H. and Broughton, W.J. (1990) Mol.

Plant-Microbe Interact., 3, 317-326.6. Perret, X., Broughton, W.J. and Brenner, S. (1991) Proc. Natl. Acad. Sci.

USA, 88, 1923-1927.7. Perret, X. (1992) Ph.D. thesis # 2489, University of Geneva, Geneva,

Switzerland.8. Fellay, R., Perret, X., Broughton, W.J. and Brenner, S. (1993) Mol.

Microbiol., submitted.9. Broughton, W.J., Wong, C.-H., Lewin, A., Samrey, U., Myint, H., Meyer

z.A., H., Dowling, D.N. and Simon, R. (1986) J. Cell Biol., 102,1173-1182.

10. Bjourson, A.J., Stone, C.E. and Cooper J.E. (1992) Appl. Environ.Microbiol., 58, 2296-2301.

11. Stanley, J., Dowling, D.N., Stucker, M. and Broughton, W.J. (1987) FEMSMicrobiol. Lett., 48, 25-30.

12. Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molecular Cloning:A Laboratory Manual, second edition, Cold Spring Harbor University Press,Cold Spring Harbor.

13. Sanger, F., Nicklen, S. and Coulson, A.R. (1977) Proc. Natl. Acad. Sci.USA, 74, 5463-5467.

14. Parsons, J.D., Brenner, S. and Bishop, M.J. (1992) Comp. Appl. Biosc.,8, 461-466.

15. Higgins, D.J. and Sharp, P.M. (1988) Gene, 73, 237-244.16. Altschul, S.F., Gish, W., Miller W., Myers, E.W. and Lipman, D. (1990)

J. Mol. Biol., 215, 403 -410.17. Prentki, P. and Krisch, H.M. (1984) Gene, 29, 303-313.18. Quandt, J. and Hynes, M.F. (1993) Gene, 127, 15-21.19. Figurski, D.H. and Helinski, D.R. (1979) Proc. Natl. Acad. Sci. USA, 76,

1648- 1652.20. Willins, D.A., Ryan, C.W., Platko, J.V. and Calvo, J.M. (1991) J. Biol.

Chein., 266, 10768-10774.21. de Maagd, R.A., Mulders, I.H.M., Canter Cremers, H.C.J. and Lugtenberg,

B.J.J. (1992) J. Bacteriol., 174, 214-221.22. Nucifora, G., Chu, L., Misra, T.K. and Silver, S. (1989) Proc. Natl. Acad.

Sci. USA, 86, 3544-3548.23. Alley, M.R.K., Maddock, J.R. and Shapiro, L. (1992) Genes and Dev.,

6, 825-836.24. Overduin, P., Boos, W. and Tommassen, J. (1988) Mol. Microbiol., 2,

767-775.25. Kahn, D., David, M., Domergue, O., Daveran, M.L., Ghai, J., Hirsch,

P.R. and Batut, J. (1989) J. Bact., 171, 929-939.26. Madhusudhan, K.T., Lorenz, D. and Sokatch, J.R. (1993) J. Bact., 175,

3934 -3940.27. Paulus, F., Canaday, J., Vincent, F., Bonard, G., Kares, C. and Otten,

L. (1991) Plant Mol. Biol., 16, 601-614.28. Griffin, A.M. (1989) J. Gen. Virol., 70, 3085-3089.29. Peterson, S.N., Schramm, N., Hu, P.-C., Bott, K.F. and Hutchison, C.A.

(1991) Nucleic Acids Res., 19, 6027-6031.30. Brenner, S., Elgar, G., Sandford, R., Macrae, A., Venkatesh, B. and

Aparicio, S. (1993) Nature, 366, 265-268.31. Newman, E.B., D'Ari, R. and Lin, R.T. (1992) Cell, 68, 617-619.32. Platko, J.V. and Calvo, J.M. (1993) J. Bact., 175, 1110-1117.33. Gottfert, M., Grob, P. and Hennecke, H. (1990) Proc. Natl. Acad. Sci. USA,

87, 2680-2684.34. Kehry, M.R., Bond, M.W., Hunkapiller, M.W. and Dahlquist, F.W. (1983)

Proc. Natl. Acad. Sci. USA, 80, 3599-3603.35. Hanahan, D. (1983) J. Mol. Biol., 166, 557-580.36. Cami, B. and Kourilsky, P. (1978) Nucleic Acids Res., 5, 2381-2390.37. Stanley, J., Dowling, D.N. and Broughton, W.J. (1988) Mol. Gen.

Genet.,215, 32-37.38. Morrison, N.A., Hau, C.Y., Trinick, M.J., Shine, J. and Rolfe, B.G. (1983)

I. Bact., 153, 527-531.