Molecular Evolution in the gnd Locus of Salmonella enterica

16
Molecular Evolution in the gnd Locus of Salmonella enterica Gowrie Thampapillai, Ruiting Lan, and Peter R. Reeves Department of Microbiology, University of Sydney The gnd gene, the structural gene for 6-phosphogluconate dehydrogenase, was sequenced and analyzed in 34 isolates from different serovars of the seven subspecies of Salmonella enterica to provide comparative information on the evolution in this gene, which has been studied extensively in Escherichia coli. The gene tree obtained by the neighbor-joining method in general gave separate branches for each subspecies, with the few exceptions readily explained by recombination. There is evidence of recombination involving transfer of long (more than 400 bp) and short (30- 150 bp) segments of DNA. Four of the six long-segment transfers detected are at the 5’ end of the gene, and in all four cases a variant of the chi sequence is located close to the recombination junction and appears to have mediated the recombination events. We suggest that in these four cases and in a fifth case with intersubspecies transfer of the whole gnd gene, the adjacent rJb (0 antigen) locus may have been transferred in the same event. The estimates of the number of synonymous substitutions per synonymous site, Ks, and the number of nonsy- nonymous substitutions per nonsynonymous site, KA, within the E. coli and S. enterica gnd genes, and also between the two species show an interesting distribution, with Ks being lower toward the ends of the gene and KA in particular being lower in the first than in the second domain. In S. enterica, synonymous sites also seem to be subjected to negative selection. The ratio of KA to Ks was higher within S. entericu and E. coli than between them, which may indicate that intraspecies variation is essentially between clones and that mildly deleterious mutations can be fixed within clones, which would thus raise KA within species. Introduction We are undertaking several studies of variation within and between bacterial species with the aim of understanding the structure and evolution of bacterial populations. They include several studies of variation in the 0 antigen of Salmonella enterica, which is present in the outer membrane of gram-negative bacteria and is very variable, with about 60 and 160 forms detected in S. enterica and Escherichia co/i, respectively (Ewing 1986). The structural variation in 0 antigen depends on genetic variation in the rJb gene cluster, which comprises a cluster of genes involved in 0 antigen synthesis (M2kelti and Stocker 1984; Reeves 1992). The analysis of the rfb locus in S. enterica has shown that interspecies transfer has been a major force in generating the diversity (Reeves 1993). The g&gene, which codes for 6-phosphogluconate dehydrogenase (6-PGDH, EC 1.1.1.44), an enzyme of the pentose-phosphate pathway, is located adjacent to the rjl3 locus. These two loci map at 44 and 42 min on the E. coli and S. enterica chromosomes, respectively (Sanderson and Roth 1988; Bachmann 1990). The re- gion downstream of the gnd gene has been sequenced recently in an E. co/i strain, 0111, and in a S. enterica strain, LT2 (Bastin et al. 1993). In the E. coZi strain the interval between the gnd gene and the his operon consists of two uncharacterized open-reading frames and another gene, c/d, which codes for a protein that determines the chain length of the 0 antigen. The S. enterica strain LT2 has a single uncharacterized open-reading frame and the cld gene in the same interval between gnd gene and his operon. The difference between the E. coli and S. en- Key words: S. enterica gnd gene, lateral gene transfer, chi sites, terica strains in this region indicates that there have been synonymous and nonsynonymous substitutions, selection and evolu- rearrangements in this region. The cld gene is highly tion. variable (Batchelor et al. 199 1; Bastin et al. 1993), and Address for correspondence and reprints: Peter R. Reeves, De- its product clearly interacts with the highly variable 0 partment of Microbiology, G08, University of Sydney, New South antigen. Therefore, the polymorphism in the gnd gene, Wales 2006, Australia. which is located between the rfb and cld loci, is likely to be affected by selection at either loci. The three-dimensional structure of the 6-PGDHs of E. coli and S. enterica have not been specifically elu- Mol. Biol, Evol. 11(6):813-828. 1994. 0 1994 by The University of Chicago. All rights reserved. 0737-4038/94/l 106-0001$02.00 813 by guest on October 19, 2014 http://mbe.oxfordjournals.org/ Downloaded from

Transcript of Molecular Evolution in the gnd Locus of Salmonella enterica

Molecular Evolution in the gnd Locus of Salmonella enterica

Gowrie Thampapillai, Ruiting Lan, and Peter R. Reeves Department of Microbiology, University of Sydney

The gnd gene, the structural gene for 6-phosphogluconate dehydrogenase, was sequenced and analyzed in 34 isolates from different serovars of the seven subspecies of Salmonella enterica to provide comparative information on the evolution in this gene, which has been studied extensively in Escherichia coli. The gene tree obtained by the neighbor-joining method in general gave separate branches for each subspecies, with the few exceptions readily explained by recombination. There is evidence of recombination involving transfer of long (more than 400 bp) and short (30- 150 bp) segments of DNA. Four of the six long-segment transfers detected are at the 5’ end of the gene, and in all four cases a variant of the chi sequence is located close to the recombination junction and appears to have mediated the recombination events. We suggest that in these four cases and in a fifth case with intersubspecies transfer of the whole gnd gene, the adjacent rJb (0 antigen) locus may have been transferred in the same event. The estimates of the number of synonymous substitutions per synonymous site, Ks, and the number of nonsy- nonymous substitutions per nonsynonymous site, KA, within the E. coli and S. enterica gnd genes, and also between the two species show an interesting distribution, with Ks being lower toward the ends of the gene and KA in particular being lower in the first than in the second domain. In S. enterica, synonymous sites also seem to be subjected to negative selection. The ratio of KA to Ks was higher within S. entericu and E. coli than between them, which may indicate that intraspecies variation is essentially between clones and that mildly deleterious mutations can be fixed within clones, which would thus raise KA within species.

Introduction

We are undertaking several studies of variation within and between bacterial species with the aim of understanding the structure and evolution of bacterial populations. They include several studies of variation in the 0 antigen of Salmonella enterica, which is present in the outer membrane of gram-negative bacteria and is very variable, with about 60 and 160 forms detected in S. enterica and Escherichia co/i, respectively (Ewing 1986). The structural variation in 0 antigen depends on genetic variation in the rJb gene cluster, which comprises a cluster of genes involved in 0 antigen synthesis (M2kelti and Stocker 1984; Reeves 1992). The analysis of the rfb locus in S. enterica has shown that interspecies transfer has been a major force in generating the diversity (Reeves 1993).

The g&gene, which codes for 6-phosphogluconate dehydrogenase (6-PGDH, EC 1.1.1.44), an enzyme of the pentose-phosphate pathway, is located adjacent to the rjl3 locus. These two loci map at 44 and 42 min on the E. coli and S. enterica chromosomes, respectively (Sanderson and Roth 1988; Bachmann 1990). The re- gion downstream of the gnd gene has been sequenced recently in an E. co/i strain, 0111, and in a S. enterica strain, LT2 (Bastin et al. 1993). In the E. coZi strain the interval between the gnd gene and the his operon consists of two uncharacterized open-reading frames and another gene, c/d, which codes for a protein that determines the chain length of the 0 antigen. The S. enterica strain LT2 has a single uncharacterized open-reading frame and the cld gene in the same interval between gnd gene and his operon. The difference between the E. coli and S. en-

Key words: S. enterica gnd gene, lateral gene transfer, chi sites, terica strains in this region indicates that there have been

synonymous and nonsynonymous substitutions, selection and evolu- rearrangements in this region. The cld gene is highly tion. variable (Batchelor et al. 199 1; Bastin et al. 1993), and

Address for correspondence and reprints: Peter R. Reeves, De- its product clearly interacts with the highly variable 0 partment of Microbiology, G08, University of Sydney, New South antigen. Therefore, the polymorphism in the gnd gene, Wales 2006, Australia. which is located between the rfb and cld loci, is likely to

be affected by selection at either loci. The three-dimensional structure of the 6-PGDHs

of E. coli and S. enterica have not been specifically elu-

Mol. Biol, Evol. 11(6):813-828. 1994. 0 1994 by The University of Chicago. All rights reserved. 0737-4038/94/l 106-0001$02.00

813

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

8 14 Thampapillai et al.

cidated, but alignment of amino acid sequences of 6- PGDH from E. coli, S. enterica, Bacillus subtilis, Sy- nechococcus sp., sheep and pig, has shown that these proteins are homologous and share conserved regions that are functionally important (Reizer et al. 199 1). The structure of sheep 6-PGDH is known, and from this data the structure of E. coli and S. enterica 6-PGDHs can be deduced (Adams et al. 199 1; M. Adams, personal communication). The enzyme 6-PGDH consists of three domains: an N terminal a/P NADP binding domain (residues 1 - 176), a second all-a domain extending from residues 177-434, and a tail comprising residues 435- 468. The amino acid sequence of the gnd gene has two putative nucleotide binding sequences, one in residues lo- 15 with a consensus sequence GXAXXG (Adams et al. 199 1) and another, GXGXXGXXXG, for the PC@ fold in residues 124- 133 and a strongly conserved pu- tative substrate binding sequence ILDXAANKGTGK in residues 253-264 (Reizer et al. 1991).

In this article we present an analysis of the DNA sequence variation in the gnd gene of S. enterica and show that the evolution of this gene involves recombi- nation at both ends of the coding region and that, at least at the 5’ end, these recombinations seem to be chi mediated and perhaps involve the adjacent rJb locus. Comparison of the number of nucleotide substitutions in the synonymous and nonsynonymous sites in the E. coli and S. enterica gnd genes indicates that both non- synonymous and synonymous sites are subjected to negative selection in S. enterica.

Material and Methods Bacterial Strains and DNA Sequences

Details of the Salmonella enterica, Citrobacter freundii, and Yersinia pseudotuberculosis strains used are given in table 1. In the data presented, we have added to the S. enterica strain names a suffix sl, ~2, and so forth, to indicate the subspecies classification of the strain. For example, M3 18~4 indicates that strain M3 18 is a subspecies IV strain. We have used 16 published Escherichia coli gnd sequences for comparison with the S. enterica sequences. Details of the E. coli sequences, with GenBank numbers in parentheses, are K-12 (KO2072) (Nasoff et al., 1984); ECOR 4 (M64324), ECOR 16 (M64325), ECOR 65 (M64331), ECOR 68 (M64330), ECOR 69 (M64328), and ECOR 70 (M64329) (Dykhuizen and Green 1991); ECOR 10 (M63821), ECOR 11 (M63822), ECOR 18 (M63823), ECOR 20 (M63824), ECOR 21 (M63825), ECOR 23 (M63826), ECOR 25 (M63827), ECOR 47 (M63828), and ECOR 56 (M63829) (Bisercic et al. 1991).

The gnd DNA sequences of 34 natural isolates of S. enterica representing several 0 antigen forms and the seven documented subspecies (Cross et al. 1973; Le Mi-

nor et al. 1986) have been analyzed in this work. The 34 S. enterica gnd sequences include 33 new S. enterica gnd genes sequenced for this study and the already pub- lished S. enterica LT2 gnd sequence (Reeves and Ste- venson 1989). The gnd gene of a C. freundii strain 396 was also sequenced at the same region and used as an outgroup in the phylogenetic analysis. The analysis is based on 1,329 bp of DNA sequence bases from 16 to 1344 in the coding region of the S. enterica gnd gene (coding sequence 1,404 bp). Bases 16 to 400 of gnd DNA of a Y. pseudotuberculosis strain were also sequenced and used as outgroup in the phylogenetic analysis of E. coli and C. freundii strains.

In some strains, the gnd sequence differed from that expected for the particular species or subspecies, and in such cases sufficient properties were tested to confirm the species or subspecies designation. We acknowledge the help of Robert Chiew, Westmead Hospital, Sydney, for confirmation of the subspecies status of S. enterica M 130~2, M38s2, M298s1, and M3 18~4, and Dr. K. A. Bettelheim and Dr. D. E. Leslie, Fairfield Hospital, Vic- toria, for confirmation that our stock of C. freundii 396 is indeed C. freundii.

PCR and Sequencing

Chromosomal DNA was extracted by the method devised by Ardeshir et al. (198 1). The sequences for oli- gonucleotide primers were chosen taking into account the segments conserved in the published Escherichia coli K- 12 (Nasoff et al. 1984) and Salmonella enterica LT2 (Reeves and Stevenson 1989) gnd gene sequences. The following oligonucleotides were used for PCR amplification and sequencing:

No. 218, 5’ tgtaaaacgacggccagt- CCAAGCAACAGATCGG 3’,

No. 239, 5’ tgtaaaacgacggccagt- TCGATTCGCTGAAACC 3’,

No. 261, 5’ tgtaaaacgacggccagt- GAATATGGCGATATGCA 3’,

No. 279, 5’ tgtaaaacgacggccagt- TCTTGGCAAGATCGT 3’,

No. 219, 5’ caggaaacagctatgacc- TATAGGTGTGCGCACC 3’, and

No. 260, 5’ caggaaacagctatgacc- ATCGGCGTTTTCTGCGTA 3’.

The segments shown in upper case are gnd gene sequences, and those in lower case are either the universal forward or reverse Ml 3 primer sequences. DNA from the coding region was amplified by the PCR method described by Saiki et al. (1988). The amplified PCR product was purified with prep A gene matrix (BioRad) according to the manufacturer’s instructions in order to remove excess PCR primers, precipitated by ethanol and resuspended in 8 ml of TE CPH 8). Cvcle sequencing

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

Table 1

Molecular Variation at the gnd Locus of Salmonella enterica 8 15

Information on Strains Used

Serovar (for S. enterica) or Species Strain a Subspecies 0 Antigen Source b

Typhimurium LT2 ................. Paratyphi B ....................... Choleraesuis ...................... Tennessee ........................ Glostrup ......................... Muenchen ........................ Typhi ............................ Strasbourg ........................ Canoga .......................... Senftenberg ....................... Berkeley .......................... Dahlem .......................... Sofia ............................. 9,12: mt: e,n,x ..................... Lindrick .......................... 1,9,12: g,mst: 1.5242 ................ Haarlem ......................... Westpark ......................... Springs ........................... Freemantle ....................... Phoenix .......................... Ar 1OalOb: 17,20 .................. Ar 1,3:1,3,11 ...................... Ar 38: 1,~: 253,254 ................. Ar28:27-21 ...................... Ar5,29:33-31 .................... 38: 24,223 ........................ Houten .......................... 43:z4:z23 ....................... Brookfield ........................ Balboa ........................... Marseille ......................... 41 : b:1,7 ......................... Vrindaban ........................ Citrobacter freundii ................. Yersinia pseudotuberculosis (Grp. V) ...

SARA 2 SARA 4 1 M36 M55 M46 SARA 71 Ty2la (M229) Ml3 M35 M73 M295 M298 M494 Ml30 M495 M496 M38 M497 M261 M287 M311 M314 M313 M498 M316 M317 M318 M319 M320 M322 M321 M324 M325 M326 396 (M132) M89

II II II II II II II II II IIIa IIIa IIIb IIIb IIIb IV IV IV V V VI VI VI

B B Cl Cl c2 c2 Dl D2 E3 E4 43 48 B Dl Dl Dl D2 El 40 42 47 40 44 38 47 48 38 43 43 66 48 F 41 45

J i i W

W

W

W

W

W

i

P P P W

P P P P P P P P P P

Y

’ Laboratory name is given in parentheses when strain name is not available. b The sources of the strains are as follows: a, SARA Collection, R. K. Selander (Beltran et al., 199 1); p, L. Le Minor,

Institut Pasteur, Paris, France; w, R. Chiew, Westmead Hospital, Sydney, Australia; i, C. Murray, Institute of Medical and Veterinary Science, Adelaide, Australia; j, K. Jann, Max-Plank-Institut fur Immnobiologie, Freibetg; m, J. Taplin, Department of Microbiology, University of Melbourne, Australia; d, This laboratory; y, Yersinia Reference Centre, Public Health Lab- oratory, Leicester Royal Infirmary, Leicester U.K.

reactions were carried out according to the method de- scribed by Applied Biosystems using a dye-labeled primer complementary to the universal primer sequence present in the primer used for PCR amplification. The extended products from the cycle sequencing reactions were run on a denaturing polyacrylamide gel and read by an ABI373A sequencer.

Computer Analysis

The sequences were edited and analyzed by pro- grams made available through the Australian National Genomic Information Service (ANGIS), at the Univer-

sity of Sydney (Reisner et al. 1993). The DNA sequences were analyzed using MULTICOMP (Reeves et al. 1994), which gives pairwise comparisons of DNA and derived amino acid sequences and facilitates the use of programs such as PAUP, MACLADE and those within the PHY- LIP package (version 3.4 written by Joseph Felsenstein, Department of Genetics, University of Washington, Seattle). It also uses Sawyer’s algorithm to detect intra- genie recombination (Sawyer 1989), calculates nucleo- tide diversity X, using the method given by Nei and Miller (1990) and estimates KS, the number of synonymous substitutions per synonymous site, and KA, the number

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

8 16 Thampapillai et al.

of nonsynonymous substitutions per nonsynonymous site, using the program kindly provided by Wen-Hsiung Li of the University of Texas, Houston (Li et al. 1985; Li 1993). The values for Ks and I& in the interspecies comparison were derived by estimating the values for pairwise comparisons of each E. coli strain against each S. enterica strain and taking the average. Phylogenetic trees were constructed using the neighbor-joining (NJ) method described by Saitou and Nei (1987). The pro- gram, METREE (Rzhetsky and Nei 1992), which iden- tifies the minimum evolution tree and other trees similar to the minimum evolution tree, was kindly provided by A. Rzhetsky, Pennsylvania State University, and was used on our NJ trees. In each case these ME trees were only minor variants of the NJ tree, and the NJ trees are presented with the bootstrap values given by the ME- TREE program.

Results and Discussion Phylogenetic Analysis

In the 34 Salmonella enterica gnd sequences there are 3 14 polymorphic sites, and of these, 201 are phy- logenetically informative, such sites having at least two bases present in two or more strains (fig. 1). These in- formative bases have been used to construct a phylo- genetic (NJ) tree using the gnd sequences of Citrobacter freundii strain 396 as outgroup. The NJ tree (fig. 2a) showed that, in general, DNA polymorphism at the gnd locus of S. enterica corresponded to subspecies variation within the species. However, there are exceptions. The sequences of two strains, those of M298sl and M326s6, were displaced from the branches that contain the other members of the same subspecies, which shows that in these strains the whole or most of the gnd gene has been replaced by lateral transfer.

Three other recombination events, in addition to those in M298sl and M326s6, appear to affect the to- pology of the gene tree. In the placement of subspecies IV, IIIa, and V, the gnd gene tree does not correspond to the trees based on the gapA or putP genes (Nelson et al. 199 1; Nelson and Selander 1992). However, if the strains M298s 1 and M3 18~4 and the segment from po- sitions 9 16 to 1344 in the two subspecies V strains are excluded from the analysis (as they contain DNA atyp- ical for the subspecies), the tree obtained is similar to those for gapA and p&P, with the exception of M326s6 (fig. 2b). Indeed, to subspecies level, there is a consensus tree for the three genes, with the putP tree differing from the gapA tree only in the order of subspecies IIIa and IV and the gnd tree (fig. 2b) differing from gapA tree only in the placement of subspecies II. We conclude that this consensus tree is the best estimate for the relationship between subspecies and that recombination on a scale that affects the tree for the gnd gene has occurred in only

the few strains referred to above. The MLEE trees ob- tained by Reeves et al. (1989) and Selander et al. (199 1) differ in the relationships of the subspecies in several ways from each other and from this consensus gene tree, which suggests that for determining the relationships be- tween the very divergent subspecies of S. enterica, a tree derived from MLEE data may not be as good as one based on DNA sequence of a few genes.

Recombination in the Salmonella enterica gnd Gene

Intersubspecies transfers involving segments of about 400-750 bp or more in size were detected in a total of 6 of 34 strains analyzed. All six recombinant gnd sequences, those of strains M3 18~4, M298s 1, M38s2, M 130~2, M32 1~5, and M322s5, have the same charac- teristic feature, in that the deduced donor segments are located at either end of the gene, which indicates the possibility that these transfers may involve regions ad- jacent to the gnd gene on either side. In addition to these large segment transfers, lateral transfers of small seg- ments of DNA (30- 150 bp) are also evident in many strains.

Detection of Recombination

The conclusions on recombination events were de- rived from both phylogenetic and statistical analysis. In general all methods of phylogenetic analysis infer that DNA sequence similarity is due to common ancestry. Phylogenetic analysis based on protein polymorphisms at 24 loci (Selander et al. 199 1) and DNA sequence vari- ation at gapA and putP loci have confirmed that S. en- terica genome is essentially clonal, and the genetic re- latedness in S. enterica strains strongly follows the subspecies variation.

Given this background, in a phylogenetic recon- struction, one would expect strains belonging to the same subspecies of S. enterica to cluster in the same branch. When a gene tree places a S. enterica strain in a branch different to that in which the members of its assigned subspecies cluster, one could infer that the erroneous or unusual placement of the strain in a different branch is due to lateral gene transfer. Such observation is further confirmed when independent analysis such as biotyping, serotyping, or DNA sequence analysis at a different locus confirms the original subspecies status of this strain.

In the same manner, if part of the sequence places a strain in the branch that contains the other members of the same subspecies, while another part of the same gene places the strain elsewhere, it can be inferred that the latter segment has been acquired by partial gene transfer. In such chimeric DNA segments of an essen- tially clonal bacterium such as S. enterica, one could not only detect recombination but also deduce the donor

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

1111222222222233333333344444444444444445555555555555555555666666666666666666677777777777778888888888 34456777795899344445556900233499900112245667788890011122333444558999000001112224556788801334445566771223345788

Consensus LT2sl M35sl S41sl M46sl M73sl M229sl M36sl wise1 I429561 S71sl M13sl M298sl M311s2 M494s2 M261s2 M496s2 M287s2 M38s2 M130s2 M497s2 M49562 M316s3b M498s3b M317s3b M313s3a M314s3a M318s4 M319s4 M320s4 M321s5 M322s5 M324s6 M325s6 M326s6

95846256810029136792451739439206923170916581703981706925134369588147036792481478040914751581450628149294798412 ** * * * * * * * * *

GGCCCCTGTCGTCCCTAGTCATGTTCACCGTGCGT~~GTCCTGCGGTTCGCACGTCCCTCTCC~CTCTATCCC~ACCGCGATTCGCTTCCACCCGCTACC~C~CGTCG .............................................. TA..............................T.......C..............T ........ .............. T ..... G ........................... T ......... G T.......T..............C.....T........T .... ........ ............................ T ................. T ................................... C ...................... T .... ............................ T ................. T ................................... C ...................... T .... A ............... T ......... T ................... T ................................... C .......................... A .............. T ............................... TA...T....CGG.T..........T......T...........TT.TAT...........C .. .............. T ..... G ......................... T .......................................................... T .... .............................................. T ................................... C ..................... A A .... A ............................................. TA..........G....T..............T.....................T.T..T .... .............................................. T ........... G T.......T..............C..............T .... ........ ............. T ................................ TA...............T.......T..............C ....................... ..... T..C .. .T..C..CT..TCC...TAC...T.CT................T.....T.......C....A.A.T..GC.....C............T......C .. A .... T..CA..TT.C..CT..TCC...TAC...T.CT................T.....T.......CT...A.A.T..GC.....C..TTT.ATG.TTT...CT.C .. ..... T..C .. .T..C..CT..TCC...TA....T.CT................T.............C....A.A.T..GC.....C..TTT.ATG.TTT...CT.C .. ..... T..C. ..T..C..CT..TCC.G...C...TCCT..................C...........C......A.T..GC.....C..TTT.ATG.TTT...CT.C .. ............... C..CT..TCC...TAC...T.CT......C........TT...C........GC.........T.....................T...CT.CG C ............ T..C..CT...CC.G...C...TCCT......C........TT...C.....A..GC...C.....T........C..TTT.ATG.TTT...CT.C .. .......... A .... C.......CC.G.......T...T.....C...........CTG.........C......A.T..GC.....C............T......C .A A ....... ..A....C ...... .CC.G.......T...T.....C...........C...........C......A.T..GC.....C............T......C .A ............ T..C..CT..TCC.....C...TCCT.....A....T..T....CTG.........C......A.T..GC.....C.............T ..... ..C ....... A..A.T..C..CF..TCC.G...C...TCCT.....A...A........C...........C.....GA.T..GC.....C............T......C .A ............ TA.CT.C....CCTG.T.C.....C..C.......A......TG.........TC.GTGT...A....GC..............G.T.T ...... ..A ............ T..C..C .. ..CCTG.T.C....CCT.C.......A......TG.........TC.GTGT...A....GC ............................ .... T ..... ..T..C..C....CCTG.T.C.....CT.C..A....A......TG.........TC.GTGT...A....GC ............................ .. T ....... ACT..C.AC......................TA.........C.T.C.GTGATT....CT.....T....GCC....C....T.A..G..T.T .... ..C ..T.......ACT..C.AC......T..........................C.T.C.GTGATT....CT....GT....GCC....C....T.A.CG..T......C .. ............................ T.................T...................................C.................T......CT T A...........T...~....~.~C.G...~........G.T............T...G.........~......AT.TAGC.T...C......A.GGT.T......CT T ............ T...C....C.CC.G.T.C........G.T........T...T...G.........C......AT.TAGC.T...C......A.GGT.T......CT T .A.T...........CC....CA...GTT.CATA..CTT.C....C...GTA.TT.C.......CTC.CTAT...G ... ..C...T.CTGT.T.AGG...T.....AC .A .A.T .......... .CC....C.....TT.CATA..CTT.C....C...GTA.TT.C...GAT.CT..CTAT...G.....C...T.CTGT.T.AGG...T.....AC .A ...... GAAA..A.T...............................T.........................A...........A...............T..A...C .A ...... G..A....T........C..G...C.G..AG.....A...T.........................A...........A...............T.TA....A C .... T ........ T .......... C..........................T..........................T.......C......T...........T .... 33333331313331333133233333333333331333333333333333333333323333333333333~332333332333333333333~33333333~3333~23

1111111111111111111111111111111111111111111111111111111111111111 8888899999999999999999999990000000000000000000000011111111111111111111112222222222222233333 8889911222233334566788889990011122223455667788899900000111223445555889990122345556677901224 3581715237902365639814790345847802675103281434957813457069586682589122476514351470925628021 ** l **** * * * * ** * * ** ** * ** *

Consensus CATTACGGCGCGCACGACCTTCGTCCTGCGCCACCGTTTCCTTGT~CTTCCTC~~CTCCGTCTT~T~C~C~~~TATC LT2sl M35sl S41sl M46sl M73sl M229sl M36sl M55sl M295sl S71sl Ml361 M298sl M311s2 M494s2 M261s2 M496s2 M287s2 M38s2 Ml3062 M497s2 M495s2 M316s3b M498s3b M317s3b M313s3a M314s3a M318s4 M319s4 M32Os4 M321s5 M322s5 M324s6 M325s6 M326s6

............... T ............................ G ................. ACTC .........................

............... T ....................... T ..... A ............... T.CTC .........................

......... A G..............A..............G.................ACTC.......A .... .................

......... A ................... AA.............G.................ACTC..A....A .................

......... A ....... T ............ A ........................... T..T.CTC............A ............

............. G ........................ GT...................A...CTC..A....TT..............A.

......... A ....... C A....A.AA...............T..........A...T...........TT..............G .... .

.... ..C..A.......C............................T..........A...T...........TT ................ .C............................T........T.A...T...........TT

~~~~~~~~~~~~~~~~..............................T.........~...T......A ................

.. ..TT..............A. ......... A ....... C ............................ T ........ T..T..T......A....TT..T ............. ............. GT..C.......................C.A...T......A.A.........C.ATC.A..T...........CG .. .... C..AA....G.......GA.................T..A....A.T.C........G....C...C................CG .. .... C..AA....G........A.................T..A....A.T.G........G....C...C................CG .T .... C..AA....G..G.....A....................AG................GA.TCC........A...........CG .. A ... C........G.....C.TA.................T..A.....CTGCG............C...C..A.A........A..CG .T A..C..C....T.G.....C..A..........C......T..A.....CT.CG............C...C..A.A........A..CG .T A..C..C .... ..G..GT....AC.....A..TC.C................CG.......G....C...C.A..A......A.A..CG .. AC.C..C......G...C....A......A..G.......T..A........CG.......G....C...C.A..A........A..CG .. A.G. ..C..A......G.....A......A..GC......T..A....A..GCG.......G....C...C....A....C...A..CG .. AG.C..C.....TC..GC...GAC......A....C....T..A........C........G....C...C...........A.A..CG .. AC...T...C...GT.GT.C.........C...........C.A...T......A.A.........C.A.C.A.........AG...CG .. ................ GT.C...C.....A...C......T.........................C.A.C...........AG...CG .. ......... A.A.....C......................TC.A............A..A......C...............AG...CG .. A.GCGT ... ..TTGT..T..C....T.....T..T.GC.G.CCA.A.T........A......CTCCG........A.TTA.CGTTGCG .T ... CG.....TC.GT..C..C....T.....T..T.GC.G.CCA...T........A......CTCCG........A.TTA.CG.TGCG .T .G.CC........G....T.C.A.T....A........GG.C.AC ............. .AT.....C.ATCA.....T...C.G..G .... .GCCC........G...AT.C.A.T....AA ...... .GG.C.AC...............T.....C...CA.....T...C.G..GCG .. .GCCC....C..TC...C..C.A.T....AA..C....GG.C.AC .............. .T.....C.A.CA.....T...C.G..GCG .. A..C..A..CTCGG...C........C..AA..C...C........T..........A...T...........TT ................ AC.C.....A...................A..............G.................ACTC ......................... ... C.............T........C.G....C.......C.A..A...................CG.....T.............CG .. ... C.TC...T.GCG..C.C..A....AG....C.......C.AG.A.....................A...A.T ................ ......... AT .... ..C.C.........A...C.....T.......................CTC..A....TT ................ 13333231232323332333333233133333~323133333333313323323~333333232333323~33333333333333333~323

FIG. 1 .-Alignment of informative bases of Sulmonellu entericu g&gene sequences showing only those bases that differ from the consensus sequence shown above. Asterisks indicate nonsynonymous substitutions, and the position of the base in its codon is indicated in the last line. Suffix s 1, a2, and so forth, indicates subspecies.

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

8 18 Thampapillai et al.

E. coli I K-12 tering of silent sites present in our real data set cannot be reproduced by permuted data sets. One set of results of Sawyer’s test is presented in table 2. One point to note in the four condensed fragments shown in table 2 is that unlike in the other three pairs of strains, the similarity between strains Ml 30~2 and M495s2 from region 429 to 984 bp is not due to recombination but to common descent, and the dissimilarity between these two strains in the segment spanning 16-429 bp is due to a recom- bination event at this region in strain Ml30s2.

Several small segment transfers of DNA of about 30- 150 base pairs were also observed in many of the 34 gnd sequences. Sawyer’s test was carried out on several sets of strains to provide statistical evidence for these gene conversion events, and the P values obtained were highly significant.

a C. freundii

S. enterica

/F

-v--c_ M32ls5 M322s.5

i-- M??i311s2

99 ‘I _ M36sl MS1 k- M2995s 1

3% S7lsl

Ml3sl

C

95 M319s4 < M32Osd Iv 98 M38.e

66 % M13Os2 ‘I

- - llla-f M313s3&314s3a s4lsl

43 * M46Sl M318.d _ ___

0.1

Genetic distance

VI d

L M13sl

M313s3a -3 14s3a

FIG. 2.-Trees derived from the Salmonella enterica gnd sequences using the NJ method and the bootstrap values. a, Tree for all 34 se- quences analyzed. b, Tree after omitting sequences from strains M298sl and M3 18~4 and also the 3’ end from position 916 of strains M321sS and M322s5. c, Tree for bases 16-429 only. d, Tree for positions 931- 1344. All trees included sequences of Escherichia coli K- 12 and Cas- trobacter freundii 396 (outgroup), but these are shown only for tree a.

and recipient parts of the DNA segments with some de- gree of accuracy. This rationale has been used previously for detecting recombination in other species (Gyllensten et al. 1991).

The statistical test for detecting gene conversion devised by Sawyer ( 1989) was also applied to confirm the observed recombination events. A subset of eight strains, S4 1 s 1, M322s5, M298s 1, M 130~2, M495s2, M3 11~2, M3 18~4, and M320s4, was selected, and the distribution of the silent informative sites in the 1329- bp gnd DNA was analyzed. The P values for the sum of the squares of the condensed fragment lengths (SSCF) and maximum condensed fragment length (MCF) were highly significant. The extremely low P values of <O.OOO 1 for both parameters show that the type of clus-

Recombination in Strain A4318s4

Strain M318s4 is a subspecies IV strain that has a chimeric gnd sequence with segments derived from sub- species I and IV. The first 77 1 bp of its sequence is of subspecies I type and identical to the sequences of strains S41sl and M46sl (figs. 1 and 3a). From position 8 19, it resembles other subspecies IV strains, M3 19~4 and M320s4, and not the subspecies I strains, S4 Is1 or M46sl. As allelic variation in this gene in general cor- responds to subspecies variation, the subspecies I DNA present up to position 77 1 in strain M3 18~4 must have been acquired by recombination. The junction of the subspecies I and subspecies IV DNA could be anywhere between positions 77 1 and 8 19 (or even 88 1). A single- base variant, 5’ CCTGGTGG 3’, of the general recom- bination stimulating sequence chi, 5’ GCTGGTGG 3’, is located at positions 744-75 1 bp. This chi-like sequence is near the 3’ end of the deduced donor sequence thought to be derived from a subspecies I strain. Sequences of M3 18~4 and related strains illustrating the recombina- tion event and showing the chi variant are given in fig- ure 3a.

Recombination in Strains M38s2 and MI3Os2

Although the NJ tree constructed with informative bases present in the entire region sequenced places the sequences of strains M38s2 and M 130~2 with other sub- species II strains, a phylogenetic tree constructed with informative bases in the first 429 bases places these two strains in a branch away from that of subspecies II (fig. 2~). We suggest that the segment to 429 base, unique to strains M38s2 and Ml 30~2, was acquired by recombi- nation. We set the junction in the vicinity of position 429-477 bp. Alignments of M38s2 and Ml 30~2 with related strains M495s2 and M497s2 are shown in fig- ure 3b.

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

Table 2

Molecular Variation at the gnd Locus of Salmonella enterica 8 19

Results” of Sawyer’s Test for Gene Conversion (1989)

A. Estimation of the Significance of SSCF and MCF Values

Statistics Observed Score P Value b Mean ’ SDd above Mean SD” of Scores

SSCF . 6,713 0.0000 3,337.1 15.751 214.3 MCF . 35 0.000 1 15.1 6.022 3.3

B. Four Largest Condensed Fragments in the gnd Genes of the Set of Eight Strains

Strains Fragment Condensed

Fragment Length Pe Value

S41sl/M318s4 . . 16-849 35 0.000 1 S41sl/M322s5 849-1344 27 0.0069 M130s2/M495s2 . . 429-984 23 0.0335 M298sl/M31ls2 . 39-603 22 0.0516

a Results of permuting 63 silent informative sites in eight strains 10,000 times. b Relative number of permuted data sets with SSCF and MCF scores greater than or equal to the observed score (real

data). c Mean for the 10,000 permutations. d (Observed - Mean)/SD. e Relative number of permuted data sets having MCF value greater than or equal to observed fragment length.

The same single-base variant of chi reported above is also present in strains M38s2 and M 130~2 at positions 424-43 1 bp, very close to the end of the distinctive seg- ment (fig. 3b).

Recombination in Strain M298sl

Twelve subspecies I strains have been analyzed at the gnd locus. With the exception of the sequence of M298s1, they show very little variation, particularly up to position 960 bp. The sequence of M298sl is atypical for this subspecies; irrespective of the region used in phylogenetic construction, it is placed in a branch dif- ferent to the one in which other subspecies I strains clus- ter. Therefore, transfer of the whole gene is invoked. The gnd sequence of M298sl has segments representa- tive of subspecies II at the 5’ end and segments repre- sentive of subspecies IIIb at the 3’ end, which suggests multiple recombination events at this locus.

The subspecies II gnd gene is highly variable, and those analyzed fall into two types (fig. 2), which we refer as type IIa and IIb simply for this discussion. A major difference between the two types is the presence of the chi-like sequence at positions 744-751 bp in type IIb strains M38s2, M 130~2, M497s2, and M495s2. The type IIa strains M3 11~2, M494s2, M26 1~2, and M287s2 have clustered substitutions in and around this chi-like region, with the exception of type IIa strain M496s2. The gnd sequence of M298sl is almost identical to those of M3 11~2 and M494s2 (type IIa) from the start of the gene to around position 705 but differs from them by

possessing the chi-like sequence in position 744-75 1 bp as in type IIb strains. Beyond this region, sequence of M298sl resembles that of subspecies IIIb (figs. 2c, 2d, and 3b). We suggest that there could have been two con- secutive transfers of gnd DNA, whole-gene transfer with a subspecies IIIb strain as donor initially and a subse- quent chi-mediated transfer of a partial segment at the 5’ end from a subspecies II strain.

Recombination in Subspecies V Strains, M321s5 and M322s5

Our analysis involved 12 subspecies I and two sub- species V strains. There is little variation within either subspecies in the 5’ end, other than in strain M298sl discussed above, but considerable differences exist be- tween the two subspecies, as expected from the MLEE analysis (Reeves et al. 1989; Selander et al. 199 1). How- ever, in the 3’ end, from about position 930 bp, the sub- species V DNA is almost identical to that of subspecies I (fig. 34. An NJ tree constructed with the informative bases present in the region 9 15- 1344 bp (fig. 2d) places the two subspecies V strains with subspecies I strains. This is an unusual placement for subspecies V strains that could only be possible if they have acquired a sub- species I segment by lateral transfer. A third subspecies V strain sequenced in part also had subspecies I DNA in this region (data not shown). The subspecies I strains are very similar to each other and do not have any dis- tinct groups within them up to position 1089 bp. From position 1089 to 122 1 bp they form two distinct groups,

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

820 Thampapillai et al.

a 1111111111111111111111

1222333444556666666661 77771771 7778888888889999990000001111111112222333 9459029468230122455670 44444455 6671478888992368891556783455558991467012 2657340153240817804095 45678901 2819941258177391707038436825892475502280

S41sl . . . . . CCTCGTGG . ..GTTCGA.TAAACTGC.TCTGGCACTCTGTGACAAA..

rvl46sl . . . . . . . . . . . . . . CCTCGTGC GTTCGA.TAAACTGCATCTGGCACTCT.TGACAAA..

M318 CATTTATTCTCTTCCCCATCCT CCTCGTGC TACTCCTTCTCCGGTCATGGGCACTGTCTCACACTCGGTA

M319S4 TCCCCGCGTCTGCATTAGCTTC CCTGGTn G GGT c. h%?&i‘t TCCCCGCGTCTGCATTAGCTTC CCTGGTn G GGT C. CCC. A. . . . . . . . . . , . . CG

A. . . . . . . . G. . . . . . CO

b 1111111111

1122233444 44444444 44558899990000012222 35945629112 22222233 77338956881222900267 90292140170 45678901 17342163494027756495

M13Os2 AACTCGGTGTC CCTGGTGG GCCTACACCTAGGGTGAAGA

M38s2 G.. . . . . . . . . CCTGGTGG . . TG.. GT. C. TCC.. . . A.

M497s2 GGTCTTACCCT CCTGGcGG ATTGCTGG. C. A. G. .

M495s2 G. TCTT. CCCT CCTGGcGG AT. G. GCGA. C. CGGA.

C 111111111111

223334445555666666771 11171117 7777888899999000011111222 77560240482499000012034 44444455 5611341922368669900148067 28219423195614036724581 45678901 6214794123631285717662692

. . . . . . . . . . C.

TT t CTGGTa G TGTTCT. CAACGATTCATGGGGG.

M31ls2 . . . T. TT t CTGGTaG TGTTCT. CAACGATTCATGGGGG. .

bf%%sl TCTTCAATTGCTCTCCCCATCAC CCTGGTGG CTCCGCCAGCTCGCCTTCAACAAGA

M317s3b CTCGTGGCCAGCTCGTGTGGT.. CCTGGTGG T. . C.. T. C.. G.. GGAG

M316s3b CTCGTGGCCAGCTCGTGTGGT. CCTGGTGG GT. T.. T. . . AG

11111111111

2223333344444445555555566666667771111788888899000~111~~22

45445339990122588001123890000158013345561578892618814455512

543653906927096031709218~036784451584062984231734399682~8~1

M3218!5 ATCCCTTCATACTTCCCGTATTCCTCTATGCTCTGTTAGGTACAACCCATTATGTCTTT

M322t3!5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..AG.GCGCACTCCC

M55Sl GCTATCCTGCGTCCGTTACGCCT’I‘CTCCCCTCTCCACGCTGGT.CTA.G..........

GCTATCCTGCGTCCGTTACGCCTTCTCCCCTCTCCACGCTGGTGCTGGGGCGCACTCCC

FIG. 3.-Alignments of sequences to show deduced major recombination events. The recombinant sequences are shown in boldface, and the sites shown are those that differ in the other strains included in each comparison. The chi-like sequences postulated to have been involved in recombination are included in a, b, and c are shown in boldface.

and the sequence of one subspecies V strain, M322s5, is similar to that of LT2s 1, which differs from the con- sensus sequence at positions 1148, 1152, 1155, and 1158 bp. The sequence of the other subspecies V strain, M32 1~5, is similar in this region to that of the other group, eg M55s1, which differs from the consensus se- quence in positions 1089, 1119, 1146, 1215, and 1221 bp. This suggests that the differences within subspecies I strains in this region must have evolved before the transfer of subspecies I DNA into subspecies V gene.

Interspecies Recombination in the gnd Gene

Evidence for interspecies transfers involving large segments of DNA is absent in the Salmonella enterica gnd gene. A similar observation was made by Nelson and Se- lander (1992). However, the region between 960 and 1200 bp shows a high level of amino acid diversity in both S. enterica and Escherichia coli gnd genes, and transfers of small segments from divergent species cannot be dis-

counted in this region. In the gapA gene, Nelson and Se- lander ( 1992) observed a transfer of a short segment of DNA from Klebsiella pneumoniae to subspecies V strains of S. enterica. This transfer resulted in six amino acid changes within 48 bases. Similar clustered amino acid changes have been observed in the 960- to 1200-bp region in the S. enterica gnd gene, a particularly striking case being the segment 989-1029 of M38s2, which includes four amino acid substitutions within 40 bases (fig. 4). However, as no donor species is known, it is difficult to confirm interspecies recombination.

We have used the 16 published E. coli gnd se- quences to compare with S. enterica gnd sequences; among them, those of ECOR 4 and ECOR 16 are very divergent. At the DNA level they are more than 15% divergent from each other and from the other E. coli gnd sequences. This is near the average level of DNA divergence between E. coli and S. enterica seen for many genes and is unusually high for alleles within a species,

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

Molecular Variation at the gnd Locus of Salmonella enterica 82 1

1111111111111 11222333444444455588888889999990000000001222

359456239112277913312788892356802222226790056 90292147017091780349242581736393012782473679

ml3Os2 AACTCGGTTGTCTGCCGCTTCCACTCGGACTCGCGGCTATCAGG nl38a2 G . . . . . . C . . . . . . . . . TG.. ..A....GTCATGCCGCG....A m497B2 GGTCTTACCCCTCATTTTGGTTCAGTmGG....C....AGGC.

33333331333333333233313333~32321312123322333

I 990 1000 1010 1020 1030 1040 I I I I I I

I Lys Ile Val Ser Tyr Ala Gln Gly Phe Ser Gln Leu Arg Ala Ala Scr Asp Glu Tyr Him Trp A8p

ml3Os2 AAA ATC GTC TCC TAT GCG CAG GGC TTC TCA CAG CTG CGC GCC GCG TCT GAT GAA TAT CAC TGG GAT m38a2 . . . .C. . . . . . . . . . __. A.. .._ . . . . ..T GC. .._ CG. . . . . . . . . . . . . . . . . . . . . . In49762 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .c. . . . . . . . . . . . . . . . . . . ..* . . . . . . In3862 Thr LYS Ala -g ln49782 Pro

FIG. 4.-Alignment of a segment of gnd gene sequences of strains M 130~2, M38s2, and M497s2, with LT2sl as the reference strain to show a short, highly divergent segment in M38s2.

as previously noted by Dykhuizen and Green ( 199 l), who sequenced these two divergent E. coli gnd alleles.

An NJ tree of the E. coli, S. enterica, and Citrobacter freundii 396 gnd sequences using the partial gnd se- quence of Yersinia pseudotuberculosis as an outgroup (fig. 5a) places the ECOR 4 and C. freundii 396 gnd genes in the same branch between the major E. coli and S. enterica branches, an unusual location for a C. freundii strain. The placement of ECOR 4 and C. freundii 396 was the same when the gnd genes of Synechococcus sp. or Trypanosoma brucei were used as the outgroup. The similarity of the DNA sequences of the ECOR 4 and C. freundii 396 gnd genes is shown in figure 6. Observation of interspecies recombination at the gnd locus of E. coli was mentioned by Nelson and Selander ( 1992), although details were not given.

Escherichia coli and S. enterica are more closely related to each other than either is to C. freundii, a view supported by trees based on the gapA and ompA genes (Lawrence et al., 199 1). At the DNA level, the divergence of the gnd sequence of C. freundii 396 from those of both S. enterica LT2 and E. coli K- 12 is similar (15%) to the divergence of the E. coli K- 12 and S. enterica LT2 sequences, but at the amino acid level E. coli K- 12 and S. enterica LT2 are closer to each other than either is to C. freundii 396 (fig. 5b). The unusual placement of the C. freundii 396 gnd DNA sequence perhaps reflects a recombination event involving the gnd’gene. Citrobacter freundii 396 has an 0 antigen that resembles a hybrid of S. enterica groups, C 1 and B (Jann and Jann 1984) and perhaps the r- and gnd genes were jointly involved in an interspecies recombination event. In this context it is interesting that the G+C content at the third base of the E. coli ECOR 4 gnd gene is 0.59, much higher than that of the other E. coli strains in which it ranges from 0.52 to 0.56. In this regard the ECOR 4 gene also resembles that of the S. enterica gene, which has an av- erage value of 0.59, while the range is 0.57-0.62. The corresponding value for C. freundii 396 gene is 0.58.

The C. freundii 396 sequence is very different to those of S. enterica to which its 0 antigen is related, but examination of the sequences shows that at some sites an allele shared by C. freundii 396 and ECOR 4 is also present in most S. enterica strains but not in other E. coli strains. It appears that there may have been some complex interactions. Relationship between the rfb Locus and Recombination in the gnd Gene

The deduced donor segments in recombinant strains M3 18~4, M298s1, M 130~2, and M38s2 extend

E. coli ECOR 16 99 C. fieundii 396

a ECOR 4 93 LT2sl

MS51 M3 17s3b

84 S. enterica

IF

M32Os4 M321sS

M3 14s3a

E. coli

Y. pseudotuberculosis M89

0.1

0.08

0.06

0.04

0.02

0

FIG. 5.-a, NJ tree of the gnd sequences of Citrobacter freundii 396 and representative strains of Escherichia coli and Salmonella en- terica to show the relationsip of C. freundii 396 and ECOR 4. Bootstrap values are shown at the nodes. b, Variation in KA for comparisons between S. enterica LT2s 1 and E. coli K- 12 (open squares) and between S. enterica LT2 and C. freundii 396 (solid squares). Each point rep- resents the values for a 198-base segment: interval of 21 bases.

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

KU 1 hampaptllai et al.

1111111111112222222222333333333444444444445555555556666666666667777777 9990112233468890034566677223444558113455567890112234570134455567770013678 0138146758420621719214703476258477472406924654095810896812901790282515543

C.f 396 CACCAATGGCTTCGTATCCCGAGTGCTCAGTACCACTGAT ECOR 4 . . ..G............ GT..G.CTTC.G.A.TT.TC.G.G..T.....TCTTC........T.A.CT.C... K-12 T.TTGGGAATCCTAGTCTTTC.A.CT.TGTAG..TTCC.A

111111111111111111111111111111111111 7888888888888899999999999999999999999000000000011111111111122222222222333 8001145556688900112222233334455556999122336788903444444888802335677899014 9780992581428469251457903692512479369769582736811345789235804364625179241

C.f 396 TTTTGCACTCCTAGGTCCG~TGCCCTACA ECOR 4 .C......G..CC..G...T.........T.....T........C....C........C.....AG..GT... K-12 .CGGCTGTATT.TCTCTTA.CGCAACCGT.ACATTTCGTG

FIG. 6.-Alignment of the gnd genes (informative sites only) of Citrobacter freundii 396 and Escherichia coli strains ECOR 4 and K- 12

into the gnd gene from the beginning of the sequence. The rJb locus is immediately upstream of the gnd gene, and we suggest that these recombination events could have also involved the rJb gene cluster (fig. 7).

Strain M3 18~4 is a subspecies IV strain carrying 0 antigen 38, predominantly present in subspecies I and IIIb (table 3). We suggest that at least part of the r- region and 5’ end of gnd were transferred together to strain M3 18~4 from a subspecies I strain by a chi-stim- ulated recombination event. A similar conclusion can be drawn for strain M298s1, which carries 0 antigen 48, prevalent in subspecies IIIb and to a lesser extent in subspecies II and has a gnd gene with segments from both of these subspecies.

In strain M326s6 the entire gnd gene appears to be derived from subspecies I: it carries 0 antigen 45, which is predominantly found in subspecies I, which again suggests the possibility of a recombination event involv- ing rfb and in this case the whole of the gnd gene. Strains M 130~2 and M38s2 carry 0 antigens Dl and D2, re- spectively. These are predominantly subspecies I epitopes but also present to a significant degree in subspecies II. It is possible that these 0 antigens were transferred to subspecies II together with the 5’ end of the gnd gene. However, although the sequences of strains M 130~2 and M38s2 are similar in the 5’ end, especially up to the

M318S4

M29%1

M3&2

M13Os2

M326s6

IfB locus 5’ ‘gnd gene 3’

I 038 I I I * IV I

I 048 1 I II *

r D2 I F * II 1 t Dl I r

< * II

045 I I I

FIG. 7.-Diagrammatic representation of postulated recombi-

nation events involving both the rfb gene cluster and gnd gene. Shaded

regions represent sequences to have been derived by transfer from an-

other subspecies with the subspecies of origin indicated. Where two

donors are implicated, two levels of shading are used.

junction site, and would appear to derive from a single recombination event, they have different 0 antigens D 1 and D2. The r- regions of groups D 1 and D2 differ only in the central region of the gene cluster (Xiang and Reeves 1994), and substitution of one for the other in the central region could have occurred in a separate event involving only rfb locus. Recombination involving the rfb cluster and the gnd gene occurs in relatively few strains, and in general gnd variation corresponds to the subspecies relationship. The overall variation in rfb is more extensive, and the distribution of 0 antigen types is widespread, with most forms present in more than one subspecies (Ewing 1986). Intersubspecies recombi- nation involving the r- genes must be relatively com- mon, and it appears that in many cases recombination affecting rJb has no effect on gnd genes.

Chi Activity in the Salmonella enterica gnd Gene

In the recombinant gnd sequences of strains M3 18~4, M298s1, M38s2, and M 130~2, a chi-like se- quence 5’ CCTGGTGG 3’ is located at the 3’ end of the deduced donor segments. Chi, an octamer DNA se- quence element, 5’ GCTGGTGG 3’, has been shown to stimulate recombination in Escherichia coli (Stahl 1979) and Salmonella enterica (Smith et al. 1986). The chi

Table 3 Distribution of Selected 0 Antigens among Subspecies”

0 Antigen I II IIIa IIIb IV V Total

038 23 3 1 18 2 0 47 048 . . 5 10 5 19 3 2 44 Dl . . . 55 24 0 0 0 0 79 D2 51 11 0 0 0 0 62 045 15 9 4 0 2 0 30 Others . 1,103 339 83 231 35 5 1,796

Total 1,252 396 93 268 42 7 2,058

’ Number of serovars with given 0 antigen. From Ewing ( 1986).

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

Molecular Variation at the gnd Locus of Salmonella enterica 823

111111 11111111 11111111111111111111111111111111111 888889967778888889999999000011 11111111 11111111122222222222222233333333333 5S5770090143555710066789024512 33344444 45567889900012233456789900122334666 258060438178258060409650837605 78901234 63549582835681439840840158706588258

s333381 CGGCCCCGCTCCCGGCCCCCCAGGGGCGGG CCTGCTGC AACGTCTCCGGCTACAACTGACGCTTGGCAACGTC s419441 ............................ GCTGGTGG .. ..C .............................. S304hS ...... T..C .T.. .... T.. .... A.. .. GCTGGTGG C..C.TC .. ACG..TC.TCAC..TCCCTTCC..C T

S304& ....... ..C.T.............A .... GCTGGTGG C..C.TC .. ACG..TG.TCAC..TCCCTTGG..C T S298 5a2 .. . ...... C.. .. T.. .. T.G ........ GCTGGTGG.. .. C ..T ......... T.. .... C.C.T.G..C. S2993r2 .. T. ..... C. ... T .... T. G. ....... GCTGGTGG .... C. T. ........ T. ..... CCC. T. ... C. s3013s7 AATTGATATCT. AATTGAT. T. ATA. TAAA GCTGGTGG GGA. C. T. . , AC. . GT.. . TA. CCC. T. GTA. A s3014r7 AATTGATATCT. AATTGAT. T. ATA. TAAA GCTCGTGG GGA. C. T. AC. . GT. . . TA. CCC. T. GTA. A

FIG. K-Alignment of the putP genes of subspecies I, II, V, and VII on both sides of a putative chi-mediated recombination event. Data from Nelson et al. ( 1992). Presentation as in fig. 3.

sequence is recognized by the multifunctional enzyme RecBCD, which mediates recombination by the RecBC pathway (Smith 1987; West 1992). The substrate for RecBCD enzyme is a linear double-stranded DNA that the enzyme unwinds and degrades as it moves along (Ganesan and Smith 1993). Recognition of the chi se- quence terminates the nuclease activity of the enzyme (Dixon and Kowalczykowski 1993). The enzyme con- tinues to unwind the DNA, releasing a single-stranded DNA with a 3’ tail that is the ideal substrate for RecA protein, which, together with SSB protein, initiates strand invasion, the first step in the complex recombination process. The resulting recombinant would therefore have the donor DNA at the 5’ end, with the chi sequence almost defining the 3’ end of the donor DNA (Smith 199 1). The location of the chi sequence and its polarity with respect to donor and recipient DNA segments in the S. enterica gnd gene recombinants M3 l&4, M298s1, M38s2, and M 130~2 is in agreement with this model for chi-dependent recombination. The chi-like sequence in these strains is also oriented in the directions of repli- cation and transcription as found for most chi sites in E. coli (Burland et al. 1993).

However, there are two discrepancies. The chi ele- ment associated with these junctions is a single-base variant of the E. coli chi sequence. Single-base mutations in chi are known to reduce the level of chi activity in E. coli to varying degrees depending on the base and po- sition within chi (Smith et al. 1984), but we suggest that this variant 5’ CCTGGTGG 3’ is associated with recom- bination functions under natural conditions in S. enter- ica. In strains M3 18~4, M38s2, and M 130~2 the junction appears to be at least 20 bp 3’ of the chi sequence rather than 3- 10 bp as observed in in vitro experiments (Pon- ticelli et al. 1985). However, it should be noted that there are other aspects of the in vivo effects of chi, which re- main unexplained (Smith 1987). We estimated the probability (Krowczynska et al. 1990) that the proximity of a chi site or any of the 24 single-base variants being within 50 bp of the recombination junction by chance to be 0.00069 for three of five junctions (for this cal-

culation we conservatively treat M38s2 and M130s2 as deriving from a single recombination event).

Chi Activity and Recombination between Subspecies I and V in p&P

As chi-like sequences are associated with several recombination events in the gnd gene of SaZmonelZa en- terica, we looked at the gapA and p&P data in both Escherichia coli and S. enterica for evidence of chi ac- tivity. An additional example of possible chi activity was observed in the putP sequences of S. enterica (Nelson and Selander 1992). The putP sequences of subspecies V are the most divergent among the S. enterica putP genes in this set, yet in the 1467-bp coding region of this gene, the sequence of subspecies V, from about positions 400 to 1145 (about 750 bp), is very similar to that of subspecies I (fig. 8). This similarity between the two sub- species ends at the region at about 1145 bp, and the DNA beyond this region in subspecies V has the expected level of divergence characteristic for this subspecies. This unexpected similarity between subspecies I and V could have been due to the acquisition of subspecies I DNA by subspecies V; the possibility of recombination be- tween the two subspecies was suggested by Nelson and Selander (1992). There is an authentic chi site between positions 1137 and 1144 immediately adjacent to the recombination junction in subspecies V putP genes. This chi site, as for those in the gnd gene, is in the same orientation as replication and transcription. It is re- markable that for both gnd and putP, the genes of sub- species V strains have substantial segments from sub- species I, which suggests that, despite the great divergence between the subspecies, this may be a widespread situ- ation.

Nelson and Selander (1992) have also proposed a recombination event in the putP genes of subspecies VII. The same central region described in the recombination of subspecies V putP sequences is involved and the same chi site is also present in subspecies VII putP sequences. We suggest that the recombination in subspecies VII of

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

824 Thampapillai et al.

putP could also have been stimulated by the same chi element.

Comparison of Polymorphism in the gnd Genes of Salmonella enterica and Escherichia coli

The gnd gene of Escherichia coli has been reported to be extremely variable, and a major objective of this study was to investigate the level of polymorphism in Salmonella enterica. Among the 16 E. coli sequences available, there are 378 polymorphic sites compared to the 3 18 polymorphic sites among the 34 S. enterica se- quences. Nucleotide diversity, X, the average number of nucleotide substitutions per site for a set of alleles sam- pled from a population (Nei 1987), was used to estimate the level of variation in the gnd gene and other genes of these two species (table 4). We are limited by the number of genes sequenced in multiple strains of S. enterica (data for only gnd, p&P, and gapA are available). The data presented in table 4 show that the n: values for E. coli genes range from 0.002 for gapA to 0.074 for the gnd gene with the other four in the range 0.011 to 0.024. The n: values for the three genes of S. enterica show much less variation, and the gnd gene with a n: value of 0.048, about two-thirds the value for the gnd gene of E. coli, seems typical for S. enterica, although the number of genes studied is very small.

variation than their homologues in S. enterica genes. For putP this correlates reasonably well with the depth in their respective MLEE trees (Reeves et al. 1989; Her- zer et al. 1990; Selander et al. 199 1). The remarkable difference in the variation in gapA in the two species shows that species specific effects can be very marked, and there is no explanation as yet for the gapA situation. The gnd gene of E. coli is still seen to have a relatively high level of variation for the species, even if two highly divergent strains are excluded. This is not the case in S. enterica, which suggests that for two of three genes for which we have comparative information, different cir- cumstances may apply in the two species.

Synonymous and Nonsynonymous Substitutions in the gnd Gene of Salmonella enterica and Escherichia coli

The E. coli gnd sequences used in this analysis in- clude two very divergent sequences: those of ECOR 4 and ECOR 16, of which ECOR 4 shows evidence of interspecies transfer (see the section on interspecies re- combination). If these two aberrant strains are omitted from the analysis, the nucleotide diversity seen among the other 14 E. coli gnd sequences is almost the same (0.046) as that observed for S. enterica (0.048).

We computed Ks, the number of synonymous sub- stitutions per synonymous site, and KA, the number of nonsynonymous substitutions per nonsynonymous site (see Material and Methods), and compared the values for the Salmonella enterica and Escherichia coli gnd se- quences. We have presented two sets of Ks and KA data for E. coli. One, the E. coli (16) set, includes the two aberrant strains, while the E. coli ( 14) set excludes them. We also examined Ks and KA in comparisons between E. coli and S. enterica and between subspecies 1 and II of S. enterica (table 5).

The small number of genes studied and the varia- tion between genes makes it difficult to establish norms at this stage and hence to evaluate the situation for the gnd gene. The E. coli gapA and putP genes have less

Values for Ks and KA vary along the gene (table 5 and fig. 9), and in all three groups the variation in KA is more marked than that in Ks. This is expected as struc- tural and functional constraints limit substitutions at nonsynonymous sites, and in the gnd gene we observe strong purifying selection to about position 900. In all three groups, purifying selection is strongest in the N terminal domain (fig. 9a). A similar level of purifying selection is observed in the N terminal region of the second domain in S. enterica and E. coli ( 14) sets, while

Table 4 Nucleotide Diversity, a, of Escherichia coli and Salmonella enterica Genes

Gene Escherichia coli Salmonella enterica Reference for Sequences

gnd . . . .

gapA . . . putP . phoA . . ccl C . . . err . . . . gutB . . .

0.074 0.048

0.002 0.038 0.024 0.048 0.018 Gene absent a 0.012 Not available 0.011 Not available 0.014 Not available

Bisercic et al. 199 1; Dykhuizen and Green 199 1; present data.

Nelson et al. 199 1 Nelson and Selander 1992 DuBose et al. 1988 Hall and Sharp 1992 Hall and Sharp 1992 Hall and Sharp 1992

“No homologue of phoA gene in S. enterica.

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

Molecular Variation at the gnd Locus of Salmonella enterica 825

Table 5 Ks, KA and KA /KS for Salmonella enterica, Escherichia coli, and Subspecies of S. enterica and between Species and Subspecies

Species or Subspecies KS K4 KA /KS

16- 1344 bp: S. enterica ............. E. coli(16) ............. E. coli(14) ............. E. coli (14)/S. ea ........ S. enterica ssp. I ........ S. enterica ssp. II ........ S. enterica ssp. I/IIb ......

16-900 bp: S. enterica ............. E. coli(l6) ............. E. coli(14) .............

901-1344 bp: S. enterica ............. E. coli(16) ............. E. coli(14) .............

0.153 * 0.054 0.011 + 0.004 0.07 1 0.313 zk 0.271 0.011 + 0.009 0.035 0.163 f 0.038 0.006 + 0.002 0.036

0.872 0.023 0.027 0.058 + 0.016 0.005 + 0.003 0.086 0.103 + 0.010 0.011 + 0.003 0.106

0.159 0.011 0.065

0.140 t- 0.059 0.006 + 0.003 0.042 0.304 + 0.249 0.008 f 0.007 0.026 0.168 rt 0.004 0.004 f 0.002 0.023

0.182 + 0.077 0.021 + 0.011 0.113 0.341 + 0.333 0.018 zk 0.015 0.052 0.156 + 0.054 0.009 + 0.004 0.057

a Comparison between E. coli (14 strains) and S. enterica. b Comparison between subspecies I and II of S. enterica.

in the E. coli (16) set recombination has introduced a slightly higher level of amino acid substitutions. The KA values in all three groups suggest that the region at about 735-880 bp is highly conserved, and this is the region where a putative substrate binding sequence is located (Reizer et al. 199 1). The region from 900 to 1200 bp has the highest amino acid diversity and therefore the lowest level of purifying selection, presumably associated with reduced functional and structural constraints in this region.

The KA values beyond 900 bp in the S. enterica and E. coli (16) sets are similar and twice that in the E. coli (14) set. In the E. cob (16) set, the increase is due to the inclusion of highly divergent gnd sequences of ECOR 4 and ECOR 16 that we have shown above to result from interspecies recombination. In S. enterica, although 34 sequences from seven different subspecies have been analyzed, direct evidence for interspecies re- combination has not been forthcoming. Interspecies re- combination in the adjacent rfb locus (Reeves 1993) and cotransfer of the gnd gene with the rfb cluster (Achtman and Pluschke 1986) has been previously shown, and the apparent lack of interspecies recombination in the S. enterica gnd gene was initially puzzling. Further analysis of the gnd gene has shown that the region up to 900 bp could be subjected to efficient dam methylase directed mismatch repair, which would act like an antirecom- binant during the processing of the recombinant inter- mediates and reduce the possibility of interspecies re- combination in S. enterica gnd gene (G. Thampapillai and P. R. Reeves, unpublished data).

However, in the region from 900 to 1200 bp there are clustered amino acid changes in S. enterica (fig. 4). These changes may imply that lateral transfer of small segments of DNA from other related species does occur in S. enterica but could be limited to a small region. If interspecies recombination has contributed to the in- crease in KA in this region in S. enterica, as in the E. coli (16) set, then the dissimilarity of Ks values in the two needs to be explained, as KS for E. coli (16) is twice that for S. enterica. In the E. coli (16) set, while selection pressure has reduced KA in positions up to about 900 bp, KS appears to be unaffected throughout and has in- creased the nucleotide diversity in the E. coli (16) set to twice that in the E. coli (14) set. The low KS values in both regions in the gnd gene in S. enterica therefore implies that unlike in the E. coli (16) set, synonymous sites in S. enterica are also subjected to negative selection in at least the first 900 bp. The constraint on synonymous sites in the S. enterica gnd gene could be related to the conservation of DNA sequences, which are recognized by proteins involved in recombinational repair, includ- ing the chi site. The possible reasons for increase in syn- onymous sites in the E. coli gnd genes will be discussed elsewhere.

Although the synonymous sites in general are not constrained in the E. coli gnd gene, the synonymous sites in nearly the first hundred base pairs are conserved both in E. coli and S. enterica (fig. 9b), and a reduction in Ksat the beginning of the gene has also been observed in other enterobacterial genes (Eyre-Walker and Bulmer 1993). Gene expression studies in E. coli K- 12 have

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

826 Thampapillai et al.

a 0.040 -

. E. coli(l6)

o,035 __ of E. coli(l4)

0 S. enterica 0.030 --

0.025 --

0.020 --

0.015 --

b)

FIG. 9.-Variation of (above, a) KA and (below, b) Ks along the gnd genes of Escherichia coli and Salmonella enterica. Each point rep- resents the values for a 198-base segment with an interval of 2 1 bases.

shown the presence of a 16-bp internal negative control element at codons 69-74 (Carter-Muenchau and Wolf 1989), which may also contribute toward additional se- lection pressure at the synonymous sites in the g&gene.

We next compare the ratio of KA to Ks for inter- and intraspecies comparisons. The ratio K,J& within S. enterica has about twice the value found within E. coli, and the value for the interspecies comparison be- tween E. coli and S. enterica is even lower than that for E. coli. The high level of KA within each species could be due to the clonal nature of bacterial populations. It has been argued elsewhere (Reeves 1992) that these clones can be niche adapted and long-lived and that N,, the effective population size is low for such clones. Under these circumstances most sequence polymorphisms in bacteria arise by fixation of new alleles within clones. The low value of N, in individual clones will increase the proportion of replacement substitutions that are fixed, as with low population size mildly deleterious mutations can behave like neutral alleles (Ohta 1973). There is indeed evidence from the distribution of re- placement substitutions in the gnd gene of E. coli that some replacement substitutions are mildly deleterious (Sawyer et al. 1987). However, while low population

sizes of clones will lead to fixation of mildly deleterious alleles and increase the ratio of K,JK,, this effect would not be expected for fixation of substitutions between species, as that requires that a mildly deleterious form be fixed in all clones, after transfer between clones. Low levels of selective disadvantage, which would not prevent fixation within a clone, would nonetheless prevent it being fixed in all clones, N, for the species being much larger than for a clone.

In the comparison of the ratio of KA to KS between subspecies, the same trend is observed as between E. coli and S. enterica. The ratios within subspecies I and II are 0.086 and 0.106, respectively, yet the ratio between sub- species I and II of S. enterica is 0.065 (table 5). This implies that the mechanism operating to give a lower interspecies than intraspecies K,JKs also operates at the subspecies level.

Conclusions

The phlylogenetic tree constructed with the gnd gene sequences of 34 strains of Salmonella enterica agrees closely with the gene trees generated for the gapA and putP genes (Nelson et al. 199 1; Nelson and Selander 1992). There has been considerable recombination in the gnd gene, and these events did have an effect on the gene tree. The DNA sequences of some of the recom- binants show the involvement of a variant of the chi sequence in stimulating these recombinations. To our knowledge, this is the first time the chi site has been associated with natural recombination events in S. en- terica in a manner that matches the chi-dependent re- combination events observed under laboratory condi- tions in S. enterica or Escherichia coli (Smith 1991).

The polymorphism observed in the rfb cluster has major effects on antigenicity. This cluster is also highly mobile, presumably under selection to generate diversity. The presence of two chi sites in the gnd gene of S. enterica (424-45 1 and 744-75 1 bp), which is located immediately 3’ to the rfb locus, may reflect occasional cotransfer of parts of gnd gene with segments downstream of rfb gene cluster.

The pattern of recombination seems to be rather different in the gnd genes of E. coli and will be discussed elsewhere. Chi activity is not detectable in the E. coli gnd gene; a similar observation was made by Dykhuizen and Green ( 199 1). A biotype-based subspecies structure as defined for S. enterica is not available for E. coli. This would make detection of whole and partial gene transfers in E. coli genes more difficult than in S. enterica. The , transfer of segments observed in the S. enterica gnd gene seems to be preferentially from subspecies I to other subspecies and not vice versa. The presence of subspecies I DNA in subspecies V in gnd and perhaps putP genes is interesting and makes the pair good candidates for

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

Molecular Variation at the gnd Locus of Salmonella enterica 827

population studies, especially in the areas of selection and origin of subspecies.

Sequence Availability

The sequences have been deposited in GenBank with accession numbers U 14336 to U 14369.

Acknowledgments

This work was supported by a grant from the Aus- tralian Research Council. G.T. was a recipient of an Australian postgraduate research scholarship. We thank those referred to in table 1 for their kind donation of strains listed in that table.

LITERATURE CITED

ACHTMAN, M., and G. PLUSCHKE. 1986. Clonal analysis of descent and virulence among selected Escherichia coli. Annu. Rev. Microbial. 40: 185-2 10.

ADAMS, M. J., S. GOVER, R. LEABACK, C. PHILLIPS, and D. SOMERS. 199 1. The structure of 6-phosphogluconate de- hydrogenase refined at 2.5 A resolution. Acta Cryst. B47: 817-820.

ARDESHIR, F., C. F. HIGGINS, and G. F.-L. AMES. 198 1. Phys- ical map of the histidine transport operon: correlation with the genetic map. J. Bacterial. 147:401-409.

BACHMANN, B. J. 1990. Linkage map of Escherichia coli K- 12. Microbial. Rev. 54: 130- 197.

BASTIN, D. A., P. K. BROWN, A. HAASE, G. STEVENSON, and P. R. REEVES. 1993. Repeat unit polysaccharides of bacteria: a model for polymerisation resembling that of ribosomes and fatty acid synthetase, with a novel mechanism for de- termining chain length. Mol. Microbial. 7:725-734.

BATCHELOR, R. A., G. E. HARAGUCHI, R. A. HULL, and S. I. HULL. 199 1. Regulation by a novel protein of the bimodal distribution of lipopolysaccharide in the outer membrane of Escherichia coli. J. Bacterial. 173:5699-5704.

BELTRAN, P., S. A. PLOCK, N. H. SMITH, T. S. WHITTAM, D. C. OLD, and R. K. SELANDER. 199 1. Reference collection of strains of the Salmonella typhimurium complex from natural populations. J. Gen. Microbial. 137:60 l-606.

BISERCIC, M., J. Y. FEUTRIER, and P. R. REEVES. 199 1. NU- cleotide sequence of the gnd gene from nine natural isolates of Escherichia coli: evidence of intragenic recombination as a contributing factor in the evolution of the polymorphic gnd locus. J. Bacterial. 173:3894-3900’.

BURLAND, V., G. PLUNKETT III, D. DANIELS, and F. BLATT- NER. 1993. DNA sequence and analysis of 136 kilobases of the Escherichia coli genome: organisational symmetry around the origin of replication. Genomics 16:55 l-56 1.

CARTER-MUENCHAU, P., and J. R. E. WOLF. 1989. Growth rate dependent regulation of 6-phosphogluconate dehydro- genase level mediated by an anti-Shine-Dalgarno sequence located within the Escherichia coli gnd structural gene. Proc. Natl. Acad. Sci. USA 86: 1138- 1142.

CROSA, J. H., D. J. BRENNER, W. H. EWING, and S. FALKOW. 1973. Molecular relationships among the Salmonellae. J. Bacterial. 115:307-3 15.

DIXON, D. A., and S. C. KOWALCZYKOWSKI. 1993. The re- combination otspot X is a regulatory sequence that acts by attenuating the nuclease activity of the E. coli RecBCD enzyme. Cell 73:87-96.

DUBOSE, R. F., D. E. DYKHUIZEN, and D. L. HARTL. 1988. Genetic exchange among natural isolates of bacteria: re- combination within the phoA gene of E. coli. Proc. Natl. Acad. Sci. USA 85:7036-7040.

DYKHUIZEN, D. E., and L. GREEN. 199 1. Recombination in Escherichia coli and the definition of biological species. J. Bacterial. 173:7257-7268.

EWING, W. H. 1986. Edwards and Ewing’s identification of the enterobacteriaceae. Elsevier Science, Amsterdam.

EYRE-WALKER, A., and M. BULMER. 1993. Reduced synon- ymous substitution rate at the start of enterobacterial genes. Nucleic Acids Res. 19:4599-4603.

GANESAN, S., and G. R. SMITH. 1993. Strand-specific binding to duplex DNA ends by the subunits of the Escherichia coli RecBCD Enzyme. J. Mol. Biol. 229:67-78.

GYLLENSTEN, U. B., M. SUNDVALL, and H. A. ERLICH. 199 1. Allelic diversity is generated by intraexon sequence exchange at the DRBl locus of primates. Proc. Natl. Acad. Sci. USA 8?3:3686-3690.

‘HALL, B. G., and P. M. SHARP. 1992. Molecular population genetics of Escherichia coli: DNA sequence diversity at the celC, err and gutB loci of natural isolates. Mol. Biol. Evol 9:654-665.

HERZER, P. J., S. INOUYE, M. INOUYE, and T. S. WHITTAM. 1990. Phylogenetic distribution of branch RNA-linked multicopy single-stranded DNA among natural isolates of Escherichia coli. J. Bacterial. 172:6 175-6 18 1.

JANN, K., and B. JANN. 1984. Structure and biosynthesis of O-antigen. Pp. 138- 186 in E. T. RIETSCHEL, ed. Handbook of endotoxin. Elsevier Scientific, Amsterdam.

KROWCZYNSKA, A. M., R. A. Rudders, and T. G. KRONTRIS. 1990. The human minisatellite consensus at breakpoints of oncogene translocations. Nucleic Acids Res. 18: 112 1- 1126.

LAWRENCE, J. G., H. OCHMAN, and D. L. HARTL. 199 1. Mo- lecular and evolutionary relationships among enteric bac- teria. J. Gen. Microbial. 137: 19 1 l- 192 1.

LE MINOR, L., M. Y. POPOFF, B. LAURENT, and D. HERMANT. 1986. Individualisation d’une septieme sous-espece de Sal- monella: S. choleraesuis subsp. indica subsp. Nov. Ann. Inst. Pasteur/Microbial. 137B:2 1 l-2 17.

LI, W.-H. 1993. Unbiased estimation of the rates of synony- mous and nonsynonymous substitution. J. Mol. Evol. 36: 96-99.

LI, W.-H., C. Wu, and C.-C. LUO. 1985. A new method for estimating synonymous and nonsynonymous rates of nu- cleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2: 150- 174.

M~~KEL;~, P. H., and B. A. D. STOCKER. 1984. Genetics of lipopolysaccharide. Pp. 59-137 in E. T. RIETSCHEL, ed. Handbook of endotoxin. Elsevier Science, Amsterdam.

NASOFF, M. S., H. V. BAKER II, and R. E. WOLF, Jr. 1984. DNA sequence of the Escherichia coli gene, gnd, for the 6- phosphogluconate dehydrogenase. Gene 27:253-264.

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from

828 Thampapillai et al.

NEI, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York.

NEI, M., and J. C. MILLER. 1990. A simple method for esti- mating average number of nucleotide substitutions within and between populations from restriction data. Genetics 125:873-879.

NELSON, K., and R. K. SELANDER. 1992. Evolutionary genetics of the proline permease gene (p&P) and the control region of the proline utilization operon in populations of Salmo- nella and Escherichia coli. J. Bacterial. 174:6886-6895.

NELSON, K., T. S. WHITTAM, and R. K. SELANDER. 199 1. Nucleotide polymorphism and evolution in the glyceral- dehyde-3-phosphate dehydrogenase gene (gapA) in natural populations of Salmonella and Escherichia coli. Proc. Natl. Acad. Sci. USA 88:6667-667 1.

OHTA, T. 1973. Slightly deleterious mutant substitutions in evolution. Nature 246:96-98.

PONTICELLI, A. S., D. W. SCHULTZ, A. F. TAYLOR, and G. R. SMITH. 1985. Chi-dependent DNA strand cleavage by RecBC enzyme. Cell 41: 145- 15 1.

REEVES, P. R. 1992. Variation in 0 antigens, niche specific selection and bacterial populations. FEMS Microbial. Lett. 100:509-5 16.

-. 1993. Evolution of Salmonella 0 antigen variation by interspecific gene transfer on a large scale. Trends Genet. 9: 17-22.

REEVES, M. W., G. M. EVINS, A. A. HEIBA, B. D. PLIKAYTIS, and J. J. FARMER. III. 1989. Clonal nature of Salmonella typhi and its genetic relatedness to other Salmonellae as shown by multilocus enzyme electrophoresis, and proposal of Salmonella bongori. J. Clin. Microbial. 27:3 13-320.

REEVES, P. R., L. FARNELL, and R. LAN. 1994. MULTICOMP: a program for multiple sequence comparison and phylo- genetic analysis. CABIOS lo:28 I-284.

REEVES, P. R., and G. STEVENSON. 1989. Cloning and nu- cleotide sequence of the Salmonella typhimurium LT2 gnd gene and its homology with the corresponding sequence of Escherichia coli K12. Mol. Gen. Genet. 217: 182- 184.

REISNER, A. H., C. A. BUCHOLTZ, J. SMELT, and S. MCNEIL. 1993. Australia’s National Genomic Information System. Proc. Twenty-Sixth Ann. Hawaii Int. Conf. Systems Sci. 1: 595-602.

REIZER, A., J. DEUTSCHER, M. H. SAIER, and J. REIZER. 199 1. Analysis of the gluconate (gnt) operon of bacillus subtilis. Mol. Microbial. 5: 108 1- 1089.

RZHETSKY, A., and M. NEI. 1992. A simple method for esti- mating and testing minimum evolution trees. Mol. Biol. Evol. 9:945-967.

SAIKI, R. K., D. H. GELFAND, S. STOFELL, S. J. SCHARF, R. HIGUCHI, G. T. HORN, K. B. MULLIS, and H. A. ERLICH. 1988. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239:487-49 1.

SAITOU, N., and M. NEI. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.

SANDERSON, K. E., and J. R. ROTH. 1988. Linkage map of Salmonella typhimurium, edition VII. Microbial. Rev. 52: 485-532.

SAWYER, S. 1989. Statistical tests for detecting gene conversion. Mol. Biol. Evol. 6:526-538.

SAWYER, S. A., D. E. DYKHUIZEN, and D. L. HARTL. 1987. Confidence interval for the number of selectively neutral aminoacid polymorphisms. Proc. Natl. Acad. Sci. USA 84: 6225-6228.

SELANDER, R. K., P. BELTRAN, and N. H. SMITH. 199 1. Evo- lutionary Genetics of Salmonella. Pp. 25-27 in R. K. SE- LANDER, A. G. CLARK, and T. S. WHITTAM, eds. Evolution at the molecular level. Sinauer, Sunderland, Mass.

SMITH, G. 1987. Mechanism and control of homologous re- combination in Escherichia coli. Annu. Rev. Genet. 21: 179-201.

~ 199 1. Conjugational recombination in E. coli: myths . and mechanisms. Cell 64: 19-27.

SMITH, G. R., S. K. AMUNDSEN, A. M. CHAUDHURY, K. C. CHENG, A. S. PONTICELLI, C. M. ROBERTS, D. W. SCHULTZ, and A. F. TAYLOR. 1984. Roles of RecBC enzyme and chi sites in homologous recombination. Cold Spring Harb. Symp. Quant. Biol. 49:485-495.

SMITH, G. R., C. M. ROBERTS, and D. W. SCHULTZ. 1986. Activity of chi recombinational hot spots in Salmonella typhimurium. Genetics 112:429-439.

STAHL, F. W. 1979. Special sites in generalised recombination. Annu. Rev. Genet. 13:7-24.

WEST, S. C. 1992. Enzymes and molecular mechanisms of genetic recombination. Annu. Rev. Biochem. 61:603-640.

XIANG, S.-H., and P. R. REEVES. 1993. Molecular cloning and expression of the rjb gene cluster of a Salmonella enterica D2 strain (ser. strasbourg) and comparison with group B, Dl and El. J. Bacterial. 176:4357-4365.

JULIAN P. ADAMS, reviewing editor

Received May 3 1, 1994

Accepted June 16, 1994

by guest on October 19, 2014

http://mbe.oxfordjournals.org/

Dow

nloaded from