Strain-Level Typing and Identification of Bacteria Using Mass Spectrometry-Based Proteomics

26
Strain-Level Typing and Identication of Bacteria Using Mass Spectrometry-Based Proteomics Roger Karlsson,* ,,Max Davidson, Liselott Svensson-Stadler, § Anders Karlsson, Kenneth Olesen, Elisabet Carlsohn, and Edward R. B. Moore § Nanoxis AB, Lennart Torstenssonsgatan 5, SE-40016, Gothenburg, Sweden Department of Chemistry, University of Gothenburg, Kemivä gen 10, SE-41296 Gothenburg, Sweden § Department of Infectious Disease, Culture Collection University of Gothenburg (CCUG), Sahlgrenska Academy of the University of Gothenburg, Guldhedsgatan 10A, SE-41346, Gothenburg, Sweden Proteomics Core Facility, Sahlgrenska Academy of the University of Gothenburg, Medicinaregatan 7A, SE-41390, Gothenburg, Sweden * S Supporting Information ABSTRACT: Because of the alarming expansion in the diversity and occurrence of bacteria displaying virulence and resistance to antimicrobial agents, it is increasingly important to be able to detect these microorganisms and to dierentiate and identify closely related species, as well as dierent strains of a given species. In this study, a mass spectrometry proteomics approach is applied, exploiting lipid-based protein immobilization (LPI), wherein intact bacterial cells are bound, via membrane-gold interactions, within a FlowCell. The bound cells are subjected to enzymatic digestion for the generation of peptides, which are subsequently identied, using LCMS. Following database matching, strain-specic peptides are used for subspecies-level discrimination. The method is shown to enable a reliable typing and identication of closely related strains of the same bacterial species, herein illustrated for Helicobacter pylori. KEYWORDS: LPI-FlowCell, mass spectrometry, proteomics, bacterial typing, bacterial identication INTRODUCTION Methods for reliable, rapid, and cost-eective characterization, dierentiation (i.e., typing), and identication of micro- organisms are essential for many clinical, biotechnology, agriculture, and basic research applications. Phenotyping, based upon cell morphological and metabolic traits, has been the traditional approach employed to identify bacterial species. However, advances in molecular biology techniques have enabled applications of bacterial typing and identication based upon genotypic characteristics, 1,2 which are generally more reproducible, robust, and reliable, compared with phenotypic characteristics. 2 DNA sequencing-based ap- proaches, including 16S rRNA gene sequencing analysis, multilocus sequence analysis (MLSA) and multilocus sequence typing (MLST), using targeted gene biomarkers, oer increasing levels of resolution, respectively, 35 but still reect limited views of the total extent of the heterogeneity in microbial genomes. Most methods of analyzing bacteria have been focused on identication, at the species level. However, microbial virulence and antibiotic resistance, as well as attributes exploitable for biotechnological applications, most often are features of individual strains, rather than pan-species properties. In settings wherein the choices available for antimicrobial treatment of infectious disease are decreasing, the detection, typing and identication of potential pathogens, at the subspecies level, are imperative. The term identicationentails the typing of bacteria, as well as the positive match to a reference; in this study, the focus was to perform subspecies level identication. The term typinghere refers to the characterization and dierentiation of bacteria; in this study, the focus was to perform subspecies level (i.e., strain) typing. The growing amount of genomic information, as well as breakthroughs in mass spectrometry-based proteomic analyses, including the development of softionization techniques, such as matrix-assisted laser desorption/ionization time-of-ight (MALDI-TOF) 6 and electrospray ionization (ESI), 7 enabling ionization of proteins and peptides without destructive fragmentation, provide alternatives for analyzing bacteria, using mass spectrometry (MS). In conjunction with separation techniques, such as HPLC, the so-called shot-gunproteomics methods have been developed, in which signicant parts of a proteome are analyzed in a single run. 8,9 As the proteome may be viewed as the translation and expression of the genome, MS- based proteomic typing of bacteria enables the elucidation of the subtle dierences between closely related strains by looking at a snapshotof a signicant part of the entire proteome in relation to the genome. Received: October 25, 2011 Published: March 27, 2012 Article pubs.acs.org/jpr © 2012 American Chemical Society 2710 dx.doi.org/10.1021/pr2010633 | J. Proteome Res. 2012, 11, 27102720

Transcript of Strain-Level Typing and Identification of Bacteria Using Mass Spectrometry-Based Proteomics

Strain-Level Typing and Identification of Bacteria Using MassSpectrometry-Based ProteomicsRoger Karlsson,*,†,‡ Max Davidson,† Liselott Svensson-Stadler,§ Anders Karlsson,† Kenneth Olesen,†

Elisabet Carlsohn,⊥ and Edward R. B. Moore§

†Nanoxis AB, Lennart Torstenssonsgatan 5, SE-40016, Gothenburg, Sweden‡Department of Chemistry, University of Gothenburg, Kemivagen 10, SE-41296 Gothenburg, Sweden§Department of Infectious Disease, Culture Collection University of Gothenburg (CCUG), Sahlgrenska Academy of the University ofGothenburg, Guldhedsgatan 10A, SE-41346, Gothenburg, Sweden⊥Proteomics Core Facility, Sahlgrenska Academy of the University of Gothenburg, Medicinaregatan 7A, SE-41390, Gothenburg,Sweden

*S Supporting Information

ABSTRACT: Because of the alarming expansion in thediversity and occurrence of bacteria displaying virulence andresistance to antimicrobial agents, it is increasingly importantto be able to detect these microorganisms and to differentiateand identify closely related species, as well as different strainsof a given species. In this study, a mass spectrometryproteomics approach is applied, exploiting lipid-based proteinimmobilization (LPI), wherein intact bacterial cells are bound, via membrane-gold interactions, within a FlowCell. The boundcells are subjected to enzymatic digestion for the generation of peptides, which are subsequently identified, using LC−MS.Following database matching, strain-specific peptides are used for subspecies-level discrimination. The method is shown toenable a reliable typing and identification of closely related strains of the same bacterial species, herein illustrated for Helicobacterpylori.

KEYWORDS: LPI-FlowCell, mass spectrometry, proteomics, bacterial typing, bacterial identification

■ INTRODUCTIONMethods for reliable, rapid, and cost-effective characterization,differentiation (i.e., typing), and identification of micro-organisms are essential for many clinical, biotechnology,agriculture, and basic research applications. Phenotyping,based upon cell morphological and metabolic traits, has beenthe traditional approach employed to identify bacterial species.However, advances in molecular biology techniques haveenabled applications of bacterial typing and identificationbased upon genotypic characteristics,1,2 which are generallymore reproducible, robust, and reliable, compared withphenotypic characteristics.2 DNA sequencing-based ap-proaches, including 16S rRNA gene sequencing analysis,multilocus sequence analysis (MLSA) and multilocus sequencetyping (MLST), using targeted gene “biomarkers”, offerincreasing levels of resolution, respectively,3−5 but still reflectlimited views of the total extent of the heterogeneity inmicrobial genomes. Most methods of analyzing bacteria havebeen focused on identification, at the species level. However,microbial virulence and antibiotic resistance, as well asattributes exploitable for biotechnological applications, mostoften are features of individual strains, rather than pan-speciesproperties. In settings wherein the choices available forantimicrobial treatment of infectious disease are decreasing,the detection, typing and identification of potential pathogens,

at the subspecies level, are imperative. The term “identification”entails the typing of bacteria, as well as the positive match to areference; in this study, the focus was to perform subspecieslevel identification. The term “typing” here refers to thecharacterization and differentiation of bacteria; in this study, thefocus was to perform subspecies level (i.e., strain) typing.The growing amount of genomic information, as well as

breakthroughs in mass spectrometry-based proteomic analyses,including the development of “soft” ionization techniques, suchas matrix-assisted laser desorption/ionization time-of-flight(MALDI-TOF)6 and electrospray ionization (ESI),7 enablingionization of proteins and peptides without destructivefragmentation, provide alternatives for analyzing bacteria,using mass spectrometry (MS). In conjunction with separationtechniques, such as HPLC, the so-called “shot-gun” proteomicsmethods have been developed, in which significant parts of aproteome are analyzed in a single run.8,9 As the proteome maybe viewed as the translation and expression of the genome, MS-based proteomic typing of bacteria enables the elucidation ofthe subtle differences between closely related strains by lookingat a “snapshot” of a significant part of the entire proteome inrelation to the genome.

Received: October 25, 2011Published: March 27, 2012

Article

pubs.acs.org/jpr

© 2012 American Chemical Society 2710 dx.doi.org/10.1021/pr2010633 | J. Proteome Res. 2012, 11, 2710−2720

Here, the LPI-FlowCell (Nanoxis AB, Gothenburg, www.nanoxis.com) is used for the immobilization of intact bacterialcells, with subsequent enzymatic generation of peptides andMS-based proteomic analyses of strain-specific peptides thatcan be used for subspecies-level typing and identification ofbacteria.

■ MATERIALS AND METHODS

Bacterial Strains and Cultivation

Lyophilized biomass of Helicobacter pylori strains J9910 =CCUG 47164 and ATCC 2669511 = CCUG 41936 and thetype strain of the species, CCUG 17874T were cultured onfreshly prepared blood agar medium (5% horse blood;Substrate Dept., Bacteriology Laboratory, Sahlgrenska Uni-versity Hospital) in microaerophilic atmosphere (80% N2, 10%CO2, 10% H2) for four days, which are standard conditions forthe cultivation of H. pylori.12 The strains were recultured for 3−4 days before the colony biomass was scraped from the agarmedium, suspended in 500 μL of phosphate-buffered saline(PBS: 0.36% NaH2PO4·2H2O, 1.37% Na2HPO4·2H2O, 8.5%NaCl, w/v, pH 7.2−7.4), and weighed.

Trypsin Digestion of Bacteria in LPI HexaLane FlowCellChannels

The bacterial biomass was washed with PBS, centrifuged for 8min at 4000g, and resuspended with PBS, three times, beforethe bacteria were resuspended finally in 1 mL of PBS. Thebacterial suspension was kept on ice until loading into the LPIHexalane FlowCell (Nanoxis AB, www.nanoxis.com; PatentApplication No. WO2006068619) (Figure 1). The bacterial cellconcentration was 0.7−1.9 × 109 CFU/mL; OD600 wasapproximately 0.5 (NanoDrop-1000 Spectrophotometer, Ther-mo Fisher Scientific). An excess of bacteria was applied to theflow cell by adding 100 μL of the washed bacterial suspension

to fill the FlowCell channel (with a volume of ∼30 μL).Bacterial suspension was removed from the inlet and outletports. The immobilized bacteria were incubated for 2 h, atroom temperature, to allow cell attachment, and the FlowCellchannels were washed subsequently with 1.0 mL of Ambic(NH4HCO3, 20 mM, pH 8), using a syringe pump, with a flowrate of 100 μL/min. Enzymatic digestions of the bacteria wereperformed by injecting 100 μL of trypsin (20 μg/mL) into theFlowCell channels and incubating for 30 min at roomtemperature. The generated peptides were eluted by injectingAmbic into the FlowCell channels at a flow rate of 100 μL/minand collected at the outlet ports, using a pipet (Figure 1). Thepeptide solutions were frozen at −20 °C until analysis by MS.

LC−MS/MS Analysis on LTQ-Orbitrap

The peptide fractions collected from the LPI FlowCell wereanalyzed separately by liquid chromatography tandem massspectrometry (LC−MS/MS) at the Proteomics Core Facility atThe University of Gothenburg (http://www.proteomics.cf.gu.se). Prior to analysis, the samples were centrifuged, in vacuum,to dryness. As a standard procedure, the dried samples werereconstituted in 20 μL of formic acid (0.1%, w/v) and desalted,using C18 ZipTip pipet tips (Millipore). The desalted sampleswere centrifuged at 13000g for 15 min, and 17 μL of thesupernatant was transferred to an HTC-PAL autosampler(CTC Analytics AG, Zwingen, Switzerland) connected to anAgilent 1200 binary pump (Agilent Technologies). Briefly,samples (2.0 μL) were injected, by autosampler, and analyzedby a LTQ-Orbitrap XL (Thermo Fisher Scientific) interfacedwith an in-house-constructed nanoLC system, describedpreviously.13 Peptides were trapped on a precolumn (45 ×0.075 mm i.d.) and separated on a reversed-phase column, 200× 0.050 mm. Both columns were packed, in-house, with 3 μmReprosil-Pur C18-AQ particles. The flow through the analyticalcolumn was reduced, by a split, to approximately 100 nL/min

Figure 1. (Top) Schematic diagram of the assembly of the LPI Hexalane FlowCell. Two plastic substrates are bonded together with a thin tapespacer that sets the channel height (50 μm). The tape spacer also sets the boundary of the six channels. (Bottom) Schematic diagram of thegeneration of tryptic peptides in the FlowCell. (A) A suspension of bacteria is injected into the FlowCell. (B) The bacteria attach to the surfaces ofthe FlowCell channel. (C) A solution of trypsin is injected into the FlowCell and the proteins, including the exposed parts of membrane proteins, aredigested to generate peptide fragments. (D) Buffer is injected into the channel, whereby the generated peptides are eluted and collected in the outletport of the channel.

Journal of Proteome Research Article

dx.doi.org/10.1021/pr2010633 | J. Proteome Res. 2012, 11, 2710−27202711

with the gradient: 0−5 min, 0.1% formic acid; 6−50 min, 5−50% ACN, 0.1% formic acid; 51−55 min 80% ACN, 0.1%formic acid. The LTQ-Orbitrap settings were as follows: sprayvoltage 1.4 kV; 1 microscan for MS1 scans at 60 000 resolution(m/z 400); full MS mass range m/z 400−2000. The LTQ-Orbitrap was operated in a data-dependent mode, i.e., one MS1FTMS scan precursor ions, followed by CID (collision induceddissociation) fragmentation of the six most intense, double- ortriple-protonated ions each FTMS scan. The settings for theMS2 were as follows: 1 microscan for CID-MS2 with a collisionenergy of 30%.

Peptide Matching

MS/MS data were analyzed, using Mascot (Matrix Science,London, U. K.) and X! Tandem software, version 2007.01.01.1(The GPM, www.thegpm.org). Mascot was configured tosearch against the NCBI nonredundant database, release 2011-04-01 (selected for Bacteria, 19 712 280 entries), with trypsinselected as digestion enzyme. Mascot files were imported intothe MS analysis program Scaffold (version Scaffold_3_00_07,Proteome Software Inc., Portland, OR) and run through X!Tandem, which was configured to search for peptide matchesagainst the NCBI nonredundant database, release 2011-04-13(selected for Bacteria, 20 019 713 entries), again with trypsinselected as the digestion enzyme. Mascot and X! Tandem fileswere searched using a fragment ion mass tolerance of 0.50 Daand a parent ion tolerance of 5.0 PPM. Oxidation ofmethionine was specified in Mascot and X! Tandem, as avariable modification. A single missed cleavage was allowed.Merging of Mascot files in Scaffold was performed in theMudPit mode.

Criteria for Protein Identification

The Scaffold software was used to validate MS/MS-basedpeptide matches and protein identifications. Peptide matcheswere accepted if they could be established at greater than 95.0%probability (false discovery rates were determined to rangebetween 1.6 and 3.2% for all injections), as specified by thePeptide Prophet algorithm.14 Identifications of proteins wereaccepted with greater than 99.0% probability from, at least, twoidentified peptide matches. Protein probabilities were assignedby the Protein Prophet algorithm.15 Proteins that containedsimilar peptides and could not be differentiated on the basis ofMS/MS analysis alone were grouped together, to satisfy theprinciples of parsimony.

Peptide Alignment to Helicobacter Protein Sequences

Helicobacter RefSeq FASTA files were obtained from thePathosystems Resource Integration Center (PATRIC) FTPserver (2011-05-03). Protein sequence redundancy within eachFASTA file was removed using CD-HIT16 with the minimumidentity cutoff set at 100%. The nonredundant FASTA fileswere collected into a master file, containing 65 888 sequences(18 875 505 letters) from 27 strains of H. pylori and 11 otherspecies of Helicobacter. Peptide sets were aligned to the masterfile, using PepAligner Version 1.1.4034. Only matches with100% peptide identity coverage were accepted. The number ofpeptides matched to each strain was counted and was used todisplay results in a graphical manner.

Membrane Association Analysis

Gene attributes for H. pylori strains J99 and 26695 wereobtained from the Comprehensive Microbial Resource (CMR)at the J. Craig Venter Institute, the Gene Ontology (GO)Consortium at the European Bioinformatics Institute (EMBl-

EBI), and Uniprot. A gene/gene product was identified in theresults as being membrane-associated if, at least, one of thefollowing attributes were observed: (1) the cellular rolecategory was designated as “cell-envelope” (CMR); (2) thesubcellular location was a membrane (Uniprot); (3) the geneontology identifiers for the gene product matched a list ofidentifiers covering trans-membrane, lipid-attached, andperipheral membrane proteins (GO, EMBl-EBI).Multilocus Sequence Typing (MLST)

The nucleotide sequences of seven partial genes used in the H.pylori MLST protocol (http://pubmlst.org/helicobacter/)17

were retrieved from the respective genome sequences of theH. pylori J99 and 26695 strains, eight other H. pylori strains(selected to represent a wide range of “peptide matches perstrain”) and two strains of other Helicobacter spp. Thenucleotide sequences for the type strain of H. pylori (CCUG17874T) were also retrieved from the H. pyloriMLST database.The sequences of strains J99 and ATCC 26695 in the MLSTdatabase were verified, by comparison with the sequences of therespective published genomes, and found to be identical.Whole Genome Comparisons

The average nucleotide identity (ANI) and tetranucleotidesignature frequency correlation coefficient (TETRA) valueswere calculated for the available genome sequences of strains ofHelicobacter spp. and H. pylori strains J99 and 26695,respectively, using the program JSpecies.18

Replicate Analyses

Samples from biological replicates of each strain, H. pylori J99and 26695, were produced, as described above (i.e., the sameculture conditions, sample handling and FlowCell processing).For each biological replicate sample, two Hexalane FlowCellchannels were used for immobilizing the bacteria andgenerating peptides, by trypsin digestion, for the subsequentproteomic identifications. Each peptide sample was injectedtwice in the LC−MS/MS instrument, resulting in a total of fourMS-files for each biological replicate of the two strains.Following MS-analyses, all strain-specific peptides identifiedfrom each of the double-injections, originating from separateFlowCell channels, were compiled into a single list for eachstrain.Bacterial Ratio Samples

Cell suspensions of H. pylori strains J99 and 26695 (OD600 wasapproximately 0.5) were mixed in the following ratios: 9:1, 7:3,1:1, 3:7, and 1:9 (J99:26695). The mixed cell suspensions wereinjected into the LPI Hexalane FlowCell channels. Separatechannels were used for each cell ratio sample and two LC−MS/MS runs were performed for each channel. Mascot data fromthe two MS runs were merged in Scaffold to create single listsof peptides for each cell ratio. Peptides with an identificationprobability ≥95% were passed to PepAligner for matchingagainst the in-house Helicobacter FASTA database. Peptidesmatching a protein sequence unique to a single strain wereclassified as strain-specific.

■ RESULTS AND DISCUSSIONIn order to achieve the highest levels of confidence for typingand identification analyses of bacteria, complete genomesequences would be the ultimate standard for assessing therespective phylogenetic relationships.19 However, sequencedeterminations and analyses of bacterial genomes are expensiveand require expertise that is not available in most laboratories.1

Journal of Proteome Research Article

dx.doi.org/10.1021/pr2010633 | J. Proteome Res. 2012, 11, 2710−27202712

A multitude of alternative techniques have been developed, allhaving particular strengths and weaknesses in their discrim-inatory power and their time and cost requirements.1,2 Thesetechniques involve so-called “indirect” or “direct” measures ofthe genome sequence. The most essential example of anindirect analysis is pulsed-field gel electrophoresis (PFGE),whereby genomic DNA fragment profiles are generated byenzymatic digestion with subsequent electrophoretic separa-tion.1,2 PFGE is considered to be the “gold-standard” of straintyping methods, particularly for epidemiological analyses.Direct measures, in which the DNA sequences of targeted“biomarker” genes are determined and analyzed, are also usedfrequently for estimating the similarities and relationshipsbetween bacteria. Sequence analysis of 16S rRNA genes hasbecome the most prevalent method for elucidating thephylogenetic relationships of prokaryotes, although it isrecognized that 16S rRNA gene sequences are limited intheir ability to differentiate closely related species.20,21 Thisrecognition has helped spur initiatives that led to thedevelopment of MLST3 and MLSA5 protocols, exploiting thecomparative sequence analyses of so-called “house-keeping”genes, which enable higher resolution differentiation. However,it is increasingly imperative that reliable, rapid, and cost-effective methods be established for characterizing anddifferentiating bacteria (i.e., typing) and identifying bacteria atthe subspecies (i.e., strain) level.Recently, methods based on mass spectrometry and

proteomics have been explored for strain-level typing andidentification of bacteria. For example, MALDI-TOF MS hasbeen used successfully for bacterial identifications with minimalnecessary sample preparation.22−24 Such analyses are limitedtypically to species-level differentiation, although some studieshave attained resolution at subspecies levels.25,26

The closely related individual strains of a species, expressingsubtle differences, may be differentiated only by analyzingsignificant proportions of the proteome. A comprehensiveproteomic study typically requires complex or prolongedprotein/peptide separation, such as two-dimensional gelelectrophoresis or liquid chromatography, prior to MS analysis.Enzymatic digestion of proteins into their peptide fragments isfrequently employed. Whole-cell bacterial digests will, however,give rise to potentially overly complex mixtures of peptides.Therefore, methods involving high-performance liquid chro-matography electro-spray ionization (HPLC-ESI) MS/MS havebeen developed and applied in order to achieve high-resolutionseparation and a corresponding increase in the number ofpeptide/protein identifications.8,9 LC−MS/MS shot-gun pro-teomics has been applied for bacterial identification.27−32 Usingthe same principles and a more extensive sample preparationprotocol for purifying outer membrane proteins, pathogenicstrains of Escherichia coli and Yersinia pestis were differentiatedfrom nonpathogenic strains of the respective species,demonstrating the high discriminatory power of such in-depth analyses.32 In combination with LC−MS/MS-basedproteomic typing, so-called “one-pot” methods have beenexplored to a limited degree, in which all necessary samplepreparation and processing steps are performed in the samevessel, thus reducing sample handling steps, time, cost andpossible sample loss.30

In this study, a lipid-based protein immobilization (LPI)technology has been applied for the proteomic typing andidentification of strains of Helicobacter pylori. The LPItechnology has been used previously, with success, for the

identification of outer membrane proteins of Salmonella spp.,using outer membrane vesicles, generated by osmotic treatmentof the cells.33,34 In the work presented here, intact cells of H.pylori were allowed to attach to the surfaces of the LPI-FlowCell, followed by enzymatic digestion for the generation ofpeptides to be analyzed by MS (Figure 1). The bacterial cellmembranes are attracted to the surfaces of the LPI-FlowCellthrough electrostatic, hydrophobic, and covalent interactions(Patent Application No. WO2006068619; www.nanoxis.com).The immobilization of the bacterial cells is performed by simpleincubation of the bacteria within the LPI-FlowCell, followed bya wash step to remove nonattached material from the channels.In this work, an excess of bacteria was used; the minimumnumber of bacteria needed for MS-based proteomic typing hasnot been optimized. For settings wherein the number ofbacterial cells is limited, optimization of the experimentalprocedures and, possibly, the FlowCell design could beimplemented. The peptides used for proteomic strain-leveltyping and identification are generated by injecting a digestiveenzyme, in this study trypsin, whereby the surface-exposedproteins of the bacterial cells are digested into peptidefragments (Figure 1). Because of the design of the LPI-FlowCell and the relatively small channel volume (50 μL), thewashing, binding, digestion, and elution steps can be performedrapidly. The elution step quickly separates the peptides fromthe intact cells, which remain attached to the FlowCell surface,thus enabling minimal sample cleanup before MS analysis.In the following paragraphs the identification results are

described for H. pylori strains J99 (CCUG 47164) and 26695(CCUG 41936). Using a 50 min LC gradient, an average of 147protein matches (599 unique peptides) per MS run (n = 6)were identified from H. pylori J99. On average, 155 proteinmatches (638 unique peptides) per MS run (n = 6) wereidentified from H. pylori 26695. The proportion of strain-specific protein matches was on average 18% for individual MSruns of the samples from H. pylori J99 and 21% from H. pylori26695 (Table 1). In the H. pylori J99 sample, 24 of 26 of the

strain-specific protein matches were assigned to H. pylori J99,and for the H. pylori 26695 sample, 31 of 32 were assigned toH. pylori 26695 (Table 2). Protein matches that could not beassigned to a specific strain were identified to the genus level,i.e., as Helicobacter, and the species was identified predom-inantly as H. pylori. In order to test the reproducibility of themethod, additional analyses were carried out, using twoadditional biological replicates for each of the H. pylori J99and 26695 strains. The level of identification for the additional

Table 1. Protein Data from the Six Individual Injections forthe H. pylori J99 Sample

injectionnumber

totalnumber

ofproteinhits

total numberof strain-specific

protein hits

numberof H.pyloriJ99 hits

percentstrain-specific

protein hitsof total

percentcorrect

assignments

1 152 26 25 17 962 154 30 28 20 933 152 25 22 16 884 150 33 30 22 915 139 20 17 14 856 134 22 19 16 86average 147 26 24 18 90merged 233 54 51 23 94

Journal of Proteome Research Article

dx.doi.org/10.1021/pr2010633 | J. Proteome Res. 2012, 11, 2710−27202713

biological replicates was the same as for the first samples,demonstrating effective reproducibility (Tables S1 and S2,Supporting Information (SI)).In the cases of the analyses of these H. pylori strains, the

amount of data generated from a single MS injection persample was sufficient to obtain correct identification, althoughstrains less differentiable at the proteomic level could provemore difficult to identify correctly if the data is insufficient.Therefore, data from six individual MS runs were merged using

Scaffold. Data merging resulted in an increase in the totalnumber of protein matches, and the number of strain-specificprotein matches was roughly doubled (Table 1 and 2), from 18to 23% for the H. pylori J99 sample (54 of 233 total proteinmatches) and from 21 to 27% for the H. pylori 26695 sample(68 of 252 total protein matches). Using the merged data, 51 of54 strain-specific protein matches from the H. pylori J99 samplewere assigned to H. pylori J99 (94% correct), and all 68 strain-specific protein matches for the H. pylori 26695 sample wereassigned to H. pylori 26695 (100% correct).The peptide set from each sample (six merged MS runs) was

aligned to a master file of Helicobacter protein sequences, basedon the genomic data of 27 strains of H. pylori and 11 otherspecies of Helicobacter. The distribution of peptide matches forthe J99 sample to the individual strains included in the masterfile is shown in Figure 2A, wherein strain J99 is shown to be thefirst in rank and clearly differentiated from the closest matches,i.e., to strains Gambia94/24, 2018, 908, and 2017, which areknown to be closely related to strain J99.35−37 The result forthe H. pylori 26695 sample is shown in Figure 2B, wherein26695 is first in rank and clearly differentiated from strainLithuania75, the second in rank. In the cases of both strainanalyses, the marked differentiation between the first andsecond strain matches indicated the detection of strain-specific

Table 2. Protein Data from the Six Individual Injections forthe H. pylori 26695 Sample

injectionnumber

totalnumber

ofproteinhits

totalnumber ofstrain-specific

protein hits

numberof H.pylori26695hits

percentstrain-specific

protein hitsof total

percentcorrect

assignments

1 142 24 24 17 1002 143 24 24 17 1003 149 35 35 24 1004 164 35 35 21 1005 170 39 38 23 976 162 32 32 20 100average 155 32 31 20 100merged 252 68 68 27 100

Figure 2. Strain ranking according to peptide matches per Helicobacter strain for the H. pylori J99 sample (A) and the H. pylori 26695 sample (B). Ofthe 1216 aligned peptides for the J99 sample, 51 were J99-specific, whereas for the 1299 aligned peptides in the 26695 sample, 62 were 26695-specific. Data was from six merged MS runs.

Journal of Proteome Research Article

dx.doi.org/10.1021/pr2010633 | J. Proteome Res. 2012, 11, 2710−27202714

peptides in each of the samples. In the H. pylori J99 sample, 51of 1216 aligned peptides were strain-specific, and similarly, inthe H. pylori 26695 sample, 62 of 1299 aligned peptides werestrain-specific. The potential number of strain-specific peptideswill depend on the genomic variation of different strains withina given species. The number of unique peptides that areapplicable for typing and identification of strains is, of course,dependent also upon what is available in the databases, i.e.,whether the number of strains for which genomic informationis available and sufficient to provide a comprehensive listing ofpotential proteins relevant for strain identification and howcomplete the genomic information is for the different strains.Notably, for the strain J99, 31 of the 51 strain-specific

peptides (approximately 60%) originated from membrane-associated proteins, such as Omp and CAG island proteins(Table S3 (SI)), and for the strain 26695, 19 of the 62 strain-

specific peptides (approximately 30%) originated frommembrane-associated proteins (Table S4 (SI)). Outer mem-brane proteins are highly involved in host−pathogeninteractions, by acting as adhesins or receptors to facilitatecolonization, mediate acquisition of nutrients, etc. In studies ofantibiotic resistance, particular mutations have been shown toinitiate up- and down-regulation of important groups of outermembrane proteins.38 The outer membrane proteins arestrongly linked to virulence in Gram-negative bacteria, and ithas been suggested that these proteins might be excellentbiomarkers for strain differentiation.32 The large portion ofstrain-specific peptides originating from membrane-associatedproteins, including outer membrane proteins, may be reflectedby the link between differences in the outer membraneproteome and strain differentiation32,38 and also by the applied

Figure 3. Peptide matches per strain for the J99 (A) and 26695 (B) sample, compared to whole-genome analyses using TETRA and ANI. Thepeptide matches per strain are shown as bars, and the TETRA (multiplied by 100) and ANI are depicted by lines connected by symbols as indicatedby the legend boxes. The strains are ranked by the peptide-to-strain count as in Figure 2.

Journal of Proteome Research Article

dx.doi.org/10.1021/pr2010633 | J. Proteome Res. 2012, 11, 2710−27202715

protocol, in the case of this study, the FlowCell, wherein intactbacteria were immobilized and digested.The strain match rankings (based on peptide matches per

strain), suggesting coherence with the respective genotypicrelationships, were compared subsequently to the results fromMLST analyses. The nucleotide sequences of the partial genesincluded in the H. pylori MLST protocol (http://pubmlst.org/helicobacter/) were compared, individually and concatenated,between the H. pylori strains J99 and 26695 and strains of H.pylori for which genome sequences exist. Similarities of theconcatenated MLST sequences of H. pylori strains in relation tothe reference strains ranged from 94.6 to 97.0%. No directcorrelation between the number of strain-specific peptides andMLST sequence similarities was observed when the differentstrains of H. pylori were compared. However, when comparingH. pylori strains J99 and 26695 to closely related species (i.e.,Helicobacter acinonychis), the decrease in the number of strainspecific peptides was accompanied by a marked decrease also inMLST sequence similarities (Supplemental Figure S1 (SI)).The average nucleotide identity (ANI)39 and tetranucleotide

frequency correlation coefficient (TETRA)40 values derivedfrom the comparisons of the whole genome sequences of H.pylori strains were compared using the program Jspecies.18 ANIwas calculated, using both BLAST (b) and MUMmer (m)algorithms; only the results from ANIb are shown (Figure 3).The ANI values between strains of the same species aretypically 94% or greater, with strains of distinct speciesexhibiting values below 94%. The ANI values observed betweenH. pylori strain J99 and the other H. pylori strains were at asimilar level (approximately 94%), except for that of the strainH. pylori Shi 470, which had a lower value (93%). Strain Shi470 also exhibited the lowest number of specific peptidematches with strain J99 (Figure 3). In relation to H. pylori strain26695, all H. pylori strains, except strain J99, shared higher ANIvalues, i.e., approximately 95% (Figure 3). When comparing toother species of Helicobacter, such as H. acinonychis, the ANIvalues for both J99 and 26695, dropped to approximately 89%,which was reflected in a lower peptide matches per strain(Figure 3). TETRA values between genomes also have beenshown to be high (>0.99) when ANI values are high, althoughthe opposite is not necessarily true. For comparisons of both ofthe H. pylori strains J99 and 26695 to other Helicobacter strains,the TETRA values followed the same pattern as what wasobserved for ANI. TETRA values were approximately 0.99 anddecreased to 0.98 for H. acinonychis (Figure 3).In summary, these genotypic comparisons are not able to

rank the strains of H. pylori as was seen with the proteomicanalysis. Using both MLST and the whole genome sequencecomparisons, all H. pylori strains exhibited the same level ofsimilarities to J99 and 26695, respectively. Generally, the ANIsbetween the strains was higher, in relation to H. pylori 26695than to J99, indicating that the genome of strain 26695 isslightly more similar to the other H. pylori strains than that thegenome of strain J99. This was not detected by MLST. For thedifferent, albeit closely related, species H. acinonychis, thedecrease in similarities to H. pylori strains is more pronouncedand obvious with the proteomic- and the genotypic-basedmethods. Even though the genomes of many importantpathogens have been sequenced, to date,32 the greater part ofbacterial diversity still remains unsequenced and is not availablein genomic or proteomic databases; for example, the type andmost important systematic reference strain of H. pylori (CCUG17874T) has not been sequenced. In order to assess how the

proteomic typing of this strain would be able to find the nearestneighbors, intact bacteria of the type strain were digested in theFlowCells. Peptide matching and strain rankings wereperformed, placing H. pylori G27 as the best match (Figure4), suggesting this strain to be the nearest neighbor, among

strains available in the databases, to the type strain of H. pylori.In this experiment, however, no strain-specific peptides wereidentified. The peptide ranking was compared to an MLSTanalysis, using the same strains used for the J99 and 26695analyses, with the type strain (CCUG 17874T) included(Figure 5). The MLST sequence similarities were approx-imately the same for all H. pylori strains and did not rank themin the way that the peptide-to-strain counts did. As in the casesof H. pylori strains J99 and 26695, the MLST sequencesimilarities to H. acinonychis strain Sheeba sequences weremarkedly decreased and correlated well to the lower number ofpeptide matches per strain obtained in the proteomic analysis.Since the genome of the H. pylori type strain has not beensequenced, the methods used above for whole-genomecomparisons were not applicable.Strain-level typing and identification of bacteria by mass

spectrometry-based proteomics relies on the nucleic acidsequence information available for peptide matching and theability to identify strain-specific peptides. During the prepara-tion of this manuscript, the genome sequences of several H.pylori strains were added to the nonredundant NCBI database,including those of four strains closely related to H. pylori J99,

Figure 4. Strain ranking, according to peptide matches per strain, forthe H. pylori type strain sample. Data were from a single MS run.

Journal of Proteome Research Article

dx.doi.org/10.1021/pr2010633 | J. Proteome Res. 2012, 11, 2710−27202716

i.e., strains 2017, 2018, Gambia94/24, and SouthAfrica7. Aftermatching the MS data against the NCBInr database (2011-04-01), approximately half of the protein hits believed to be J99-specific after the first data analysis were discarded. Thisillustrates the importance of the database used for theproteomic analyses of bacteria at the strain level.The approach of this study exploits proteolytic cleavage at

sites bordering the regions containing strain-specific sequence.In this work, trypsin was used with satisfactory results, sincetrypsin digests proteins into peptides with suitable mass andcharge for electrospray-ionization MS. However, for strainswith less definitive regions, it could be necessary to use, as well,proteases with different specificities than trypsin. Such

treatments would increase the likelihood of extracting themost relevant strain-specific peptides and would be of particularimportance when trying to discriminate between the mostclosely related strains, since, for common proteins, only a fewamino acids usually will distinguish different strains at thepeptide sequence level. The higher the discriminatory powerrequired to identify the strains unambiguously, the higher thedemand for generating and identifying unique peptidefragments.The discovery of strain-specific peptides may be applied, as

well, for analyzing samples that contain several different strains.To illustrate how the analytical setup could be optimized foranalyzing complex peptide samples that entail small differences

Figure 5. MLST analysis of 11 H. pylori strains and two other Helicobacter spp. showing the percent sequence similarity to H. pylori Type strainCCUG 17874T. (A) The H. pylori strains and species were arranged according to the peptide matches per strain ranking, shown as bars. Thesequence similarity of the seven partial genes and also the concatenated sequences of the strains in relation to the H. pylori Type strain are shown aslines connected by symbols indicated by the legend boxes. (B) The H. hepaticus was removed from the analysis, and the scales were adjustedaccordingly.

Journal of Proteome Research Article

dx.doi.org/10.1021/pr2010633 | J. Proteome Res. 2012, 11, 2710−27202717

in peptide populations, the effect of increasing the gradient timefrom 50 min to 2 h was examined for the separation of peptidesoriginating from a 1:1 mixture of J99 and 26695 peptidesamples. As expected, a more efficient LC separation increasedthe overall number of identified peptides, including strain-specific peptides. For the short run, J99 was the dominant strainmatch, followed by 26695, from peptide alignment matchesbased on a single MS run. Importantly, peptides that weredetermined to be strain-specific were only from the H. pyloriJ99 and 26695 strains. The total number of strain-specificpeptide matches was 32, of which 23 were J99-specific and 8were 26695-specific. Using a 2 h gradient, 71 strain-specificpeptides were identified, of which 52 were J99-specific and 18were 26695-specific. This is approximately the same number ofstrain-specific peptides as determined by merging the data fromsix 50 min gradient runs (Table 3). The relative order and

relative separation, in terms of counts, between the top rankingH. pylori strains, however, was not altered.To simulate the case of a biological sample containing more

than one strain, the H. pylori strains J99 and 26695 sampleswere mixed at the following ratios prior to immobilization anddigestion in the LPI-FlowCells: 9:1, 7:3, 1:1, 3:7, and 1:9(J99:26695). Strain-specific peptides originating from both H.pylori J99 and 26695 could be found in all of the mixtures(Figure 6). A small number of peptides (at most three) uniqueto other strains were found in three of the mixtures. These wereeither H. pylori Lithuania75, Helicobacter mustelae 12198, H.pylori P12, or H. pylori Gambia94/24. This provided examplesof the occurrence of false peptide identifications. The key tosuccessful typing therefore is to be able to generate, byenzymatic cleavage, a sufficient number of strain-specificpeptides and to be able to identify as many of the peptidesas possible, using a good LC separation and MS identificationsetup. Further studies will focus on optimizing the number ofbacteria needed for reliable strain-level identification in samplescontaining multiple strains in different amounts.

■ CONCLUSIONSThe LPI-FlowCell technology is easy to use, which is importantfor potential application in routine protocols. The technologyrequires low amounts of sample and has the potential to befully automated and coupled directly to LC−MS instruments,because of the easy exchange of fluid and reagents in the LPI-FlowCell format. In its current configuration, the equipmentand expertise needed to perform the proteomic MS analysisand interpret the results suffer the same drawback as the wholegenome analysis, i.e., by not being widespread and usedroutinely in clinical settings. All methods for analyzing bacteriahave distinct advantages and disadvantages. The current

methods for subspecies typing and identification are oftenlimited by the level of resolution that may be attained by lack ofreproducibility, by relatively slow processing time in analyzingsamples, or by being too expensive for routine application. Forexample, PFGE, the so-called “gold-standard” of strain typingmethods, used for clinical epidemiological studies, requires 24−48 h to complete analyses on strains. The MS-based proteomicanalyses of strains described in this study offers a strategy basedupon obtaining a comprehensive assessment of the proteomesof bacterial strains. Such an extensive basis for comparison ofbacterial strains does not exist with currently applied methods.Nominal man-power and sample preparation of samples forMS-based proteomics offers significant potential for increasingthe speed in being able to provide detailed strain informationon infectious bacteria. The MS instrument development is arapidly evolving area, and together with more user-friendlybioinformatics tools, it is not difficult to imagine that MSinstruments and MS-based methods such as those presentedhere will find their way into clinical settings for rapid typing andidentification of bacteria.In this study, the method of using intact bacteria in the LPI-

FlowCell for the enzymatic generation and subsequent MS-based identification of strain-specific peptides has been shownto work well for discriminating different strains of Helicobacterpylori. Proteomic identification of bacterial samples notincluded in whole genome sequence databases has also beendemonstrated, and the results indicate that such data analyseswill indicate a particular group or cluster of nearest-neighborstrains, a finding which may be adequate in itself or help indetermining the correct response to an infectious outbreak.The importance of using updated databases when performing

data evaluation is stressed; the final results can be quitedifferent when using out-of-date databases. Additionally,optimization of the chromatographic conditions for peptideseparation, enabling the identification of as many strain-specificpeptides as possible, may be critical when analyzing complexsamples containing multiple closely related strains.

Table 3. Comparison between Short (50 min) and Long (2h) Liquid Chromatography Separation Gradienta

number of strain-specific peptidesper strain

sampleIDtotal number of strain-

specific peptidesH. pylori

J99H. pylori26695

otherstrains

shortgradient

32 23 8 1

longgradient

71 52 18 1

aThe number of strain-specific peptides are shown. Data produced byPepAligner.

Figure 6. Number of strain-specific peptides identified in differentmixtures of H. pylori J99 and 26695. The ratios of H. pylori J99 to26695 were 9:1, 7:3, 1:1, 3:7, and 1:9. Peptides specific to other strainswere identified in three of the samples.

Journal of Proteome Research Article

dx.doi.org/10.1021/pr2010633 | J. Proteome Res. 2012, 11, 2710−27202718

Importantly, if mixtures of bacteria are analyzed, the methodcan still detect unique peptide markers to identify the presenceof particular bacterial strains. Such discrimination offersopportunities to develop peptide biomarker databases fordetecting the presence of particular strains of bacteria in asample. Peptide biomarkers for antibiotic resistance orvirulence could be employed, in order to rapidly assess thelevel of contagion; with consideration that bacterial pathogenshave the potential to be exploited as biological weapons. Withfurther optimization of the analytical protocol, automatedhandling of sample and sample processing, and creation ofdatabases for strain-specific peptides, the FlowCell-basedproteomic approach may rival current rapid typing methodsand provide strain-level identifications.

■ ASSOCIATED CONTENT*S Supporting Information

Table S1. Protein data from the biological replicate samples ofH. pylori J99. Table S2. Protein data from the biologicalreplicate samples of H. pylori 26695. Table S3. List of identifiedproteins from the strain-specific peptide matches from theanalyses of the H. pylori J99 sample. Table S4. List of identifiedproteins from the strain-specific peptide matches from theanalyses of the H. pylori 26695 sample. Table S5. List ofidentified proteins for the H. pylori J99 strain from the strain-specific peptide matches based on the analyses of the bacterialratio samples. Table S6. List of identified proteins for the H.pylori 26695 strain from the strain-specific peptide matchesbased on the analyses of the bacterial ratio samples. Figure S1.MLST analyses of the H. pylori J99 and 26695 strains. Thismaterial is available free of charge via the Internet at http://pubs.acs.org.

■ AUTHOR INFORMATIONCorresponding Author

*E-mail: [email protected].

Notes

The authors declare no competing financial interest.

■ ACKNOWLEDGMENTSThis research was supported, in part, by the W.R. WileyEnvironmental Molecular Science Laboratory, a nationalscientific user facility sponsored by the U.S. Department ofEnergy’s Office of Biological and Environmental Research andlocated at PNNL. PNNL is operated by Battelle MemorialInstitute for the U.S. Department of Energy, under contractDE-AC05-76RL0 1830. The authors received support fromThe Health & Medical Care Committee of the RegionalExecutive Board, Region Vastra Gotaland, Project No. 72241,from the ALF-Medel for Forskning Projects ALFGBG-11574and ALFGBG-210591 and FoU-Vastra Gotaland RegionProjects VGFOUREG-30781, 83080 and 157801. Thispublication made use of the Helicobacter pylori Multi LocusSequence Typing Website (http://pubmlst.org/helicobacter/)developed by Keith Jolley. The development of this site hasbeen funded by the Wellcome Trust and European Union. Theauthors acknowledge the Proteomics Core Facility at theSahlgrenska Academy of the University of Gothenburg. Thepurchase of LTQ-OrbitrapXL was made possible through agrant from the Knut and Alice Wallenberg Foundation to Prof.Gunnar C. Hansson (KAW2007.0118)

■ REFERENCES(1) Foxman, B.; Zhang, L.; Koopman, J. S.; Manning, S. D.; Marrs, C.F. Choosing an appropriate bacterial typing technique for epidemio-logic studies. Epidemiol. Perspect. Innov. 2005, 2, 10.(2) Li, W.; Raoult, D.; Fournier, P.-E. Bacterial strain typing in thegenomic era. FEMS Microbiol. Rev. 2009, 33, 892−916.(3) Maiden, M. C.; Bygraves, J. A.; Feil, E.; Morelli, G.; Russell, J. E.;Urwin, R.; Zhang, Q.; Zhou, J.; Zurth, K.; Caugant, D. A.; Feavers, I.M.; Achtman, M.; Spratt, B. G. Multilocus sequence typing: A portableapproach to the identification of clones within populations ofpathogenic microorganisms. Proc. Natl. Acad. Sci. U. S. A. 1998, 95,3140−3145.(4) Cilia, V.; Lafay, B.; Christen, R. Sequence heterogeneities among16S ribosomal RNA sequences and their effect on phylogeneticanalyses at the species level. Mol. Biol. Evol. 1996, 13, 451−461.(5) Bishop, C. J.; Aanensen, D. M.; Jordan, G. E.; Kilian, M.; Hanage,W. P.; Spratt, B. C. Assigning strains to bacterial species via theinternet. BMC Biol. 2009, 7, 3.(6) Karas, M.; Hillenkamp, F. Laser desorption ionization of proteinswith molecular masses exceeding 10,000 Da. Anal. Chem. 1995, 60,2299−2301.(7) Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F.; Whitehouse, C.M. Electrospray ionization for mass spectrometry of largebiomolecules. Science 1989, 246, 64−71.(8) Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.;Morris, D. R.; Garvik, B. M.; Yates, J. R. Direct analysis of proteincomplexes using mass spectrometry. Nat. Biotechnol. 1999, 17 (7),676−682.(9) Washburn, M. P.; Wolters, D.; Yates, J. R. Large-scale analysis ofthe yeast proteome by multidimensional protein identificationtechnology. Nat. Biotechnol. 2001, 19 (3), 242−247.(10) Alm, R. A.; Ling, L. S.; Moir, D. T.; King, B. L.; Brown, E. D.;Doig, P. C.; Smith, D. R.; Noonan, B.; Guild, B. C.; deJonge, B. L.;Carmel, G.; Tummino, P. J.; Caruso, A.; Uria-Nickelsen, M.; Mills, D.M.; Ives, C.; Gibson, R.; Merberg, D.; Mills, S. D.; Jiang, Q.; Taylor, D.E.; Vovis, G. F.; Trust, T. J. Genomic-sequence comparison of twounrelated isolates of the human gastric pathogen Helicobacter pylori.Nature 1999, 397 (6721), 719.(11) Tomb, J. F.; White, O.; Kerlavage, A. R.; Clayton, R. A.; Sutton,G. G.; Fleischmann, R. D.; Ketchum, K. A.; Klenk, H. P.; Gill, S.;Dougherty, B. A.; Nelson, K.; Quackenbush, J.; Zhou, L.; Kirkness, E.F.; Peterson, S.; Loftus, B.; Richardson, D.; Dodson, R.; Khalak, H. G.;Glodek, A.; McKenney, K.; Fitzegerald, L. M.; Lee, N.; Adams, M. D.;Hickey, E. K.; Berg, D. E.; Gocayne, J. D.; Utterback, T. R.; Peterson, J.D.; Kelley, J. M.; Cotton, M. D.; Weidman, J. M.; Fujii, C.; Bowman,C.; Watthey, L.; Wallin, E.; Hayes, W. S.; Borodovsky, M.; Karp, P. D.;Smith, H. O.; Fraser, C. M.; Venter, J. C. The complete genomesequence of the gastric pathogen Helicobacter pylori. Nature 1997, 388(6642), 539−547.(12) Goodwin, C. S.; Blincow, E. D.; Warren, J. R.; Waters, T. E.;Sanderson, C. R.; Easton, L. Evaluation of cultural techniques forisolating Campylobacter pyloridis from endoscopic biopsies of gastricmucosa. J. Clin. Pathol. 1985, 38 (10), 1127−1131.(13) Carlsohn, E.; Nystrom, J.; Karlsson, H.; Svennerholm, A.-M.;Nilsson, C. L. Characterization of the outer membrane protein profilefrom disease-related Helicobacter pylori isolates by subcellularfractionation and nano-LC FT-ICR MS analysis. J. Proteome Res.2006, 5, 3197−3204.(14) Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Empiricalstatistical model to estimate the accuracy of peptide identificationsmade by MS/MS and database search. Anal. Chem. 2002, 74 (20),5383−5392.(15) Nesvizhskii, A. I.; Keller, A.; Kolker, E.; Aebersold, R. Astatistical model for identifying proteins by tandem mass spectrometry.Anal. Chem. 2003, 75 (17), 4646−4658.(16) Huang, Y.; Niu, B.; Gao, Y.; Fu, L.; Li, W. CD-HIT Suite: a webserver for clustering and comparing biological sequences. Bioinfor-matics 2010, 26 (5), 680−682.

Journal of Proteome Research Article

dx.doi.org/10.1021/pr2010633 | J. Proteome Res. 2012, 11, 2710−27202719

(17) Jolley, K. A.; Maiden, M. C. BIGSdb: Scalable analysis ofbacterial genome variation at the population level. BMC Bioinf. 2010,11, 595.(18) Richter, M.; Rossello-Mora, R. Shifting the genomic goldstandard for the prokaryotic species definition. Proc. Natl. Acad. Sci. U.S. A. 2009, 106 (45), 19126−19131.(19) Wayne, L. G.; Brenner, D. J.; Colwell, R. R.; Grimont, P. A. D.;Kandler, O.; Krichevsky, M. I.; Moore, L. H.; Moore, W. E. C.;Murray, R. G. E.; Stackebrandt, E.; Starr, M. P.; Truper, H. G. Reportof the Ad Hoc Committee on Reconciliation of Approaches toBacterial Systematics. Int. J. Syst. Bacteriol. 1987, 37 (4), 463−464.(20) Fox, G. E.; Wisotzkey, J. D.; Jurtshuk, P. How close is close: 16SrRNA sequence identity may not be sufficient to guarantee speciesidentity. Syst. Appl. Microbiol. 1992, 42 (1), 166−170.(21) Rossello-Mora, R.; Amann, R. The species concept forprokaryotes. FEMS Microbiol. Rev. 2001, 25, 39−67.(22) Sauer, S.; Kliem, M. Mass spectrometry tools for theclassification and identification of bacteria. Nat. Rev. Microbiol. 2010,8 (1), 74−82.(23) Welker, M. Proteomics for routine identification of micro-organisms. Proteomics 2011, 11, 3143−3153.(24) Welker, M.; Moore, E. R. B. Applications of whole-cell matrix-assisted laser-desorption/ionization time-of-flight mass spectrometryin systematic microbiology. Syst. Appl. Microbiol. 2011, 34 (1), 2−11.(25) Sato, H.; Teramoto, K.; Ishii, Y.; Watanabe, K.; Benno, Y.Ribosomal protein profiling by matrix-assisted laser desorption/ionization-mass spectrometry for phylogeny-based subspecies reso-lution of Bif idobacterium longum. Syst. Appl. Microbiol. 2011, 34 (1),76−80.(26) Shah, H. N.; Rajakaruna, L.; Ball, G.; Misra, R.; Al-Shahib, A.;Fang, M.; Gharbia, S. E. Tracing the transition of methicillin resistancein sub-populations of Staphylococcus aureus, using SELDI-TOF massspectrometry and artificial neural network analysis. Syst. Appl.Microbiol. 2011, 34 (1), 81−86.(27) Zhou, X.; Gonnet, G.; Hallett, M.; Munchbach, M.; Folkers, G.;James, P. Cell fingerprinting: an approach to classifying cells accordingto mass profiles of digests of protein extracts. Proteomics 2001, 1 (5),683−690.(28) Dworzanski, J. P.; Snyder, A. P.; Chen, R.; Zhang, H.; Wishart,D.; Li, L. Identification of bacteria using tandem mass spectrometrycombined with a proteome database and statistical scoring. Anal.Chem. 2004, 76, 2355−2366.(29) Dworzanski, J. P.; Deshpande, S. V.; Chen, R.; Jabbour, R. E.;Snyder, A. P.; Wick, C. H.; Li, L. Mass spectrometry-based proteomicscombined with bioinformatic tools for bacterial classification. J.Proteome Res. 2006, 5 (1), 76−87.(30) Jabbour, R. E.; Deshpande, S. V.; Stanford, M. F.; Wick, C. H.;Zulich, A. W.; Snyder, A. P. A protein processing filter method forbacterial identification by mass spectrometry-based proteomics. J.Proteome Res. 2011, 10 (2), 907−912.(31) Jabbour, R. E.; Deshpande, S. V.; Wade, M. M.; Stanford, M. F.;Wick, C. H.; Zulich, A. W.; Skowronski, E. W.; Snyder, A. P. Double-blind characterization of non-genome-sequenced bacteria by massspectrometry-based proteomics. Appl. Environ. Microbiol. 2010a, 76(11), 3637−3644.(32) Jabbour, R. E.; Wade, M. M.; Deshpande, S. V.; Stanford, M. F.;Wick, C. H.; Zulich, A. W.; Snyder, A. P. Identification of Yerseniapestis and Escherichia coli strains by whole cells and outer membraneprotein extracts with mass spectrometry-based proteomics. J. ProteomeRes. 2010b, 9 (7), 3647−3655.(33) Chooneea, D.; Karlsson, R.; Encheva, V.; Arnold, C.; Appleton,H.; Shah, H. Elucidation of the outer membrane proteome ofSalmonella enteric serovar Typhimurium utilising a lipid-based proteinimmobilization technique. BMC Microbiol. 2010, 10, 44.(34) Karlsson, R.; Choonea, D.; Carlsohn, E.; Encheva, V.; Shah, H.Characterization of bacterial membrane proteins using a novelcombination of a lipid based protein immobilization technique withmass spectrometry. In Mass Spectrometry for Microbial Proteomics;

Shah, H. N., Gharbia, S. E., Encheva,V., Eds.; Wiley-Blackwell:Hoboken, NJ, 2010.(35) Alvi, A.; Devi, S. M.; Ahmed, I.; Hussain, M. A.; Rizwan, M.;Lamouliatte, H.; Megraud, F.; Ahmed, N. Microevolution ofHelicobacter pylori type IV secretion systems in an ulcer diseasepatient over a ten-year period. J. Clin. Microbiol. 2007, 45 (12), 4039−4043.(36) Devi, S. H.; Taylor, T. D.; Avasthi, T. S.; Kondo, S.; Suzuki, Y.;Megraud, F.; Ahmed, N. Genome of Helicobacter pylori strain 908. J.Bacteriol. 2010, 192 (24), 6488−6489.(37) Avasthi, T. S.; Devi, S. H.; Taylor, T. D.; Kumar, N.; Baddam,R.; Kondo, S.; Suzuki, Y.; Lamouliatte, H.; Megraud, F.; Ahmed, N.Genomes of two chronological isolates (Helicobacter pylori 2017 and2018) of the west African Helicobacter pylori strain 908 obtained from asingle patient. J. Bacteriol. 2011, 193 (13), 3385−3386.(38) Cordwell, S. J. Technologies for bacterial surface proteomics.Curr. Opin. Microbiol. 2006, 9, 320−329.(39) Kostantinidis, K.; Tiedje, J. M. Genomic insights that advancethe species definition for prokaryotes. Proc. Natl. Acad. Sci. U. S. A.2005, 102 (7), 2567−72.(40) Bohlin, J.; Skjerve, E.; Ussery, D. W. Reliability and applicationsof statistical methods based on oligonucleotide frequencies in bacterialand archaeal genomes. BMC Genomics 2008, 9, 104.

Journal of Proteome Research Article

dx.doi.org/10.1021/pr2010633 | J. Proteome Res. 2012, 11, 2710−27202720

Supplemental Table 1. Protein data from the biological replicate samples of H. pylori J99.

Sample Total number of protein hits

Total number of strain- specific protein hits

Number of H. pylori J99 hits

Percent strain- specific protein hits of total

Percent correct assignments

J99

Biological replicate 1

53 8 7 15 88

J99

Biological replicate 2

58 8 8 15 100

Supplemental Table 2. Protein data from the biological replicate samples of H. pylori 26695.

Sample Total number of protein hits

Total number of strain- specific protein hits

Number of H. pylori 26695 hits

Percent strain- specific protein hits of total

Percent correct assignments

26695

Biological replicate 1

56 11 11 20 100

26695

Biological replicate 2

59 12 12 20 100

Supplemental Table 3. Proteins identified by strain-specific peptide matches from the analyses of the H. pylori J99 sample. Membrane-

association is indicated by the tag “M” in column 2.

#

Locus tag | Protein name Number of spectra per

peptide

Best Mascot Ion score

Best X! Tandem -

log(e) score 1. M jhp0095 | methyl-accepting chemotaxis protein (MCP)

0 TVGQSGLSLQSVDGVYYVR 5 66.7 8.14 2. 1 jhp0135 | cytochrome oxidase (CBB3-type)

0 GLQENQVFAADLTTYGTESFLR 9 128 13.3 0 GMDYPAGEMPAIEMDEK 7 82.9 7.49

3. M jhp0212 | putative Outer membrane protein 0 ISSVNDAENLLQQAATIINVLTTQNPHVNGGGGAWGFGGK 2 77.8 6.14

0 TGNVMDIFGDSFNAINEMIK 1 67.7 5.82 4. M jhp0214 | Outer membrane protein/porin

0 QTASNTDSSTAQAIDNLEK 1 104 9.52 5. 1 jhp0249 | heat shock protein

0 MTDQLHETLDSAIALALHHK 3 45.9 2.15 6. 1 jhp0305 | poly E-rich protein

0 LQENETPKDESMQESAQNLQDK 2 56 1.66 7. 1 jhp0422 | oligopeptidase

0 TLQTEVQEFENAYQNNLK 9 87.2 8.28 8. M jhp0444 | putative paralog of HpaA

0 SVIHENLDK 3 38.7 1.11 9. M jhp0495 | cag island protein, cytotoxicity associated immunodominant antigen

0 GVGATNGVSHLEAGFSK 5 98.3 8.28 0 HLALVAEFGNGELSYTLK 14 97 10.4 0 IDRLDQIASGLGDVGQAASFLLK 3 33.8 4.72 0 LDQIASGLGDVGQAASFLLK 14 132 12.3

0 NVVNLYVESAK 6 39 4.28 0 SNDLIDKDNLIDTGSSIK 10 78.7 6.02 0 VDNVVASFDPNQKPIVDK 10 69.2 8.96 0 VGLSANHEPIYATIDDLGGPFPLK 2 33.6 3.26 0 VGLSANHEPIYATIDDLGGPFPLKR 6 69.6 8.96 0 YQIFMNWVSHQNDPSK 8 95.7 6.64

10. 1 jhp0561 | Adenylate kinase 0 VFLDPLVEIQNFYK 2 73.3 4.74

11. 1 jhp0573 | hypothetical protein 0 NPQVEQYLNSLTVHLR 5 53.3 2.08

12. M jhp0797 | hypothetical protein 0 DQNNAIQQGETK 5 81.1 4.85

0 RPEPTKDQNNAIQQGETK 1 39 3.2 13. 1 jhp0809 | catalase

0 DLFDAIAGGDFPK 3 56.7 2.89 14. M jhp0833 | outer membrane protein - adhesin

0 ADGNTTGVSYTEITNK 3 82.7 7.39 0 GIQDLSDRYESLNNLLNR 5 53.1 6.27 0 KNNPYSPQGIDTNYYLNQNSYNQIQTINQELGR 2 58.2 4.39 0 MITDAQELVNQTSVINEHEQTTPVGNNNGKPFNPFTDASFAQGMLANASAQAK 1 25.5 3.48 0 NNPYSPQGIDTNYYLNQNSYNQIQTINQELGR 6 65.8 9.39 0 YESLNNLLNR 6 63.6 3.59

15. 1 jhp0841 | phosphotransacetylase 0 GALVEDIVNTVLISAIQAQDY 1 39.3 2.72

16. 1 jhp0892 | hypothetical protein 0 ILYAESTHESNAQPPK 4 52.8 4.08

17. 1 jhp1004 | hypothetical protein 0 LDENLLSSGTQSSK 2 53.6 4.2

18. M jhp1054 | putative Outer membrane protein

0 AVQTAPVTTEPAPEK 12 47.8 4.77 0 AVQTAPVTTEPAPEKEEPK 6 73.7 7.64

19. 1 jhp1079 | putative signal recognition particle protein 0 IPDLDVFMPER 6 70.6 4.07

20. M jhp1103 | putative Outer membrane function 0 ASTTDFNNQTTPQLDQAQTLANTLTQELGNNPFK 4 68.1 9.77

0 DQQGTSSDQTTTTTSVIDTTNDAQNLLTQAQTIVNTLK 1 61.9 6.14 0 LANQVASDFDK 3 66.9 4.3 0 NITQPNNFNLNSPGSLTALAQSMLK 4 80.1 5.85 0 SSSNGGTNGANTPSWQTAGGGK 6 88 12.9

21. 1 jhp1121 | DNA-directed RNA polymerase subunit beta/beta' 0 VAGETIYLTAIQEDSHIIAPASTPIDEEGNILGDLIETR 2 42.2 3.72

22. M jhp1164 | outer membrane protein - adhesin 0 ALTANGEGIPVLSNTTTK 4 58 8.38

0 LSSDPSAVNDAR 3 95.3 3.8 23. 1 jhp1196 | phosphomannomutase

0 FLFEVLSAGLQSSGLK 5 62.6 3.2 24. 1 jhp1206 | hypothetical protein

0 TFSNAVVGDEVK 3 53.9 4.57 25. 1 jhp1227 | 50S ribosomal protein L5

0 IMQNIAQTISLVAGQK 9 81 6.2 26. 1 jhp1264 | phosphoglycerate kinase

0 IANTYAFSLIGGGDTIDAINR 11 98.3 9.48 27. M jhp1413 | hypothetical protein

0 TTQNNQINQPNK 4 61.4 2.59 28. 1 jhp1423 | Type I restriction enzyme modification subunit

0 MVSLEEISLNDYNLNIPR 1 88.4 5.85

Supplemental Table 4. Proteins identified by strain-specific peptide matches from the analyses of the H. pylori 26695 sample. Membrane-

association is indicated by the tag “M” in column 2.

#

Locus tag | Protein name Number of spectra per

peptide

Best Mascot Ion score

Best X! Tandem -

log(e) score 1. 1 HP0026 | type II citrate synthase

0 SVTLVNNENNER 5 62.7 4.43 2. 1 HP0056 | delta-1-pyrroline-5-carboxylate dehydrogenase

0 VLLMGFLSFGK 7 61.6 3.74 3. 1 HP0088 | RNA polymerase sigma factor RpoD

0 KDEDNEEDEENEER 4 49.1 5.7 4. M HP0099 | methyl-accepting chemotaxis protein (TlpA)

0 SDLFLIGTK 2 43.8 1.74 0 VNEVQGVLENTYTSMGIVK 5 134 12.2

5. 1 HP0109 | molecular chaperone DnaK 0 AKFESLTEDLMK 3 69.2 4.08

0 FESLTEDLMK 4 27.7 3.4 6. 1 HP0110 | co-chaperone and heat shock protein (grpE)

0 SAAEEDKESALTK 4 79.2 5.14 7. M HP0154 | phosphopyruvate hydratase

0 NIANAVLIKPNQIGTISETLETIR 7 45.8 5.12 8. M HP0227 | outer membrane protein (omp5)

0 AEAQAEILNLAK 7 77.8 4.89 9. M HP0229 | outer membrane protein (omp6)

0 TAPESPNQPSAFNNADFNK 5 89.7 10.9 0 TTPNSANQAVSSALSSAVAMWQVIVSNLANNSLPTSEYNK 3 44.3 7.07

10. 1 HP0238 | prolyl-tRNA synthetase 0 DLDNVGLIAGFIGPYGLK 3 71.8 4.21

11. 1 HP0322 | poly E-rich protein 0 NENNTETPQEK 3 44.1 2.33

0 TQAQELEVPK 1 41.7 1.52 12. 1 HP0390 | adhesin-thiol peroxidase (tagD)

0 HFNEQTGK 1 36.1 2.62 13. 1 HP0509 | glycolate oxidase subunit (glcD)

0 GHEAMEEIFQAAISLEGTLSGEHGIGLSK 8 74.3 14.5 14. M HP0520 | cag pathogenicity island protein (cag1)

0 VPTTVNNETQK 7 41.5 5.07 15. M HP0522 | cag pathogenicity island protein (cag3)

0 VVEVPVSPQTSNSDETMR 4 103 13 16. M HP0547 | cag pathogenicity island protein (cag26)

0 DQQGNNVATLINVHMK 1 26.7 2.54 0 GVGATNGVSHLEAGFNK 4 60.7 9.8 0 HSALITEFNNGDLSYTLK 3 102 11.5 0 NKVDFMEFLAQNNTK 3 35 1.82 0 TPDQTQSQTAFDPQQFINNLQVAFIK 8 46 6.11 0 TPDQTQSQTAFDPQQFINNLQVAFIKVDNVVASFDPDQKPIVDK 1 32.1 2.96

17. 1 HP0554 | hypothetical protein 0 ANPIEENKPEPTPK 5 41.9 5.62

18. 1 HP0617 | aspartyl-tRNA synthetase 0 DSSNAIFSNTAK 3 82.2 5.72

19. 1 HP0649 | aspartate ammonia-lyase 0 AATLANVQLGLIDEK 5 81.8 5.16

20. 1 HP0705 | excinuclease ABC subunit A 0 VVVNNENASR 3 51.8 2.06

21. 1 HP0850 | type I restriction enzyme M protein (hsdM) 0 ITLHGQESVNK 3 46.5 2.92

22. 1 HP0857 | phosphoheptose isomerase

0 KGLSAISLNTDISALTAIANDYGYEEVFAR 4 49.7 3.64 23. M HP0858 | ADP-heptose synthase (rfaE)

0 GVLDFELTQAMIALANQHHK 1 40.1 1.51 24. 1 HP0900 | hydrogenase expression/formation protein (hypB)

0 ADMVEVFNFR 7 62.1 2.64 0 QESLQNNPNLSK 3 51.7 2.92

25. 1 HP0958 | hypothetical protein 0 LVQQLESLVENEVK 6 72.9 4.06

0 NEQTLQDTNTK 5 36.5 5.47 26. 1 HP0979 | cell division protein FtsZ

0 AAEESANEIK 4 45.5 1.82 27. 1 HP1046 | hypothetical protein

0 IGGVIESLGYLLYDVSLVK 2 29.1 1.96 0 IGGVIESLGYLLYDVSLVKENEQHVLR 5 54 1.35

28. 1 HP1048 | translation initiation factor IF-2 0 LTQSNANNASNANNAK 6 84.9 7.07

0 TGDGIDNLLETILIQAGIMELK 1 37.6 1.4 0 VIQSTTAIPEEVR 4 56 2.27

29. 1 HP1110 | pyruvate flavodoxin oxidoreductase subunit alpha 0 SSPAGTMGAMFNEVTSAVYQTQGTK 1 28.5 2.34

30. 1 HP1123 | peptidyl-prolyl cis-trans isomerase, FKBP-type rotamase (slyD) 0 EQGSSIVLDSNISKEPLEFIIGTNQIIAGLEK 3 30.9 2.26

31. 1 HP1161 | flavodoxin FldA 0 VILVAPTAGAGDLQTDWEDFLGTLEASDFATK 3 46.4 7.85

32. M HP1177 | outer membrane protein (omp27) 0 GEKLEAHVTTSKPENNSQTK 3 34.7 6.96

0 LEAHVTTSKPENNSQTK 13 68.4 7.92 0 SSSESSGAATTNAPSWQTAGGGK 11 99.4 13.3

33. M HP1243 | outer membrane protein (omp28)

0 VVGSIASGNTSHVITNK 5 90.8 5.77 34. 1 HP1286 | hypothetical protein

0 MLSVFEGNIDVK 11 62.4 3.29 35. 1 HP1313 | 30S ribosomal protein S3

0 HPQADAQLAAENVATQLEK 9 98.9 10.9 36. 1 HP1350 | protease

0 NEEEKEITPK 1 51.6 2.35 37. 1 HP1374 | ATP-dependent protease ATP-binding subunit ClpX

0 MESSAYEEEFLLSYIPAPK 1 64.1 8.23 38. 1 HP1422 | isoleucyl-tRNA synthetase

0 ALDENLVEELLMVSFVGIAK 5 62.2 5.6 0 DSLVALGEHVGLEDGTGAVHTAPGHGEEDYYLGLR 5 54.6 3.16 0 LAALGVVDNEITHEFNSNDLEYLVATNPLNQR 7 31.9 3.8

39. 1 HP1430 | ATP-binding protein 0 QALLESSQFSSLGLVGFKDEKPLIK 6 50.1 2.96

40. 1 HP1463 | hypothetical protein 0 NGNQDFQESAIQSLQHVSVQAIQEAVSLIK 4 61.6 2.32

41. 1 HP1507 | hypothetical protein 0 TGVFNIEELNTDPFMEELIK 7 97.9 5.8

42. 1 HP1542 | hypothetical protein 0 FTGTVEAEVVEIMPLGHLDGK 3 40 4.39

43. 1 HP1554 | 30S ribosomal protein S2 0 ELMQEEIVHANENSEEIEYVSHEEK 5 57.6 4.42

0 ELMQEEIVHANENSEEIEYVSHEEKEEMLAEIQK 2 30.3 1.82

170x212mm (300 x 300 DPI)

Supplemental Table 5. H. pylori J99 proteins identified by strain-specific peptide matches in the analyses of the bacterial ratio samples.

Membrane-association is indicated by the tag “M” in column 2.

Ratio of J99 to 26695-> 9:1 7:3 1:1 3:7 1:9

# Locus tag | Protein name Number of strain-specific peptides 1. jhp0062 | urease accessory protein 1 1 2. M jhp0075 | methyl-accepting chemotaxis protein (MCP) 1 1 3. M jhp0095 | methyl-accepting chemotaxis protein (MCP) 1 1 1 4. jhp0124 | bacterioferritin comigratory protein 2 1 1 5. jhp0135 | cytochrome oxidase (CBB3-type) 2 2 6. jhp0187 | putative glycerol-3-phosphate acyltransferase PlsX 1 1 7. jhp0193 | hypothetical protein 1 8. M jhp0212 | putative Outer membrane protein 4 3 3 9. M jhp0214 | Outer membrane protein/porin 3 10. jhp0216 | hypothetical protein 1 11. jhp0305 | poly E-rich protein 2 1 12. jhp0337 | transketolase 1 1 13. jhp0387 | putative proline peptidase 1 14. jhp0390 | hypothetical protein 1 1 15. jhp0413 | polyphosphate kinase 1 16. jhp0422 | oligopeptidase 1 1 17. M jhp0476 | cag island protein 1 18. M jhp0495 | cag island protein, cytotoxicity associated immunodominant antigen 15 12 9 9 4 19. jhp0501 | hypothetical protein 1 1 1 1 20. jhp0524 | methylenetetrahydrofolate dehydrogenase/methenyltetrahydrofolate cyclohydrolase 1 21. jhp0561 | Adenylate kinase 1 1 1 1 22. jhp0585 | putative 3-hydroxyacid dehydrogenase 1 1 23. jhp0655 | DNA polymerase III subunits gamma and tau 1

24. M jhp0677 | D-alanyl-D-alanine-adding enzyme 2 2 1 25. M jhp0727 | putative heavy-metal cation-transporting P-type ATPase 1 26. jhp0786 | Type I restriction enzyme modification subunit 1 27. M jhp0797 | hypothetical protein 1 1 1 28. jhp0809 | catalase 1 1 1 29. M jhp0833 | outer membrane protein - adhesin 6 6 4 3 3 30. jhp0892 | hypothetical protein 1 1 31. jhp0967 | methionyl-tRNA synthetase 1 32. jhp1011 | biotin carboxylase 1 33. M jhp1054 | putative Outer membrane protein 2 2 2 2 34. M jhp1063 | F0F1 ATP synthase subunit delta 2 35. jhp1079 | putative signal recognition particle protein 1 1 1 36. M jhp1103 | putative Outer membrane function 6 5 4 4 37. jhp1121 | DNA-directed RNA polymerase subunit beta/beta' 1 1 38. M jhp1164 | outer membrane protein - adhesin 2 2 2 2 1 39. jhp1171 | hypothetical protein 2 40. jhp1206 | hypothetical protein 1 41. jhp1227 | 50S ribosomal protein L5 1 42. jhp1231 | 50S ribosomal protein L29 3 43. jhp1264 | phosphoglycerate kinase 1 1 44. M jhp1287 | rod shape-determining protein MreB 1 2 45. jhp1363 | DNA polymerase I 1 46. jhp1365 | putative type II DNA modification enzyme (methyltransferase) 2 47. jhp1400 | hypothetical protein 2 1 1 48. M jhp1413 | hypothetical protein 1 49. jhp1423 | Type I restriction enzyme modification subunit 1 1 1 50. jhp1446 | putative recombination protein RecB 1

Supplemental Table 6. H. pylori 26695 proteins identified by strain-specific peptide matches in the analyses of the bacterial ratio samples.

Membrane-association is indicated by the tag “M” in column 2.

Ratio of J99 to 26695-> 9:1 7:3 1:1 3:7 1:9

# Locus tag | Protein name Number of strain-specific peptides 1. HP0026 | type II citrate synthase 2 2 2 2 2 2. M HP0099 | methyl-accepting chemotaxis protein (TlpA) 1 1 1 1 1 3. HP0109 | molecular chaperone DnaK 1 4. HP0110 | co-chaperone and heat shock protein (grpE) 1 1 5. HP0130 | hypothetical protein 1 2 2 6. HP0136 | bacterioferritin comigratory protein (bcp) 1 1 7. M HP0154 | phosphopyruvate hydratase 1 8. M HP0160 | hypothetical protein 1 9. HP0185 | hypothetical protein 1 1 2 10. HP0207 | ATP-binding protein (mpr) 1 1 1 11. M HP0227 | outer membrane protein (omp5) 2 1 1 12. M HP0229 | outer membrane protein (omp6) 1 1 1 1 13. HP0238 | prolyl-tRNA synthetase 1 1 14. HP0322 | poly E-rich protein 1 2 2 15. HP0416 | cyclopropane fatty acid synthase (cfa) 1 1 1 16. M HP0421 | type 1 capsular polysaccharide biosynthesis protein J (capJ) 1 17. HP0509 | glycolate oxidase subunit (glcD) 1 18. M HP0520 | cag pathogenicity island protein (cag1) 1 1 1 19. M HP0522 | cag pathogenicity island protein (cag3) 1 1 1 1 20. M HP0527 | cag pathogenicity island protein (cag7) 2 2 1 21. M HP0547 | cag pathogenicity island protein (cag26) 1 4 5 6 4 22. HP0550 | transcription termination factor Rho 1 1 23. HP0554 | hypothetical protein 1 24. HP0659 | hypothetical protein 1 25. HP0841 | bifunctional phosphopantothenoylcysteine decarboxylase/phosphopantothenate synthase 1 1

26. HP0850 | type I restriction enzyme M protein (hsdM) 1 1 2 3 27. HP0900 | hydrogenase expression/formation protein (hypB) 3 3 3 3 3 28. M HP0912 | outer membrane protein (omp20) 1 2 2 2 29. HP0919 | carbamoyl phosphate synthase large subunit 2 2 2 2 30. M HP0923 | outer membrane protein (omp22) 1 1 31. HP0958 | hypothetical protein 1 2 2 32. HP1019 | serine protease (htrA) 2 2 2 2 33. HP1037 | hypothetical protein 1 1 34. HP1048 | translation initiation factor IF-2 3 2 2 35. M HP1105 | LPS biosynthesis protein 1 36. HP1110 | pyruvate flavodoxin oxidoreductase subunit alpha 1 1 1 37. HP1123 | peptidyl-prolyl cis-trans isomerase, FKBP-type rotamase (slyD) 1 1 38. HP1138 | plasmid replication-partition related protein 1 1 1 39. HP1161 | flavodoxin FldA 1 1 40. M HP1177 | outer membrane protein (omp27) 3 3 3 3 3 41. HP1189 | aspartate-semialdehyde dehydrogenase 1 1 1 42. M HP1243 | outer membrane protein (omp28) 2 2 3 2 43. M HP1255 | preprotein translocase subunit SecG 2 2 44. M HP1268 | NADH dehydrogenase subunit I 1 45. HP1286 | hypothetical protein 1 1 1 46. HP1313 | 30S ribosomal protein S3 1 2 2 47. HP1345 | phosphoglycerate kinase 1 1 1 1 48. HP1350 | protease 1 49. HP1362 | replicative DNA helicase 1 1 50. M HP1373 | rod shape-determining protein MreB 1 1 51. HP1374 | ATP-dependent protease ATP-binding subunit ClpX 1 1 2 52. HP1397 | hypothetical protein 1 53. HP1422 | isoleucyl-tRNA synthetase 1 1 54. HP1454 | hypothetical protein 1 1 1 2 55. HP1463 | hypothetical protein 2 2 56. HP1470 | DNA polymerase I (polA) 1 1 1

57. HP1480 | seryl-tRNA synthetase 1 2 58. HP1507 | hypothetical protein 1 1 1 59. HP1524 | hypothetical protein 1 1 1 1 60. HP1542 | hypothetical protein 1 1 61. HP1554 | 30S ribosomal protein S2 1 2 2 1