Genome-level analyses of Mycobacterium bovis lineages reveal the role of SNPs and antisense...

18
RESEARCH ARTICLE Open Access Genome-level analyses of Mycobacterium bovis lineages reveal the role of SNPs and antisense transcription in differential gene expression Paul Golby 1* , Javier Nunez 1 , Adam Witney 2 , Jason Hinds 2 , Michael A Quail 3 , Stephen Bentley 3 , Simon Harris 3 , Noel Smith 1 , R Glyn Hewinson 1 and Stephen V Gordon 4 Abstract Background: Bovine tuberculosis (bTB) is a disease with major implications for animal welfare and productivity, as well as having the potential for zoonotic transmission. In Great Britain (GB) alone, controlling bTB costs in the region of £100 million annually, with the current control scheme seemingly unable to stop the inexorable spread of infection. One aspect that may be driving the epidemic is evolution of the causative pathogen, Mycobacterium bovis. To understand the underlying genetic changes that may be responsible for this evolution, we performed a comprehensive genome-level analyses of 4 M. bovis strains that encompass the main molecular types of the pathogen circulating in GB. Results: We have used a combination of genome sequencing, transcriptome analyses, and recombinant DNA technology to define genetic differences across the major M. bovis lineages circulating in GB that may give rise to phenotypic differences of practical importance. The genomes of three M. bovis field isolates were sequenced using Illumina sequencing technology and strain specific differences in gene expression were measured during in vitro growth and in ex vivo bovine alveolar macrophages using a whole genome amplicon microarray and a whole genome tiled oligonucleotide microarray. SNP/small base pair insertion and deletions and gene expression data were overlaid onto the genomic sequence of the fully sequenced strain of M. bovis 2122/97 to link observed strain specific genomic differences with differences in RNA expression. Conclusions: We show that while these strains show extensive similarities in their genetic make-up and gene expression profiles, they exhibit distinct expression of a subset of genes. We provide genomic, transcriptomic and functional data to show that synonymous point mutations (sSNPs) on the coding strand can lead to the expression of antisense transcripts on the opposing strand, a finding with implications for how we define a silentnucleotide change. Furthermore, we show that transcriptomic data based solely on amplicon arrays can generate spurious results in terms of gene expression profiles due to hybridisation of antisense transcripts. Overall our data suggest that subtle genetic differences, such as sSNPS, may have important consequences for gene expression and subsequent phenotype. Keywords: Bovine tuberculosis, Mycobacterium bovis, Microarray, Transcript, SNP, Antisense, Macrophage * Correspondence: [email protected] 1 Animal Health and Veterinary Laboratories Agency, Woodham Lane, New Haw Addlestone, Surrey KT15 3NB, UK Full list of author information is available at the end of the article © 2013 Golby et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Golby et al. BMC Genomics 2013, 14:710 http://www.biomedcentral.com/1471-2164/14/710

Transcript of Genome-level analyses of Mycobacterium bovis lineages reveal the role of SNPs and antisense...

RESEARCH ARTICLE Open Access

Genome-level analyses of Mycobacterium bovislineages reveal the role of SNPs and antisensetranscription in differential gene expressionPaul Golby1 Javier Nunez1 Adam Witney2 Jason Hinds2 Michael A Quail3 Stephen Bentley3 Simon Harris3Noel Smith1 R Glyn Hewinson1 and Stephen V Gordon4

Abstract

Background Bovine tuberculosis (bTB) is a disease with major implications for animal welfare and productivity aswell as having the potential for zoonotic transmission In Great Britain (GB) alone controlling bTB costs in theregion of pound100 million annually with the current control scheme seemingly unable to stop the inexorable spread ofinfection One aspect that may be driving the epidemic is evolution of the causative pathogen Mycobacteriumbovis To understand the underlying genetic changes that may be responsible for this evolution we performed acomprehensive genome-level analyses of 4 M bovis strains that encompass the main molecular types of thepathogen circulating in GB

Results We have used a combination of genome sequencing transcriptome analyses and recombinant DNAtechnology to define genetic differences across the major M bovis lineages circulating in GB that may give rise tophenotypic differences of practical importance The genomes of three M bovis field isolates were sequenced usingIllumina sequencing technology and strain specific differences in gene expression were measured during in vitrogrowth and in ex vivo bovine alveolar macrophages using a whole genome amplicon microarray and a wholegenome tiled oligonucleotide microarray SNPsmall base pair insertion and deletions and gene expression datawere overlaid onto the genomic sequence of the fully sequenced strain of M bovis 212297 to link observed strainspecific genomic differences with differences in RNA expression

Conclusions We show that while these strains show extensive similarities in their genetic make-up and geneexpression profiles they exhibit distinct expression of a subset of genes We provide genomic transcriptomic andfunctional data to show that synonymous point mutations (sSNPs) on the coding strand can lead to the expressionof antisense transcripts on the opposing strand a finding with implications for how we define a lsquosilentrsquo nucleotidechange Furthermore we show that transcriptomic data based solely on amplicon arrays can generate spuriousresults in terms of gene expression profiles due to hybridisation of antisense transcripts Overall our data suggestthat subtle genetic differences such as sSNPS may have important consequences for gene expression andsubsequent phenotype

Keywords Bovine tuberculosis Mycobacterium bovis Microarray Transcript SNP Antisense Macrophage

Correspondence paulgolbyahvlagsigovuk1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UKFull list of author information is available at the end of the article

copy 2013 Golby et al licensee BioMed Central Ltd This is an open access article distributed under the terms of the CreativeCommons Attribution License (httpcreativecommonsorglicensesby20) which permits unrestricted use distribution andreproduction in any medium provided the original work is properly cited

Golby et al BMC Genomics 2013 14710httpwwwbiomedcentralcom1471-216414710

BackgroundMycobacterium bovis is the causative agent of bovinetuberculosis (bTB) an endemic disease of cattle inGreat Britain (GB) with the potential for zoonotic trans-mission to humans In GB the primary control of bTB isthrough lsquotest and slaughterrsquo surveillance whereby cattlethat are positive to the tuberculin skin test [1] areremoved from the herd and slaughtered In spite of thisapproach which has been in place since the 1950s thenumber of TB-positive cattle slaughtered is increasingyear on year - approximately 30000 cattle were tested andslaughtered between 2012ndash2013 compared to 300 between1995ndash1996 (httpwwwdefragovukanimal-diseasesa-zbovine-tb) The UK (GB and Northern Ireland) govern-ments currently spend approximately pound100 million peryear collectively on control measures and compensationto farmers for slaughtered cattle The failure of the test-and-slaughter policy to control the spread of infectionin large parts of GB suggests that we need a muchgreater understanding of the TB disease dynamic includ-ing the role of pathogen diversity as a potential driver ofthis processM bovis isolates that are cultured from skin test-reactor

animals are currently genetically typed using a combinationof spoligotyping [2] and VNTR [3] Spoligotyping exploitsa polymorphic region of the genome called the DR locuswhich consists of multiple identical 36bp repeats inter-spersed with unique sequences known as spacers Isolatesof M bovis differ in the presence or absence of spacersand adjacent DRs allowing a lsquobarcodersquo to be generated foreach molecular type Spoligotypes 9 and 17 are the domin-ant molecular types in the UK with more than one thirdof all isolates corresponding to Type 9 and a quarter toType 17 VNTR measures the variation at repeat se-quences in 6 regions of the genome There are 6 majorVNTR types for Type 9 while all others show only onedominant profile suggesting that M bovis Type 9strains are more genetically variable compared withother spoligotypes Integration of molecular typingwith geographical information systems allows temporaland spatial distribution of molecular types to bemapped across GB Type 9 isolates are widely distrib-uted across GB while type 17 is an emerging clonewhich has expanded out of foci around GloucesterHereford and Worcester Similarly Types 25 and 35have expanded out of StaffordshireShropshire andHerefordWorcester respectively Between them types25 35 9 and 17 encompass the diversity of the majorclonal lineages of M bovis circulating in the UKAn analysis of molecular typing data from ~11500

M bovis isolates revealed that the population structureof M bovis in GB could not be explained by randommutation and drift and instead it appeared that certainstrains were increasing at a faster rate relative to others

[4] One suggestion for the lsquoclonal expansionrsquo of GBM bovis genotypes was that certain genotypes had aselective advantage over others leading to an increasein their frequency in the population [4] Supportive ofthis hypothesis several lines of evidence have suggestedthat M bovis isolates show phenotypic differences toeach other Fourier-Transform Infrared Spectroscopy(FT-IR) has been used to generate metabolic profiles ofthe 10 major spoligotype groups of M bovis isolatescirculating in GB Clustering analysis of the resultingspectra showed that the spectra could be differentiatedaccording to spoligotype indicating that strains of dif-ferent spoligotypes possess phenotypically distinct traits[5] In addition it has also been shown that type 17 iso-lates have lower incorporation rates of propionate intomembrane lipid components compared to other fieldstrains suggesting a degree of metabolic remodelling inthe type 17 lineage [6] Hence it appears that geneticdifferences across M bovis lineages may impact onphenotypic traits This latter finding may have import-ant implications for vaccine and diagnostic test develop-ment in terms of which experimental challenge strainsto test vaccines against or on influencing diagnostic testperformanceIn an attempt to better define genetic differences

across the major M bovis lineages circulating in GBthat may give rise to phenotypic differences of practicalimportance we have used a combination of genomesequencing transcriptome analyses and recombinantDNA technology The genomes of three M bovis fieldisolates were sequenced using Illumina sequencingtechnology and strain specific differences in gene ex-pression were measured during in vitro growth and inex vivo bovine alveolar macrophages (Mϕ) using awhole genome amplicon microarray Recent discover-ies of small non coding RNA within mycobacteria [78]prompted us to assess differences in sRNA expressionacross the isolates using a whole genome tiled oligo-nucleotide microarray SNPsmall base pair insertionand deletions (INDELs) and gene expression data wereoverlaid onto the genomic sequence of the fully se-quenced strain of M bovis 212297 to link observedstrain specific genomic differences with differences inRNA expression

ResultsComparative genomics of M bovis field isolates usingwhole genome sequencing and microarraysThe strains for this study were chosen to reflect thegenomic diversity of the M bovis population circulatingin GB and are listed in Table 1 M bovis strains weretyped using a combination of spoligotyping and VNTRFor each spoligotype group an isolate which possessedthe most common VNTR profile was selected so that

Golby et al BMC Genomics 2013 14710 Page 2 of 18httpwwwbiomedcentralcom1471-216414710

each chosen strain was the most representative of eachspoligotype group (Table 1) Of the four studied strains245101 and 130701 diverged earliest during descentfrom the most recent common ancestor of M bovis inGB and are more distant to strains 112101 and 130701 (Smith N personal communication) All 4 strainswere isolated from diseased cows belonging to herdswhich were taken from farms in geographically diverseareas of the countryThe genomes of the three M bovis strains 112101

245101 and 130701 were paired-end sequenced usingIllumina sequencing technology Processed sequencereads were mapped to the genome of the fully sequenced

and annotated strain 212297 [10] to identify SNPsSNPs were identified across all four sequenced M bovisstrains and their positions together with their SNPclass are listed in Additional file 1 The evolutionaryrelationships between the three sequenced strains aredepicted in Figure 1 using a phylogenetic tree and adistance matrix plot Three genome sequenced membersof the Mycobacterium tuberculosis complex M bovisBCG Pasteur M tuberculosis H37Rv and M africanumGM041182 were included in the analysis to place theM bovis strains in a wider mycobacterial context Thenumbers of SNPs between M bovis 212297 and thethree newly sequenced M bovis strains were found

Table 1 M bovis field strains used in this study

AHVLA ID International spoligotype ID [9] UK spoligotype ID VNTR ID Year of isolation UK county of isolation

112101 SB0263 17 7555331 2001 Wiltshire

212297 SB0140 9 8555331 1997 Devon

245101 SB0129 25 6554231 2001 Clwyd

130701 SB0134 35 3354331 2001 Shropshireindicates partial number of repeats

Figure 1 Whole genome SNP-based evolutionary analysis of M bovis sequenced strains (a) Phylogenetic tree with numbers above thebranches indicating the number of SNPs identified between the organism and its common ancestor (b) Distance matrix plot showing thenumber of SNPs present between selected pairs of strains

Golby et al BMC Genomics 2013 14710 Page 3 of 18httpwwwbiomedcentralcom1471-216414710

to be consistent with their predicted evolutionary distancesfrom each other Strain 112101 (type 17) is most closelyrelated to the original genome sequenced strain 212297(type 9) with 114 SNPs whereas the more distantly relatedstrains 245101 (type 25) and 130701 (type 35) have 431and 552 SNP differences respectively All four sequencedM bovis strains were more distantly related to Mafricanum (~1800 SNPs) and M tuberculosis H37Rv(~2200 SNPs)In addition to SNPs both large and small INDELs can

be inferred from NGS data although there are severalchallenges involved [11] Large sized insertions (gt500bp) are particularly difficult to identify with accuracy asthey require the de novo assembly of the reads that failto map to the reference genome and the subsequentdatabase searching with the resulting contigs Accurateidentification of large sized deletions (gt500 bp) are how-ever more easily identifiable and Additional file 2 liststhose that have been identified by NGS in the genomesof the three sequenced strains As poor coverage canconfound the identification of deletions we sought touse microarray technology to confirm the deletions iden-tified by the NGS data Genomic DNA was isolated fromall four strains labelled with fluorescent dyes andhybridised to a whole genome M tuberculosisM bovisamplicon microarray (see Methods) Table 2 lists severalLSPs that were detected across the strains usingmicroarrays Both NGS and microarray data predict thepresence of the large 68kb deletion (RDbovis(d)_0173)which encompasses genes Mb1963-Mb1971 and appearsto be specific to UK strains belonging to Type 17 thathas been described in a previous study [12] Several ofthese gene products are predicted to encode proteins in-volved in lipid metabolism but the lipid composition ofseveral type 17 isolates was found to be no different toother M bovis strains although their ability to incorpor-ate propionate into mycolic acids was found to be lower[6] A smaller 16 kb deletion specific 130701 that com-prises the 3prime end of Mb2056c Mb2055c and the 5prime end

of pkfB (Mb2054c) was also detected by both NGS andmicroarray data Due to a single base deletion Mb2056cand Mb2055c are pseudogenes in 212297 but the twogenes exist as one intact functional gene in 2145 andH37Rv (Rv2030c) The pfkB gene encodes a phospho-fructokinase homologue and is strongly immunogenic inhuman TB patients while Rv2030c encodes an erythro-mycin esterase Both pfkB and Mb2056c are members ofthe DosR regulon which are highly upregulated underanaerobic conditions and have been implicated in bac-terial persistence in vivo [13] Other smaller deletionsdetected include a deletion of a probable lipid transferprotein encoding gene Mb1699c which is specific to130701 and an aldoketo reductase encoding geneMb2320 that is specific to 112101

Linking SNPs to genes that show differential expressionamongst M bovis strains grown under vitro conditionsand in ex vivo macrophagesThe four M bovis field strains were grown to mid-logarithmic phase in pyruvate-containing Middlebrook7H9 liquid media and then used to infect bovine alveo-lar Mϕ using a multiplicity of infection (MOI) of 101(bacilli Mϕ) Mycobacterial RNA was recovered frominfected host cells 4 and 24 hrs post infection using adifferential lysis procedure and amplified using a modi-fied procedure similar to that described by van Gelderet al ([14] see Methods) As a control RNAs were alsoextracted from strains that had been incubated staticallyin RPMI cell culture media for a period of 4 hrs Toeliminate potential skewing effects on the transcriptomeresulting from the amplification process comparisonswere made only between amplified RNA generated fromsamples collected at the same time point and biologicalreplicate For the in vitro growth condition total RNAswere extracted from the four strains grown in a pyruvate-containing Middlebrook 7H9 liquid media and rolledduring incubation

Table 2 Large sequence polymorphisms present in M bovis field isolates

Mb CDS Mtb CDS Genomic postn(wrt to 212297)

Size ofdeletion (bp)

Gene(s) 212297 112101 245101 130701 Product(s)

Mb1699c Rv1627c nd nd DEL Probable lipid transfer proteinaMb1963c-Mb1971

Rv1928c-Rv1936

2172231-2179038 6807 tpx fadE17fadE18 echA13

DEL Gene products have roles inlipid metabolism

Mb2054c Rv2029c 2259798-2261457 1659 pfkB DEL Possible phosphofructokinase

Mb2055cMb2056c

Rv2030c DEL Conserved hypothetical protein(frame-shifted in 221297)

Mb2320 Rv2298 nd nd DEL DEL Aldoketo reductase

Mb3923c Rv3894c nd nd DEL Conserved membrane protein(frame-shifted in 221297)

DEL indicates genes that are deleted from strainsaMostowy et al 2005 nd not determined

Golby et al BMC Genomics 2013 14710 Page 4 of 18httpwwwbiomedcentralcom1471-216414710

The RNAs extracted from each of the growth condi-tions were converted to cyanine labelled cDNA using re-verse transcriptase and hybridised to whole genomeamplicon microarrays Using only those genes that arecommon to all four strains we found a total set of 70genes that showed a 25-fold or more difference in ex-pression in one or more strains when pairwise compari-sons were made between the transcriptomes of 212297and 112101 245101 or 130701 (Additional file 3) Asubset of these 70 genes is shown in Table 3 where keyexamples of alterations in metabolic processes areshown The numbers of genes that were found to showdifferential expression across the four strains reflectedthe evolutionary distances between 212297 and theother 3 strains Thus 112101 which is closely relatedto 212297 shows only 5 differentially expressed geneswhile the most distantly related strain 130701 shows 56genes differently expressed compared to 212297 Ofthese 56 genes 5 were specific to the in vitro conditionwhile 19 were specific to the Mϕ Ten genes were com-mon to both conditions which serves to validate thetechnical reproducibility of the RNA amplificationprocessUsing the genome sequencing information determined

for each of the four strains we attempted to correlatethe observed strain-specific differences in gene expres-sion with the presence or absence of mutations withinthe coding regions or promoters of those genes thatshow differential gene expression or in genes that areknown to regulate the activity of those genes Mb1749cand Mb1750c are two genes that are specificallyupregulated in 130701 and encode a toxin and antitoxin(TA) pair respectively belonging to type II TA systemsof the VapBC family [15] Members of VapB type toxinscontain PIN domains that cleave RNA and thus functionto control translation of mRNA transcripts [16] Thehomologous genes from strain 130701 show up to 19-and 10-fold higher levels of expression respectively thanthose of the other three strains An analysis of the cod-ing sequences of Mb1749c across all four strains re-vealed that the 130701 homologue has a unique nSNPat position 1932704 (wrt 212297 genomic sequence) aC-T transition that results in the nonconservative substi-tution of Gly19 to Asp Research has shown that TAgene pairs negatively regulate their own expressionthrough binding of the TA protein complex to the pro-moter region of the TA gene pair thus preventing accessto RNA polymerase [17] The G19D mutation inMb1749c could therefore impair the ability of the com-plex to bind to the promoter resulting in the deregula-tion of the TA gene pairMb2007c which shows a 4-fold higher expression in

130701 only encodes a transcriptional regulator of theLysR class There are two SNPs present in the coding

sequence of Mb2007c in 130701 which are absent inthe homologues of the other three strains the first is anSNP which results in the conservative substitution ofArg137 to Gln while the second is a more debilitatingnonsense SNP which ultimately leads to a protein whoselength is only 60 that of the wild-type Many regulatorsbelonging to the LysR family regulate their own expres-sion through a negative autoregulatory mechanism simi-lar to that described above for VapBC TA systems [18]A loss in protein integrity could therefore result in theregulator being unable to bind the regulatory regionleading to the observed upregulation in the expressionof this gene in 130701 As the product of this gene ispredicted to be a transcriptional regulator it was specu-lated that the regulation of gene(s) controlled by regula-tor could be affected in 130701 due to the severelytruncated form of this protein As LysR regulators areoften found to regulate genes that are divergently tran-scribed from the lysR gene it was surprisingly to findthat expression of the Mb2008 homologue in 130701which is predicted to encode a lysine transporter doesnot show any difference in expression in 130701 to212297 To define the regulon of this regulator we firstcompared the transcriptomes of 212297 transformedwith a multicopy plasmid expressing the truncated copyof mb2007c against a vector only control No differencesin expression were found (data not shown) which couldindicate that the regulator does not control any othergenes apart from itself or that experimental conditionsdid not favour the active form of the regulator LysR reg-ulators regulate expression of their regulon throughbinding of a co-inducer to the C-terminal domain andthe failure to observe any changes in gene expressioncould therefore be due to the absence of the co-inducerduring the experiment A further experiment to comparethe profiles of 212297 expressing either the truncatedor wild type forms of the protein also showed no differ-ences in expression (data not shown)Nitrite reductase catalyses the reduction of nitrite to

ammonia and is strongly expressed during growth in thepresence of nitrate or nitrite but repressed in the pres-ence of ammonia [19] The gene encoding the large sub-unit of the nitrite reductase nirB (Mb0258) showsapproximately 9-fold higher expression in 130701 com-pared to the other 3 strains in our standard ammoniacontaining 7H9 growth media suggesting that the strainhas lost regulatory control of this gene Expression ofnirB in M tuberculosis has been shown to be controlledby the response regulator GlnR [20] but an analysis ofthe sequence of the glnR orthologue from all four strainsrevealed no differences in either the coding or promotersequences A comparison of the nirB sequence from all4 strains did however reveal the presence of a singlebase (C to T) transition leading to a sSNP that is specific

Golby et al BMC Genomics 2013 14710 Page 5 of 18httpwwwbiomedcentralcom1471-216414710

Table 3 Fold change differences in gene expression in M bovis field isolates 1121 2451 and 1307 compared to 2122

Mb CDS Mtb CDS Common 112101 245101 130701 Product Assoc SNPInDel

7h9 RPMI 4hr MOslash 24 hr MOslash 7h9 RPMI 4hr MOslash 24 hr MOslash 7h9 RPMI 4hr MOslash 24 hr MOslash

Mb0038c Rv0037c 4uarr 3uarr 3uarr probable conservedintegral membrane

protein

Mb0124c Rv0120c fusA2 3uarr 3uarr probable elongationfactor

Mb0258 Rv0252 nirB 9uarr 12uarr 21uarr 9uarr nitrite reductase(large subunit)

sSNP at postn 303227 in 130701only C-T

Mb0259 Rv0253 nirD 3uarr 4uarr 5uarr nitrite reductase(small subunit)

Mb0428c Rv0420c 2uarr possibletransmembrane

protein

Mb0734 Rv0713 2uarr probable conservedtransmembrane

protein

Mb0947c Rv0923c 2darr 2darr conservedhypotheitical protein

Mb0948c Rv0924c mntH cation uptake system

Mb1013Mb1014

Rv0987 5uarr 10uarr 5uarr 5uarr 7uarr 6uarr probable adhesioncomponent transport

ABC transporter

nSNP at postn 1104263 in 245101 and130701 A-G (STOP to W) sSNP at

postn 1103991 in 245101 and 130701

Mb1015 Rv0988 3uarr 9uarr 3uarr 5uarr 5uarr possible conservedexported protein

Mb1161 Rv1130 2darr conservedhypothetical protein

Mb1162 Rv1131 gltA1 3darr probable citratesynthase

Mb1554c Rv1527c pks5 2darr probable polyketidesynthase

Mb1562 Rv1535 2uarr 5uarr hypothetical protein

Mb1619c Rv1593c 2uarr conservedhypothetical protein

Mb1749c Rv1720c 10uarr 19uarr 18uarr 12uarr conservedhypothetical protein

nSNP at postn 1932704 in130701 only C-T (G to D)

Mb1750c Rv1721c 6uarr 3uarr 9uarr 5uarr conservedhypothetical protein

Mb1833c Rv1804 3darr conservedhypothetical protein

Mb1834c Rv1805c 4uarr hypothetical protein

Golby

etalBM

CGenom

ics201314710

Page6of

18httpw

wwbiom

edcentralcom1471-216414710

Table 3 Fold change differences in gene expression in M bovis field isolates 1121 2451 and 1307 compared to 2122 (Continued)

Mb1835 Rv1806 PE20 3uarr PE family protein

Mb1885c Rv1854c ndh 3darr probable nadhdehydrogenase

Mb1914c Rv1882c 6uarr 3uarr 6uarr probable short-chaintype dehydrogenase

reductase

sSNP at postn 2122970 in241501 only C-T

Mb2007c Rv1985c 4uarr 3uarr 3uarr probabletranscriptional

regulatory protein

nSNP at postn 2208296 in 130701only G-A (Q to Stop)

Mb2015c RV1992c ctpG 2darr 3darr probable cationtransporter

Mb2420c Rv2398c cysW 2darr 6darr sulphate transporter

Mb2421c Rv2399c cysT 2darr 4darr sulphate transporter

Mb2607 Rv2577 9darr 5darr 7darr 12darr 10darr 7darr 7darr 13darr conservedhypothetical protein

[first part]

Mb3194 Rv3169 3uarr 3uarr 2uarr 3uarr conservedhypothetical protein

Mb3477c Rv3447c 8darr 3darr 10darr 9darr 2darr 7darr probable conservedmembrane protein

nSNP at postn 3812465 in 245101and 130701 T-C (S to G)

Mb3563c Rv3533c PPE62 5uarr 5uarr ppe family protein

Mb3721c Rv3696c glpK 3darr glycerol kinase

Mb3803 Rv3774 echA21 8uarr 7uarr 22uarr possible enoyl-coAhydratase

sSNP at postn 4155803 in241501 only G-A

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expression

Golby

etalBM

CGenom

ics201314710

Page7of

18httpw

wwbiom

edcentralcom1471-216414710

to 130701 It was not readily apparent why a sSNP in thecoding sequence of a gene should lead to an increase inexpression of that gene but there are several reports thatshow sSNPs leading to changes in stability of mRNA tran-scripts [2122] Rv0987 and Rv0988 of M tuberculosisH37Rv encode part of an ABC transporter and a putativesecreted hydrolase respectively In 212297 a single basetransition (G-A) introduces a stop codon that splitsRv0987 into the two pseudogenes Mb1013 and Mb1014Previous microarray based gene expression studies by ourgroup have shown that Rv0987 and Rv0988 in M tubercu-losis show higher levels of expression than the orthologousMb1013Mb1014 and Mb1015 respectively in M bovis212297 [23] and in the present study the Mb1013Mb1014 and Mb1015 homologues in 245101 and 130701 also showed higher expression (up to 10-fold) than thehomologues in 212297 and 112101 Comparing thesequences of Mb1013Mb1014 and Mb1015 across all 4strains indicated that strains that show high expressionhave the lsquoGrsquo alleleMb3477c encodes an ATP binding membrane protein

part of the Esx4 secretion system [24] and gene shows upto 10-fold higher expression in 245101 and 130701 com-pared to 212297 The gene also contains an A to C transi-tion at position 3812465 a nSNP at position resulting inthe non-conservative substitution of a serine to a glycineresidue

Of the 19 genes that show specific differential expres-sion in the Mϕ the most notable are Mb1914c andechA21 which show upregulation in 245101 only (upto 6- and 23-fold respectively) Both genes encodeproteins that could be involved in lipid metabolism andboth genes contain single sSNPs that are present in245101 but absent in the other three strainsReal time RT-PCR was used to verify a selection of

genes that showed differential gene expression as pre-dicted by the microarray analysis Figure 2 comparesthe fold changes in the expression levels of 4 genes asmeasured by microarray and real time RT-PCR ThenirB and Mb1749c genes were selected because theyshowed strong upregulation in 130701 in both in vitroand ex vivo Mϕ while Mb1914c and echA21 were chosenbecause the array data predicted them to be specificallyupregulated in 245101 and only in ex vivo Mϕ Foreach of the 4 genes the strain dependent pattern ofexpression as measured by real time RT-PCR was consistentwith that measured by microarray although the foldchanges measured by real time RT-PCR were higherthan those measured by microarray

Functional analysis of SNP role in differential geneexpressionThe above data showed that many of the strain specificdifferences in gene expression were linked to the presence

Figure 2 Confirmation of amplicon microarray results with real time RT-PCR The fold changes in gene expression for (a) Mb1750c (b) nirB(c) echA21 and (d) Mb1914c measured by microarray (open bars) in each of the four strains were compared to that measured by real time RT-PCR(closed bars)

Golby et al BMC Genomics 2013 14710 Page 8 of 18httpwwwbiomedcentralcom1471-216414710

of synonymous or non-synonymous SNPs located withinthe coding regions of the genes that show variable expres-sion Non-synonymous SNPs lead to changes in aminoacid sequence which can lead to changes in protein func-tion The C to T transition at position 1932704 (wrt 212297) in the coding sequence of the 130701 Mb1749chomologue leads to the non-conservative substitution ofGly19 to Asp and this nSNP appears to be linked to theupregulation of both Mb1749c and Mb1750c in thatstrain In order to confirm this a 09 kb DNA fragmentcontaining the Mb1749c-Mb1750c-lsquoMB1751c region of130701 (containing the lsquoTrsquo allele) and the equivalentregion from 212297 (with the lsquoCrsquo allele) were PCRamplified and the fragments were cloned separatelyinto the mycobacterial shuttle vector pKINT (see Methods)to create the constructs pPG107 and pPG106 respectivelyThe two constructs were introduced into Mycobacteriumsmegmatis mc2155 separately and then the expression ofMb1749c and Mb1750c in M smegmatis pPG101 wascompared to that of M smegmatis pPG102 using real timeRT-PCR Table 4 shows that the expression levels ofMb1750c and Mb1749c in the strain expressing themutated forms of Mb1749cMb1750c are 13- and 9-foldhigher respectively compared to the strain expressingthe wild type forms confirming that this SNP is respon-sible for the observed up-regulation of the two genes in130701Synonymous substitutions do not lead to changes in

protein sequence and have generally been considered tobe lsquosilentrsquo or benign Recent studies however have sug-gested that sSNPs can have functional effects such asdecreased mRNA stability and translation [2122] Inour own studies we have found several genes whose ex-pression levels correlate with the presence of sSNPs inthe coding regions of those genes For example a C-Ttransversion at position 303227 (wrt 212297) withinthe coding sequence of nirB of 130701 is a sSNP thatappears to be linked with the upregulation in expressionof nirB within that strain To confirm that this is thecase we PCR amplified 35 kb DNA fragments containing

the hsp-nirB-nirD-cobU region of strain 130701 (containingthe lsquoCrsquo allele) and the equivalent region from strain 212297(with the lsquoTrsquo allele) and cloned them separately into theintegrating vector pKINT to create the constructspPG108 and pPG109 respectively These constructswere introduced into 212297 and the expression levelsof nirB and nirD were found to be 30- and 2-fold re-spectively higher in the strain expressing the mutatednirBD locus compared to the strain expressing the wild-type form (Table 4) This confirms that this mutation isresponsible for the upregulation of the two genes in thisstrain

Use of a high density tiled oligonucleotide microarray todetect differentially expressed small RNA transcripts inM bovis isolatesThe M tuberculosisM bovis amplicon arrays used inthe present study were specifically designed to measureexpression levels of genes annotated in the genomic se-quence of M bovis 212297 [10] They were not how-ever designed to monitor the expression of non-codingRNA such as small RNA within intergenic regions orantisense sRNA Hence a high density tiled oligonucleotidemicroarray consisting of approximately 180000 partially-overlapping (10-base overlap) short 60 mer oligonucleo-tides was designed that offered an unbiased approach tothe detection of strand specific transcripts encoded overthe entire M bovis 212297 chromosome Total RNAthat includes small sized (lt100 nt) RNA species wasextracted from the four M bovis strains that had beengrown in liquid media and hybridised to the oligo-nucleotide microarray To avoid potential secondarystrand synthesis during cDNA synthesis which couldbe interpreted as sRNA the RNA was directly labelledwith cyanine based dyes After pairwise comparisonswere performed between 212297 and 112101 245101or 130701 220 oligonucleotide probes were identifiedthat detected differentially expressed transcripts (25 foldcut off ) in one or more of the three strains (Additionalfile 4) Only transcripts detected by multiple (2 or more)overlapping probes were regarded as genuine transcriptsas those detected by single probes could be due to cross-hybridisation effects or represent spurious transcriptsUsing these criteria 26 transcripts designated T1-T26were found to show differential expression in one or moreof the strains (Table 5) and those transcripts can be di-vided into those that are encoded within intergenic re-gions and those encoded within the genomic co-ordinatesencompassing annotated coding sequences Comparisonof the differentially expressed gene lists identified usingamplicon vs oligonucleotide arrays (Tables 3 and 5) itis clear that many of the transcripts detected using theamplicon arrays are not necessarily encoded on thesense gene strand as had been previous interpreted For

Table 4 Confirmation of SNP linkage to upregulation ingene expression

Strain Gene Fold change

M smegmatis mc2155 (pPG101) Mb1750c 161 plusmn 32

M smegmatis mc2155 (pPG101) Mb1749c 125 plusmn 12

M smegmatis mc2155 (pPG102) Mb1750c 19 plusmn 08

M smegmatis mc2155 (pPG102) Mb1749c 14 plusmn 04

Mb 212297 (pPG108) nirB 20 plusmn 10

Mb 212297 (pPG108) nirD 13 plusmn 05

Mb 212297 (pPG109) nirB 529 plusmn 159

Mb 212297 (pPG109) nirD 31 plusmn 08

Fold changes are the mean ratios plusmn standard deviation

Golby et al BMC Genomics 2013 14710 Page 9 of 18httpwwwbiomedcentralcom1471-216414710

Table 5 Differential expression of RNA transcripts as detected by a tiled oligonucleotide microarray

Transcript No Probes Position Size cds Strand 112101 245101 130701 CDS Product SNP

T1 3 303138-303295 157 Mb0258nirB A 3uarr nitrite reductase (large sub-unit)

T2 5 303260-303515 255 Mb0258nirB S 5uarr nitrite reductase (large sub-unit) sSNP at 303227 C-T in 130701 only

T3 2 1105085-1105193 108 Mb1014 A 3uarr 3uarr probable adhesion componentof ABC transporter

T4 2 1105158-1105315 157 Mb1014 A 3uarr 3uarr probable adhesioncomponent of ABC transporter

T5 3 1105158-1105315 157 Mb1014 S 3uarr 3uarr probable adhesioncomponent of ABC transporter

T6 5 1778876-1779131 255 Mb1618c A 9uarr possible secreted lipase sSNP at 1778879 C-T in 112101 only

T7 2 1840174-1840282 108 Mb1672c S 4darr hypothetical protein

T8 3 1932158-1932315 157 tRNA-Pro S 3uarr proline tRNA

T9 5 1932452-1932707 255 Mb1749c S 4uarr toxin component oftoxin-antitoxin system

nSNP at 1932704 C-T (G to D) in 130701 only

T10 5 1932746-1933075 329 Mb1750c S 7uarr anti-toxin component oftoxin-antitoxin system

T11 6 2122959-2123263 304 Mb1914c A 12uarr probable short-chain typedehydrogenasereductase

sSNP at 2122970 C-T in 245101 only

T12 2 2324463-2324571 108 Mb2110 S 3uarr hypothetical protein(frame-shifted in 212297)

T13 2 2328944-2329052 108 Mb2116cpepE S 2darr 3darr probable dipeptidase

T14 2 2329483-2329591 108 Mb2117 A 10darr 10darr 5prime-3prime exonuclease nSNP at 2329583 A-G (comp strand I to T)in 245101 only

T15 3 2868310-2868467 157 Mb2607 A 6darr 8darr hp (frame-shifted in 212297)

T16 2 2868506-2868614 108 Mb2607 A 7darr 7darr hp (frame-shifted in 212297) nSNP at 2868616 A-G (stop to W) in245101 and 130701

T17 2 3075740-3075848 108 I 5uarr 6uarr

T18 2 3075887-3075995 108 I 5uarr 6uarr

T19 2 3076622-3076730 108 I 4uarr 4uarr

T20 2 3076769-3076877 108 I 5uarr 5uarr

T21 2 3078728-3078836 108 I 6uarr 7uarr

T22 2 3079169-3079277 108 I 4uarr 6uarr

T23 2 3079903-3080011 108 I 6uarr 7

T24 2 3844577-3844685 108 Mb3509c S 3uarr 3uarr possible acyltransferase(frame-shifted in 212297)

T25 6 4155501-4155805 304 Mb3803echA21 A 5uarr possible enoyl-coA hydratase sSNP at 4155803 C-T (comp strand) in 245101 only

T26 2 4316987-4317095 108 Mb3927c A 3uarr chp (frame-shifted in H37Rv)

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expressionStrand A indicates antisense S sense and I intergenic

Golby

etalBM

CGenom

ics201314710

Page10

of18

httpwwwbiom

edcentralcom1471-216414710

example the amplicon array data had appeared to suggestthat Mb1914c and echA21 were upregulated in 245101but the oligo array data indicates that transcripts 11 and25 which are encoded within the co-ordinates encoded bythose two genes are actually encoded on the antisensestrands This apparent discrepancy can be rationalisedonce we consider that double stranded amplicon micro-array probes cannot discriminate between transcriptsencoded on the sense or antisense strands Transcripts 11and 25 can therefore be considered as potential antisensesRNAs (asRNA) which could be involved in translationalor post-transcriptional control of the sense transcriptOther potential cis-encoded sRNAs detected using the ar-rays include T6 T14 and T15T16 which are encoded onthe antisense strands to Mb1618c Mb2117 and Mb2607respectively and for each of these transcripts their expres-sion appears to be linked to the presence of a single SNPwithin the co-ordinates of the genes The approximateboundaries of these transcripts can be derived from thegenomic co-ordinates of the oligonucleotide probes thatdetect the expression of the transcript Thus the tran-scripts appear to be between 100ndash300 nt in size and thepositions of the linked SNPs appear to be positioned eitherjust upstream or within the predicted 5primeend of the tran-scripts (Figure 3) Three of the transcripts (T11 T14 andT25) are antisense to the central part of the sense encodedgene while T6 is encoded antisense to the 5prime end ofMb1618c As well as antisense transcripts we also saw thedifferential expression of sense transcripts The ampliconmicroarray data (confirmed by real time RT-PCR) indi-cated that nirB is strongly upregulated specifically in130701 in both in vitro and ex-vivo Mϕ An analysis ofthe oligonucleotide array data however indicates thatthere are two short transcripts T1 and T2 (sense and anti-sense respectively) that are encoded within the genomicco-ordinates of the nirB gene T2 is the longer in size (255vs 155nt) and more highly expressed (5 vs 3-fold) thanT1 and both transcripts appear to be linked to the pres-ence of a SNP that is located within the middle of T1 andapproximately 50nt upstream of T2Some of the transcripts are bone fide gene sense

strand mRNA transcripts such as T9 and T10 whichare encoded by Mb1749c and Mb1750c respectivelyAlthough it would appear that the two genes are tran-scribed separately it is probable that the two transcriptsare co-transcribed as the stop codon of Mb1750c over-laps the start codon of Mb1749c Eight of the tran-scripts listed in Table 5 are encoded within intergenicregions 7 of which are encoded within the polymorphicdirect repeat (DR) locus The DR locus of strains belong-ing to the M tuberculosis complex has been suggested toconstitute a CRISPR locus which have been shown inmany species of bacteria to be involved in protectionagainst exogenous foreign DNA such as plasmids and

phage [25] All the DR encoded transcripts are short(approx 100 nt) straddle contiguous repeat and spacersequences and show approximately 5-fold higher levelsof expression in 245101 and 130701 compared with212297 and 112101

Characterisation of differentially expressed cis asRNAThe genomic co-ordinates of the oligonucleotide probesthat detected the antisense species described above canonly serve as approximate estimations as to their startand end points Thus we used 5prime RLM-RACE (RNALigase Mediated Rapid Amplification of cDNA Ends) inan attempt to accurately define the transcriptional startsites (TSS) for the short sense transcript T2 and the anti-sense transcripts T6 T11 and T25 described in the abovesection (see Methods) These transcripts were chosen astheir expression levels are high and their transcript lengthswere considered to be sufficiently long to enable theRLM-RACE methodology to work Table 6 details thesizes of the PCR products obtained after RLM-RACE wasperformed using oligonucleotide primers designed tosequences predicted for transcripts T6 T11 and T25 NoPCR product was obtained for transcript T2 For each ofthe three transcripts the TSS was determined to be a Gresidue which is the most commonly used residue typefor mycobacterial TSSrsquos (Figure 4) [26] For each of the T6T11 and T25 transcripts expression of the asRNAs waslinked to the presence of a SNP (C to T) proximal tothe 5prime end of the asRNA Strains exhibiting the lsquoCrsquo alleleshowed no expression of the asRNA whilst the strainthat showed expression had the lsquoTrsquo allele An analysis ofthe nucleotide sequence in the vicinity of the SNPsreveals that for each of the three transcripts the SNPconstitutes the 6th residue of a motif that has stronghomology to the consensus sequence for the minus10 elementof Group A mycobacterial promoters (Figure 4) [26] Thefinding that a lsquoTrsquo residue is associated with expression isconsistent with the consensus sequence which indicatesthat 86 of all minus10 elements have a lsquoTrsquo residue at the 6thresidue position Several residues that flank the minus10 motifalso show a degree of conservation Sequence motifswhich show strong homology to group A minus35 elementsare present 18ndash19 bp upstream of the putative minus10elements and the distances between the minus35 -10 andTSS elements are consistent with those elements ofthe consensus sequence No protein encoding openreading frames were detected within the T6 T11 andT25 transcriptsIn a parallel study high density oligonucleotide micro-

arrays were also used to interrogate the transcriptomesof M tuberculosis H37Rv M bovis BCG Pasteur Myco-bacterium caprae and M bovis AN5 that had beengrown in Middlebrook 7H9 media As a result of theseexperiments two asRNA species were found to be

Golby et al BMC Genomics 2013 14710 Page 11 of 18httpwwwbiomedcentralcom1471-216414710

expressed within the antisense strands of the ino1 andnarH genes of M tuberculosis H37Rv but not in any ofthe other 3 strains tested (data not shown) A compari-son of nucleotide sequences of the orthologous genesacross the species suggested that expressions of theas_sRNAs correlated with the presence of a sSNP (C to T

transition at positions 50555 and 1292100 wrt H37Rvgenomic sequence for as_ino1 and as_narH respectively)upstream of the asRNAs Approximate information re-garding the transcriptional start site was deduced from thebinding co-ordinates of the probes that detected the tran-scripts As with the M bovis antisense sRNAs described

Figure 3 Expressions and schematic representation of genomic locations of selected cis-encoded antisense sRNAs identified using atiled oligonucleotide microarray Three asRNAs (open arrows) are (a) T6 (b) T14 and (c) T25 For each asRNA a histogram plots the foldchanges for each of the oligonucleotide probes that detected the asRNA and for each probe the binding position relative to the 212297genome is indicated Closed and open arrows indicate lengths and direction of transcription of genes and asRNAs respectively

Golby et al BMC Genomics 2013 14710 Page 12 of 18httpwwwbiomedcentralcom1471-216414710

above the T residue associated with the expression ofthe M tuberculosis asRNAs is part of a putative minus10element A minus35 element with an appropriate spacing tothe minus10 element was identified for as_ino1 but not foras_narH suggesting that the as_narH promoter maybelong to group B mycobacterial promoters that havea conserved minus10 but no minus35 motif [26]

DiscussionsThe aim of this work was to define possible phenotypicvariation across M bovis field isolates through a combin-ation of genome sequencing comparative genomics andtranscriptome analyses from both in vitro and ex vivo con-ditions Using these approaches we uncovered a range ofnovel findings the most striking of which was the realisa-tion that genes that had been predicted to be differentiallyexpressed based on amplicon-microarray data were in fact

not upregulated and that instead it was an antisense tran-script that was showing differential expression Analysingboth transcriptome and genome sequence data allowed usto identify SNPs responsible for the transcription of anti-sense RNAs with generation of a consensus minus10 promotersequence the likely mechanism Our results suggest thatdata generated from amplicon arrays in the past may needto be revisited as it is possible that some coding-sequences identified as being differentially expressed wereinstead antisense transcriptsWith the growth of technologies such as high density

tiled oligonucleotide microarrays and next generationsequencing there has been a rapid increase in the num-ber of reports describing the existence of non-codingRNAs (ncRNAs) in bacteria Non-coding RNAs broadlyconsist of two types cis- and trans-encoded RNA TransRNA includes intergenic encoded RNA while cis-encodedRNA includes 5prime and 3prime untranslated regions of mRNAand antisense RNA To study the expressions of both cis-and trans encoded ncRNA we used a high density oligo-nucleotide tiled microarray since our amplicon microarraywas unable to detect intergenic transcripts or differentiatebetween sense and antisense transcripts Previous studiesusing M tuberculosis have identified substantial amountsof ncRNA encoded in both intergenic and intragenic re-gions [78] We detected substantial amounts of ncRNAs

Figure 4 Promoters of anti sense RNAs a Promoters of the asRNAs as_mb1618c as_1914c and as_echA21 -10 and minus35 elements areindicated in bold and italics Transcriptional start sites are indicated by large font G characters while SNP residue that leads to the expression ofthe asRNA is indicated by a large font red T residue The consensus sequence for group a mycobacterial promoters is indicated Numericalsubscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determinedthat show the indicated residue b Promoters of the differentially expressed asRNAs as_ino1 and as_narH in M tuberculosis -10 and minus35 elementsare indicated in bold italics The red residue indicates SNP responsible for differential expression

Table 6 Determination of transcriptional start sites ofantisense RNAs

Transcript asRNA PCRprod size

TSSresidue type

TSS residue postn(wrt 212297)

T6 mb1618c_as 150 G 1778886

T11 mb1914c_as 220 G 2122978

T25 echA21_as 175 G 4155796

Golby et al BMC Genomics 2013 14710 Page 13 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

BackgroundMycobacterium bovis is the causative agent of bovinetuberculosis (bTB) an endemic disease of cattle inGreat Britain (GB) with the potential for zoonotic trans-mission to humans In GB the primary control of bTB isthrough lsquotest and slaughterrsquo surveillance whereby cattlethat are positive to the tuberculin skin test [1] areremoved from the herd and slaughtered In spite of thisapproach which has been in place since the 1950s thenumber of TB-positive cattle slaughtered is increasingyear on year - approximately 30000 cattle were tested andslaughtered between 2012ndash2013 compared to 300 between1995ndash1996 (httpwwwdefragovukanimal-diseasesa-zbovine-tb) The UK (GB and Northern Ireland) govern-ments currently spend approximately pound100 million peryear collectively on control measures and compensationto farmers for slaughtered cattle The failure of the test-and-slaughter policy to control the spread of infectionin large parts of GB suggests that we need a muchgreater understanding of the TB disease dynamic includ-ing the role of pathogen diversity as a potential driver ofthis processM bovis isolates that are cultured from skin test-reactor

animals are currently genetically typed using a combinationof spoligotyping [2] and VNTR [3] Spoligotyping exploitsa polymorphic region of the genome called the DR locuswhich consists of multiple identical 36bp repeats inter-spersed with unique sequences known as spacers Isolatesof M bovis differ in the presence or absence of spacersand adjacent DRs allowing a lsquobarcodersquo to be generated foreach molecular type Spoligotypes 9 and 17 are the domin-ant molecular types in the UK with more than one thirdof all isolates corresponding to Type 9 and a quarter toType 17 VNTR measures the variation at repeat se-quences in 6 regions of the genome There are 6 majorVNTR types for Type 9 while all others show only onedominant profile suggesting that M bovis Type 9strains are more genetically variable compared withother spoligotypes Integration of molecular typingwith geographical information systems allows temporaland spatial distribution of molecular types to bemapped across GB Type 9 isolates are widely distrib-uted across GB while type 17 is an emerging clonewhich has expanded out of foci around GloucesterHereford and Worcester Similarly Types 25 and 35have expanded out of StaffordshireShropshire andHerefordWorcester respectively Between them types25 35 9 and 17 encompass the diversity of the majorclonal lineages of M bovis circulating in the UKAn analysis of molecular typing data from ~11500

M bovis isolates revealed that the population structureof M bovis in GB could not be explained by randommutation and drift and instead it appeared that certainstrains were increasing at a faster rate relative to others

[4] One suggestion for the lsquoclonal expansionrsquo of GBM bovis genotypes was that certain genotypes had aselective advantage over others leading to an increasein their frequency in the population [4] Supportive ofthis hypothesis several lines of evidence have suggestedthat M bovis isolates show phenotypic differences toeach other Fourier-Transform Infrared Spectroscopy(FT-IR) has been used to generate metabolic profiles ofthe 10 major spoligotype groups of M bovis isolatescirculating in GB Clustering analysis of the resultingspectra showed that the spectra could be differentiatedaccording to spoligotype indicating that strains of dif-ferent spoligotypes possess phenotypically distinct traits[5] In addition it has also been shown that type 17 iso-lates have lower incorporation rates of propionate intomembrane lipid components compared to other fieldstrains suggesting a degree of metabolic remodelling inthe type 17 lineage [6] Hence it appears that geneticdifferences across M bovis lineages may impact onphenotypic traits This latter finding may have import-ant implications for vaccine and diagnostic test develop-ment in terms of which experimental challenge strainsto test vaccines against or on influencing diagnostic testperformanceIn an attempt to better define genetic differences

across the major M bovis lineages circulating in GBthat may give rise to phenotypic differences of practicalimportance we have used a combination of genomesequencing transcriptome analyses and recombinantDNA technology The genomes of three M bovis fieldisolates were sequenced using Illumina sequencingtechnology and strain specific differences in gene ex-pression were measured during in vitro growth and inex vivo bovine alveolar macrophages (Mϕ) using awhole genome amplicon microarray Recent discover-ies of small non coding RNA within mycobacteria [78]prompted us to assess differences in sRNA expressionacross the isolates using a whole genome tiled oligo-nucleotide microarray SNPsmall base pair insertionand deletions (INDELs) and gene expression data wereoverlaid onto the genomic sequence of the fully se-quenced strain of M bovis 212297 to link observedstrain specific genomic differences with differences inRNA expression

ResultsComparative genomics of M bovis field isolates usingwhole genome sequencing and microarraysThe strains for this study were chosen to reflect thegenomic diversity of the M bovis population circulatingin GB and are listed in Table 1 M bovis strains weretyped using a combination of spoligotyping and VNTRFor each spoligotype group an isolate which possessedthe most common VNTR profile was selected so that

Golby et al BMC Genomics 2013 14710 Page 2 of 18httpwwwbiomedcentralcom1471-216414710

each chosen strain was the most representative of eachspoligotype group (Table 1) Of the four studied strains245101 and 130701 diverged earliest during descentfrom the most recent common ancestor of M bovis inGB and are more distant to strains 112101 and 130701 (Smith N personal communication) All 4 strainswere isolated from diseased cows belonging to herdswhich were taken from farms in geographically diverseareas of the countryThe genomes of the three M bovis strains 112101

245101 and 130701 were paired-end sequenced usingIllumina sequencing technology Processed sequencereads were mapped to the genome of the fully sequenced

and annotated strain 212297 [10] to identify SNPsSNPs were identified across all four sequenced M bovisstrains and their positions together with their SNPclass are listed in Additional file 1 The evolutionaryrelationships between the three sequenced strains aredepicted in Figure 1 using a phylogenetic tree and adistance matrix plot Three genome sequenced membersof the Mycobacterium tuberculosis complex M bovisBCG Pasteur M tuberculosis H37Rv and M africanumGM041182 were included in the analysis to place theM bovis strains in a wider mycobacterial context Thenumbers of SNPs between M bovis 212297 and thethree newly sequenced M bovis strains were found

Table 1 M bovis field strains used in this study

AHVLA ID International spoligotype ID [9] UK spoligotype ID VNTR ID Year of isolation UK county of isolation

112101 SB0263 17 7555331 2001 Wiltshire

212297 SB0140 9 8555331 1997 Devon

245101 SB0129 25 6554231 2001 Clwyd

130701 SB0134 35 3354331 2001 Shropshireindicates partial number of repeats

Figure 1 Whole genome SNP-based evolutionary analysis of M bovis sequenced strains (a) Phylogenetic tree with numbers above thebranches indicating the number of SNPs identified between the organism and its common ancestor (b) Distance matrix plot showing thenumber of SNPs present between selected pairs of strains

Golby et al BMC Genomics 2013 14710 Page 3 of 18httpwwwbiomedcentralcom1471-216414710

to be consistent with their predicted evolutionary distancesfrom each other Strain 112101 (type 17) is most closelyrelated to the original genome sequenced strain 212297(type 9) with 114 SNPs whereas the more distantly relatedstrains 245101 (type 25) and 130701 (type 35) have 431and 552 SNP differences respectively All four sequencedM bovis strains were more distantly related to Mafricanum (~1800 SNPs) and M tuberculosis H37Rv(~2200 SNPs)In addition to SNPs both large and small INDELs can

be inferred from NGS data although there are severalchallenges involved [11] Large sized insertions (gt500bp) are particularly difficult to identify with accuracy asthey require the de novo assembly of the reads that failto map to the reference genome and the subsequentdatabase searching with the resulting contigs Accurateidentification of large sized deletions (gt500 bp) are how-ever more easily identifiable and Additional file 2 liststhose that have been identified by NGS in the genomesof the three sequenced strains As poor coverage canconfound the identification of deletions we sought touse microarray technology to confirm the deletions iden-tified by the NGS data Genomic DNA was isolated fromall four strains labelled with fluorescent dyes andhybridised to a whole genome M tuberculosisM bovisamplicon microarray (see Methods) Table 2 lists severalLSPs that were detected across the strains usingmicroarrays Both NGS and microarray data predict thepresence of the large 68kb deletion (RDbovis(d)_0173)which encompasses genes Mb1963-Mb1971 and appearsto be specific to UK strains belonging to Type 17 thathas been described in a previous study [12] Several ofthese gene products are predicted to encode proteins in-volved in lipid metabolism but the lipid composition ofseveral type 17 isolates was found to be no different toother M bovis strains although their ability to incorpor-ate propionate into mycolic acids was found to be lower[6] A smaller 16 kb deletion specific 130701 that com-prises the 3prime end of Mb2056c Mb2055c and the 5prime end

of pkfB (Mb2054c) was also detected by both NGS andmicroarray data Due to a single base deletion Mb2056cand Mb2055c are pseudogenes in 212297 but the twogenes exist as one intact functional gene in 2145 andH37Rv (Rv2030c) The pfkB gene encodes a phospho-fructokinase homologue and is strongly immunogenic inhuman TB patients while Rv2030c encodes an erythro-mycin esterase Both pfkB and Mb2056c are members ofthe DosR regulon which are highly upregulated underanaerobic conditions and have been implicated in bac-terial persistence in vivo [13] Other smaller deletionsdetected include a deletion of a probable lipid transferprotein encoding gene Mb1699c which is specific to130701 and an aldoketo reductase encoding geneMb2320 that is specific to 112101

Linking SNPs to genes that show differential expressionamongst M bovis strains grown under vitro conditionsand in ex vivo macrophagesThe four M bovis field strains were grown to mid-logarithmic phase in pyruvate-containing Middlebrook7H9 liquid media and then used to infect bovine alveo-lar Mϕ using a multiplicity of infection (MOI) of 101(bacilli Mϕ) Mycobacterial RNA was recovered frominfected host cells 4 and 24 hrs post infection using adifferential lysis procedure and amplified using a modi-fied procedure similar to that described by van Gelderet al ([14] see Methods) As a control RNAs were alsoextracted from strains that had been incubated staticallyin RPMI cell culture media for a period of 4 hrs Toeliminate potential skewing effects on the transcriptomeresulting from the amplification process comparisonswere made only between amplified RNA generated fromsamples collected at the same time point and biologicalreplicate For the in vitro growth condition total RNAswere extracted from the four strains grown in a pyruvate-containing Middlebrook 7H9 liquid media and rolledduring incubation

Table 2 Large sequence polymorphisms present in M bovis field isolates

Mb CDS Mtb CDS Genomic postn(wrt to 212297)

Size ofdeletion (bp)

Gene(s) 212297 112101 245101 130701 Product(s)

Mb1699c Rv1627c nd nd DEL Probable lipid transfer proteinaMb1963c-Mb1971

Rv1928c-Rv1936

2172231-2179038 6807 tpx fadE17fadE18 echA13

DEL Gene products have roles inlipid metabolism

Mb2054c Rv2029c 2259798-2261457 1659 pfkB DEL Possible phosphofructokinase

Mb2055cMb2056c

Rv2030c DEL Conserved hypothetical protein(frame-shifted in 221297)

Mb2320 Rv2298 nd nd DEL DEL Aldoketo reductase

Mb3923c Rv3894c nd nd DEL Conserved membrane protein(frame-shifted in 221297)

DEL indicates genes that are deleted from strainsaMostowy et al 2005 nd not determined

Golby et al BMC Genomics 2013 14710 Page 4 of 18httpwwwbiomedcentralcom1471-216414710

The RNAs extracted from each of the growth condi-tions were converted to cyanine labelled cDNA using re-verse transcriptase and hybridised to whole genomeamplicon microarrays Using only those genes that arecommon to all four strains we found a total set of 70genes that showed a 25-fold or more difference in ex-pression in one or more strains when pairwise compari-sons were made between the transcriptomes of 212297and 112101 245101 or 130701 (Additional file 3) Asubset of these 70 genes is shown in Table 3 where keyexamples of alterations in metabolic processes areshown The numbers of genes that were found to showdifferential expression across the four strains reflectedthe evolutionary distances between 212297 and theother 3 strains Thus 112101 which is closely relatedto 212297 shows only 5 differentially expressed geneswhile the most distantly related strain 130701 shows 56genes differently expressed compared to 212297 Ofthese 56 genes 5 were specific to the in vitro conditionwhile 19 were specific to the Mϕ Ten genes were com-mon to both conditions which serves to validate thetechnical reproducibility of the RNA amplificationprocessUsing the genome sequencing information determined

for each of the four strains we attempted to correlatethe observed strain-specific differences in gene expres-sion with the presence or absence of mutations withinthe coding regions or promoters of those genes thatshow differential gene expression or in genes that areknown to regulate the activity of those genes Mb1749cand Mb1750c are two genes that are specificallyupregulated in 130701 and encode a toxin and antitoxin(TA) pair respectively belonging to type II TA systemsof the VapBC family [15] Members of VapB type toxinscontain PIN domains that cleave RNA and thus functionto control translation of mRNA transcripts [16] Thehomologous genes from strain 130701 show up to 19-and 10-fold higher levels of expression respectively thanthose of the other three strains An analysis of the cod-ing sequences of Mb1749c across all four strains re-vealed that the 130701 homologue has a unique nSNPat position 1932704 (wrt 212297 genomic sequence) aC-T transition that results in the nonconservative substi-tution of Gly19 to Asp Research has shown that TAgene pairs negatively regulate their own expressionthrough binding of the TA protein complex to the pro-moter region of the TA gene pair thus preventing accessto RNA polymerase [17] The G19D mutation inMb1749c could therefore impair the ability of the com-plex to bind to the promoter resulting in the deregula-tion of the TA gene pairMb2007c which shows a 4-fold higher expression in

130701 only encodes a transcriptional regulator of theLysR class There are two SNPs present in the coding

sequence of Mb2007c in 130701 which are absent inthe homologues of the other three strains the first is anSNP which results in the conservative substitution ofArg137 to Gln while the second is a more debilitatingnonsense SNP which ultimately leads to a protein whoselength is only 60 that of the wild-type Many regulatorsbelonging to the LysR family regulate their own expres-sion through a negative autoregulatory mechanism simi-lar to that described above for VapBC TA systems [18]A loss in protein integrity could therefore result in theregulator being unable to bind the regulatory regionleading to the observed upregulation in the expressionof this gene in 130701 As the product of this gene ispredicted to be a transcriptional regulator it was specu-lated that the regulation of gene(s) controlled by regula-tor could be affected in 130701 due to the severelytruncated form of this protein As LysR regulators areoften found to regulate genes that are divergently tran-scribed from the lysR gene it was surprisingly to findthat expression of the Mb2008 homologue in 130701which is predicted to encode a lysine transporter doesnot show any difference in expression in 130701 to212297 To define the regulon of this regulator we firstcompared the transcriptomes of 212297 transformedwith a multicopy plasmid expressing the truncated copyof mb2007c against a vector only control No differencesin expression were found (data not shown) which couldindicate that the regulator does not control any othergenes apart from itself or that experimental conditionsdid not favour the active form of the regulator LysR reg-ulators regulate expression of their regulon throughbinding of a co-inducer to the C-terminal domain andthe failure to observe any changes in gene expressioncould therefore be due to the absence of the co-inducerduring the experiment A further experiment to comparethe profiles of 212297 expressing either the truncatedor wild type forms of the protein also showed no differ-ences in expression (data not shown)Nitrite reductase catalyses the reduction of nitrite to

ammonia and is strongly expressed during growth in thepresence of nitrate or nitrite but repressed in the pres-ence of ammonia [19] The gene encoding the large sub-unit of the nitrite reductase nirB (Mb0258) showsapproximately 9-fold higher expression in 130701 com-pared to the other 3 strains in our standard ammoniacontaining 7H9 growth media suggesting that the strainhas lost regulatory control of this gene Expression ofnirB in M tuberculosis has been shown to be controlledby the response regulator GlnR [20] but an analysis ofthe sequence of the glnR orthologue from all four strainsrevealed no differences in either the coding or promotersequences A comparison of the nirB sequence from all4 strains did however reveal the presence of a singlebase (C to T) transition leading to a sSNP that is specific

Golby et al BMC Genomics 2013 14710 Page 5 of 18httpwwwbiomedcentralcom1471-216414710

Table 3 Fold change differences in gene expression in M bovis field isolates 1121 2451 and 1307 compared to 2122

Mb CDS Mtb CDS Common 112101 245101 130701 Product Assoc SNPInDel

7h9 RPMI 4hr MOslash 24 hr MOslash 7h9 RPMI 4hr MOslash 24 hr MOslash 7h9 RPMI 4hr MOslash 24 hr MOslash

Mb0038c Rv0037c 4uarr 3uarr 3uarr probable conservedintegral membrane

protein

Mb0124c Rv0120c fusA2 3uarr 3uarr probable elongationfactor

Mb0258 Rv0252 nirB 9uarr 12uarr 21uarr 9uarr nitrite reductase(large subunit)

sSNP at postn 303227 in 130701only C-T

Mb0259 Rv0253 nirD 3uarr 4uarr 5uarr nitrite reductase(small subunit)

Mb0428c Rv0420c 2uarr possibletransmembrane

protein

Mb0734 Rv0713 2uarr probable conservedtransmembrane

protein

Mb0947c Rv0923c 2darr 2darr conservedhypotheitical protein

Mb0948c Rv0924c mntH cation uptake system

Mb1013Mb1014

Rv0987 5uarr 10uarr 5uarr 5uarr 7uarr 6uarr probable adhesioncomponent transport

ABC transporter

nSNP at postn 1104263 in 245101 and130701 A-G (STOP to W) sSNP at

postn 1103991 in 245101 and 130701

Mb1015 Rv0988 3uarr 9uarr 3uarr 5uarr 5uarr possible conservedexported protein

Mb1161 Rv1130 2darr conservedhypothetical protein

Mb1162 Rv1131 gltA1 3darr probable citratesynthase

Mb1554c Rv1527c pks5 2darr probable polyketidesynthase

Mb1562 Rv1535 2uarr 5uarr hypothetical protein

Mb1619c Rv1593c 2uarr conservedhypothetical protein

Mb1749c Rv1720c 10uarr 19uarr 18uarr 12uarr conservedhypothetical protein

nSNP at postn 1932704 in130701 only C-T (G to D)

Mb1750c Rv1721c 6uarr 3uarr 9uarr 5uarr conservedhypothetical protein

Mb1833c Rv1804 3darr conservedhypothetical protein

Mb1834c Rv1805c 4uarr hypothetical protein

Golby

etalBM

CGenom

ics201314710

Page6of

18httpw

wwbiom

edcentralcom1471-216414710

Table 3 Fold change differences in gene expression in M bovis field isolates 1121 2451 and 1307 compared to 2122 (Continued)

Mb1835 Rv1806 PE20 3uarr PE family protein

Mb1885c Rv1854c ndh 3darr probable nadhdehydrogenase

Mb1914c Rv1882c 6uarr 3uarr 6uarr probable short-chaintype dehydrogenase

reductase

sSNP at postn 2122970 in241501 only C-T

Mb2007c Rv1985c 4uarr 3uarr 3uarr probabletranscriptional

regulatory protein

nSNP at postn 2208296 in 130701only G-A (Q to Stop)

Mb2015c RV1992c ctpG 2darr 3darr probable cationtransporter

Mb2420c Rv2398c cysW 2darr 6darr sulphate transporter

Mb2421c Rv2399c cysT 2darr 4darr sulphate transporter

Mb2607 Rv2577 9darr 5darr 7darr 12darr 10darr 7darr 7darr 13darr conservedhypothetical protein

[first part]

Mb3194 Rv3169 3uarr 3uarr 2uarr 3uarr conservedhypothetical protein

Mb3477c Rv3447c 8darr 3darr 10darr 9darr 2darr 7darr probable conservedmembrane protein

nSNP at postn 3812465 in 245101and 130701 T-C (S to G)

Mb3563c Rv3533c PPE62 5uarr 5uarr ppe family protein

Mb3721c Rv3696c glpK 3darr glycerol kinase

Mb3803 Rv3774 echA21 8uarr 7uarr 22uarr possible enoyl-coAhydratase

sSNP at postn 4155803 in241501 only G-A

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expression

Golby

etalBM

CGenom

ics201314710

Page7of

18httpw

wwbiom

edcentralcom1471-216414710

to 130701 It was not readily apparent why a sSNP in thecoding sequence of a gene should lead to an increase inexpression of that gene but there are several reports thatshow sSNPs leading to changes in stability of mRNA tran-scripts [2122] Rv0987 and Rv0988 of M tuberculosisH37Rv encode part of an ABC transporter and a putativesecreted hydrolase respectively In 212297 a single basetransition (G-A) introduces a stop codon that splitsRv0987 into the two pseudogenes Mb1013 and Mb1014Previous microarray based gene expression studies by ourgroup have shown that Rv0987 and Rv0988 in M tubercu-losis show higher levels of expression than the orthologousMb1013Mb1014 and Mb1015 respectively in M bovis212297 [23] and in the present study the Mb1013Mb1014 and Mb1015 homologues in 245101 and 130701 also showed higher expression (up to 10-fold) than thehomologues in 212297 and 112101 Comparing thesequences of Mb1013Mb1014 and Mb1015 across all 4strains indicated that strains that show high expressionhave the lsquoGrsquo alleleMb3477c encodes an ATP binding membrane protein

part of the Esx4 secretion system [24] and gene shows upto 10-fold higher expression in 245101 and 130701 com-pared to 212297 The gene also contains an A to C transi-tion at position 3812465 a nSNP at position resulting inthe non-conservative substitution of a serine to a glycineresidue

Of the 19 genes that show specific differential expres-sion in the Mϕ the most notable are Mb1914c andechA21 which show upregulation in 245101 only (upto 6- and 23-fold respectively) Both genes encodeproteins that could be involved in lipid metabolism andboth genes contain single sSNPs that are present in245101 but absent in the other three strainsReal time RT-PCR was used to verify a selection of

genes that showed differential gene expression as pre-dicted by the microarray analysis Figure 2 comparesthe fold changes in the expression levels of 4 genes asmeasured by microarray and real time RT-PCR ThenirB and Mb1749c genes were selected because theyshowed strong upregulation in 130701 in both in vitroand ex vivo Mϕ while Mb1914c and echA21 were chosenbecause the array data predicted them to be specificallyupregulated in 245101 and only in ex vivo Mϕ Foreach of the 4 genes the strain dependent pattern ofexpression as measured by real time RT-PCR was consistentwith that measured by microarray although the foldchanges measured by real time RT-PCR were higherthan those measured by microarray

Functional analysis of SNP role in differential geneexpressionThe above data showed that many of the strain specificdifferences in gene expression were linked to the presence

Figure 2 Confirmation of amplicon microarray results with real time RT-PCR The fold changes in gene expression for (a) Mb1750c (b) nirB(c) echA21 and (d) Mb1914c measured by microarray (open bars) in each of the four strains were compared to that measured by real time RT-PCR(closed bars)

Golby et al BMC Genomics 2013 14710 Page 8 of 18httpwwwbiomedcentralcom1471-216414710

of synonymous or non-synonymous SNPs located withinthe coding regions of the genes that show variable expres-sion Non-synonymous SNPs lead to changes in aminoacid sequence which can lead to changes in protein func-tion The C to T transition at position 1932704 (wrt 212297) in the coding sequence of the 130701 Mb1749chomologue leads to the non-conservative substitution ofGly19 to Asp and this nSNP appears to be linked to theupregulation of both Mb1749c and Mb1750c in thatstrain In order to confirm this a 09 kb DNA fragmentcontaining the Mb1749c-Mb1750c-lsquoMB1751c region of130701 (containing the lsquoTrsquo allele) and the equivalentregion from 212297 (with the lsquoCrsquo allele) were PCRamplified and the fragments were cloned separatelyinto the mycobacterial shuttle vector pKINT (see Methods)to create the constructs pPG107 and pPG106 respectivelyThe two constructs were introduced into Mycobacteriumsmegmatis mc2155 separately and then the expression ofMb1749c and Mb1750c in M smegmatis pPG101 wascompared to that of M smegmatis pPG102 using real timeRT-PCR Table 4 shows that the expression levels ofMb1750c and Mb1749c in the strain expressing themutated forms of Mb1749cMb1750c are 13- and 9-foldhigher respectively compared to the strain expressingthe wild type forms confirming that this SNP is respon-sible for the observed up-regulation of the two genes in130701Synonymous substitutions do not lead to changes in

protein sequence and have generally been considered tobe lsquosilentrsquo or benign Recent studies however have sug-gested that sSNPs can have functional effects such asdecreased mRNA stability and translation [2122] Inour own studies we have found several genes whose ex-pression levels correlate with the presence of sSNPs inthe coding regions of those genes For example a C-Ttransversion at position 303227 (wrt 212297) withinthe coding sequence of nirB of 130701 is a sSNP thatappears to be linked with the upregulation in expressionof nirB within that strain To confirm that this is thecase we PCR amplified 35 kb DNA fragments containing

the hsp-nirB-nirD-cobU region of strain 130701 (containingthe lsquoCrsquo allele) and the equivalent region from strain 212297(with the lsquoTrsquo allele) and cloned them separately into theintegrating vector pKINT to create the constructspPG108 and pPG109 respectively These constructswere introduced into 212297 and the expression levelsof nirB and nirD were found to be 30- and 2-fold re-spectively higher in the strain expressing the mutatednirBD locus compared to the strain expressing the wild-type form (Table 4) This confirms that this mutation isresponsible for the upregulation of the two genes in thisstrain

Use of a high density tiled oligonucleotide microarray todetect differentially expressed small RNA transcripts inM bovis isolatesThe M tuberculosisM bovis amplicon arrays used inthe present study were specifically designed to measureexpression levels of genes annotated in the genomic se-quence of M bovis 212297 [10] They were not how-ever designed to monitor the expression of non-codingRNA such as small RNA within intergenic regions orantisense sRNA Hence a high density tiled oligonucleotidemicroarray consisting of approximately 180000 partially-overlapping (10-base overlap) short 60 mer oligonucleo-tides was designed that offered an unbiased approach tothe detection of strand specific transcripts encoded overthe entire M bovis 212297 chromosome Total RNAthat includes small sized (lt100 nt) RNA species wasextracted from the four M bovis strains that had beengrown in liquid media and hybridised to the oligo-nucleotide microarray To avoid potential secondarystrand synthesis during cDNA synthesis which couldbe interpreted as sRNA the RNA was directly labelledwith cyanine based dyes After pairwise comparisonswere performed between 212297 and 112101 245101or 130701 220 oligonucleotide probes were identifiedthat detected differentially expressed transcripts (25 foldcut off ) in one or more of the three strains (Additionalfile 4) Only transcripts detected by multiple (2 or more)overlapping probes were regarded as genuine transcriptsas those detected by single probes could be due to cross-hybridisation effects or represent spurious transcriptsUsing these criteria 26 transcripts designated T1-T26were found to show differential expression in one or moreof the strains (Table 5) and those transcripts can be di-vided into those that are encoded within intergenic re-gions and those encoded within the genomic co-ordinatesencompassing annotated coding sequences Comparisonof the differentially expressed gene lists identified usingamplicon vs oligonucleotide arrays (Tables 3 and 5) itis clear that many of the transcripts detected using theamplicon arrays are not necessarily encoded on thesense gene strand as had been previous interpreted For

Table 4 Confirmation of SNP linkage to upregulation ingene expression

Strain Gene Fold change

M smegmatis mc2155 (pPG101) Mb1750c 161 plusmn 32

M smegmatis mc2155 (pPG101) Mb1749c 125 plusmn 12

M smegmatis mc2155 (pPG102) Mb1750c 19 plusmn 08

M smegmatis mc2155 (pPG102) Mb1749c 14 plusmn 04

Mb 212297 (pPG108) nirB 20 plusmn 10

Mb 212297 (pPG108) nirD 13 plusmn 05

Mb 212297 (pPG109) nirB 529 plusmn 159

Mb 212297 (pPG109) nirD 31 plusmn 08

Fold changes are the mean ratios plusmn standard deviation

Golby et al BMC Genomics 2013 14710 Page 9 of 18httpwwwbiomedcentralcom1471-216414710

Table 5 Differential expression of RNA transcripts as detected by a tiled oligonucleotide microarray

Transcript No Probes Position Size cds Strand 112101 245101 130701 CDS Product SNP

T1 3 303138-303295 157 Mb0258nirB A 3uarr nitrite reductase (large sub-unit)

T2 5 303260-303515 255 Mb0258nirB S 5uarr nitrite reductase (large sub-unit) sSNP at 303227 C-T in 130701 only

T3 2 1105085-1105193 108 Mb1014 A 3uarr 3uarr probable adhesion componentof ABC transporter

T4 2 1105158-1105315 157 Mb1014 A 3uarr 3uarr probable adhesioncomponent of ABC transporter

T5 3 1105158-1105315 157 Mb1014 S 3uarr 3uarr probable adhesioncomponent of ABC transporter

T6 5 1778876-1779131 255 Mb1618c A 9uarr possible secreted lipase sSNP at 1778879 C-T in 112101 only

T7 2 1840174-1840282 108 Mb1672c S 4darr hypothetical protein

T8 3 1932158-1932315 157 tRNA-Pro S 3uarr proline tRNA

T9 5 1932452-1932707 255 Mb1749c S 4uarr toxin component oftoxin-antitoxin system

nSNP at 1932704 C-T (G to D) in 130701 only

T10 5 1932746-1933075 329 Mb1750c S 7uarr anti-toxin component oftoxin-antitoxin system

T11 6 2122959-2123263 304 Mb1914c A 12uarr probable short-chain typedehydrogenasereductase

sSNP at 2122970 C-T in 245101 only

T12 2 2324463-2324571 108 Mb2110 S 3uarr hypothetical protein(frame-shifted in 212297)

T13 2 2328944-2329052 108 Mb2116cpepE S 2darr 3darr probable dipeptidase

T14 2 2329483-2329591 108 Mb2117 A 10darr 10darr 5prime-3prime exonuclease nSNP at 2329583 A-G (comp strand I to T)in 245101 only

T15 3 2868310-2868467 157 Mb2607 A 6darr 8darr hp (frame-shifted in 212297)

T16 2 2868506-2868614 108 Mb2607 A 7darr 7darr hp (frame-shifted in 212297) nSNP at 2868616 A-G (stop to W) in245101 and 130701

T17 2 3075740-3075848 108 I 5uarr 6uarr

T18 2 3075887-3075995 108 I 5uarr 6uarr

T19 2 3076622-3076730 108 I 4uarr 4uarr

T20 2 3076769-3076877 108 I 5uarr 5uarr

T21 2 3078728-3078836 108 I 6uarr 7uarr

T22 2 3079169-3079277 108 I 4uarr 6uarr

T23 2 3079903-3080011 108 I 6uarr 7

T24 2 3844577-3844685 108 Mb3509c S 3uarr 3uarr possible acyltransferase(frame-shifted in 212297)

T25 6 4155501-4155805 304 Mb3803echA21 A 5uarr possible enoyl-coA hydratase sSNP at 4155803 C-T (comp strand) in 245101 only

T26 2 4316987-4317095 108 Mb3927c A 3uarr chp (frame-shifted in H37Rv)

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expressionStrand A indicates antisense S sense and I intergenic

Golby

etalBM

CGenom

ics201314710

Page10

of18

httpwwwbiom

edcentralcom1471-216414710

example the amplicon array data had appeared to suggestthat Mb1914c and echA21 were upregulated in 245101but the oligo array data indicates that transcripts 11 and25 which are encoded within the co-ordinates encoded bythose two genes are actually encoded on the antisensestrands This apparent discrepancy can be rationalisedonce we consider that double stranded amplicon micro-array probes cannot discriminate between transcriptsencoded on the sense or antisense strands Transcripts 11and 25 can therefore be considered as potential antisensesRNAs (asRNA) which could be involved in translationalor post-transcriptional control of the sense transcriptOther potential cis-encoded sRNAs detected using the ar-rays include T6 T14 and T15T16 which are encoded onthe antisense strands to Mb1618c Mb2117 and Mb2607respectively and for each of these transcripts their expres-sion appears to be linked to the presence of a single SNPwithin the co-ordinates of the genes The approximateboundaries of these transcripts can be derived from thegenomic co-ordinates of the oligonucleotide probes thatdetect the expression of the transcript Thus the tran-scripts appear to be between 100ndash300 nt in size and thepositions of the linked SNPs appear to be positioned eitherjust upstream or within the predicted 5primeend of the tran-scripts (Figure 3) Three of the transcripts (T11 T14 andT25) are antisense to the central part of the sense encodedgene while T6 is encoded antisense to the 5prime end ofMb1618c As well as antisense transcripts we also saw thedifferential expression of sense transcripts The ampliconmicroarray data (confirmed by real time RT-PCR) indi-cated that nirB is strongly upregulated specifically in130701 in both in vitro and ex-vivo Mϕ An analysis ofthe oligonucleotide array data however indicates thatthere are two short transcripts T1 and T2 (sense and anti-sense respectively) that are encoded within the genomicco-ordinates of the nirB gene T2 is the longer in size (255vs 155nt) and more highly expressed (5 vs 3-fold) thanT1 and both transcripts appear to be linked to the pres-ence of a SNP that is located within the middle of T1 andapproximately 50nt upstream of T2Some of the transcripts are bone fide gene sense

strand mRNA transcripts such as T9 and T10 whichare encoded by Mb1749c and Mb1750c respectivelyAlthough it would appear that the two genes are tran-scribed separately it is probable that the two transcriptsare co-transcribed as the stop codon of Mb1750c over-laps the start codon of Mb1749c Eight of the tran-scripts listed in Table 5 are encoded within intergenicregions 7 of which are encoded within the polymorphicdirect repeat (DR) locus The DR locus of strains belong-ing to the M tuberculosis complex has been suggested toconstitute a CRISPR locus which have been shown inmany species of bacteria to be involved in protectionagainst exogenous foreign DNA such as plasmids and

phage [25] All the DR encoded transcripts are short(approx 100 nt) straddle contiguous repeat and spacersequences and show approximately 5-fold higher levelsof expression in 245101 and 130701 compared with212297 and 112101

Characterisation of differentially expressed cis asRNAThe genomic co-ordinates of the oligonucleotide probesthat detected the antisense species described above canonly serve as approximate estimations as to their startand end points Thus we used 5prime RLM-RACE (RNALigase Mediated Rapid Amplification of cDNA Ends) inan attempt to accurately define the transcriptional startsites (TSS) for the short sense transcript T2 and the anti-sense transcripts T6 T11 and T25 described in the abovesection (see Methods) These transcripts were chosen astheir expression levels are high and their transcript lengthswere considered to be sufficiently long to enable theRLM-RACE methodology to work Table 6 details thesizes of the PCR products obtained after RLM-RACE wasperformed using oligonucleotide primers designed tosequences predicted for transcripts T6 T11 and T25 NoPCR product was obtained for transcript T2 For each ofthe three transcripts the TSS was determined to be a Gresidue which is the most commonly used residue typefor mycobacterial TSSrsquos (Figure 4) [26] For each of the T6T11 and T25 transcripts expression of the asRNAs waslinked to the presence of a SNP (C to T) proximal tothe 5prime end of the asRNA Strains exhibiting the lsquoCrsquo alleleshowed no expression of the asRNA whilst the strainthat showed expression had the lsquoTrsquo allele An analysis ofthe nucleotide sequence in the vicinity of the SNPsreveals that for each of the three transcripts the SNPconstitutes the 6th residue of a motif that has stronghomology to the consensus sequence for the minus10 elementof Group A mycobacterial promoters (Figure 4) [26] Thefinding that a lsquoTrsquo residue is associated with expression isconsistent with the consensus sequence which indicatesthat 86 of all minus10 elements have a lsquoTrsquo residue at the 6thresidue position Several residues that flank the minus10 motifalso show a degree of conservation Sequence motifswhich show strong homology to group A minus35 elementsare present 18ndash19 bp upstream of the putative minus10elements and the distances between the minus35 -10 andTSS elements are consistent with those elements ofthe consensus sequence No protein encoding openreading frames were detected within the T6 T11 andT25 transcriptsIn a parallel study high density oligonucleotide micro-

arrays were also used to interrogate the transcriptomesof M tuberculosis H37Rv M bovis BCG Pasteur Myco-bacterium caprae and M bovis AN5 that had beengrown in Middlebrook 7H9 media As a result of theseexperiments two asRNA species were found to be

Golby et al BMC Genomics 2013 14710 Page 11 of 18httpwwwbiomedcentralcom1471-216414710

expressed within the antisense strands of the ino1 andnarH genes of M tuberculosis H37Rv but not in any ofthe other 3 strains tested (data not shown) A compari-son of nucleotide sequences of the orthologous genesacross the species suggested that expressions of theas_sRNAs correlated with the presence of a sSNP (C to T

transition at positions 50555 and 1292100 wrt H37Rvgenomic sequence for as_ino1 and as_narH respectively)upstream of the asRNAs Approximate information re-garding the transcriptional start site was deduced from thebinding co-ordinates of the probes that detected the tran-scripts As with the M bovis antisense sRNAs described

Figure 3 Expressions and schematic representation of genomic locations of selected cis-encoded antisense sRNAs identified using atiled oligonucleotide microarray Three asRNAs (open arrows) are (a) T6 (b) T14 and (c) T25 For each asRNA a histogram plots the foldchanges for each of the oligonucleotide probes that detected the asRNA and for each probe the binding position relative to the 212297genome is indicated Closed and open arrows indicate lengths and direction of transcription of genes and asRNAs respectively

Golby et al BMC Genomics 2013 14710 Page 12 of 18httpwwwbiomedcentralcom1471-216414710

above the T residue associated with the expression ofthe M tuberculosis asRNAs is part of a putative minus10element A minus35 element with an appropriate spacing tothe minus10 element was identified for as_ino1 but not foras_narH suggesting that the as_narH promoter maybelong to group B mycobacterial promoters that havea conserved minus10 but no minus35 motif [26]

DiscussionsThe aim of this work was to define possible phenotypicvariation across M bovis field isolates through a combin-ation of genome sequencing comparative genomics andtranscriptome analyses from both in vitro and ex vivo con-ditions Using these approaches we uncovered a range ofnovel findings the most striking of which was the realisa-tion that genes that had been predicted to be differentiallyexpressed based on amplicon-microarray data were in fact

not upregulated and that instead it was an antisense tran-script that was showing differential expression Analysingboth transcriptome and genome sequence data allowed usto identify SNPs responsible for the transcription of anti-sense RNAs with generation of a consensus minus10 promotersequence the likely mechanism Our results suggest thatdata generated from amplicon arrays in the past may needto be revisited as it is possible that some coding-sequences identified as being differentially expressed wereinstead antisense transcriptsWith the growth of technologies such as high density

tiled oligonucleotide microarrays and next generationsequencing there has been a rapid increase in the num-ber of reports describing the existence of non-codingRNAs (ncRNAs) in bacteria Non-coding RNAs broadlyconsist of two types cis- and trans-encoded RNA TransRNA includes intergenic encoded RNA while cis-encodedRNA includes 5prime and 3prime untranslated regions of mRNAand antisense RNA To study the expressions of both cis-and trans encoded ncRNA we used a high density oligo-nucleotide tiled microarray since our amplicon microarraywas unable to detect intergenic transcripts or differentiatebetween sense and antisense transcripts Previous studiesusing M tuberculosis have identified substantial amountsof ncRNA encoded in both intergenic and intragenic re-gions [78] We detected substantial amounts of ncRNAs

Figure 4 Promoters of anti sense RNAs a Promoters of the asRNAs as_mb1618c as_1914c and as_echA21 -10 and minus35 elements areindicated in bold and italics Transcriptional start sites are indicated by large font G characters while SNP residue that leads to the expression ofthe asRNA is indicated by a large font red T residue The consensus sequence for group a mycobacterial promoters is indicated Numericalsubscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determinedthat show the indicated residue b Promoters of the differentially expressed asRNAs as_ino1 and as_narH in M tuberculosis -10 and minus35 elementsare indicated in bold italics The red residue indicates SNP responsible for differential expression

Table 6 Determination of transcriptional start sites ofantisense RNAs

Transcript asRNA PCRprod size

TSSresidue type

TSS residue postn(wrt 212297)

T6 mb1618c_as 150 G 1778886

T11 mb1914c_as 220 G 2122978

T25 echA21_as 175 G 4155796

Golby et al BMC Genomics 2013 14710 Page 13 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

each chosen strain was the most representative of eachspoligotype group (Table 1) Of the four studied strains245101 and 130701 diverged earliest during descentfrom the most recent common ancestor of M bovis inGB and are more distant to strains 112101 and 130701 (Smith N personal communication) All 4 strainswere isolated from diseased cows belonging to herdswhich were taken from farms in geographically diverseareas of the countryThe genomes of the three M bovis strains 112101

245101 and 130701 were paired-end sequenced usingIllumina sequencing technology Processed sequencereads were mapped to the genome of the fully sequenced

and annotated strain 212297 [10] to identify SNPsSNPs were identified across all four sequenced M bovisstrains and their positions together with their SNPclass are listed in Additional file 1 The evolutionaryrelationships between the three sequenced strains aredepicted in Figure 1 using a phylogenetic tree and adistance matrix plot Three genome sequenced membersof the Mycobacterium tuberculosis complex M bovisBCG Pasteur M tuberculosis H37Rv and M africanumGM041182 were included in the analysis to place theM bovis strains in a wider mycobacterial context Thenumbers of SNPs between M bovis 212297 and thethree newly sequenced M bovis strains were found

Table 1 M bovis field strains used in this study

AHVLA ID International spoligotype ID [9] UK spoligotype ID VNTR ID Year of isolation UK county of isolation

112101 SB0263 17 7555331 2001 Wiltshire

212297 SB0140 9 8555331 1997 Devon

245101 SB0129 25 6554231 2001 Clwyd

130701 SB0134 35 3354331 2001 Shropshireindicates partial number of repeats

Figure 1 Whole genome SNP-based evolutionary analysis of M bovis sequenced strains (a) Phylogenetic tree with numbers above thebranches indicating the number of SNPs identified between the organism and its common ancestor (b) Distance matrix plot showing thenumber of SNPs present between selected pairs of strains

Golby et al BMC Genomics 2013 14710 Page 3 of 18httpwwwbiomedcentralcom1471-216414710

to be consistent with their predicted evolutionary distancesfrom each other Strain 112101 (type 17) is most closelyrelated to the original genome sequenced strain 212297(type 9) with 114 SNPs whereas the more distantly relatedstrains 245101 (type 25) and 130701 (type 35) have 431and 552 SNP differences respectively All four sequencedM bovis strains were more distantly related to Mafricanum (~1800 SNPs) and M tuberculosis H37Rv(~2200 SNPs)In addition to SNPs both large and small INDELs can

be inferred from NGS data although there are severalchallenges involved [11] Large sized insertions (gt500bp) are particularly difficult to identify with accuracy asthey require the de novo assembly of the reads that failto map to the reference genome and the subsequentdatabase searching with the resulting contigs Accurateidentification of large sized deletions (gt500 bp) are how-ever more easily identifiable and Additional file 2 liststhose that have been identified by NGS in the genomesof the three sequenced strains As poor coverage canconfound the identification of deletions we sought touse microarray technology to confirm the deletions iden-tified by the NGS data Genomic DNA was isolated fromall four strains labelled with fluorescent dyes andhybridised to a whole genome M tuberculosisM bovisamplicon microarray (see Methods) Table 2 lists severalLSPs that were detected across the strains usingmicroarrays Both NGS and microarray data predict thepresence of the large 68kb deletion (RDbovis(d)_0173)which encompasses genes Mb1963-Mb1971 and appearsto be specific to UK strains belonging to Type 17 thathas been described in a previous study [12] Several ofthese gene products are predicted to encode proteins in-volved in lipid metabolism but the lipid composition ofseveral type 17 isolates was found to be no different toother M bovis strains although their ability to incorpor-ate propionate into mycolic acids was found to be lower[6] A smaller 16 kb deletion specific 130701 that com-prises the 3prime end of Mb2056c Mb2055c and the 5prime end

of pkfB (Mb2054c) was also detected by both NGS andmicroarray data Due to a single base deletion Mb2056cand Mb2055c are pseudogenes in 212297 but the twogenes exist as one intact functional gene in 2145 andH37Rv (Rv2030c) The pfkB gene encodes a phospho-fructokinase homologue and is strongly immunogenic inhuman TB patients while Rv2030c encodes an erythro-mycin esterase Both pfkB and Mb2056c are members ofthe DosR regulon which are highly upregulated underanaerobic conditions and have been implicated in bac-terial persistence in vivo [13] Other smaller deletionsdetected include a deletion of a probable lipid transferprotein encoding gene Mb1699c which is specific to130701 and an aldoketo reductase encoding geneMb2320 that is specific to 112101

Linking SNPs to genes that show differential expressionamongst M bovis strains grown under vitro conditionsand in ex vivo macrophagesThe four M bovis field strains were grown to mid-logarithmic phase in pyruvate-containing Middlebrook7H9 liquid media and then used to infect bovine alveo-lar Mϕ using a multiplicity of infection (MOI) of 101(bacilli Mϕ) Mycobacterial RNA was recovered frominfected host cells 4 and 24 hrs post infection using adifferential lysis procedure and amplified using a modi-fied procedure similar to that described by van Gelderet al ([14] see Methods) As a control RNAs were alsoextracted from strains that had been incubated staticallyin RPMI cell culture media for a period of 4 hrs Toeliminate potential skewing effects on the transcriptomeresulting from the amplification process comparisonswere made only between amplified RNA generated fromsamples collected at the same time point and biologicalreplicate For the in vitro growth condition total RNAswere extracted from the four strains grown in a pyruvate-containing Middlebrook 7H9 liquid media and rolledduring incubation

Table 2 Large sequence polymorphisms present in M bovis field isolates

Mb CDS Mtb CDS Genomic postn(wrt to 212297)

Size ofdeletion (bp)

Gene(s) 212297 112101 245101 130701 Product(s)

Mb1699c Rv1627c nd nd DEL Probable lipid transfer proteinaMb1963c-Mb1971

Rv1928c-Rv1936

2172231-2179038 6807 tpx fadE17fadE18 echA13

DEL Gene products have roles inlipid metabolism

Mb2054c Rv2029c 2259798-2261457 1659 pfkB DEL Possible phosphofructokinase

Mb2055cMb2056c

Rv2030c DEL Conserved hypothetical protein(frame-shifted in 221297)

Mb2320 Rv2298 nd nd DEL DEL Aldoketo reductase

Mb3923c Rv3894c nd nd DEL Conserved membrane protein(frame-shifted in 221297)

DEL indicates genes that are deleted from strainsaMostowy et al 2005 nd not determined

Golby et al BMC Genomics 2013 14710 Page 4 of 18httpwwwbiomedcentralcom1471-216414710

The RNAs extracted from each of the growth condi-tions were converted to cyanine labelled cDNA using re-verse transcriptase and hybridised to whole genomeamplicon microarrays Using only those genes that arecommon to all four strains we found a total set of 70genes that showed a 25-fold or more difference in ex-pression in one or more strains when pairwise compari-sons were made between the transcriptomes of 212297and 112101 245101 or 130701 (Additional file 3) Asubset of these 70 genes is shown in Table 3 where keyexamples of alterations in metabolic processes areshown The numbers of genes that were found to showdifferential expression across the four strains reflectedthe evolutionary distances between 212297 and theother 3 strains Thus 112101 which is closely relatedto 212297 shows only 5 differentially expressed geneswhile the most distantly related strain 130701 shows 56genes differently expressed compared to 212297 Ofthese 56 genes 5 were specific to the in vitro conditionwhile 19 were specific to the Mϕ Ten genes were com-mon to both conditions which serves to validate thetechnical reproducibility of the RNA amplificationprocessUsing the genome sequencing information determined

for each of the four strains we attempted to correlatethe observed strain-specific differences in gene expres-sion with the presence or absence of mutations withinthe coding regions or promoters of those genes thatshow differential gene expression or in genes that areknown to regulate the activity of those genes Mb1749cand Mb1750c are two genes that are specificallyupregulated in 130701 and encode a toxin and antitoxin(TA) pair respectively belonging to type II TA systemsof the VapBC family [15] Members of VapB type toxinscontain PIN domains that cleave RNA and thus functionto control translation of mRNA transcripts [16] Thehomologous genes from strain 130701 show up to 19-and 10-fold higher levels of expression respectively thanthose of the other three strains An analysis of the cod-ing sequences of Mb1749c across all four strains re-vealed that the 130701 homologue has a unique nSNPat position 1932704 (wrt 212297 genomic sequence) aC-T transition that results in the nonconservative substi-tution of Gly19 to Asp Research has shown that TAgene pairs negatively regulate their own expressionthrough binding of the TA protein complex to the pro-moter region of the TA gene pair thus preventing accessto RNA polymerase [17] The G19D mutation inMb1749c could therefore impair the ability of the com-plex to bind to the promoter resulting in the deregula-tion of the TA gene pairMb2007c which shows a 4-fold higher expression in

130701 only encodes a transcriptional regulator of theLysR class There are two SNPs present in the coding

sequence of Mb2007c in 130701 which are absent inthe homologues of the other three strains the first is anSNP which results in the conservative substitution ofArg137 to Gln while the second is a more debilitatingnonsense SNP which ultimately leads to a protein whoselength is only 60 that of the wild-type Many regulatorsbelonging to the LysR family regulate their own expres-sion through a negative autoregulatory mechanism simi-lar to that described above for VapBC TA systems [18]A loss in protein integrity could therefore result in theregulator being unable to bind the regulatory regionleading to the observed upregulation in the expressionof this gene in 130701 As the product of this gene ispredicted to be a transcriptional regulator it was specu-lated that the regulation of gene(s) controlled by regula-tor could be affected in 130701 due to the severelytruncated form of this protein As LysR regulators areoften found to regulate genes that are divergently tran-scribed from the lysR gene it was surprisingly to findthat expression of the Mb2008 homologue in 130701which is predicted to encode a lysine transporter doesnot show any difference in expression in 130701 to212297 To define the regulon of this regulator we firstcompared the transcriptomes of 212297 transformedwith a multicopy plasmid expressing the truncated copyof mb2007c against a vector only control No differencesin expression were found (data not shown) which couldindicate that the regulator does not control any othergenes apart from itself or that experimental conditionsdid not favour the active form of the regulator LysR reg-ulators regulate expression of their regulon throughbinding of a co-inducer to the C-terminal domain andthe failure to observe any changes in gene expressioncould therefore be due to the absence of the co-inducerduring the experiment A further experiment to comparethe profiles of 212297 expressing either the truncatedor wild type forms of the protein also showed no differ-ences in expression (data not shown)Nitrite reductase catalyses the reduction of nitrite to

ammonia and is strongly expressed during growth in thepresence of nitrate or nitrite but repressed in the pres-ence of ammonia [19] The gene encoding the large sub-unit of the nitrite reductase nirB (Mb0258) showsapproximately 9-fold higher expression in 130701 com-pared to the other 3 strains in our standard ammoniacontaining 7H9 growth media suggesting that the strainhas lost regulatory control of this gene Expression ofnirB in M tuberculosis has been shown to be controlledby the response regulator GlnR [20] but an analysis ofthe sequence of the glnR orthologue from all four strainsrevealed no differences in either the coding or promotersequences A comparison of the nirB sequence from all4 strains did however reveal the presence of a singlebase (C to T) transition leading to a sSNP that is specific

Golby et al BMC Genomics 2013 14710 Page 5 of 18httpwwwbiomedcentralcom1471-216414710

Table 3 Fold change differences in gene expression in M bovis field isolates 1121 2451 and 1307 compared to 2122

Mb CDS Mtb CDS Common 112101 245101 130701 Product Assoc SNPInDel

7h9 RPMI 4hr MOslash 24 hr MOslash 7h9 RPMI 4hr MOslash 24 hr MOslash 7h9 RPMI 4hr MOslash 24 hr MOslash

Mb0038c Rv0037c 4uarr 3uarr 3uarr probable conservedintegral membrane

protein

Mb0124c Rv0120c fusA2 3uarr 3uarr probable elongationfactor

Mb0258 Rv0252 nirB 9uarr 12uarr 21uarr 9uarr nitrite reductase(large subunit)

sSNP at postn 303227 in 130701only C-T

Mb0259 Rv0253 nirD 3uarr 4uarr 5uarr nitrite reductase(small subunit)

Mb0428c Rv0420c 2uarr possibletransmembrane

protein

Mb0734 Rv0713 2uarr probable conservedtransmembrane

protein

Mb0947c Rv0923c 2darr 2darr conservedhypotheitical protein

Mb0948c Rv0924c mntH cation uptake system

Mb1013Mb1014

Rv0987 5uarr 10uarr 5uarr 5uarr 7uarr 6uarr probable adhesioncomponent transport

ABC transporter

nSNP at postn 1104263 in 245101 and130701 A-G (STOP to W) sSNP at

postn 1103991 in 245101 and 130701

Mb1015 Rv0988 3uarr 9uarr 3uarr 5uarr 5uarr possible conservedexported protein

Mb1161 Rv1130 2darr conservedhypothetical protein

Mb1162 Rv1131 gltA1 3darr probable citratesynthase

Mb1554c Rv1527c pks5 2darr probable polyketidesynthase

Mb1562 Rv1535 2uarr 5uarr hypothetical protein

Mb1619c Rv1593c 2uarr conservedhypothetical protein

Mb1749c Rv1720c 10uarr 19uarr 18uarr 12uarr conservedhypothetical protein

nSNP at postn 1932704 in130701 only C-T (G to D)

Mb1750c Rv1721c 6uarr 3uarr 9uarr 5uarr conservedhypothetical protein

Mb1833c Rv1804 3darr conservedhypothetical protein

Mb1834c Rv1805c 4uarr hypothetical protein

Golby

etalBM

CGenom

ics201314710

Page6of

18httpw

wwbiom

edcentralcom1471-216414710

Table 3 Fold change differences in gene expression in M bovis field isolates 1121 2451 and 1307 compared to 2122 (Continued)

Mb1835 Rv1806 PE20 3uarr PE family protein

Mb1885c Rv1854c ndh 3darr probable nadhdehydrogenase

Mb1914c Rv1882c 6uarr 3uarr 6uarr probable short-chaintype dehydrogenase

reductase

sSNP at postn 2122970 in241501 only C-T

Mb2007c Rv1985c 4uarr 3uarr 3uarr probabletranscriptional

regulatory protein

nSNP at postn 2208296 in 130701only G-A (Q to Stop)

Mb2015c RV1992c ctpG 2darr 3darr probable cationtransporter

Mb2420c Rv2398c cysW 2darr 6darr sulphate transporter

Mb2421c Rv2399c cysT 2darr 4darr sulphate transporter

Mb2607 Rv2577 9darr 5darr 7darr 12darr 10darr 7darr 7darr 13darr conservedhypothetical protein

[first part]

Mb3194 Rv3169 3uarr 3uarr 2uarr 3uarr conservedhypothetical protein

Mb3477c Rv3447c 8darr 3darr 10darr 9darr 2darr 7darr probable conservedmembrane protein

nSNP at postn 3812465 in 245101and 130701 T-C (S to G)

Mb3563c Rv3533c PPE62 5uarr 5uarr ppe family protein

Mb3721c Rv3696c glpK 3darr glycerol kinase

Mb3803 Rv3774 echA21 8uarr 7uarr 22uarr possible enoyl-coAhydratase

sSNP at postn 4155803 in241501 only G-A

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expression

Golby

etalBM

CGenom

ics201314710

Page7of

18httpw

wwbiom

edcentralcom1471-216414710

to 130701 It was not readily apparent why a sSNP in thecoding sequence of a gene should lead to an increase inexpression of that gene but there are several reports thatshow sSNPs leading to changes in stability of mRNA tran-scripts [2122] Rv0987 and Rv0988 of M tuberculosisH37Rv encode part of an ABC transporter and a putativesecreted hydrolase respectively In 212297 a single basetransition (G-A) introduces a stop codon that splitsRv0987 into the two pseudogenes Mb1013 and Mb1014Previous microarray based gene expression studies by ourgroup have shown that Rv0987 and Rv0988 in M tubercu-losis show higher levels of expression than the orthologousMb1013Mb1014 and Mb1015 respectively in M bovis212297 [23] and in the present study the Mb1013Mb1014 and Mb1015 homologues in 245101 and 130701 also showed higher expression (up to 10-fold) than thehomologues in 212297 and 112101 Comparing thesequences of Mb1013Mb1014 and Mb1015 across all 4strains indicated that strains that show high expressionhave the lsquoGrsquo alleleMb3477c encodes an ATP binding membrane protein

part of the Esx4 secretion system [24] and gene shows upto 10-fold higher expression in 245101 and 130701 com-pared to 212297 The gene also contains an A to C transi-tion at position 3812465 a nSNP at position resulting inthe non-conservative substitution of a serine to a glycineresidue

Of the 19 genes that show specific differential expres-sion in the Mϕ the most notable are Mb1914c andechA21 which show upregulation in 245101 only (upto 6- and 23-fold respectively) Both genes encodeproteins that could be involved in lipid metabolism andboth genes contain single sSNPs that are present in245101 but absent in the other three strainsReal time RT-PCR was used to verify a selection of

genes that showed differential gene expression as pre-dicted by the microarray analysis Figure 2 comparesthe fold changes in the expression levels of 4 genes asmeasured by microarray and real time RT-PCR ThenirB and Mb1749c genes were selected because theyshowed strong upregulation in 130701 in both in vitroand ex vivo Mϕ while Mb1914c and echA21 were chosenbecause the array data predicted them to be specificallyupregulated in 245101 and only in ex vivo Mϕ Foreach of the 4 genes the strain dependent pattern ofexpression as measured by real time RT-PCR was consistentwith that measured by microarray although the foldchanges measured by real time RT-PCR were higherthan those measured by microarray

Functional analysis of SNP role in differential geneexpressionThe above data showed that many of the strain specificdifferences in gene expression were linked to the presence

Figure 2 Confirmation of amplicon microarray results with real time RT-PCR The fold changes in gene expression for (a) Mb1750c (b) nirB(c) echA21 and (d) Mb1914c measured by microarray (open bars) in each of the four strains were compared to that measured by real time RT-PCR(closed bars)

Golby et al BMC Genomics 2013 14710 Page 8 of 18httpwwwbiomedcentralcom1471-216414710

of synonymous or non-synonymous SNPs located withinthe coding regions of the genes that show variable expres-sion Non-synonymous SNPs lead to changes in aminoacid sequence which can lead to changes in protein func-tion The C to T transition at position 1932704 (wrt 212297) in the coding sequence of the 130701 Mb1749chomologue leads to the non-conservative substitution ofGly19 to Asp and this nSNP appears to be linked to theupregulation of both Mb1749c and Mb1750c in thatstrain In order to confirm this a 09 kb DNA fragmentcontaining the Mb1749c-Mb1750c-lsquoMB1751c region of130701 (containing the lsquoTrsquo allele) and the equivalentregion from 212297 (with the lsquoCrsquo allele) were PCRamplified and the fragments were cloned separatelyinto the mycobacterial shuttle vector pKINT (see Methods)to create the constructs pPG107 and pPG106 respectivelyThe two constructs were introduced into Mycobacteriumsmegmatis mc2155 separately and then the expression ofMb1749c and Mb1750c in M smegmatis pPG101 wascompared to that of M smegmatis pPG102 using real timeRT-PCR Table 4 shows that the expression levels ofMb1750c and Mb1749c in the strain expressing themutated forms of Mb1749cMb1750c are 13- and 9-foldhigher respectively compared to the strain expressingthe wild type forms confirming that this SNP is respon-sible for the observed up-regulation of the two genes in130701Synonymous substitutions do not lead to changes in

protein sequence and have generally been considered tobe lsquosilentrsquo or benign Recent studies however have sug-gested that sSNPs can have functional effects such asdecreased mRNA stability and translation [2122] Inour own studies we have found several genes whose ex-pression levels correlate with the presence of sSNPs inthe coding regions of those genes For example a C-Ttransversion at position 303227 (wrt 212297) withinthe coding sequence of nirB of 130701 is a sSNP thatappears to be linked with the upregulation in expressionof nirB within that strain To confirm that this is thecase we PCR amplified 35 kb DNA fragments containing

the hsp-nirB-nirD-cobU region of strain 130701 (containingthe lsquoCrsquo allele) and the equivalent region from strain 212297(with the lsquoTrsquo allele) and cloned them separately into theintegrating vector pKINT to create the constructspPG108 and pPG109 respectively These constructswere introduced into 212297 and the expression levelsof nirB and nirD were found to be 30- and 2-fold re-spectively higher in the strain expressing the mutatednirBD locus compared to the strain expressing the wild-type form (Table 4) This confirms that this mutation isresponsible for the upregulation of the two genes in thisstrain

Use of a high density tiled oligonucleotide microarray todetect differentially expressed small RNA transcripts inM bovis isolatesThe M tuberculosisM bovis amplicon arrays used inthe present study were specifically designed to measureexpression levels of genes annotated in the genomic se-quence of M bovis 212297 [10] They were not how-ever designed to monitor the expression of non-codingRNA such as small RNA within intergenic regions orantisense sRNA Hence a high density tiled oligonucleotidemicroarray consisting of approximately 180000 partially-overlapping (10-base overlap) short 60 mer oligonucleo-tides was designed that offered an unbiased approach tothe detection of strand specific transcripts encoded overthe entire M bovis 212297 chromosome Total RNAthat includes small sized (lt100 nt) RNA species wasextracted from the four M bovis strains that had beengrown in liquid media and hybridised to the oligo-nucleotide microarray To avoid potential secondarystrand synthesis during cDNA synthesis which couldbe interpreted as sRNA the RNA was directly labelledwith cyanine based dyes After pairwise comparisonswere performed between 212297 and 112101 245101or 130701 220 oligonucleotide probes were identifiedthat detected differentially expressed transcripts (25 foldcut off ) in one or more of the three strains (Additionalfile 4) Only transcripts detected by multiple (2 or more)overlapping probes were regarded as genuine transcriptsas those detected by single probes could be due to cross-hybridisation effects or represent spurious transcriptsUsing these criteria 26 transcripts designated T1-T26were found to show differential expression in one or moreof the strains (Table 5) and those transcripts can be di-vided into those that are encoded within intergenic re-gions and those encoded within the genomic co-ordinatesencompassing annotated coding sequences Comparisonof the differentially expressed gene lists identified usingamplicon vs oligonucleotide arrays (Tables 3 and 5) itis clear that many of the transcripts detected using theamplicon arrays are not necessarily encoded on thesense gene strand as had been previous interpreted For

Table 4 Confirmation of SNP linkage to upregulation ingene expression

Strain Gene Fold change

M smegmatis mc2155 (pPG101) Mb1750c 161 plusmn 32

M smegmatis mc2155 (pPG101) Mb1749c 125 plusmn 12

M smegmatis mc2155 (pPG102) Mb1750c 19 plusmn 08

M smegmatis mc2155 (pPG102) Mb1749c 14 plusmn 04

Mb 212297 (pPG108) nirB 20 plusmn 10

Mb 212297 (pPG108) nirD 13 plusmn 05

Mb 212297 (pPG109) nirB 529 plusmn 159

Mb 212297 (pPG109) nirD 31 plusmn 08

Fold changes are the mean ratios plusmn standard deviation

Golby et al BMC Genomics 2013 14710 Page 9 of 18httpwwwbiomedcentralcom1471-216414710

Table 5 Differential expression of RNA transcripts as detected by a tiled oligonucleotide microarray

Transcript No Probes Position Size cds Strand 112101 245101 130701 CDS Product SNP

T1 3 303138-303295 157 Mb0258nirB A 3uarr nitrite reductase (large sub-unit)

T2 5 303260-303515 255 Mb0258nirB S 5uarr nitrite reductase (large sub-unit) sSNP at 303227 C-T in 130701 only

T3 2 1105085-1105193 108 Mb1014 A 3uarr 3uarr probable adhesion componentof ABC transporter

T4 2 1105158-1105315 157 Mb1014 A 3uarr 3uarr probable adhesioncomponent of ABC transporter

T5 3 1105158-1105315 157 Mb1014 S 3uarr 3uarr probable adhesioncomponent of ABC transporter

T6 5 1778876-1779131 255 Mb1618c A 9uarr possible secreted lipase sSNP at 1778879 C-T in 112101 only

T7 2 1840174-1840282 108 Mb1672c S 4darr hypothetical protein

T8 3 1932158-1932315 157 tRNA-Pro S 3uarr proline tRNA

T9 5 1932452-1932707 255 Mb1749c S 4uarr toxin component oftoxin-antitoxin system

nSNP at 1932704 C-T (G to D) in 130701 only

T10 5 1932746-1933075 329 Mb1750c S 7uarr anti-toxin component oftoxin-antitoxin system

T11 6 2122959-2123263 304 Mb1914c A 12uarr probable short-chain typedehydrogenasereductase

sSNP at 2122970 C-T in 245101 only

T12 2 2324463-2324571 108 Mb2110 S 3uarr hypothetical protein(frame-shifted in 212297)

T13 2 2328944-2329052 108 Mb2116cpepE S 2darr 3darr probable dipeptidase

T14 2 2329483-2329591 108 Mb2117 A 10darr 10darr 5prime-3prime exonuclease nSNP at 2329583 A-G (comp strand I to T)in 245101 only

T15 3 2868310-2868467 157 Mb2607 A 6darr 8darr hp (frame-shifted in 212297)

T16 2 2868506-2868614 108 Mb2607 A 7darr 7darr hp (frame-shifted in 212297) nSNP at 2868616 A-G (stop to W) in245101 and 130701

T17 2 3075740-3075848 108 I 5uarr 6uarr

T18 2 3075887-3075995 108 I 5uarr 6uarr

T19 2 3076622-3076730 108 I 4uarr 4uarr

T20 2 3076769-3076877 108 I 5uarr 5uarr

T21 2 3078728-3078836 108 I 6uarr 7uarr

T22 2 3079169-3079277 108 I 4uarr 6uarr

T23 2 3079903-3080011 108 I 6uarr 7

T24 2 3844577-3844685 108 Mb3509c S 3uarr 3uarr possible acyltransferase(frame-shifted in 212297)

T25 6 4155501-4155805 304 Mb3803echA21 A 5uarr possible enoyl-coA hydratase sSNP at 4155803 C-T (comp strand) in 245101 only

T26 2 4316987-4317095 108 Mb3927c A 3uarr chp (frame-shifted in H37Rv)

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expressionStrand A indicates antisense S sense and I intergenic

Golby

etalBM

CGenom

ics201314710

Page10

of18

httpwwwbiom

edcentralcom1471-216414710

example the amplicon array data had appeared to suggestthat Mb1914c and echA21 were upregulated in 245101but the oligo array data indicates that transcripts 11 and25 which are encoded within the co-ordinates encoded bythose two genes are actually encoded on the antisensestrands This apparent discrepancy can be rationalisedonce we consider that double stranded amplicon micro-array probes cannot discriminate between transcriptsencoded on the sense or antisense strands Transcripts 11and 25 can therefore be considered as potential antisensesRNAs (asRNA) which could be involved in translationalor post-transcriptional control of the sense transcriptOther potential cis-encoded sRNAs detected using the ar-rays include T6 T14 and T15T16 which are encoded onthe antisense strands to Mb1618c Mb2117 and Mb2607respectively and for each of these transcripts their expres-sion appears to be linked to the presence of a single SNPwithin the co-ordinates of the genes The approximateboundaries of these transcripts can be derived from thegenomic co-ordinates of the oligonucleotide probes thatdetect the expression of the transcript Thus the tran-scripts appear to be between 100ndash300 nt in size and thepositions of the linked SNPs appear to be positioned eitherjust upstream or within the predicted 5primeend of the tran-scripts (Figure 3) Three of the transcripts (T11 T14 andT25) are antisense to the central part of the sense encodedgene while T6 is encoded antisense to the 5prime end ofMb1618c As well as antisense transcripts we also saw thedifferential expression of sense transcripts The ampliconmicroarray data (confirmed by real time RT-PCR) indi-cated that nirB is strongly upregulated specifically in130701 in both in vitro and ex-vivo Mϕ An analysis ofthe oligonucleotide array data however indicates thatthere are two short transcripts T1 and T2 (sense and anti-sense respectively) that are encoded within the genomicco-ordinates of the nirB gene T2 is the longer in size (255vs 155nt) and more highly expressed (5 vs 3-fold) thanT1 and both transcripts appear to be linked to the pres-ence of a SNP that is located within the middle of T1 andapproximately 50nt upstream of T2Some of the transcripts are bone fide gene sense

strand mRNA transcripts such as T9 and T10 whichare encoded by Mb1749c and Mb1750c respectivelyAlthough it would appear that the two genes are tran-scribed separately it is probable that the two transcriptsare co-transcribed as the stop codon of Mb1750c over-laps the start codon of Mb1749c Eight of the tran-scripts listed in Table 5 are encoded within intergenicregions 7 of which are encoded within the polymorphicdirect repeat (DR) locus The DR locus of strains belong-ing to the M tuberculosis complex has been suggested toconstitute a CRISPR locus which have been shown inmany species of bacteria to be involved in protectionagainst exogenous foreign DNA such as plasmids and

phage [25] All the DR encoded transcripts are short(approx 100 nt) straddle contiguous repeat and spacersequences and show approximately 5-fold higher levelsof expression in 245101 and 130701 compared with212297 and 112101

Characterisation of differentially expressed cis asRNAThe genomic co-ordinates of the oligonucleotide probesthat detected the antisense species described above canonly serve as approximate estimations as to their startand end points Thus we used 5prime RLM-RACE (RNALigase Mediated Rapid Amplification of cDNA Ends) inan attempt to accurately define the transcriptional startsites (TSS) for the short sense transcript T2 and the anti-sense transcripts T6 T11 and T25 described in the abovesection (see Methods) These transcripts were chosen astheir expression levels are high and their transcript lengthswere considered to be sufficiently long to enable theRLM-RACE methodology to work Table 6 details thesizes of the PCR products obtained after RLM-RACE wasperformed using oligonucleotide primers designed tosequences predicted for transcripts T6 T11 and T25 NoPCR product was obtained for transcript T2 For each ofthe three transcripts the TSS was determined to be a Gresidue which is the most commonly used residue typefor mycobacterial TSSrsquos (Figure 4) [26] For each of the T6T11 and T25 transcripts expression of the asRNAs waslinked to the presence of a SNP (C to T) proximal tothe 5prime end of the asRNA Strains exhibiting the lsquoCrsquo alleleshowed no expression of the asRNA whilst the strainthat showed expression had the lsquoTrsquo allele An analysis ofthe nucleotide sequence in the vicinity of the SNPsreveals that for each of the three transcripts the SNPconstitutes the 6th residue of a motif that has stronghomology to the consensus sequence for the minus10 elementof Group A mycobacterial promoters (Figure 4) [26] Thefinding that a lsquoTrsquo residue is associated with expression isconsistent with the consensus sequence which indicatesthat 86 of all minus10 elements have a lsquoTrsquo residue at the 6thresidue position Several residues that flank the minus10 motifalso show a degree of conservation Sequence motifswhich show strong homology to group A minus35 elementsare present 18ndash19 bp upstream of the putative minus10elements and the distances between the minus35 -10 andTSS elements are consistent with those elements ofthe consensus sequence No protein encoding openreading frames were detected within the T6 T11 andT25 transcriptsIn a parallel study high density oligonucleotide micro-

arrays were also used to interrogate the transcriptomesof M tuberculosis H37Rv M bovis BCG Pasteur Myco-bacterium caprae and M bovis AN5 that had beengrown in Middlebrook 7H9 media As a result of theseexperiments two asRNA species were found to be

Golby et al BMC Genomics 2013 14710 Page 11 of 18httpwwwbiomedcentralcom1471-216414710

expressed within the antisense strands of the ino1 andnarH genes of M tuberculosis H37Rv but not in any ofthe other 3 strains tested (data not shown) A compari-son of nucleotide sequences of the orthologous genesacross the species suggested that expressions of theas_sRNAs correlated with the presence of a sSNP (C to T

transition at positions 50555 and 1292100 wrt H37Rvgenomic sequence for as_ino1 and as_narH respectively)upstream of the asRNAs Approximate information re-garding the transcriptional start site was deduced from thebinding co-ordinates of the probes that detected the tran-scripts As with the M bovis antisense sRNAs described

Figure 3 Expressions and schematic representation of genomic locations of selected cis-encoded antisense sRNAs identified using atiled oligonucleotide microarray Three asRNAs (open arrows) are (a) T6 (b) T14 and (c) T25 For each asRNA a histogram plots the foldchanges for each of the oligonucleotide probes that detected the asRNA and for each probe the binding position relative to the 212297genome is indicated Closed and open arrows indicate lengths and direction of transcription of genes and asRNAs respectively

Golby et al BMC Genomics 2013 14710 Page 12 of 18httpwwwbiomedcentralcom1471-216414710

above the T residue associated with the expression ofthe M tuberculosis asRNAs is part of a putative minus10element A minus35 element with an appropriate spacing tothe minus10 element was identified for as_ino1 but not foras_narH suggesting that the as_narH promoter maybelong to group B mycobacterial promoters that havea conserved minus10 but no minus35 motif [26]

DiscussionsThe aim of this work was to define possible phenotypicvariation across M bovis field isolates through a combin-ation of genome sequencing comparative genomics andtranscriptome analyses from both in vitro and ex vivo con-ditions Using these approaches we uncovered a range ofnovel findings the most striking of which was the realisa-tion that genes that had been predicted to be differentiallyexpressed based on amplicon-microarray data were in fact

not upregulated and that instead it was an antisense tran-script that was showing differential expression Analysingboth transcriptome and genome sequence data allowed usto identify SNPs responsible for the transcription of anti-sense RNAs with generation of a consensus minus10 promotersequence the likely mechanism Our results suggest thatdata generated from amplicon arrays in the past may needto be revisited as it is possible that some coding-sequences identified as being differentially expressed wereinstead antisense transcriptsWith the growth of technologies such as high density

tiled oligonucleotide microarrays and next generationsequencing there has been a rapid increase in the num-ber of reports describing the existence of non-codingRNAs (ncRNAs) in bacteria Non-coding RNAs broadlyconsist of two types cis- and trans-encoded RNA TransRNA includes intergenic encoded RNA while cis-encodedRNA includes 5prime and 3prime untranslated regions of mRNAand antisense RNA To study the expressions of both cis-and trans encoded ncRNA we used a high density oligo-nucleotide tiled microarray since our amplicon microarraywas unable to detect intergenic transcripts or differentiatebetween sense and antisense transcripts Previous studiesusing M tuberculosis have identified substantial amountsof ncRNA encoded in both intergenic and intragenic re-gions [78] We detected substantial amounts of ncRNAs

Figure 4 Promoters of anti sense RNAs a Promoters of the asRNAs as_mb1618c as_1914c and as_echA21 -10 and minus35 elements areindicated in bold and italics Transcriptional start sites are indicated by large font G characters while SNP residue that leads to the expression ofthe asRNA is indicated by a large font red T residue The consensus sequence for group a mycobacterial promoters is indicated Numericalsubscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determinedthat show the indicated residue b Promoters of the differentially expressed asRNAs as_ino1 and as_narH in M tuberculosis -10 and minus35 elementsare indicated in bold italics The red residue indicates SNP responsible for differential expression

Table 6 Determination of transcriptional start sites ofantisense RNAs

Transcript asRNA PCRprod size

TSSresidue type

TSS residue postn(wrt 212297)

T6 mb1618c_as 150 G 1778886

T11 mb1914c_as 220 G 2122978

T25 echA21_as 175 G 4155796

Golby et al BMC Genomics 2013 14710 Page 13 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

to be consistent with their predicted evolutionary distancesfrom each other Strain 112101 (type 17) is most closelyrelated to the original genome sequenced strain 212297(type 9) with 114 SNPs whereas the more distantly relatedstrains 245101 (type 25) and 130701 (type 35) have 431and 552 SNP differences respectively All four sequencedM bovis strains were more distantly related to Mafricanum (~1800 SNPs) and M tuberculosis H37Rv(~2200 SNPs)In addition to SNPs both large and small INDELs can

be inferred from NGS data although there are severalchallenges involved [11] Large sized insertions (gt500bp) are particularly difficult to identify with accuracy asthey require the de novo assembly of the reads that failto map to the reference genome and the subsequentdatabase searching with the resulting contigs Accurateidentification of large sized deletions (gt500 bp) are how-ever more easily identifiable and Additional file 2 liststhose that have been identified by NGS in the genomesof the three sequenced strains As poor coverage canconfound the identification of deletions we sought touse microarray technology to confirm the deletions iden-tified by the NGS data Genomic DNA was isolated fromall four strains labelled with fluorescent dyes andhybridised to a whole genome M tuberculosisM bovisamplicon microarray (see Methods) Table 2 lists severalLSPs that were detected across the strains usingmicroarrays Both NGS and microarray data predict thepresence of the large 68kb deletion (RDbovis(d)_0173)which encompasses genes Mb1963-Mb1971 and appearsto be specific to UK strains belonging to Type 17 thathas been described in a previous study [12] Several ofthese gene products are predicted to encode proteins in-volved in lipid metabolism but the lipid composition ofseveral type 17 isolates was found to be no different toother M bovis strains although their ability to incorpor-ate propionate into mycolic acids was found to be lower[6] A smaller 16 kb deletion specific 130701 that com-prises the 3prime end of Mb2056c Mb2055c and the 5prime end

of pkfB (Mb2054c) was also detected by both NGS andmicroarray data Due to a single base deletion Mb2056cand Mb2055c are pseudogenes in 212297 but the twogenes exist as one intact functional gene in 2145 andH37Rv (Rv2030c) The pfkB gene encodes a phospho-fructokinase homologue and is strongly immunogenic inhuman TB patients while Rv2030c encodes an erythro-mycin esterase Both pfkB and Mb2056c are members ofthe DosR regulon which are highly upregulated underanaerobic conditions and have been implicated in bac-terial persistence in vivo [13] Other smaller deletionsdetected include a deletion of a probable lipid transferprotein encoding gene Mb1699c which is specific to130701 and an aldoketo reductase encoding geneMb2320 that is specific to 112101

Linking SNPs to genes that show differential expressionamongst M bovis strains grown under vitro conditionsand in ex vivo macrophagesThe four M bovis field strains were grown to mid-logarithmic phase in pyruvate-containing Middlebrook7H9 liquid media and then used to infect bovine alveo-lar Mϕ using a multiplicity of infection (MOI) of 101(bacilli Mϕ) Mycobacterial RNA was recovered frominfected host cells 4 and 24 hrs post infection using adifferential lysis procedure and amplified using a modi-fied procedure similar to that described by van Gelderet al ([14] see Methods) As a control RNAs were alsoextracted from strains that had been incubated staticallyin RPMI cell culture media for a period of 4 hrs Toeliminate potential skewing effects on the transcriptomeresulting from the amplification process comparisonswere made only between amplified RNA generated fromsamples collected at the same time point and biologicalreplicate For the in vitro growth condition total RNAswere extracted from the four strains grown in a pyruvate-containing Middlebrook 7H9 liquid media and rolledduring incubation

Table 2 Large sequence polymorphisms present in M bovis field isolates

Mb CDS Mtb CDS Genomic postn(wrt to 212297)

Size ofdeletion (bp)

Gene(s) 212297 112101 245101 130701 Product(s)

Mb1699c Rv1627c nd nd DEL Probable lipid transfer proteinaMb1963c-Mb1971

Rv1928c-Rv1936

2172231-2179038 6807 tpx fadE17fadE18 echA13

DEL Gene products have roles inlipid metabolism

Mb2054c Rv2029c 2259798-2261457 1659 pfkB DEL Possible phosphofructokinase

Mb2055cMb2056c

Rv2030c DEL Conserved hypothetical protein(frame-shifted in 221297)

Mb2320 Rv2298 nd nd DEL DEL Aldoketo reductase

Mb3923c Rv3894c nd nd DEL Conserved membrane protein(frame-shifted in 221297)

DEL indicates genes that are deleted from strainsaMostowy et al 2005 nd not determined

Golby et al BMC Genomics 2013 14710 Page 4 of 18httpwwwbiomedcentralcom1471-216414710

The RNAs extracted from each of the growth condi-tions were converted to cyanine labelled cDNA using re-verse transcriptase and hybridised to whole genomeamplicon microarrays Using only those genes that arecommon to all four strains we found a total set of 70genes that showed a 25-fold or more difference in ex-pression in one or more strains when pairwise compari-sons were made between the transcriptomes of 212297and 112101 245101 or 130701 (Additional file 3) Asubset of these 70 genes is shown in Table 3 where keyexamples of alterations in metabolic processes areshown The numbers of genes that were found to showdifferential expression across the four strains reflectedthe evolutionary distances between 212297 and theother 3 strains Thus 112101 which is closely relatedto 212297 shows only 5 differentially expressed geneswhile the most distantly related strain 130701 shows 56genes differently expressed compared to 212297 Ofthese 56 genes 5 were specific to the in vitro conditionwhile 19 were specific to the Mϕ Ten genes were com-mon to both conditions which serves to validate thetechnical reproducibility of the RNA amplificationprocessUsing the genome sequencing information determined

for each of the four strains we attempted to correlatethe observed strain-specific differences in gene expres-sion with the presence or absence of mutations withinthe coding regions or promoters of those genes thatshow differential gene expression or in genes that areknown to regulate the activity of those genes Mb1749cand Mb1750c are two genes that are specificallyupregulated in 130701 and encode a toxin and antitoxin(TA) pair respectively belonging to type II TA systemsof the VapBC family [15] Members of VapB type toxinscontain PIN domains that cleave RNA and thus functionto control translation of mRNA transcripts [16] Thehomologous genes from strain 130701 show up to 19-and 10-fold higher levels of expression respectively thanthose of the other three strains An analysis of the cod-ing sequences of Mb1749c across all four strains re-vealed that the 130701 homologue has a unique nSNPat position 1932704 (wrt 212297 genomic sequence) aC-T transition that results in the nonconservative substi-tution of Gly19 to Asp Research has shown that TAgene pairs negatively regulate their own expressionthrough binding of the TA protein complex to the pro-moter region of the TA gene pair thus preventing accessto RNA polymerase [17] The G19D mutation inMb1749c could therefore impair the ability of the com-plex to bind to the promoter resulting in the deregula-tion of the TA gene pairMb2007c which shows a 4-fold higher expression in

130701 only encodes a transcriptional regulator of theLysR class There are two SNPs present in the coding

sequence of Mb2007c in 130701 which are absent inthe homologues of the other three strains the first is anSNP which results in the conservative substitution ofArg137 to Gln while the second is a more debilitatingnonsense SNP which ultimately leads to a protein whoselength is only 60 that of the wild-type Many regulatorsbelonging to the LysR family regulate their own expres-sion through a negative autoregulatory mechanism simi-lar to that described above for VapBC TA systems [18]A loss in protein integrity could therefore result in theregulator being unable to bind the regulatory regionleading to the observed upregulation in the expressionof this gene in 130701 As the product of this gene ispredicted to be a transcriptional regulator it was specu-lated that the regulation of gene(s) controlled by regula-tor could be affected in 130701 due to the severelytruncated form of this protein As LysR regulators areoften found to regulate genes that are divergently tran-scribed from the lysR gene it was surprisingly to findthat expression of the Mb2008 homologue in 130701which is predicted to encode a lysine transporter doesnot show any difference in expression in 130701 to212297 To define the regulon of this regulator we firstcompared the transcriptomes of 212297 transformedwith a multicopy plasmid expressing the truncated copyof mb2007c against a vector only control No differencesin expression were found (data not shown) which couldindicate that the regulator does not control any othergenes apart from itself or that experimental conditionsdid not favour the active form of the regulator LysR reg-ulators regulate expression of their regulon throughbinding of a co-inducer to the C-terminal domain andthe failure to observe any changes in gene expressioncould therefore be due to the absence of the co-inducerduring the experiment A further experiment to comparethe profiles of 212297 expressing either the truncatedor wild type forms of the protein also showed no differ-ences in expression (data not shown)Nitrite reductase catalyses the reduction of nitrite to

ammonia and is strongly expressed during growth in thepresence of nitrate or nitrite but repressed in the pres-ence of ammonia [19] The gene encoding the large sub-unit of the nitrite reductase nirB (Mb0258) showsapproximately 9-fold higher expression in 130701 com-pared to the other 3 strains in our standard ammoniacontaining 7H9 growth media suggesting that the strainhas lost regulatory control of this gene Expression ofnirB in M tuberculosis has been shown to be controlledby the response regulator GlnR [20] but an analysis ofthe sequence of the glnR orthologue from all four strainsrevealed no differences in either the coding or promotersequences A comparison of the nirB sequence from all4 strains did however reveal the presence of a singlebase (C to T) transition leading to a sSNP that is specific

Golby et al BMC Genomics 2013 14710 Page 5 of 18httpwwwbiomedcentralcom1471-216414710

Table 3 Fold change differences in gene expression in M bovis field isolates 1121 2451 and 1307 compared to 2122

Mb CDS Mtb CDS Common 112101 245101 130701 Product Assoc SNPInDel

7h9 RPMI 4hr MOslash 24 hr MOslash 7h9 RPMI 4hr MOslash 24 hr MOslash 7h9 RPMI 4hr MOslash 24 hr MOslash

Mb0038c Rv0037c 4uarr 3uarr 3uarr probable conservedintegral membrane

protein

Mb0124c Rv0120c fusA2 3uarr 3uarr probable elongationfactor

Mb0258 Rv0252 nirB 9uarr 12uarr 21uarr 9uarr nitrite reductase(large subunit)

sSNP at postn 303227 in 130701only C-T

Mb0259 Rv0253 nirD 3uarr 4uarr 5uarr nitrite reductase(small subunit)

Mb0428c Rv0420c 2uarr possibletransmembrane

protein

Mb0734 Rv0713 2uarr probable conservedtransmembrane

protein

Mb0947c Rv0923c 2darr 2darr conservedhypotheitical protein

Mb0948c Rv0924c mntH cation uptake system

Mb1013Mb1014

Rv0987 5uarr 10uarr 5uarr 5uarr 7uarr 6uarr probable adhesioncomponent transport

ABC transporter

nSNP at postn 1104263 in 245101 and130701 A-G (STOP to W) sSNP at

postn 1103991 in 245101 and 130701

Mb1015 Rv0988 3uarr 9uarr 3uarr 5uarr 5uarr possible conservedexported protein

Mb1161 Rv1130 2darr conservedhypothetical protein

Mb1162 Rv1131 gltA1 3darr probable citratesynthase

Mb1554c Rv1527c pks5 2darr probable polyketidesynthase

Mb1562 Rv1535 2uarr 5uarr hypothetical protein

Mb1619c Rv1593c 2uarr conservedhypothetical protein

Mb1749c Rv1720c 10uarr 19uarr 18uarr 12uarr conservedhypothetical protein

nSNP at postn 1932704 in130701 only C-T (G to D)

Mb1750c Rv1721c 6uarr 3uarr 9uarr 5uarr conservedhypothetical protein

Mb1833c Rv1804 3darr conservedhypothetical protein

Mb1834c Rv1805c 4uarr hypothetical protein

Golby

etalBM

CGenom

ics201314710

Page6of

18httpw

wwbiom

edcentralcom1471-216414710

Table 3 Fold change differences in gene expression in M bovis field isolates 1121 2451 and 1307 compared to 2122 (Continued)

Mb1835 Rv1806 PE20 3uarr PE family protein

Mb1885c Rv1854c ndh 3darr probable nadhdehydrogenase

Mb1914c Rv1882c 6uarr 3uarr 6uarr probable short-chaintype dehydrogenase

reductase

sSNP at postn 2122970 in241501 only C-T

Mb2007c Rv1985c 4uarr 3uarr 3uarr probabletranscriptional

regulatory protein

nSNP at postn 2208296 in 130701only G-A (Q to Stop)

Mb2015c RV1992c ctpG 2darr 3darr probable cationtransporter

Mb2420c Rv2398c cysW 2darr 6darr sulphate transporter

Mb2421c Rv2399c cysT 2darr 4darr sulphate transporter

Mb2607 Rv2577 9darr 5darr 7darr 12darr 10darr 7darr 7darr 13darr conservedhypothetical protein

[first part]

Mb3194 Rv3169 3uarr 3uarr 2uarr 3uarr conservedhypothetical protein

Mb3477c Rv3447c 8darr 3darr 10darr 9darr 2darr 7darr probable conservedmembrane protein

nSNP at postn 3812465 in 245101and 130701 T-C (S to G)

Mb3563c Rv3533c PPE62 5uarr 5uarr ppe family protein

Mb3721c Rv3696c glpK 3darr glycerol kinase

Mb3803 Rv3774 echA21 8uarr 7uarr 22uarr possible enoyl-coAhydratase

sSNP at postn 4155803 in241501 only G-A

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expression

Golby

etalBM

CGenom

ics201314710

Page7of

18httpw

wwbiom

edcentralcom1471-216414710

to 130701 It was not readily apparent why a sSNP in thecoding sequence of a gene should lead to an increase inexpression of that gene but there are several reports thatshow sSNPs leading to changes in stability of mRNA tran-scripts [2122] Rv0987 and Rv0988 of M tuberculosisH37Rv encode part of an ABC transporter and a putativesecreted hydrolase respectively In 212297 a single basetransition (G-A) introduces a stop codon that splitsRv0987 into the two pseudogenes Mb1013 and Mb1014Previous microarray based gene expression studies by ourgroup have shown that Rv0987 and Rv0988 in M tubercu-losis show higher levels of expression than the orthologousMb1013Mb1014 and Mb1015 respectively in M bovis212297 [23] and in the present study the Mb1013Mb1014 and Mb1015 homologues in 245101 and 130701 also showed higher expression (up to 10-fold) than thehomologues in 212297 and 112101 Comparing thesequences of Mb1013Mb1014 and Mb1015 across all 4strains indicated that strains that show high expressionhave the lsquoGrsquo alleleMb3477c encodes an ATP binding membrane protein

part of the Esx4 secretion system [24] and gene shows upto 10-fold higher expression in 245101 and 130701 com-pared to 212297 The gene also contains an A to C transi-tion at position 3812465 a nSNP at position resulting inthe non-conservative substitution of a serine to a glycineresidue

Of the 19 genes that show specific differential expres-sion in the Mϕ the most notable are Mb1914c andechA21 which show upregulation in 245101 only (upto 6- and 23-fold respectively) Both genes encodeproteins that could be involved in lipid metabolism andboth genes contain single sSNPs that are present in245101 but absent in the other three strainsReal time RT-PCR was used to verify a selection of

genes that showed differential gene expression as pre-dicted by the microarray analysis Figure 2 comparesthe fold changes in the expression levels of 4 genes asmeasured by microarray and real time RT-PCR ThenirB and Mb1749c genes were selected because theyshowed strong upregulation in 130701 in both in vitroand ex vivo Mϕ while Mb1914c and echA21 were chosenbecause the array data predicted them to be specificallyupregulated in 245101 and only in ex vivo Mϕ Foreach of the 4 genes the strain dependent pattern ofexpression as measured by real time RT-PCR was consistentwith that measured by microarray although the foldchanges measured by real time RT-PCR were higherthan those measured by microarray

Functional analysis of SNP role in differential geneexpressionThe above data showed that many of the strain specificdifferences in gene expression were linked to the presence

Figure 2 Confirmation of amplicon microarray results with real time RT-PCR The fold changes in gene expression for (a) Mb1750c (b) nirB(c) echA21 and (d) Mb1914c measured by microarray (open bars) in each of the four strains were compared to that measured by real time RT-PCR(closed bars)

Golby et al BMC Genomics 2013 14710 Page 8 of 18httpwwwbiomedcentralcom1471-216414710

of synonymous or non-synonymous SNPs located withinthe coding regions of the genes that show variable expres-sion Non-synonymous SNPs lead to changes in aminoacid sequence which can lead to changes in protein func-tion The C to T transition at position 1932704 (wrt 212297) in the coding sequence of the 130701 Mb1749chomologue leads to the non-conservative substitution ofGly19 to Asp and this nSNP appears to be linked to theupregulation of both Mb1749c and Mb1750c in thatstrain In order to confirm this a 09 kb DNA fragmentcontaining the Mb1749c-Mb1750c-lsquoMB1751c region of130701 (containing the lsquoTrsquo allele) and the equivalentregion from 212297 (with the lsquoCrsquo allele) were PCRamplified and the fragments were cloned separatelyinto the mycobacterial shuttle vector pKINT (see Methods)to create the constructs pPG107 and pPG106 respectivelyThe two constructs were introduced into Mycobacteriumsmegmatis mc2155 separately and then the expression ofMb1749c and Mb1750c in M smegmatis pPG101 wascompared to that of M smegmatis pPG102 using real timeRT-PCR Table 4 shows that the expression levels ofMb1750c and Mb1749c in the strain expressing themutated forms of Mb1749cMb1750c are 13- and 9-foldhigher respectively compared to the strain expressingthe wild type forms confirming that this SNP is respon-sible for the observed up-regulation of the two genes in130701Synonymous substitutions do not lead to changes in

protein sequence and have generally been considered tobe lsquosilentrsquo or benign Recent studies however have sug-gested that sSNPs can have functional effects such asdecreased mRNA stability and translation [2122] Inour own studies we have found several genes whose ex-pression levels correlate with the presence of sSNPs inthe coding regions of those genes For example a C-Ttransversion at position 303227 (wrt 212297) withinthe coding sequence of nirB of 130701 is a sSNP thatappears to be linked with the upregulation in expressionof nirB within that strain To confirm that this is thecase we PCR amplified 35 kb DNA fragments containing

the hsp-nirB-nirD-cobU region of strain 130701 (containingthe lsquoCrsquo allele) and the equivalent region from strain 212297(with the lsquoTrsquo allele) and cloned them separately into theintegrating vector pKINT to create the constructspPG108 and pPG109 respectively These constructswere introduced into 212297 and the expression levelsof nirB and nirD were found to be 30- and 2-fold re-spectively higher in the strain expressing the mutatednirBD locus compared to the strain expressing the wild-type form (Table 4) This confirms that this mutation isresponsible for the upregulation of the two genes in thisstrain

Use of a high density tiled oligonucleotide microarray todetect differentially expressed small RNA transcripts inM bovis isolatesThe M tuberculosisM bovis amplicon arrays used inthe present study were specifically designed to measureexpression levels of genes annotated in the genomic se-quence of M bovis 212297 [10] They were not how-ever designed to monitor the expression of non-codingRNA such as small RNA within intergenic regions orantisense sRNA Hence a high density tiled oligonucleotidemicroarray consisting of approximately 180000 partially-overlapping (10-base overlap) short 60 mer oligonucleo-tides was designed that offered an unbiased approach tothe detection of strand specific transcripts encoded overthe entire M bovis 212297 chromosome Total RNAthat includes small sized (lt100 nt) RNA species wasextracted from the four M bovis strains that had beengrown in liquid media and hybridised to the oligo-nucleotide microarray To avoid potential secondarystrand synthesis during cDNA synthesis which couldbe interpreted as sRNA the RNA was directly labelledwith cyanine based dyes After pairwise comparisonswere performed between 212297 and 112101 245101or 130701 220 oligonucleotide probes were identifiedthat detected differentially expressed transcripts (25 foldcut off ) in one or more of the three strains (Additionalfile 4) Only transcripts detected by multiple (2 or more)overlapping probes were regarded as genuine transcriptsas those detected by single probes could be due to cross-hybridisation effects or represent spurious transcriptsUsing these criteria 26 transcripts designated T1-T26were found to show differential expression in one or moreof the strains (Table 5) and those transcripts can be di-vided into those that are encoded within intergenic re-gions and those encoded within the genomic co-ordinatesencompassing annotated coding sequences Comparisonof the differentially expressed gene lists identified usingamplicon vs oligonucleotide arrays (Tables 3 and 5) itis clear that many of the transcripts detected using theamplicon arrays are not necessarily encoded on thesense gene strand as had been previous interpreted For

Table 4 Confirmation of SNP linkage to upregulation ingene expression

Strain Gene Fold change

M smegmatis mc2155 (pPG101) Mb1750c 161 plusmn 32

M smegmatis mc2155 (pPG101) Mb1749c 125 plusmn 12

M smegmatis mc2155 (pPG102) Mb1750c 19 plusmn 08

M smegmatis mc2155 (pPG102) Mb1749c 14 plusmn 04

Mb 212297 (pPG108) nirB 20 plusmn 10

Mb 212297 (pPG108) nirD 13 plusmn 05

Mb 212297 (pPG109) nirB 529 plusmn 159

Mb 212297 (pPG109) nirD 31 plusmn 08

Fold changes are the mean ratios plusmn standard deviation

Golby et al BMC Genomics 2013 14710 Page 9 of 18httpwwwbiomedcentralcom1471-216414710

Table 5 Differential expression of RNA transcripts as detected by a tiled oligonucleotide microarray

Transcript No Probes Position Size cds Strand 112101 245101 130701 CDS Product SNP

T1 3 303138-303295 157 Mb0258nirB A 3uarr nitrite reductase (large sub-unit)

T2 5 303260-303515 255 Mb0258nirB S 5uarr nitrite reductase (large sub-unit) sSNP at 303227 C-T in 130701 only

T3 2 1105085-1105193 108 Mb1014 A 3uarr 3uarr probable adhesion componentof ABC transporter

T4 2 1105158-1105315 157 Mb1014 A 3uarr 3uarr probable adhesioncomponent of ABC transporter

T5 3 1105158-1105315 157 Mb1014 S 3uarr 3uarr probable adhesioncomponent of ABC transporter

T6 5 1778876-1779131 255 Mb1618c A 9uarr possible secreted lipase sSNP at 1778879 C-T in 112101 only

T7 2 1840174-1840282 108 Mb1672c S 4darr hypothetical protein

T8 3 1932158-1932315 157 tRNA-Pro S 3uarr proline tRNA

T9 5 1932452-1932707 255 Mb1749c S 4uarr toxin component oftoxin-antitoxin system

nSNP at 1932704 C-T (G to D) in 130701 only

T10 5 1932746-1933075 329 Mb1750c S 7uarr anti-toxin component oftoxin-antitoxin system

T11 6 2122959-2123263 304 Mb1914c A 12uarr probable short-chain typedehydrogenasereductase

sSNP at 2122970 C-T in 245101 only

T12 2 2324463-2324571 108 Mb2110 S 3uarr hypothetical protein(frame-shifted in 212297)

T13 2 2328944-2329052 108 Mb2116cpepE S 2darr 3darr probable dipeptidase

T14 2 2329483-2329591 108 Mb2117 A 10darr 10darr 5prime-3prime exonuclease nSNP at 2329583 A-G (comp strand I to T)in 245101 only

T15 3 2868310-2868467 157 Mb2607 A 6darr 8darr hp (frame-shifted in 212297)

T16 2 2868506-2868614 108 Mb2607 A 7darr 7darr hp (frame-shifted in 212297) nSNP at 2868616 A-G (stop to W) in245101 and 130701

T17 2 3075740-3075848 108 I 5uarr 6uarr

T18 2 3075887-3075995 108 I 5uarr 6uarr

T19 2 3076622-3076730 108 I 4uarr 4uarr

T20 2 3076769-3076877 108 I 5uarr 5uarr

T21 2 3078728-3078836 108 I 6uarr 7uarr

T22 2 3079169-3079277 108 I 4uarr 6uarr

T23 2 3079903-3080011 108 I 6uarr 7

T24 2 3844577-3844685 108 Mb3509c S 3uarr 3uarr possible acyltransferase(frame-shifted in 212297)

T25 6 4155501-4155805 304 Mb3803echA21 A 5uarr possible enoyl-coA hydratase sSNP at 4155803 C-T (comp strand) in 245101 only

T26 2 4316987-4317095 108 Mb3927c A 3uarr chp (frame-shifted in H37Rv)

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expressionStrand A indicates antisense S sense and I intergenic

Golby

etalBM

CGenom

ics201314710

Page10

of18

httpwwwbiom

edcentralcom1471-216414710

example the amplicon array data had appeared to suggestthat Mb1914c and echA21 were upregulated in 245101but the oligo array data indicates that transcripts 11 and25 which are encoded within the co-ordinates encoded bythose two genes are actually encoded on the antisensestrands This apparent discrepancy can be rationalisedonce we consider that double stranded amplicon micro-array probes cannot discriminate between transcriptsencoded on the sense or antisense strands Transcripts 11and 25 can therefore be considered as potential antisensesRNAs (asRNA) which could be involved in translationalor post-transcriptional control of the sense transcriptOther potential cis-encoded sRNAs detected using the ar-rays include T6 T14 and T15T16 which are encoded onthe antisense strands to Mb1618c Mb2117 and Mb2607respectively and for each of these transcripts their expres-sion appears to be linked to the presence of a single SNPwithin the co-ordinates of the genes The approximateboundaries of these transcripts can be derived from thegenomic co-ordinates of the oligonucleotide probes thatdetect the expression of the transcript Thus the tran-scripts appear to be between 100ndash300 nt in size and thepositions of the linked SNPs appear to be positioned eitherjust upstream or within the predicted 5primeend of the tran-scripts (Figure 3) Three of the transcripts (T11 T14 andT25) are antisense to the central part of the sense encodedgene while T6 is encoded antisense to the 5prime end ofMb1618c As well as antisense transcripts we also saw thedifferential expression of sense transcripts The ampliconmicroarray data (confirmed by real time RT-PCR) indi-cated that nirB is strongly upregulated specifically in130701 in both in vitro and ex-vivo Mϕ An analysis ofthe oligonucleotide array data however indicates thatthere are two short transcripts T1 and T2 (sense and anti-sense respectively) that are encoded within the genomicco-ordinates of the nirB gene T2 is the longer in size (255vs 155nt) and more highly expressed (5 vs 3-fold) thanT1 and both transcripts appear to be linked to the pres-ence of a SNP that is located within the middle of T1 andapproximately 50nt upstream of T2Some of the transcripts are bone fide gene sense

strand mRNA transcripts such as T9 and T10 whichare encoded by Mb1749c and Mb1750c respectivelyAlthough it would appear that the two genes are tran-scribed separately it is probable that the two transcriptsare co-transcribed as the stop codon of Mb1750c over-laps the start codon of Mb1749c Eight of the tran-scripts listed in Table 5 are encoded within intergenicregions 7 of which are encoded within the polymorphicdirect repeat (DR) locus The DR locus of strains belong-ing to the M tuberculosis complex has been suggested toconstitute a CRISPR locus which have been shown inmany species of bacteria to be involved in protectionagainst exogenous foreign DNA such as plasmids and

phage [25] All the DR encoded transcripts are short(approx 100 nt) straddle contiguous repeat and spacersequences and show approximately 5-fold higher levelsof expression in 245101 and 130701 compared with212297 and 112101

Characterisation of differentially expressed cis asRNAThe genomic co-ordinates of the oligonucleotide probesthat detected the antisense species described above canonly serve as approximate estimations as to their startand end points Thus we used 5prime RLM-RACE (RNALigase Mediated Rapid Amplification of cDNA Ends) inan attempt to accurately define the transcriptional startsites (TSS) for the short sense transcript T2 and the anti-sense transcripts T6 T11 and T25 described in the abovesection (see Methods) These transcripts were chosen astheir expression levels are high and their transcript lengthswere considered to be sufficiently long to enable theRLM-RACE methodology to work Table 6 details thesizes of the PCR products obtained after RLM-RACE wasperformed using oligonucleotide primers designed tosequences predicted for transcripts T6 T11 and T25 NoPCR product was obtained for transcript T2 For each ofthe three transcripts the TSS was determined to be a Gresidue which is the most commonly used residue typefor mycobacterial TSSrsquos (Figure 4) [26] For each of the T6T11 and T25 transcripts expression of the asRNAs waslinked to the presence of a SNP (C to T) proximal tothe 5prime end of the asRNA Strains exhibiting the lsquoCrsquo alleleshowed no expression of the asRNA whilst the strainthat showed expression had the lsquoTrsquo allele An analysis ofthe nucleotide sequence in the vicinity of the SNPsreveals that for each of the three transcripts the SNPconstitutes the 6th residue of a motif that has stronghomology to the consensus sequence for the minus10 elementof Group A mycobacterial promoters (Figure 4) [26] Thefinding that a lsquoTrsquo residue is associated with expression isconsistent with the consensus sequence which indicatesthat 86 of all minus10 elements have a lsquoTrsquo residue at the 6thresidue position Several residues that flank the minus10 motifalso show a degree of conservation Sequence motifswhich show strong homology to group A minus35 elementsare present 18ndash19 bp upstream of the putative minus10elements and the distances between the minus35 -10 andTSS elements are consistent with those elements ofthe consensus sequence No protein encoding openreading frames were detected within the T6 T11 andT25 transcriptsIn a parallel study high density oligonucleotide micro-

arrays were also used to interrogate the transcriptomesof M tuberculosis H37Rv M bovis BCG Pasteur Myco-bacterium caprae and M bovis AN5 that had beengrown in Middlebrook 7H9 media As a result of theseexperiments two asRNA species were found to be

Golby et al BMC Genomics 2013 14710 Page 11 of 18httpwwwbiomedcentralcom1471-216414710

expressed within the antisense strands of the ino1 andnarH genes of M tuberculosis H37Rv but not in any ofthe other 3 strains tested (data not shown) A compari-son of nucleotide sequences of the orthologous genesacross the species suggested that expressions of theas_sRNAs correlated with the presence of a sSNP (C to T

transition at positions 50555 and 1292100 wrt H37Rvgenomic sequence for as_ino1 and as_narH respectively)upstream of the asRNAs Approximate information re-garding the transcriptional start site was deduced from thebinding co-ordinates of the probes that detected the tran-scripts As with the M bovis antisense sRNAs described

Figure 3 Expressions and schematic representation of genomic locations of selected cis-encoded antisense sRNAs identified using atiled oligonucleotide microarray Three asRNAs (open arrows) are (a) T6 (b) T14 and (c) T25 For each asRNA a histogram plots the foldchanges for each of the oligonucleotide probes that detected the asRNA and for each probe the binding position relative to the 212297genome is indicated Closed and open arrows indicate lengths and direction of transcription of genes and asRNAs respectively

Golby et al BMC Genomics 2013 14710 Page 12 of 18httpwwwbiomedcentralcom1471-216414710

above the T residue associated with the expression ofthe M tuberculosis asRNAs is part of a putative minus10element A minus35 element with an appropriate spacing tothe minus10 element was identified for as_ino1 but not foras_narH suggesting that the as_narH promoter maybelong to group B mycobacterial promoters that havea conserved minus10 but no minus35 motif [26]

DiscussionsThe aim of this work was to define possible phenotypicvariation across M bovis field isolates through a combin-ation of genome sequencing comparative genomics andtranscriptome analyses from both in vitro and ex vivo con-ditions Using these approaches we uncovered a range ofnovel findings the most striking of which was the realisa-tion that genes that had been predicted to be differentiallyexpressed based on amplicon-microarray data were in fact

not upregulated and that instead it was an antisense tran-script that was showing differential expression Analysingboth transcriptome and genome sequence data allowed usto identify SNPs responsible for the transcription of anti-sense RNAs with generation of a consensus minus10 promotersequence the likely mechanism Our results suggest thatdata generated from amplicon arrays in the past may needto be revisited as it is possible that some coding-sequences identified as being differentially expressed wereinstead antisense transcriptsWith the growth of technologies such as high density

tiled oligonucleotide microarrays and next generationsequencing there has been a rapid increase in the num-ber of reports describing the existence of non-codingRNAs (ncRNAs) in bacteria Non-coding RNAs broadlyconsist of two types cis- and trans-encoded RNA TransRNA includes intergenic encoded RNA while cis-encodedRNA includes 5prime and 3prime untranslated regions of mRNAand antisense RNA To study the expressions of both cis-and trans encoded ncRNA we used a high density oligo-nucleotide tiled microarray since our amplicon microarraywas unable to detect intergenic transcripts or differentiatebetween sense and antisense transcripts Previous studiesusing M tuberculosis have identified substantial amountsof ncRNA encoded in both intergenic and intragenic re-gions [78] We detected substantial amounts of ncRNAs

Figure 4 Promoters of anti sense RNAs a Promoters of the asRNAs as_mb1618c as_1914c and as_echA21 -10 and minus35 elements areindicated in bold and italics Transcriptional start sites are indicated by large font G characters while SNP residue that leads to the expression ofthe asRNA is indicated by a large font red T residue The consensus sequence for group a mycobacterial promoters is indicated Numericalsubscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determinedthat show the indicated residue b Promoters of the differentially expressed asRNAs as_ino1 and as_narH in M tuberculosis -10 and minus35 elementsare indicated in bold italics The red residue indicates SNP responsible for differential expression

Table 6 Determination of transcriptional start sites ofantisense RNAs

Transcript asRNA PCRprod size

TSSresidue type

TSS residue postn(wrt 212297)

T6 mb1618c_as 150 G 1778886

T11 mb1914c_as 220 G 2122978

T25 echA21_as 175 G 4155796

Golby et al BMC Genomics 2013 14710 Page 13 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

The RNAs extracted from each of the growth condi-tions were converted to cyanine labelled cDNA using re-verse transcriptase and hybridised to whole genomeamplicon microarrays Using only those genes that arecommon to all four strains we found a total set of 70genes that showed a 25-fold or more difference in ex-pression in one or more strains when pairwise compari-sons were made between the transcriptomes of 212297and 112101 245101 or 130701 (Additional file 3) Asubset of these 70 genes is shown in Table 3 where keyexamples of alterations in metabolic processes areshown The numbers of genes that were found to showdifferential expression across the four strains reflectedthe evolutionary distances between 212297 and theother 3 strains Thus 112101 which is closely relatedto 212297 shows only 5 differentially expressed geneswhile the most distantly related strain 130701 shows 56genes differently expressed compared to 212297 Ofthese 56 genes 5 were specific to the in vitro conditionwhile 19 were specific to the Mϕ Ten genes were com-mon to both conditions which serves to validate thetechnical reproducibility of the RNA amplificationprocessUsing the genome sequencing information determined

for each of the four strains we attempted to correlatethe observed strain-specific differences in gene expres-sion with the presence or absence of mutations withinthe coding regions or promoters of those genes thatshow differential gene expression or in genes that areknown to regulate the activity of those genes Mb1749cand Mb1750c are two genes that are specificallyupregulated in 130701 and encode a toxin and antitoxin(TA) pair respectively belonging to type II TA systemsof the VapBC family [15] Members of VapB type toxinscontain PIN domains that cleave RNA and thus functionto control translation of mRNA transcripts [16] Thehomologous genes from strain 130701 show up to 19-and 10-fold higher levels of expression respectively thanthose of the other three strains An analysis of the cod-ing sequences of Mb1749c across all four strains re-vealed that the 130701 homologue has a unique nSNPat position 1932704 (wrt 212297 genomic sequence) aC-T transition that results in the nonconservative substi-tution of Gly19 to Asp Research has shown that TAgene pairs negatively regulate their own expressionthrough binding of the TA protein complex to the pro-moter region of the TA gene pair thus preventing accessto RNA polymerase [17] The G19D mutation inMb1749c could therefore impair the ability of the com-plex to bind to the promoter resulting in the deregula-tion of the TA gene pairMb2007c which shows a 4-fold higher expression in

130701 only encodes a transcriptional regulator of theLysR class There are two SNPs present in the coding

sequence of Mb2007c in 130701 which are absent inthe homologues of the other three strains the first is anSNP which results in the conservative substitution ofArg137 to Gln while the second is a more debilitatingnonsense SNP which ultimately leads to a protein whoselength is only 60 that of the wild-type Many regulatorsbelonging to the LysR family regulate their own expres-sion through a negative autoregulatory mechanism simi-lar to that described above for VapBC TA systems [18]A loss in protein integrity could therefore result in theregulator being unable to bind the regulatory regionleading to the observed upregulation in the expressionof this gene in 130701 As the product of this gene ispredicted to be a transcriptional regulator it was specu-lated that the regulation of gene(s) controlled by regula-tor could be affected in 130701 due to the severelytruncated form of this protein As LysR regulators areoften found to regulate genes that are divergently tran-scribed from the lysR gene it was surprisingly to findthat expression of the Mb2008 homologue in 130701which is predicted to encode a lysine transporter doesnot show any difference in expression in 130701 to212297 To define the regulon of this regulator we firstcompared the transcriptomes of 212297 transformedwith a multicopy plasmid expressing the truncated copyof mb2007c against a vector only control No differencesin expression were found (data not shown) which couldindicate that the regulator does not control any othergenes apart from itself or that experimental conditionsdid not favour the active form of the regulator LysR reg-ulators regulate expression of their regulon throughbinding of a co-inducer to the C-terminal domain andthe failure to observe any changes in gene expressioncould therefore be due to the absence of the co-inducerduring the experiment A further experiment to comparethe profiles of 212297 expressing either the truncatedor wild type forms of the protein also showed no differ-ences in expression (data not shown)Nitrite reductase catalyses the reduction of nitrite to

ammonia and is strongly expressed during growth in thepresence of nitrate or nitrite but repressed in the pres-ence of ammonia [19] The gene encoding the large sub-unit of the nitrite reductase nirB (Mb0258) showsapproximately 9-fold higher expression in 130701 com-pared to the other 3 strains in our standard ammoniacontaining 7H9 growth media suggesting that the strainhas lost regulatory control of this gene Expression ofnirB in M tuberculosis has been shown to be controlledby the response regulator GlnR [20] but an analysis ofthe sequence of the glnR orthologue from all four strainsrevealed no differences in either the coding or promotersequences A comparison of the nirB sequence from all4 strains did however reveal the presence of a singlebase (C to T) transition leading to a sSNP that is specific

Golby et al BMC Genomics 2013 14710 Page 5 of 18httpwwwbiomedcentralcom1471-216414710

Table 3 Fold change differences in gene expression in M bovis field isolates 1121 2451 and 1307 compared to 2122

Mb CDS Mtb CDS Common 112101 245101 130701 Product Assoc SNPInDel

7h9 RPMI 4hr MOslash 24 hr MOslash 7h9 RPMI 4hr MOslash 24 hr MOslash 7h9 RPMI 4hr MOslash 24 hr MOslash

Mb0038c Rv0037c 4uarr 3uarr 3uarr probable conservedintegral membrane

protein

Mb0124c Rv0120c fusA2 3uarr 3uarr probable elongationfactor

Mb0258 Rv0252 nirB 9uarr 12uarr 21uarr 9uarr nitrite reductase(large subunit)

sSNP at postn 303227 in 130701only C-T

Mb0259 Rv0253 nirD 3uarr 4uarr 5uarr nitrite reductase(small subunit)

Mb0428c Rv0420c 2uarr possibletransmembrane

protein

Mb0734 Rv0713 2uarr probable conservedtransmembrane

protein

Mb0947c Rv0923c 2darr 2darr conservedhypotheitical protein

Mb0948c Rv0924c mntH cation uptake system

Mb1013Mb1014

Rv0987 5uarr 10uarr 5uarr 5uarr 7uarr 6uarr probable adhesioncomponent transport

ABC transporter

nSNP at postn 1104263 in 245101 and130701 A-G (STOP to W) sSNP at

postn 1103991 in 245101 and 130701

Mb1015 Rv0988 3uarr 9uarr 3uarr 5uarr 5uarr possible conservedexported protein

Mb1161 Rv1130 2darr conservedhypothetical protein

Mb1162 Rv1131 gltA1 3darr probable citratesynthase

Mb1554c Rv1527c pks5 2darr probable polyketidesynthase

Mb1562 Rv1535 2uarr 5uarr hypothetical protein

Mb1619c Rv1593c 2uarr conservedhypothetical protein

Mb1749c Rv1720c 10uarr 19uarr 18uarr 12uarr conservedhypothetical protein

nSNP at postn 1932704 in130701 only C-T (G to D)

Mb1750c Rv1721c 6uarr 3uarr 9uarr 5uarr conservedhypothetical protein

Mb1833c Rv1804 3darr conservedhypothetical protein

Mb1834c Rv1805c 4uarr hypothetical protein

Golby

etalBM

CGenom

ics201314710

Page6of

18httpw

wwbiom

edcentralcom1471-216414710

Table 3 Fold change differences in gene expression in M bovis field isolates 1121 2451 and 1307 compared to 2122 (Continued)

Mb1835 Rv1806 PE20 3uarr PE family protein

Mb1885c Rv1854c ndh 3darr probable nadhdehydrogenase

Mb1914c Rv1882c 6uarr 3uarr 6uarr probable short-chaintype dehydrogenase

reductase

sSNP at postn 2122970 in241501 only C-T

Mb2007c Rv1985c 4uarr 3uarr 3uarr probabletranscriptional

regulatory protein

nSNP at postn 2208296 in 130701only G-A (Q to Stop)

Mb2015c RV1992c ctpG 2darr 3darr probable cationtransporter

Mb2420c Rv2398c cysW 2darr 6darr sulphate transporter

Mb2421c Rv2399c cysT 2darr 4darr sulphate transporter

Mb2607 Rv2577 9darr 5darr 7darr 12darr 10darr 7darr 7darr 13darr conservedhypothetical protein

[first part]

Mb3194 Rv3169 3uarr 3uarr 2uarr 3uarr conservedhypothetical protein

Mb3477c Rv3447c 8darr 3darr 10darr 9darr 2darr 7darr probable conservedmembrane protein

nSNP at postn 3812465 in 245101and 130701 T-C (S to G)

Mb3563c Rv3533c PPE62 5uarr 5uarr ppe family protein

Mb3721c Rv3696c glpK 3darr glycerol kinase

Mb3803 Rv3774 echA21 8uarr 7uarr 22uarr possible enoyl-coAhydratase

sSNP at postn 4155803 in241501 only G-A

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expression

Golby

etalBM

CGenom

ics201314710

Page7of

18httpw

wwbiom

edcentralcom1471-216414710

to 130701 It was not readily apparent why a sSNP in thecoding sequence of a gene should lead to an increase inexpression of that gene but there are several reports thatshow sSNPs leading to changes in stability of mRNA tran-scripts [2122] Rv0987 and Rv0988 of M tuberculosisH37Rv encode part of an ABC transporter and a putativesecreted hydrolase respectively In 212297 a single basetransition (G-A) introduces a stop codon that splitsRv0987 into the two pseudogenes Mb1013 and Mb1014Previous microarray based gene expression studies by ourgroup have shown that Rv0987 and Rv0988 in M tubercu-losis show higher levels of expression than the orthologousMb1013Mb1014 and Mb1015 respectively in M bovis212297 [23] and in the present study the Mb1013Mb1014 and Mb1015 homologues in 245101 and 130701 also showed higher expression (up to 10-fold) than thehomologues in 212297 and 112101 Comparing thesequences of Mb1013Mb1014 and Mb1015 across all 4strains indicated that strains that show high expressionhave the lsquoGrsquo alleleMb3477c encodes an ATP binding membrane protein

part of the Esx4 secretion system [24] and gene shows upto 10-fold higher expression in 245101 and 130701 com-pared to 212297 The gene also contains an A to C transi-tion at position 3812465 a nSNP at position resulting inthe non-conservative substitution of a serine to a glycineresidue

Of the 19 genes that show specific differential expres-sion in the Mϕ the most notable are Mb1914c andechA21 which show upregulation in 245101 only (upto 6- and 23-fold respectively) Both genes encodeproteins that could be involved in lipid metabolism andboth genes contain single sSNPs that are present in245101 but absent in the other three strainsReal time RT-PCR was used to verify a selection of

genes that showed differential gene expression as pre-dicted by the microarray analysis Figure 2 comparesthe fold changes in the expression levels of 4 genes asmeasured by microarray and real time RT-PCR ThenirB and Mb1749c genes were selected because theyshowed strong upregulation in 130701 in both in vitroand ex vivo Mϕ while Mb1914c and echA21 were chosenbecause the array data predicted them to be specificallyupregulated in 245101 and only in ex vivo Mϕ Foreach of the 4 genes the strain dependent pattern ofexpression as measured by real time RT-PCR was consistentwith that measured by microarray although the foldchanges measured by real time RT-PCR were higherthan those measured by microarray

Functional analysis of SNP role in differential geneexpressionThe above data showed that many of the strain specificdifferences in gene expression were linked to the presence

Figure 2 Confirmation of amplicon microarray results with real time RT-PCR The fold changes in gene expression for (a) Mb1750c (b) nirB(c) echA21 and (d) Mb1914c measured by microarray (open bars) in each of the four strains were compared to that measured by real time RT-PCR(closed bars)

Golby et al BMC Genomics 2013 14710 Page 8 of 18httpwwwbiomedcentralcom1471-216414710

of synonymous or non-synonymous SNPs located withinthe coding regions of the genes that show variable expres-sion Non-synonymous SNPs lead to changes in aminoacid sequence which can lead to changes in protein func-tion The C to T transition at position 1932704 (wrt 212297) in the coding sequence of the 130701 Mb1749chomologue leads to the non-conservative substitution ofGly19 to Asp and this nSNP appears to be linked to theupregulation of both Mb1749c and Mb1750c in thatstrain In order to confirm this a 09 kb DNA fragmentcontaining the Mb1749c-Mb1750c-lsquoMB1751c region of130701 (containing the lsquoTrsquo allele) and the equivalentregion from 212297 (with the lsquoCrsquo allele) were PCRamplified and the fragments were cloned separatelyinto the mycobacterial shuttle vector pKINT (see Methods)to create the constructs pPG107 and pPG106 respectivelyThe two constructs were introduced into Mycobacteriumsmegmatis mc2155 separately and then the expression ofMb1749c and Mb1750c in M smegmatis pPG101 wascompared to that of M smegmatis pPG102 using real timeRT-PCR Table 4 shows that the expression levels ofMb1750c and Mb1749c in the strain expressing themutated forms of Mb1749cMb1750c are 13- and 9-foldhigher respectively compared to the strain expressingthe wild type forms confirming that this SNP is respon-sible for the observed up-regulation of the two genes in130701Synonymous substitutions do not lead to changes in

protein sequence and have generally been considered tobe lsquosilentrsquo or benign Recent studies however have sug-gested that sSNPs can have functional effects such asdecreased mRNA stability and translation [2122] Inour own studies we have found several genes whose ex-pression levels correlate with the presence of sSNPs inthe coding regions of those genes For example a C-Ttransversion at position 303227 (wrt 212297) withinthe coding sequence of nirB of 130701 is a sSNP thatappears to be linked with the upregulation in expressionof nirB within that strain To confirm that this is thecase we PCR amplified 35 kb DNA fragments containing

the hsp-nirB-nirD-cobU region of strain 130701 (containingthe lsquoCrsquo allele) and the equivalent region from strain 212297(with the lsquoTrsquo allele) and cloned them separately into theintegrating vector pKINT to create the constructspPG108 and pPG109 respectively These constructswere introduced into 212297 and the expression levelsof nirB and nirD were found to be 30- and 2-fold re-spectively higher in the strain expressing the mutatednirBD locus compared to the strain expressing the wild-type form (Table 4) This confirms that this mutation isresponsible for the upregulation of the two genes in thisstrain

Use of a high density tiled oligonucleotide microarray todetect differentially expressed small RNA transcripts inM bovis isolatesThe M tuberculosisM bovis amplicon arrays used inthe present study were specifically designed to measureexpression levels of genes annotated in the genomic se-quence of M bovis 212297 [10] They were not how-ever designed to monitor the expression of non-codingRNA such as small RNA within intergenic regions orantisense sRNA Hence a high density tiled oligonucleotidemicroarray consisting of approximately 180000 partially-overlapping (10-base overlap) short 60 mer oligonucleo-tides was designed that offered an unbiased approach tothe detection of strand specific transcripts encoded overthe entire M bovis 212297 chromosome Total RNAthat includes small sized (lt100 nt) RNA species wasextracted from the four M bovis strains that had beengrown in liquid media and hybridised to the oligo-nucleotide microarray To avoid potential secondarystrand synthesis during cDNA synthesis which couldbe interpreted as sRNA the RNA was directly labelledwith cyanine based dyes After pairwise comparisonswere performed between 212297 and 112101 245101or 130701 220 oligonucleotide probes were identifiedthat detected differentially expressed transcripts (25 foldcut off ) in one or more of the three strains (Additionalfile 4) Only transcripts detected by multiple (2 or more)overlapping probes were regarded as genuine transcriptsas those detected by single probes could be due to cross-hybridisation effects or represent spurious transcriptsUsing these criteria 26 transcripts designated T1-T26were found to show differential expression in one or moreof the strains (Table 5) and those transcripts can be di-vided into those that are encoded within intergenic re-gions and those encoded within the genomic co-ordinatesencompassing annotated coding sequences Comparisonof the differentially expressed gene lists identified usingamplicon vs oligonucleotide arrays (Tables 3 and 5) itis clear that many of the transcripts detected using theamplicon arrays are not necessarily encoded on thesense gene strand as had been previous interpreted For

Table 4 Confirmation of SNP linkage to upregulation ingene expression

Strain Gene Fold change

M smegmatis mc2155 (pPG101) Mb1750c 161 plusmn 32

M smegmatis mc2155 (pPG101) Mb1749c 125 plusmn 12

M smegmatis mc2155 (pPG102) Mb1750c 19 plusmn 08

M smegmatis mc2155 (pPG102) Mb1749c 14 plusmn 04

Mb 212297 (pPG108) nirB 20 plusmn 10

Mb 212297 (pPG108) nirD 13 plusmn 05

Mb 212297 (pPG109) nirB 529 plusmn 159

Mb 212297 (pPG109) nirD 31 plusmn 08

Fold changes are the mean ratios plusmn standard deviation

Golby et al BMC Genomics 2013 14710 Page 9 of 18httpwwwbiomedcentralcom1471-216414710

Table 5 Differential expression of RNA transcripts as detected by a tiled oligonucleotide microarray

Transcript No Probes Position Size cds Strand 112101 245101 130701 CDS Product SNP

T1 3 303138-303295 157 Mb0258nirB A 3uarr nitrite reductase (large sub-unit)

T2 5 303260-303515 255 Mb0258nirB S 5uarr nitrite reductase (large sub-unit) sSNP at 303227 C-T in 130701 only

T3 2 1105085-1105193 108 Mb1014 A 3uarr 3uarr probable adhesion componentof ABC transporter

T4 2 1105158-1105315 157 Mb1014 A 3uarr 3uarr probable adhesioncomponent of ABC transporter

T5 3 1105158-1105315 157 Mb1014 S 3uarr 3uarr probable adhesioncomponent of ABC transporter

T6 5 1778876-1779131 255 Mb1618c A 9uarr possible secreted lipase sSNP at 1778879 C-T in 112101 only

T7 2 1840174-1840282 108 Mb1672c S 4darr hypothetical protein

T8 3 1932158-1932315 157 tRNA-Pro S 3uarr proline tRNA

T9 5 1932452-1932707 255 Mb1749c S 4uarr toxin component oftoxin-antitoxin system

nSNP at 1932704 C-T (G to D) in 130701 only

T10 5 1932746-1933075 329 Mb1750c S 7uarr anti-toxin component oftoxin-antitoxin system

T11 6 2122959-2123263 304 Mb1914c A 12uarr probable short-chain typedehydrogenasereductase

sSNP at 2122970 C-T in 245101 only

T12 2 2324463-2324571 108 Mb2110 S 3uarr hypothetical protein(frame-shifted in 212297)

T13 2 2328944-2329052 108 Mb2116cpepE S 2darr 3darr probable dipeptidase

T14 2 2329483-2329591 108 Mb2117 A 10darr 10darr 5prime-3prime exonuclease nSNP at 2329583 A-G (comp strand I to T)in 245101 only

T15 3 2868310-2868467 157 Mb2607 A 6darr 8darr hp (frame-shifted in 212297)

T16 2 2868506-2868614 108 Mb2607 A 7darr 7darr hp (frame-shifted in 212297) nSNP at 2868616 A-G (stop to W) in245101 and 130701

T17 2 3075740-3075848 108 I 5uarr 6uarr

T18 2 3075887-3075995 108 I 5uarr 6uarr

T19 2 3076622-3076730 108 I 4uarr 4uarr

T20 2 3076769-3076877 108 I 5uarr 5uarr

T21 2 3078728-3078836 108 I 6uarr 7uarr

T22 2 3079169-3079277 108 I 4uarr 6uarr

T23 2 3079903-3080011 108 I 6uarr 7

T24 2 3844577-3844685 108 Mb3509c S 3uarr 3uarr possible acyltransferase(frame-shifted in 212297)

T25 6 4155501-4155805 304 Mb3803echA21 A 5uarr possible enoyl-coA hydratase sSNP at 4155803 C-T (comp strand) in 245101 only

T26 2 4316987-4317095 108 Mb3927c A 3uarr chp (frame-shifted in H37Rv)

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expressionStrand A indicates antisense S sense and I intergenic

Golby

etalBM

CGenom

ics201314710

Page10

of18

httpwwwbiom

edcentralcom1471-216414710

example the amplicon array data had appeared to suggestthat Mb1914c and echA21 were upregulated in 245101but the oligo array data indicates that transcripts 11 and25 which are encoded within the co-ordinates encoded bythose two genes are actually encoded on the antisensestrands This apparent discrepancy can be rationalisedonce we consider that double stranded amplicon micro-array probes cannot discriminate between transcriptsencoded on the sense or antisense strands Transcripts 11and 25 can therefore be considered as potential antisensesRNAs (asRNA) which could be involved in translationalor post-transcriptional control of the sense transcriptOther potential cis-encoded sRNAs detected using the ar-rays include T6 T14 and T15T16 which are encoded onthe antisense strands to Mb1618c Mb2117 and Mb2607respectively and for each of these transcripts their expres-sion appears to be linked to the presence of a single SNPwithin the co-ordinates of the genes The approximateboundaries of these transcripts can be derived from thegenomic co-ordinates of the oligonucleotide probes thatdetect the expression of the transcript Thus the tran-scripts appear to be between 100ndash300 nt in size and thepositions of the linked SNPs appear to be positioned eitherjust upstream or within the predicted 5primeend of the tran-scripts (Figure 3) Three of the transcripts (T11 T14 andT25) are antisense to the central part of the sense encodedgene while T6 is encoded antisense to the 5prime end ofMb1618c As well as antisense transcripts we also saw thedifferential expression of sense transcripts The ampliconmicroarray data (confirmed by real time RT-PCR) indi-cated that nirB is strongly upregulated specifically in130701 in both in vitro and ex-vivo Mϕ An analysis ofthe oligonucleotide array data however indicates thatthere are two short transcripts T1 and T2 (sense and anti-sense respectively) that are encoded within the genomicco-ordinates of the nirB gene T2 is the longer in size (255vs 155nt) and more highly expressed (5 vs 3-fold) thanT1 and both transcripts appear to be linked to the pres-ence of a SNP that is located within the middle of T1 andapproximately 50nt upstream of T2Some of the transcripts are bone fide gene sense

strand mRNA transcripts such as T9 and T10 whichare encoded by Mb1749c and Mb1750c respectivelyAlthough it would appear that the two genes are tran-scribed separately it is probable that the two transcriptsare co-transcribed as the stop codon of Mb1750c over-laps the start codon of Mb1749c Eight of the tran-scripts listed in Table 5 are encoded within intergenicregions 7 of which are encoded within the polymorphicdirect repeat (DR) locus The DR locus of strains belong-ing to the M tuberculosis complex has been suggested toconstitute a CRISPR locus which have been shown inmany species of bacteria to be involved in protectionagainst exogenous foreign DNA such as plasmids and

phage [25] All the DR encoded transcripts are short(approx 100 nt) straddle contiguous repeat and spacersequences and show approximately 5-fold higher levelsof expression in 245101 and 130701 compared with212297 and 112101

Characterisation of differentially expressed cis asRNAThe genomic co-ordinates of the oligonucleotide probesthat detected the antisense species described above canonly serve as approximate estimations as to their startand end points Thus we used 5prime RLM-RACE (RNALigase Mediated Rapid Amplification of cDNA Ends) inan attempt to accurately define the transcriptional startsites (TSS) for the short sense transcript T2 and the anti-sense transcripts T6 T11 and T25 described in the abovesection (see Methods) These transcripts were chosen astheir expression levels are high and their transcript lengthswere considered to be sufficiently long to enable theRLM-RACE methodology to work Table 6 details thesizes of the PCR products obtained after RLM-RACE wasperformed using oligonucleotide primers designed tosequences predicted for transcripts T6 T11 and T25 NoPCR product was obtained for transcript T2 For each ofthe three transcripts the TSS was determined to be a Gresidue which is the most commonly used residue typefor mycobacterial TSSrsquos (Figure 4) [26] For each of the T6T11 and T25 transcripts expression of the asRNAs waslinked to the presence of a SNP (C to T) proximal tothe 5prime end of the asRNA Strains exhibiting the lsquoCrsquo alleleshowed no expression of the asRNA whilst the strainthat showed expression had the lsquoTrsquo allele An analysis ofthe nucleotide sequence in the vicinity of the SNPsreveals that for each of the three transcripts the SNPconstitutes the 6th residue of a motif that has stronghomology to the consensus sequence for the minus10 elementof Group A mycobacterial promoters (Figure 4) [26] Thefinding that a lsquoTrsquo residue is associated with expression isconsistent with the consensus sequence which indicatesthat 86 of all minus10 elements have a lsquoTrsquo residue at the 6thresidue position Several residues that flank the minus10 motifalso show a degree of conservation Sequence motifswhich show strong homology to group A minus35 elementsare present 18ndash19 bp upstream of the putative minus10elements and the distances between the minus35 -10 andTSS elements are consistent with those elements ofthe consensus sequence No protein encoding openreading frames were detected within the T6 T11 andT25 transcriptsIn a parallel study high density oligonucleotide micro-

arrays were also used to interrogate the transcriptomesof M tuberculosis H37Rv M bovis BCG Pasteur Myco-bacterium caprae and M bovis AN5 that had beengrown in Middlebrook 7H9 media As a result of theseexperiments two asRNA species were found to be

Golby et al BMC Genomics 2013 14710 Page 11 of 18httpwwwbiomedcentralcom1471-216414710

expressed within the antisense strands of the ino1 andnarH genes of M tuberculosis H37Rv but not in any ofthe other 3 strains tested (data not shown) A compari-son of nucleotide sequences of the orthologous genesacross the species suggested that expressions of theas_sRNAs correlated with the presence of a sSNP (C to T

transition at positions 50555 and 1292100 wrt H37Rvgenomic sequence for as_ino1 and as_narH respectively)upstream of the asRNAs Approximate information re-garding the transcriptional start site was deduced from thebinding co-ordinates of the probes that detected the tran-scripts As with the M bovis antisense sRNAs described

Figure 3 Expressions and schematic representation of genomic locations of selected cis-encoded antisense sRNAs identified using atiled oligonucleotide microarray Three asRNAs (open arrows) are (a) T6 (b) T14 and (c) T25 For each asRNA a histogram plots the foldchanges for each of the oligonucleotide probes that detected the asRNA and for each probe the binding position relative to the 212297genome is indicated Closed and open arrows indicate lengths and direction of transcription of genes and asRNAs respectively

Golby et al BMC Genomics 2013 14710 Page 12 of 18httpwwwbiomedcentralcom1471-216414710

above the T residue associated with the expression ofthe M tuberculosis asRNAs is part of a putative minus10element A minus35 element with an appropriate spacing tothe minus10 element was identified for as_ino1 but not foras_narH suggesting that the as_narH promoter maybelong to group B mycobacterial promoters that havea conserved minus10 but no minus35 motif [26]

DiscussionsThe aim of this work was to define possible phenotypicvariation across M bovis field isolates through a combin-ation of genome sequencing comparative genomics andtranscriptome analyses from both in vitro and ex vivo con-ditions Using these approaches we uncovered a range ofnovel findings the most striking of which was the realisa-tion that genes that had been predicted to be differentiallyexpressed based on amplicon-microarray data were in fact

not upregulated and that instead it was an antisense tran-script that was showing differential expression Analysingboth transcriptome and genome sequence data allowed usto identify SNPs responsible for the transcription of anti-sense RNAs with generation of a consensus minus10 promotersequence the likely mechanism Our results suggest thatdata generated from amplicon arrays in the past may needto be revisited as it is possible that some coding-sequences identified as being differentially expressed wereinstead antisense transcriptsWith the growth of technologies such as high density

tiled oligonucleotide microarrays and next generationsequencing there has been a rapid increase in the num-ber of reports describing the existence of non-codingRNAs (ncRNAs) in bacteria Non-coding RNAs broadlyconsist of two types cis- and trans-encoded RNA TransRNA includes intergenic encoded RNA while cis-encodedRNA includes 5prime and 3prime untranslated regions of mRNAand antisense RNA To study the expressions of both cis-and trans encoded ncRNA we used a high density oligo-nucleotide tiled microarray since our amplicon microarraywas unable to detect intergenic transcripts or differentiatebetween sense and antisense transcripts Previous studiesusing M tuberculosis have identified substantial amountsof ncRNA encoded in both intergenic and intragenic re-gions [78] We detected substantial amounts of ncRNAs

Figure 4 Promoters of anti sense RNAs a Promoters of the asRNAs as_mb1618c as_1914c and as_echA21 -10 and minus35 elements areindicated in bold and italics Transcriptional start sites are indicated by large font G characters while SNP residue that leads to the expression ofthe asRNA is indicated by a large font red T residue The consensus sequence for group a mycobacterial promoters is indicated Numericalsubscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determinedthat show the indicated residue b Promoters of the differentially expressed asRNAs as_ino1 and as_narH in M tuberculosis -10 and minus35 elementsare indicated in bold italics The red residue indicates SNP responsible for differential expression

Table 6 Determination of transcriptional start sites ofantisense RNAs

Transcript asRNA PCRprod size

TSSresidue type

TSS residue postn(wrt 212297)

T6 mb1618c_as 150 G 1778886

T11 mb1914c_as 220 G 2122978

T25 echA21_as 175 G 4155796

Golby et al BMC Genomics 2013 14710 Page 13 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

Table 3 Fold change differences in gene expression in M bovis field isolates 1121 2451 and 1307 compared to 2122

Mb CDS Mtb CDS Common 112101 245101 130701 Product Assoc SNPInDel

7h9 RPMI 4hr MOslash 24 hr MOslash 7h9 RPMI 4hr MOslash 24 hr MOslash 7h9 RPMI 4hr MOslash 24 hr MOslash

Mb0038c Rv0037c 4uarr 3uarr 3uarr probable conservedintegral membrane

protein

Mb0124c Rv0120c fusA2 3uarr 3uarr probable elongationfactor

Mb0258 Rv0252 nirB 9uarr 12uarr 21uarr 9uarr nitrite reductase(large subunit)

sSNP at postn 303227 in 130701only C-T

Mb0259 Rv0253 nirD 3uarr 4uarr 5uarr nitrite reductase(small subunit)

Mb0428c Rv0420c 2uarr possibletransmembrane

protein

Mb0734 Rv0713 2uarr probable conservedtransmembrane

protein

Mb0947c Rv0923c 2darr 2darr conservedhypotheitical protein

Mb0948c Rv0924c mntH cation uptake system

Mb1013Mb1014

Rv0987 5uarr 10uarr 5uarr 5uarr 7uarr 6uarr probable adhesioncomponent transport

ABC transporter

nSNP at postn 1104263 in 245101 and130701 A-G (STOP to W) sSNP at

postn 1103991 in 245101 and 130701

Mb1015 Rv0988 3uarr 9uarr 3uarr 5uarr 5uarr possible conservedexported protein

Mb1161 Rv1130 2darr conservedhypothetical protein

Mb1162 Rv1131 gltA1 3darr probable citratesynthase

Mb1554c Rv1527c pks5 2darr probable polyketidesynthase

Mb1562 Rv1535 2uarr 5uarr hypothetical protein

Mb1619c Rv1593c 2uarr conservedhypothetical protein

Mb1749c Rv1720c 10uarr 19uarr 18uarr 12uarr conservedhypothetical protein

nSNP at postn 1932704 in130701 only C-T (G to D)

Mb1750c Rv1721c 6uarr 3uarr 9uarr 5uarr conservedhypothetical protein

Mb1833c Rv1804 3darr conservedhypothetical protein

Mb1834c Rv1805c 4uarr hypothetical protein

Golby

etalBM

CGenom

ics201314710

Page6of

18httpw

wwbiom

edcentralcom1471-216414710

Table 3 Fold change differences in gene expression in M bovis field isolates 1121 2451 and 1307 compared to 2122 (Continued)

Mb1835 Rv1806 PE20 3uarr PE family protein

Mb1885c Rv1854c ndh 3darr probable nadhdehydrogenase

Mb1914c Rv1882c 6uarr 3uarr 6uarr probable short-chaintype dehydrogenase

reductase

sSNP at postn 2122970 in241501 only C-T

Mb2007c Rv1985c 4uarr 3uarr 3uarr probabletranscriptional

regulatory protein

nSNP at postn 2208296 in 130701only G-A (Q to Stop)

Mb2015c RV1992c ctpG 2darr 3darr probable cationtransporter

Mb2420c Rv2398c cysW 2darr 6darr sulphate transporter

Mb2421c Rv2399c cysT 2darr 4darr sulphate transporter

Mb2607 Rv2577 9darr 5darr 7darr 12darr 10darr 7darr 7darr 13darr conservedhypothetical protein

[first part]

Mb3194 Rv3169 3uarr 3uarr 2uarr 3uarr conservedhypothetical protein

Mb3477c Rv3447c 8darr 3darr 10darr 9darr 2darr 7darr probable conservedmembrane protein

nSNP at postn 3812465 in 245101and 130701 T-C (S to G)

Mb3563c Rv3533c PPE62 5uarr 5uarr ppe family protein

Mb3721c Rv3696c glpK 3darr glycerol kinase

Mb3803 Rv3774 echA21 8uarr 7uarr 22uarr possible enoyl-coAhydratase

sSNP at postn 4155803 in241501 only G-A

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expression

Golby

etalBM

CGenom

ics201314710

Page7of

18httpw

wwbiom

edcentralcom1471-216414710

to 130701 It was not readily apparent why a sSNP in thecoding sequence of a gene should lead to an increase inexpression of that gene but there are several reports thatshow sSNPs leading to changes in stability of mRNA tran-scripts [2122] Rv0987 and Rv0988 of M tuberculosisH37Rv encode part of an ABC transporter and a putativesecreted hydrolase respectively In 212297 a single basetransition (G-A) introduces a stop codon that splitsRv0987 into the two pseudogenes Mb1013 and Mb1014Previous microarray based gene expression studies by ourgroup have shown that Rv0987 and Rv0988 in M tubercu-losis show higher levels of expression than the orthologousMb1013Mb1014 and Mb1015 respectively in M bovis212297 [23] and in the present study the Mb1013Mb1014 and Mb1015 homologues in 245101 and 130701 also showed higher expression (up to 10-fold) than thehomologues in 212297 and 112101 Comparing thesequences of Mb1013Mb1014 and Mb1015 across all 4strains indicated that strains that show high expressionhave the lsquoGrsquo alleleMb3477c encodes an ATP binding membrane protein

part of the Esx4 secretion system [24] and gene shows upto 10-fold higher expression in 245101 and 130701 com-pared to 212297 The gene also contains an A to C transi-tion at position 3812465 a nSNP at position resulting inthe non-conservative substitution of a serine to a glycineresidue

Of the 19 genes that show specific differential expres-sion in the Mϕ the most notable are Mb1914c andechA21 which show upregulation in 245101 only (upto 6- and 23-fold respectively) Both genes encodeproteins that could be involved in lipid metabolism andboth genes contain single sSNPs that are present in245101 but absent in the other three strainsReal time RT-PCR was used to verify a selection of

genes that showed differential gene expression as pre-dicted by the microarray analysis Figure 2 comparesthe fold changes in the expression levels of 4 genes asmeasured by microarray and real time RT-PCR ThenirB and Mb1749c genes were selected because theyshowed strong upregulation in 130701 in both in vitroand ex vivo Mϕ while Mb1914c and echA21 were chosenbecause the array data predicted them to be specificallyupregulated in 245101 and only in ex vivo Mϕ Foreach of the 4 genes the strain dependent pattern ofexpression as measured by real time RT-PCR was consistentwith that measured by microarray although the foldchanges measured by real time RT-PCR were higherthan those measured by microarray

Functional analysis of SNP role in differential geneexpressionThe above data showed that many of the strain specificdifferences in gene expression were linked to the presence

Figure 2 Confirmation of amplicon microarray results with real time RT-PCR The fold changes in gene expression for (a) Mb1750c (b) nirB(c) echA21 and (d) Mb1914c measured by microarray (open bars) in each of the four strains were compared to that measured by real time RT-PCR(closed bars)

Golby et al BMC Genomics 2013 14710 Page 8 of 18httpwwwbiomedcentralcom1471-216414710

of synonymous or non-synonymous SNPs located withinthe coding regions of the genes that show variable expres-sion Non-synonymous SNPs lead to changes in aminoacid sequence which can lead to changes in protein func-tion The C to T transition at position 1932704 (wrt 212297) in the coding sequence of the 130701 Mb1749chomologue leads to the non-conservative substitution ofGly19 to Asp and this nSNP appears to be linked to theupregulation of both Mb1749c and Mb1750c in thatstrain In order to confirm this a 09 kb DNA fragmentcontaining the Mb1749c-Mb1750c-lsquoMB1751c region of130701 (containing the lsquoTrsquo allele) and the equivalentregion from 212297 (with the lsquoCrsquo allele) were PCRamplified and the fragments were cloned separatelyinto the mycobacterial shuttle vector pKINT (see Methods)to create the constructs pPG107 and pPG106 respectivelyThe two constructs were introduced into Mycobacteriumsmegmatis mc2155 separately and then the expression ofMb1749c and Mb1750c in M smegmatis pPG101 wascompared to that of M smegmatis pPG102 using real timeRT-PCR Table 4 shows that the expression levels ofMb1750c and Mb1749c in the strain expressing themutated forms of Mb1749cMb1750c are 13- and 9-foldhigher respectively compared to the strain expressingthe wild type forms confirming that this SNP is respon-sible for the observed up-regulation of the two genes in130701Synonymous substitutions do not lead to changes in

protein sequence and have generally been considered tobe lsquosilentrsquo or benign Recent studies however have sug-gested that sSNPs can have functional effects such asdecreased mRNA stability and translation [2122] Inour own studies we have found several genes whose ex-pression levels correlate with the presence of sSNPs inthe coding regions of those genes For example a C-Ttransversion at position 303227 (wrt 212297) withinthe coding sequence of nirB of 130701 is a sSNP thatappears to be linked with the upregulation in expressionof nirB within that strain To confirm that this is thecase we PCR amplified 35 kb DNA fragments containing

the hsp-nirB-nirD-cobU region of strain 130701 (containingthe lsquoCrsquo allele) and the equivalent region from strain 212297(with the lsquoTrsquo allele) and cloned them separately into theintegrating vector pKINT to create the constructspPG108 and pPG109 respectively These constructswere introduced into 212297 and the expression levelsof nirB and nirD were found to be 30- and 2-fold re-spectively higher in the strain expressing the mutatednirBD locus compared to the strain expressing the wild-type form (Table 4) This confirms that this mutation isresponsible for the upregulation of the two genes in thisstrain

Use of a high density tiled oligonucleotide microarray todetect differentially expressed small RNA transcripts inM bovis isolatesThe M tuberculosisM bovis amplicon arrays used inthe present study were specifically designed to measureexpression levels of genes annotated in the genomic se-quence of M bovis 212297 [10] They were not how-ever designed to monitor the expression of non-codingRNA such as small RNA within intergenic regions orantisense sRNA Hence a high density tiled oligonucleotidemicroarray consisting of approximately 180000 partially-overlapping (10-base overlap) short 60 mer oligonucleo-tides was designed that offered an unbiased approach tothe detection of strand specific transcripts encoded overthe entire M bovis 212297 chromosome Total RNAthat includes small sized (lt100 nt) RNA species wasextracted from the four M bovis strains that had beengrown in liquid media and hybridised to the oligo-nucleotide microarray To avoid potential secondarystrand synthesis during cDNA synthesis which couldbe interpreted as sRNA the RNA was directly labelledwith cyanine based dyes After pairwise comparisonswere performed between 212297 and 112101 245101or 130701 220 oligonucleotide probes were identifiedthat detected differentially expressed transcripts (25 foldcut off ) in one or more of the three strains (Additionalfile 4) Only transcripts detected by multiple (2 or more)overlapping probes were regarded as genuine transcriptsas those detected by single probes could be due to cross-hybridisation effects or represent spurious transcriptsUsing these criteria 26 transcripts designated T1-T26were found to show differential expression in one or moreof the strains (Table 5) and those transcripts can be di-vided into those that are encoded within intergenic re-gions and those encoded within the genomic co-ordinatesencompassing annotated coding sequences Comparisonof the differentially expressed gene lists identified usingamplicon vs oligonucleotide arrays (Tables 3 and 5) itis clear that many of the transcripts detected using theamplicon arrays are not necessarily encoded on thesense gene strand as had been previous interpreted For

Table 4 Confirmation of SNP linkage to upregulation ingene expression

Strain Gene Fold change

M smegmatis mc2155 (pPG101) Mb1750c 161 plusmn 32

M smegmatis mc2155 (pPG101) Mb1749c 125 plusmn 12

M smegmatis mc2155 (pPG102) Mb1750c 19 plusmn 08

M smegmatis mc2155 (pPG102) Mb1749c 14 plusmn 04

Mb 212297 (pPG108) nirB 20 plusmn 10

Mb 212297 (pPG108) nirD 13 plusmn 05

Mb 212297 (pPG109) nirB 529 plusmn 159

Mb 212297 (pPG109) nirD 31 plusmn 08

Fold changes are the mean ratios plusmn standard deviation

Golby et al BMC Genomics 2013 14710 Page 9 of 18httpwwwbiomedcentralcom1471-216414710

Table 5 Differential expression of RNA transcripts as detected by a tiled oligonucleotide microarray

Transcript No Probes Position Size cds Strand 112101 245101 130701 CDS Product SNP

T1 3 303138-303295 157 Mb0258nirB A 3uarr nitrite reductase (large sub-unit)

T2 5 303260-303515 255 Mb0258nirB S 5uarr nitrite reductase (large sub-unit) sSNP at 303227 C-T in 130701 only

T3 2 1105085-1105193 108 Mb1014 A 3uarr 3uarr probable adhesion componentof ABC transporter

T4 2 1105158-1105315 157 Mb1014 A 3uarr 3uarr probable adhesioncomponent of ABC transporter

T5 3 1105158-1105315 157 Mb1014 S 3uarr 3uarr probable adhesioncomponent of ABC transporter

T6 5 1778876-1779131 255 Mb1618c A 9uarr possible secreted lipase sSNP at 1778879 C-T in 112101 only

T7 2 1840174-1840282 108 Mb1672c S 4darr hypothetical protein

T8 3 1932158-1932315 157 tRNA-Pro S 3uarr proline tRNA

T9 5 1932452-1932707 255 Mb1749c S 4uarr toxin component oftoxin-antitoxin system

nSNP at 1932704 C-T (G to D) in 130701 only

T10 5 1932746-1933075 329 Mb1750c S 7uarr anti-toxin component oftoxin-antitoxin system

T11 6 2122959-2123263 304 Mb1914c A 12uarr probable short-chain typedehydrogenasereductase

sSNP at 2122970 C-T in 245101 only

T12 2 2324463-2324571 108 Mb2110 S 3uarr hypothetical protein(frame-shifted in 212297)

T13 2 2328944-2329052 108 Mb2116cpepE S 2darr 3darr probable dipeptidase

T14 2 2329483-2329591 108 Mb2117 A 10darr 10darr 5prime-3prime exonuclease nSNP at 2329583 A-G (comp strand I to T)in 245101 only

T15 3 2868310-2868467 157 Mb2607 A 6darr 8darr hp (frame-shifted in 212297)

T16 2 2868506-2868614 108 Mb2607 A 7darr 7darr hp (frame-shifted in 212297) nSNP at 2868616 A-G (stop to W) in245101 and 130701

T17 2 3075740-3075848 108 I 5uarr 6uarr

T18 2 3075887-3075995 108 I 5uarr 6uarr

T19 2 3076622-3076730 108 I 4uarr 4uarr

T20 2 3076769-3076877 108 I 5uarr 5uarr

T21 2 3078728-3078836 108 I 6uarr 7uarr

T22 2 3079169-3079277 108 I 4uarr 6uarr

T23 2 3079903-3080011 108 I 6uarr 7

T24 2 3844577-3844685 108 Mb3509c S 3uarr 3uarr possible acyltransferase(frame-shifted in 212297)

T25 6 4155501-4155805 304 Mb3803echA21 A 5uarr possible enoyl-coA hydratase sSNP at 4155803 C-T (comp strand) in 245101 only

T26 2 4316987-4317095 108 Mb3927c A 3uarr chp (frame-shifted in H37Rv)

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expressionStrand A indicates antisense S sense and I intergenic

Golby

etalBM

CGenom

ics201314710

Page10

of18

httpwwwbiom

edcentralcom1471-216414710

example the amplicon array data had appeared to suggestthat Mb1914c and echA21 were upregulated in 245101but the oligo array data indicates that transcripts 11 and25 which are encoded within the co-ordinates encoded bythose two genes are actually encoded on the antisensestrands This apparent discrepancy can be rationalisedonce we consider that double stranded amplicon micro-array probes cannot discriminate between transcriptsencoded on the sense or antisense strands Transcripts 11and 25 can therefore be considered as potential antisensesRNAs (asRNA) which could be involved in translationalor post-transcriptional control of the sense transcriptOther potential cis-encoded sRNAs detected using the ar-rays include T6 T14 and T15T16 which are encoded onthe antisense strands to Mb1618c Mb2117 and Mb2607respectively and for each of these transcripts their expres-sion appears to be linked to the presence of a single SNPwithin the co-ordinates of the genes The approximateboundaries of these transcripts can be derived from thegenomic co-ordinates of the oligonucleotide probes thatdetect the expression of the transcript Thus the tran-scripts appear to be between 100ndash300 nt in size and thepositions of the linked SNPs appear to be positioned eitherjust upstream or within the predicted 5primeend of the tran-scripts (Figure 3) Three of the transcripts (T11 T14 andT25) are antisense to the central part of the sense encodedgene while T6 is encoded antisense to the 5prime end ofMb1618c As well as antisense transcripts we also saw thedifferential expression of sense transcripts The ampliconmicroarray data (confirmed by real time RT-PCR) indi-cated that nirB is strongly upregulated specifically in130701 in both in vitro and ex-vivo Mϕ An analysis ofthe oligonucleotide array data however indicates thatthere are two short transcripts T1 and T2 (sense and anti-sense respectively) that are encoded within the genomicco-ordinates of the nirB gene T2 is the longer in size (255vs 155nt) and more highly expressed (5 vs 3-fold) thanT1 and both transcripts appear to be linked to the pres-ence of a SNP that is located within the middle of T1 andapproximately 50nt upstream of T2Some of the transcripts are bone fide gene sense

strand mRNA transcripts such as T9 and T10 whichare encoded by Mb1749c and Mb1750c respectivelyAlthough it would appear that the two genes are tran-scribed separately it is probable that the two transcriptsare co-transcribed as the stop codon of Mb1750c over-laps the start codon of Mb1749c Eight of the tran-scripts listed in Table 5 are encoded within intergenicregions 7 of which are encoded within the polymorphicdirect repeat (DR) locus The DR locus of strains belong-ing to the M tuberculosis complex has been suggested toconstitute a CRISPR locus which have been shown inmany species of bacteria to be involved in protectionagainst exogenous foreign DNA such as plasmids and

phage [25] All the DR encoded transcripts are short(approx 100 nt) straddle contiguous repeat and spacersequences and show approximately 5-fold higher levelsof expression in 245101 and 130701 compared with212297 and 112101

Characterisation of differentially expressed cis asRNAThe genomic co-ordinates of the oligonucleotide probesthat detected the antisense species described above canonly serve as approximate estimations as to their startand end points Thus we used 5prime RLM-RACE (RNALigase Mediated Rapid Amplification of cDNA Ends) inan attempt to accurately define the transcriptional startsites (TSS) for the short sense transcript T2 and the anti-sense transcripts T6 T11 and T25 described in the abovesection (see Methods) These transcripts were chosen astheir expression levels are high and their transcript lengthswere considered to be sufficiently long to enable theRLM-RACE methodology to work Table 6 details thesizes of the PCR products obtained after RLM-RACE wasperformed using oligonucleotide primers designed tosequences predicted for transcripts T6 T11 and T25 NoPCR product was obtained for transcript T2 For each ofthe three transcripts the TSS was determined to be a Gresidue which is the most commonly used residue typefor mycobacterial TSSrsquos (Figure 4) [26] For each of the T6T11 and T25 transcripts expression of the asRNAs waslinked to the presence of a SNP (C to T) proximal tothe 5prime end of the asRNA Strains exhibiting the lsquoCrsquo alleleshowed no expression of the asRNA whilst the strainthat showed expression had the lsquoTrsquo allele An analysis ofthe nucleotide sequence in the vicinity of the SNPsreveals that for each of the three transcripts the SNPconstitutes the 6th residue of a motif that has stronghomology to the consensus sequence for the minus10 elementof Group A mycobacterial promoters (Figure 4) [26] Thefinding that a lsquoTrsquo residue is associated with expression isconsistent with the consensus sequence which indicatesthat 86 of all minus10 elements have a lsquoTrsquo residue at the 6thresidue position Several residues that flank the minus10 motifalso show a degree of conservation Sequence motifswhich show strong homology to group A minus35 elementsare present 18ndash19 bp upstream of the putative minus10elements and the distances between the minus35 -10 andTSS elements are consistent with those elements ofthe consensus sequence No protein encoding openreading frames were detected within the T6 T11 andT25 transcriptsIn a parallel study high density oligonucleotide micro-

arrays were also used to interrogate the transcriptomesof M tuberculosis H37Rv M bovis BCG Pasteur Myco-bacterium caprae and M bovis AN5 that had beengrown in Middlebrook 7H9 media As a result of theseexperiments two asRNA species were found to be

Golby et al BMC Genomics 2013 14710 Page 11 of 18httpwwwbiomedcentralcom1471-216414710

expressed within the antisense strands of the ino1 andnarH genes of M tuberculosis H37Rv but not in any ofthe other 3 strains tested (data not shown) A compari-son of nucleotide sequences of the orthologous genesacross the species suggested that expressions of theas_sRNAs correlated with the presence of a sSNP (C to T

transition at positions 50555 and 1292100 wrt H37Rvgenomic sequence for as_ino1 and as_narH respectively)upstream of the asRNAs Approximate information re-garding the transcriptional start site was deduced from thebinding co-ordinates of the probes that detected the tran-scripts As with the M bovis antisense sRNAs described

Figure 3 Expressions and schematic representation of genomic locations of selected cis-encoded antisense sRNAs identified using atiled oligonucleotide microarray Three asRNAs (open arrows) are (a) T6 (b) T14 and (c) T25 For each asRNA a histogram plots the foldchanges for each of the oligonucleotide probes that detected the asRNA and for each probe the binding position relative to the 212297genome is indicated Closed and open arrows indicate lengths and direction of transcription of genes and asRNAs respectively

Golby et al BMC Genomics 2013 14710 Page 12 of 18httpwwwbiomedcentralcom1471-216414710

above the T residue associated with the expression ofthe M tuberculosis asRNAs is part of a putative minus10element A minus35 element with an appropriate spacing tothe minus10 element was identified for as_ino1 but not foras_narH suggesting that the as_narH promoter maybelong to group B mycobacterial promoters that havea conserved minus10 but no minus35 motif [26]

DiscussionsThe aim of this work was to define possible phenotypicvariation across M bovis field isolates through a combin-ation of genome sequencing comparative genomics andtranscriptome analyses from both in vitro and ex vivo con-ditions Using these approaches we uncovered a range ofnovel findings the most striking of which was the realisa-tion that genes that had been predicted to be differentiallyexpressed based on amplicon-microarray data were in fact

not upregulated and that instead it was an antisense tran-script that was showing differential expression Analysingboth transcriptome and genome sequence data allowed usto identify SNPs responsible for the transcription of anti-sense RNAs with generation of a consensus minus10 promotersequence the likely mechanism Our results suggest thatdata generated from amplicon arrays in the past may needto be revisited as it is possible that some coding-sequences identified as being differentially expressed wereinstead antisense transcriptsWith the growth of technologies such as high density

tiled oligonucleotide microarrays and next generationsequencing there has been a rapid increase in the num-ber of reports describing the existence of non-codingRNAs (ncRNAs) in bacteria Non-coding RNAs broadlyconsist of two types cis- and trans-encoded RNA TransRNA includes intergenic encoded RNA while cis-encodedRNA includes 5prime and 3prime untranslated regions of mRNAand antisense RNA To study the expressions of both cis-and trans encoded ncRNA we used a high density oligo-nucleotide tiled microarray since our amplicon microarraywas unable to detect intergenic transcripts or differentiatebetween sense and antisense transcripts Previous studiesusing M tuberculosis have identified substantial amountsof ncRNA encoded in both intergenic and intragenic re-gions [78] We detected substantial amounts of ncRNAs

Figure 4 Promoters of anti sense RNAs a Promoters of the asRNAs as_mb1618c as_1914c and as_echA21 -10 and minus35 elements areindicated in bold and italics Transcriptional start sites are indicated by large font G characters while SNP residue that leads to the expression ofthe asRNA is indicated by a large font red T residue The consensus sequence for group a mycobacterial promoters is indicated Numericalsubscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determinedthat show the indicated residue b Promoters of the differentially expressed asRNAs as_ino1 and as_narH in M tuberculosis -10 and minus35 elementsare indicated in bold italics The red residue indicates SNP responsible for differential expression

Table 6 Determination of transcriptional start sites ofantisense RNAs

Transcript asRNA PCRprod size

TSSresidue type

TSS residue postn(wrt 212297)

T6 mb1618c_as 150 G 1778886

T11 mb1914c_as 220 G 2122978

T25 echA21_as 175 G 4155796

Golby et al BMC Genomics 2013 14710 Page 13 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

Table 3 Fold change differences in gene expression in M bovis field isolates 1121 2451 and 1307 compared to 2122 (Continued)

Mb1835 Rv1806 PE20 3uarr PE family protein

Mb1885c Rv1854c ndh 3darr probable nadhdehydrogenase

Mb1914c Rv1882c 6uarr 3uarr 6uarr probable short-chaintype dehydrogenase

reductase

sSNP at postn 2122970 in241501 only C-T

Mb2007c Rv1985c 4uarr 3uarr 3uarr probabletranscriptional

regulatory protein

nSNP at postn 2208296 in 130701only G-A (Q to Stop)

Mb2015c RV1992c ctpG 2darr 3darr probable cationtransporter

Mb2420c Rv2398c cysW 2darr 6darr sulphate transporter

Mb2421c Rv2399c cysT 2darr 4darr sulphate transporter

Mb2607 Rv2577 9darr 5darr 7darr 12darr 10darr 7darr 7darr 13darr conservedhypothetical protein

[first part]

Mb3194 Rv3169 3uarr 3uarr 2uarr 3uarr conservedhypothetical protein

Mb3477c Rv3447c 8darr 3darr 10darr 9darr 2darr 7darr probable conservedmembrane protein

nSNP at postn 3812465 in 245101and 130701 T-C (S to G)

Mb3563c Rv3533c PPE62 5uarr 5uarr ppe family protein

Mb3721c Rv3696c glpK 3darr glycerol kinase

Mb3803 Rv3774 echA21 8uarr 7uarr 22uarr possible enoyl-coAhydratase

sSNP at postn 4155803 in241501 only G-A

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expression

Golby

etalBM

CGenom

ics201314710

Page7of

18httpw

wwbiom

edcentralcom1471-216414710

to 130701 It was not readily apparent why a sSNP in thecoding sequence of a gene should lead to an increase inexpression of that gene but there are several reports thatshow sSNPs leading to changes in stability of mRNA tran-scripts [2122] Rv0987 and Rv0988 of M tuberculosisH37Rv encode part of an ABC transporter and a putativesecreted hydrolase respectively In 212297 a single basetransition (G-A) introduces a stop codon that splitsRv0987 into the two pseudogenes Mb1013 and Mb1014Previous microarray based gene expression studies by ourgroup have shown that Rv0987 and Rv0988 in M tubercu-losis show higher levels of expression than the orthologousMb1013Mb1014 and Mb1015 respectively in M bovis212297 [23] and in the present study the Mb1013Mb1014 and Mb1015 homologues in 245101 and 130701 also showed higher expression (up to 10-fold) than thehomologues in 212297 and 112101 Comparing thesequences of Mb1013Mb1014 and Mb1015 across all 4strains indicated that strains that show high expressionhave the lsquoGrsquo alleleMb3477c encodes an ATP binding membrane protein

part of the Esx4 secretion system [24] and gene shows upto 10-fold higher expression in 245101 and 130701 com-pared to 212297 The gene also contains an A to C transi-tion at position 3812465 a nSNP at position resulting inthe non-conservative substitution of a serine to a glycineresidue

Of the 19 genes that show specific differential expres-sion in the Mϕ the most notable are Mb1914c andechA21 which show upregulation in 245101 only (upto 6- and 23-fold respectively) Both genes encodeproteins that could be involved in lipid metabolism andboth genes contain single sSNPs that are present in245101 but absent in the other three strainsReal time RT-PCR was used to verify a selection of

genes that showed differential gene expression as pre-dicted by the microarray analysis Figure 2 comparesthe fold changes in the expression levels of 4 genes asmeasured by microarray and real time RT-PCR ThenirB and Mb1749c genes were selected because theyshowed strong upregulation in 130701 in both in vitroand ex vivo Mϕ while Mb1914c and echA21 were chosenbecause the array data predicted them to be specificallyupregulated in 245101 and only in ex vivo Mϕ Foreach of the 4 genes the strain dependent pattern ofexpression as measured by real time RT-PCR was consistentwith that measured by microarray although the foldchanges measured by real time RT-PCR were higherthan those measured by microarray

Functional analysis of SNP role in differential geneexpressionThe above data showed that many of the strain specificdifferences in gene expression were linked to the presence

Figure 2 Confirmation of amplicon microarray results with real time RT-PCR The fold changes in gene expression for (a) Mb1750c (b) nirB(c) echA21 and (d) Mb1914c measured by microarray (open bars) in each of the four strains were compared to that measured by real time RT-PCR(closed bars)

Golby et al BMC Genomics 2013 14710 Page 8 of 18httpwwwbiomedcentralcom1471-216414710

of synonymous or non-synonymous SNPs located withinthe coding regions of the genes that show variable expres-sion Non-synonymous SNPs lead to changes in aminoacid sequence which can lead to changes in protein func-tion The C to T transition at position 1932704 (wrt 212297) in the coding sequence of the 130701 Mb1749chomologue leads to the non-conservative substitution ofGly19 to Asp and this nSNP appears to be linked to theupregulation of both Mb1749c and Mb1750c in thatstrain In order to confirm this a 09 kb DNA fragmentcontaining the Mb1749c-Mb1750c-lsquoMB1751c region of130701 (containing the lsquoTrsquo allele) and the equivalentregion from 212297 (with the lsquoCrsquo allele) were PCRamplified and the fragments were cloned separatelyinto the mycobacterial shuttle vector pKINT (see Methods)to create the constructs pPG107 and pPG106 respectivelyThe two constructs were introduced into Mycobacteriumsmegmatis mc2155 separately and then the expression ofMb1749c and Mb1750c in M smegmatis pPG101 wascompared to that of M smegmatis pPG102 using real timeRT-PCR Table 4 shows that the expression levels ofMb1750c and Mb1749c in the strain expressing themutated forms of Mb1749cMb1750c are 13- and 9-foldhigher respectively compared to the strain expressingthe wild type forms confirming that this SNP is respon-sible for the observed up-regulation of the two genes in130701Synonymous substitutions do not lead to changes in

protein sequence and have generally been considered tobe lsquosilentrsquo or benign Recent studies however have sug-gested that sSNPs can have functional effects such asdecreased mRNA stability and translation [2122] Inour own studies we have found several genes whose ex-pression levels correlate with the presence of sSNPs inthe coding regions of those genes For example a C-Ttransversion at position 303227 (wrt 212297) withinthe coding sequence of nirB of 130701 is a sSNP thatappears to be linked with the upregulation in expressionof nirB within that strain To confirm that this is thecase we PCR amplified 35 kb DNA fragments containing

the hsp-nirB-nirD-cobU region of strain 130701 (containingthe lsquoCrsquo allele) and the equivalent region from strain 212297(with the lsquoTrsquo allele) and cloned them separately into theintegrating vector pKINT to create the constructspPG108 and pPG109 respectively These constructswere introduced into 212297 and the expression levelsof nirB and nirD were found to be 30- and 2-fold re-spectively higher in the strain expressing the mutatednirBD locus compared to the strain expressing the wild-type form (Table 4) This confirms that this mutation isresponsible for the upregulation of the two genes in thisstrain

Use of a high density tiled oligonucleotide microarray todetect differentially expressed small RNA transcripts inM bovis isolatesThe M tuberculosisM bovis amplicon arrays used inthe present study were specifically designed to measureexpression levels of genes annotated in the genomic se-quence of M bovis 212297 [10] They were not how-ever designed to monitor the expression of non-codingRNA such as small RNA within intergenic regions orantisense sRNA Hence a high density tiled oligonucleotidemicroarray consisting of approximately 180000 partially-overlapping (10-base overlap) short 60 mer oligonucleo-tides was designed that offered an unbiased approach tothe detection of strand specific transcripts encoded overthe entire M bovis 212297 chromosome Total RNAthat includes small sized (lt100 nt) RNA species wasextracted from the four M bovis strains that had beengrown in liquid media and hybridised to the oligo-nucleotide microarray To avoid potential secondarystrand synthesis during cDNA synthesis which couldbe interpreted as sRNA the RNA was directly labelledwith cyanine based dyes After pairwise comparisonswere performed between 212297 and 112101 245101or 130701 220 oligonucleotide probes were identifiedthat detected differentially expressed transcripts (25 foldcut off ) in one or more of the three strains (Additionalfile 4) Only transcripts detected by multiple (2 or more)overlapping probes were regarded as genuine transcriptsas those detected by single probes could be due to cross-hybridisation effects or represent spurious transcriptsUsing these criteria 26 transcripts designated T1-T26were found to show differential expression in one or moreof the strains (Table 5) and those transcripts can be di-vided into those that are encoded within intergenic re-gions and those encoded within the genomic co-ordinatesencompassing annotated coding sequences Comparisonof the differentially expressed gene lists identified usingamplicon vs oligonucleotide arrays (Tables 3 and 5) itis clear that many of the transcripts detected using theamplicon arrays are not necessarily encoded on thesense gene strand as had been previous interpreted For

Table 4 Confirmation of SNP linkage to upregulation ingene expression

Strain Gene Fold change

M smegmatis mc2155 (pPG101) Mb1750c 161 plusmn 32

M smegmatis mc2155 (pPG101) Mb1749c 125 plusmn 12

M smegmatis mc2155 (pPG102) Mb1750c 19 plusmn 08

M smegmatis mc2155 (pPG102) Mb1749c 14 plusmn 04

Mb 212297 (pPG108) nirB 20 plusmn 10

Mb 212297 (pPG108) nirD 13 plusmn 05

Mb 212297 (pPG109) nirB 529 plusmn 159

Mb 212297 (pPG109) nirD 31 plusmn 08

Fold changes are the mean ratios plusmn standard deviation

Golby et al BMC Genomics 2013 14710 Page 9 of 18httpwwwbiomedcentralcom1471-216414710

Table 5 Differential expression of RNA transcripts as detected by a tiled oligonucleotide microarray

Transcript No Probes Position Size cds Strand 112101 245101 130701 CDS Product SNP

T1 3 303138-303295 157 Mb0258nirB A 3uarr nitrite reductase (large sub-unit)

T2 5 303260-303515 255 Mb0258nirB S 5uarr nitrite reductase (large sub-unit) sSNP at 303227 C-T in 130701 only

T3 2 1105085-1105193 108 Mb1014 A 3uarr 3uarr probable adhesion componentof ABC transporter

T4 2 1105158-1105315 157 Mb1014 A 3uarr 3uarr probable adhesioncomponent of ABC transporter

T5 3 1105158-1105315 157 Mb1014 S 3uarr 3uarr probable adhesioncomponent of ABC transporter

T6 5 1778876-1779131 255 Mb1618c A 9uarr possible secreted lipase sSNP at 1778879 C-T in 112101 only

T7 2 1840174-1840282 108 Mb1672c S 4darr hypothetical protein

T8 3 1932158-1932315 157 tRNA-Pro S 3uarr proline tRNA

T9 5 1932452-1932707 255 Mb1749c S 4uarr toxin component oftoxin-antitoxin system

nSNP at 1932704 C-T (G to D) in 130701 only

T10 5 1932746-1933075 329 Mb1750c S 7uarr anti-toxin component oftoxin-antitoxin system

T11 6 2122959-2123263 304 Mb1914c A 12uarr probable short-chain typedehydrogenasereductase

sSNP at 2122970 C-T in 245101 only

T12 2 2324463-2324571 108 Mb2110 S 3uarr hypothetical protein(frame-shifted in 212297)

T13 2 2328944-2329052 108 Mb2116cpepE S 2darr 3darr probable dipeptidase

T14 2 2329483-2329591 108 Mb2117 A 10darr 10darr 5prime-3prime exonuclease nSNP at 2329583 A-G (comp strand I to T)in 245101 only

T15 3 2868310-2868467 157 Mb2607 A 6darr 8darr hp (frame-shifted in 212297)

T16 2 2868506-2868614 108 Mb2607 A 7darr 7darr hp (frame-shifted in 212297) nSNP at 2868616 A-G (stop to W) in245101 and 130701

T17 2 3075740-3075848 108 I 5uarr 6uarr

T18 2 3075887-3075995 108 I 5uarr 6uarr

T19 2 3076622-3076730 108 I 4uarr 4uarr

T20 2 3076769-3076877 108 I 5uarr 5uarr

T21 2 3078728-3078836 108 I 6uarr 7uarr

T22 2 3079169-3079277 108 I 4uarr 6uarr

T23 2 3079903-3080011 108 I 6uarr 7

T24 2 3844577-3844685 108 Mb3509c S 3uarr 3uarr possible acyltransferase(frame-shifted in 212297)

T25 6 4155501-4155805 304 Mb3803echA21 A 5uarr possible enoyl-coA hydratase sSNP at 4155803 C-T (comp strand) in 245101 only

T26 2 4316987-4317095 108 Mb3927c A 3uarr chp (frame-shifted in H37Rv)

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expressionStrand A indicates antisense S sense and I intergenic

Golby

etalBM

CGenom

ics201314710

Page10

of18

httpwwwbiom

edcentralcom1471-216414710

example the amplicon array data had appeared to suggestthat Mb1914c and echA21 were upregulated in 245101but the oligo array data indicates that transcripts 11 and25 which are encoded within the co-ordinates encoded bythose two genes are actually encoded on the antisensestrands This apparent discrepancy can be rationalisedonce we consider that double stranded amplicon micro-array probes cannot discriminate between transcriptsencoded on the sense or antisense strands Transcripts 11and 25 can therefore be considered as potential antisensesRNAs (asRNA) which could be involved in translationalor post-transcriptional control of the sense transcriptOther potential cis-encoded sRNAs detected using the ar-rays include T6 T14 and T15T16 which are encoded onthe antisense strands to Mb1618c Mb2117 and Mb2607respectively and for each of these transcripts their expres-sion appears to be linked to the presence of a single SNPwithin the co-ordinates of the genes The approximateboundaries of these transcripts can be derived from thegenomic co-ordinates of the oligonucleotide probes thatdetect the expression of the transcript Thus the tran-scripts appear to be between 100ndash300 nt in size and thepositions of the linked SNPs appear to be positioned eitherjust upstream or within the predicted 5primeend of the tran-scripts (Figure 3) Three of the transcripts (T11 T14 andT25) are antisense to the central part of the sense encodedgene while T6 is encoded antisense to the 5prime end ofMb1618c As well as antisense transcripts we also saw thedifferential expression of sense transcripts The ampliconmicroarray data (confirmed by real time RT-PCR) indi-cated that nirB is strongly upregulated specifically in130701 in both in vitro and ex-vivo Mϕ An analysis ofthe oligonucleotide array data however indicates thatthere are two short transcripts T1 and T2 (sense and anti-sense respectively) that are encoded within the genomicco-ordinates of the nirB gene T2 is the longer in size (255vs 155nt) and more highly expressed (5 vs 3-fold) thanT1 and both transcripts appear to be linked to the pres-ence of a SNP that is located within the middle of T1 andapproximately 50nt upstream of T2Some of the transcripts are bone fide gene sense

strand mRNA transcripts such as T9 and T10 whichare encoded by Mb1749c and Mb1750c respectivelyAlthough it would appear that the two genes are tran-scribed separately it is probable that the two transcriptsare co-transcribed as the stop codon of Mb1750c over-laps the start codon of Mb1749c Eight of the tran-scripts listed in Table 5 are encoded within intergenicregions 7 of which are encoded within the polymorphicdirect repeat (DR) locus The DR locus of strains belong-ing to the M tuberculosis complex has been suggested toconstitute a CRISPR locus which have been shown inmany species of bacteria to be involved in protectionagainst exogenous foreign DNA such as plasmids and

phage [25] All the DR encoded transcripts are short(approx 100 nt) straddle contiguous repeat and spacersequences and show approximately 5-fold higher levelsof expression in 245101 and 130701 compared with212297 and 112101

Characterisation of differentially expressed cis asRNAThe genomic co-ordinates of the oligonucleotide probesthat detected the antisense species described above canonly serve as approximate estimations as to their startand end points Thus we used 5prime RLM-RACE (RNALigase Mediated Rapid Amplification of cDNA Ends) inan attempt to accurately define the transcriptional startsites (TSS) for the short sense transcript T2 and the anti-sense transcripts T6 T11 and T25 described in the abovesection (see Methods) These transcripts were chosen astheir expression levels are high and their transcript lengthswere considered to be sufficiently long to enable theRLM-RACE methodology to work Table 6 details thesizes of the PCR products obtained after RLM-RACE wasperformed using oligonucleotide primers designed tosequences predicted for transcripts T6 T11 and T25 NoPCR product was obtained for transcript T2 For each ofthe three transcripts the TSS was determined to be a Gresidue which is the most commonly used residue typefor mycobacterial TSSrsquos (Figure 4) [26] For each of the T6T11 and T25 transcripts expression of the asRNAs waslinked to the presence of a SNP (C to T) proximal tothe 5prime end of the asRNA Strains exhibiting the lsquoCrsquo alleleshowed no expression of the asRNA whilst the strainthat showed expression had the lsquoTrsquo allele An analysis ofthe nucleotide sequence in the vicinity of the SNPsreveals that for each of the three transcripts the SNPconstitutes the 6th residue of a motif that has stronghomology to the consensus sequence for the minus10 elementof Group A mycobacterial promoters (Figure 4) [26] Thefinding that a lsquoTrsquo residue is associated with expression isconsistent with the consensus sequence which indicatesthat 86 of all minus10 elements have a lsquoTrsquo residue at the 6thresidue position Several residues that flank the minus10 motifalso show a degree of conservation Sequence motifswhich show strong homology to group A minus35 elementsare present 18ndash19 bp upstream of the putative minus10elements and the distances between the minus35 -10 andTSS elements are consistent with those elements ofthe consensus sequence No protein encoding openreading frames were detected within the T6 T11 andT25 transcriptsIn a parallel study high density oligonucleotide micro-

arrays were also used to interrogate the transcriptomesof M tuberculosis H37Rv M bovis BCG Pasteur Myco-bacterium caprae and M bovis AN5 that had beengrown in Middlebrook 7H9 media As a result of theseexperiments two asRNA species were found to be

Golby et al BMC Genomics 2013 14710 Page 11 of 18httpwwwbiomedcentralcom1471-216414710

expressed within the antisense strands of the ino1 andnarH genes of M tuberculosis H37Rv but not in any ofthe other 3 strains tested (data not shown) A compari-son of nucleotide sequences of the orthologous genesacross the species suggested that expressions of theas_sRNAs correlated with the presence of a sSNP (C to T

transition at positions 50555 and 1292100 wrt H37Rvgenomic sequence for as_ino1 and as_narH respectively)upstream of the asRNAs Approximate information re-garding the transcriptional start site was deduced from thebinding co-ordinates of the probes that detected the tran-scripts As with the M bovis antisense sRNAs described

Figure 3 Expressions and schematic representation of genomic locations of selected cis-encoded antisense sRNAs identified using atiled oligonucleotide microarray Three asRNAs (open arrows) are (a) T6 (b) T14 and (c) T25 For each asRNA a histogram plots the foldchanges for each of the oligonucleotide probes that detected the asRNA and for each probe the binding position relative to the 212297genome is indicated Closed and open arrows indicate lengths and direction of transcription of genes and asRNAs respectively

Golby et al BMC Genomics 2013 14710 Page 12 of 18httpwwwbiomedcentralcom1471-216414710

above the T residue associated with the expression ofthe M tuberculosis asRNAs is part of a putative minus10element A minus35 element with an appropriate spacing tothe minus10 element was identified for as_ino1 but not foras_narH suggesting that the as_narH promoter maybelong to group B mycobacterial promoters that havea conserved minus10 but no minus35 motif [26]

DiscussionsThe aim of this work was to define possible phenotypicvariation across M bovis field isolates through a combin-ation of genome sequencing comparative genomics andtranscriptome analyses from both in vitro and ex vivo con-ditions Using these approaches we uncovered a range ofnovel findings the most striking of which was the realisa-tion that genes that had been predicted to be differentiallyexpressed based on amplicon-microarray data were in fact

not upregulated and that instead it was an antisense tran-script that was showing differential expression Analysingboth transcriptome and genome sequence data allowed usto identify SNPs responsible for the transcription of anti-sense RNAs with generation of a consensus minus10 promotersequence the likely mechanism Our results suggest thatdata generated from amplicon arrays in the past may needto be revisited as it is possible that some coding-sequences identified as being differentially expressed wereinstead antisense transcriptsWith the growth of technologies such as high density

tiled oligonucleotide microarrays and next generationsequencing there has been a rapid increase in the num-ber of reports describing the existence of non-codingRNAs (ncRNAs) in bacteria Non-coding RNAs broadlyconsist of two types cis- and trans-encoded RNA TransRNA includes intergenic encoded RNA while cis-encodedRNA includes 5prime and 3prime untranslated regions of mRNAand antisense RNA To study the expressions of both cis-and trans encoded ncRNA we used a high density oligo-nucleotide tiled microarray since our amplicon microarraywas unable to detect intergenic transcripts or differentiatebetween sense and antisense transcripts Previous studiesusing M tuberculosis have identified substantial amountsof ncRNA encoded in both intergenic and intragenic re-gions [78] We detected substantial amounts of ncRNAs

Figure 4 Promoters of anti sense RNAs a Promoters of the asRNAs as_mb1618c as_1914c and as_echA21 -10 and minus35 elements areindicated in bold and italics Transcriptional start sites are indicated by large font G characters while SNP residue that leads to the expression ofthe asRNA is indicated by a large font red T residue The consensus sequence for group a mycobacterial promoters is indicated Numericalsubscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determinedthat show the indicated residue b Promoters of the differentially expressed asRNAs as_ino1 and as_narH in M tuberculosis -10 and minus35 elementsare indicated in bold italics The red residue indicates SNP responsible for differential expression

Table 6 Determination of transcriptional start sites ofantisense RNAs

Transcript asRNA PCRprod size

TSSresidue type

TSS residue postn(wrt 212297)

T6 mb1618c_as 150 G 1778886

T11 mb1914c_as 220 G 2122978

T25 echA21_as 175 G 4155796

Golby et al BMC Genomics 2013 14710 Page 13 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

to 130701 It was not readily apparent why a sSNP in thecoding sequence of a gene should lead to an increase inexpression of that gene but there are several reports thatshow sSNPs leading to changes in stability of mRNA tran-scripts [2122] Rv0987 and Rv0988 of M tuberculosisH37Rv encode part of an ABC transporter and a putativesecreted hydrolase respectively In 212297 a single basetransition (G-A) introduces a stop codon that splitsRv0987 into the two pseudogenes Mb1013 and Mb1014Previous microarray based gene expression studies by ourgroup have shown that Rv0987 and Rv0988 in M tubercu-losis show higher levels of expression than the orthologousMb1013Mb1014 and Mb1015 respectively in M bovis212297 [23] and in the present study the Mb1013Mb1014 and Mb1015 homologues in 245101 and 130701 also showed higher expression (up to 10-fold) than thehomologues in 212297 and 112101 Comparing thesequences of Mb1013Mb1014 and Mb1015 across all 4strains indicated that strains that show high expressionhave the lsquoGrsquo alleleMb3477c encodes an ATP binding membrane protein

part of the Esx4 secretion system [24] and gene shows upto 10-fold higher expression in 245101 and 130701 com-pared to 212297 The gene also contains an A to C transi-tion at position 3812465 a nSNP at position resulting inthe non-conservative substitution of a serine to a glycineresidue

Of the 19 genes that show specific differential expres-sion in the Mϕ the most notable are Mb1914c andechA21 which show upregulation in 245101 only (upto 6- and 23-fold respectively) Both genes encodeproteins that could be involved in lipid metabolism andboth genes contain single sSNPs that are present in245101 but absent in the other three strainsReal time RT-PCR was used to verify a selection of

genes that showed differential gene expression as pre-dicted by the microarray analysis Figure 2 comparesthe fold changes in the expression levels of 4 genes asmeasured by microarray and real time RT-PCR ThenirB and Mb1749c genes were selected because theyshowed strong upregulation in 130701 in both in vitroand ex vivo Mϕ while Mb1914c and echA21 were chosenbecause the array data predicted them to be specificallyupregulated in 245101 and only in ex vivo Mϕ Foreach of the 4 genes the strain dependent pattern ofexpression as measured by real time RT-PCR was consistentwith that measured by microarray although the foldchanges measured by real time RT-PCR were higherthan those measured by microarray

Functional analysis of SNP role in differential geneexpressionThe above data showed that many of the strain specificdifferences in gene expression were linked to the presence

Figure 2 Confirmation of amplicon microarray results with real time RT-PCR The fold changes in gene expression for (a) Mb1750c (b) nirB(c) echA21 and (d) Mb1914c measured by microarray (open bars) in each of the four strains were compared to that measured by real time RT-PCR(closed bars)

Golby et al BMC Genomics 2013 14710 Page 8 of 18httpwwwbiomedcentralcom1471-216414710

of synonymous or non-synonymous SNPs located withinthe coding regions of the genes that show variable expres-sion Non-synonymous SNPs lead to changes in aminoacid sequence which can lead to changes in protein func-tion The C to T transition at position 1932704 (wrt 212297) in the coding sequence of the 130701 Mb1749chomologue leads to the non-conservative substitution ofGly19 to Asp and this nSNP appears to be linked to theupregulation of both Mb1749c and Mb1750c in thatstrain In order to confirm this a 09 kb DNA fragmentcontaining the Mb1749c-Mb1750c-lsquoMB1751c region of130701 (containing the lsquoTrsquo allele) and the equivalentregion from 212297 (with the lsquoCrsquo allele) were PCRamplified and the fragments were cloned separatelyinto the mycobacterial shuttle vector pKINT (see Methods)to create the constructs pPG107 and pPG106 respectivelyThe two constructs were introduced into Mycobacteriumsmegmatis mc2155 separately and then the expression ofMb1749c and Mb1750c in M smegmatis pPG101 wascompared to that of M smegmatis pPG102 using real timeRT-PCR Table 4 shows that the expression levels ofMb1750c and Mb1749c in the strain expressing themutated forms of Mb1749cMb1750c are 13- and 9-foldhigher respectively compared to the strain expressingthe wild type forms confirming that this SNP is respon-sible for the observed up-regulation of the two genes in130701Synonymous substitutions do not lead to changes in

protein sequence and have generally been considered tobe lsquosilentrsquo or benign Recent studies however have sug-gested that sSNPs can have functional effects such asdecreased mRNA stability and translation [2122] Inour own studies we have found several genes whose ex-pression levels correlate with the presence of sSNPs inthe coding regions of those genes For example a C-Ttransversion at position 303227 (wrt 212297) withinthe coding sequence of nirB of 130701 is a sSNP thatappears to be linked with the upregulation in expressionof nirB within that strain To confirm that this is thecase we PCR amplified 35 kb DNA fragments containing

the hsp-nirB-nirD-cobU region of strain 130701 (containingthe lsquoCrsquo allele) and the equivalent region from strain 212297(with the lsquoTrsquo allele) and cloned them separately into theintegrating vector pKINT to create the constructspPG108 and pPG109 respectively These constructswere introduced into 212297 and the expression levelsof nirB and nirD were found to be 30- and 2-fold re-spectively higher in the strain expressing the mutatednirBD locus compared to the strain expressing the wild-type form (Table 4) This confirms that this mutation isresponsible for the upregulation of the two genes in thisstrain

Use of a high density tiled oligonucleotide microarray todetect differentially expressed small RNA transcripts inM bovis isolatesThe M tuberculosisM bovis amplicon arrays used inthe present study were specifically designed to measureexpression levels of genes annotated in the genomic se-quence of M bovis 212297 [10] They were not how-ever designed to monitor the expression of non-codingRNA such as small RNA within intergenic regions orantisense sRNA Hence a high density tiled oligonucleotidemicroarray consisting of approximately 180000 partially-overlapping (10-base overlap) short 60 mer oligonucleo-tides was designed that offered an unbiased approach tothe detection of strand specific transcripts encoded overthe entire M bovis 212297 chromosome Total RNAthat includes small sized (lt100 nt) RNA species wasextracted from the four M bovis strains that had beengrown in liquid media and hybridised to the oligo-nucleotide microarray To avoid potential secondarystrand synthesis during cDNA synthesis which couldbe interpreted as sRNA the RNA was directly labelledwith cyanine based dyes After pairwise comparisonswere performed between 212297 and 112101 245101or 130701 220 oligonucleotide probes were identifiedthat detected differentially expressed transcripts (25 foldcut off ) in one or more of the three strains (Additionalfile 4) Only transcripts detected by multiple (2 or more)overlapping probes were regarded as genuine transcriptsas those detected by single probes could be due to cross-hybridisation effects or represent spurious transcriptsUsing these criteria 26 transcripts designated T1-T26were found to show differential expression in one or moreof the strains (Table 5) and those transcripts can be di-vided into those that are encoded within intergenic re-gions and those encoded within the genomic co-ordinatesencompassing annotated coding sequences Comparisonof the differentially expressed gene lists identified usingamplicon vs oligonucleotide arrays (Tables 3 and 5) itis clear that many of the transcripts detected using theamplicon arrays are not necessarily encoded on thesense gene strand as had been previous interpreted For

Table 4 Confirmation of SNP linkage to upregulation ingene expression

Strain Gene Fold change

M smegmatis mc2155 (pPG101) Mb1750c 161 plusmn 32

M smegmatis mc2155 (pPG101) Mb1749c 125 plusmn 12

M smegmatis mc2155 (pPG102) Mb1750c 19 plusmn 08

M smegmatis mc2155 (pPG102) Mb1749c 14 plusmn 04

Mb 212297 (pPG108) nirB 20 plusmn 10

Mb 212297 (pPG108) nirD 13 plusmn 05

Mb 212297 (pPG109) nirB 529 plusmn 159

Mb 212297 (pPG109) nirD 31 plusmn 08

Fold changes are the mean ratios plusmn standard deviation

Golby et al BMC Genomics 2013 14710 Page 9 of 18httpwwwbiomedcentralcom1471-216414710

Table 5 Differential expression of RNA transcripts as detected by a tiled oligonucleotide microarray

Transcript No Probes Position Size cds Strand 112101 245101 130701 CDS Product SNP

T1 3 303138-303295 157 Mb0258nirB A 3uarr nitrite reductase (large sub-unit)

T2 5 303260-303515 255 Mb0258nirB S 5uarr nitrite reductase (large sub-unit) sSNP at 303227 C-T in 130701 only

T3 2 1105085-1105193 108 Mb1014 A 3uarr 3uarr probable adhesion componentof ABC transporter

T4 2 1105158-1105315 157 Mb1014 A 3uarr 3uarr probable adhesioncomponent of ABC transporter

T5 3 1105158-1105315 157 Mb1014 S 3uarr 3uarr probable adhesioncomponent of ABC transporter

T6 5 1778876-1779131 255 Mb1618c A 9uarr possible secreted lipase sSNP at 1778879 C-T in 112101 only

T7 2 1840174-1840282 108 Mb1672c S 4darr hypothetical protein

T8 3 1932158-1932315 157 tRNA-Pro S 3uarr proline tRNA

T9 5 1932452-1932707 255 Mb1749c S 4uarr toxin component oftoxin-antitoxin system

nSNP at 1932704 C-T (G to D) in 130701 only

T10 5 1932746-1933075 329 Mb1750c S 7uarr anti-toxin component oftoxin-antitoxin system

T11 6 2122959-2123263 304 Mb1914c A 12uarr probable short-chain typedehydrogenasereductase

sSNP at 2122970 C-T in 245101 only

T12 2 2324463-2324571 108 Mb2110 S 3uarr hypothetical protein(frame-shifted in 212297)

T13 2 2328944-2329052 108 Mb2116cpepE S 2darr 3darr probable dipeptidase

T14 2 2329483-2329591 108 Mb2117 A 10darr 10darr 5prime-3prime exonuclease nSNP at 2329583 A-G (comp strand I to T)in 245101 only

T15 3 2868310-2868467 157 Mb2607 A 6darr 8darr hp (frame-shifted in 212297)

T16 2 2868506-2868614 108 Mb2607 A 7darr 7darr hp (frame-shifted in 212297) nSNP at 2868616 A-G (stop to W) in245101 and 130701

T17 2 3075740-3075848 108 I 5uarr 6uarr

T18 2 3075887-3075995 108 I 5uarr 6uarr

T19 2 3076622-3076730 108 I 4uarr 4uarr

T20 2 3076769-3076877 108 I 5uarr 5uarr

T21 2 3078728-3078836 108 I 6uarr 7uarr

T22 2 3079169-3079277 108 I 4uarr 6uarr

T23 2 3079903-3080011 108 I 6uarr 7

T24 2 3844577-3844685 108 Mb3509c S 3uarr 3uarr possible acyltransferase(frame-shifted in 212297)

T25 6 4155501-4155805 304 Mb3803echA21 A 5uarr possible enoyl-coA hydratase sSNP at 4155803 C-T (comp strand) in 245101 only

T26 2 4316987-4317095 108 Mb3927c A 3uarr chp (frame-shifted in H37Rv)

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expressionStrand A indicates antisense S sense and I intergenic

Golby

etalBM

CGenom

ics201314710

Page10

of18

httpwwwbiom

edcentralcom1471-216414710

example the amplicon array data had appeared to suggestthat Mb1914c and echA21 were upregulated in 245101but the oligo array data indicates that transcripts 11 and25 which are encoded within the co-ordinates encoded bythose two genes are actually encoded on the antisensestrands This apparent discrepancy can be rationalisedonce we consider that double stranded amplicon micro-array probes cannot discriminate between transcriptsencoded on the sense or antisense strands Transcripts 11and 25 can therefore be considered as potential antisensesRNAs (asRNA) which could be involved in translationalor post-transcriptional control of the sense transcriptOther potential cis-encoded sRNAs detected using the ar-rays include T6 T14 and T15T16 which are encoded onthe antisense strands to Mb1618c Mb2117 and Mb2607respectively and for each of these transcripts their expres-sion appears to be linked to the presence of a single SNPwithin the co-ordinates of the genes The approximateboundaries of these transcripts can be derived from thegenomic co-ordinates of the oligonucleotide probes thatdetect the expression of the transcript Thus the tran-scripts appear to be between 100ndash300 nt in size and thepositions of the linked SNPs appear to be positioned eitherjust upstream or within the predicted 5primeend of the tran-scripts (Figure 3) Three of the transcripts (T11 T14 andT25) are antisense to the central part of the sense encodedgene while T6 is encoded antisense to the 5prime end ofMb1618c As well as antisense transcripts we also saw thedifferential expression of sense transcripts The ampliconmicroarray data (confirmed by real time RT-PCR) indi-cated that nirB is strongly upregulated specifically in130701 in both in vitro and ex-vivo Mϕ An analysis ofthe oligonucleotide array data however indicates thatthere are two short transcripts T1 and T2 (sense and anti-sense respectively) that are encoded within the genomicco-ordinates of the nirB gene T2 is the longer in size (255vs 155nt) and more highly expressed (5 vs 3-fold) thanT1 and both transcripts appear to be linked to the pres-ence of a SNP that is located within the middle of T1 andapproximately 50nt upstream of T2Some of the transcripts are bone fide gene sense

strand mRNA transcripts such as T9 and T10 whichare encoded by Mb1749c and Mb1750c respectivelyAlthough it would appear that the two genes are tran-scribed separately it is probable that the two transcriptsare co-transcribed as the stop codon of Mb1750c over-laps the start codon of Mb1749c Eight of the tran-scripts listed in Table 5 are encoded within intergenicregions 7 of which are encoded within the polymorphicdirect repeat (DR) locus The DR locus of strains belong-ing to the M tuberculosis complex has been suggested toconstitute a CRISPR locus which have been shown inmany species of bacteria to be involved in protectionagainst exogenous foreign DNA such as plasmids and

phage [25] All the DR encoded transcripts are short(approx 100 nt) straddle contiguous repeat and spacersequences and show approximately 5-fold higher levelsof expression in 245101 and 130701 compared with212297 and 112101

Characterisation of differentially expressed cis asRNAThe genomic co-ordinates of the oligonucleotide probesthat detected the antisense species described above canonly serve as approximate estimations as to their startand end points Thus we used 5prime RLM-RACE (RNALigase Mediated Rapid Amplification of cDNA Ends) inan attempt to accurately define the transcriptional startsites (TSS) for the short sense transcript T2 and the anti-sense transcripts T6 T11 and T25 described in the abovesection (see Methods) These transcripts were chosen astheir expression levels are high and their transcript lengthswere considered to be sufficiently long to enable theRLM-RACE methodology to work Table 6 details thesizes of the PCR products obtained after RLM-RACE wasperformed using oligonucleotide primers designed tosequences predicted for transcripts T6 T11 and T25 NoPCR product was obtained for transcript T2 For each ofthe three transcripts the TSS was determined to be a Gresidue which is the most commonly used residue typefor mycobacterial TSSrsquos (Figure 4) [26] For each of the T6T11 and T25 transcripts expression of the asRNAs waslinked to the presence of a SNP (C to T) proximal tothe 5prime end of the asRNA Strains exhibiting the lsquoCrsquo alleleshowed no expression of the asRNA whilst the strainthat showed expression had the lsquoTrsquo allele An analysis ofthe nucleotide sequence in the vicinity of the SNPsreveals that for each of the three transcripts the SNPconstitutes the 6th residue of a motif that has stronghomology to the consensus sequence for the minus10 elementof Group A mycobacterial promoters (Figure 4) [26] Thefinding that a lsquoTrsquo residue is associated with expression isconsistent with the consensus sequence which indicatesthat 86 of all minus10 elements have a lsquoTrsquo residue at the 6thresidue position Several residues that flank the minus10 motifalso show a degree of conservation Sequence motifswhich show strong homology to group A minus35 elementsare present 18ndash19 bp upstream of the putative minus10elements and the distances between the minus35 -10 andTSS elements are consistent with those elements ofthe consensus sequence No protein encoding openreading frames were detected within the T6 T11 andT25 transcriptsIn a parallel study high density oligonucleotide micro-

arrays were also used to interrogate the transcriptomesof M tuberculosis H37Rv M bovis BCG Pasteur Myco-bacterium caprae and M bovis AN5 that had beengrown in Middlebrook 7H9 media As a result of theseexperiments two asRNA species were found to be

Golby et al BMC Genomics 2013 14710 Page 11 of 18httpwwwbiomedcentralcom1471-216414710

expressed within the antisense strands of the ino1 andnarH genes of M tuberculosis H37Rv but not in any ofthe other 3 strains tested (data not shown) A compari-son of nucleotide sequences of the orthologous genesacross the species suggested that expressions of theas_sRNAs correlated with the presence of a sSNP (C to T

transition at positions 50555 and 1292100 wrt H37Rvgenomic sequence for as_ino1 and as_narH respectively)upstream of the asRNAs Approximate information re-garding the transcriptional start site was deduced from thebinding co-ordinates of the probes that detected the tran-scripts As with the M bovis antisense sRNAs described

Figure 3 Expressions and schematic representation of genomic locations of selected cis-encoded antisense sRNAs identified using atiled oligonucleotide microarray Three asRNAs (open arrows) are (a) T6 (b) T14 and (c) T25 For each asRNA a histogram plots the foldchanges for each of the oligonucleotide probes that detected the asRNA and for each probe the binding position relative to the 212297genome is indicated Closed and open arrows indicate lengths and direction of transcription of genes and asRNAs respectively

Golby et al BMC Genomics 2013 14710 Page 12 of 18httpwwwbiomedcentralcom1471-216414710

above the T residue associated with the expression ofthe M tuberculosis asRNAs is part of a putative minus10element A minus35 element with an appropriate spacing tothe minus10 element was identified for as_ino1 but not foras_narH suggesting that the as_narH promoter maybelong to group B mycobacterial promoters that havea conserved minus10 but no minus35 motif [26]

DiscussionsThe aim of this work was to define possible phenotypicvariation across M bovis field isolates through a combin-ation of genome sequencing comparative genomics andtranscriptome analyses from both in vitro and ex vivo con-ditions Using these approaches we uncovered a range ofnovel findings the most striking of which was the realisa-tion that genes that had been predicted to be differentiallyexpressed based on amplicon-microarray data were in fact

not upregulated and that instead it was an antisense tran-script that was showing differential expression Analysingboth transcriptome and genome sequence data allowed usto identify SNPs responsible for the transcription of anti-sense RNAs with generation of a consensus minus10 promotersequence the likely mechanism Our results suggest thatdata generated from amplicon arrays in the past may needto be revisited as it is possible that some coding-sequences identified as being differentially expressed wereinstead antisense transcriptsWith the growth of technologies such as high density

tiled oligonucleotide microarrays and next generationsequencing there has been a rapid increase in the num-ber of reports describing the existence of non-codingRNAs (ncRNAs) in bacteria Non-coding RNAs broadlyconsist of two types cis- and trans-encoded RNA TransRNA includes intergenic encoded RNA while cis-encodedRNA includes 5prime and 3prime untranslated regions of mRNAand antisense RNA To study the expressions of both cis-and trans encoded ncRNA we used a high density oligo-nucleotide tiled microarray since our amplicon microarraywas unable to detect intergenic transcripts or differentiatebetween sense and antisense transcripts Previous studiesusing M tuberculosis have identified substantial amountsof ncRNA encoded in both intergenic and intragenic re-gions [78] We detected substantial amounts of ncRNAs

Figure 4 Promoters of anti sense RNAs a Promoters of the asRNAs as_mb1618c as_1914c and as_echA21 -10 and minus35 elements areindicated in bold and italics Transcriptional start sites are indicated by large font G characters while SNP residue that leads to the expression ofthe asRNA is indicated by a large font red T residue The consensus sequence for group a mycobacterial promoters is indicated Numericalsubscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determinedthat show the indicated residue b Promoters of the differentially expressed asRNAs as_ino1 and as_narH in M tuberculosis -10 and minus35 elementsare indicated in bold italics The red residue indicates SNP responsible for differential expression

Table 6 Determination of transcriptional start sites ofantisense RNAs

Transcript asRNA PCRprod size

TSSresidue type

TSS residue postn(wrt 212297)

T6 mb1618c_as 150 G 1778886

T11 mb1914c_as 220 G 2122978

T25 echA21_as 175 G 4155796

Golby et al BMC Genomics 2013 14710 Page 13 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

of synonymous or non-synonymous SNPs located withinthe coding regions of the genes that show variable expres-sion Non-synonymous SNPs lead to changes in aminoacid sequence which can lead to changes in protein func-tion The C to T transition at position 1932704 (wrt 212297) in the coding sequence of the 130701 Mb1749chomologue leads to the non-conservative substitution ofGly19 to Asp and this nSNP appears to be linked to theupregulation of both Mb1749c and Mb1750c in thatstrain In order to confirm this a 09 kb DNA fragmentcontaining the Mb1749c-Mb1750c-lsquoMB1751c region of130701 (containing the lsquoTrsquo allele) and the equivalentregion from 212297 (with the lsquoCrsquo allele) were PCRamplified and the fragments were cloned separatelyinto the mycobacterial shuttle vector pKINT (see Methods)to create the constructs pPG107 and pPG106 respectivelyThe two constructs were introduced into Mycobacteriumsmegmatis mc2155 separately and then the expression ofMb1749c and Mb1750c in M smegmatis pPG101 wascompared to that of M smegmatis pPG102 using real timeRT-PCR Table 4 shows that the expression levels ofMb1750c and Mb1749c in the strain expressing themutated forms of Mb1749cMb1750c are 13- and 9-foldhigher respectively compared to the strain expressingthe wild type forms confirming that this SNP is respon-sible for the observed up-regulation of the two genes in130701Synonymous substitutions do not lead to changes in

protein sequence and have generally been considered tobe lsquosilentrsquo or benign Recent studies however have sug-gested that sSNPs can have functional effects such asdecreased mRNA stability and translation [2122] Inour own studies we have found several genes whose ex-pression levels correlate with the presence of sSNPs inthe coding regions of those genes For example a C-Ttransversion at position 303227 (wrt 212297) withinthe coding sequence of nirB of 130701 is a sSNP thatappears to be linked with the upregulation in expressionof nirB within that strain To confirm that this is thecase we PCR amplified 35 kb DNA fragments containing

the hsp-nirB-nirD-cobU region of strain 130701 (containingthe lsquoCrsquo allele) and the equivalent region from strain 212297(with the lsquoTrsquo allele) and cloned them separately into theintegrating vector pKINT to create the constructspPG108 and pPG109 respectively These constructswere introduced into 212297 and the expression levelsof nirB and nirD were found to be 30- and 2-fold re-spectively higher in the strain expressing the mutatednirBD locus compared to the strain expressing the wild-type form (Table 4) This confirms that this mutation isresponsible for the upregulation of the two genes in thisstrain

Use of a high density tiled oligonucleotide microarray todetect differentially expressed small RNA transcripts inM bovis isolatesThe M tuberculosisM bovis amplicon arrays used inthe present study were specifically designed to measureexpression levels of genes annotated in the genomic se-quence of M bovis 212297 [10] They were not how-ever designed to monitor the expression of non-codingRNA such as small RNA within intergenic regions orantisense sRNA Hence a high density tiled oligonucleotidemicroarray consisting of approximately 180000 partially-overlapping (10-base overlap) short 60 mer oligonucleo-tides was designed that offered an unbiased approach tothe detection of strand specific transcripts encoded overthe entire M bovis 212297 chromosome Total RNAthat includes small sized (lt100 nt) RNA species wasextracted from the four M bovis strains that had beengrown in liquid media and hybridised to the oligo-nucleotide microarray To avoid potential secondarystrand synthesis during cDNA synthesis which couldbe interpreted as sRNA the RNA was directly labelledwith cyanine based dyes After pairwise comparisonswere performed between 212297 and 112101 245101or 130701 220 oligonucleotide probes were identifiedthat detected differentially expressed transcripts (25 foldcut off ) in one or more of the three strains (Additionalfile 4) Only transcripts detected by multiple (2 or more)overlapping probes were regarded as genuine transcriptsas those detected by single probes could be due to cross-hybridisation effects or represent spurious transcriptsUsing these criteria 26 transcripts designated T1-T26were found to show differential expression in one or moreof the strains (Table 5) and those transcripts can be di-vided into those that are encoded within intergenic re-gions and those encoded within the genomic co-ordinatesencompassing annotated coding sequences Comparisonof the differentially expressed gene lists identified usingamplicon vs oligonucleotide arrays (Tables 3 and 5) itis clear that many of the transcripts detected using theamplicon arrays are not necessarily encoded on thesense gene strand as had been previous interpreted For

Table 4 Confirmation of SNP linkage to upregulation ingene expression

Strain Gene Fold change

M smegmatis mc2155 (pPG101) Mb1750c 161 plusmn 32

M smegmatis mc2155 (pPG101) Mb1749c 125 plusmn 12

M smegmatis mc2155 (pPG102) Mb1750c 19 plusmn 08

M smegmatis mc2155 (pPG102) Mb1749c 14 plusmn 04

Mb 212297 (pPG108) nirB 20 plusmn 10

Mb 212297 (pPG108) nirD 13 plusmn 05

Mb 212297 (pPG109) nirB 529 plusmn 159

Mb 212297 (pPG109) nirD 31 plusmn 08

Fold changes are the mean ratios plusmn standard deviation

Golby et al BMC Genomics 2013 14710 Page 9 of 18httpwwwbiomedcentralcom1471-216414710

Table 5 Differential expression of RNA transcripts as detected by a tiled oligonucleotide microarray

Transcript No Probes Position Size cds Strand 112101 245101 130701 CDS Product SNP

T1 3 303138-303295 157 Mb0258nirB A 3uarr nitrite reductase (large sub-unit)

T2 5 303260-303515 255 Mb0258nirB S 5uarr nitrite reductase (large sub-unit) sSNP at 303227 C-T in 130701 only

T3 2 1105085-1105193 108 Mb1014 A 3uarr 3uarr probable adhesion componentof ABC transporter

T4 2 1105158-1105315 157 Mb1014 A 3uarr 3uarr probable adhesioncomponent of ABC transporter

T5 3 1105158-1105315 157 Mb1014 S 3uarr 3uarr probable adhesioncomponent of ABC transporter

T6 5 1778876-1779131 255 Mb1618c A 9uarr possible secreted lipase sSNP at 1778879 C-T in 112101 only

T7 2 1840174-1840282 108 Mb1672c S 4darr hypothetical protein

T8 3 1932158-1932315 157 tRNA-Pro S 3uarr proline tRNA

T9 5 1932452-1932707 255 Mb1749c S 4uarr toxin component oftoxin-antitoxin system

nSNP at 1932704 C-T (G to D) in 130701 only

T10 5 1932746-1933075 329 Mb1750c S 7uarr anti-toxin component oftoxin-antitoxin system

T11 6 2122959-2123263 304 Mb1914c A 12uarr probable short-chain typedehydrogenasereductase

sSNP at 2122970 C-T in 245101 only

T12 2 2324463-2324571 108 Mb2110 S 3uarr hypothetical protein(frame-shifted in 212297)

T13 2 2328944-2329052 108 Mb2116cpepE S 2darr 3darr probable dipeptidase

T14 2 2329483-2329591 108 Mb2117 A 10darr 10darr 5prime-3prime exonuclease nSNP at 2329583 A-G (comp strand I to T)in 245101 only

T15 3 2868310-2868467 157 Mb2607 A 6darr 8darr hp (frame-shifted in 212297)

T16 2 2868506-2868614 108 Mb2607 A 7darr 7darr hp (frame-shifted in 212297) nSNP at 2868616 A-G (stop to W) in245101 and 130701

T17 2 3075740-3075848 108 I 5uarr 6uarr

T18 2 3075887-3075995 108 I 5uarr 6uarr

T19 2 3076622-3076730 108 I 4uarr 4uarr

T20 2 3076769-3076877 108 I 5uarr 5uarr

T21 2 3078728-3078836 108 I 6uarr 7uarr

T22 2 3079169-3079277 108 I 4uarr 6uarr

T23 2 3079903-3080011 108 I 6uarr 7

T24 2 3844577-3844685 108 Mb3509c S 3uarr 3uarr possible acyltransferase(frame-shifted in 212297)

T25 6 4155501-4155805 304 Mb3803echA21 A 5uarr possible enoyl-coA hydratase sSNP at 4155803 C-T (comp strand) in 245101 only

T26 2 4316987-4317095 108 Mb3927c A 3uarr chp (frame-shifted in H37Rv)

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expressionStrand A indicates antisense S sense and I intergenic

Golby

etalBM

CGenom

ics201314710

Page10

of18

httpwwwbiom

edcentralcom1471-216414710

example the amplicon array data had appeared to suggestthat Mb1914c and echA21 were upregulated in 245101but the oligo array data indicates that transcripts 11 and25 which are encoded within the co-ordinates encoded bythose two genes are actually encoded on the antisensestrands This apparent discrepancy can be rationalisedonce we consider that double stranded amplicon micro-array probes cannot discriminate between transcriptsencoded on the sense or antisense strands Transcripts 11and 25 can therefore be considered as potential antisensesRNAs (asRNA) which could be involved in translationalor post-transcriptional control of the sense transcriptOther potential cis-encoded sRNAs detected using the ar-rays include T6 T14 and T15T16 which are encoded onthe antisense strands to Mb1618c Mb2117 and Mb2607respectively and for each of these transcripts their expres-sion appears to be linked to the presence of a single SNPwithin the co-ordinates of the genes The approximateboundaries of these transcripts can be derived from thegenomic co-ordinates of the oligonucleotide probes thatdetect the expression of the transcript Thus the tran-scripts appear to be between 100ndash300 nt in size and thepositions of the linked SNPs appear to be positioned eitherjust upstream or within the predicted 5primeend of the tran-scripts (Figure 3) Three of the transcripts (T11 T14 andT25) are antisense to the central part of the sense encodedgene while T6 is encoded antisense to the 5prime end ofMb1618c As well as antisense transcripts we also saw thedifferential expression of sense transcripts The ampliconmicroarray data (confirmed by real time RT-PCR) indi-cated that nirB is strongly upregulated specifically in130701 in both in vitro and ex-vivo Mϕ An analysis ofthe oligonucleotide array data however indicates thatthere are two short transcripts T1 and T2 (sense and anti-sense respectively) that are encoded within the genomicco-ordinates of the nirB gene T2 is the longer in size (255vs 155nt) and more highly expressed (5 vs 3-fold) thanT1 and both transcripts appear to be linked to the pres-ence of a SNP that is located within the middle of T1 andapproximately 50nt upstream of T2Some of the transcripts are bone fide gene sense

strand mRNA transcripts such as T9 and T10 whichare encoded by Mb1749c and Mb1750c respectivelyAlthough it would appear that the two genes are tran-scribed separately it is probable that the two transcriptsare co-transcribed as the stop codon of Mb1750c over-laps the start codon of Mb1749c Eight of the tran-scripts listed in Table 5 are encoded within intergenicregions 7 of which are encoded within the polymorphicdirect repeat (DR) locus The DR locus of strains belong-ing to the M tuberculosis complex has been suggested toconstitute a CRISPR locus which have been shown inmany species of bacteria to be involved in protectionagainst exogenous foreign DNA such as plasmids and

phage [25] All the DR encoded transcripts are short(approx 100 nt) straddle contiguous repeat and spacersequences and show approximately 5-fold higher levelsof expression in 245101 and 130701 compared with212297 and 112101

Characterisation of differentially expressed cis asRNAThe genomic co-ordinates of the oligonucleotide probesthat detected the antisense species described above canonly serve as approximate estimations as to their startand end points Thus we used 5prime RLM-RACE (RNALigase Mediated Rapid Amplification of cDNA Ends) inan attempt to accurately define the transcriptional startsites (TSS) for the short sense transcript T2 and the anti-sense transcripts T6 T11 and T25 described in the abovesection (see Methods) These transcripts were chosen astheir expression levels are high and their transcript lengthswere considered to be sufficiently long to enable theRLM-RACE methodology to work Table 6 details thesizes of the PCR products obtained after RLM-RACE wasperformed using oligonucleotide primers designed tosequences predicted for transcripts T6 T11 and T25 NoPCR product was obtained for transcript T2 For each ofthe three transcripts the TSS was determined to be a Gresidue which is the most commonly used residue typefor mycobacterial TSSrsquos (Figure 4) [26] For each of the T6T11 and T25 transcripts expression of the asRNAs waslinked to the presence of a SNP (C to T) proximal tothe 5prime end of the asRNA Strains exhibiting the lsquoCrsquo alleleshowed no expression of the asRNA whilst the strainthat showed expression had the lsquoTrsquo allele An analysis ofthe nucleotide sequence in the vicinity of the SNPsreveals that for each of the three transcripts the SNPconstitutes the 6th residue of a motif that has stronghomology to the consensus sequence for the minus10 elementof Group A mycobacterial promoters (Figure 4) [26] Thefinding that a lsquoTrsquo residue is associated with expression isconsistent with the consensus sequence which indicatesthat 86 of all minus10 elements have a lsquoTrsquo residue at the 6thresidue position Several residues that flank the minus10 motifalso show a degree of conservation Sequence motifswhich show strong homology to group A minus35 elementsare present 18ndash19 bp upstream of the putative minus10elements and the distances between the minus35 -10 andTSS elements are consistent with those elements ofthe consensus sequence No protein encoding openreading frames were detected within the T6 T11 andT25 transcriptsIn a parallel study high density oligonucleotide micro-

arrays were also used to interrogate the transcriptomesof M tuberculosis H37Rv M bovis BCG Pasteur Myco-bacterium caprae and M bovis AN5 that had beengrown in Middlebrook 7H9 media As a result of theseexperiments two asRNA species were found to be

Golby et al BMC Genomics 2013 14710 Page 11 of 18httpwwwbiomedcentralcom1471-216414710

expressed within the antisense strands of the ino1 andnarH genes of M tuberculosis H37Rv but not in any ofthe other 3 strains tested (data not shown) A compari-son of nucleotide sequences of the orthologous genesacross the species suggested that expressions of theas_sRNAs correlated with the presence of a sSNP (C to T

transition at positions 50555 and 1292100 wrt H37Rvgenomic sequence for as_ino1 and as_narH respectively)upstream of the asRNAs Approximate information re-garding the transcriptional start site was deduced from thebinding co-ordinates of the probes that detected the tran-scripts As with the M bovis antisense sRNAs described

Figure 3 Expressions and schematic representation of genomic locations of selected cis-encoded antisense sRNAs identified using atiled oligonucleotide microarray Three asRNAs (open arrows) are (a) T6 (b) T14 and (c) T25 For each asRNA a histogram plots the foldchanges for each of the oligonucleotide probes that detected the asRNA and for each probe the binding position relative to the 212297genome is indicated Closed and open arrows indicate lengths and direction of transcription of genes and asRNAs respectively

Golby et al BMC Genomics 2013 14710 Page 12 of 18httpwwwbiomedcentralcom1471-216414710

above the T residue associated with the expression ofthe M tuberculosis asRNAs is part of a putative minus10element A minus35 element with an appropriate spacing tothe minus10 element was identified for as_ino1 but not foras_narH suggesting that the as_narH promoter maybelong to group B mycobacterial promoters that havea conserved minus10 but no minus35 motif [26]

DiscussionsThe aim of this work was to define possible phenotypicvariation across M bovis field isolates through a combin-ation of genome sequencing comparative genomics andtranscriptome analyses from both in vitro and ex vivo con-ditions Using these approaches we uncovered a range ofnovel findings the most striking of which was the realisa-tion that genes that had been predicted to be differentiallyexpressed based on amplicon-microarray data were in fact

not upregulated and that instead it was an antisense tran-script that was showing differential expression Analysingboth transcriptome and genome sequence data allowed usto identify SNPs responsible for the transcription of anti-sense RNAs with generation of a consensus minus10 promotersequence the likely mechanism Our results suggest thatdata generated from amplicon arrays in the past may needto be revisited as it is possible that some coding-sequences identified as being differentially expressed wereinstead antisense transcriptsWith the growth of technologies such as high density

tiled oligonucleotide microarrays and next generationsequencing there has been a rapid increase in the num-ber of reports describing the existence of non-codingRNAs (ncRNAs) in bacteria Non-coding RNAs broadlyconsist of two types cis- and trans-encoded RNA TransRNA includes intergenic encoded RNA while cis-encodedRNA includes 5prime and 3prime untranslated regions of mRNAand antisense RNA To study the expressions of both cis-and trans encoded ncRNA we used a high density oligo-nucleotide tiled microarray since our amplicon microarraywas unable to detect intergenic transcripts or differentiatebetween sense and antisense transcripts Previous studiesusing M tuberculosis have identified substantial amountsof ncRNA encoded in both intergenic and intragenic re-gions [78] We detected substantial amounts of ncRNAs

Figure 4 Promoters of anti sense RNAs a Promoters of the asRNAs as_mb1618c as_1914c and as_echA21 -10 and minus35 elements areindicated in bold and italics Transcriptional start sites are indicated by large font G characters while SNP residue that leads to the expression ofthe asRNA is indicated by a large font red T residue The consensus sequence for group a mycobacterial promoters is indicated Numericalsubscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determinedthat show the indicated residue b Promoters of the differentially expressed asRNAs as_ino1 and as_narH in M tuberculosis -10 and minus35 elementsare indicated in bold italics The red residue indicates SNP responsible for differential expression

Table 6 Determination of transcriptional start sites ofantisense RNAs

Transcript asRNA PCRprod size

TSSresidue type

TSS residue postn(wrt 212297)

T6 mb1618c_as 150 G 1778886

T11 mb1914c_as 220 G 2122978

T25 echA21_as 175 G 4155796

Golby et al BMC Genomics 2013 14710 Page 13 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

Table 5 Differential expression of RNA transcripts as detected by a tiled oligonucleotide microarray

Transcript No Probes Position Size cds Strand 112101 245101 130701 CDS Product SNP

T1 3 303138-303295 157 Mb0258nirB A 3uarr nitrite reductase (large sub-unit)

T2 5 303260-303515 255 Mb0258nirB S 5uarr nitrite reductase (large sub-unit) sSNP at 303227 C-T in 130701 only

T3 2 1105085-1105193 108 Mb1014 A 3uarr 3uarr probable adhesion componentof ABC transporter

T4 2 1105158-1105315 157 Mb1014 A 3uarr 3uarr probable adhesioncomponent of ABC transporter

T5 3 1105158-1105315 157 Mb1014 S 3uarr 3uarr probable adhesioncomponent of ABC transporter

T6 5 1778876-1779131 255 Mb1618c A 9uarr possible secreted lipase sSNP at 1778879 C-T in 112101 only

T7 2 1840174-1840282 108 Mb1672c S 4darr hypothetical protein

T8 3 1932158-1932315 157 tRNA-Pro S 3uarr proline tRNA

T9 5 1932452-1932707 255 Mb1749c S 4uarr toxin component oftoxin-antitoxin system

nSNP at 1932704 C-T (G to D) in 130701 only

T10 5 1932746-1933075 329 Mb1750c S 7uarr anti-toxin component oftoxin-antitoxin system

T11 6 2122959-2123263 304 Mb1914c A 12uarr probable short-chain typedehydrogenasereductase

sSNP at 2122970 C-T in 245101 only

T12 2 2324463-2324571 108 Mb2110 S 3uarr hypothetical protein(frame-shifted in 212297)

T13 2 2328944-2329052 108 Mb2116cpepE S 2darr 3darr probable dipeptidase

T14 2 2329483-2329591 108 Mb2117 A 10darr 10darr 5prime-3prime exonuclease nSNP at 2329583 A-G (comp strand I to T)in 245101 only

T15 3 2868310-2868467 157 Mb2607 A 6darr 8darr hp (frame-shifted in 212297)

T16 2 2868506-2868614 108 Mb2607 A 7darr 7darr hp (frame-shifted in 212297) nSNP at 2868616 A-G (stop to W) in245101 and 130701

T17 2 3075740-3075848 108 I 5uarr 6uarr

T18 2 3075887-3075995 108 I 5uarr 6uarr

T19 2 3076622-3076730 108 I 4uarr 4uarr

T20 2 3076769-3076877 108 I 5uarr 5uarr

T21 2 3078728-3078836 108 I 6uarr 7uarr

T22 2 3079169-3079277 108 I 4uarr 6uarr

T23 2 3079903-3080011 108 I 6uarr 7

T24 2 3844577-3844685 108 Mb3509c S 3uarr 3uarr possible acyltransferase(frame-shifted in 212297)

T25 6 4155501-4155805 304 Mb3803echA21 A 5uarr possible enoyl-coA hydratase sSNP at 4155803 C-T (comp strand) in 245101 only

T26 2 4316987-4317095 108 Mb3927c A 3uarr chp (frame-shifted in H37Rv)

Up and down arrows indicate fold up- and downregulation respectively and empty cells indicate no change in expressionStrand A indicates antisense S sense and I intergenic

Golby

etalBM

CGenom

ics201314710

Page10

of18

httpwwwbiom

edcentralcom1471-216414710

example the amplicon array data had appeared to suggestthat Mb1914c and echA21 were upregulated in 245101but the oligo array data indicates that transcripts 11 and25 which are encoded within the co-ordinates encoded bythose two genes are actually encoded on the antisensestrands This apparent discrepancy can be rationalisedonce we consider that double stranded amplicon micro-array probes cannot discriminate between transcriptsencoded on the sense or antisense strands Transcripts 11and 25 can therefore be considered as potential antisensesRNAs (asRNA) which could be involved in translationalor post-transcriptional control of the sense transcriptOther potential cis-encoded sRNAs detected using the ar-rays include T6 T14 and T15T16 which are encoded onthe antisense strands to Mb1618c Mb2117 and Mb2607respectively and for each of these transcripts their expres-sion appears to be linked to the presence of a single SNPwithin the co-ordinates of the genes The approximateboundaries of these transcripts can be derived from thegenomic co-ordinates of the oligonucleotide probes thatdetect the expression of the transcript Thus the tran-scripts appear to be between 100ndash300 nt in size and thepositions of the linked SNPs appear to be positioned eitherjust upstream or within the predicted 5primeend of the tran-scripts (Figure 3) Three of the transcripts (T11 T14 andT25) are antisense to the central part of the sense encodedgene while T6 is encoded antisense to the 5prime end ofMb1618c As well as antisense transcripts we also saw thedifferential expression of sense transcripts The ampliconmicroarray data (confirmed by real time RT-PCR) indi-cated that nirB is strongly upregulated specifically in130701 in both in vitro and ex-vivo Mϕ An analysis ofthe oligonucleotide array data however indicates thatthere are two short transcripts T1 and T2 (sense and anti-sense respectively) that are encoded within the genomicco-ordinates of the nirB gene T2 is the longer in size (255vs 155nt) and more highly expressed (5 vs 3-fold) thanT1 and both transcripts appear to be linked to the pres-ence of a SNP that is located within the middle of T1 andapproximately 50nt upstream of T2Some of the transcripts are bone fide gene sense

strand mRNA transcripts such as T9 and T10 whichare encoded by Mb1749c and Mb1750c respectivelyAlthough it would appear that the two genes are tran-scribed separately it is probable that the two transcriptsare co-transcribed as the stop codon of Mb1750c over-laps the start codon of Mb1749c Eight of the tran-scripts listed in Table 5 are encoded within intergenicregions 7 of which are encoded within the polymorphicdirect repeat (DR) locus The DR locus of strains belong-ing to the M tuberculosis complex has been suggested toconstitute a CRISPR locus which have been shown inmany species of bacteria to be involved in protectionagainst exogenous foreign DNA such as plasmids and

phage [25] All the DR encoded transcripts are short(approx 100 nt) straddle contiguous repeat and spacersequences and show approximately 5-fold higher levelsof expression in 245101 and 130701 compared with212297 and 112101

Characterisation of differentially expressed cis asRNAThe genomic co-ordinates of the oligonucleotide probesthat detected the antisense species described above canonly serve as approximate estimations as to their startand end points Thus we used 5prime RLM-RACE (RNALigase Mediated Rapid Amplification of cDNA Ends) inan attempt to accurately define the transcriptional startsites (TSS) for the short sense transcript T2 and the anti-sense transcripts T6 T11 and T25 described in the abovesection (see Methods) These transcripts were chosen astheir expression levels are high and their transcript lengthswere considered to be sufficiently long to enable theRLM-RACE methodology to work Table 6 details thesizes of the PCR products obtained after RLM-RACE wasperformed using oligonucleotide primers designed tosequences predicted for transcripts T6 T11 and T25 NoPCR product was obtained for transcript T2 For each ofthe three transcripts the TSS was determined to be a Gresidue which is the most commonly used residue typefor mycobacterial TSSrsquos (Figure 4) [26] For each of the T6T11 and T25 transcripts expression of the asRNAs waslinked to the presence of a SNP (C to T) proximal tothe 5prime end of the asRNA Strains exhibiting the lsquoCrsquo alleleshowed no expression of the asRNA whilst the strainthat showed expression had the lsquoTrsquo allele An analysis ofthe nucleotide sequence in the vicinity of the SNPsreveals that for each of the three transcripts the SNPconstitutes the 6th residue of a motif that has stronghomology to the consensus sequence for the minus10 elementof Group A mycobacterial promoters (Figure 4) [26] Thefinding that a lsquoTrsquo residue is associated with expression isconsistent with the consensus sequence which indicatesthat 86 of all minus10 elements have a lsquoTrsquo residue at the 6thresidue position Several residues that flank the minus10 motifalso show a degree of conservation Sequence motifswhich show strong homology to group A minus35 elementsare present 18ndash19 bp upstream of the putative minus10elements and the distances between the minus35 -10 andTSS elements are consistent with those elements ofthe consensus sequence No protein encoding openreading frames were detected within the T6 T11 andT25 transcriptsIn a parallel study high density oligonucleotide micro-

arrays were also used to interrogate the transcriptomesof M tuberculosis H37Rv M bovis BCG Pasteur Myco-bacterium caprae and M bovis AN5 that had beengrown in Middlebrook 7H9 media As a result of theseexperiments two asRNA species were found to be

Golby et al BMC Genomics 2013 14710 Page 11 of 18httpwwwbiomedcentralcom1471-216414710

expressed within the antisense strands of the ino1 andnarH genes of M tuberculosis H37Rv but not in any ofthe other 3 strains tested (data not shown) A compari-son of nucleotide sequences of the orthologous genesacross the species suggested that expressions of theas_sRNAs correlated with the presence of a sSNP (C to T

transition at positions 50555 and 1292100 wrt H37Rvgenomic sequence for as_ino1 and as_narH respectively)upstream of the asRNAs Approximate information re-garding the transcriptional start site was deduced from thebinding co-ordinates of the probes that detected the tran-scripts As with the M bovis antisense sRNAs described

Figure 3 Expressions and schematic representation of genomic locations of selected cis-encoded antisense sRNAs identified using atiled oligonucleotide microarray Three asRNAs (open arrows) are (a) T6 (b) T14 and (c) T25 For each asRNA a histogram plots the foldchanges for each of the oligonucleotide probes that detected the asRNA and for each probe the binding position relative to the 212297genome is indicated Closed and open arrows indicate lengths and direction of transcription of genes and asRNAs respectively

Golby et al BMC Genomics 2013 14710 Page 12 of 18httpwwwbiomedcentralcom1471-216414710

above the T residue associated with the expression ofthe M tuberculosis asRNAs is part of a putative minus10element A minus35 element with an appropriate spacing tothe minus10 element was identified for as_ino1 but not foras_narH suggesting that the as_narH promoter maybelong to group B mycobacterial promoters that havea conserved minus10 but no minus35 motif [26]

DiscussionsThe aim of this work was to define possible phenotypicvariation across M bovis field isolates through a combin-ation of genome sequencing comparative genomics andtranscriptome analyses from both in vitro and ex vivo con-ditions Using these approaches we uncovered a range ofnovel findings the most striking of which was the realisa-tion that genes that had been predicted to be differentiallyexpressed based on amplicon-microarray data were in fact

not upregulated and that instead it was an antisense tran-script that was showing differential expression Analysingboth transcriptome and genome sequence data allowed usto identify SNPs responsible for the transcription of anti-sense RNAs with generation of a consensus minus10 promotersequence the likely mechanism Our results suggest thatdata generated from amplicon arrays in the past may needto be revisited as it is possible that some coding-sequences identified as being differentially expressed wereinstead antisense transcriptsWith the growth of technologies such as high density

tiled oligonucleotide microarrays and next generationsequencing there has been a rapid increase in the num-ber of reports describing the existence of non-codingRNAs (ncRNAs) in bacteria Non-coding RNAs broadlyconsist of two types cis- and trans-encoded RNA TransRNA includes intergenic encoded RNA while cis-encodedRNA includes 5prime and 3prime untranslated regions of mRNAand antisense RNA To study the expressions of both cis-and trans encoded ncRNA we used a high density oligo-nucleotide tiled microarray since our amplicon microarraywas unable to detect intergenic transcripts or differentiatebetween sense and antisense transcripts Previous studiesusing M tuberculosis have identified substantial amountsof ncRNA encoded in both intergenic and intragenic re-gions [78] We detected substantial amounts of ncRNAs

Figure 4 Promoters of anti sense RNAs a Promoters of the asRNAs as_mb1618c as_1914c and as_echA21 -10 and minus35 elements areindicated in bold and italics Transcriptional start sites are indicated by large font G characters while SNP residue that leads to the expression ofthe asRNA is indicated by a large font red T residue The consensus sequence for group a mycobacterial promoters is indicated Numericalsubscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determinedthat show the indicated residue b Promoters of the differentially expressed asRNAs as_ino1 and as_narH in M tuberculosis -10 and minus35 elementsare indicated in bold italics The red residue indicates SNP responsible for differential expression

Table 6 Determination of transcriptional start sites ofantisense RNAs

Transcript asRNA PCRprod size

TSSresidue type

TSS residue postn(wrt 212297)

T6 mb1618c_as 150 G 1778886

T11 mb1914c_as 220 G 2122978

T25 echA21_as 175 G 4155796

Golby et al BMC Genomics 2013 14710 Page 13 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

example the amplicon array data had appeared to suggestthat Mb1914c and echA21 were upregulated in 245101but the oligo array data indicates that transcripts 11 and25 which are encoded within the co-ordinates encoded bythose two genes are actually encoded on the antisensestrands This apparent discrepancy can be rationalisedonce we consider that double stranded amplicon micro-array probes cannot discriminate between transcriptsencoded on the sense or antisense strands Transcripts 11and 25 can therefore be considered as potential antisensesRNAs (asRNA) which could be involved in translationalor post-transcriptional control of the sense transcriptOther potential cis-encoded sRNAs detected using the ar-rays include T6 T14 and T15T16 which are encoded onthe antisense strands to Mb1618c Mb2117 and Mb2607respectively and for each of these transcripts their expres-sion appears to be linked to the presence of a single SNPwithin the co-ordinates of the genes The approximateboundaries of these transcripts can be derived from thegenomic co-ordinates of the oligonucleotide probes thatdetect the expression of the transcript Thus the tran-scripts appear to be between 100ndash300 nt in size and thepositions of the linked SNPs appear to be positioned eitherjust upstream or within the predicted 5primeend of the tran-scripts (Figure 3) Three of the transcripts (T11 T14 andT25) are antisense to the central part of the sense encodedgene while T6 is encoded antisense to the 5prime end ofMb1618c As well as antisense transcripts we also saw thedifferential expression of sense transcripts The ampliconmicroarray data (confirmed by real time RT-PCR) indi-cated that nirB is strongly upregulated specifically in130701 in both in vitro and ex-vivo Mϕ An analysis ofthe oligonucleotide array data however indicates thatthere are two short transcripts T1 and T2 (sense and anti-sense respectively) that are encoded within the genomicco-ordinates of the nirB gene T2 is the longer in size (255vs 155nt) and more highly expressed (5 vs 3-fold) thanT1 and both transcripts appear to be linked to the pres-ence of a SNP that is located within the middle of T1 andapproximately 50nt upstream of T2Some of the transcripts are bone fide gene sense

strand mRNA transcripts such as T9 and T10 whichare encoded by Mb1749c and Mb1750c respectivelyAlthough it would appear that the two genes are tran-scribed separately it is probable that the two transcriptsare co-transcribed as the stop codon of Mb1750c over-laps the start codon of Mb1749c Eight of the tran-scripts listed in Table 5 are encoded within intergenicregions 7 of which are encoded within the polymorphicdirect repeat (DR) locus The DR locus of strains belong-ing to the M tuberculosis complex has been suggested toconstitute a CRISPR locus which have been shown inmany species of bacteria to be involved in protectionagainst exogenous foreign DNA such as plasmids and

phage [25] All the DR encoded transcripts are short(approx 100 nt) straddle contiguous repeat and spacersequences and show approximately 5-fold higher levelsof expression in 245101 and 130701 compared with212297 and 112101

Characterisation of differentially expressed cis asRNAThe genomic co-ordinates of the oligonucleotide probesthat detected the antisense species described above canonly serve as approximate estimations as to their startand end points Thus we used 5prime RLM-RACE (RNALigase Mediated Rapid Amplification of cDNA Ends) inan attempt to accurately define the transcriptional startsites (TSS) for the short sense transcript T2 and the anti-sense transcripts T6 T11 and T25 described in the abovesection (see Methods) These transcripts were chosen astheir expression levels are high and their transcript lengthswere considered to be sufficiently long to enable theRLM-RACE methodology to work Table 6 details thesizes of the PCR products obtained after RLM-RACE wasperformed using oligonucleotide primers designed tosequences predicted for transcripts T6 T11 and T25 NoPCR product was obtained for transcript T2 For each ofthe three transcripts the TSS was determined to be a Gresidue which is the most commonly used residue typefor mycobacterial TSSrsquos (Figure 4) [26] For each of the T6T11 and T25 transcripts expression of the asRNAs waslinked to the presence of a SNP (C to T) proximal tothe 5prime end of the asRNA Strains exhibiting the lsquoCrsquo alleleshowed no expression of the asRNA whilst the strainthat showed expression had the lsquoTrsquo allele An analysis ofthe nucleotide sequence in the vicinity of the SNPsreveals that for each of the three transcripts the SNPconstitutes the 6th residue of a motif that has stronghomology to the consensus sequence for the minus10 elementof Group A mycobacterial promoters (Figure 4) [26] Thefinding that a lsquoTrsquo residue is associated with expression isconsistent with the consensus sequence which indicatesthat 86 of all minus10 elements have a lsquoTrsquo residue at the 6thresidue position Several residues that flank the minus10 motifalso show a degree of conservation Sequence motifswhich show strong homology to group A minus35 elementsare present 18ndash19 bp upstream of the putative minus10elements and the distances between the minus35 -10 andTSS elements are consistent with those elements ofthe consensus sequence No protein encoding openreading frames were detected within the T6 T11 andT25 transcriptsIn a parallel study high density oligonucleotide micro-

arrays were also used to interrogate the transcriptomesof M tuberculosis H37Rv M bovis BCG Pasteur Myco-bacterium caprae and M bovis AN5 that had beengrown in Middlebrook 7H9 media As a result of theseexperiments two asRNA species were found to be

Golby et al BMC Genomics 2013 14710 Page 11 of 18httpwwwbiomedcentralcom1471-216414710

expressed within the antisense strands of the ino1 andnarH genes of M tuberculosis H37Rv but not in any ofthe other 3 strains tested (data not shown) A compari-son of nucleotide sequences of the orthologous genesacross the species suggested that expressions of theas_sRNAs correlated with the presence of a sSNP (C to T

transition at positions 50555 and 1292100 wrt H37Rvgenomic sequence for as_ino1 and as_narH respectively)upstream of the asRNAs Approximate information re-garding the transcriptional start site was deduced from thebinding co-ordinates of the probes that detected the tran-scripts As with the M bovis antisense sRNAs described

Figure 3 Expressions and schematic representation of genomic locations of selected cis-encoded antisense sRNAs identified using atiled oligonucleotide microarray Three asRNAs (open arrows) are (a) T6 (b) T14 and (c) T25 For each asRNA a histogram plots the foldchanges for each of the oligonucleotide probes that detected the asRNA and for each probe the binding position relative to the 212297genome is indicated Closed and open arrows indicate lengths and direction of transcription of genes and asRNAs respectively

Golby et al BMC Genomics 2013 14710 Page 12 of 18httpwwwbiomedcentralcom1471-216414710

above the T residue associated with the expression ofthe M tuberculosis asRNAs is part of a putative minus10element A minus35 element with an appropriate spacing tothe minus10 element was identified for as_ino1 but not foras_narH suggesting that the as_narH promoter maybelong to group B mycobacterial promoters that havea conserved minus10 but no minus35 motif [26]

DiscussionsThe aim of this work was to define possible phenotypicvariation across M bovis field isolates through a combin-ation of genome sequencing comparative genomics andtranscriptome analyses from both in vitro and ex vivo con-ditions Using these approaches we uncovered a range ofnovel findings the most striking of which was the realisa-tion that genes that had been predicted to be differentiallyexpressed based on amplicon-microarray data were in fact

not upregulated and that instead it was an antisense tran-script that was showing differential expression Analysingboth transcriptome and genome sequence data allowed usto identify SNPs responsible for the transcription of anti-sense RNAs with generation of a consensus minus10 promotersequence the likely mechanism Our results suggest thatdata generated from amplicon arrays in the past may needto be revisited as it is possible that some coding-sequences identified as being differentially expressed wereinstead antisense transcriptsWith the growth of technologies such as high density

tiled oligonucleotide microarrays and next generationsequencing there has been a rapid increase in the num-ber of reports describing the existence of non-codingRNAs (ncRNAs) in bacteria Non-coding RNAs broadlyconsist of two types cis- and trans-encoded RNA TransRNA includes intergenic encoded RNA while cis-encodedRNA includes 5prime and 3prime untranslated regions of mRNAand antisense RNA To study the expressions of both cis-and trans encoded ncRNA we used a high density oligo-nucleotide tiled microarray since our amplicon microarraywas unable to detect intergenic transcripts or differentiatebetween sense and antisense transcripts Previous studiesusing M tuberculosis have identified substantial amountsof ncRNA encoded in both intergenic and intragenic re-gions [78] We detected substantial amounts of ncRNAs

Figure 4 Promoters of anti sense RNAs a Promoters of the asRNAs as_mb1618c as_1914c and as_echA21 -10 and minus35 elements areindicated in bold and italics Transcriptional start sites are indicated by large font G characters while SNP residue that leads to the expression ofthe asRNA is indicated by a large font red T residue The consensus sequence for group a mycobacterial promoters is indicated Numericalsubscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determinedthat show the indicated residue b Promoters of the differentially expressed asRNAs as_ino1 and as_narH in M tuberculosis -10 and minus35 elementsare indicated in bold italics The red residue indicates SNP responsible for differential expression

Table 6 Determination of transcriptional start sites ofantisense RNAs

Transcript asRNA PCRprod size

TSSresidue type

TSS residue postn(wrt 212297)

T6 mb1618c_as 150 G 1778886

T11 mb1914c_as 220 G 2122978

T25 echA21_as 175 G 4155796

Golby et al BMC Genomics 2013 14710 Page 13 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

expressed within the antisense strands of the ino1 andnarH genes of M tuberculosis H37Rv but not in any ofthe other 3 strains tested (data not shown) A compari-son of nucleotide sequences of the orthologous genesacross the species suggested that expressions of theas_sRNAs correlated with the presence of a sSNP (C to T

transition at positions 50555 and 1292100 wrt H37Rvgenomic sequence for as_ino1 and as_narH respectively)upstream of the asRNAs Approximate information re-garding the transcriptional start site was deduced from thebinding co-ordinates of the probes that detected the tran-scripts As with the M bovis antisense sRNAs described

Figure 3 Expressions and schematic representation of genomic locations of selected cis-encoded antisense sRNAs identified using atiled oligonucleotide microarray Three asRNAs (open arrows) are (a) T6 (b) T14 and (c) T25 For each asRNA a histogram plots the foldchanges for each of the oligonucleotide probes that detected the asRNA and for each probe the binding position relative to the 212297genome is indicated Closed and open arrows indicate lengths and direction of transcription of genes and asRNAs respectively

Golby et al BMC Genomics 2013 14710 Page 12 of 18httpwwwbiomedcentralcom1471-216414710

above the T residue associated with the expression ofthe M tuberculosis asRNAs is part of a putative minus10element A minus35 element with an appropriate spacing tothe minus10 element was identified for as_ino1 but not foras_narH suggesting that the as_narH promoter maybelong to group B mycobacterial promoters that havea conserved minus10 but no minus35 motif [26]

DiscussionsThe aim of this work was to define possible phenotypicvariation across M bovis field isolates through a combin-ation of genome sequencing comparative genomics andtranscriptome analyses from both in vitro and ex vivo con-ditions Using these approaches we uncovered a range ofnovel findings the most striking of which was the realisa-tion that genes that had been predicted to be differentiallyexpressed based on amplicon-microarray data were in fact

not upregulated and that instead it was an antisense tran-script that was showing differential expression Analysingboth transcriptome and genome sequence data allowed usto identify SNPs responsible for the transcription of anti-sense RNAs with generation of a consensus minus10 promotersequence the likely mechanism Our results suggest thatdata generated from amplicon arrays in the past may needto be revisited as it is possible that some coding-sequences identified as being differentially expressed wereinstead antisense transcriptsWith the growth of technologies such as high density

tiled oligonucleotide microarrays and next generationsequencing there has been a rapid increase in the num-ber of reports describing the existence of non-codingRNAs (ncRNAs) in bacteria Non-coding RNAs broadlyconsist of two types cis- and trans-encoded RNA TransRNA includes intergenic encoded RNA while cis-encodedRNA includes 5prime and 3prime untranslated regions of mRNAand antisense RNA To study the expressions of both cis-and trans encoded ncRNA we used a high density oligo-nucleotide tiled microarray since our amplicon microarraywas unable to detect intergenic transcripts or differentiatebetween sense and antisense transcripts Previous studiesusing M tuberculosis have identified substantial amountsof ncRNA encoded in both intergenic and intragenic re-gions [78] We detected substantial amounts of ncRNAs

Figure 4 Promoters of anti sense RNAs a Promoters of the asRNAs as_mb1618c as_1914c and as_echA21 -10 and minus35 elements areindicated in bold and italics Transcriptional start sites are indicated by large font G characters while SNP residue that leads to the expression ofthe asRNA is indicated by a large font red T residue The consensus sequence for group a mycobacterial promoters is indicated Numericalsubscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determinedthat show the indicated residue b Promoters of the differentially expressed asRNAs as_ino1 and as_narH in M tuberculosis -10 and minus35 elementsare indicated in bold italics The red residue indicates SNP responsible for differential expression

Table 6 Determination of transcriptional start sites ofantisense RNAs

Transcript asRNA PCRprod size

TSSresidue type

TSS residue postn(wrt 212297)

T6 mb1618c_as 150 G 1778886

T11 mb1914c_as 220 G 2122978

T25 echA21_as 175 G 4155796

Golby et al BMC Genomics 2013 14710 Page 13 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

above the T residue associated with the expression ofthe M tuberculosis asRNAs is part of a putative minus10element A minus35 element with an appropriate spacing tothe minus10 element was identified for as_ino1 but not foras_narH suggesting that the as_narH promoter maybelong to group B mycobacterial promoters that havea conserved minus10 but no minus35 motif [26]

DiscussionsThe aim of this work was to define possible phenotypicvariation across M bovis field isolates through a combin-ation of genome sequencing comparative genomics andtranscriptome analyses from both in vitro and ex vivo con-ditions Using these approaches we uncovered a range ofnovel findings the most striking of which was the realisa-tion that genes that had been predicted to be differentiallyexpressed based on amplicon-microarray data were in fact

not upregulated and that instead it was an antisense tran-script that was showing differential expression Analysingboth transcriptome and genome sequence data allowed usto identify SNPs responsible for the transcription of anti-sense RNAs with generation of a consensus minus10 promotersequence the likely mechanism Our results suggest thatdata generated from amplicon arrays in the past may needto be revisited as it is possible that some coding-sequences identified as being differentially expressed wereinstead antisense transcriptsWith the growth of technologies such as high density

tiled oligonucleotide microarrays and next generationsequencing there has been a rapid increase in the num-ber of reports describing the existence of non-codingRNAs (ncRNAs) in bacteria Non-coding RNAs broadlyconsist of two types cis- and trans-encoded RNA TransRNA includes intergenic encoded RNA while cis-encodedRNA includes 5prime and 3prime untranslated regions of mRNAand antisense RNA To study the expressions of both cis-and trans encoded ncRNA we used a high density oligo-nucleotide tiled microarray since our amplicon microarraywas unable to detect intergenic transcripts or differentiatebetween sense and antisense transcripts Previous studiesusing M tuberculosis have identified substantial amountsof ncRNA encoded in both intergenic and intragenic re-gions [78] We detected substantial amounts of ncRNAs

Figure 4 Promoters of anti sense RNAs a Promoters of the asRNAs as_mb1618c as_1914c and as_echA21 -10 and minus35 elements areindicated in bold and italics Transcriptional start sites are indicated by large font G characters while SNP residue that leads to the expression ofthe asRNA is indicated by a large font red T residue The consensus sequence for group a mycobacterial promoters is indicated Numericalsubscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determinedthat show the indicated residue b Promoters of the differentially expressed asRNAs as_ino1 and as_narH in M tuberculosis -10 and minus35 elementsare indicated in bold italics The red residue indicates SNP responsible for differential expression

Table 6 Determination of transcriptional start sites ofantisense RNAs

Transcript asRNA PCRprod size

TSSresidue type

TSS residue postn(wrt 212297)

T6 mb1618c_as 150 G 1778886

T11 mb1914c_as 220 G 2122978

T25 echA21_as 175 G 4155796

Golby et al BMC Genomics 2013 14710 Page 13 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

in M bovis including many instances of cis-antisenseRNA species Due to their perfect complementarity cisasRNA form a duplex with the sense strand encodedtranscript resulting in either degradation [27] or transla-tion inhibition [28] of the sense mRNA AntisenseRNAs vary in length ranging from 10s to 1000primes ofnucleotides and can be classified according to theirencoded position with respect to the opposite senseencoded gene Thus they can be classified as 5prime or 3primeoverlapping while others are classified as internallylocated The 112101 specific as_Mb1618c is an exampleof a 5prime overlapping asRNA which is encoded antisense togene Mb1618c which is predicted to express a secretorylipase The location of the asRNA transcript suggests itmay function to prevent translation of Mb1618c mRNAby steric hindrance of the ribosome binding siteIn the work presented here strain 245101 expressed

two asRNAs as_Mb1914c and as_echA21 that are notexpressed by any of the other 3 strains They areencoded within the central part of the opposite genesand are therefore likely to modulate the stability of thetranscripts Mb1914c encodes a short chain dehydro-genase while echA21 encodes an enoyl-CoA hydrataseShort chain dehydrogenases catalyse a wide range offunctions so the precise function and identity of thesubstrate is difficult to deduce from sequence aloneEnoyl-CoA hydratases hydrate double carbon-carbonbonds of macromolecules and are vital in the metabol-ism of fatty acids Both gene products would thereforeappear to be involved in the metabolism of a macromol-ecule and their similar expression profiles in this straincould indicate involvement in the metabolism of the samemolecule or molecules that are of the same pathwayIn many instances upregulation of asRNA negatively

correlates with the transcription of the antisense gene[27] but in many cases expression of the antisense tran-script has no effect on the transcription of the oppositegene In our studies expressions of as_Mb1618cas_Mb1914c and as_echA21 did not appear to have anyeffect on the expressions of the opposite sense encodedgenes (data not shown) We have shown that theexpressions of the asRNAs are associated with the pres-ence of SNPs which are either synonymous or non-synonymous with respect to the sense transcript butupstream of the asRNA transcriptional start site Thishighlights the fact that mutations can potentially affectexpression of transcripts on both strands and that theclassification of a SNP is strand dependent For each ofthe three asRNAs the associated SNP was found to belocated within a putative minus10 promoter motif of groupA mycobacterial promoters The sixth residue of the minus10hexamer motif consensus sequence is a strongly conservedlsquoTrsquo residue which is present in 81 of all group A myco-bacterial promoter elements Its importance is underlined

by the finding that the strains that exhibit a lsquoCrsquo residue atthis position show no detectable expression of the asRNAwhile strains having a lsquoTrsquo residue at this position exhibitexpressionSingle nucleotide polymorphisms were found to be

the most frequent form of genetic variation that existsbetween the isolates with a total of 1013 SNPs detectedacross the three strains 112101 245101 and 130701compared to the reference strain M bovis 212297Non-synonymous SNPs which include both non-senseand missense SNPs are a class of SNPs most likely toimpact on protein function and contribute to phenotypicvariation Non-sense SNPs which results in the expressionof a truncated polypeptide due to the introduction of apremature stop codon were identified in five genes acrossthe strains Of these we focussed our attentions on thenon-sense SNP present in a gene encoding the LysR regu-lator Mb2007c as mutations affecting regulators are likelyto impact on the expression of one or more genes that arepart of the regulon of the regulator and are therefore morelikely to result in phenotypic variation Experiments tocompare the transcriptomes of a strain that exhibited themutation with a strain overexpressing a functional regula-tor did not however reveal any differences The reasonfor this unclear but could reflect a requirement of theregulator for a co-inducer that was absent under the con-ditions of the experiment The consequences of missenseSNPs are more difficult to predict as substitutions of oneamino acid for another in a protein sequence do not ne-cessarily lead to a change in protein function Howeverfor genes that are controlled by an autoregulatory mech-anism a mutation that affects the ability of the product ofthe gene to regulate itself will result in a change in expres-sion of the gene In our studies we have shown that thepresence of a missense mutation in a VapB type toxin en-coding gene Mb1749c in strain 130701 results in theupregulation in expression of the toxin-antitoxin encodingpair of genes Mb1750c-Mb1749c due to the inability ofthe encoded proteins to self regulate themselves Toxin-antitoxin systems have a variety of proposed cellular func-tions including general regulation of mRNA stability levelsin the cell [29] Further experiments are required to fullyunderstand the consequences of this mutationGenomic deletions have played an important role in

the evolution of strains belonging to the mycobacterialcomplex [30] and in the derivation of the tuberculosisvaccine strain M bovis BCG [31] In addition to the previ-ously described 68 kb gene deletion that is specific tostrains having a spoligotype 17 pattern [6] we have identi-fied a 16 kb multi-gene deletion that is specific to strain130701 and encompasses genes that are part of the DosRregulon However one of the deleted genes exists as apseudo gene in DosR in strain 212297 so its importanceto the biology of M bovis is unlikely to be significant

Golby et al BMC Genomics 2013 14710 Page 14 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

Several other genes with internal deletions were detectedbut none of the encoded proteins have any significantsimilarity to any protein with a defined function

ConclusionsIn conclusion we have performed a comprehensive ana-lysis of 4 M bovis strains of the most common moleculartypes circulating in GB We show that while these strainsshow extensive similarities in their genetic make-up andgene expression profiles they show distinct differences inthe expression of a subset of genes We provide functionaldata to show that SNPs can lead to the expression of anti-sense RNA a finding with implications for how we definea lsquosilentrsquo nucleotide change Furthermore we show thatthe interpretation of transcriptome data based solely onamplicon arrays could lead to artefacts due to expressionof antisense transcripts a caveat that needs to be kept inmind for previous studies of global expression analysis inbacteria

MethodsBacterial strains media and growth conditionsFor the Mϕ infection experiments bovine alveolar Mϕwere cultivated in tissue culture media R10 whichconsisted of RPMI (Invitrogen) media plus 2 mM glu-tamine 10 calf fetal serum and 1 amphotericinWhere used antibiotics gentamycin and ampicillinwere added at concentrations of 50 and 100 μg mlrespectively M bovis field strains were pre-grown inMiddlebrook 7H9 broth supplemented with 10 albumin-dextrose catalase (ADC Difco) 005 Tween and 10 mMpyruvate Cultures were harvested in mid-logarithmicphase (OD600 of 03-08) washed and then resuspended inRPMI containing 005 Tween 80

Isolation of bovine alveolar macrophages and infectionwith mycobacteriaThe lungs of a 6ndash8 week old male Holstein-Friesian calfwere removed and a whole lung lavage procedure wasperformed to washout the alveolar Mϕ Briefly 4ndash5 x500 ml aliquots of Hanksrsquo Balanced Sterile Salts solution(HBSS) were used to infuse the lungs via the tracheaand the washings were pooled in a sterile beaker The Mϕcells contained in the washes were pelleted by centrifuga-tion at 500 x g for 10 mins at 4degC washed and thenresuspended in R10 growth media supplemented withantibiotics (R10+) to a concentration of 1ndash2 x 107 mlApproximately 05-15 x 109 Mϕ were isolated per calflungVented 225 cm2 tissue culture flasks containing R10+

media were seeded with 3ndash4 x 107 alveolar Mϕ andplaced in a humidified 37degC incubator containing 5CO2 Typically 2ndash4 flasks were used per strain and timepoint After 24 hrs the growth media was decanted to

remove non-adherent cells and then replaced with freshR10+ media After a further 24 hrs the growth mediawas discarded and the Mϕ monolayer was washed withRPMI to remove traces of the antibiotic containinggrowth media The monolayer was then covered withR10 media without antibiotics (R10-) and then infectedwith mid-logarithmic phase grown mycobacteria usingan MOI of approximately 101 (bacilli Mϕ) The AlvMϕwere incubated with mycobacteria for 4 hrs after whichthe cell monolayers were washed with RPMI and theneither processed for RNA extraction (4 hr time point)or incubated in fresh R10+ media for a further 20 hrsbefore being processed for RNA extraction (24 hr timepoint)

Extraction and amplification of mycobacterial RNA frominfected macrophagesMϕ cell monolayers were lysed using a guanidiniumthiocyanate (GTC) containing solution The lysed Mϕrsquoswere vortexed and passed twice through a 21G bluntended needle to sheer host genomic DNA and therebyreduce the viscosity of the solution Mycobacterial cellswere then pelleted by centrifugation at 4600 rpm for 20mins at room temperature and washed with GTC solu-tion to remove host genomic DNA Cells were thenresuspended in Trizol and RNA was extracted using theprotocol outlined in Bacon et al [32] The amount ofpurified DNase-treated RNA recovered was of the order100ndash500 ng per time point RNA was amplified usingthe lsquoMessageAmp II-Bacteria RNA Amplification Kitrsquo(Ambion) according to the manufacturersrsquo instructionsUsing an input of 100 ng of unamplified RNA 20ndash100μg of amplified RNA was recovered

Amplicon microarray analysisFor the in vitro growth experiments three independentexperiments (biological replicates) were carried out andfor each strain in each experiment two microarrays (tech-nical replicates) were performed Thus for each strain 6microarrays were performed Three independent AlvMϕinfection experiments were carried out and for eachexperiment two microarrays were performed for each ofthe control RPMI samples and the 4- and 24 hrs post-infection samples Cy5 and Cy3 fluorescently-labelledprobes were synthesised from RNA and genomicDNA respectively and hybridised to whole genomeM bovis M tuberculosis microarrays The array designis available in BμGSbase (accession number A-BUGS-31httpbugssgulacukA-BUGS-31) and also ArrayExpress(accession number A-BUGS-31) Details of probe synthesishybridization conditions and manufacture of the micro-array can be found in Golby et al 2008 Microarrayswere scanned using an Affymetrix 428 Microarrayscanner and scanned images were quantified using

Golby et al BMC Genomics 2013 14710 Page 15 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

BlueFuse for Microarrays v32 software (BlueGnome)See Golby et al 2008 for further detailsNormalisation was performed by dividing the log ratio

of the Cy5 to Cy3 signal for every spot by the median ofthe log ratios for all spots except control spots A medianabsolute (MAD) scale transformation was applied to thenormalised data from the pas an additional normalisationstep For every microarray duplicate spots were averagedand then the average expression of every gene across alltechnical replicate microarrays was calculated Averagesof the three biological replicates were used to comparegene expression between strains For each gene a mod-erated t-test was applied and those genes with a P- valueless than 005 were selected From this gene list thosegenes whose average expression differed by more than25-fold between strains were selected Fully annotatedmicroarray data have been deposited in BμGSbase (acces-sion number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession number E-BUGS-150)

Oligonucleotide microarray analysisExperiments were performed in a similar way to thatdescribed for the in vitro amplicon array experimentsexcept that the RNA was purified using the mirVanamiRNA Isolation kit (Ambion) which is designed tocapture small (gt20 nt) RNA species RNA and genomicDNA were directly labelled with Cy5- and Cy3 respect-ively using the ULS microRNA labelling kit (Kreatech)according to the manufacturerrsquos instructions PurifiedCy5- and Cy3-labelled probes were co-purified andapplied to an Agilent 40K custom made tiled (10 ntoverlap) 60-mer oligonucleotide microarray designed tothe genomic sequence of M bovis 21229797 The arraydesign is available in BμGSbase (accession numberA-BUGS-52 httpbugssgulacukA-BUGS-52) and alsoArrayExpress (accession number A-BUGS-52) Microarrayswere hybridised at 65degC for 18 hrs and then washed inWash Buffer 1 (Agilent) at room temperature for 1 minuteThe slides were then washed in Wash Buffer 2 (Agilent) at37degC for 1 minute dried and then scanned at 2 μm usingan Agilent DNA microarray scannerTiling array data was analysed using the Limma package

of RBioconductor [33] The signal median was quan-tile normalised between arrays followed by a LOESSnormalisation within arrays Differential expressionanalysis was performed by pairwise comparison usinglinear models and empirical Bayes methods and Pvalues adjusted using the Benjamini and Hochbergsmethod to control for multiple testing Fully annotatedmicroarray data have been deposited in BμGSbase(accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (accession num-ber E-BUGS-150)

Whole genome sequencing of M bovis field isolatesWhole genome paired end (2 x 76bp) sequencing wasperformed using Illumina HiSeq machines at theWellcome Trust Sanger Institute (Hinxton Cambridge)Raw sequence data was uploaded to the European Nucleo-tide Archive (ENA) and can be downloaded at httpwwwebiacukena accession numbers ERX006616 ERX06617ERX012284 and ERX012286 The FastQC (httpwwwbioinformaticsbabrahamacukprojectsfastqc) programwas used to analyse the quality of the raw data reads Rawsequence data was trimmed to remove adapter sequencesand nucleotides where the sequence quality score wasbelow 20 Filtered reads were aligned to the M bovis refer-ence strain 212297 [10] with the SMALT alignmentprogram (httpwwwsangeracukresourcessoftwaresmalt)using default settings The average mean coverages for thesequenced strains 112101 245101 and 130701 were293 224 and 234 respectively Reads that did not maponto the reference genome were de novo assembled andblasted against the NCBI data base in order to find regionspresent in these strains but absent in 212201 Variantcalling and the generation of consensus sequenceswere carried out using the SAMtools program suite(httpsamtoolssourceforgenet) SNPs that had a mini-mum of quality score of 200 and had a minimum of fourgood quality forward (reverse) reads covering the SNP siteand having the variant base were retained The number offorward (reverse) reads mapping with good quality ontothe SNP site having the same base as in the reference hadto be less than 5 of the total number of forward (reverse)reads mapping with good quality onto the SNP site havingthe variant base Sequences of 12 bases in length centredaround SNP sites were blasted to the genome of the refer-ence strain 212297 and any SNP within an area with a90 hit to another area of the genome were filtered outThis eliminated most of the SNPs within the repetitivePE-PGRS regions For all sequenced strains more than999 of the genomes were covered by reads

Real time RT-PCRQuantitative real-time SYBR Green based PCR (qRT-PCR)experiments were performed using a RotorGene 3000(Corbett research) as described by Golby et al [23] Foldchanges were calculated using relative standard curvemethod and pcr controls included no template and noreverse transcriptase Primer pair sequences are given inAdditional file 5 available with the online version ofthis paper

Construction of the Rv1749c-Rv1750c and nirBDoverexpressing plasmidsThe Rv1749c-Rv1750c overexpressing plasmids pPG106and pPG107 were constructed by PCR amplification of a907bp fragment encompassing Rv1749c-Rv1750c-lsquoRv1751c

Golby et al BMC Genomics 2013 14710 Page 16 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

using primers tox_f and tox_r For pPG106 the fragmentwas amplified using 21229797 genomic DNA as a tem-plate while pPG107 was amplified using 130701 gDNABoth fragments were digested with SpeI and cloned intothe SpeI cut mycobacterial attP-integrating shuttle vectorpKINT (a gift from Douglas Young Imperial CollegeLondon) Plasmids pPG108 and pPG109 which contain a35 kb hsprsquo-nirB-nirD-lsquocobU fragment was constructed inseveral steps Firstly two 16 kb hsprsquo-nirBrsquo PCR fragmentswere PCR amplified separately using primers nirB1_f andnirB1_r and 212297 and 130701 as genomic templatesSimilarly two 18kb lsquonirB-nirD-lsquocobU fragments wereamplified using primers nirB2_f and nirB2_r and genomicDNAs 212297 and 130701 The two PCR products weredigested with SpeI and BamHI and then co-ligated intopKINT Details concerning the nucleotide sequences ofthe pcr primer pairs are given in the Additional file 5

5prime-RLM-RACE PCRTranscriptional start site mapping of was determinedusing the First Choice RNA ligase-mediated rapid amp-lification of cDNA ends (RLM-RACE) kit (Ambion) asper manufacturers instructions Briefly 10 ug of totalRNA was treated with calf intestinal phosphatase (CIP)and tobacco acid pyrophosphatase (TAP) before ligationof an RNA Adapter oligonucleotide to the 5prime ends ofthe mRNA transcripts A random-primed reverse tran-scription reaction was carried out to generate cDNAand then nested PCR reactions were performed on thecDNA using combinations of adapter and gene specificprimers Details concerning the nucleotide sequencesof the 5prime outer and inner adapter sequences as well as3prime outer and inner gene specific primers are given inthe Additional file 5 PCR products generated usingthe 5prime inner adapter and 3prime inner gene specific primerswere sequenced by Sanger sequencing using the 3prime innergene specific primer

Animal ethics statementAnimal work was carried out according to the UK Animal(Scientific Procedures) Act 1986 The study protocol wasapproved by the AHVLA Animal Use Ethics Committee(UK Home Office PCD number 706905)

Availability of supporting data sectionFully annotated microarray data have been deposited inBμGSbase (accession number E-BUGS-150 httpbugssgulacukE-BUGS-150) and also ArrayExpress (ac-cession number E-BUGS-150) Raw sequence data wasuploaded to the European Nucleotide Archive (ENA) andcan be downloaded at httpwwwebiacukenadataviewERX006616-ERX06617ERX012284-ERX012286

Additional files

Additional file 1 SNPs identified across sequenced strains

Additional file 2 Large deletions identified by NGS in sequencedstrains

Additional file 3 Fold change differences in gene expression inM bovis field isolates 1121 2451 and 1307 compared to 2122Cells shaded in red indicate upregulation green indicate down-regulation and empty cells indicate no change in expression

Additional file 4 Differential expression of RNA transcripts asdetected by a tiled oligonucleotide microarray Up and down arrowsindicate fold up- and downregulation respectively and empty cellsindicate no change in expression Strand A indicates antisense S senseand I intergenic

Additional file 5 Details of oligonucleotides used in PCR RT-PCRand RLM-RACE experiments

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsPG carried out the culturing of organisms macrophage infectionsmicroarrays qRT-PCR cloning analysis of sequence data and writing of themanuscript JN performed analysis of the Illumina sequence and PCRamplicon microarray data AW and JH performed the analysis of the Agilentoligonucleotide microarray data and assisted in the uploading of themicroarray data into the BμGSbase database MAQ and SB wereresponsible for the Illumina sequencing SH contributed to the analysis ofthe Illumina sequence data NS contributed to the writing and critical reviewof the manuscript RGH was a co-investigator who helped to guide theproject SG was the principle investigator on the project who guided theresearch and helped in the writing of the manuscript All authors read andapproved the final manuscript

AcknowledgementsWe would like to acknowledge Kate Gould for her help in carrying out theoligonucleotide microarrays and Denise Waldron for her assistance inuploading of the microarray data into BμGSbase We also thank MarkChambers for comments on the manuscript and BBSRC and Defra forfunding the work (Grant no BBE0184911)

Author details1Animal Health and Veterinary Laboratories Agency Woodham Lane NewHaw Addlestone Surrey KT15 3NB UK 2Division of Clinical Sciences BacterialMicroarray Group Centre for Infection amp Immunity St Georgersquos University ofLondon Cranmer Terrace London SW17 0RE UK 3The Wellcome TrustSanger Institute Hinxton Cambridgeshire UK 4UCD School of VeterinaryMedicine and UCD Conway Institute University College Dublin Dublin 4Ireland

Received 4 June 2013 Accepted 26 September 2013Published 17 October 2013

References1 Monaghan ML Doherty ML Collins JD Kazda JF Quinn PJ The tuberculin

test Vet Microbiol 1994 40(1ndash2)111ndash1242 Kamerbeek J Schouls L Kolk A van Agterveld M van Soolingen D Kuijper

S Bunschoten A Molhuizen H Shaw R Goyal M et al Simultaneousdetection and strain differentiation of Mycobacterium tuberculosis fordiagnosis and epidemiology J Clin Microbiol 1997 35(4)907ndash914

3 Frothingham R Meeker-OrsquoConnell WA Genetic diversity in theMycobacterium tuberculosis complex based on variable numbers oftandem DNA repeats Microbiology 1998 144(Pt 5)1189ndash1196

4 Smith NH Dale J Inwald J Palmer S Gordon SV Hewinson RG Smith JMThe population structure of Mycobacterium bovis in Great Britain clonalexpansion Proc Natl Acad Sci USA 2003 100(25)15271ndash15275

5 Winder CL Gordon SV Dale J Hewinson RG Goodacre R Metabolicfingerprints of Mycobacterium bovis cluster with molecular type

Golby et al BMC Genomics 2013 14710 Page 17 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710

implications for genotype-phenotype links Microbiology 2006152(Pt 9)2757ndash2765

6 Wheeler PR Brosch R Coldham NG Inwald JK Hewinson RG Gordon SVFunctional analysis of a clonal deletion in an epidemic strain ofMycobacterium bovis reveals a role in lipid metabolism Microbiology2008 154(Pt 12)3731ndash3742

7 Arnvig KB Young DB Identification of small RNAs in Mycobacteriumtuberculosis Mol Microbiol 2009 73(3)397ndash408

8 Arnvig KB Comas I Thomson NR Houghton J Boshoff HI Croucher NJRose G Perkins TT Parkhill J Dougan G et al Sequence-based analysisuncovers an abundance of non-coding RNA in the total transcriptome ofMycobacterium tuberculosis PLoS Pathog 2011 7(11)e1002342

9 Smith NH Upton P Naming spoligotype patterns for the RD9-deletedlineage of the Mycobacterium tuberculosis complex www MbovisorgInfect Genet Evol 2012 12(4)873ndash876

10 Garnier T Eiglmeier K Camus JC Medina N Mansoor H Pryor M Duthoy SGrondin S Lacroix C Monsempe C et al The complete genome sequenceof Mycobacterium bovis Proc Natl Acad Sci USA 2003 100(13)7877ndash7882

11 Albers CA Lunter G MacArthur DG McVean G Ouwehand WH Durbin RDindel accurate indel calls from short-read data Genome Res 201121(6)961ndash973

12 Mostowy S Inwald J Gordon S Martin C Warren R Kremer K Cousins DBehr MA Revisiting the evolution of Mycobacterium bovis J Bacteriol2005 187(18)6386ndash6395

13 Boon C Dick T Mycobacterium bovis BCG response regulator essentialfor hypoxic dormancy J Bacteriol 2002 184(24)6760ndash6767

14 Van Gelder RN von Zastrow ME Yool A Dement WC Barchas JD EberwineJH Amplified RNA synthesized from limited quantities of heterogeneouscDNA Proc Natl Acad Sci USA 1990 87(5)1663ndash1667

15 Pandey DP Gerdes K Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res 200533(3)966ndash976

16 Arcus VL McKenzie JL Robson J Cook GM The PIN-domain ribonucleasesand the prokaryotic VapBC toxin-antitoxin array Protein Eng Des Sel 201124(1ndash2)33ndash40

17 Bukowski M Rojowska A Wladyka B Prokaryotic toxin-antitoxin systemsndashthe role in bacterial physiology and application in molecular biologyActa Biochim Pol 2011 58(1)1ndash9

18 Maddocks SE Oyston PC Structure and function of the LysR-typetranscriptional regulator (LTTR) family proteins Microbiology 2008154(Pt 12)3609ndash3623

19 Moreno-Vivian C Cabello P Martinez-Luque M Blasco R Castillo FProkaryotic nitrate reduction molecular properties and functionaldistinction among bacterial nitrate reductases J Bacteriol 1999 181(21)6573ndash6584

20 Tiffert Y Supra P Wurm R Wohlleben W Wagner R Reuther J TheStreptomyces coelicolor GlnR regulon identification of new GlnR targetsand evidence for a central role of GlnR in nitrogen metabolism inactinomycetes Mol Microbiol 2008 67(4)861ndash880

21 Duan J Wainwright MS Comeron JM Saitou N Sanders AR Gelernter JGejman PV Synonymous mutations in the human dopamine receptor D2(DRD2) affect mRNA stability and synthesis of the receptor Hum MolGenet 2003 12(3)205ndash216

22 Capon F Allen MH Ameen M Burden AD Tillman D Barker JN TrembathRC A synonymous SNP of the corneodesmosin gene leads to increasedmRNA stability and demonstrates association with psoriasis acrossdiverse ethnic groups Hum Mol Genet 2004 13(20)2361ndash2368

23 Golby P Hatch KA Bacon J Cooney R Riley P Allnutt J Hinds J Nunez JMarsh PD Hewinson RG et al Comparative transcriptomics reveals keygene expression differences between the human and bovine pathogensof the Mycobacterium tuberculosis complex Microbiology 2007153(Pt 10)3323ndash3336

24 Bitter W Houben EN Bottai D Brodin P Brown EJ Cox JS Derbyshire KFortune SM Gao LY Liu J et al Systematic genetic nomenclature for typeVII secretion systems PLoS Pathog 2009 5(10)e1000507

25 Sorek R Kunin V Hugenholtz P CRISPRndasha widespread system thatprovides acquired resistance against phages in bacteria and archaeaNat Rev Microbiol 2008 6(3)181ndash186

26 Gomez M Smith I Determinants of Mycobacterial Gene Expression InMolecular Genetics of Mycobacteria Washington DC ASM Press2000111ndash129

27 Duhring U Axmann IM Hess WR Wilde A An internal antisense RNAregulates expression of the photosynthesis gene isiA Proc Natl Acad SciUSA 2006 103(18)7054ndash7058

28 Kawano M Aravind L Storz G An antisense RNA controls synthesis of anSOS-induced toxin evolved from an antitoxin Mol Microbiol 200764(3)738ndash754

29 Thomason MK Storz G Bacterial antisense RNAs how many are thereand what are they doing Annu Rev Genet 2010 44167ndash188

30 Brosch R Pym AS Gordon SV Cole ST The evolution of mycobacterialpathogenicity clues from comparative genomics Trends Microbiol 20019(9)452ndash458

31 Calmette A Guerin C Sur quelques proprietes du bacille tuberculeuxdrsquoorigine cultive sur la bile de boeuf glycerinee C R Acad Sci Paris 1909149716ndash718

32 Bacon J James BW Wernisch L Williams A Morley KA Hatch GJ ManganJA Hinds J Stoker NG Butcher PD et al The influence of reduced oxygenavailability on pathogenicity and gene expression in Mycobacteriumtuberculosis Tuberculosis (Edinb) 2004 84(3ndash4)205ndash217

33 Smyth GK Linear models and empirical bayes methods for assessingdifferential expression in microarray experiments Stat Appl Genet Mol Biol2004 33

doi1011861471-2164-14-710Cite this article as Golby et al Genome-level analyses ofMycobacterium bovis lineages reveal the role of SNPs and antisensetranscription in differential gene expression BMC Genomics 2013 14710

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Golby et al BMC Genomics 2013 14710 Page 18 of 18httpwwwbiomedcentralcom1471-216414710