Comparison of Clostridium botulinum genomes shows the absence of cold shock protein coding genes in...

19
The Botulinum J., Vol. 2, Nos. 3/4, 2013 189 Copyright © 2013 Inderscience Enterprises Ltd. Comparison of Clostridium botulinum genomes shows the absence of cold shock protein coding genes in type E neurotoxin producing strains Henna Söderholm* and Kaisa Jaakkola Department of Food Hygiene and Environmental Health, Faculty of Veterinary Medicine, University of Helsinki, P.O. Box 66, 00014 University of Helsinki, Finland Fax: +358-9-191-57101 E-mail: [email protected] E-mail: [email protected] *Corresponding author Panu Somervuo, Pia Laine, Petri Auvinen and Lars Paulin Institute of Biotechnology, University of Helsinki, P.O. Box 56, 00014 University of Helsinki, Finland Fax: +358-9-191-59366 E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] Miia Lindström and Hannu Korkeala Department of Food Hygiene and Environmental Health, Faculty of Veterinary Medicine, University of Helsinki, P.O. Box 66, 00014 University of Helsinki, Finland Fax: +358-9-191-57101 E-mail: [email protected] E-mail: [email protected] Abstract: To collect specific information about the genetic mechanisms that Clostridium botulinum strains utilise when adapting to changing environments, 16 C. botulinum genomes were analysed with comparative genome sequence analysis. Particular attention was paid to low temperature adaptation and the presence of cold shock protein coding genes in these genomes was evaluated. Surprisingly, unlike any other studied strains, the type E neurotoxin-producing strains lacked these extremely conserved genes. This finding suggests unique mechanisms for the cold tolerance of these strains and offers a new perspective into the investigations concerning this subject. The sizes of the pangenome and core genome of a certain bacterial species are considered to reflect the

Transcript of Comparison of Clostridium botulinum genomes shows the absence of cold shock protein coding genes in...

The Botulinum J., Vol. 2, Nos. 3/4, 2013 189

Copyright © 2013 Inderscience Enterprises Ltd.

Comparison of Clostridium botulinum genomes shows the absence of cold shock protein coding genes in type E neurotoxin producing strains

Henna Söderholm* and Kaisa Jaakkola Department of Food Hygiene and Environmental Health, Faculty of Veterinary Medicine, University of Helsinki, P.O. Box 66, 00014 University of Helsinki, Finland Fax: +358-9-191-57101 E-mail: [email protected] E-mail: [email protected] *Corresponding author

Panu Somervuo, Pia Laine, Petri Auvinen and Lars Paulin Institute of Biotechnology, University of Helsinki, P.O. Box 56, 00014 University of Helsinki, Finland Fax: +358-9-191-59366 E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] E-mail: [email protected]

Miia Lindström and Hannu Korkeala Department of Food Hygiene and Environmental Health, Faculty of Veterinary Medicine, University of Helsinki, P.O. Box 66, 00014 University of Helsinki, Finland Fax: +358-9-191-57101 E-mail: [email protected] E-mail: [email protected]

Abstract: To collect specific information about the genetic mechanisms that Clostridium botulinum strains utilise when adapting to changing environments, 16 C. botulinum genomes were analysed with comparative genome sequence analysis. Particular attention was paid to low temperature adaptation and the presence of cold shock protein coding genes in these genomes was evaluated. Surprisingly, unlike any other studied strains, the type E neurotoxin-producing strains lacked these extremely conserved genes. This finding suggests unique mechanisms for the cold tolerance of these strains and offers a new perspective into the investigations concerning this subject. The sizes of the pangenome and core genome of a certain bacterial species are considered to reflect the

190 H. Söderholm et al.

versatility of the species. While the pangenome of C. botulinum was very large, the core genome appeared strikingly small, both findings highlighting the great diversity of C. botulinum strains.

Keywords: Clostridium botulinum; neurotoxin; pangenome; core genome; genomic comparison; comparative genomics; functional studies; orthologous genes; cold shock protein; CSP; cold tolerance mechanism; food safety.

Reference to this paper should be made as follows: Söderholm, H., Jaakkola, K., Somervuo, P., Laine, P., Auvinen, P., Paulin, L., Lindström, M. and Korkeala, H. (2013) ‘Comparison of Clostridium botulinum genomes shows the absence of cold shock protein coding genes in type E neurotoxin producing strains’, The Botulinum J., Vol. 2, Nos. 3/4, pp.189–207.

Biographical notes: Henna Söderholm graduated from University of Helsinki 2005 and continued her studies as a post-graduate at the Department of Food Hygiene and Environmental Health in the Centre of Excellence in Microbial Food Safety Research. Her studies are focused on the stress response of Clostridium botulinum, with special interest in cold tolerance mechanisms.

Kaisa Jaakkola is studying veterinary medicine and microbial genetics in the University of Helsinki.

Panu Somervuo received his DSc (Tech) degree from Helsinki University of Technology in 2000. His background is in signal processing, pattern recognition, and machine learning. He has been working with automatic speech recognition and neural networks at Neural Networks Research Centre, Finland, 1996–2004. He was a Visiting Researcher at International Computer Science Institute, Berkeley, California, 2002–2003. In 2005, he moved to Viikki campus of University of Helsinki where he has been working with bioinformatics of microarrays and next generation sequencing. Microarray work has included custom array probe design, detection, gene expression, and array comparative genomic hybridisation. NGS work has included analysis and assembly of genomic and RNASeq data. He has been a teacher in several microarray and NGS courses at University of Helsinki and IT centre CSC, one Erasmus course in Italy, and one NGS course in University of Mauritius.

Pia Laine graduated from Espoo Vantaa Institute of Technology to a Bachelor of Engineering in Biotechnology in year 2001 and later in 2004 from University of Helsinki to MSc having major in computer science and minor in mathematics and genetic bioinformatics. She is currently working at the Institute of Biotechnology DNA Sequencing and Genomics Laboratory. Her work is focused on bacterial genome assembly projects using sequencing data from two or more different sequencing platforms, like 454 and solid or illumina without forgetting Sanger sequencing. She also works with several different sequencing projects like eukaryotic genome and various metagenome projects.

Petri Auvinen defended his PhD thesis which concentrated on genetics of human entoroviruses in 1990. Thereafter, he spent first three year post-doctoral period studying molecular biology of Syndecan 1 gene in University of Turku (Finland). The second post doc he spent in EMBL Heidelberg (Germany) studying cell biology polarity of mammalian cells. After the EMBL period, he returned to Finland in 1996 and for three years as a Staff Scientist in the Institute of Biotechnology studied cell biological and molecular biological aspects of RNA viruses. Since 2000, he has worked as a group leader and later as a Laboratory Director in Institute of Biotechnology using genome wide methods studying genome structure and function relationships using DNA microarrays, NGS technologies and related bioinformatics.

Comparison of Clostridium botulinum genomes 191

Lars Paulin graduated in Biochemistry from University of Helsinki in 1984. He worked as a Researcher in a small biotechnology company Genesit Ltd in 1987–1990. He started and headed the DNA Synthesis and Sequencing Laboratory at the Institute of Biotechnology University of Helsinki in 1990. In 2008, the laboratory was fused with the microarray laboratory and formed the current DNA Sequencing and Genomics Laboratory. He is an expert in molecular biology especially DNA sequencing.

Miia Lindström graduated from University of Helsinki and started her academic career in 1998. Her group works in the Centre of Excellence in Microbial Food Safety Research and her main research interests are food pathogenic spore-forming bacteria, with special focus on the epidemiology, diagnostics and genetic mechanisms of Clostridium botulinum. The C. botulinum laboratory at the University of Helsinki runs human and animal botulism diagnostics in Finland.

Hannu Korkeala is a Professor of Food Hygiene at the Centre of Excellence in Microbial Food Safety Research of the Academy of Finland at the Department of Food Hygiene and Environmental Health, University of Helsinki, Finland. He leads a research group focusing on the diagnostics, epidemiology, and genomics of psychrotrophic and spore-forming food borne bacteria. He has published 230 scientific papers in peer-reviewed journals including 49 papers on Clostridium botulinum. At the moment his interest in Clostridium botulinum research is stress response.

1 Introduction

Development of whole-genome sequencing and comparative genomics has elicited considerable new information about bacterial ecology. Significant genomic plasticity among bacterial populations or species has become evident (Ehrlich et al., 2005), forcing the need to define genetic contents of bacterial populations rather than individual strains. Each individual strain of a population consists of a unique set of genes in a large gene pool called apangenome (Medini et al., 2005; Tettelin et al., 2005), contributing to high diversity between the strains (Ehrlich et al., 2005). The term core genome, on the other hand, is used to describe a set of conserved genes present in all strains of a population (Tettelin et al., 2005).

Genetic diversity between individual bacterial strains ensures competition and survival of a bacterial population in the changing environment (Ehrlich et al., 2005). One important group of genes related to bacterial growth under stress encodes the family of cold shock proteins (CSP’s) (Horn et al., 2007; Ehrlich et al., 2005). CSP’s are small (~7.4 kDa), proteins that contain a nucleic acid binding cold shock domain (CSD) (Wistow, 1990; Graumann and Marahiel, 1998). CSD-containing proteins are highly conserved inmost prokaryotes (Mihailovich et al., 2010). CSP’s help bacteria to survive under unfavourable conditions, especially at low temperature (Ermolenko and Makhatadze, 2002). In addition to their role at low temperatures, CSP’s have been reported to function under other stress conditions, such as nutrient starvation, but also under optimal growth conditions (Graumann et al., 1997; Graumann and Marahiel, 1996; Phadtare and Inouye, 2004; Yamanaka et al., 1998). CSP’s are also related to the regulation of numerous other proteins in the complex stress response network (Graumann

192 H. Söderholm et al.

and Marahiel, 1997; Phadtare and Inouye, 2001). The genome of Clostridium botulinum Group I type A1 (hereafter referred to as C. botulinum I A1) neurotoxin-producing strain ATCC 3502 contains three csp genes (Sebaihia et al., 2007). A gene inactivation study showed cspB to encode the major CSP in ATCC 3502, whereas cspA and cspC were suggested to be involved in other types of stress or physiological conditions (Söderholm et al., 2011). So far no reports exist about the occurrence of CSP genes in other C. botulinum strains.

The species name C. botulinum is used for a diverse group of Clostridia that produce botulinum neurotoxins. The seven serologically distinct neurotoxin types are designated A–G; these are further divided into subtypes within a toxin type (Carter et al., 2009; Chen et al., 2007; Hill et al., 2007; Smith et al., 2005). Ingestion of the neurotoxin leads to flaccid paralysis, i.e., botulism, in humans and animals. Strains of C. botulinum are classified in groups (I–IV) by their genetic background and physiological characteristics, such as nutrient utilisation and temperature requirements. Studies with DNA fingerprinting methods have revealed very low similarity between strains of different groups (Keto-Timonen et al., 2005). In line with distinct phenotypic and genetic properties, the different physiological Groups in fact form four separate species (Collins and East, 1998; Hill et al., 2007; Hutson et el., 1993; Lindström and Korkeala, 2006).

In this study, CSP-encoding genes were searched for in 16 C. botulinum genomes. Fifteen genomes were publicly available and one was sequenced by the authors. All studied Group I and Group III C. botulinum strains contained two or three csp genes, whereas among the Group II strains, only one type B toxic strain contained one csp gene. Taking into account the conserved nature of csp genes across kingdoms, this finding was unexpected and intriguing. The absence of csp genes in the psychrotrophic E-toxic C. botulinum strains suggests that some yet unknown mechanism replaces the CSP’s and ensures cold shock tolerance of these strains. In addition, we analysed these 16 C. botulinum genome sequences for identification of the core genomes for Group I C. botulinum and Group II C. botulinum and a composite core genome for Groups I, II and III strains together. Furthermore, the pangenome of Groups I, II and III together was defined based on all 16 genomes.

2 Materials and methods

2.1 C. botulinum genomes

We compared the genomes of 16 C. botulinum strains (Table 1). Fifteen of these strains were publicly available at http://pathema.jcvi.org (Brinkac et al., 2010). Soon after the initial analyses of the available genomes, all data from Pathema was transferred to the VBI PathoSystems Resource Integration Center, PATRIC database, which was used in the subsequent review of the material (http://patricbcr.org) (Gillespie et al., 2011). In addition, we included in the comparison an incomplete genome of a type Eneurotoxin-producing strain, C. botulinum II E CB11/1-1, which was isolated in a Finnish foodborne botulism outbreak (Lindström et al., 2004). Ten of the strains studied represented Group I, four Group II, and two Group III. The reference genomes for all comparisons were I A1 ATCC 3502 for Group I, and II E Alaska for Group II strains.

Comparison of Clostridium botulinum genomes 193

Table 1 Clostridium botulinum strains used in genome analysisa

Size (bp) Group Serotype Strain Plasmid

Chromosome Plasmid GC % content

I A1 Hall NKb 3,760,560 NK 28.18 I A1 ATCC 19397 NK 3,863,450 NK 28.21 I A1(B) NCTC 2916 NK 4,031,357 NK 28.47 I A1 I A1 ATCC 3502 pBOT 3502 3,886,916 16,344 28.24 I A2 Kyoto NK 4,155,278 NK 28.21 I A3 Loch Maree pCLK 3,992,906 266,785 28.14 I B1 Okra pCLD 3,958,233 148,780 28.23 I Ba4 657 pCLJ I/pCLJ II 3,977,794 9,953; 270,022 28.04 I Bf NK 4,217,754 NK 28.23 I F Langeland PCLI 3,995,387 17,531 28.30 II B Eklund 17B PCLL 3,800,327 47,642 27.48 II E3 Alaska E43 NK 3,659,644 NK 27.36 II E1 ‘BoNT E Beluga’ NK 3,999,201 NK 27.44 II E CB 11/1-1 NAc NA NA NA III C Eklund NK 2,961,186 NK 28.98 III D 1873 pCLG I/pCLG II 2,237,359 107,690; 54,152 27.72

Notes: aInformation collected from http://pathema.jcvi.org and http://patricbrc.org; bNK, not known; cNA, not analysed.

The CB11/1-1 genome was sequenced using 454 Genome Sequencers GS20 and GS Flx (Margulies et al., 2005) generating 134.3 Mbps in 1,080,733 reads. De novo assembly of the obtained reads was done using gs Assembler (version 1.1.03), resulting in 3,884,990 bps in 639 contigs ( >= 100 bp) at an approximate coverage of 35 X. The Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AORM00000000. Table 2 Reference genes for searching for csp homologues in Clostridium botulinum

Species Strain Genea Size (amino acids) C. botulinum I A1 ATCC 3502 NT02CB031 (cspA) 65 NT02CB138 (cspB) 67 NT02CB176 (cspC) 69 II B Eklund 17B CLL A1515 69 III C Eklund CBC 0846 66 CBC A0872 65 C. beijerinckii NCIMB8052 NT08CB3149 63 NT08CB2991 69 C. butyricum CBy5521 CBY 1397 69 BL5262 CLP 2174 69

Note: ahttp://pathema.org.

194 H. Söderholm et al.

Table 3 Amino acid sequence alignment of the cold shock proteins of Clostridium botulinum, Clostridium perfringens, Clostrdium beijerinckii and Clostridium butyricum

Org

anis

m

Stra

in

Gen

ea Am

ino

acid

sequ

ence

(5’ →

3’)

C. b

otul

inum

cspA

b

Gro

up I

NC

TC 2

916

CB

N 0

311

---M

NG

TVK

WFN

GEK

GFG

FITG

EDG

ND

VFA

HFS

QIN

SEG

-YK

SLEE

GQ

KV

SYD

VV

KG

PKG

PQA

ENIT

II

A

TCC

193

97

CLB

032

6 --

-MN

GTV

KW

FNG

EKG

FGFI

TGED

GN

DV

FAH

FSQ

INSE

G-Y

KSL

EEG

QK

VSY

DV

VK

GPK

GPQ

AEN

ITII

ATC

C 3

502

NT0

2CB

0318

--

-MN

GTV

KW

FNG

EKG

FGFI

TGED

GN

DV

FAH

FSQ

INSE

G-Y

KSL

EEG

QK

VSY

DV

VK

GPK

GPQ

AEN

ITII

Bf

CB

B 0

302

---M

NG

TVK

WFN

GEK

GFG

FITG

EDG

ND

VFA

HFS

QIN

SEG

-YK

SLEE

GQ

KV

SYD

VV

KG

PKG

PQA

ENIT

II

H

all

CLC

034

1 --

-MN

GTV

KW

FNG

EKG

FGFI

TGED

GN

DV

FAH

FSQ

INSE

G-Y

KSL

EEG

QK

VSY

DV

VK

GPK

GPQ

AEN

ITII

657

CLJ

B03

36

---M

NG

TVK

WFN

GEK

GFG

FITG

EDG

ND

VFA

HFS

QIN

SEG

-YK

SLEE

GQ

KV

SYD

VV

KG

PKG

PQA

ENIT

II

O

kra

CLD

046

9 --

-MN

GTV

KW

FNG

EKG

FGFI

TGED

GN

DV

FAH

FSQ

INSE

G-Y

KSL

EEG

QK

VSY

DV

VK

GPK

GPQ

AEN

ITII

Lang

elan

d C

LI 0

355

---M

NG

TVK

WFN

GEK

GFG

FITG

EDG

ND

VFA

HFS

QIN

SEG

-YK

SLEE

GQ

KV

SYD

VV

KG

PKG

PQA

ENIT

II

K

yoto

C

LM 0

340

---M

NG

TVK

WFN

GEK

GFG

FITG

EDG

ND

VFA

HFS

QIN

SEG

-YK

SLEE

GQ

KV

SYD

VV

KG

PKG

PQA

ENIT

II

Lo

ch M

aree

C

LK 3

469

---M

NG

TVK

WFN

GD

KG

FGFI

TGED

GN

DV

FAH

FSQ

INSE

G-Y

KSL

EEG

QK

VSY

DV

VK

GPK

GPQ

AEN

ITII

----

----

--*-

----

----

----

----

----

----

----

----

----

----

----

----

----

- G

roup

III

Eklu

nd

CB

C A

0872

--

-MTG

TVK

WFN

AEK

GFG

FITT

EEG

ND

VFA

HFS

QIN

KD

G-F

KTL

EEG

QN

VSF

DV

VEG

AK

GPQ

AEN

ISV

L

1873

C

LG B

0776

--

-MTG

TVK

WFN

AEK

GFG

FITT

EEG

ND

VFA

HFS

QIN

REG

-FK

TLD

EGQ

NV

SFD

VV

EGA

KG

PQA

ENIT

IL

--

----

----

----

----

----

----

----

---*

*- --

--*-

----

----

----

----

----

**-

Not

es: a ht

tp://

path

ema.

org;

b Gen

e na

mes

use

d ac

cord

ing

to C

. bot

ulin

um I

A1

ATC

C 3

502

(Seb

aihi

a et

al.,

200

7; S

öder

holm

et a

l., 2

011)

.

Comparison of Clostridium botulinum genomes 195

Table 3 Amino acid sequence alignment of the cold shock proteins of Clostridium botulinum, Clostridium perfringens, Clostrdium beijerinckii and Clostridium butyricum (continued)

Org

anis

m

Stra

in

Gen

ea Am

ino

acid

sequ

ence

(5’ →

3’)

cspB

b

Gro

up I

Okr

a C

LD 3

152

--M

KTG

TVK

WFN

SEK

GFG

FIEV

EGEK

DV

FVH

FSA

IQG

DEP

RKN

LEEG

QK

VQ

FEV

EEG

QK

GPQ

AA

NV

IKL

La

ngel

and

CLI

148

3 --

MK

TGTV

KW

FNSE

KG

FGFI

EVEG

EKD

VFV

HFS

AIQ

GD

EPRK

NLE

EGQ

KV

QFE

VEE

GQ

KG

PQA

AN

VIK

L

ATC

C 1

9397

C

LB 1

411

--M

KTG

TVK

WFN

SEK

GFG

FIEV

EGEK

DV

FVH

FSA

IQG

DEP

RKN

LEEG

QK

VQ

FEV

EEG

QK

GPQ

AA

NV

IKL

N

CTC

291

6 C

BN

154

6 --

MK

TGTV

KW

FNSE

KG

FGFI

EVEG

EKD

VFV

HFS

AIQ

GD

EPRK

NLE

EGQ

KV

QFE

VEE

GQ

KG

PQA

AN

VIK

L

Kyo

to

CLM

156

1 --

MK

TGTV

KW

FNSE

KG

FGFI

EVEG

EKD

VFV

HFS

AIQ

GD

EPRK

NLE

EGQ

KV

QFE

VEE

GQ

KG

PQA

AN

VIK

L

Loch

Mar

ee

CLK

082

8 --

MK

TGTV

KW

FNSE

KG

FGFI

EVEG

EKD

VFV

HFS

AIQ

GD

EPRK

NLE

EGQ

KV

QFE

VEE

GQ

KG

PQA

AN

VIK

L

ATC

C 3

502

NT0

2CB

1388

--

MK

TGTV

KW

FNSE

KG

FGFI

EVEG

EKD

VFV

HFS

AIQ

GD

EPRK

NLE

EGQ

KV

QFE

VEE

GQ

KG

PQA

AN

VIK

L

Hal

l C

LC 1

422

--M

KTG

TVK

WFN

SEK

GFG

FIEV

EGEK

DV

FVH

FSA

IQG

DEP

RKN

LEEG

QK

VQ

FEV

EEG

QK

GPQ

AA

NV

IKL

65

7 C

LJ B

1501

--

MK

TGTV

KW

FNSE

KG

FGFI

EVEG

EKD

VFV

HFS

AIQ

GD

EPRK

NLE

EGQ

KV

EFEV

EEG

QK

GPQ

AA

NV

IKL

B

f C

BB

165

8 --

MK

TGTV

KW

FNSE

KG

FGFI

EVEG

EKD

VFV

HFS

AIQ

GD

EPRK

NLE

EGQ

KV

EFEV

EEG

QK

GPQ

AA

NV

IKL

----

----

----

----

----

----

----

----

----

----

----

----

*---

----

----

----

---

Gro

up II

I 18

73

CLG

B02

29

--M

KTG

IVK

WFN

AEK

GFG

FISV

EGED

DV

FVH

FSA

IQG

DG

-FK

TLEE

GQ

KV

EFEV

TEG

AR

GPQ

AA

NV

VK

L

Eklu

nd

CB

C 0

846

--M

KTG

IVK

WFN

AEK

GFG

FISV

EGED

DV

FVH

FSA

IQG

DG

-FK

TLEE

GQ

KV

EFEV

TEG

AR

GPQ

AA

NV

VK

L

----

----

----

----

----

----

----

----

----

- ----

----

----

----

----

----

----

- C

. bei

jeri

ncki

i N

CIM

B80

52

NT0

8CB

3149

--

MK

TGTV

KFF

NSE

KG

FGFI

EVEG

EKD

VFV

HSS

SLSG

FS--

--IQ

EGD

KV

QFD

VEK

GTK

GPQ

ATN

IQR

V

Not

es: a ht

tp://

path

ema.

org;

b Gen

e na

mes

use

d ac

cord

ing

to C

. bot

ulin

umI A

1 A

TCC

350

2 (S

ebai

hia

et a

l., 2

007;

Söd

erho

lm e

t al.,

201

1).

196 H. Söderholm et al.

Table 3 Amino acid sequence alignment of the cold shock proteins of Clostridium botulinum, Clostridium perfringens, Clostrdium beijerinckii and Clostridium butyricum (continued)

Org

anis

m

Stra

in

Gen

ea Am

ino

acid

sequ

ence

(5’ →

3’)

C. b

otul

inum

cspC

b

Gro

up I

ATC

C 3

502

NT0

2CB

1769

M

SMH

TGTV

KW

FDN

ERG

YG

FIA

GN

NG

KD

VY

VH

SMQ

IKEK

TLN

KD

LHEG

EEV

LFD

IVEK

EKG

PIA

INV

QK

L

ATC

C 1

9397

C

LB 1

707

MSM

HTG

TVK

WFD

NER

GY

GFI

AG

NN

GK

DV

YV

HSM

QIK

EKTL

NK

DLH

EGEE

VLF

DIV

EKEK

GPI

AIN

VQ

KL

H

all

CLC

171

5 M

SMH

TGTV

KW

FDN

ERG

YG

FIA

GN

NG

KD

VY

VH

SMQ

IKEK

TLN

KD

LHEG

EEV

LFD

IVEK

EKG

PIA

INV

QK

L

Okr

a C

LD 2

867

MSM

HTG

TVK

WFD

NER

GY

GFI

AG

NN

GK

DV

YV

HY

MQ

IKEK

THN

KD

LHEG

EEV

LFD

IVEK

EKG

PIA

INV

QK

L

Lang

elan

d C

LI 1

767

MSM

HTG

TVK

WFD

NER

GY

GFI

AG

NN

GK

DV

YV

HY

MQ

IKEK

THN

KD

LHEG

EEV

LFD

IVEK

EKG

PIA

INV

QK

L

Loch

Mar

ee

CLK

115

4 M

SMH

TGTV

KW

FDN

ERG

YG

FISG

NN

GK

DV

YV

HSM

QIK

EKTH

NK

DLH

EGEE

VLF

DIV

EKEK

GPI

AIN

VQ

KL

--

----

----

----

----

--*-

----

----

-*--

----

-*--

----

----

----

----

----

----

---

C. p

erfr

inge

ns

13

NT0

3CP1

318

MSS

RTG

IVK

WFN

QEK

GY

GFI

SCD

EGD

DV

FVH

ISQ

VK

EKG

PEK

DLH

EGES

VSF

DIS

EGEK

GPM

ATN

VQ

KL

D

JGS1

721

CJD

167

7 M

SSR

TGIV

KW

FNQ

EKC

YG

FISC

DEG

DD

VFV

HIS

QV

KEK

GPE

KD

LHEG

ESV

SFD

ISEG

EKG

PMA

TNV

QK

L

C JG

S149

5 C

PC 1

464

MSS

RTG

IVK

WFN

QEK

GY

GFI

SCD

EGD

DV

FVH

ISQ

VK

EKG

PEK

DLH

EGES

VSF

DIS

EGEK

GPM

ATN

VQ

KL

C

PE F

4969

A

C5

1550

M

SSR

TGIV

KW

FNQ

EKG

YG

FISC

DEG

DD

VFV

HIS

QV

KEK

GPE

KD

LHEG

ESV

SFD

ISEG

EKG

PMA

TNV

QK

L

E JG

S198

7 A

C3

1643

M

SSR

TGIV

KW

FNQ

EKG

YG

FISC

DEG

DD

VFV

HIS

QV

KEK

GPE

KD

LHEG

ESV

SFD

ISEG

EKG

PMA

TNV

QK

L

B A

TCC

3626

A

C1

1645

M

SSR

TGIV

KW

FNQ

EKG

YG

FISC

DEG

DD

VFV

HIS

QV

KEK

GPE

KD

LHEG

ESV

SFD

ISEG

EKG

PMA

TNV

QK

L

NC

TC82

39

AC

7 14

99

MSS

RTG

IVK

WFN

QEK

GY

GFI

SCD

EGD

DV

FVH

ISQ

VK

EKG

PEK

DLH

EGES

VSF

DIS

EGEK

GPM

ATN

VQ

KL

A

TCC

1312

4 C

PF 1

452

MSS

RTG

IVK

WFN

QEK

GY

GFI

SCD

EGD

DV

FVH

ISQ

VK

EKG

PEK

DLH

EGES

VSF

DIS

EGEK

GPM

ATN

VQ

KL

--

----

----

----

-*--

----

----

----

----

----

----

----

----

----

----

----

----

---

C. b

otul

inum

Gro

up II

Ek

lund

17B

C

LL A

1515

M

ASR

TGIV

KW

FNA

EKG

YG

FISC

DEG

DD

VFA

HH

SQIK

ENG

PEK

DLH

EGES

VTF

DIQ

DG

EKG

PMA

TNIQ

KL

C. b

eije

rinc

kii

NC

IMB

8052

N

T08C

B29

91

MA

KV

TGV

VK

WFD

TER

GY

GFI

SCD

KG

DD

VFV

HH

SQIK

DK

GPD

KD

LHED

ESV

TFD

IESG

EKG

PMA

TNV

QK

L C

. but

yric

um

CB

y552

1 C

BY

139

7 M

AQ

NTG

TVK

WY

DR

EKG

YG

FISC

DEG

ND

VFA

HH

SQIK

DN

GPE

KD

LKEG

ESV

TFSI

EESD

KG

PMA

INIQ

KF

B

L526

2 C

LP 2

174

MA

QN

TGTV

KW

YD

REK

GY

GFI

SCD

EGN

DV

FAH

HSQ

IKD

NG

PEK

DLK

EGES

VTF

SIEE

SDK

GPM

AIN

IQK

F

----

----

----

----

----

----

----

----

----

-----

----

----

----

----

----

----

----

Not

es: a ht

tp://

path

ema.

org;

b Gen

e na

mes

use

d ac

cord

ing

to C

. bot

ulin

um I

A1

ATC

C 3

502

(Seb

aihi

a et

al.,

200

7; S

öder

holm

et a

l., 2

011)

.

Comparison of Clostridium botulinum genomes 197

2.2 CSP coding genes

CSP genes (csp’s) of all studied strains were searched for in the genome databases [http://pathema.jcvi.org (Brinkac et al., 2010) and http://patricbcr.org (Gillespie et al., 2011)] by simple text searches. In addition, csp’s in all C. botulinum strains were searched by reciprocal BLAST using three predicted I A1 ATCC 3502 csp’s and their flanking genes as a reference (Table 2). The predicted csp homologue of II B Eklund 17B and III C Eklund as well as C. butyricum and C. beijerenckii, the close relatives of Group II C. botulinum (Collins and East, 1998; Keto-Timonen et al., 2006), were also used as reference sequences in the search for putative csp genes in type E strains (Table 2). Finally, the search was complemented with csp sequences of several other non-clostridial species (not shown). The DNA and amino acid sequences of the putative csp’s were compared using the Align and Clustal W software (http://ebi.ac.uk/embl/) (Table 3). The CSP sequences of C. perfringens were included in the comparison (Table 3). In addition to BLAST, csp’s for type E strains were also searched for in protein family (PFAM) databases using hidden Markov models (HMMs) (Wu and Xie, 2010).

2.3 Assessment of orthologous genes and construction of the core genome and pangenome

The different genetic background and phenotypic properties of the four groups of C. botulinum suggest that these groups represent four distinct bacterial species (Collins and East, 1998; Hill et al., 2007). Because of that, a prediction of a total core genome across all the strains alone did not seem appropriate. Instead, the core genome was predicted separately for Group I and Group II genomes. However, in addition to that, the core genome of the whole species was defined to compare total and group-related core genomes and to emphasise the small degree of similarity between the groups. The pangenome was predicted to reinforce the understanding of C. botulinum as four distinct, albeit clinically important neurotoxigenic clostridial species rather than a uniform bacterial species.

The orthologous genes were predicted using a reciprocal basic local alignment search tool (BLAST) that finds gene pairs with the highest homology scores when comparing two genomes. The same pair of sequences had to result in a best match with either of them used as a reference; this is called abi-directional best hit (BBH). The BLAST score ratio (BSR) approach (Rasko et al., 2005) was utilised for further comparison. Briefly, for each predicted protein of the reference strain a raw BLAST score for the alignment against itself was stored as a reference score. Each reference peptide was then compared with all of the predicted proteins of all C. botulinum genomes and the obtained BLAST raw scores were normalised to the reference scores. Peptides with a normalised ratio of ≥ 0.4 with the BBH constrain were considered to be homologous.

Both group-specific core genomes and the total core genome for the species were calculated based on orthologous gene pairs. For the core genome prediction, the orthologous genes between strains X and Y were first determined, and then all genes that were not present in genome Z were omitted from the comparison. This step was repeated until only genes with a predicted orthologous counterpart in every relevant strain remained. The pangenome for the 16 studied strains was calculated using the BLAST clust algorithm. With this method, large clusters of homologous genes are obtained instead of homologous gene pairs and the clusters can contain several genes from the

198 H. Söderholm et al.

same genome. The C. botulinum pangenome was determined by the results of clustering. The goal was to group orthologous sequences together. The total amount of genes found in the studied genomes can be considered as their pangenome.

3 Results

3.1 Characteristics of the studied genomes

The average size of the studied genomes, including the chromosome and one or two putative plasmids, was 3.84 Mbp (median 4.00 Mbp) (Table 1). The largest genomes were those of strains I A3 Loch Maree and I Ba4 strain 657 (4.25 Mbp), and the smallest one was that of III D 1873 (2.54 Mbp) (Table 1). Six strains contained one plasmid and two contained two plasmids. The sizes of the plasmids varied greatly from 16.3 kbp (I A1 ATCC 3502) to 270.4 kbp (I Ba4 657) (Table 1).

3.2 CSP coding genes

The nomenclature of csp genes is diverse. Some predicted csp homologues of clostridia have not been named, and for others names vary between databases. The nomenclature (cspA, cspB and cspC) established for the csp homologues of the Group I reference strain of this study C. botulinum I A1 ATCC 3502 (Sebaihia et al., 2007; Söderholm et al, 2011) was used and any CDSs showing a greater than 97% identity to the cspA, cspB and cspC were similarly named. For homologues sharing less than 97% identity with the csp genes of ATCC 3502, we used the symbol given in http://pathema.jcvi.org.

All Group I strains contained homologues for cspA and cspB, and their flanking genes were conserved. A homologue for cspC was present in all Group I strains other than C. botulinum I A2 Kyoto, C. botulinum I A1 NCTC 2916, C. botulinum I Bf and C. botulinum I Ba4 657. Nevertheless, the cspC flanking genes found in A1 ATCC 3502 had orthologues in all Group I strains, including the ones missing cspC. In multiple sequence alignment, the csp genes of all Group I strains clustered consistently in three distinct groups (Figure 2, Table 3) and shared 97–100% identity with each other.

Of Group II strains, only II B Eklund appeared to contain one csp homologue (CLL A1515). This CDS shared moderate homology with Group I and Group III csp alleles (identity 49 to 51% with cspA, 41 to 49% with cspB and 59 to 61% with cspC, 41 to 49% with cspB and), but was more closely related to the eight csp CDSs predicted in C. perfringens, the two csp genes of two different C. butyricum strains and the one csp of C. beijerinckii (NT08CB2991) (Figure 2, Table 3). No homologues for csp genes were found in any of the three type E C. botulinum genomes.

The two Group III strains, C. botulinum III C Eklund and C. botulinum III D 1873, both contained two csp homologues (CBC A0872 and CBC 0846, and CLG B0776 and CLG B0229, respectively). All of these sequences were 92 to 100% identical and clustered with Group I cspA and cspB.

In addition to BLAST, csp genes for type E strains were also searched for in PFAM databases based on HMMs, but no matches were found.

Comparison of Clostridium botulinum genomes 199

3.3 Orthologous genes and the core genome and pangenome

Orthologous gene pairs were drawn between all strains based on the amino acid sequences of their predicted protein coding sequences (CDSs). For Group I strains, the proportion of orthologous pairs was an average of 85% of the genome between any two strains. Based on the number of orthologous pairs, on average 78% of the genes between any two Group II, and 72% of any two of the Group III strains were common. The ratio of orthologous pairs between any two strains from any Group was approximately 65%. The number of pairwise orthologues between any two strains is presented in Figure 1.

The core genomes, which represent all the common genes between a group of genomes were derived based on orthologous gene pairs by reciprocal BLAST analysis.

Figure 1 Number of predicted orthologous gene pairs between Clostridium botulinum strains

Note: NA: not analysed.

200 H. Söderholm et al.

Figure 2 Clustering of amino acid sequences of the cold shock proteins of Clostridium botulinum, Clostridium perfringens, Clostrdium beijerinckii and Clostridium butyricum

The core genome of Group I strains consisted of 2,758 genes. This corresponds to 70% of the genes of an average Group I genome. The Group II core genome consisted of 2,456 genes (67%). A total core genome of C. botulinum, evaluated based on the 16 studied genomes, contained 1,076 genes for which a predicted orthologue was found from every strain. This corresponds to 29% of the genes of an average C. botulinum genome (Table 4). Table 4 Sizes of the calculated core genomes for Clostridium botulinum and for C. botulinum

Groups I and II separately

Group Number of genomes Size of core genome (clusters)

Proportion of average genome of the group (%)

Groups I–III 16 1,076 29.0 Group I 10 2,758 70.4 Group II 4 2,456 66.9

Comparison of Clostridium botulinum genomes 201

Table 5 Number of genes based on gene role category in Clostridium botulinum Groups I and II

Number of genes

Reference strains Core genome (% of the reference strain) Gene role categorya

Group I A1 ATCC 3502

Group IIE3 Alaska

Group I (ten strains)

Group II (four strains)

Amino acid biosynthesis 63 97 54 (86) 82 (85) Biosynthesis of cofactors, prosthetic groups and carriers

102 103 97 (95) 90 (87)

Cell envelope 301 290 235 (78) 219 (76) Cellular processes 254 295 208 (82) 236 (80) Sporulation and germination 40 82 39 (98) 78 (95) Pathogenesis 13 18 10 (77) 8 (44) Adaptation to atypical conditions 19 16 17 (89) 13 (81) Toxin production and resistance 61 31 37 (61) 19 (61) Chemotaxis and motility 73 94 61 (84) 72 (77) Cell division 25 25 25 (100) 24 (96) DNA metabolism 128 121 97 (76) 100 (83) Energy metabolism 258 288 236 (91) 245 (85) Fatty acid and phospholipid metabolism

37 51 36 (97) 42 (82)

Protein fate 164 122 142 (87) 113 (93) Protein synthesis 171 146 163 (95) 132 (90) Central intermediary metabolism 69 91 58 (84) 73 (80) Purines, pyrimidines, nucleosides and nucleotides

86 79 80 (93) 74 (94)

Regulatory functions 292 253 224 (77) 198 (78) Signal transduction 65 46 53 (82) 41 (89) Transcription 76 59 64 (84) 52 (88) Transport and binding proteins 521 377 436 (84) 300 (80) Mobile and extrachromosomal element functions

67 46 16 (23) 14 (30)

Unknown function Enzymes of unknown function 165 157 144 (87) 121 (77) Unknown function 275 227 229 (83) 196 (86) Disrupted reading frame 8 1 5 (63) 0 (0) Conserved hypothetical proteins 510 426 383 (75) 287 (67) Hypothetical proteins with conserved domain

31 25 15 (48) 17 (68)

Hypothetical proteins 297 312 128 (43) 124 (40)

Note: ahttp://pathema.org.

202 H. Söderholm et al.

The size of the pangenome of all C. botulinum genomes analysed was evaluated according to the results of orthologous clustering. The pangenome consisted of 18,385 genes.

3.4 Comparison of Group I and II C. botulinum core genomes

The core genomes of Group I and Group II C. botulinum were further specified based on the functional classification of strain I A1 ATCC 3502 and strain II E3 Alaska genes (Table 5). Approximately 80% of the genes of I A1 ATCC 3502 classified in functional categories were conserved in the Group I core genome, and 76% of the classified genes of II E3 Alaska were conserved in the Group II core genome. Some variation in conservation between the categories was observed. In both core genomes, the highest conservativeness was expectedly found for CDSs with predicted functions in cell division (100% in Group I core genome and 96% in Group II) and the lowest for those encodingmobile and extrachromosomal elements (23% in Group I, 30% in Group II). A large variation was also noted in genes with predicted functions in pathogenesis, toxin production and resistance. The number of conserved genes in these categories was 77% and 61% for Group I genomes and only 44% and 61% for Group II genomes, respectively.

4 Discussion

Groups I and III genomes contained two or three csp genes. Some of the Group I strains and both Group III strains seemed to miss a single csp homologue in otherwise conserved genomic loci. CSP’s are known to compensate for each other’s functions (Palonen et al., 2010) suggesting not all csp genes present in a genome are essential for cold tolerance. We have previously shown that inactivation of cspB resulted in impaired growth at 15°C (Söderholm et al., 2011). The fact that all Group I and III strains have a homologue to cspB further strengthens the proposed role of cspB as the major CSP in C. botulinum Group I (Söderholm et al., 2011). The role of cspA and cspC is clear since inactivation of these genes did not cause major defects in growth at temperatures below 37°C, the expected optimal growth temperature of Group I C. botulinum (Söderholm et al., 2011). Moreover, no cspC homologues were found in Group I strains Kyoto, Bf, 57 and NCTC 2916 or from any of the two Group III strains. In other bacteria, some CSP’s have been proposed to be related to chromosomal condensation (Yamanaka et al., 1998) or to be associated with stationary phase events (Graumann and Marahiel, 1998; Yamanaka and Inouye, 1997). Further research is warranted to reveal whether cspA and cspC in C. botulinum are involved with cellular processes besides cold tolerance, and whether different CSP’s compensate for each other.

Considering that CSP’s are conserved over kingdoms (Cavicchioli, 2006; Palonen et al., 2010), their scarcity in the psychrotrophic Group II strains was very surprising. Of the Group II strains, only B Eklund was found to contain one probable CSP coding gene, while the type E strains, showing an equal or even lower minimum growth temperature than B Eklund (Derman et al., 2011), lacked these genes. According to Graumann et al. (1997), csp homologues are essential for growth of Bacillus subtilis, a model organism of spore-forming bacteria at 15°C. As indicated above, this applied also to cspB, and to a lesser extent to cspC, in the mesophilic C. botulinum ATCC 3502 (Söderholm et al.,

Comparison of Clostridium botulinum genomes 203

2011). The scarcity of csp genes in the psychrotrophic Group II C. botulinum suggests that there is some other, as yet unknown mechanism behind the cold shock response of type E strains. While mostly studied in the mesophilic E. coli, homologues of csp genes have also been found in other psychrotrophs, such as representatives of the genera Yersinia (Palonen et al., 2010), Listeria (Bayles et al., 1996; Wemekamp-Kamphuis et al., 2002) and Pseudomonas (Gumley and Innis, 1996; Michel et al., 1997). The synthesis of cold induced proteins in psychrotrophic bacteria seems to be actively regulated and the relative level of their synthesis is lower than in mesophiles. However, at least some of them are considered to be essential for the psychrotrophs to survive from the cold shock (Hébraud and Potier, 1999). In general, despite their important role in cold shock response, CSPs are probably a small part of a large, complex regulatory network ensuring cold tolerance in all species. How their role differs between psychrotrophic and mesophilic bacteria is an interesting topic for future research.

Analysis of the number of orthologous gene pairs gives a preliminary view of the similarity between two strains and creates a base for genomic analysis. In the case of C. botulinum, although variation between the numbers of orthologous gene pairs within each physiological group was observed, no strain differed greatly from the others within each group, and orthologues were identified evenly between strains within each group. Most of the predicted orthologous pairs had high BSRs, and the annotation was similar among the paired genes. Within the groups, all orthologous CDSs were more than 70% identical, most of them even more than 90% identical. The principle of the method is thus simple but proven effective (Altenhoff and Dessimoz, 2009).

The core genomes of Group I (2,758 clusters) and Group II (2,456 clusters), constructed based on the orthologous gene pairs, contained more than twice the number of genes of the whole species core genome (1,076 genes). The core genome of Group I corresponds to the one predicted by Carter et al. (2009). According to their microarray studies, 63% of the genes of I A1 ATCC 3502 were conserved in 61 studied Group I C. botulinum and C. sporogenes strains. In ten similar type A1 neurotoxin-producing strains, 89% of the genes of I A1 ATCC 3502 were conserved (Carter et al., 2009).

For our analysis of the total core genome, 16 strains were used, of which only two represented Group III and none represented Group IV. Thus, the true core genome of C. botulinum is probably considerably smaller than now predicted. The core genome of the C. botulinum groups studied likely resembles the core genome of clostridia in general: three distinct species of clostridia were shown to share an average of 850 to 950 genes (Paredes et al., 2005). The core genome established for C. botulinum in this study was only slightly larger, consisting of 1,076 genes. Scaria et al. (2010) predicted the core genome of Clostridium difficile to contain 947 to 1,033 genes which is considered unusually low for an individual bacterial species. However, based on the existing knowledge on the diversity among C. botulinum genomes representing the different physiological groups (Keto-Timonen et al., 2005, 2006) the finding of a remarkably small core genome was expected.

The pangenome of all the 16 C. botulinum strains analysed consisted of 18,385 orthologous clusters; in other words, an individual genome would contain approximately 20% of the possible genes of the species. In reality, the percentage is probably even lower and the real pangenome of C. botulinum is likely to be larger than that estimated here. Including more genomes in the analysis would presumably have increased the predicted size of the pangenome. The estimated size of the C. difficile pangenome increased notably upon addition of any single genome in the analysis, until a

204 H. Söderholm et al.

plateau of 9,640 genes was reached. Approximately 26 genomes were required to determine the size of the pangenome for C. difficile (Scaria et al., 2010). Compared with the pangenomes reported for most other bacteria, that of C. botulinum appears to be very large. Only the pangenome of Escherichia coli has been estimated to comprise 13,000 genes (Rasko et al., 2008). A large pangenome reflects the heterogeneity and wide adaptation of a species (Tettelin et al., 2005), and is not a surprising finding for C. botulinum, the strains of which form four genetically and phenotypically distinct groups that challenge the species concept since decades (Collins and East, 1998; Hill et al., 2007; Hutson et al., 1993; Lindström and Korkeala, 2006). A large pangenome also shows that, as opposed to bacteria restricted to the shelter of the homeostasis of mammalian hosts, the four groups of C. botulinum require a large gene pool for adaptation to various soil and aqueous environments.

The relatively large variation observed in CDSs related to mobile and extrachromosomal functions was not surprising. The variety of plasmids is complemented by the prophages and transposon-related CDSs present in most genomes. Nevertheless, in I A1 ATCC 3502 only approximately 1% of all CDSs represented mobile and extrachromosomal elements (Sebaihia et al., 2007), whereas the proportion of CDSs representing this functional category in C. difficile strain 630 was tenfold higher (Sebaihia et al., 2007). These figures depict that in the clostridial realm, individual C. botulinum genomes represent stability. In addition to mobile elements, CDSs with predicted functions in pathogenesis, toxin production and resistance showed variation. This was expected and has been extensively discussed by other authors (Carter et al., 2009; Hill et al., 2009; Macdonald et al., 2011).

In conclusion, the scarcity of the highly conserved CSD containing csp genes in the psychrotrophic C. botulinum type E strains suggests unique cold tolerance machinery in these strains. Further studies are warranted to reveal the genetic background of the stress and adaptive mechanisms of these strains. Expectedly, the pangenome of all strains included in the analysis was large, reflecting the heterogeneity of the four groups of C. botulinum and the distinct adaptation of individual strains and the different physiological groups to different environments. The core genome among the 16 C. botulinum strains studied was expectedly very small relative to that of many other bacterial species, and the group-related core genomes were twice as large as the total core genome. These findings emphasise the distinct qualities of the physiological C. botulinum groups.

Acknowledgements

The work was performed in the Centre of Excellence in Microbial Food Safety Research and funded by the Academy of Finland (118602, 141140) and the ABS Graduate School.

References Altenhoff, A.M. and Dessimoz, C. (2009) ‘Phylogenetic and functional assessment of orthologs

inference projects and methods’, PLoS Computational Biology, Vol. 5, No. 1, p.e1000262. Bayles, D.O., Annous, B.A. and Wilkinson, B.J. (1996) ‘Cold stress proteins induced in Listeria

monocytogenes in response to temperature downshock and growth at low temperatures’, Applied and Environmental Microbiology, Vol. 62, No. 3, pp.1116–1119.

Comparison of Clostridium botulinum genomes 205

Brinkac, L.M., Davidsen, T., Beck, E., Ganapathy, A., Caler, E., Dodson, R.J., Durkin, A.S., Harkins, D.M., Lorenzi, H., Madupu, R., Sebastian, Y., Shrivastava, S., Thiagarajan, M., Orvis, J., Sundaram, J.P., Crabtree, J., Galens, K., Zhao, Y., Inman, J.M., Montgomery, R., Schobel, S., Galinsky, K., Tanenbaum, D.M., Resnick, A., Zafar, N., White, O. and Sutton, G. (2010) ‘Pathema: a clade-specific bioinformatics resource center for pathogen research’, Nucleic Acids Research, Vol. 38, Database issue, pp.D408–414.

Carter, A.T., Paul, C.J., Mason, D.R., Twine, S.M., Alston, M.J., Logan, S.M., Austin, J.W. and Peck, M.W. (2009) ‘Independent evolution of neurotoxin and flagellar genetic loci in proteolytic Clostridium botulinum’, BMC Genomics, Vol. 10, p.115, doi:10.1186/1471-2164-10-115.

Cavicchioli, R. (2006) ‘Cold-adapted archaea’, Nature Reviews Microbiology, Vol. 4, No. 5, pp.331–343.

Chen, Y., Korkeala, H., Aarnikunnas, J. and Lindström, M. (2007) ‘Sequencing the botulinum neurotoxin gene and related genes in Clostridium botulinum type E strains reveals orfx3 and a novel type E neurotoxin subtype’, Journal of Bacteriology, Vol. 189, No. 23, pp.8643–8650.

Collins, M.D. and East, A.K. (1998) ‘Phylogeny and taxonomy of the food-borne pathogen Clostridium botulinum and its neurotoxins’, Journal of Applied Microbiology, Vol. 84, No. 1, pp.5–17.

Derman, Y., Lindström, M., Selby, K. and Korkeala, H. (2011) ‘Growth of group II Clostridium botulinum strains at extreme temperature’, Journal of Food Protection, Vol. 74, No. 11, pp.1797–1804.

Ehrlich, G.D., Hu, F.Z., Shen, K., Stoodley, P. and Post, J.C. (2005) ‘Bacterial plurality as a general mechanism driving persistence in chronic infections’, Clinical Orthopaedics and Related Research, August, Vol. 437, pp.20–24.

Ermolenko, D.N. and Makhatadze, G.I. (2002) ‘Bacterial cold-schock proteins’, Cellular and Molecular Life Sciences, Vol. 59, No. 11, pp.1902–1913.

Gillespie, J.J., Wattam, A.R., Cammer, S.A., Gabbard, J.L., Shukla, M.P., Dalay, O., Driscoll, T., Hix, D., Mane, S.P., Mao, C., Nordberg, E.K., Scott, M., Schulman, J.R., Snyder, E.E., Sullivan, D.E., Wang, C., Warren, A., Williams, K.P., Xue, T., Yoo, H.S., Zhang, C., Zhang, Y., Will, R., Kenyon, R.W. and Sobral, B.W. (2011) ‘PATRIC: the comprehensive bacterial bioinformatics resource with a focus on human pathogenic species’, Infection and Immunity, Vol. 79, No. 11, pp.4286–4298.

Graumann, P. and Marahiel, M.A. (1996) ‘Some like it cold: response of microorganisms to cold shock’, Archives of Microbiology, Vol. 166, No. 5, pp.293–300.

Graumann, P. and Marahiel, M.A. (1997) ‘Effects of heterologous expression of CspB, the major cold shock protein of Bacillus subtilis, on protein synthesis in Escherichia coli’, Molecular and General Genetics, Vol. 253, No. 6, pp.745–752.

Graumann, P., Wendrich, T.M., Weber, M.H., Schroder, K. and Marahiel, M.A. (1997) ‘A family of cold shock proteins in Bacillus subtilis is essential for cellular growth and for efficient protein synthesis at optimal and low temperatures’, Molecular Microbiology, Vol. 25, No. 4, pp.741–756.

Graumann, P.L. and Marahiel, M.A. (1998) ‘A superfamily of proteins that contain the cold-shock domain’, Trends in Biochemical Sciences, Vol. 23, No. 8, pp.286–290.

Gumley, A.W. and Inniss, W.E. (1996) ‘Cold shock proteins and cold acclimation proteins in the psychrotrophic bacterium Pseudomonas putida Q5 and its transconjugant’, Canadian Journal of Microbiology, Vol. 42, No. 8, pp.798–803.

Hébraud, M. and Potler, P. (1999) ‘Cold shock response and low temperature adaptation in psychrotrophic bacteria’, Journal of Molecular Microbiological Biotechnology, Vol. 1, No. 2, pp.211–219.

Hill, K.K., Smith, T.J., Helma, C.H., Ticknor, L.O., Foley, B.T., Svensson, R.T., Brown, J.L., Johnson, E.A., Smith, L.A., Okinaka, R.T., Jackson, P.J. and Marks, J.D. (2007) ‘Genetic diversity among botulinum neurotoxin-producing clostridial strains’, Journal of Bacteriology, Vol. 89, No. 3, pp.818–832.

206 H. Söderholm et al.

Hill, K.K., Xie, G., Foley, B.T., Smith, T.J., Munk, A.C., Bruce, D., Smith, L.A., Brettin, T.S. and Detter, J.C. (2009) ‘Recombination and insertion events involving the botulinum neurotoxin complex genes in Clostridium botulinum types A, B, E and F and Clostridiumbutyricum type E strains’, BMC Biology, Vol. 7, p.66, doi:10.1186/1741-7007-7-66.

Horn, G., Hofweber, R., Kremer, W. and Kalbitzer, H.R. (2007) ‘Structure and function of bacterial cold shock proteins’, Cellular and Molecular Life Sciences, Vol. 64, No. 12, pp.1457–1470.

Hutson, R.A., Thompson, D.E., Lawson, P.A., Schocken-Itturino, R.P., Böttger, E.C. and Collins, M.D. (1993) ‘Genetic interrelationships of proteolytic Clostridium botulinum types A, B and F and other members of the Clostridium botulinum complex as revealed by small-subunit rRNA gene sequences’, Antonie van Leeuwenhoek, Vol. 64, Nos. 3–4, pp.273–283.

Keto-Timonen, R., Heikinheimo, A., Eerola, E. and Korkeala, H. (2006) ‘Identification of Clostridium species and DNA fingerprinting of Clostridium perfringens by amplified fragment length polymorphism analysis’, Journal of Clinical Microbiology, Vol. 44, No. 11, pp.4057–4065.

Keto-Timonen, R., Nevas, M. and Korkeala, H. (2005) ‘Efficient DNA fingerprinting of Clostridium botulinum types A, B, E, and F by amplified fragment length polymorphism analysis’, Applied and Environmental Microbiology, Vol. 71, No. 3, pp.1148–1154.

Lindström, M. and Korkeala, H. (2006) ‘Laboratory diagnostics of botulism’, Clinical Microbiology Reviews, Vol. 19, No. 2, pp.298–314.

Lindström, M., Hielm, S., Nevas, M., Tuisku, S. and Korkeala, H. (2004) ‘Proteolytic Clostridium botulinum type B in the gastric content of a patient with type E botulism due to whitefish eggs’, Foodborne Pathogens and Disease, Vol. 1, No. 1, pp.53–57.

Macdonald, T.E., Helma, C.H., Shou, Y., Valdez, Y.E., Ticknor, L.O., Foley, B.T., Davis, S.W., Hannett, G.E., Kelly-Cirino, C.D., Barash, J.R., Arnon, S.S., Lindström, M., Korkeala, H., Smith, L.A., Smith, T.J. and Hill, K.K. (2011) ‘Analysis of Clostridium botulinum serotype E strains by using multilocus sequence typing, amplified fragment length polymorphism, variable-number tandem-repeat analysis, and botulinum neurotoxin gene sequencing’, Applied and Environmental Microbiology, Vol. 77, No. 24, pp.8625–8634.

Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen, Y.J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L., Jarvie, T.P., Jirage, K.B., Kim, J.B., Knight, J.R., Lanza, J.R., Leamon, J.H., Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T., Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz, A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P., Begley, R.F., Rothberg, J.M. (2005) ‘Genome sequencing in microfabricated high-density picolitre reactors’, Nature, Vol. 437, No. 7057, pp.376–380.

Medini, D., Donati, C., Tettelin, H., Masignani, V. and Rappuoli, R. (2005) ‘The microbial pan-genome’, Current Opinion in Genetics &Development, Vol. 15, No. 6, pp.589–594.

Michel, V., Lehoux, I., Depret, G., Anglade, P., Labadie, J. and Hebraud, M. (1997) ‘The cold shock response of the psychrotrophic bacterium Pseudomonas fragi involves four low-molecular-mass nucleic acid-binding proteins’, Journal of Bacteriology, Vol. 179, No. 23, pp.7331–7342.

Mihailovich, M., Militti, C., Gabaldon, T. and Gebauer, F. (2010) ‘Eukaryotic cold shock domain proteins: highly versatile regulators of gene expression’, Bioessays, Vol. 32, No. 2, pp.109–118.

Palonen, E., Lindström, M. and Korkeala, H. (2010) ‘Adaptation of enteropathogenic Yersinia to low growth temperature’, Critical Reviews in Microbiology, Vol. 36, No. 1, pp.54–67.

Paredes, C.J., Alsaker, K.V. and Papoutsakis, E.T. (2005) ‘A comparative genomic view of clostridial sporulation and physiology’, Nature Reviews Microbiology, Vol. 3, No. 12, pp.969–978.

Comparison of Clostridium botulinum genomes 207

Phadtare, S. and Inouye, M. (2001) ‘Role of CspC and CspE in regulation of expression of RpoS and UspA, the stress response proteins in Escherichia coli’, Journal of Bacteriology, Vol. 183, No. 4, pp.1205–1214.

Phadtare, S. and Inouye, M. (2004) ‘Genome-wide transcriptional analysis of the cold shock response in wild-type and cold-sensitive, quadruple-csp-deletion strains of Escherichia coli’, Journal of Bacteriology, Vol. 186, No. 20, pp.7007–7014.

Rasko, D.A., Myers, G.S. and Ravel, J. (2005) ‘Visualization of comparative genomic analyses by BLAST score ratio’, BMC Bioinformatics, Vol. 6, p.2, doi10.1186/1471-2105/6/2.

Rasko, D.A., Rosovitz, M.J., Myers, G.S., Mongodin, E.F., Fricke, W.F., Gajer, P., Crabtree, J., Sebaihia, M., Thomson, N.R., Chaudhuri, R., Henderson, I.R., Sperandio, V. and Ravel, J. (2008) ‘The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates’, Journal of Bacteriology, Vol. 190, No. 20, pp.6881–6893.

Scaria, J., Ponnala, L., Janvilisri, T., Yan, W., Mueller, L.A. and Chang, Y-F. (2010) ‘Analysis of ultra low genome conservation in Clostridium difficile’, PLoS ONE, Vol. 5, No. 12, p.e15147.

Sebaihia, M., Peck, M.W., Minton, N.P., Thomson, N.R., Holden, M.T., Mitchell, W.J., Carter, A.T., Bentley, S.D., Mason, D.R., Crossman, L., Paul, C.J., Ivens, A., Wells-Bennik, M.H., Davis, I.J., Cerdeno-Tarraga, A.M., Churcher, C., Quail, M.A., Chillingworth, T., Feltwell, T., Fraser, A., Goodhead, I., Hance, Z., Jagels, K., Larke, N., Maddison, M., Moule, S., Mungall, K., Norbertczak, H., Rabbinowitsch, E., Sanders, M., Simmonds, M., White, B., Whithead, S. and Parkhill, J. (2007) ‘Genome sequence of a proteolytic (Group I) Clostridium botulinum strain Hall A and comparative analysis of the clostridial genomes’, Genome Research, Vol. 17 No. 7, pp.1082-1092.

Smith, T.J., Lou, J., Geren, I.N., Forsyth, C.M., Tsai, R., Laporte, S.L., Tepp, W.H., Bradshaw, M., Johnson, E.A., Smith, L.A. and Marks, J.D. (2005) ‘Sequence variation within botulinum neurotoxin serotypes impacts antibody binding and neutralization’, Infection and Immunity, Vol. 73, No. 9, pp.5450–5457.

Söderholm, H., Lindström, M., Somervuo, P., Heap, J., Minton, N., Lindén, J. and Korkeala, H. (2011) ‘cspB encodes a major cold shock protein in Clostridium botulinum ATCC 3502’, International Journal of Food Microbiology, Vol. 146, No. 1, pp.23–30.

Tettelin, H., Masignani, V., Cieslewicz, M.J., Donati, C., Medini, D., Ward, N.L., Angiuoli, S.V., Crabtree, J., Jones, A.L., Durkin, A.S., Deboy, R.T., Davidsen, T.M., Mora, M., Scarselli, M., Margarit y Ros, I., Peterson, J.D., Hauser, C.R., Sundaram, J.P., Nelson, W.C., Madupu, R., Brinkac, L.M., Dodson, R.J., Rosovitz, M.J., Sullivan, S.A., Daugherty, S.C., Haft, D.H., Selengut, J., Gwinn, M.L., Zhou, L., Zafar, N., Khouri, H., Radune, D., Dimitrov, G., Watkins, K., O’Connor, K.J., Smith, S., Utterback, T.R., White, O., Rubens, C.E., Grandi, G., Madoff, L.C., Kasper, D.L., Telford, J.L., Wessels, M.R., Rappuoli, R. and Fraser, C.M. (2005) ‘Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial ‘pan-genome’’, Proceedings of the National Academy of Sciences of the United States of America, Vol. 102, No. 39, pp.13950–13955.

Wemekamp-Kamphuis, H.H., Karatzas, A.K., Wouters, J.A. and Abee, T. (2002) ‘Enhanced levels of cold shock proteins in Listeria monocytogenes LO28 upon exposure to low temperature and high hydrostatic pressure’, Applied and Environmental Microbiology, Vol. 68, No. 2, pp.456–463.

Wistow, G. (1990) ‘Cold shock and DNA binding’, Nature, Vol. 344, No. 6269, pp.823–824. Wu, J. and Xie, J. (2010) ‘Hidden Markov model and its applications in motif findings’, in

Bang, H. et al. (Eds.): Statistical Methods in Molecular Biology, series, Vol. 620, pp.405–416. Yamanaka, K. and Inouye, M. (1997) ‘Growth-phase-dependent expression of cspD, encoding a

member of the CspA family in Escherichia coli’, Journal of Bacteriology, Vol. 179, No. 16, pp.5126–5130.

Yamanaka, K., Fang, L. and Inouye, M. (1998) ‘The CspA family in Escherichia coli: multiple gene duplication for stress adaptation’, Molecular Microbiology, Vol. 27, No. 2, pp.247–255.