Comparative Sequence Analysis of the DNA Packaging, Head, and Tail Morphogenesis Modules in the...

10
Comparative Sequence Analysis of the DNA Packaging, Head, and Tail Morphogenesis Modules in the Temperate cos-Site Streptococcus thermophilus Bacteriophage Sfi21 Frank Desiere, Sacha Lucchini, and Harald Bru ¨ssow 1 Nestle ´ Research Centre, Nestec Ltd., Vers-chez-les-Blanc, CH-1000 Lausanne 26, Switzerland Received December 22, 1998; returned to author for revision May 13, 1999; accepted June 1, 1999 The temperate Streptococcus thermophilus bacteriophage Sfi21 possesses 15-nucleotide-long cohesive ends with a 39 overhang that reconstitutes a cos-site with twofold hyphenated rotational symmetry. Over the DNA packaging, head and tail morphogenesis modules, the Sfi21 sequence predicts a gene map that is strikingly similar to that of lambdoid coliphages in the absence of any sequence similarity. A nearly one to one gene correlation was found with the phage lambda genes Nu1 to H, except for gene B-to-E complex, where the Sfi21 map resembled that of coliphage HK97. The similarity between Sfi21 and HK97 was striking: both major head proteins showed an N-terminal coiled-coil structure, the mature major head proteins started at amino acid positions 105 and 104, respectively, and both major head genes were preceded by genes encoding a possible protease and portal protein. The purported Sfi21 protease is the first viral member of the ClpP protease family. The prediction of Sfi21 gene functions by reference to the gene map of intensively investigated coliphages was experimentally confirmed for the major head and tail gene. Phage Sfi21 shows nucleotide sequence similarity with Lactococcus phage BK5-T and a lactococcal prophage and amino acid sequence similarity with the Lactobacillus phage A2 and the Staphylococcus phage PVL. PVL is a missing link that connects the portal proteins from Sfi21 and HK97 with respect to sequence similarity. These observations and database searches, which demonstrate sequence similarity between proteins of phage from gram-positive bacteria, proteobacteria, and Archaea, constrain models of phage evolution. © 1999 Academic Press INTRODUCTION Streptococcus thermophilus is a gram-positive ther- mophilic lactic acid bacterium used in industrial milk fermentation (Mercenier, 1990). Phage attack has always been a major problem in the dairy industry and is asso- ciated with substantial economical loss. This is also the case for S. thermophilus (Bru ¨ssow et al., 1998). The genetic construction of phage-resistant bacterial starters is thus an important biotechnological goal. In contrast to the related genus Lactococcus, lactic streptococci pos- sess few plasmids encoding natural defence systems of the bacterial host against phage infection (Larbi et al., 1992). We therefore turned our attention to the bacterio- phage genome to identify and exploit phage-encoded control systems that interfere with phage replication (Foley et al., 1998). Because relatively few data were available on the biology of these phages (reviewed in Bru ¨ssow et al., 1998; Bru ¨ssow, 1999) and because tools for genetic analysis in lactic streptococci are not very developed, several laboratories decided to sequence, as a first step, a complete phage genome to obtain an overview of their genome organization (Stanley et al., 1997). However, in many bacterial genome sequencing projects, only about half of all identified open reading frames (ORFs) find a match in the database. This situa- tion is even worse in bacteriophage genome sequencing projects: rarely more than a fourth of the phage ORFs match entries in the database. A possible remedy to this situation could be the sequencing of genetically closely related phages that differ in phenotypes. We applied this approach to S. thermophilus bacteriophages that differed in DNA packaging (cos- or pac-type phages), lifestyle (lytic or temperate phages), host range, or sensitivity to prophage-mediated superinfection exclusion (Desiere et al., 1998; Lucchini et al., 1998; Lucchini et al., 1999a,b). This analysis allowed the establishment of associations between phenotype and genotype and thus the attribu- tion of possible functions to a few ORFs. When comparing the genome of a pac-type S. ther- mophilus bacteriophage Sfi11 with Siphoviridae from gram-positive bacteria, we observed not only a similar gene order but also sequence relationships with a Lac- tococcus and a Bacillus phage (Lucchini et al., 1998). In addition, similarity with the structural gene map of phage l was noted for the Lactococcus phage sk1 (Chandry et al., 1997), the Streptococcus phage Sfi11 (Lucchini et al., 1998), and the Bacillus phage SPP1 (Becker et al., 1997). Because phage 2 is the best investigated biological system, the similarity might be exploited to predict gene functions in gram-positive bacteria to lead future biolog- ical experimentation. However, for the prediction of gene functions, the observed correlations between the gene 1 To whom reprint requests should be addressed. Fax: 0041-21-785- 8925. E-mail: [email protected]. Virology 260, 244–253 (1999) Article ID viro.1999.9830, available online at http://www.idealibrary.com on 0042-6822/99 $30.00 Copyright © 1999 by Academic Press All rights of reproduction in any form reserved. 244

Transcript of Comparative Sequence Analysis of the DNA Packaging, Head, and Tail Morphogenesis Modules in the...

mfbccgitst1pc(aBfdao1p

8

Virology 260, 244–253 (1999)Article ID viro.1999.9830, available online at http://www.idealibrary.com on

0CA

Comparative Sequence Analysis of the DNA Packaging, Head, and Tail MorphogenesisModules in the Temperate cos-Site Streptococcus thermophilus Bacteriophage Sfi21

Frank Desiere, Sacha Lucchini, and Harald Brussow1

Nestle Research Centre, Nestec Ltd., Vers-chez-les-Blanc, CH-1000 Lausanne 26, Switzerland

Received December 22, 1998; returned to author for revision May 13, 1999; accepted June 1, 1999

The temperate Streptococcus thermophilus bacteriophage Sfi21 possesses 15-nucleotide-long cohesive ends with a 39overhang that reconstitutes a cos-site with twofold hyphenated rotational symmetry. Over the DNA packaging, head and tailmorphogenesis modules, the Sfi21 sequence predicts a gene map that is strikingly similar to that of lambdoid coliphages inthe absence of any sequence similarity. A nearly one to one gene correlation was found with the phage lambda genes Nu1to H, except for gene B-to-E complex, where the Sfi21 map resembled that of coliphage HK97. The similarity between Sfi21and HK97 was striking: both major head proteins showed an N-terminal coiled-coil structure, the mature major head proteinsstarted at amino acid positions 105 and 104, respectively, and both major head genes were preceded by genes encoding apossible protease and portal protein. The purported Sfi21 protease is the first viral member of the ClpP protease family. Theprediction of Sfi21 gene functions by reference to the gene map of intensively investigated coliphages was experimentallyconfirmed for the major head and tail gene. Phage Sfi21 shows nucleotide sequence similarity with Lactococcus phage BK5-Tand a lactococcal prophage and amino acid sequence similarity with the Lactobacillus phage A2 and the Staphylococcusphage PVL. PVL is a missing link that connects the portal proteins from Sfi21 and HK97 with respect to sequence similarity.These observations and database searches, which demonstrate sequence similarity between proteins of phage from

gram-positive bacteria, proteobacteria, and Archaea, constrain models of phage evolution. © 1999 Academic Press

ftpmsrai(peTbt

mggtala1Bsfi

INTRODUCTION

Streptococcus thermophilus is a gram-positive ther-ophilic lactic acid bacterium used in industrial milk

ermentation (Mercenier, 1990). Phage attack has alwayseen a major problem in the dairy industry and is asso-iated with substantial economical loss. This is also thease for S. thermophilus (Brussow et al., 1998). Theenetic construction of phage-resistant bacterial starters

s thus an important biotechnological goal. In contrast tohe related genus Lactococcus, lactic streptococci pos-ess few plasmids encoding natural defence systems of

he bacterial host against phage infection (Larbi et al.,992). We therefore turned our attention to the bacterio-hage genome to identify and exploit phage-encodedontrol systems that interfere with phage replication

Foley et al., 1998). Because relatively few data werevailable on the biology of these phages (reviewed inrussow et al., 1998; Brussow, 1999) and because tools

or genetic analysis in lactic streptococci are not veryeveloped, several laboratories decided to sequence, as

first step, a complete phage genome to obtain anverview of their genome organization (Stanley et al.,997). However, in many bacterial genome sequencingrojects, only about half of all identified open reading

1 To whom reprint requests should be addressed. Fax: 0041-21-785-

f925. E-mail: [email protected].

042-6822/99 $30.00opyright © 1999 by Academic Pressll rights of reproduction in any form reserved.

244

rames (ORFs) find a match in the database. This situa-ion is even worse in bacteriophage genome sequencingrojects: rarely more than a fourth of the phage ORFsatch entries in the database. A possible remedy to this

ituation could be the sequencing of genetically closelyelated phages that differ in phenotypes. We applied thispproach to S. thermophilus bacteriophages that differed

n DNA packaging (cos- or pac-type phages), lifestylelytic or temperate phages), host range, or sensitivity torophage-mediated superinfection exclusion (Desieret al., 1998; Lucchini et al., 1998; Lucchini et al., 1999a,b).his analysis allowed the establishment of associationsetween phenotype and genotype and thus the attribu-

ion of possible functions to a few ORFs.When comparing the genome of a pac-type S. ther-

ophilus bacteriophage Sfi11 with Siphoviridae fromram-positive bacteria, we observed not only a similarene order but also sequence relationships with a Lac-

ococcus and a Bacillus phage (Lucchini et al., 1998). Inddition, similarity with the structural gene map of phagewas noted for the Lactococcus phage sk1 (Chandry et

l., 1997), the Streptococcus phage Sfi11 (Lucchini et al.,998), and the Bacillus phage SPP1 (Becker et al., 1997).ecause phage 2 is the best investigated biologicalystem, the similarity might be exploited to predict gene

unctions in gram-positive bacteria to lead future biolog-cal experimentation. However, for the prediction of gene

unctions, the observed correlations between the gene

mcl1pScnpot

bfoguat(sobocqlibcd

T

pe(alvgeqe11ts2snw(

Bg

pwmoidpl(fpvostnatpAa3OtAagmp(LbaAO

To

esPNpw(pqpTp

245SEQUENCE ANALYSIS OF S. thermophilus BACTERIOPHAGE Sfi21

aps of the phages investigated were not sufficientlylose. We suspected that the smaller genome size of the

actococcal cos-site phage sk1 (28 kb) (Chandry et al.,997) or the different DNA packaging mechanism of theac-site streptococcal phage Sfi11 and B. subtilis phagePP1 prevented a closer match with the gene map of theos-site coliphage. Therefore, we investigated the ge-ome organization of the 40-kb cos-site S. thermophilushage Sfi21. Over a 15-kb structural gene cluster, webserved a nearly 1:1 gene/ORF correlation between the

wo phages.The theoretical basis for this correlation is not clear

ecause the comparable genome organization wasound in the absence of any sequence relationship. Tax-nomically oriented virologists favour the comparison ofenomic maps over protein sequence alignments for thenderstanding of phage evolution because sequencelignments have, until now, essentially confirmed rela-

ionships between phages that were known to be relatedAckermann, 1999). However, the total number of phageequences in the database is small compared with otherrganisms and the lack of protein sequence alignmentsetween more distantly related phages might reflect anbservation bias. Currently, the situation is about tohange with the introduction of many new phage se-uences into the database. We demonstrate here simi-

arities not only in the gene organization but partially alson protein sequences of phages whose hosts include,esides streptococci and its close relatives, Escherichiaoli, Rhodobacter (a proteobacterium of the gamma sub-ivision), and an archaeon.

RESULTS

he cos-site of Sfi21

During direct sequencing of the heat-treated unligatedhage DNA by primer walking, we observed an abruptnd of the sequencing reaction upstream of ORF 152

Fig. 1). When re-annealed, ligated Sfi21 DNA was useds the template, direct DNA sequencing with a primer

ocated in ORF 152 led into ORF 175 sequences, and viceersa. When the heat-treated Sfi21 DNA was first di-ested with T4 DNA polymerase, which carries a 39-to-59xonuclease activity, then ligated and subsequently se-uenced, we obtained an identical nucleotide sequencexcept for the absence of a 15-bp stretch between ORFs75 and 152. This analysis suggested the presence of a5-nucleotide-long, 39-extended single-stranded cDNA athe two ends of the phage DNA. This 15-bp DNA stretchhowed a twofold hyphenated rotational symmetry (Fig.). We propose that this sequence constitutes the cosNite of Sfi21. The cosN site was located in an 186-bponcoding AT-rich DNA region. This region showed aeak match with pST1 plasmid from S. thermophilus

P 5 .015), (Janzen et al., 1992). W

ioinformatic analysis of the cos-site associatedenes of Sfi21

Over the genome region depicted in Fig. 1, the predictedhage Sfi21 proteins showed close sequence relationshipsith the corresponding proteins from the cos-site S. ther-ophilus phage DT1 (Tremblay and Moineau, 1999). The

nly major difference was the separation of Sfi21 ORF 623nto two ORFs in DT1 (Table 1). Except for phage DT1, noatabase links could be established for ORF 132 gp fromhage Sfi21. ORF 175 gp showed in addition strong simi-

arity to a protein from Staphylococcus aureus phage PVLTable 1) and weak similarity to cos-site-associated genesrom Lactobacillus casei phage A2 and Bacillus subtilishage phi-105. A multiple alignment of the proteins re-ealed 32 amino acid (aa) positions shared in at least threef four phage proteins (data not shown). ORF 152 showedignificant sequence similarity to a probably defective Lac-

ococcus lactis prophage gene that abuts on a large ge-ome inversion in the host bacterium (Daveran-Mingot etl., 1998) and probably prophage-encoded Mycobacterium

uberculosis proteins. In addition, weak similarities withroteins encoded by cos-site-associated genes from phage2 and actinophage RP3 were detected (Table 1). Multiplelignments of the above-mentioned five proteins identified9 aa positions that were shared in at least three proteins.RF 623 gp showed very high similarity with proteins from

he same L. lactis prophage, phages A2 and PVL (Table 1).ll three phage proteins were encoded by the second genefter the respective cos-sites (Fig. 3). In addition, ORF 623p shared significant sequence similarity with putative ter-inases from the L. lactis phages c2 and bIL67 and a

rophage-encoded protein from Rhodobacter capsulatusTable 1, Fig. 3). A multiple alignment of the Streptococcus,actococcus, Lactobacillus, Staphylococcus, and Rhodo-acter phage and prophage proteins showed 41 invariablea positions. The presence of a typical Walker box for anTP-binding protein further supports the suggestion thatRF 623 gp encodes a large subunit terminase.

he putative head morphogenesis genome regionf Sfi21

Except for phage DT1, no bioinformatic links could bestablished for ORF 59 gp. ORF 384 gp from Sfi21 showedignificant sequence similarity with a protein from phageVL encoded at a corresponding position (Table 1, Fig. 3).otably, the PVL protein showed significant similarity to theortal protein from coliphage HK97 (Kaneko et al., 1998),hereas the HK97 protein showed sequence similarity

P 5 10210) with ORF 426 gp from the Rhodobacter pro-hage (Fig. 3). ORF 221 gp from Sfi21 showed strong se-uence similarity to different members of the large ClpProtease family (10 members showed P 5 1029 to 1025).he attribution of ORF 221 gp to this large family of serineroteases is further supported by a multiple alignment.

hen compared with eight members of the ClpP protease

FIG

.1.C

ompa

rison

ofth

eD

NA

pack

agin

gan

dhe

adan

dta

ilas

sem

bly

gene

sin

the

cos-

site

-con

tain

ing

stre

ptoc

occa

lpha

geS

fi21

(gre

enbo

x)an

dth

eco

s-si

teE

.col

ipha

gel

and

the

Met

hano

bact

eriu

mph

age

psiM

2.In

addi

tion,

apa

rtia

lgen

em

apof

the

lam

bdoi

dco

lipha

geH

K97

ispr

ovid

ed.T

henu

mbe

rsbe

low

the

map

sgi

veth

ele

ngth

sof

the

dedu

ced

prot

eins

inaa

.The

phag

el

gene

sar

eid

entif

ied

byth

eir

lette

rco

deab

ove

the

arro

ws

and

bya

shor

tdes

crip

tion

ofth

ege

nefu

nctio

nsbe

low

the

arro

ws.

Cor

resp

ondi

ngge

nes

inth

efo

urph

ages

are

indi

cate

dw

ithth

esa

me

colo

rco

de(l

Cge

negi

ves

abi

oinf

orm

atic

prot

ease

link,

but

prot

ease

activ

ityha

sno

tbe

enid

entif

ied)

.

246

fsscwTviga

copdamtsl1Ts(t

tpms

lis

Oi

247SEQUENCE ANALYSIS OF S. thermophilus BACTERIOPHAGE Sfi21

amily of bacterial, plant, and animal origin, the Sfi21 proteinhared 28 positions with at least six of them (data nothown). Especially well conserved were the diagnosticatalytic serine and histidine residues (Maurizi et al., 1990),hich are flanked by conserved glycine residues (Fig. 5).he Sfi21 protein is, to our knowledge, the first case of airally encoded ClpP-like protein. The Sfi21 ClpP-like protein

s unlikely to result from a horizontal gene transfer from theenome of its bacterial host: two different phylogenetic treenalyses demonstrated that the Sfi21 protein is not more

FIG. 2. The nucleotide sequence of the cos-region from S. thermophi-us phage Sfi21. The 15-nucleotide 39 overhang from the cohesive endss marked by a large box; the region of twofold hyphenated rotationalymmetry is shaded.

T

Open Reading Frames of the DNA Packaging, Head, and Ta

ORF Put. function Similarity to

132 S. thermophilus phage DT1175 S. thermophilus phage DT1

S. aureus phage PVL ORFLb. casei phage A2 ORFB. subtilis phage f-105 ORF

152 S. thermophilus phage DT1 ORFL. lactis spp. cremoris prophage ORFM. tuberculosis Rv15M. tuberculosis Rv26Lb. casei phage A2 ORFActinophage RP3 ORF

623 Terminase L. lactis spp. cremoris prophage Gp5-large subunit S. thermophilus phage DT1 ORF

S. thermophilus phage DT1 ORFLb. casei phage A2 ORFS. aureus phage PVL ORFB. subtilis phage f-105 ORFL. lactis prophage ORFRhodobacter capsulatus ORFStreptomyces phage f-C31 gp 3B. subtilis phage f-105 ORF

59 S. thermophilus phage DT1 ORF384 Portal protein S. thermophilus phage DT1 ORF

S. aureus phage PVL PortaB. subtilis phage f-105 ORFStreptomyces phage f-C31 gp 3

221 ClpP-like S. thermophilus phage DT1 ORFvarious ClpP proteases

397 Major head S. thermophilus phage DT1 ORFprotein L. lactis cremoris phage BK5-T ORF

B. subtilis phage f-105 ORFS. aureus phage PVL Caps

Note. For the database matches of the ORFs downstream of the majoRF by codon length; Put. Function; putative function deduced from

dentification as annotated in the entry; id/sim/length; percent aa identity/sim

losely related to bacterial (including a close sister speciesf S. thermophilus, namely S. salivarius) than to algal ClpProteins (Fig. 4). Because a closely related protein wasetected in the lactococcal phage BK5-T and was encodedt a corresponding map position (A. Hillier, personal com-unication), we might deal here with a new class of bac-

eriophage-encoded ClpP-like proteins. Interestingly, corre-ponding map positions were occupied by genes encoding

ikely proteases in coliphages HK97 (Hendrix and Duda,998) and N15 (R. W. Hendrix, unpublished observation).he ORF 303 gp from the Rhodobacter prophage showedequence relationship with the putative protease from N15

P 5 1023). However, none of these possible phage pro-eases belonged to the ClpP protein family.

ORF 397 gp shares very high sequence similarity withhe major head protein of L. lactis phage BK5-T (A. Hillier,ersonal communication) and significant similarity with theajor head protein of phage PVL (Table 1). N-terminal

equencing of the larger of the two major structural pro-

hogenesis Module of Sfi21 and Similarities with Databases

P id/sim/length (%) Reference

10261 86/95/132 Tremblay and Moineau (1999)10285 84/93/168 Tremblay and Moineau (1999)1028 28/44/125 Kaneko et al. (1998)

0.038 29/41/70 Garcia et al. (1997)0.19 33/53/142 Ellis and Dean (1985)

10267 76/87/152 Tremblay and Moineau (1999)10214 30/50/137 Daveran-Mingot (1999)1027 25/41/127 Cole et al. (1998)1024 24/45/129 Cole et al. (1998)

0.008 21/40/138 Garcia et al. (1998)0.11 20/35/76 Kinner et al. (1994)0 64/83/519 AJ2239610 92/97/369 Tremblay and Moineau (1999)

102126 95/95/229 Tremblay and Moineau (1999)10272 30/53/542 Garcia et al. (1998)10247 24/64/575 Kaneko et al. (1998)10225 23/45/411 Ellis and Dean (1985)10223 46/72/92 AJ22396210219 22/41/548 Vlcek et al. (1997)1027 23/39/311 Smith et al. (1999)1024 35/54/77 Ellis and Dean (1985)10224 86/96/59 Tremblay and Moineau (1999)

0 87/94/381 Tremblay and Moineau (1999)10214 27/46/339 Kaneko et al. (1998)10210 22/43/329 AB0162821026 24/45/275 Smith et al. (1999)102109 90/95/221 Tremblay and Moineau (1999)

102143 86/90/290 Tremblay and Moineau (1999)102128 58/73/196 Lakshmidevi et al. (1990)10248 38/58/344 AB016282

ein 10218 26/46/324 Kaneko et al. (1998)

gene, see Desiere et al., (1998). ORF; identification of the phage Sfi21rmatic analysis or N-terminal sequencing (ORF 397 gp); Gene; gene

ABLE 1

il Morp

Gene

6322153478c52c4210like3691229s52225557

32359384l25

422

29340427id prot

r headbioinfo

ilarity over the indicated length.

tOpstbmN1tc3t

tAgdga

Gi

shl

tsrp(ptsgalsdaqwa

osotocga

fpcp

248 DESIERE, LUCCHINI, AND BRUSSOW

eins from Sfi21 (Desiere et al., 1998) confirmed that theRF 397 gp is the likely major head protein. Tail-less phagearticles contain only the larger 32-kDa protein (data nothown). The N-terminal peptide sequence (LLDSKT) from

he larger phage structural protein resolved a discrepancyetween the predicted (44 kDa) and the observed (32 kDa)olecular mass of the putative major head protein. The-terminal sequence corresponded to aa positions 105–

10 of the predicted ORF 397 gp, suggesting possible pro-eolytic processing of this protein during maturation. Pro-essing of the ORF 397 gp at aa position 105 predicts a2-kDa protein, in agreement with the experimentally de-

ermined value.The smaller phage structural protein, which is lost in

ail-less phage particles, yielded the peptide sequenceIVGLK, which matched aa positions 2–7 from ORF 202p (Fig. 1). The first methionine was lacking in accor-ance with the rule that the N-terminal methionine isenerally processed when the second aa residue is anlanine (Ben-Bassat et al., 1987).

raded conservation of the morphogenesis modulen tailed phages

When analyzing the morphogenesis genes from a pac-ite S. thermophilus phage, we previously observed aierarchy of relationships between phages that corre-

FIG. 3. The map of the cos-site-associated genes in the temperate S.rom Staphylococcus aureus phage PVL, Lactobacillus casei phage Arophage, a Rhodobacter capsulatus prophage, and E. coli phage HK97odon length. Corresponding genes showed the same shading. Theroteins show significant aa sequence similarity are connected.

ated with the evolutionary distance of the host bacteria d

o lactic streptococci (Lucchini et al., 1998). This was alsoeen for the cos-site S. thermophilus phage. The closestelationships of phage Sfi21 were to the lactococcalhages that could still be aligned at the nucleotide level

P , 10285): the major head gene from BK5-T (A. Hillier,ersonal communication) showed 66% bp identity, and

he terminase gene from the lactococcal prophagehowed 64% bp identity with the corresponding Sfi21ene. At the next level were similarities between Sfi21nd the Staphylococcus phage PVL and the Lactobacil-

us phage A2. The genetic organization was similar, andeveral phage genes showed sequence similarity at theeduced aa level but no longer at the nucleotide level. Inddition, sequence-related genes were separated by se-uence-unrelated genes: PVL ORF 63, 2, 4, and 7 gpsere related to Sfi21 proteins, whereas PVL ORF 1, 3, 5,nd 6 gps were unrelated (Fig. 3).

At the next level, Sfi21 showed similarity in the geneticrganization with phage l (Fig. 1), whereas no sequenceimilarity at the protein level could be detected. In fact,nly two differences were seen in the genetic organiza-

ion of the left 15 kb from Sfi21 and l: Sfi21 ORF 221ccupied the position of the phage l gene C–Nu3–Domplex and Sfi21 ORF 117 occupied the position of theene G–T complex in l. Interestingly, over the first vari-nt region, Sfi21 showed exactly the gene constellation

philus phage Sfi21 was compared with the corresponding gene mapsmpletely sequenced Lactococcus lactis phage BK5-T and a L. lactisredicted ORFs are indicated as in their published reports or with their

n of the cos-site is indicated where known. Genes whose predicted

thermo2, inco. The ppositio

escribed for the lambdoid coliphage HK97 (Hendrix and

DHpcDcc1SS1oHp

i(tipm

Mataptf

aS cus sa

dvi

249SEQUENCE ANALYSIS OF S. thermophilus BACTERIOPHAGE Sfi21

uda, 1998) (Fig. 1). The similarity between Sfi21 andK97 was even more striking: first, the major structuralroteins from both phages are likely to be proteolyticallyleaved at a comparable position (HK97 aa position 104;uda et al., 1995; Sfi21 aa position 105). Second, a strongoiled-coil structure could be predicted for the pro-essed N-terminal region of both proteins (Conway et al.,995) (Fig. 6). Interestingly, the major head protein from. aureus PVL showed similar properties to the HK97 andfi21 proteins: the mature protein starts at aa position17 (Kaneko et al., 1998), and a coiled-coil is predictedver the N-terminal 70 aa (data not shown). Third, theK97 and Sfi21 major head genes were preceded by autative protease gene. Phage HK97 has been noted for

FIG. 4. Phylogenetic tree analysis of the ORF 221 gp from S. thermond plants. The tree was derived by the DARWIN function phylotree,wissProt mnemonic of each protein or the species name. Streptococ

FIG. 5. Multiple alignment of the ORF 221 gp from S. thermophilus penitrificans, Synechocystis spp., Bacillus subtilis, Streptococcus saliulgaris over the characteristic catalytic ClpP motif. The catalytic aa

dentical in at least five proteins are shaded, aa positions identical in at leas

ts absence of a scaffolding protein. Hendrix and Duda1998) proposed that the proteolytically processed N-erminal part of the major head protein fulfills a scaffold-ng function. From the similar genomic organization inhage Sfi21, we suspect that this constellation might beore widely distributed than initially thought.It is fascinating to note that the gene map of theethanobacterium thermoautotrophicum phage psiM2,

n archaeavirus (Pfister et al., 1998), also resembleshose of phages Sfi21 and l (Fig. 1). Further support forn ancestral genetic relationship between these threehage systems comes from significant sequence rela-

ionships. For example, the portal and major head proteinrom psiM2 showed similarity with the corresponding

hage Sfi21 in comparison with putative ClpP proteases from bacteriae distances given in PAM units. The branches are labeled with the

livarius, the closest species to S. thermophilus, is in bold.

fi21 with putative ClpP proteins from Borrelia burgdorferi, ParacoccusCaenorhabditis elegans, Mycobacterium tuberculosis, and Chlorellas serine and histidine are marked by arrows. Amino acid positions

philus pwith th

hage Svarius,residue

t eight of the nine aligned proteins are underlined in black.

ppgmbai

sspcogmmobmspi

p1cmtgptwjtprmVjsOitclc

ftncbk1nscctogppmsp

ohiovtaposc

f3tp

250 DESIERE, LUCCHINI, AND BRUSSOW

roteins from the gram-negative Haemophilus influenzaehage HP-1 (P 5 1026; Esposito et al., 1996) and theram-positive B. subtilis prophage PBSX (P 5 1024; Take-aru et al., 1995; Pfister et al., 1998; and our own data-

ase searches). In fact, alignments of the three portalnd head proteins demonstrated 21 and 32, respectively,

dentical aa positions (data not shown).

DISCUSSION

Over the DNA packaging, head and tail morphogene-is modules, a nearly one to one correlation was ob-erved between genes from a cos-site S. thermophilushage and lambdoid coliphages. The probability that thisorrelation occurred by chance is very low. The basicbservation of the conservation of the morphogenesisenes in some tailed phages is not new; it was notedore than 25 years ago (Dove, 1971) that the prophageaps of P2, P22, and l were partially congruent. The

bservation was confirmed by sequence comparisonetween P22 and l (Eppler et al., 1991). Evolutionaryechanisms that could favour the conservation of the

tructural gene order in lambdoid phages have beenroposed (Casjens and Hendrix, 1974). This congruence

FIG. 6. Probability of a coiled-coil region in the major head proteinsrom E. coli phage HK97 (gp5) and S. thermophilus phage Sfi21 (ORF53 gp) predicted according to the Multicoil program. The location of

he N-terminus of the mature protein from the extracellular phagearticle is indicated by an arrow and the aa position.

n the gene maps was recently extended to bacterio- O

hages infecting gram-positive bacteria (Becker et al.,997; Chandry et al., 1997; Lucchini et al., 1998). Theorrelation between the Sfi21 and phage l/HK97 geneaps reported in the present communication is so close

hat one is tempted to exploit the similarity to predictene functions for Sfi21 ORFs. The alignment with thehage map predicts that Sfi21 ORFs 152 and 623 encode

he small and large subunits of terminase, respectively,hereas ORF 59 would be implicated in head-to-tail

oining. The alignment with the HK97 map predicts thathe three ORFs 384, 221, and 397 encode the portalrotein, a prohead protease, and the major head protein,

espectively. For ORFs 106, 116, 141, 123, and 202, oneight predict analogous functions to the Fi, Fii, Z, U, andgenes from l, which are implicated in head-to-tail

oining processes. According to that prediction, ORF 202hould encode the major tail protein. The attribution ofRF 1560 to the H gene from l, implicated in DNA

njection and tail length determination, is suggested byhe map position and the sheer length of the gene. Inontrast, the assignment of ORF 117 to a gene is prob-

ematic because two genes, G and T, are found at theorresponding position in l.

Two of the above predictions were experimentally con-irmed. ORFs 397 and 202 gps are the major head andail proteins, respectively, as demonstrated by a combi-ation of N-terminal sequencing and electron micros-opy. The validity of the other assignments is supportedy indirect arguments: a minor structural protein of 155Da (Desiere et al., 1998) is likely to be encoded by ORF560; ORF 623 gp shows bioinformatic links to termi-ases and an ATP-binding motif, whereas ORF 384 gphowed bioinformatic links to portal proteins. ORF 221 gpontained a motif found at the catalytic site of a specificlass of serine protease. A further argument in favor of

he predictive power of the correlation hypothesis is thebservation that many bioinformatic links to phageenes were located at corresponding positions of thehage genome maps. We believe, therefore, that com-arative sequence “gazing” has, in phage genomics, auch better chance of producing new biological under-

tanding than simple BLAST searches for individualhage genes.

What might be the theoretical basis for this correlationf the gene maps of tailed phages infecting bacterialosts as distinct as gram-positive and -negative organ-

sms? Taxonomically, both phages belong to the samerder Caudovirales, family Siphoviridae, genus “l-likeiruses” (Maniloff and Ackermann, 1998). Over the struc-ural genes, which are likely to be indigenous to phagesnd where relationships must be sought in the firstlace, phages Sfi21 and l might thus share a commonrigin. Why, then, do we not detect sequence relation-hips between these two phages? Ackermann (1999)ompiled a list of arguments to deal with this dilemma.

ne is that tailed phages are so much older than plant or

aldtwdtctstmafhpRtsaetsihdtcseatpnafmpsp

B

amaiBwms

C

pr(cs1

wceDmSsLnd2msa

rseDzS

a(gph

P

pdpmarCpe

wptvs

251SEQUENCE ANALYSIS OF S. thermophilus BACTERIOPHAGE Sfi21

nimal viruses that relationships were erased and noonger detectable by aa alignments (Doolittle, 1981). In-irect evidence in favor of this argument is the observa-

ion that we actually get aa sequence alignments whene compare “l-like viruses” over smaller evolutionaryistances, such as from phages infecting low GC-con-

ent gram-positive bacteria (Lucchini et al., 1998; andurrent report). Another argument of Ackermann (1999) is

hat relationships (e.g., between capsid proteins) may beo weak that they are detectable only through examina-

ion of missing links; for example, proteins A, B, and Cay be related but may appear unrelated if only A and C

re compared. If we equate the putative portal proteinsrom phages Sfi21, PVL, and HK97 with A, B, and C, weave exactly what is predicted (and we could still pro-ose a D for the putative portal protein of thehodobacter prophage). A further argument is that rela-

ionships may be preserved in the three-dimensionaltructure of tailed phage capsids and tail proteins but notny longer in sequence (Ackermann, 1999). This hypoth-sis cannot be tested yet because not a single one of

hese phage proteins has been studied for its spatialtructure. Again, our comparisons provide an indirect

llustration for the validity of this hypothesis. The majoread proteins from Sfi21 and HK97 do not share anyetectable sequence similarity, yet they yield an ex-

remely similar coiled-coil prediction and they areleaved at a nearly identical aa position. Both argumentspeak in favor of a rather similar spatial structure. Ack-rmann (1999) is skeptical about the use of aa sequencelignments for addressing the question of phage evolu-

ion simply because the evidence of relatedness ofhage proteins has not yet been found. This is, however,ot an argument against aa alignments but an encour-gement to sequence more phage genomes from care-

ully chosen bacterial genera. An increasing number ofissing links are obtained as the number of sequenced

hage genomes increases (Hendrix et al., 1999). This willoon allow an integrated, sequence-based picture ofhage evolution.

MATERIALS AND METHODS

acterial strains, phages, and media

Streptococcus thermophilus Sfi1 was grown at 40°C inn M17 broth (Difco Laboratories, Detroit, MI) supple-ented with 0.5% lactose. Phage Sfi21 was isolated frombatch of Sfi21 cells during starter preparation and

ndependently from a lysogenic cell Sfi19 (Brussow andruttin, 1995). For phage preparation, calcium chlorideas added to the medium to a final concentration of 10M. Phages were purified and concentrated as de-

cribed previously (Brussow and Bruttin, 1995). o

loning and sequencing

Sfi21 DNA was first purified using the Wizard DNAurification kit (Promega; Madison, WI) and then cut with

estriction enzymes EcoRI, HindIII, NsiI, XbaI, or Sau3ABoehringer Mannheim, Mannheim, Germany) andloned into vector pUC19 or the E. coli–lactococcal–treptococcal shuttle vector pNZ124 (Platteeuw et al.,994).

DNA sequencing was started with the universal for-ard and reverse primers of pUC19 or pNZ124 and

ontinued with synthetic oligonucleotide (17-mer) prim-rs (Microsynth, Switzerland). Both strands of the clonedNA were sequenced by the Sanger method of dideoxy-ediated chain termination using either the fmol DNA

equencing System of Promega (Madison, WI) as de-cribed previously (Bruttin et al., 1997) or the Amershamab-station sequencing kit based on the Thermo Seque-ase-labeled primer cycle-sequencing with 7-deaza-GTP (RPN2437). The thermal cycler was programmed at5 cycles at 95°C for 30 s, 50°C for 30 s, and 72°C for 1in. Sequencing was done on a Licor 6000L automated

equencer with fluorescence-labeled universal reversend forward pUC19 primers.

PCR was used to span regions not obtained throughandom cloning. PCR products were generated using theynthetic oligonucleotide pair designed according to thestablished phage Sfi21 DNA sequence, purified phageNA, and Super Taq polymerase (Stehelin, Basel, Swit-

erland). PCR products were purified using the QIAquick-pin PCR purification kit.

The cos-sites were ligated using purified phage DNAnd T4 ligase according to the manufacturer’s protocol

Boehringer Mannheim). The cos-overhangs were di-ested by the 39-to-59-end exonuclease activity of the T4olymerase in the appropriate buffer (Boehringer Mann-eim) and incubated for 5 min at room temperature.

rotein techniques

Extracellular phage particles were purified by PEGrecipitation and CsCl density gradient centrifugation asescribed previously (Brussow and Bruttin, 1995). Theurified phage particles were denatured at 95°C for 10in in Laemmli’s sample buffer (Bio-Rad, La Jolla, CA)

nd then loaded onto a 12% SDS–polyacrylamide minigelunning at 200 V for 60 min. The gel was stained withoomassie Brilliant Blue (Bio-Rad). Broad-rangerestained SDS–PAGE standards (Bio-Rad) were used tostimate the size of the stained proteins.

For N-terminal sequencing, the polyacrylamide gelas transferred to a PVDF membrane (Immobilon; Milli-ore, Bedford, MA) using a Bio-Rad Trans-Blot cell in

ransfer buffer (50 mM, pH 9). The proteins were thenisualized with Coomassie Brilliant Blue (0.1% Coomas-ie blue, 40% methanol, 1% acetic acid), and the protein

f interest was cut from the membrane for sequencing by

EpF

S

putt(P(1u11Cr

A

rt

snm

rh1

A

A

B

B

B

B

B

B

C

C

C

C

D

D

D

D

D

E

E

E

F

G

H

H

H

J

K

252 DESIERE, LUCCHINI, AND BRUSSOW

dman degradation with an Applied Biosystems 473Aulsed liquid protein sequencer (PE Applied Biosystems,oster City, CA).

equence analysis

The Genetics Computer Group sequence analysisackage (University of Wisconsin, Madison, WI) wassed to assemble and analyze the sequences. Nucleo-

ide and predicted aa sequences were compared withhose in the databases [GenBank, Release 110; EMBLAbridged), Release 58; PIR-Protein, Release 59; SWISS-ROT, Release 37; PROSITE, Release 15.0] using FastA

Pearson and Lipman, 1988) and BLAST (Altschul et al.,990) programs. Sequence alignments were performedsing the CLUSTAL W 1.74 method (Thompson et al.,994) and the BLOCKMAKER program (Henikoff et al.,995). The phylogenetic tree was constructed using theomputational Biochemistry Research Group Server (Zu-

ich, Switzerland).

ccession numbers

The complete genome sequence of Sfi21 was reportedecently (Lucchini et al., 1999) and was deposited underhe accession number AF115103.

ACKNOWLEDGMENTS

We thank the Swiss National Science Foundation for the financialupport of F. Desiere and S. Lucchini in the framework of its Biotech-ology Module (Grant 50022044545/1) and S. Foley for reading theanuscript.

Note added in proof. A very similar module for capsid assembly wasecently described for the temperate phage fC31 which infects theigh GC content gram-positive bacterium Streptomyces (Smith et al.,999).

REFERENCES

ckermann, H. W. (1999). Tailed bacteriophages: The order Caudovi-rales. Adv. Virus Res. 51, 135–201.

ltschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990).Basic local alignment search tool. J. Mol. Biol. 215, 403–410.

ecker, B., de la Fuente, N., Gassel, M., Gunther, D., Tavares, P., Lurz,R., Trautner, T. A., and Alonso, J. C. (1997). Head morphogenesisgenes of the Bacillus subtilis bacteriophage SPP1. J. Mol. Biol. 268,822–839.

en-Bassat, A., Bauer, K., Chang, S. Y., Myambo, K., Boosman, A., andChang, S. (1987). Processing of the initiation methionine from pro-teins: Properties of the Escherichia coli methionine aminopeptidaseand its gene structure. J. Bacteriol. 169, 751–757.

ruttin, A., Desiere, F., Lucchini, S., Foley, S., and Brussow, H. (1997).Characterization of the lysogeny DNA module from the temperateStreptococcus thermophilus bacteriophage phi Sfi21. Virology 233,136–148.

russow, H. (1999). “Encyclopedia of Virology,” 2nd ed. (R. Webster andA. Granoff, Eds.). Academic Press, London.

russow, H., and Bruttin, A. (1995). Characterization of a temperateStreptococcus thermophilus bacteriophage and its genetic relation-ship with lytic phages. Virology 212, 632–640.

russow, H., Bruttin, A., Desiere, F., Lucchini, S., and Foley, S. (1998).

Molecular ecology and evolution of Streptococcus thermophilus bac-teriophages: A review. Virus Genes 16, 95–109.

asjens, S., and Hendrix, R. (1974). Comments on the arrangement ofthe morphogenetic genes of bacteriophage lambda. J. Mol. Biol. 90,20–25.

handry, P. S., Moore, S. C., Boyce, J. D., Davidson, B. E., and Hillier, A. J.(1997). Analysis of the DNA sequence, gene expression, origin ofreplication and modular structure of the Lactococcus lactis lyticbacteriophage sk1. Mol. Microbiol. 26, 49–64.

ole, S. T., Brosch, R., Parkhill, J., Garnier, T., Churcher, C., Harris, D.,Gordon, S. V., Eiglmeier, K., Gas, S., Barry, C. E., Tekaia, F., Badcock,K., Basham, D., Brown, D., Chillingworth, T., Connor, R., Davies, R.,Devlin, K., Feltwell, T., Gentles, S., Hamlin, N., Holroyd, S., Hornsby, T.,Jagels, K., and Barrell, B. G. (1998). Deciphering the biology ofMycobacterium tuberculosis from the complete genome sequence.Nature 393, 537–544 [see comments] [published erratum appears inNature (1998) 396, 190].

onway, J. F., Duda, R. L., Cheng, N., Hendrix, R. W., and Steven, A. C.(1995). Proteolytic and conformational control of virus capsid matu-ration: The bacteriophage HK97 system. J. Mol. Biol. 253, 86–99.

averan-Mingot, M. L., Campo, N., Ritzenthaler, P., and Le Bourgeois, P.(1998). A natural large chromosomal inversion in Lactococcus lactisis mediated by homologous recombination between two insertionsequences. J. Bacteriol. 180, 4834–4842.

esiere, F., Lucchini, S., and Brussow, H. (1998). Evolution of Strepto-coccus thermophilus bacteriophage genomes by modular ex-changes followed by point mutations and small deletions and inser-tions. Virology 241, 345–356.

oolittle, R. F. (1981). Similar amino acid sequences: Chance or com-mon ancestry? Science 214, 149–159.

ove, W. F. (1971). The Bacteriophage Lambda (A. D. Hershey, Ed.) pp.297–312. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY.

uda, R. L., Hempel, J., Michel, H., Shabanowitz, J., Hunt, D., andHendrix, R. W. (1995). Structural transitions during bacteriophageHK97 head assembly. J. Mol. Biol. 247, 618–635.

llis, D. M., and Dean, D. H. (1985). Nucleotide sequence of thecohesive single-stranded ends of Bacillus subtilis temperate bacte-riophage phi 105. J. Virol. 55, 513–515.

ppler, K., Wyckoff, E., Goates, J., Parr, R., and Casjens, S. (1991).Nucleotide sequence of the bacteriophage P22 genes required forDNA packaging. Virology 183, 519–538.

sposito, D., Fitzmaurice, W. P., Benjamin, R. C., Goodman, S. D.,Waldman, A. S., and Scocca, J. J. (1996). The complete nucleotidesequence of bacteriophage HP1 DNA. Nucleic Acids Res. 24, 2360–2368.

oley, S., Lucchini, S., Zwahlen, M. C., and Brussow, H. (1998). A shortnoncoding viral DNA element showing characteristics of a replica-tion origin confers bacteriophage resistance to Streptococcus ther-mophilus. Virology 250, 377–387.

arcia, P., Alonso, J. C., and Suarez, J. E. (1997). Molecular analysis ofthe cos region of the Lactobacillus casei bacteriophage A2: Geneproduct 3, gp3, specifically binds to its downstream cos region. Mol.Microbiol. 23, 505–514.

endrix, R. W., and Duda, R. L. (1998). Bacteriophage HK97 headassembly: A protein ballet. Adv. Virus Res. 50,235–88, 235–288.

endrix, R. W., Smith, M. C., Burns, R. N., Ford, M. E., and Hatfull, G. F.(1999). Evolutionary relationships among diverse bacteriophagesand prophages: All the world’s a phage. Proc. Natl. Acad. Sci. USA96, 2192–2197.

enikoff, S., Henikoff, J. G., Alford, W. J., and Pietrokovski, S. (1995).Automated construction and graphical presentation of protein blocksfrom unaligned sequences. Gene 163, GC17–GC26.

anzen, T., Kleinschmidt, J., Neve, H., and Geis, A. (1992). Sequencingand characterization of pST1, a cryptic plasmid from Streptococcusthermophilus. FEMS Microbiol. Lett. 74, 175–180.

aneko, J., Kimura, T., Narita, S., and Tomita, T. (1998). Complete

nucleotide sequence and molecular characterization of the temper-

K

L

L

L

L

L

M

M

M

P

P

P

S

S

T

T

T

V

253SEQUENCE ANALYSIS OF S. thermophilus BACTERIOPHAGE Sfi21

ate staphylococcal bacteriophage phiPVL carrying Panton-Valentineleukocidin genes. Gene 215, 57–67.

inner, E., Pocta, D., Stroer, S., and Schmieger, H. (1994). Sequenceanalysis of cohesive ends of the actinophage RP3 genome andconstruction of a transducible shuttle vector. FEMS Microbiol. Lett.118, 283–289 [published erratum appears in FEMS Microbiol. Lett.(1994) 121, 257].

akshmidevi, G., Davidson, B. E., and Hillier, A. J. (1990). Molecularcharacterization of promoters of the Lactococcus lactis subsp. cre-moris temperate bacteriophage BK5-T and identification of a phagegene implicated in the regulation of promoter activity. Appl. Environ.Microbiol. 56, 934–942.

arbi, D., Decaris, B., and Simonet, J. M. (1992). Different bacteriophageresistance mechanisms in Streptococcus salivarius subsp. ther-mophilus. J. Dairy Res. 59, 349–357.

ucchini, S., Desiere, F., and Brussow, H. (1998). The structural genemodule in Streptococcus thermophilus bacteriophage phi Sfi11shows a hierarchy of relatedness to Siphoviridae from a wide rangeof bacterial hosts. Virology 246, 63–73.

ucchini, S., Desiere, F., and Brussow, H. (1999a). The genetic relation-ship between virulent and temperate Streptococcus thermophilusbacteriophages: Whole genome comparison of cos-site phages Sfi19and Sfi21. Virology 260, 232–243.

ucchini, S., Desiere, F., and Brussow, H. (1999b). Comparative genom-ics of Streptococcus thermophilus phage species supports a modu-lar evolution theory. J. Virol., in press.aniloff, J., and Ackermann, H. W. (1998). Taxonomy of bacterial vi-ruses: Establishment of tailed virus genera and the order Caudovi-rales. Arch. Virol. 143, 2051–2063.aurizi, M. R., Clark, W. P., Kim, S. H., and Gottesman, S. (1990). Clp Prepresents a unique family of serine proteases. J. Biol. Chem. 265,12546–12552.ercenier, A. (1990). Molecular genetics of Streptococcus thermophi-

lus. FEMS Microbiol. Rev. 7, 61–77.

earson, W. R., and Lipman, D. J. (1988). Improved tools for biologicalsequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.

fister, P., Wasserfallen, A., Stettler, R., and Leisinger, T. (1998). Molec-ular analysis of Methanobacterium phage psiM2. Mol. Microbiol. 30,233–244.

latteeuw, C., Simons, G., and de Vos, W. M. (1994). Use of the Esch-erichia coli beta-glucuronidase (gusA) gene as a reporter gene foranalyzing promoters in lactic acid bacteria. Appl. Environ. Microbiol.60, 587–593 [published erratum appears in Appl. Environ. Microbiol.(1994) 60, 2207].

mith, M. C., Burns, R. N., Wilson, S. E., and Gregory, M. A. (1999). Thecomplete genome sequence of the Streptomyces temperate phagephiC31: Evolutionary relationships to other viruses. Nucleic AcidsRes. 27, 2145–2155.

tanley, E., Fitzgerald, G. F., Le Marrec, C., Fayard, B., and van Sinderen,D. (1997). Sequence analysis and characterization of phi O1205, atemperate bacteriophage infecting Streptococcus thermophilusCNRZ1205. Microbiology 143, 3417–3429.

akemaru, K., Mizuno, M., Sato, T., Takeuchi, M., and Kobayashi, Y.(1995). Complete nucleotide sequence of a skin element excised byDNA rearrangement during sporulation in Bacillus subtilis. Microbi-ology 141, 323–327.

hompson, J. D., Higgins, D. G., and Gibson, T. J. (1994). CLUSTAL W:Improving the sensitivity of progressive multiple sequence alignmentthrough sequence weighting, position-specific gap penalties andweight matrix choice. Nucleic Acids Res. 22, 4673–4680.

remblay, D. M., and Moineau, S. (1999). Complete genomic sequenceof the lytic bacteriophage DT1 of Streptococcus thermophilus. Virol-ogy 255, 63–76.

lcek, C., Paces, V., Maltsev, N., Paces, J., Haselkorn, R., and Fonstein,M. (1997). Sequence of a 189-kb segment of the chromosome ofRhodobacter capsulatus SB1003. Proc. Natl. Acad. Sci. USA 94,

9384–9388.