The Evolution of SMC Proteins: Phylogenetic Analysis and Structural Implications

16
The Evolution of SMC Proteins: Phylogenetic Analysis and Structural Implications Neville Cobbe and Margarete M. S. Heck Wellcome Trust Centre for Cell Biology, Institute of Cell and Molecular Biology, University of Edinburgh, United Kingdom The SMC proteins are found in nearly all living organisms examined, where they play crucial roles in mitotic chromosome dynamics, regulation of gene expression, and DNA repair. We have explored the phylogenetic relationships of SMC proteins from prokaryotes and eukaryotes, as well as their relationship to similar ABC ATPases, using maximum-likelihood analyses. We have also investigated the coevolution of different domains of eukaryotic SMC proteins and attempted to account for the evolutionary patterns we have observed in terms of available structural data. Based on our analyses, we propose that each of the six eukaryotic SMC subfamilies originated through a series of ancient gene duplication events, with the condensins evolving more rapidly than the cohesins. In addition, we show that the SMC5 and SMC6 subfamily members have evolved comparatively rapidly and suggest that these proteins may perform redundant functions in higher eukaryotes. Finally, we propose a possible structure for the SMC5/SMC6 heterodimer based on patterns of coevolution. Introduction DNA is compacted in length by a factor of up to 10,000 to form a metaphase chromosome in a typical metazoan cell. How this remarkable feat of packaging is achieved and how the chromosomes are then faithfully segregated to the daughter cells have been fundamental questions for over 100 years, ever since Walther Fleming first described the dynamic behavior of chromosomes during cell division. Within the past decade, a convergence of data from genetic and biochemical approaches has identified the SMC (structural maintenance of chromo- somes) proteins as key players in both chromosome con- densation and segregation (Cobbe and Heck 2000). These proteins are a highly conserved and ubiquitous family, found in all eukaryotes for which sufficient sequence data are available and in most of the currently sequenced pro- karyotes. The function of these proteins in different aspects of chromosomal behavior is also conserved in these organisms, where they play vital roles in the packaging, repair, and segregation of the genetic material. No more than one smc gene is found in each of the prokaryotes examined, in which the protein apparently forms homodimers, whereas six paralogs are found in eukaryotes, in which they form at least three types of heterodimers. These heterodimers include the SMC1 and SMC3 proteins (part of the ‘‘cohesin’’ complex), the SMC2 and SMC4 proteins (part of the ‘‘condensin’’ complex), and the SMC5/SMC6 heterodimer, which forms part of a complex involved in DNA repair (Hirano 2002). Although no canonical SMC family members have been found in the enterobacteriaceae, similar phenotypes to smc mutants are displayed by mutations affecting mukB in Escherichia coli (Niki et al. 1992). This gene encodes a protein with a structure that is remarkably similar to that of SMC proteins (Melby et al. 1998), despite limited sequence homology. In addition, although the crenarch- aeota do not appear to have any SMC or MukB orthologs, they do contain the related Rad50 proteins. Furthermore, although no SMC, MukB, or Rad50 orthologs are apparent in the genome of Rickettsia, it does nonetheless have a related RecN protein. In fact, only four genera for which a complete genome sequence is currently available (Helicobacter, Buchnera, Chlamydia, and Chlamydophila) do not appear to have either an SMC or a related protein. Thus, it appears that SMC proteins have an extremely ancient origin, reflecting their fundamental role in chromosome dynamics. As shown in figure 1A, each SMC protein is char- acterized by a large globular domain at each terminus and a third globular domain forming a flexible hinge near the middle of the molecule, separated by extended coiled-coil structures (Saitoh et al. 1994; Melby et al. 1998) with short gaps in the coiled-coils at various positions (Beasley et al. 2002). Although MukB and Rad50 have also been shown to possess hinge regions (Melby et al. 1998; Anderson et al. 2001; Hopfner et al. 2001), the large globular hinge domain of SMC proteins is so far unique in both its sequence and structure (Haering et al. 2002). Recent work has shown that the hinge domain is the principal site at which these molecules dimerize (Haering et al. 2002; Hopfner et al. 2002). The terminal head domains are required for ATP hydrolysis and have been shown to be the region where non-SMC subunits of the condensin and cohesin complexes bind (Anderson et al. 2002; Yoshimura et al. 2002), similar to the binding of MukE and MukF to the head of MukB (Yamazoe et al. 1999) or the binding of Mre11 to Rad50 (Anderson et al. 2001; Hopfner et al. 2001). Although ATP is not required for DNA binding (Kimura and Hirano 1997; Hirano and Hirano 1998; Kimura et al. 1999), it appears necessary for preferential binding to positively supercoiled substrates (Kimura et al. 1999). Likewise, ATP binding (but not hydrolysis) is also required for the enhanced aggregation of the B. subtilis Smc with ssDNA (Hirano and Hirano 1998). Among the most conserved motifs in all SMC proteins are the N-terminal Walker A (P-loop) and C-terminal Walker B (DA box) ATPase motifs (Saitoh, Key words: SMC, phylogeny, condensin, cohesin, ABC ATPases, maximum likelihood. E-mail: [email protected]. 332 Mol. Biol. Evol. 21(2):332–347. 2004 DOI: 10.1093/molbev/msh023 Advance Access publication December 5, 2003 Molecular Biology and Evolution vol. 21 no. 2 Ó Society for Molecular Biology and Evolution 2004; all rights reserved. by guest on January 22, 2015 http://mbe.oxfordjournals.org/ Downloaded from

Transcript of The Evolution of SMC Proteins: Phylogenetic Analysis and Structural Implications

The Evolution of SMC Proteins: Phylogenetic Analysis andStructural Implications

Neville Cobbe and Margarete M. S. HeckWellcome Trust Centre for Cell Biology, Institute of Cell and Molecular Biology, University of Edinburgh, United Kingdom

The SMC proteins are found in nearly all living organisms examined, where they play crucial roles in mitoticchromosome dynamics, regulation of gene expression, and DNA repair. We have explored the phylogenetic relationshipsof SMC proteins from prokaryotes and eukaryotes, as well as their relationship to similar ABC ATPases, usingmaximum-likelihood analyses. We have also investigated the coevolution of different domains of eukaryotic SMCproteins and attempted to account for the evolutionary patterns we have observed in terms of available structural data.Based on our analyses, we propose that each of the six eukaryotic SMC subfamilies originated through a series of ancientgene duplication events, with the condensins evolving more rapidly than the cohesins. In addition, we show that theSMC5 and SMC6 subfamily members have evolved comparatively rapidly and suggest that these proteins may performredundant functions in higher eukaryotes. Finally, we propose a possible structure for the SMC5/SMC6 heterodimerbased on patterns of coevolution.

Introduction

DNA is compacted in length by a factor of up to10,000 to form a metaphase chromosome in a typicalmetazoan cell. How this remarkable feat of packaging isachieved and how the chromosomes are then faithfullysegregated to the daughter cells have been fundamentalquestions for over 100 years, ever since Walther Flemingfirst described the dynamic behavior of chromosomesduring cell division. Within the past decade, a convergenceof data from genetic and biochemical approaches hasidentified the SMC (structural maintenance of chromo-somes) proteins as key players in both chromosome con-densation and segregation (Cobbe and Heck 2000). Theseproteins are a highly conserved and ubiquitous family,found in all eukaryotes for which sufficient sequence dataare available and in most of the currently sequenced pro-karyotes. The function of these proteins in different aspectsof chromosomal behavior is also conserved in theseorganisms, where they play vital roles in the packaging,repair, and segregation of the genetic material.

No more than one smc gene is found in each of theprokaryotes examined, in which the protein apparentlyforms homodimers, whereas six paralogs are found ineukaryotes, in which they form at least three types ofheterodimers. These heterodimers include the SMC1 andSMC3 proteins (part of the ‘‘cohesin’’ complex), theSMC2 and SMC4 proteins (part of the ‘‘condensin’’complex), and the SMC5/SMC6 heterodimer, which formspart of a complex involved in DNA repair (Hirano 2002).Although no canonical SMC family members have beenfound in the enterobacteriaceae, similar phenotypes to smcmutants are displayed by mutations affecting mukB inEscherichia coli (Niki et al. 1992). This gene encodesa protein with a structure that is remarkably similar to thatof SMC proteins (Melby et al. 1998), despite limited

sequence homology. In addition, although the crenarch-aeota do not appear to have any SMC or MukB orthologs,they do contain the related Rad50 proteins. Furthermore,although no SMC, MukB, or Rad50 orthologs are apparentin the genome of Rickettsia, it does nonetheless havea related RecN protein. In fact, only four genera for whicha complete genome sequence is currently available(Helicobacter, Buchnera, Chlamydia, and Chlamydophila)do not appear to have either an SMC or a related protein.Thus, it appears that SMC proteins have an extremelyancient origin, reflecting their fundamental role inchromosome dynamics.

As shown in figure 1A, each SMC protein is char-acterized by a large globular domain at each terminus anda third globular domain forming a flexible hinge near themiddle of the molecule, separated by extended coiled-coilstructures (Saitoh et al. 1994; Melby et al. 1998) with shortgaps in the coiled-coils at various positions (Beasley et al.2002). Although MukB and Rad50 have also been shownto possess hinge regions (Melby et al. 1998; Anderson etal. 2001; Hopfner et al. 2001), the large globular hingedomain of SMC proteins is so far unique in both itssequence and structure (Haering et al. 2002). Recent workhas shown that the hinge domain is the principal site atwhich these molecules dimerize (Haering et al. 2002;Hopfner et al. 2002). The terminal head domains arerequired for ATP hydrolysis and have been shown to bethe region where non-SMC subunits of the condensin andcohesin complexes bind (Anderson et al. 2002; Yoshimuraet al. 2002), similar to the binding of MukE and MukF tothe head of MukB (Yamazoe et al. 1999) or the binding ofMre11 to Rad50 (Anderson et al. 2001; Hopfner et al.2001). Although ATP is not required for DNA binding(Kimura and Hirano 1997; Hirano and Hirano 1998;Kimura et al. 1999), it appears necessary for preferentialbinding to positively supercoiled substrates (Kimura et al.1999). Likewise, ATP binding (but not hydrolysis) is alsorequired for the enhanced aggregation of the B. subtilisSmc with ssDNA (Hirano and Hirano 1998).

Among the most conserved motifs in all SMCproteins are the N-terminal Walker A (P-loop) andC-terminal Walker B (DA box) ATPase motifs (Saitoh,

Key words: SMC, phylogeny, condensin, cohesin, ABC ATPases,maximum likelihood.

E-mail: [email protected].

332

Mol. Biol. Evol. 21(2):332–347. 2004DOI: 10.1093/molbev/msh023Advance Access publication December 5, 2003Molecular Biology and Evolution vol. 21 no. 2� Society for Molecular Biology and Evolution 2004; all rights reserved.

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

Goldberg, and Earnshaw 1995), which are also conservedin MukB, Rad50, and RecN. Although both motifs havebeen shown to be essential for ATP binding and hydrolysis(Hirano et al. 2001), only the N-terminal globular domaindirectly binds ATP (Akhmedov et al. 1998). This ATPaseactivity appears to be required for the full function ofSMC-containing complexes, as shown by mutagenesis ofthe ATP-binding domain (Chuang, Albertson, and Meyer1994; Verkade et al. 1999; Fousteri and Lehmann 2000;Hirano et al. 2001) or the use of nonhydrolysable ATPanalogs (Kimura and Hirano 1997). Based on the identifi-cation of these motifs at the extreme ends of SMCmolecules, it was suggested that the functional ATPasedomain would form by uniting these regions (Saitoh et al.1994). This could occur in either of two ways, asillustrated schematically in figure 1B. Either the moleculecould bend at the hinge to bring the terminal domainstogether (the intramolecular model) or it could do so bydimerizing as an antiparallel coiled-coil, bringing theN-terminal domain of one subunit next to the C-terminaldomain of the other (the intermolecular model). Indeed,the formation of such an ATPase has been confirmedby a covalent fusion of the N-terminal and C-terminaldomains alone of the single SMC in Thermotogamaritima, which is capable of ATP-dependent DNA ag-gregation (Lowe, Cordell, and van den Ent 2001). Acombination of in vitro data from electron microscopy,crystal structures, and several elegant biochemical experi-ments now indicates that both Rad50 and SMC moleculesbring these terminal domains together by bending at thehinge and forming intramolecular antiparallel coiled-coils(Hirano et al. 2001; Haering et al. 2002; Hirano andHirano 2002; Hopfner et al. 2002). In addition, visualiza-tion of the whole condensin complex and its SMC subunitsalone by atomic force microscopy has suggested that theSMC heterodimer may fold back along its coiled-coils,allowing the hinge to interact with the terminal ATPasedomains (Yoshimura et al. 2002). However, the functionalsignificance of this conformation remains to be explored.

SMC proteins share another conserved motif (LSGG)upstream of the Walker B site, referred to as the ‘‘signaturemotif’’ of the ABC (ATP-binding cassette) ATPase family,which also includes the structurally similar RecN, Rad50,and MukB proteins (Lowe, Cordell, and van den Ent 2001).This motif has been shown to play a pivotal role in thedimerization of Rad50 head domains (Hopfner et al. 2000)and probably serves a similar function in other ABCATPases such as the SMC, MukB, and RecN proteins.Given the striking level of conservation within SMCproteins and other ABC ATPases with extended coiled-coils, the origin of these proteins and their relationship to oneanother have been a subject of considerable interest. Anumber of different phylogenies for SMC proteins havebeen suggested to date that differ in their method ofconstruction and resultant topology (Melby et al. 1998;Cobbe and Heck 2000; Jones and Sgouros 2001; Soppa2001; Beasley et al. 2002), so the precise relationshipsbetween these and related proteins have been unclear. Toresolve this issue, we constructed phylogenetic treescontaining all six SMC subfamilies, as well as the relatedMukB, Rad50, and RecN proteins, from a large data set ofprotein sequences using the combined criteria of minimumevolution andmaximum likelihood. In addition, we used thedata generated in constructing these trees to test hypothesesabout the structure, function, and evolution of these proteins.

Materials and MethodsPhylogenetic Analysis of SMC Proteins

Sequences were assembled into subfamilies initiallybased on their BLAST (Altschul et al. 1997) scores, and thewhole protein sequences were aligned with ClustalW(Thompson, Higgins, and Gibson 1994) and DIALIGN(Morgenstern et al. 1998). Alignments were then inspectedmanually with the aid of Mview (Brown, Leroy, andSander 1998) and adjusted where necessary. Treescontaining large numbers of OTUs were constructed usingPAL (Drummond and Strimmer 2001) and Weighbor(Bruno, Socci, and Halpern 2000), and smaller maximum-likelihood trees of various subgroups were then calculatedby quartet puzzling using the Tree-Puzzle program(Strimmer and von Haeseler 1996). These subgroupsincluded members of the eukaryotic and prokaryotic SMCsubfamilies shown in figure 2, subfamilies that branchtogether or closely related sequences within a subfamily(e.g., the archaeal, proteobacterial, or firmicute SMCs in theeubacterial subfamily). In the case of prokaryotic sequencessuch as archaeal or firmicute SMCs, the branches in thesesmaller trees were most easily resolved by removing coiled-coil sequences and calculating distances based on theconserved globular domains.

Maximum parsimony was used to confirm the topol-ogy of smaller trees containing closely related sequenceswithin SMC subfamilies, using the tree-searching optionsin PAUP* (Swofford 1999). However, this method wasnot used to reconstruct the topology of the whole tree, as itcan become inconsistent with unequal rates of evolution(Felsenstein 1978; Kuhner and Felsenstein 1994; Tateno,Takezaki, and Nei 1994), which are observed in thedifferent SMC subfamilies. The topology of the eukaryotic

FIG. 1.—(A) General structure of SMC proteins, showing the fivedistinct domains. (B) Alternative models of SMC heterodimer structure.

Phylogenetic Analysis of SMC Proteins 333

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

FIG. 2.—Phylogenetic tree of SMC proteins. The scale bar denotes the number of accepted substitutions in Whelan-Goldman distance units.

334 Cobbe and Heck

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

SMCs differed both within subfamilies, when differentconserved domains of the protein were used to calculatetrees, and between subfamilies, even when the wholeprotein sequence was used (see Supplementary Materialonline). Horizontal gene transfer is a comparatively rareevent in multicellular eukaryotes, and so it seems morelikely that the conflict between subfamilies is the result ofadditional evolutionary constraints imposed by functionalinteractions between members of the subfamilies. Con-sequently, the phylogenetic signal leading to the mostprobable topology was recovered by calculating themaximum-likelihood tree from concatenated alignmentsof all the SMC subfamily members for which sequencewas available. This widely used approach (Gouy and Li1989; Baldauf et al. 2000; Hausdorf 2000; Brown et al.2001; Nei, Xu, and Glazko 2001; Springer et al. 2001;Wolf et al. 2001; Brochier et al. 2002; Lang et al. 2002;Suzuki, Glazko, and Nei 2002) is employed here on theassumption that each of the individual eukaryotic SMCsequences shares the same vertical pattern of inheritanceand therefore should follow the same branching order. Asthese trees using concatenated alignments had bootstrapvalues of 100% at all nodes (online supplementary figure)and in turn led to the best alignments when used as guidetrees, it seems likely that the most probable topology isrecovered this way. For those taxa in which geneduplications have occurred within a subfamily, it wasfound that the separate use of either paralog gave rise tothe same topology, after testing all possible combinationsof paralogs within subfamilies.

In assembling the complete tree, the phylogeneticinformation of as many taxa as possible was used to breakup long branches (Graybeal 1998; Zwickl and Hillis 2002).Additionally, tests were performed using the less divergentmembers of subfamilies to reconstruct trees in any caseswhere long-branch attraction (Lyons-Weiler and Hoelzer1997) was suspected, by confirming that the sametopology existed between other subfamily members. Anyambiguously assigned branches (typically, with a bootstrapprobability less that 60%) were compared with all re-maining competing tree topologies in which the suspectbranch was placed elsewhere, to establish which positioncontributed least to the total length. The relationshipbetween the subfamilies was established both by calculat-ing maximum-likelihood trees with the least divergentrepresentatives of the different subgroups and by de-termining the topology yielding the minimum-evolutiontree when all sequences were included. Finally, the branchlengths of the overall tree were calculated using theWhelan-Goldman model of amino acid substitution(Whelan and Goldman 2001), with rate heterogeneityamong sites modeled according to a gamma distributionwith 16 rate categories. The gamma distribution shapeparameter a was estimated from the data set using theTree-Puzzle program (Strimmer and von Haeseler 1996).

The phylogenetic tree of coiled-coil containing ABCATPases was constructed using essentially the samemethods as the SMC tree, in which smaller maximum-likelihood trees of each subfamily were calculated usingTree-Puzzle (Strimmer and von Haeseler 1996), and therelationship between the subfamilies was established bothby calculating maximum-likelihood trees with the least

divergent sequences and by determining the topologyyielding the minimum-evolution tree when all sequenceswere included. Similarly, the final tree of coiled-coil–containing ABC ATPases was constructed by Tree-Puzzle,using the Whelan-Goldman model of amino acid sub-stitution with a gamma distribution. However, branchlengths for this tree were calculated using distances esti-mated from the conserved globular domains of theseproteins, excluding coiled-coils.

Examples of maximum-likelihood trees used toconstruct the final trees presented in figure 2 and figure 5are provided as Supplementary Material online, togetherwith percentage bootstrap support values calculated usingPAL (Drummond and Strimmer 2001) and Tree-Puzzle(Strimmer and von Haeseler 1996).

Analysis of SMC Codon Usage

The general pattern of codon usage in the genome ofdifferent bacteria was compiled using data from the CodonUsage Database (http://www.kazusa.or.jp/codon/ ). Gran-tham’s D statistic was calculated using the WisconsinPackage programs CodonFrequency and Correspondversion 10.3 (Accelrys Inc) in the following species:Pseudomonas aeruginosa, Mycobacterium tuberculosis,Caulobacter crescentus, Deinococcus radiodurans, Trep-onema pallidum, Mesorhizobium loti, Thermotoga mar-itima, Borrelia burgdorferi, Listeria monocytogenes,Neisseria meningitidis, Clostridium acetobutylicum, Ba-cillus subtilis, Agrobacterium tumefaciens, Staphylococcusaureus, Xylella fastidiosa, Streptococcus pneumoniae,Aquifex aeolicus, Prochlorococcus marinus, Synechocys-tis, Nostoc, Thermosynechococcus elongatus, Synechococ-cus, Archaeoglobus fulgidus, Halobacterium, Pyrococcusfuriosus, Methanosarcina acetivorans, and Methanococ-cus jannaschii. The codon bias index and the effectivenumber of codons were calculated using the CodonWprogram (http://www.molbiol.ox.ac.uk/cu/ ).

The bias in GC content at silent third codon positionswas calculated by developing the bias(GC3) statistic. Tocorrect for biases in GC content caused by a biased aminoacid composition in the encoded protein, the GC content atthird nucleotide positions was calculated using relativefrequencies of codons. The statistic used is defined asfollows:

GC3 ¼1

21

X21j¼1

Xn

i¼1

ci�j

where n is the number of alternative codons ending ineither a G or a C for the j th group of synonymous codons,c is the frequency with which such a codon occurs in thecoding sequence, and �j is the sum of the frequencies foreach alternative codon in the j th synonymous group,regardless of GC content. To correct for changes in com-position resulting from mutational bias at the same locus,the GC content of the remaining first and second codonpositions was calculated (in the same way as for GC3), aswell as that of any introns. The arithmetic mean of thesethree values was taken to be the expected GC content forthe locus, and the bias(GC3) statistic for a gene was thencalculated as

Phylogenetic Analysis of SMC Proteins 335

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

biasðGC3Þ ¼ 1observedðGC3Þ � expectedðGC3Þ

1� expectedðGC3Þ

� �

where the species scaling factor 1 is the average ATcontent of the respective genome (57.3% in the case ofD. melanogaster [Adams et al. 2000]), divided by theaverage GC content.

The pairwise number of synonymous and nonsynon-ymous substitutions per site between aligned SMC andRad50 sequences from D. melanogaster and D. pseudo-obscura were estimated using the Diverge program in theWisconsin package. The D. pseudoobscura nucleic acidsequences were obtained from the Drosophila GenomeProject at the Baylor College of Medicine (http://hgsc.bcm.tmc.edu/projects/Drosophila/ )

SMC Evolutionary Rate Correlation

The divergence of each eukaryotic SMC proteinregion was calculated by summing the maximum-likeli-hood branch lengths as far as the presumed root where theeukaryotic and prokaryotic branches meet. The extent ofcorrelation between the estimated evolutionary rate in dif-ferent parts of the proteins was then calculated using thePearson product-moment correlation coefficient for eachpair of SMC protein regions. Similar trends were alsofound using the Spearman rank correlation coefficient.

ResultsPhylogenetic Analysis of SMC Proteins

The final maximum-likelihood tree of 148 SMC se-quences is presented in figure 2, with only one repre-sentative of each prokaryotic genus displayed. Among theeukaryotic SMC proteins, it was consistently observed thatSMC1 and SMC4 branch together (these are the largerSMC subunits of the cohesin and condensin SMCheterodimers, respectively). Similarly, the smaller sub-units, SMC3 and SMC2, appear to originate from the samegene duplication event, and SMC5 and SMC6 (whichheterodimerize as part of a DNA repair complex [Tayloret al. 2001; Fujioka et al. 2002]) also branch together. Alleukaryotes whose genomes have been substantiallysequenced, from microsporidia to humans, contain thesesix subfamilies of SMC proteins, suggesting that theduplication events giving rise to each subfamily must haveoccurred either before or very soon after the origin ofeukaryotes.

It is also noteworthy that the rate of accepted aminoacid substitution varies among different eukaryotic taxawithin each subfamily. In particular, a number of SMCproteins in Caenorhabditis elegans seem to be exception-ally divergent. However, this can be explained by theobservation that the more rapidly evolving SMCs areeither those that have arisen from a gene duplication (suchas SMC4a and DPY-27) or SMC proteins that formheterodimers with both of the duplicated SMCs (such asMIX-1) (Hagstrom et al. 2002). This is consistent witha correlation between the large number of duplicated genesin the C. elegans genome (Coghlan and Wolfe 2002; Guet al. 2002) and its rapid evolutionary rate (Aguinaldo et al.

1997; Mushegian et al. 1998). Conversely, the rate ofevolution appears relatively slow for the plant and verte-brate SMCs, with the exception of the SMC1b proteins,which are probably less evolutionarily constrained, as theyappear to be required only during meiosis in males(Revenkova et al. 2001).

Examining the tree, it appears that the branch lengthsbetween more closely related organisms (such as thevertebrates or diptera) are longer for condensin SMCproteins than for the corresponding cohesins, suggestingeither that condensins may be more ancient or may evolvemore rapidly. However, the SMC proteins of bothcomplexes appear to be similarly essential for viability ina range of organisms (Strunnikov, Larionov, and Koshland1993; Saka et al. 1994; Strunnikov, Hogan, and Koshland1995; Michaelis, Ciosk, and Nasmyth 1997; Lieb et al.1998; Steffensen et al. 2001; Hagstrom et al. 2002; Liuet al. 2002; Siddiqui et al. 2003), so one might expect theirrates of evolution to be broadly similar. Indeed, a signif-icant rate difference is only evident between the moreclosely-related organisms for which all SMC protein se-quences are available, whereas the mean pairwise distan-ces for condensin and cohesin SMCs in more divergentgroups seem to approach similar values (fig. 3A). Further-more, the overall number of substitutions between con-densin or cohesin SMC proteins and their presumedancestor appear to be essentially the same (fig. 3B). It istherefore possible that the trend shown in figure 3A mayreflect slightly more rapid evolution in condensin SMCproteins (detectable only by examining the smallestpairwise distances, in which the number of acceptedsubstitutions and speciation events are minimized).

Although condensin SMCs appear to show a highersubstitution rate among closely related species thancohesin SMCs, the mean distances within subfamilies ofthese proteins (averaged across all condensin and cohesinSMCs for each pairwise comparisons between differentorganisms) are about half (0.54 6 0.134) the correspond-ing distances between SMC5 and SMC6 proteins. Simi-larly, it can be seen in figure 3B that the mean distance ofSMC5 and SMC6 to the presumed root with prokaryotes isapproximately twice as great as that of other eukaryoticSMC proteins. As discussed later, this elevated sequencedivergence may reflect a reduced selection pressure onthese proteins compared with the other SMCs, which arerequired during every cell division, and is thus consistentwith the suggested general role for these proteins,specifically in response to DNA damage (Lehmann et al.1995; Mengiste et al. 1999; Verkade et al. 1999; Fousteriand Lehmann 2000).

Analysis of Prokaryotic SMC Codon Usage

In general, the phylogeny of prokaryotic SMCproteins agrees well with their accepted taxonomicdistribution, although the unusually short SMC protein inBorrelia burgdorferi results in this protein clustering withother outgroup species, away from other spirochaetes suchas Treponema. However, as previously suggested (Soppa2001), it would appear from the tree shown in figure 2 thatat least two cases of horizontal gene transfer (HGT) have

336 Cobbe and Heck

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

occurred, resulting in the SMC proteins of the cyanobac-teria (Prochlorococcus marinus, Thermosynechococcuselongatus, Nostoc, Synechocystis, and Synechococcus) aswell as the eubacterial species Aquifex aeolicus, mostclosely resembling those of the archaea. On the other hand,phylogenetic trees based on concatenated alignments ofprotein or rRNA sequences (Wolf et al. 2001; Brochieret al. 2002) firmly place these species among the remainingeubacteria, as do our own trees based on sequences for 16SrRNA or proteins such as EF-Tu and RecA (data notshown).

To investigate this HGT phenomenon further, codonusage analysis was conducted to examine whether thepattern of codon usage in the cyanobacterial SMC genesmore closely resembled that of the archaeal SMCs or ofother cyanobacterial genes. The differences between codonfrequency tables were computed using Grantham’s Dstatistic (Grantham et al. 1981). This statistic wascalculated to measure the difference in codon usage

between the SMC of a particular prokaryote and thegeneral pattern of codon usage in its genome. In addition,comparisons were made between genomic codon usageand the codon usage of recA orthologs or EF-Tu genes,which are often employed in phylogenetic tree reconstruc-tion because of the apparent rarity with which theyexperience HGT (Eisen 1995; Baldauf, Palmer, andDoolittle 1996; Brendel et al. 1997; Lopez, Forterre, andPhilippe 1999). However, as shown in figure 4A, thepattern of codon usage for SMC genes in A. aeolicus, thecyanobacteria, the remaining eubacteria and the archaeamore closely match the general pattern for other loci in thegenome than do EF-Tu and RecA, which have highervalues of D. Therefore, we have found little evidence infavor of HGT for smc genes from patterns of codon usage.

As the relative level of sequence similarity betweenthe archaeal and cyanobacterial SMC proteins seems hardto account for in terms of convergent evolution, it wouldappear that their similarity results from their commonancestry (synapomorphy). This synapomorphy is pre-

FIG. 3.—Divergence of SMC protein subfamilies. (A) Relativedistances (mean pairwise distance within each subfamily divided by themean distance of all subfamilies) of condensin and cohesin SMC proteinswithin different taxonomic groups. The difference in relative distancesbetween condensin and cohesin SMC proteins becomes most obviouswith progressively more closely related organisms, as shown from left toright. (B) Evolutionary distance between members of different eukaryoticSMC subfamilies and the presumed root with prokaryotic SMC proteins.The mean number of accepted substitutions (in Whelan-Goldman distanceunits) and standard deviation are displayed for organisms with sub-stantially sequenced genomes in which all SMC protein sequences areavailable.

FIG. 4.—Codon usage in SMC genes. (A) Measures of Grantham’s Dstatistic in comparisons between codon usage for SMC, EF-Tu, and RecAin different prokaryotic species and general codon usage patterns in thesame organism. (B) Codon usage in Drosophila SMC genes. Values ofbias(GC3) displayed above have all been halved to facilitate comparisonwith trends in the other statistics. (C) The ratio of nonsynonymous (Ka) tosynonymous substitutions (Ks) in SMC genes, based on pairwisecomparisions between codons in D. melanogaster and D. pseudoobscurasequences.

Phylogenetic Analysis of SMC Proteins 337

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

sumably associated with an ancient HGT event, allowingsufficient time for a transferred smc gene to adapt to thetranslational apparatus of its recipient genome by meansof appropriate synonymous substitutions. This codonadaptation could occur without significantly affectinga gene’s identity at the amino acid level, as the rate ofamino acid replacement is generally much slower than therate of synonymous substitution (Li 1997). Indeed, a recentstudy (Koski, Morton, and Golding 2001) has shown thatmany E. coli genes with normal nucleotide compositionhave no apparent orthologs in a related enterobacterium,suggesting that only relatively recent HGT events maynecessarily reveal themselves by differences in codonusage. Consequently, it is not clear from the codon usagedata in which direction this transfer event occurred,whether from archaea to cyanobacteria or vice versa, andthere is still debate as to which is actually the older groupof prokaryotes (Hedges et al. 2001; Cavalier-Smith 2002).As the A. aeolicus SMC clusters with different archaealspecies in the tree and its pattern of codon usage moreclosely resembles that of the archaea than the cyanobac-teria, it would appear that a separate transfer event wasinvolved in this lineage. This may reflect a general trend inthis species to acquire and retain genes from similarlythermophilic archaea (Aravind et al. 1998).

However, apart from the cases described above, thephylogeny of prokaryotic SMC proteins agrees well withtrees constructed from 16S rRNA, EF-Tu, and RecAsequences. If the complexity hypothesis (Jain, Rivera, andLake 1999) holds true that informational genes are lessprone to successful HGT by virtue of the number of geneproducts they interact with, then the relative lack of HGTinvolving SMCs may reflect a potentially large number ofinteractors. So far, only two proteins have been found toform a complex biochemically with either a bacterial SMC(Soppa et al. 2002) or MukB (Yamazoe et al. 1999), but itseems likely that many more may interact either transientlyor indirectly, as suggested by the large number of proteinsthat interact genetically with MukB (Hiraga 2000).

Analysis of SMC Codon Usage in Drosophila

The dipteran SMC5 and SMC6 coding sequencesseem to have an exceptionally large number of substitu-tions, even allowing for the observation that a largenumber of genes in Drosophila are known to evolve at anunusually high rate (Sharp and Li 1989; Schmid and Tautz1997; Fay, Wyckoff, and Wu 2002). The possibility thatthese genes might be nonessential (Jordan et al. 2002;Yang, Gu, and Li 2003) or required at much lower levels(Pal, Papp, and Hurst 2001) was therefore investigated byexamining the pattern of codon usage in the Drosophilagenes encoding SMC5 and SMC6, in comparison with thecodon usage bias of the other four SMCs. In Drosophilaand other organisms, there is a strong positive correlationbetween gene expression levels and codon usage bias(Duret and Mouchiroud 1999; Kanaya et al. 2001).However, it has also been suggested that synonymouscodon usage in Drosophila may be biased to enhance theaccuracy of translation, as the frequency of preferredcodons is significantly higher at conserved amino acid

positions (Akashi 1994). As the number of acceptednucleotide substitutions may be constrained by strongcodon usage bias, this could explain the apparentcorrelation between the rates of synonymous and non-synonymous substitutions in Drosophila (Comeron andKreitman 1998). Therefore, if the higher rates of aminoacid replacement in Drosophila SMC5 and SMC6 resultfrom reduced selective pressures, this may be reflected bya lack of bias for optimal codons (consistent with reducedtranslational accuracy or efficiency.)

The extent to which the genes encoding SMCproteins in Drosophila demonstrate directional bias incodon usage was first investigated by calculating thecodon bias index (CBI) (Bennetzen and Hall 1982). Asshown in figure 4B, the CBI for SMC5 or SMC6 is lowerthan the values for cohesin or condensin SMCs. Moreover,the especially low CBI value for SMC5 is closer to that ofthe negatively biased yEst-6 pseudogene (accessionnumber AF526559) of Drosophila. The remaining SMCgenes all have similar values, with the exception ofDrosophila SMC3 (CAP), which has a comparatively highlevel of codon bias. As this gene is found on the Xchromosome, whereas the other SMC genes are allautosomal, the increased bias and decreased number ofsubstitutions in Drosophila SMC3 is presumably a re-flection of the increased selection pressure on X-linkedgenes in the heterogametic sex (Montgomery, Charles-worth, and Langley 1987). To evaluate whether thereduced bias observed in SMC5 and SMC6 might besimply the result of lower expression levels of DNA repairproteins, the CBI for Drosophila Rad50 was alsocalculated. As this is another gene specifically involvedin DNA repair (Gorski and Eeken 2002; Engels et al.2003; Johnson-Schlitz et al. 2003) that also has strongsequence and structural similarities to SMC proteins, onemight expect it to be expressed similarly to the SMC5 andSMC6 repair proteins. Interestingly, the CBI for Rad50 isless than most SMCs, which is consistent with a threefold-lower abundance of rad50 mRNA compared with SMC2and SMC4 during early Drosophila embryogenesis (http://genome.med.yale.edu/Lifecycle/ ), when most mitoticactivity takes place. Nevertheless, the CBI for Rad50 issignificantly greater than that of SMC5 and SMC6,suggesting that these proteins may not be required to thesame extent as Rad50 to repair DNA damage.

Secondly, the effective number of codons (NC)(Wright 1990) was also calculated. As NC can take valuesfrom 20 (in the cases of extreme bias) to 61 (when allsynonymous codons are used equally), the bias in NC wasevaluated as:

biasðNCÞ ¼61� NC

41

This statistic ranges from 0, for completely random codonusage, to 1, if one codon is exclusively used for eachamino acid. These values are also plotted in figure 4B andshow that bias(NC) for SMC5 and particularly for SMC6 ismost similar to that of the yEst-6 pseudogene.

Thirdly, in addition to a correlation between tRNAcopy number and favored codons, it has also beendescribed that more highly expressed genes in Drosophila

338 Cobbe and Heck

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

tend to have a high GC content at silent sites such as thethird nucleotide position of their codons (Shields et al.1988; Kanaya et al. 2001). On the other hand, this GCcontent is often reduced in sequences with weakerselective constraints, such as pseudogenes (Shields et al.1988) (although this depends on their age [Echols et al.2002]). The bias in GC content at silent third codonpositions was evaluated by calculating the bias(GC3)statistic. As shown in figure 4B, the bias(GC3) for SMC5and SMC6 is at least 20% smaller than that of most otherSMC genes (but only about 11% less than that of Rad50),whereas SMC3 shows the highest bias.

Finally, the selection pressures on Drosophila SMCproteins were examined by estimating the average ratio ofnonsynonymous to synonymous substitutions (Ka/Ks) ateach codon position. This ratio provides a measure of theintensity of purifying, or negative, selection exerted ongenes, which results in decreased rates of amino acidreplacement and thus Ka/Ks ratios less than one. Thedistribution of Ka/Ks varies greatly among different classesof genes, with values ranging from 0.02 to 0.306 and ameanratio of 0.122 for various Drosophila genes (Li 1997) Onthe other hand, the rates of synonymous and nonsynon-ymous substitutions are approximately equal if most aminoacid variation is neutral, as observed in many pseudogenes(Li, Gojobori, and Nei 1981; Bustamante, Nielsen, andHartl 2002), whereas Ka/Ks ratios greater than 1 usuallyindicate adaptive or positive selection (Messier and Stewart1997; Hughes and Yeager 1998). As shown in figure 4C,the Ka/Ks ratios for SMC2, SMC4, and Rad50 appear to beclose to the previously described mean value in Drosophila(Li 1997), whereas SMC1 and SMC3 have considerablylower Ka/Ks ratios. This increased negative selection onSMC1 and SMC3 is consistent with the differences inamino acid substitution rates between cohesin and con-densin SMCs in closely related organisms, as describedearlier. By contrast, SMC5 and SMC6 display Ka/Ks ratiosapproximately twice as great as other SMC proteins. Inparticular, the ratio for SMC6 is similar to the highest Ka/Ks

values previously described in Drosophila for chorionproteins (Li 1997). Based on these combined results, itwould appear that the SMC5 and SMC6 proteins inDrosophila are under severely reduced selection forpreferred codons in comparison with other SMCs andexperience dramatically decreased negative selection,which could explain their higher amino acid substitutionrate in terms of fewer selective constraints on theirevolution in general.

Phylogenetic Analysis of SMC Related Proteins

To reveal how closely related SMC proteins are tosimilar ABC ATPases, a maximum-likelihood tree wasconstructed using sequences from these protein families, asshown in figure 5. It is clear from this tree that the closestrelatives to the SMC proteins are the archaeal Rad50proteins, followed by eukaryotic Rad50 and eubacterialSbcC proteins. Many of these Rad50 superfamily proteinshave the conserved N-terminal FKS (or FRS) motif(located before the Walker A site), which is present inmost of the SMC proteins. However, this motif is absent in

the SMC5 and SMC6 subfamily members (in which onlythe phenylalanine is conserved). The FKS motif is alsoabsent in the MukB proteins, which appear to possessuniversally conserved NWN and FART motifs instead. Inaddition, although it was previously suggested that allSMC and MukB proteins uniquely share a conserved N-terminal QG motif (Melby et al. 1998), this dipeptide wasfound to occur at different locations in SMC and MukBsequences in the alignments generated in this study.Instead, this motif was found at a conserved position inSMC and Rad50 proteins, in agreement with previouslypublished structure-based alignments (van den Ent et al.1999). Therefore, although it was previously proposed thatMukB should be included among the SMC proteins asthey adopt a similar structure (Melby et al. 1998), theRad50 proteins seem to be more closely related at thesequence level. The high level of sequence conservationbetween MukB proteins argues against the notion that theyare a highly divergent group of SMC proteins. Instead, it ispossible that their structural similarity may be partlycaused by convergent evolution, which is consistent withthe similar functions of MukB and SMC proteins inprokaryotes (Cobbe and Heck 2000).

Correlated Rates of Evolution in Different Regions ofSMC Proteins

Contrary to previous reports (Melby et al. 1998), weobserved that eukaryotic SMC trees reconstructed fromindividual conserved hinge or N-terminal and C-terminalglobular domains often differed both from each other andfrom phylogenies based on the whole protein sequence.For this reason, we attempted to dissect the differentselective constraints on the sequence divergence of theseregions by examining how they might coevolve. As inter-acting proteins tend to evolve at similar rates (Fraser et al.2002), it was expected that regions of SMC proteins thatinteract with each other would also coevolve and showcorrelated rates of substitution.

Correlation coefficients for different regions of SMCproteins were calculated after sorting the maximum-likelihood distances according to the functional complexeswithin which different subfamily members are found. ThePearson correlation coefficient for each pair of SMC pro-tein regions is plotted in figure 6, together with Spear-man’s coefficient of rank correlation. As shown in figure6B, a clear positive correlation was observed between theevolutionary rates of the b-strands that constitute theinterface of the N-termini and C-termini in the samemolecule (Lowe, Cordell, and, van den Ent 2001). How-ever, no significant positive correlation was found betweenthe corresponding N-terminal and C-terminal regions ofany SMC and its partner (fig. 6A), as would have beenexpected if the heterodimer existed naturally in an anti-parallel intermolecular conformation. This strongly sug-gests that the intramolecular structure (fig. 1B) indicatedby available in vitro data is also the structure found invivo. By contrast, a strong positive correlation was obser-ved between the evolutionary rates of the exposed outerportions of the N-terminal and C-terminal domains inSMC2 and SMC4 but not in the corresponding regions

Phylogenetic Analysis of SMC Proteins 339

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

FIG. 5.—Phylogenetic tree of SMC proteins and other coiled-coil ABC ATPases. The scale bar denotes the number of accepted substitutions inWhelan-Goldman distance units.

340 Cobbe and Heck

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

of the same molecule (fig. 6C and D). These strongerintermolecular correlations are nonetheless consistent withobserved associations between the termini of SMC pro-teins (Melby et al. 1998; Anderson et al. 2002). Moreover,a significant correlation at the 1% level was observed forintermolecular associations between the coiled-coils ofthe condensin SMC proteins (fig. 6E ). This elevated in-termolecular correlation appears to reflect differencesbetween the conformation of vertebrate condensin andcohesin SMC proteins previously observed by rotaryshadowing, whereby the coiled-coils of condensin SMCsare intimately associated along most of their length but

those of the cohesin complex are well separated (Andersonet al. 2002).

Interestingly, the SMC5 and SMC6 proteins appearto show similarly weak associations between their arms(fig. 6C and E) as SMC1 and SMC3. This suggests that theSMC proteins involved specifically in DNA repair mayadopt a similarly open conformation to that of the cohesinSMC proteins (Haering et al. 2002). By contrast, althougha significant intermolecular correlation was not foundconsistently for the rate at which the hinge domains ofcohesin SMC proteins evolve, the SMC5 and SMC6proteins nonetheless showed intermolecular rate correla-

FIG. 6.—Correlated rates of evolution in different regions of SMC proteins, calculated using maximum-likelihood branch lengths. Correlationcoefficients associated with postulated intermolecular associations are denoted ‘‘INTER,’’ whereas ‘‘INTRA’’ indicates intramolecular correlations.The Pearson correlation coefficients are shown above and nonparametric Spearman rank correlations are shown below.

Phylogenetic Analysis of SMC Proteins 341

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

tions greater than 60% at the dimer interface of their hingedomains. This may possibly indicate that SMC5 andSMC6 have a stronger hinge association than other SMCproteins such as SMC1 and SMC3. Although an evenstronger positive correlation can be seen for the dimerinterface of condensin hinge domains using the Pearsoncorrelation coefficient, this particular result is not sup-ported by nonparametric analyses.

DiscussionThe Phylogeny of SMC Proteins

We have investigated the evolutionary radiation ofSMC proteins and determined that the closest relatives ofthese proteins are members of the Rad50 superfamily, asrevealed by phylogenetic analysis and the conservation ofparticular residues. Apart from the suspected cases of HGTdescribed in this study, the tree topology described heregenerally agrees with the currently accepted taxonomicdistribution of the organisms whose SMC proteins havebeen examined. Interestingly, our consensus tree showsthat plant and animal SMC proteins cluster together withfungi as the outgroup, whereas other recent studies basedon 18S rRNA and a panoply of proteins have reported thatfungi and animals are most closely related to eachother (Baldauf and Palmer 1993; Baldauf et al. 2000; Vande Peer et al. 2000). However, the resolution of thistrichotomy has been a longstanding controversy, withdifferent studies suggesting that animals are closerrelatives to either plants or fungi (Gupta 1995; Wang,Kumar, and Hedges 1999). The tree presented here is moreconsistent with previous analyses using ribosomal proteins(Veuthey and Bittar 1998) and other concatenated se-quences (Gouy and Li 1989), which place plants closerto animals. Alternatively, the phylogeny revealed in theSMC tree may reflect higher rates of substitution in thefungal proteins because of reduced constraints in theirinteractions with other proteins, as has been suggested inthe case of histones H3 and H4 (Thatcher and Gorovsky1994). For example, at least one protein (AKAP95) withno clear orthologs in other animals, fungi, or plants isthought to interact with SMC proteins in vertebrates(Collas, Le Guellec, and Tasken 1999; Steen et al. 2000),which may influence the relatively slow rates of SMCevolution in these animals. In the future, it will beinteresting to see how the use of SMC sequences, incombination with larger data sets of similarly conservedprotein sequences (Brown et al. 2001), might contributeto resolving the relationships among these eukaryotickingdoms.

Evolution of Eukaryotic SMC Subfamilies

According to the tree in figure 2, it appears thata symmetric duplication of genes encoding the larger andsmaller eukaryotic SMCs gave rise to both cohesin andcondensin SMC proteins in all eukaryotes. The relativelyshort branch lengths connecting either the roots of theSMC1/SMC4 or the SMC2/SMC3 lineages to the pro-karyotic SMC root also suggest that the first duplicationevent, giving rise to the primordial eukaryotic SMC

heterodimer, occurred very early in eukaryotic evolution.However, it seems that the first SMC heterodimer to arisein eukaryotes may have had more functions in commonwith the condensin heterodimer, based on the assertion thatthe primary phenotype in a prokaryotic smc null mutant isprobably a condensation defect (Britton and Grossman1999) and the observation that DNA reannealing (Sutaniand Yanagida 1997; Hirano and Hirano 1998) andsupercoiling (Kimura and Hirano 1997; Lindow, Britton,and Grossman 2002) activities are common features ofthese proteins but apparently not of cohesins in general(Jessberger et al. 1996; Losada and Hirano 2001; Sakaiet al. 2003). Given these functional similarities betweencondensin and prokaryotic SMCs, it might seem odd thatthe primary sequences of SMC2 and SMC4 do not appearto be more closely related to prokaryotic SMC proteinsthan the functionally distinct SMC1 and SMC3 proteins(fig. 3B). However, if condensin SMCs had evolvedslightly more rapidly since the divergence of eukaryotes,then this could account for their lack of greater sequenceidentity with the functionally related prokaryotic SMCproteins. Indeed, a higher substitution rate amongcondensin SMC proteins than cohesins can be seen inpairwise comparisons between sequences from moreclosely related organisms, such as vertebrate or dipteranSMCs. This suggests that SMC2 and SMC4 may evolvemore rapidly than the cohesin SMCs but that the differencein rate is small enough to be masked by mutationalsaturation in comparisons between more distantly relatedspecies. If so, it is anticipated that additional SMC proteinsequences from closely related organisms in other phylamight also reveal more accepted substitutions in pairwisecomparisons between condensin SMC proteins than incohesins. One might speculate that the later evolution ofSMC2 and SMC4 involved some level of positiveselection for proteins with enhanced condensing activities,given the larger size of eukaryotic chromosomes in general(Bendich and Drlica 2000), whereas SMC1 and SMC3were recruited to specific roles in maintaining sisterchromatid cohesion. Although speculative, a predictionof this hypothesis could be tested by directly comparingthe supercoiling activities of condensin and prokaryoticSMCs to see if the condensing activities of the former areindeed more efficient. Moreover, as additional SMCsequences from closely related species become available,this may facilitate the detection of positive selection atparticular amino acid sites of condensin SMC proteins bymaximum-likelihood approaches (Anisimova, Bielawski,and Yang 2001).

Whereas, differences in the substitution rate ofcohesin and condensin SMC proteins become morepronounced in comparisons between closely relatedorganisms, SMC5 and SMC6 consistently show pairwisedistances about twice that of the average for the otherSMCs. Based on the pattern of codon usage and Ka/Ks

ratios in Drosophila, it seems that the large numbers ofsubstitutions in the SMC5 and SMC6 proteins are bestexplained by an elevated rate of evolution, rather thana more ancient origin. Similar codon usage analyses inother organisms failed to clearly show the same pattern,often because of less pronounced codon bias patterns or

342 Cobbe and Heck

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

the complicating influence of SMC gene duplications onevolutionary rate. However, as SMC5 and SMC6 are onlyknown to be essential for proliferation in yeasts (Lehmannet al. 1995; Verkade et al. 1999) but not in Arabidopsis(Mengiste et al. 1999), it seems likely that their reducedsequence conservation may reflect reduced selectivepressures on these proteins in other species as well.Moreover, as some of the properties of SMC5 and SMC6(Taylor et al. 2001) are mirrored by the association ofSMC1 (or SMC1b in vertebrates) and SMC3 with malemeiotic chromosomes (Klein et al. 1999; Eijpe et al. 2000;Revenkova et al. 2001) and the role of SMC1/SMC3 inrecombination repair (Jessberger et al. 1996; Sjogren andNasmyth 2001; Kim, Xu, and Kastan 2002; Yazdi et al.2002), it is possible that these proteins have evolved morerapidly since their function can be partially complementedby the cohesin SMCs. Therefore, the long branch lengthswithin this subfamily seem to better reflect their less-constrained substitution rate, rather than their greaterantiquity as previously suggested (Jones and Sgouros2001). Although the other coiled-coil ABC ATPasesappear to place the root of SMC proteins in the SMC5/SMC6 branch, this does not necessarily imply that theseare the ancestral SMC proteins. Instead, it seems morelikely that the placement of the root here in the maximum-likelihood tree reflects the average similarity of othercoiled-coil ABC ATPases to SMC proteins, regardless ofhow the SMCs have diverged from each other.

The Conformations of SMC Heterodimers

In agreement with the results obtained previously byin vitro analyses (Hirano et al. 2001; Haering et al. 2002;Hirano and Hirano 2002), our statistical analysis ofcoevolution in eukaryotic SMC heterodimers stronglyfavors the intramolecular model of dimer formation(fig. 6A and B). We believe our findings are complemen-tary to the available in vitro data as the observedcorrelations between substitution rates would be consistentwith formation of such intramolecular heterodimers invivo.

The low correlations between the evolutionary ratesat the coiled-coils and termini of SMC5/SMC6 hetero-dimers suggest that their conformation may more closelyresemble cohesin SMC proteins, consistent with theiroverlapping functions in DNA repair and meiosis. Thesimilarly weak intermolecular correlations for the arms and

termini of cohesin SMCs may represent a weakerassociation between these regions (fig. 7), facilitating thedissolution of sister chromatid cohesion mediated bya proposed annular cohesin complex (Gruber, Haering,and Nasmyth 2003). The weaker association between thearms suggested by our analysis is consistent with theappearance of both the SMC1/SMC3 heterodimer and thecohesin complex when observed by electron microscopy(Anderson et al. 2002; Haering et al. 2002). By contrast,the strong intermolecular correlation for the coiled-coilsand the termini of condensin SMC proteins are consistentboth with electron microscope images of these proteins(Anderson et al. 2002; Yoshimura et al. 2002) and the highspecificity of their coiled-coil interactions (Hirano andHirano 2002). In the future it will be interesting to see ifSMC5 and SMC6 also adopt a similarly open conforma-tion to that of the cohesin SMC proteins when viewed bymicroscopy. On the other hand, if SMC5 and SMC6 proveto have a stronger hinge association than other SMCproteins, this may reflect a role for this domain in tetheringsuch SMC proteins bound to separate DNA strands duringthe repair process, as has been suggested similarly forRad50 (de Jager et al. 2001; Hopfner et al. 2002).

Supplementary Material

Maximum-likelihood trees of selected SMC proteinsfrom eukaryotes in which all SMC protein sequences areavailable, are presented in an online supplementary figure.Separate trees of the six different subfamilies are showntogether with a consensus tree constructed from concate-nated alignments of the other sequences (SMC1 to SMC6).Scale bars denote the number of accepted substitutions inWhelan-Goldman distance units and bootstrap percentagesupport for the nodes are indicated.

Web sites where additional data are available includehttp://www.kazusa.or.jp/codon/ Codon Usage Database,http://www.molbiol.ox.ac.uk/cu/ CodonW Program, http://genome.med.yale.edu/Lifecycle/ Drosophila Gene Expres-sion Levels, and http://www.hgsc.bcm.tmc.edu/blast/?organism¼Dpseudoobscura D. pseudoobscura Genome

Acknowledgments

We would like to thank Andrew Lloyd of INCBI(Irish National Centre for Bioinformatics) and the UKHuman Genome Mapping Project Resource Centre for use

FIG. 7.—Models of SMC heterodimers, based on available structural data and the patterns of coevolution described in figure 5. The coiled-coilarms of SMC4 and SMC2 have positively correlated substitution rates and appear to be associated with each other in electron microscope images ofcondensin. By contrast, cohesin is believed to form an annular structure (Gruber, Haering, and Nasmyth 2003) in which the arms of SMC1 and SMC3are separated but their termini are bridged by the SCC1/RAD21 cohesin subunit, which might account for the lack of any significant correlationbetween substitution rates in these regions. As there is also little evidence of coevolution between the arms and exposed termini of SMC5 and SMC6,it is possible that these regions do not form significant contacts between these proteins either.

Phylogenetic Analysis of SMC Proteins 343

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

of computing facilities. In addition, we are grateful toDietlind Gerloff, Paul McLaughlin, Ken Wolfe, and ananonymous referee for helpful discussions and criticalreading of the manuscript. M.H. is a Senior ResearchFellow in the Basic Biomedical Sciences, funded by theWellcome Trust. N.C. has been supported by a DarwinTrust Prize Studentship.

Literature Cited

Adams, M. D., S. E. Celniker, R. A. Holt et al. 2000. Thegenome sequence of Drosophila melanogaster. Science287:2185–2195.

Aguinaldo, A. M., J. M. Turbeville, L. S. Linford, M. C. Rivera,J. R. Garey, R. A. Raff, and J. A. Lake. 1997. Evidence fora clade of nematodes, arthropods and other moulting animals.Nature 387:489–493.

Akashi, H. 1994. Synonymous codon usage in Drosophilamelanogaster: natural selection and translational accuracy.Genetics 136:927–935.

Akhmedov, A. T., C. Frei, M. Tsai-Pflugfelder, B. Kemper, S. M.Gasser, and R. Jessberger. 1998. Structural maintenance ofchromosomes protein C-terminal domains bind preferentiallyto DNA with secondary structure. J. Biol. Chem. 273:24088–24094.

Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z.Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLASTand PSI-BLAST: a new generation of protein database searchprograms. Nucleic Acids Res. 25:3389–3402.

Anderson, D. E., A. Losada, H. P. Erickson, and T. Hirano. 2002.Condensin and cohesin display different arm conformationswith characteristic hinge angles. J. Cell Biol. 156:419–424.

Anderson, D. E., K. M. Trujillo, P. Sung, and H. P. Erickson.2001. Structure of the Rad50�Mre11 DNA repair complexfrom Saccharomyces cerevisiae by electron microscopy.J. Biol. Chem. 276:37027–37033.

Anisimova, M., J. P. Bielawski, and Z. Yang. 2001. Accuracyand power of the likelihood ratio test in detecting adaptivemolecular evolution. Mol. Biol. Evol. 18:1585–1592.

Aravind, L., R. L. Tatusov, Y. I. Wolf, D. R. Walker, and E. V.Koonin. 1998. Evidence for massive gene exchange betweenarchaeal and bacterial hyperthermophiles. Trends. Genet.14:442–444.

Baldauf, S. L., and J. D. Palmer. 1993. Animals and fungi areeach other’s closest relatives: congruent evidence frommultiple proteins. Proc. Natl. Acad. Sci. USA 90:11558–11562.

Baldauf, S. L., J. D. Palmer, and W. F. Doolittle. 1996. The rootof the universal tree and the origin of eukaryotes based onelongation factor phylogeny. Proc. Natl. Acad. Sci. USA93:7749–7754.

Baldauf, S. L., A. J. Roger, I. Wenk-Siefert, and W. F. Doolittle.2000. A kingdom-level phylogeny of eukaryotes based oncombined protein data. Science 290:972–977.

Beasley, M., H. Xu, W. Warren, and M. McKay. 2002.Conserved disruptions in the predicted coiled-coil domainsof eukaryotic SMC complexes: implications for structure andfunction. Genome Res. 12:1201–1209.

Bendich, A. J., and K. Drlica. 2000. Prokaryotic and eukaryoticchromosomes: what’s the difference? Bioessays 22:481–486.

Bennetzen, J. L. and B. D. Hall. 1982. Codon selection in yeast.J. Biol. Chem. 257:3026–3031.

Brendel, V., L. Brocchieri, S. J. Sandler, A. J. Clark, and S.Karlin. 1997. Evolutionary comparisons of RecA-like proteinsacross all major kingdoms of living organisms. J. Mol. Evol.44:528–541.

Britton, R. A., and A. D. Grossman. 1999. Synthetic lethalphenotypes caused by mutations affecting chromosomepartitioning in Bacillus subtilis. J. Bacteriol. 181:5860–5864.

Brochier, C., E. Bapteste, D. Moreira, and H. Philippe. 2002.Eubacterial phylogeny based on translational apparatusproteins. Trends Genet. 18:1–5.

Brown, J. R., C. J. Douady, M. J. Italia, W. E. Marshall, and M. J.Stanhope. 2001. Universal trees based on large combinedprotein sequence data sets. Nat. Genet. 28:281–285.

Brown, N. P., C. Leroy, and C. Sander. 1998. MView: A Webcompatible database search or multiple alignment viewer.Bioinformatics 14:380–381.

Bruno, W. J., N. D. Socci, and A. L. Halpern. 2000. Weightedneighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol. Biol. Evol. 17:189–197.

Bustamante, C. D., R. Nielsen, and D. L. Hartl. 2002. Amaximum likelihood method for analyzing pseudogeneevolution: implications for silent site evolution in humansand rodents. Mol. Biol. Evol. 19:110–117.

Cavalier-Smith, T. 2002. The neomuran origin of archaebacteria,the negibacterial root of the universal tree and bacterialmegaclassification. Int. J. Syst. Evol. Microbiol. 52:7–76.

Chuang, P. T., D. G. Albertson, and B. J. Meyer. 1994. DPY-27:a chromosome condensation protein homolog that regulatesC. elegans dosage compensation through association with theX chromosome. Cell 79:459–474.

Cobbe, N. and M. M. Heck. 2000. SMCs in the world ofchromosome biology- from prokaryotes to higher eukaryotes.J. Struct. Biol. 129:123–143.

Coghlan, A., and K. H. Wolfe. 2002. Fourfold faster rate ofgenome rearrangement in nematodes than in Drosophila.Genome Res. 12:857–867.

Collas, P., K. Le Guellec, and K. Tasken. 1999. The A-kinase-anchoring protein AKAP95 is a multivalent protein with a keyrole in chromatin condensation at mitosis. J. Cell Biol.147:1167–1180.

Comeron, J. M. and M. Kreitman. 1998. The correlation betweensynonymous and nonsynonymous substitutions in Droso-phila: mutation, selection or relaxed constraints? Genetics150:767–775.

de Jager, M., J. van Noort, D. C. van Gent, C. Dekker, R. Kanaar,and C. Wyman. 2001. Human Rad50/Mre11 is a flexiblecomplex that can tether DNA ends. Mol. Cell. 8:1129–1135.

Drummond, A., and K. Strimmer. 2001. PAL: An object-orientedprogramming library for molecular evolution and phyloge-netics. Bioinformatics 17:662–663.

Duret, L., and D. Mouchiroud. 1999. Expression pattern and,surprisingly, gene length shape codon usage in Caenorhabdi-tis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA96:4482–4487.

Echols, N., P. Harrison, S. Balasubramanian, N. M. Luscombe, P.Bertone, Z. Zhang, and M. Gerstein. 2002. Comprehensiveanalysis of amino acid and nucleotide composition ineukaryotic genomes, comparing genes and pseudogenes.Nucleic Acids Res. 30:2515–2523.

Eijpe, M., C. Heyting, B. Gross, and R. Jessberger. 2000.Association of mammalian SMC1 and SMC3 proteins withmeiotic chromosomes and synaptonemal complexes. J. CellSci. 113:673–682.

Eisen, J. A. 1995. The RecA protein as a model molecule formolecular systematic studies of bacteria: comparison of treesof RecAs and 16S rRNAs from the same species. J. Mol.Evol. 41:1105–1123.

Engels, W. R., J. Ducau, C. Flores, D. Johnson-Schlitz, and C. R.Preston. 2003. Double strand break repair: four pathways,many genes. A. Dros. Res. Conf. 44:1.

344 Cobbe and Heck

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

Fay, J. C., G. J. Wyckoff, and C. I. Wu. 2002. Testing the neutraltheory of molecular evolution with genomic data fromDrosophila. Nature 415:1024–1026.

Felsenstein, J. 1978. Cases in which parsimony or compatibi-lity methods will be positively misleading. Syst. Zool. 27:401–410.

Fousteri, M. I., and A. R. Lehmann. 2000. A novel SMC proteincomplex in Schizosaccharomyces pombe contains the Rad18DNA repair protein. EMBO J. 19:1691–1702.

Fraser, H. B., A. E. Hirsh, L. M. Steinmetz, C. Scharfe, andM. W. Feldman. 2002. Evolutionary rate in the proteininteraction network. Science 296:750–752.

Fujioka, Y., Y. Kimata, K. Nomaguchi, K. Watanabe, and K.Kohno. 2002. Identification of a novel non-SMC componentof the SMC5/SMC6 complex involved in DNA repair. J. Biol.Chem. 277:21585–21591.

Gorski, M. M., and J. C. J. Eeken. 2002. The Drosophila rad50mutants are pupal lethal, however the third instar larvae showelevated levels of anaphase bridges in dividing cells. A. Dros.Res. Conf. 43:201C.

Gouy, M., and W. H. Li. 1989. Molecular phylogeny of thekingdoms Animalia, Plantae, and Fungi. Mol. Biol. Evol.6:109–122.

Grantham, R., C. Gautier,M.Gouy,M. Jacobzone, andR.Mercier.1981. Codon catalog usage is a genome strategy modulated forgene expressivity. Nucleic Acids Res. 9:r43–r74.

Graybeal, A. 1998. Is it better to add taxa or characters to adifficult phylogenetic problem? Syst. Biol. 47:9–17.

Gruber, S., C. H. Haering, and K. Nasmyth. 2003. Chromosomalcohesin forms a ring. Cell 112:765–777.

Gu, Z., A. Cavalcanti, F. C. Chen, P. Bouman, and W. H. Li.2002. Extent of gene duplication in the genomes ofDrosophila, nematode, and yeast. Mol. Biol. Evol. 19:256–262.

Gupta, R. S. 1995. Phylogenetic analysis of the 90 kD heat shockfamily of protein sequences and an examination of therelationship among animals, plants, and fungi species. Mol.Biol. Evol. 12:1063–1073.

Haering, C. H., J. Lowe, A. Hochwagen, and K. Nasmyth. 2002.Molecular architecture of SMC proteins and the yeast cohesincomplex. Mol. Cell. 9:773–788.

Hagstrom, K. A., V. F. Holmes, N. R. Cozzarelli, and B. J.Meyer. 2002. C. elegans condensin promotes mitoticchromosome architecture, centromere organization, and sisterchromatid segregation during mitosis and meiosis. GenesDev. 16:729–742.

Hausdorf, B. 2000. Early evolution of the bilateria. Syst. Biol.49:130–142.

Hedges, S. B., H. Chen, S. Kumar, D. Y. Wang, A. S. Thompson,and H. Watanabe. 2001. A genomic timescale for the origin ofeukaryotes. BMC Evol. Biol. 1:4.

Hiraga, S. 2000. Dynamic localization of bacterial and plasmidchromosomes. Annu. Rev. Genet. 34:21–59.

Hirano, T. 2002. The ABCs of SMC proteins: two-armedATPases for chromosome condensation, cohesion, and repair.Genes Dev. 16:399–414.

Hirano, M., D. E. Anderson, H. P. Erickson, and T. Hirano. 2001.Bimodal activation of SMC ATPase by intra- and inter-molecular interactions. EMBO J. 20:3238–3250.

Hirano, M., and T. Hirano. 1998. ATP-dependent aggregation ofsingle-stranded DNA by a bacterial SMC homodimer. EMBOJ. 17:7139–7148.

———. 2002. Hinge-mediated dimerization of SMC protein isessential for its dynamic interaction with DNA. EMBO J.21:5733–5744.

Hopfner, K. P., L. Craig, G. Moncalian et al (12 co-authors).2002. The Rad50 zinc-hook is a structure joining Mre11

complexes in DNA recombination and repair. Nature418:562–566.

Hopfner, K. P., A. Karcher, L. Craig, T. T. Woo, J. P. Carney,and J. A. Tainer. 2001. Structural biochemistry and interactionarchitecture of the DNA double-strand break repair Mre11nuclease and Rad50-ATPase. Cell 105:473–485.

Hopfner, K. P., A. Karcher, D. S. Shin, L. Craig, L. M. Arthur,J. P. Carney, and J. A. Tainer. 2000. Structural biology ofRad50 ATPase: ATP-driven conformational control in DNAdouble-strand break repair and the ABC-ATPase superfamily.Cell 101:789–800.

Hughes, A. L., and M. Yeager. 1998. Natural selection at majorhistocompatibility complex loci of vertebrates. Annu. Rev.Genet. 32:415–435.

Jain, R., M. C. Rivera, and J. A. Lake. 1999. Horizontal genetransfer among genomes: the complexity hypothesis. Proc.Natl. Acad. Sci. USA 96:3801–3806.

Jessberger, R., B. Riwar, H. Baechtold, and A. T. Akhmedov.1996. SMC proteins constitute two subunits of the mamma-lian recombination complex RC-1. EMBO J. 15:4061–4068.

Johnson-Schlitz, D. M., J. Ducau, C. Flores, and W. R. Engels.2003. Evidence of DNA repair defects in mre11 and rad50mutants. A. Dros. Res. Conf. 44:323B.

Jones, S., and J. Sgouros. 2001. The cohesin complex: sequencehomologies, interaction networks and shared motifs. GenomeBiol. 2:RESEARCH0009.

Jordan, I. K., I. B. Rogozin, Y. I. Wolf, and E. V. Koonin. 2002.Essential genes are more evolutionarily conserved than arenonessential genes in bacteria. Genome Res. 12:962–968.

Kanaya, S., Y. Yamada, M. Kinouchi, Y. Kudo, and T. Ikemura.2001. Codon usage and tRNA genes in eukaryotes: correlationof codon usage diversity with translation efficiency and withCG-dinucleotide usage as assessed by multivariate analysis.J. Mol. Evol. 53:290–298.

Kim, S. T., B. Xu, and M. B. Kastan. 2002. Involvement of thecohesin protein, Smc1, in Atm-dependent and independentresponses to DNA damage. Genes Dev. 16:560–570.

Kimura, K., and T. Hirano. 1997. ATP-dependent positivesupercoiling of DNA by 13S condensin: a biochemicalimplication for chromosome condensation. Cell 90:625–634.

Kimura, K., V. V. Rybenkov, N. J. Crisona, T. Hirano, and N. R.Cozzarelli. 1999. 13S condensin actively reconfigures DNAby introducing global positive writhe: implications forchromosome condensation. Cell 98:239–248.

Klein, F., P. Mahr, M. Galova, S. B. Buonomo, C. Michaelis, K.Nairz, and K. Nasmyth. 1999. A central role for cohesins insister chromatid cohesion, formation of axial elements, andrecombination during yeast meiosis. Cell 98:91–103.

Koski, L. B., R. A. Morton, and G. B. Golding. 2001. Codon biasand base composition are poor indicators of horizontallytransferred genes. Mol. Biol. Evol. 18:404–412.

Kuhner, M. K., and J. Felsenstein. 1994. A simulationcomparison of phylogeny algorithms under equal and unequalevolutionary rates. Mol. Biol. Evol. 11:459–468.

Lang, B. F., C. O’Kelly, T. Nerad, M. W. Gray, and G. Burger.2002. The closest unicellular relatives of animals. Curr. Biol.12:1773–1778.

Lehmann, A. R., M. Walicka, D. J. Griffiths, J. M. Murray, F. Z.Watts, S. McCready, and A. M. Carr. 1995. The rad18 gene ofSchizosaccharomyces pombe defines a new subgroup of theSMC superfamily involved in DNA repair. Mol. Cell. Biol.15:7067–7080.

Li, W.-H. 1997. Molecular Evolution. Sinauer Associates,Sunderland, Mass.

Li, W. H., T. Gojobori, and M. Nei. 1981. Pseudogenes asa paradigm of neutral evolution. Nature 292:237–239.

Phylogenetic Analysis of SMC Proteins 345

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

Lieb, J. D., M. R. Albrecht, P. T. Chuang, and B. J. Meyer. 1998.MIX-1: an essential component of the C. elegans mitoticmachinery executes X chromosome dosage compensation.Cell 92:265–277.

Lindow, J. C., R. A. Britton, and A. D. Grossman. 2002. Structuralmaintenance of chromosomes protein of Bacillus subtilisaffects supercoiling in vivo. J. Bacteriol. 184:5317–5322.

Liu, C. C., J. McElver, I. Tzafrir, R. Joosen, P. Wittich, D.Patton, A. A. Van Lammeren, and D. Meinke. 2002.Condensin and cohesin knockouts in Arabidopsis exhibita titan seed phenotype. Plant J. 29:405–415.

Lopez, P., P. Forterre, and H. Philippe. 1999. The root of the treeof life in the light of the covarion model. J. Mol. Evol.49:496–508.

Losada, A., and T. Hirano. 2001. Intermolecular DNA inter-actions stimulated by the cohesin complex in vitro. Implica-tions for sister chromatid cohesion. Curr. Biol. 11:268–272.

Lowe, J., S. C. Cordell, and F. van den Ent. 2001. Crystal structureof the SMC head domain: an ABC ATPase with 900 residuesantiparallel coiled-coil inserted. J. Mol. Biol. 306:25–35.

Lyons-Weiler, J., and G. A. Hoelzer. 1997. Escaping from theFelsenstein zone by detecting long branches in phylogeneticdata. Mol. Phylogenet. Evol. 8:375–384.

Melby, T. E., C. N. Ciampaglio, G. Briscoe, and H. P. Erickson.1998. The symmetrical structure of structural maintenance ofchromosomes (SMC) and MukB proteins: long, antiparallelcoiled coils, folded at a flexible hinge. J. Cell Biol. 142:1595–1604.

Mengiste, T., E. Revenkova, N. Bechtold, and J. Paszkowski.1999. An SMC-like protein is required for efficient homol-ogous recombination in Arabidopsis. EMBO J. 18:4505–4512.

Messier, W., and C. Stewart. 1997. Episodic adaptive evolutionof primate lysozymes. Nature 385:151–154.

Michaelis, C., R. Ciosk, and K. Nasmyth. 1997. Cohesins:chromosomal proteins that prevent premature separation ofsister chromatids. Cell 91:35–45.

Montgomery, E. A., B. Charlesworth, and C. H. Langley. 1987.A test for the role of natural selection in the stabilization oftransposable element copy number in a population ofDrosophila melanogaster. Genet. Res. 49:31–41.

Morgenstern, B., K. Frech, A. Dress, and T. Werner. 1998.DIALIGN: Finding local similarities by multiple sequencealignment. Bioinformatics 14:290–294.

Mushegian, A. R., J. R. Garey, J. Martin, and L. X. Liu. 1998.Large-scale taxonomic profiling of eukaryotic model organ-isms: a comparison of orthologous proteins encoded by thehuman, fly, nematode, and yeast genomes. Genome Res.8:590–598.

Nei, M., P. Xu, and G. Glazko. 2001. Estimation of divergencetimes from multiprotein sequences for a few mammalianspecies and several distantly related organisms. Proc. Natl.Acad. Sci. USA 98:2497–2502.

Niki, H., R. Imamura, M. Kitaoka, K. Yamanaka, T. Ogura, andS. Hiraga. 1992. E. coli MukB protein involved inchromosome partition forms a homodimer with a rod-and-hinge structure having DNA binding and ATP/GTP bindingactivities. EMBO J. 11:5101–5109.

Pal, C., B. Papp, and L. D. Hurst. 2001. Highly expressed genesin yeast evolve slowly. Genetics 158:927–931.

Revenkova, E., M. Eijpe, C. Heyting, B. Gross, and R.Jessberger. 2001. Novel meiosis-specific isoform of mamma-lian SMC1. Mol. Cell Biol. 21:6984–6998.

Saitoh, N., I. Goldberg, and W. C. Earnshaw. 1995. The SMCproteins and the coming of age of the chromosome scaffoldhypothesis. BioEssays 17:759–766.

Saitoh, N., I. G. Goldberg, E. R. Wood, and W. C. Earnshaw.1994. ScII: an abundant chromosome scaffold protein isa member of a family of putative ATPases with an unusualpredicted tertiary structure. J. Cell Biol. 127:303–318.

Saka, Y., T. Sutani, Y. Yamashita, S. Saitoh, M. Takeuchi, Y.Nakaseko, and M. Yanagida. 1994. Fission yeast cut3 andcut14, members of a ubiquitous protein family, are requiredfor chromosome condensation and segregation in mitosis.EMBO J. 13:4938–4952.

Sakai, A., K. Hizume, T. Sutani, K. Takeyasu, and M. Yanagida.2003. Condensin but not cohesin SMC heterodimer inducesDNA reannealing through protein-protein assembly. EMBO J.22:2764–2775.

Schmid, K. J., and D. Tautz. 1997. A screen for fast evolving genesfrom Drosophila. Proc. Natl. Acad. Sci. USA 94:9746–9750.

Sharp, P. M., and W. H. Li. 1989. On the rate of DNA sequenceevolution in Drosophila. J. Mol. Evol. 28:398–402.

Shields, D. C., P. M. Sharp, D. G. Higgins, and F. Wright. 1988.‘‘Silent’’ sites in Drosophila genes are not neutral: evidenceof selection among synonymous codons. Mol. Biol. Evol.5:704–716.

Siddiqui, N. U., P. E. Stronghill, R. E. Dengler, C. A.Hasenkampf, and C. D. Riggs. 2003. Mutations in Arabi-dopsis condensin genes disrupt embryogenesis, meristemorganization and segregation of homologous chromosomesduring meiosis. Development 130:3283–3295.

Sjogren, C., and K. Nasmyth. 2001. Sister chromatid cohesion isrequired for postreplicative double-strand break repair inSaccharomyces cerevisiae. Curr. Biol. 11:991–995.

Soppa, J. 2001. Prokaryotic structural maintenance of chromo-somes (SMC) proteins: distribution, phylogeny, and compar-ison with MukBs and additional prokaryotic and eukaryoticcoiled-coil proteins. Gene 278:253–264.

Soppa, J., K. Kobayashi, M. F. Noirot-Gros, D. Oesterhelt, S. D.Ehrlich, E. Dervyn, N. Ogasawara, and S. Moriya. 2002.Discovery of two novel families of proteins that are proposedto interact with prokaryotic SMC proteins, and characteriza-tion of the Bacillus subtilis family members ScpA and ScpB.Mol. Microbiol. 45:59–71.

Springer, M. S., R. W. DeBry, C. Douady, H. M. Amrine, O.Madsen, W. W. de Jong, and M. J. Stanhope. 2001.Mitochondrial versus nuclear gene sequences in deep-levelmammalian phylogeny reconstruction. Mol. Biol. Evol.18:132–143.

Steen, R. L., F. Cubizolles, K. Le Guellec, and P. Collas. 2000. Akinase-anchoring protein (AKAP)95 recruits human chromo-some-associated protein (hCAP)-D2/Eg7 for chromosomecondensation in mitotic extract. J. Cell Biol. 149:531–536.

Steffensen, S., P. A. Coelho, N. Cobbe, S. Vass, M. Costa, B.Hassan, S. N. Prokopenko, H. Bellen, M. M. Heck, and C. E.Sunkel. 2001. A role for Drosophila SMC4 in the resolutionof sister chromatids in mitosis. Curr. Biol. 11:295–307.

Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling:a quartet maximum likelihood method for reconstructing treetopologies. Mol. Biol. Evol. 13:964–969.

Strunnikov, A. V., E. Hogan, and D. Koshland. 1995. SMC2,a Saccharomyces cerevisiae gene essential for chromosomesegregation and condensation, defines a subgroup within theSMC family. Genes Dev. 9:587–599.

Strunnikov, A. V., V. L. Larionov, and D. Koshland. 1993.SMC1: an essential yeast gene encoding a putative head-rod-tail protein is required for nuclear division and defines a newubiquitous protein family. J. Cell Biol. 123:1635–1648.

Sutani, T., and M. Yanagida. 1997. DNA renaturation activity ofthe SMC complex implicated in chromosome condensation.Nature 388:798–801.

346 Cobbe and Heck

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from

Suzuki, Y., G. V. Glazko, and M. Nei. 2002. Overcredibility ofmolecular phylogenies obtained by Bayesian phylogenetics.Proc. Natl. Acad. Sci. USA 99:16138–16143.

Swofford, D. L. 1999. PAUP*: phylogenetic analysis usingparsimony (*and other methods). Version 4.0. SinauerAssociates, Sunderland, Mass.

Tateno, Y., N. Takezaki, and M. Nei. 1994. Relative efficienciesof the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site.Mol. Biol. Evol. 11:261–277.

Taylor, E. M., J. S. Moghraby, J. H. Lees, B. Smit, P. B. Moens,and A. R. Lehmann. 2001. Characterization of a novel humanSMC heterodimer homologous to the Schizosaccharomycespombe Rad18/Spr18 complex. Mol. Biol. Cell. 12:1583–1594.

Thatcher, T. H., and M. A. Gorovsky. 1994. Phylogeneticanalysis of the core histones H2A, H2B, H3, and H4. NucleicAcids Res. 22:174–179.

Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994.CLUSTAL W: improving the sensitivity of progressivemultiple sequence alignment through sequence weighting,position-specific gap penalties and weight matrix choice.Nucleic Acids Res. 22:4673–4680.

Van de Peer, Y., S. L. Baldauf, W. F. Doolittle, and A. Meyer.2000. An updated and comprehensive rRNA phylogeny of(crown) eukaryotes based on rate-calibrated evolutionarydistances. J. Mol. Evol. 51:565–576.

van den Ent, F., A. Lockhart, J. Kendrick-Jones, and J. Lowe.1999. Crystal structure of the N-terminal domain of MukB:a protein involved in chromosome partitioning. Struct. Fold.Des. 7:1181–1187.

Verkade, H. M., S. J. Bugg, H. D. Lindsay, A. M. Carr, and M. J.O’Connell. 1999. Rad18 is required for DNA repair andcheckpoint responses in fission yeast. Mol. Biol. Cell.10:2905–2918.

Veuthey, A. L., and G. Bittar. 1998. Phylogenetic relationships offungi, plantae, and animalia inferred from homologouscomparison of ribosomal proteins. J. Mol. Evol. 47:81–92.

Wang, D. Y., S. Kumar, and S. B. Hedges. 1999. Divergence

time estimates for the early history of animal phyla and the

origin of plants, animals and fungi. Proc. R. Soc. Lond. B

Biol. Sci. 266:163–171.Whelan, S., and N. Goldman. 2001. A general empirical model

of protein evolution derived from multiple protein families

using a maximum-likelihood approach. Mol. Biol. Evol. 18:691–699.

Wolf, Y. I., I. B. Rogozin, N. V. Grishin, R. L. Tatusov, and E.

V. Koonin. 2001. Genome trees constructed using five

different approaches suggest new major bacterial clades.

BMC Evol. Biol. 1:8.Wright, F. 1990. The effective number of codons used in a gene.

Gene 87:23–29.Yamazoe, M., T. Onogi, Y. Sunako, H. Niki, K. Yamanaka, T.

Ichimura, and S. Hiraga. 1999. Complex formation of MukB,

MukE and MukF proteins involved in chromosome partition-

ing in Escherichia coli. EMBO J. 18:5873–5884.Yang, J., Z. Gu, and W. H. Li. 2003. Rate of protein evolution

versus fitness effect of gene deletion. Mol. Biol. Evol.

20:772–724.Yazdi, P. T., Y. Wang, S. Zhao, N. Patel, E. Y. Lee, and J. Qin.

2002. SMC1 is a downstream effector in the ATM/NBS1

branch of the human S-phase checkpoint. Genes Dev. 16:571–582.

Yoshimura, S. H., K. Hizume, A. Murakami, T. Sutani, K.

Takeyasu, and M. Yanagida. 2002. Condensin architecture

and interaction with DNA: regulatory non-SMC subunits bind

to the head of SMC heterodimer. Curr Biol. 12:508–513.Zwickl, D. J., and D. M. Hillis. 2002. Increased taxon sampling

greatly reduces phylogenetic error. Syst. Biol. 51:588–598.

Michele Vendruscolo, Associate Editor

Accepted September 29, 2003

Phylogenetic Analysis of SMC Proteins 347

by guest on January 22, 2015http://m

be.oxfordjournals.org/D

ownloaded from