Reorganization of Adjacent Gene Relationships in Yeast Genomes by Whole-Genome Duplication and Gene...
-
Upload
independent -
Category
Documents
-
view
2 -
download
0
Transcript of Reorganization of Adjacent Gene Relationships in Yeast Genomes by Whole-Genome Duplication and Gene...
and Evolution. All rights reserved. For permissions, please e-mail: [email protected] The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology
Research Article
Reorganization of adjacent gene relationships in yeast genomes by
whole genome duplication and gene deletion
Jake K. Byrnes*, Geoffrey P. Morris*, and Wen-Hsiung Li
* These authors contributed equally.
Department of Ecology and Evolution, University of Chicago
Address for correspondence and reprints: Wen-Hsiung Li, Department of Ecology &
Evolution, University of Chicago, 1101 East 57th St., Chicago, Illinois 60637. Phone:
773-702-3104. Fax:773-702-9740. Email: [email protected]
Running head: Reorganization of adjacent genes in yeast
Keywords: Whole genome duplication, gene deletion, Saccharomyces, adjacent gene
orientation, intergenic spacing, gene expression.
Abbreviations: whole genome duplication followed by deletion (WGD-D), percentage of
divergently and convergently transcribed adjacent gene pairs (%DC)
MBE Advance Access published March 9, 2006 by guest on M
ay 30, 2013http://m
be.oxfordjournals.org/D
ownloaded from
Abstract
In Saccharomyces, an ancient whole genome duplication (WGD) and widespread
duplicate gene deletion resulted in extensive reorganization of adjacent gene
relationships. We have studied the evolution of adjacent gene pairs’ identity, orientation,
and spacing following WGD and deletion (WGD-D) using comparative genomic analyses
and simulations. Surveying adjacent gene organization across the Saccharomyces species
complex, we find a genome-wide bias towards divergently and convergently transcribed
gene pairs in all species, but a reduction in this bias in the species that underwent WGD-
D. Among neutral models of WGD-D, only single-gene deletion can produce the
appropriate reduction in orientation bias and recapitulate the pattern of short, highly-
dispersed deletions we observe in S. cerevisiae. To characterize the dynamics of WGD-
D, we trace the conservation and creation of adjacent gene pairs along the S. cerevisiae
lineage. We find that newly created adjacencies have a tandem orientation bias, while
adjacencies conserved from prior to WGD-D have the same divergent-convergent bias as
found in the species that diverged before WGD. We also find that adjacent gene pairs
produced by WGD-D gained greater intergenic spacing, but that this is reduced in the
older adjacencies. Given this, and the preponderance of short deleted blocks, we argue
that the deletion phase of WGD-D occurred primarily by small inactivating mutations
followed by numerous small deletions. Newly created adjacent gene pairs also have an
initial increase in mean log2 expression ratios and maximal expression levels, suggesting
that increased intergenic spacing caused a genome-wide reduction in transcriptional
interference.
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
Introduction
Based on analyses of syntenic blocks of duplicate genes in the Saccharomyces
cerevisiae genome, Wolfe and Shields (1997) proposed an ancient whole genome
duplication (WGD) in the budding yeast. This hypothesis has recently been confirmed by
comparative genomic analyses of two yeast species that diverged from S. cerevisiae prior
to WGD, Ashbya gossypii (Dietrich et al. 2004) and Kluyveromyces waltii (Kellis, Birren,
and Lander 2004). There is now an excellent opportunity to study genome evolution
following WGD using the many species in the Saccharomyces species complex
(Kurtzman 2003) for which whole genome sequence is available (Kellis et al. 2003;
Cliften et al. 2003; Dujon et al. 2004). Those which diverged following WGD (“post-
WGD”) include the Saccharomyces sensu stricto species S. mikatae, S. kudriavzevii, S.
bayanus, and the more distantly related S. castellii and Candida glabrata. Those that
diverged from the S. cerevisiae lineage prior to WGD (“non-WGD”) include S. kluyveri
and K. lactis, in addition to A. gossypii and K. waltii.
Despite the initial doubling of genome content due to WGD there are now only
modest differences in genome size and gene number between non-WGD and post-WGD
species. For example, the K. waltii genome has 10.7 million base pairs [Mbp] and ~5200
genes, while the S. cerevisiae genome contains 12.5 Mbp and ~5700 genes. Was the
reduction of genome size concurrent with the reduction in gene number, with large
deletions directly responsible for gene loss, or did much of the deletion follow prior
pseudogenization? We may gain insight into the dynamics of this process by studying
the organization of the remaining genes.
In fact, the process of WGD followed by deletion (WGD-D) left a complex
pattern of large, interleaved syntenic blocks in S. cerevisiae. Since few duplicates remain
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
(~10%; Kellis, Birren, and Lander 2004), these syntenic blocks only become obvious in a
2:1 alignment of S. cerevisiae syntenic blocks to the corresponding block in a non-WGD
species (Kellis, Birren, and Lander 2004; Dietrich et al. 2004). Visual inspection of the
interleaving pattern in these 2:1 alignments seems to suggest a preponderance of short,
highly-dispersed deletions, but the underlying deletion process has never been modeled.
It is not known whether a simple model of short random deletions could produce this
pattern, or if deletions are more or less interleaved than would be expected by chance.
This interleaving is important because it may represent a rare opportunity for
genome reorganization in yeast, given the apparent paucity of inversions and
translocations (Fisher et al. 2000). Following WGD-D, genome location was mostly
retained (albeit across two duplicate chromosomes) while adjacent gene relationships
were largely altered. Therefore, we focused on the dynamics of genomic reorganization
from the perspective of adjacent gene organization. A deletion may affect one or more of
these aspects of adjacent gene organization: identity, orientation, and spacing. First, the
identities of the genes in an adjacent pair change when one or more genes are lost and the
flanking genes form a new adjacent pair. Second, after a deletion, the newly adjacent
pair may have a different transcriptional orientation (tandem, convergent, or divergent;
Cohen et al. 2000) than the old adjacent pairs. Finally, the spacing of adjacent genes may
be reduced or increased by a deletion, depending on its boundaries
There are a number of reasons to consider genome reorganization in terms of
adjacent genes as opposed to chromosomal location. First, a preliminary examination of
the S. cerevisiae synteny map suggests that most deletions are small, altering the local
gene organization. In this case, adjacency may be the aspect of genome structure most
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
affected by WGD-D. Second, there is mounting evidence for functional interactions
between adjacent genes in eukaryotes. There are many well-documented instances of
transcriptional interference for adjacent genes (Shearwin, Callen, and Egan 2005) and
divergent transcription from bidirectional promoters, including GAL1-GAL10 in S.
cerevisiae and prnD-prnB in Aspergillus nidulans (Lohr, Venkov, and Zlatanova 1995;
Garcia et al. 2004). Genome-wide analyses have shown differing mean expression
correlations for divergent, convergent, and tandem adjacent gene pairs in yeast (Cohen et
al. 2000) and greater intergenic spacing for genes with higher expression in humans
(Chiaromonte, Miller, and Bouhassira 2003). Adjacent gene orientation has also been
linked to the localization of cohesin domains (Filipski and Mucha 2002) and hotspots for
recombination (Gerton et al. 2000) in yeast. Finally, we study adjacent genes because
several of the sequenced yeast genomes are currently available only as fragments such as
contigs or supercontigs. By analyzing adjacent genes we can investigate genome
reorganization without knowing the entire genome structure.
In this study, we analyze genome sequences of post-WGD and non-WGD species
and implement simulations of the WGD-D process to address the following questions.
What is the typical scale of deletion and how does this inform our model of gene loss?
Does the genome organization of post-WGD species (i.e. interleaving and adjacent gene
relationships) provide evidence for selection shaping WGD-D or for functional
consequences of WGD-D on expression patterns? We find that gene loss occurred by
inactivating mutation, followed by numerous small deletions, and that the resulting
increase in intergenic spacing led to a widespread, but largely neutral, reduction in
transcriptional interference across the yeast genome.
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
Materials and Methods
Genomic analyses
The Saccharomyces Genome Database (SGD) annotation for S. cerevisiae, and
the Washington University annotations for S. mikatae, S. kudriavzevii, S. bayanus, S.
castellii, and S. kluyveri were downloaded from SGD (http://www.yeastgenome.org).
The K. waltii genome annotation was downloaded from the supplemental website for
Kellis, Birren, and Lander (2004; http://www.broad.mit.edu/seq/YeastDuplication/). To
avoid spurious ORFs, we only use K. waltii ORFs with homology to S. cerevisiae ORFs.
The A. gossypii genome annotation was downloaded from the Ashbya Genome Database
(http://agd.unibas.ch/). The genome annotations for C. glabrata and K. lactis were
downloaded from Génolevures (http://cbi.labri.fr/Genolevures/). Orthology to S.
cerevisiae was provided in the respective genome annotations. The lengths of post-WGD
deletions were collected from the S. cerevisiae-A. gossypii alignment, provided in the
supplemental materials from Dietrich et al. (2004). Orthologous genes with no
connection to a syntenic block were not included in counts of deleted block lengths.
We define the quantity %DC, the percentage of adjacent pairs that are in
divergent or convergent orientation. We use this summary statistic because of structural
dependency between divergent and convergent adjacency. For any contiguous block of
genes, the divergent and convergent counts can differ by at most one, since divergent and
convergent adjacencies are switching points between tracts of genes on opposite strands.
Genome-wide %DC and intergenic length data were collected using PERL scripts and
MySQL queries.
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
Evolutionary analyses
We traced the origin of each pair of adjacent genes in S. cerevisiae, looking for
conservation in successive outgroups along the S. cerevisiae lineage (including S.
mikatae, S. kudriavzevii, S. bayanus, S. castellii, S. kluyveri, and A. gossypii). Since
homoplasy is unlikely, we assume that any S. cerevisiae adjacency also found in an
outgroup was present in their common ancestor regardless of its absence in intervening
nodes. Tandem gene duplication also creates new adjacencies, skewing the distribution
of orientations in younger adjacency classes. Indeed, we found that 14 out of 52 tandem
duplicates in the full datasets are in the youngest class, so we removed all adjacent
duplicate genes for subsequent analyses. We present results from analysis of all S.
cerevisiae ORFs included in Harbison et al. (2004; 5546 ORFs), though the patterns hold
if we use only adjacencies where both ORFs are classified by SGD as “verified”. We
investigated whether other possible origins for S. cerevisiae-specific adjacencies
(inversion, mis-annotation) could have biased our results, but found no evidence for this.
If we use only the S. cerevisiae-specific adjacencies that have unambiguous evidence of
WGD-D origin (i.e., orthologs in non-WGD species are less than 10 genes apart) the
results are the same.
Simulations
To examine the potential for reorganization of adjacent gene relationships by
WGD-D we used a simulation coded in PERL. Each chromosome was represented as an
array of genes that recorded presence/absence and orientation. The input genomes had
the chromosome number and gene number of A. gossypii, a non-WGD species. Gene
orientations were either modeled after A. gossypii or constructed randomly within a set
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
range of %DC. The genomes were then duplicated and the deletion process occurred in
three stages. First, we draw “attempted” deletions, which represent the underlying
mutational process. For each attempted deletion event, the locus was randomly selected
and the length (in number of genes) was drawn from a uniform distribution (maximum
block size of one or two genes) or a Poisson distribution (mean block size of one or two
genes). For the neutral simulation, the duplicate copy was chosen for deletion randomly.
For the selective simulation, we weighted the probability that a duplicate copy was
chosen based on the net gain of tandem adjacencies for a deletion of one copy versus the
other. When defining the deletion boundaries, previously deleted genes were assumed to
have zero length. Next, attempted deletions that remove only redundant gene copies
become “accepted” deletions. This assumes that any deletion that removes a single-copy
gene would be effectively lethal. Finally, overlapping and adjacent deletions were
combined to form “apparent” deletions, equivalent to the blocks of deleted genes that can
be observed in yeast. Therefore, it is the apparent deleted block length distributions from
our simulations that we compare to the observed deleted block length distribution from S.
cerevisiae. When we discuss “deleted blocks” from either simulation or data analysis, we
will always be referring to apparent deleted blocks. In S. cerevisiae, Kellis, Birren and
Lander (2004) found that ~10% of duplicates remained from the WGD, so in our
simulations we allowed deletions to continue until 10% of duplicate genes remained. All
simulation results were generated using 10,000 iterations.
Gene expression analyses
S. cerevisiae expression data from Affymetrix GeneChip (101 microarray
experiments) were obtained from NCBI’s Gene Expression Omnibus
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
(http://www.ncbi.nlm.nih.gov/geo/). Only the hybridization intensities for the perfect
match probes were used for further analyses. These data were background-corrected and
quantile normalized in R (http://www.r-project.org) using the Affy package from
Bioconductor (http://www.bioconductor.org; Gautier et al. 2004). The probe-to-gene
annotation was created from a MEGABLAST (Zhang et al. 2000) similarity search of the
probe sequences against the most recent version of S. cerevisiae coding sequences. To
avoid spurious signal due to cross-hybridization, probes that matched more than one gene
with an E-value < 10-2 (at least 12 consecutive base pairs and 16/25 base pairs matching
overall) were dropped from the analysis. The relative expression values used in
subsequent analyses are mean intensities from the set of unique probes. To quantify
expression coupling of the genes in an adjacent pair, we calculated the mean of the
absolute value of log2 expression ratios for adjacent genes (abs[log2[adjacent gene #1
expression/adjacent gene #2 expression]]) across conditions and Pearson and Spearman
correlations across conditions. To quantify maximal expression for an adjacent gene pair,
we summed expression for pairs of adjacent genes within each condition, then took the
maximum of this value across conditions (max[adjacent gene #1 expression + adjacent
gene #2 expression]).
Results
Adjacent gene relationships in extant yeast genomes
To investigate the evolution of adjacent gene relationships in yeast, we first
surveyed the variation in adjacent gene organization in the genome sequences of ten
members of the Saccharomyces species complex (Table 1). All species examined have a
bias towards divergent and convergent adjacencies, however, the %DC values for
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
genomes of post-WGD species are consistently lower (51.0 to 53.0%) than the %DC
found in the genomes of non-WGD species (54.0 to 56.0%). This suggests that one effect
of WGD-D may be to reduce genome-wide bias in adjacent gene orientation.
The intergenic lengths of the non-WGD genomes are generally less than that of
post-WGD species (Table 1). Since intergenic length estimates are inflated by missing
genes, it is most informative to compare genomes of similar annotation quality.
Comparing the well-annotated non-WGD species A. gossypii (4711 genes) with the post-
WGD species S. cerevisiae (5714 genes), we see that the intergenic spacing has increased
regardless of orientation. The same is true in the comparison of the non-WGD species K.
waltii (5230 genes) and K. lactis (5331 genes) with the post-WGD species C. glabrata
(5272 genes) and S. bayanus (4716 genes).
Modeling adjacent gene reorganization due to WGD-D
We developed a simulation to test the ability of WGD-D to reorganize adjacent
gene relationships (see Materials and Methods). Under random single-gene deletion, our
model predicts that WGD-D will lead to extensive reorganization of adjacent gene
orientations for a wide range of starting genome structures (fig. 1). In particular, the
mean genome-wide orientation bias was always reduced in our simulations of neutral
WGD-D. We also find that selection has the potential to shape the extent of
reorganization, either in terms of %DC (fig. 1) or deleted block lengths (unpublished
data). We present the results for strong selection on adjacency (i.e. duplicate copies are
deterministically selected for deletions based on net differences in the adjacencies
created) and moderate selection on adjacency (i.e. 40% adjustment in the probability of
choosing the copy for deletion), either favoring or disfavoring tandem adjacencies. We
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
can clearly distinguish the change in %DC expected under these selective models from
that expected under the neutral model for a wide range of starting %DC (starting %DC >
20%).
From our simulations we can make quantitative predictions about the effect of
WGD-D to compare with the pattern in extant yeast. We ran simulations using the A.
gossypii genome organization as a proxy for the ancestral organization, and used a range
of attempted deletion distributions to account for the possibility of longer deletion tracts
or clustering of deletions. Of the neutral deletion scenarios we examined, only single-
gene deletion could reduce the %DC from 56.0% to 52.0%, the inferred reduction in the
S. cerevisiae lineage (fig. 2a; p = 0.21 for single-gene deletion and p < 10-4 for models
with longer deletions).
We can also use our simulations to derive the expected distribution of deleted
block lengths under a variety of deletion models. We compared the distribution of
deleted block lengths (in number of genes) from S. cerevisiae to the distribution from our
simulations of uniform and Poisson-distributed attempted deletions. The distribution of
deletions for S. cerevisiae is skewed towards small deletions relative to most neutral
deletion models (fig. 2b; p < 0.0001), but a single-gene deletion model can approximate it
well (p = 0.11).
Evolutionary analysis of orientation and intergenic spacing
To gain insight into the dynamics of the adjacent gene reorganization, we traced
the origin of each pair of adjacent genes in S. cerevisiae, looking for conservation in
successive outgroups along the S. cerevisiae lineage. The oldest adjacencies, which
predate WGD, have a %DC of 56.7% (fig. 3a), statistically indistinguishable from the
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
ancestral %DC (p = 0.59). Therefore, the orientation bias in conserved adjacencies is a
reflection of the ancestral bias, not a bias in the retention rate. Given that the ancestral
genome has a divergent-convergent bias, we expect that random deletion will initially
create more new tandem adjacencies and reduce the divergent-convergent bias (see fig.
1). As deletion proceeds, and the genome-wide divergent-convergent bias is reduced, the
bias in newly created adjacencies should approach this genome-wide value (see fig. 1).
Indeed, the adjacencies created immediately following WGD have a tandem bias (%DC =
46.5%) and the tandem bias is reduced for those created more recently (%DC = 49.3% -
49.9%), though they are not significantly different from one another.
The evolutionary analysis also shows a pattern of greater intergenic spacing in
more recently created adjacencies (fig. 3b). Furthermore, the mean and variance of
intergenic length are monotonically decreasing with age, as would be expected if genes
are lost by small, inactivating mutations followed by successive small deletions. This
effect is not due to differences in %DC across the adjacency age classes because the
pattern holds when the data is partitioned into convergent, divergent, and tandem
adjacencies (unpublished data). Even the adjacencies created immediately following
WGD have greater mean intergenic length than the adjacencies that remained from before
WGD, suggesting that WGD-D has had a lasting effect on intergenic spacing.
Expression evolution in adjacencies
Given that Cohen et al. (2000) found that adjacent gene coexpression was less
likely for pairs with greater spacing, we asked whether the evolution of longer intergenic
regions was associated with the decoupling of expression in adjacent genes. If new
adjacencies are as strongly coupled in expression as conserved adjacencies, then the
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
increase in intergenic spacing provides no overall expression decoupling. To characterize
the extent of expression coupling, we determined the log2 expression ratio for pairs of
adjacent genes in each age class, averaged across 101 published microarray experiments
(fig. 3c). There is a significant increase in the mean expression ratio for younger
adjacencies, suggesting an overall expression decoupling due to WGD-D. Pearson and
Spearman correlation coefficients for expression of adjacent genes, which would detect
finer scale coexpression patterns, show no relationship with age of adjacency
(unpublished data). The higher mean log2 expression ratio for new adjacencies may not
be due to decoupling at all levels of expression, but could be driven by decoupling at high
expression levels. Since the mean log expression ratio for adjacent genes is more
sensitive to extreme values than correlation coefficients, which are bounded, the pattern
in log expression ratio is likely driven by the increase in maximal expression. Therefore,
we asked whether the evolution of longer intergenic regions was associated with greater
maximal expression (i.e. maximum of the summed expression for an adjacent gene pair
across conditions). In this case, the average maximal expression should be greater for the
younger adjacency age classes. As predicted, greater maximal expression is observed in
the adjacency age classes with greater intergenic spacing, that is, the S. cerevisiae and the
Saccharomyces sensu stricto specific adjacencies (fig. 3d).
Expression values derived from microarray hybridization signal may potentially
be influenced by differences in probe affinities due to base composition, but this is
unlikely to affect our results given the large number (~104-105) of probes per adjacency
age class. Indeed, when we performed a multiple regression accounting for any effect of
GC-content on the expression measures, the relationship between maximal expression
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
and adjacency age class remained highly significant (p < 10-6). Furthermore, this
relationship is not due to a specific functional category of genes. While many of the
highly expressed genes are ribosomal, the trend remains when we remove the 438 genes
in our dataset annotated by Gene Ontology to the “protein biosynthesis” biological
process category.
Discussion
Deletion and the mechanism of gene loss
In the 2:1 alignments of the S. cerevisiae genome to the K. waltii or A. gossypii
genome, there appears to be many multiple-gene deletion blocks. Kellis, Birren and
Lander (2004) note that deleted blocks are small, with an average length of two genes,
but they do not speculate whether these blocks represent individual multiple-gene
deletion events or several single-gene losses beside one another. Our simulation
demonstrates that long tracts of adjacent deletions are common under a random single-
gene deletion model, and S. cerevisiae has no more long deleted blocks than expected
under a model of random single-gene deletion (fig. 2b). Therefore, we find no evidence
that multiple-gene deletion events played a significant role in WGD-D in yeast.
Similarly, there is no evidence for clustering of deletions, as this would also lead to the
appearance of an excess of multiple-gene deleted blocks. Since multiple-gene deleted
blocks imply tracts of conserved adjacency on the paralogous chromosome, there is also
no genome-wide tendency for conservation of gene clusters at the level of adjacencies.
Since we only consider conservation of adjacency, this does not contradict window-based
identification of conserved clusters of metabolic (Wong and Wolfe 2005) or essential
genes (Pal and Hurst 2003).
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
While a single-gene deletion model was best able to approximate the effect of
WGD-D in yeast, our gene-based simulation does not distinguish whether the loss of
single-genes occurs by gene-length deletion, smaller deletion, or inactivating mutation.
However, the evolutionary analysis of intergenic lengths does provide information about
the interplay between deletion and pseudogenization during the process of gene loss. For
instance, if a functional duplicate gene is lost by a gene-length deletion event without an
intermediate step of pseudogene formation, the intergenic length of the new adjacency
may not increase significantly. This mode of gene loss is not likely because our analysis
shows that adjacencies newly created by WGD-D have greater intergenic spacing (fig.
3b). This suggests that gene loss is initiated by inactivating mutations, such as small
indels that cause frameshifts or substitutions that cause premature stop codons.
The monotonic decrease of intergenic spacing with increasing age of adjacency
suggests that after pseudogenization the intergenic spacing for the newly formed
adjacency is gradually reduced by many small deletions. This mean decrease of
intergenic spacing cannot be due to rare large deletions since the variance is reduced
along with the mean (fig. 3b). These data support a model for gene loss during the yeast
WGD-D where small inactivating mutations (deletion or otherwise) are followed by a
whittling down of the pseudogenic and intergenic sequence. Previous research has
shown that small indels are biased towards deletion in a wide range of eukaryotes, from
mammals and fish to insects and plants (Gregory 2004), but it has never been
investigated in yeast. While it is possible that selection for genome compactness led to
the reduction of intergenic space following WGD-D, it is likely that the yeast deletion
pattern is due to the same mutational bias found across the eukaryotes.
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
In keeping with the idea of gradual deletion, it seems that the deletion phase was
not complete in the common ancestor of the post-WGD species. If the deletion phase of
WGD-D was very short, most new adjacencies would have been created by the time of
the S. castellii-S. cerevisiae split, but we find that the majority (73%; fig. 3) of new
adjacencies were created afterwards. An incomplete annotation for the S. castellii
genome could potentially lead us to underestimate the age of some adjacencies,
spuriously dating them to the S. cerevisiae-Saccharomyces sensu stricto split. However,
there are also many (721) S. cerevisiae specific adjacencies, which are unlikely to be
missed in the genome sequencing of all Saccharomyces sensu stricto species and S.
castellii. Therefore, we conclude that the process of gene loss and deletion has continued
during the radiation of the Saccharomyces species complex. Since there are only a
handful of pseudogenes in yeast (Harrison and Gerstein 2002), and none of these appear
to be of WGD-D origin (unpublished data), little, if any, trace remains of the
pseudogenes created by WGD-D.
Functional consequences of WGD-D
A gradual process of gene loss may suggest that the deletion phase of WGD-D is
neutral or even subject to negative selection. Unfortunately, the current annotation
quality does not allow a reliable estimate of the gene loss rate, so we cannot determine
whether the rate of gene loss is reduced due to negative selection conserving adjacencies
or even increased by positive selection favoring the creation of new adjacencies.
However, the signature of selection may appear as a bias in the conservation or creation
of adjacent gene relationships, or in the spatial patterning of deletions.
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
We find that random single-gene deletion can recapitulate the reorganization of
adjacencies due to WGD-D in S. cerevisiae. Since random single-gene deletion is
sufficient to produce the same genome-wide reduction in %DC (from ~56% to ~52%) as
observed in yeast, WGD-D was largely neutral with respect to orientation (fig. 2a). The
conservation and creation of adjacencies along the S. cerevisiae lineage is also consistent
with a neutral reorganization of gene orientations (fig. 3a). Finally, the random single-
gene deletion model was also able to produce the distribution of deleted block lengths
observed in S. cerevisiae, with deletions neither excessively interleaved or clustered (fig.
2b). Since the random single-gene deletion model of WGD-D reproduced the patterns of
retention and turnover of adjacent gene relationships, there is no evidence that negative
selection or positive selection shaped the WGD-D genome reorganization in yeast.
Even though there is no evidence for selection shaping WGD-D in yeast, there is
evidence that this reorganization had functional consequences for the genome. A major
effect of the interleaved gene loss was to provide increased intergenic spacing for many
genes (fig. 3b). We suggest that this increased intergenic spacing was responsible for the
uncoupling of expression and increased maximal expression for adjacent genes by
relieving transcriptional interference. Given that the ancestral genome was highly
compact, the concurrent expression of adjacent genes may have been limited by
transcriptional interference (Shearwin, Callen, and Egan 2005). Even in the relatively
spacious genome of humans, there is evidence that transcriptional interference can
explain an association between higher expression and greater intergenic spacing
(Chiaromonte, Miller, and Bouhassira 2003).
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
How can we reconcile the gradual neutral reorganization of adjacent gene
relationships with the evidence for widespread functional changes? For instance, while
WGD maintains relative copy numbers of interacting partners, the dosage balance
hypothesis (Veitia 2004) would predict that an asynchronous deletion phase would be
deleterious. It may be that downstream regulatory mechanisms (i.e. feedback or
translational regulation) compensate for most expression changes or that expression
changes of the magnitude we see do not affect the function of most genes. Future
improvements to the genome sequences of the species in the Saccharomyces complex
will clarify the dynamics of WGD-D, but a full understanding of the functional
consequences will require genome-wide expression data from more species, particularly
the non-WGD outgroups.
Acknowledgements
This research was supported by the Natural Sciences and Engineering Research
Council of Canada (G.P.M.), National Science Foundation (G.P.M.), the Department of
Education’s Graduate Assistance in Areas of National Needs Program (J.K.B and
G.P.M.) and NIH grants (W.H.L.). We thank K. Wolfe and the reviewers for helpful
suggestions.
Literature Cited
Chiaromonte, F., W. Miller, and E.E. Bouhassira. 2003. Gene length and proximity to
neighbors affect genome-wide expression levels. Genome Res. 13:2602-2608.
Cliften, P., P. Sudarsanam, A. Desikan, L. Fulton, B. Fulton, J. Majors, R. Waterston,
B.A. Cohen, and M. Johnston. 2003. Finding functional features in
Saccharomyces genomes by phylogenetic footprinting. Science. 301:71-76.
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
Cohen, B.A., R.D. Mitra, J.D. Hughes, and G.M. Church. 2000. A computational analysis
of whole-genome expression data reveals chromosomal domains of gene
expression. Nat. Genet. 26:183-186.
Dietrich, F.S., S. Voegeli, S. Brachat et al. (14 co-authors). 2004. The Ashbya gossypii
genome as a tool for mapping the ancient Saccharomyces cerevisiae genome.
Science. 304:304-307.
Dujon, B., D. Sherman, G. Fischer et al. (67 co-authors). 2004. Genome evolution in
yeasts. Nature. 430:35-44.
Filipski, J., and M. Mucha. 2002. Structure, function and DNA composition of
Saccharomyces cerevisiae chromatin loops. Gene. 300:63-68.
Fischer, G., S.A. James, I.N. Roberts, S.G. Oliver, and E.J. Louis. 2000. Chromosomal
evolution in Saccharomyces. Nature. 405:451-454.
Garcia, I., R. Gonzalez, D. Gomez, and C. Scazzocchio. 2004. Chromatin rearrangements
in the prnD-prnB bidirectional promoter: dependence on transcription factors.
Eukaryot. Cell. 3:144-156.
Gautier, L., L. Cope, B.M. Bolstad, and R.A. Irizarry. 2004. affy--analysis of Affymetrix
GeneChip data at the probe level. Bioinformatics. 20:307-315.
Gerton, J.L., J. DeRisi, R. Shroff, M. Lichten, P.O. Brown, and T.D. Petes. 2000. Global
mapping of meiotic recombination hotspots and coldspots in the yeast
Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA. 97:11383-11390.
Gregory, T.R. 2004. Insertion-deletion biases and the evolution of genome size. Gene.
324:15-34.
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
Harbison, C.T., D.B. Gordon, T.I. Lee et al. (20 co-authors). 2004. Transcriptional
regulatory code of a eukaryotic genome. Nature. 431:99-104.
Harrison, P.M., and M. Gerstein. 2002. Studying genomes through the aeons: protein
families, pseudogenes and proteome evolution. J. Mol. Biol. 318:1155-1174.
Kellis, M., N. Patterson, M. Endrizzi, B. Birren, and E.S. Lander. 2003. Sequencing and
comparison of yeast species to identify genes and regulatory elements. Nature.
423:241-254.
Kellis, M., B.W. Birren, and E.S. Lander. 2004. Proof and evolutionary analysis of
ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature.
428:617-624.
Kurtzman, C.P. 2003. Phylogenetic circumscription of Saccharomyces, Kluyveromyces
and other members of the Saccharomycetaceae, and the proposal of the new
genera Lachancea, Nakaseomyces, Naumovia, Vanderwaltozyma and
Zygotorulaspora. FEMS Yeast Res. 4:233-245.
Lohr, D., P. Venkov, and J. Zlatanova. 1995. Transcriptional regulation in the yeast GAL
gene family: a complex genetic network. FASEB J. 9:777-787.
Pal, C., and L.D. Hurst. 2003. Evidence for co-evolution of gene order and recombination
rate. Nat. Genet. 33:392-395.
Shearwin, K.E., B.P. Callen, and J.B. Egan. 2005. Transcriptional interference - a crash
course. Trends Genet. 21:339-345.
Veitia, R.A. 2004. Gene dosage balance in cellular pathways: implications for dominance
and gene duplicability. Genetics. 168:569-574.
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
Wolfe, K.H., and D.C. Shields. 1997. Molecular evidence for an ancient duplication of
the entire yeast genome. Nature. 387:708-713.
Wong, S., and K.H. Wolfe. 2005. Birth of a metabolic gene cluster in yeast by adaptive
gene relocation. Nat. Genet. 37:777-782.
Zhang, Z., S. Schwartz, L. Wagner, and W. Miller. 2000. A greedy algorithm for aligning
DNA sequences. J. Comput. Biol. 7:203–214.
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
Figure 1: The extent of reorganization for adjacent gene orientation depends on starting
bias in orientation and selection on orientation. We created 51 starting genomes with
%DC ranging from 0% to 100%, separated by increments of ~2%, based on the gene
counts of A. gossypii. We simulated WGD-D under a single-gene deletion model 10,000
times for each genome and here plot the mean change in %DC (+/- one standard
deviation) for a neutral deletion (solid), or with strong (black) or moderate (grey)
selection favoring (dashed) or disfavoring (dotted) deletions that result in a net gain of
tandem adjacencies.
Figure 2: Shorter deletions in WGD-D result in more reorganization of gene orientation
and results in shorter deleted blocks. (a) We plot the frequency distribution of %DC for
10,000 simulations of WGD-D using the A. gossypii genome structure as the starting
point, with deletion lengths drawn from a uniform or Poisson distribution. Vertical lines
indicate the genome-wide %DC values observed for S. cerevisiae (solid) and A. gossypii
(dotted). Smaller deletions lead to significantly greater mean change and variance in
final %DC (p < 10-15). The single gene deletion model (i.e. uniform with max = 1; short
dashed line) has a reduction in %DC from 56% to 52.5% ± 0.57% (mean ± standard
deviation), which is statistically indistinguishable from the S. cerevisiae value (p = 0.21).
(b) We plot the distribution of deleted blocks lengths from the same set of simulations,
with the means (± standard deviation) for each deletion model, against the distribution
from S. cerevisiae. Again, only the single-gene deletion model (short dashed line)
approximates the pattern observed in S. cerevisiae (solid line).
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
Figure 3: Origin of adjacencies in the S. cerevisiae lineage. (a) %DC, (b) mean
intergenic length, (c) mean log2 expression ratio of adjacent genes, and (d) maximal
expression for adjacent gene pairs in each age class (Scer = S. cerevisiae specific, stricto
= Saccharomyces sensu stricto specific, Scas = present in S. cerevisiae-S. castellii
common ancestor, preWGD = conserved from before WGD). Bars with the same letter
are not significantly different (p > 0.05) and error bars represent 95% confidence interval.
The number of genes in each adjacency age class is as follows (with the subset of genes
with expression data [c,d] given in parentheses): Scer, 1392 (312); stricto, 1530 (1454);
Scas, 810 (774); and preWGD, 2156 (1984). The pattern for %DC and mean intergenic
length is the same whether we use the full set of genes (a,b) or the subset of genes with
expression data (unpublished data). The %DC is lowest for adjacencies created
immediately following WGD-D and increases slightly in more recently created
adjacencies. Intergenic lengths, log2 expression ratios, and maximal expression are
greater for younger adjacencies.
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
Table 1: Summary of adjacent gene relationships for 10 species of the Saccharomyces complex
Orientation counts Mean intergenic lengths (standard deviation) Species
Genome Size (Mb)
Annotated ORFs %DC
Div Con Tan Divergent Convergent Tandem
S. cerevisiae 12.5a 5714 52.0 1495 1493 2763 799 (1039) 379 (670) 650 (995)
S. mikatae 10c , 12.12b 1902 53.0 514 494 894 1684 (1611) 1372 (1649) 1528 (1639)
S. kudriavzevii 10.6c 2793 53.0 749 732 1312 1410 (1495) 1132 (1557) 1381 (1607)
S. bayanus 10.2c, 11.54b 4716 51.0 1215 1217 2284 1045 (1372) 622 (1023) 793 (1154)
S. castellii 11.1c 4378 52.4 1143 1149 2086 996 (1289) 656 (1221) 819 (1232)
C. glabrata 12.3d 5272 52.9 1360 1363 2427 1031 (1214) 393 (420) 861 (837)
S. kluyveri 10.2c 1985 57.3 570 563 852 1830 (2108) 1235 (1710) 1792 (2050)
K. waltii 10.7e 5230 54.0 1234 1264 2131 946 (984) 310 (554) 736 (899)
K. lactis 10.6d 5331 55.5 1476 1480 2361 650 (708) 318 (453) 668 (653)
A. gossypii 9.2f 4711 56.0 1317 1319 2075 466 (493) 203 (313) 415 (502)
a Saccharomyces genome database b Kellis et al. 2003 c Cliften et al. 2003 d Dujon et al. 2004 e Kellis, Birren, and Lander 2004 f Dietrich et al. 200
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
Figure 1
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
B
A
Figure 2
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
Figure 3
B A
C D
by guest on May 30, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from