Global patterns of apparent copy number variation in birds revealed by cross-species comparative...
Transcript of Global patterns of apparent copy number variation in birds revealed by cross-species comparative...
Global patterns of apparent copy number variation in birds revealed by cross species
comparative genomic hybridization
Benjamin M Skinner1,2, Deborah Smith2, Martin Völker2, Nilofour Hojjat2, Sannaa Raja2,
Steven Trim2, Peter Houde3, William J. Boecklen3, Abdullah Al Mutery2 , Darren K Griffin2
1 Department of Pathology, University of Cambridge, Cambridge, UK, CB2 1QP
2 School of Biosciences, University of Kent, Canterbury, UK, CT2 7NJ
3 New Mexico State University, Department of Biology, Las Cruces NM 88003 USA
Running title: Copy number variation in birds
Key words: evolution, avian, chromosome, microarray, CNVs, array CGH
Corresponding author
Darren K. Griffin
School of Biosciences, University of Kent, Canterbury, Kent, UK, CT2 7NJ
Tel: +44 1227 823022
Fax: +44 1227 763912
This is the post-review version of this article. The original publication is available at
www.springerlink.com: http://link.springer.com/article/10.1007/s10577-014-9405-0
DOI: 10.1007/s10577-014-9405-0
1
Abstract
There is a growing interest in copy number variation (CNV) and the recognition of its
importance in phenotype, disease, adaptation and speciation. CNV data is usually ascertained
by array-CGH within-species, but similar inter-species comparisons have also been made in
primates, mice and domestic mammals. Here, we conduct a broad appraisal of putative cross
species CNVs in birds, 16 species in all, using the standard array-CGH approach. Using a
chicken oligonucleotide microarray we detected 790 apparent CNVs within 135 unique
regions and developed a bioinformatic tool “CNV Analyser” for analysing and visualising
cross species data sets. We successfully addressed four hypotheses: 1) That cross species
CNVs (compared to chicken) are, as suggested from preliminary evidence, smaller and fewer
in number than in mammals; this “dogma” was rejected in the light of the new evidence. 2)
That CNVs in birds are likely to have a functional effect through an association with genes; a
large proportion of detected regions (70%) were indeed associated with genes (suggesting
functional significance) however, not necessarily more so than in mammals. 3) That there are
more CNVs in birds with more rearranged karyotypes; this hypothesis was rejected, indeed
Falco species contained fewer than most with relatively standard (chicken-like) karyotypes.
4) That there are more CNVs per megabase on micro-chromosomes than
macrochromosomes; this hypothesis was accepted; indeed, in species with rearranged
karyotypes characterised by chromosomal fusions, the fused former microchromosomes still
“behaved” as though they were their microchromosomal ancestors. Gene ontology analysis of
CNVRs revealed enrichment in immune response and antigen presentation genes and five
CNVRs were perfectly correlated with the unique loss of sexual dichromatism in one
Galliform species.
2
Introduction
Variations in the copy number of DNA fragments ≥1kb (excepting insertions or deletions of
transposable elements), CNVs (Feuk et al. 2006; Freeman et al. 2006), are increasingly
recognised for their role in phenotype, disease, adaptation and speciation (Freeman et al.
2006; Emanuel and Saitta 2007). In medicine, CNVs have been linked with schizophrenia,
autism and other psychiatric disorders (Stefansson et al. 2008; St Clair 2009), and there have
been associations of HIV-AIDS susceptibility with CNVs at the CCL3L locus in humans,
chimpanzees and macaques (reviewed by Colobran et al. 2010).
CNVs can facilitate adaptation and by conferring phenotypes with a selective advantage, e.g.
increase in copy number of the human amylase (AMY1) gene has been linked to the adoption
of a starch-rich diet in human populations (Perry et al. 2007). This is one of many examples
of the proposed effects of CNVs on human evolution, which were summarised recently
(Iskow et al. 2012). CNVs have also been linked to genome evolution on a chromosomal
scale in that they may facilitate non-homologous recombination (Lupski 1998; Stankiewicz
and Lupski 2002). That is, a correlation has been observed in primates between the locations
of segmental duplications (low copy-number CNVs with >90% sequence identity; Lander et
al. 2001) and the breakpoints of chromosomal rearrangements. It has also been suggested that
chromosomal fusions may be facilitated by the presence of segmental duplications (Emanuel
and Saitta 2007). CNV generation is therefore perhaps driven, at least in part, through non-
allelic homologous recombination (NAHR) following ancestral segmental duplications (Perry
et al. 2006; Repping et al. 2006; Völker et al. 2010).
Amongst mammals, CNV research has focussed on primates (Perry et al. 2006), mice
(Graubert et al. 2007), cattle (Fadista et al. 2010), pigs (Fadista et al. 2008; Wang et al. 2012)
3
and sheep (Fontanesi et al. 2011). For the vast majority of CNV studies, the platform used for
detection is array CGH (microarray based comparative genomic hybridization). While most
have been performed between individuals of the same species, inter-specific array CGH is
becoming increasingly commonplace with primate and mouse cross species array CGH all
reporting numerous lineage-specific gene gains and losses (Samonte and Eichler 2002; Locke
et al. 2003; Fortna et al. 2004; Newman et al. 2005; Bailey and Eichler 2006; Dumas et al.
2007). The 40+ year-old hypothesis that gene duplication is a major factor in evolutionary
change (Ohno 1970) is thus supported. Despite inherent drawbacks associated with the fact
that all inter-specific array CGH is made with respect to a reference species (usually human),
cross species array CGH is a useful tool for determining global patterns of DNA copy number
variation between species as it can identify areas of the genome suitable for further study. In
other words, although apparent losses of DNA copy number compared to the reference
species may, in some cases, be attributable to reduction in microarray hybridization dynamics
(e.g. brought about by sequence divergence), inter-specific array CGH data is nonetheless
extremely valuable for cross species genomic comparisons. For the purposes of this study
therefore we will hence refer to CNVs and CNVRs to mean the outputs of array CGH studies,
being mindful of the potential limitations of the platform.
In birds, three studies by our own group (Griffin et al. 2008; Skinner et al. 2009; Völker et al.
2010) in turkey, duck and zebra finch respectively, examined a small number of animals per
species to derive the first results of cross species array CGH in birds (using chicken as the
reference species). That is, we reported the identification of 16 putative CNVs between
chicken and turkey (Griffin et al. 2008), 32 between chicken and duck (Skinner et al. 2009),
and 20 between chicken and zebra finch (Völker et al. 2010). This compares with 58 CNVs
detected on a comparable platform between chimp and human (Perry et al. 2006). With a
4
reported divergence of humans and chimpanzees of 6 million years, and of chickens and
turkey, duck and zebra finch, 40, 80 and 100 million years respectively, the initial inference is
that CNVs are far less common in birds than in mammals. Within-species avian studies are
limited to chickens, and have revealed patterns of breed specific CNVs (Wang et al. 2010;
Wang et al. 2012; Crooijmans In Press), many of which contain genes of interest for
cytoskeletal components (mainly the keratin superfamily), assorted transcription factors and
metabolic processes. Nonetheless, general patterns of avian CNVs between species remain
very under-explored, despite being essential to provide the basis for studies of the genome
evolution of this extremely important phylogenetic class.
Bird genomes are characterised by a small size (reviewed in Gregory 2005) and by distinctive
and highly conserved karyotypes, consisting of about 10 pairs of macrochromosomes and
~30 pairs of microchromosomes (reviewed in Griffin et al. 2007). The genomes in most bird
lineages have undergone far fewer interchromosomal rearrangements than mammalian ones
(Hillier et al. 2004; Griffin et al. 2007); though recent work suggests intrachromosomal
rearrangements may be common (Völker et al. 2010; Skinner and Griffin 2011). The chicken
genome has a low number of segmental duplications relative to a typical mammalian genome
(Hillier et al. 2004) and this also appears to be the case for more recently published zebra
finch and turkey genomes (Dalloul et al. 2010; Warren et al. 2010). These findings are thus
consistent with the notion that segmental duplications facilitate chromosomal rearrangements
(Kehrer-Sawatzki and Cooper 2008).
Given these smaller genome sizes, smaller intergenic distances, low repeat content and low
number of pseudogenes in the avian genome (Burt et al. 1999; Hillier et al. 2004) it is
reasonable to hypothesise that avian CNVs (as identified by array CGH) will be
5
correspondingly smaller and fewer in number than those reported in mammals. As mentioned
above, our preliminary evidence (Griffin et al. 2008; Skinner et al. 2009; Völker et al. 2010)
supports this. One could also therefore propose a second hypothesis that those CNVs present
are more likely to have a functional effect through association with genes. This leads to third
hypothesis, namely that birds that are characterised by a high degree of chromosome
rearrangement (such as Falco and Gyps species) have a higher proportion of CNVs than the
avian norm. Given the inverse relationship between chromosome size and gene density, bird
microchromosomes contain a higher density of genes and a lower amount of repeats than
macrochromosomes, a fourth hypothesis may be proposed i.e. that microchromosomes
contain more CNVs per megabase than macrochromosomes.
The purpose of this study was therefore to extend CNV investigations to several key avian
clades including Galliformes, Anseriformes, Passeriformes, Gruiformes, and Falconiformes
to address the above hypotheses. We used the previously well-established array-CGH
approach, hybridising genomic DNA from 16 bird species across six orders to a chicken
whole genome tiling path microarray. Our results provide a global overview of the patterns of
CNVs (or at least “apparent” CNVs as revealed by array CGH) in birds. We have also and
performed gene ontology analysis on the putative CNV regions to ask whether any
conclusions can be drawn about the role of CNVs in phenotypic variation. Finally, we present
a program named "CNV Analyser" that we have developed for the collation and presentation
of cross species array CGH (putative CNV) data from a Nimblegen microarray platform.
6
Materials and Methods
Sample preparation
Blood samples and/or feather pulp were acquired from individuals from 16 species (Table 1).
Genomic DNA was prepared using a Qiagen DNeasy Blood and Tissue Kit (#69504)
according to the manufacturers’ instructions. The reference (Red Jungle Fowl) DNA, from
the same animal used in the chicken genome sequencing project, was kindly provided by Dr
Hans Cheng (Michigan State University).
Array-CGH
The NimbleGen chicken whole-genome tiling array (Catalogue Number/Design Name
B3791001-00-01, galGal3 WG CGH - Roche NimbleGen, Milton Keynes, UK) was used for
the array CGH experiments. It contains 385,000 50-mer oligonucleotides with an average
spacing of 2,586 base pairs (source - UCSC, build - galGal3). Labeling of genomic DNA and
hybridization to the NimbleGen array were performed by the company and used random
priming to incorporate modified nucleotides by either amino-allyl or direct linkage to either
of the two dyes used (Cy3 and Cy5). All of the hybridizations in this experiment used two
dyes per slide (Cy3 and Cy5). Red Jungle Fowl reference DNA was co-hybridized with test
DNA. Array data has been uploaded to ArrayExpress under accessions E-TABM-920 and E-
MTAB-1554.
Data analysis
CGH analysis proceeded in three stages, normalization, window averaging and segmentation.
After combining the signal intensity and genomic coordinate information, the Cy3 and Cy5
7
signal intensities were normalized to one another using Qspline normalization (Workman et
al. 2002). Qspline is a robust non-linear method for normalization using array signal
distribution analysis and cubic splines. Once normalized, the data was prepared for DNA
segmentation analysis.
Previously (Griffin et al. 2008; Skinner et al. 2009; Völker et al. 2010), we have used the
default segmentation analysis provided by Nimblegen. In this study, we performed
segmentation using DNACopy (Olshen et al. 2004). Each probe on the array was annotated as
to whether it was located within an Ensembl exon, using an in-house script. The data were
then divided into two groups: Group 1 contained all probes on the array. Group 2 contained
only probes NOT found in exons. The DNACopy algorithm was run on both data sets. CNVs
were called for segments spanning 4 or more probes in which the absolute log2 ratio was
greater than 0.5. The resulting CNVs from groups 1 and 2 were compared, and only those
present in both groups (the vast majority) were taken forward for further analysis. This extra
step was used to check that our CNVs were not simply genomic regions highly conserved
across species appearing as gains following normalisation. Following normalisation, regions
of higher conservation appear as gains, while regions of lower conservation appear as losses.
Since exonic regions are more like to be highly conserved than intronic or intergenic regions,
we tested for this possibility by removing all exonic probes from the dataset before running
the segmentation algorithm. The vast majority or the CNVs were still detected in this
analysis, indicating that the data is not simply due to differential conservation of sequence
identity. Furthermore, there was no change in the number of CNVs excluded by this method
per species with increasing phylogenetic distance from chicken, as would be expected if
sequence divergence were dominating the results.
8
Overlapping CNVs were combined into CNV regions (CNVRs) irrespective of the degree of
overlap or whether the change was a gain or loss (as in Redon et al. 2006) and subsequent
CNV studies). We incorporated our previously published data from turkey (Griffin et al.
2008), duck (Skinner et al. 2009) and zebra finch (Völker et al. 2010) into the data analysis
for completeness, including the DNACopy segmentation. Chromosomes Z, W and
unassigned linkage groups were discarded from the CNVR calling. In order to visualise
CNVRs across individuals within the same species, or multiple species compared to chicken,
a bespoke tool “CNV Analyser” was developed specifically for this study. The purpose of this
was to display the results in a graphical format compared to a standard chicken ideogram (see
figure 1), and to provide a simple tool for discovering genes within the CNVRs. CNV
Analyser is freely available by contacting the authors.
Gene content of CNVRs and gene ontology was identified from Ensembl using BioMart
(www.ensembl.org/biomart/martview). Gene enrichment analysis was performed using
DAVID (Da Wei Huang and Lempicki 2008); http://david.abcc.ncifcrf.gov/). Gene clusters
with an enrichment score >2 were taken to be significant. Statistical phenotypic associative
comparisons focused on the fowl (Phasianidae), the nearest relatives to chicken, upon which
all CNVs were scored. As members of the same family, these birds are genetically similar to
one another and differ in a relatively narrowly defined set of phenotypes; genes are less likely
to have diverged functions than they might between more distantly related species. Their
phylogeny is well-established so autocorrelation of traits due to shared ancestry can be
accounted for (Kimball and Braun 2008; Bonilla et al. 2010). There is an available data set of
phenotypes related to sexual traits that covers all species (Kimball et al. 2011) that encode
body size, shape, plumage, and sex and are therefore likely related to numerous
developmental and hormonal regulators. Correlations between CNV and sexual phenotype
9
data sets were assessed visually and with a pairwise correlation matrix (Minitab Statistical
Software, Minitab, Inc.). Statistical testing was carried out using R.
10
Results
Cross-species array-CGH identification of CNVs
We successfully hybridised genomic DNA from 16 avian species to the 384,000
oligonucleotide chicken microarray (Roche Nimblegen). We also successfully reanalysed and
incorporated previous CNV data from turkey (Griffin et al. 2008), duck (Skinner et al. 2009)
and zebra finch (Völker et al. 2010). Our full data set thus includes species from six orders
and three major clades, the Galliformes, Anseriformes and Neoaves. The details of the
species and animals used are in Table 1. An example of the output generated by CNV
Analyser is given in Figure 1: CNVRs in the common quail are shown over a chicken
karyotype. Gains are shown in green to the right of the chromosome and apparent losses in
red to the left.
Figure 1Output from CNV Analyser showing relative gains and losses in 4 common quails compared to chickens. Chicken chromosomes are numbered in order at the top of the figure and ideograms are represented by vertical lines. Apparent CNVR gains are represented by green bars to the right of each chromosome in green, apparent losses by red bars to the left of the chromosome.
All reported ‘gains’ and ‘losses’ are presented with respect to chicken. Excluding the sex
chromosomes and unassigned linkage groups, 790 CNVs were found (supplementary table
S1). The overlapping CNVs were combined into 135 CNV regions (supplementary table S2).
11
These CNVRs are similar in size and distribution to those seen in our previous experiments:
CNVRs ranged from 7.15kb to 1.56Mb, with a median size of 32.9kb (figure 2). Overall,
11.8Mb of the chicken genome was encompassed by CNVRs.
Figure 2Distribution of CNV sizes across the species in this study. The grey bounded region shows the median and quartiles for the dataset as a whole. Numbers at the top show the number of CNVs per species.
Gene content of CNVRs
We examined the CNVRs for association with known or novel genes on the Ensembl
database (release 69); 95 CNVRs (70%) contained, or overlapped with, genes (supplementary
table 2). Both gains and losses were predominantly associated with genes, either known or
novel; 47 of the gains (79%) and 48 of the losses (63%) had associated genes. Furthermore,
most of these genes are annotated as protein-coding (supplementary figure S2).
CNVR frequency in birds with highly rearranged karyotypes
Both Falco (falcon) and Gyps (vulture) species have previously been reported as having
significantly rearranged karyotypes and high degrees of interchromosomal rearrangements
(Bed'Hom et al. 2003; Nanda et al. 2006; Nishida et al. 2008). The median number of CNVs
12
per genome for Falco species was 10, for Gyps species was 12 compared to an overall
median of 11.75. Differences were not statistically significant.
CNVR density in macro- versus microchromosomes
We compared the sizes and densities of CNVRs against chromosome sizes. As predicted, we
found a significant inverse association between CNVR frequency and chromosome size
(Figure 3A) – CNVRs are more common on smaller chromosomes (rho=-0.88, p<0.0001,
n=25, Spearman's rank). The size of CNVRs however does not show any correlation with
chromosome size (Figure 3B) – small CNVRs are as common as large CNVRs on small
chromosomes (rho=0.08, p=0.67, n=25, Spearman's rank). Consequently, smaller
chromosomes tend to have a higher percentage of sequence within CNVs than larger
chromosomes. The same patterns were seen when the CNVRs were separated into gains and
losses, with similar statistical significances.
Figure 3Correlations between chromosome size and A) CNVR density (the number of CNVRs per megabase of chromosome); B) the median CNVR length. There is a strong significant inverse correlation with CNVR density (Spearman's rank, rho=-0.88, p<0.0001, n=25) but no correlation with median CNVR length (Spearman's rank, rho=0.08, p=0.67, n=25).
Overlap with CNVs amongst chicken breeds
13
A total of 47 of the 135 CNVRs from our species (~36%) overlap chicken CNVRs found by
Crooijmans et al. (2013) looking between chicken breeds (supplementary table S6). The
shared CNVRs are distributed across most chromosomes and contain both gains and losses.
Many of the CNVRs seen as gain in our species relative to Red Jungle Fowl appear as losses
in chicken breeds relative to Red Jungle Fowl, consistent with the high percentage of losses
(75%) found by Crooijmans et al. (2013).
Gene ontology analysis
A gene ontology analysis using DAVID revealed enrichment among losses for genes involved
in immune responses and antigen presentation (supplementary Table S3). No specific
enrichment categories were seen amongst gains. A total of 51 CNVRs (38%) are shared
between two or more species (see also figure S3). While the total number of gains and losses
are similar (59 versus 76 respectively), the 18 CNVRs shared amongst more than four species
are all losses (see figure S3). These CNVRs predominantly contain genes related to immune
function, including CNVR109, which covers almost all of the available GGA16 sequence
(the MHC region).
Some of the CNVRs appear lineage-specific: for example, the gain CNVR64 is found only in
the four quails. It contains the gene LRFN5, which while uncharacterised in chicken, in
humans may play a role in regulating synapse formation (Mah et al. 2010). Similarly, the gain
CNVR56 is found only in four animals from Gyps fulvus, and contains ADAMTS3. This gene
has roles in cell signalling, cell fusion, and cell-cell interactions, and in humans is implicated
in multiple diseases including myocardial infarction (Lee et al. 2012). An annotated list of
genes in CNVRs seen concordant in most animals of a species is given in supplementary
table S5.
14
Statistical correlations between our CNV dataset and phenotypic datasets detect some
correlated CNVRs with sexual phenotypic data (despite deliberately excluding Z and W
probes from the analysis). That is, five CNVRs (four losses and one gain) were found to be
perfectly correlated with the unique loss of sexual dichromatism in the chukar (Alectoris
chukar) (Kimball et al. 2011). A sixth CNVR (weak loss vs strong losses in chukar) was
shared by a pheasant with sexual dichromatism. This CNVR overlaps an intron of DACH2,
associated with the development of female primary sexual characteristics, and an as-yet un-
annotated antisense miRNA within the intron.
In supplementary table S6, a list of autapomorphic species specific and lineage-specific
putative CNVRs containing genes are given. Since many of the genes remain unannotated in
chicken, information from the human orthologue of the gene has been included where
possible to provide an approximation of function.
15
Discussion
In this study we successfully addressed four hypotheses, as outlined in the introduction,
pertaining to cross species CNVs (as revealed by array CGH using chicken as a reference).
We developed a novel in house tool for analysing and visualising cross species CNVRs and
identified putative relationships between inter-specific CNVRs and phenotypic variation in
birds. As such, this study represents the most comprehensive appraisal of copy number
variation (or at least cross species array CGH data) in birds to date and its possible role in
avian species variation and genome evolution.
Hypothesis 1) Size and number of copy number variants in birds
The median CNV size (32.9kb) presented here is smaller than reported in our individual
studies of turkey (127.1 kb), duck (50kb) and zebra finch (90kb) (Griffin et al. 2008; Skinner
et al. 2009; Völker et al. 2010), albeit employing slightly different segmentation methods.
The range of CNV sizes is near identical in all the species studied, with only a slight
variability. These results compare favourably with those of Wang et al. (2010, 2012) who
examined CNVs between chicken breeds, finding median CNV sizes of 42.59kb and 14.43kb,
with similar ranges.
Our first hypothesis was that CNVs apparent in birds would appear smaller than those seen in
mammals; this was supported by initial evidence. Recent within-species and cross-species
studies in mammals however indicate this not to be the case. That is, experiments with
oligonucleotide platforms within species in mice, cattle and pigs have found median CNV
sizes of 122kb, 16.7kb and 142.9kb respectively (Graubert et al. 2007; Fadista et al. 2010;
Wang et al. 2012). Cross species experiments comparing sheep and goats to cattle have found
16
median sizes of 55.9kb and 49.5kb (Fontanesi et al. 2011). It is becoming clearer therefore
that as the resolution of CNV detection platforms improves, smaller CNVs are being found.
We thus do not feel confident in accepting out first hypothesis that bird CNVs are smaller
than mammals and await comparable results using similar platforms in order to confirm or
refute this.
The number of CNVs per individual genome also seems comparable to the situation in
mammals. We found a median of 11.75 CNVs per animal, varying by species from 5.5 in
Lady Amherst's pheasant to 39.75 in the red-legged partridge. CNVs within chickens were
found with an average of 17 and 40 CNVs per bird (Wang et al. 2010; Wang et al. 2012). The
equivalent numbers for mammals range from 11.6 to 47 in cattle (Fadista et al. 2010; Liu et
al. 2010), about 17 per individual genome in dogs, goats and sheep (Chen et al. 2009;
Fontanesi et al. 2010; Fontanesi et al. 2011) and 22 per individual genome in mice (Graubert
et al. 2007). Consequently, it appears that CNVs are equally frequent across birds as they are
across mammals, accounting for slight variations in array platforms and analysis methods,
and thus are likely to play as important a functional role. Given that bird genomes are, on
average, one third smaller than mammalian ones however the number of CNVs per megabase
in birds is nonetheless correspondingly larger.
Hypothesis 2) Gene content of CNVRs
Our second hypothesis was that CNVRs would frequently be associated with genes, and more
so than in mammals. In the current data set, we found Ensembl genes within the majority of
CNVRs (70%). Redon et al. (2006), found 58% of CNVRs in the human genome overlapping
known RefSeq genes. Amongst other mammals, CNVRs overlapping with genes range from
30% in cattle (Fadista et al. 2008; Fadista et al. 2010) through 61% (sheep and goats,
17
(Fontanesi et al. 2010; Fontanesi et al. 2011)) to 74% in pigs and dogs (Chen et al. 2009;
Wang et al. 2012). Amongst chicken breeds, the overlaps are 58% (Wang et al. 2010), 66%
(Crooijmans In Press) and 87.7% (Wang et al. 2012). Thus it is possible that there is a slightly
higher association between genes and CNVRs amongst birds, but this is only tentative. For
the moment, we thus feel it prudent to conclude that, while we accept the hypothesis that
CNVRs are frequently associated with genes, there appear to be not dissimilar proportions of
genes involved in copy number variation in birds and mammals. We thus await more robust
statistical analysis of comparable platforms and larger data sets. In any event, the high
association of the avian CNVRs identified in this study with protein coding genes argues for
the functional significance of these CNVRs.
Hypothesis 3) CNVs in species with highly rearranged karyotypes
The falcon and vulture species have substantial karyotypic rearrangements compared with the
‘typical’ avian karyotype. The reduction in chromosome number, typical of the Falconidae
was mediated by fusions of microchromosomes with macrochromosomes,
microchromosomes with other microchromosomes, and some fissions of the largest ancestral
chromosomes (Bed'Hom et al. 2003; Nanda et al. 2006; Nishida et al. 2008). Our third
hypothesis, that we would find a higher number of CNVRs in these birds as compared with
species with more stable karyotypes was not supported. That is, the number of CNVs per
animal was not greater in Falco nor Gyps species compared to the others, nor were CNV
sizes noticeably different from the global median. In fact, with an average of about six CNVs
per animal in Falco species, they were among the lowest counts in the dataset. This contrasts
with our initial expectation that species with highly rearranged karyotypes would be more
prone to copy number variation. As CNVs have been observed to be associated with intra-
chromosomal rearrangements (Völker et al. 2010), we hypothesise that CNVs correlate more
18
strongly with intra-chromosomal rearrangements than with inter-chromosomal
rearrangements. The in-progress sequence assemblies of falcon and parrot species will aid in
testing this.
Hypothesis 4) Association between chromosome size and CNV content
Models of copy-number variation suggest it is driven by non-allelic homologous
recombination (NAHR), and so is expected to be more frequent in regions of the genome
with high recombination rates. In birds, the smaller chromosomes have high recombination
rates, due in part to the obligatory crossover during meiosis (Burt 2005; Schmid et al. 2005).
Consequently, our final hypothesis was that CNV density (number of CNVs per megabase)
would inversely correlate with chromosome size. As hypothesised, the smaller chromosomes
appeared to be more CNV-dense (Figure 2). We also found that the CNVs have an equal
range of sizes across chromosomes, meaning that the proportion of a microchromosome
affected by copy number variation is greater than the comparable proportion of a
macrochromosome. This impacts on the functional relevance of avian CNVRs, given about
two thirds of most avian species’ genes are found on the microchromosomes (Hillier et al.
2004; Dalloul et al. 2010; Warren et al. 2010). Moreover, Falco CNVR density still follows
the association with chicken chromosome size, and CNVR length still lacks association with
chromosome size that is seen in species with “standard” (i.e. chicken-like) karyotypes. Thus,
while these birds have rearranged karyotypes, the individual chromosome synteny blocks are
still 'behaving' like their ancestral chromosome. Similarly, chicken GGA4p, the ancestral
chromosome ten (Griffin et al. 2007), still has a GC content, CpG density and recombination
rate resembling a microchromosome (Hillier et al. 2004; Griffin et al. 2008).
19
Do our reported “gains” and “losses” represent genuine CNVs?
Inference of true DNA copy number variation from relative hybridization efficiencies and
fluorescent intensities on a microarray platform is a point for discussion. It is possible for
instance, given the means by which the data was ascertained (array CGH), that some of our
apparent losses are, in point of fact, due to sequence divergence rather than copy number
change. We would argue however, as they mostly overlap with genes, these apparent CNVRs
still represent genomic regions of interest, whichever the cause. On the other hand, it is hard
to imagine how the gains (relative to chicken) could be anything other than genuine CNVs.
Since generating the primary array CGH data we have made multiple attempts to verify the
results independently, chiefly by quantitative PCR. While some results (not presented)
appeared to confirm the array data, others did not. Moreover, qPCR of “known” inter-specific
CNVs do not appear consistently to confirm the data either. After around two years of
attempts we were thus forced to conclude that the qPCR data was no more or less reliable
than the array CGH. That is, while qPCR is excellent for confirming array CGH data between
individuals of the same species, possible confounding effects of differing annealing
efficiencies between the species-specific primers leads us to believe that it is not a suitable
strategy for confirming cross species array CGH data. Other possible alternatives could be
FISH-based approaches e.g. fibre FISH for small CNVs or 3D FISH for larger ones. Another
possibility is massive parallel deep sequencing of the genomes in question. Both these data,
particularly the latter, will need to wait for better-characterised genome assemblies.
Comparison with the pre-existing chicken data (in which genuine CNVs are presumably
represented) reveals significant overlaps, thus leading us to conclude that a large proportion
(perhaps the majority) of the “gains” and “losses” seen in this data set are genuine CNVs.
20
Gene ontology analysis
Many CNVRs appearing consistently in every (or most individuals in every) species appear
to be losses relative to chicken, e.g. CNVR109 on GGA16. The only sequence available for
GGA16 (~400kb) contains the MHC locus (as well as other rRNA genes) - a rapidly
changing and copy number variable region of the genome e.g. (Ewald and Livant 2004).
Furthermore, the analysis of gene enrichment categories using DAVID revealed other
immune system function and cytoskeletal components to be particularly highly enriched
(supplementary table S4) amongst the losses. The fact that no gains were seen in these
regions supports the notion that a significant proportion of our 'losses here are caused by
sequence divergence in these rapidly evolving genomic regions.
The majority of CNVRs (62%) that were found appeared to be unique to individual species.
Those that included genes were almost uniformly either all gains or all losses, e.g. 24 gene
losses and 1 gain in Meleagris, 14 gene gains and no losses in Coturnix. The fact that many
of these genes have multifaceted transcriptional and/or regulatory roles (e.g., containing LIM-
domains (Jurata and Gill 1997) is consistent with the suggestion that the correlation of gains
or losses may be due to their participation in functional networks. Further, it is a reasonable
assumption that these genomic regions may contain genes involved in speciation and/or
adaptation of that lineage because most closely related species typically exhibit opposite
trends of gains versus losses. The most interesting examples of these are those CNVRs that
are present in all or most of the animals within that species, e.g. the quail-specific CNVR64.
Other interesting examples include:
• Genes that may be involved in diet in turkey (e.g., losses of F1N9P3_CHICK and
F1NP19_CHICK, implicated in proteolytic digestion and degradation of trypsin
inhibitors (Szmola et al. 2003);
21
• The uniquely migratory behaviour of common quail among fowl (e.g., gains of
OBSCN, implicated in hypertrophy of myofribrils, and of MAPK8IP3, implicated in
respiratory gaseous exchange (Borisov et al. 2003);
• Muscle activity in the speedy falcons (e.g., gain of MYOZ3, preferentially expressed
in fast-twitch myofibers and skeletal muscle (Frey and Olson 2002);
• Immune function in the common quail and silver pheasant (e.g. LEAP2 and ITCH
respectively),
• Homeotic genes in the common pheasant and California quail (e.g., SCML2 and
DLX5, respectively)
• Brain development and neuronal function in turkey (e.g., loss of CTXN1), common
quail (gain of LRFN5) and duck (e.g., DLGAP2), based on comparison of
orthologous gene functions in mammals (Watson et al. 1994; Mah et al. 2010).
Conclusions
Copy number variation is increasingly being recognised as an important and still unexplored
area of genetic variation in most species, contributing to disease states, adaptations and
speciation. This work represents a broad survey of copy number variation in avian species,
and establishes a base from which future studies can build. Undoubtedly, the utility of cross-
species array-CGH in birds will prove valuable to investigations into copy number variation
in general and avian genetic variability in particular.
Acknowledgements
We thank Wingham Wildlife Park, Kent, Cherry Valley Farms, British United Turkeys, and
the Central Veterinary Research Laboratory in Dubai, and the Falcon Hospital Dubai for
providing feathers and blood samples from which the DNA samples were extracted.
22
Supplementary Figures
Figure S1Numbers of CNVRs containing genes in each species, and the proportion of genes that are protein-coding.
Figure S2Heatmap in which gains are shown in green, losses in red, with the brightness proportional to the log2 ratio of the CNV. The animals and CNVs have been clustered, revealing patterns of lineage-specific CNVRs, and demonstrating that losses are more frequently shared across multiple species than gains.
Figure S3The number of CNVRs which are unique to one species, or which are shared between two or more species. Gains are shared between at most four species; losses are shared with up to 18 species. Shared losses are likely attributable to sequence divergence rather than true copy number change.
Table 1Species analysed in this study, comprising 66 animals from 18 species across 6 orders.
Species Common name Order nAlectoris chukar Red legged partridge Galliformes 4Anas platyrhynchos Duck Anseriformes 2Callipepla californica California quail Galliformes 2Chlamydotis undulata Houbara bustard Gruiformes 9Chrysolophus
amherstiae
Lady Amherst's pheasant Galliformes 2
Chrysolophus pictus Golden pheasant Galliformes 2Colinus virginianus Bobwhite quail Galliformes 2Coturnix coturnix Common quail Galliformes 4Falco chiquera Red-necked falcon Falconiformes 5Falco rusticolus Gyrfalcon Falconiformes 11Gyps africanus White-backed vulture Accipitriformes 2Gyps fulvus Griffon vulture Accipitriformes 2Gyps rueppellii Rüppell's Vulture Accipitriformes 2Lophura nycthermera Silver pheasant Galliformes 4Meleagris gallopavo Turkey Galliformes 4Pavo cristatus Peafowl Galliformes 4Phasianus colchicus Common pheasant Galliformes 3Taeniopygia guttata Zebra finch Passeriformes 2
References
Bailey JA, Eichler EE. 2006. Primate segmental duplications: crucibles of evolution, diversity and disease. Nature Reviews Genetics 7(7): 552-564.
Bed'Hom B, Coullin P, Guillier-Gencik Z, Moulin S, Bernheim A, Volobouev V. 2003. Characterization of the atypical karyotype of the black-winged kite Elanus caeruleus (Falconiformes: Accipitridae) by means of classical and molecular cytogenetic techniques. Chromosome Research 11(4): 335-343.
Bonilla AJ, Braun EL, Kimball RT. 2010. Comparative molecular evolution and phylogenetic utility of 3′-UTRs and introns in Galliformes. Molecular phylogenetics and evolution 56(2): 536-542.
Borisov AB, Raeker MO, Kontrogianni-Konstantopoulos A, Yang K, Kurnit DM, Bloch RJ, Russell MW. 2003. Rapid response of cardiac obscurin gene cluster to aortic stenosis: differential activation of Rho-GEF and MLCK and involvement in hypertrophic growth. Biochemical and biophysical research communications 310(3): 910-918.
Burt DW. 2005. Chicken genome: current status and future opportunities. Genome Research 15(12): 1692-1698.
Burt DW, Bruley C, Dunn IC, Jones CT, Ramage A, Law AS, Morrice DR, Paton IR, Smith J, Windsor D. 1999. The dynamics of chromosome evolution in birds and mammals. Nature 402(6760): 411-413.
Chen W-K, Swartz JD, Rush LJ, Alvarez CE. 2009. Mapping DNA structural variation in dogs. Genome research 19(3): 500-509.
Colobran R, Pedrosa E, Carretero‐Iglesia L, Juan M. 2010. Copy number variation in chemokine superfamily: the complex scene of CCL3L–CCL4L genes in health and disease. Clinical & Experimental Immunology 162(1): 41-52.
Crooijmans RPMA, Fife, M.S. Fitzgerald, T. Strickland, S. Cheng, H.H. Kaiser, P. Redon R. and Groenen M. A. M. BMC Genomics 14:398 . Large scale variation in DNA copy number in chicken breeds.
Da Wei Huang BTS, Lempicki RA. 2008. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols 4(1): 44-57.
Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Bouffard P, Burt DW, Crasta O, Crooijmans RP, Cooper K. 2010. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS biology 8(9): e1000475.
Dumas L, Kim YH, Karimpour-Fard A, Cox M, Hopkins J, Pollack JR, Sikela JM. 2007. Gene copy number variation spanning 60 million years of human and primate evolution. Genome Research 17(9): 1266-1277.
Emanuel BS, Saitta SC. 2007. From microscopes to microarrays: dissecting recurrent chromosomal rearrangements. Nature Reviews Genetics 8(11): 869-883.
Ewald S, Livant E. 2004. Distinctive polymorphism of chicken B-FI (major histocompatibility complex class I) molecules. Poultry science 83(4): 600-605.
Fadista J, Nygaard M, Holm L-E, Thomsen B, Bendixen C. 2008. A snapshot of CNVs in the pig genome. PLoS One 3(12): e3916.
Fadista J, Thomsen B, Holm L-E, Bendixen C. 2010. Copy number variation in the bovine genome. BMC genomics 11(1): 284.
Feuk L, Carson AR, Scherer SW. 2006. Structural variation in the human genome. Nature Reviews Genetics 7(2): 85-97.
Fontanesi L, Beretti F, Martelli P, Colombo M, Dall'Olio S, Occidente M, Portolano B, Casadio R, Matassino D, Russo V. 2011. A first comparative map of copy number variations in the sheep genome. Genomics 97(3): 158-165.
Fontanesi L, Martelli P, Beretti F, Riggio V, Dall'Olio S, Colombo M, Casadio R, Russo V, Portolano B. 2010. An initial comparative map of copy number variations in the goat (Capra hircus) genome. BMC genomics 11(1): 639.
Fortna A, Kim Y, MacLaren E, Marshall K, Hahn G, Meltesen L, Brenton M, Hink R, Burgers S, Hernandez-Boussard T. 2004. Lineage-specific gene duplication and loss in human and great ape evolution. PLoS biology 2(7): e207.
Freeman JL, Perry GH, Feuk L, Redon R, McCarroll SA, Altshuler DM, Aburatani H, Jones KW, Tyler-Smith C, Hurles ME. 2006. Copy number variation: new insights in genome diversity. Genome Research 16(8): 949-961.
Frey N, Olson EN. 2002. Calsarcin-3, a novel skeletal muscle-specific member of the calsarcin family, interacts with multiple Z-disc proteins. Journal of Biological Chemistry 277(16): 13998-14004.
Graubert TA, Cahan P, Edwin D, Selzer RR, Richmond TA, Eis PS, Shannon WD, Li X, McLeod HL, Cheverud JM. 2007. A high-resolution map of segmental DNA copy number variation in the mouse genome. PLoS Genetics 3(1): e3.
Gregory TR. 2005. The evolution of the genome. Academic Press.Griffin D, Robertson L, Tempest H, Skinner B. 2007. The evolution of the avian genome as
revealed by comparative molecular cytogenetics. Cytogenetic and genome research 117(1-4): 64-77.
Griffin DK, Robertson LB, Tempest HG, Vignal A, Fillon V, Crooijmans RP, Groenen MA, Deryusheva S, Gaginskaya E, Carré W. 2008. Whole genome comparative studies between chicken and turkey and their implications for avian genome evolution. BMC genomics 9(1): 168.
Hillier LW, Miller W, Birney E, Warren W, Hardison RC, Ponting CP, Bork P, Burt DW, Groenen MA, Delany ME. 2004. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432(7018): 695-716.
Iskow RC, Gokcumen O, Lee C. 2012. Exploring the role of copy number variants in human adaptation. Trends in Genetics.
Jurata LW, Gill GN. 1997. Functional analysis of the nuclear LIM domain interactor NLI. Molecular and cellular biology 17(10): 5688-5698.
Kehrer-Sawatzki H, Cooper DN. 2008. Molecular mechanisms of chromosomal rearrangement during primate evolution. Chromosome Research 16(1): 41-56.
Kimball RT, Braun EL. 2008. A multigene phylogeny of Galliformes supports a single origin of erectile ability in non‐feathered facial traits. Journal of Avian Biology 39(4): 438-445.
Kimball RT, Mary CMS, Braun EL. 2011. A macroevolutionary perspective on multiple sexual traits in the Phasianidae (Galliformes). International journal of evolutionary biology 2011.
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W. 2001. Initial sequencing and analysis of the human genome. Nature 409(6822): 860-921.
Lee CW, Hwang I, Park C-S, Lee H, Park D-W, Kang S-J, Lee S-W, Kim Y-H, Park S-W, Park S-J. 2012. Expression of ADAMTS-2,-3,-13, and-14 in culprit coronary lesions in patients with acute myocardial infarction or stable angina. Journal of thrombosis and thrombolysis 33(4): 362-370.
Liu GE, Hou Y, Zhu B, Cardone MF, Jiang L, Cellamare A, Mitra A, Alexander LJ, Coutinho LL, Dell'Aquila ME. 2010. Analysis of copy number variations among diverse cattle breeds. Genome research 20(5): 693-703.
Locke DP, Segraves R, Carbone L, Archidiacono N, Albertson DG, Pinkel D, Eichler EE. 2003. Large-scale variation among human and great ape genomes determined by array comparative genomic hybridization. Genome Research 13(3): 347-357.
Lupski JR. 1998. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends in genetics: TIG 14(10): 417.
Mah W, Ko J, Nam J, Han K, Chung WS, Kim E. 2010. Selected SALM (synaptic adhesion-like molecule) family proteins regulate synapse formation. The Journal of Neuroscience 30(16): 5559-5568.
Nanda I, Karl E, Volobouev V, Griffin D, Schartl M, Schmid M. 2006. Extensive gross genomic rearrangements between chicken and Old World vultures (Falconiformes: Accipitridae). Cytogenetic and genome research 112(3-4): 286-295.
Newman TL, Tuzun E, Morrison VA, Hayden KE, Ventura M, McGrath SD, Rocchi M, Eichler EE. 2005. A genome-wide survey of structural variation between human and chimpanzee. Genome Research 15(10): 1344-1356.
Nishida C, Ishijima J, Kosaka A, Tanabe H, Habermann FA, Griffin DK, Matsuda Y. 2008. Characterization of chromosome structures of Falconinae (Falconidae, Falconiformes, Aves) by chromosome painting and delineation of chromosome rearrangements during their differentiation. Chromosome Research 16(1): 171-181.
Ohno S. 1970. Evolution by gene duplication. London: George Alien & Unwin Ltd. Berlin, Heidelberg and New York: Springer-Verlag.
Olshen AB, Venkatraman E, Lucito R, Wigler M. 2004. Circular binary segmentation for the analysis of array‐based DNA copy number data. Biostatistics 5(4): 557-572.
Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, Werner J, Villanea FA, Mountain JL, Misra R. 2007. Diet and the evolution of human amylase gene copy number variation. Nature genetics 39(10): 1256-1260.
Perry GH, Tchinda J, McGrath SD, Zhang J, Picker SR, Cáceres AM, Iafrate AJ, Tyler-Smith C, Scherer SW, Eichler EE. 2006. Hotspots for copy number variation in chimpanzees and humans. Proceedings of the National Academy of Sciences 103(21): 8006-8011.
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W. 2006. Global variation in copy number in the human genome. Nature 444(7118): 444-454.
Repping S, van Daalen SK, Brown LG, Korver CM, Lange J, Marszalek JD, Pyntikova T, van der Veen F, Skaletsky H, Page DC. 2006. High mutation rates have driven extensive structural polymorphism among human Y chromosomes. Nature genetics 38(4): 463-467.
Samonte RV, Eichler EE. 2002. Segmental duplications and the evolution of the primate genome. Nature Reviews Genetics 3(1): 65-72.
Schmid M, Nanda I, Burt DW. 2005. Second report on chicken genes and chromosomes 2005. Cytogenetic and genome research 109(4): 415-479.
Skinner B, Griffin D. 2011. Intrachromosomal rearrangements in avian genome evolution: evidence for regions prone to breakpoints. Heredity 108(1): 37-41.
Skinner B, Robertson L, Tempest H, Langley E, Ioannou D, Fowler K, Crooijmans R, Hall A, Griffin D, Völker M. 2009. Comparative genomics in chicken and Pekin duck using FISH mapping and microarray analysis. BMC genomics 10(1): 357.
St Clair D. 2009. Copy number variation and schizophrenia. Schizophrenia bulletin 35(1): 9-12.
Stankiewicz P, Lupski JR. 2002. Genome architecture, rearrangements and genomic disorders. Trends in genetics: TIG 18(2): 74.
Stefansson H, Rujescu D, Cichon S, Pietiläinen OP, Ingason A, Steinberg S, Fossdal R, Sigurdsson E, Sigmundsson T, Buizer-Voskamp JE. 2008. Large recurrent microdeletions associated with schizophrenia. Nature 455(7210): 232-236.
Szmola R, Kukor Z, Sahin-Tóth M. 2003. Human mesotrypsin is a unique digestive protease specialized for the degradation of trypsin inhibitors. Journal of Biological Chemistry 278(49): 48580-48589.
Völker M, Backström N, Skinner BM, Langley EJ, Bunzey SK, Ellegren H, Griffin DK. 2010. Copy number variation, chromosome rearrangement, and their association with recombination during avian evolution. Genome Research 20(4): 503-511.
Wang J, Jiang J, Fu W, Jiang L, Ding X, Liu J-F, Zhang Q. 2012. A genome-wide detection of copy number variations using SNP genotyping arrays in swine. BMC genomics 13(1): 273.
Wang X, Nahashon S, Feaster TK, Bohannon-Stewart A, Adefope N. 2010. An initial map of chromosomal segmental copy number variations in the chicken. BMC genomics 11(1): 351.
Warren WC, Clayton DF, Ellegren H, Arnold AP, Hillier LW, Künstner A, Searle S, White S, Vilella AJ, Fairley S. 2010. The genome of a songbird. Nature 464(7289): 757-762.
Watson KL, Justice RW, Bryant PJ. 1994. Drosophila in cancer research: the first fifty tumor suppressor genes. Journal of cell science Supplement 18: 19.
Workman C, Jensen LJ, Jarmer H, Berka R, Gautier L, Nielser HB, Saxild H-H, Nielsen C, Brunak S, Knudsen S. 2002. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome biol 3(9): 1-16.