Next-generation sequencing and syntenic integration of flow-sorted arms of wheat chromosome 4A...
Transcript of Next-generation sequencing and syntenic integration of flow-sorted arms of wheat chromosome 4A...
FEATURED ARTICLE
Next-generation sequencing and syntenic integration offlow-sorted arms of wheat chromosome 4A exposes thechromosome structure and gene content
Pilar Hernandez1, Mihaela Martis2, Gabriel Dorado3, Matthias Pfeifer2, Sergio Galvez4, Sebastian Schaaf2, Nicolas Jouve5,
Hana Simkova6, Miroslav Valarik6, Jaroslav Dolezel6 and Klaus F.X. Mayer2,*
1Instituto de Agricultura Sostenible (IAS-CSIC), Alameda del Obispo s/n, 14080 Cordoba, Spain,2Institute for Bioinformatics and Systems Biology, Helmholtz Center Munich, Ingolstadter Landstraße 1, 85764 Neuherberg,
Germany,3Departamento de Bioquımica y Biologıa Molecular, Campus Rabanales C6-1-E17, Campus de Excelencia Internacional
Agroalimentario, Universidad de Cordoba, 14071 Cordoba, Spain,4Departamento de Lenguajes y Ciencias de la Computacion, ETSI Informatica, Campus de Teatinos,
Universidad de Malaga, Bulevar Louis Pasteur 35, 29071 Malaga, Spain,5Departamento de Biologıa Celular y Genetica, Universidad de Alcala, Campus Universitario, 28871 Alcala de Henares,
Madrid, Spain, and6Centre of the Hana Region for Biotechnological and Agricultural Research, Institute of Experimental Botany,
Sokolovska 6, 77200 Olomouc, Czech Republic
Received 3 August 2011; revised 29 September 2011; accepted 30 September 2011; published online 25 November 2011.
*For correspondence (fax +49 89 3187 2225; e-mail [email protected]).
SUMMARY
Wheat is the third most important crop for human nutrition in the world. The availability of high-resolution
genetic and physical maps and ultimately a complete genome sequence holds great promise for breeding
improved varieties to cope with increasing food demand under the conditions of changing global climate.
However, the large size of the bread wheat (Triticum aestivum) genome (approximately 17 Gb/1C) and the
triplication of genic sequence resulting from its hexaploid status have impeded genome sequencing of this
important crop species. Here we describe the use of mitotic chromosome flow sorting to separately purify and
then shotgun-sequence a pair of telocentric chromosomes that together form chromosome 4A (856 Mb/1C) of
wheat. The isolation of this much reduced template and the consequent avoidance of the problem of sequence
duplication, in conjunction with synteny-based comparisons with other grass genomes, have facilitated
construction of an ordered gene map of chromosome 4A, embracing ‡85% of its total gene content, and have
enabled precise localization of the various translocation and inversion breakpoints on chromosome 4A that
differentiate it from its progenitor chromosome in the A genome diploid donor. The gene map of
chromosome 4A, together with the emerging sequences of homoeologous wheat chromosome groups 4, 5
and 7, represent unique resources that will allow us to obtain new insights into the evolutionary dynamics
between homoeologous chromosomes and syntenic chromosomal regions.
Keywords: wheat genome, chromosome sorting, genome zipper, grass comparative genomics, wheat
shotgun chromosome, Triticeae genome
INTRODUCTION
Bread wheat (Triticum aestivum) provides approximately
20% of mankind’s dietary energy supply (http://www.fao.
org), but, despite its importance as a crop, acquisition of its
genome sequence remains a major challenge. The biologi-
cal features responsible for the slowness of progress
towards this goal are its large genome size (1C is
ª 2011 The Authors 377The Plant Journal ª 2011 Blackwell Publishing Ltd
The Plant Journal (2012) 69, 377–386 doi: 10.1111/j.1365-313X.2011.04808.x
approximately 17 Gb), its hexaploid status, and its high
content of repetitive DNA (approximately 80%) (Flavell,
1986). Each wheat chromosome is larger than the entire
genome of rice (Oryza sativa), and the whole genome is
more than one hundred times larger than that of Arabidopsis
thaliana. The species arose from two separate hybridization
and allopolyploidization events, the first involving a hybrid
between the A genome donor Triticum urartu (closely
related to the cultivated species Triticum monococcum)
(Dvorak et al., 1993) and the B genome donor, thought to be
an ancestor of Aegilops speltoides (Sarkar and Stebbins,
1956; Dvorak and Zhang, 1990; Wang et al., 1997; Kilian
et al., 2007). This formed the wild tetraploid Triticum dic-
occoides, which was the ancestor of the cultivated tetraploid
parent of bread wheat Triticum turgidum. The second, much
more recent, event involved T. turgidum and the D genome
diploid Aegilops tauschii (McFadden and Sears, 1946).
A successful strategy that is frequently adopted to
circumvent many of the difficulties created by polyploidy
has been to rely on diploid, and in some cases tetraploid,
progenitors as surrogates (Feuillet et al., 2003). The avail-
ability of the genome sequences of rice and Brachypodium
distachyon (Brachypodium) has been of particular value in
providing saturation of the genetic map in specific regions of
the wheat genome (International Rice Genome Sequencing
Project 2005, International Brachypodium Initiative 2010). As
an alternative, Dolezel et al. (2007) proposed that genome
sequencing be based on flow-sorted individual chromo-
somes or chromosome arms, an approach that simplifies
genome analysis by simultaneously reducing the template
to a manageable size, and crucially avoids all of the
complications introduced by the triplication of genic
sequence arising from wheat’s hexaploid status (Kubala-
kova et al., 2002). Next-generation sequencing of chromo-
somal DNA provides a powerful approach to identify most
of the genes and low-copy regions on a chromosome and to
produce annotated syntenic builds whereby the majority of
genes are placed in an approximate order and orientation
(Berkman et al., 2011; Mayer et al., 2011, 2009; Wicker et al.,
2011). The so-called GenomeZipper approach (Mayer et al.,
2011) relies on comparisons of chromosomal shotgun
sequences with reference grass genomes (typically rice,
sorghum (Sorghum bicolor) and Brachypodium) to detect
syntenic regions in these reference genomes. Genes in the
detected regions are selected to generate a genomic build
along a marker scaffold that takes into account the sequen-
tial order of sequence-tagged genes in the reference
genomes as well as the ordering deduced from the marker
scaffold.
Although most of the bread wheat chromosomes have
maintained the structure of ancestral species, that of chro-
mosome 4A underwent a series of re-arrangements. Previ-
ous analyses revealed that the chromosome harbors two
translocations from chromosome arms 5AL and 7BS, and
that it has undergone a pericentric inversion (Figure 1)
(Devos et al., 1995; Miftahudin et al., 2004; Naranjo et al.,
1987) (Figure 1). The 5A translocation occurred at the diploid
level in a common ancestor as it is present in wheats of all
ploidy levels, including diploid wheat progenitors and
related species such as T. monococcum. On the other hand,
the 7BS translocation is detected in tetraploid and hexaploid
wheat only, indicating its occurrence after or at the time of
origin of T. dicoccoides (Devos et al., 1995). Interestingly,
most of the studies on bread wheat also report the presence
of a small region on the most distal part of 4AS that was not
affected by the large pericentric inversion that placed most
of the ancestral short arm on the modern long arm (4AL),
and, as a result, large proportions of the ancestral 4AL now
constitute the modern 4AS.
As many as 40 genes of interest have been mapped to this
chromosome to date, including some encoding resistance/
tolerance to biotic and abiotic stress (Chen et al., 1995, 2005;
Effertz et al., 2001; Nga et al., 2009; Paull et al., 1998; Talbert
et al., 1996), and various agronomic traits (Araki et al., 1999;
Bai et al., 2008; Borner et al., 2002; Keller et al., 1999;
McCartney et al., 2005; Sourdille et al., 2002). Detailed
information on the chromosome gene order would greatly
enhance effective use of the genes in breeding programs
and ultimately in their cloning and functional analysis.
Here, we report a high-resolution gene map of this
chromosome, based on DNA sequence obtained from
flow-sorted chromosome arms. Use has been made of the
genetic marker content present in homoeologous portions
of the barley genome (Hordeum vulgare) and the reference
grass genomes to provide a detailed insight into gene
composition and order along the length of the chromosome.
Figure 1. The structure of wheat chromosome 4A.
The structure of bread wheat chromosome 4A, as inferred by Devos et al. (1995) and Miftahudin et al. (2004). During its evolution, the chromosome first underwent a
pericentric inversion, which resulted in much of the ancient long arm (excluding segment C) becoming the modern short arm. Subsequent translocations from 5AL
(segment D) and 7BS (segment E) completed the rearrangement of the chromosome. The five individual segments A–E are color-coded. The additional small
structural rearrangements proposed by Miftahudin et al. (2004) are not shown as they could not be confirmed in the present study.
378 Pilar Hernandez et al.
ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2012), 69, 377–386
This is a powerful approach for production of a high-
resolution draft of gene space for the complex genome of
bread wheat, including its highly rearranged chromosome
4A. The approach has important implications for the
whole-genome analysis of both bread wheat and other
large genomes of agriculturally important grasses such as
rye (Secale cereale), fescue (Festuca ssp.) and ryegrass
(Lolium ssp.).
RESULTS
Preparation of chromosomal DNA and shotgun sequencing
Two separate DNA bulks were prepared from the mitotically
dividing cells of the double di-telosomic 4A stock. The 4AS
preparation contained approximately 78 000 flow-sorted
telosomes, and the 4AL one contained approximately
50 000. The level of purity of these preparations, as esti-
mated by fluorescent in situ hybridization, was 86.9% and
89.0%, respectively. Chromosome 1D comprised approxi-
mately 50% of the contaminants in the 4AL preparation, but
no single chromosome predominated in the 4AS prepara-
tion. The 4AS bulk yielded 29.5 ng DNA, which was
amplified in three independent multiple displacement
amplification reactions to generate 16 lg DNA; similarly, the
4AL bulk produced 44.6 ng DNA, which was amplified in
four reactions to yield 22.7 lg DNA. The individual multiple
displacement amplification reactions for each template were
combined to reduce the probability of bias introduced by
multiple displacement amplification itself. The amplified
DNA was used for 454 shotgun sequencing (Table 1), which
produced 2 181 649 4AS reads of mean length 324 bp,
representing a total of 707 Mb of sequence (NCBI sequence
read archive, http://www.ncbi.nlm.nih.gov/sra, reference
SRA038898.1). Given the estimated length of this arm
(317 Mbp; Safar et al., 2010), this is equivalent to a
sequencing depth of approximately 2.2-fold. For chromo-
some 4AL, the 2 987 571 reads (mean length 302 bp) yielded
901 Mb of sequence (NCBI sequence read archive, reference
SRA034928.1), equivalent to a sequencing depth of approx-
imately 1.7-fold. Sequencing details are summarized in
Table 1.
Identification of syntenic regions in related grass genomes
The full genome sequences of Brachypodium, rice and
sorghum (International Rice Genome Sequencing Project
2005, Paterson et al., 2009; International Brachypodium
Initiative 2010) were used to identify regions of synteny in
order to take advantage of the GenomeZipper approach
(Mayer et al., 2011). The 4AS and 4AL sequences were
compared by BLAST analysis against the genomic
sequences of Brachypodium, rice and sorghum, as well as
against the virtual barley genome (Mayer et al., 2011), to
identify syntenic regions (Figures 2 and 3). The 4AL com-
parison highlighted regions on Brachypodium chromo-
Table 1 Shotgun sequences of wheat chromosome arms 4AS and 4AL
Parameter
4AS 4AL
Raw dataRepeat-maskedand filtereda Raw data
Repeat-maskedand filtereda
Number of sequences 2 181 649 420 739 2 987 571 752 981Number of base pairs 707 234 947 146 653 961 901 236 013 239 649 872Minimal length (bp) 18 100 29 100Maximal length (bp) 826 826 982 982Mean length (bp) 324 349 302 318Repeat content (%) 79.5 72.8GC content (%) 44.7 46.4 41.4 41
aThe filter applied ensured the retention of sequences longer than 100 bp that contained at least 100 bp of non-repetitive sequence.
Figure 2. Comparison of the 4A shotgun sequence with that of barley.
Repeat-masked 4AS and 4AL shotgun sequence reads were compared against the sequence of virtual barley chromosomes (Mayer et al., 2011). Syntenic regions on
chromosomes 4H, 7H and 5H are colored red; non-syntenic regions are colored blue. Centromeres are indicated by black triangles and the arms of the chromosomes
are labeled S and L. Connectors/joins indicate corresponding segments and orientation of the individual segments.
GenomeZipper analysis of wheat chromosome 4A 379
ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2012), 69, 377–386
somes 1 and 4, rice chromosomes 3, 6, 11 and 12, and sor-
ghum chromosomes 1 and 10 (Figure 3). In the same way, 4AS
syntenic regions were identified on Brachypodium chromo-
some 1, rice chromosome 3 and sorghum chromosome 1.
Comparison with the barley chromosome reference pro-
duced hits on chromosomes 4H, 5H and 7H (Figure 2). The
4AS sequences identified part of chromosome arm 4HL,
while the 4AL sequences matched the entire 4HS arm, as
well as identifying regions on 5HL and 7HS and a small
region of 4HL (Figure 2). The gene content in these regions
was collated, and the syntenic boundaries were located with
high precision (Tables 4 and 5). The resulting data were then
used to generate a GenomeZipper-based alignment and a
high-resolution genetic map of chromosome 4A.
Gene content of chromosome 4A
In order to estimate the number of genes present on each 4A
chromosome arm, TBLASTX comparisons were made with
the Brachypodium, rice and sorghum genome sequences,
based on a stringency level of at least 75% over at least 30
amino acids (Table 2). This exercise produced between 3278
and 3805 hits for 4AS, and between 3956 and 4523 for 4AL.
The numbers of non-redundant matches were 4383 and
5188, respectively, giving a total of 9571 non-redundant
gene matches on 4A. Given the estimated size of 4A of
856 Mb and a gene density representative of the complete
wheat genome (Qi et al., 2004), this scales up to at least
61 500 genes for the A genome and >180 000 genes for
bread wheat. This result contrasts with recent estimates for
barley (‡32 000 genes; Mayer et al., 2011) and the B genome
of wheat (38 000 genes; Choulet et al., 2010), and with our
own estimate of ‡3000 genes on 4A based on a conservative
synteny-driven integration approach (Table 3).
The structure of chromosome 4A
On the basis of synteny with barley, Brachypodium, rice and
sorghum, it was possible to recognize five distinct regions
Figure 3. Comparison of the 4A shotgun sequence with that of Brachypodium, rice and sorghum.
Repeat-masked 4AS and 4AL shotgun sequence reads were compared with the genome sequences of Brachypodium (Bd), rice (Os) and sorghum (Sb). Syntenic
regions are colored red; non-syntenic regions are colored blue. Centromeres are indicated by black triangles and the arms of the chromosomes are labeled S and L.
Connectors/joins indicate corresponding regions and the orientation of the individual segments.
Table 2 Tagged genes in the reference genomes
Chromosomearm
Non-redundant genesa
Non-redundantgenes (total)Brachypodium Rice Sorghum
4AS 3805 3278 3365 43834AL 4523 3956 4069 5188
aThe number of sequence-tagged genes located on chromosome 4Aas deduced from similarity comparisons (sequence identity ‡75% and‡30 amino acids) with reference genomes.
380 Pilar Hernandez et al.
ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2012), 69, 377–386
(A–E) on chromosome 4A. There are 120 (5!) ways in which
five independent segments can be ordered, but as each
segment can be present in one of two possible orientations,
the true number of possible arrangements is 3840 (120 · 25).
To resolve the actual ordering, advantage was taken of
published genetic mapping data (Devos et al., 1995; Mayer
et al., 2011; Miftahudin et al., 2004). The 4AS sequence-
identified a syntenic region on 4HL, while 4AL sequence-
identified 4HS and a small segment of 4HL (Figures 2 and 3).
Chromosome 4A is known to carry a pericentromeric inver-
sion (Devos et al., 1995; Miftahudin et al., 2004) involving a
portion of the ancient long arm (4ALanc; segment A) and the
complete ancient short arm (4ASanc; segment B); this con-
verted 4ALanc into the modern 4AS, and 4ASanc into the
distal part of the modern 4AL. In addition, a small region of
4AL (segment C) appears to have not been involved in the
pericentromeric inversion (Figure 2). Consequently, the
gene order in segments A and B was reversed with respect
to barley, but that in segment C was conserved. The segment
D sequences on 4AL show homology with a distal portion of
5HL (Figure 2), consistent with genetic mapping data (Devos
et al., 1995; Mayer et al., 2011; Miftahudin et al., 2004). Fi-
nally, genetic data indicated that a further translocation must
have occurred between a distal segment of chromosome
arm 7BS and 4A (Devos et al., 1995; Mayer et al., 2011;
MickelsonYoung et al., 1995; Miftahudin et al., 2004). The
evolutionary scenario proposed by Devos et al. (1995) and
Miftahudin et al. (2004) allowed the orientation of segments
D and E to be determined. On the basis of meiotic pairing
between the distal segments of 4AS, 4BS and 4DS, Naranjo
et al. (1987) have suggested retention of a small segment of
4ASancient in the distal part of modern 4AS; although no
genetic evidence for this was obtained by Devos et al. (1995),
two relevant EST sequences were located within this region
by bin mapping (Miftahudin et al., 2004). A BLASTN com-
parison of these two ESTs against the present set of 4AS and
4AL sequences produced either no hits or hits with restricted
sequence similarity and sequence alignment length (data
not shown), so it was not possible to confirm the presence of
this distal 4AS segment.
A virtual map of chromosome 4A
A map of chromosome 4A was assembled using the
GenomeZipper protocol (Mayer et al., 2011) from the
sequence data and synteny-based deductions (Figure 3,
Table 4 and Table S1). The ordering of segments A–E was
determined based on the marker map of barley (Close et al.,
2009). The number of markers involved in this process ran-
ged from 16 (segment C) to 127 (segment A) (Table 3). The
4AS arm is associated with 127 markers, and the 4AL arm
with 233. Between 79 genes (segment C) and 1182 genes
(segment A) were thus assigned to each of the segments,
resulting in the placement of approximately 3000 genes over
the whole chromosome (Table 3). The five segments varied
considerably in the extent of synteny with the other grass
genomes, and overall just 29% of the genes were conserved
across wheat and all three sequenced genomes (Figure 4).
Almost half of the genes in segment A were present in the
expected location in all three reference genomes, compared
to only approximately one-eighth in segment B. When the
criterion for support was reduced to just one of the three
heterologous genomes, the frequency of conservation
across the whole chromosome with Brachypodium was
22.3%, that with rice was 22.8%, but that with sorghum was
only 7.7%. At the level of the individual segments, the fre-
quency of conservation varied by as much as threefold, with
no evidence that the segments resulting from the two known
translocation events (segments D and E) showed a lower
level of conservation. The clearly unequal level of conser-
vation across the various Pooideae lineages underlines the
value of using more than one reference genome when
attempting synteny-based deduction of gene order.
Table 3 Wheat chromosome 4A GenomeZipper statistics
Parameter
Chromosome 4A segment
4AS 4AL 4AA B C D E
Number of markers 127 107 16 46 64 127 233 360Number of markers with associated genefrom reference genome(s)
92 52 8 27 27 92 114 206
Number of matched barley fl-cDNAs 600 393 29 123 114 600 659 1256Number of non-redundant sequence reads 10 628 18 415 969 5556 5126 10 628 26 584 37 212Number of non-redundant ESTs 935 700 60 217 204 935 1068 1996Number of Brachypodium genes 905 606 49 157 151 905 963 1865Number of rice genes 763 649 31 190 124 763 994 1754Number of sorghum genes 785 248 32 110 103 785 493 1278Number of anchored gene loci 1182 1110 79 300 262 1182 1751 2933
Overview of non-redundant data points anchored along chromosome 4A. The numbers refer to the chromosomal segments A–E, which form thechromosomal zippers for 4AS and 4AL, as well as for the whole chromosome 4A. Numbers given in the individual columns give non-redundantnumbers for each category. Due to small overlaps in segment and arm assignment few cases don’t have an unbiased assignment and can’t beresolved. Thus the non-redundant sums do not always match with the sum of individual values.
GenomeZipper analysis of wheat chromosome 4A 381
ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2012), 69, 377–386
Translocation and inversion breakpoints
Alignment of the 4A sequence against that of the barley,
Brachypodium, rice and sorghum genomes allowed precise
localization of the breakpoints associated with the various
rearrangements that have determined the structure of
chromosome 4A (Figures 2 and 3, and Tables 4 and 5).
Sequence comparison against a genome built of the barley
genome as well as against the reference genomes of Brac-
hypodium, rice and sorghum allow precise delineation and
detection of the regions where the rearrangements
occurred, with an almost single-gene resolution (Figures 2
and 3, and Tables 4 and 5). We analyzed the corresponding
regions in the reference genomes for syntenic intervals and
syntenic borders. Based on gene detection by sequence
comparisons of 4AS and 4AL, reads bordering syntenic
regions could be accurately identified. The regions range
between 370 kb and 28 Mb, and contain between 905 genes
and only 31 genes in the respective syntenic regions
(Table 4). The exact positioning of syntenic borders also
allows definition of the bordering genes for the individual
segments and their orthologous counterparts in the
reference genomes (Table 5). The approach based on
flow sorting-facilitated isolation of chromosome arms,
Table 4 Regions in Brachypodium, rice and sorghum sharing synteny with 4AS and 4AL, as deduced from in silico mapping
Reference genome Chromosome Start (Mp) Stop (Mp) Number of genes
Wheat chromosome arm 4ASSegment A Brachypodium 1 60.2 71.8 905
Rice 3 2.6 14.2 763Sorghum 1 58.8 69.7 785
Wheat chromosome arm 4ALSegment B Brachypodium 1 6.5 11.5 240
4 7.9 10.3 754 22.5 31.8 291
Rice 3 23.3 30.8 23311 2.3 11.9 20111 14.1 30.7 215
Sorghum 1 7.5 13.9 248
Segment C Brachypodium 1 71.5 72.6 49Rice 3 1.9 2.7 31Sorghum 1 70.0 71.1 32
Segment D Brachypodium 1 0.3 1.8 157Rice 3 35.3 37.3 123
12 0.1 2.1 67Sorghum 1 0.1 1.9 110
Segment E Brachypodium 1 48.1 50.5 151Rice 6 0.1 2.6 124Sorghum 10 0.6 2.3 103
Based on detected syntenic segments, the table gives the chromosome, start and stop coordinates on the respective reference genome and thenumber of genes located in these regions.
Figure 4. Conservation of synteny between chromosome 4A and rice, Brachypodium and sorghum.
The gene content of segments A–E of chromosome 4A was compared with that of the homologous regions in Brachypodium (Bd), rice (Os) and sorghum (Sb). The
Venn diagrams show the numbers of genes shared between wheat and the reference genomes.
382 Pilar Hernandez et al.
ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2012), 69, 377–386
high-throughput sequencing and comparative genome
analysis is thus capable of reconstructing genomes and
identifying evolutionary translocation breakpoints.
DISCUSSION
The structure of chromosome 4A
The large size and polyploidy of the bread wheat genome
pose a considerable challenge for its sequencing. Current
sequencing technology has the ability to acquire large
amounts of sequence economically, but sequence assem-
bly, and, most importantly, establishment of the gene order
within each of the 21 chromosomes are particularly difficult,
mostly because of the presence of homoeologous copies of
most genes and the extent of repetitive DNA present. For-
tunately, much synteny has been retained among Poaceae
species in general, and among the Triticeae species in par-
ticular (Bolot et al., 2009; Devos and Gale, 2000; Moore,
1995; Salse and Feuillet, 2007). A small number of major
chromosomal rearrangements are known among the Triti-
ceae (Devos et al., 1995), but collinearity has largely been
retained across the wheat and barley genomes, despite
their divergence approximately 12 Myr (Gaut, 2002). Within
wheat itself, chromosome 4A suffered the most significant
overall re-arrangement (Figures 1 and 2) (Devos et al., 1995;
Miftahudin et al., 2004), as confirmed by an extensive com-
parative study of the gene content of wheat and barley
(Mayer et al., 2011; Qi et al., 2004). Integration of sequence
and mapping data allowed recognition that chromosome 4A
comprises five separate segments. Segments A, B and C
originated from the pericentromeric inversion, while seg-
ments D and E arose from later interchanges with chromo-
somes 5A and 7B. All five segments were successfully
ordered and oriented to allow subsequent GenomeZipper-
based gene integration and positioning.
Identification of translocations and inversions
While assignment of 4AS-derived sequences to corre-
sponding syntenic segments in the reference genomes and
barley was relatively straightforward, the assignment was
much more complex for 4AL (Figure 3). Integration of the
resulting patterns and comparison with barley chromo-
somes 4H, 5H and 7H led to identification of five syntenic
segments A–E. The orientation of segments A, B and C was
evident from comparisons based on the most parsimonious
single pericentromeric inversion event. On the other hand,
the positioning and orientation of segments D and E, which
resulted from translocations from chromosomes 5A and 7B,
respectively, could not be deduced from synteny patterns
alone. In conjunction with genetic mapping data and a
derived order of segments and their orientation (Devos
et al., 1995; Miftahudin et al., 2004), all five segments were
ordered and oriented for GenomeZipper-based gene inte-
gration and positioning. Thus, by integrating genetic data
with our molecular and comparative data, a conclusive order
and orientation of segments and accordingly a linear order
of genes could be established. This demonstrates the power
of combining and integrating genetic data with chromo-
some next-generation sequencing-derived shotgun
sequence data and comparative and bioinformatic analysis.
Gene content of wheat chromosome 4A
A rather stringent comparison between the 4A sequences
and the various annotated reference genomes produced an
estimated gene content on chromosome 4A of >9500 genes, a
number that is rather higher than has been suggested for
either barley chromosome 4H (4000) or wheat chromosome
3B (6360) (Mayer et al., 2011; Paux et al., 2008). Other
estimates of gene number based on the analysis of individual
chromosomes have also diverged from those based on
Table 5 Overview of the breakpoints between the five chromosome4A segments
Segment Brachypodium Rice Sorghum
A Bradi1g72080.1 Os03g0187500 Sb01g044730.1Bradi1g72086.1 Os03g0187400 Sb01g044740.1Bradi1g72092.1 Os03g0187300 Sb01g044750.1
… … …Bradi1g65190.1 Os03g0296700 Sb01g038210.1Bradi1g65197.1 Os03g0296600 Sb01g038220.1Bradi1g65210.1 Os03g0296400 Sb01g038230.1
B Bradi4g26690.1 Os11g0150450 Sb01g013770.1Bradi4g26670.3 Os11g0151600 Sb01g013780.1Bradi4g26640.1 Os11g0152700 Sb01g013830.1
… … …Bradi1g13777.1 Os03g0652100 Sb01g013490.1Bradi1g13850.1 Os03g0648200 Sb01g013540.1Bradi1g13870.1 Os03g0645100 Sb01g013650.1
C Bradi1g75740.1 Os03g0138200 Sb01g047640.1Bradi1g75720.1 Os03g0140100 Sb01g047630.1Bradi1g75707.1 Os03g0141100 Sb01g047610.1
… … …Bradi1g75960.1 Os03g0147900 Sb01g047070.1Bradi1g75970.1 Os03g0147700 Sb01g047850.1Bradi1g76227.1 Os03g0136900 Sb01g047860.1
D Bradi1g00227.1 Os03g0861800 Sb01g000210.1Bradi1g00237.1 Os03g0860900 Sb01g000220.1Bradi1g00247.1 Os03g0860700 Sb01g000300.1
… … …Bradi1g02940.1 Os03g0823800 Sb01g002280.1Bradi1g02950.1 Os03g0822100 Sb01g002300.1Bradi1g02980.1 Os03g0821633 Sb01g002410.1
E Bradi1g49450.1 Os06g0122200 Sb10g001470.1Bradi1g49460.1 Os06g0125000 Sb10g001520.1Bradi1g49470.1 Os06g0125300 Sb10g001530.1
… … …Bradi1g52060.1 Os06g0103300 Sb10g000300.1Bradi1g52090.1 Os06g0102900 Sb10g000270.1Bradi1g52110.1 Os06g0102700 Sb10g000260.1
Only the first and last three syntenic genes anchored in each segmentare shown.
GenomeZipper analysis of wheat chromosome 4A 383
ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2012), 69, 377–386
whole-genome analyses (Mayer et al., 2009; Wicker et al.,
2011). One explanation is that individual chromosomes/
chromosome arms are compared with complete reference
genomes, which may result in a higher rate of false-positive
gene identifications due to the presence of cross-matching
paralogous sequences. In an analysis of shotgun sequences
from wheat homoeologous group 1 chromosomes, Wicker
et al. (2011) identified a significant number of potential
pseudogenes (similar to, or even exceeding the number of
functional genes) that shared homology with various known
genes but were not present in the syntenic regions of either
Brachypodium or rice. This underlines the value of the com-
parative approach when attempting estimation of the num-
ber of genes present on a particular wheat chromosome.
The GenomeZipper method identified 1182 genes on
4AS and 1751 on 4AL, so a total of approximately 3000 genes
supported by synteny was placed on the entire chromosome.
Important in understanding the context of this estimate are
(i) the sequencing depth achieved (2.2-fold for 4AS; 1.7-fold
for 4AL), (ii) the expected gene detection rate (85%, based
on the method described by Lander and Waterman, 1988),
and (iii) that 20–25% of wheat genic sequences fail to detect a
close homolog in the reference genomes (Mayer et al., 2011).
Based on these considerations, we estimate the gene con-
tent of chromosome 4A to be approximately 4300 (2933/
0.85 · 100/80). Assuming that the gene density on chromo-
some 4A is representative of the A genome as a whole, and
given that its physical length is 15.6% of the entire genome,
the A genome contains approximately 28 000 genes, a
number largely in line with estimates for both the B genome
(38 000; Choulet et al., 2010) and for barley (32 000; Mayer
et al., 2011). However, due to the series of translocations of
presumably gene-rich telomeric regions that shaped the
modern chromosome 4A, the gene content of chromosome
4A may deviate from that of other less rearranged wheat
chromosomes. Thus chromosomal shotgun sequences for
other chromosomes will be helpful to refine gene estimates
for the individual wheat sub-genomes.
Limitations in resolution
A high-resolution EST map has been constructed for both
Ae. tauschii, the D genome donor species (Luo et al., 2009),
and chromosome 3B (Paux et al., 2008). At present, only
binned EST markers (Qi et al., 2004) are available for the
other wheat chromosomes. Bin maps lack sufficient reso-
lution to be used for syntenic integration and genome zip-
ping, which is why it was necessary here to rely on the barley
genetic map. The validity of this approach depends on the
retention of a high degree of synteny between barley and
wheat; any small-scale rearrangements will not be detected
until a dense marker map of the wheat genome has been
generated. Nevertheless, it was still possible to identify with
high precision the boundaries between the five segments
that arose as a result of the evolutionary inversion and
translocations. Earlier research based on mapping of cDNA
RFLP loci (Devos et al., 1995) and bin mapping (Miftahudin
et al., 2004) suggested the presence of at least two other
segments on chromosome 4A, but we have not been able to
confirm the presence of either of these. The availability of
higher-resolution genetic maps (which are certainly attain-
able given the volume of relevant sequence data now
available) will enable confirmation of the veracity of these
proposed additional structural rearrangements. A full com-
parative analysis awaits acquisition of genomic sequence
from chromosomes 4B and 4D, and from the translocated
portions of 5A and 7B. These data will enable determination
of the degree of similarity between homoeologs with respect
to gene content and potential loss of genes. Identification of
the translocation breakpoints on chromosomes 7B and 5A
may also allow recognition of molecular signatures and the
molecular environment that marks these translocations.
CONCLUSION
We have demonstrated here that fractionation of the com-
plex wheat genome into single chromosome arms, coupled
with the analysis of shotgun sequences using GenomeZip-
per, provides a successful strategy for constructing a high-
resolution gene-based chromosome map. The acquisition of
a complete ordered gene map, and ultimately of the genome
sequence itself, requires the development of a reliable
physical map, the construction of which is presently being
coordinated by the International Wheat Genome Sequenc-
ing Consortium. A physical map, together with chromosome
survey sequences, offers an ideal means of performing a
detailed analysis of chromosomal rearrangements. Further
developments in sequencing efficiency should also provide
opportunities to improve both chromosome coverage and
gene detection rate. This will eventually enable discovery of
the full genomic gene territories to enable study of the gene
structure and associated non-transcribed elements such as
cis elements.
EXPERIMENTAL PROCEDURES
Plant material
The 4A double di-telosomic stock of bread wheat cv. Chinese Springis a stable line in which chromosome 4A is represented by a pair oftelosomes, one of which is the short arm (4AS) and the other thelong arm (4AL) (Sears and Sears, 1978). Grain of this stock waskindly provided by Dr Bikram Gill (Department of Plant Pathology,Kansas State University, Manhattan, KS).
Chromosome sorting and DNA amplification
Liquid suspensions of mitotic chromosomes were prepared fromseedling root tips as described by Vrana et al. (2000). Telosomeswere isolated and sorted using a FACSVantage SE flow cytometer(Becton Dickinson) into 40 ll sterile deionized water. The level ofpurity of the sorted material was determined using fluorescencein situ hybridization based on a probe containing either the telo-meric repeat Afa or [GAA]n, as described by Kubalakova et al.
384 Pilar Hernandez et al.
ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2012), 69, 377–386
(2003). The flow-sorted chromosomes were treated with proteinase,and DNA was then extracted using a Microcon YM-100 column(Millipore, http://www.millipore.com/), as described by Simkovaet al. (2008). Chromosomal DNA was amplified by multiple dis-placement amplification using an Illustra GenomiPhi V2 DNAamplification kit (GE Healthcare, http://www.gehealthcare.com),and a Roche shotgun library (http://www.roche.com) was thencreated for each chromosome arm based on 5 lg multipledisplacement-amplified DNA.
DNA sequencing and analysis
Sequencing of the 4AS and 4AL libraries was performed at theLifesequencing S.L. facilities in Valencia (Spain) (http://www.lifesequencing.com/) on a Genome Sequencer FLX instrument (Roche),using titanium chemistry 454 Life Sciences Technology (Roche).Three full sequencing runs were performed for the 4AL library andtwo for the 4AS library. Repetitive DNA was masked using VMATCH
software (http://www.vmatch.de/), using the MIPS-REDAT POA-CEAE version 8.6.2 repeat library as a reference (http://mips.helmholtz-muenchen.de/plant/genomes.jsp). The following para-meters were applied: 70% identity cut-off, 100 bp minimal length,seed length 14, exdrop 5, e-value 0.001. To estimate the number ofgenes present, the repeat-filtered sequence reads were comparedby TBLASTX against the coding sequences for Brachypodium(ftp://ftpmips.helmholtz-muenchen.de/plants/brachypodium/v1.2),rice (rice RAP-DB genome build 4, http://rapdb.dna.affrc.go.jp)and SORGHUM (version 1.4, http://genome.jgi-psf.org/Sorbi1/Sorbi1.download.ftp.html).
GenomeZipper analysis
The GenomeZipper workflow described by Mayer et al. (2011) wasused, with some adjustments. Comparison and integration of theshotgun sequence into a linear gene order reference were achievedby exploiting synteny with barley, Brachypodium, rice and sor-ghum. The 4A segments were delineated by a BLASTN comparisonof the shotgun sequence data with that of barley artificial chromo-somes (Mayer et al., 2011). Only hits showing at least 85% identityand a minimum alignment of 100 bp were considered. BLASTX wasused to identify homologs in the reference genomes, applying acriterion of >70% similarity and a minimum length of 30 aminoacids. To position and orient genes, a selection of genes present inboth the five 4A segments and the relevant syntenic regions of theother grass genome(s) was aligned using a marker-based map ofbarley chromosomes 4H, 5H and 7H.
ACKNOWLEDGEMENTS
We warmly acknowledge the help provided by Jarmila Cıhalıkova,Romana Sperkova and Zdenka Dubska in chromosome sorting, aswell as the helpful comments of two anonymous reviewers. Thisresearch was financially supported by the Spanish Ministry of Sci-ence and Innovation (grant numbers BIO2009–07443, BIO2011–15237 and AGL2010–17316), the German Ministry of Education andResearch GABI Barlex project, the European Commission FP7-212019 Triticeae Genome grant, the Czech Science Foundation(awards 521/08/1629 and P501/10/1740), and the Czech RepublicMinistry of Education, Youth and Sports/European RegionalDevelopment Fund (Operational Programme Research and Devel-opment for Innovations grant number CZ.1.05/2.1.00/01.0007).
SUPPORTING INFORMATION
Additional Supporting Information may be found in the onlineversion of this article:Table S1. GenomeZipper analysis of wheat chromosome 4A.
Please note: As a service to our authors and readers, this journalprovides supporting information supplied by the authors. Suchmaterials are peer-reviewed and may be re-organized for onlinedelivery, but are not copy-edited or typeset. Technical supportissues arising from supporting information (other than missingfiles) should be addressed to the authors.
REFERENCES
Araki, E., Miura, H. and Sawada, S. (1999) Identification of genetic loci
affecting amylose content and agronomic traits on chromosome 4A of
wheat. Theor. Appl. Genet. 98, 977–984.
Bai, G.H., Chen, C.X. and Cai, S.B. (2008) A major QTL controlling seed dor-
mancy and pre-harvest sprouting resistance on chromosome 4A in a Chi-
nese wheat landrace. Mol. Breeding, 21, 351–358.
Berkman, P.J., Skarshewski, A., Lorenc, M.T. et al. (2011) Sequencing and
assembly of low copy and genic regions of isolated Triticum aestivum
chromosome arm 7DS. Plant Biotechnol. J. 9, 768–775.
Bolot, S., Abrouk, M., Masood-Quraishi, U., Stein, N., Messing, J., Feuillet, C.
and Salse, J. (2009) The ‘inner circle’ of the cereal genomes. Curr. Opin.
Plant Biol. 12, 119–125.
Borner, A., Schumann, E., Furste, A., Coster, H., Leithold, B., Roder, M.S. and
Weber, W.E. (2002) Mapping of quantitative trait loci determining agro-
nomic important characters in hexaploid wheat (Triticum aestivum L.).
Theor. Appl. Genet. 105, 921–936.
Chen, X.M., Line, R.F. and Jones, S.S. (1995) Chromosomal location of genes
for resistance to Puccinia striiformis in winter-wheat cultivars Heines-Vii,
Clement, Moro, Tyee, Ikes, and Daws. Phytopathology, 85, 1362–1367.
Chen, X.M., Luo, Y.H., Xia, X.C., Xia, L.Q., Chen, X., Ren, Z.L., He, Z.H. and Jia,
J.Z. (2005) Chromosomal location of powdery mildew resistance gene
Pm16 in wheat using SSR marker analysis. Plant Breeding, 124, 225–228.
Choulet, F., Wicker, T., Rustenholz, C. et al. (2010) Megabase level sequenc-
ing reveals contrasted organization and evolution patterns of the wheat
gene and transposable element spaces. Plant Cell, 22, 1686–1701.
Close, T.J., Bhat, P.R., Lonardi, S. et al. (2009) Development and implemen-
tation of high-throughput SNP genotyping in barley. BMC Genomics, 10,
582.
Devos, K. and Gale, M. (2000) Genome relationships: the grass model in
current research. Plant Cell, 12, 637–646.
Devos, K.M., Dubcovsky, J., Dvorak, J., Chinoy, C.N. and Gale, M.D. (1995)
Structural evolution of wheat chromosomes 4A, 5A, and 7B and its impact
on recombination. Theor. Appl. Genet. 91, 282–288.
Dolezel, J., Kubalakova, M., Paux, E., Bartos, J. and Feuillet, C. (2007) Chro-
mosome-based genomics in the cereals. Chromosome Res. 15, 51–66.
Dvorak, J. and Zhang, H.B. (1990) Variation in repeated nucleotide sequences
sheds light on the phylogeny of the wheat B and G genomes. Proc. Natl
Acad. Sci. USA, 87, 9640–9644.
Dvorak, J., Terlizzi, P., Zhang, H.B. and Resta, P. (1993) The evolution of
polyploid wheats: identification of the A genome donor species. Genome,
36, 21–31.
Effertz, R.J., Anderson, J.A. and Francl, L.J. (2001) Restriction fragment length
polymorphism mapping of resistance to two races of Pyrenophora tritici-
repentis in adult and seedling wheat. Phytopathology, 91, 572–578.
Feuillet, C., Travella, S., Stein, N., Albar, L., Nublat, A. and Keller, B. (2003)
Map-based isolation of the leaf rust disease resistance gene Lr10 from the
hexaploid wheat (Triticum aestivum L.) genome. Proc. Natl Acad. Sci. USA,
100, 15253–15258.
Flavell, R.B. (1986) Repetitive DNA and chromosome evolution in plants.
Philos. Trans. R. Soc. Lond. B Biol. Sci. 312, 227–242.
Gaut, B.S. (2002) Evolutionary dynamics of grass genomes. New Phytol. 154,
15–28.
International Brachypodium Initiative (2010) Genome sequencing and anal-
ysis of the model grass Brachypodium distachyon. Nature, 463, 763–768.
International Rice Genome Sequencing Project (2005) The map-based se-
quence of the rice genome. Nature, 436, 793–800.
Keller, M., Karutz, C., Schmid, J.E., Stamp, P., Winzeler, M., Keller, B. and
Messmer, M.M. (1999) Quantitative trait loci for lodging resistance in a
segregating wheat x spelt population. Theor. Appl. Genet. 98, 1171–1182.
Kilian, B., Ozkan, H., Deusch, O., Effgen, S., Brandolini, A., Kohl, J., Martin, W.
and Salamini, F. (2007) Independent wheat B and G genome origins in
outcrossing Aegilops progenitor haplotypes. Mol. Biol. Evol. 24, 217–227.
GenomeZipper analysis of wheat chromosome 4A 385
ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2012), 69, 377–386
Kubalakova, M., Vrana, J., Cıhalıkova, J., Simkova, H. and Dolezel, J. (2002)
Flow karyotyping and chromosome sorting in bread wheat (Triticum aes-
tivum L.). Theor. Appl. Genet. 104, 1362–1372.
Kubalakova, M., Valarik, M., Bartos, J., Vrana, J., Cıhalıkova, J., Molnar-Lang,
M. and Dolezel, J. (2003) Analysis and sorting of rye (Secale cereale L.)
chromosomes using flow cytometry. Genome, 46, 893–905.
Lander, E.S. and Waterman, M.S. (1988) Genomic mapping by fingerprinting
random clones: a mathematical analysis. Genomics, 2, 231–239.
Luo, M.C., Deal, K.R., Akhunov, E.D. et al. (2009) Genome comparisons reveal
a dominant mechanism of chromosome number reduction in grasses and
accelerated genome evolution in Triticeae. Proc. Natl Acad. Sci. USA, 106,
15780–15785.
Mayer, K.F., Taudien, S., Martis, M. et al. (2009) Gene content and virtual gene
order of barley chromosome 1H. Plant Physiol. 151, 496–505.
Mayer, K.F., Martis, M., Hedley, P.E. et al. (2011) Unlocking the barley
genome by chromosomal and comparative genomics. Plant Cell, 23,
1249–1263.
McCartney, C.A., Somers, D.J., Humphreys, D.G., Lukow, O., Ames, N., Noll,
J., Cloutier, S. and McCallum, B.D. (2005) Mapping quantitative trait loci
controlling agronomic traits in the spring wheat cross RL4452 x ‘AC
Domain’. Genome, 48, 870–883.
McFadden, E. and Sears, E. (1946) The origin of Triticum spelta and its free-
threshing hexaploid relatives. J. Hered. 37, 107–116.
MickelsonYoung, L., Endo, T.R. and Gill, B.S. (1995) A cytogenetic ladder-map
of the wheat homoeologous group-4 chromosomes. Theor. Appl. Genet.
90, 1007–1011.
Miftahudin, R.K., Ma, X.F., Mahmood, A.A. et al. (2004) Analysis of
expressed sequence tag loci on wheat chromosome group 4. Genetics,
168, 651–663.
Moore, G. (1995) Cereal genome evolution – pastoral pursuits with lego
genomes. Curr. Opin. Genet. Dev. 5, 717–724.
Naranjo, T., Roca, A., Goicoechea, P.G. and Giraldez, R. (1987) Arm homo-
eology of wheat and rye chromosomes. Genome, 29, 873–882.
Nga, N.T.T., Hau, V.T.B. and Tosa, Y. (2009) Identification of genes for resis-
tance to a Digitaria isolate of Magnaporthe grisea in common wheat cul-
tivars. Genome, 52, 801–809.
Paterson, A.H., Bowers, J.E., Bruggmann, R. et al. (2009) The Sorghum
bicolor genome and the diversification of grasses. Nature, 457, 551–556.
Paull, J.G., Chalmers, K.J., Karakousis, A., Kretschmer, J.M., Manning, S.
and Langridge, P. (1998) Genetic diversity in Australian wheat varieties
and breeding material based on RFLP data. Theor. Appl. Genet. 96,
435–446.
Paux, E., Sourdille, P., Salse, J. et al. (2008) A physical map of the 1-gigabase
bread wheat chromosome 3B. Science, 322, 101–104.
Qi, L.L., Echalier, B., Chao, S. et al. (2004) A chromosome bin map of 16,000
expressed sequence tag loci and distribution of genes among the three
genomes of polyploid wheat. Genetics, 168, 701–712.
Safar, J., Simkova, H., Kubalakova, M., Cıhalıkova, J., Suchankova, P., Bartos,
J. and Dolezel, J. (2010) Development of chromosome-specific BAC
resources for genomics of bread wheat. Cytogenet. Genome Res. 129,
211–223.
Salse, J. and Feuillet, C. (2007) Comparative genomics of cereals. In
Genomics-Assisted Crop Improvement (Rajeev, K. and Varshney, R.T.,
eds). New York: Springer, pp. 177–205.
Sarkar, P. and Stebbins, G.L. (1956) Morphological evidence concerning the
origin of the B genome in wheat. Am. J. Bot. 43, 297–304.
Sears, E.R. and Sears, L.M.S. (1978) The telocentric chromosomes of common
wheat. In Proceedings of the 5th International Wheat Genetics Symposium
(Ramanujam, S. ed.). New Dehli: Indian Soc. Genet Plant Breed, pp. 389–
407.
Simkova, H., Svensson, J.T., Condamine, P., Hribova, E., Suchankova, P.,
Bhat, P.R., Bartos, J., Safar, J., Close, T.J. and Dolezel, J. (2008) Coupling
amplified DNA from flow-sorted chromosomes to high-density SNP map-
ping in barley. BMC Genomics, 9, 294.
Sourdille, P., Cadalen, T., Gay, G., Gill, B. and Bernard, M. (2002) Molecular
and physical mapping of genes affecting awning in wheat. Plant Breeding,
121, 320–324.
Talbert, L.E., Bruckner, P.L., Smith, L.Y., Sears, R. and Martin, T.J. (1996)
Development of PCR markers linked to resistance to wheat streak mosaic
virus in wheat. Theor. Appl. Genet. 93, 463–467.
Vrana, J., Kubalakova, M., Simkova, H., Cıhalıkova, J., Lysak, M.A. and
Dolezel, J. (2000) Flow sorting of mitotic chromosomes in common wheat
(Triticum aestivum L.). Genetics, 156, 2033–2041.
Wang, G.Z., Miyashita, N.T. and Tsunewaki, K. (1997) Plasmon analyses of
Triticum (wheat) and Aegilops: PCR-single-strand conformational poly-
morphism (PCR-SSCP) analyses of organellar DNAs. Proc. Natl Acad. Sci.
USA, 94, 14570–14577.
Wicker, T., Mayer, K.F., Gundlach, H. et al. (2011) Frequent gene movement
and pseudogene evolution is common to the large and complex genomes
of wheat, barley, and their relatives. Plant Cell, 23, 1706–1718.
386 Pilar Hernandez et al.
ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2012), 69, 377–386