Using next-generation sequencing for molecular reconstruction of past Arctic vegetation and climate
Transcript of Using next-generation sequencing for molecular reconstruction of past Arctic vegetation and climate
TECHNICAL ADVANCES
Using next-generation sequencing for molecularreconstruction of past Arctic vegetation and climate
J. H. SØNSTEBØ,* L. GIELLY,† A. K. BRYSTING,‡ R. ELVEN,* M. EDWARDS,§ J. HAILE,¶**
E. WILLERSLEV,¶ E. COISSAC,† D. RIOUX,† J. SANNIER,* P. TABERLET† and C. BROCHMANN*
*National Centre for Biosystematics, Natural History Museum, University of Oslo, P.O.Box 1172 Blindern, NO-0318 Oslo,
Norway, †Laboratoire d’Ecologie Alpine, UMR UJF-CNRS 5553, Universite Joseph Fourier, 38041 Grenoble, France, ‡Centre for
Ecological and Evolutionary Synthesis, Department of Biology, University of Oslo, P.O.Box 1066 Blindern, NO-0316 Oslo,
Norway, §School of Geography, University of Southampton, Room 2019 Shackleton Building, Highfield, Southampton, SO17 1BJ,
UK, ¶Centre for Ancient Genetics, University of Copenhagen, Universitetsparken 15, DK-2100, Denmark, **Department of
Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK
Abstract
Palaeoenvironments and former climates are typically inferred from pollen and macrofossil
records. This approach is time-consuming and suffers from low taxonomic resolution and
biased taxon sampling. Here, we test an alternative DNA-based approach utilizing the P6
loop in the chloroplast trnL (UAA) intron; a short (13–158 bp) and variable region with highly
conserved flanking sequences. For taxonomic reference, a whole trnL intron sequence data-
base was constructed from recently collected material of 842 species, representing all wide-
spread and ⁄ or ecologically important taxa of the species-poor arctic flora. The P6 loop alone
allowed identification of all families, most genera (>75%) and one-third of the species, thus
providing much higher taxonomic resolution than pollen records. The suitability of the P6
loop for analysis of samples containing degraded ancient DNA from a mixture of species is
demonstrated by high-throughput parallel pyrosequencing of permafrost-preserved DNA
and reconstruction of two plant communities from the last glacial period. Our approach
opens new possibilities for DNA-based assessment of ancient as well as modern biodiversity
of many groups of organisms using environmental samples.
Keywords: ancient DNA, Arctic Flora, chloroplast, climate change, DNA barcoding sensu lato,
environmental sample
Received 16 September 2009; revision received 14 December 2009, 27 January 2010; accepted 4 February 2010
Introduction
Many species may face extinction because of the current
climate warming (Thomas et al. 2004). Bioclimatic models
can be used to estimate relationships between species
distribution and climate and to predict future range
shifts and extinction risks (Araujo & Rahbek 2006). How-
ever, improved knowledge of historical species distribu-
tions and climates is needed for backward testing of
models before applying them to future climate scenarios
(Guisan & Thuiller 2005).
Past plant communities are traditionally inferred
from identification of fossil pollen or macrofossils, but
the taxonomic sampling and resolution is limited, partic-
ularly in the Arctic (Lamb & Edwards 1988). An alterna-
tive approach is molecular community analyses which
utilize information from ancient DNA accumulated in
soil (Willerslev et al. 2003). Here, traces in environmental
samples can be identified to species, genera or family
using diagnostic variation in a suitable DNA region,
which may be termed ‘DNA barcoding sensu lato’ (Valen-
tini et al. 2009). This approach has, for example, been
used to identify boreal tree species from silty ice at the
bottom of the Greenland Ice Sheet, showing that southern
Greenland was covered with forest between 450 000 and
800 000 years ago (Willerslev et al. 2007).
Traditionally, identification of organisms in environ-
mental (e.g. soil) samples containing DNA from a mix-
ture of organisms is time-consuming and expensiveCorrespondence: Christian Brochmann, Fax: +47 22 85 18 35;
E-mail: [email protected]
� 2010 Blackwell Publishing Ltd
Molecular Ecology Resources (2010) doi: 10.1111/j.1755-0998.2010.02855.x
because the different amplified DNA fragments have to
be cloned into bacteria. Often, thousands of clones have
to be Sanger sequenced to detect the diversity of organ-
isms of interest. However, recently, new sequencing tech-
nologies have become available that bypass the cloning
step, which makes them ideal for analysis of mixed DNA
in environmental samples.
Molecular community analysis, especially of ancient
DNA, is not only dependent on a robust system with
primers amplifying over a wide range of taxa but also
on targeting a very short DNA region (<200 bp) that is
sufficiently variable and can be amplified from highly
degraded DNA (Willerslev & Cooper 2005; Taberlet
et al. 2007). Previously, Hajibabaei et al. (2006) found
that a short DNA region within the mitochondrial gene
cytochrome c oxidase I was effective to identify species
of wasps and moths. In plants, earlier studies of
ancient DNA have used parts of the rbcL chloroplast
gene (Poinar et al. 1998; Hofreiter et al. 2000; Willerslev
et al. 2003), but this system usually only enables identi-
fication to the family level (Willerslev et al. 2003). As
an alternative, Taberlet et al. (2007) suggested using a
part of the trnL (UAA) intron, the P6 loop, to identify
plant species from highly degraded DNA. In their anal-
ysis of modern DNA from 123 arctic species, it was
possible to distinguish close to 50% of the species with
the P6 loop only (Taberlet et al. 2007). This suggests
that the P6 loop may serve as a suitable marker for
reconstructing historical plant communities in species-
poor floras such as in the Arctic, which also provides
unique conditions for ancient DNA preservation in per-
mafrost soils.
Here, we assess the usefulness of the P6 loop for
reconstructing the composition of past arctic plant com-
munities from a mixture of ancient DNA sequences pre-
served in permafrost soils. We expand the taxonomic
reference database of Taberlet et al. (2007) to include all
widespread and ⁄ or ecologically important species
occurring in the Arctic (842 species). As our main pur-
pose is to develop a system for identification of species
from permafrost soil samples, the taxonomic resolution
of the P6 loop was compared to the maximum resolu-
tion that can be obtained from pollen analysis. To com-
pare the level of resolution with other DNA regions, the
chloroplast DNA barcoding regions proposed by Chase
et al. (2007) were also sequenced for several genera
which showed low or medium variation in the P6 loop.
We also used a bioinformatic approach to identify alter-
native regions based on all available complete cpDNA
genomes and to evaluate their potential usefulness. The
P6 loop approach was then tested by high-throughput
pyrosequencing on 454 Genome Sequencer FLX (Roche)
(Margulies et al. 2005) of dated samples from permafrost
soils.
Material and methods
Construction of the taxonomic reference database
Leaves from herbarium specimens or silica-dried leaf
samples of 856 species deposited at the Natural History
Museum in Oslo were analysed (Appendix S1). These
species represent 84 families and 289 genera following
the taxonomy of the Panarctic Flora checklist (Elven
2008). For most species, two specimens (in some cases
representing two subspecies) were selected from differ-
ent parts of the species distribution area.
Sequencing of the whole trnL intron was conducted in
parallel in our laboratories in Grenoble and Oslo and
merged into a single database. DNA extraction and poly-
merase chain reaction (PCR) of the trnL intron were per-
formed as reported for the arctic plant dataset in Taberlet
et al. (2007) with the following modification: PCR was
performed in volumes of 10 lL in Oslo and 25 lL in Gre-
noble; the concentration of the primers was 0.5 lM
instead of 1 lM; and the number of PCR cycles was 30 in
Oslo and 35 in Grenoble.
PCR for the other cpDNA regions was as follows:
trnH-psbA spacer – PCR was performed in volumes of
10 lL containing 2.5 mM MgCl2, 160 lM of each dNTP,
0.4 lM of each primer, 0.1 lg ⁄ mL bovine serum albumin
(BSA; Roche Diagnostic, Basel, Switzerland) and 0.4 U
of AmpliTaq DNA Polymerase (Applied Biosystems).
The trnH-psbA spacer was amplified with primers
trnHGUG and psbA (Sang et al. 1997). matK, rpoB and
rpoC1 – PCR was performed in volumes of 10 lL accord-
ing to the protocols of the Royal Botanic Gardens, Kew
(http://www.kew.org/barcoding/protocols.html). The
primers used for amplifying these three markers are also
available on this website. The matK region was ampli-
fied with the updated forward primer X and the reverse
primer 5, and the rpoB and the rpoC1 regions were
amplified by the forward primer 1 and the reverse pri-
mer 3.
Following the PCR amplification, PCR products were
diluted four times (in Oslo only) and purified with Exo-
SAP-IT (USB Corporation) according to the manufac-
turer’s instructions to remove excess primers and
deoxynucleotide triphosphates. Sequencing was per-
formed on both strands as described by Taberlet et al.
(2007), but in volumes of 10 lL containing 0.5 lM of the
amplification primer and 1 lL of the PCR products (in
Oslo only). For trnH-psbA, matK, rpoB and rpoC1, the
sequencing reactions underwent 30 s at 95 �C and 35
cycles of 30 s at 94 �C, 15 s at 50 �C and 4 min at 60 �C.
Excess dye terminators were removed by spin-column
purification. Sequencing reactions were electrophoresed
on an ABI PRISM�3100 Genetic Analyzer (Applied Bio-
systems) using either 80-cm capillaries and POP-4�
� 2010 Blackwell Publishing Ltd
2 T E C H N I C A L A D V A N C E S
polymer for 3 h (Oslo) or 36-cm capillaries and POP-
7�polymer for 1 h (Grenoble).
Data analysis
PCR of the P6 loop (primers g and h; Taberlet et al. 2007)
and the whole trnL intron (after adding the c and d prim-
ers (Taberlet et al. 2007) to the sequences) was simulated
on the whole taxonomic reference database using the
electronic PCR software ecoPCR (Taberlet et al. 2007).
This was performed to identify mismatch in the g and h
primer sequences and to identify to which taxonomic
unit each sequence could be identified, e.g. whether a
sequence represented a single species or a group of spe-
cies. The ecoPCR software counts the maximum mis-
match of primers (maximum mismatch of three bases
was allowed), minimum and maximum length of the
amplified sequence and retrieves taxonomic data from
the analysed entries. We also performed local Blast
searches on the database for each sequence to record dif-
ferent P6 loop and whole trnL intron haplotypes and
intraspecific variation based on the two (or in a few cases
three or four) sequences from each species (Appen-
dix S1).
Variation in the DNA barcoding regions matK, rpoB,
rboC1 and trnH-psbA was compared with the variation in
the P6 loop and the whole trnL intron by counting the
number of indels (indels covering several characters were
scored as one) and substitutions and calculating the per-
centage variability. The distribution of the variation
along the matK region and the trnL intron was visualized
using the FINGERPRINT software (Lou & Golding 2007; see
Fig. 2).
Tracking other potentially suitable regions based on allavailable complete cpDNA genomes
We used a bioinformatic approach to identify other
potentially suitable regions for analysis of degraded
DNA, taking advantage of the 110 currently available
complete cpDNA genomes (Streptophyta complete
cpDNA genomes). A potentially suitable region was
defined as a short and variable mid-region flanked by
two conserved regions that will be the target sequences
of the amplification primers. For finding all potentially
suitable regions, we first identified all conserved regions
of at least 18 bp among all the cpDNA genomes. Then,
we paired these regions to generate a set of all available
mid-regions shorter than 200 bp. Each of these mid-
regions (potential barcodes sensu lato) was evaluated for
its coverage (proportion of species potentially amplified
by the primer pair) and for its specificity (proportion of
species unambiguously identified among the species
potentially amplified).
Pollen resolution
The maximum potential taxonomic resolution that can
be obtained from pollen data was assessed using two
palynological keys (Fægri & Iversen 1964; Moore et al.
1991).
Permafrost analysis
Two permafrost samples were collected from an expo-
sure of late-Quaternary, ice-rich, silty sediments on the
Mayn River, Southern Chukotka, Russia (64�17¢N;
171�15¢E; A. Sher & E. Willerslev, unpublished). The sam-
ples were radiocarbon dated at the Oxford University
Accelerator Unit. Total DNA was extracted from approxi-
mately 6 g of sediment from each permafrost sample
using a combined Sergey Bulat protocol (Bulat et al. 2000)
and Cambio PowerMax Soil kit protocol, which employs
a silica clean-up method. The DNA extracts underwent
amplification of the P6 loop in a 50-lL volume containing
2 U AmpliTaq� Gold DNA Polymerase (Applied Biosys-
tems, Foster City, CA), 10 mM Tris-HCl, 50 mM KCl,
2 mM MgCl2, 0.2 mM of each dNTPs, 0.2 lM of each pri-
mer g and h (Taberlet et al. 2007), and 0.008 mg of BSA
and 1-lL DNA extract as template. PCR cycling condi-
tions were as follows: 10 min at 95 �C (enzyme activa-
tion, manufacturer’s condition), 15 cycles of 30 s at 95 �C
and 30 s at 50 �C followed by 40 cycles of 30 s at 95 �C
and 30 s at 60 �C. Elongation steps were omitted because
of the short length of the target sequences. DNA was
amplified with ‘fusion’ primers: primers g and h were
modified by the addition of a specific 6-base pair-long tag
and the sequencing primer at the 5¢ end (Jarvie 2007) to
facilitate parallel sequencing of multiple homologous
samples (Valentini et al. 2009). The sequences obtained
by pyrosequencing were subsequently compared to the
P6 loop database to assign them to the particular arctic
taxa that fully matched in lengths and sequences using
the software ecoPCR (Taberlet et al. 2007).
Results
Resolution of the P6 loop and the whole trnL intron
A total of 842 species was successfully sequenced for the
whole trnL intron. Two or more samples were sequenced
for each of 709 of these species (Appendix S1). Fourteen
species that were attempted, but not successfully
sequenced, are also listed in Appendix S1. The ecoPCR
analysis showed that the P6 loop varied in length from
13 bp to 158 bp (excluding primer target sites) and that
the primers g and h were generally well conserved, with
a maximum of two and three mismatches, respectively.
Ninety-eight per cent of the primer sequences exhibited a
� 2010 Blackwell Publishing Ltd
T E C H N I C A L A D V A N C E S 3
maximum of one mismatch. The mismatches were not
located close to the 3¢ end of primer (Appendix S2).
The ecoPCR analysis further showed that a total of
33.5% of the species and 77.1% of the genera could be
identified by the P6 loop. Among the identified species,
10.3% were represented by a single species in each family
(Fig. 1a). Based on the whole trnL intron, 59.5% of the
species and 98.5% of the genera could be identified
(Fig. 1b). All families were unambiguously identified
with both the P6 loop and the whole trnL intron. The
level of resolution (both in terms of species- and genus
resolution) of the P6 loop varied among families (Fig. 1a).
Only few species could be identified in the Salicaceae
and Papaveraceae. The P6 loop resolution was also low
in the Poaceae and Asteraceae, with only about 10% of
the species and 20–40% of the genera identified. In other
(a)
(b)
(c)
Fig. 1 Power of the P6 loop (a), the trnL
intron (b) and pollen (c) for discrimination
between genera and species in plant fami-
lies with more than two genera or five spe-
cies included in the taxonomic reference
data base. Black and grey histograms
show the percentages of discriminated
species and genera, respectively. Numbers
in parentheses refer to number of species
and number of genera analysed.
� 2010 Blackwell Publishing Ltd
4 T E C H N I C A L A D V A N C E S
important arctic families, such as the Cyperaceae, Brass-
icaceae, Caryophyllaceae, Ranunculaceae and Saxifraga-
ceae, the species resolution was 25–50%, but in these
families, all genera could be unambiguously identified
(Fig. 1a).
In several genera, for which the P6 loop provided
overall low species resolution, we could discriminate
among intrageneric groupings of species. In Ranunculus,
for example, the 13 species represented six identifiable
taxonomic sections, and in Pedicularis, five species
belonged to the P. sudetica aggregate that could be sepa-
rated from all other species (Appendix S1).
The intraspecific variation identified by the P6 loop
was generally low. Variation was found in 80 of the 709
species that were represented by two or more accessions.
This was usually length variation in poly(A) ⁄ poly(T)
regions (32 species), variation at single-nucleotide sites
(28 species) or indels of variable lengths (20 species)
(Appendix S1). The intraspecific variation did not usu-
ally affect the species resolution, except that the variation
in the length of poly(A) ⁄ poly(T) regions sometimes was
not species specific (particularly in Vaccinium).
Resolution of other molecular markers sequenced
The other molecular markers matK, trnH-psbA, rpoC1 and
rpoB (Chase et al. 2007) were sequenced for 24
(FN668453-FN668476), 22 (FN668431-FN668452), 33
(FN668504-FN668536) and 29 (FN668477-FN668503) spe-
cies, respectively, belonging to six different genera (App-
endix S3). Amplification was unsuccessful in one genus
each for matK, rpoB and trnH-psbA (Appendix S3). The
markers varied considerably in level of resolution among
genera, as was observed for the whole trnL intron and for
the P6 loop. matK and trnH-psbA were most variable,
similar to or more variable than the whole trnL intron
(Appendix S3). When combining sequences from all gen-
era, the variation appeared distributed evenly through-
out the whole regions, except in the trnL intron, where
clusters of conserved characters were found around the
variable P6 loop (illustrated for matK vs trnL in Fig. 2).
Other potentially suitable cpDNA regions identifiedusing a bioinformatics approach
Using a bioinformatics approach, quality of a barcode
region can be estimated measuring its coverage, the abil-
ity of the primer pair to amplify a large number of taxa
and measuring its specificity as its capacity to discrimi-
nate between taxa (Ficetola GF, Coissac E, Zundel S, Riaz
T, Bessiere J, Taberlet P, Pompanon F unpublished).
These indices allow for relatively evaluating the utility of
the different regions as barcodes useable for degraded
DNA. Following these criteria, Table 1 lists the 37 poten-
tially most suitable regions identified from 110 complete
cpDNA genomes of Streptophyta. Three of them were
most promising based on their combination of high cov-
erage and high specificity (1, 3, and 16). The first was
located in the 16S gene and was the most specific one
(72% unambiguously identified species), but this region
was so long (average 166.2 bp) that many species will
probably remain undetected in a matrix of highly
degraded DNA. The second region, which represented
the P6 loop flanked by the g and h primers used in this
study, was less specific (61% unambiguously identified
species), but it was shorter (average 43.2 bp) and had
better coverage. The third region (no 16) corresponded to
the first part of the trnL intron between primer c and the
reverse of primer g. This region had high coverage and
was short; however, its specificity was lower than that of
the P6 loop. In our arctic trnL intron database, this region
identified only 14% of the species and 43% of genera as
Fig. 2 Variation of the two chloroplast regions trnL intron (above) and matK (below) in 25 species. Location of primers g and h flanking
the P6 loop is shown for the trnL intron. Each line represents one character (bp), and the colour indicates the level of variation; black lines
for invariable characters and white lines for characters with four different states. Different shades of grey indicate intermediate variabil-
ity. Arrows indicate gaps in the alignment.
� 2010 Blackwell Publishing Ltd
T E C H N I C A L A D V A N C E S 5
Tab
le1
Lis
to
fal
lp
ote
nti
ally
suit
able
chlo
rop
last
reg
ion
s(i
.e.,
sho
rtan
dv
aria
ble
mid
-reg
ion
flan
ked
by
two
con
serv
edre
gio
ns
that
wil
lb
eth
eta
rget
seq
uen
ces
of
the
amp
lifi
cati
on
pri
mer
s)sh
ort
erth
an20
0b
pid
enti
fied
amo
ng
110
com
ple
tech
loro
pla
stg
eno
mes
of
Str
epto
ph
yta
usi
ng
ab
ioin
form
atic
app
roac
h.
Pri
mer
pai
rsar
en
ot
op
tim
ized
for
thei
r
ther
mo
dy
nam
icp
rop
erti
esan
dar
eal
l18
bp
lon
g.
Pri
mer
len
gth
sar
en
ot
incl
ud
edin
the
amp
lifi
edle
ng
thin
dic
ated
.P
rim
erp
air
1am
pli
fies
a16
Sre
gio
n,
pri
mer
pai
r2
amp
lifi
esa
par
to
fth
ein
tro
nlo
cate
din
trn
I.P
rim
erp
air
3am
pli
fies
the
P6
loo
pre
gio
nan
dco
rres
po
nd
sto
the
ghp
rim
erp
air
use
din
this
stu
dy
.P
rim
erp
air
16am
pli
fies
the
firs
tp
art
of
the
trn
L
intr
on
.Th
eth
ree
pri
mer
sin
bo
ldw
ere
mo
stp
rom
isin
gb
ased
on
thei
rco
mb
inat
ion
of
hig
hco
ver
age,
hig
hsp
ecifi
city
and
sho
rtle
ng
th
N�
Pri
mer
s
Lo
cati
on
*
No
of
amp
lifi
edta
xa
Co
ver
age
No
of
iden
tifi
edta
xa
Sp
ecifi
city
Am
pli
fied
len
gth
Fo
rwar
dR
ever
seM
inM
axA
ver
age
1A
GC
AT
TA
GC
TC
TC
CC
TG
AA
GC
CA
TA
CG
GC
GG
TG
AA
T10406
9100
0.9
172
0.7
2165
167
166.2
2A
AA
GG
AT
CT
TA
GA
GT
GT
CG
AA
CG
CA
GC
AT
AC
AA
CT
C10
5053
960.
8763
0.66
9414
812
7.1
3G
AG
TC
TC
TG
CA
CC
TA
TC
CG
CA
AT
CC
TG
AG
CC
AA
AT
C49417
102
0.9
362
0.6
116
88
43.2
6A
AA
GG
AT
CT
TA
GA
GT
GT
CG
AA
CA
CA
GC
TT
AC
AA
CT
T10
5053
910.
8354
0.59
9214
812
7.3
7C
GA
TA
AC
CA
TC
TT
TC
GG
CT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4817
103
0.93
610.
5911
314
513
8.8
8A
TA
AC
CA
TC
TT
TC
GG
CT
AT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4819
105
0.95
620.
5970
143
136.
1
11A
CC
AT
CT
TT
CG
GC
TA
AC
CT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4822
104
0.94
610.
5967
140
133.
1
12A
AC
CA
TC
TT
TC
GG
CT
AA
CT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4821
104
0.94
610.
5968
141
134.
1
13G
AT
AA
CC
AT
CT
TT
CG
GC
TT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4818
103
0.93
600.
5811
214
413
7.8
14C
CA
TC
TT
TC
GG
CT
AA
CC
TT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4823
104
0.94
600.
5866
139
132.
1
15C
AT
CT
TT
CG
GC
TA
AC
CT
AT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4824
104
0.94
580.
5665
138
131.
1
16
GG
TA
GA
CG
CT
AC
GG
AC
TT
AT
TT
GG
CT
CA
GG
AT
TG
CC
49319
108
0.9
860
0.5
675
88
77.2
17A
TC
TT
TC
GG
CT
AA
CC
TA
GT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4825
105
0.95
580.
5564
137
130.
1
18T
TA
TC
AA
CG
TC
CA
TG
AA
TC
GA
GT
CC
GC
TT
AT
CT
CC
A13
6357
990.
9053
0.53
2212
177
.0
20T
CG
GC
TA
AC
CT
AG
CC
TC
CT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4830
106
0.96
560.
5359
132
125.
1
21T
TC
GG
CT
AA
CC
TA
GC
CT
CT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4829
106
0.96
560.
5360
133
126.
1
22T
TT
CG
GC
TA
AC
CT
AG
CC
TT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4828
106
0.96
560.
5361
134
127.
1
23A
GT
TC
CT
AC
GG
GG
TG
GA
GT
TG
GA
GT
CG
GC
GG
CT
CT
C13
6801
990.
9052
0.53
123
144
138.
1
24C
TT
TC
GG
CT
AA
CC
TA
GC
CT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4827
105
0.95
550.
5262
135
128.
1
26T
CT
TT
CG
GC
TA
AC
CT
AG
CT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4826
105
0.95
550.
5263
136
129.
1
27G
CT
AA
CC
TA
GC
CT
CC
TC
CT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4833
107
0.97
560.
5256
129
122.
1
28G
GC
TA
AC
CT
AG
CC
TC
CT
CT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4832
107
0.97
560.
5257
130
123.
1
29C
GG
CT
AA
CC
TA
GC
CT
CC
TT
CC
AA
TG
CC
CC
GA
AA
AC
C13
4831
107
0.97
560.
5258
131
124.
1
30A
GA
TG
AG
TG
CT
CT
CC
TA
TA
GG
AT
GC
CT
CA
GC
TG
CA
T10
9125
104
0.94
540.
5211
117
815
8.0
31A
GA
TG
AG
TG
CT
CT
CC
TA
TT
AG
GA
TG
CC
TC
AG
CT
GC
A10
9125
104
0.94
540.
5211
217
915
9.0
32A
GA
TG
AG
TG
CT
CT
CC
TA
TT
TA
GG
AT
GC
CT
CA
GC
TG
C10
9125
104
0.94
540.
5211
318
016
0.0
33A
GA
TG
AG
TG
CT
CT
CC
TA
TG
AT
GC
CT
CA
GC
TG
CA
TA
C10
9125
104
0.94
530.
5110
917
615
6.0
34A
GA
TG
AG
TG
CT
CT
CC
TA
TG
GA
TG
CC
TC
AG
CT
GC
AT
A10
9125
104
0.94
530.
5111
017
715
7.0
35A
GA
TG
AG
TG
CT
CT
CC
TA
TC
CT
CA
GC
TG
CA
TA
CA
TC
A10
9125
103
0.94
520.
5011
017
215
2.5
36A
GA
TG
AG
TG
CT
CT
CC
TA
TG
CC
TC
AG
CT
GC
AT
AC
AT
C10
9125
103
0.94
520.
5011
117
315
3.5
37T
TC
TT
GG
TC
TG
TG
AA
GA
TG
GA
TA
GA
TG
GG
GC
GA
TT
C13
7669
990.
9050
0.50
176
195
186.
2
*Ref
ers
toth
ep
osi
tio
nin
the
pu
bli
shed
tob
acco
cpD
NA
seq
uen
ce(S
hin
oza
ki
etal
.198
6)
� 2010 Blackwell Publishing Ltd
6 T E C H N I C A L A D V A N C E S
compared to 34% and 77%, respectively, with the P6
loop.
Resolution from fossil pollen morphology
Pollen morphology can theoretically distinguish all fami-
lies included in our database, although some of them
(e.g. in the Ericales) are difficult to distinguish based on
fossil pollen because of poor preservation of morpho-
logical characters. Furthermore, the resolution at lower
taxonomic level varies considerably among families
(Fig. 1c). We estimated that the maximum resolution
based on well-preserved fossil pollen was identification
of 46.5% of the genera and only 10.2% of the species. In
the major arctic families Cyperaceae, Poaceae, Salicaceae,
Brassicaceae and Caryophyllaceae, little resolution can be
achieved from fossil pollen, and species-level resolution
is often difficult in the Saxifragaceae as well. However,
some important genera (e.g. Artemisia and Populus) are
identifiable based on pollen, but not on the P6 loop
(Appendix S1).
The permafrost soil samples
The two permafrost samples were dated to 22 960 ± 120
(OxA-15348) and 15 810 ± 75 (OxA-14930) uncalibrated
radiocarbon years B.P. Pyrosequencing of the two sam-
ples yielded haplotypes representing 47 and 18 different
molecular taxonomic units, respectively, the former iden-
tified from 1983 P6 loop sequences obtained in the 454
run (Table 2). Among the 47 taxonomic units from the
oldest sample, 20 could be identified to species, 19 to spe-
cies group or genus and eight only to family. Among the
18 units from the youngest sample, seven could be iden-
tified to species, six to species group or genus and five
only to family. The taxonomic composition differed sub-
stantially between the two samples. For the oldest sam-
ple, almost half of the sequences obtained from the 454
run were identified as Bistorta vivipara and a quarter of
them as Equisetum sp. In the youngest sample, sequences
of Trollius sp., Festuca sp. and Valeriana sp. occurred with
highest frequencies.
Discussion
This study represents the first implementation of P6 loop
analysis for palaeoecological reconstruction based on
ancient DNA. We have clearly demonstrated the power
of this approach. The system is very robust with highly
conserved primer sites, permitting potential amplifica-
tion of all gymnosperms and angiosperms in a DNA
extract from environmental samples containing a mixture
of different species. PCR can be carried out efficiently,
even with highly degraded DNA, as demonstrated here
Table 2 Lists of molecular taxonomic units based on DNA from
the two Chukotkan sediment samples, showing the molecular
taxonomic units identified and their frequency of occurrence
(% of sequences obtained in the 454 run)
%
22 960 ± 120 years BP
Bistorta vivipara 47.25
Equisetum arvense ⁄ E. fluviatile ⁄ E. sylvaticum 24.31
Salix sp. ⁄ Chosenia arbutifolia ⁄ Populus balsamifera 4.74
Armeria scabra 3.03
Thymus oxyodontus 2.77
Lagotis glauca 2.17
Asteraceae 1* 1.87
Avenella flexuosa 1.77
Aconogonon alaskanum ⁄ A. ocreatum ⁄ A. tripterospermum 1.36
Rumex sp. 1.31
Packera sp. ⁄ Senecio sp. 0.96
Poaceae 1† 0.96
Ranunculus acris ⁄ R. subborealis ⁄ R. turneri 0.81
Festuca sp. 0.76
Hulteniella integrifolia 0.66
Saxifraga hirculus 0.55
Trientalis europaea 0.45
Asteraceae 2‡ 0.40
Valeriana capitata ⁄ V. officinalis agg. 0.35
Myosotis alpestris 0.30
Asteraceae 3§ 0.30
Empetrum sibiricum ⁄ E. subholarcticum 0.30
Anthoxanthum nipponicum 0.25
Crepis chrysanta 0.25
Saxifraga bracteata ⁄ S. cernua ⁄ S. hyperborea ⁄S. radiata ⁄ S. rivularis
0.20
Papaver sp. 0.15
Elymus sp. ⁄ Leymus sp. 0.15
Trollius sp.¶ 0.15
Koeleria asiatica ⁄ Trisetum spicatum 0.15
Pedicularis oederi 0.10
Viola biflora 0.10
Claytonia arctica ⁄ C. scammaniana 0.10
Sanguisorba officinalis 0.10
Vaccinium uliginosum 0.10
Calamagrostis sp. 0.10
Potentilla sp. 0.10
Pulsatilla patens 0.05
Beckmannia syzigachne 0.05
Cardamine pratensis 0.05
Trisetum sibiricum 0.05
Vaccinium alaskense ⁄ V. myrtillus 0.05
Castilleja elegans ⁄ C. hyperborea 0.05
Deschampsia sp. 0.05
Parrya arctica ⁄ P. nudicaulis 0.05
Astragalus alpinus ⁄ A. umbellatus 0.05
Thalictrum alpinum ⁄ T. minus ⁄ T. sparsiflorum 0.05
Caltha arctica ⁄ C. palustris 0.05
15 810 ± 75 years BP
Trollius sp.¶ 27.87
Festuca sp. 16.34
Valeriana capitata ⁄ V. officinalis agg. 14.91
Asteraceae 1* 13.69
� 2010 Blackwell Publishing Ltd
T E C H N I C A L A D V A N C E S 7
with ancient DNA preserved in frozen sediments, and by
Valentini et al. (2009) for diet analysis based on DNA in
animal faeces. The system does not depend on time-con-
suming cloning and is well designed for the next-genera-
tion sequencing technology. It thus permits large-scale
and fast analysis. If the goal is to obtain about 2000 reads
per sample, a total of about 200 samples can be
sequenced in a single run with the current technology
available on the 454 GS FLX (Roche, Basel, Switzerland).
Full analysis can be carried out within few weeks as soon
as a taxonomic reference database is available, here con-
structed for the bulk of the arctic flora. However, a large
number of trnL sequences are already available in Gen-
Bank (about 28 000), facilitating development of similar
reference databases also for other floras and purposes.
We have demonstrated the potential for reconstruc-
tion of past plant communities from DNA preserved
in permafrost with the P6 loop. Our focus on a single,
relatively species-poor flora enabled resolution of most
genera (77%) and many species (33%) with this very
short DNA region. This would not be possible if all plant
species occurring worldwide were considered. Taberlet
et al. (2007) found that 19% of the species and 41% of the
genera could be identified in an electronic PCR analysis
of more than 11 000 P6 loop sequences from GenBank.
The approach might result in even higher resolution if
restricted to smaller areas within the Arctic. For example,
42% of the species and 89% of the genera currently
occurring in the isolated arctic archipelago of Svalbard
(for information about the Svalbard Flora, see http://
svalbardflora.net/index.php?id=1) can be identified using
the P6 loop only, but it cannot be excluded that other
species may have occurred in this archipelago in earlier
time periods.
We have shown that the proportion of genera and spe-
cies identifiable with the P6 loop is considerably higher
than with pollen analyses, based on the maximum poten-
tial resolution inferred from two frequently used palyno-
logical keys. However, in some cases, genera or species
can be identified palynologically but not with the P6
loop. Furthermore, the low pollen production in insect-
pollinated plants tends to result in poor representation of
such species in the fossil record (although very high pol-
len counts may improve the diversity in pollen records;
see Cwynar (1982) and Anderson et al. (1994)). Prolific
wind-pollinated genera (particularly Betula and Alnus)
are over-represented and further skew pollen assemblage
composition. In addition, wind-dispersed pollen can be
transported over long distances and is difficult to distin-
guish from locally produced pollen. DNA-based flora
reconstruction from soil may thus give a more accurate
picture of the local species assemblage, as do detailed
plant macrofossil studies (Goetcheus & Birks 2001; Kie-
nast et al. 2008).
Among the alternative chloroplast regions we
sequenced here (matK, rpoB, rpoC1, trnH-psbA), only
matK and trnH-psbA had similar or more variation than
the whole trnL intron. Previously, difficulties with ampli-
fication of the trnH-psbA intron in some genera have been
reported (e.g. Lahaye et al. 2008), suggesting that this
region is not suitable for sequencing of environmental
samples. In coding genes, e.g. matK, mutations occur
frequently over the whole region, resulting in lack of
potentially conserved internal primer sites (see Fig. 2 for
a graphical presentation of the variation in matK).
Thus, the alternative regions sequenced here were less
suitable than the P6 loop for our specific purpose of
reconstructing past vegetation based on degraded DNA
from samples containing a mixture of different species.
This conclusion was supported by our bioinformatic
analysis of all available complete Streptophyta cpDNA
genomes, demonstrating that the P6 loop of the trnL
intron is the best candidate region for analysing highly
degraded template DNA (no 3 in Table 1). It is the short-
est one among the 37 potential cpDNA regions identified,
Table 2 Continued
%
Salix sp. ⁄ Chosenia arbutifolia ⁄ Populus balsamifera 8.98
Bistorta vivipara 4.95
Alchemilla glomerulans 4.91
Larix cajanderi ⁄ L. dahurica ⁄ L. laricina 2.99
Pulsatilla patens 1.72
Koeleria asiatica ⁄ Trisetum spicatum 1.48
Hulteniella integrifolia 0.88
Asteraceae 3§ 0.66
Viola biflora 0.29
Asteraceae 2‡ 0.11
Vaccinium alaskense 0.11
Gentianopsis barbata ⁄ G. detonsa 0.07
Omalotheca norvegica 0.04
*Eurybia sibirica, Solidago multiradiata, S. virgaurea, Symphyotri-
chum pygmaea.
†Alopecurus magellanicus, Anthoxanthum arcticum, A. monticola,
Arctagrostis latifolia, Beckmannia syzigachne, Hyalopoa lanatiflora,
Milium effusum, Poa abbreviata, P. arctica, P. eminens, P. glauca,
P. hartzii, P. laxa, P. palustris, P. paucispicula, P. pratensis,
P. pseudoabbreviata, P. sibirica, Vahlodea atropurpurea.
‡Hieracium alpinum, H. sect. Sylvatica, Lactuca sibirica, Saussurea
alpina, S. tilesii, Taraxacum croceum, T. Lacerum.
§Achillea alpina, A. millefolium, Arctanthemum arcticum, Artemisia
arctica, A. arctisibirica, A. borealis, A. frigida, A. furcata, A. glomera-
ta, A. kruhsiana, A. tilesii, Chrysanthemum mongolicum, Tanacetum
bipinnatum.
¶We refer to this haplotype as Trollius sp.: only the European
T. europaeus and none of the Siberian Trollius species is included
in our current database. However, all trnL sequences of 12 spe-
cies of Trollius, including T. europaeus and two of the Siberian
arctic species, deposited in GenBank have identical P6 loops.
� 2010 Blackwell Publishing Ltd
8 T E C H N I C A L A D V A N C E S
but nevertheless provides high taxonomic resolution
compared to other regions.
The first analyses of permafrost sediments presented
here illustrate the potential of the P6 loop for palaeo-
ecological reconstruction based on degraded DNA. A
high proportion of the sequences retrieved could be
identified to species, species groups and genera. We
could also assess the relative abundances of the differ-
ent haplotypes from the 454 sequencing, although it is
still not clear to what degree these reflect the actual rel-
ative abundances of the different taxa in the past vege-
tation. The species composition of the older sample we
analysed, which appears to be dominated by Bistorta
vivipara and Equisetum sp. (most likely E. arvense in this
case), indicates a species-rich, mesic meadow ⁄ moist
heath community. The species composition of the youn-
ger sample, which appears to be dominated by Trollius
sp., Valeriana sp. and grasses, indicates a nutrient-rich
tall-forb meadow. This also might be expected to be
species-rich, and it is probable that analysis of more
samples from the same locality would reveal additional
species. Thus, the first test results presented here sug-
gest that full analysis of sites in permafrost regions may
allow for detailed reconstructions of past plant commu-
nities and give new insights into their change over
time.
The main constraint of our approach is clearly the lim-
ited resolution at the species level, but this apparently
must be accepted as unavoidable when working with
degraded DNA, and particularly so in the young arctic
ecosystem, which formed only 2–3 million years ago
(Brochmann & Brysting 2008). The low sequence varia-
tion observed in many arctic plant groups can be
explained by recent (Pleistocene) speciation, for example
in Papaver, Salix, Draba and Cardamine (reviewed in
Brochmann & Brysting 2008). Further complicating the
analysis of conservative molecular markers such as chlo-
roplast DNA in recently originated arctic species is
potential incomplete lineage sorting of old polymor-
phisms, which can be shared among closely related spe-
cies. The arctic flora is also known for high levels of
interspecific hybridization, which may lead to transfer of
chloroplasts between species (Brochmann & Brysting
2008). For the taxonomic reference database constructed
here, we sequenced two geographically widely separated
specimens of each species to minimize this problem, but
in some cases, such processes may admittedly lead to
erroneous discrimination among species based on our
current database. Nevertheless, despite the inherent limi-
tations associated with the evolutionary history of arctic
plants, the Arctic provides unique possibilities for
long-term DNA preservation in the permafrost and thus
ideal conditions for reconstructing historical species
assemblages.
It is possible that the taxonomic resolution of our
approach can be improved by using a tiered approach,
with the P6 loop as the first tier and subsequently design
family ⁄ genus-specific primers around short regions in
the chloroplast or nuclear genome with higher power for
species separation (Newmaster et al. 2006). The current
resolution should nevertheless be sufficient for quite
detailed large-scale reconstruction of past plant commu-
nities. Inferences of past climates based on the ecological
requirements of the identified species, inferred from
niche modelling of current species distributions, do not
require identification of all plant species in past commu-
nities.
Two important possibilities need to be explored in fur-
ther studies. First, it should be possible to apply the same
approach also to other groups of organisms, by designing
appropriate short barcode systems and constructing
taxonomic reference databases. Second, the possibility of
quantitative reconstruction of past communities should
be explored by analysis of DNA from modern soils
relative to present composition of the vegetation. The
accuracy of quantitative reconstruction can possibly be
improved by carrying out emulsion PCR that better pre-
serve the initial concentrations (Williams et al. 2006) and
by increasing the number of sequences from each sample
in the 454 analysis.
To summarize, we have shown that use of the P6
loop is promising for improved palaeoecological recon-
struction. Our approach can also be useful in other
kinds of ecological studies depending on a short bar
coding region, for example in studies of species com-
position in faeces or other animal remains (Valentini
et al. 2009). Besides enabling amplification from
degraded DNA, short DNA regions can also be easily
retrieved via ultra-high-throughput sequencing, as
demonstrated for the permafrost soil samples analysed
here. With this technology, a high number of DNA
fragments can be sequenced in parallel without the
need for traditional cloning. We conclude that such
short DNA barcode systems, together with the parallel
pyrosequencing technology, open new avenues for
DNA-based biodiversity assessment using environmen-
tal samples.
Acknowledgements
This study was funded by the European Commission, via the
ECOCHANGE project (GOCE-2006-036866). We thank Professor
Mark Ungerer and three anonymous reviewers for valuable
comments on improving the manuscript. We also thank Corinne
Cruaud and Patrick Wincker for the 454 sequencing at Geno-
scope, France and Virginia Mirre for technical assistance in the
Oslo laboratory. The late A.V. Sher identified the Quaternary
sequence at Mayn River and established the stratigraphic context
for sample collection.
� 2010 Blackwell Publishing Ltd
T E C H N I C A L A D V A N C E S 9
References
Anderson PM, Bartlein PJ, Brubaker LB (1994) Late Quaternary
History of Tundra Vegetation in Northwestern Alaska. Qua-
ternary Research, 41, 306–315.
Araujo MB, Rahbek C (2006) How does climate change affect
biodiversity? Science, 313, 1396–1397.
Brochmann C, Brysting AK (2008) The Arctic – an evolutionary
freezer? Plant Ecology & Diversity, 1, 181–195.
Bulat SA, Lubeck M, Alekhina IA et al. (2000) Identification of
a universally primed-PCR-derived sequence-characterized
amplified region marker for an antagonistic strain of Clonosta-
chys rosea and development of a strain-specific PCR detection
assay. Applied and Environmental Microbiology, 66, 4758–4763.
Chase MW, Cowan RS, Hollingsworth PM et al. (2007) A pro-
posal for a standardised protocol to barcode all land plants.
Taxon, 56, 295–299.
Cwynar LC (1982) A late-quaternary vegetation history from
Hanging Lake, Northern Yukon. Ecological Monographs, 52,
1–24.
Elven R (2008) Checklist of the Panarctic Flora (PAF) Vascular
Plants. http://www.binran.ru/infsys/paflist/index.htm (last
accessed 13th April 2010).
Fægri K, Iversen J (1964) Textbook of Pollen Analysis, 2 edn. Scan-
dinavian university books, Copenhagen.
Goetcheus VG, Birks HH (2001) Full-glacial upland tundra vege-
tation preserved under tephra in the Beringia National Park,
Seward Peninsula, Alaska. Quaternary Science Reviews, 20, 135–
147.
Guisan A, Thuiller W (2005) Predicting species distribution:
offering more than simple habitat models. Ecology Letters, 8,
993–1009.
Hajibabaei M, Smith MA, Janzen DH et al. (2006) A minimalist
barcode can identify a specimen whose DNA is degraded.
Molecular Ecology Notes, 6, 959–964.
Hofreiter M, Poinar HN, Spaulding WG et al. (2000) A molecular
analysis of ground sloth diet through the last glaciation. Molec-
ular Ecology, 9, 1975–1984.
Jarvie T (2007) Amplicon sequencing. Genome Sequencer Sys-
tem, Roche Diagnostic, Application note, N5, 1–16.
Kienast F, Tarasov P, Schirnneister L, Grosse G, Andreev AA
(2008) Continental climate in the East Siberian Arctic during
the last interglacial: implications from palaeobotanical
records. Global and Planetary Change, 60, 535–562.
Lahaye R, Van der Bank M, Bogarin D et al. (2008) DNA barcod-
ing the floras of biodiversity hotspots. Proceedings of the
National Academy of Sciences of the United States of America, 105,
2923–2928.
Lamb HF, Edwards ME (1988) The Arctic. In: Vegetation History
(eds Huntley B & Webb T III), pp. 519–555. Kluwer Academic
Publishers, Dordrecht.
Lou M, Golding GB (2007) FINGERPRINT: visual depiction of
variation in multiple sequence alignments. Molecular Ecology
Notes, 7, 908–914.
Margulies M, Egholm M, Altman WE et al. (2005) Genome
sequencing in microfabricated high-density picolitre reactors.
Nature, 437, 376–380.
Moore PD, Webb JA, Collinson ME (1991) Pollen Analysis, 2 edn.
Blackwell Scientific Publications, Oxford.
Newmaster SG, Fazekas AJ, Ragupathy S (2006) DNA barcoding
in land plants: evaluation of rbcL in a multigene tiered
approach. Canadian Journal of Botany-Revue Canadienne De Bota-
nique, 84, 335–341.
Poinar HN, Hofreiter M, Spaulding WG et al. (1998) Molecular
coproscopy: dung and diet of the extinct ground sloth Nothro-
theriops shastensis. Science, 281, 402–406.
Sang T, Crawford DJ, Stuessy TF (1997) Chloroplast DNA phy-
logeny, reticulate evolution, and biogeography of Paeonia
(Paeoniaceae). American Journal of Botany, 84, 1120–1136.
Shinozaki K, Ohme M, Tanaka M et al. (1986) The complete
nucleotide sequence of tobacco chloroplast genome: its gene
organization and expression. The EMBO Journal, 5, 2043–
2049.
Taberlet P, Coissac E, Pompanon F et al. (2007) Power and limi-
tations of the chloroplast trnL (UAA) intron for plant DNA
barcoding. Nucleic Acids Research, 35, e14.
Thomas CD, Cameron A, Green RE et al. (2004) Extinction risk
from climate change. Nature, 427, 145–148.
Valentini A, Pompanon F, Taberlet P (2009) DNA barcoding for
ecologists. Trends in Ecology & Evolution, 24, 110–117.
Willerslev E, Cooper A (2005) Ancient DNA. Proceedings of the
Royal Society B-Biological Sciences, 272, 3–16.
Willerslev E, Hansen AJ, Binladen J et al. (2003) Diverse plant
and animal genetic records from Holocene and Pleistocene
sediments. Science, 300, 791–795.
Willerslev E, Cappellini E, Boomsma W et al. (2007) Ancient bio-
molecules from deep ice cores reveal a forested Southern
Greenland. Science, 317, 111–114.
Williams R, Peisajovich SG, Miller OJ et al. (2006) Amplification
of complex gene libraries by emulsion PCR. Nature Methods, 3,
545–550.
Supporting Information
Additional Supporting Information may be found in the
online version of this article.
Appendix S1. List of species included in the Arctic plant
database, with P6 haplotypes (each number refers to a
unique haplotype) and trnL haplotypes and notes on
intraspecific variation in parentheses (number of substi-
tutions, number of indels).
Appendix S2. Variability in g and h primer sites (Taberlet
et al. 2007), with variable characters given.
Appendix S3. Variability in DNA barcoding markers
from Chase et al. (2007) compared with the trnL intron
and the P6 loop.
Please note: Wiley-Blackwell are not responsible for the
content or functionality of any supporting information
supplied by the authors. Any queries (other than missing
material) should be directed to the corresponding author
for the article.
� 2010 Blackwell Publishing Ltd
10 T E C H N I C A L A D V A N C E S