Source verification of mis-identified Arabidopsis thaliana accessions

13
TECHNICAL ADVANCE Source verification of mis-identified Arabidopsis thaliana accessions Alison E. Anastasio 1,† , Alexander Platt 2,† , Matthew Horton 1 , Erich Grotewold 3 , Randy Scholl 3 , Justin O. Borevitz 1 , Magnus Nordborg 2,4 and Joy Bergelson 1,* 1 Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA, 2 Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA, 3 Arabidopsis Biological Resource Center, Ohio State University, Columbus, OH 43210, USA, and 4 Gregor Mendel Institute, 1030 Vienna, Austria Received 9 March 2011; revised 3 April 2011; accepted 6 April 2011; published online 16 June 2011. * For correspondence (fax 773-702-9740; e-mail [email protected]). These authors contributed equally to this work. SUMMARY A major strength of Arabidopsis thaliana as a model lies in the availability of a large number of naturally occurring inbred lines. Recent studies of A. thaliana population structure, using thousands of accessions from stock center and natural collections, have revealed a robust pattern of isolation by distance at several spatial scales, such that genetically identical individuals are generally found close to each other. However, some individual accessions deviate from this pattern. While some of these may be the products of rare long-distance dispersal events, many deviations may be the result of mis-identification, in the sense that the data regarding location of origin data are incorrect. Here, we aim to identify such discrepancies. Of the 5965 accessions examined, we conclude that 286 deserve special attention as being potentially mis-identified. We describe these suspicious accessions and their possible origins, and advise caution with regard to their use in experiments in which accurate information on geographic origin is important. Finally, we discuss possibilities for maintaining the integrity of stock lines. Keywords: Arabidopsis thaliana, contamination, long-distance dispersal, stock center, natural variation, population structure. INTRODUCTION The Arabidopsis community has been interested in natural variation for decades (reviewed by Alonso-Blanco and Koornneef, 2000; Meyerowitz, 2001). In 1937, Friedrich Laibach began collecting local ecotypes (Laibach, 1943; Meyerowitz, 2001; Koornneef and Meinke, 2010). His per- sonal collection and compilation of natural accessions from other collectors constituted the first standardized set of seed stock used by researchers (Ro ¨ bbelen, 1965). As the popu- larity of Arabidopsis increased, researchers focused on a few standard lines (e.g. Ler-0, Col-0), from which mutants, and later recombinant inbred lines, were derived. Researchers also continued to collect new ecotypes from natural popu- lations, notably Laibach in Europe, Ivo Cetl in Moravia, and Albert Kranz in Germany. Several narrowly distributed papers described phenotypic variation in these accessions, but, with the advent of the Arabidopsis Information Service (AIS) in 1964, descriptions of natural accessions and their ecology were shared among a growing community (Cetl, 1965; Cetl et al., 1965; Ro ¨ bbelen, 1965; Effmertova and Cetl, 1966; Effmertova, 1967). Soon after, the AIS Seed Stock Center was established to house ecotypes and mutant lines (Somerville and Koornneef, 2002). Currently, three stock centers supply Arabidopsis researchers around the world with seeds and DNA for their work: the Arabidopsis Biolog- ical Resource Center at Ohio State University, Columbus, OH, USA (http://abrc.osu.edu), the Nottingham Arabidopsis Stock Centre (NASC) at the University of Nottingham, UK (http://arabidopsis.info), and the Biological Resource Center, 554 ª 2011 The Authors The Plant Journal ª 2011 Blackwell Publishing Ltd The Plant Journal (2011) 67, 554–566 doi: 10.1111/j.1365-313X.2011.04606.x

Transcript of Source verification of mis-identified Arabidopsis thaliana accessions

TECHNICAL ADVANCE

Source verification of mis-identified Arabidopsis thalianaaccessions

Alison E. Anastasio1,†, Alexander Platt2,†, Matthew Horton1, Erich Grotewold3, Randy Scholl3, Justin O. Borevitz1,

Magnus Nordborg2,4 and Joy Bergelson1,*

1Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA,2Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA,3Arabidopsis Biological Resource Center, Ohio State University, Columbus, OH 43210, USA, and4Gregor Mendel Institute, 1030 Vienna, Austria

Received 9 March 2011; revised 3 April 2011; accepted 6 April 2011; published online 16 June 2011.*For correspondence (fax 773-702-9740; e-mail [email protected]).†These authors contributed equally to this work.

SUMMARY

A major strength of Arabidopsis thaliana as a model lies in the availability of a large number of naturally

occurring inbred lines. Recent studies of A. thaliana population structure, using thousands of accessions from

stock center and natural collections, have revealed a robust pattern of isolation by distance at several spatial

scales, such that genetically identical individuals are generally found close to each other. However, some

individual accessions deviate from this pattern. While some of these may be the products of rare long-distance

dispersal events, many deviations may be the result of mis-identification, in the sense that the data regarding

location of origin data are incorrect. Here, we aim to identify such discrepancies. Of the 5965 accessions

examined, we conclude that 286 deserve special attention as being potentially mis-identified. We describe

these suspicious accessions and their possible origins, and advise caution with regard to their use in

experiments in which accurate information on geographic origin is important. Finally, we discuss possibilities

for maintaining the integrity of stock lines.

Keywords: Arabidopsis thaliana, contamination, long-distance dispersal, stock center, natural variation,

population structure.

INTRODUCTION

The Arabidopsis community has been interested in natural

variation for decades (reviewed by Alonso-Blanco and

Koornneef, 2000; Meyerowitz, 2001). In 1937, Friedrich

Laibach began collecting local ecotypes (Laibach, 1943;

Meyerowitz, 2001; Koornneef and Meinke, 2010). His per-

sonal collection and compilation of natural accessions from

other collectors constituted the first standardized set of seed

stock used by researchers (Robbelen, 1965). As the popu-

larity of Arabidopsis increased, researchers focused on a few

standard lines (e.g. Ler-0, Col-0), from which mutants, and

later recombinant inbred lines, were derived. Researchers

also continued to collect new ecotypes from natural popu-

lations, notably Laibach in Europe, Ivo Cetl in Moravia, and

Albert Kranz in Germany. Several narrowly distributed

papers described phenotypic variation in these accessions,

but, with the advent of the Arabidopsis Information Service

(AIS) in 1964, descriptions of natural accessions and their

ecology were shared among a growing community (Cetl,

1965; Cetl et al., 1965; Robbelen, 1965; Effmertova and Cetl,

1966; Effmertova, 1967). Soon after, the AIS Seed Stock

Center was established to house ecotypes and mutant lines

(Somerville and Koornneef, 2002). Currently, three stock

centers supply Arabidopsis researchers around the world

with seeds and DNA for their work: the Arabidopsis Biolog-

ical Resource Center at Ohio State University, Columbus,

OH, USA (http://abrc.osu.edu), the Nottingham Arabidopsis

Stock Centre (NASC) at the University of Nottingham, UK

(http://arabidopsis.info), and the Biological Resource Center,

554 ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd

The Plant Journal (2011) 67, 554–566 doi: 10.1111/j.1365-313X.2011.04606.x

part of the RIKEN Organization (formerly Sendai Arabidopsis

Seed Stock Center), in Japan (http://www.brc.riken.go.jp/lab/

epd/Eng/catalog/seed.shtml).

The availability of so many natural accessions means that

scientists can exploit natural genetic variability within the

species to identify gene functions that mutagenesis fails

to uncover (Tonsor et al., 2005), illuminate traits that are

important in native habitats and are targets of selection

(Mauricio and Rausher, 1997; Stahl et al., 1999; Alonso-

Blanco and Koornneef, 2000; Johanson et al., 2000; Hauser

et al., 2001; Tian et al., 2002; Mauricio et al., 2003; Goss and

Bergelson, 2006; Mitchell-Olds and Schmitt, 2006; McKay

et al., 2008), test evolutionary theory (Bergelson et al., 2001;

Nordborg et al., 2005; Bakker et al., 2006; Ehrenreich and

Purugganan, 2006; Shindo et al., 2007; Novembre and

Slatkin, 2009), and understand the extent to which popula-

tion structure is important in natural populations (Bergelson

et al., 1998; Nordborg et al., 2005; Beck et al., 2008; Bom-

blies et al., 2010; Platt et al., 2010). A. thaliana has a global

distribution and is found in a variety of habitats. Consider-

able variation in ecologically relevant traits has been

uncovered using accessions from disparate geographical

areas (Aranzana et al., 2005; Banta et al., 2007; Shindo et al.,

2007; Bouchabke et al., 2008). This natural variation is

especially useful in the study of adaptation. Indeed, conclu-

sions about historical demographic events have been drawn

using these accessions and information regarding their

specific geographical locations (Nordborg et al., 2005; Beck

et al., 2008; Pico et al., 2008). These conclusions rely on the

association between genotypic and geographic information

being correct.

A way to validate the identity of accessions used for

research is to genotype a large collection of individuals,

characterize the population structure across the species

range, and identify individuals that appear to be out of place.

This is especially easy when a species has strong population

structure, such as in species that show clear isolation by

distance. In this case, misplaced individuals may either be

the result of recent long-distance migration or human error

(i.e. mislabeling or contamination of stocks). In general, we

do not expect outliers to be the result of true long-distance

migration. The mere fact that they are identifiable as outliers

means that long-distance migration is exceedingly rare;

furthermore, we would not expect to find such individuals in

studies with limited sample size. Exceptions include situa-

tions where migration rates have recently increased (as has

clearly happened in humans), or where genotypes with a

predisposition for migration are strongly favored by selec-

tion. However, outliers can usually be explained by misla-

beling or other human error.

Using a data set of 139 SNPs in almost 6000 common

laboratory strains and natural accessions, Platt et al. (2010)

revealed that A. thaliana conforms to a pattern of isolation

by distance at several scales, and that the scales differ

regionally. Although genetically identical individuals can be

found across North America, identical individuals are rarely

separated by more than 1 km in continental Eurasia. Given

the clear pattern of isolation by distance across large

geographical areas, there is an expectation of the degree

to which plants within a region are related. In preparing data

for our previous paper (Platt et al., 2010), we came across

a number of accessions from both our collections and the

stock centers that are outliers. Plants that are almost

identical to geographically distant plants stand out as

potentially mis-identified. In cases where local collections

are not especially diverse, a single distantly related outlier

(perhaps also found in another location) is also suspicious.

Importantly, the inverse is true as well. Plants that are closely

related to close neighbors and only distantly related to

distant neighbors are almost certainly correctly identified.

RESULTS

Based on pairwise comparisons between individuals, we

identified three categories of accessions, ranging from well-

corroborated to highly suspicious. Geographic information

for individuals on the ‘green list’ was corroborated by

neighbors. These accessions are genetically similar to their

neighbors, and genetically dissimilar from accessions found

farther away. The green list included 70% of accessions

(Table S1). Accessions on the ‘red list’ are genetically

differentiated from their neighbors, genetically similar to

geographically distant individuals, and often reveal other

characteristics strongly suggestive of contamination, as

discussed below. Just under 5% of the dataset (286 acces-

sions) were included on the red list: all of these are geneti-

cally similar to geographically distant accessions and

dissimilar to their geographic neighbors (Table 1). The

‘yellow list’ contained the remaining 1472 accessions

(almost 25%). These include 131 accessions for which there

are insufficient nearby collections for corroboration, and

1341 accessions whose only long-distance, genetically close

relatives were on the red list.

The 286 accessions on the red list were further divided

into four categories. The first category comprised 71 acces-

sions that were the sole member of their haplogroup. The

second category comprised 78 accessions that belonged

to haplogroups that included commonly used laboratory

strains and were extensively over-dispersed (Col-0, Kin-0,

Kondara, Ler-0 and Ws-2) (all members of such haplogroups

were included on the red list). The third category comprised

18 haplogroups in which one or more far-flung individuals

came from a different geographical area than the majority of

members in the group; 30 accessions fitted this description

and were the only members of their haplogroup placed on

the red list (although, accordingly, the remaining members

were on the yellow list). For instance, the notable Midwest

US haplogroup hg8115, known as the ‘Heartland’ haplo-

group, contains 1041 accessions, one of which is not

Mis-identified A. thaliana accessions 555

ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 554–566

Tab

le1

Iden

tity

and

cate

go

riza

tio

no

fal

lac

cess

ion

so

nth

ere

dlis

t,in

clu

din

gm

ost

clo

sely

rela

ted

acce

ssio

n,

per

cen

tag

eid

enti

ty,

geo

gra

ph

icd

ista

nce

toth

em

ost

dis

tan

tre

lati

ve(>

70%

iden

tity

),th

em

ost

rela

ted

acce

ssio

nb

yn

ame

and

nu

mb

er,

per

cen

tsi

mila

rity

and

the

dis

tan

ceto

the

mo

stre

late

dac

cess

ion

Cat

ego

ryH

aplo

gro

up

Acc

essi

on

nam

eN

ativ

en

ame

Co

llect

or

Acc

essi

on

nu

mb

erC

ou

ntr

yLa

titu

de

Lon

git

ud

e

Max

imu

md

ista

nce

tore

lati

ve(k

m)

Mo

stre

late

dac

cess

ion

nu

mb

er

Mo

stre

late

dac

cess

ion

nam

eS

imila

rity

Geo

gra

ph

icd

ista

nce

(km

)

14

ALL

1-5

ALL

1-5

LeC

orr

e4

FRA

45.2

667

1.48

333

5868

.54

461

EM

-183

0.73

2139

8.85

161

ALL

1-4

ALL

1-4

LeC

orr

e3

FRA

45.2

667

1.48

333

499.

6461

CA

M-5

71

491.

561

77C

LA-7

CLA

-7Le

Co

rre

77FR

A48

.82.

2666

756

9.6

461

EM

-183

0.82

1420

8.14

178

CLE

-6C

LE-6

LeC

orr

e78

FRA

48.9

167

)0.4

8333

363

61.6

546

83U

KS

W06

-078

0.76

0931

9.98

113

7LD

V-4

4LD

V-4

4Le

Co

rre

137

FRA

48.5

167

)4.0

6667

6248

.936

7T

OU

-E-2

0.75

5361

9.6

114

9M

IB-9

6M

IB-9

6Le

Co

rre

234

FRA

47.3

833

5.31

667

766.

6210

5LD

V-1

51

698.

91

260

PA

R-5

PA

R-5

LeC

orr

e26

0FR

A46

.65

)0.2

542

5.64

166

MIB

-15

0.76

842

5.64

126

6R

AN

RA

NLe

Co

rre

266

FRA

48.6

5)2

6374

.54

8014

PT

1.09

0.72

0663

74.5

41

322

TO

U-A

1-45

TO

U-A

1-45

Ro

ux

322

FRA

46.6

667

4.11

667

5769

.07

196

MIB

-47

0.84

7610

0.35

133

7T

OU

-A1-

70T

OU

-A1-

70R

ou

x33

7FR

A46

.666

74.

1166

759

50.8

863

4LI

-OF-

077

0.71

5859

50.8

81

362

TO

U-C

-3T

OU

-C-3

Ro

ux

362

FRA

46.6

667

4.11

667

333.

2916

4M

IB-1

30.

7852

100.

651

369

TO

U-F

-1T

OU

-F-1

Ro

ux

369

FRA

46.6

667

4.11

667

1114

.96

86C

UR

-80.

7956

207.

381

922

LIN

S-1

5LI

NS

-15

Do

no

hu

e92

2U

SA

41.8

972

)71.

4378

1093

7.36

7184

CS

2837

80.

8551

1093

7.36

110

70B

rosa

rp-4

5-15

3B

rosa

rp-4

5-15

3A

gre

n10

70S

WE

55.7

167

14.1

333

8050

.65

5975

Dra

IV6-

40.

7432

483.

311

5341

UK

NW

06-0

18U

KN

W06

-018

Ho

lub

5363

UK

54.4

)315

429.

4653

41U

KS

E06

-628

132

5.36

157

11U

KID

4U

KID

4H

olu

b57

11U

K54

.8)3

.315

436.

2872

63C

S28

578

0.74

2615

436.

281

5751

UK

ID46

UK

ID46

Ho

lub

5751

UK

57.3

)5.7

1632

.77

6902

CS

2809

20.

848

1601

.68

157

72U

KID

67U

KID

67R

atcl

iffe

5772

UK

54.1

)2.3

827.

8783

43N

a-1

0.85

8250

4.07

158

29A

le1-

2A

le1-

2N

ord

bo

rg58

29S

WE

55.3

838

14.0

612

8936

.54

5098

UK

SE

06-2

460.

9193

6.18

164

13U

ll3-4

Ull3

-4N

ord

bo

rg64

13S

WE

56.0

613

.97

477.

0359

66D

raIV

5-30

0.72

9947

7.03

170

09C

S28

065

Ben

k-2

Ko

orn

nee

f70

09N

ED

525.

675

765.

3846

1E

M-1

830.

7963

361.

31

7011

CS

2806

3B

e-1

Kra

nz

7011

GE

R49

.680

38.

6161

5989

.47

911

LIN

F-18

0.74

159

89.4

71

7014

CS

2805

3B

a-1

Kra

nz

7014

UK

56.5

459

)4.7

9821

1335

.44

7429

CS

2849

60.

7412

1335

.44

170

30C

S28

060

Bch

-4K

ran

z70

30G

ER

49.5

166

9.31

6610

22.1

5981

Dra

IV6-

100.

7558

503.

581

7041

CS

2811

4B

u-1

7K

ran

z70

41G

ER

50.5

9.5

428.

1974

05C

S28

814

0.90

1414

6.65

170

94C

S28

200

Da-

0K

ran

z70

94G

ER

49.8

724

8.65

081

718.

972

09C

S28

439

0.73

1949

41

7113

CS

2822

4E

i-4

Kra

nz

7113

GE

R50

.36.

367

11.6

772

20C

S28

451

0.74

4512

5.56

171

16C

S28

227

Eil-

0K

ran

z71

16G

ER

51.4

599

12.6

327

1650

.67

417

Do

ub

ravn

ik14

0.76

2529

7.11

171

24C

S28

240

Eri

-1K

oo

rnn

eef

7124

SW

E56

.433

315

.35

667.

5559

62D

raIV

5-25

0.73

9652

5.21

171

25C

S28

239

Er-

0K

ran

z71

25G

ER

49.5

955

11.0

087

1354

.68

6980

CS

2882

50.

8195

1354

.68

171

32C

S28

270

Fr-6

Kra

nz

7132

GE

R50

.110

28.

6822

9184

.17

5877

Dra

II-13

0.91

3754

6.1

171

33C

S28

266

Fr-2

Kra

nz

7133

GE

R50

.110

28.

6822

6941

.26

7429

CS

2849

60.

7711

348.

091

7141

CS

2827

4G

a-2

Kra

nz

7141

GE

R50

.38

6829

.65

7161

CS

2827

50.

8699

277.

271

7147

CS

2828

0G

ie-0

Kra

nz

7147

GE

R50

.584

8.67

825

595.

4270

68C

S28

132

0.73

9759

5.42

171

58C

S28

326

Gr-

5K

ran

z71

58A

UT

4715

.538

0.81

5981

Dra

IV6-

100.

7176

162.

411

7163

CS

2833

6H

a-0

Kra

nz

7163

GE

R52

.372

19.

7356

926

3.82

7297

CS

2860

10.

8561

263.

821

7170

CS

2834

7H

l-0

Kra

nz

7170

GE

R52

.144

49.

3782

752

6.81

7429

CS

2849

60.

7143

526.

811

7181

CS

2836

4Je

-0K

ran

z71

81G

ER

50.9

2711

.587

377.

258

70D

ra3-

90.

7163

377.

21

7196

CS

2839

1K

l-2

Kra

nz

7196

GE

R50

.95

6.96

6690

0.27

1314

An

gso

-61-

423

0.72

3790

0.27

172

07C

S28

425

Kyo

toT

suka

ya72

07JP

N35

.008

513

5.75

210

171.

4323

33W

ilco

x-27

0.72

9310

171.

431

7208

CS

2844

1La

n-0

Kra

nz

7208

UK

55.6

739

)3.7

8181

8203

.92

5766

UK

ID61

0.90

3746

0.02

556 Alison E. Anastasio et al.

ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 554–566

Tab

le1

(Co

nti

nu

ed)

Cat

ego

ryH

aplo

gro

up

Acc

essi

on

nam

eN

ativ

en

ame

Co

llect

or

Acc

essi

on

nu

mb

erC

ou

ntr

yLa

titu

de

Lon

git

ud

e

Max

imu

md

ista

nce

tore

lati

ve(k

m)

Mo

stre

late

dac

cess

ion

nu

mb

er

Mo

stre

late

dac

cess

ion

nam

eS

imila

rity

Geo

gra

ph

icd

ista

nce

(km

)

172

09C

S28

439

La-0

Kra

nz

7209

PO

L52

.733

315

.233

339

67.1

370

96C

S28

205

0.89

5880

3.64

172

10C

S28

440

La-1

Kra

nz

7210

PO

L52

.733

315

.233

364

5.77

7114

CS

2822

50.

8182

645.

771

7223

CS

2845

3Li

-2:1

Kra

nz

7223

GE

R50

.383

38.

0666

8196

.45

5962

Dra

IV5-

250.

7708

589.

41

7224

CS

2845

4Li

-3K

ran

z72

24G

ER

50.3

833

8.06

6670

24.1

446

1E

M-1

830.

7636

532.

811

7231

CS

2846

1Li

-7K

ran

z72

31G

ER

50.3

833

8.06

6653

2.81

7246

CS

2848

90.

7092

57.2

31

7268

CS

2857

2N

p-0

Kra

nz

7268

GE

R52

.696

910

.981

957.

6411

34G

ard

by-

19-2

050.

7792

467.

141

7287

CS

2859

0O

ve-0

Kra

nz

7287

GE

R53

.342

28.

4225

554

8.07

7209

CS

2843

90.

7286

458.

531

7294

CS

2859

8P

er-2

Kra

nz

7294

RU

S58

56.3

167

8314

.66

7438

CS

2851

90.

7941

1284

.68

172

97C

S28

601

Pf-

0K

ran

z72

97G

ER

48.5

479

9.11

033

263.

8271

63C

S28

336

0.85

6126

3.82

173

05C

S28

653

Pt-

0K

ran

z73

05G

ER

53.4

7610

.606

577

7.17

7147

CS

2828

00.

7273

237.

351

7306

CS

2865

0P

og

-0K

ran

z73

06C

AN

49.2

655

)123

.206

8355

.88

7130

CS

2824

60.

7464

8355

.88

173

09C

S28

649

Po

-1K

ran

z73

09G

ER

50.7

167

7.1

8130

.69

7300

CS

2864

00.

812

632.

51

7320

CS

2869

2R

ou

-0K

ran

z73

20FR

A49

.442

41.

0984

955

3.85

4840

UK

SW

06-2

400.

7468

432.

341

7325

CS

2871

7R

u-0

Kra

nz

7325

GE

R50

.330

87.

9154

123

1.93

8304

Hi-

00.

8723

231.

931

7333

CS

2872

9S

ei-0

Kra

nz

7333

ITA

46.5

438

11.5

614

815.

3673

71C

S28

778

0.93

6672

9.62

173

54C

S28

759

Tin

g-1

Ko

orn

nee

f73

54S

WE

56.5

14.9

1043

.52

5962

Dra

IV5-

250.

7368

489.

891

7371

CS

2877

8T

s-7

Kra

nz

7371

ES

P41

.719

42.

9305

672

9.62

7333

CS

2872

90.

9366

729.

621

7388

CS

2880

6W

ag-1

Ko

orn

nee

f73

88N

ED

51.9

666

5.66

6645

95.3

655

34U

KN

W06

-352

0.82

1461

7.94

174

05C

S28

814

Wc-

2K

ran

z74

05G

ER

52.6

10.0

667

516.

2870

41C

S28

114

0.90

1414

6.65

182

41Li

aru

mLi

aru

mS

all

8241

SW

E55

.95

13.8

511

26.1

784

23H

ov2

-10.

7917

13.3

51

8256

Ba1

-2B

a1-2

No

rdb

org

8256

SW

E56

.412

.910

69.7

157

48U

KID

430.

7126

1069

.71

183

13Jm

-0Jm

-0K

ran

z83

13C

ZE

4915

525.

2641

7D

ou

bra

vnik

140.

7625

101.

831

8326

Lis-

1Li

s-1

No

rdb

org

8326

SW

E56

14.7

1010

.37

5975

Dra

IV6-

40.

7436

496.

431

8343

Na-

1N

a-1

Kra

nz

8343

FRA

47.5

1.5

1162

.29

5772

UK

ID67

0.85

8250

4.07

183

87S

t-0

St-

0K

ran

z83

87S

WE

5918

1001

.29

8289

Ei-

20.

9441

997.

521

9058

Vas

terv

ikV

aste

rvik

No

rdb

org

9058

SW

E57

.75

16.6

333

1160

.496

8B

ols

ena-

4-11

80.

7808

968.

291

9230

CS

7557

1D

el-1

0B

eck

9230

SR

B44

.944

421

.182

881

88.1

982

6K

YF-

170.

7122

8188

.19

194

36P

uk

1P

uk

1A

nas

tasi

o94

36S

WE

56.1

633

14.6

806

1181

.27

5748

UK

ID43

0.73

8611

81.2

71

9455

Ste

4S

te4

An

asta

sio

9455

SW

E57

.800

918

.516

259

6.63

9413

Ko

r4

0.72

6614

7.07

194

67T

ur

1T

ur

1A

nas

tasi

o94

67S

WE

57.6

511

14.8

043

1145

.360

44Lo

v-3

145

9.85

218

54C

S28

686

Ri-

0K

ran

z73

17C

AN

49.1

632

)123

.137

7652

.08

1854

MN

F-P

ot-

231

2882

.44

218

54M

usk

SP

-82

Mu

skS

P-8

2B

yers

2090

US

A43

.248

3)8

6.33

6862

65.5

1854

MN

F-P

ot-

231

19.1

62

1854

Pen

t-67

Pen

t-67

Bye

rs22

27U

SA

43.7

623

)86.

3929

6195

.62

1854

MN

F-P

ot-

231

13.5

42

1854

CS

2818

6C

SH

L-10

Wei

ss67

34U

SA

40.8

585

)73.

4675

5610

.81

1854

MN

F-P

ot-

231

1051

.81

218

54C

S28

187

CS

HL-

11W

eiss

6735

US

A40

.858

5)7

3.46

7556

10.8

118

54M

NF-

Po

t-23

110

51.8

12

1854

CS

2838

8K

in-0

Kra

nz

6926

US

A44

.46

)85.

3760

84.4

318

54M

NF-

Po

t-23

185

.42

218

54C

S28

387

Kin

-0K

ran

z71

93U

SA

44.4

6)8

5.37

6084

.43

1854

MN

F-P

ot-

231

85.4

22

1854

CS

2872

8S

eatt

le-0

Am

asin

o73

32U

SA

47)1

22.2

7851

.93

1854

MN

F-P

ot-

231

2829

.42

1854

Sea

ttle

-0S

eatt

le-0

Am

asin

o82

45U

SA

47)1

22.2

7851

.93

1854

MN

F-P

ot-

231

2829

.42

1854

Kin

-0K

in-0

Kra

nz

8316

US

A44

.46

)85.

3761

19.7

318

54M

NF-

Po

t-23

185

.42

264

34Z

drI

2-9

Zd

rI2-

9R

elic

ho

va64

34C

ZE

49.3

853

16.2

544

4009

.49

7212

CS

2844

61

402.

842

6434

CS

2820

5D

i-G

Ko

orn

nee

f70

96FR

A47

.323

95.

0427

848

41.4

764

34Z

drI

2-9

183

1.55

Mis-identified A. thaliana accessions 557

ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 554–566

Tab

le1

(Co

nti

nu

ed)

Cat

ego

ryH

aplo

gro

up

Acc

essi

on

nam

eN

ativ

en

ame

Co

llect

or

Acc

essi

on

nu

mb

erC

ou

ntr

yLa

titu

de

Lon

git

ud

e

Max

imu

md

ista

nce

tore

lati

ve(k

m)

Mo

stre

late

dac

cess

ion

nu

mb

er

Mo

stre

late

dac

cess

ion

nam

eS

imila

rity

Geo

gra

ph

icd

ista

nce

(km

)

264

34C

S28

209

Di-

2K

ran

z70

99FR

A47

548

59.8

164

34Z

drI

2-9

183

9.65

264

34C

S28

449

Ler-

1H

olu

b69

32G

ER

47.9

8410

.871

944

23.2

364

34Z

drI

2-9

140

2.84

264

34C

S28

448

Ler-

1H

olu

b72

11G

ER

47.9

8410

.871

944

23.2

364

34Z

drI

2-9

140

2.84

264

34C

S28

446

Ler-

0K

oo

rnn

eef

7212

GE

R47

.984

10.8

719

4423

.23

6434

Zd

rI2-

91

402.

842

6434

CS

2844

5Le

r-0

Ko

orn

nee

f72

13G

ER

47.9

8410

.871

944

23.2

364

34Z

drI

2-9

140

2.84

264

34C

S28

450

Ler-

2K

oo

rnn

eef

7214

GE

R47

.984

10.8

719

4423

.23

6434

Zd

rI2-

91

402.

842

6434

CS

2844

7Le

r-0

Ko

orn

nee

f72

15G

ER

47.9

8410

.871

944

23.2

364

34Z

drI

2-9

140

2.84

264

34Le

r-1

Ler-

1H

olu

b83

24G

ER

47.9

8410

.871

944

23.2

364

34Z

drI

2-9

140

2.84

264

34T

770

T77

0Ja

kob

sso

n61

30S

WE

55.8

561

13.3

247

3995

5765

UK

ID60

195

8.24

264

34E

M-0

48E

M-0

48H

olu

b44

6U

K51

.30.

549

43.0

364

34Z

drI

2-9

111

32.4

62

6434

UK

SW

06-1

81U

KS

W06

-181

Ho

lub

4781

UK

50.4

)4.9

5317

.45

6434

Zd

rI2-

91

1518

.41

264

34U

KS

E06

-427

UK

SE

06-4

27H

olu

b52

05U

K51

.30.

449

49.1

564

34Z

drI

2-9

111

39.5

32

6434

UK

NW

06-3

52U

KN

W06

-352

Ho

lub

5534

UK

54.6

)3.1

5000

.264

34Z

drI

2-9

113

98.4

62

6434

UK

ID60

UK

ID60

Ho

lub

5765

UK

57.1

)2.3

5974

.29

6130

T77

01

958.

242

7398

CS

2826

7Fr

-3K

ran

z71

34G

ER

50.1

102

8.68

2244

80.2

573

98C

S28

830

114

76.5

22

7398

CS

2882

8W

s-2

Feld

man

n69

81R

US

52.3

3030

11.9

573

98C

S28

830

10

273

98C

S28

829

Ws-

3H

olu

b73

95R

US

52.3

3030

11.9

560

42Lo

m1-

11

1070

.41

273

98C

S28

823

Ws

Dam

m73

97R

US

52.3

3030

11.9

560

42Lo

m1-

11

1070

.41

273

98C

S28

830

Ws-

4P

elle

tier

7398

RU

S52

.330

2965

.07

7134

CS

2826

71

1476

.52

273

98C

S28

827

Ws-

2Fe

ldm

ann

7399

RU

S52

.330

3011

.95

7398

CS

2883

01

02

7398

CS

2882

6W

s-1

Sco

lnik

7400

RU

S52

.330

3011

.95

7398

CS

2883

01

02

7398

Ws-

2W

s-2

Feld

man

n84

06R

US

52.3

3030

11.9

573

98C

S28

830

10

273

98Lu

l-4-

269

Lul-

4-26

9A

gre

n16

05S

WE

66.2

17.7

667

3596

.47

7398

CS

2883

01

1308

.12

273

98Lo

m1-

1Lo

m1-

1N

ord

bo

rg60

42S

WE

56.0

913

.939

54.2

373

14C

S28

667

110

96.9

62

7398

TH

O04

TH

O04

Jako

bss

on

6223

SW

E62

.799

217

.901

436

06.9

273

98C

S28

830

110

96.8

12

7398

CS

2866

7R

agl-

1K

oo

rnn

eef

7314

UK

54.3

512

)3.4

1697

5029

.960

42Lo

m1-

11

1096

.96

291

97U

od

-3U

od

-3K

och

6414

AU

T48

.314

.45

9512

.45

9197

CS

7553

81

1502

.72

9197

Uo

d-2

Uo

d-2

Ko

ch84

28A

UT

48.3

14.4

595

71.5

891

97C

S75

538

115

02.7

291

97D

raII-

13D

raII-

13R

elic

ho

va58

77C

ZE

49.4

112

16.2

815

9523

.75

8472

LP34

13.4

10.

9928

7450

.36

291

97D

raIII

-5D

raIII

-5R

elic

ho

va58

79C

ZE

49.4

112

16.2

815

9475

.53

9197

CS

7553

81

1657

.67

291

97D

raIII

-7D

raIII

-7R

elic

ho

va58

80C

ZE

49.4

112

16.2

815

9546

.66

9197

CS

7553

81

1657

.67

291

97U

du

I1-

8U

du

I1-

8R

elic

ho

va62

94C

ZE

49.2

771

16.6

314

9503

.91

9197

CS

7553

81

1683

.45

291

97C

S75

602

Ab

il-2

Bec

k92

61D

EN

56.6

752

9.58

9186

97.1

891

97C

S75

538

113

66.3

92

9197

CS

2808

7B

la-1

2K

ran

z70

18E

SP

41.6

833

2.8

9574

.85

9197

CS

7553

81

516.

682

9197

CS

7553

8C

ant-

4B

eck

9197

ES

P41

.241

7)3

.382

892

66.3

370

87C

S28

169

172

49.1

42

9197

MO

G-3

6M

OG

-36

LeC

orr

e24

1FR

A48

.666

7)4

.066

6785

82.6

591

97C

S75

538

139

1.21

291

97S

L-3

SL-

3A

gre

n14

41S

WE

62.8

18.0

833

8051

.23

9197

CS

7553

81

2152

.62

291

97U

KS

W06

-178

UK

SW

06-1

78H

olu

b47

78U

K50

.4)4

.985

13.5

691

97C

S75

538

150

1.92

291

97U

KS

W06

-255

UK

SW

06-2

55H

olu

b48

55U

K50

.4)4

.985

13.5

691

97C

S75

538

150

1.92

291

97U

KS

E06

-375

UK

SE

06-3

75H

olu

b51

77U

K51

.30.

485

80.6

391

97C

S75

538

162

1.4

291

97U

KS

E06

-450

UK

SE

06-4

50H

olu

b52

21U

K51

.20.

385

84.7

191

97C

S75

538

161

2.24

291

97U

KN

W06

-232

UK

NW

06-2

32H

olu

b54

85U

K54

.6)3

.381

23.7

691

97C

S75

538

174

2.18

558 Alison E. Anastasio et al.

ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 554–566

Tab

le1

(Co

nti

nu

ed)

Cat

ego

ryH

aplo

gro

up

Acc

essi

on

nam

eN

ativ

en

ame

Co

llect

or

Acc

essi

on

nu

mb

erC

ou

ntr

yLa

titu

de

Lon

git

ud

e

Max

imu

md

ista

nce

tore

lati

ve(k

m)

Mo

stre

late

dac

cess

ion

nu

mb

er

Mo

stre

late

dac

cess

ion

nam

eS

imila

rity

Geo

gra

ph

icd

ista

nce

(km

)

291

97U

KN

W06

-476

UK

NW

06-4

76H

olu

b56

39U

K54

.7)3

.482

89.7

991

97C

S75

538

174

8.52

291

97U

KN

W06

-493

UK

NW

06-4

93H

olu

b56

56U

K54

.4)2

.983

34.7

791

97C

S75

538

173

0.5

291

97U

KID

5U

KID

5H

olu

b57

12U

K53

.2)4

.182

03.5

791

97C

S75

538

165

6.95

291

97U

KID

7U

KID

7H

olu

b57

14U

K51

.6)0

.685

06.4

291

97C

S75

538

160

1.63

291

97U

KID

61U

KID

61H

olu

b57

66U

K51

.11

8627

.08

9197

CS

7553

81

637.

342

9197

UK

ID11

4U

KID

114

Ho

lub

5818

UK

51.8

)0.6

8631

.17

9197

CS

7553

81

612.

712

9197

CS

2874

7S

q-4

Cra

wle

y68

96U

K51

.408

3)0

.638

385

21.3

491

97C

S75

538

158

9.89

291

97C

S28

074

Bg

-7W

inte

rer

6714

US

A47

.647

9)1

22.3

0586

80.8

591

97C

S75

538

184

18.0

92

9197

CS

2816

7C

ol-

0K

ran

z69

09U

SA

38.3

)92.

380

58.1

191

97C

S75

538

172

49.1

42

9197

CS

2806

7B

erke

ley

Mu

rph

y70

12U

SA

37.8

695

)122

.271

9574

.85

9197

CS

7553

81

9266

.33

291

97C

S28

166

Co

l-0

Kra

nz

7082

US

A38

.3)9

2.3

8058

.11

9197

CS

7553

81

7249

.14

291

97C

S28

172

Co

l-4

List

er70

83U

SA

38.3

)92.

380

87.0

591

97C

S75

538

172

49.1

42

9197

CS

2817

3C

ol-

7W

eig

el70

84U

SA

38.3

)92.

380

58.1

191

97C

S75

538

172

49.1

42

9197

CS

2817

1C

ol-

3M

eyer

ow

itz

7085

US

A38

.3)9

2.3

8058

.11

9197

CS

7553

81

7249

.14

291

97C

S28

175

Co

l-5

Ho

lub

7086

US

A38

.3)9

2.3

8001

.76

9197

CS

7553

81

7249

.14

291

97C

S28

169

Co

l-1

Red

ei70

87U

SA

38.3

)92.

380

01.7

691

97C

S75

538

172

49.1

42

9197

CS

2816

8C

ol-

8A

lon

so70

88U

SA

38.3

)92.

380

01.7

691

97C

S75

538

172

49.1

42

9197

CS

2817

0C

ol-

2S

om

ervi

lle70

90U

SA

38.3

)92.

380

87.0

591

97C

S75

538

172

49.1

42

9197

CS

2846

4Li

mep

ort

Mu

rph

y72

33U

SA

40.5

088

)75.

4472

6856

.59

9197

CS

7553

81

5871

.58

291

97C

S28

722

San

taC

lara

Mu

rph

y73

29U

SA

37.2

1)1

21.1

695

73.2

291

97C

S75

538

192

57.2

12

9197

Co

l-0

Co

l-0

Kra

nz

8279

US

A38

.3)9

2.3

8087

.05

9197

CS

7553

81

7249

.14

291

97S

anta

Cla

raS

anta

Cla

raM

urp

hy

8377

US

A37

.21

)121

.16

9573

.22

9197

CS

7553

81

9257

.21

291

97LP

3413

.41

LP34

13.4

1B

ore

vitz

8472

US

A41

.686

2)8

6.85

1374

50.3

670

84C

S28

173

0.99

2849

5.16

291

9732

8RM

X02

532

8RM

X02

5B

ore

vitz

8854

US

A42

.036

)86.

511

7401

.25

9197

CS

7553

81

6614

.24

329

CA

M-2

2C

AM

-22

LeC

orr

e29

FRA

48.2

667

)4.5

8333

738.

1218

1M

IB-3

01

738.

123

972

Bo

lsen

a-5-

120

Bo

lsen

a-5-

120

Ag

ren

969

ITA

42.6

512

7721

.11

1068

Bro

sarp

-37-

149

193

6.57

397

2B

ols

ena-

8-12

5B

ols

ena-

8-12

5A

gre

n97

2IT

A42

.65

1277

21.1

110

68B

rosa

rp-3

7-14

91

936.

573

972

Ham

-25-

251

Ham

-25-

251

Ag

ren

1372

SW

E59

.783

317

.583

313

71.8

810

68B

rosa

rp-3

7-14

91

379.

063

995

Bo

lsen

a-2-

113

Bo

lsen

a-2-

113

Ag

ren

966

ITA

42.6

512

7670

.25

995

Bo

lsen

a-11

-127

190

5.97

310

01C

HA

-4C

HA

-4D

on

oh

ue

923

US

A42

.363

4)7

1.14

4560

71.3

210

01A

le-S

ten

ar-6

3-22

159

72.0

33

1123

Gar

db

y-8-

171

Gar

db

y-8-

171

Ag

ren

1123

SW

E56

.616

716

.65

1526

.62

1426

Ro

d-5

-284

153

6.61

311

23G

ard

by-

17-1

98G

ard

by-

17-1

98A

gre

n11

32S

WE

56.6

167

16.6

579

41.6

1133

Gar

db

y-18

-201

10

311

23G

ard

by-

21-2

11G

ard

by-

21-2

11A

gre

n11

36S

WE

56.6

167

16.6

515

26.6

214

26R

od

-5-2

841

536.

613

1369

Ale

dal

-8-5

5A

led

al-8

-55

Ag

ren

1160

SW

E56

.716

.516

772

06.0

311

55A

led

al-3

-40

10

313

69A

ng

so-2

8-40

7A

ng

so-2

8-40

7A

gre

n13

06S

WE

59.5

667

16.8

667

7069

.61

1155

Ale

dal

-3-4

01

227.

453

1369

Ham

-8-2

35H

am-8

-235

Ag

ren

1364

SW

E59

.783

317

.583

319

12.3

1354

Ham

-16

10

313

69H

am-1

8-24

5H

am-1

8-24

5A

gre

n13

69S

WE

59.7

833

17.5

833

6976

.21

1354

Ham

-16

10

358

01U

KID

97U

KID

97H

olu

b58

01U

K51

)2.2

1335

.34

6904

CS

2809

51

1335

.34

367

09C

S28

143

CIB

C-5

Cra

wle

y69

08U

K51

.408

3)0

.638

379

65.4

167

09C

S28

069

176

56.5

73

6709

CIB

C-5

CIB

C-5

Cra

wle

y82

77U

K51

.408

3)0

.638

379

65.4

167

09C

S28

069

176

56.5

73

6797

NO

GN

OG

LeC

orr

e25

6FR

A47

.85

2.73

333

1469

8.83

6797

CS

2835

61

336.

363

6797

WH

A2

WH

A2

LeC

orr

e40

0U

K53

)451

08.9

767

97C

S28

356

125

5.37

Mis-identified A. thaliana accessions 559

ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 554–566

Tab

le1

(Co

nti

nu

ed)

Cat

ego

ryH

aplo

gro

up

Acc

essi

on

nam

eN

ativ

en

ame

Co

llect

or

Acc

essi

on

nu

mb

erC

ou

ntr

yLa

titu

de

Lon

git

ud

e

Max

imu

md

ista

nce

tore

lati

ve(k

m)

Mo

stre

late

dac

cess

ion

nu

mb

er

Mo

stre

late

dac

cess

ion

nam

eS

imila

rity

Geo

gra

ph

icd

ista

nce

(km

)

373

76C

S28

784

Tu

-1K

ran

z73

76IT

A45

7.5

7116

.97

7198

CS

2839

31

351.

43

8115

CS

2828

1G

ifu

-2T

suka

ya71

48JP

N35

.45

137.

4211

308.

7384

34LP

3413

.101

110

316.

473

8115

328P

NA

061

328P

NA

061

Bo

revi

tz86

98U

SA

42.0

945

)86.

3253

1032

9.55

2408

Yn

g-4

70.

9643

28.8

93

8344

UK

ID96

UK

ID96

Ho

lub

5800

UK

57.4

)5.5

1174

.71

8344

Nd

-11

1174

.71

383

52U

KID

52U

KID

52R

atcl

iffe

5757

UK

54.6

)2.3

2198

.59

8352

Oy-

01

675.

763

8370

UK

ID59

UK

ID59

Ho

lub

5764

UK

54.7

)2.8

9198

.64

8370

Rm

x-A

021

5979

.27

383

70U

KID

63U

KID

63H

olu

b57

68U

K54

.1)1

.592

18.0

783

70R

mx-

A02

160

79.3

383

70U

KID

73U

KID

73H

olu

b57

78U

K52

.21.

593

05.3

183

70R

mx-

A02

163

43.0

53

8634

TO

U-A

1-13

8T

OU

-A1-

138

Ro

ux

299

FRA

46.6

667

4.11

667

6796

8634

11M

E2.

171

6794

.83

9059

Ho

g-2

Ho

g-2

No

rdb

org

9059

SW

E62

.79

17.9

6501

.99

7200

CS

2841

71

3980

.45

391

76C

S75

599

Tru

st-1

Bec

k92

58D

EN

56.2

945

9.67

4419

88.5

9176

CS

7551

71

1988

.53

9277

CS

7560

0T

rust

-2B

eck

9259

DE

N56

.294

59.

6744

6545

.32

9277

CS

7561

81

526.

824

1A

LL1-

2A

LL1-

2Le

Co

rre

1FR

A45

.266

71.

4833

346

7.77

141

LDV

-49

146

7.77

41

LDV

-49

LDV

-49

LeC

orr

e14

1FR

A48

.516

7)4

.066

6746

7.77

1A

LL1-

21

467.

774

9A

LL2-

1A

LL2-

1Le

Co

rre

9FR

A45

.266

71.

4833

316

69.1

426

5P

YL-

61

210.

844

9P

YL-

6P

YL-

6Le

Co

rre

265

FRA

44.6

5)1

.166

6719

16.9

79

ALL

2-1

121

0.84

482

CU

R-4

CU

R-4

LeC

orr

e82

FRA

451.

7520

7.38

315

TO

U-A

1-34

120

7.38

482

TO

U-A

1-34

TO

U-A

1-34

Ro

ux

315

FRA

46.6

667

4.11

667

207.

3882

CU

R-4

120

7.38

491

JEA

JEA

LeC

orr

e91

FRA

43.6

833

7.33

333

941.

1825

3M

OG

-57

194

1.18

491

MO

G-5

7M

OG

-57

LeC

orr

e25

3FR

A48

.666

7)4

.066

6761

34.3

991

JEA

194

1.18

415

6LD

V-7

0LD

V-7

0Le

Co

rre

156

FRA

48.5

167

)4.0

6667

619.

627

0T

OU

-A1-

101

619.

64

156

TO

U-A

1-10

TO

U-A

1-10

Ro

ux

270

FRA

46.6

667

4.11

667

5950

.88

156

LDV

-70

161

9.6

433

4LD

V-4

8LD

V-4

8Le

Co

rre

140

FRA

48.5

167

)4.0

6667

705.

533

4T

OU

-A1-

681

628.

584

334

TO

U-A

1-68

TO

U-A

1-68

Ro

ux

334

FRA

46.6

667

4.11

667

628.

5814

0LD

V-4

81

628.

584

973

Bo

lsen

a-11

-127

Bo

lsen

a-11

-127

Ag

ren

973

ITA

42.6

512

1128

.52

1367

Ham

-13-

241

0.98

811

28.5

24

973

Ham

-13-

241

Ham

-13-

241

Ag

ren

1367

SW

E59

.783

317

.583

368

25.9

973

Bo

lsen

a-8-

125

0.98

811

28.5

24

1062

Bro

sarp

-15-

138

Bro

sarp

-15-

138

Ag

ren

1062

SW

E55

.716

714

.133

311

52.4

511

40G

ard

by-

25-2

220.

9868

170.

864

1062

Gar

db

y-25

-222

Gar

db

y-25

-222

Ag

ren

1140

SW

E56

.616

716

.65

7121

.09

964

Bel

mo

nte

-15-

109

190

3.9

411

34B

rosa

rp-6

3-16

3B

rosa

rp-6

3-16

3A

gre

n10

75S

WE

55.7

167

14.1

333

778.

2311

34G

ard

by-

19-2

050.

9873

170.

124

1134

Gar

db

y-19

-205

Gar

db

y-19

-205

Ag

ren

1134

SW

E56

.616

716

.65

6071

.32

1118

Gar

db

y-28

10

411

37B

elm

on

te-1

5-10

9B

elm

on

te-1

5-10

9A

gre

n96

4IT

A42

.116

712

.483

376

29.6

111

37G

ard

by-

22-2

131

903.

94

1137

Gar

db

y-22

-213

Gar

db

y-22

-213

Ag

ren

1137

SW

E56

.616

716

.65

7941

.611

18G

ard

by-

281

04

1137

Ale

dal

-9-5

8A

led

al-9

-58

Ag

ren

1161

SW

E56

.716

.516

769

77.4

911

45A

led

al-2

60.

9881

04

1137

An

gso

-42-

411

An

gso

-42-

411

Ag

ren

1308

SW

E59

.566

716

.866

768

10.5

813

06A

ng

so-2

8-40

71

04

1305

An

gso

-26-

405

An

gso

-26-

405

Ag

ren

1305

SW

E59

.566

716

.866

722

69.4

313

08A

ng

so-4

2-41

11

04

1305

Ham

-26-

254

Ham

-26-

254

Ag

ren

1373

SW

E59

.783

317

.583

328

26.8

1305

An

gso

-26-

405

0.98

8544

.09

456

83C

S28

019

An

g-1

Kra

nz

6993

BE

L50

.35.

366

68.3

5683

UK

NW

99-0

401

640.

44

5683

An

g-0

An

g-0

Kra

nz

8254

BE

L50

.35.

374

5.38

5683

UK

NW

99-0

401

640.

44

5683

UK

NW

99-0

40U

KN

W99

-040

Ho

lub

5683

UK

54.6

)3.1

5951

.57

6993

CS

2801

91

640.

44

5748

CS

2838

9K

l-0

Kra

nz

7194

GE

R50

.95

6.96

6691

7.22

5748

UK

ID43

184

6.66

457

48U

KID

26U

KID

26H

olu

b57

33U

K54

.7)2

.613

70.4

257

48U

KID

430.

9886

150.

184

5748

UK

ID43

UK

ID43

Rat

clif

fe57

48U

K56

)4.4

6043

.471

94C

S28

389

184

6.66

560 Alison E. Anastasio et al.

ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 554–566

Tab

le1

(Co

nti

nu

ed)

Cat

ego

ryH

aplo

gro

up

Acc

essi

on

nam

eN

ativ

en

ame

Co

llect

or

Acc

essi

on

nu

mb

erC

ou

ntr

yLa

titu

de

Lon

git

ud

e

Max

imu

md

ista

nce

tore

lati

ve(k

m)

Mo

stre

late

dac

cess

ion

nu

mb

er

Mo

stre

late

dac

cess

ion

nam

eS

imila

rity

Geo

gra

ph

icd

ista

nce

(km

)

457

48C

S28

386

Kil-

0K

ran

z71

92U

K55

.639

5)5

.663

6415

85.4

971

94C

S28

389

191

7.22

458

19C

S28

211

Dr-

0K

ran

z71

06G

ER

51.0

5113

.733

611

58.1

458

19U

KID

291

1158

.14

458

19U

KID

29U

KID

29H

olu

b58

19U

K55

.9)3

.214

26.3

371

06C

S28

211

111

58.1

44

5842

Bo

r-10

Bo

r-10

Rel

ich

ova

5842

CZ

E49

.401

316

.232

694

74.5

461

43T

910

147

9.48

458

42D

raIII

-14

Dra

III-1

4R

elic

ho

va58

82C

ZE

49.4

112

16.2

815

9475

.53

5837

Bo

r-1

13.

64

5842

T91

0T

910

Jako

bss

on

6143

SW

E55

.940

613

.538

386

78.3

358

42B

or-

101

479.

484

5964

Dra

IV5-

28D

raIV

5-28

Rel

ich

ova

5964

CZ

E49

.411

216

.281

591

95.7

180

77P

T2.

211

7472

.29

459

64P

T2.

21P

T2.

21D

un

nin

g80

77U

SA

41.3

423

)86.

7368

1037

9.46

5964

Dra

IV5-

281

7472

.29

461

08V

OU

-5V

OU

-5Le

Co

rre

394

FRA

46.6

50.

1666

6710

99.7

861

08T

480

110

99.7

84

6108

T48

0T

480

Jako

bss

on

6108

SW

E55

.798

913

.120

613

03.7

739

4V

OU

-51

1099

.78

469

90C

S28

132

Cer

v-1

Ko

orn

nee

f70

68IT

A42

12.1

7684

.74

6990

CS

2801

40.

9861

895.

944

6990

CS

2801

4A

mel

-1K

oo

rnn

eef

6990

NE

D53

.448

5.73

895.

9470

68C

S28

132

0.98

6189

5.94

469

96C

S28

017

An

-2K

ran

z69

96B

EL

51.2

167

4.4

906.

0770

29C

S28

059

0.99

1836

4.53

469

96C

S28

059

Bch

-3K

ran

z70

29G

ER

49.5

166

9.31

6610

21.4

769

96C

S28

017

0.99

1836

4.53

470

22C

S28

082

Bla

-4K

ran

z70

22E

SP

41.6

833

2.8

2591

.39

8238

Ken

t1

548.

214

7022

CS

2845

1Li

-1K

ran

z72

20G

ER

50.3

833

8.06

6618

05.5

470

22C

S28

082

163

3.21

470

22C

S28

716

Rsc

h-4

Kra

nz

7322

RU

S56

.334

2591

.39

7022

CS

2808

21

2591

.39

470

22R

sch

-4R

sch

-4K

ran

z83

74R

US

56.3

3425

91.3

970

22C

S28

082

125

91.3

94

7022

Ken

tK

ent

No

rdb

org

8238

UK

51.1

50.

422

64.7

170

22C

S28

082

154

8.21

471

27C

S28

242

Est

Dam

m71

27E

ST

58.6

656

24.9

871

317.

1871

28C

S28

243

134

.74

471

27C

S28

243

Est

-0K

ran

z71

28E

ST

58.3

25.3

274.

0271

27C

S28

242

134

.74

471

54C

S28

328

Gr3

Red

ei71

54A

UT

47.0

705

15.4

381

1143

.01

7458

CS

2806

61

590.

054

7154

CS

2832

2G

r-1

Kra

nz

7155

AU

T47

15.5

1116

.48

7154

CS

2832

81

6.3

471

54G

r-1

Gr-

1K

ran

z83

00A

UT

4715

.511

49.5

471

54C

S28

328

16.

34

7154

CS

2806

6B

erR

edei

7458

DE

N55

.675

12.5

687

859.

3671

54C

S28

328

159

0.05

471

54U

KS

E06

-476

UK

SE

06-4

76H

olu

b52

41U

K51

.20.

411

49.5

471

54C

S28

328

111

43.0

14

7156

CS

2832

3G

r-2

Kra

nz

7156

AU

T47

15.5

836.

6570

25C

S28

078

135

1.25

471

56C

S28

078

Bl-

1K

ran

z70

25IT

A44

.504

111

.339

665

7.08

7156

CS

2832

31

351.

254

7202

CS

2839

4K

l-5

Kra

nz

7199

GE

R50

.95

6.96

6667

93.1

572

02C

S28

380

112

0.55

472

02C

S28

380

Kb

-0K

ran

z72

02G

ER

50.1

797

8.50

861

6952

.73

7199

CS

2839

41

120.

554

7246

CS

2848

9M

a-2

Kra

nz

7246

GE

R50

.816

78.

7667

669.

8174

15C

S28

838

110

7.25

472

46C

S28

838

Wu

-0K

ran

z74

15G

ER

49.7

878

9.93

6167

0.24

7246

CS

2848

91

107.

254

7250

CS

2849

1M

e-0

Kra

nz

7250

GE

R51

.918

310

.113

870

77.1

284

03W

a-1

174

4.6

472

50C

S28

804

Wa-

1K

ran

z73

94P

OL

52.3

2172

52.1

372

50C

S28

491

174

4.6

472

50W

a-1

Wa-

1K

ran

z84

03P

OL

52.3

2172

52.1

372

50C

S28

491

174

4.6

474

29C

S28

455

Li-3

:3K

ran

z72

25G

ER

50.3

833

8.06

6668

28.9

274

29C

S28

496

141

0.94

474

29C

S28

496

Mr-

0K

ran

z74

29IT

A44

.15

9.65

9739

.57

7225

CS

2845

51

410.

944

7429

CS

2849

7M

r-0

Kra

nz

7522

ITA

44.1

59.

6513

25.3

183

38M

r-0

10

482

65B

lh-1

Blh

-1K

ran

z82

65C

ZE

4819

1667

.59

7117

CS

2822

80.

993

715.

664

8265

CS

2822

8E

l-0

Kra

nz

7117

GE

R51

.510

59.

6825

397

0.25

8265

Blh

-10.

993

715.

664

8280

CS

2820

6D

i-M

Viz

ir70

95FR

A47

524

05.6

482

80C

t-1

197

7.91

482

80C

S28

233

En

-1K

ran

z71

18G

ER

508.

520

29.4

482

80C

t-1

182

7.18

Mis-identified A. thaliana accessions 561

ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 554–566

Tab

le1

(Co

nti

nu

ed)

Cat

ego

ryH

aplo

gro

up

Acc

essi

on

nam

eN

ativ

en

ame

Co

llect

or

Acc

essi

on

nu

mb

erC

ou

ntr

yLa

titu

de

Lon

git

ud

e

Max

imu

md

ista

nce

tore

lati

ve(k

m)

Mo

stre

late

dac

cess

ion

nu

mb

er

Mo

stre

late

dac

cess

ion

nam

eS

imila

rity

Geo

gra

ph

icd

ista

nce

(km

)

482

80C

S28

230

En

-DK

ran

z71

20G

ER

508.

520

29.4

482

80C

t-1

182

7.18

482

80E

n-1

En

-1K

ran

z82

90G

ER

508.

520

29.4

482

80C

t-1

182

7.18

482

80C

S28

196

Ct-

1K

ran

z69

10IT

A37

.315

2166

.15

8280

Ct-

11

04

8280

CS

2819

5C

t-1

Kra

nz

7067

ITA

37.3

1521

12.2

482

80C

t-1

10

482

80C

t-1

Ct-

1K

ran

z82

80IT

A37

.315

2166

.15

7118

CS

2823

31

827.

184

8341

CS

2848

8M

a-0

Kra

nz

7245

GE

R50

.816

78.

7667

1561

.25

8341

Mt-

01

1501

.97

483

41C

S28

502

Mt-

0K

ran

z72

49LI

B32

.334

22.4

626

51.9

8341

Mt-

01

04

8341

Mt-

0M

t-0

Kra

nz

8341

LIB

32.3

422

.46

1501

.97

7245

CS

2848

81

1501

.97

483

59U

KID

1U

KID

1H

olu

b57

08U

K56

.8)3

.959

00.2

983

59P

na-

171

5816

.74

483

59U

KID

23U

KID

23H

olu

b57

30U

K55

.3)1

.860

76.3

583

59P

na-

171

5993

.87

483

62C

S28

654

Pu

2-7

Cet

l69

56C

ZE

49.4

216

.36

1080

.56

8362

Pu

2-7

10

483

62P

u2-

7P

u2-

7C

etl

8362

CZ

E49

.42

16.3

611

46.1

557

27U

KID

201

1096

.69

483

62U

KID

20U

KID

20H

olu

b57

27U

K51

.31.

111

19.2

983

62P

u2-

71

1096

.69

483

82C

S28

744

Sp

r1-2

No

rdb

org

6964

SW

E56

.316

6283

.52

8382

Sp

r1-2

10

483

82S

pr1

-2S

pr1

-2N

ord

bo

rg83

82S

WE

56.3

1662

83.5

269

64C

S28

744

10

483

82U

KID

85U

KID

85H

olu

b57

90U

K51

.30.

454

78.7

883

82S

pr1

-21

1073

.59

483

86S

r:5

Sr:

5C

etl

8386

SW

E58

.911

.282

7.66

8423

Ho

v2-1

127

0.62

483

86H

ov2

-1H

ov2

-1N

ord

bo

rg84

23S

WE

56.1

13.7

467

86.6

483

86S

r:5

127

0.62

483

89C

S28

753

Ta-

0K

ran

z73

49C

ZE

49.5

14.5

8508

.283

89T

a-0

10

483

89T

a-0

Ta-

0K

ran

z83

89C

ZE

49.5

14.5

8508

.273

49C

S28

753

10

483

89C

S28

754

Tac

-0M

itch

ell-

Old

s73

50U

SA

47.2

413

)122

.459

8773

.53

8389

Ta-

01

8508

.24

8394

CS

2878

3T

u-0

Kra

nz

7375

ITA

457.

598

02.9

669

72C

S28

782

198

02.9

64

8394

Tu

-0T

u-0

Kra

nz

8395

ITA

457.

597

79.2

973

75C

S28

783

10

483

94C

S28

782

Tsu

-1H

olu

b69

72JP

N34

.43

136.

3198

02.9

673

75C

S28

783

198

02.9

64

8394

CS

2878

1T

su-1

Ho

lub

7374

JPN

34.4

313

6.31

9802

.96

7375

CS

2878

31

9802

.96

483

94T

su-1

Tsu

-1H

olu

b83

94JP

N34

.43

136.

3197

79.2

973

75C

S28

783

0.99

397

79.2

94

8399

Uo

d-7

Uo

d-7

Ko

ch83

99A

UT

48.3

14.4

563

8.44

6425

Zd

rI1-

241

149.

074

8399

Zd

rI1-

24Z

drI

1-24

Rel

ich

ova

6425

CZ

E49

.385

316

.254

454

6.39

8399

Uo

d-7

114

9.07

492

25C

S75

565

Pir

in-1

7B

eck

9224

BU

L41

.595

623

.548

314

46.5

692

25C

S75

566

126

1.5

492

25C

S75

566

Del

-1B

eck

9225

SR

B44

.944

421

.182

811

76.4

992

24C

S75

565

126

1.5

492

60C

S75

620

An

d-2

Bec

k92

79B

EL

50.8

54.

2833

380

7.32

9278

CS

7561

91

04

9260

CS

7562

1A

nd

-3B

eck

9280

BE

L50

.85

4.28

333

807.

3292

79C

S75

620

10

492

60C

S75

622

An

d-4

Bec

k92

81B

EL

50.8

54.

2833

323

27.5

392

60C

S75

601

155

5.85

492

60C

S75

619

An

d-1

Bec

k92

78B

EL

50.8

54.

2833

380

7.32

9279

CS

7562

01

04

9260

CS

7560

1A

bil-

1B

eck

9260

DE

N56

.675

29.

5891

8584

.97

9280

CS

7562

11

555.

85

AR

M=

Arm

enia

,A

UT

=A

ust

ria,

AZ

E=

Aze

rbai

jan

,B

EL

=B

elg

ium

,B

UL

=B

ulg

aria

,C

AN

=C

anad

a,C

PV

=C

ape

Ver

de

Isla

nd

s,C

RO

=C

roat

ia,

CZ

E=

Cze

chR

epu

blic

,D

EN

=D

enm

ark,

ES

P=

Sp

ain

,E

ST

=E

sto

nia

,FI

N=

Fin

lan

d,

FRA

=Fr

ance

,G

EO

=G

eorg

ia,

GE

R=

Ger

man

y,IN

D=

Ind

ia,

IRL

=Ir

elan

d,

ITA

=It

aly,

JPN

=Ja

pan

,K

AZ

=K

azak

hst

an,

KG

Z=

Kyr

gyz

stan

,LI

B=

Lib

ya,

LTU

=Li

thu

ania

,M

AR

=M

oro

cco

,N

ED

=N

eth

erla

nd

s,N

OR

=N

orw

ay,

NZ

L=

New

Zea

lan

d,

PO

L=

Po

lan

d,

PO

R=

Po

rtu

gal

,R

OU

=R

om

ania

,R

US

=R

uss

ia,

SR

B=

Ser

bia

,S

UI

=S

wit

zerl

and

,S

WE

=S

wed

en,

TJK

=T

ajik

ista

n,

TU

R=

Tu

rkey

,U

K=

Un

ited

Kin

gd

om

,U

KR

=U

krai

ne,

US

A=

Un

ited

Sta

tes

of

Am

eric

a,U

ZB

=U

zbek

ista

n

562 Alison E. Anastasio et al.

ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 554–566

corroborated by its neighbors and was thus placed on the

red list. The fourth category comprised 107 members of 39

over-dispersed haplogroups for which it was impossible to

determine which individuals were out of place. All members

in these haplogroups were placed on the red list. Finally,

outside verification was used in several cases where knowl-

edge of the literature exonerated accessions from the red

list.

DISCUSSION

Patterns of isolation by distance are slow to emerge, and are

generally the result of many generations of low but steady

migration, genetic drift, and local recombination (Lew-

andowska-Sabat et al., 2010; Platt et al., 2010). Such patterns

are unlikely to occur accidentally as a result of incorrectly

associating a natural accession with an arbitrary sampling

location. The vast majority of the accessions we examined fit

this pattern of isolation by distance, and thus are almost

certainly properly identified.

Of the remainder, our analysis focused primarily on

haplogroups that were geographically over-dispersed,

indicating potential contamination. In regions such as

continental Europe where A. thaliana has been established

for thousands of years, local haplotype diversity is high,

long-distance migration is rare, much of the available

habitat is already colonized, and outcrossing happens

relatively frequently (Bomblies et al., 2010; Platt et al.,

2010). Naturally sampling two genetically identical plants

more than 10 km apart is extremely unlikely. First, a long-

distance migration event would have to have taken place.

The source population would have to be included in the

sample, and the individual genotype that migrated would

have to be sampled within it. The recipient population

would also have to be included in the sample, and even

though the newly entered genotype would be at very low

frequency in the new population, it too would have to be

represented in the collected sample. If sufficient time had

passed for the migrant haplotype to have increased to

appreciable frequency in its new region, it would also

have had the opportunity to backcross with the estab-

lished population, reducing the probability that it would

be identified as an outlier. A widespread haplotype that is

unrelated to the other individuals sampled in one or more

of its purported sites (particularly when those individuals

tend to be closely related to each other) is thus almost

certainly not a naturally occurring haplotype at those

sites.

A contaminant, on the other hand, need only have been

sampled once from the wild, and may spread easily from pot

to pot, vial to vial, or spreadsheet row to spreadsheet row.

It is surely no accident that many of the widely spread

haplogroups involve the most commonly used laboratory

strains. While it could be imagined that this is because

scientists have spread them worldwide, this over-estimates

our importance as dispersal agents. For example, according

to our results, Col-0 (an accession originally from Central

Europe) is found in ‘natural’ samples from the very north of

Sweden to the south of Spain. This seems unlikely com-

pared to the alternative explanation, which involves con-

tamination in Arabidopsis research facilities, practically all

of which are known to contain vast quantities of Col-0.

Similarly, accessions identical to Ws-2 (Wassilewskija, in

Russia) are found in Germany, Sweden and the UK; acces-

sions identical to Ler-0 are found in the Czech Republic,

France, Sweden and the UK. In all of these cases, many

different collectors were involved (including some of the

authors), suggesting that several laboratory collections are

probably contaminated. Error may also have happened at

the level of a single collector. We found several instances

where distant accessions sampled by the same collector

apparently shared a haplotype. The case for better growth

practices and record-keeping is clear.

Putative mislabeling events can be found when examining

genetic information for similarly named but geographically

distant accessions. For example, the nominally ‘Japanese’

Figure 1. Proportion of suspicious accessions (a) from each collection and (b)

from large and small local populations.

Mis-identified A. thaliana accessions 563

ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 554–566

accession Tsu-0 appears to be identical to the nominally

‘Italian’ accession Tu-0, and the ‘American’ accession Tac-0 is

identical to the ‘Czech’ accession Ta-0. Such mix-ups have

been noted previously: C24 was historically referred to as

Columbia, but Torjek et al. (2003) suggested that it was

clearly a different accession both phenotypically and genet-

ically. The case for barcode-based labeling when collecting,

sowing seed and transplanting is also clear.

As expected if discrepancies occurred due to handling,

older collections are often affected. For example, in the Kranz

collection, the ‘German’ accession Li-1 appears to be iden-

tical to the ‘Spanish’ accession Bla-4 and the ‘Russian’

accession Rsch-4. Which origin is the original one (if any) is

very difficult to ascertain. Older collections also suffer from

the well-known problem that ‘ecotypes’ were originally

propagated in bulk, and then gradually developed into

individual lines. Thus, for example, our collection contains

ten different samples from ‘Ws’ (Wassilewskija, from Russia),

with ten different stock center IDs, but only six abbreviated

names (Ws, Ws-0, Ws-1, Ws-2, Ws-3 and Ws-4). Based on our

marker data, these appear to be two distinct genotypes: the

three accessions called Ws-0 were different from all the rest, a

finding that has also been described by Aukerman et al.

(1997). This result is not surprising given that the latter four

accessions (Ws-1–Ws-4) were all derived very recently from a

single line in the B. Griffing collection. This line has itself has

been separated from the Kranz set for decades, traversing

from Laibach to Langridge and Griffing in Australia, and then

to the Griffing laboratory at Ohio State University, while Ws-0

came directly from the Kranz collection.

We recommend that researchers do not use accessions

from our red list when geographic data are relevant, because

it appears unlikely that they are the same accessions that

were originally collected at these locations. Although we

have no specific reason to question the integrity of the

accessions on the yellow list, their location and genetic

information is not corroborated as well as those on the green

list. The relative sizes of the green and red lists suggest that

over 93% of accessions on the yellow list are likely to be

accurate. However, whether or not these accessions should

be included in analyses will need to be decided on a case-by-

case basis. The quality of the data accompanying the

accessions on the green list is generally excellent. However,

it is possible that the green list contains accessions that have

inadvertently been outcrossed during propagation. It is not

possible to distinguish such an accession from a properly

placed one given the level of genetic resolution in our dataset.

We must hope that such events are rare. In general, it is much

easier to definitively corroborate or question the identity of

accessions sampled in the general vicinity of other acces-

sions (Figure 1b). It is difficult to comment with any degree of

certainty on accessions sampled singly from remote regions.

Individual accessions that lack neighbors with which to be

compared are therefore most likely to end up on the yellow

list. Those on the red list should be treated with caution until

corroborating samples are collected. The practice of collect-

ing larger population samples provides researchers with

greater power to verify the integrity of a collection.

In our own collections, we found a 3% error rate in the

assigned origin of accessions (Figure 1a), demonstrating

that it is quite easy to introduce error over even a small

number of generations in the laboratory. The error rate in the

stock center collection was threefold higher (14%), although

these accessions have been propagated for several decades.

Given the increasing number of submissions to the stock

center and the increasing number of laboratories utilizing

these accessions, it is no surprise that mix-ups occur. The

early collections were even maintained through extraordi-

narily difficult conditions during the Second World War

(Maarten Koornneef, Max Plank Institute for Plant Breeding

Research (Cologne), Department of Plant Breeding and

Genetics, personal communication; Reinholz, 1965; Redei,

1992). We strongly urge researchers to be diligent in their

greenhouse and labeling practices for both laboratory-

specific and community-wide collections. Fast and cheap

fingerprinting of accessions, if widely used, could ensure

that mis-identification errors of this kind are unlikely to occur

in the future. For instance, primer information for assays

of the 149 SNPs on which these samples were genotyped is

publicly available through the Bergelson laboratory website

at the University of Chicago (http://bergelson.uchicago.edu/

a.thaliana-resources). In the near future, sequencing costs

will be low enough that even small laboratories will be able

to cheaply confirm the accuracy of geographic information.

There are several ways that Arabidopsis stock centers

could ensure the integrity of propagated lineages: genetic

fingerprinting of current stocks and submitted accessions,

provision of accurate GPS coordinates, and allocation of a

unique barcode to accessions at the time of collection or

initial propagation, to avoid use of the same name abbrevi-

ations for distant populations. Finally, in the interest of

maintaining a robust and informative collection of ecotypes

for use in future geographical and ecological studies, it would

be helpful if location attributes were collected along with

samples in the field. Clear photographs of an individual plant

and its surrounding habitat could be included, together with

altitude, land use, soil type and plant community data as

appropriate. These additional data and safeguards will

ensure that the hard work that goes into collecting and

maintaining stocks for use in the community results in a

trustworthy, informative and increasingly valuable resource.

EXPERIMENTAL PROCEDURES

Dataset

We used the dataset from Platt et al. (2010), described in detail athttp://arabidopsis.usc.edu/. It contains 4942 accessions collectedfrom natural populations (not all of which are available from thestock centers) and 1023 accessions from the Arabidopsis Biological

564 Alison E. Anastasio et al.

ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 554–566

Resource Center. The accessions obtained from the ArabidopsisBiological Resource Center were as a leaf from a single referenceplant such that the distributed seed matched the genotype in thisstudy. Notably, many of the accessions were donated decades agoand have been perpetuated via single seed descent in growthchambers (or greenhouses, in the case of common laboratorystrains) at stock center facilities. The collection spans 42 countriesand four continents. These accessions were genotyped at 149 SNPmarkers using the Sequenom MASSArray technology (SequenomInc., http://www.sequenom.com). As in Platt et al. (2010), weexcluded individuals that lacked a large number of genotype calls(more than 50 of 149), as this indicates poor quality of the genomicDNA. We also excluded information from 10 SNPs due to poorlyperforming genotype assays. However, we did retain accessionsthat were deemed geographically or genetically suspicious by Plattet al. (2010), leaving a dataset of 5965 accessions.

How haplogroups were created

Accessions were grouped by haplotype by Platt et al. (2010), andwe used the same groups. Briefly, haplogroups were createdusing a modified quality threshold (QT) clustering algorithm(Heyer et al., 1999). Haplogroups are maximal collections ofaccessions for which the observed full SNP genotype of eachaccession in the haplogroup is expected to be consistent beyondthe variation attributable to genotyping error. The actual numberof discrepancies allowed between an observed genotype and thepresumed common haplotype is a function of the size of the ha-plogroup. For all values of N, the accession with the Nth mostdiscrepant genotype is expected to have fewer discrepancies thanthe Nth most discrepant genotype in 95% of hypothetical samplesof identical individuals (where genotyping errors occur at 0.5% ofmarkers and are independently distributed). In rare cases involv-ing highly heterozygous accessions that are likely to be recentcrosses between other sampled accessions, haplogroups may bemore diverse than desired. No accessions were included on thered list solely because of this effect, although some accessions onthe red list were affected by high heterozygosity. Heterozygotesare indicated using standard nomenclature in the ‘genetic finger-print’ column of Table S1.

Algorithm

Accessions were divided into two major groups based on theirsimilarity to geographic and genetic neighbors. Broadly, thoseaccessions whose identity was corroborated by nearby individu-als and without geographically distant genetic relatives wereincluded on the green list. The vast majority of accessions fellinto this category. Only when individuals were genetically similarto geographically distant accessions and whose identity was notcorroborated by geographically close neighbors were theyincluded in the red or yellow lists. A number of further classifi-cations were used to categorize the types of accessions on thered list.

Accessions were initially considered for inclusion on the greenlist when at least ten accessions were sampled within 10 km andeither a corroborating accession existed or the accession sharedalleles at more than 70% of its markers with all other accessionsfound within 10 km. A corroborating accession is either anotheraccession in the same haplogroup found within 10 km or anaccession, found within 100 km, that shares alleles at more than70% of markers. Accessions from North America or the UK wereincluded in the green list unless an accession sharing alleles atmore than 70% of markers was found on another continent or inanother country, respectively. Accessions from other regions

were included when no other members of their haplogroup werefound more than 10 km away and no accessions sharing allelesat more than 70% of markers were found more than 100 kmaway.

Accessions that were not from North America or the UK wereinitially considered for the red list if another accession in thesame haplogroup was found more than 10 km away or anaccession sharing alleles at more than 70% of markers was foundmore than 100 km away. For accessions from North America orthe UK to be considered for the red group, we required anaccession sharing alleles at more than 70% of markers to befound on another continent or in another country, respectively.An accession was included in the red list unless it was part of ahaplogroup of at least four accessions where another accessionwithin 10 km shared alleles at more than 70% of markers, andmore than half of the members of the haplogroup also hadaccessions within 10 km sharing alleles at more than 70% ofmarkers. This avoided inclusion on the red list of an entire large,well-supported haplogroup on the basis of a small number ofdubious members.

Once accessions had been relegated to the red list (because theywere not similar to neighbors and were similar to one or moreaccessions from a great distance), we further sub-divided the listinto four categories. In category 1, the flagged accession was thesole member of the haplogroup (long-distance migrants areexpected to be found here. In category 2, all accessions in ahaplogroup were flagged because the group contained a commonlyused laboratory strain that was a likely contaminant. In category 3,one or more accessions in a large haplogroup were flagged for notlooking like the others. Finally, category 4 comprised all accessionsin a haplogroup when it was not clear which (if any) had accurategeographic information.

Accessions that did not qualify for the green or red lists wereplaced on the yellow list.

ACKNOWLEDGEMENTS

We would like to thank Luz Rivero (Ohio State University, ABRC) forcomments and discussion, and two anonymous reviewers forimprovements to the manuscript. This work was primarily sup-ported by US National Science Foundation grant DEB-0519961 toJ.B. and M.N.

SUPPORTING INFORMATION

Additional Supporting Information may be found in the onlineversion of this article:Table S1. Complete dataset of 5965 accessions used in our analysis.Please note: As a service to our authors and readers, this journalprovides supporting information supplied by the authors. Suchmaterials are peer-reviewed and may be re-organized for onlinedelivery, but are not copy-edited or typeset. Technical supportissues arising from supporting information (other than missingfiles) should be addressed to the authors.

REFERENCES

Alonso-Blanco, C. and Koornneef, M. (2000) Naturally occurring variation in

Arabidopsis: an underexploited resources for plant genetics. Trends Plant

Sci. 5, 1360–1385.

Aranzana, M.J., Kim, S., Zhao, K.Y. et al. (2005) Genome-wide association

mapping in Arabidopsis identifies previously known flowering time and

pathogen resistance genes. PLoS Genet. 1, 531–539.

Aukerman, M.J., Hirschfeld, M., Wester, L., Weaver, M., Clack, T., Amasino,

R.M. and Sharrock, R.A. (1997) A deletion in the PHYD gene of the Arabi-

dopsis Wassilewskija ecotype defines a role for phytochrome D in red/far-

red light sensing. Plant Cell, 9, 1317–1326.

Mis-identified A. thaliana accessions 565

ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 554–566

Bakker, E.G., Toomajian, C., Kreitman, M. and Bergelson, J. (2006) A genome-

wide survey of R gene polymorphisms in Arabidopsis. Plant Cell, 18, 1803–

1818.

Banta, J.A., Dole, J., Cruzan, M.B. and Pigliucci, M. (2007) Evidence of local

adaptation to coarse-grained environmental variation in Arabidopsis tha-

liana. Evolution, 61, 2419–2432.

Beck, J.B., Schmuths, H. and Schaal, B.A. (2008) Native range genetic varia-

tion in Arabidopsis thaliana is strongly geographically structured and

reflects Pleistocene glacial dynamics. Mol. Ecol. 17, 902–915.

Bergelson, J., Stahl, E., Dudek, S. and Kreitman, M. (1998) Genetic variation

within and among populations of Arabidopsis thaliana. Genetics, 148,

1311–1323.

Bergelson, J., Kreitman, M., Stahl, E. and Tian, D. (2001) Evolutionary

dynamics of plant R-genes. Science, 292, 2281–2284.

Bomblies, K., Yant, L., Laitinen, R.A., Kim, S.-T., Hollister, J.D., Warthmann,

N., Fitz, J. and Weigel, D. (2010) Local-scale patterns of genetic variability,

outcrossing, and spatial structure in natural stands of Arabidopsis thaliana.

PLoS Genet. 6, e1000890.

Bouchabke, O., Chang, F., Simon, M., Voisin, R., Pelletier, G. and Durand-

Tardif, M. (2008) Natural variation in Arabidopsis thaliana as a tool for

highlighting differential drought responses. PLoS ONE, 3, e1705.

Cetl, I. (1965) Racial differences in the number of days to appearance of the

flower primordia, in the number of rosette leaves, and in the number of

rosette leaves per day in Arabidopsis thaliana Heynh. Arabidopsis

Information Service, http://www.arabidopsis.org/ais/1965/cetl—1965-aag-

mi.html.

Cetl, I., Dobrovolna, J. and Effmertova, E. (1965) Distribution of spring and

winter types in the local populations of Arabidopsis thaliana (L.) Heynh.

from various localities in Western Moravia. Arabidopsis Information Ser-

vice, http://www.arabidopsis.org/ais/1965/cetl—1965-aagmh.html.

Effmertova, E. (1967) The behaviour of ‘summer annual’, ‘mixed’, and ‘winter

annual’ natural populations as compared with early and late races in field

conditions. Arabidopsis Information Service, http://www.arabidopsis.org/

ais/1967/effme-1967-aagph.html.

Effmertova, E. and Cetl, I. (1966) The vernalization requirement of ‘winter-

annual’ populations from Western Moravia. Arabidopsis Information Ser-

vice, http://arabidopsis.org/ais/1966/effme-1966-aagnw.html.

Ehrenreich, I.M. and Purugganan, M.D. (2006) The molecular genetic basis of

plant adaptation. Am. J. Bot. 93, 953–962.

Goss, E.M. and Bergelson, J. (2006) Variation in resistance and virulence in the

interaction between Arabidopsis thaliana and a bacterial pathogen. Evo-

lution, 60, 1562–1573.

Hauser, M.-T., Harr, B. and Schlotterer, C. (2001) Trichome distribution

in Arabidopsis thaliana and its close relative Arabidopsis lyrata:

molecular analysis of the candidate gene GLABROUS1. Mol. Biol. Evol. 18,

1754–1763.

Heyer, L.J., Kruglyak, S. and Yooseph, S. (1999) Exploring expression data:

identification and analysis of coexpressed genes. Genome Res. 9, 1106–

1115.

Johanson, U., West, J., Lister, C., Michaels, S., Amasino, R. and Dean, C.

(2000) Molecular analysis of FRIGIDA, a major determinant of natural var-

iation in Arabidopsis flowering time. Science, 290, 344–347.

Koornneef, M. and Meinke, D. (2010) The development of Arabidopsis as a

model plant. Plant J. 61, 909–921.

Laibach, F. (1943) Arabidopsis thaliana (L.) Heynh. als Objekt fur geneti-

sche und entwicklungsphysiologische Untersuchungen. Bot. Arch. 44,

439–455.

Lewandowska-Sabat, A.M., Fjellheim, S. and Rognli, O.A. (2010) Extremely

low genetic variability and highly structured local populations of Arabid-

opsis thaliana at higher latitudes. Mol. Ecol. 19, 4753–4764.

Mauricio, R. and Rausher, M.D. (1997) Experimental manipulation of putative

selective agents provides evidence for the role of natural enemies in the

evolution of plant defense. Evolution, 51, 1435–1444.

Mauricio, R., Stahl, E.A., Korves, T., Tian, D., Kreitman, M. and Bergelson, J.

(2003) Natural selection for polymorphism in the disease resistance gene

Rps2 of Arabidopsis thaliana. Genetics, 163, 735–746.

McKay, J.K., Richards, J.H., Nemali, K.S., Sen, S., Mitchell-Olds, T., Boles, S.,

Stahl, E.A., Wayne, T. and Juenger, T.E. (2008) Genetics of drought adap-

tation in Arabidopsis thaliana. II. QTL analysis of a new mapping popula-

tion, KAS-1 · TSU-1. Evolution, 62, 3014–3026.

Meyerowitz, E.M. (2001) Prehistory and history of Arabidopsis research. Plant

Physiol. 125, 15–19.

Mitchell-Olds, T. and Schmitt, J. (2006) Genetic mechanisms and evolution-

ary significance of natural variation in Arabidopsis. Nature, 441, 947–952.

Nordborg, M., Hu, T.T., Ishino, Y. et al. (2005) The pattern of polymorphism in

Arabidopsis thaliana. PLoS Biol. 3, 1289.

Novembre, J. and Slatkin, M. (2009) Likelihood-based inference in isolation-

by-distance models using the spatial distribution of low-frequency alleles.

Evolution, 63, 2914–2925.

Pico, F.X., Mendez-Vigo, B., Martınez-Zapater, J.M. and Alonso-Blanco, C.

(2008) Natural genetic variation of Arabidopsis thaliana is geographically

structured in the Iberian peninsula. Genetics, 180, 1009–1021.

Platt, A., Horton, M., Huang, Y.S. et al. (2010) The scale of population struc-

ture in Arabidopsis thaliana. PLoS Genet. 6, e1000843.

Redei, G.P. (1992) A heuristic glance at the past of Arabidopsis genetics. In

Methods in Arabidopsis Research (Koncz, C., Chua, N.-H. and Schell, J.,

eds). River Edge, New Jersey: World Scientific Press Inc, pp. 1–15.

Reinholz, E. (1965) Arabidopsis thaliana (L.) Heynh. als Objekt fur genetische

und entwicklungsphysiologische Untersuchungen. Arabidopsis Informa-

tion Service, http://www.arabidopsis.org/ais/1965/reinh-1965-aagld.html.

Robbelen, G. (1965) The LAIBACH standard collection of natural races. Ara-

bidopsis Information Service, http://www.arabidopsis.org/ais/1965/roebb-

1965-xxxxx.html.

Shindo, C., Bernasconi, G. and Hardtke, C.S. (2007) Natural genetic variation

in Arabidopsis: tools, traits and prospects for evolutionary ecology. Ann.

Bot. 99, 1043–1054.

Somerville, C. and Koornneef, M. (2002) A fortunate choice: the history of

Arabidopsis as a model plant. Nat. Rev. Genet. 3, 883–889.

Stahl, E.A., Dwyer, G., Mauricio, R., Kreitman, M. and Bergelson, J. (1999)

Dynamics of disease resistance polymorphism at the Rpm1 locus of Ara-

bidopsis. Nature, 400, 667–671.

Tian, D., Araki, H., Stahl, E., Bergelson, J. and Kreitman, M. (2002) Signature

of balancing selection in Arabidopsis. Proc. Natl Acad. Sci. USA, 99, 11525–

11530.

Tonsor, S.J., Alonso-Blanco, C. and Koornneef, M. (2005) Gene function

beyond the single trait: natural variation, gene effects, and evolutionary

ecology in Arabidopsis thaliana. Plant Cell Environ. 28, 2–20.

Torjek, O., Berger, D., Meyer, R.C., Mussig, C., Schmid, K.J., Sorensen, T.R.,

Weisshaar, B., Mitchell-Olds, T. and Altmann, T. (2003) Establishment of a

high-efficiency SNP-based framework marker set for Arabidopsis. Plant J.

36, 122–140.

566 Alison E. Anastasio et al.

ª 2011 The AuthorsThe Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 554–566