STATISTICAL APPLICATIONS IN PLANT BREEDING AND ...

135
STATISTICAL APPLICATIONS IN PLANT BREEDING AND GENETICS By CARL ALAN WALKER A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY IN CROP SCIENCE WASHINGTON STATE UNIVERSITY Department of Crop and Soil Sciences MAY 2012

Transcript of STATISTICAL APPLICATIONS IN PLANT BREEDING AND ...

STATISTICAL APPLICATIONS IN PLANT BREEDING

AND GENETICS

By

CARL ALAN WALKER

A dissertation submitted in partial fulfillment of

the requirements for the degree of

DOCTOR OF PHILOSOPHY IN CROP SCIENCE

WASHINGTON STATE UNIVERSITY

Department of Crop and Soil Sciences

MAY 2012

ii

To the Faculty of Washington State University:

The members of the Committee appointed to examine the dissertation of CARL ALAN

WALKER find it satisfactory and recommend that it be accepted.

Kimberly Garland-Campbell, Ph.D., Chair

Fabiano Pita, Ph.D.

J. Richard Alldredge, Ph.D.

Richard Gomulkiewicz, Ph.D.

Daniel Skinner, Ph.D.

iii

ACKNOWLEDGEMENT

I would like to thank my committee members for their advice and assistance with this research

and with writing this dissertation. I would like to thank all the members of both the Campbell

and Steber labs for their advice when I presented my work in lab meetings. I began the project

presented in Chapter 3 as part of a paid internship with Dow AgroSciences. I would like to

thank the members of the Dow AgroSciences Quantitative Genetics group for their assistance

during that internship, especially Kelly Robins who provided some initial programs and data. I

would also like to acknowledge Bruce Walsh, Rebecca Doerge, and Radu Totir for the valuable

advice they gave me at conferences where I presented my work. I would not have been able conduct

this research without the funding for these projects by the Washington Grain Commission and

USDA project 5348-21000-023-00. Finally I‟d like to thank my parents for all their help getting

me this far and my wife Elizabeth for her help editing and moral support.

iv

STATISTICAL APPLICATIONS IN PLANT BREEDING AND

GENETICS

ABSTRACT

by Carl Alan Walker, Ph.D.

Washington State University

May 2012

Chair: Kimberly Garland-Campbell

Statistical analysis has many applications ensuring the validity and reproducibility of plant

breeding and genetics research. Crop plant germplasm collections are often too large to be of use

regularly. A core subset with fewer accessions can increase utility while maintaining most of the genetic

diversity of the complete collection. This study evaluated methods for selecting core subsets using sparse

data. Cores were selected by forming clusters of accessions based on distances estimated with phenotypic

data. Accessions were randomly selected relative to the number of accessions in each cluster. The

method using all the available data to calculate distances, average linkage clustering, and

sampling in proportion to the natural logarithm of cluster size produced the most diverse cores.

Evaluations of genotypes in varied environmental conditions are referred to as multiple

environment trials (MET) and often necessitate estimation of effects of genotypes within environments.

Empirical best linear unbiased predictions can provide more accurate estimates of these effects,

depending upon the mixed model used. An objective of this work was to simulate and analyze MET data

sets to determine which models provide the most accurate estimates in varied MET conditions. Simulated

MET were fit with mixed models with or without genetic relationship matrices (GRM) and with

structures of varying complexity used to model relationships among environments. The model that

included a GRM and a constant variance-constant correlation structure was the most accurate for the

v

largest number of scenarios. More complex models were the most effective for a smaller subset of

scenarios, most involving many genotypes and low experimental error.

Statistical analyses were applied in consultation with other researchers for two projects

studying Fusarium crown rot of wheat and one on cold tolerance of wheat. Heritability and

genetic correlations were calculated for Fusarium resistance assays in field, growth chamber, and

terrace bed settings. Factor analysis was used to estimate latent factors from field characteristic

variables, which were used as predictor variables in linear mixed models and generalized linear

mixed models. Cold tolerance among genotypes was assessed with logistic regression.

vi

TABLE OF CONTENTS

ACKNOWLEDGEMENT ........................................................................................................ III

ABSTRACT .............................................................................................................................. IV

TABLE OF CONTENTS .......................................................................................................... VI

LIST OF TABLES .................................................................................................................... IX

LIST OF FIGURES .................................................................................................................... X

LITERATURE REVIEW ............................................................................................................... 1

CORE SUBSETS OF GERMPLASM COLLECTIONS ........................................................................... 1

MIXED MODELS FOR MULTIPLE ENVIRONMENT TRIALS ............................................................. 6

HERITABILITY AND GENETIC CORRELATION ............................................................................. 13

DIMENSION REDUCTION FOR LINEAR MODELING ..................................................................... 17

EXTREME COLD TOLERANCE IN WHEAT ................................................................................... 20

REFERENCES ............................................................................................................................. 22

METHODS FOR SELECTING GERMPLASM CORE SUBSETS USING SPARSE

PHENOTYPIC DATA.................................................................................................................. 30

ABSTRACT ................................................................................................................................. 30

INTRODUCTION.......................................................................................................................... 32

MATERIALS AND METHODS ...................................................................................................... 36

RESULTS ................................................................................................................................... 42

DISCUSSION .............................................................................................................................. 44

Conclusion ........................................................................................................................ 47

APPENDIX ................................................................................................................................. 47

vii

REFERENCES ............................................................................................................................. 50

COMPARISON OF LINEAR MIXED MODELS FOR MULTIPLE ENVIRONMENT PLANT

BREEDING TRIALS ................................................................................................................... 64

ABSTRACT ................................................................................................................................. 64

INTRODUCTION.......................................................................................................................... 65

METHODS .................................................................................................................................. 68

Simulations ....................................................................................................................... 68

Analyses ............................................................................................................................ 70

RESULTS AND DISCUSSION ........................................................................................................ 74

Justification of Approach .................................................................................................. 74

Choice of a Default Model ................................................................................................ 75

Models for Specific Scenarios .......................................................................................... 76

DISCUSSION .............................................................................................................................. 78

Conclusions ....................................................................................................................... 80

APPENDIX: REAL DATA AS A BASIS FOR SIMULATIONS ............................................................. 81

REFERENCES ............................................................................................................................. 83

CONSULTING PROJECTS ......................................................................................................... 98

HERITABILITY AND GENETIC CORRELATION ANALYSES FOR FUSARIUM CROWN ROT

RESISTANCE ASSAYS OF WHEAT MAPPING POPULATION .......................................................... 98

Abstract ................................................................................................................................. 98

Discussion of Statistical Methods ......................................................................................... 99

LINEAR MODELING OF THE RELATIONSHIPS BETWEEN WHEAT FIELD CHARACTERISTICS AND

FUSARIUM CROWN ROT OBSERVATIONS ................................................................................. 106

viii

Abstract ............................................................................................................................... 106

Discussion of Statistical Methods ....................................................................................... 107

LOGISTIC REGRESSION ANALYSIS OF WHEAT COLD TOLERANCE TESTING ............................ 112

Summary ............................................................................................................................. 112

Discussion of Methods ........................................................................................................ 113

REFERENCES ........................................................................................................................... 120

ix

LIST OF TABLES

Table 1. Measurement levels and missing value percentages of variables evaluated on the

Triticum aestivum L. subsp. aestivum complete collection. ......................................................... 54

Table 2. Removal percentages by variable for simulating data sets with missing values

by removing values from the "complete collection". .................................................................... 55

Table 3. Comparisons of core subset selection methods in terms of diversity of 1000

potential core subsets selected from 200 complete collections simulated with values removed at

the rates given by set 1 (see Table 2) from accessions selected randomly from a uniform

distribution. ................................................................................................................................... 56

Table 4. Comparisons of core subset selection methods in terms of diversity of 1000

potential core subsets selected from 200 complete collections simulated with values removed at

the rates given by set 1 (see Table 2) from accessions selected as a contiguous group. .............. 57

Table 5. Comparisons of core subset selection methods in terms of diversity of 1000

potential core subsets selected from 200 complete collections simulated with values removed at

the rates given by set 2 (see Table 2) from accessions selected randomly from a uniform

distribution. ................................................................................................................................... 58

Table 6. Comparisons of core subset selection methods in terms of diversity of 1000

potential core subsets selected from 200 complete collections simulated with values removed at

the rates given by set 2 (see Table 2) from accessions selected as a contiguous group. .............. 59

x

LIST OF FIGURES

Figure 1. Plot of cumulative means, over simulations, of median recovery of interquartile

range, over 1000 potential core subsets per simulation. Simulations were generated by removing

values from randomly chosen individual accessions with missingness rates given by set 1. The

values of the means of all 200 simulations are shown in Table 3……………………………..…60

Figure 2. Plot of cumulative means, over simulations, of median recovery of interquartile

range, over 1000 potential core subsets, ranked across methods within each simulation.

Simulations were generated by removing values from randomly chosen individual accessions

with missingness rates given by set 1. The mean ranks, over all 200 simulations, are shown in

Table 3.…………………………………………………………………………………………..61

Figure 1. Means, over simulations, of model ranks, where models were ranked in terms

of RMSEP within each simulation. All scenarios evaluated are included, and index denotes each

scenario‟s position in the order. Scenarios are ordered CSA, CSAH, CSAVH, CSB, CSBH,

CSBVH, Toep, ToepH, and then ToepVH, with the indices of the final scenarios of each group

equal to 76, 154, 230, 304, 380, 456, 532, 608, and 682, respectively. Within each of these

patterns, numbers of environments are ordered 5, 10, 20, and then 40 environments. Within each

number of environments, the numbers of genotypes are ordered 25, 50, 100, and then 150

genotypes. Within each number of genotypes, the experimental designs are ordered RCBD,

MAD, and then unreplicated designs. Within each design, error variances are ordered 0.5 then

2.0.……………………………………………………………………………………………..…85

Figure 2. A standardized version of Figure 1, where models have been ranked within

each scenario in terms of their mean ranks. The order of scenarios is the same as Figure 1…...86

xi

Figure 3. The same as figure 2, but only the models GRM_CorV and GRM_CorH. The

order of scenarios is the same………………………………………………………..……..……87

Figure 4. Equivalent to Figure 3, with only scenarios with high (2.0) error variance

included. Scenarios are ordered CSA, CSAH, CSAVH, CSB, CSBH, CSBVH, Toep, ToepH, and

then ToepVH, with the indices of the final scenarios of each group equal to 39, 78, 116, 154,

192, 230, 268, 306, and 343, respectively. Within each of these patterns, numbers of

environments are ordered 5, 10, 20, and then 40 environments. Within each number of

environments, the numbers of genotypes are ordered 25, 50, 100, and then 150 genotypes.

Within each number of genotypes, the experimental designs are ordered RCBD, MAD, and then

unreplicated designs.……………………………………………………………………….….....88

Figure 5. Equivalent to Figure 3, with only scenarios with low (0.5) error variance

included. Scenarios are ordered CSA, CSAH, CSAVH, CSB, CSBH, CSBVH, Toep, ToepH, and

then ToepVH, with the indices of the final scenarios of each group equal to 37, 76, 114, 150,

188, 226, 264, 302, and 339, respectively. Within each of these patterns, numbers of

environments are ordered 5, 10, 20, and then 40 environments. Within each number of

environments, the numbers of genotypes are ordered 25, 50, 100, and then 150 genotypes.

Within each number of genotypes, the experimental designs are ordered RCBD, MAD, and then

unreplicated designs.……………………………………………………………………………..89

Figure 6. Equivalent to Figure 3, only including scenarios simulated with a compound

symmetric pattern of relationships among environments. Scenarios are ordered CSA, then CSB,

with the indices of the final scenarios of each group equal to 76 and 150, respectively. Within

each of these patterns, numbers of environments are ordered 5, 10, 20, and then 40 environments.

Within each number of environments, the numbers of genotypes are ordered 25, 50, 100, and

xii

then 150 genotypes. Within each number of genotypes, the experimental designs are ordered

RCBD, MAD, and then unreplicated designs. Within each design, error variances are ordered

0.5 then 2.0.……………………………………………………………..……………….……….90

Figure 7. Equivalent to Figure 3, only including scenarios simulated with a compound

symmetric pattern of correlations among environments and heterogeneous variances of genotype

effects within environments. Scenarios are ordered CSAH, then CSBH, with the indices of the

final scenarios of each group equal to 78 and 154, respectively. Within each of these patterns,

numbers of environments are ordered 5, 10, 20, and then 40 environments. Within each number

of environments, the numbers of genotypes are ordered 25, 50, 100, and then 150 genotypes.

Within each number of genotypes, the experimental designs are ordered RCBD, MAD, and then

unreplicated designs. Within each design, error variances are ordered 0.5 then 2.0.……..……91

Figure 8. Equivalent to Figure 3, only including scenarios simulated with a compound

symmetric pattern of correlations among environments and extremely heterogeneous variances

of genotype effects within environments. Scenarios are ordered CSAVH, then CSBVH, with the

indices of the final scenarios of each group equal to 76 and 152, respectively. Within each of

these patterns, numbers of environments are ordered 5, 10, 20, and then 40 environments.

Within each number of environments, the numbers of genotypes are ordered 25, 50, 100, and

then 150 genotypes. Within each number of genotypes, the experimental designs are ordered

RCBD, MAD, and then unreplicated designs. Within each design, error variances are ordered

0.5 then 2.0.………………………………………….…………………………………….….…92

Figure 9. Equivalent to Figure 3, only including scenarios simulated with a Toeplitz

pattern of correlations among environments. Scenarios are ordered Toep, ToepH, and then

ToepVH, with the indices of the final scenarios of each group equal to 76, 152, and 226,

xiii

respectively. Within each of these patterns, numbers of environments are ordered 5, 10, 20, and

then 40 environments. Within each number of environments, the numbers of genotypes are

ordered 25, 50, 100, and then 150 genotypes. Within each number of genotypes, the

experimental designs are ordered RCBD, MAD, and then unreplicated designs. Within each

design, error variances are ordered 0.5 then 2.0.………………………………….……….….…93

Figure 10. Equivalent to Figure 3, only including scenarios simulated with a Toeplitz

pattern of correlations among environments, 100 or 150 genotypes, 5 to 20 environments, and

low (0.5) error variance. Scenarios are ordered Toep, ToepH, and then ToepVH, with the indices

of the final scenarios of each group equal to 14, 29, and 43, respectively. Within each of these

patterns, numbers of environments are ordered 5, 10, and then 20 environments. Within each

number of environments, the numbers of genotypes are ordered 100 and then 150 genotypes.

Within each number of genotypes, the experimental designs are ordered RCBD, MAD, and then

unreplicated designs.….……………………………………………………………………….…94

Figure 11. Equivalent to Figure 3, only including scenarios simulated with 25 genotypes.

Scenarios are ordered CSA, CSAH, CSAVH, CSB, CSBH, CSBVH, Toep, ToepH, and then

ToepVH, with the indices of the final scenarios of each group equal to 24, 48, 72, 96, 120, 144,

168, 192, and 216, respectively. Within each of these patterns, numbers of environments are

ordered 5, 10, 20, and then 40 environments. Within each number of environments, the

experimental designs are ordered RCBD, MAD, and then unreplicated designs. Within each

design, error variances are ordered 0.5 then 2.0.…………………………………..…..………..95

Figure 12. Equivalent to Figure 3, only including scenarios simulated with MAD or

unreplicated designs. Scenarios are ordered CSA, CSAH, CSAVH, CSB, CSBH, CSBVH, Toep,

ToepH, and then ToepVH, with the indices of the final scenarios of each group equal to 50, 102,

xiv

154, 203, 255, 307, 357, 409, and 461, respectively. Within each of these patterns, numbers of

environments are ordered 5, 10, 20, and then 40 environments. Within each number of

environments, the numbers of genotypes are ordered 25, 50, 100, and then 150 genotypes.

Within each number of genotypes, the experimental designs are ordered MAD, and then

unreplicated designs. Within each design, error variances are ordered 0.5 then 2.0.…………...96

Figure 13. A standardized version of Figure 1, where only models not including GRM

have been ranked within each scenario in terms of their mean ranks. The order of scenarios is

the same.………………………………………………………………………………......……..97

1

CHAPTER 1

LITERATURE REVIEW

Crop science research relies on statistical methods to assist in making objective decisions based on

complex and subtle patterns in nature that are not always obvious from raw observations or observed

results from experiments. Among other tasks in crop science research, statistical methods may be used to

group genotypes based on phenotypic data, make accurate predictions about the future performance of

breeding material, estimate relationships from observational data, and test hypotheses from designed

experiments.

Core Subsets of Germplasm Collections

Crop plant germplasm collections are maintained to conserve genetic variation and to provide useful plant

material for researchers and plant breeders. An example is the collection of wheat (Triticum aestivum L.

subsp. aestivum) accessions maintained as part of the National Small Grains Collection of the USDA-

ARS National Plant Germplasm System (http://www.ars-grin.gov/npgs/index.html).

For many researchers and plant breeders, germplasm collections are often too large and too

lacking in descriptive data to be of practical use. A well-characterized core collection, or core subset

(these terms will be used interchangeably), that consists of a reduced number of accessions (usually about

10% of the total) can provide increased accessibility and utility while still maintaining most of the genetic

diversity of the complete collection (Brown, 1989). Users of core subsets generally seek a diverse sample

that varies for one or more characteristics. For example, Wang et al. (2010) evaluated a rice core subset

for resistance to the blast fungal disease and identified known and novel genetic sources for resistance.

These researchers utilized the core subset to access the genetic diversity of the complete collection

without needing to evaluate the large numbers of very similar accessions held in the complete collection.

For desired alleles that are evenly distributed throughout the whole collection, at any level of abundance,

a simple random sample of the complete collection is the most appropriate. This is because every

2

accession selected for the core subset would have an equal chance of having the specific allele. If desired

alleles are instead localized to certain parts of the collection, preferentially selecting a portion from each

heterogeneous group present in the complete collection increases the likelihood of selecting these

unevenly distributed alleles (Brown, 1989). For this reason, most researchers have constructed core

collections by grouping accessions and then selecting accessions within groups.

A number of different methods and types of data have been used to group accessions and select

core collections. Passport data, i.e. the location of cultivation or collection, has been used to stratify the

complete collection, followed by selection from within each stratum. This technique was used to select a

core subset for the complete wheat collection described above (USDA ARS, National Genetic Resources

Program, 2009), and this method has also been used to develop other core collections (Skinner et al.,

1999; Huamán et al., 2000; Dahlberg et al., 2004; Yan et al., 2007). Other methods for selecting core

collections have included stratification based on geographic origin, followed by further grouping based on

cluster analysis of phenotypic traits (Basigalup et al., 1995; Rao and Rao, 1995; Igartua et al., 1998; Tai

and Miller, 2001; Upadhyaya et al., 2001, 2006; Mahalakshmi et al., 2006; Bhattacharjee et al., 2007;

Dwivedi et al., 2008). Stratification of collections has also been conducted using cluster analysis without

prior geographic grouping (Diwan et al., 1995; Franco et al., 1997, 1998, 1999, 2005; Grenier et al.,

2001a; Li et al., 2004; Anderson, 2005; Holbrook and Dong, 2005; Weihai et al., 2008; Upadhyaya et al.,

2008). A study which compared methods for selecting core subsets using relatively complete phenotypic

data demonstrated that selection based on clustering using those data was superior to selection based on

geographic origin alone (Diwan et al., 1995).

Researchers have also conducted cluster analysis based on genotypic data, either based on actual

genotyping (Franco et al., 2006; Wang et al., 2006; Balfourier et al., 2007; Escribano et al., 2008; Hao et

al., 2008) or predicted genotypic effects based on modeling of phenotypic data (Hu et al., 2000; Li et al.,

2004). Combinations of genotypic and phenotypic data have also been used to group accessions (Franco

et al., 2010). Grouping based on genotype data would be expected to better reflect the genetic

relationships among accessions. However, researchers are limited in the number of accessions that can be

3

evaluated and the depth of genotyping possible. Such limitations may prevent the selection of core

subsets from large collections based on genetic data or may result in cores that are not as diverse as those

selected based on non-genetic data

The clustering method used and the data used in the clustering process have also varied. Choice

of clustering method determines the way in which variable data or distance calculations are used to group

accessions and different method choices can result in dramatic differences in final grouping. Ward‟s

minimum variance method is one clustering method used by many researchers to construct cores (Franco

et al., 1997; Hu et al., 2000; Upadhyaya et al., 2006, 2008, 2001; Anderson, 2005; Holbrook and Dong,

2005; Reddy et al., 2005; Kang et al., 2006; Mahalakshmi et al., 2006; Bhattacharjee et al., 2007;

Dwivedi et al., 2008). Other clustering methods that have been used include unweighted pair-group

method using arithmetic average (UPGMA), also known as the average linkage method (Hu et al., 2000;

Huamán et al., 2000; Li et al., 2004; Franco et al., 2006; Weihai et al., 2008); complete linkage (Hu et al.,

2000); and the Ward-Modified Location Method (Franco et al., 1998, 1999, 2005). Authors have

constructed clusters based on a variety of phenotypic variables. In many cases these variables have been

uniformly quantitative, and authors often either used Euclidian distances or principle components to

determine relationships among accessions and construct clusters (Diwan et al., 1995; Igartua et al., 1998;

Holbrook and Dong, 2005; Kang et al., 2006; Bhattacharjee et al., 2007; Upadhyaya et al., 2008).

However, a smaller number of researchers have used both categorical and quantitative variables in cluster

analysis (Franco et al., 1997, 1998, 1999, 2005; Kroonenberg et al., 1997).

Grouping via geographic information and/or cluster analysis serves two purposes. The first is to

aid in selecting a core with reduced redundancy as described above. The second benefit of grouping is

that it provides structure to the accessions and connections to the reserve collection, which is the set of

accessions from the complete collection that are not included in the core. If breeders find lines in the core

collection that are of interest, they can trace connections from these lines to sets of additional accessions

in the reserve with similar characteristics. Ideally these accessions will be genetically similar to the

accessions in the core, although this will depend on the effectiveness of the grouping. Milkas et al.

4

(1999) reported using core and reserve collections of common bean in such a way to discover sources of

white mold resistance beyond a set found in a core subset.

Following the stratification and clustering of the complete collection, a set of accessions is chosen

from each group and compiled into a core. Generally accessions are chosen at random from each stratum;

however, some researchers have suggested that direct, or only partially random, selection of all or a

portion of the accessions in a core can increase diversity (Basigalup et al., 1995; Skinner et al., 1999;

Huamán et al., 2000; Rodiño et al., 2003; Yan et al., 2007; Weihai et al., 2008). Several core subsets

have been selected using proportional sampling, a random selection method that determines quantities of

accessions from each group in proportion to the number of accessions in each group (Basigalup et al.,

1995; Upadhyaya et al., 2001, 2006, 2008; Grenier et al., 2001b; Dahlberg et al., 2004; Holbrook and

Dong, 2005; Reddy et al., 2005; Bhattacharjee et al., 2007; Dwivedi et al., 2008). This proportional

sampling method is the most effective choice if the numbers of accessions in each group in the complete

collection perfectly reflect the true genetic diversity of all the genotypes in the world that could fit in that

group. In reality, the selection of accessions for germplasm collections may differ markedly from such

perfection, largely due to constraints on collection activities. Sampling methods that take relatively fewer

samples from larger clusters reduce redundancy and increase variability, as larger clusters tend to have

greater redundancy among accessions (Brown, 1989). Common implementations of such sampling

strategies include selection in proportion to the square root (Huamán et al., 2000; Wang et al., 2006) and

natural logarithm of group size (Grenier et al., 2001b; Yan et al., 2007). Selecting equal numbers of

accessions from each group, regardless of the size of the group, is the most extreme method for

attempting to reduce redundancy. Rather than basing sampling strategy on the relationships between

group sizes and diversity, some sampling methods attempt to increase diversity by selecting more

accessions from groups with greater relative diversity. An example of this is selecting sample numbers

relative to the mean distance among accessions in each cluster (Franco et al., 2005).

One aspect of the core collections developed in the studies referenced above is that they were

constructed using complete or nearly complete data sets of geographical, phenotypic, or genotypic data.

5

Unfortunately, many germplasm collections only have complete, or even mostly complete, data for a few

variables. Grouping based only on a few variables is unlikely to maintain the allelic diversity of genes

that affect other traits. Therefore, it is desirable to utilize all the variables for which we have even limited

information. One method for doing so is to use Gower‟s distance (Gower, 1971), as this metric allows the

calculation distances between accessions based on variables, of any measurement level (nominal, ordinal,

interval, or ratio), for which both accessions have values, and is not affected by variables for which either

accession has a missing value.

The goal of a core subset is to represent the diversity of a complete collection with a

reduced number of accessions. Therefore, the best method for selecting core subsets is the one

that results in the most diversity for a given number of accessions. A wide variety of tests and

calculations have been used to evaluate diversity of core subsets and to compare them to

complete collections under the assumption that core subsets and complete collections are

independent samples of some larger population. These methods have included chi-square tests

of independence of collection type and country of origin, marker alleles, and nominal phenotypic

variables (Tai and Miller, 2001; Upadhyaya et al., 2001, 2006, 2008; Grenier et al., 2001b;

Reddy et al., 2005; Mahalakshmi et al., 2006; Bhattacharjee et al., 2007; Agrama et al., 2009).

Differences between the distribution of quantitative variables for proposed core subsets and

complete collections have been tested using the Levene test and the Newman-Keuls test

(Upadhyaya et al., 2001, 2006, 2008; Grenier et al., 2001b; Reddy et al., 2005; Kang et al., 2006;

Bhattacharjee et al., 2007; Agrama et al., 2009). However, the validity of statistical tests of

differences between complete collections and core subsets is questionable, since these are not

independent samples in two respects. First, the complete collection is not a random sample of all

the germplasm for the species in the collection (due to limitations in collection activities), and so

statistics calculated on the complete collection should not be considered estimates of the

6

population of all germplasm for that species. Instead, the complete collection should be

considered a population of interest for which we can calculate exact parameter values. Second,

even if the complete collection is incorrectly considered a random sample, the core subset is not

independently sampled; it is a subset of the observations in the complete collection, violating an

assumption of these inference tests.

Aside from proper statistical testing, the other consideration when evaluating core subsets

is how distributions of variables in the core subset should reflect the complete collection. Many

researchers have sought core subsets that match the mean values observed in the complete

collection (Hu et al., 2000; Upadhyaya et al., 2006, 2008; Weihai et al., 2008; Parra-Quijano et

al., 2011). However, achieving the same mean values as the original collection does nothing to

further the goal of increased diversity, in fact it can result in selection against more diverse core

subsets. Unless the distributions of quantitative variables in the complete collection are

symmetrical, a core subset with reduced redundancy will have a mean that is shifted toward the

skew. Selecting against such a change will favor methods that either omit extreme values on the

skewed end or reduce redundancy less.

Mixed Models for Multiple Environment Trials

Evaluations of genotypes in varied environmental conditions are referred to as multi-environment trials

(MET), and are used in advanced stages of plant breeding programs to identify genotypes with superior

performance across environments and within specific environments or sets of environments. Yield data

from MET often show genotype by environment interactions (GE). That is, genotypes respond

differently to different environments. Genotype by environment interactions can occur for all response

variables measured in MET, such as biomass or testweight, and can be analyzed in the same way as yield.

We will use yield as our example response variable.

7

When G×E occurs, the average yield of a genotype across all environments is no longer sufficient

information upon which to base selections. Genotype by environment interactions occur in two forms.

The less problematic form is changes of scale or interaction without rank changes. As the name implies,

this occurs when the absolute yield differences among genotypes are not consistent from one environment

to another, but the rankings of the genotypes remain constant. With this type of G×E, there is still a

genotype that is superior to all the others, but this difference may not be significant in all environments.

The second form of G×E is cross-over interaction, occurring when genotypes have different rankings in

different environments. This necessitates evaluating genotypes in each environment separately. Breeders

will then often select those genotypes with consistent relatively high performance across environments.

Observed genotype yields in particular environments can be thought of as a sum of pattern and noise,

where pattern is the yield expected whenever that genotype is grown in that environment and noise is

defined as the deviation of the particular observation from the true pattern. The goal of statistical

modeling is to find a model that explains the true pattern of genotype responses in each environment, and

there are many methods that have been devised to do so.

A traditional approach to the analysis of GE is a two-way analysis of variance (ANOVA) model

where genotype, environment, and their interaction are treated as fixed effects with the model:

ijkijjiijk geegy )(

where yijk is the yield (or other response variable) of the kth replicate of the i

th genotype in the j

th

environment, μ is the overall mean, gi is the fixed effect of the ith genotype, ej is the fixed effect of the j

th

environment, (ge)ij is the interaction between the ith genotype and the j

th environment, and ijk is the

experimental error associated with the ijkth observation; i = 1…Ng, j =…Ne, k = 1…Nr. In this approach, a

significant interaction necessitates the estimation of G×E effects using the simple mean across replicates

of each genotype within each environment. These are referred to as the cell means. The major

disadvantage of this fixed effects approach is that these estimates are usually based on very little data

(usually two to four datapoints, depending on the number of replicates) and so are less predictively

(1)

8

accurate than alternative estimators. This approach cannot be used to estimate GE effects when

genotypes are not replicated within environments, since the effect of GE and experimental error are

confounded. Confounding also occurs with replication if all replicates, or all but one, of any combination

of genotype and environment are missing.

Various alternatives have been shown to be superior to this traditional approach, including

approaches with a fixed effects framework. One of the earliest approaches was joint regression analysis

or the Finlay-Wilkinson model (Yates and Cochran, 1938; Finlay and Wilkinson, 1963) where a regressor

is estimated for each genotype on the mean of all genotypes in each environment. More recently, the

additive main effects and multiplicative interaction (AMMI; Gauch and Zobel, 1988; Gauch, 1988) and

sites regression (SREG; Cornelius and Crossa, 1999) model families demonstrated improved predictive

accuracy over the cell means. These two model families use sums of multiplicative terms, derived from

singular value decomposition, replacing (ge)ij, in the case of AMMI, or gi +(ge)ij for SREG. The cell

means model can be considered a case of the AMMI model where all possible multiplicative terms are

included in the model. The AMMI and SREG models have been shown to be relatively equivalent in

terms of predictive accuracy (Cornelius and Crossa, 1999). Like the analysis of G×E in a fixed effects

ANOVA, these models cannot be used when data from any genotype and environment combination is

missing.

Another approach is to use best linear unbiased prediction (BLUP) of random effects from a two-

way mixed ANOVA model specified as in (1), but with genotypes or environments and G×E treated as

random effects. This model can be specified in matrix notation as:

y = Xβ + Zγ + e, (2)

where y is the vector of observations, β and γ are the vectors of fixed and random effects, respectively, X

and Z are design matrices, and e is the vector of experimental error. The random effects vector, γ,

consists of a subvector for genotype (and/or environment) main effects and a subvector for G×E effects.

Alternatively, γ can be limited to only G×E effects. The random effects are assumed to follow a

9

multivariate normal distribution with mean of 0 and a variance-covariance matrix G. Hill and Rosenburg

(1985) set G = σ2I, that is, constant variance and no covariance. Hill and Rosenburg determined that the

use of BLUP improved predictive accuracy over cell means, which they attributed to its shrinkage

property. That is, the predictions from the BLUP method are shrunk towards the mean, but the bias this

introduces is offset by a reduction in variance (Piepho et al., 2008). Assuming that G = σ2I does not

allow G to reflect any relationships among environments. Additionally, it does not take into account

relationships among genotypes known from pedigree or marker data. This limits the accuracy with which

estimates of G can reflect reality, and thus limits the accuracy of predicted breeding values, because

information from correlated environments is not included in the BLUP calculations.

Further, the model used by Hill and Rosenburg (1985) assumes that genotypes are independent,

but in most MET at least a portion of the genotypes are related and therefore would be expected to show

some correlation in their effects. Breeders keep detailed pedigrees of the lines in their breeding programs

and so are able to predict the degree of additive genetic relationship among genotypes by calculating a

genetic (also numerator, kinship or additive) relationship matrix (given the symbol A) using the

coefficient of coancestry (Mrode and Thompson, 2005). Henderson (1973) proposed a method for using

pedigree information, through the inverse of A, to calculate BLUP from mixed models of dairy cattle

sires. Following Henderson‟s (1976) description of a method for quickly calculating A-1

without first

generating A, animal breeders began using pedigree information with BLUP to make selections.

Animal breeders now routinely use BLUP with pedigree data, but adoption by plant breeders has

been slower. Examples of use by plant breeders include selection of soybean parents and crosses (Panter

and Allen, 1995a; b), and selection of parents in peanuts (Pattee et al., 2001). Molecular marker data can

also be used to generate a genetic relationship matrix (Bernardo, 1994, 1995; Villanueva et al., 2005;

Hayes et al., 2009). Such genetic relationship matrices are estimates of realized relationship matrices

which reflect the way the proportion of the genome that is identical by decent between two individuals

can differ from the value predicted by the pedigree due to Mendelian sampling, especially if multiple

rounds of selfing have occurred after a cross. Bernardo (1996) used coefficients of coancestry calculated

10

from pedigrees to calculate BLUP and observed high predictive accuracy as measured by cross-

validation. Piepho et al. (2008) review and provide examples of BLUP based on pedigree data without

using the coefficient of coancestry.

The predictive accuracy of mixed models may also be improved by increasing the complexity of

the variance-covariance matrix of the random G×E effect (Gge) beyond an identity matrix. Note that Gge

is a submatrix of G and is equal to G when random main effects are not included in the model. Smith et

al. (2001) suggested that Gge can often be assumed separable such that Gge = Ge ⨂ Ig, where Ig is an

identity matrix. The specific year and location combinations that are used as environments in MET can

easily be thought of as random samples from a population of possible environments, but these

environments do not behave independently in most MET. Instead, groups of environments have similar

conditions and genotype responses. For example, locations in close proximity would be expected to have

similar weather, resulting in more favorable yields for similar genotypes. In that case, Ge with non-zero

covariances between environments may be beneficial. Additionally, it may be more accurate to model

responses in each environment with a different variance (heterogeneous variances). The most general

way of doing so is to allow separate parameters for each variance and covariance. This is referred to as

an unstructured matrix and it has a total of j × (j + 1)/2 parameters, where j is the number of

environments. Unfortunately, this means that the number of parameters to be estimated increases in a

greater than linear rate with the number of environments, so the use of an unstructured matrix is often

impossible for large numbers of environments and may be unstable for fewer environments. In order to

reduce the number of parameters that must be estimated, various simpler structures for Ge can be fit. For

instance, we may assume no covariance among environments, but allow for heterogeneous variances

among environments; a diagonal structure. Alternatively, one can fit the same variance to all

environments and a single covariance to all pairs of environments, referred to as a compound symmetric

structure. When used to evaluate faba bean MET datasets, Piepho (1994) determined that the BLUP

predictions, using a compound symmetric structure for Ge, were more predictively accurate than those of

11

any AMMI family model, including the cell means model. Many other more complex structures can be

used to model Ge.

One such structure is the factor analytic model (FA) which is a mixed model version of the

multiplicative model family proposed by Gollob (1968) and Mandel (1971). The fixed effects version of

this model is usually referred to as the AMMI model family, which was mentioned earlier. The FA

structure provides a compromise between the diagonal and unstructured matrices by finding a few

common factors that best explain correlations between environments and then fitting the residual

variation for each environment after the common factors are fit. Piepho (1997) used this model to analyze

MET using the form:

ijjiiij ebgy ,

where yij is the mean observed yield (or response) for the ijth genotype and environment combination, gi is

the fixed main effect of the ith genotype, bi is a score for genotype i, ej is a main effect for environment j,

and ij is the error for the ijth genotype and environment combination, which includes both experimental

error and unexplained interaction. He considered environmental effects and genotype scores random, so

bi, ej, and ij are independently normally distributed with mean zero and variances of σ2

b, σ2

e and σ2,

respectively. The variance-covariance matrix of genotype means in environment j (yj) is equal to σ2

eJ + λ

λ′ + D where J is a square matrix of ones, λ is a vector with elements equal to αiσβ (αi must be estimated

along with the variance components) and D is equal to σ2I. This model can be expanded to include

multiple factors in the interaction term. When Piepho fit the model to a MET of 10 wheat varieties in 17

environments, it had a similar -2 log-likelihood and fewer parameters compared to a generalized version

of the Finlay-Wilkinson model, and so was considered superior.

Smith et al. (2001) also fit a model that included a factor analytic structure for a variance-

covariance matrix using the basic matrix formulation of the mixed model given in (2), and modeling the

variance-covariance matrix of the G×E interaction as separable such that Gge = Ge ⨂ Ig as described

12

above. A factor analytic model for Ge was modeled including the random effect of genotypes within

environments (γ) as:

δfu )( IΛg ,

where Λ is a matrix whose columns are known as loadings, f is a vector that can be partitioned into

factors corresponding to the columns of Λ, and δ is a vector of residuals (or specific variances). The

vectors f and δ have independent multivariate normal distributions with mean 0, and variance-covariances

of I and Ψ ⨂ I, respectively. The variance-covariance matrix for γ is:

IΨΛΛγ

IΛIΛγ

var

varvarvar δf

Smith et al. (2005) showed that the model used by Piepho (1997) can be specified in a matrix algebraic

form similar to that used by Smith et al. (2001). However, Smith et al. (2001) considered genotypes to be

random effects with fixed effects for environments, did not include a main effect for genotype in some

models, used heterogeneous specific variances (Ψ as opposed to Piepho‟s σ2I), and included a spatial

model for within-field variation. Both of these models assumed that genetic effects were independent.

Fitting this model to a MET with 172 genotypes in 7 locations, Smith et al. found that a two factor model

fit the data nearly as well as an unstructured Ge as judged by a likelihood ratio test.

As described above, researchers have attempted to improve the predictive accuracy of analyses of

MET by either incorporating pedigree data or FA structures into models of G×E variance. Crossa et al

(2006) and Oakey et al. (2007) went one step further and combined pedigree data with a FA structure for

environmental covariances. Crossa et al. modeled the variance covariance matrix of effects of genotypes

within environments as:

Aγ 1var g ,

where A is the additive relationship matrix, and Σg1 is a structure that models genetic variance and

covariance across environments. Crossa et al. used multiple structures ranging from independent and

identical variances to FA structures. Oakey et al. used a model similar to Smith et al. (2001), except for a

13

different model for spatial effects and a different model for the variance-covariance matrix of effects of

genotypes within environments:

IGDGAGγ idavar ,

where A and D are the additive and dominance relationship matrices, and Ga, Gd, and Gi are structures

that model genetic variance and covariance across environments specific to additive, dominance, and

residual non-additive effects. Oakey et al. fit models with diagonal, compound symmetric, or FA

structures for Ga, Gd, and Gi. Crossa et al. (2006) used a similar model. Kelly et al. (2009) utilized a

similar model, including the use of an additive relationship matrix and a FA structure for Ga, but did not

fit dominance effects. These authors all found that that models with FA structures resulted in better AIC

scores than simpler or more complex structures when fit to real data sets.

Heritability and Genetic Correlation

Heritability is a useful concept in both breeding and genetics, but the use of the word heritability can

cause confusion due to varying definitions and methods of calculation. Heritability is the proportion of

phenotypic variance due to genetic effects. When calculating broad-sense heritability (symbolized H2 or

H), these are total genetic effects; whereas for narrow-sense heritability (h2), we only consider additive

genetic effects (Falconer and Mackay, 1996):

where is the total genetic variance and

is the phenotypic variance, and

where is the additive genetic variance and

is the phenotypic variance.

Additive genetic variance is a component of total genetic variance for multi-loci traits, which can

be partitioned into additive, dominance, and epistatic variance. Additive genetic variance measures the

variance attributable to the individual effects of single alleles. Dominance effects are the deviations from

14

the additive effect of each allele that are observed to occur in heterozygous individuals, due to the

interactions between alleles at a locus. Epistasis refers to the interactions between genes or loci that

deviate from simple additive effects. Epistatic interactions can be among additive and/or dominance

effects and can be among two or more loci. Although the occurrence of epistatic effects is widely

acknowledged, estimation of epistatic effects is often limited by statistical power or experimental design.

Partly for this reason, epistasis is often assumed to be zero or negligible. When epistasis is estimated, the

number of interacting loci is often limited to two and dominance interactions may not be evaluated (Reif

et al., 2009; Duthie et al., 2010). If parents and offspring can be evaluated in the same environment,

covariance between parents and offspring and covariance between full-sib offspring can be used to

estimate additive, dominance, and additive*additive variance (Hallauer and Filho, 1981 pp. 49–52).

Diallel mating designs, with crosses between all pairs of n parents, can be used to estimate and

,

assuming no epistasis (Hallauer and Filho, 1981 pp. 52–60). Theoretically, crosses between three and

four parents can be used to isolate ,

, and , but their use is limited due to the complexity in

obtaining parents and crosses (Hallauer and Filho, 1981 pp. 83–88). Epistatic effects and the other

variance components can be easily estimated for populations in Hardy-Weinberg equilibrium with other

strict assumptions, but in reality, these assumptions are often violated. This makes estimation of variance

components more difficult and may necessitate other assumptions such as no epistasis (Lynch and Walsh,

1998 pp. 141–170).

Beyond broad and narrow-sense definitions, the definition of heritability must be further

specialized for specific uses. In animal breeding and evolutionary genetics, the individual is the unit of

interest; therefore, phenotypic and genetic variance among individuals is used in calculations (Visscher et

al., 2008). In plant breeding, many individuals with the same genotype can be produced, allowing for

replicated testing. Selection can then occur based on means of individuals with the same genotype. This

situation changes the definition of heritability so that the phenotypic variance is adjusted based on the unit

of selection and response (Holland et al., 2003). For example, if genotypes are selected based on means

over e environments and r replicates within environments, broad sense heritability is:

15

where is the variance of genotype by environment interactions, and

is the residual error variance

(Piepho and Möhring, 2007).

An alternate definition of heritability in the breeding context is in terms of the univariate

breeder‟s equation: R = h2S, where R is the expected response to selection and S is the selection

differential. In this context, narrow-sense heritability is the coefficient of the regression of the response to

selection on the selection differential. So by the general definition of regression coefficients, the narrow-

sense heritability when selection occurs on both parents is:

where is the covariance between the selection unit phenotype and the response unit phenotype,

and is the variance among selection unit phenotypes, i.e. the phenotypic variance (Holland et al.,

2003). In most cases, individuals are assumed to be evaluated in independent environments and genotype

effects are specified to have mean 0; therefore, , where is

the expectation of the cross-product of the selection and response genotypic values. The assumption that

environments are independent may not always be appropriate, and in such situations genetic correlations

should be estimated.

Phenotypic correlation reflects the relationship between two phenotypes or traits for a set of

individuals, and is partitioned into environmental and genetic correlations. Usually, genetic and

phenotypic correlations are evaluated for traits measured on the same individual, for example, plant

biomass and yield. Such genetic correlations are due to either to pleiotropy or gametic phase

disequilibrium between multiple genes affecting multiple traits (Lynch and Walsh, 1998 p. 629). When

two traits are measured on individuals or plots of genetically identical plants, the correlation between the

traits, across genotypes, is the phenotypic correlation ( ) (Holland, 2006). If the same genotype is

grown in multiple environments, the correlation between the traits, across genotypes and averaged over

16

all environments, is the genetic correlation ( ). The difference between the two is the correlation

between the two traits when measured in the same environment (

), in this case the same

location, year and place in the field. Genotypes can be grown in multiple plots in the same field to parse

out the correlation due to position in the field (microenvironment) and year and location combination

(macroenvironment).

An alternative is to consider the responses of a genotype in different environments to be different

traits; the genetic correlation is then defined as the correlation between the responses of a set of genotypes

evaluated in two macroenvironments (Falconer, 1952). However, in this situation, the only

environmental and phenotypic correlations are on the scale of the microenvironments within the

macroevironments, and these correlations cannot be evaluated, since microenvironments cannot be

replicated. When treating responses to environments as traits, the genetic correlation is related to G×E,

where genetic correlations of less than one result from genotype by environment interactions (Lynch and

Walsh, 1998 pp. 660–665):

ignoring nonadditive genetic effects, where is the additive genetic correlation, is the variance of

genetic effects, and is the variance of interaction effects (differences in responses to the two

environments varying among genotypes). This relationship means that instead of the traditional fixed

effects ANOVA approach of modeling genotype by environment interactions as effects specific to each

genotype and environment combination, effects of genotypes within environments can be modeled as

correlated random effects that covary between environments and genotypes. This random effects

approach is described in detail in the Mixed Models for Multiple Environment Trials section.

Both heritability and genetic correlation are defined in terms of variance and covariance

parameters that are unknown in reality, and this necessitates estimation of these parameters, generally

through the use of mixed linear models with restricted maximum likelihood estimation. For heritability

estimations, the specific mixed model used depends on the relatedness among selection and response

17

individuals and on the structure of evaluation trials (Holland et al., 2003). This complexity and the

relationship above between the response to selection and realized heritability, led Piepho and Möhring

(2007) to propose a method to simulate values such as response to selection rather than heritability per se.

Estimation of genetic correlation is somewhat more straightforward, as it only depends upon relationship

and trial structures for the population evaluated (Holland, 2006).

Variance, heritability, and genetic correlation are often estimated in a single study and then

considered representative of a population, and this is valid, to the extent that the population represented

remains the one of interest. If any of these parameters are estimated in a study of one set of genetic

material, their application to another may be questionable, depending upon the differences between them.

Heritability and genetic variance will change over time due to selection, inbreeding, or mutation, which

change allelic frequencies and may change additive effects of alleles on a population basis (Visscher et

al., 2008). Heritability estimated for a set of breeding material may change in later years as new material

is introgressed and/or selection removes or reduces the frequency of inferior alleles. For example, if,

through selection, an allele becomes fixed, that locus will no longer contribute to additive effects for that

trait in that population. However, the effect may again become evident if a different allele is reintroduced

into the population. Genetic correlation between two environments may vary as weather conditions may

be more or less similar year-to-year.

Dimension Reduction for Linear Modeling

When fitting multiple linear models, one major assumption is that all explanatory variables are

independent of each other. A major consequence of violating this assumption is that the parameter

estimates for the multicollinear variables will have very large sampling variability and so do not provide

reliable information about the true parameter values (Kutner et al., 2004). The goal of many

observational studies is to evaluate multiple variables to determine which of them have the greatest effect

on a particular response. Statistical analysis of such studies will only be successful if the issue of

multicollinearity is resolved.

18

Multiple remedial measures are available for addressing multicollinearity, one of which is

dimension reduction. Dimension reduction techniques can be used to convert multicollinear variables

into a smaller number of independent variables, creating new variables that are functions of the original

multicollinear variables. Two dimension reduction techniques commonly used to eliminate

multicollinearity are principal components analysis and factor analysis.

Principal components analysis (PCA) is used for dimension reduction by replacing the original

variables with the first few principal components (Lattin et al., 2002). These principal components are

linear combinations of the original variables that are selected one at a time for maximum variance:

where

indicates the value of ui for which the bracketed quantity is maximized, zi is the vector of

scores for the ith scaled principal component, ui is the i

th eigenvector, X is the standardized original data,

R is the sample correlation matrix, and q is the number of original variables. This maximization problem

is solved using an eigenvalue or spectral decomposition of R. Additionally, PCA is equivalent to the

singular value decomposition of X. After PCA, the scores of the first few principal components can be

used to replace the observations of the original variables. Principal components are mutually

independent, and when calculated on a dataset with high multicollinearity, the first few can capture most

of the variation in the original variables. This makes PCA a useful option for dimension reduction prior

to linear regression.

Factor analysis and PCA are very similar but differ in terms of the specific model used for

dimension reduction. Like PCA, factor analysis can be used to generate a smaller set of new variables,

which capture most of the variation in the original variables by decomposing the correlation matrix of the

original variables (Lattin et al., 2002). Unlike PCA, in factor analysis variation not accounted for by the

factors is attributed to specific variance terms for each of the original variables. The new variables

identified in factor analysis are often referred to as latent factors, and are considered to be the true

unmeasurable factors, measured with error by the original variables, that affect the response (Suhr, 2005).

19

The number of new variables generated from either PCA or Factor analysis can range from 1 up

to the number of original variables and multiple techniques can be used to decide how many to keep

(Lattin et al., 2002). A scree plot of eigenvalues versus their associated component/factor can identify the

point at which the eigenvalues decrease in a linear fashion. Since the eigenvalues relate to the variance

explained by each component/factor, only retaining those that are above this linear trend results in a

parsimonious set that account for a large share of the original variation. Alternatively, Kaiser‟s Rule

suggests that only those variables with eigenvalues of greater than 1 should be retained, because each of

the standardized original variables has variance of 1 and thus the new variables should account for more

variation. Horn‟s procedure uses cutoffs like Kaiser‟s Rule, but with cutoffs of the eigenvalues from a

PCA of random data, generated with numbers of variables and observations equal to the original data.

Alternatively, a number of new variables can be retained such that they explain a user-defined proportion

of the variation in the original data or each specific original variable.

Following both PCA and factor analysis, rotation of the solution (a transformation of the matrix

of principal components/factors) may be used to aid in interpretation (Lattin et al., 2002). In the initial

results from PCA and principal factor analysis the first factor is chosen to maximize the variance

accounted for and is often partially correlated to many of the original variables. With rotation, new

factors can be generated that are highly related to some of the original variables and mostly unrelated to

the others. Rotation should be conducted after the final number of components or factors is chosen. The

methods of rotation can be divided into two major groups, those that result in orthogonal factors and those

that allow non-orthogonal factors. When the goal is to generate new variables to use in linear modeling,

orthogonality/independence is usually preferred. The advantage of non-orthogonal rotation is that it

allows for new variables that are related to more distinct subsets of the original variables. The easier

interpretation of non-orthogonal rotation can be advantageous in an observational study, but any resulting

lack of independence will make linear model coefficient estimates inaccurate.

20

Extreme Cold Tolerance in Wheat

Winter wheat is planted in autumn and requires a period of cold vernalization before flowering can occur.

However, not all winter wheat cultivars are able to survive the extreme cold that occurs in some

environments, resulting in winterkill that often causes economically important yield losses (Patterson et

al., 1990). Extreme cold can occur in winter wheat growing areas of Washington State and can result in

observed losses of 70% in an extreme year (Allen et al., 1992). Breeding winter wheat to tolerate extreme

cold is therefore an important goal for wheat breeders in Washington. Winter wheat genotypes vary in

their ability to survive extreme cold and this cold tolerance is controlled by genes on multiple

chromosomes (Sutka, 1994). Expression levels of a large number of genes change when wheat is exposed

to extreme cold and these changes vary across genotypes (Skinner, 2009). Additional work will be

necessary before breeders will be able to select for superior cold tolerance based on genetic information

alone.

Due to the complex genetic nature of wheat cold tolerance, differential phenotypic assessments

are necessary for the selection of breeding lines with greater cold tolerance. Assessment of cold tolerance

in the field is impaired by year to year variation in temperature conditions and variation in conditions

across a field, especially due to variable snow cover (Fowler, 1978). For this reason, evaluations under

controlled environments are more appropriate. Testing is generally conducted by subjecting vernalized

plants at an early growth stage to temperatures that decrease below freezing to a point where survival is

differential among genotypes, followed by slow warming to regular greenhouse temperatures. After a

number of weeks of regrowth plants are scored for survival. Beyond these generalities, numerous

variations have been implemented, including variations in temperature and time at each temperature

(Sutka, 1994; Fowler and Limin, 2004; Reddy et al., 2006; Skinner and Mackey, 2009; Skinner and

Bellinger, 2010).

When saturated soil is exposed to temperatures that decrease rapidly to a point well below

freezing, substantial differences in soil temperatures can occur (Skinner and Mackey, 2009). These

21

differences can be explained by variation in the amount of soil and water in each container that can occur

due to accidental differences in soil packing and heterogeneity in the soil or other planting media.

Containers with more soil and water will take longer to cool or warm, especially during the phase change

as water freezes. Holding the temperature just below freezing for an extended period of time allows the

water in all containers of soil to freeze, reducing variation in temperatures beyond that point. When

exposed to temperatures slightly below freezing (-3°C) for extended periods of time, wheat acquires

increased tolerance to extreme temperatures as compared to shorter periods just below freezing in a

process referred to as sub-zero acclimation (Herman et al., 2006). In the field, such a period of moderate

cold does not always precede an extreme cold event. Therefore, testing for cold tolerance including such

a sub-zero acclimation period may not precisely reflect tolerance in the field. However, the improved

consistency resulting from a sub-zero acclimation period may outweigh this concern. Even with a sub-

zero acclimation period, small differences in temperature may be present among samples, so it may be

beneficial to include temperature measurements in analyses.

The method for analyzing cold tolerance data depends upon the tolerance rating method and any

explanatory variables to be included. Cold tolerance is generally evaluated in terms of numbers of plants

surviving a cold event, but it can also be judged on an ordinal scale in terms of quality of regrowth (Sutka,

1994; Vagujfalvi et al., 2003). While both of these methods involve some subjective judgment, binary

survival is easier to judge and thus likely to be more consistent among researchers. Binary survival data

can be analyzed by treating each plant as an experimental unit or by using the proportion of plants that

survived as the response for each group of plants. If each plant is considered an experimental unit, the

data may be analyzed using logistic regression. Using the proportion approach, researchers have analyzed

survival data using analysis of variance on transformed proportions (Skinner and Mackey, 2009) and have

compared genotypes using the temperature at which 50% of the plants are killed (Limin and Fowler,

1993, 2006) or area under the death progress curve over a range of temperatures (Reddy et al., 2006).

If phenotypic evaluation of extreme cold tolerance is to be included in a breeding program,

evaluations must rapidly segregate large numbers of genotypes into groups with sufficient and insufficient

22

cold tolerance. Exact estimation of absolute tolerance levels is not necessary, but it is necessary to ensure

that placement of each genotype into each group is due to true genetic differences and not random chance.

Therefore, statistical testing that determines if each genotype has significantly different odds of survival

as compared to a control is an effective analysis method. Rapid evaluation of large numbers of genotypes

necessitates minimizing the number of times each genotype must be grown and placed in a freeze

chamber. Using a single cooling profile allows for more efficient testing as compared to testing each

genotype at multiple temperatures. However, it is important to identify beforehand a temperature profile

that is both differential among genotypes and provides cold tolerance assessments that accurately predict

performance in the field.

References

Agrama, H.A., W. Yan, F. Lee, R. Fjellstrom, M.-H. Chen, M. Jia, and A. McClung. 2009.

Genetic assessment of a mini-core subset developed from the USDA rice genebank. Crop

Science. 49(4): 1336–1346.

Allen, R.E., J.A. Pritchett, and L.M. Little. 1992. Cold injury observations. Anual Wheat

Newsletter. 38.

Anderson, W.F. 2005. Development of a forage bermudagrass (Cynodon sp.) core collection.

Grassland Science. 51: 305–308.

Balfourier, F., V. Roussel, P. Strelchenko, F. Exbrayat-Vinson, P. Sourdille, G. Boutet, J.

Koenig, C. Ravel, O. Mitrofanova, M. Beckert, and G. Charmet. 2007. A worldwide

bread wheat core collection arrayed in a 384-well plate. Theoretical and Applied

Genetics. 114: 1265–1275.

Basigalup, D.H., D.K. Barnes, and R.E. Stucker. 1995. Development of a core collection for

perennial Medicago plant introductions. Crop Science. 35: 1163–1168.

Bernardo, R. 1994. Prediction of maize single-cross performance using RFLPs and information

from related hybrids. Crop Science. 34(1): 20–25.

Bernardo, R. 1995. Genetic models for predicting maize single-cross performance in unbalanced

yield trial data. Crop Science. 35(1): 141–147.

Bernardo, R. 1996. Best linear unbiased prediction of maize single-cross performance. Crop

Science. 36(1): 50–56.

23

Bhattacharjee, R., I.S. Khairwal, P.J. Bramel, and K.N. Reddy. 2007. Establishment of a pearl

millet [Pennisetum glaucum (L.) R. Br.] core collection based on geographical

distribution and quantitative traits. Euphytica. 155: 35–45.

Brown, A.H.D. 1989. Core collections: a practical approach to genetic resources management.

Genome. 31: 818–824.

Cornelius, P.L., and J. Crossa. 1999. Prediction assessment of shrinkage estimators of

multiplicative models for multi-environment cultivar trials. Crop Science. 39(4): 998–

1009.

Crossa, J., J. Burgueño, P.L. Cornelius, G. McLaren, R. Trethowan, and A. Krishnamachari.

2006. Modeling genotype × environment interaction using additive genetic covariances

of relatives for predicting breeding values of wheat genotypes. Crop Science. 46(4):

1722–1733.

Dahlberg, J.A., J.J. Burke, and D.T. Rosenow. 2004. Development of a sorghum core collection:

refinement and evaluation of a subset from Sudan. Economic Botany. 58(4): 556–567.

Diwan, N., G.R. Bauchan, and M.S. McIntosh. 1995. Methods of developing a core collection of

annual Medicago species. Theoretical and Applied Genetics. 90: 755–761.

Duthie, C., G. Simm, A. Doeschl-Wilson, E. Kalm, P.W. Knap, and R. Roehe. 2010. Epistatic

analysis of carcass characteristics in pigs reveals genomic interactions between

quantitative trait loci attributable to additive and dominance genetic effects. Journal of

Animal Science. 88(7): 2219 –2234Available at (verified 20 January 2012).

Dwivedi, S.L., N. Puppala, H.D. Upadhyaya, N. Manivannan, and S. Singh. 2008. Developing a

core collection of peanut specific to Valencia market type. Crop Science. 48: 625–632.

Escribano, P., M.A. Viruel, and J.I. Hormaza. 2008. Comparison of different methods to

construct a core germplasm collection in woody perennial species with simple sequence

repeat markers. A case study in cherimoya (Annona cherimola, Annonaceae), an

underutilised subtropical fruit tree species. Annals of Applied Biology. 153: 25–32.

Falconer, D.S. 1952. The Problem of Environment and Selection. The American Naturalist.

86(830): 293–298.

Falconer, D.S., and T.F.C. Mackay. 1996. Introduction to Quantitative Genetics. 4th ed.

Benjamin Cummings.

Finlay, K., and G. Wilkinson. 1963. The analysis of adaptation in a plant-breeding programme.

Aust. J. Agric. Res. 14(6): 742–754.

Fowler, D.B. 1978. Selection for winterhardiness in wheat. II. variation within field trials. Crop

Science. 19(6): 773–775.

24

Fowler, D.B., and A.E. Limin. 2004. Interactions among factors regulating phenological

development and acclimation rate determine low-temperature tolerance in wheat. Annals

of Botany. 94(5): 717 –724.

Franco, J., J. Crossa, and S. Desphande. 2010. Hierarchical multiple-factor analysis for

classifying genotypes based on phenotypic and genetic data. Crop Sci. 50(1): 105–117.

Franco, J., J. Crossa, S. Taba, and H. Shands. 2005. A sampling strategy for conserving genetic

diversity when forming core subsets. Crop Science. 45: 1035–1044.

Franco, J., J. Crossa, J. Villasenor, A. Castillo, S. Taba, and S.A. Eberhart. 1999. A two-stage,

three-way method for classifying genetic resources in multiple environments. Crop

Science. 39: 259–267.

Franco, J., J. Crossa, J. Villasenor, S. Taba, and S.A. Eberhart. 1997. Classifying Mexican maize

accessions using hierarchical and density search methods. Crop Science. 37: 972–980.

Franco, J., J. Crossa, J. Villasenor, S. Taba, and S.A. Eberhart. 1998. Classifying genetic

resources by categorical and continuous variables. Crop Science. 38: 1688–1696.

Franco, J., J. Crossa, M.L. Warburton, and S. Taba. 2006. Sampling strategies for conserving

maize diversity when forming core subsets using genetic markers. Crop Science. 46:

854–864.

Gauch, H.G. 1988. Model selection and validation for yield trials with interaction. Biometrics.

44(3): 705–715.

Gauch, H.G., and R.W. Zobel. 1988. Predictive and postdictive success of statistical analyses of

yield trials. Theoret. Appl. Genetics. 76(1): 1–10.

Gollob, H.F. 1968. A statistical model which combines features of factor analytic and analysis of

variance techniques. Psychometrika. 33(1): 73–115.

Gower, J.C. 1971. A general coefficient of similarity and some of its properties. Biometrics. 27:

857–74.

Grenier, C., P.J. Bramel-Cox, and P. Hamon. 2001a. Core collection of sorghum: I. stratification

based on eco-geographical data. Crop Science. 41: 234–240.

Grenier, C., P. Hamon, and P.J. Bramel-Cox. 2001b. Core collection of sorghum: II. comparison

of three random sampling strategies. Crop Science. 41: 241–246.

Hallauer, A., and J.B.M. Filho. 1981. Quantitative Genetics in Maize Breeding. Iowa State

University Press, Ames.

Hao, C., Y. Dong, L. Wang, G. You, H. Zhang, H. Ge, J. Jia, and X. Zhang. 2008. Genetic

diversity and construction of core collection in Chinese wheat genetic resources. Chinese

Science Bulletin. 53(10): 1518–1526.

25

Hayes, B.J., P.M. Visscher, and M.E. Goddard. 2009. Increased accuracy of artificial selection

by using the realized relationship matrix. Genetics Research. 91(01): 47.

Henderson, C.R. 1973. Sire evaluation and genetic trends. J. Anim Sci. 1973(Symposium): 10–

41.

Henderson, C.R. 1976. A simple method for computing the inverse of a numerator relationship

matrix used in prediction of breeding values. Biometrics. 32(1): 69–83.

Herman, E.M., K. Rotter, R. Premakumar, G. Elwinger, R. Bae, L. Ehler-King, S. Chen, and

D.P. Livingston III. 2006. Additional freeze hardiness in wheat acquired by exposure to -

3C is associated with extensive physiological, morphological, and molecular changes.

Journal of Experimental Botany. 57(14): 3601–3618.

Hill, R.R., and J.L. Rosenberger. 1985. Methods for combining data from germplasm evaluation

trials. Crop Science. 25(3): 467–470.

Holbrook, C.C., and W. Dong. 2005. Development and evaluation of a mini core collection for

the U.S. peanut germplasm collection. Crop Science. 45: 1540–1544.

Holland, J.B. 2006. Estimating Genotypic Correlations and Their Standard Errors Using

Multivariate Restricted Maximum Likelihood Estimation with SAS Proc MIXED. Crop

Science. 46(2): 642–654.

Holland, J.B., W.E. Nyquist, and C.T. Cervantes-Martínez. 2003. Estimating and Interpreting

Heritability for Plant Breeding: An Update. : 9–112.

Hu, J., J. Zhu, and H.M. Xu. 2000. Methods of constructing core collections by stepwise

clustering with three sampling strategies based on the genotypic values of crops.

Theoretical and Applied Genetics. 101: 264–268.

Huamán, Z., R. Ortiz, and R. Gómez. 2000. Selecting a Solanum tuberosum subsp. andigena

core collection using morphological, geographical, disease and pest descriptors.

American Journal of Potato Research. 77: 183–190.

Igartua, E., M.P. Gracia, J.M. Lasa, B. Medina, J.L. Molina-Cano, J.L. Montoya, and I.

Romagosa. 1998. The Spanish barley core collection. Genetic Resources and Crop

Evolution. 45: 475–481.

Kang, C.W., S.Y. Kim, S.W. Lee, P.N. Mathur, T. Hodgkin, M.D. Zhou, and J.R. Lee. 2006.

Selection of a core collection of Korean sesame germplasm by a stepwise clustering

method. Breeding Science. 56: 85–91.

Kelly, A., B.R. Cullis, A.R. Gilmour, J.A. Eccleston, and R. Thompson. 2009. Estimation in a

multiplicative mixed model involving a genetic relationship matrix. Genetics Selection

Evolution. 41: 33–42.

26

Kroonenberg, P.M., B.D. Harch, K.E. Basford, and A. Cruickshan. 1997. Combined analysis of

categorical and numerical descriptors of australian groundnut accessions using nonlinear

principal component analysis. Journal of Agricultural, Biological, and Environmental

Statistics. 2(3): 294–312.

Kutner, M., C. Nachtsheim, J. Neter, and W. Li. 2004. Applied Linear Statistical Models. 5th ed.

McGraw-Hill/Irwin, San Francisco.

Lattin, J., D. Carroll, and P. Green. 2002. Analyzing Multivariate Data. 1st ed. Duxbury Press.

Li, C.T., C.H. Shi, J.G. Wu, H.M. Xu, H.Z. Zhang, and Y.L. Ren. 2004. Methods of developing

core collections based on the predicted genotypic value of rice (Oryza sativa L.).

Theoretical and Applied Genetics. 108: 1172–1176.

Limin, A.E., and D.B. Fowler. 1993. Inheritance of cold hardiness in Triticum aestivum ×

synthetic hexaploid wheat crosses. Plant Breeding. 110(2): 103–108.

Limin, A.E., and D.B. Fowler. 2006. Low-temperature tolerance and genetic potential in wheat

(Triticum aestivum L.): response to photoperiod, vernalization, and plant development.

Planta. 224(2): 360–366.

Lynch, M., and B. Walsh. 1998. Genetics and Analysis of Quantitative Traits. 1st ed. Sinauer

Associates.

Mahalakshmi, V., Q. Ng, M. Lawson, and R. Ortiz. 2006. Cowpea [Vigna unguiculata (L.)

Walp.] core collection defined by geographical, agronomical and botanical descriptors.

Plant Genetic Resources: Characterization and Utilization. 5(3): 113–119.

Mandel, J. 1971. A new analysis of variance model for non-additive data. Technometrics. 13(1):

1–18.

Miklas, P.N., R. Delorme, R. Hannan, and M.H. Dickson. 1999. Using a subsample of the core

collection to identify new sources of resistance to white mold in common bean. Crop

Science. 39: 569–573.

Mrode, R.A., and R. Thompson. 2005. Linear models for the prediction of animal breeding

values. 2nd ed. CABI, Cambridge, MA.

Oakey, H., A.P. Verbyla, B.R. Cullis, X. Wei, and W.S. Pitchford. 2007. Joint modeling of

additive and non-additive (genetic line) effects in multi-environment trials. Theoretical

and Applied Genetics. 114: 1319–1332.

Panter, D.M., and F.L. Allen. 1995a. Using best linear unbiased predictions to enhance breeding

for yield in soybean: I. choosing parents. Crop Science. 35(2): 397–405.

Panter, D.M., and F.L. Allen. 1995b. Using best linear unbiased predictions to enhance breeding

for yield in soybean: II. selection of superior crosses from a limited number of yield

trials. Crop Science. 35(2): 405–410.

27

Parra-Quijano, M., J.M. Iriondo, E. Torres, and L.D. la Rosa. 2011. Evaluation and Validation of

Ecogeographical Core Collections using Phenotypic Data. Crop Science. 51(2): 694.

Pattee, H.E., T.G. Isleib, D.W. Gorbet, F.G. Giesbrecht, and Z. Cui. 2001. Parent selection in

breeding for roasted peanut flavor quality. Peanut Science. 28(2): 51–58.

Patterson, F.L., G.E. Shaner, H.W. Ohm, and J.E. Foster. 1990. A historical perspective for the

establishment of research goals for wheat improvement. Journal of Production

Agriculture. 3(1): 30–38.

Piepho, H.-P. 1994. Best Linear Unbiased Prediction (BLUP) for regional yield trials: a

comparison to additive main effects and multiplicative interaction (AMMI) analysis.

Theoret. Appl. Genetics. 89(5).

Piepho, H.-P. 1997. Analyzing genotype-environment data by mixed models with multiplicative

terms. Biometrics. 53(2): 761–766.

Piepho, H.-P., and J. Möhring. 2007. Computing Heritability and Selection Response From

Unbalanced Plant Breeding Trials. Genetics. 177(3): 1881 –1888.

Piepho, H.-P., J. Mohring, A.E. Melchinger, and A. Buchse. 2008. BLUP for phenotypic

selection in plant breeding and variety testing. Euphytica. 161: 209–228.

Rao, K.E.P., and V.R. Rao. 1995. The use of characterisation data in developing a core collection

of sorghum. p. 109–115. In Core Collections of Plant Genetic Resources. John Wiley &

Sons, Chichester.

Reddy, L., R.E. Allan, and K.A. Garland Campbell. 2006. Evaluation of cold hardiness in two

sets of near-isogenic lines of wheat (Triticum aestivum) with polymorphic vernalization

alleles. Plant Breeding. 125(5): 448–456.

Reddy, L.J., H.D. Upadhyaya, C.L.L. Gowda, and S. Singh. 2005. Development of core

collection in pigeonpea [Cajanus cajan (L.) Millspaugh] using geographic and qualitative

morphological descriptors. Genetic resources and crop evolution. 52: 1049–1056.

Reif, J.C., B. Kusterer, H.-P. Piepho, R.C. Meyer, T. Altmann, C.C. Schön, and A.E.

Melchinger. 2009. Unraveling Epistasis With Triple Testcross Progenies of Near-

Isogenic Lines. Genetics. 181(1): 247 –257Available at (verified 20 January 2012).

Rodiño, A.P., M. Santalla, A.M. De Ron, and S.P. Singh. 2003. A core collection of common

bean from the Iberian peninsula. Euphytica. 131: 165–175.

Skinner, D.Z. 2009. Post-acclimation transcriptome adjustment is a major factor in freezing

tolerance of winter wheat. Functional & Integrative Genomics. 9(4): 513–523.

Skinner, D.Z., G.R. Bauchan, G. Auricht, and S. Hughes. 1999. A method for the efficient

management and utilization of large germplasm collections. Crop Science. 39: 1237–

1242.

28

Skinner, D.Z., and B.S. Bellinger. 2010. Exposure to subfreezing temperature and a freeze-thaw

cycle affect freezing tolerance of winter wheat in saturated soil. Plant and Soil. 332: 289–

297.

Skinner, D.Z., and B. Mackey. 2009. Freezing tolerance of winter wheat plants frozen in

saturated soil. Field Crops Research. 113(3): 335–341.

Smith, A., B. Cullis, and R. Thompson. 2001. Analyzing variety by environment data using

multiplicative mixed models and adjustments for spatial field trend. Biometrics. 57(4):

1138–1147.

Smith, A.B., B.R. Cullis, and R. Thompson. 2005. The analysis of crop cultivar breeding and

evaluation trials: an overview of current mixed model approaches. The Journal of

Agricultural Science. 143(06): 449–462.

Suhr, D. 2005. Principal Component Analysis vs. Exploratory Factor Analysis. In Proceedings of

the Thirtieth Annual SAS Users Group International Conference. SAS Institute Inc.,

Cary, NC.

Sutka, J. 1994. Genetic control of frost tolerance in wheat (Triticum aestivum L.). Euphytica. 77:

277–282.

Tai, P.Y.P., and J.D. Miller. 2001. A core collection for Saccharum spontaneum L. from the

world collection of sugarcane. Crop Science. 41: 879–885.

Upadhyaya, H.D., P.J. Bramel, and S. Singh. 2001. Development of a chickpea core subset using

geographic distribution and quantitative traits. Crop Science. 41: 206–210.

Upadhyaya, H.D., C.L.L. Gowda, R.P.S. Pundir, V.G. Reddy, and S. Singh. 2006. Development

of core subset of finger millet germplasm using geographical origin and data on 14

quantitative traits. Genetic resources and crop evolution. 53: 679–685.

Upadhyaya, H.D., R.P.S. Pundir, C.L.L. Gowda, V.G. Reddy, and S. Singh. 2008. Establishing a

core collection of foxtail millet to enhance the utilization of germplasm of an

underutilized crop. Plant Genetic Resources: Characterization and Utilization. 6: 1–8.

USDA ARS, National Genetic Resources Program. 2009. Germplasm Resources Information

Network - (GRIN). [Online Database] National Germplasm Resources Laboratory,

Beltsville, Maryland.Available at http://www.ars-grin.gov/cgi-

bin/npgs/html/desc.pl?65059 (verified 17 December 2009).

Vagujfalvi, A., G. Galiba, L. Cattivelli, and J. Dubcovsky. 2003. The cold-regulated

transcriptional activator Cbf3 is linked to the frost-tolerance locus Fr-A2 on wheat

chromosome 5A. Molecular Genetics and Genomics. 269(1): 60–67.

Villanueva, B., R. Pong-Wong, J. Fernández, and M.A. Toro. 2005. Benefits from Marker-

Assisted Selection Under an Additive Polygenic Genetic Model. J ANIM SCI. 83(8):

1747–1752.

29

Visscher, P.M., W.G. Hill, and N.R. Wray. 2008. Heritability in the genomics era - concepts and

misconceptions. Nat Rev Genet. 9(4): 255–266.

Wang, X., R. Fjellstrom, Y. Jia, W.G. Yan, M.H. Jia, B.E. Scheffler, D. Wu, Q. Shu, and A.

McClung. 2010. Characterization of Pi-ta blast resistance gene in an international rice

core collection. Plant Breeding. 129(5): 491–501.

Wang, L., Y. Guan, R. Guan, Y. Li, Y. Ma, Z. Dong, X. Liu, H. Zhang, Y. Zhang, Z. Liu, R.

Chang, H. Xu, L. Li, F. Lin, W. Luan, Z. Yan, X. Ning, L. Zhu, Y. Cui, R. Piao, Y. Liu,

P. Chen, and L. Qiu. 2006. Establishment of Chinese soybean (Glycine max) core

collections with agronomic traits and SSR markers. Euphytica. 151: 215–223.

Weihai, M., Y. Jinxin, and D. Sihachakr. 2008. Development of core subset for the collection of

Chinese cultivated eggplants using morphological-based passport data. Plant Genetic

Resources: Characterization and Utilization. 6(1): 33–40.

Yan, W., N. Rutger, R.J. Bryant, H.E. Bockelman, R.G. Fjellstrom, M.-H. Chen, T.H. Tai, and

A.M. McClung. 2007. Development and evaluation of a core subset of the USDA rice

germplasm collection. Crop Science. 47: 869–878.

Yates, F., and W.G. Cochran. 1938. The analysis of groups of experiments. The Journal of

Agricultural Science. 28(04): 556–580.

30

CHAPTER 2

METHODS FOR SELECTING GERMPLASM CORE SUBSETS

USING SPARSE PHENOTYPIC DATA

Carl A. Walker, Harold E. Bockelman, J. Richard Alldredge, Kimberly Garland Campbell*

C.A. Walker, Dep. of Crop and Soil Sciences, Washington State Univ., Pullman, WA, 99164-

6420; K.G. Campbell, USDA-ARS, Wheat Genetics, Wheat Genetics, Quality, Physiology, and

Disease Research Unit, 209 Johnson Hall, Pullman, WA 99164-6420; H.E. Bockelman, USDA-

ARS, National Small Grains Collection, 1691 S 2700 W, Aberdeen, ID 83210; J.R. Alldredge,

Department of Statistics, Washington State University, Pullman, WA, 99164-6420

*Corresponding author ([email protected]).

Abbreviations: GRIN, Germplasm Research Information Network; HTAP, High-temperature

Adult Plant; RI, recovery of interquartile range; RM, recovery of median; RR recovery of range;

RS, recovery of Shannon index; UPGMA, unweighted pair-group method using arithmetic

averages.

Abstract

Crop plant germplasm collections are often too large and too lacking in descriptive data to be of

use regularly. A well-characterized core subset that consists of a reduced number of accessions

(usually about 10% of the total) can provide increased utility while still maintaining most of the

genetic diversity of the complete collection. Most core subsets have been constructed using

complete or nearly complete data sets of geographical, phenotypic, or genotypic data, but most

large germplasm collections only have complete, or even mostly complete, data for a few

31

variables. The main objective of this study was to evaluate methods for selecting core subsets of

germplasm collections using sparse geographic and phenotypic data. A subset of variables and

accessions with complete data was isolated from the USDA Triticum aestivum collection and

was used to simulate multiple collections with sparse data. Core subsets were selected from the

simulated data sets using 12 methods, defined by the choice of variables to use in Gower‟s

distance estimations, clustering algorithm, and sampling intensity. Diversity metrics were

calculated for each method and simulation. The methods were ranked within each simulation

and then compared in terms of these average rankings. We conclude that core subsets can be

selected based on sparse phenotypic data, and we recommend that a) Gower‟s distances should

be estimated using all variables available, including those with more than 5% missing data; b)

clustering should be conducted using the UPGMA algorithm; and c) clusters should be sampled

in proportion to the logarithm of the cluster sizes.

32

Introduction

Crop plant germplasm collections are maintained to conserve genetic variation and to provide

useful plant material for researchers and plant breeders. An example of such a collection, which

will be used in this study, is the collection of wheat (Triticum aestivum L. subsp. aestivum)

accessions maintained as part of the National Small Grains Collection of the USDA-ARS

National Plant Germplasm System (http://www.ars-grin.gov/npgs/index.html).

For many researchers and plant breeders, germplasm collections are often too large and

too lacking in descriptive data to be of regular use. A well-characterized core collection, or core

subset, that consists of a reduced number of accessions (usually about 10% of the total) can

provide increased utility while still maintaining most of the genetic diversity of the complete

collection (Brown, 1989). Therefore, the best method for selecting core subsets is the one that

results in the most diversity for a given number of accessions. Since desired alleles are unevenly

distributed throughout germplasm collections, preferentially selecting a portion from each

heterogeneous group present in a complete collection increases the likelihood of selecting these

unevenly distributed alleles (Brown, 1989). For this reason, most researchers have constructed

core collections by grouping accessions and then selecting accessions within groups.

A number of different methods and types of data have been used to group accessions and

select core collections. Passport data, i.e. the location of cultivation or collection, has been used

to stratify complete collections followed by selection from within each stratum. This technique

was used to select a core subset for the complete wheat collection described above (USDA ARS,

National Genetic Resources Program, 2009), and to develop other core collections (Skinner et

al., 1999; Huamán et al., 2000; Dahlberg et al., 2004; Yan et al., 2007). Other methods for

selecting core subsets have included stratification based on geographic origin, followed by

33

further grouping based on cluster analysis of phenotypic traits (Basigalup et al., 1995; Rao and

Rao, 1995; Igartua et al., 1998; Tai and Miller, 2001; Upadhyaya et al., 2001, 2006;

Mahalakshmi et al., 2006; Bhattacharjee et al., 2007; Dwivedi et al., 2008). Stratification of

collections has also been conducted using cluster analysis without prior geographic grouping

(Diwan et al., 1995; Franco et al., 1997, 1998, 1999, 2005; Grenier et al., 2001a; Li et al., 2004;

Anderson, 2005; Holbrook and Dong, 2005; Weihai et al., 2008; Upadhyaya et al., 2008). A

comparison of core subset selection methods using relatively complete phenotypic data,

demonstrated that selection based on clustering using those data was superior to selection based

on geographic origin alone (Diwan et al., 1995).

The clustering method and the data used in the clustering process have also varied,

resulting in dramatic differences in final grouping. Clustering methods used include Ward‟s

minimum variance (Franco et al., 1997; Hu et al., 2000; Upadhyaya et al., 2006, 2008, 2001;

Anderson, 2005; Holbrook and Dong, 2005; Reddy et al., 2005; Kang et al., 2006; Mahalakshmi

et al., 2006; Bhattacharjee et al., 2007; Dwivedi et al., 2008), unweighted pair-group method

using arithmetic average (UPGMA), also known as the average linkage method (Hu et al., 2000;

Huamán et al., 2000; Li et al., 2004; Franco et al., 2006; Weihai et al., 2008), complete linkage

(Hu et al., 2000), and the Ward-Modified Location method (Franco et al., 1998, 1999, 2005).

Core collections have been constructed using cluster analysis based on phenotypic

variables followed by random sampling. In many cases these variables have been uniformly

quantitative, and either Euclidian distances or principle components were used to determine

relationships among accessions and construct clusters (Diwan et al., 1995; Igartua et al., 1998;

Holbrook and Dong, 2005; Kang et al., 2006; Bhattacharjee et al., 2007; Upadhyaya et al., 2008).

34

Both categorical and quantitative variables have been used to determine clusters in a few cases

(Franco et al., 1997, 1998, 1999, 2005; Kroonenberg et al., 1997).

More recently, relationships have been determined based on genotypic data (Franco et al.,

2006; Wang et al., 2006; Balfourier et al., 2007; Escribano et al., 2008; Hao et al., 2008) or

combinations of genotypic and phenotypic data (Franco et al., 2010). Clusters constructed using

genotypic data likely result in core subsets that better capture the genetic diversity in the

complete collection; however, the germplasm collections for most major crop plants are too large

to genotype all accessions.

Following the stratification of the complete collection, by cluster analysis, a set of

accessions was chosen from each group with the number sampled based on the size or the

diversity of each group. Direct, or only partially random, selection of all or a portion of the

accessions in a core has been used to increase diversity (Basigalup et al., 1995; Skinner et al.,

1999; Huamán et al., 2000; Rodiño et al., 2003; Yan et al., 2007; Weihai et al., 2008). However,

most researchers have selected accessions from each group randomly and with equal chance of

selection among accessions in a group. Proportional sampling, where the number of accessions

was chosen according to the group size, has often been used to select core subsets (Basigalup et

al., 1995; Upadhyaya et al., 2001, 2006, 2008; Grenier et al., 2001b; Dahlberg et al., 2004;

Holbrook and Dong, 2005; Reddy et al., 2005; Bhattacharjee et al., 2007; Dwivedi et al., 2008).

Other sampling methods, such as selection in proportion to the square root of group size (square

root sampling) (Huamán et al., 2000; Wang et al., 2006) and selection based on the natural

logarithm of group size (logarthmic sampling) (Grenier et al., 2001b; Yan et al., 2007), took

relatively fewer samples from larger groups. These proportional sampling methods reduced

redundancy and increased variability, as larger clusters tend to have greater redundancy among

35

accessions (Brown, 1989). Diversity was also increased by directly selecting more accessions

from groups with greater relative diversity. For example, sampling numbers were determined

relative to the mean distance among accessions in each cluster (Franco et al., 2005).

One aspect of most core subsets, including those referenced above, is that they were

constructed using complete or nearly complete data sets of geographical, phenotypic, or

genotypic data. One exception was an approach by Basigalup et al. (1995), who used partial data

to manually select accessions with extreme values. Such a manual approach is time consuming

and necessarily requires subjective judgments or core subsets that increase in size with the

number of variables assessed. Most large germplasm collections have complete, or even mostly

complete, data for only a few geographic and phenotypic variables. The USDA soybean

collection is comprised of more than 17 thousand accessions, and 106 descriptors, but 75 of

those descriptors have data for less than 10 thousand accessions (Randall Nelson, personal

communication). The USDA wheat collection is even larger and more sparse (details below).

Therefore, methods that have previously been shown to produce diverse core subsets may not be

applicable or the most effective use of sparse data describing the largest germplasm collections

of major crop plants. While one option is to construct the core subset using only the complete, or

nearly complete, geographic and phenotypic data; the sparse data may well delineate differences

among heterogeneous accessions that would otherwise be grouped together.

How then should sparse data, such as the wheat GRIN database, be used to select core

subsets? Gower‟s distance (Gower, 1971) has been used to calculate distances between

accessions based on variables, of any measurement level (nominal, ordinal, interval, or ratio), for

which both accessions have values. When either accession has a missing value for a variable,

that variable would not be included in the calculation. Thus, sparse data can be used to calculate

36

distances, but it may not be appropriate to do so. If data are not missing at random, then

parameter estimates calculated from such data will be biased (Graham, 2009). That is, estimates

of distances between accessions may be influenced by which variables have missing values.

In some cases, however, biased estimates using all data available may be preferable to

unbiased estimates using a very small proportion of the total data available, because they result

in reduced variance among distance calculations. Intuitively, if two accessions are known to be

different for a trait, that information should be included, rather than ignored, when making

decisions about relatedness. Decisions about whether or not to use sparse data in distance

calculations should be based on the diversity of the resulting core subsets.

The main objective of this study was to evaluate methods for selecting core subsets of

germplasm collections using sparse geographic and phenotypic data. We compared methods by

repeatedly selecting core subsets from a complete collection with data sets simulated to have

differing patterns of missing values. We then evaluated the core subsets, selected using each

method, in terms of their capture of the diversity present in the complete collection. A second

objective was to recommend a strategy for using all data available to construct core germplasm

collections.

Materials and Methods

The Germplasm Research Information Network (GRIN) maintains a database of all species held

by the National Genetic Resources Program, including wheat accessions held by the USDA-ARS

National Small Grains Collection. The wheat information in the database includes species,

identification information (accession number, name and improvement status), passport data

(country and state where the accession was collected), growth habit, and 77 other variables

reflecting agronomic, descriptive, disease susceptibility, and insect susceptibility data. However,

37

as is the case for many species curated in the National Genetic Resources Program, this data set

is far from complete, as many agronomic and susceptibility tests have been only been conducted

on a portion of the accessions. For accessions of Triticum aestivum subsp. aestivum, the

variables range from greater than 99.9% complete for country of origin, to more than 99%

missing values (Table 1). Ongoing evaluations are being conducted for several traits. At the

time this project was initiated, 41,312 accessions of Triticum aestivum subsp. aestivum were

included in the database.

A total of 3160 accessions from the USDA wheat collection have complete data for 23

variables (Table 2, first column). This subset of variables and accessions will be referred to as

the “complete” collection and was used to simulate multiple collections with sparse data. To

reduce the possibility that our conclusions would rely on a particular set or pattern of missing

data, four patterns of missing data were evaluated. Independently for each variable, values were

removed by randomly choosing accessions from a uniform distribution, i.e. missing completely

at random (MCAR), or, for each variable, values were removed from a randomly chosen

contiguous set of accessions (MC). For each of these two methods of removal, two sets of

percentages of missing values were assigned to the 23 variables (Table 2). The first set of values

was chosen to resemble the general pattern of missing value amounts observed among the

variables in the complete collection (Table 1), with the same variables best represented. The

second pattern was the reverse of the first and of the data structure of the complete collection.

No values were removed from country of origin, since this information is available for almost all

accessions. For each of these four patterns of missing data (MCAR-1, MC-1, MCAR-2, and

MC-2), 200 independent data sets were simulated by removing values from the “complete”

collection (referred to as the simulations).

38

There are important functional differences between MCAR and MC patterns, because

accessions in the USDA complete collection and the “complete” collection are ordered based on

when they were added to the collection and frequently characterized in that order. Since

accessions were added to the collection in groups that often had a degree of genetic similarity

(e.g. cultivars from a single breeding program), missing values from a contiguous set could

influence the best choice of method for selecting a core subset using sparse phenotypic data.

In order to evaluate the utility of selecting core subsets using the sparse passport and

phenotypic data from the GRIN database, multiple core selection methods were applied to the

simulated data sets and compared. All methods began with the calculation of a matrix of

distances based on Gower‟s similarity coefficient (Gower, 1971), which was calculated using the

DISTANCE procedure in SAS/STAT software (SAS Institute Inc., 2003). The distance between

two accessions, i and j, was calculated as 1 – Sij, where Sij is the similarity coefficient as defined

by Gower (1971). This distance metric was calculated using the variables, both quantitative and

qualitative, for which both accessions had an observation. In order to determine if very sparse

data can be used when selecting core subsets, distance matrices were calculated either using all

23 variables, including those with many missing values, or only those that were densely

populated with 5% or less missing values.

Following distance matrix calculation, the CLUSTER procedure in SAS/STAT software

(SAS Institute Inc., 2003) was used to construct a hierarchical tree of all the accessions within

the “complete” collection. For comparison purposes, the clustering was conducted using two

methods: the Ward‟s minimum variance method (Ward, 1963) and the unweighted pair-group

method using arithmetic averages (UPGMA) (Sokal and Michener, 1958), which is also known

as the average linkage method. These clustering methods were selected as part of a core subset

39

selection approach, since they were the most commonly recommended in other studies that

selected core subsets.

In both methods clustering was based on the Gower‟s distance matrix of each stratum.

An R2 value of 0.6 was set as the minimum value at which the next smallest number of clusters

was selected for each stratum. As the focus of this study was the selection of core subsets for use

in breeding programs, this R2 value chosen so that the number of clusters in each stratum

produced practical groups rather than attempting to determine genetic relations of accessions.

This R2 cutoff resulted in clusters that retained about 60% of the total variation in Gower‟s

distances among accessions in the simulation. We viewed this percentage as a reasonable

compromise between excessively small and homogeneous clusters or large and heterogeneous

clusters.

Random samples of accessions were selected from each cluster according to three

methods for assigning sampling intensities as follows: 1) sampling numbers of accessions in

proportion to the number of accessions in each cluster (proportional sampling); 2) sampling in

proportion to the square root of the cluster total (square root sampling); 3) sampling in

proportion to the natural logarithm of the cluster total (natural log sampling). If the number of

selections calculated for a cluster was less than one, then one accession was selected from that

cluster, otherwise the number of selections was rounded to the nearest accession. As a result of

this rounding, the number of accessions selected for potential core subsets varied among

simulations and subsetting methods.

The combinations of choice of variables in the distance calculation, clustering method

and sampling intensity define 12 different methods for selection of core subsets. Since

accessions were randomly selected from clusters according to the selection intensities, an

40

extremely large number of potential core subsets could be selected for a given data set and

methods. Since the specific accessions selected could influence the diversity of the resulting

core subset, 1000 potential core subsets were selected using each of the twelve methods for all

200 simulations.

In order to compare methods for selecting core subsets, five metrics were calculated for

each potential core subset based on statistics calculated on each variable. These metrics allowed

concurrent comparisons of multiple variables of the same type by averaging measures of the

percent by which a core subset differed from the “complete” collection. The metrics were

calculated after replacing the values removed in the simulation process for the accessions

selected in a given core subset. The first metric was the averaged standardized percent deviation

in medians, hereafter referred to as the recovery of median (RecMed), calculated as:

, where is the median of the values of the kth

ratio variable of the

subset, is the median of the values of the kth

ratio variable for the accessions in the “complete”

collection, and v is the total number of ratio variables. The second metric, averaged percent

recovery of interquartile range, hereafter referred to as recovery of interquartile range (RecIQR),

calculated as: , where is the interquartile range of the values of the

kth

ratio variable of the subset, and is the interquartile range of the values of the kth

ratio

variable for the complete collection. The third metric was the recovery of range (RecRange),

also known as the coincidence rate (Hu et al., 2000) , and was calculated using the equation of

Franco et al. (2005): , where Rk is the range of the kth

ratio or ordinal

variables of the accessions in the subset, is the range of the kth

variable for the complete

v

k k

kk

x

xx

v 1~

~~100

RecMed kx~

kx ~

v

k k

k

RIQ

IQR

v 1

100RecIQR

kIQR

kRIQ

v

k k

k

R

R

v 1

100RecRange

kR

41

collection, and v is the total number of ratio and ordinal variables. The fourth metric was the

recovery of number of categories (RecNCat), calculated as: , where Ck and

are the number of unique values a nominal variable takes in the subset and complete

collection, respectively. The fifth metric was the recovery of Shannon index (RecS), calculated

as: , where Hk and are the Shannon diversity (or entropy) indices for the

kth

ordinal or nominal variable of the subset and complete collection, respectively. In both cases

listed above, the Shannon indices were calculated using the equation: , where S

is the total number of unique values that occur for a nominal or ordinal variable, and pi is the

frequency of the ith

value of the variable. Hereafter these five metrics will be collectively

referred to as recovery metrics.

The recovery metrics described above were calculated for each potential core subset

within each of the core subset selection methods and simulations. The median value of each

recovery metric was calculated for each set of 1000 potential core subsets, and these medians

will be referred to as the Median Recovery Metrics Over Potential core Subsets (MRMOPS).

Medians were calculated because the distributions of recovery metrics over potential core subsets

were often highly skewed. The methods were ranked, in terms of the MRMOPS, within each

simulation. To summarize and provide concise criteria for choosing the method that would be

expected to provide the most diverse core subset, the MRMOPS and ranks were averaged, over

the simulations, for each method.

v

k k

k

C

C

v 1

100RecNCat

kC

v

k k

k

H

H

v 1

100RecS

kH

S

i

ii ppH1

ln

42

Results

Our goal was to select a core subset with most of the diversity of the complete collection without

redundancy, so we evaluated how well various methods for selecting cores achieved this goal.

There are two major aspects of diversity: a wide range of possible values and an even

representation of all values. How these aspects are evaluated depends on the variable of interest,

with range calculated on ratio and ordinal variables, interquartile range calculated on ratio

variables, number of categories evaluated for nominal variables, and Shannon‟s index calculated

on ordinal and nominal variables. We used the recovery metric calculations to compare the

diversity of the complete collection with the diversity recovered by the core subsets. Values in

excess of 100% were desired for RecIQR and RecH, since the evenness of core subsets should

exceed that of the complete collections, whereas 100% was the maximum possible recovery of

range or numbers of categories.

The best methods were expected to produce cores with the greatest diversity, as estimated

by the MRMOPS values. Multiple patterns of missing data were simulated to ensure our

conclusions were not specific to a single pattern of missing data. For each of these simulations,

the MRMOPS were calculated and the methods were ranked in terms of the MRMOPS. The

method with the consistently highest ranks (lowest numbers) would be expected to produce the

most diverse core subsets for other germplasm collections with sparse data. The averages over

the simulations of the MRMOPS and the ranks of MRMOPS allow us to choose a best method.

Results from simulations with each pattern of missing data indicated similar average rankings of

the methods.

For the MCAR-1 simulations, the method that used the entire sparse data set, UPGMA

clustering, and logarithmic sampling had the best average rank for RecRange, RecNCat, and

43

RecH, and the second best rank for RecIQR, for which square root sampling resulted in greater

diversity (Table 3). Average ranks near 1.0 indicate that method was consistently the most

diverse, as measured by each recovery metric, over all simulations. When the best average rank

for a metric is nearer to 2.0, e.g. RecIQR for MCAR-1, a single method was not consistently the

best, but the best method placed consistently in the top few ranks. Results from the MC-1

simulations also show that the method that uses all variables, UPGMA clustering, and

logarithmic sampling results in the greatest diversity as measured by RecIQR, RecRange, and

RecH (Table 4). The near equal average rankings, in terms of RecNCat, for logarithmic and

square root sampling indicate that the top ranking mostly switched back and forth between these

two methods over the simulations.

Results from analyses of the MCAR-2 and MC-2 simulations differ only slightly from the

results of MCAR-1 and MC-1 (Tables 5 and 6). For these simulations, core subsets selected

using all variables from sparse data sets, UPGMA clustering, and logarithmic sampling were the

most diverse in terms of RecIQR, RecNCat, and RecH, but this method did not produce cores

with as wide of ranges as some other methods. However, the results for RecRange are probably

not meaningful, since the mean MRMOPS for each method did not vary by more than one

percent. Due to time constraints, we were unable to evaluate additional patterns of missing

values or “complete” collections from other databases, but the range of conditions evaluated

suggest that the best method will be generally consistent in other scenarios. Independent

simulations and analyses are necessary to confirm these conclusions.

When applying this methodology to real world germplasm collections, multiple potential

core subsets could be selected. Here we compared medians over the potential subsets, but the

potential cores for each method and simulation varied greatly in terms of diversity. We

44

recommend the method with the highest median, since it would be more likely to produce more

diverse cores. For real world collections, it may be beneficial to select many potential core

subsets and then choose to use one with relatively high values for all the recovery metrics.

Since our methodology prevents the selection of core subsets all of the same size, it is

possible that the MRMOPS and their rankings may have been influenced by the fluctuations in

sizes. In general, including more accessions in a core might be expected to result in greater

retention of diversity. When Pearson correlations were calculated over all simulations and

methods, MRMOPS calculated on each recovery metric, except RecMed, were positively but

weakly associated with core subset sizes (r = 0.355, 0.095, 0.159, 0.232; for RecIQR, RecRange,

RecNCat, and RecH, respectively). However, these associations do not appear to be sufficient to

explain the rankings of the methods. The methods with the best average rankings were not the

methods with the largest mean sizes of core subset (Tables 3-6).

We were concerned that 200 simulations per missing value pattern might not have been

enough to accurately compare the methods. As illustrated in Figures 1 and 2, 200 simulations

were sufficient to produce stable means of the MRMOPS and ranks. If additional simulations

were analyzed the means would not be expected to change to a meaningful degree. That is, the

only rank changes would be between methods that produce very similar results.

Discussion

Grouping germplasm accessions via cluster analysis serves two purposes. The first is to aid in

selecting a core with reduced redundancy as described above. The second benefit of grouping is

that it provides structure and connections to the reserve collection, the set of accessions from the

complete collection that are not included in the core. The connection between each accession in

the core and a specific group in the reserve collection can be of use to breeders. If breeders find

45

lines in the core collection that are of interest, they can trace connections from these lines to sets

of additional accessions in the reserve with similar characteristics. Ideally these accessions will

be genetically similar to the accessions in the core, although this will depend on the effectiveness

of the grouping. Miklas et al. (1999) reported using core and reserve collections of common

bean in such a way to discover sources of white mold resistance beyond a set found in a core

subset. It is this second benefit that is the greatest argument for using a clustering approach over

other approaches that yield diverse core subsets.

The goal of a core subset is to provide easier access to the resources of a complete

collection by representing the complete collection with a reduced number of accessions. Some

researchers have selected core subsets that match the distributions of variables measured on their

complete collections. A wide variety of statistical inference tests have been used to evaluate

diversity of core subsets and to compare them to complete collections under the assumption that

core subsets and complete collections are independent samples of some larger population. These

methods have included chi-square tests of independence of collection type and country of origin,

marker alleles, and nominal phenotypic variables (Tai and Miller, 2001; Upadhyaya et al., 2001,

2006, 2008; Grenier et al., 2001b; Reddy et al., 2005; Mahalakshmi et al., 2006; Bhattacharjee et

al., 2007; Agrama et al., 2009). Differences between the distribution of quantitative variables for

proposed core subsets and complete collections have been tested using the Levene test and the

Newman-Keuls test (Upadhyaya et al., 2001, 2006, 2008; Grenier et al., 2001b; Reddy et al.,

2005; Kang et al., 2006; Bhattacharjee et al., 2007; Agrama et al., 2009).

The validity of these statistical tests of differences between complete collections and core

subsets is questionable, since these are not independent samples in two respects. First, complete

collections are not random samples of all wheat germplasm (due to limitations in collection

46

activities), and so statistics calculated on the complete collection should not be considered

estimates of the population of all wheat germplasm. Instead, the complete collection is the

population of interest for which we can calculate exact parameter values. Second, accessions in

the core subset are not independent of the complete collection; the core is a stratified sample of

the complete collection. Therefore, comparisons that avoid statistical tests and acknowledge that

the core subset is, in fact, a subset of the complete collection are preferable.

Aside from considerations of proper statistical testing, are researchers correct to select

cores that match the means of complete collections, a common goal (Hu et al., 2000; Upadhyaya

et al., 2006, 2008; Weihai et al., 2008; Parra-Quijano et al., 2011)? One reason to match the

mean would be if the core subset was to be evaluated as a sample to make predictions about the

complete collection and, by extension, the whole population represented by the complete

collection. Although core subsets can do a very good job of matching the distributions of

complete collections, the risk associated with this approach is that the complete collection does

not effectively represent the actual population of germplasm it was sampled from. Germplasm

collections are limited by the manner in which they were collected. For example, commercial

breeding programs have rarely contributed material. Additionally, as a result of specific

collectors or collection activities, certain countries may be under- or over-represented.

Repetition of genotypes may also be a problem, but is often difficult to discern (van de Wouw et

al., 2011).

Rather than attempting to perfectly match the distributions of germplasm populations, a

more achievable and beneficial goal is to select core subsets that capture the diversity maintained

in complete collections while excluding redundant accessions. The core subsets that result

would be useful for breeders and researchers who wish to evaluate a small set of accessions for a

47

new or unevaluated trait, maximizing their likelihood of finding the trait without evaluating

excessive numbers of accessions, e.g. Wang et al. (2010). This approach necessarily results in

deviation from the distributions of variables in the complete collection, since eliminating

redundancy in anything other than a symmetric distribution, will shift the center of a distribution.

We have included the RecMed metric in our evaluations for readers who feel that the

distributional centers of the complete collection should be maintained in the core subsets (Tables

3-6). This metric results in higher values when the medians of the core subset deviate

substantially from the medians of the complete collection. However, we believe that the other

four recovery metrics are preferable and effectively identify diverse core subsets with reduced

redundancy.

Conclusion

We conclude that core subsets can be selected based on incomplete phenotypic data sets,

and when doing so, we recommend that a) Gower‟s distances should be estimated using all

variables available, including those with more than 5% missing data; b) clustering should be

conducted using the UPGMA algorithm; and c) clusters should be sampled in proportion to the

logarithm of the cluster sizes. This method, which uses all available data, is expected to produce

core subsets that retain much of the diversity of the complete collection while excluding

redundant accessions.

Appendix

The Germplasm Research Information Network (GRIN) maintains a database of wheat

accessions held by the National Small Grains Collection. This database includes information on

species, identification information (accession number, name and improvement status), passport

48

data (country and state where the accession was collected), growth habit, and 77 other variables

reflecting agronomic, descriptive, disease susceptibility, and insect susceptibility data. However,

this data set is far from complete, as many agronomic and susceptibility tests have not been

conducted for the majority of the accessions. For accessions of Triticum aestivum subsp.

aestivum, the variables range from greater than 99.9% complete for country of origin, to more

than 99% missing values (Table 1). Ongoing evaluations are being conducted for several traits.

At the time this project was initiated, 41,312 accessions of Triticum aestivum subsp. aestivum

were included in the database.

A core subset of this complete collection was previously chosen by curator H. Bockelman

in 1995, and additional accessions were added in 2006. In 1995, accessions were selected

randomly from groups with the same value for the variable country (referring to country of

origin). The number selected from each country-group was in proportion to the natural

logarithm of the size of each country-group, resulting in the selection of about 10% of the

complete collection. In 2006, to reflect the additions to the complete collection, 10% (858) of

the accessions added between 1995 and 2006 were selected randomly, without grouping, and

added to the core subset. This existing core subset consists of a total of 3992 accessions.

In order to select a new core subset, accessions were stratified based on their growth habit

(spring, winter, or facultative), and were additionally stratified by components within world

macro regions, as defined in the United Nations demographic yearbook publications (United

Nations, 2008). The region to which each value of the country variable is assigned is shown in

Appendix Table 1. This initial stratification ensures that two accessions from different regions

or with differing growth habits cannot be put together in the same group later in the core

49

selection process. This is desirable, since it is unlikely that two such accessions would be

related.

Based on our comparisons of core selection methods, we concluded that the most diverse

core subset would be selected using all variables in Gower‟s distance calculations, UPGMA

clustering, and logarithm sampling. Using this method, 2000 potential core subsets were

selected from the complete collection. Recovery metrics were calculated on all potential cores

and the potential core subsets were ranked for each metric. The potential cores were then

compared using the sums of the ranks multiplied by the number of variables used in the

calculation of each metric, that is: 11*RI + 44*RR + 12*RC + 45*RS. The core with the lowest

value of this comparison metric was selected as the “best” potential core subset. Instead of

directly using this “best” core subset, it was decided that any new core should use the maximum

number possible of accessions from the original core. All accessions selected for both the

original and “best” core were included in the new core. Additional accessions were then

preferentially selected from the original core and then the “best” core to equal the number of

accessions from each cluster determined by the logarithm sampling strategy. This resulted in a

new core subset with over half of its accessions selected from the existing core, but with superior

diversity as measured by the recovery metrics RecRange, RecNcat, and RecH (Appendix Table

2), but not RecIQR. This indicates that the original core has greater evenness in its distribution

of ratio variables, but lesser diversity for ordinal and nominal variables as compared to the

reselected core subset.

50

References

Agrama, H.A., W. Yan, F. Lee, R. Fjellstrom, M.-H. Chen, M. Jia, and A. McClung. 2009.

Genetic assessment of a mini-core subset developed from the USDA rice genebank. Crop

Science. 49(4): 1336–1346.

Anderson, W.F. 2005. Development of a forage bermudagrass (Cynodon sp.) core collection.

Grassland Science. 51: 305–308.

Balfourier, F., V. Roussel, P. Strelchenko, F. Exbrayat-Vinson, P. Sourdille, G. Boutet, J.

Koenig, C. Ravel, O. Mitrofanova, M. Beckert, and G. Charmet. 2007. A worldwide

bread wheat core collection arrayed in a 384-well plate. Theoretical and Applied

Genetics. 114: 1265–1275.

Basigalup, D.H., D.K. Barnes, and R.E. Stucker. 1995. Development of a core collection for

perennial Medicago plant introductions. Crop Science. 35: 1163–1168.

Bhattacharjee, R., I.S. Khairwal, P.J. Bramel, and K.N. Reddy. 2007. Establishment of a pearl

millet [Pennisetum glaucum (L.) R. Br.] core collection based on geographical

distribution and quantitative traits. Euphytica. 155: 35–45.

Brown, A.H.D. 1989. Core collections: a practical approach to genetic resources management.

Genome. 31: 818–824.

Dahlberg, J.A., J.J. Burke, and D.T. Rosenow. 2004. Development of a sorghum core collection:

refinement and evaluation of a subset from Sudan. Economic Botany. 58(4): 556–567.

Diwan, N., G.R. Bauchan, and M.S. McIntosh. 1995. Methods of developing a core collection of

annual Medicago species. Theoretical and Applied Genetics. 90: 755–761.

Dwivedi, S.L., N. Puppala, H.D. Upadhyaya, N. Manivannan, and S. Singh. 2008. Developing a

core collection of peanut specific to Valencia market type. Crop Science. 48: 625–632.

Escribano, P., M.A. Viruel, and J.I. Hormaza. 2008. Comparison of different methods to

construct a core germplasm collection in woody perennial species with simple sequence

repeat markers. A case study in cherimoya (Annona cherimola, Annonaceae), an

underutilised subtropical fruit tree species. Annals of Applied Biology. 153: 25–32.

Franco, J., J. Crossa, and S. Desphande. 2010. Hierarchical multiple-factor analysis for

classifying genotypes based on phenotypic and genetic data. Crop Sci. 50(1): 105–117.

Franco, J., J. Crossa, S. Taba, and H. Shands. 2005. A sampling strategy for conserving genetic

diversity when forming core subsets. Crop Science. 45: 1035–1044.

Franco, J., J. Crossa, J. Villasenor, A. Castillo, S. Taba, and S.A. Eberhart. 1999. A two-stage,

three-way method for classifying genetic resources in multiple environments. Crop

Science. 39: 259–267.

51

Franco, J., J. Crossa, J. Villasenor, S. Taba, and S.A. Eberhart. 1997. Classifying Mexican maize

accessions using hierarchical and density search methods. Crop Science. 37: 972–980.

Franco, J., J. Crossa, J. Villasenor, S. Taba, and S.A. Eberhart. 1998. Classifying genetic

resources by categorical and continuous variables. Crop Science. 38: 1688–1696.

Franco, J., J. Crossa, M.L. Warburton, and S. Taba. 2006. Sampling strategies for conserving

maize diversity when forming core subsets using genetic markers. Crop Science. 46:

854–864.

Gower, J.C. 1971. A general coefficient of similarity and some of its properties. Biometrics. 27:

857–74.

Graham, J.W. 2009. Missing Data Analysis: Making It Work in the Real World. Annual Review

of Psychology. 60(1): 549–576.

Grenier, C., P.J. Bramel-Cox, and P. Hamon. 2001a. Core collection of sorghum: I. stratification

based on eco-geographical data. Crop Science. 41: 234–240.

Grenier, C., P. Hamon, and P.J. Bramel-Cox. 2001b. Core collection of sorghum: II. comparison

of three random sampling strategies. Crop Science. 41: 241–246.

Hao, C., Y. Dong, L. Wang, G. You, H. Zhang, H. Ge, J. Jia, and X. Zhang. 2008. Genetic

diversity and construction of core collection in Chinese wheat genetic resources. Chinese

Science Bulletin. 53(10): 1518–1526.

Holbrook, C.C., and W. Dong. 2005. Development and evaluation of a mini core collection for

the U.S. peanut germplasm collection. Crop Science. 45: 1540–1544.

Hu, J., J. Zhu, and H.M. Xu. 2000. Methods of constructing core collections by stepwise

clustering with three sampling strategies based on the genotypic values of crops.

Theoretical and Applied Genetics. 101: 264–268.

Huamán, Z., R. Ortiz, and R. Gómez. 2000. Selecting a Solanum tuberosum subsp. andigena

core collection using morphological, geographical, disease and pest descriptors.

American Journal of Potato Research. 77: 183–190.

Igartua, E., M.P. Gracia, J.M. Lasa, B. Medina, J.L. Molina-Cano, J.L. Montoya, and I.

Romagosa. 1998. The Spanish barley core collection. Genetic Resources and Crop

Evolution. 45: 475–481.

Kang, C.W., S.Y. Kim, S.W. Lee, P.N. Mathur, T. Hodgkin, M.D. Zhou, and J.R. Lee. 2006.

Selection of a core collection of Korean sesame germplasm by a stepwise clustering

method. Breeding Science. 56: 85–91.

Kroonenberg, P.M., B.D. Harch, K.E. Basford, and A. Cruickshan. 1997. Combined analysis of

categorical and numerical descriptors of australian groundnut accessions using nonlinear

52

principal component analysis. Journal of Agricultural, Biological, and Environmental

Statistics. 2(3): 294–312.

Li, C.T., C.H. Shi, J.G. Wu, H.M. Xu, H.Z. Zhang, and Y.L. Ren. 2004. Methods of developing

core collections based on the predicted genotypic value of rice (Oryza sativa L.).

Theoretical and Applied Genetics. 108: 1172–1176.

Mahalakshmi, V., Q. Ng, M. Lawson, and R. Ortiz. 2006. Cowpea [Vigna unguiculata (L.)

Walp.] core collection defined by geographical, agronomical and botanical descriptors.

Plant Genetic Resources: Characterization and Utilization. 5(3): 113–119.

Miklas, P.N., R. Delorme, R. Hannan, and M.H. Dickson. 1999. Using a subsample of the core

collection to identify new sources of resistance to white mold in common bean. Crop

Science. 39: 569–573.

Parra-Quijano, M., J.M. Iriondo, E. Torres, and L.D. la Rosa. 2011. Evaluation and Validation of

Ecogeographical Core Collections using Phenotypic Data. Crop Science. 51(2): 694.

Rao, K.E.P., and V.R. Rao. 1995. The use of characterisation data in developing a core collection

of sorghum. p. 109–115. In Core Collections of Plant Genetic Resources. John Wiley &

Sons, Chichester.

Reddy, L.J., H.D. Upadhyaya, C.L.L. Gowda, and S. Singh. 2005. Development of core

collection in pigeonpea [Cajanus cajan (L.) Millspaugh] using geographic and qualitative

morphological descriptors. Genetic resources and crop evolution. 52: 1049–1056.

Rodiño, A.P., M. Santalla, A.M. De Ron, and S.P. Singh. 2003. A core collection of common

bean from the Iberian peninsula. Euphytica. 131: 165–175.

SAS Institute Inc. 2003. SAS/STAT® User‟s Guide, Version 9. SAS Institute Inc., Cary, NC.

Skinner, D.Z., G.R. Bauchan, G. Auricht, and S. Hughes. 1999. A method for the efficient

management and utilization of large germplasm collections. Crop Science. 39: 1237–

1242.

Sokal, R.R., and C.D. Michener. 1958. A statistical method for evaluating systematic

relationships. Kansas University Science Bulletin. 38: 1409–1438.

Tai, P.Y.P., and J.D. Miller. 2001. A core collection for Saccharum spontaneum L. from the

world collection of sugarcane. Crop Science. 41: 879–885.

United Nations. 2008. 2006 Demographic Yearbook. New York.

Upadhyaya, H.D., P.J. Bramel, and S. Singh. 2001. Development of a chickpea core subset using

geographic distribution and quantitative traits. Crop Science. 41: 206–210.

53

Upadhyaya, H.D., C.L.L. Gowda, R.P.S. Pundir, V.G. Reddy, and S. Singh. 2006. Development

of core subset of finger millet germplasm using geographical origin and data on 14

quantitative traits. Genetic resources and crop evolution. 53: 679–685.

Upadhyaya, H.D., R.P.S. Pundir, C.L.L. Gowda, V.G. Reddy, and S. Singh. 2008. Establishing a

core collection of foxtail millet to enhance the utilization of germplasm of an

underutilized crop. Plant Genetic Resources: Characterization and Utilization. 6: 1–8.

USDA ARS, National Genetic Resources Program. 2009. Germplasm Resources Information

Network - (GRIN). [Online Database] National Germplasm Resources Laboratory,

Beltsville, Maryland.Available at http://www.ars-grin.gov/cgi-

bin/npgs/html/desc.pl?65059 (verified 17 December 2009).

Wang, X., R. Fjellstrom, Y. Jia, W.G. Yan, M.H. Jia, B.E. Scheffler, D. Wu, Q. Shu, and A.

McClung. 2010. Characterization of Pi-ta blast resistance gene in an international rice

core collection. Plant Breeding. 129(5): 491–501.

Wang, L., Y. Guan, R. Guan, Y. Li, Y. Ma, Z. Dong, X. Liu, H. Zhang, Y. Zhang, Z. Liu, R.

Chang, H. Xu, L. Li, F. Lin, W. Luan, Z. Yan, X. Ning, L. Zhu, Y. Cui, R. Piao, Y. Liu,

P. Chen, and L. Qiu. 2006. Establishment of Chinese soybean (Glycine max) core

collections with agronomic traits and SSR markers. Euphytica. 151: 215–223.

Ward, J.H. 1963. Hierarchical grouping to optimize and objective function. Journal of the

American Statistical Association. 58: 236–244.

Weihai, M., Y. Jinxin, and D. Sihachakr. 2008. Development of core subset for the collection of

Chinese cultivated eggplants using morphological-based passport data. Plant Genetic

Resources: Characterization and Utilization. 6(1): 33–40.

van de Wouw, M., R. van Treuren, and T. van Hintum. 2011. Authenticity of Old Cultivars in

Genebank Collections: A Case Study on Lettuce. Crop Science. 51(2): 736.

Yan, W., N. Rutger, R.J. Bryant, H.E. Bockelman, R.G. Fjellstrom, M.-H. Chen, T.H. Tai, and

A.M. McClung. 2007. Development and evaluation of a core subset of the USDA rice

germplasm collection. Crop Science. 47: 869–878.

54

Table 1. Measurement levels and missing value percentages of variables evaluated on the

Triticum aestivum L. subsp. aestivum complete collection.

Variable† Level of

measurement %

missing Variable Level of

measurement %

missing

awn color nominal 63.3 RWA leaf roll 2 nominal 74.0

awn type ordinal 55.4 SBMV reaction ordinal 79.9

BYDV Davis reaction ordinal 95.3 scab reaction ratio 90.0

BYDV Urb reaction ordinal 69.3 shattering ordinal 78.0 cereal leafbeetle

reaction ordinal 69.6 spike density ordinal 74.5 commonbunt M1

reaction ratio 61.3 spike type nominal 74.8 commonbunt M2

reaction ratio 93.3 spikelets per spike ratio 95.7 commonbunt M3

reaction ratio 86.7 stagnospora reaction ordinal 84.7 commonbunt R36

reaction ratio 99.9 state nominal 20.6 commonbunt R39

reaction ratio 97.0 stem rust adult Rosemount ordinal 87.7 commonbunt R43

reaction ratio 99.4 stem rust adult St.Paul ordinal 65.2 commonbunt T1

reaction ratio 85.2 stem rust HJCS reaction nominal 90.4

country nominal 0.02 stem rust HNLQ reaction nominal 91.1

days to flowering ratio 4.7 stem rust QFBS reaction nominal 82.5

dwarf bunt reaction ratio 56.2 stem rust QSHS reaction nominal 90.2

glume color nominal 62.8 stem rust RHRS reaction nominal 90.5

glume pubescence ordinal 59.6 stem rust RKQS reaction nominal 91.1

growth habit nominal 0.8 stem rust RTQQ reaction nominal 81.7

height ratio 4.8 stem rust TNMH reaction nominal 90.4

Hessian B reaction ordinal 98.9 stem rust TNMK reaction nominal 81.8

Hessian C reaction ordinal 58.3 straw breakage ordinal 70.5

Hessian E reaction ordinal 58.3 straw color nominal 55.6

Hessian GP reaction ordinal 68.5 stripe rust adult Mt.Vernon ordinal 7.3

Hessian L reaction ordinal 89.6 stripe rust adult Pullman ordinal 26.5

kernel color nominal 44.6 stripe rust PST 100 reaction ordinal 88.3

kernel weight ratio 65.6 stripe rust PST 17 reaction ordinal 50.2

kernels per spike ratio 92.9 stripe rust PST 20 reaction ordinal 73.0

leaf pubescence ordinal 61.8 stripe rust PST 25 reaction ordinal 97.9

leaf rust adult reaction ordinal 90.0 stripe rust PST 27 reaction ordinal 70.5

leaf rust reaction ordinal 26.1 stripe rust PST 29 reaction ordinal 71.2

lodging ordinal 57.6 stripe rust PST 37 reaction ordinal 62.8

lysine content ratio 75.1 stripe rust PST 43 reaction ordinal 65.1 powdery mildew

reaction ordinal 66.4 stripe rust PST 45 reaction ordinal 62.8

rachis length ordinal 95.6 stripe rust PST 78 reaction ordinal 89.8

RWA 1 chlorosis ordinal 27.7 stripe rust PST 80 reaction ordinal 92.8

RWA 2 chlorosis ordinal 74.0 stripe rust severity Mt. Vernon ratio 7.3

RWA leaf roll 1 nominal 27.7 stripe rust severity Pullman ratio 26.3 † Detailed information on variables available at http://www.ars-grin.gov/cgi-bin/npgs/html/desclist.pl?65

55

Table 2. Removal percentages by variable for simulating data sets with missing values by

removing values from the "complete collection".

Level of

measurement

%

Variable Set 1 Set 2

country nominal 0 0

days to flowering ratio 5 95

height ratio 5 95

stripe rust adult Mt.Vernon ordinal 5 90

stripe rust severity Mt. Vernon ratio 5 90

state nominal 25 90

leaf rust reaction ordinal 25 90

RWA 1 chlorosis ordinal 25 75

RWA leaf roll 1 nominal 25 75

stripe rust adult Pullman ordinal 50 75

stripe rust severity Pullman ratio 50 75

kernel color nominal 50 50

Hessian C reaction ordinal 50 50

Hessian E reaction ordinal 75 50

kernel weight ratio 75 50

straw color nominal 75 25

awn type ordinal 75 25

lodging ordinal 90 25

leaf pubescence ordinal 90 25

glume color nominal 90 5

awn color nominal 90 5

straw breakage ordinal 95 5

commonbunt M1 reaction ratio 95 5

56

Table 3. Comparisons of core subset selection methods in terms of diversity of 1000 potential core subsets selected from 200

complete collections simulated with values removed at the rates given by set 1 (see Table 2) from accessions selected randomly from a

uniform distribution.

Core subset selection method RecMed† RecIQR† RecRange† RecNCat† RecH†

Distance Variables‡ Clustering§ Sampling¶

Mean # of accessions Medians# Ranks†† Medians Ranks Medians Ranks Medians Ranks Medians Ranks

Sparse UPGMA Logarithm 322.7 30.1 1.2 116.1 2.6 95.9 1.0 77.6 1.0 103.0 1.0

Sparse UPGMA Proportional 320.5 1.1 10.9 104.5 9.0 91.3 4.5 71.8 4.5 92.0 7.4

Sparse UPGMA Square Root 325.5 18.3 3.5 115.1 1.8 95.0 2.1 76.3 2.0 100.9 2.0

Sparse Ward Logarithm 340.2 6.0 6.5 112.0 3.7 90.4 5.9 71.0 5.7 94.9 4.9

Sparse Ward Proportional 310.0 4.2 6.8 102.8 11.1 89.2 11.0 68.2 11.4 90.3 11.2

Sparse Ward Square Root 330.0 1.5 8.1 110.6 5.3 90.1 7.2 70.2 7.0 93.6 6.0

Dense UPGMA Logarithm 314.3 25.2 1.8 110.7 5.3 92.5 3.3 72.6 3.8 98.7 3.0

Dense UPGMA Proportional 311.2 3.3 7.9 103.3 10.3 89.3 10.4 69.3 9.2 90.6 10.3

Dense UPGMA Square Root 327.4 15.7 4.4 112.8 3.2 91.8 4.5 72.2 4.3 96.4 4.1

Dense Ward Logarithm 339.7 0.9 9.4 109.3 6.4 89.8 8.1 69.5 8.3 91.8 7.7

Dense Ward Proportional 310.2 5.1 6.0 102.7 11.4 89.2 11.0 68.2 11.5 90.3 11.5

Dense Ward Square Root 329.9 0.7 11.5 106.8 8.0 89.6 9.0 69.1 9.4 91.3 9.0

† Recovery metrics: RecMed, recovery of median; RecIQR, recovery of interquartile range; RecRange, recovery of range; RecNCat, recovery of number of unique categories; RecH, recovery of Shannon's Index.

‡ Distance calculations were conducted using either all of the variables (Sparse) or only those few with 5% or less missing values (Dense).

§ Clustering of accessions methods: UPGMA, unweighted pair-group method using arithmetic averages; Ward, Ward’s minimum variance.

¶ Sampling accessions from each cluster in proportion to cluster size (proportional), natural logarithm of the size (logarithm), or square root of the size (square root).

# Means, over 200 simulations, of medians of recovery metrics of 1000 potential core subsets.

†† Means, over 200 simulations, of ranks of selection methods. Methods were ranked, with highest values receiving the lowest ranks, within each simulation based on medians, over 1000 potential core subsets, of recovery metrics.

57

Table 4. Comparisons of core subset selection methods in terms of diversity of 1000 potential core subsets selected from 200

complete collections simulated with values removed at the rates given by set 1 (see Table 2) from accessions selected as a contiguous

group.

Core subset selection method RecMed† RecIQR† RecRange† RecNCat† RecH†

Distance Variables‡ Clustering§ Sampling¶

Mean # of accessions Medians# Ranks†† Medians Ranks Medians Ranks Medians Ranks Medians Ranks

Sparse UPGMA Logarithm 314.4 27.0 2.1 122.3 2.2 95.2 1.4 75.4 2.0 101.7 1.2

Sparse UPGMA Proportional 317.4 0.9 11.1 104.0 9.2 91.0 5.1 70.8 5.9 91.6 8.0

Sparse UPGMA Square Root 325.6 13.0 4.5 115.9 2.7 94.5 2.1 74.7 2.0 99.4 2.2

Sparse Ward Logarithm 340.1 8.4 5.5 113.2 4.2 90.6 5.9 71.2 5.3 94.8 4.9

Sparse Ward Proportional 309.8 4.4 6.3 102.8 11.0 89.2 11.0 68.2 11.4 90.3 11.1

Sparse Ward Square Root 330.0 1.9 8.2 110.7 5.5 90.2 6.9 70.4 6.4 93.5 5.8

Dense UPGMA Logarithm 316.7 25.2 1.9 114.0 4.0 92.6 3.4 73.0 3.3 98.2 2.8

Dense UPGMA Proportional 310.9 3.8 7.0 102.9 10.6 89.3 10.6 68.9 9.5 90.4 10.6

Dense UPGMA Square Root 327.8 11.6 5.0 113.7 3.7 91.9 4.1 72.2 4.0 95.9 4.2

Dense Ward Logarithm 339.8 1.2 9.2 109.4 6.2 89.8 7.9 69.5 8.0 91.9 7.3

Dense Ward Proportional 310.2 4.8 5.9 102.7 11.1 89.2 11.1 68.2 11.3 90.3 11.2

Dense Ward Square Root 329.9 0.7 11.4 106.7 7.7 89.6 8.7 69.1 9.1 91.3 8.6

† Recovery metrics: RecMed, recovery of median; RecIQR, recovery of interquartile range; RecRange, recovery of range; RecNCat, recovery of number of unique categories; RecH, recovery of Shannon's Index.

‡ Distance calculations were conducted using either all of the variables (Sparse) or only those few with 5% or less missing values (Dense).

§ Clustering of accessions methods: UPGMA, unweighted pair-group method using arithmetic averages; Ward, Ward’s minimum variance. ¶ Sampling accessions from each cluster in proportion to cluster size (proportional), natural logarithm of the size (logarithm), or square root of the size (square root).

# Means, over 200 simulations, of medians of recovery metrics of 1000 potential core subsets. †† Means, over 200 simulations, of ranks of selection methods. Methods were ranked, with highest values receiving the lowest ranks, within each simulation based on medians, over 1000 potential core subsets, of recovery metrics.

58

Table 5. Comparisons of core subset selection methods in terms of diversity of 1000 potential core subsets selected from 200

complete collections simulated with values removed at the rates given by set 2 (see Table 2) from accessions selected randomly from a

uniform distribution.

Core subset selection method RecMed† RecIQR† RecRange† RecNCat† RecH†

Distance Variables‡ Clustering§ Sampling¶

Mean # of accessions Medians# Ranks†† Medians Ranks Medians Ranks Medians Ranks Medians Ranks

Sparse UPGMA Logarithm 315.4 4.3 4.7 114.1 1.2 89.8 7.0 74.1 1.5 99.7 1.1

Sparse UPGMA Proportional 312.2 6.0 3.6 103.0 9.3 89.4 7.4 70.4 7.2 90.8 9.2

Sparse UPGMA Square Root 325.8 1.4 7.3 111.3 3.3 90.1 4.1 73.7 2.1 97.1 2.5

Sparse Ward Logarithm 339.9 3.7 6.4 106.2 7.5 89.6 4.8 70.6 6.6 93.1 6.5

Sparse Ward Proportional 309.7 4.5 5.1 102.7 10.1 89.2 10.0 68.2 11.4 90.3 11.3

Sparse Ward Square Root 330.0 3.7 8.0 104.6 8.9 89.5 5.9 69.8 8.7 92.0 8.0

Dense UPGMA Logarithm 335.4 2.4 6.0 113.0 2.0 89.5 5.4 72.0 3.7 97.0 2.5

Dense UPGMA Proportional 310.5 6.2 3.4 102.9 9.6 89.2 10.0 70.5 7.0 90.5 9.9

Dense UPGMA Square Root 328.4 1.1 9.1 108.5 5.3 89.5 5.6 71.9 4.1 95.0 4.6

Dense Ward Logarithm 340.1 1.1 8.8 110.4 3.8 89.7 2.9 70.8 6.1 95.0 4.6

Dense Ward Proportional 309.7 4.7 4.5 102.7 10.3 89.2 9.8 68.2 11.5 90.3 11.5

Dense Ward Square Root 330.0 0.9 11.2 106.9 6.7 89.5 5.1 70.0 8.2 93.3 6.5

† Recovery metrics: RecMed, recovery of median; RecIQR, recovery of interquartile range; RecRange, recovery of range; RecNCat, recovery of number of unique categories; RecH, recovery of Shannon's Index.

‡ Distance calculations were conducted using either all of the variables (Sparse) or only those few with 5% or less missing values (Dense).

§ Clustering of accessions methods: UPGMA, unweighted pair-group method using arithmetic averages; Ward, Ward’s minimum variance.

¶ Sampling accessions from each cluster in proportion to cluster size (proportional), natural logarithm of the size (logarithm), or square root of the size (square root).

# Means, over 200 simulations, of medians of recovery metrics of 1000 potential core subsets.

†† Means, over 200 simulations, of ranks of selection methods. Methods were ranked, with highest values receiving the lowest ranks, within each simulation based on medians, over 1000 potential core subsets, of recovery metrics.

59

Table 6. Comparisons of core subset selection methods in terms of diversity of 1000 potential core subsets selected from 200

complete collections simulated with values removed at the rates given by set 2 (see Table 2) from accessions selected as a contiguous

group.

Core subset selection method RecMed† RecIQR† RecRange† RecNCat† RecH†

Distance Variables‡ Clustering§ Sampling¶

Mean # of accessions Medians# Ranks†† Medians Ranks Medians Ranks Medians Ranks Medians Ranks

Sparse UPGMA Logarithm 320.3 4.1 5.0 113.9 1.5 89.8 5.5 72.8 2.5 98.7 1.4

Sparse UPGMA Proportional 311.3 4.7 5.1 103.2 9.2 89.3 8.0 69.5 8.0 90.7 9.1

Sparse UPGMA Square Root 327.3 1.5 7.6 111.0 3.3 89.9 4.2 72.3 3.3 96.2 2.9

Sparse Ward Logarithm 340.0 4.6 5.6 104.8 7.6 89.7 4.6 69.8 7.7 92.1 7.6

Sparse Ward Proportional 309.6 4.9 4.9 102.7 10.1 89.1 9.7 68.2 11.0 90.3 10.8

Sparse Ward Square Root 330.0 4.4 6.8 104.0 8.7 89.5 6.0 69.4 8.5 91.5 8.4

Dense UPGMA Logarithm 333.1 2.4 6.1 112.5 2.2 89.4 6.4 72.2 3.0 97.0 2.3

Dense UPGMA Proportional 310.6 6.4 3.2 103.0 9.5 89.1 9.6 70.6 5.9 90.6 9.2

Dense UPGMA Square Root 328.1 1.1 9.2 108.2 5.3 89.4 6.2 72.0 3.1 94.9 4.3

Dense Ward Logarithm 340.1 1.2 8.8 110.2 3.8 89.8 3.0 70.3 6.7 94.8 4.5

Dense Ward Proportional 309.7 4.6 4.7 102.7 10.3 89.1 9.7 68.2 10.9 90.3 11.0

Dense Ward Square Root 330.0 1.0 11.0 106.8 6.5 89.5 5.1 69.8 7.5 93.1 6.4

† Recovery metrics: RecMed, recovery of median; RecIQR, recovery of interquartile range; RecRange, recovery of range; RecNCat, recovery of number of unique categories; RecH, recovery of Shannon's Index.

‡ Distance calculations were conducted using either all of the variables (Sparse) or only those few with 5% or less missing values (Dense).

§ Clustering of accessions methods: UPGMA, unweighted pair-group method using arithmetic averages; Ward, Ward’s minimum variance.

¶ Sampling accessions from each cluster in proportion to cluster size (proportional), natural logarithm of the size (logarithm), or square root of the size (square root).

# Means, over 200 simulations, of medians of recovery metrics of 1000 potential core subsets.

†† Means, over 200 simulations, of ranks of selection methods. Methods were ranked, with highest values receiving the lowest ranks, within each simulation based on medians, over 1000 potential core subsets, of recovery metrics.

60

Figure 3. Plot of cumulative means, over simulations, of median recovery of interquartile range,

over 1000 potential core subsets per simulation. Simulations were generated by removing values

from randomly chosen individual accessions with missingness rates given by set 1. The values

of the means of all 200 simulations are shown in Table 3.

0 50 100 150 200

10

51

10

11

51

20

MedRIQR

Index

Me

an

ove

r sim

ula

tio

ns o

f m

ed

ian

s o

ve

r co

res

61

Figure 4. Plot of cumulative means, over simulations, of median recovery of interquartile range,

over 1000 potential core subsets, ranked across methods within each simulation. Simulations

were generated by removing values from randomly chosen individual accessions with

missingness rates given by set 1. The mean ranks, over all 200 simulations, are shown in Table

3.

0 50 100 150 200

02

46

81

01

2RRIQR

Index

Me

an

of R

an

ks

62

Appendix Table 1. Assignments of countries to world macro region components.

Carribean Central America South America Northern America Eastern Asia South-Central Asia

South-eastern Asia

Cuba Guatemala Argentina Canada China Afghanistan Indonesia

Honduras Bolivia United States Japan Bangladesh Philippines

Mexico Brazil

Korea, North Bhutan Thailand

Nicaragua Chile

Korea, South India

Panama Colombia

Mongolia Iran

Ecuador

Taiwan Kazakhstan

Paraguay

Kyrgyzstan

Peru

Nepal

Uruguay

Pakistan

Venezuela

Tajikistan

Turkistan

Turkmenistan

Uzbekistan

Western Asia Eastern Europe Northern Europe Southern Europe

Western Europe Oceania Unknown

Ancient Palestine Belarus Denmark Albania Austria Australia Asia

Armenia Bulgaria Estonia Andorra Belgium New Zealand Europe

Asia Minor Czech Republic Finland Bosnia and Herzegovina France

Uncertain

Azerbaijan Czechoslovakia Iceland Croatia Germany

Unknown

Cyprus Former Soviet Union Ireland Former Yugoslavia Netherlands Georgia Hungary Latvia Greece Switzerland Iraq Moldova Lithuania Italy

Israel Poland Norway Macedonia

Jordan Romania Sweden Malta

Lebanon Russian Federation United Kingdom Portugal

Oman Slovakia

Slovenia

Saudi Arabia Ukraine

Spain

Syria

Yugoslavia

Turkey

Yemen

63

Appendix Table 2. Comparison of original core subset against the reselected core subset in

terms of recovery metrics.

Core Subset RecMed† RecIQR† RecRange† RecNCat† RecH†

Original 4.0 101.5 97.7 92.5 101.0 New core 9.2 95.8 98.2 94.9 109.3

† Recovery metrics: RecMed, recovery of median; RecIQR, recovery of interquartile range; RecRange, recovery of range; RecNCat, recovery of number of unique categories; RecH, recovery of Shannon's Index.

64

CHAPTER 3

COMPARISON OF LINEAR MIXED MODELS FOR MULTIPLE

ENVIRONMENT PLANT BREEDING TRIALS

Carl A. Walker1, Fabiano Pita

2, Kimberly Garland Campbell

1,3

1 Dept. of Crop and Soil Sciences, Washington State University;

2 Quantitative Genetics Group, Dow

AgroSciences; 3 USDA-ARS, Wheat Genetics, Quality, Physiology, and Disease Research Unit

Abstract

Evaluations of genotypes in varied environmental conditions are referred to as multiple environment trials

(MET) and often show genotype by environment interactions, necessitating estimation of effects of

genotypes within environments. Empirical best linear unbiased predictions can provide more accurate

estimates of these effects, depending upon the mixed model used. The objective of this work was to

simulate and analyze MET data sets to determine which linear models provide the most accurate

estimates and determine how the choice of ideal model changes as a result of different MET conditions.

Simulations varied in terms of numbers of genotypes and environments, variances and covariances of

genotype responses within and between environments, experimental design, and experimental error

variance. Simulated MET were fit with mixed models with or without genetic relationship matrices

(GRM) and with structures of varying complexity used to model relationships among environments.

Estimates from these analyses for effects of genotypes within environments were compared to the

simulated values. The model that included a GRM and constant variance-constant correlation structure

was the most accurate for the largest number of scenarios. Models including GRM that allowed

heterogeneous environmental variances with constant correlations often resulted in greater accuracy when

MET were simulated with heterogeneous variances among environments. Factor analytic models with

65

GRM were the most accurate only in a subset of scenarios simulated with complex relationships among

environments, 100 or more genotypes, less than 40 environments, and low error variance.

Introduction

Evaluations of genotypes in varied environmental conditions are referred to as multiple environment trials

(MET), and are used in advanced stages of plant breeding programs to identify genotypes with superior

performance across environments and within specific environments or sets of environments. Yield data

from MET often show genotype by environment interactions (G×E), and, in practice, are most often are

analyzed using a two-way analysis of variance (ANOVA) model where genotype, environment, and their

interaction are treated as fixed effects:

where yijk is the yield (or other response variable) of the kth replicate of the i

th genotype in the j

th

environment, μ is the overall mean, gi is the fixed effect of the ith genotype, ej is the fixed effect of the j

th

environment, (ge)ij is the interaction between the ith genotype and the j

th environment, and ijk is the

experimental error associated with the ijkth observation; i = 1…Ng, j =…Ne, k = 1…Nr. The estimates of

genotype within environment effects are the means across replicates of each genotype in each

environment (i.e. cell means). The major disadvantage of this approach is that these cell means estimates

are usually based on very little data (dependent on the number of replicates) and so are less predictively

accurate than some alternative estimators. Additionally, this approach cannot be used to estimate GE

effects when genotypes are not replicated within environments, since the effect of GE and experimental

error would be confounded. That is, experimental error cannot be separated from the specific effect of

each genotype and environment combination.

Various estimators have been shown to be more accurate for MET than cell means. These

include the Additive Main effects Multiplicative Interaction (AMMI) models (Gauch and Zobel, 1988;

ijkijjiijk geegy )(

66

Gauch, 1988) and sites regression (SREG; Cornelius and Crossa, 1999) model families, which are

sometimes referred to as linear-bilinear models. These two fixed-effect model families include sums of

multiplicative terms, resulting from singular value decomposition, replacing (ge)ij, in the case of AMMI,

or gi +(ge)ij for SREG. The AMMI and SREG models have been shown to be relatively equivalent in

terms of predictive accuracy (Cornelius and Crossa, 1999). Like the analysis of G×E in a fixed-effects

ANOVA, the standard implementation of these models cannot be used when data from any genotype and

environment combination is missing. However, the expectation-maximization algorithm has been used to

impute missing data with the AMMI model (Gauch and Zobel, 1990).

As an alternative to the above models that consider genotype effects within environments as

fixed, these effects can be considered random values and modeled using linear mixed models, which have

important inherent benefits over fixed-effects models. Mixed models can easily incorporate non-constant

error variance structures, including within-field spatial correlation, in the same model as genotype and

environment effects. Additionally, mixed models easily handle missing data and, with some specific

models, even unreplicated data. Some models can predict genotype effects in environments they were not

tested in.

Mixed model analyses have a long history in animal breeding (Henderson, 1973), and recent

research has demonstrated new approaches to make them very effective in plant breeding. If a mixed

linear model is used, genotype effects are estimated as empirical best linear unbiased predictors (BLUPs)

calculated using the estimated variance parameters. A very basic mixed model would assume a random

effect of genotypes within environments that has a variance-covariance matrix of σ2I, where σ

2 is a

constant variance parameter and I is an identity matrix. In most breeding programs, plant or animal, at

least a portion of the genotypes assessed in a trial are related and therefore would be expected to show

some correlation in their effects. Pedigree information can be incorporated into the model through a

Genetic Relationship Matrix (GRM) to take advantage of these relationships and improve predictive

accuracy (Henderson, 1973). The GRM is also known as the additive relationship matrix, or numerator

relationship matrix and is usually symbolized as A, and A = 2[fii′], where fii′ is the coefficient of parentage

67

or coancestry between genotypes i and i′ (Mrode and Thompson, 2005). When a GRM is used in a linear

mixed model, the performance of genotypes can be predicted for environments in which they were not

grown. The GRM allows the model to use information from related genotypes to predict the unreplicated

genotype, because known covariances are modeled between pairs of related genotypes.

Another modification that may improve the predictive accuracy of mixed models is to increase

the complexity of the variance-covariance matrix of the random G×E effect beyond σ2Ig (Piepho, 1994).

The matrix can be described as the product of two other matrices, such that Gge = Ge ⨂ Ig, where Ig is an

identity matrix with dimensions equal to the number of genotypes, and structures of varying complexity

can be used to model Ge (Smith et al., 2001). The structures for Ge reflect patterns of relationships among

environments in terms of similar genotype performance. One option for Ge is the factor analytic (FA)

structure, which increases in complexity with the number of factors used. When using a FA structure

researchers must choose how many factors to include. More factors allow for greater flexibility, but may

reduce model parsimony. These structures are more parsimonious than unstructured Ge when the

numbers of environments is sufficiently high compared to the number of factors. Factor analytic

structures can be combined with pedigree information to improve model fit, as measured by information

criteria (Crossa et al., 2006; Oakey et al., 2007; Kelly et al., 2009; Beeck et al., 2010). Previous studies

have analyzed a limited number of real MET data sets that are limited in the scenarios (number of

genotypes, number of environments and relationships among genotypes or envrionments) that have been

evaluated. A simulation study could determine if the FA model with a GRM is the most effective model

for a much wider range of MET. Additionally, relationships between MET conditions and the best choice

of model can be thoroughly investigated, since simulations allow conditions to be changed individually.

In this work we simulated MET across the ranges of conditions expected in typical MET, and

analyzed these simulations using multiple mixed linear models with different variance-covariance

structures for the random effects of genotypes within environments. The objective of this work was to

determine which of these models would be most effective in breeding programs by consistently providing

the most accurate estimates and how the ideal model changes as a result of different MET conditions.

68

Methods

Simulations

Simulations were conducted to generate data sets that resemble MET across a range of conditions. The

simulations included randomly generated effects of genotypes within environments and phenotypes of

each observation, resulting from the addition of a random experimental error to the genotype within

environment effect. These simulated data sets covered a range of scenarios with varying numbers of

environments and genotypes, environmental relationship patterns, field trial designs, and magnitudes of

experimental error.

Genotype effects within each environment were simulated as random samples from multivariate

normal distributions with means of 0 and covariance matrices (ΣGE) that differed among scenarios. The

ΣGE are equivalent to correlation matrices multiplied by a constant scalar variance component of unity.

The ΣGE were the Kronecker (or direct) product of a matrix of correlations between environments (ΣE) and

a matrix of correlations between genotypes (ΣG). The ΣE were generated in four sizes: 5 × 5, 10 × 10, 20

× 20, and 40 × 40, corresponding to scenarios with 5, 10, 20, or 40 environments, respectively. The ΣE

were themselves generated as Kronecker products of two matrices: EY ⨂ EL. The ΣE matrices for five

and ten environments were simply products of EL matrices of size 5 × 5 or 10 × 10 with a 1 × 1 EY matrix

of unity, whereas the ΣE of 20 and 40 environments were Kronecker products of 4 × 4 and 8 × 8 EY

matrices with a 5 × 5 EL matrix. This allows the ΣE to better reflect possible complex patterns of

relationships among large numbers of environments. For example, the five and ten environment scenarios

reflect MET with five or ten locations in a single year, whereas the 20 and 40 environment scenarios

reflect MET with five locations evaluated over four or eight years.

The ΣE are grouped into three classes of patterns of correlations: compound symmetry A (CSA),

compound symmetry B (CSB), and Toeplitz (Toep). For the compound symmetric classes both the EY

and EL matrices had constant off-diagonal elements of 0.3 and 0.7 for CSA and 0.4 and 0.4 for CSB. Both

69

the EY and EL matrices for the Toeplitz class of patterns had bands of constant correlation parallel to the

diagonal with the greatest correlations next to the diagonal. The exact correlations differed by the sizes of

ΣE, but in all cases included negative values for the element farthest from the diagonal. These specific

correlation values are by no means the only correlation values that could occur in a MET, but were

chosen to be similar to values observed in the Washington State University soft white wheat variety trials

(details provided in the appendix).

The EL were generated with three levels of variance heterogeneity: homogeneous variances (CSA,

etc.), heterogeneous variances (CSAH, etc.), and very heterogeneous variances (CSAVH, etc.). To do so,

the EL were pre and post multiplied by a diagonal matrix of standard deviations. For the heterogeneous

variances the variances ranged from 0.5 to 1.5 for the least and greatest variances, respectively. The very

heterogeneous variances ranged from 0.2 to 2. The three to one ratio is often used as a rule-of-thumb

cutoff for considering variances to be heterogeneous, but greater levels of heterogeneity can occur among

highly variable environments.

Four options were considered for ΣG, corresponding to scenarios of 25, 50, 100, and 150

genotypes. In order to choose ΣG, first a GRM was estimated from the pedigree in a Dow AgroSciences

early generation study of North American Stiff Stalk maize inbred lines. The four options were

overlapping submatrices of this GRM: the first 25, 50, 100, and 150 rows and columns in the GRM.

With four options for sizes of ΣE, three classes of patterns for ΣE, three levels of variance

heterogeneity, and four options for ΣG, taking all combinations results in a total of 144 different simulated

scenarios for ΣGE. For each ΣGE, genotype effects were generated by randomly sampling from a

multivariate normal distribution with variance-covariance matrix equal to ΣGE 1000 times.

For each set of genotype within environment effects, simulations were generated for three trial

designs (RCBD – randomized complete block designs, MAD – modified augmented designs, and

unreplicated designs) and two experimental error variances. Since spatial field effects were not

considered, the only effect of the experimental design was to determine the number of replicates of each

genotype in each environment. Therefore, other designs commonly used in MET will also have either

70

equal or unequal replication, regardless of blocking structure, and so would not add much beyond the

designs tested here. For the RCBD scenarios, every genotype was replicated three times. In the MAD

scenarios, genotypes were not replicated except for primary and secondary “checks” that were replicated

five and two times, respectively, for every 23 non-check genotypes. In the unreplicated design, each

genotype appeared once in each environment. A fixed effect for each environment was simulated by

sampling one value from a gamma (3, 2) distribution and multiplying it by 3. This distribution and

constant multiplier were chosen to provide environment effects that are of similar magnitude to the

simulated genotype effects and error. Every observation had a unique phenotype equal to the effect of the

genotype in an environment plus a simulated environment effect and a random experimental error

selected from a normal distribution with mean of 0 and two possible error variances (σe2 = 0.5 or 2.0).

These error variances corresponded to ratios of about 1.7 and 0.4, respectively, of variance of the

genotype within environment effects divided by the variance of the experimental error for a given

simulation. The ratios values varied slightly around these averages among scenarios with greater values

for scenarios with more environments.

Analyses

A total of 20 related linear mixed models were compared for their ability to predict the simulated

genotype effects within environments based on the simulated phenotypic data. Models were fit using the

program ASReml-R, release 3.0 (Butler et al., 2009), which is a package for the R programming language

(R Development Core Team, 2010). The models were all of the form:

,

where y is the vector of observed phenotypes, β is a vector of fixed environment effects, X is the

associated design matrix, γ is the vector of genotype within environment effects, Z is the associated

design matrix, and ε is the vector of experimental error terms. The joint distribution of γ and ε is given

by:

εZγXβy

71

,

where is a constant error variance and G is a covariance structure that varies for each of the 20

models and is separable:

G = GE ⨂ B,

where both GE and B were varied, resulting in 20 models:

B = I or A,

where I is an identity matrix and A is a GRM. This study evaluated the ideal situation, when the GRM

used perfectly reflects the actual relationships among the genotypes; therefore, A was set equal to the ΣG

used to simulate of the data set being analyzed.

Ten structures were used to model GE and these are shown below for a five environment

example. The simplest structure was independence (no covariance) and identical variances (ID):

.

A generalization of this is the diagonal structure, where environments are still independent, but each can

have different variance (Diag):

.

A constant covariance can be added, yielding compound symmetric structures with uniform (CorV) or

heterogeneous variances (CorH):

I0

0G

0

0

ε

γ2

,MVN~R

2R

2

2

2

2

2

0000

0000

0000

0000

0000

EG

25

24

23

22

21

0000

0000

0000

0000

0000

EG

72

.

Six FA structures were compared. Structures were fit with one to three factors and uniform or

heterogeneous specific variances (FA1U, FA2U, FA3U, FA1H, FA2H, FA3H):

GE = ΛΛ′ + Ψ,

where Λ is a matrix whose columns are the factors and Ψ is a diagonal matrix of specific variances:

In an effort to improve convergence rates, models were fit sequentially in the order of the GE

structures described above. Parameter estimates from simpler models were used as the starting values of

the next more complex structure for which the simple structure was a specific case. If a model did not

converge, the next more complex structure was not attempted. The percentage of simulations of a

scenario for which a model converged was defined as the convergence rate.

In addition to the mixed linear models, estimates of genotype effects within environments were

derived from cell means (the mean of the replicates of a genotype in each environment) and Additive

Main effects Multiplicative Interaction (AMMI) models. The AMMI models were fixed effects linear

models with main effects for genotype and environment. The effects of genotype by environment

interaction were replaced with an approximation of the matrix using a reduced set of the principle

211111

12

1111

112

111

1112

11

11112

1

EG

2511111

1241111

1123111

1112211

1111211

EG

53

43

33

23

13

52

42

32

22

12

51

41

31

21

11

52

42

32

22

12

51

41

31

21

11

51

41

31

21

11

or,,

Λ

5

4

3

2

1

5

0000

0000

0000

0000

0000

or

73

components (Gauch, 1988). Only RCBD scenarios were analyzed using the AMMI models, since main

effects of genotype and environment cannot be separated from the interaction term if genotypes are not

replicated in an environment. The AMMI models were fit with all possible numbers of principle

components. The most accurate of the AMMI models for each simulation, as judged by the root mean

square error of prediction (described below), were compared to the mixed linear models. However, this

was often not the most accurate AMMI model as measured with the correlations between estimates from

the model and the simulated data.

To evaluate each model‟s predictive accuracy, Pearson and Spearman correlations between the

estimated effects of genotypes within environments and the simulated effects were calculated for each

simulation. Additionally, root mean squared errors of prediction (RMSEP) were calculated as:

, where p is the number of genotype by environment combinations, γi is the

ith effect of a genotype within an environment, and is the i

th predictor. Both the Akaike and Bayesian

information criteria (AIC and BIC, respectively) were also calculated for each of the mixed models. The

models were ranked based on each of these accuracy measures and the information criteria within each

simulation. Models that were not fit or did not fit were all given the same rank value just inferior to the

least accurate. The accuracy measurements for each model varied greatly among simulations of the same

scenario, whereas the rankings varied to a lesser degree. In order to summarize the results for each

scenario, means were taken, over simulations, of the accuracy measurements and the rankings of the

models.

For each scenario, a different number of simulations were analyzed with every model to ensure

that sufficient simulations were analyzed to produce reliable means. Simulations were analyzed

sequentially, and following the analysis of each simulation, rankings of RMSEP, Pearson correlations,

and Spearman correlations were averaged over the analyzed simulations. When these three cumulative

mean rankings did not change by more than 0.05 one simulation to the next or over 10 simulations, no

p

ip

2

ii γγ~1

RMSEP

iγ~

74

additional simulations were analyzed for that scenario. The number of simulations necessary to achieve

this stability varied greatly among scenarios. Due to time constraints, the scenarios with the largest data

sets were not analyzed and do not appear in the results.

Results and Discussion

Justification of Approach

Since the simulations within each scenario were all random realizations of MET described by the

scenario, no particular simulation was more valid than the others and a mean over all the simulations is a

valid summary of model performance for a given scenario. A model can be considered to be the best for

a given scenario if it has the greatest accuracy averaged over all simulations. However, using mean

accuracy excessively weights performance in simulations that result in extremely low or high accuracy for

a given model. Alternatively, the best model might be defined as one that has the greatest accuracy in the

most simulations. To evaluate this, model accuracy can be ranked within each simulation, and the mean

of the ranks for each model calculated. High accuracy values for a given simulation have no additional

influence if the model is already top ranked, but an occasional low ranking can still pull the mean rank

down. This approach rewards models that perform well consistently rather than those with inconsistent

extraordinary performance. This was the approach that we took to summarize our results. Models were

ranked with '1' being the best rank and a greater number indicating worse performance. The mean values

and ranks of all accuracy measurements for each model and scenario are given in Supplemental Table 1.

Results from RMSEP, Pearson correlations, and Spearman correlations differ, but conclusions as

to the best model were generally consistent after averaging over simulations. Results from the

information criteria were highly variable among simulations. Additionally, with our simulation approach,

the “true” simulated effects of genotypes within environments were available, allowing for direct

comparison of the simulated and estimated values. Therefore, the information criteria were of limited

value. The Spearman correlations may be more applicable than the Pearson from the perspective of

75

breeding, since genotypes are often selected based on their rankings, rather than observed values.

However, ranking of genotypes is not the only use of estimates of genotype responses in different

environments. Since these estimates are also used to evaluate traits like stability, the deviations of

estimates from the true values may be more important than how well the ranks agree. The accuracy

measured by RMSEP reflects how much estimates deviated from the simulated values, penalizing more

extreme values to a greater extent. This was desirable, as extreme errors in estimation are more likely to

cause rank changes among genotypes, leading to changes in selection decisions. In the interest of brevity,

we limit our discussion to RMSEP accuracy. The important conclusions from these data are summarized

below along with illustrative graphs of the data.

Choice of a Default Model

The plot of the mean ranks, in terms of RMSEP, for each model in each scenario showed that the

model with the genetic relationship matrix and a constant variance-constant correlation structure

(GRM_CorV) was the best model in many, but not all, situations (Figure 1). When the mean rankings of

the models were graphed, there was a pattern of nine troughs that correspond to the scenarios with the

fewest environments (Figure 1). In these scenarios the mean ranking of the best model was a greater

number. This indicates that there was less consistency in the top model rankings among simulations.

That is, the best model was not ranked first in all simulations of a scenario, or even necessarily always in

the top three. However, no other model did as well overall. Because of this pattern, in order to better

visualize the best model for each scenario, we graphed the results by ranking the means of the ranks for

each scenario. This standardized the graph in Figure 1, so that the best model was in the top row in each

scenario (Figure 2). The top row was predominantly GRM_CorV or GRM_CorH. To simplify even

more, we graphed results from just the GRM_CorV and GRM_CorH models (Figure 3). The points in the

top row indicate which of GRM_CorV or GRM_CorH was the best model for a scenario, whereas blanks

in the top row indicate that a model other than these two was the best for a scenario. Generally, the

GRM_CorV model appears to be a good default choice of model, i.e. if one decided which model to use

76

without additional specific information about a MET, since it was often the best ranked model and was

always in the top 10 models.

Models for Specific Scenarios

When all scenarios were examined together, it was difficult to determine if there were any

patterns as to when GRM_CorV, GRM_CorH, or another model was the best, but subsets of scenarios

allowed such evaluations. The GRM_CorV model was the most accurate model in most scenarios with

high error variance, whereas the GRM_CorH model was superior in few scenarios and the other models

were best only rarely (Figure 4). The GRM_CorV was preferred in fewer of the scenarios with low error

variance (Figure 5). In these scenarios the GRM_CorH and other models were the best in more cases.

These results indicate that as plot-to-plot error in a MET increases, the effectiveness of complex models

for GE decreases. The increased noise resulted in inaccurate estimation of the many parameters in the

complex models. In contrast, the CorV and CorH structures more accurately estimated fewer parameters,

compensating for their oversimplification of the pattern of relationships among environments.

One would expect that the GE structure in the most accurate model would be of similar

complexity to the pattern used to simulate relationships among environments. While such relationships

occurred, they were not entirely predictive. The GRM_CorV model was the best choice in almost all

scenarios simulated with a compound symmetric pattern for GE (Figure 6). This was to be expected,

since the CorV structure exactly matched the simulated pattern for five or 10 environments, and with 20

or 40 environments the simulated pattern only had two possible values that did not dramatically differ

from each other. In the scenarios simulated with compound symmetric correlation patterns and

heteroskedasticity, one might expect the CorH structure to be the best choice. However, it was only

superior to the CorV structure in some scenarios (Figure 7). An explanation for this is that the CorH

structure traded parsimony for flexibility, and in doing so, increased the risk that it modeled noise rather

than capturing actual differences in variances. Since the variance heterogeneity was not especially large

in these scenarios, there were many occasions where the ability to model this heterogeneity was not

valuable. With compound symmetric scenarios that include very heterogeneous environmental variances,

77

the GRM_CorH was generally the best, losing out to other models only in a small number of cases

(Figure 8). With these scenarios it was more important to model the greater degree of heterogeneity.

When scenarios were simulated with Toeplitz patterns the GRM_CorV and GRM_CorH models

were still generally the best, with GRM_CorH more often superior as heteroskedasticity increased (Figure

9). Models other than GRM_CorV or GRM_CorH were superior in scenarios with Toeplitz patterns, 100

or more genotypes, less than 40 environments, and low error variance (Figure 10). In these scenarios, one

of the models with a GRM and FA structures was almost always superior to the GRM_CorV or

GRM_CorH. This was also true with either 50 genotypes or a high error variance (not both) if a RCBD

was used (not shown). Unfortunately, prediction of error variance and patterns of relationships among

environments is difficult, especially due to dramatic year-to-year variability.

The number of genotypes, the number of environments in which they were tested, and the

replication as determined by the experimental design had limited effect on the choice of the best model.

When only 25 genotypes are included, the GRM_CorV model was preferred over the GRM_CorH and

other models in most scenarios (Figure 11). Complex relationships in genotype performance among

environments were not influential with so few genotypes. Therefore, it was better to just assume constant

correlations of genotype performance among environments. The GRM_CorH and more complex models

were more frequently ranked better than GRM_CorV as the number of genotypes increased. The reverse

pattern was true for numbers of environments. As the number of environments was increased, the

effectiveness of the more complex models decreased. In the MET simulations, the overall range of

correlations between pairs of environments was about the same (0.7 to -0.1 for 5 and 10 environments and

0.75 to 0.2 for 20 and 40 environments) for all numbers of environments. As the number of environments

was increased, the differences among the correlation parameters decreased resulting in many correlation

parameters with similar values, and reducing the benefits of having large numbers of parameters in the

model. Such a constraint on the correlations is also likely in reality, unless the large number of

environments cover a wider geographic or temporal range that would extend the range of possible

correlations beyond those tested here.

78

Experimental design, and by extension level of replication had a limited effect on choice of

model. FA structures were more accurate with the increased replication of the RCBD designs than in

scenarios than with less replication (Figure 12). This is again a case where more data was available to

improve the accuracy with which greater numbers of parameters are estimated.

It is also informative to look at only the models that did not incorporate a GRM, since in some

cases researchers may not have the ability to estimate a GRM. In this study we assumed the ideal case,

where the GRMs used in the analyses exactly matched those used in the simulations, but in reality, GRMs

are estimated with error. The CorV and CorH models were often the best options for non-GRM models,

with the CorH model often preferable in scenarios with variance heterogeneity (Figure 13). The FA

models were often more accurate for the Toeplitz pattern scenarios, especially for scenarios with low

error variance. Therefore, it appeared that the preferred GE structures were similar for the non-GRM and

GRM models for the same scenarios.

Discussion

Although we endeavored to simulate a range of scenarios that cover most MET, the scenarios may not all

be equally likely. Most MET do not test large numbers of genotypes in many environments, except over

multiple years. Even when data from multiple years and locations is analyzed, rarely are the same

genotypes tested over all environments. Therefore, our simulations with both 150 genotypes and 40

environments are relatively unimportant. On the other hand, small numbers of genotypes are often tested

over many locations and years, and large numbers of genotypes are often tested in few environments.

The actual incidence of the various patterns for ΣE is difficult to determine, since a thorough analysis,

preferably with cross validation, is necessary to even estimate the manner in which test environments are

related in a given MET.

The simulated patterns for ΣE are similar to what we would expect for most MET. Some MET

include environments that are all fairly similar in terms of genotype performance. For example, elite

cultivars are usually ranked similarly in multiple locations in the same region in a single year. Such MET

79

would have an underlying pattern of environmental relationships similar to a compound symmetric

pattern. MET that include environments that are very dissimilar to the others tested, may be more

common. This might occur when a MET includes one year when the weather was different than normal.

The Toeplitz patterns for ΣE provide a wide range of correlations, including negative values.

One might argue that our simulations should have included scenarios with independence among

environments; however, when modeling genotype effects within environments (without fixed genotype

effects), independence would only occur if genotype performance in one environment had no relationship

to performance in another. This independence would only occur in reality if the environments tested in

had completely different climates and/or the genotypes tested only differed in terms of specific response

to those environments.

Other researchers have previously compared the same mixed models as in this study, usually fit to

real data sets. Their results generally match ours for the most closely matching simulations. Piepho

(1998) observed that, for five MET, empirical BLUPs based on factor analytic structures were more

accurate than least squares estimates of cell means and usually better than predictions from unstructured

covariance matrices. We also saw greater accuracy from factor analytic models than the cell means.

Crossa et al. (2006) evaluated a wide range of mixed linear models on a single wheat MET with

29 genotypes, 16 locations throughout the world, and a RCBD with three replicates. These researchers

concluded that a nine factor, heterogeneous specific variance, factor analytic model with a GRM derived

from a pedigree, best fit the data as measured by various information criteria. The factor analytic model

was superior to a range of models including those with constant environmental correlation. The trial they

analyzed is most comparable to our scenarios with 25 genotypes, 10 or 20 environments, and RCBD

designs. For these scenarios, we did not find that FA structure models with GRM were superior to

GRM_CorV or GRM_CorH models as measured by average RMSEP ranking. However, rankings of the

models based on information criteria scores were highly variable among simulations, suggesting that our

differences in conclusions may be due to the chance conditions specific to the single trial investigated by

Crossa et al.

80

Other researchers have also found that FA models resulted in better AIC scores than simpler or

more complex structures when fit to real data sets (Kelly et al., 2007; Oakey et al., 2007; Beeck et al.,

2010). Kelly et al. (2007) also used RMSEP values from the analysis of simulated MET to show that FA

or unstructured environmental covariance models were superior to constant correlation models, but did

not evaluate GRM. These authors all evaluated real or simulated data sets which included relatively large

numbers of genotypes, and their results agreed with ours for simulations with 100 or more genotypes with

Toeplitz patterns for ΣE. It is likely that the sets of environments these researchers evaluated had complex

relationship patterns similar to our Toeplitz patterns and the simulations of Kelly et al. had heterogeneous

environmental correlations, making them similar to our Toeplitz patterns.

So and Edwards (2011) evaluated 51 maize MET, each with 6 environments (two years) and 187

to 386 genotypes only partially overlapping between years. They fit these 51 with mixed models that did

not include GRM and included various environmental covariance matrices for five of the six

environments, and performance was predicted in the sixth environment for purposes of cross validation.

These authors found that models that allowed for heterogeneous genetic covariances were often inferior to

compound symmetric models, in agreement with our results from simulations.

Our study and those of other researchers have generally focused on model fit or predictive

accuracy of empirical BLUPs, but MET are also analyzed to estimate relationships among environments.

Identification of highly correlated environments allows researchers to save resources by avoiding

redundant locations. Alternatively, information about highly unrelated environments can aid

interpretation of results from these esoteric locations or years. Although our analyses indicated that

simpler models often provided more accurate estimates, a simple structure for GE does not allow the

estimation of different parameters (variances or covariances) for different environments. In situations

where it is important to estimate these differences, factor analytic structures are preferable even if some

accuracy is sacrificed.

Conclusions

81

The most accurate models for analyzing MET always included a GRM, but differed among simulated

scenarios in terms of the ideal GE structure for estimating relationships in genotype performance among

environments. Complex structures that allow for heterogeneity of environmental variances or correlations

were only successful when the pattern used to simulate the data also had heterogeneous variances or

correlations. Even when this was true, if error variance was high or the MET had few genotypes, simpler

models were often more accurate. These results suggest that while GRM should be used when available,

complex structures for environmental relationships, such as FA, should only be used when evaluating 100

or more genotypes and a complex underlying structure is expected along with low error variance.

Appendix: Real Data as a Basis for Simulations

A simulation study is only effective if the simulations reflect the actual conditions that are being

simulated. Obviously, the actual biology of MET is more complex than our simulations, but a properly

tuned simulation should be able to approximate actual MET data. In order to create simulations that

effectively reflect actual MET in breeding programs, the parameters used in our simulations were chosen

based on the analysis of real MET.

Yield data from Washington State University soft white wheat variety improvement trials were used

as an example MET. The variety trials were grown in RCBD with 4 blocks at each location between

2002 and 2008. We limited our analyses to these years to allow the use of a single error variance

structure for every year. Although 116 genotypes were evaluated over this time period, the genotypes

tested varied among years as only 48 to 54 genotypes were tested each year. Trials were conducted at 21

different locations throughout Washington, but trials were only grown in 18 to 20 locations each year.

Rainfall patterns vary greatly across the state, and the testing locations cover the range of precipitation

zones where wheat is grown in Washington (Appendix Figure 1).

Such a large data set prevented the use of a complex model to fit the entire data set. Instead, various

overlapping subsets of environments were evaluated, for example: all locations in a single year; four years

82

of data from five locations covering the range of precipitation zones. This allowed us to fit complex

models and additionally allowed this MET to approximate a range of smaller MET.

We fit a range of linear mixed models to yield data from these trials, and the models were all of the

same basic form as used to analyze the simulated data:

,

where y is the vector of observed phenotypes, β is a vector of fixed environment effects, X is the

associated design matrix, γ is the vector of genotype within environment effects, Z is the associated

design matrix, and ε is the vector of experimental error terms. The joint distribution of γ and ε is given

by:

,

where R is either a matrix with no covariance and constant variance or variances that are heterogeneous

across environments and G is a covariance structure that varies and is separable:

G = GE ⨂ I,

where I is of size equal to the number of genotypes, which are assumed independent, and GE is one of

seven structures (examples of the first six are provided in the Methods section): independence and

identical variance, independence and heteroskedasticity, constant correlation and constant or

heterogeneous variances, a one factor FA model with constant or heterogeneous specific variances, and an

unstructured model with heterogeneous variances and covariances.

Parameters of GE and R were estimated for each structure and data set, and parameter estimates from

all data sets were used as a baseline for determining ranges of parameters used in simulations. Values of

and were estimated for constant genetic variance structures and the ratios of ranged from

0.4 to .98 depending on which subset of the data was analyzed. These estimated values were used to

choose error variances of 0.5 and 2.0 to use in simulations to achieve similar values of . Multiple

genetic variances were estimated when structures allowed heteroskedasticity. Ratios between the least

εZγXβy

R0

0G

0

0

ε

γ,MVN~

2g 2

e 22eg

22eg

83

and greatest genetic variance ranged from 3:1 to 56:1. Genetic correlations were estimated from models

with constant correlations and ranged from 0.21 to 0.71. When estimated from models with unstructured

covariance ratios, the maximum correlation estimated between a pair of environments ranged from 0.71 to

0.89 depending on data subset, whereas the minimums ranged from -0.56 to 0.29. It is important to note

that the most extreme values for these parameters were estimated for the data subsets that included

multiple years and a small number of locations that covered a range of precipitation levels. Generally,

cultivars are rarely bred for such a range of climates and so most MET do not cover such wide ranges of

conditions. Therefore, our simulations were conducted with less extreme values for correlations and

variance heterogeneity ratios.

References

Beeck, C.P., W.A. Cowling, A.B. Smith, and B.R. Cullis. 2010. Analysis of yield and oil from a

series of canola breeding trials. Part I. Fitting factor analytic mixed models with pedigree

information. Genome. 53: 992–1001.

Butler, D.G., B.R. Cullis, A.R. Gilmour, and B.J. Gogel. 2009. Mixed models for S language

environments ASReml-R reference manual.

Cornelius, P.L., and J. Crossa. 1999. Prediction assessment of shrinkage estimators of

multiplicative models for multi-environment cultivar trials. Crop Science. 39(4): 998–

1009.

Crossa, J., J. Burgueño, P.L. Cornelius, G. McLaren, R. Trethowan, and A. Krishnamachari.

2006. Modeling genotype × environment interaction using additive genetic covariances

of relatives for predicting breeding values of wheat genotypes. Crop Science. 46(4):

1722–1733.

Gauch, H.G. 1988. Model selection and validation for yield trials with interaction. Biometrics.

44(3): 705–715.

Gauch, H.G., and R.W. Zobel. 1988. Predictive and postdictive success of statistical analyses of

yield trials. Theoret. Appl. Genetics. 76(1): 1–10.

Gauch, H.G., and R.W. Zobel. 1990. Imputing missing yield trial data. Theoret. Appl. Genetics.

79(6): 753–761.

Henderson, C.R. 1973. Sire evaluation and genetic trends. J. Anim Sci. 1973(Symposium): 10–

41.

84

Kelly, A., B.R. Cullis, A.R. Gilmour, J.A. Eccleston, and R. Thompson. 2009. Estimation in a

multiplicative mixed model involving a genetic relationship matrix. Genetics Selection

Evolution. 41: 33–42.

Kelly, A.M., A.B. Smith, J.A. Eccleston, and B.R. Cullis. 2007. The Accuracy of Varietal

Selection Using Factor Analytic Models for Multi-Environment Plant Breeding Trials.

Crop Science. 47(3): 1063.

Mrode, R.A., and R. Thompson. 2005. Linear models for the prediction of animal breeding

values. 2nd ed. CABI, Cambridge, MA.

Oakey, H., A.P. Verbyla, B.R. Cullis, X. Wei, and W.S. Pitchford. 2007. Joint modeling of

additive and non-additive (genetic line) effects in multi-environment trials. Theoretical

and Applied Genetics. 114: 1319–1332.

Piepho, H.-P. 1994. Best Linear Unbiased Prediction (BLUP) for regional yield trials: a

comparison to additive main effects and multiplicative interaction (AMMI) analysis.

Theoret. Appl. Genetics. 89(5).

Piepho, H.-P. 1998. Empirical best linear unbiased prediction in cultivar trials using factor-

analytic variance-covariance structures. TAG Theoretical and Applied Genetics. 97(1-2):

195–201.

R Development Core Team. 2010. R: A language and environment for statistical computing.

Available at http://www.R-project.org.

Smith, A., B. Cullis, and R. Thompson. 2001. Analyzing variety by environment data using

multiplicative mixed models and adjustments for spatial field trend. Biometrics. 57(4):

1138–1147.

So, Y.-S., and J. Edwards. 2011. Predictive Ability Assessment of Linear Mixed Models in

Multienvironment Trials in Corn. Crop Science. 51(2): 542.

85

Figure 1. Means, over simulations, of model ranks, where models were ranked in terms of RMSEP

within each simulation. All scenarios evaluated are included, and index denotes each scenario‟s position

in the order. Scenarios are ordered CSA, CSAH, CSAVH, CSB, CSBH, CSBVH, Toep, ToepH, and then

ToepVH, with the indices of the final scenarios of each group equal to 76, 154, 230, 304, 380, 456, 532,

608, and 682, respectively. Within each of these patterns, numbers of environments are ordered 5, 10, 20,

and then 40 environments. Within each number of environments, the numbers of genotypes are ordered

25, 50, 100, and then 150 genotypes. Within each number of genotypes, the experimental designs are

ordered RCBD, MAD, and then unreplicated designs. Within each design, error variances are ordered 0.5

then 2.0.

86

Figure 2. A standardized version of Figure 1, where models have been ranked within each scenario in

terms of their mean ranks. The order of scenarios is the same as Figure 1.

87

Figure 3. The same as Figure 2, but only the models GRM_CorV and GRM_CorH. The order of

scenarios is the same.

88

Figure 4. Equivalent to Figure 3, with only scenarios with high (2.0) error variance included. Scenarios

are ordered CSA, CSAH, CSAVH, CSB, CSBH, CSBVH, Toep, ToepH, and then ToepVH, with the indices

of the final scenarios of each group equal to 39, 78, 116, 154, 192, 230, 268, 306, and 343, respectively.

Within each of these patterns, numbers of environments are ordered 5, 10, 20, and then 40 environments.

Within each number of environments, the numbers of genotypes are ordered 25, 50, 100, and then 150

genotypes. Within each number of genotypes, the experimental designs are ordered RCBD, MAD, and

then unreplicated designs.

89

Figure 5. Equivalent to Figure 3, with only scenarios with low (0.5) error variance included. Scenarios

are ordered CSA, CSAH, CSAVH, CSB, CSBH, CSBVH, Toep, ToepH, and then ToepVH, with the indices

of the final scenarios of each group equal to 37, 76, 114, 150, 188, 226, 264, 302, and 339, respectively.

Within each of these patterns, numbers of environments are ordered 5, 10, 20, and then 40 environments.

Within each number of environments, the numbers of genotypes are ordered 25, 50, 100, and then 150

genotypes. Within each number of genotypes, the experimental designs are ordered RCBD, MAD, and

then unreplicated designs.

90

Figure 6. Equivalent to Figure 3, only including scenarios simulated with a compound symmetric pattern

of relationships among environments. Scenarios are ordered CSA, then CSB, with the indices of the final

scenarios of each group equal to 76 and 150, respectively. Within each of these patterns, numbers of

environments are ordered 5, 10, 20, and then 40 environments. Within each number of environments, the

numbers of genotypes are ordered 25, 50, 100, and then 150 genotypes. Within each number of

genotypes, the experimental designs are ordered RCBD, MAD, and then unreplicated designs. Within

each design, error variances are ordered 0.5 then 2.0.

91

Figure 7. Equivalent to Figure 3, only including scenarios simulated with a compound symmetric pattern

of correlations among environments and heterogeneous variances of genotype effects within

environments. Scenarios are ordered CSAH, then CSBH, with the indices of the final scenarios of each

group equal to 78 and 154, respectively. Within each of these patterns, numbers of environments are

ordered 5, 10, 20, and then 40 environments. Within each number of environments, the numbers of

genotypes are ordered 25, 50, 100, and then 150 genotypes. Within each number of genotypes, the

experimental designs are ordered RCBD, MAD, and then unreplicated designs. Within each design, error

variances are ordered 0.5 then 2.0.

92

Figure 8. Equivalent to Figure 3, only including scenarios simulated with a compound symmetric pattern

of correlations among environments and extremely heterogeneous variances of genotype effects within

environments. Scenarios are ordered CSAVH, then CSBVH, with the indices of the final scenarios of each

group equal to 76 and 152, respectively. Within each of these patterns, numbers of environments are

ordered 5, 10, 20, and then 40 environments. Within each number of environments, the numbers of

genotypes are ordered 25, 50, 100, and then 150 genotypes. Within each number of genotypes, the

experimental designs are ordered RCBD, MAD, and then unreplicated designs. Within each design, error

variances are ordered 0.5 then 2.0.

93

Figure 9. Equivalent to Figure 3, only including scenarios simulated with a Toeplitz pattern of

correlations among environments. Scenarios are ordered Toep, ToepH, and then ToepVH, with the

indices of the final scenarios of each group equal to 76, 152, and 226, respectively. Within each of these

patterns, numbers of environments are ordered 5, 10, 20, and then 40 environments. Within each number

of environments, the numbers of genotypes are ordered 25, 50, 100, and then 150 genotypes. Within each

number of genotypes, the experimental designs are ordered RCBD, MAD, and then unreplicated designs.

Within each design, error variances are ordered 0.5 then 2.0.

94

Figure 10. Equivalent to Figure 3, only including scenarios simulated with a Toeplitz pattern of

correlations among environments, 100 or 150 genotypes, 5 to 20 environments, and low (0.5) error

variance. Scenarios are ordered Toep, ToepH, and then ToepVH, with the indices of the final scenarios

of each group equal to 14, 29, and 43, respectively. Within each of these patterns, numbers of

environments are ordered 5, 10, and then 20 environments. Within each number of environments, the

numbers of genotypes are ordered 100 and then 150 genotypes. Within each number of genotypes, the

experimental designs are ordered RCBD, MAD, and then unreplicated designs.

95

Figure 11. Equivalent to Figure 3, only including scenarios simulated with 25 genotypes. Scenarios are

ordered CSA, CSAH, CSAVH, CSB, CSBH, CSBVH, Toep, ToepH, and then ToepVH, with the indices of

the final scenarios of each group equal to 24, 48, 72, 96, 120, 144, 168, 192, and 216, respectively.

Within each of these patterns, numbers of environments are ordered 5, 10, 20, and then 40 environments.

Within each number of environments, the experimental designs are ordered RCBD, MAD, and then

unreplicated designs. Within each design, error variances are ordered 0.5 then 2.0.

96

Figure 12. Equivalent to Figure 3, only including scenarios simulated with MAD or unreplicated designs.

cenarios are ordered CSA, CSAH, CSAVH, CSB, CSBH, CSBVH, Toep, ToepH, and then ToepVH, with the

indices of the final scenarios of each group equal to 50, 102, 154, 203, 255, 307, 357, 409, and 461,

respectively. Within each of these patterns, numbers of environments are ordered 5, 10, 20, and then 40

environments. Within each number of environments, the numbers of genotypes are ordered 25, 50, 100,

and then 150 genotypes. Within each number of genotypes, the experimental designs are ordered MAD,

and then unreplicated designs. Within each design, error variances are ordered 0.5 then 2.0.

97

Figure 13. A standardized version of Figure 1, where only models not including GRM have been ranked

within each scenario in terms of their mean ranks. The order of scenarios is the same.

98

CHAPTER 4

CONSULTING PROJECTS

As part of my Ph.D. I consulted on three projects where I provided statistical analysis of data

generated by other researchers. Below I introduce each project with an abstract or summary and

then describe the methods used along with reasons for and consequences of the methods. For the

Fusarium crown rot projects I began consulting after all the data had been collected and so had to

adapt my methods to suit the realities of the project. In the cold tolerance project I was able to

help devise methodologies which were then further modified based on the results of my analyses.

Heritability and Genetic Correlation Analyses for Fusarium

Crown Rot Resistance Assays of Wheat Mapping Population

Abstract

The following abstract is from: G.J. Poole, R.W. Smiley, T. C. Paulitz, C.A. Walker, A.H.

Carter, D.R. See and K. Garland-Campbell. 2012. Identification of quantitative trait loci (QTL)

for resistance to Fusarium crown rot (Fusarium pseudograminearum) in multiple assay

environments in the Pacific Northwestern US. Theoretical and Applied Genetics (In Press). It is

used here to summarize the project I collaborated on. My participation began after all data had

been collected and a preliminary analysis had been conducted.

Fusarium crown rot (FCR), caused by F. pseudograminearum and F. culmorum, reduces wheat

(Triticum aestivum L.) yields in the Pacific Northwest (PNW) of the U.S. by as much as 35%.

Resistance to FCR has not yet been discovered in currently grown PNW wheat cultivars. Several

significant quantitative trait loci (QTL) for FCR resistance have been documented on

chromosomes 1A, 1D, 2B, 3B, and 4B in resistant Australian cultivars. Our objective was to

identify QTL and tightly linked SSR markers for FCR resistance in the partially resistant

99

Australian spring wheat cultivar Sunco using PNW isolates of F. pseudograminerarum in

greenhouse and field based screening nurseries. A second objective was to compare heritabilities

of FCR resistance in multiple types of disease assaying environments (seedling, terrace, and

field) using multiple disease rating methods. Two recombinant inbred line (RIL) mapping

populations were derived from crosses between Sunco and PNW spring wheat cultivars Macon

and Otis. The Sunco/Macon population comprised 219 F6:F7 lines and the Sunco/Otis population

comprised 151 F5:F6 lines. Plants were inoculated with a single PNW F. pseudograminearum

isolate (006-13) in growth room (seedling), outdoor terrace (adult) and field (adult) assays

conducted from 2008 through 2010. Crown and lower stem tissue of seedling and adult plants

were rated for disease severity on several different scales, but mainly on a numeric scale from 0

to 10 where 0=no discoloration and 10=severe disease. Significant QTL were identified on

chromosomes 2B, 3B, 4B, 4D, and 7A with LOD scores ranging from 3 to 22. The most

significant and consistent QTL across screening experiments was located on chromosome 3BL,

inherited from the PNW cultivars Macon and Otis, with maximum LOD scores of 22 and 9

explaining 36% and 23% of the variation, respectively for the Sunco/Macon and Sunco/Otis

populations. The SSR markers Xgwm247 and Xgwm299 flank this QTL and are being validated

for use in marker assisted selection for FCR resistance. This is the first report of QTL associated

with FCR resistance in the U.S.

Discussion of Statistical Methods

Separate analyses were conducted for each mapping population, but methods and models were

consistent for the two populations. Experimental units for the growth room and outdoor terrace

bed screening experiments were individual plants within a cone-tainer. Experimental units for

the field screening experiments were individual plots, from which 5 individual stems were sub-

100

sampled and averaged. Averaging over subsamples has the benefit of reducing plot-to-plot

variation due to errors in FCR severity assessment, thereby aiding in identifying QTL and

increasing heritability.

The mapping populations were planted and tested in similar experimental designs for the

growth chamber and terrace screening experiments and a different design in the field. These

experiments were conducted prior to my involvement in the project. In all three screening

experiments, multiple assays were conducted at different time points and, for field screens only,

at different locations. In the greenhouse and terrace beds, the recombinant inbred lines (RIL)

from the mapping populations were divided into sets, each set including the same check

genotypes. These sets were planted in separate growth chambers or sections of the outdoor

terrace beds. A randomized complete block design was implemented within each growth

chamber or terrace bed section. These designs were used, because the desired number of

genotypes and replicates could not be fit in one growth chamber or terrace bed section. The field

experimental design did not divide the genotypes into sets and used a randomized complete

block design within each field. For the growth chamber and terrace experiments, the sets acted

as incomplete blocks, and so necessitated the assumption that there were no interactions between

the conditions of the set (growth camber or terrace bed) and the effects of genotypes (Dean and

Voss, 1998 p. 348). This assumption was also made for randomized complete block designs.

Additionally, effects of sets were assumed to be estimable using only the data from the checks.

These assumptions may not have been valid, especially for the terrace bed sections, since

growing conditions differed noticeably among the terrace bed sections, likely causing differences

in Fusarium infection pressure, to which the genotypes may have shown varying responses. In

some situations, incomplete blocks are unavoidable, due to restrictions on the number of

101

experimental units per block. However, in this study, either a randomized complete block design

or a general complete block design could have been used. Use of either design would have

provided more data, beyond the checks, to estimate growth room or terrace bed section effects.

Use of the general complete block design would have additionally allowed for the estimation of

growth room/terrace bed by genotype interactions (Dean and Voss, 1998 p. 313).

Variance components analyses were conducted separately for each screening experiment

using the SAS System software v9.2 (SAS Institute Inc., 2003) Mixed procedure. Analyzing the

data from all three screening experiments together would have necessitated a very complex

model, since the screening experiments would be expected to have heterogeneous variances and

correlated genotypic effects. The models used all shared the general form: Y = μ + Zγ + ϵ,

where Y is the vector of FCR severity scores, μ is the overall mean, γ is the vector of random

effects, Z is the associated incidence matrix, and ϵ is the vector of experimental errors. The

terms γ and ϵ are considered independent with variance-covariance matrices of G and R,

respectively. The variance-covariance matrices G differed between the screening experiments,

but in all cases all covariances were assumed equal to zero. This describes a standard random

effects model that allows estimation of heritability by estimating variance components for all

conditions known to vary among experimental units. For growth room and terrace data, variance

parameters were included for random effects of assay, sets within assays, replicates within sets

and assays, and genotypes within sets and assays. For field data, variance parameters were

included for assay, replicates within assays, and genotypes. Experimental error variance was

assumed constant ( ) for all observations and normally distributed. The assumption

of constant error variance, especially across assays within screening experiments, may not be

ideal. However, allowing for heterogeneity of error variances would have made heritability

102

estimates much less straightforward. These assumptions were checked, and in the case of the

field data from both populations, modest departure from the assumptions of both constant

variance and normality were detected. Therefore, some inaccuracy in the estimates of residual

variance, standard errors for the variance component estimates, and z-tests are expected (Kutner

et al., 2004 pp. 793–794). Assumptions were not substantially violated for the other data sets.

Broad-sense heritability (H2) on a genotype mean basis was estimated over all assays of

each screening experiment using the formula: H2 =

, where

is the estimated variance of

the genotypic effect and is the estimated variance of the phenotypic effect. Here, H

2 differs

from narrow-sense heritability (h2) only in terms of epistatic variance. Since the genotypes

evaluated here are RIL they are assumed to be fully homozygous. If the response unit is also an

inbred, then dominance has no influence on gain from selection. Therefore, H2 exceeds h

2 only

by the epistatic variance; the degree to which epistasis influences specific quantitative traits is

unknown, but may be substantial (Reif et al., 2009; Miedaner et al., 2011). The SAS code

provided by Holland et al. (2003), modified according to the experimental design of each

screening experiment, was used for these analyses. The heritabilities within each assay and

experiment were also calculated, prior to my participation in the project, but their utility may be

limited. Although such estimates can provide some suggestions of the consistency or variability

of genetic and phenotypic variances, the statistical accuracy of separate estimates of H2 for each

assay are likely lower due to a reduced data set compared to estimates over all assays.

Additionally, estimates within assays are less meaningful from a breeding perspective, since both

selection and response generally occur over a range of assays (e.g. selecting based on results

from multiple years of field testing for genotypes that will perform consistently well in multiple

years and locations).

103

Least squares means for genotypes were calculated within each of the assays and

screening experiments. These least squares means were calculated using fixed effects versions

of the above models in the SAS GLM procedure, prior to my participation. The statistical

signficance of differences in LSmeans between the parents were determined using t-tests. This

removed the main effects for blocks and sets as estimated by ordinary least squares estimation.

The LSmeans values for each assay of each genotype differ from the best linear unbiased

predictions that could have been extracted from a random effects model. These least squares

means were used in the QTL analyses and the analyses described below, since these analyses

were limited in terms of model complexity. The QTL analysis was limited by the composite

interval mapping models used by the QTL analysis software. Individual marker effects and

genetic correlations were analyzed on the same data so that the estimates could be compared to

the results of the QTL analysis.

The influence of assay-to-assay variation on marker effects was investigated using

separate analyses of variance for each combination of population, marker, and screening

experiment using the SAS Mixed procedure. Our analyses used the following model: Y = Xβ +

ϵ, where Y is the vector of FCR severity scores (lsmeans of each assay from the raw data

analysis), β is the vector of fixed effects of assay and marker allele, X is the associated

incidence/design matrix, and ϵ is the vector of experimental errors. Since assays were

considered repeated measures of each genotype (within marker allele), the covariance structure

for ϵ was modeled as compound symmetric for each genotype subject. The assumptions of this

method, that residuals are normally, independently, and identically distributed, were met for each

model within ranges that would not influence results. If significant interactions were detected

between marker and assay effects, each assay was tested for a significant marker effect

104

individually. This analysis acted as partial confirmation of the QTL analysis and suggested

which of the significant QTL had independent effects that could be discerned from background

noise. Although these evaluations are completely valid for these markers, these analyses don‟t

provide confirmation of the QTLs that is truly independent of the QTL analysis. These analyses

of variance used the same data as the QTL analysis and therefore suffered from the same

sampling biases. That is, if our observations of a marker in multiple assays happened to be

greater than the actual effect, the QTL analysis would flag the marker and we would have

concluded that there was a significant marker effect, consistent across assays. This is even more

likely than one might think, since markers that have a consistent effect across assays are more

likely to be significant QTLs and these analyses of variance were only conducted on highly

significant QTLs.

The genetic correlations between screening experiments and the genetic and phenotypic

variance specific to each was estimated with the SAS Mixed procedure using methods similar to

Holland (2006). Our analyses used the following model: Y = Xβ + Zγ + ϵ, where Y is the vector

of FCR severity scores (lsmeans of each assay from the raw data analysis), β is the vector of

fixed effects of assays within each screening experiment, X is the associated incidence/design

matrix, γ is the vector of random effects of genotypes in each screening experiment, Z is the

associated incidence matrix, and ϵ is the vector of experimental errors. The terms γ and ϵ are

considered independent with covariance matrices of G and R, respectively. We considered G

unstructured and identical for all genotypes, with separate variance parameters for each

screening experiment and heterogeneous covariances between each pair of screening experiment.

Experimental error variance (R) was modeled with heterogeneous variances across screening

experiments and no covariance between screening experiments. The assumptions of this method,

105

that residuals are normally, independently, and identically distributed, were tested and confirmed

for each model. Wald-type inference tests were used to test if genetic correlations differed from

zero.

By estimating genetic correlation with the above model using lsmeans of each assay from

the raw data analysis as the response, any effects of sets within assays, replicates within sets and

assays, or interactions between genotypes and replicates were accounted for, at least to the extent

that these were accurately estimated within the restrictions of the experimental design.

Therefore, the response variable should only be influenced by genotype, screening experiment,

assay within experiment, and their interactions, assuming that no latent variables have been

missed in the raw data analysis. Since we modeled the raw data with a model that included

terms for genotypes within experiments and fixed effects of assays within experiments, the

residual error associated with the genetic correlation should be equivalent to the genotype by

assay within experiment interaction (assuming no latent variables). Covariance between

screening experiments, and by extension phenotypic correlation between experiments, could not

be evaluated in this study because the same environmental conditions cannot be replicated in

different screening experiments, i.e. assays are nested within experiments.

106

Linear Modeling of the Relationships Between Wheat Field

Characteristics and Fusarium Crown Rot Observations

Abstract

The following Abstract is from: Grant J. Poole, Richard W. Smiley, Carl Walker, Kimberly

Garland-Campbell, Timothy C. Paulitz. A survey of Fusarium crown rot in dryland wheat in the

Pacific Northwest of the US. (Unpublished) It is used here to summarize the project I

collaborated on. My participation began after all data had been collected and a preliminary

analysis had been conducted.

Fusarium crown rot (FCR) is one of the most widespread root and crown diseases of wheat in the

Pacific Northwest of the U.S. Accurate surveys of pathogen and disease presence are needed to

determine the extent and damage due to FCR. Our objectives were to conduct a survey covering

the diverse dryland wheat-producing areas of Washington and Oregon, to determine the

geographic species distribution of causal agents of Fusarium crown rot, and to determine if

various environmental and geographical features of the collection location were associated with

species distribution. In this study 105 fields were surveyed during 2008 and 2009. Isolates of

Fusarium spp. were obtained from 99% of fields in 2008 and 97% of fields in 2009. Fusarium

culmorum was isolated from 31% of the symptomatic stems surveyed, closely followed by F.

pseudograminearum isolated at a frequency of 30% (symptomatic stems) averaged over both

survey years. Overall isolation frequency means for other minor species included F.

crookwellense, F. acuminatum, F. equiseti, and Bipolaris sorokiniana at 13%, 1%, 1%, and 2%,

respectively. Species composition and disease severity varied significantly depending on

geography and cropping system. F. pseudograminearum occurred in a greater frequency in

areas of the PNW with warmer and drier weather patterns, whereas F. culmorum occurred in

greater frequency from zones with moderate to high moisture and cooler temperatures.

107

Discussion of Statistical Methods

Statistical analyses were carried out to determine how field characteristics relate to Fusarium

culmorum and F. pseudograminearum infection. Factor analysis was used to estimate latent

factors from the highly correlated, continuous, field characteristic variables. The estimated latent

factors were used, with additional variables not included in the factor analysis, as predictor

variables in linear mixed models and generalized linear mixed models of the Fusarium infection

responses.

Data were collected from surveys, during 2008 and 2009, of 200 fields located in major

wheat growing regions of Washington and Oregon. In 2008, 100 fields were surveyed, and in

2009, a matched field, no more than 3.2 km away was surveyed, since wheat was not grown on

each field each year. These matched pairs were considered to have the same values for every

explanatory variable.

Climate data used in this analysis were 30 year averages and were retrieved from the US

Forest Service Rocky Mountain Research Station website

(http://forest.moscowfsl.wsu.edu/climate/current/) for each specific GPS point survey location.

These data are potentially useful for predicting FCR infection, since a given year‟s weather

conditions are partially predicted by average conditions. However, year to year climatic

variation is always large. Some of this variation was captured as overall year effects in linear

models, but some of the variation was not attributable to year main effects or 30 year averages.

Therefore, estimates using these data would be expected to be less accurate than if actual weather

data were recorded. Unfortunately, resources were limited, necessitating the use of data that are

freely available.

108

Soil textures were categorized as one of three classification categories: sandy loam, loam,

or silty loam. Although the categories cover a range of values, we chose the center point of each

category and assigned all observations of that category sand, silt, and clay contents equal to the

center values. Doing so resulted in some loss of information, since the categories encompass

more possibilities than captured by the center points. For the centers of these three categories,

the amounts of sand and silt are almost perfectly negatively correlated and the amount of clay

doesn‟t vary to a meaningful degree. Therefore, we chose the percent sand content as a

replacement variable for the original categorical soil texture variable. Using all three continuous

variables would result in extreme multicollinearity, which should be avoided in linear modeling.

The benefits of using sand content instead of texture classes were that it allowed a single variable

to replace two indicator variables, it correctly placed loam soils intermediate to sandy loam and

silty loam soils, and since it was a continuous variable, it could be easily incorporated into

dimension reduction techniques.

The field characteristic „percent cropped‟ indicates the estimated percentage of time and

area that a crop was grown on a field. These values were estimated based on data from National

Agricultural Statistics Service analysis of satellite images. The estimated values are from the 12

km area centered on each field, during the growing season, over multiple years. These data

provide information as to how much time the field spent in fallow prior to sampling of the wheat

crop. Unfortunately, the limited resolution of 12 km around the field means that other fields and

non-cropped lands may have been captured in the estimate. Despite such limitations on

accuracy, these data were considered superior to other options for estimating cropping system

with limited resources.

109

Prior to factor analysis, correlations were calculated to estimate relatedness among

continuous field characteristic variables. Pearson correlations were calculated between

elevation, mean annual temperature, mean annual precipitation, mean temperature in the coldest

month, mean temperature in the warmest month, percent cropped, and soil sand content. Only

variables which had correlation coefficients of greater than 0.5 with at least one other variable

were included in the factor analysis, so sand content was not included. This correlation analysis

provided some indication as to whether multicollinearity was present in the explanatory

variables. High correlation values indicate definite multicollinearity, but multicollinearity could

also occur between sets of more than two variables even if all pairwise correlations were low.

However, in this situation, with only seven explanatory variables, there was limited potential for

high levels of multiple variable dependency without pairwise correlation.

Factor analysis was conducted using the FACTOR and SCORE procedures with the

common factor method in SAS (SAS Institute Inc., 2003) to estimate latent factor values for each

field for two rotated factors. Factor analysis was chosen for dimension reduction over principal

components analysis, because the field characteristic variables used in this analysis were

considered imprecise measurements of the true variation across these fields. Principal

component analysis does not have this same philosophical underpinning (Suhr, 2005).

Underlying latent variables were assumed, since weather patterns, which varied over the wide

geographic region sampled, influenced all the climate traits and growers‟ cropping systems

decisions. Analysis was limited to two factors based on the scree test and the Kaiser-Guttman

rule. The initial factor solution was rotated to improve interpretation using a varimax rotation.

This rotation was chosen because it yields orthogonal latent factors, and the primary goal of our

factor analysis was to avoid multicollinearity in the linear modeling that followed.

110

Repeated measures analyses were conducted to relate the field characteristics to the

response variables (FCR severity scores and node infection scores). The two response variables

were analyzed similarly using the MIXED procedure in SAS (SAS Institute Inc., 2003) and a

model which included the two estimated latent factors, sand content, and year as fixed effects

including all two-way interactions. Each pair of matched fields was considered the subject of

repeated measures. That is, the assessment of one field in 2008 and its matched field in 2009 are

considered repeated measures on the same subject. Linear mixed model assumptions were

checked and met for both response variables. When significant interactions were observed,

simple effects of each level of one variable were tested and estimated at each level of the other.

The use of repeated measures analysis and the consideration of matched fields to be observations

on the same subject was the most appropriate analysis on a non-ideal design. Ideally, the same

field would have been observed both years, but crop rotation made this impossible for most of

the fields. Alternatively, a different set of fields could have been evaluated each year, although

this would necessitate sufficient numbers of fields to ensure that the same sample space of the

predictor variables was covered in both years. Due to limitations in funds and time, evaluation

of more than 100 fields per year was not possible, so this match-fields design was an effective

compromise.

Presence or absence of F. culmorum and F. pseudograminearum on five stems from each

field was related to field characteristics and year effects using the GLMMIX procedure in SAS

(SAS Institute Inc., 2003). If the underlying field conditions result in a certain probability that

each stem will get infected with a given species, then testing five independent stems is a

binomial response. Therefore, a generalized linear model with a logit link was appropriate.

With these separate analyses for each species, we made the assumption that presence of one

111

species does not affect the presence of the other. This assumption was partially justified by our

observations of the presence of both species on some stems. Allowing for influence between

species would have required a more complex model that would likely be beyond the information

content of the data. A generalized linear mixed model of repeated measures was fit using a logit

link. Each pair of matched fields was considered the subject of repeated measures. As described

above, other designs might have been preferable, given greater resources. Reported effects and

95% confidence intervals were estimated using models which only included those terms

identified as significant in the full model.

112

Logistic Regression Analysis of Wheat Cold Tolerance

Testing

My participation in this project began prior to the testing and analysis described below, and as

such I contributed to the choices of methods and experimental design.

Summary

The ability of both winter and spring genotypes to survive extreme cold has economic

repercussions in Washington and thus is an important target for wheat breeders. The genetics

that underpin tolerance to extreme cold have not been fully elucidated, so breeders must rely on

phenotypic evaluations. Assessment of cold tolerance in the field is impaired by variations

across years and within a field, necessitating testing under controlled conditions. The objective

of this study was to develop and implement a set of procedures and analyses to evaluate

genotypes for tolerance to extreme cold within a breeding program. Wheat was grown within

cells of soil and was subjected to extreme cold that had a differential effect on survival.

Temperature recording probes in each cell were used to measure the temperature variation

among cells of soil every 2 minutes during a freeze test procedure. A calibration based on

temperature readings of freezing water was used in an attempt to adjust for inaccuracies in the

probe readings. Logistic regression was performed to compare survival of test genotypes to the

check cultivars, resulting in estimates of odds ratios and corresponding confidence intervals.

These confidence intervals let us identify which genotypes were significantly worse, better, or

not significantly different from the controls, and this allows for ranking of test genotypes while

statistically accounting for error in estimation. Accuracy using calibrated and un-calibrated

probe data and using survival data alone was evaluated by taking the Spearman correlation

between odds ratios calculated on the 2010 freeze trials of winter genotypes and the estimates

113

derived from the 2011 freeze trials. The correlations were: 0.493, 0.460, and 0.443, for no probe

data, probe data, and calibrated probe data, respectively. These results suggest that, at least when

only one minimum temperature is used, use of probe data, with or without calibration does not

improve the accuracy of tolerance estimation.

Discussion of Methods

Multiple sets of winter and spring wheat germplasm were evaluated for tolerance to extreme cold

temperatures in 2009, 2010, and 2011. These sets included released cultivars, breeding material,

and mapping populations. Sets of test genotypes that could be considered from the same

population were analyzed together. Division into populations was partially subjective, but was

based on expectations of cold tolerance for a population. Separation based on growth habit is

obviously necessary, since no single temperature will allow both winter and spring genotypes to

show differential survival, i.e. too cold a temperature will kill all the spring cultivars and vice

versa. Released cultivars and breeding material were separated from mapping populations, since

the mapping populations derive from a single cross. Progeny from a single cross should have

reduced variability, as compared to less closely related cultivars and breeding material, and so

may have a different ideal test temperature. Two check genotypes were included with each set

of test genotypes. The cultivars Eltan, which has desirable cold tolerance, and Stephens, which

has insufficient cold tolerance, were used as checks with the winter genotypes. The cultivars

Alpowa, with good cold tolerance (for a spring growth habit genotype), and Zak, with poor cold

tolerance, were used as controls for the spring genotypes. Different checks for winter and spring

populations were necessary not only to match the vernalization requirements of the test

114

genotypes, but also because checks are only useful if they fit within the distribution of the test

genotypes.

Prior to cold tolerance testing, the test and check genotypes were grown in conditions

mimicking fall growing conditions. The wheat seeds were densely planted in small cells of soil,

twenty seeds per cell. The plants were grown in warm conditions 22°C day/15°C night for one

week, followed by five weeks at 4°C. Immediately prior to testing, the number of seedlings that

had grown in each cell was recorded and cells were watered to saturation with a 10 mg/L

solution of SNOWMAX® (Johnson Controls, Centennial, CO, USA). Snowmax was added to

help reduce supercooling, thereby reducing chance variability among soil cell conditions.

However, Snowmax was probably not necessary after the sub-zero acclimation period (described

below) was adopted in 2010, since this allowed cells to all freeze at a non-lethal temperature

prior to imposition of extreme cold, and the freezing temperature of water in soil is 0°C (Lackner

et al., 2005).

Testing for extreme cold tolerance was conducted in a programmable freezer (model LU-

113, Espec Corp., Hudsonville, MI, USA) using three temperature profiles, one for winter

genotypes and two profiles for spring genotypes. Runs of 48 soil cells were placed in the freezer

at the same time. It is important to note that the freezer settings did not necessarily match the

actual temperatures of the air inside the chamber or the soil in each cell. The temperature of

each cell of soil was recorded at 2 minute intervals using food piercing temperature probes

connected to a monitor system that connected to multiple personal computers. The temperature

profiles used were similar to those of Skinner and Bellinger (2011). All temperature profiles

began by cooling the freezer and holding at a setting of -3°C for 16 hours for tests in 2010 and

2011. This will be referred to as the sub-zero acclimation period (SZAP). Tests in 2009 did not

115

include a SZAP and began the same temperature profile from room temperature. Skinner and

Bellinger (2010) observed that a 16-hour SZAP increased cold tolerance in winter wheat. This

was not our goal for the SZAP, since growing conditions in the field may not necessarily be as

gradual. The SZAP was instead included because it ensures that all soil cells are frozen by the

time colder temperatures are imposed. Omitting a SZAP period results in dramatic variability in

the temperatures that occur in each soil cell. The soil cells vary in terms of water holding

capacity, due to differences in actual mass of soil contained and packing density, and latent heat

from the freezing process prevents the soil cells from cooling further while the phase change

occurs. This means that without a SZAP some soil cells will still be freezing while others are

decreasing in temperature post-freezing. Following the SZAP, the winter genotypes were cooled

for 2.5 hours to a temperature setting of -13°C, which was held for 1 hour. Temperatures were

then increased to 4°C over a period of 4.25 hours. The warmer profile for spring genotypes

consisted of cooling after the SZAP for 0.75 hours to -6°C, holding at -6°C for 1 hour, then

warming for 2.5 hours. The cooler spring profile was cooling for 1.25 hours to -8°C, holding at -

8°C for 1 hour, then warming for 3 hours. The minimum temperatures used were selected to

result in the greatest range of probabilities of survival and to result in differential survival

between the check genotypes. Using too cold or too warm a minimum will result in many

genotypes with no or complete survival, preventing differential survival. Winter genotypes were

tested at a single temperature as a matter of efficiency. In breeding program evaluations for

selection, high-throughput procedures with modest accuracy are preferred to precisely estimating

small differences among genotypes. Multiple temperatures were used for spring genotypes in the

process of identifying the ideal temperature for each set of genotypes, and data were analyzed

separately for each minimum. Following the extreme cold treatment, seedling leaves were cut

116

off at the soil line. After one day at 4°C, the seedlings were grown in a greenhouse for five

weeks, and the plants that regrew were recorded as having survived.

In response to observed variability in probe readings for known temperatures, a

calibration of output temperature values was attempted. Obviously malfunctioning probes were

replaced beforehand, but minor inaccuracies could not be easily identified prior to calibration.

The probes were all placed in the same container of distilled water within the freezer, which was

lowered and held below freezing. The volume of water was sufficient that the period of phase

change from liquid to ice lasted long enough to appear as a horizontal line of constant

temperature in the probe data readings. Since the phase change of distilled water occurs at 0°C,

the deviation of each probe from 0°C during the phase change was considered error. This

estimated error value for each probe was subtracted from all of that probe‟s reported values to

calibrate the probe output. This method of calibration is obviously not ideal. Although it could

be expected to effectively determine the degree of error for each probe at 0°C, the error at the

more important and damaging temperatures may differ, possibly resulting in over- or under-

adjustment. It is also possible that probe errors are not constant over time. If the deviation

changed signs, the calibration adjustment would in fact be opposite the correct direction at the

time of the test. In light of these shortcomings, the effectiveness of calibration was compared to

not adjusting the raw probe data.

Probe temperature output data, with and without calibration, were used to calculate

metrics that described components of the temperature profiles of each extreme cold treatment.

These calculations were similar to those of Skinner and Mackay (2009) and Skinner and

Bellinger (2011). These profile components were used, since Skinner and Mackay observed that

survival in some populations was related to metrics other than just minimum temperature.

117

Temperature differences above -5°C were considered unimportant, since the temperature of the

SZAP (-3°C setting on the freeze chamber) is uniformly tolerable for all the genotypes tested and

temperatures recorded by the probes during the SZAP are sometimes lower than -3°C. The post-

SZAP was defined as the period between the first time each probe recorded a temperature of -

5°C and the next time the probe recorded -4.6°C. The higher ending temperature ensures that

small fluctuations in temperature do not result in the incorrect identification of very short post-

SZAP. For each post-SZAP, multiple components of the temperature profile were calculated.

The post-SZAP duration and minimum temperature are self-explanatory. The post-SZAP was

parsed into three periods: the cooling period, the at-minimum period, and the warming period.

The at-minimum period was defined as the time during which temperatures were within 0.5°C of

the minimum temperature. This ensures that minor temperature fluctuations around the

minimum are ignored. The cooling and warming periods preceded and followed the at-minimum

period, respectively. The duration of each of these periods was calculated. The cooling and

warming rates were defined as the change in temperature during each period divided by the

duration of each respective period. Degree minutes were calculated as:

F

i

iiii

TTtt

1

1)1(

2,

where ti is the ith

time point, Ti is the temperature at the ith

time point, and F is the number of

time points recorded during the post-SZAP.

Logistic regression was performed using the LOGISTIC procedure of the SAS/STAT

software (SAS Institute Inc., 2003) to compare survival of test genotypes to the check cultivars

using survival data with both calibrated and un-calibrated probe data and using survival data

alone. Logistic regression is valid when each plant is considered an experimental unit and

survival or death of each plant is the binary response. It is appropriate to think of individual

118

plants as experimental units, because survival was assessed for each plant individually and each

plant responds independently to the soil and temperature conditions specific to each cell of soil.

The conditions of each soil cell were included in the analysis via the temperature profile

components used as covariates in the logistic regression. When analyses were conducted without

probe data, cell-to-cell variation was no longer captured in the model. Therefore, without probe

data, the assumption of independent error for each plant may have been violated. The expected

cell-to-cell variation could have been included in the model with the use of a generalized linear

mixed model. Unfortunately, such models often failed to converge when applied to the large

data sets in this study. Smaller subsets of genotypes could have been separately analyzed, but

that would have produced different statistical conclusions (inference tests, confidence intervals,

etc.) depending upon which genotypes were analyzed together.

The genotype factor and the seven temperature profile components define a set of 255

possible first order models without interactions. Specific models varied among data sets and

were chosen using stepwise regression in the LOGISTIC procedure with entry and exit

significance levels of 0.05. Test genotypes were evaluated by generating confidence intervals of

the ratios of odds of survival of each test genotype against each control genotype. These

confidence intervals let us identify which genotypes were significantly worse, better, or not

significantly different from the controls. Using two checks for comparisons allows test

genotypes to be separated into five groups: worse than the inferior check, equivalent to the

inferior check, in between the two checks, equivalent to the superior check, or better than the

superior check. This grouping allows for ranking of test genotypes while statistically accounting

for error in estimation. Rankings based on estimated odds ratios or percent survival do not

119

reflect the sampling errors that may cause inaccurate rankings among similarly performing

genotypes.

Accuracy using calibrated and un-calibrated probe data and using survival data alone was

evaluated by taking the Spearman correlation between the odds ratios (with Eltan or Stephens as

reference) calculated on the 2010 freeze trials of winter genotypes and the estimates derived

from the 2011 freeze trials. Spearman correlations were used to minimize the effect of outliers,

since odds ratio estimates vary more drastically at the extremes. This analysis included only

genotypes that were tested in both years and, in 2011, were tested after malfunctioning probes

were replaced on 3/23/11. This resulted in 45 genotypes used in the comparisons. The

Spearman correlations, with Eltan as the reference for the odds ratios, were: 0.493, 0.460, and

0.443, for no probe data, probe data, and calibrated probe data, respectively. The correlations

using Stephens as the reference genotype were very similar and showed the same pattern. This

comparison indicated that the use of probe data, with or without calibration, does not improve

predictive ability. The futility of the calibration was not unexpected, due to the limitations

described above. However, the use of probe data was expected to improve predictions based on

the results of Skinner and Mackey (2009), who observed that temperature probes could record

changes in conditions that influenced survival. The difference in conclusions may be explained

by the differences in the temperature profiles used in each study. Skinner and Mackey lowered

temperatures to two different minimums, as opposed to the single minimum we used for the

winter genotypes, and did not include a SZAP. Their methods resulted in a much larger range of

conditions. When a SZAP was included with only a single minimum in our study, temperature

variation among cells was much less. This variation was likely within the margin of error for the

temperature probes, making their output useless for predicting survival.

120

The correlation of 0.493, between 2010 and 2011 results based on survival data alone,

indicates that substantial variation in survival is unaccounted for by genotype. This additional

variation may be attributable to cell-to-cell temperature variability that was not successfully

captured by the probe data or variation in growing and testing conditions between 2010 and

2011. Additional replication of cells and/or testing over time will be necessary to provide

accurate and precise estimates of extreme cold tolerance, but evaluation within a single year

should be sufficient for coarse selection decisions.

References

Dean, A.M., and D. Voss. 1998. Design and Analysis of Experiments. Springer.

Holland, J.B. 2006. Estimating Genotypic Correlations and Their Standard Errors Using

Multivariate Restricted Maximum Likelihood Estimation with SAS Proc MIXED. Crop

Science. 46(2): 642–654.

Holland, J.B., W.E. Nyquist, and C.T. Cervantes-Martínez. 2003. Estimating and Interpreting

Heritability for Plant Breeding: An Update. : 9–112.

Kutner, M., C. Nachtsheim, J. Neter, and W. Li. 2004. Applied Linear Statistical Models. 5th ed.

McGraw-Hill/Irwin, San Francisco.

Lackner, R., A. Amon, and H. Lagger. 2005. Artificial Ground Freezing of Fully Saturated Soil:

Thermal Problem. Journal of Engineering Mechanics. 131(2): 211–220.

Miedaner, T., T. Würschum, H.P. Maurer, V. Korzun, E. Ebmeyer, and J.C. Reif. 2011.

Association mapping for Fusarium head blight resistance in European soft winter wheat.

Molecular Breeding. 28(4): 647–655.

Reif, J.C., B. Kusterer, H.-P. Piepho, R.C. Meyer, T. Altmann, C.C. Schön, and A.E.

Melchinger. 2009. Unraveling Epistasis With Triple Testcross Progenies of Near-

Isogenic Lines. Genetics. 181(1): 247 –257Available at (verified 20 January 2012).

SAS Institute Inc. 2003. SAS/STAT® User‟s Guide, Version 9. SAS Institute Inc., Cary, NC.

Skinner, D.Z., and B.S. Bellinger. 2010. Exposure to subfreezing temperature and a freeze-thaw

cycle affect freezing tolerance of winter wheat in saturated soil. Plant and Soil. 332: 289–

297.

121

Skinner, D.Z., and B.S. Bellinger. 2011. Differential Response of Wheat Cultivars to

Components of the Freezing Process in Saturated Soil. Crop Science. 51(1): 69.

Skinner, D.Z., and B. Mackey. 2009. Freezing tolerance of winter wheat plants frozen in

saturated soil. Field Crops Research. 113(3): 335–341.

Suhr, D. 2005. Principal Component Analysis vs. Exploratory Factor Analysis. In Proceedings of

the Thirtieth Annual SAS Users Group International Conference. SAS Institute Inc.,

Cary, NC.