Association mapping for growth, straightness and wood chemistry traits in the Pinus pinaster...

14
ORIGINAL PAPER Association mapping for growth, straightness and wood chemistry traits in the Pinus pinaster Aquitaine breeding population Camille Lepoittevin & Luc Harvengt & Christophe Plomion & Pauline Garnier-Géré Received: 11 May 2011 /Revised: 12 July 2011 /Accepted: 4 August 2011 /Published online: 26 August 2011 # Springer-Verlag 2011 Abstract Association mapping is a recommended method to dissect the genetic basis of naturally occurring trait variation in non-model tree species with outcrossing mating systems and large population sizes. We report here the results of the first association-mapping study in maritime pine (Pinus pinaster Ait.), a conifer species of economical importance for timber and pulp production in south-western Europe. Two association samples were examined: 160 plus trees belonging to the first generation breeding population (G0, resulting from mass selection for overall good growth and form in the forest of South West of France) and 162 trees from the second generation breeding population (G1, resulting from biparental crosses between G0 trees). These samples were (1) genotyped for 184 in vitro SNPs discovered in 40 candidate genes for plant cell wall formation or drought stress resistance and 200 in silico SNPs detected in 146 contigs from the maritime pine EST database and (2) phenotyped for growth, stem straightness and wood chemistry traits in progeny or clonal experimen- tal designs (from 768 to 5,080 phenotypes depending on the trait). First, SNP data were used to test for putative stratification in the breeding population. Then, two different approaches using pedigree records to account for inbreed- ing were used to test for associations. Despite the a priori low power of the designs, we identified two mutations that were significantly associated, one with variation in growth (in a HD-Zip III transcription factor) and the other with variation in wood cellulose content (in a fasciclin-like arabinogalactan protein). Keywords Association mapping . Pinus pinaster . Wood quality . Growth Introduction Research for dissecting the molecular basis of quantitative variation has historically focused on quantitative trait locus (QTL) mapping, which allows detecting associations between phenotypic variability and genomic regions iden- tified by molecular markers. QTL studies depend on the development of suitable segregating populations (e.g. F 2 , backcrosses or near-isogenic lines), which can be a serious limitation for species showing high levels of inbreeding depression and long generation times such as forest trees. Moreover, QTL effects often depend on the narrow genetic background in which they have been detected, which can also severely limit their application when breeding material from different genetic backgrounds is used. Association mapping (or linkage disequilibrium (LD) mapping) is a more recent approach which can be applied at the population level. Taking advantage of the large number of historical recombination events in natural populations, LD mapping is expected to show a much higher resolution than in QTL studies since only markers in strong LD with a Communicated by S. Aitken Electronic supplementary material The online version of this article (doi:10.1007/s11295-011-0426-y) contains supplementary material, which is available to authorized users. C. Lepoittevin : C. Plomion : P. Garnier-Géré INRA, UMR1202 BIOGECO, 33610 Cestas, France C. Lepoittevin : C. Plomion : P. Garnier-Géré Université de Bordeaux, UMR1202 BIOGECO, 33400 Talence, France C. Lepoittevin (*) : L. Harvengt FCBA, Laboratoire de Biotechnologies, 77370 Nangis, France e-mail: [email protected] Tree Genetics & Genomes (2012) 8:113126 DOI 10.1007/s11295-011-0426-y

Transcript of Association mapping for growth, straightness and wood chemistry traits in the Pinus pinaster...

ORIGINAL PAPER

Association mapping for growth, straightness and woodchemistry traits in the Pinus pinaster Aquitainebreeding population

Camille Lepoittevin & Luc Harvengt &Christophe Plomion & Pauline Garnier-Géré

Received: 11 May 2011 /Revised: 12 July 2011 /Accepted: 4 August 2011 /Published online: 26 August 2011# Springer-Verlag 2011

Abstract Association mapping is a recommended methodto dissect the genetic basis of naturally occurring traitvariation in non-model tree species with outcrossing matingsystems and large population sizes. We report here theresults of the first association-mapping study in maritimepine (Pinus pinaster Ait.), a conifer species of economicalimportance for timber and pulp production in south-westernEurope. Two association samples were examined: 160 plustrees belonging to the first generation breeding population(G0, resulting from mass selection for overall good growthand form in the forest of South West of France) and 162trees from the second generation breeding population (G1,resulting from biparental crosses between G0 trees). Thesesamples were (1) genotyped for 184 in vitro SNPsdiscovered in 40 candidate genes for plant cell wallformation or drought stress resistance and 200 in silicoSNPs detected in 146 contigs from the maritime pine ESTdatabase and (2) phenotyped for growth, stem straightnessand wood chemistry traits in progeny or clonal experimen-tal designs (from 768 to 5,080 phenotypes depending on the

trait). First, SNP data were used to test for putativestratification in the breeding population. Then, two differentapproaches using pedigree records to account for inbreed-ing were used to test for associations. Despite the a priorilow power of the designs, we identified two mutations thatwere significantly associated, one with variation in growth(in a HD-Zip III transcription factor) and the other withvariation in wood cellulose content (in a fasciclin-likearabinogalactan protein).

Keywords Association mapping .Pinus pinaster . Woodquality . Growth

Introduction

Research for dissecting the molecular basis of quantitativevariation has historically focused on quantitative trait locus(QTL) mapping, which allows detecting associationsbetween phenotypic variability and genomic regions iden-tified by molecular markers. QTL studies depend on thedevelopment of suitable segregating populations (e.g. F2,backcrosses or near-isogenic lines), which can be a seriouslimitation for species showing high levels of inbreedingdepression and long generation times such as forest trees.Moreover, QTL effects often depend on the narrow geneticbackground in which they have been detected, which canalso severely limit their application when breeding materialfrom different genetic backgrounds is used. Associationmapping (or linkage disequilibrium (LD) mapping) is amore recent approach which can be applied at thepopulation level. Taking advantage of the large number ofhistorical recombination events in natural populations, LDmapping is expected to show a much higher resolution thanin QTL studies since only markers in strong LD with a

Communicated by S. Aitken

Electronic supplementary material The online version of this article(doi:10.1007/s11295-011-0426-y) contains supplementary material,which is available to authorized users.

C. Lepoittevin :C. Plomion : P. Garnier-GéréINRA, UMR1202 BIOGECO,33610 Cestas, France

C. Lepoittevin :C. Plomion : P. Garnier-GéréUniversité de Bordeaux, UMR1202 BIOGECO,33400 Talence, France

C. Lepoittevin (*) : L. HarvengtFCBA, Laboratoire de Biotechnologies,77370 Nangis, Francee-mail: [email protected]

Tree Genetics & Genomes (2012) 8:113–126DOI 10.1007/s11295-011-0426-y

causative allele will show significant associations withtargeted traits (Cardon and Bell 2001). This allows for afine mapping scale (<1 cM; see Xiong and Guo 1997) and,in some cases, for the detection of causative polymorphisms(Fournier-Level et al. 2009; Ingvarsson et al. 2008;Thumma et al. 2009).

The efficiency of association studies thus not only largelydepends on LD patterns and extent in the populationconsidered, but also on the ability to distinguish between LDdue to physical linkage and LD due to other evolutionaryforces (Abecasis et al. 2005; Flint-Garcia et al. 2003; Gupta etal. 2005). For example, population structure is seen as themost serious systematic bias producing false-positive asso-ciations, as it creates LD between any joint divergentmolecular and phenotypic traits among subpopulations(Hirschhorn and Daly 2005; Marchini et al. 2004). A largenumber of methods have thus been developed for detectinghidden population stratification (Falush et al. 2003; Hubisz etal. 2009; Pritchard et al. 2000a; Wu et al. 2006) and/or fordealing with any structure in association models (Kang et al.2008; Price et al. 2006, 2010; Pritchard et al. 2000b; Zhao etal. 2007). Applying those methods ensures that onlyassociations caused by physical linkage should remain.Similarly, covariance between individuals because of theirrelatedness can increase the false-positive rate (Voight andPritchard 2005), and models accounting for familial orcryptic relatedness have been developed (Malosetti et al.2007; Sillanpää 2011; Stich and Melchinger 2009; Yu et al.2006), which allow association mapping to be carried out onbreeding populations showing variable levels of inbreeding.

Most coniferous species are considered as good modelsfor association mapping due to their generally high levels ofgenetic diversity and pollen flow and large population sizes(Neale and Savolainen 2004). These life history and matingcharacteristics a priori lead to rapid LD decay betweengenes or polymorphisms, low inbreeding and low popula-tion structure, which is a favourable situation to avoid false-positive associations. Associations explaining small butsignificant proportions of trait variation (often <5%) havealready been reported for wood quality in Pinus taeda(González-Martínez et al. 2007), Pinus radiata (Dillon etal. 2010) and Picea glauca (Beaulieu et al. 2011). Otherquantitative traits are commonly targeted and showedsimilarly low proportions of variation explained on anSNP level: for carbon isotope discrimination or for diseaseresistance in P. taeda (Cumbie et al. 2011; González-Martínez et al. 2008; Quesada et al. 2010) and for coldhardiness or bud set timing in Picea sitchensis (Holliday etal. 2010) and Pseudotsuga menziesii (Eckert et al. 2009).We report here the results of the first association-mappingstudy in maritime pine (Pinus pinaster Ait.). This coniferspecies is of economical importance for timber and pulpproduction in south-western Europe (Alazard et al. 2005;

Pot et al. 2005) where it occurs naturally and shows afragmented distribution (Bucci et al. 2007; Burban and Petit2003; González-Martínez et al. 2002). First, SNP data wereused to test for putative stratification in the Aquitainebreeding population. Then, two different association-mapping approaches using pedigree records to account forinbreeding were used to test for potential associationsbetween SNPs and growth, stem straightness and woodchemistry traits.

Materials and methods

Plant material

Two different populations were examined in this study: (1)160 plus trees belonging to the first generation breedingpopulation (G0) that were chosen for overall good growth andform in Aquitaine region natural forests (South West France)and (2) 162 trees from the second generation breedingpopulation (G1) resulting from biparental crosses between77 G0 trees, which were individually selected based on theirgenetic value for growth and stem straightness. G0 trees weresampled across a large range of different locations wellcovering the Aquitaine region, especially along the oceancoast, and were distant from one another by at least 50 m in asingle location (Online resource 1). G0 and G1 trees wereevaluated in the Hermitage progeny trial and the Vaqueyclonal trial, respectively, as described in Lepoittevin et al.(2011). Briefly, the 160 G0 trees were crossed with a pollenmix collected from 28 unrelated G0 trees, resulting in 5,080progenies distributed in families of 12 to 36 half-sibs. Theywere installed in a randomised complete block design withthree, six or nine tree plots per family. The 162 G1 trees werecloned, and 3 to 5 replicates per clone (768 trees in total)were installed in randomised single-tree plots with nodefined field blocks.

Phenotypic data

Phenotyping methods and data for both trials are describedin Lepoittevin et al. (2011). Briefly, all the G0 progenies inthe Hermitage trial were measured for total height (Height),circumference at breast height (Girth, as in Lepoittevin etal. 2011) and deviation from verticality (Str) at 8 years. Allthe G1 replicates in the Vaquey trial were measured for totalheight (Height) at 8 years and diameter at breast height(Diameter) at 13 years. Wood samples were collected fromboth trials and indirect chemical characterisation wasobtained by near-infrared spectroscopy. In the Hermitagetrial, only 958 G0 progenies (7 to 12 half-sibs by family in101 families) were measured for extractives (Extract) andlignin (Lignin) content at 31 years old. In the Vaquey trial,

114 Tree Genetics & Genomes (2012) 8:113–126

all the G1 replicates were measured for lignin, cellulose,mannose and galactose contents (Lignin, Cellulose, Mannoseand Galactose, respectively) at 13 years old.

Genotypic data

DNA was extracted from needles of all G0 and G1 treesusing the Invisorb® Spin Plant Mini Kit (Invitek, Berlin,Germany). Genotyping was conducted with the IlluminaGoldenGate Technology (Illumina Inc., San Diego, CA,USA) using 184 in vitro SNPs discovered in 40 candidategenes involved in plant cell wall formation or drought stressresistance and 200 in silico SNPs detected in 146 contigsfrom the maritime pine EST database, as described inLepoittevin et al. (2010). For the in vitro SNPs, an averageof 50 megagametophytes from different populations wellcovering the geographic distribution of the species wassequenced for each gene. For the in silico SNPs, ESTs werederived from 6 different libraries constructed using differenttissues and a number of segregating haploid genomes from3 up to 300 from different populations. In silico SNPs werechosen according to criteria such as minor allele frequency(>20% and no singletons), number of ESTs for the detection(≥4), PolyBayes and functionality scores, polymorphism ofthe flanking sequences and chromatograms quality. Ourgoal was to include a low number of markers per contig in alarge number of contigs, not focusing on particularannotations. Among the 384 SNPs of our assay, 192 (111in vitro SNPs in 32 candidate genes and 81 in silico SNPsin 69 contigs) and 186 (106 in vitro SNPs in 31 candidategenes and 80 in silico SNPs in 69 contigs) werepolymorphic in the G0 and G1 trees, respectively.

Population structure

Genetic structure was assessed using the Structure softwarev2.2 (Falush et al. 2003, 2007; Pritchard et al. 2000a) on SNPgenotypic data, first for the 160 G0 trees and second for 28unrelated G1 trees. This method assumes that populations areat Hardy–Weinberg equilibrium and that markers come fromunlinked or weakly linked loci (Falush et al. 2003). Prior tothe structure analysis, we thus discarded the SNPs signifi-cantly departing from Hardy–Weinberg equilibrium, basedon Fisher exact tests implemented in the GenePop software(Raymond and Rousset 1995). In our dataset, several SNPswithin the same fragments were at distances shorter than1,000 bp and, therefore could be in strong LD. Wedetermined a subset of unlinked or weakly linked SNPsusing the H-clust method described in Rinaldo et al. (2005)and implemented in the R (R_Development_Core_Team2009) function available at http://www.wpic.pitt.edu/WPICCompGen/hclust/hclust.htm. This method allows selectingeither unlinked or weakly linked SNPs on unphased

genotypic data, using pairwise squared correlation LDestimates (r2) for computing distances between SNPs (1−r2)and building a tree by hierarchical clustering. Below a cut-offvalue of 0.5, non-redundant SNPs were selected, leaving onlySNPs with intermediate levels of association which wereaccounted for in the Structure software method. For bothpanels (160 G0 and 28 unrelated G1 trees), we performed 10independent runs of Structure for numbers of groups (Kparameter) varying from 1 to 10, with the correlated allelefrequencies model and with burn-in and run-length periods of106 iterations. The best number of groups K was thendetermined using both the mean likelihood L(K) over 10 runsfor each K and the ΔK criterion of Evanno et al. (2005).

Statistical models

To decrease computation time while ensuring differentSNPs to be tested within genes, association tests were alsoperformed on a subset of SNPs defined as above with a lessstringent cut-off value of 0.2 (meaning that the maximumlevel of r2 allowed between SNPs was 0.8) successivelyapplied to the 160 G0 trees and the 28 unrelated G1 trees.Two different methods were used to test for genotype–phenotype associations: a two-stage association analysis forthe G0 samples and a one-stage association analysis for theG1 samples.

Two-stage analysis on the G0 samples

Best linear unbiased predictors (BLUPs) were obtained forthe 160 G0 trees using their half-sib progenies installed inthe Hermitage trial by analysing each trait separately withmodel 1 described in Lepoittevin et al. (2011). TheseBLUPs were then used in a second step to test theirpossible association with each SNP marker using thefollowing analysis of variance model:

Bi ¼ mþ si þ ei ð1Þ

where Bi is the BLUP of the ith entry, si is the effect of thegenotypic class at the SNP locus considered and ei is theresidual. Two different models were tested, with sirepresenting either three genotypic classes (two homozy-gous and one heterozygous) in the codominant model or anallelic dose effect in the additive model (using a numericalvariable taking the values 0, 1 and 2 for the absence, thepresence in one copy and the presence in two copies for oneof the two alleles, respectively). We assumed that the 160G0 were unrelated and came from an unstructuredpopulation, as shown by the Structure analysis (see the‘Results’ section). This model was applied using theSNPassoc package (Gonzalez et al. 2007) implemented inR. In this package, the statistical significance of a given

Tree Genetics & Genomes (2012) 8:113–126 115

SNP is obtained by a likelihood ratio test that comparesmodel 1 with a null model which does not include the sieffect. When significant associations were observed, thegenetic variance associated to the SNP effect was estimatedin the codominant model as:

s2snp ¼

MSs �MSek

ð2Þ

where MSs is the mean square associated with the SNPeffect, MSe is the residual mean square and k=(N−1)/2where N is the number of genotypes, taking into accountthe random sampling of genotypes in each genotypic classthat follows a binomial distribution depending on allelicfrequencies (Charcosset and Gallais 1996; Garnier-Gere1992). Then, the broad-sense heritability of the SNP wasestimated as:

h2bs ¼s2snp

s2snp þ s2

e

ð3Þ

In the additive model, the SNP effect is only additive(Falconer and Mackay 1996); thus, the coefficient ofdetermination (r2) is also the fraction of additive varianceassociated to the SNP in the model, i.e. the narrow-senseheritability of the SNP (h2ns). In the codominant and additivemodels, the derived heritabilities that were associated to theSNPs both correspond to the percentage of varianceexplained (PVE) commonly reported in the literature(Holliday et al. 2010; Ma et al. 2010).

One-stage analysis on the G1 samples

For the G1 samples, phenotypic variation and associationanalyses were performed in one step using the followingmodel:

y ¼ Xsþ Zg þ e ð4Þ

where y is a vector of observations on a trait, s is a vector offixed SNP effects, g is a vector of random genetic effects ofindividual genotypes, e is the vector of residuals and X andZ are the incidence matrices linking observations to theeffects. We assumed that the G1 individuals came from anunstructured population (see the ‘Results’ section). Therandom effects in model 3 were assumed to follow normaldistributions with means and variances defined by:

ge

� �� N

00

� �;

As2g 0

0 Is2e

� �� �ð5Þ

where 0 is a null matrix, A is the genetic relationship matrix(computed from a pedigree that takes into account all therelationships among individual genotypes), I is the identitymatrix, s2

g is the genetic variance and s2e is the residual

variance. We considered both codominant and additiveallelic models, as previously described in the two-stageapproach. The estimates of the fixed and random effectswere obtained by solving Henderson’s mixed-model equa-tions (Henderson 1975) using the average informationREML algorithm (Gilmour et al. 1995) implemented inthe ASReml v2.0 software (Gilmour et al. 2006). The Waldtest as implemented in ASReml was used to assess thestatistical significance of SNP effects. In order to alsoobtain an indicative PVE by the SNPs in this analysis, wealso declared the SNPs as random effects in model 3 andcomputed the ratios of the SNP-associated variance to thefull variance as in Cumbie et al. (2011).

Multiple-testing corrections

To account for multiple testing, we first computed q values(Dabney et al. 2009; Storey 2002; Storey and Tibshirani2003), which measure the significance of an association interms of the false discovery rate (FDR). This method,however, assumes that statistical tests are independent,which is not the case if SNPs are in LD. In our study, thearbitrary choice of SNPs with r2<0.8 (defined as informa-tive) still meant that some markers could be in moderate tohigh LD. Therefore, we also implemented a permutationtest using the R software with the boot (Canty and Ripley2009), SNPassoc (Gonzalez et al. 2007) and asreml(Gilmour et al. 2006) packages: BLUPs or phenotypicvalues were permuted among individuals while keepingtheir genotypes fixed and models 1 and 4 were used to testfor false-positive associations. The minimum P value (Pvaluemin) obtained in each of 1,000 permutations was thenused to estimate the P valuemin empirical distribution.Finally, a P′ value was computed as P(P value≤P valuemin).This permutation procedure retains the LD structure andthus allows an estimation of the false-positive rate whennon-independent observations exist in the data (Hirschhornand Daly 2005). However, it is computationally demanding,particularly for model 4, since solving mixed-modelequations has to be repeated 1,000 times, which took∼4 h by trait on a 2.53-GHz Intel Core 2 Duo Processor.

Results

Population structure

For the Structure analysis, 98 polymorphic SNPs (out of192) in the G0 individuals and 109 polymorphic SNPs (outof 169) in the unrelated G1 individuals were retained thatdid not depart significantly from Hardy–Weinberg equilib-rium and that best represented each fragment based on thecut-off value of 0.5 in the H-clust method. We observed a

116 Tree Genetics & Genomes (2012) 8:113–126

typical pattern of unstructured population (Pritchard et al.2007): plateaus in the estimate of log-likelihood of thecluster number (L(K)) were not observed since the highestlikelihood was for K=1, L(K) either consistently decreasedor showed an erratic pattern with greater variance (seeOnline resource 2), all individuals were admixed and theproportion of assignment for any individual to eachsubpopulation was roughly similar (data not shown). TheEvanno criterion ΔK (Evanno et al. 2005) was not pertinentas it can only be computed for K≥2 and, thus, does notallow comparing the results of K=1 (no stratification) withthe other cases. For K≥2, ΔK remained at values very closeto 0 (Online resource 2B, D).

Selection of markers for association tests

We selected two subsets of 141 and 121 informative SNPs(pairwise r2<0.8) for the G0 and G1 samples, respectively(Online resource 3). In these subsets, ∼60% of the pairwisecorrelations were below 0.5, indicating a low to moderatelevel of LD. The allele frequency spectrum for bothsamples is shown in Fig. 1. Minimum allele frequencies(MAFs) were strongly correlated in the G0 and G1 samples(r=0.93), and the 20 SNPs that were polymorphic in the G0samples and not in the G1 samples corresponded to rarevariants (frequency <10%, as shown in Fig. 1).

Statistical tests

In the G0 samples only, a single SNP CT_3782.445 wassignificantly associated with Height and Girth with low Pvalues (<2×10−4), q values (<0.01) and P′ values (<0.04) in

both the codominant and additive models (Fig. 2 andOnline resources 4 and 5). The ‘T’ allele was associated toa decrease in Girth (Fig. 3) and Height (Fig. 4). Theassociation with Girth was slightly stronger than withHeight (for example, using the additive model, the P′ valuewas 0.001 for Girth and 0.01 for Height), but consistentwith the strong correlation between these two traits (geneticand phenotypic correlation coefficients >0.8; see Lepoittevinet al. 2011). Broad-sense and narrow-sense heritabilities(corresponding to PVE) of CT_3782.445 for Girth weresimilar (∼10%), indicating an effect with an additive mode ofaction (Online resource 5). This in silico SNP showed aMAF of 28.3% and no significant departure from Hardy–Weinberg equilibrium in the sample. It was randomly chosenin the maritime pine EST database among SNPs showinghigh chances of genotyping success, but the selection wasnot based on its contig annotation (Lepoittevin et al. 2010).The consensus sequence of CT_3782 group of ESTs hasbeen annotated as a putative fasciclin-like arabinogalactanprotein (FLA) by a tBLASTx search (Altschul et al. 1997)against the non-redundant GenBank database, with a highscore (300) and very low E value (4e−126) with Arabidopsisthaliana FLA17 protein (accession number NM_120722).CT_3782.445 is the only SNP of the contig CT_3782included in our study and is likely synonymous according toopen reading frame predictions. The other 140 SNPs werenot significantly associated with any trait using both thecodominant and additive models: the lower q value and P′value observed were above 0.5, indicating that associationsbased on P values at a 5% threshold were likely due to type Ierrors.

In the G1 sample, clear differences can be seen betweenthe codominant model (Fig. 5) and the additive model(Fig. 6) for the ranges of the three test statistics values (forthe table of test results across all SNPs, see also Onlineresource 5). We first observed that q values for Diameter,Cellulose, Galactose and Lignin in the codominant modeldid not give good estimates of the FDR, since many SNPsshowing high and non-significant P values also showedvery low and highly significant q values (Fig. 5). This wasprobably due to the P value distribution among SNPs forthese traits, which did not reach the expected plateau(Storey and Tibshirani 2003) and thus did not allow for agood estimation of the proportion of null P values (asillustrated for Diameter in Online resource 6). Indeed,the q value approach has been developed originally formicroarray studies where several thousands of tests arecommonly performed to obtain a reasonable control ofFDR (Yang et al. 2005); its application to our small dataset(only 121 tests in the G1 population) was not appropriateand, in this particular case, the P′ value was a much bettercorrection for multiple testing. The SNP HDZ31.2268 wassignificantly associated with Cellulose in the additive

]0−0.1] ]0.1−0.2] ]0.2−0.3] ]0.3−0.4] ]0.4−0.5]

G0 samplesG1 samples

Minor allele frequency

Num

ber

of S

NP

s

010

2030

4050

Fig. 1 Allele frequency spectrum for the 141 and 121 informativeSNPs in the G0 and G1 samples, respectively

Tree Genetics & Genomes (2012) 8:113–126 117

C/C T/C T/T

−60

−40

−20

020

4060

BLU

P fo

r he

ight

9 72 78

R² = 0.0995P−value = 0.00018Q−value = 0.025P’−value = 0.035

Fig. 4 Genotypic effect of SNP CT_3782.445 on Height in the G0sample. (codominant model). The width of each box plot isproportional to the number of observations, which are indicatedbelow the x-axis

C/C T/C T/T

−2

02

4

BLU

P fo

r gi

rth

9 72 78

R² = 0.114P−value = 4.5e−05Q−value = 0.0064P’−value = 0.0160

Fig. 3 Genotypic effect of SNP CT_3782.445 on Girth in the G0sample. (codominant model). The width of each box plot isproportional to the number of observations, which are indicatedbelow the x-axis

01

23

45

−lo

g(P

−va

lue)

CT_3782.445

0.0

0.5

1.0

1.5

2.0

2.5

−lo

g(Q

−va

lue)

CT_3782.445

0.0

0.5

1.0

1.5

2.0

−lo

g(P

’−va

lue)

CT_3782.445

Fig. 2 P values, q values and P′ values for the SNP effect in the codominant model, for 141 informative SNPs in the G0 samples (list of SNPsand values of test statistics available in Online resource 5)

118 Tree Genetics & Genomes (2012) 8:113–126

model (P value=2.3×10−4 and P′ value=0.02, see alsoOnline resource 5), but not in the codominant modelwhere the P′ value was much higher (0.11). This marker isa synonymous SNP located in the HDZ31 transcriptionfactor, a candidate gene for wood formation. Its MAF inthe G1 samples was 36.8%, and its genotypic frequenciesdid not significantly depart from Hardy–Weinberg equi-librium. HDZ31.2268 was in complete LD with six othersynonymous SNPs of HDZ31 that were not included inthe association analysis (Online resource 3B). The other120 informative SNPs were not significantly associatedwith any trait using both the codominant and additivemodels at a 5% threshold for P′ values.

Discussion

Population structure and familial relatedness

In the present study, no population structure was found inthe maritime pine Aquitaine G0 and G1 samples using∼100 SNP markers scattered along the genome. This resultsuggests that the mass selection performed in the Aquitaine

forest, which is generally considered as an unstructuredpopulation both at the phenotypic (Danjon 1994) andmolecular (Derory et al. 2002; Eveno et al. 2008; Marietteet al. 2001; Ribeiro et al. 2002) levels, did not induce anyartificial stratification in the breeding population for thisgroup of SNPs. Further studies are, however, necessary toconfirm this result, as our markers only covered a verysmall part of the pine genome and finer substructure couldexist at the phenotypic level.

Familial relatedness was accounted for in our associationmodels, either in the first step of the two-stage analysiswhen estimating BLUPs for the G0 trees (Lepoittevin et al.2011) or directly in the one-stage analysis of the G1samples. The coancestry coefficient between individualswith unknown relationship was set to 0, thus assuming thatthe G0 trees were unrelated. Indeed, many arguments are infavour of low inbreeding in the Aquitaine population:maritime pine shows a high outcrossing rate and anextensive pollen flow (de-Lucas et al. 2008; Gaspar et al.2009; González-Martínez et al. 2003), and probably astrong selection against inbreds, as reported for various pinespecies (reviewed in Ledig 1998). Recent studies proposedthe use of marker-based kinship estimates to account for

0.0

1.0

2.0

3.0

−lo

g(P

−va

lue)

HDZ31.2268

01

23

4

−lo

g(Q

−va

lue)

HDZ31.2268

0.0

0.5

1.0

1.5

−lo

g(P

’−va

lue)

HDZ31.2268

Fig. 5 P values, q values and P′ values for the SNP effect in the codominant model, for 121 informative SNPs in the G1 samples (list of SNPsand values of test statistics available in Online resource 5)

Tree Genetics & Genomes (2012) 8:113–126 119

familial relatedness in association-mapping experiments,which proved useful when pedigree records were incom-plete or inaccurate (Stich and Melchinger 2009; Stich et al.2008; Yu et al. 2006). These methods will be tested inforthcoming association studies involving more SNPs orSSRs to ensure that the G0 trees can be consideredunrelated at a wider genomic level.

One-stage versus two-stage association-mappingapproaches

To date, the two-stage association-mapping approach is themost commonly used in plants (Stich et al. 2008). It firstimplies the analysis of phenotypic data and the calculationof BLUPs or entry means for each individual of thepopulation, followed by a second step where theseestimates are used for the association analysis. However,this two-stage procedure does not account for heterogeneityin experimental errors among the adjusted entry means.This problem can be overcome by applying a one-stageassociation approach in which the phenotypic and associ-ation analyses are performed together (Stich et al. 2008).This approach requires the availability of both phenotypes

and genotypes that we had for the 162 G1 clones. However,it could not be applied to the G0 samples for which onlymother trees were genotyped and half-sib progeniesphenotyped. In Yu et al. (2006), the heterogeneity ofexperimental errors among entry means in the two-stageapproach was partially accounted for by specifying astructure for the residual variance such as s2

e ¼ R� VR

where R is a square matrix in which the off-diagonalelements are 0 and the diagonal elements are reciprocal ofthe number of phenotypic observations underlying eachentry mean and VR is the residual variance. Stich et al.(2008) improved this method by replacing the diagonalelements of the R matrix by the square of the standarderrors of the adjusted entry means, which led to lower typeI error rates. These ‘experiment-wise error’ corrections arerarely used in published association studies, probablybecause they increase the computation time and are notimplemented in the software packages commonly used suchas Tassel (Bradbury et al. 2007) or SNPassoc (Gonzalez etal. 2007). In our case, the number of observations for eachG0 sample was either large (12 to 36 half-sibs for growth-related traits and stem straightness) or balanced (7 to 12half-sibs for chemistry traits, with 93% of the samples

01

23

4

−lo

g(P

−va

lue)

HDZ31.2268

0.0

0.5

1.0

1.5

2.0

−lo

g(Q

−va

lue)

HDZ31.2268

0.0

0.5

1.0

1.5

2.0

−lo

g(P

’−va

lue)

HDZ31.2268

Fig. 6 P values, q values and P′ values for the SNP effect in the additive model, for 121 informative SNPs in the G1 samples (list of SNPs andvalues of test statistics available in Online resource 5)

120 Tree Genetics & Genomes (2012) 8:113–126

represented by 8 to 11 half-sibs), resulting in lowcoefficients of variation for BLUP standard errors (<2.7%for all the traits except for Str for which it was higher,∼7.5%). Thus, we expect that applying such a correctionhere would probably not have changed the results much,considering the high homogeneity of BLUP standard errors.It would nevertheless be interesting to program thecorrection method based on standard errors for upcomingassociation-mapping studies, since unbalanced designs arefrequent in forest tree breeding. For the G1 samples, thenumber of replicates by clone was low (three to five) andBLUP standard errors were more variable, with coefficientsof variation ranging from 6% to 12.5% for the differenttraits considered. The one-stage approach was thus moreappropriate since it accounts directly for heterogeneity inexperimental errors.

Power, allele frequency and sample size

The power of an association test is the probability ofdetecting a ‘true’ effect. As power increases, the chances ofmaking a type II error (i.e. rejecting ‘true’ associations)decrease. In LD-mapping studies, power largely depends onthe sample size, type I error threshold, allele frequency,marker heritability and levels of LD between markers andputative causative mutations (Ball 2005; Wang et al. 2005).Previous studies of statistical power for LD mapping usingcoalescence simulations showed that ∼500 individualstyped for ∼20 SNPs spaced throughout a candidate generegion are necessary for detecting a causative polymor-phism of small effect (∼5% of phenotypic variationexplained) and that greater power is achieved by increasingthe sample size more than by increasing the number ofpolymorphisms (Long and Langley 1999). In the presentstudy, we used two different samples of 160 unrelated G0and 162 related G1 individuals, respectively, which is farless than the sample size recommended by Long andLangley (1999). We, however, included 5,080 and 958phenotypic measures for growth and wood chemistry traitson the G0 sample, respectively, and 768 phenotypicmeasures for the G1 sample. This accuracy in phenotypeprediction might have given us enough power to detect twoassociations that remained significant after correction formultiple testing. Significant associations have similarlybeen reported for small samples, in Eucalyptus for woodmicrofibril angle (N=290 in Thumma et al. 2005) and inPopulus for timing of bud set (N=120 in Ingvarsson et al.2008) or growth cessation (N=120 in Ma et al. 2010). Inour case, the MAFs for the two SNPs significantlyassociated to either growth or cellulose content were ratherhigh (28.3% and 36.8%, respectively), which is notsurprising. Indeed, variants that contribute to complex traitsare likely to have only modest phenotypic effects and rare

variants with modest effects are difficult to detect by anymethod because they explain only a very small fraction ofthe phenotypic variance in the population (Hirschhorn andDaly 2005). To illustrate this point, we simulated the powerto detect a significant association at a 10−5 alpha thresholdfor an SNP with a 5% heritability and with a sample size of1,000: it is near 0 for MAFs below 25%, but rapidlyincreases for larger MAFs (assuming Hardy–Weinbergequilibrium for genotypic frequencies, see Online resource7A). With a sample size 6 times smaller (160 individuals, asin the present study), the predicted power is null whateverthe MAF for such small SNP effects (Online resource 7B).Even at a low 10−5 alpha threshold, the power in our studywas very low (around 10−3 for the SNP CT_3782.445 forexample; see Fig. 7), which probably explains the lack ofrepeatability (i.e. the significant association detected in theG0 sample was not significant in the G1 sample, and viceversa; Ball 2005; Long and Langley 1999). The mostobvious limitations of association-mapping studies are thehigh genotyping and phenotyping costs, which hamper theuse of large sample sizes crucial to detect more robust andrepeatable associations. To minimise the amount of geno-typing required in association studies without sacrificingpower, Hirschhorn and Daly (2005) proposed a multistagestrategy: First, a large number of SNPs is genotyped in asmall sample, allowing for the detection of a subset of

0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.2

0.4

0.6

0.8

1.0

SNP heritability

Pow

er

maf 0−10%maf 10−20%maf 20−30%maf 30−40%maf 40−50%

XCT_3782.445

Fig. 7 Power of association tests for the G0 sample as a function ofSNP heritability at a 10−5 alpha threshold. Each curve represents 1 ofthe 141 SNPs of the study, with its particular genotypic frequencies.Colours correspond to SNP minor allele frequencies in the G0 sample.Power was calculated using the ldDesign package (Ball 2004),assuming that the SNPs considered were the causative mutations.The power of the association test for the SNP CT_3782.445, which issignificantly associated with growth traits with 11% heritability, isrepresented by a cross and is very low (∼10−3)

Tree Genetics & Genomes (2012) 8:113–126 121

SNPs with putative associations at a high nominal P valuethreshold. In the following stages, these SNPs are re-testedin similar or larger samples to distinguish the more likely tobe true-positive associations from the many false-positiveresults. This strategy was successfully used to detectvariants associated with autism in humans in a designinvolving ∼500,000 SNPs and over 1,000 families in thefirst step (Weiss and Arking 2009). In our study, only ∼10SNPs showed significant associations at a very liberal 1%alpha threshold, and re-testing them in samples of a similarsize is pointless as our design lacks repeatability. Since thecost of field trials and phenotyping is certainly moreimportant than that of genotyping for forest trees, we ratherpropose to extend the sample size by including more G0and G1 trees, ideally taking advantage of the many fieldtrials available through the breeding program. Thisapproach, however, requires that either control familiesor clones or related individuals are installed in each trialto allow estimating environmental effects in a combinedanalysis of different experiments (Williams et al. 2011).Finally, if the genotyping capacities are limited, an SNPselection can be done based on the MAFs, as this criterionis directly linked to the power of detection.

Significant associations: which genes, which traits?

The significant genotype–phenotype association withgrowth involved one SNP located in contig CT_3782,which was annotated as a putative FLA. FLAs are asubclass of arabinogalactan proteins (AGPs) that have, inaddition to predicted AGP-like glycosylated regions,putative cell adhesion domains known as fasciclin domains(Johnson et al. 2003). AGPs have been found in all organsand tissues of higher plants, since they are components ofthe plasma membrane, the extracellular matrix and the cellwall (Liu et al. 2008). They are involved in many cellularprocesses such as cell proliferation, cell expansion anddifferentiation, cell–cell recognition or programmed celldeath (Zhang et al. 2003). If the specific functions of AGPsremain uncertain, their involvement in wood formation hasbeen hypothesised through numerous expressional studies,for example in poplar (Lafarguette et al. 2004), loblollypine (No and Loopstra 2000; Whetten et al. 2001; Zhang etal. 2000), radiata pine (Li et al. 2009) or maritime pine(Gion et al. 2005; Paiva et al. 2008). Immunolocalisationstudies also showed that AGPs are expressed during patternformation in vascular tissues, giving stronger evidence fortheir role in xylem development (Casero et al. 1998; Zhanget al. 2003). Given the arguments above, the association ofCT_3782 with growth is plausible. In upcoming studies,new markers could be developed around CT_3782.445 toassess the extent of LD in this region. The same associationwas not significant for Height in the G1 sample (P value=

0.36 for the codominant model and P value=0.83 for theadditive model), which calls for further testing of therobustness of this potential association in other material.

Another significant association was detected betweenSNP HDZ31.2268 and cellulose content in the G1population. HDZ31 encodes a class III homeodomain-leucine zipper (HD-ZIPIII) transcription factor, a class ofgenes that is unique to plants (Prigge and Clark 2006; Sessaet al. 1993) and involved in tissue patterning and polarity(reviewed in Demura and Fukuda 2007). Indeed, gain-of-function mutations in A. thaliana HD-ZIPIII genes resultedin adaxialised lateral organs and amphivasal (xylemsurrounding phloem) vascular bundles (Emery et al. 2003;Juarez et al. 2004; McConnell et al. 2001; Zhong and Ye2004). In addition, the functions of HD-ZIPIII genes inwood formation have been demonstrated by reverse geneticapproaches in other plant species: in Zinnia elegans, fourHD-ZIPIII genes have been found differentially expressedin vascular tissues (procambium, immature xylem and/orxylem parenchyma cells; Ohashi-Ito and Fukuda 2003), andmutant approaches suggested that these genes regulatexylem cell differentiation (Ohashi-Ito et al. 2005). InPopulus, expressional studies showed that the PtaHB1 geneis closely associated with secondary growth and inverselycorrelated with the level of microRNA miR166 (Ko et al.2006). The function of HDZ31 in P. pinaster remainsunknown, but its role in wood formation and thus itsassociation with wood cellulose content is credible. In theAquitaine natural population, it showed a low level ofnucleotide diversity and a high level of LD across morethan 3.2 kbp, which was also detected in the G0 and G1samples (Online resource 3). Such a high level of LD wasnot commonly found in previous candidate genes re-sequencing studies in conifer species (Brown et al. 2004;Eveno et al. 2008; González-Martínez et al. 2006; Heuertz etal. 2006; Pyhäjärvi et al. 2007; Savolainen and Pyhäjärvi2007). However, more recent literature investigating LDpatterns among different full-length genes revealed caseswhere LD levels were much higher at regulatory genes inPicea species (Namroud et al. 2010) and at allozyme-codinggenes in Pinus sylvestris (Pyhäjärvi et al. 2011). Interestingly,a HD-ZIPIII homolog in Namroud et al. (2010) maintainedthe highest level of LD in the three species across the fivegenes studied, and this was interpreted as a possiblesignature of selection. In the natural population where theG0 were sampled, Tajima’s D (Tajima 1989) on HDZ31 waspositive (D=2.27) and departed significantly from thestandard neutral model (data not shown). However, furthersimulations of past demographic scenarios would be neededto explore whether this pattern could be explained bydemographic and/or selective events.

The high level of LD in our candidate gene prevents thedetection of the precise location of mutations putatively

122 Tree Genetics & Genomes (2012) 8:113–126

associated with wood chemistry traits in HDZ31. It wouldbe interesting first to obtain more sequenced data in thevicinity of HDZ31 to explore any significant LD decay andsecond to check if similar LD patterns or association arefound in other maritime pine populations. We could notperform this comparison here since Cellulose was notrecorded in the G0 population.

Conclusion and perspectives

We have shown that association mapping is a valuabletool to detect genotype–phenotype associations in themaritime pine breeding population and highlighted twogenes significantly associated with growth and woodcellulose content, respectively. However, our power waslow and increasing the sample size is required to detectrepeatable associations. Correcting for heterogeneity ofexperimental errors, discarding low-MAF SNPs toreduce the genotyping cost and enriching the CT_3782and HDZ31 regions with new markers have also beensuggested for upcoming association studies. We showedthat correcting for multiple testing through P′ valueswas, in some cases, more appropriate than using thepositive FDR method, at the cost of longer computationtime. Instead of implementing multiple testing, Bayesianmethods could provide an interesting alternative, as theydo not depend on the number of tests performed (Stephensand Balding 2009). Such methods should, however, beapplied with caution, since they require additional model-ling assumptions such as the choice of prior distributionsfor the effect sizes at SNPs showing true associations.These methods nevertheless may give a well-definedmeasure of strength of evidence that could be more easilycompared across studies, independent of the experimentaldesign or sample size used (Ball 2005).

Acknowledgements This research was supported by grants fromAgence Nationale de la Recherche: Genoplante (GenoQB,GNP05013C), Agence Nationale de la Recherche ‘Plates-FormesTechnologiques du Vivant’ (BOOST-SNP, 07PFTV002), the AquitaineRegion (20061201004PFM), the European Union (NOVELTREEproject, FP7-211868) and the EVOLTREE Network of Excellence(contract number 016322). C. Lepoittevin was supported by CIFREcontract between FCBA and INRA. The authors declare that they haveno conflict of interest.

References

Abecasis GR, Ghosh D, Nichols TE (2005) Linkage disequilibrium:ancient history drives the new genetics. Hum Hered 59(2):118–124. doi:10.1159/000085226

Alazard P, Canteloup D, Crémière L, Daubet A, Lesgourgues T,Merzeau D, Pastuszka P, Raffin A (2005) Genetic breeding of themaritime pine in Aquitaine: an exemplary success story. GroupePin Maritime du Futur, Cestas

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a newgeneration of protein database search programs. Nucleic AcidsRes 25(17):3389–3402. doi:10.1093/nar/25.17.3389

Ball RD (2004) ldDesign: design of experiments for detection oflinkage disequilibrium. Available at http://cran.r-project.org/web/packages/ldDesign/index.html

Ball RD (2005) Experimental designs for reliable detection of linkagedisequilibrium in unstructured random population associationstudies. Genetics 170(2):859–873. doi:10.1534/genetics.103.024752

Beaulieu J, Doerksen T, Boyle B, Clément S, Deslauriers M,Beauseigle S, Blais S, Poulin PL, Lenz P, Caron S (2011)Association genetics of wood physical traits in the conifer whitespruce and relationships with gene expression. Genetics 188(1):197. doi:10.1534/genetics.110.125781

Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y,Buckler ES (2007) TASSEL: software for association mapping ofcomplex traits in diverse samples. Bioinformatics 23(19):2633–2635. doi:10.1093/bioinformatics/btm308

Brown GR, Gill GP, Kuntz RJ, Langley CH, Neale DB (2004)Nucleotide diversity and linkage disequilibrium in loblolly pine.Proc Natl Acad Sci USA 101(42):15255–15260. doi:10.1073/pnas.0404231101

Bucci G, González-Martínez SC, Le Provost G, Plomion C, RibeiroMM, Sebastiani F, Alia R, Vendramin GG (2007) Range-widephylogeography and gene zones in Pinus pinaster Ait. revealedby chloroplast microsatellite markers. Mol Ecol 16(10):2137–2153. doi:10.1111/j.1365-294X.2007.03275.x

Burban C, Petit RJ (2003) Phylogeography of maritime pine inferredwith organelle markers having contrasted inheritance. Mol Ecol12(6):1487–1495. doi:10.1046/j.1365-294X.2003.01817.x

Canty A, Ripley B (2009) boot: Bootstrap R (S-Plus) functions.Available at http://cran.r-project.org/web/packages/boot/

Cardon LR, Bell JI (2001) Association study designs for complexdiseases. Nat Rev Genet 2(2):91–99. doi:10.1038/35052543

Casero PJ, Casimiro I, Knox JP (1998) Occurrence of cell surfacearabinogalactan-protein and extensin epitopes in relation to pericy-cle and vascular tissue development in the root apex of four species.Planta 204(2):252–259. doi:10.1007/s004250050254

Charcosset A, Gallais A (1996) Estimation of the contribution ofquantitative trait loci (QTL) to the variance of a quantitative traitby means of genetic markers. Theor Appl Genet 93(8):1193–1201. doi:10.1007/BF00223450

Cumbie WP, Eckert A, Wegrzyn J, Whetten R, Neale D, Goldfarb B(2011) Association genetics of carbon isotope discrimination,height and foliar nitrogen in a natural population of Pinus taedaL. Heredity. doi:10.1038/hdy.2010.168

Dabney A, Storey JD, Warnes GR (2009) qvalue: Q-value estimationfor false discovery rate control. Available at http://CRAN.R-project.org/package=qvalue

Danjon F (1994) Stand features and height growth in a 36-year-oldmaritime pine (Pinus pinaster Ait) provenance test. Silvae Genet43(1):52–62

de-Lucas AI, Robledo-Arnuncio JJ, Hidalgo E, González-Martínez SC(2008) Mating system and pollen gene flow in Mediterraneanmaritime pine. Heredity 100(4):390–399. doi:10.1038/sj.hdy.6801090

Demura T, Fukuda H (2007) Transcriptional regulation in woodformation. Trends Plant Sci 12(2):64–70. doi:10.1016/j.tplants.2006.12.006

Derory J, Mariette S, Gonzalez-Martinez SC, Chagne D, Madur D,Gerber S, Brach J, Persyn F, Ribeiro MM, Plomion C (2002)What can nuclear microsatellites tell us about maritime pinegenetic resources conservation and provenance certificationstrategies? Ann For Sci 59(5–6):699–708. doi:10.1051/forest:2002058

Tree Genetics & Genomes (2012) 8:113–126 123

Dillon SK, Nolan M, Li W, Bell C, Wu HX, Southerton SG (2010)Allelic variation in cell wall candidate genes affecting solid woodproperties in natural populations and land races of Pinus radiata.Genetics 185(4):1477. doi:10.1534/genetics.110.116582

Eckert AJ, Bower AD, Wegrzyn JL, Pande B, Jermstad KD,Krutovsky KV, St Clair JB, Neale DB (2009) Associationgenetics of coastal Douglas fir (Pseudotsuga menziesii var.menziesii, Pinaceae). I. Cold-hardiness related traits. Genetics182(4):1289. doi:10.1534/genetics.109.102350

Emery JF, Floyd SK, Alvarez J, Eshed Y, Hawker NP, Izhaki A, BaumSF, Bowman JL (2003) Radial patterning of Arabidopsis shootsby class III HD-ZIP and KANADI genes. Curr Biol 13(20):1768–1774. doi:10.1016/j.cub.2003.09.035

Evanno G, Regnaut S, Goudet J (2005) Detecting the number ofclusters of individuals using the software structure: a simulationstudy. Mol Ecol 14(8):2611–2620. doi:10.1111/j.1365-294X.2005.02553.x

Eveno E, Collada C, Guevara MA, Leger V, Soto A, Diaz L, Leger P,Gonzalez-Martinez SC, Cervera MT, Plomion C, Garnier-GerePH (2008) Contrasting patterns of selection at Pinus pinaster Ait.drought stress candidate genes as revealed by genetic differen-tiation analyses. Mol Biol Evol 25(2):417–437. doi:10.1093/molbev/msm272

Falconer DS, Mackay TFC (1996) Introduction to quantitativegenetics. Longman, New York

Falush D, Stephens M, Pritchard JK (2003) Inference of populationstructure using multilocus genotype data: linked loci andcorrelated allele frequencies. Genetics 164(4):1567–1587

Falush D, Stephens M, Pritchard JK (2007) Inference of populationstructure using multilocus genotype data: dominant markers andnull alleles. Mol Ecol Notes 7(4):574. doi:10.1111/j.1471-8286.2007.01758.x

Flint-Garcia SA, Thornsberry JM, Buckler ES (2003) Structure oflinkage disequilibrium in plants. Annu Rev Plant Biol 54:357–374. doi:10.1146/annurev.arplant.54.031902.134907

Fournier-Level A, Le Cunff L, Gomez C, Doligez A, Ageorges A,Roux C, Bertrand Y, Souquet JM, Cheynier V, This P (2009)Quantitative genetic bases of anthocyanin variation in grape (Vitisvinifera L. ssp sativa) berry: a QTL to QTN integrated study.Genetics 183:1127–1139. doi:10.1534/genetics.109.103929

Garnier-Gere P (1992) Contribution à l’étude de la variabilitégénétique inter et intra-population chez le maïs (Zea maysL.): valorisation d’informations agromorphologiques et enzy-matiques. Institut National Agronomique Paris-Grignon,Paris-Grignon

Gaspar MJ, de-Lucas AI, Alía R, Almiro Pinto Paiva J, Hidalgo E,Louzada J, Almeida H, González-Martínez SC (2009) Use ofmolecular markers for estimating breeding parameters: a casestudy in a Pinus pinaster Ait. progeny trial. Tree Genet Genom5:609–616. doi:10.1007/s11295-009-0213-1

Gilmour AR, Gogel BJ, Cullis BR, Thompson R (2006) ASReml userguide release 2.0. VSN International Ltd., Hemel Hempstead

Gilmour AR, Thompson R, Cullis BR (1995) Average informationREML: an efficient algorithm for variance parameter estimationin linear mixed models. Biometrics 51(4):1440–1450

Gion JM, Lalanne C, Le Provost G, Ferry-Dumazet H, Paiva J,Chaumeil P, Frigerio JM, Brach J, Barre A, de Daruvar A,Claverol S, Bonneu M, Sommerer N, Negroni L, Plomion C(2005) The proteome of maritime pine wood forming tissue.Proteomics 5(14):3731–3751. doi:10.1002/pmic.200401197

González-Martínez SC, Alia R, Gil L (2002) Population geneticstructure in a Mediterranean pine (Pinus pinaster Ait.): acomparison of allozyme markers and quantitative traits. Heredity89:199–206. doi:10.1038/sj.hdy.6800114

González-Martínez SC, Ersoz E, Brown GR, Wheeler NC, Neale DB(2006) DNA sequence variation and selection of tag single-

nucleotide polymorphisms at candidate genes for drought-stressresponse in Pinus taeda L. Genetics 172(3):1915–1926.doi:10.1534/genetics.105.047126

González-Martínez SC, Gerber S, Cervera MT, Martínez-Zapater JM,Alía R, Gil L (2003) Selfing and sibship structure in a two-cohortstand of maritime pine (Pinus pinaster Ait.) using nuclear SSRmarkers. Ann For Sci 60(2):115–121. doi:10.1051/forest:2003003

González-Martínez SC, Huber D, Ersoz E, Davis JM, Neale DB(2008) Association genetics in Pinus taeda L. II. Carbon isotopediscrimination. Heredity 101(1):19–26. doi:10.1038/hdy.2008.21

González-Martínez SC, Wheeler NC, Ersoz E, Nelson CD, Neale DB(2007) Association genetics in Pinus taeda L. I. Wood propertytraits. Genetics 175(1):399–409

Gonzalez JR, Armengol L, Sole X, Guino E, Mercader JM, Estivill X,Moreno V (2007) SNPassoc: an R package to perform wholegenome association studies. Bioinformatics 23(5):654–655.doi:10.1093/bioinformatics/btm025

Gupta PK, Rustgi S, Kulwal PL (2005) Linkage disequilibrium andassociation studies in higher plants: present status and futureprospects. Plant Mol Biol 57(4):461–485. doi:10.1007/s11103-005-0257-z

Henderson CR (1975) Best linear unbiased estimation and predictionunder a selection model. Biometrics 31(2):423–447

Heuertz M, De Paoli E, Kallman T, Larsson H, Jurman I, Morgante M,Lascoux M, Gyllenstrand N (2006) Multilocus patterns ofnucleotide diversity, linkage disequilibrium and demographichistory of Norway spruce [Picea abies (L.) Karst]. Genetics 174(4):2095–2105. doi:10.1534/genetics.106.065102

Hirschhorn JN, Daly MJ (2005) Genome-wide association studies forcommon diseases and complex traits. Nat Rev Genet 6(2):95–108. doi:10.1038/nrg1521

Holliday JA, Ritland K, Aitken SN (2010) Widespread, ecologicallyrelevant genetic markers developed from association mapping ofclimate related traits in Sitka spruce (Picea sitchensis). NewPhytol. doi:10.1111/j.1469-8137.2010.03380.x

Hubisz MJ, Falush D, Stephens M, Pritchard JK (2009) Inferringweak population structure with the assistance of sample groupinformation. Mol Ecol Resour 9(5):1322–1332. doi:10.1111/j.1755-0998.2009.02591.x

Ingvarsson PK, Garcia MV, Luquez V, Hall D, Jansson S (2008)Nucleotide polymorphism and phenotypic associations withinand around the phytochrome B2 locus in European aspen(Populus tremula, Salicaceae). Genetics 178(4):2217–2226.doi:10.1534/genetics.107.082354

Johnson KL, Jones BJ, Bacic A, Schultz CJ (2003) The fasciclin-likearabinogalactan proteins of Arabidopsis. A multigene family ofputative cell adhesion molecules. Plant Physiol 133(4):1911–1925. doi:10.1104/pp.103.031237

Juarez MT, Kui JS, Thomas J, Heller BA, Timmermans MCP (2004)microRNA-mediated repression of rolled leaf1 specifies maizeleaf polarity. Nature 428(6978):84–88. doi:10.1038/nature02363

Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ,Eskin E (2008) Efficient control of population structure in modelorganism association mapping. Genetics 178(3):1709–1723.doi:10.1534/genetics.107.080101

Ko JH, Prassinos C, Han KH (2006) Developmental and seasonalexpression of PtaHB1, a Populus gene encoding a class IIIHD-Zip protein, is closely associated with secondary growth andinversely correlated with the level of microRNA (miR166). NewPhytol 169(3):469–478. doi:10.1111/j.1469-8137.2005.01623.x

Lafarguette F, Leple J-C, Dejardin A, Laurans F, Costa G, Lesage-Descauses M-C, Pilate G (2004) Poplar genes encoding fasciclin-like arabinogalactan proteins are highly expressed in tensionwood. New Phytol 164(1):107–121. doi:10.1111 /j.1469-8137.2004.01175.x

124 Tree Genetics & Genomes (2012) 8:113–126

Ledig FT (1998) Genetic variation in Pinus. In: Richardson DM (ed)Ecology and biogeography of Pinus. Cambridge UniversityPress, Cambridge, pp 251–280

Lepoittevin C, Frigerio J-M, Garnier-Géré P, Salin F, Cervera MT,Vornam B, Harvengt L, Plomion C (2010) In vitro vs in silicodetected SNPs for the development of a genotyping array: whatcan we learn from a non-model species? PLoS ONE 5(6):e11034.doi:10.1371/journal.pone.0011034

Lepoittevin C, Rousseau J-P, Guillemin A, Gauvrit C, Besson F,Hubert F, Da Silva Perez D, Harvengt L, Plomion C (2011)Genetic parameters of growth, straightness and wood-chemistrytraits in Pinus pinaster. Ann For Sci 68:873–884. doi:10.1007/s13595-011-0084-0

Li X, Wu H, Dillon S, Southerton S (2009) Generation and analysis ofexpressed sequence tags from six developing xylem libraries inPinus radiata D. Don. BMC Genomics 10(1):41. doi:10.1186/1471-2164-10-41

Liu D, Tu L, Li Y, Wang L, Zhu L, Zhang X (2008) Genes encodingfasciclin-like arabinogalactan proteins are specifically expressedduring cotton fiber development. Plant Mol Biol Rep 26(2):98–113. doi:10.1007/s11105-008-0026-7

Long AD, Langley CH (1999) The power of association studies todetect the contribution of candidate genetic loci to variation incomplex traits. Genome Res 9(8):720–731. doi:10.1101/gr.9.8.720

Ma XF, Hall D, Onge KR, Jansson S, Ingvarsson PK (2010) Geneticdifferentiation, clinal variation and phenotypic associations withgrowth cessation across the Populus tremula photoperiodic path-way. Genetics 186:1033–1044. doi:10.1534/genetics.110.120873

Malosetti M, van der Linden CG, Vosman B, van Eeuwijk FA (2007)A mixed-model approach to association mapping using pedigreeinformation with an illustration of resistance to Phytophthorainfestans in potato. Genetics 175(2):879–889. doi:10.1534/genetics.105.054932

Marchini J, Cardon LR, Phillips MS, Donnelly P (2004) The effects ofhuman population structure on large genetic association studies.Nat Genet 36(5):512–517. doi:10.1038/ng1337

Mariette S, Chagne D, Lezier C, Pastuszka P, Baffin A, Plomion C,Kremer A (2001) Genetic diversity within and among Pinuspinaster populations: comparison between AFLP and micro-satellite markers. Heredity 86:469–479. doi:10.1046/j.1365-2540.2001.00852.x

McConnell JR, Emery J, Eshed Y, Bao N, Bowman J, Barton MK(2001) Role of PHABULOSA and PHAVOLUTA in determiningradial patterning in shoots. Nature 411(6838):709–713.doi:10.1038/35079635

Namroud M-C, Guillet-Claude C, Mackay J, Isabel N, Bousquet J(2010) Molecular evolution of regulatory genes in spruces fromdifferent species and continents: heterogeneous patterns oflinkage disequilibrium and selection but correlated recentdemographic changes. J Mol Evol 70(4):371–386. doi:10.1007/s00239-010-9335-1

Neale DB, Savolainen O (2004) Association genetics of complex traitsin conifers. Trends Plant Sci 9(7):325–330. doi:10.1016/j.tplants.2004.05.006

No EG, Loopstra CA (2000) Hormonal and developmental regulationof two arabinogalactan-proteins in xylem of loblolly pine (Pinustaeda). Physiol Plant 110(4):524–529. doi:10.1111/j.1399-3054.2000.1100415.x

Ohashi-Ito K, Fukuda H (2003) HD-Zip III homeobox genes thatinclude a novel member, ZeHB-13 (Zinnia)/ATHB-15 (Arabi-dopsis), are involved in procambium and xylem cell differenti-ation. Plant Cell Physiol 44(12):1350–1358. doi:10.1093/pcp/pcg164

Ohashi-Ito K, Kubo M, Demura T, Fukuda H (2005) Class IIIhomeodomain leucine-zipper proteins regulate xylem cell differ-

entiation. Plant Cell Physiol 46(10):1646–1656. doi:10.1093/pcp/pci180

Paiva JAP, Garnier-Gere PH, Rodrigues JC, Alves A, Santos S, GracaJ, Le Provost G, Chaumeil P, Da Silva-Perez D, Bosc A (2008)Plasticity of maritime pine (Pinus pinaster) wood-forming tissuesduring a growing season. New Phytol 179(4):1180–1194.doi:10.1111/j.1469-8137.2008.02536.x

Pot D, McMillan L, Echt C, Le Provost G, Garnier-Gere P, Cato S,Plomion C (2005) Nucleotide variation in genes involved inwood formation in two pine species. New Phytol 167(1):101–112. doi:10.1111/j.1469-8137.2005.01417.x

Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA,Reich D (2006) Principal components analysis corrects forstratification in genome-wide association studies. Nat Genet 38(8):904–909. doi:10.1038/ng1847

Price AL, Zaitlen NA, Reich D, Patterson N (2010) New approachesto population stratification in genome-wide association studies.Nat Rev Genet 11(7):459–463. doi:10.1038/nrg2813

Prigge MJ, Clark SE (2006) Evolution of the class III HD-Zip genefamily in land plants. Evol Dev 8(4):350–361. doi:10.1111/j.1525-142X.2006.00107.x

Pritchard JK, Stephens M, Donnelly P (2000a) Inference of populationstructure using multilocus genotype data. Genetics 155(2):945–959

Pritchard JK, Stephens M, Rosenberg NA, Donnelly P (2000b)Association mapping in structured populations. Am J HumGenet 67(1):170–181. doi:10.1086/302959

Pritchard JK, Wen X, Falush D (2007) Documentation for structuresoftware: version 2.2. Available at http://pritch.bsd.uchicago.edu/software

Pyhäjärvi T, Garcia-Gil MR, Knurr T, Mikkonen M, Wachowiak W,Savolainen O (2007) Demographic history has influencednucleotide diversity in European Pinus sylvestris populations.Genetics 177(3):1713–1724. doi:10.1534/genetics.107.077099

Pyhäjärvi T, Kujala S, Savolainen O (2011) Revisiting proteinheterozygosity in plants—nucleotide diversity in allozymecoding genes of conifer Pinus sylvestris. Tree Genet Genom7:385–397. doi:10.1007/s11295-010-0340-8

Quesada T, Gopal V, Cumbie WP, Eckert AJ, Wegrzyn JL, Neale DB,Goldfarb B, Huber DA, Casella G, Davis JM (2010) Associationmapping of quantitative disease resistance in a natural populationof loblolly pine (Pinus taeda L.). Genetics 186(2):677.doi:10.1534/genetics.110.117549

R_Development_Core_Team (2009) R: a language and environmentfor statistical computing. Available at http://www.R-project.org

Raymond M, Rousset F (1995) Genepop (version-1.2)—population-genetics software for exact tests and ecumenicism. J Hered 86(3):248–249

Ribeiro MM, Mariette S, Vendramin GG, Szmidt AE, Plomion C,Kremer A (2002) Comparison of genetic diversity estimateswithin and among populations of maritime pine using chloroplastsimple-sequence repeat and amplified fragment length polymor-phism data. Mol Ecol 11(5):869–877. doi:10.1046/j.1365-294X.2002.01490.x

Rinaldo A, Bacanu SA, Devlin B, Sonpar V, Wasserman L, Roeder K(2005) Characterization of multilocus linkage disequilibrium.Genet Epidemiol 28(3):193–206. doi:10.1002/gepi.20056

Savolainen O, Pyhäjärvi T (2007) Genomic diversity in forest trees.Curr Opin Plant Biol 10(2):162–167. doi:10.1016/j.pbi.2007.01.011

Sessa G, Morelli G, Ruberti I (1993) The Athb-1 and Athb-2 Hd-Zipdomains homodimerize forming complexes of different DNA-binding specificities. EMBO J 12(9):3507–3517

Sillanpää MJ (2011) Overview of techniques to account for confound-ing due to population stratification and cryptic relatedness ingenomic data association analyses. Heredity 106:511–519.doi:10.1038/hdy.2010.91

Tree Genetics & Genomes (2012) 8:113–126 125

Stephens M, Balding DJ (2009) Bayesian statistical methods forgenetic association studies. Nat Rev Genet 10(10):681–690.doi:10.1038/nrg2615

Stich B, Melchinger AE (2009) Comparison of mixed-modelapproaches for association mapping in rapeseed, potato, sugarbeet, maize, and Arabidopsis. BMC Genomics 10(1):94.doi:10.1186/1471-2164-10-94

Stich B, Mohring J, Piepho HP, Heckenberger M, Buckler ES,Melchinger AE (2008) Comparison of mixed-model approachesfor association mapping. Genetics 178(3):1745. doi:10.1534/genetics.107.079707

Storey JD (2002) A direct approach to false discovery rates. J Roy StatSoc Ser B (Stat Method) 64:479–498

Storey JD, Tibshirani R (2003) Statistical significance for genomewidestudies. Proc Natl Acad Sci USA 100(16):9440–9445.doi:10.1073/pnas.1530509100

Tajima F (1989) Statistical method for testing the neutral mutationhypothesis by DNA polymorphism. Genetics 123:585–595

Thumma BR, Matheson BA, Zhang D, Meeske C, Meder R, DownesGM, Southerton SG (2009) Identification of a cis-actingregulatory polymorphism in a eucalypt Cobra-like geneaffecting cellulose content. Genetics 183:1153–1164.doi:10.1534/genetics.109.106591

Thumma BR, Nolan MR, Evans R, Moran GF (2005) Polymorphismsin cinnamoyl CoA reductase (CCR) are associated with variationin microfibril angle in Eucalyptus spp. Genetics 171(3):1257–1265. doi:10.1534/genetics.105.042028

Voight BF, Pritchard JK (2005) Confounding from cryptic relatednessin case–control association studies. PLoS Genet 1(3):302–311.doi:10.1371/journal.pgen.0010032

Wang WYS, Barratt BJ, Clayton DG, Todd JA (2005) Genome-wideassociation studies: theoretical and practical concerns. Nat RevGenet 6(2):109–118. doi:10.1038/nrg1522

Weiss LA, Arking DE (2009) A genome-wide linkage and associationscan reveals novel loci for autism. Nature 461(7265):802–808.doi:10.1038/nature08490

Whetten R, Sun Y-H, Zhang Y, Sederoff R (2001) Functional genomicsand cell wall biosynthesis in loblolly pine. Plant Mol Biol 47(1):275–291. doi:10.1023/A:1010652003395

Williams E, Piepho HP, Whitaker D (2011) Augmented p-rep designs.Biom J 53(1):19–27. doi:10.1002/bimj.201000102

Wu B, Liu N, Zhao H (2006) PSMIX: an R package for populationstructure inference via maximum likelihood method. BMCBioinforma 7(1):317. doi:10.1186/1471-2105-7-317

Xiong M, Guo S-W (1997) Fine-scale genetic mapping based onlinkage disequilibrium: theory and applications. Am J HumGenet 60(6):1513–1531. doi:10.1086/515475

Yang Q, Cui J, Chazaro I, Cupples LA, Demissie S (2005) Power andtype I error rate of false discovery rate approaches in genome-wide association studies. BMC Genet 6(Suppl 1):S134

Yu JM, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF,McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S,Buckler ES (2006) A unified mixed-model method for associa-tion mapping that accounts for multiple levels of relatedness. NatGenet 38(2):203–208. doi:10.1038/ng1702

Zhang Y, Brown G, Whetten R, Loopstra CA, Neale D, KieliszewskiMJ, Sederoff RR (2003) An arabinogalactan protein associatedwith secondary cell wall formation in differentiating xylem ofloblolly pine. Plant Mol Biol 52(1):91–102. doi:10.1023/A:1023978210001

Zhang Y, Sederoff RR, Allona I (2000) Differential expression ofgenes encoding cell wall proteins in vascular tissues from verticaland bent loblolly pine trees. Tree Physiol 20(7):457–466.doi:10.1093/treephys/20.7.457

Zhao K, Aranzana MJ, Kim S, Lister C, Shindo C, Tang C, ToomajianC, Zheng H, Dean C, Marjoram P, Nordborg M (2007) AnArabidopsis example of association mapping in structuredsamples. PLoS Genet 3(1):e4. doi:10.1371/journal.pgen.0030004

Zhong R, Ye Z-H (2004) Amphivasal vascular bundle 1, a gain-of-function mutation of the IFL1/REV gene, is associated withalterations in the polarity of leaves, stems and carpels. Plant CellPhysiol 45(4):369–385. doi:10.1093/pcp/pch051

126 Tree Genetics & Genomes (2012) 8:113–126