Linkage disequilibrium and population-structure analysis among Capsicum annuum L. cultivars for use...

9
1 3 Mol Genet Genomics DOI 10.1007/s00438-014-0827-3 ORIGINAL PAPER Linkage disequilibrium and population‑structure analysis among Capsicum annuum L. cultivars for use in association mapping Padma Nimmakayala · Venkata L. Abburi · Lavanya Abburi · Suresh Babu Alaparthi · Robert Cantrell · Minkyu Park · Doil Choi · Gerald Hankins · Sridhar Malkaram · Umesh K. Reddy Received: 13 November 2013 / Accepted: 11 February 2014 © Springer-Verlag Berlin Heidelberg 2014 (CA) and dihydrocapsaicin (DCA) levels. Further, CAMS- 142 was located in an LD block of 98.18 Mb. CAMS-142 amplified bands of 244, 268, 283 and 326 bp. Alleles 268 and 283 bp had positive effects on both CA and DCA levels, with an average R 2 of 12.15 % (CA) and 12.3 % (DCA). Eight markers from seven different chromosomes were significantly associated with fruit weight, contrib- uting an average effect of 15 %. CAMS-199, HpmsE082 and CAMS-190 are the three major quantitative trait loci located on chromosomes 8, 9, and 10, respectively, and were associated with fruit weight in samples from both years of the study. This research demonstrates the effective- ness of using genome-wide SSR-based markers to assess features of LD and genetic diversity within C. annuum. Keywords SSRs · Population structure · Association mapping · Capsaicin · Fruit weight Introduction Capsicum annuum L. is a highland Mexican pepper type that has spread widely across the world and grown as sweet and hot pepper (Hernández-Verdugo et al. 2001). The wild progenitor is an erect plant with small red-colored fruits that are soft-fleshed and pungent (Hill et al. 2013). Christo- pher Columbus encountered Capsicum fruit during his trav- els to the Americas and named them “peppers” because of their spicy hot taste (Powis et al. 2013). Pickersgill (1997) noted that selection for different goals within domesticated C. annuum types might have fragmented the genetic diver- sity into narrow pools. Despite the great economic and cultural importance of C. annuum, the population struc- ture of worldwide collections is little known (Aguilar- Meléndez et al. 2009). Genome-wide association studies Abstract Knowledge of population structure and link- age disequilibrium among the worldwide collections of peppers currently classified as hot, mild, sweet and orna- mental types is indispensable for applying association map- ping and genomic selection to improve pepper. The current study aimed to resolve the genetic diversity and related- ness of Capsicum annuum germplasm by use of simple sequence repeat (SSR) loci across all chromosomes in sam- ples collected in 2011 and 2012. The physical distance cov- ered by the entire set of SSRs used was 2,265.9 Mb from the 3.48-Gb hot-pepper genome size. The model-based pro- gram STRUCTURE was used to infer five clusters, which was further confirmed by classical molecular-genetic diver- sity analysis. Mean heterozygosity of various loci was esti- mated to be 0.15. Linkage disequilibrium (LD) was used to identify 17 LD blocks across various chromosomes with sizes from 0.154 Kb to 126.28 Mb. CAMS-142 of chromo- some 1 was significantly associated with both capsaicin P. Nimmakayala, V. L. Abburi and U. K. Reddy contributed equally to this work. Communicated by S. Hohmann. Electronic supplementary material The online version of this article (doi:10.1007/s00438-014-0827-3) contains supplementary material, which is available to authorized users. P. Nimmakayala · V. L. Abburi · L. Abburi · S. B. Alaparthi · R. Cantrell · G. Hankins · S. Malkaram · U. K. Reddy (*) Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, WV 25112-1000, USA e-mail: [email protected] M. Park · D. Choi Department of Plant Science, Plant Genomics and Breeding Institute, College of Agriculture and Life Sciences, Seoul National University, Seoul 151-321, Republic of Korea

Transcript of Linkage disequilibrium and population-structure analysis among Capsicum annuum L. cultivars for use...

1 3

Mol Genet GenomicsDOI 10.1007/s00438-014-0827-3

OrIGInal PaPer

Linkage disequilibrium and population‑structure analysis among Capsicum annuum L. cultivars for use in association mapping

Padma Nimmakayala · Venkata L. Abburi · Lavanya Abburi · Suresh Babu Alaparthi · Robert Cantrell · Minkyu Park · Doil Choi · Gerald Hankins · Sridhar Malkaram · Umesh K. Reddy

received: 13 november 2013 / accepted: 11 February 2014 © Springer-Verlag Berlin Heidelberg 2014

(Ca) and dihydrocapsaicin (DCa) levels. Further, CaMS-142 was located in an lD block of 98.18 Mb. CaMS-142 amplified bands of 244, 268, 283 and 326 bp. alleles 268 and 283 bp had positive effects on both Ca and DCa levels, with an average R2 of 12.15 % (Ca) and 12.3 % (DCa). eight markers from seven different chromosomes were significantly associated with fruit weight, contrib-uting an average effect of 15 %. CaMS-199, Hpmse082 and CaMS-190 are the three major quantitative trait loci located on chromosomes 8, 9, and 10, respectively, and were associated with fruit weight in samples from both years of the study. This research demonstrates the effective-ness of using genome-wide SSr-based markers to assess features of lD and genetic diversity within C. annuum.

Keywords SSrs · Population structure · association mapping · Capsaicin · Fruit weight

Introduction

Capsicum annuum l. is a highland Mexican pepper type that has spread widely across the world and grown as sweet and hot pepper (Hernández-Verdugo et al. 2001). The wild progenitor is an erect plant with small red-colored fruits that are soft-fleshed and pungent (Hill et al. 2013). Christo-pher Columbus encountered Capsicum fruit during his trav-els to the americas and named them “peppers” because of their spicy hot taste (Powis et al. 2013). Pickersgill (1997) noted that selection for different goals within domesticated C. annuum types might have fragmented the genetic diver-sity into narrow pools. Despite the great economic and cultural importance of C. annuum, the population struc-ture of worldwide collections is little known (aguilar-Meléndez et al. 2009). Genome-wide association studies

Abstract Knowledge of population structure and link-age disequilibrium among the worldwide collections of peppers currently classified as hot, mild, sweet and orna-mental types is indispensable for applying association map-ping and genomic selection to improve pepper. The current study aimed to resolve the genetic diversity and related-ness of Capsicum annuum germplasm by use of simple sequence repeat (SSr) loci across all chromosomes in sam-ples collected in 2011 and 2012. The physical distance cov-ered by the entire set of SSrs used was 2,265.9 Mb from the 3.48-Gb hot-pepper genome size. The model-based pro-gram STrUCTUre was used to infer five clusters, which was further confirmed by classical molecular-genetic diver-sity analysis. Mean heterozygosity of various loci was esti-mated to be 0.15. linkage disequilibrium (lD) was used to identify 17 lD blocks across various chromosomes with sizes from 0.154 Kb to 126.28 Mb. CaMS-142 of chromo-some 1 was significantly associated with both capsaicin

P. nimmakayala, V. l. abburi and U. K. reddy contributed equally to this work.

Communicated by S. Hohmann.

Electronic supplementary material The online version of this article (doi:10.1007/s00438-014-0827-3) contains supplementary material, which is available to authorized users.

P. nimmakayala · V. l. abburi · l. abburi · S. B. alaparthi · r. Cantrell · G. Hankins · S. Malkaram · U. K. reddy (*) Department of Biology, Gus r. Douglass Institute, West Virginia State University, Institute, WV 25112-1000, USae-mail: [email protected]

M. Park · D. Choi Department of Plant Science, Plant Genomics and Breeding Institute, College of agriculture and life Sciences, Seoul national University, Seoul 151-321, republic of Korea

Mol Genet Genomics

1 3

and genomic selection have been used in several crops with high-throughput single-nucleotide polymorphisms (SnPs); however, use of simple sequence repeat markers (SSrs) is comparatively superior in resolving population structure. In comparing analyses based on 89 SSrs (primarily dinucleo-tide repeats) to that based on 847 SnPs in individuals from the same 259 inbred maize lines, Hamblin et al. (2007) concluded that the SSrs performed better when clustering germplasm into populations than did the 847 SnPs or 554 SnP haplotypes. The authors compared heterozygosity lev-els and allele frequencies of SnPs and SSrs that had impli-cations for their use in assessing relatedness and evaluation of genetic diversity. With recent advances in sequencing technologies, microsatellites can be selected from regular intervals covering the entire length of a genome. Genome-wide SSr markers have been useful in genetic-diversity studies of diverse plant genotypes (Sonah et al. 2011; aru-mugasundaram et al. 2011; reyes-Valdés et al. 2013; liu et al. 2013; Hao et al. 2011). The current study is to resolve molecular-diversity of C. annuum collections and popula-tion structure using genome-wide SSr markers in pepper and evaluate its use in an association-mapping study.

examining linkage disequilibrium (lD) distribution across the C. annuum genome can help in understanding the consequences of selection throughout its breeding his-tory. Cultivars of C. annuum diverged with breeding meth-ods meant to suit various cooking methods, food traditions and wide-ranging ecologies across the world. Understand-ing genome divergence and polymorphism related to fruit traits is a way to study genomic footprints of adaptation. an examination of population structure, diversity at the molecular level, lD distribution across various chromo-somes and quantitative trait loci (QTl) estimated by using model-based association mapping can provide insights into the evolution of various traits among C. annuum cul-tivars. Despite several QTl studies of pepper (Ben Chaim et al. 2001, 2003; Paran 2003; Yarnes et al. 2012), the cur-rent research is unique in that it involved association map-ping to identify markers for various fruit-related traits. Our association-mapping study aimed to identify genome-wide effects on FW and capsaiciates in C. annuum populations by using a set of microsatellites previously mapped to various chromosomes (Minamiyama et al. 2006; Yi et al. 2006; Sugita et al. 2013) and validated their position on the whole-genome sequence draft.

association mapping has been found effective for min-ing new markers and has been used with all major crops, including maize, rice, barley, tomato, wheat, sorghum, sug-arcane, soybean, grape, and melon (Tomason et al. 2013; Zhang et al. 2011). For estimating Q and K, the multiallelic and codominant microsatellites can be useful because they are selectively neutral (Zhu et al. 2008). as compared with other marker systems such as SnPs, SSrs are relatively new

alleles and show higher rates of mutation (Matsuoka et al. 2002). The current study aimed to resolve the population structure and relatedness of C. annuum germplasm with genome-wide SSrs and to estimate lD across the linkage groups for association mapping for FW and capsaiciates.

Materials and methods

For molecular-diversity analysis, we included 96 accessions of C. annuum belonging to various countries representing a wide geographical area (Table S1). These selfed accessions were grown in three replications during two seasons (2011 and 2012) adapting a row-to-plant spacing of 100 × 30 cm. We grew ten plants per accession for each replication. We collected tissue from five plants for FW (mg) and harvested ten fruits per replication at the breaker stage that were oven-dried at 75 °C to grind into fine powder for estimat-ing capsaicin (Ca) and dihydrocapsaicin (DCa). Ground samples (100 mg) were mixed with 2 ml pure acetoni-trile thoroughly on a vortex. The mixture was incubated at 50 °C for 1 h followed by 1-h sonication before centrifu-gation at 10,000 rpm for 15 min. The supernatant was fil-tered through a phenomenex 0.2-μm PTFe membrane filter (Torrance, Ca, USa) before analysis. The sample was then transferred to high-performance liquid chromatography (HPlC) vials. The HPlC system was equipped with a 1525 binary HPlC pump, 2707 autosampler and 2998 photodi-ode array detector (Waters Corp., Milford, Ma, USa). a 10-μl sample was injected. The mobile phases were ace-tonitrile and 2 % acetic acid at a flow rate of 0.6 ml/min. an X-Bridge C18 column (4.6 × 150 mm; 5 μm) coupled with a guard column (Waters Corp.) was used. Ca and DCa levels were detected at 280 nm. retention times were approximately 9.3 and 9.7 min for Ca and DCa, respec-tively. Stock solutions of Ca and DCa (Sigma-aldrich) were prepared in acetonitrile for a linear standard curve ranging from 12.5 to 500 ppm. The integrated peak areas of Ca and DCa in unknown samples were converted to parts per million by using an equation of the line describing the contribution of Ca and DCa in the standard (R2 = 0.99).

Marker resources

We used 176 single-locus SSrs mapped to various link-age groups of published maps (Minamiyama et al. 2006; Yi et al. 2006; Sugita et al. 2013). Chromosomal assign-ment and position on the physical map was deduced from the whole-genome sequence draft for hot pepper (CM334) that spanned 3.48 Gb (kindly provided by Drs. Park and Choi of Seoul national University). In all, 17 SSr mark-ers were located in a physical distance of 172.08 Mb on

Mol Genet Genomics

1 3

chromosome 1 (Chr-1). In addition, we used 11 SSrs span-ning a cumulative length of 117.08 Mb on Chr-2, 24 SSrs (235.56 Mb) on Chr-3, 15 SSrs (187.54 Mb) on Chr-4, 12 SSrs (185.11 Mb) on Chr-5, 4 SSrs (138.71 Mb) on Chr-6, 11 SSrs (208.62 Mb) on Chr-7, 27 SSrs (171.7 Mb) on Chr-8, 12 SSrs (231.15 Mb) on Chr-9, 9 SSrs (216.12 Mb) on Chr-10, 10 SSrs (211.70 Mb) on Chr-11 and 7 SSrs (190.48 Mb) on Chr-12.

DNA isolation and PCR

Genomic Dna isolation involved the plant Dna isola-tion kit (QIaGen cat# 69104). PCr reaction consisted of 50 ng genomic Dna, 0.20 μM mixed forward and reverse primers, 1× buffer (10 mM Tris–HCl, pH 8.2, 50 mM KCl, Triton 0.1 %, BSa 1 mg/ml), 1.5 mM MgCl2, 0.2 mM dnTPs and 1 U Taq polymerase in 10-μl reac-tion volumes. amplification involved a Geneamp PCr 9700 System thermal cycler (applied Biosystems Inc.) programmed at 94 °C for 2 min followed by 35 cycles of 94 °C for 30 s, 50–65 °C for 30 s, 72 °C for 1 min, and a final extension step at 72 °C for 10 min. amplified prod-ucts were separated on a high-throughput Dna fragment analyzer (AdvanCE FS; advanced analytical Technologies, Inc., ames, Ia). amplified PCr products were diluted to 1:11 times depending on the concentration of products, dilution and injection voltage adjusted to prevent overload-ing of the PCr product on the fragment analyzer. PCr product of 2 μl was pipetted into 22 μl of 1× Te dilution buffer in plate wells. Samples were size-separated by using a 96-capillary automated system with capillaries of 80 cm. Polymer and other required reagents were from the double-stranded Dna kit DnF-900 (advanced analytical Tech-nologies, Inc.), which can effectively separate the amplicon ranges between 35 and 500 bp and resolve 1-bp differences between various alleles. Following capillary electrophore-sis, data were processed by use of PrO Size 2.0 (advanced analytical Technologies, Inc.). The data were normalized to 35-bp lower marker and 500-bp upper marker and cali-brated to the 75- to 400-bp range.

Statistical analysis

Data from all three replications for Ca and DCa were log-10-transformed before anOVa. Data for FW, Ca and DCa were analyzed by anOVa. Structure 2.2 (Pritchard et al. 2000), which involves posterior probability of the data for a given K, Pr(X|K), was used to cluster individuals into number of population (K) groups. K was determined following the admixture model with correlated alleles, with a K of 2–10. Five independent runs of 100,000 Markov

Chain Monte Carlo generations and 100,000 generations of burn-in were used for estimating each value of K. The optimal K value was determined by an ad hoc statistic, ΔK (evanno et al. 2005). The number of Ks in each data-set was evaluated by using ΔK values estimated with the software STrUCTUre Harvester. ClUMPP 1.1 (Jakobs-son and rosenberg 2007) was used to find optimal align-ments of independent runs and the output was used directly as input into DISTrUCT 1.1, a program for cluster visu-alization (rosenberg 2004). Genetic distance estimates were calculated as described (Crossa and Franco (2004). The neighbor-joining (nJ) algorithm was used to build a dendogram based on genetic distances with MeGa var. 5 (Tamura et al. 2011). The genetic variance was parti-tioned between and among the pepper groups identified by molecular-diversity and population-structure analyses by analysis of molecular variance (aMOVa) with arlequin 2.0 (Schneider et al. 2000). Molecular-genetic diversity was estimated by genetic diversity and heterozygosity (h). FIS and FST were estimated by Wright’s F-statistics (Weir and Cockerham 1984) with PopGene 1.31 (Yeh and Boyle 1999). analysis of gene flow (nm) based on the private allele method (Slatkin 1985) was estimated with Genepop 4.1 (rousset 2008).

lD was estimated as the r2 between all pairs of SSrs with 1,000 permutations with TaSSel 2.2 (http://www.maizegenetics.net) (Bradbury et al. 2007). To visualize lD throughout the genome, we produced heatmaps based on P values for pair-wise r2 estimates for all marker pairs. These heatmaps were used to identify lD blocks in various pepper chromosomes at specific map locations. a marker set was used to estimate the kinship (K) matrix with TaS-Sel 2.1, which involves the proportion of alleles shared between each pair of accessions in the study. The Q matrix was adapted from K-5 cluster results obtained in structure analysis. GlM and MlM models of TaSSel 3.0 were used for association mapping. a false discovery rate (FDr) of 0.05 was used as a threshold for significant association with Bonferroni correction (Sidak 1967) or at a moderate threshold as described (Benjamini and Hochberg 1995).

Results

We recorded FW for two seasons and Ca and DCa levels for 96 C. annuum collections. FW and Ca and DCa levels significantly differed among the collections. The correla-tion of FW across the two seasons was estimated at 0.92. The correlation of Ca and DCa levels across the seasons was 0.755 and 0.746, respectively. The correlation between Ca and DCa for 2011 and 2012 was 0.979 and 0.88, respectively. We found a significantly high negative corre-lation between FW and total capsaicinoids (r2 = −0.394).

Mol Genet Genomics

1 3

Mean FW ranged from 111.26 ± 3.78 g in Peperone Quar-rato red to 0.33 g in Bird’s eye Baby during both 2011 and 2012; mean FW was 104.7 ± 4.47 g for Peperone Quar-rato red and 0.18 ± 0.02 for Tepin, another wild accession. Mean Ca level was highest for Prikkinu (933.19 ± 95.59) and lowest for anthoni romanian (0.04 ± 0.037). Mean DCa level was highest for Tepin (568.81 ± 11.60) and lowest for Yankee Bell (0.027 ± 0.02).

In total, 118 polymorphic SSr loci amplified 366 scora-ble alleles among 96 diverse C. annuum types. Mean het-erozygosity of polymorphic loci was 0.15. Five alleles amplified by CaMS-826 represented the highest Shannon index (SI, an indicator of the polymorphic information content in diverse accessions) of 1.73. The mean number of alleles was 2.1 with an average SI of 0.54. Observed and expected heterozygosity and SI for all the loci in the study are in Table S3. Molecular-genetic diversity study revealed that five main clusters in the C. annuum collec-tions. Molecular-diversity indices were used to construct an nJ tree (Fig. 1). Genetic divergence analysis revealed that both clusters I and II had pungent types but the highest divergence, 30 %, because wild- to semi-wild collections were in cluster II. a divergence of 24 % was noted between clusters II and V, predominantly sweet peppers. Medium

pungent types grouped into two clusters (III and IV), with a narrow diversity of 3 %.

STrUCTUre was used to infer K2 to K10 clusters. The K-5 cluster was the most appropriate because it pro-duced the highest ΔK distribution (9.45) and the highest ln′(K) (616.5) (Fig. 2). Hot pungent C. annuum types were grouped into clusters I and II as resolved by STrUCTUre and the nJ tree. Clusters III and V obtained from STrUC-TUre analysis contained moderately pungent accessions. accessions of cluster IV with the nJ tree were largely sweet and bell peppers but were in cluster IV with STrUC-TUre analysis.

Because our research resolved the diverse C. annuum collections into five main groups, we further sought the support of population genetics. aMOVa revealed highly significantly different variance within and between the five groups (Table 1). Gene flow estimates (nm) and distribu-tions of FST and FIS are in Table S2. In total, for 17 loci, FST was from 0.4 to 0.69, so these markers contributed to most of the divergence among the five groups. FIS was >0.9 for 73 loci, so these loci are fixed. Higher nm esti-mates indicate increased gene flow, and the loci with low nm estimates were the private alleles for various groups in the study. For Hpmse068, the nm was 55.63, which indi-cates that this locus is highly migratory across the groups, followed by CaMS-089 (18.59), CaMS-888 (18.41), Hpmse067 (16.88), CaMS-153 (15.88), Hpmse091 (14.09), Hpmse020 (13.66), Hpms 2-23 (12.23) and CaMS-301 (12.06). In total, for 67 loci, nm estimates were <1, which suggests that these are private alleles. The mean frequency of private alleles was 0.17 per group.

lD distribution across various chromosomes is presented in Fig. 3. The total significant marker pairs in lD were 196 (based on r2, P < 0.05) and are in Table S4. a total of 17 lD blocks were identified across various chromosomes from 0.154 Kb to 126.28 Mb. The mean lD block was 32.17 Mb in the physical map or 32.3 cM in genetic map, which indicates the size at which lD decays in the pep-per genome. We identified 3 lD blocks of 0.15 Kb (1 cM), 98.18 Mb (199.8 cM) and 25.04 Mb (919 cM) on Chr-1. We identified a single lD block, of 5.84 Mb (64.6 cM), on Chr-2. Three lD blocks of 12.53 Mb (13.7 cM), 6.17 Mb (6.5 cM) and 24.55 Mb (59.5 cM) were located on Chr-3. One block of 7.56 Mb (16 cM) was detected on Chr-4. Two blocks of 126.28 Mb (64.6 cM) and 6.96 Mb (30.9 cM) were located on Chr-7. Both of these lD blocks were inverted. Hpmse123, the first marker of the lD block of Chr-7 (126.28 Mb) with the physical location of 4.75 Mb, was located at the genetic map position of 159.8 cM, and the last marker of the block Hpmse141, with physical location 131.03 Mb, was present at the genetic map loca-tion of 95.2 cM, which indicates an inversion. The second block of Chr-7 formed with two markers, Hpmse057 and

Fig. 1 neighbor-joining (nJ) tree of the 96 Capsicum annuum acces-sions constructed with SSr genotypes

Mol Genet Genomics

1 3

Hpmse068, with physical distances 172.91 and 179.89 Mb but genetic map locations 76.3 and 45.4 cM, respectively, which indicates an inversion.

Four blocks (20.79, 0.94, 4.24 and 2.22 Mb) were located on Chr-8, with genetic map locations 8.4, 1, 4 and 18.7 cM, respectively. The first block of Chr-8 had undergone an inversion with its first marker, Hpmse100 (physical map position 20.01 Mb and genetic map position

140.9 cM), and second marker, Hpmse034 (physical map position 40.80 Mb and genetic map position 132.5 cM). Chr-9 and -10 had one block each of 78.59 Mb (64 cM) and 55.52 Mb, respectively. Interestingly, the lD block located on Chr-9 had undergone several rearrangements because the markers Hpmse-150 (136.43 Mb), Hpmse-143 (169.90 Mb), Hpmse025 (189.91 Mb) and Hpmse084 (215.03 Mb) were located in the genetic map at the

Fig. 2 Distribution of five shared ancestries (K = 5) of the collection as estimated using the model-based program STrUCTUre

Table 1 analysis of molecular variance (aMOVa)

Fixation index FST: 0.15262

Source of variation df Sum of squares Variance components Percentage of variation

among population 4 474.175 3.39707 15.26

Within population 187 3,527.002 18.86097 84.74

Total 191 4,001.177 22.25804

Mol Genet Genomics

1 3

positions 76.8, 140.8, 76.8 and 81.4 cM, respectively. Chr-12 had one block of 71.5 Mb (9.4 cM). no lD blocks were found on Chr-5, -6 and -11.

GlM and MlM were used to locate QTl for sam-ples for the 2 years. Marker significance was adjusted by Bonferroni’s correction. For 176 tests at alpha 0.05, the adjusted P was 0.0002841. eight markers located on Chr-1, 2, 4, 5, 8, 9 and 10 were significantly associated with FW, with average effect (R2) of 15 %. The linked markers with their location, position on the physical map, corresponding R2 (GlM), P values (MlM and GlM), allele effects and marker heritabilities are in Table 2. CaMS-199, Hpmse082 and CaMS-190 were the three major loci located on Chr-8, -9 and -10, respectively, and were associated with FW during both years. CaMS-199 could explain 15 and 12.7 % of the variation in the 2 years. This marker was highly sig-nificant after Bonferroni correction, when tested by GlM and MlM models. CaMS-199 amplified alleles of 214 and 233 bp and located on Chr-8 at 87.56 Mb in the physical map. CaMS-199 was associated with FW only when both alleles were in heterozygous condition (with the marker heritability of 15 %), which indicates dominant gene action. CaMS-190 locus (Chr-10 at position 168.07 Mb) showed significance only with Benjamini–Hochberg cor-rection (FDr) and not stringent with Bonferroni correc-tion. CaMS-190 was found in a lD block of 55.52 Mb on Chr-10. CaMS-190 had amplified allele sizes of 289, 304 and 322 bp. The positive allele (289 bp) was associ-ated when in a homozygous condition, which indicates additive gene action and explained 12.4 % of the FW vari-ation. Hpmse082 (position 13.51 Mb of Chr-9) amplified alleles of 197, 214, and 243 bp. This marker was associated with FW when the allele 243 bp was in the homozygous

condition (additive gene action), with R2 of 18 % and marker heritability of 39.6 %.

The CaMS-142 (61153984 position of Chr-1 in whole-genome sequence) was significantly associated with both Ca and DCa levels, as revealed by both GlM and MlM models across the seasons 2011 and 2012 (Table 2). This marker is located in an lD block of 98.18 Mb. CaMS-142 amplified bands of 244, 268, 283 and 326 bp. among these alleles, 268- and 283-bp positions had positive effects on both Ca and DCa levels in both homozygous and het-erozygous states, with a mean R2 of 12.15 % (Ca) and 12.3 % (DCa). CaMS-855 of Chr-1 showed a tight asso-ciation (adjusted by Bonferroni’s correction) with Ca level during 2012 but not 2011.

Discussion

This study resolved the genetic diversity and relatedness of C. annuum germplasm with SSr loci spread across all chromosomes. The physical distance covered by the entire set of SSrs was 2265.9 Mb of the total 3.48-Gb hot-pep-per genome. Population-structure and molecular-diversity studies resolved C. annuum types into five major clusters largely based on pungency, which indicates that pungency is the strong selection force in breeding that contributed to the population substructure of C. annuum cultivars.

Hill et al. (2013) performed population-structure analy-sis and separated sweet peppers and hot peppers from the rest into separate clusters, as we observed. Portis et al. (2005) and Paran et al. (1998) distinguished large-fruit types from the other types. Similar to our study, aguilar-Melendez et al. (2009) performing population-structure

Fig. 3 linkage disequilibrium distribution pattern based on r2 values between all possible SSr loci pairs across various chromosomes

Mol Genet Genomics

1 3

analysis, concluded that to a large extent, the cluster-ing pattern did not follow the geographical diversity. The authors included semi-domesticated and wild types along with domesticated peppers spread across the four clusters and further showed by haplotype comparisons of three nuclear genes that domesticated peppers could retain 91 % of the diversity of semi-wild samples. They further con-cluded that cultivated C. annuum retained a larger amount of diversity than any other known crop plant as compared with their wild progenitors. For example, maize, one of the most extensively cultivated crops, shows reduced diversity of approximately 20 % from its wild ancestor (Tenaillon et al. 2004). From Harlan’s non-centers concept (1971),

domestication might have occurred multiple times across a wide geographic area. Similar to Harlan’s concept, agui-lar-Melendez et al. (2009) hypothesized multiple domesti-cations of C. annuum that led to wider crop diversity. Our findings support that the domesticated C. annuum harbors unique gene pools and thus can be important reservoirs of genetic variation.

In addition to the wide genetic diversity in C. annuum collections, we noted a wide range of lD distributed across the genome, which implies the possible use of association mapping to identify linked markers. The polymorphic SSrs we used were valuable for evaluating population structure among the five groups of peppers. SSrs have been used

Table 2 QTls identified using GlM and MlM models

Trait Marker Marker R2 GlM P value MlM P value Chromosome number

Position on genetic map

Position on WGS (Mb)

allele sizes (bp)

allelic effect

Capsaicin 2012

CaMS-855 0.1815 7.55e − 04 7.55e − 04 1 53.0 14.54 [160, 160] 1.0935

[150, 150] 0.6902

Capsaicin 2012

CaMS-142 0.1339 0.0041 0.0061 1 93.3 61.15 [268, 268] 0.8268

[283, 283] 0.7898

Capsaicin 2011

CaMS-142 0.1093 0.0018 0.0018 1 93.3 61.15 [268, 268] 2.1797

[283, 283] 3.5657

Dihydrocap-saicin 2012

CaMS-142 0.1265 0.0087 1 93.3 61.15 [268, 268] 0.4334

[283, 283] 0.5163

Dihydrocap-saicin 2011

CaMS-142 0.1194 0.0012 0.0012 1 93.3 61.15 [268, 268] 2.0084

[283, 283] 3.5497

Fruit weight 2011

CaMS-199 0.1285 9.25e − 04 0.0019 8 87.56 [214, 233] 27.6608

Fruit weight 2011

CaMS-190 0.1215 0.0019 0.0045 10 168.07 [289, 289] 31.1912

Fruit weight 2011

CaMS-842 0.1239 3.54e − 04 0.0014 5 10.92 [166, 166] 20.1641

Fruit weight 2011

Hpmse082 0.1011 0.0075 0.0199 9 58.6 13.51 [197, 197] 1.13e + 01

Fruit weight 2011

Hpms 2-23 0.1718 3.80e − 04 0.0029 1 184.73 [155, 155] 3.2225

Fruit weight 2011

CaMS-236 0.1118 0.0032 0.0138 4 87.1 179.54 [151, 151] 40.9491

Fruit weight 2012

CaMS-199 0.1684 7.42e − 04 0.0019 8 87.56 [214, 233] 32.7117

Fruit weight 2012

CaMS-190 0.1434 0.0033 0.0169 10 168.07 [289, 289] 31.7461

Fruit weight 2012

Hpmse082 0.1798 5.12e − 04 0.0052 9 58.6 13.51 [243, 243] 27.3486

Fruit weight 2012

Hpmse084 0.1489 7.42e − 04 0.0097 9 215.03 [212, 212] 24.9887

Fruit weight 2012

Hpmse045 0.2091 7.76e − 04 0.0044 2 126.74 [155, 178] 20.9483

Mol Genet Genomics

1 3

to accurately resolve population structure in many other crops (D’hoop et al. 2010; Jin et al. 2010; robbins et al. 2011; Zhang et al. 2011; Tomason et al. 2013). Micros-atellite markers generally reveal higher power than SnPs for resolving population structure (emanuelli et al. 2013; Ohashi and Tokunaga 2003). With SSrs used to resolve population structure, SnPs can be used for genome-wide association studies that would aid fine-scale mapping.

Our study is the first association-mapping study per-formed in the pepper to identify markers for fruit weight. Despite high variation in FW across the 2 years, we iden-tified five significant markers after Bonferroni correction. FW is a polygenic trait controlled by major and minor genes with complicated effects involving additive, domi-nant and epistatic interactions (Grandillo et al. 1999). among the known tomato FW QTl cloned, fw1.1, fw2.2, fw2.3, fw3.1, fw3.2, fw4.1 and fw9.1 collectively contribute 20 % of the phenotypic variation (Munos et al. 2011; Cong et al. 2008). Ben Chaim et al. (2003) noted that several pepper FW QTl are putative orthologues of tomato FW QTl; however, this finding requires detailed comparative sequence analysis of both crops.

Our study identified common markers for both Ca and DCa level, so these traits are controlled by the same set of genes. Pungency QTl contributed 12 % of the phenotypic variation and had additive gene effects, on Chr-1. In con-trast, a recent whole-genome study located Pun 1, a major diagnostic locus that controls pungency, on Chr-2 and in acyl transferase 3; putative amino transferase (p-AMP) has two homologues on Chr-3 and also linked to pungent and non-pungent types (Mazourek et al. 2009). Whole-genome sequence analysis located p-AMP on Chr-10 and -12. asso-ciation mapping based on the candidate gene polymor-phisms of Pun 1 and KAS, two different genes in capsaicin metabolic pathway revealed that some candidate SnPs reg-ulate the major precursors of acyl moieties of capsaicinoids and thereby playing a key role in capsaicinoid produc-tion (reddy et al. 2014). Han et al. (Han et al. 2013) con-cluded the possibility of gene controls other than Pun 1 and p-AMP for capsaicinoid synthesis, which indicates the pos-sibility of identifying additional loci controlling pungency though fine mapping and or genome annotation. In future, when the genome sequence of pepper is publicly available, these regions can be annotated and used for further analysis of Ca pathways.

a unified, mixed-model approach for association map-ping combined with population structure (Q) and relative kinship matrix (K) provides a dependable and robust sys-tem for identifying reliable QTl (Yu and Buckler 2006; Zhao et al. 2007). Toward this end, here we resolved the population structure of C. annuum using genome-wide SSrs and further applied in mixed models to identify linked markers for various fruit traits. This analysis will be

of use for future association-mapping studies to identify markers for various complex traits in C. annuum.

Acknowledgments The study received funding from USDa-nIFa (2010-38821-21574), nSF-ePSCOr#09-570 - ePS-1003907, nIH Grant P20rr016477 to the West Virginia IDea network for Biomed-ical research Funding and the Gus r. Douglass Institute (graduate research assistantship to V. abburi).

Conflict of interest We declare no conflicts of interest.

References

aguilar-Melendez a, Morrell Pl, roose Ml, Kim S-C (2009) Genetic diversity and structure in semiwild and domesticated chiles (Capsicum annuum; Solanaceae) from Mexico. am J Bot 96(6):1190–1202

arumugasundaram S, Ghosh M, Veerasamy S, ramasamy Y (2011) Species discrimination, population structure and linkage dis-equilibrium in Eucalyptus camaldulensis and Eucalyptus tereti-cornis using SSr markers. PloS One 6(12):e28252. doi:10.1371/journal.pone.0028252

Ben Chaim a, Paran I, Grube rC, Jahn M, van Wijk r, Peleman J (2001) QTl mapping of fruit-related traits in pepper (Capsicum annuum). Theor appl Genet 102(6–7):1016–1028

Ben Chaim a, Borovsky Y, rao GU, Tanyolac B, Paran I (2003) fs3.1: a major fruit shape QTl conserved in Capsicum. Genome 46(1):1–9

Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J r Stat Soc Ser B Methodol 57(1):289–300

Bradbury PJ, Zhang Z, Kroon De, Casstevens TM, ramdoss Y, Buck-ler eS (2007) TaSSel: software for association mapping of com-plex traits in diverse samples. Bioinformatics 23(19):2633–2635

Cong B, Barrero lS, Tanksley SD (2008) regulatory change in YaBBY-like transcription factor led to evolution of extreme fruit size during tomato domestication. nat Genet 40(6):800–804

Crossa J, Franco J (2004) Statistical methods for classifying geno-types. euphytica 137(1):19–37

D’hoop B, Paulo MJ, Kowitwanich K, Sengers M, Visser rF, eck H, eeuwijk F (2010) Population structure and linkage dis-equilibrium unravelled in tetraploid potato. Theor appl Genet 121(6):1151–1170

emanuelli F, lorenzi S, Grzeskowiak l, Catalano V, Stefanini M, Troggio M, Myles S, Martinez-Zapater J, Zyprian e, Moreira F, Grando M (2013) Genetic diversity and population structure assessed by SSr and SnP markers in a large germplasm collec-tion of grape. BMC Plant Biol 13(1):1–17

evanno G, regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software structure: a simulation study. Mol ecol 14(8):2611–2620

Grandillo S, Ku HM, Tanksley SD (1999) Identifying the loci respon-sible for natural variation in fruit size and shape in tomato. Theor appl Genet 99(6):978–987

Hamblin MT, Warburton Ml, Buckler eS (2007) empirical compari-son of simple sequence repeats and single nucleotide polymor-phisms in assessment of maize diversity and relatedness. PloS One 2(12):e1367. doi:10.1371/journal.pone.0001367

Han K, Jeong H-J, Sung J, Keum Y, Cho M-C, Kim J-H, Kwon J-K, Kim B-D, Kang B-C (2013) Biosynthesis of capsinoid is con-trolled by the Pun1 locus in pepper. Mol Breed 31(3):537–548. doi:10.1007/s11032-012-9811-y

Hao C, Wang l, Ge H, Dong Y, Zhang X (2011) Genetic diversity and linkage disequilibrium in Chinese bread wheat (Triticum

Mol Genet Genomics

1 3

aestivum l.) revealed by SSr markers. PloS One 6(2):e17279. doi:10.1371/journal.pone.0017279

Hernández-Verdugo S, luna-reyes r, Oyama K (2001) Genetic structure and differentiation of wild and domesticated popula-tions of Capsicum annuum (Solanaceae) from Mexico. Plant Syst evol 226(3–4):129–142. doi:10.1007/s006060170061

Hill Ta, ashrafi H, reyes-Chin-Wo S, Yao J, Stoffel K, Truco M-J, Kozik a, Michelmore rW, Van Deynze a (2013) Characteri-zation of Capsicum annuum genetic diversity and population structure based on parallel polymorphism discovery with a 30 K unigene pepper genechip. PloS One 8(2):e56200. doi:10.1371/journal.pone.0056200

Jakobsson M, rosenberg na (2007) ClUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23(14):1801–1806

Jin l, lu Y, Xiao P, Sun M, Corke H, Bao J (2010) Genetic diversity and population structure of a diverse set of rice germplasm for association mapping. Theor appl Genet 121(3):475–487

liu S-r, li W-Y, long D, Hu C-G, Zhang J-Z (2013) Development and characterization of genomic and expressed SSrs in citrus by genome-wide analysis. PloS One 8(10):e75149. doi:10.1371/journal.pone.0075149

Matsuoka Y, Mitchell Se, Kresovich S, Goodman M, Doebley J (2002) Microsatellites in Zea—variability, patterns of muta-tions, and use for evolutionary studies. Theor appl Genet 104(2–3):436–450

Mazourek M, Pujar a, Borovsky Y, Paran I, Mueller l, Jahn MM (2009) a dynamic interface for capsaicinoid systems biology. Plant Physiol 150(4):1806–1821

Minamiyama Y, Tsuro M, Hirai M (2006) an SSr-based linkage map of Capsicum annuum. Mol Breed 18(2):157–169

Munos S, ranc n, Botton e, Berard a, rolland S, Duffe P, Carretero Y, le Paslier MC, Delalande C, Bouzayen M, Brunel D, Causse M (2011) Increase in tomato locule number is controlled by two single-nucleotide polymorphisms located near WUSCHel. Plant Physiol 156(4):2244–2254

Ohashi J, Tokunaga K (2003) Power of genome-wide linkage dise-quilibrium testing by using microsatellite markers. J Hum Genet 48(9):487–491

Paran I (2003) Marker-assisted utilization of exotic germplam. In: nguyen HT, Blum a (eds) Physiology and biotechnology integra-tion for plant breeding. Marcel Dekker, new York

Paran I, aftergoot e, Shifriss C (1998) Variation in Capsicum annuum revealed by raPD and aFlP markers. euphytica 99(3):167–173

Pickersgill (1997) Genetic resources and breeding of Capsicum spp. euphytica 96:129–133

Portis e, Barchi l, acquadro a, Macua JI, lanteri S (2005) Genetic diversity assessment in cultivated cardoon by aFlP (amplified fragment length polymorphism) and microsatellite markers. Plant Breed 124(3):299–304

Powis TG, Gallaga Murrieta e, lesure r, lopez Bravo r, Grivetti l, Kucera H, Gaikwad nW (2013) Prehispanic use of chili pep-pers in Chiapas, Mexico. PloS One 8(11):e79013. doi:10.1371/journal.pone.0079013

Pritchard JK, Stephens M, Donnelly P (2000) Inference of popu-lation structure using multilocus genotype data. Genetics 155(2):945–959

reddy UK, almeida a, abburi Vl, alaparthi SB, Unselt D, Hankins G, Park M, Choi D, nimmakayala P (2014) Identification of gene-specific polymorphisms and association with capsaicin pathway metabolites in Capsicum annuum l. collections. PloS One 9(1):e86393. doi:10.1371/journal.pone.0086393

reyes-Valdés MH, Santacruz-Varela a, Martínez O, Simpson J, Hay-ano-Kanashiro C, Cortés-romero C (2013) analysis and opti-mization of bulk Dna sampling with binary scoring for germ-plasm characterization. PloS One 8(11):e79936. doi:10.1371/journal.pone.0079936

robbins MD, Sim S-C, Yang W, Van Deynze a, van der Knaap e, Joobeur T, Francis DM (2011) Mapping and linkage disequilib-rium analysis with a genome-wide collection of SnPs that detect polymorphism in cultivated tomato. J exp Bot 62(6):1831–1845

rosenberg na (2004) Distruct: a program for the graphical display of population structure. Mol ecol notes 4(1):137–138

rousset F (2008) Genepop’007: a complete re-implementation of the genepop software for Windows and linux. Mol ecol resour 8(1):103–106

Schneider, roessli D, excoffier l (2000) arlequin: a software for population genetics data analysis User manual

Sidak Z (1967) rectangular confidence regions for the means of mul-tivariate normal distributions. J am Stat assoc 62(318):626–633

Slatkin M (1985) Gene flow in natural populations. annu rev ecol Syst 16:393–430

Sonah H, Deshmukh rK, Sharma a, Singh VP, Gupta DK, Gac-che rn, rana JC, Singh nK, Sharma Tr (2011) Genome-wide distribution and organization of microsatellites in plants: an insight into marker development in Brachypodium. PloS One 6(6):e21298. doi:10.1371/journal.pone.0021298

Sugita T, Semi Y, Sawada H, Utoyama Y, Hosomi Y, Yoshimoto e, Maehata Y, Fukuoka H, nagata r, Ohyama a (2013) Develop-ment of simple sequence repeat markers and construction of a high-density linkage map of Capsicum annuum. Mol Breed 31(4):909–920

Tamura K, Peterson D, Peterson n, Stecher G, nei M, Kumar S (2011) MeGa5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsi-mony methods. Mol Biol evol 28(10):2731–2739

Tenaillon MI, U’ren J, Tenaillon O, Gaut BS (2004) Selection versus demography: a multilocus investigation of the domestication pro-cess in maize. Mol Biol evol 21(7):1214–1225

Tomason Y, nimmakayala P, levi a, reddy U (2013) Map-based molecular diversity, linkage disequilibrium and association map-ping of fruit traits in melon. Mol Breed 31(4):829–841

Weir BS, Cockerham CC (1984) estimating F-statistics for the analy-sis of population structure. evolution 38(6):1358–1370

Yarnes SC, ashrafi H, reyes-Chin-Wo S, Hill Ta, Stoffel KM, Van Deynze a, Gulick P (2012) Identification of QTls for capsai-cinoids, fruit quality, and plant architecture-related traits in an interspecific Capsicum rIl population. Genome 56(1):61–74

Yeh Yr, Boyle T (1999) POPGene Version 1.31. Microsoft Win-dow-based Freeware for Population Genetic analysis Quick User Guide

Yi G, lee JM, lee S, Choi D, Kim BD (2006) exploitation of pep-per eST-SSrs and an SSr-based linkage map. Theor appl Genet 114(1):113–130

Yu J, Buckler eS (2006) Genetic association mapping and genome organization of maize. Curr Opin Biotechnol 17(2):155–160

Zhang J, Hao C, ren Q, Chang X, liu G, Jing r (2011) association mapping of dynamic developmental plant height in common wheat. Planta 234(5):891–902

Zhao K, aranzana MJ, Kim S, lister C, Shindo C, Tang C, Toomajian C, Zheng H, Dean C, Marjoram P, nordborg M (2007) an arabi-dopsis example of association mapping in structured samples. PloS Genet 3(1):e4

Zhu C, Gore M, Buckler eS, Yu J (2008) Status and prospects of asso-ciation mapping in plants. Plant Gen 1(1):5–20