Estrogen Receptor Genotypes and Haplotypes Associated with Breast Cancer Risk

10
[CANCER RESEARCH 64, 8891– 8900, December 15, 2004] Estrogen Receptor Genotypes and Haplotypes Associated with Breast Cancer Risk Bert Gold, 1 Francis Kalush, 2 Julie Bergeron, 3 Kevin Scott, 3 Nandita Mitra, 4 Kelly Wilson, 2 Nathan Ellis, 4 Helen Huang, 4 Michael Chen, 3 Ross Lippert, 5,6 Bjarni V. Halldorsson, 5 Beth Woodworth, 1 Thomas White, 2 Andrew G. Clark, 2 Fritz F. Parl, 7 Samuel Broder, 2 Michael Dean, 1 and Kenneth Offit 4 1 Human Genetics Section, Laboratory of Genomic Diversity, National Cancer Institute at Frederick, Frederick, Maryland; 2 Celera Diagnostics or Celera, Inc, Rockville, Maryland or Alameda, California; 3 SAIC-Frederick, Inc, Frederick, Maryland; 4 Memorial Sloan-Kettering Cancer Center, New York, New York; 5 Applied Biosystems, Rockville, Maryland; 6 Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts; and 7 Department of Pathology, Vanderbilt University School of Medicine, Nashville, Tennessee ABSTRACT Nearly one in eight US women will develop breast cancer in their lifetime. Most breast cancer is not associated with a hereditary syndrome, occurs in postmenopausal women, and is estrogen and progesterone receptor-positive. Estrogen exposure is an epidemiologic risk factor for breast cancer and estrogen is a potent mammary mitogen. We studied single nucleotide polymorphisms (SNPs) in estrogen receptors in 615 healthy subjects and 1011 individuals with histologically confirmed breast cancer, all from New York City. We analyzed 13 SNPs in the progesterone receptor gene (PGR), 17 SNPs in estrogen receptor 1 gene (ESR1), and 8 SNPs in the estrogen receptor 2 gene (ESR2). We observed three common haplotypes in ESR1 that were associated with a decreased risk for breast cancer [odds ratio (OR), O.4; 95% confidence interval (CI), 0.2– 0.8; P < 0.01]. Another haplotype was associated with an increased risk of breast cancer (OR, 2.1; 95% CI, 1.2–3.8; P < 0.05). A unique risk haplotype was present in 7% of older Ashkenazi Jewish study subjects (OR, 1.7; 95% CI, 1.2–2.4; P < 0.003). We narrowed the ESR1 risk haplotypes to the promoter region and first exon. We define several other haplotypes in Ashkenazi Jews in both ESR1 and ESR2 that may elevate susceptibility to breast cancer. In contrast, we found no association be- tween any PGR variant or haplotype and breast cancer. Genetic epidemi- ology study replication and functional assays of the haplotypes should permit a better understanding of the role of steroid receptor genetic variants and breast cancer risk. INTRODUCTION Only a small fraction (5%) of women diagnosed with breast cancer have a clear hereditary predisposition (1–3), and of these, about one half have predisposing mutations in BRCA1, BRCA2, PTEN, TP53, or other known cancer predisposing genes. However, twin studies indicate that the heritability of breast cancer is 30% (4), suggesting that genes other than the well-mapped regions act as modifiers of breast cancer risk. Although it is likely low penetrance as well as high penetrance genes may be involved in the etiology, it remains unclear which genomic regions and which biochemical func- tions or signal transduction pathways account for additional, heritable breast cancer incidence or progression. Abundant epidemiologic evidence suggests that estrogen plays a crucial role in most breast cancers. Nulliparous women are at signif- icantly elevated risk, as are women who have children late in their lives, women who have early menarche or women who have late menopause. Obesity is also associated with breast cancer risk; estro- gen synthesis in adipose tissue is proposed to account for this increase in risk. Whereas estrogen receptor (ER)-positive and progesterone receptor (PgR)-positive breast cancers have better short-term progno- sis than those that have become hormone independent (5), receptor status varies as a function of age and menopausal status. Younger patients are more likely receptor negative and hormonally unrespon- sive; older patients are more often receptor positive and hormonally responsive. Recent and complete reviews of the genetics of breast cancer and its relation to the estrogen and progesterone receptor are available (6 –12). In this report, we seek to identify candidate steroid hormone recep- tor gene variants in ESR1, ESR2, and PGR that might be associated with risk of breast cancer, perhaps leading to accelerated or slower rates of neoplastic transformation. 8 MATERIALS AND METHODS 5-Nucleotidase Assay Designs. Single nucleotide polymorphisms (SNPs) discovered through data mining of the Celera Discovery System, a Celera Proprietary Database, or deposits into dbSNP were chosen for assay design. Limited resequencing permitted discovery of the rare variant in ESR2 exon 4 reported here. Sequences chosen for scoring on the entire cohort were either purchased from Applied Biosystems as Assays-on-Demand (AOD) or submit- ted to an Applied Biosystems Assays-by-Design pipeline. One difficult assay (G393G in PGR) was designed by Raymond Stephens of Celadon Labs., College Park, MD. Propynyl T oligonucleotide probes for G393G were man- ufactured under special license agreement by TriLink Biotech, San Diego, CA. 5-Nucleotidase Assay Method. Five nanograms of genomic patient and control DNAs were aliquoted with a Hydra liquid handler (HYDRA Robbins Molecular BioProducts, San Diego, CA) into 384-well bar-coded optical thermocycler plates compatible with the ABI PRISM 7900HT sequence de- tection system (ABI Prism 7900 HT, Applied Biosystems, Inc., Foster City, CA). Before assay, these were rehydrated with 2.4 L of deionized water with a Qfill2 automated pipetter (QFILL2 Genetix Ltd., Queensway, New Milton Hampshire, United Kingdom). For Assays-on-Demand products (denoted by a catalog number and paucity of sequence information in Table 1), 2.5 L of TaqMan Universal PCR master mix was added with 125 nL of Assay-on- Demand mix per well. For Assays-by-Design products or assays devised in house, 2.5 L of PCR master mix was combined with 100 mol of each primer (45 nl each) and 100 mol of each probe (10 nl each). Plates were sealed and cycled at 95°C for 10 minutes, followed by 50 cycles of 95°C for 15 seconds, 58°C for 1 minute in an ABI GeneAmp PCR System 9700 thermocycler set for 9600 emulation. At the end of cycling, plates were held at 25°C until reading in a 7900 HT sequence detection system. Each plate contained controls of each genotype and no template controls. Data from plates failing any control were discarded. Manual genotype calls were done conservatively, consistent with the standards discussed in Clark et al.(13) and missing values were excluded from the analysis as detailed in Results. Aggregate indeterminate genotypes Received 4/9/04; revised 10/1/04; accepted 10/6/04. Grant support: Funded in whole or in part with federal funds from the Center for Cancer Research of the National Cancer Institute and the NIH under contract NO1-CO- 12400, the Barbara Goldsmith Foundation, The Lymphoma Foundation, The Frankel Foundation, and the Academic Medicine Development Company. The New York Cancer Project is administered and funded by AMDeC Foundation, Inc. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. Requests for reprints: Bert Gold, Human Genetics Section, Laboratory of Genomic Diversity, Center for Cancer Research, National Cancer Institute at Frederick, Building 560, Room 21-21, Frederick, MD 21702. Phone: (301) 846-5098; Fax: (301) 846-1909; E-mail: [email protected]. ©2004 American Association for Cancer Research. 8 Genetic loci are identified in this publication as ESR1, ESR2 and PGR consistent with Human Genome Organization (HUGO) guidelines. ER, ER and PgR refer to the respective peptide products. 8891

Transcript of Estrogen Receptor Genotypes and Haplotypes Associated with Breast Cancer Risk

[CANCER RESEARCH 64, 8891–8900, December 15, 2004]

Estrogen Receptor Genotypes and Haplotypes Associated with Breast Cancer Risk

Bert Gold,1 Francis Kalush,2 Julie Bergeron,3 Kevin Scott,3 Nandita Mitra,4 Kelly Wilson,2 Nathan Ellis,4

Helen Huang,4 Michael Chen,3 Ross Lippert,5,6 Bjarni V. Halldorsson,5 Beth Woodworth,1 Thomas White,2

Andrew G. Clark,2 Fritz F. Parl,7 Samuel Broder,2 Michael Dean,1 and Kenneth Offit4

1Human Genetics Section, Laboratory of Genomic Diversity, National Cancer Institute at Frederick, Frederick, Maryland; 2Celera Diagnostics or Celera, Inc, Rockville,Maryland or Alameda, California; 3SAIC-Frederick, Inc, Frederick, Maryland; 4Memorial Sloan-Kettering Cancer Center, New York, New York; 5Applied Biosystems, Rockville,Maryland; 6Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts; and 7Department of Pathology, Vanderbilt University School ofMedicine, Nashville, Tennessee

ABSTRACT

Nearly one in eight US women will develop breast cancer in theirlifetime. Most breast cancer is not associated with a hereditary syndrome,occurs in postmenopausal women, and is estrogen and progesteronereceptor-positive. Estrogen exposure is an epidemiologic risk factor forbreast cancer and estrogen is a potent mammary mitogen. We studiedsingle nucleotide polymorphisms (SNPs) in estrogen receptors in 615healthy subjects and 1011 individuals with histologically confirmed breastcancer, all from New York City. We analyzed 13 SNPs in the progesteronereceptor gene (PGR), 17 SNPs in estrogen receptor 1 gene (ESR1), and 8SNPs in the estrogen receptor 2 gene (ESR2). We observed three commonhaplotypes in ESR1 that were associated with a decreased risk for breastcancer [odds ratio (OR), � O.4; 95% confidence interval (CI), 0.2–0.8;P < 0.01]. Another haplotype was associated with an increased risk ofbreast cancer (OR, 2.1; 95% CI, 1.2–3.8; P < 0.05). A unique riskhaplotype was present in �7% of older Ashkenazi Jewish study subjects(OR, 1.7; 95% CI, 1.2–2.4; P < 0.003). We narrowed the ESR1 riskhaplotypes to the promoter region and first exon. We define several otherhaplotypes in Ashkenazi Jews in both ESR1 and ESR2 that may elevatesusceptibility to breast cancer. In contrast, we found no association be-tween any PGR variant or haplotype and breast cancer. Genetic epidemi-ology study replication and functional assays of the haplotypes shouldpermit a better understanding of the role of steroid receptor geneticvariants and breast cancer risk.

INTRODUCTION

Only a small fraction (�5%) of women diagnosed with breastcancer have a clear hereditary predisposition (1–3), and of these,about one half have predisposing mutations in BRCA1, BRCA2,PTEN, TP53, or other known cancer predisposing genes. However,twin studies indicate that the heritability of breast cancer is �30% (4),suggesting that genes other than the well-mapped regions act asmodifiers of breast cancer risk. Although it is likely low penetrance aswell as high penetrance genes may be involved in the etiology, itremains unclear which genomic regions and which biochemical func-tions or signal transduction pathways account for additional, heritablebreast cancer incidence or progression.

Abundant epidemiologic evidence suggests that estrogen plays acrucial role in most breast cancers. Nulliparous women are at signif-icantly elevated risk, as are women who have children late in theirlives, women who have early menarche or women who have late

menopause. Obesity is also associated with breast cancer risk; estro-gen synthesis in adipose tissue is proposed to account for this increasein risk. Whereas estrogen receptor (ER)-positive and progesteronereceptor (PgR)-positive breast cancers have better short-term progno-sis than those that have become hormone independent (5), receptorstatus varies as a function of age and menopausal status. Youngerpatients are more likely receptor negative and hormonally unrespon-sive; older patients are more often receptor positive and hormonallyresponsive. Recent and complete reviews of the genetics of breastcancer and its relation to the estrogen and progesterone receptor areavailable (6–12).

In this report, we seek to identify candidate steroid hormone recep-tor gene variants in ESR1, ESR2, and PGR that might be associatedwith risk of breast cancer, perhaps leading to accelerated or slowerrates of neoplastic transformation.8

MATERIALS AND METHODS

5�-Nucleotidase Assay Designs. Single nucleotide polymorphisms (SNPs)discovered through data mining of the Celera Discovery System, a CeleraProprietary Database, or deposits into dbSNP were chosen for assay design.Limited resequencing permitted discovery of the rare variant in ESR2 exon 4reported here. Sequences chosen for scoring on the entire cohort were eitherpurchased from Applied Biosystems as Assays-on-Demand (AOD) or submit-ted to an Applied Biosystems Assays-by-Design pipeline. One difficult assay(G393G in PGR) was designed by Raymond Stephens of Celadon Labs.,College Park, MD. Propynyl T oligonucleotide probes for G393G were man-ufactured under special license agreement by TriLink Biotech, San Diego, CA.

5�-Nucleotidase Assay Method. Five nanograms of genomic patient andcontrol DNAs were aliquoted with a Hydra liquid handler (HYDRA RobbinsMolecular BioProducts, San Diego, CA) into 384-well bar-coded opticalthermocycler plates compatible with the ABI PRISM 7900HT sequence de-tection system (ABI Prism 7900 HT, Applied Biosystems, Inc., Foster City,CA). Before assay, these were rehydrated with 2.4 �L of deionized water witha Qfill2 automated pipetter (QFILL2 Genetix Ltd., Queensway, New MiltonHampshire, United Kingdom). For Assays-on-Demand products (denoted by acatalog number and paucity of sequence information in Table 1), 2.5 �L ofTaqMan Universal PCR master mix was added with 125 nL of Assay-on-Demand mix per well. For Assays-by-Design products or assays devised inhouse, 2.5 �L of PCR master mix was combined with 100 �mol of each primer(45 nl each) and 100 �mol of each probe (10 nl each). Plates were sealed andcycled at 95°C for 10 minutes, followed by 50 cycles of 95°C for 15 seconds,58°C for 1 minute in an ABI GeneAmp PCR System 9700 thermocycler set for9600 emulation. At the end of cycling, plates were held at 25°C until readingin a 7900 HT sequence detection system. Each plate contained controls of eachgenotype and no template controls. Data from plates failing any control werediscarded. Manual genotype calls were done conservatively, consistent withthe standards discussed in Clark et al.(13) and missing values were excludedfrom the analysis as detailed in Results. Aggregate indeterminate genotypes

Received 4/9/04; revised 10/1/04; accepted 10/6/04.Grant support: Funded in whole or in part with federal funds from the Center for

Cancer Research of the National Cancer Institute and the NIH under contract NO1-CO-12400, the Barbara Goldsmith Foundation, The Lymphoma Foundation, The FrankelFoundation, and the Academic Medicine Development Company. The New York CancerProject is administered and funded by AMDeC Foundation, Inc.

The costs of publication of this article were defrayed in part by the payment of pagecharges. This article must therefore be hereby marked advertisement in accordance with18 U.S.C. Section 1734 solely to indicate this fact.

Requests for reprints: Bert Gold, Human Genetics Section, Laboratory of GenomicDiversity, Center for Cancer Research, National Cancer Institute at Frederick, Building560, Room 21-21, Frederick, MD 21702. Phone: (301) 846-5098; Fax: (301) 846-1909;E-mail: [email protected].

©2004 American Association for Cancer Research.

8 Genetic loci are identified in this publication as ESR1, ESR2 and PGR consistent withHuman Genome Organization (HUGO) guidelines. ER�, ER� and PgR refer to therespective peptide products.

8891

averaged 1% of total ESR1 SNPs sampled (range, 0.4–2%). Genotype tabu-lations and missing value details are provided as supplementary data.9

The following SNPs were assayed in the ESR1 gene: rs851984, rs2881766,ESR1002, rs2077647, rs827421, rs9322331, rs712221, hCV1576295, rs1801132,rs1884051, rs6905370, rs926778, rs3020366, rs750686, rs2228480, rs3798577,rs910416; in the ESR2 gene: rs1255998, rs928554, rs1152579, rs4986938,rs1256049, E2EX4CorT, rs1256030, rs1271572; and in the PGR gene: rs511298,rs1042839, rs492457, rs1042838, rs2020876, rs578938, rs613120, rs566351,rs506487, rs1379130, rs3740753, PR331GORA, rs518162. These SNPs werechosen for assay based on the following criteria: (a) all nonsynonymous, splicing,and transcription factor binding site variants were examined; (b) allele frequencyexceeded 10% in Caucasian samples; (c) they were available by Assay on De-mand. Some novel SNPs that did not meet the frequency criteria were selectedfrom ESR2 and PGR after resequencing. Probe and primer sequences, whenavailable, are provided in Table 1, as are detailed order numbers for Assays-on-Demand products, when applicable.

A map with SNP location details is provided as Fig. 1.Statistical Methods. SNP typing in control samples was checked for

compliance with Hardy–Weinberg equilibrium with Tools For PopulationGenetic Analyses (TFPGA).10 Contingency table analysis for individual SNPswas carried out with SAS (SAS Institute, Inc., Cary, NC), SAS/Genetics (SASInstitute, Inc.), and SPSS (SPSS Inc., Chicago, IL). For haplotype estimationwe used SNPhap,11 PHASE12 (14), SNPEM13 (15), MLOCUS14 (16), anextension of the Clark (17) algorithm with expectation maximization (EM; ref.18), and haplo.score15 (19). Self-assigned demography was checked with

STRUCTURE.16 Each analysis was conducted for the case–control populationas a whole; stratifying by one of six ethnicities (Asian American, AfricanAmerican, Hispanic, Ashkenazi Jewish, Unknown, or European American)and by age (age �50 and age �50). To address the effect of age of menopause,a surrogate marker (age � or �50) was used. Five males with breast cancerand two male controls were included in the study, but excluded for thestatistical analysis presented here. No result presented here was significantlyimpacted when we included or excluded these males. For the purpose ofhaplotype estimation, the most common alleles in the study as a whole wererepresented by “1” for ESR1 loci, and “A” for ESR2 loci; in addition, efforts aremade to describe each haplotype by the base letter of the variants that composeit. Map coordinates provided are those from NCBI Build 33 (April 2003).17

SNP name designations are consistent with those defined in dbSNP whenpossible.

The strategy used to search for statistically significant haplotypes consistedof, first, using the EM algorithm or a Bayesian model to provide haplotypeestimates; second, these estimated haplotypes were used to identify htSNPs(haplotype-tagging SNPs; ref. 20) representing a minimal informative subset ofSNPs in each gene of interest (21). Haplotype-tagging SNPs for ESR1 were

9 ftp://ftp.ncifcrf.gov/pub/users/goldb/ in a folder labeled CANCERRESEARCH.10 Authored by Dr. Mark P. Miller ([email protected]; Utah State University,

Logan, UT) and available from the web page http://bioweb.usu.edu/mpmbio/ or by writingto the author.

11 By David Clayton ([email protected]; Cambridge Institute for MedicalResearch, Cambridge, United Kingdom), downloaded from http://www-gene.cimr.cam.ac.uk/clayton/software/.

12 Written by Matthew Stevens ([email protected]; University of Wash-ington, Seattle, WA) and available from his web site, http://www.stat.washington.edu/stephens/software.html.

13 Written by Dr. Nicholas Schork and M. Daniele Fallin, and obtained [email protected].

14 Obtained from Dr. Jeffrey C. Long ([email protected]; University of MichiganMedical School, Ann Arbor, MI).

15 The R-version of haplo.score was authored by Dr. Daniel J. Schaid ([email protected]; Mayo Clinic, Rochester, MN).

16 Authored by Dr. Jonathan Pritchard ([email protected]; The University ofChicago, Chicago, IL) and available at his web site, http://pritch.bsd.uchicago.edu/software.html.

17 http://genome.ucsc.edu.

Table 1 Primers, probes, and genotyping assays

Gene SNP Name Interrogated sequence Forward primer or AOD number *

ESR1 rs851984 ATCTAGAATAGTTAA[G/A]TGCCTGTTTCAGTCC TGAACTTTGAACCATCACTGAGGrs2881766 TATAAACTGCAGACT[T/G]AAATTAAGACCTTGA C_11414988ESR1002 AACACATCCACACAC[T/G]CTCTCTGCCTAGTTC CTGCCATTCCACGCACAArs2077647 GTCATCCCGGTAGGG[C/T]CTACGAAACCACACC GCGGCCACGGACCATrs827421 TGTCATAAAGTACAA[C/T]GTTCTCCTTTGAATA C_11920506rs9322331 TCCCTTTCTCCTGGC[C/T]CATGCCCTTCAGTCT C_1987609_10rs712221 AGGACTTCATGTTCA[T/A]TAACTTTTCCTTTTT C_3163596hCV1576295 GAAGAATACACTTTT[T/C]GCTTGCAGTTAGCAT C_1576295_10rs1801132 GGATGCTGAGCCCCC[G/C]ATACTCTATTCCGAG TGACGGCCGACCAGATGrs1884051 TCAAGAGCTTCTGCC[A/G]TCTTCTAGGCATTCT C_11918415rs6905370 TAGTCACTACAAGGC[G/A]AGTTTTGTTCTGTCT C_328969rs926778 GCAACTAACTCTTTC[C/A]AAGCATTGACCAGAT C_8790211rs3020366 CTTAAGGAATTGCCC[T/C]GTGTGAGTTCCTTGA C_338027rs750686 CTTTATTCAACTCAC[G/A]TAATGAGAAGTCAGT C_2823728rs2228480 GGGTTTCCCTGCCAC[G/A]GTCTGAGAGCTCCCT CATCGCATTCCTTGCAAAAGTrs3798577 GGAGCTGAACAGTAC[T/C]TGTGCAGGATTGTTG C_2823742rs910416 GGTAGCTGCTTTACA[T/C]GTGGTCTCAGTGCCT C_2823749

ESR2 rs1271572 TGTGACACTGGGGGG[T/G]TCTCACAATGGCCTG C_7573237_10rs1256030 ACTTAGAGATGTAGC[T/C]CCCACCCCATGGCTA GATCTGGCCACTCCTTTCATTACAE2EX4CorT ACTTCGGAAGTGTTA[C/T]GAAGTGGGAATGGTG CCAGGCCTGCCGACTTCrs1256049 CCTGTTCGACCAAGT[G/A]CGGCTCTTGGAGAGC GGAGCTCAGCCTGTTCGArs4986938 CCCACAGAGGTCACA[G/A]GCTGAAGCGTGAACT C_11462726_10 or GGTGAACTGGCCCACAGArs1152579 GTACAATTTGAGAGA[T/C]GCTGTCACGGTATCT C_7573336_1_rs928554 GTGTGGTCAGCTGTG[A/G]CTGCCAACAGATGCA ATCAACTCGGTGGCCTAAAGAAAArs1255998 ACGTAGACAACCGTC[C/G]CGTGTCGACTGGTGTT GGTTTGTGCTTTGGCAGAGAAG

PGR rs518162 ATGCCACCCACACGC[A/G]CAAATACAACAAGGC CAGTCCACAGCTGTCACTAATCGPR331GORA AAGTCGGGAGATAAA[G/A]GAGCCGCGTGTCACT CACGAGTTTGATGCCAGAGAAAArs3740753 CTCCGTGTCCCACTT[G/C]AGGCGCCGCCCCGTT CTGCCAGCGCCTTTGCrs1379130 GCGCCTCCGGAGGCG[T/C]GGAAGGAGGAGGAGG CGGCCACAAGGTAGGArs506487 TTCACTTTAAAGGAT[G/A]ATGGACGAAAAGACG C_997633_10rs566351 GAGGATTCAATACTT[G/A]AACCGAGTAGGTAAA C_3182870_10rs613120 GAATATGCCCAGTCT[T/C]TACCGAAGTACTTGT C_997599_10rs578938 TTCATCATCTTTAAC[A/G]TTAAGTGATGAGCCA C_3182860_10rs2020876 CTGCCCAGCATGTCG[C/A]CTTAGAAAGTGCTGT GCATCGTTGATAAAATCCGCAGAAArs1042838 GCTCTCCCACAGCCA[G/T]TGGGCGTTCCAAATG GTCAGAGTTGTGAGAGCACTGGATrs492457 CTCTAATTTAAGGGT[T/C]ACTACTATTACTAGT C_1142764_10rs1042839 GAGATCCTACAAACA[C/T]GTCAGTGGGCAGATG GGTGTTTGGTCTAGGATGGAGATCrs511298 GTACTACTTGACTTT[C/T]AACATTATACACATG C_659831_10

* AOD, Assays-on-Demand (number beginning with “C_” is a catalog number).

8892

ESTROGEN RECEPTOR GENES AND BREAST CANCER

chosen through use of the software program PHASEpybest.py.18 Rare haplo-types representing less than 1% of the total were deleted for the purpose ofdetermining htSNPs. HT SNP Tester (at the same web site) was used todetermine the final set of htSNPs used in the analysis, which included rarehaplotypes. We used a permutation program [SNPEM (22) or PHASE 2.02(14)] to evaluate the statistical significance of any association observed be-tween haplotype and disease state. Schaid’s program, haplo.score [Schaid et al.(19)] which uses an E-M algorithm to estimate haplotypes and then testsdisease association through a general linear model, was used to verify signif-icant associations discovered with SNPEM. For sparse data, haplo.score com-putes simulation P-values for all score tests of association. In addition to ourhtSNP work, we performed two separate haplotype association analyses onblocks of strong linkage disequilibrium in ESR1.

Case–Control Sample Description. Breast cancer cases consisted of 1,006female patients with histologically confirmed breast cancer who presented fortreatment or consultation at Memorial Sloan Kettering Cancer Center fromJanuary 2000 through December 2001. DNA was obtained from peripheralblood samples. Information was obtained on age at time of diagnosis of breastcancer, age at donation of blood sample, histologic confirmation of breastcancer diagnosis, sex, and ethnicity. All DNA samples were permanentlyanonymized according to an Institutional Review Board (IRB)-approved pro-tocol. Samples were unselected for family history or any other demographiccharacteristic. Control subjects were drawn from the New York Cancer Study.The New York Cancer Study is a cohort study, modeled after the Framinghamstudy, in which 18,187 individuals were enrolled from January 2000 throughDecember 2002 (23). Individuals were recruited at 14 sites in the New Yorkmetropolitan area. Volunteers were recruited by advertisement targeted to anethnically diverse community. Volunteers were all individually interviewedand filled out a questionnaire that took 1 hour. Individuals were from 30 to 65years of age. Individuals provided informed consent for use of DNA, which

was obtained from 50 cc of whole blood. Information was gathered on age, sex,personal medical history, ethnicity (by 2000 Census nomenclature), familyhistory, substance use, reproductive history and medication use. For the cohortgroup, a subset of 613 subjects were chosen who were female, free of breastcancer, and with age and ethnicity annotating each sample. Other information,including all personal identifiers were permanently removed, according toinstructions of participating IRBs. IRB approval for the study stipulated that noclinical information beyond case or control designation, sex, ethnicity and agecould be linked to samples subsequent to anonymization for DNA analysis.Controls were age, sex, and ethnically matched to cases. In both cases andcontrols, DNA was extracted from peripheral blood lymphocytes with QiagenQIAamp Blood kits (Qiagen GmbH, Hilden, Germany) and then spectropho-tometrically quantitated. Human subjects research approvals were obtained atMemorial Sloan-Kettering Cancer Center, and an exemption was obtained atthe National Cancer Institute. Among 1,626 research subjects, 7 males wereexcluded from the final analysis on account of their sex. The inclusion orexclusion of these males had an insignificant impact on our statistical conclu-sions. The female case population was composed of 927 European Americans,388 Ashkenazi Jews, 149 African Americans, 81 research subjects of Hispanicethnicity, 39 Asians, and 35 subjects for whom no ethnicity or race was provided.

Verification Data Set. On completion of genotyping and analysis from theNew York Academic Medical Development Corporation (NY AMDeC) study,we were provided a set of incomplete genotyping data from 298 breast cancercases and 94 controls from a study conducted at Vanderbilt University Schoolof Medicine. Investigators there (24) had typed five SNPs in ESR1 during theearly 1990s, but have recently added to their data set. Although these inves-tigators have now typed seven SNPs (rs2077647, rs746432, Intron 1 PvuII(which is rs2234693 or c.454–397T�C), intron 1 XbaI (which is rs9340799 orc.454–351A�G), Exon 2 codon 160, rs1801132, and rs2228480), only four(rs746432, rs2234693, rs9340799, and rs1801132) provide sufficient infor-mation to assemble 564 useful haplotypes from 282 individuals (190 cases and94 controls).18 Written by Ross Lazarus at http://www.innateimmunity.net.

Table 1 Continued

Reverse primer Normal probe Variant probe

CGTTCTCCAAACTGATGACCAA VIC-CTTTGTCCGTAAATT-MGBNFQ 6FAM-ACTTTGTCCGTGAATT-MGBNFQ

GCATGTGCGATGGCTCAGT VIC-CACACACTCTCTCTG-MGBNFQ 6FAM-CAC - ACACGCTCTG-MGBNFQTTCCCTTGGATCTGATGCAGTA VIC-CATCCCAGATGCT-MGBNFQ 6FAM-CCATCCCGGATGC-MGBNFQ

CACTGAAGGGTCTGGTAGGATCA VIC-CCCCCCATACTCT-MGBNFQ 6FAM-CCCCCGATACTCT-MGBNFQ

GGGTAAAATGCAGCAGGGATT VIC-CTGCCACAGTCTG-MGBNFQ 6FAM-CTGCCACGGTCTG-MGBNFQ

TGCCGAAGACCAGTCATAGC VIC-CTTAGAGATGTAGCCCCCACC-MGBNFQ FAM-TTAGAGATGTAGCTCCCACC-MGBNFQGCACTCACCACACTTCACCAT VIC-CCCACTTCGTAACACT-MGBNFQ FAM-TCCCACTTCATAACACT-MGBNFQCCATCATTAACACCTCCATCCAACA VIC-CCAAGTACGGCTCTT-MGBNFQ FAM-AAGTGCGGCTCTT-MGBNFQCCAGGCTCCTGACACACT VIC-CACGCTTCAGCTTGTGA-MGBNFQ FAM-ACGCTTCAGCCTGTGA-MGBNFQ

GGTTTTTAACCACATAACTAACTTCAAAGTATTTTAACT VIC-CACTTCAATTTCCC-MGBNFQ FAM-CACTTCAGTTTCCC-MGBNFQACACAGTTCCTAACCTGCATCTG VIC-CAGCTGTGCCTGCCAA-MGBNFQ FAM-AGCTGTGGCTGCCAA-MGBNFQ

ACTCAAATGACAAGTGAAGCTAGTTCTC VIC-CCACACGCGCAAA-MGBNFQ 6FAM-CACACGCACAAAT-MGBNFQTGCGACGGCAATTTAGTGACA CGGCTCCTTTATCTC CGGCTCTTTTATCTCCGGGTACGCGCAGTCG VIC-AGGGTGAACTCCG-MGBNFQ 6FAM-AGGGTGAAGTCCG-MGBNFQCGCCCGCTCTAAAGATAAA FAM-AGGCCTCCGCACCTTCC-TAMRA [Propynyl-T] JOE-TCCGCGCCTTCCTCCT-TAMRA [Propynyl-T]

GACCATGCCAGCCTGACA VIC-CACTTTCTAAGGCGACATG-MGBNFQNFQ FAM-CACTTTCTAAGTCGACATG-MGBNFQNFQGGGCTTGGCTTTCATTTGG VIC-ACAGCCAGTGGGC-MGBNFQ 6FAM-AGCCATTGGGCGTT-MGBNFQ

TCAGGTGCAAAATACAGCATCTG VIC-CACTGACGTGTTTGTA-MGBNFQ 6FAM-CCCACTGACATGTT-MGBNFQ

8893

ESTROGEN RECEPTOR GENES AND BREAST CANCER

RESULTS

Linkage Disequilibrium. Both PGR and ESR2 SNPs have a largeamount of linkage disequilibrium (Fig. 2B and C). Thus, we see a smallnumber of haplotypes when we do frequency estimation by EM. ESR1haplotypes indicated only moderate linkage disequilibrium in the regionas represented by the D� statistic, and more modest linkage disequilib-rium as measured by a correlation coefficient (R2; Fig. 2A). An estimationof the total frequencies was done by EM and PHASE as described inMaterials and Methods. These showed 585 haplotypes for 1,626 individ-uals with 17 loci. Codominant segregation of five alleles of ESR1(rs851984, ESR1002, rs2077647, rs1801132, and rs2228480) was con-firmed by screening several Centre d’Etudes du Polymorphisme Humain(CEPH) families. European-American allele frequencies in the NYAMDeC study were compared with those of CEPH founders. Each wasused separately for linkage disequilibrium estimation (data not shown).Our linkage disequilibrium measurements in ESR1 were consistent withthose provided by Zuppan et al. (25).

Analysis of Individual Single Nucleotide Polymorphisms inESR1. Seventeen SNPs were typed in ESR1 with an average distanceof 25,606 bp. Fifteen of these 17 variants are in the public dbSNP,whereas two are unique to Celera (hCV1576295, ESR1002). The latteris a T/G SNP identified by Celera as located in the promoter regionupstream of untranslated exon 1C. It is �1831 bp from the transcrip-tion start site specified by NM_000125 as elaborated in the April 2003(Build 33, hg15) version of the human genome map.10 After checkingHardy–Weinberg equilibrium in each of the six sample groups(Asians, African Americans, Hispanics, Ashkenazi Jews, Unknownethnicity, and European Americans), was carried out a two-by-twocontingency table analysis on each SNP in every gene with affection

status. Hardy–Weinberg compliance was obtained in each populationin every control group, so long as cell numbers did not reach singledigits. When cell numbers were very small, such as the Asian popu-lation, Fisher’s Exact Test evidenced no violations of Hardy–Weinberg in the control population. Each population was also strati-fied into two age categories: age �50 and age �50. A comparison ofthe age match between cases and controls is available as supplemen-tary data at ftp://ftp.ncifcrf.gov/pub/users/goldb/in a folder labeledCANCERRESEARCH.9 Differences in the genotype distributionsbetween cases and controls were tested with the �2 test, Fisher’s ExactTest, and a Monte Carlo test. Although no SNPs were differentiallydistributed among aggregate cases and controls in a statistically sig-nificant way, when the population was age-stratified into an age50-and-under and an over-50 age group, three ESR1 SNPs werestatistically significantly associated (with P-values � 0.01, 0.001, and0.003 by Fisher’s Exact Test) with disease in the Jewish population;these are ESR1002, rs2077647, and rs827421 respectively (see Fig. 3).Whereas ESR1002 is located at a putative promoter site, rs2077647 is asynonymous SNP in exon 1 often described as S10S, and rs827421 islocated in intron 1. Among the Ashkenazi Jewish control subjects overage 50, 82, 43, and 57% were distributed with the more common allele,at the three SNPs, respectively; whereas 89, 56, and 45 of the cases,possess that allele. Both the 3 � 2 genotype distributions and the 2 � 2allele distributions were statistically significant departures from that ex-pected for these two SNPs. These SNPs implicate the region, denoted asthe A/B region of the steroid hormone receptor, that encodes the ligand-dependent transactivation domain. As expected, a haplotype resultingfrom SNPs ESR1002 and rs2077647, which is the most common haplo-type among the older Ashkenazi Jews, accounting for 50% of all haplo-

Fig. 1. Maps of typed steroid receptor geneSNPs. A, typed ESR1 polymorphisms. Top bar,maps all 17 typed polymorphisms with dbSNPnames and including the three synonymous poly-morphisms typed in the coding region (rs2077647is ESR1390, which is also known as S10S in exon1; rs1801132 is ESR1071 which is P325P in exon4; and rs2228480 is ESR1031 which is T594T inexon 8). SYNE1, Nuclear Envelope Spectrin RepeatProtein 1, also known as Synaptic Nuclear Enve-lope Protein 1. Bottom bar, only those SNPs map-ping to the transcribed (RefSeq) region of the gene.B, typed ESR2 polymorphisms. SYNE2, NuclearEnvelope Spectrin Repeat Protein 2, also known asSynaptic Nuclear Envelope Protein 2. C, typedPGR polymorphisms.

8894

ESTROGEN RECEPTOR GENES AND BREAST CANCER

types among them, 11 (T-C), is associated with an increased risk forbreast cancer among Ashkenazi Jews over 50 years of age or older [oddsratio (OR), 1.706; 95% confidence interval (CI), 1.213–2.399; P � 0.05from SNPEM, for the dominant model and OR � 2.916, 95% CI1.598–5.320 for the recessive model]. The details of the contingencytables are available as supplementary data at ftp://ftp.ncifcrf.gov/pub/users/goldb/in a folder labeled CANCERRESEARCH.9

Typing of eight htSNP (haplotype tagging) sites, composed ofrs851984, rs2881766, rs2077647, hCV1576295, rs1801132, rs6905370,rs2228480, and rs3798577 in the ESR1 locus, permitted identification ofindividual haplotypes associated with either an increased or a decreasedrisk of breast cancer (see Table 2).

Protective Haplotypes of ESR1. Three protective haplotypes, H4,H6 and H13, evidenced a statistically significant level of protectionamong overall female research subjects. When only female European-

Fig. 3. Histogram and contingency table analysis of statistically significant susceptibleESR1 SNPs in presumptively postmenopausal Ashkenazi-Jewish patients and controls. A,bar graph representing the genotype counts of the putative promoter SNP ESR1002 in theolder-than-age-50 Ashkenazi-Jewish case–controls under study. Case–control genotypecomparison in the 3 � 2 contingency table provided a P-value of �0.02 with Fisher’sExact Test. B, allele counts of the ESR1002 SNP in the over-age-50 Ashkenazi-Jewishcase–controls. The Fisher’s Exact Test P-value computed for this comparison is �0.012.C, bar graph depicting the genotype counts of the S10S SNP in the over-age-50 Ash-kenazi-Jewish case–controls. This case control genotype comparison provided a P-value�0.0025 with Fisher’s Exact Test. D, allele counts of the S10S SNP in the over-age-50Ashkenazi-Jewish case–control population. These provide a P-value of �0.0025 with anOR of 1.706 (95% CI, 1.213–2.399) for rs2077647 alone. E, data on the distribution ofESR1 intron 1 genotypes (rs827421) in Ashkenazi-Jewish cases and controls over age 50.The �2 P-value computed for this comparison is �0.004. F, the allele counts for the samecomparison of cases and controls in the over-age-50 Ashkenazi-Jewish population studied.This SNP evidences a P-value of �0.003. Contingency table details for these SNPs andresultant haplotype can be found at ftp://ftp.ncifcrf.gov/pub/users/goldb/in a folder labeledCANCERRESEARCH.9

Fig. 2. Linkage disequilibrium measures with the ESR1, ESR2, and PGR SNPs inEuropean-American controls. The upper right triangle plot in each panel provides ameasure of D�; the lower left triangle plot in each panel provides a measure of thecorrelation coefficient (Pearson’s R2). A, linkage disequilibrium statistics for ESR1, alongwith displacement measures relating to coordinates in NCBI genome build 33 (April2003).17 In the lower right hand corner of the figure, there is a legend for each of thedisequilibrium measures. B, linkage disequilibrium statistics for ESR2. C, linkage dis-equilibrium statistics for PGR.

8895

ESTROGEN RECEPTOR GENES AND BREAST CANCER

American study subjects were evaluated, only H6 conferred statisti-cally significant protection. We sought to further define the location ofthe statistically protective signal through two methods: (a) redefiningthe haplotype with a rapid redefinition feature of SNPEM, and (b)stratifying the population by age. Research subjects over 50, bearingH6, evidenced significant protection that could be localized to the firstsix SNPs of the eight-SNP haplotype as H6b (see Table 2). H6b wasonly marginally significant with the Dirichlet method in Northern

Europeans. Peak significance of the protective haplotype was obtainedwhen only the first six htSNPs were used to estimate haplotypes in theolder population sample (P � 0.004). With these parameters, 1.8% ofthe older research subjects with a six-SNP haplotype, 112221 (G-T-T-C-C-G) provided an OR of 0.330 (95% CI, 0.136–0.799). The useof PHASE 2.02, a Bayesian haplotype estimation method, in case–control mode, affirmed the statistical significance of the association(data not shown).

Table 2 Selected ESR1 haplotypes

Groups n Cases and controlsNo. of 8 SNP

haplotypesHaplotype

name Significant haplotype

All 1619 1006 cases; 613 controls 131 H1 11111111 (G-T-C-T-G-G-G-T)H2 11112211 (G-T-C-T-C-A-G-T)H2a 11221 (G or A-T or G-C-T-C-A-G-T or C)H3 11211111 (G-T-T-T-G-G-G-T)H4 11212111 (G-T-T-T-C-G-G-T)H6 11222111 (G-T-T-C-C-G-G-T)H7 11222211 (G-T-T-C-C-A-G-T)H13 12211112 (G-G-T-T-G-G-G-C)

European Americans 927 582 cases; 345 controls 112 H1 11111111 (G-T-C-T-G-G-G-T)H3 11211111 (G-T-T-T-G-G-G-T)H4 11212111 (G-T-T-T-C-G-G-T)H6 11222111 (G-T-T-C-C-G-G-T)H6b 112221 (G-T-T-C-C-G-G or A-T or C)H7 11222211 (G-T-T-C-C-A-G-T)H13 12211112 (G-G-T-T-G-G-G-C)

African Americans 149 92 cases; 57 controls 64 H1 11111111 (G-T-C-T-G-G-G-T)H3 11211111 (G-T-T-T-G-G-G-T)H4 11212111 (G-T-T-T-C-G-G-T)H6 11222111 (G-T-T-C-C-G-G-T)H13 12211112 (G-G-T-T-G-G-G-C)H10 12111212 (G-G-C-T-G-A-G-C)H14 22111211 (A-G-C-T-G-A-G-T)

Ashkenazi Jews 388 238 cases; 150 controls 76 H1 11111111 (G-T-C-T-G-G-G-T)H4 11212111 (G-T-T-T-C-G-G-T)H5 11212211 (G-T-T-T-C-A-G-T)H6 11222111 (G-T-T-C-C-G-G-T)H8 12111111 (G-G-C-T-G-G-G-T)H11 12121111 (G-G-C-C-G-G-G-T)H12 12211111 (G-G-T-T-G-G-G-T)H13 12211112 (G-G-T-T-G-G-G-C)

Asians 39 26 cases; 13 controls 28 H1 11111111 (G-T-C-T-G-G-G-T)H3 11211111 (G-T-T-T-G-G-G-T)H4 11212111 (G-T-T-T-C-G-G-T)H6 11222111 (G-T-T-C-C-G-G-T)H13 12211112 (G-G-T-T-G-G-G-C)

Hispanics 81 49 cases; 32 controls 47 H1 11111111 (G-T-C-T-G-G-G-T)H3 11211111 (G-T-T-T-G-G-G-T)H4 11212111 (G-T-T-T-C-G-G-T)H6 11222111 (G-T-T-C-C-G-G-T)H13 12211112 (G-G-T-T-G-G-G-C)

Other ethnicities 35 19 cases; 16 controls 36 H1 11111111 (G-T-C-T-G-G-G-T)H3 11211111 (G-T-T-T-G-G-G-T)H4 11212111 (G-T-T-T-C-G-G-T)H6 11222111 (G-T-T-C-C-G-G-T)H13 12211112 (G-G-T-T-G-G-G-C)

Age 50 and under 606 † 447 cases; 159 controls 103 H1 11111111 (G-T-C-T-G-G-G-T)H3 11211111 (G-T-T-T-G-G-G-T)H4 11212111 (G-T-T-T-C-G-G-T)H6 11222111 (G-T-T-C-C-G-G-T)H9 12111211 (G-G-C-T-G-A-G-T)H13 12211112 (G-G-T-T-G-G-G-C)

Age over 50 1008 † 559 cases; 454 controls 115 H1 11111111 (G-T-C-T-G-G-G-T)H3 11211111 (G-T-T-T-G-G-G-T)H4 11212111 (G-T-T-T-C-G-G-T)H6 11222111 (G-T-T-C-C-G-G-T)H6b 112221 (G-T-T-C-C-G-G or A-T or C)H6a 112222 (G-T-T-C-C-A-G or A-T or C)H8 12111111 (G-G-C-T-G-G-G-T)H13 12211112 (G-G-T-T-G-G-G-C)

Abbreviations: H, haplotype; NS, not significant.† Five individuals did not provide age.‡ No controls of this ethnicity with this haplotype.

8896

ESTROGEN RECEPTOR GENES AND BREAST CANCER

Susceptible Haplotypes of ESR1. Among all female study partic-ipants, H3 and H4 evidenced statistically significant susceptibility tobreast cancer. H3 evidenced susceptibility when only older study subjectswere considered. This haplotype (H3) was very rare among AfricanAmericans, and did not evidence significant susceptibility in study par-ticipants ages 50 or under. Haplotype H8: 12111111 (G-G-C-T-G-G-G-T) was not associated with an increased risk of breast cancer when thedata set as a whole was considered, but was significant by two measures(P-value from SNPEM � 0.014 and P-value from haplo.score �0.02)when only self-identified Ashkenazi-Jewish research subjects were ana-

lyzed, although the 95% CIs encompass unity when males are excludedfrom the analysis [OR, 3.292; 95% CI, 0.945–11.471 (OR, 3.706; 95%CI, 1.076–12.757; P � 0.01 with males included in the analysis)]. Aseparate susceptible haplotype localized to five (of the eight htSNPs)distal to the protective haplotype. In this analysis, rs2077647,hCV1576295, rs1801132, rs6905370, and rs2228480 formed haplotypeH2a 11221 (C-T-C-A-G) that accounted for 2.6% of the study subjectswith a P � 0.001 for risk, an OR of 1.776 (95% CI, 1.001–3.152). Thisis consistent with a susceptible haplotype mapping to exon 4 and intron4, which might implicate the ligand-binding domain of ER�.

Table 2 Continued

P value using Dirichletdistribution from SNPEM

P value fromhaplo. score

Proportion of haplotypesin group (using SNPhap)

OR (CI) used SPSS for risk ratios, SNPhap haplotypeestimates and all other cases as referent haplotype

NS NS 6.0% 1.260 (0.949–1.672)NS P � 0.03 1.3% 1.562 (0.819–2.979)

P � 0.0040 P � 0.01 2.6% 1.776 (1.001–3.152)P � 0.0260 NS 2.6% 2.108 (1.180–3.767)

NS NS 1.0% 0.390 (0.168–0.904)NS P � 0.002 1.0% 0.379 (0.171–0.838)NS NS 1.2% 0.775 (0.441–1.361)

P � 0.0080 P � 0.003 1.1% 0.364 (0.159–0.834)

NS NS 5.9% 1.297 (0.883–1.903)NS NS 2.9% 1.314 (0.708–2.439)NS NS 1.4% 0.524 (0.201–1.364)NS P � 0.004 1.0% 0.261 (0.080–0.851)NS P � 0.004 1.6% 0.293 (0.109–0.784)NS P � 0.008 2.2% 0.709 (0.409–1.229)NS NS 1.3% 0.738 (0.343–1.586)

NS NS 4.8% 2.188 (0.588–8.136)NS NS 0.5% (one case)NS NS 1.5%NS NS NoneNS NS 4.1% 0.840 (0.283–2.490)NS NS 1.7% (no cases)

P � 0.011 NS 1.3% (no cases)

NS NS 7.3% 1.251 (0.730–2.143)NS NS NoneNS P � 0.05 0.6% 0.426 (0.071–2.567)NS NS 0.8% 0.641 (0.128–3.197)

P � 0.014 P � 0.02 2.5% 3.292 (0.945–11.471)NS NS 0.3% 0.634 (0.127–3.162)NS NS 1.4% 0.642 (0.090–4.580)NS NS 0.6% 0.317 (0.029–3.546)

NS NS 8.0%NS NS 4.3%NS NS NoneNS NS NoneNS NS None

NS NS NoneNS NS 4.2% 3.434 (0.391–30.174)NS NS NoneNS NS NoneNS NS None

NS NS 9.1%NS NS NoneNS NS NoneNS NS NoneNS NS None

NS NS 4.1% 0.775 (0.386–1.556)NS NS 5.6% 1.050 (0.620–1.776)NS NS 0.6% ‡NS NS NoneNS P � 0.05 1.5% 0.561 (0.216–1.461)NS NS 0.6% 0.239 (0.040–1.436)

NS NS 5.9% 1.252 (0.860–1.824)NS NS 1.8% 4.103 (1.184–14.220)NS NS 0.7% 0.202 (0.023–1.812)

P � 0.003 P � 0.001 1.2% 0.344 (0.132–0.899)P � 0.004 P � 0.0003 1.8% 0.330 (0.136–0.799)

NS P � 0.04 2.1% 0.776 (0.449–1.340)NS NS 1.2% 0.901 (0.364–2.228)NS P � 0.01 1.4% 0.621 (0.271–1.423)

8897

ESTROGEN RECEPTOR GENES AND BREAST CANCER

We reanalyzed the ESR1 data with two blocks of SNPs in clearlinkage disequilibrium for association with breast cancer, i.e., SNPs 2,3, 4, 5, 6, and 7 as one block and SNPs 10, 11, 12, and 13 as anotherblock for association analysis. We found some susceptibility in olderEuropean-American study participants (P � 0.05) in a haplotype of221 or G-A-C for rs1884051-rs6905370-rs926778, but the ORs var-ied only between 1.4 and 1.8, with each 95% CI overlapping unity, yetwith significance with the SNPEM permutation algorithm (22) pre-served among older European Americans (P � 0.05). This haplotypewas very rare in Jews, observed only twice, both times among cases.

Results from the Vanderbilt Validation Data Set. One haplo-type, consisting of four SNPs, from a genotype complete subset of theVanderbilt case–control study (see Materials and Methods), provideda statistically significant association with breast cancer when analyzedwith SNPEM and when ORs were computed with SNPhap and SPSS.This haplotype 1211 (C-C-A-G) provided a P-value �0.05 throughcomparison with a permuted distribution in SNPEM, and a P � 0.004with Fisher’s Exact Test (OR, 4.619, 95% CI, 1.378–15.481). Thehaplotype was present in 29 (� 5%) of the estimated 568 haplotypes.Among cases, 26 among 345 weighted haplotypes were 1211 (C-C-A-G); among 173 weighted control haplotypes only 3 were 1211(C-C-A-G). This data set and haplotype also evidenced a greater riskfor breast cancer for bearers over age 50 (P � 0.01 by Fisher’s ExactTest); limiting the analysis to that group demonstrated an OR of 7.9(95% CI, 1.025–60.868). With weighted probabilities and SNPhap onsubjects over age 50 to determine the haplotypes, there were 14 casesand only 1 control with this haplotype. Whether 1211 (C-C-A-G) inthe Vanderbilt data set overlaps with, or is identical to, one of thehaplotypes in the NY AMDeC data set cannot be determined at thistime, because there is only a single SNP typed in common in the twocase–control data sets (rs2077547, or S10S).

Susceptible Haplotype in ESR2 among Ashkenazi Jews. The388 self-identified Ashkenazi Jewish females in the study were typedfor the eight markers in ESR2 at 14q23.2 described in Fig. 1. Onehaplotype, E2H1, formed from the last seven of these SNPs,AAAAAAA (T-C-G-G-T-A-C), beginning with rs1256030, mani-fested a statistically significant (P � 0.037) susceptibility for breastcancer (OR, 2.317; 95% CI, 1.042–5.155; see Table 3). This was thethird most common ESR2 haplotype among the Ashkenazi Jewstested, making up 11.7% of the predicted haplotypes. The 382 Ash-kenazim in the study for whom complete genotypes were provided byTaqMan were further investigated with SNPEM. On reduction of thehaplotype under investigation to the final five 3� SNPs in ESR2, a newbut related haplotype, E2H5, was statistically significant among theAshkenazim, AAAAA, (C-G-G-T-A; P � 0.001 OR, 1.82; 95% CI,1.213–2.737). Localization of the SNPs providing the haplotype as-sociation has been possible through the use of the SNPEM permuta-tion algorithm (Table 3). This implicated the four distal (3�) SNPs of

ESR2 as giving rise to the significant haplotype because the exon 4 Cor T polymorphism identified in this study is invariant among theAshkenazi Jewish subjects.

DISCUSSION

These results suggest that a portion of hereditary predisposition tobreast cancer can be accounted for by allele polymorphism in genes inthe steroid hormone pathway. Individual differences in hormonalregulation may result from haplotypes that confer an increased risk orprotection from risk of breast cancer in a subset of the population.

Although a linkage between late-onset breast cancer and ESR1 waspublished in 1991 (25) and several published studies find statisticallysignificant associations between ESR1 polymorphisms and breastcancer (24, 26–30), other studies have not shown linkage or associ-ation (31–33). Each of these studies either had a relative paucity ofsamples or chose only a few SNPs, with no efforts to generatehaplotypes for haplotype estimation and association analysis. Re-cently, a breast cancer protective association was reported for a GTdinucleotide repeat polymorphism 6627 bp upstream from the tran-scription start site of ESR1 exon 1 in a large, ethnically homogeneousHan Chinese cohort (GT18 allele; OR, 0.58; 95% CI, 0.36–0.94; ref.29). Previously, that same group had reported a PvuII polymorphismin intron 1 that was associated with increased breast cancer risk(genotype pp; OR, 1.4; 95% CI, 1.1–1.8; ref. 30). Data suggest that themost frequent allele of this common PvuII polymorphism eliminatesa functional binding site for transcription factor B-myb, which may,therefore, down-regulate ESR1 transcription (34, 35). Although wehave not typed the variants described by Cai et al. (29, 30), themagnitude and direction of the protective effects seen by them areconsistent with the haplotype data reported here.

Since the discovery of ESR2 in 1996 (36, 37), several groups havecharacterized its unique expression profile (38–44), but few havesearched for polymorphisms associated with breast cancer risk (45–47). Five ESR2 polymorphisms have been identified in the Germanpopulation (48), among which, one, rs1256049, provided evidence foran association with anorexia nervosa. This same ESR2 polymorphismhad a highly statistically significant association with ovulatory dys-function in a Chinese population (49). In addition, an intragenic CArepeat polymorphism in ESR2 has been associated with bone mineraldensity in a Japanese research subject population (50). More recently,the Shanghai breast study group reported an ESR2 exon 7 synony-mous SNP (rs1256054, L392L) as conferring increased risk of breastcancer (OR, 2.37; 95% CI, 1.18–4.77) in a robust study (47). Theyhypothesized that this SNP may act as an exonic splicing enhancer.We are currently in the process of typing this SNP; although prelim-inary results indicate that it is quite rare and will likely be uninfor-

Table 3 Significant ESR2 haplotypes in Ashkenazi-Jewish NY AMDEC research subjects

Haplotype name HaplotypeHaplotype % among

AshkenazimP-value using Dirichlet

Distribution from SNPEMP-value fromhaplo. score OR (95% CI)

E2H1 X AAAAAAA(T or G)-T-C-G-G-T-A-C

11.8% 0.037 0.03766 2.317 (1.042–5.155)

E2H2 XAAAAAA(T or G)-T-C-G-G-T-A

11.8% 0.002 0.00257 2.043 (1.243–3.358)

E2H3 XAAAAA(T or G)-T-C-G-G-T

11.8% 0.002 0.00344 1.951 (1.195–3.184)

E2H4 XXAAAAAA(T or G)(T or C)-C-G-G-T-A-C

11.8% 0.009 0.01391 2.339 (1.140–4.797)

E2H5 X XAAAAA(T or G)(T or C)-C-G-G-T-A

17.6% 0.001 0.00267 1.822 (1.213–2.737)

Note. Places in the haplotype table above represent variants in the ESR2 gene typed in Ashkenazi-Jewish research subjects in this study. The most common alleles were typed as“A” for SNPEM input. Most common alleles were “T” in rs1271572, “T” in rs1256030, “C” in EX4CorT, “G” in rs1256049, “T” in rs4986938, “A” in rs928554, and “G” in rs1255998.“C” in EX4CorT was invariant among Ashkenazi Jews tested.

8898

ESTROGEN RECEPTOR GENES AND BREAST CANCER

mative in our European-American and Ashkenazi-Jewish populations(data not shown).

We typed six intronic and 3� SNPs in addition to six of thosecharacterized by De Vivo et al. (51: �44 C/T, �331 G/A, S344T,G393G, V660L, and H770H). As can be seen through inspection ofFig. 2C, we found significant linkage disequilibrium throughout thePGR gene as gauged by either the D� or R2 statistic computed for the12 SNPs that we typed. However, we found no single SNP, nor anyhaplotype, that was significantly associated with breast cancer whenwe stratified by age, ethnicity, or both. We sought to determine whywe were not able to replicate the association of the functional �331G/A polymorphism with breast cancer risk reported by De Vivo et al.(51). Whereas they found 87% GG at this site among their cases and90% GG among their controls, we found 93.4% and 93% among ourcases and controls, respectively; and whereas they found 87% of thissame genotype among their postmenopausal cases with 90% amongtheir postmenopausal controls, we found 93% GG and 93.2% GGamong our older cases and controls, respectively. We sought todetermine whether this disparity could be explained by demographicdifferences between the Nurses Health Study participants, polled byDe Vivo, and our research subjects by stratifying our results by bothethnicity and age. This analysis of �331 G/A genotypes in cases andcontrols also yielded no association.

This study has several limitations inherent in its study design. Togain IRB approval, we permanently anonymized DNA samples aftercollection of a minimum clinical annotation. Thus, it is not nowpossible to retrospectively examine clinical or demographic records toexamine additional potential confounding variables, such as endoge-nous or exogenous estrogen exposure and other environmental vari-ables. To address the effect of age of menopause, we used a surrogatemarker (age � or �50).

Second, our study, although among the largest to date, still lacksstatistical power to come to firm conclusions concerning the relation-ship of ESR1, ESR2, or PGR SNPs or haplotypes with respect to thespecific populations tested. Our genotyping adhered to reproducibilityand control standards published elsewhere (13), and we meet recentlypublished genotyping standards (52). In addition, we demonstratedhaplotype segregation in the CEPH families on the five initial SNPsprovided by Celera and tested and obtained Hardy–Weinberg equi-librium in each of our control populations, even although some werequite small.

Although we made an effort to verify our findings in an independ-ent data set, this additional analysis was also underpowered. None-theless, the Vanderbilt verification data set confirmed the existence ofan ESR1 risk haplotype, which includes the aforementioned PvuIISNP, although it may not be precisely the same haplotype identifiedin the NY AMDeC study or the Shanghai breast cancer study.

With regard to the statistical limitations inherent in our studydesign, we used a variety of methods of imputing haplotypes toprovide evidence for statistically significant disease associations.Whereas Nyholt (53) has emphasized a critical need for multiple testcorrections in disease association studies to avoid Type I error,Krawczak et al., (54) and others (55, 56) have questioned the over-zealous application of the Bonferroni (57) correction. In an attempt toavoid potential errors in inference about associations stemming fromthe method of determination of haplotypes, we applied several differ-ent haplotype inference procedures. We observed that the methodsfound the common haplotypes accurately, but there was some dis-agreement in the rare haplotypes. With a variety of methods, someassuming underlying normal distributions, others being permutationbased and still others being Bayesian, we have found similar haplo-type frequencies in the genes under investigation. Such methodsinclude those of Fallin et al.(15) and Fallin and Schork (22, 58), whose

haplotype assignment, affection status permutation algorithm has beenadopted by Schaid et al. (19) and modified by Stephens and Donnelly(59). These methods (22) use EM haplotype estimations, which as-sume compliance with Hardy–Weinberg equilibrium, whereas Ste-phens’ Gibbs sampling algorithm bases haplotype estimations onexistent allele frequencies. In all cases of significant associationbetween haplotypes and breast cancer risk, these methods providedconsistent haplotype inferences.

Using a variety of statistical and laboratory methods, we havediscovered and validated the presence of common polymorphisms inthree sex steroid hormone receptor genes. These candidate genes fordisease association analysis have functional significance for the eti-ology under study and, therefore, cannot be thought of as impartiallyselected for statistical association testing. With cases and controlsfrom the same geographical area and matched for age, we analyzedpatterns of linkage disequilibrium and affection trait association withthese genetic variants. We were able to define ESR1 haplotypes thatconferred significant association with breast cancer risk in a NorthAmerican population. Whereas the “protective” alleles that have beenidentified are quite rare in the populations studied, and their overallcontribution to disease may be quite small, this analysis of SNPgenotypes provides a means to associate variants in steroid hormonereceptor genes and the breast cancer phenotype. Continued study ofhaplotypes of candidate genes in the steroid hormone receptor signaltransduction pathway will provide additional insight into the biologyof breast neoplasia.

ACKNOWLEDGMENTS

The authors thank Drs. Vanessa Clark and Daniele Fallin for helpfulcomments on the statistical analysis. Drs. Robert Stephens and MatthewStephens graciously provided recompiled software. The authors would like toacknowledge the New York Cancer Project, which, in connection with thepublication of this study, made available biological samples from and infor-mation on control individuals. The New York Cancer Project is administeredand funded by AMDeC Foundation, Inc. The content of this publication doesnot necessarily reflect the views of the Department of Health and HumanServices, nor does mention of trade names, commercial products, or organi-zations imply endorsement by the United States Government.

REFERENCES

1. Newman B, Mu H, Butler LM, Millikan RC, Moorman PG, King MC. Frequency ofbreast cancer attributable to BRCA1 in a population-based series of Americanwomen. JAMA 1998;279:915–21.

2. Vehmanen P, Friedman LS, Eerola H, et al. A low proportion of BRCA2 mutationsin Finnish breast cancer families. Am J Hum Genet 1997;60:1050–8.

3. Whittemore AS, Gong G, Itnyre J. Prevalence and contribution of BRCA1 mutationsin breast cancer and ovarian cancer: results from three U.S. population-based case-control studies of ovarian cancer. Am J Hum Genet 1997;60:496–504.

4. Lichtenstein P, Holm NV, Verkasalo PK, et al. Environmental and heritable factorsin the causation of cancer–analyses of cohorts of twins from Sweden, Denmark, andFinland. N Engl J Med 2000;343:78–85.

5. Clark GM, McGuire WL, Hubay CA, Pearson OH, Marshall JS. Progesteronereceptors as a prognostic factor in Stage II breast cancer. N Engl J Med 1983;309:1343–7.

6. Parl FF. Estrogens, estrogen receptor and breast cancer. Amsterdam: IOS Press/Ohmsha; 2000.

7. de Jong MM, Nolte IM, te Meerman GJ, et al. Genes other than BRCA1 and BRCA2involved in breast cancer susceptibility. J Med Genet 2002;39:225–42.

8. Jordan VC. Selective estrogen receptor modulation: concept and consequences incancer. Cancer Cell 2004;5:207–13.

9. Hanstein B, Djahansouzi S, Dall P, Beckmann MW, Bender HG. Insights into themolecular biology of the estrogen receptor define novel therapeutic targets for breastcancer. Eur J Endocrinol 2004;150:243–55.

10. Tempfer CB, Schneeberger C, Huber JC. Applications of polymorphisms and phar-macogenomics in obstetrics and gynecology. Pharmacogenomics 2004;5:57–65.

11. Lymberis SC, Parhar PK, Katsoulakis E, Formenti SC. Pharmacogenomics and breastcancer. Pharmacogenomics 2004;5:31–55.

12. Bland KI, Copeland EM. The breast: comprehensive management of benign andmalignant disorders. 3rd ed. St. Louis: Saunders; 2004.

8899

ESTROGEN RECEPTOR GENES AND BREAST CANCER

13. Clark VJ, Metheny N, Dean M, Peterson RJ. Statistical estimation and pedigreeanalysis of CCR2-CCR5 haplotypes. Hum Genet 2001;108:484–93.

14. Stephens M, Smith NJ, Donnelly P. A new statistical method for haplotype recon-struction from population data. Am J Hum Genet 2001;68:978–89.

15. Fallin D, Cohen A, Essioux L, et al. Genetic analysis of case/control data usingestimated haplotype frequencies: application to APOE locus variation and Alzhei-mer’s disease. Genome Res 2001;11:143–51.

16. Long JC. Multiple locus haplotype analysis (MLOCUS, OBSHAP, PAIRWISE),Software and documentation distributed by the author. Section on population geneticsand linkage, 2.0 ed. Bethesda, MD: Laboratory of Neurogenetics, NIAAA, NationalInstitutes of Health; 1999.

17. Clark AG. Inference of haplotypes from PCR-amplified samples of diploid popula-tions. Mol Biol Evol 1990;7:111–22.

18. Fullerton SM, Clark AG, Weiss KM, et al. Sequence polymorphism at the humanapolipoprotein AII gene (APOA2): unexpected deficit of variation in an African-American sample. Hum Genet 2002;111:75–87.

19. Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA. Score tests forassociation between traits and haplotypes when linkage phase is ambiguous. Am JHum Genet 2002;70:425–34.

20. Johnson GC, Esposito L, Barratt BJ, et al. Haplotype tagging for the identification ofcommon disease genes. Nat Genet 2001;29:233–7.

21. Bafna VH, BV, Schwartz RS, Clark AG, Istrail S. Haplotypes and informative SNPselection algorithms: don’t block out information. Proceedings of the Seventh AnnualInternational Conference on Computational Molecular Biology (RECOMB) 2003:19–27.

22. Fallin D. Haplotype-based approaches to genetic case-control studies [Ph.D. Disser-tation]. Cleveland, Ohio: Case Western Reserve University; 2001.

23. Mitchell MK, Gregersen PK, Johnson S, Parsons R, Vlahov D. The New York CancerProject: rationale, organization, design, and baseline characteristics. J Urban Health2004;81:301–10.

24. Roodi N, Bailey LR, Kao WY, et al. Estrogen receptor gene analysis in estrogenreceptor-positive and receptor-negative primary breast cancer. J Natl Cancer Inst(Bethesda) 1995;87:446–51.

25. Zuppan P, Hall JM, Lee MK, Ponglikitmongkol M, King MC. Possible linkage of theestrogen receptor gene to breast cancer in a family with late-onset disease. Am J HumGenet 1991;48:1065–8.

26. Andersen TI, Heimdal KR, Skrede M, Tveit K, Berg K, Borresen AL. Oestrogenreceptor (ESR) polymorphisms and breast cancer susceptibility. Hum Genet 1994;94:665–70.

27. Iwase H, Greenman JM, Barnes DM, Hodgson S, Bobrow L, Mathew CG. Sequencevariants of the estrogen receptor (ER) gene found in breast cancer patients with ERnegative and progesterone receptor positive tumors. Cancer Lett 1996;108:179–84.

28. Wang M, Dotzlaw H, Fuqua SA, Murphy LC. A point mutation in the human estrogenreceptor gene is associated with the expression of an abnormal estrogen receptormRNA containing a 69 novel nucleotide insertion. Breast Cancer Res. Treat 1997;44:145–51.

29. Cai Q, Gao YT, Wen W, et al. Association of breast cancer risk with a GTdinucleotide repeat polymorphism upstream of the estrogen receptor-alpha gene.Cancer Res 2003;63:5727–30.

30. Cai Q, Shu XO, Jin F, et al. Genetic polymorphisms in the estrogen receptor alphagene and risk of breast cancer: results from the Shanghai Breast Cancer Study. CancerEpidemiol Biomark Prev 2003;12:853–9.

31. Southey MC, Batten LE, McCredie MR, et al. Estrogen receptor polymorphism atcodon 325 and risk of breast cancer in women before age forty. J Natl Cancer Inst(Bethesda) 1998;90:532–6.

32. Schubert EL, Lee MK, Newman B, King MC. Single nucleotide polymorphisms(SNPs) in the estrogen receptor gene and breast cancer susceptibility. J SteroidBiochem Mol Biol 1999;71:21–7.

33. Kang HJ, Kim SW, Kim HJ, et al. Polymorphisms in the estrogen receptor-alpha geneand breast cancer risk. Cancer Lett 2002;178:175–80.

34. Herrington DM, Howard TD, Brosnihan KB, et al. Common estrogen receptorpolymorphism augments effects of hormone replacement therapy on E-selectin butnot C-reactive protein. Circulation 2002;105:1879–82.

35. Schuit SC, Oei HH, Witteman JC, et al. Estrogen receptor alpha gene polymorphismsand risk of myocardial infarction. JAMA 2004;291:2969–77.

36. Kuiper GG, Enmark E, Pelto-Huikko M, Nilsson S, Gustafsson JA. Cloning of anovel receptor expressed in rat prostate and ovary. Proc Natl Acad Sci USA 1996;93:5925–30.

37. Mosselman S, Polman J, Dijkema R. ER beta: identification and characterization ofa novel human estrogen receptor. FEBS Lett 1996;392:49–53.

38. Zou A, Marschke KB, Arnold KE, et al. Estrogen receptor beta activates the humanretinoic acid receptor alpha-1 promoter in response to tamoxifen and other estrogenreceptor antagonists, but not in response to estrogen. Mol Endocrinol 1999;13:418–30.

39. Bieche I, Parfait B, Laurendeau I, Girault I, Vidaud M, Lidereau R. Quantification ofestrogen receptor alpha and beta expression in sporadic breast cancer. Oncogene2001;20:8109–15.

40. Poola I, Abraham J, Liu A. Estrogen receptor beta splice variant mRNAs aredifferentially altered during breast carcinogenesis. J Steroid Biochem Mol Biol 2002;82:169–79.

41. Poola I, Clarke R, DeWitty R, Leffall LD. Functionally active estrogen receptorisoform profiles in the breast tumors of African American women are different fromthe profiles in breast tumors of Caucasian women. Cancer (Phila) 2002;94:615–23.

42. Saji S, Omoto Y, Shimizu C, et al. Clinical impact of assay of estrogen receptor betacx in breast cancer. Breast Cancer 2002;9:303–7.

43. Omoto Y, Eguchi H, Yamamoto-Yamaguchi Y, Hayashi S. Estrogen receptor (ER)beta1 and ERbetacx/beta2 inhibit ERalpha function differently in breast cancer cellline MCF7. Oncogene 2003;22:5011–20.

44. Weihua Z, Andersson S, Cheng G, Simpson ER, Warner M, Gustafsson JA. Updateon estrogen signaling. FEBS Lett 2003;546:17–24.

45. Forsti A, Zhao C, Israelsson E, Dahlman-Wright K, Gustafsson JA, Hemminki K.Polymorphisms in the estrogen receptor beta gene and risk of breast cancer: noassociation. Breast Cancer Res Treat 2003;79:409–13.

46. Hasegawa S, Miyoshi Y, Ikeda N, et al. Mutational analysis of estrogen receptor-betagene in human breast cancers. Breast Cancer Res Treat 2003;78:133–4.

47. Zheng SL, Zheng W, Chang BL, et al. Joint effect of estrogen receptor beta sequencevariants and endogenous estrogen exposure on breast cancer risk in Chinese women.Cancer Res 2003;63:7624–9.

48. Rosenkranz K, Hinney A, Ziegler A, et al. Systematic mutation screening of theestrogen receptor beta gene in probands of different weight extremes: identification ofseveral genetic variants. J Clin Endocrinol Metab 1998;83:4524–7.

49. Sundarrajan C, Liao WX, Roy AC, Ng SC. Association between estrogen receptor-beta gene polymorphisms and ovulatory dysfunctions in patients with menstrualdisorders. J Clin Endocrinol Metab 2001;86:135–9.

50. Ogawa S, Hosoi T, Shiraki M, et al. Association of estrogen receptor beta genepolymorphism with bone mineral density. Biochem Biophys Res Commun 2000;269:537–41.

51. De Vivo I, Huggins GS, Hankinson SE, et al. A functional polymorphism in thepromoter of the progesterone receptor gene associated with endometrial cancer risk.Proc Natl Acad Sci USA 2002;99:12263–8.

52. Rebbeck TR, Ambrosone CB, Bell DA, et al. SNPs, haplotypes, and cancer: appli-cations in molecular epidemiology. Cancer Epidemiol Biomark Prev 2004;13:681–7.

53. Nyholt DR. Genetic case-control association studies—correcting for multiple testing.Hum Genet 2001;109:564–7.

54. Krawczak M, Boehringer S, Epplen JT. Correcting for multiple testing in geneticassociation studies: the legend lives on. Hum Genet 2001;109:566–7.

55. Boehringer S, Epplen JT, Krawczak M. Genetic association studies of bronchialasthma—a need for Bonferroni correction? Hum Genet 2000;107:197.

56. Perneger TV. What’s wrong with Bonferroni adjustments. BMJ 1998;316:1236–8.57. Bonferroni CE. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni

del R Instituto Superiore di Scienze Economiche e Commericiali di Firenze 1936;8:3–62.

58. Fallin D, Schork NJ. Accuracy of haplotype frequency estimation for biallelic loci,via the expectation-maximization algorithm for unphased diploid genotype data. Am JHum Genet 2000;67:947–59.

59. Stephens M, Donnelly P. A comparison of Bayesian methods for haplotype recon-struction from population genotype data. Am J Hum Genet 2003;73:1162–9.

8900

ESTROGEN RECEPTOR GENES AND BREAST CANCER