Whole genome genotyping technologies on the BeadArray™ platform

9
© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 41 Biotechnol. J. 2007, 2, 41–49 DOI 10.1002/biot.200600213 www.biotechnology-journal.com 1 Introduction Advances in the field of genetics are highly dependent on enabling technologies to perform accurate high-resolu- tion genomic analysis. These technologies have the po- tential to revolutionize our ability to identify disease-as- sociated loci and loci involved in mediating clinical re- sponse and potential toxicity to drug therapy [1]. Whole genome genotyping (WGG) technologies have recently emerged as attractive tools to genotype hun- dreds of thousands of SNP markers on a genome-wide scale [2]. These SNP markers can be used in linkage dis- equilibrium-based (LD) association studies to find ge- nomic regions harboring variants associated with in- creased disease incidence. LD is the nonrandom associa- tion between two or more alleles such that certain combi- nations of alleles are more likely to occur together on a chromosome than other combinations of alleles. The WGG approach is sensitive in detecting multiple small gene effects often found in complex diseases and does not rely on prior identification of candidate genes or regions. Currently, a fixed set of genome-wide SNP mark- ers appears attractive. SNP content for genome-wide as- sociation studies can be categorized based on random selection [3], tag SNPs, and functional (focused sets of genes or non-synonymous SNPs) marker sets. The Inter- national HapMap project has provided the haplotype block structure of the human genome, and enabled se- lection of tag SNPs for several populations [4]. Tag SNPs serve as proxies for a much larger set of genetically re- dundant SNPs, and essentially capture a major fraction of the “variation” present within a population. The haplo- type map and corresponding tag SNPs provide a frame- work for discovering associations between genes and disease, and may enable SNP characterization to play a role in personalized medicine. Other findings from the HapMap project are: variation in local recombination rates is a major determinant of LD, breakdown of LD is variable and can appear as “block-like” structure, and a typical SNP is highly correlated with many neighboring SNPs in any given population. Technical Report Whole genome genotyping technologies on the BeadArray™ platform Frank J. Steemers and Kevin L. Gunderson Illumina, Inc., San Diego, CA, USA The ability to simultaneously genotype hundreds of thousands of single-nucleotide polymor- phisms (SNPs) in a single assay has recently become feasible due to innovative combinations of assay and array platform multiplexing. In this review, we describe the development of the Infinium ® whole genome genotyping technology and the BeadArray TM platform. We discuss the automated use and performance of a series of genotyping BeadChips, including data quality, tech- nology scalability, and flexibility in designing array content. We describe high-density tag SNP- based BeadChips and various multi-sample BeadChip configurations with their respective appli- cations. These technologies are enabling large-scale whole genome association studies that have the potential to revolutionize our ability to detect common genetic variants with a significant role in identifying disease-associated loci, proteins, biomarkers, and pharmacogenomic responses. Keywords: Whole genome genotyping · Bead Array · Association studies · Infinium assay · Illumina Correspondence: Dr. Frank J. Steemers, Illumina, Inc., 9885 Towne Centre Drive, San Diego, CA 92121, USA E-mail: [email protected] Fax: +1-858-202-4680 Abbreviations: ASPE, allele-specific primer extension; CGH, comparative genomic hybridization; LD, linkage disequilibrium; LOH, loss of heterozy- gosity; MAF, minor allele frequency; SBE, single-base extension; SNP, sin- gle-nucleotide polymorphism; WGA, whole genome-amplified; WGG, whole genome genotyping Received 13 October 2006 Revised 22 November 2006 Accepted 23 November 2006

Transcript of Whole genome genotyping technologies on the BeadArray™ platform

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 41

Biotechnol. J. 2007, 2, 41–49 DOI 10.1002/biot.200600213 www.biotechnology-journal.com

1 Introduction

Advances in the field of genetics are highly dependent onenabling technologies to perform accurate high-resolu-tion genomic analysis. These technologies have the po-tential to revolutionize our ability to identify disease-as-sociated loci and loci involved in mediating clinical re-sponse and potential toxicity to drug therapy [1].

Whole genome genotyping (WGG) technologies haverecently emerged as attractive tools to genotype hun-dreds of thousands of SNP markers on a genome-widescale [2]. These SNP markers can be used in linkage dis-equilibrium-based (LD) association studies to find ge-nomic regions harboring variants associated with in-creased disease incidence. LD is the nonrandom associa-

tion between two or more alleles such that certain combi-nations of alleles are more likely to occur together on achromosome than other combinations of alleles.

The WGG approach is sensitive in detecting multiplesmall gene effects often found in complex diseases anddoes not rely on prior identification of candidate genes orregions. Currently, a fixed set of genome-wide SNP mark-ers appears attractive. SNP content for genome-wide as-sociation studies can be categorized based on randomselection [3], tag SNPs, and functional (focused sets ofgenes or non-synonymous SNPs) marker sets. The Inter-national HapMap project has provided the haplotypeblock structure of the human genome, and enabled se-lection of tag SNPs for several populations [4]. Tag SNPsserve as proxies for a much larger set of genetically re-dundant SNPs, and essentially capture a major fraction ofthe “variation” present within a population. The haplo-type map and corresponding tag SNPs provide a frame-work for discovering associations between genes anddisease, and may enable SNP characterization to play arole in personalized medicine. Other findings from theHapMap project are: variation in local recombinationrates is a major determinant of LD, breakdown of LD isvariable and can appear as “block-like” structure, and atypical SNP is highly correlated with many neighboringSNPs in any given population.

Technical Report

Whole genome genotyping technologies on theBeadArray™ platform

Frank J. Steemers and Kevin L. Gunderson

Illumina, Inc., San Diego, CA, USA

The ability to simultaneously genotype hundreds of thousands of single-nucleotide polymor-phisms (SNPs) in a single assay has recently become feasible due to innovative combinationsof assay and array platform multiplexing. In this review, we describe the development of theInfinium® whole genome genotyping technology and the BeadArrayTM platform. We discuss theautomated use and performance of a series of genotyping BeadChips, including data quality, tech-nology scalability, and flexibility in designing array content. We describe high-density tag SNP-based BeadChips and various multi-sample BeadChip configurations with their respective appli-cations. These technologies are enabling large-scale whole genome association studies that havethe potential to revolutionize our ability to detect common genetic variants with a significant rolein identifying disease-associated loci, proteins, biomarkers, and pharmacogenomic responses.

Keywords: Whole genome genotyping · Bead Array · Association studies · Infinium assay · Illumina

Correspondence: Dr. Frank J. Steemers, Illumina, Inc., 9885 Towne CentreDrive, San Diego, CA 92121, USAE-mail: [email protected]: +1-858-202-4680

Abbreviations: ASPE, allele-specific primer extension; CGH, comparativegenomic hybridization; LD, linkage disequilibrium; LOH, loss of heterozy-gosity; MAF, minor allele frequency; SBE, single-base extension; SNP, sin-gle-nucleotide polymorphism; WGA, whole genome-amplified; WGG,whole genome genotyping

Received 13 October 2006Revised 22 November 2006Accepted 23 November 2006

BiotechnologyJournal Biotechnol. J. 2007, 2, 41–49

42 © 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Our recent data indicate that approximately 550 000tag SNPs capture about 90% of the variation (r2>0.7) pres-ent in the European and Asian populations, and 650 000SNPs capture 75% of the variation present in the Yorubanpopulations (unpublished data). r2 is a measure of LD be-tween two SNP markers within a population. An r2 of zeroindicates that the phase of one SNP marker is completelyindependent of the other; in contrast, an r2 of 1 indicatescomplete linkage between two markers (only two haplo-types present in the population). Cordon et al. [5] showedthat the content of the HumanHap300 has comparablepower to the Affymetrix 500K providing substantial LD-based coverage of common variation in non-African pop-ulations; the precise extent is strongly dependent on thefrequencies of alleles of interest and on specific consider-ations of study design.

From a WGG technology perspective, tag SNPs can beused to reduce the number of genotyped loci while pro-ducing the same amount of information as a larger num-ber of randomly chosen loci. Hinds et al. [6] conclude thatTag SNPs in WGG genotyping products economizes ongenotyping demands by providing an equivalent poweras threefold more randomly chosen SNPs (up to onemillion SNPs). The use of fewer redundant SNPs alsominimizes data handling and computation time and re-duces the false-positive errors from multiple hypothesistesting.

Based on recent HapMap data, the key technologyneeds for genome-wide association studies are: (i) theability to accurately and economically genotype hundredsof thousands of loci across thousands of well-phenotypedcase and matched-control population samples [6, 7], (ii) arobust means of processing these samples easily and effi-ciently, (iii) a technician-friendly automatable processthat reduces sample tracking errors, and (iv) a genotypingplatform enabling unconstrained SNP selection allowingaccess to tag SNPs throughout the genome. This last re-quirement is rather essential since around 48% of theHapMap blocks are singletons (an SNP allele that is ob-served only once in the sample being analyzed), and ob-viously not well covered if the SNPs are randomly chosen(CEU, r2 threshold of 0.7, HapMap release 20). In thisreview, the Illumina’s WGG approach and associatedtechnologies are described, along with several productexamples to illustrate the versatility and reach of thetechnology.

2 Infinium I and II WGG assays

Array technology has been successfully applied to theanalysis of the entire transcriptome [8]. However, the suc-cessful implementation of an array-based WGG assay hasseveral fundamental challenges: (i) the much greater com-plexity of the human genome, (ii) the low partial concen-

tration of any given locus, and (iii) the requirement for anaccurate, single-base readout of the SNP locus.

The concept of the Infinium WGG assay is based ondirect hybridization of whole genome-amplified (WGA)genomic DNA to a bead array of 50-mer locus-specificprimers [9–13]. After locus-specific hybridization captureof each individual target to their cognate bead, each SNPlocus is “scored” by an enzymatic-based extension assayusing labeled nucleotides. After extension, these labelsare visualized by staining with a sandwich-based im-munohistochemistry assay that increases the overall sen-sitivity of the assay (Fig. 1). Inherent to the WGG designis the ability for virtually unlimited scalability, dependentonly on the physical constraint of the number of arrayelements, and not by loci multiplexing constraints. Addi-tionally, the uniformity of locus representation in the WGAsamples enables access to almost any SNP in the genome,and the high DNA yield increases the partial concentra-tion of any given locus. We observed high correlation(r2>0.98) of SNP signal intensities between amplificationreplicates, and between the same loci of different sam-ples, exemplifying the robustness of the amplification re-action (Steemers et al., unpublished results). The relative-ly high target concentrations generated drive the hy-bridization capture of the target loci to the probes on thearray. Specificity of the WGG assay was achieved by com-bining three different design elements into the assay: (i)highly specific hybridization capture using long full-length 50-mer oligonucleotide probes, (ii) an enzymaticarray-based allelic scoring step, and (iii) removal of non-specific target-target primer extension products beforearray readout. Enzymatic SNP scoring assays have beenextensively reviewed [14]. We employ two differentprimer extension assays: allele-specific primer extension(ASPE) for our Infinium I assay, and single-base extension(SBE) for our Infinium II assay (Figs. 1b, 1c). The ASPE as-say, a one-color format, is specifically designed to allowthe detection of all SNP classes by employing two identi-cal probes differing only at their 3’ base. One probe is theperfect match hybrid for allele A, and the other is the per-fect match hybrid for allele B. The ability to detect all SNPclasses is useful for genotyping custom SNPs. The SBEassay format uses a single probe per SNP with a two-col-or readout. This characteristic reduces the required num-ber of oligonucleotides by half as compared to the ASPEassay, allowing WGG probe sets to be made more eco-nomically. The limitation of the two-color SBE assay de-sign is that only 83% of common bi-allelic SNPs can bescored on a single slide. The A and T nucleotides arestained in one color and C and G in another, therefore ATand GC polymorphisms cannot be detected. However, theremaining 17% of SNPs can be simultaneously genotypedon the same slide using a two-probe ASPE design withSBE biochemical scoring [10].

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 43

3 High-density array platforms; BeadChips

WGG technology is currently deployed on the Sentrix®

BeadChip platform. BeadChips are micro-electro-me-chanical systems in which wells are created through acombination of photo lithography and plasma etching onsilicon wafers. The main concept of array manufacturingis that DNA-immobilized beads are randomly dispersedand assembled into wells of a slide [15, 16]. A decodingprocess maps the location and identity of each beadon the array [17]. Currently, 3-μm-diameter beads (with~5-μm center-to-center spacing) are utilized, allowingpacking of over ~13 million beads on a single slide(Fig. 2). With current bead density, and a redundancy of~20 beads per bead type, this yields over 700 000 possibleunique bead types per slide.

In addition to array technology, Illumina has imple-mented Oligator® technology to complement its array andassay technologies [18]. This high-throughput oligo syn-thesis facility is instrumental to the production of high-quality, low-cost, genome-wide probe sets. Oligos are im-mobilized to chemically activated beads in a high-throughput fashion using 384-well microtiter plates [19].After oligo attachment, individual bead types are pooled

and arrayed. Beads as array elements have several ad-vantages: (i) monodisperse beads yield uniform “spotsizes”, and (ii) oligos are immobilized in bulk surface re-actions on millions of beads, as compared to functional-ization as an independent event, e.g., spotting or synthe-sizing each oligo on arrays. This feature contributes to ar-ray-to-array feature consistency. Additionally, the abilityto immobilize oligos through their 5’-end attachment moi-ety generates 3’ full-length probes.

The BeadChip substrate provides a versatile arrayplatform that can be formatted into various sample lay-outs. In Fig. 3 several exemplar layouts are shown: a sin-gle sample and two multi-sample iSelect™ configurationsThe modularity of the layout is enabled by the use of indi-vidual bead pools and separate loading stripes on theBeadChip. Given 12 stripes, one can load 12 differentbead pools for up to 720 000 assays across a single sam-ple, or alternatively one can load a single bead pool 12times for up to 60 000 assays across 12 different samples.These multi-sample offerings are typically used for finemapping studies by researchers who have already per-formed WGG and/or have particular genomic regionswith SNPs of interest. Simple gasket layouts allow differ-ent samples to be applied to the different stripes on the

Biotechnol. J. 2007, 2, 41–49 www.biotechnology-journal.com

Figure 1. Infinium assays. (a) A schematic of the Infinium assay, (b) Infinium I probe design (ASPE), and (c) Infinium II probe design (SBE). Two-colorSBE assay biochemistry can be used to simultaneously score both assay designs.

BiotechnologyJournal Biotechnol. J. 2007, 2, 41–49

44 © 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

BeadChip. Additionally, these removable chamberedplastic gaskets, with inlet and outlet ports, create a low-volume gap (~10 μL per stripe) minimizing the amount ofsample required (see Fig. 3). An additional advantage ofthis customizable approach is that standard products canbe configured with several stripes reserved for customSNPs. For example, the iSelect Infinium concept allows afocused set of markers relevant to a particular disease,candidate gene set, or population to be added to theBeadChip. Products that employ this concept are the Hu-manHap550+, the HumanHap550Y, and the Human-Hap300-Duo. The HumanHap550+ product design con-sists of 10 stripes loaded with standard product beadpools with two stripes reserved for two bead pools con-taining up to 120 000 custom SNP markers. The Human-Hap650Y genotyping BeadChip uses the base Human-Hap550 BeadChip and includes two additional bead poolsencompassing over 100 000 Yoruban SNPs to extend thecoverage for this population. The recent introduction ofthe HumanHap300-Duo BeadChip enables researchers toanalyze two samples across 317 000 tag SNP markers ona single BeadChip. This configuration allows the pairedanalysis of two matched DNA samples, for example as inDNA copy/loss of heterozygosity (LOH) analysis of tumorgenomic DNA and matched normal genomic DNA in can-cer studies [20].

4 Infinium workflow and assay automation

The Infinium workflow can be divided in three main seg-ments: (i) sample preparation, (ii), sample fragmentationand hybridization, (iii) extension, staining, and scanning(Scheme 1). On day 1, the WGA process generates hun-dred of micrograms of amplified genomic DNA startingfrom an initial input of several hundred nanograms(250–750 ng) of relatively intact, quantified, DNA (>1 kb

fragment size). For a BeadChip with over 650 000 SNPs,this translates into approximately 1 pg genomic DNA perSNP. The quality of the DNA sample (average length, de-gree of cross-linking, and lesion frequency) can affect theamplification efficiency. For example, we have seen pooramplification from low-quality DNA harvested from for-malin-fixed, paraffin-embedded samples. On day 2, theamplified genomic DNA is fragmented to an average sizeof around 300 bp using an endpoint enzymatic fragmen-tation protocol. DNA fragmentation improves the hy-bridization of the DNA sample. To facilitate plate-basedamplification and isopropanol precipitation, the genomicDNA sample is amplified in four separate wells. After frag-mentation, precipitation, and resuspension in hybridiza-tion buffer, the four wells are recombined. This material ishybridized overnight to a BeadChip placed in a humidi-fied chamber. On day 3, the BeadChips undergo primerextension, immunohistochemical staining, and are im-aged using a two-color confocal laser system with 0.8-μmresolution. The Bead intensities are extracted, and geno-types are calculated using an Illumina-supplied clusterfile, which is based on a set of reference samples. The nor-malization algorithm adjusts for nominal offset, cross-talk,and intensity variations observed in the two-color chan-nels. The behavior of each locus is modeled using a cus-tom-made clustering algorithm that incorporates severalbiological heuristics for calling SNP genotypes. In caseswere fewer then three clusters are observed, locations andshapes of the missing clusters are estimated using artifi-cial neural networks

The Infinium WGG assay was automated using theTecan GenePaint™ system, which combines roboticswith a simple capillary-gap-based fluidics system (Te-Flow chambers) (Fig. 4). In this system, BeadChips are as-sembled in Te-Flow chambers creating a ~70-μm capil-lary gap that enables easy fluidics processing. Gravity

Figure 2. BeadArray technology: beads in wells concept. Beads are ran-domly dispersed in wells on silicon wafers.

Figure 3. Examples of (a) HumanHap300-Duo+ with gasket technology,(b) iSelect BeadChips with gasket technology (the sample access ports arethe alternating left and right areas adjacent to each stripe), and (c) Hu-manHap550+.

a b c

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 45

flow allows reagent exchange (wash solutions, blockingmixes, extension reagents), and capillary action retainsreagents within the gap after the reservoir is emptied.Reagents are delivered robotically by dispensing reagentsinto the flow cell reservoir. For single-sample BeadChips,amplified DNA samples are hybridized in the Te-Flowchambers. For the multi-sample BeadChip configuration,the individual samples are robotically dispensed to the in-let ports of the gasket seals (Fig. 3) and hybridized. Afterhybridization, the seals are removed and the BeadChipsare assembled in the Te-Flow chambers. Subsequently,the workflow proceeds similarly as a single-sample Bead-

Chip. We have formulated the reagents in aliquoted, sin-gle-use tubes, ensuring ease of use, consistency, and min-imizing reagent preparation errors. Ninety-six BeadChipscan be processed in parallel on a temperature-controlledTecan chamber rack. The automation contributes to theease of use, robustness of the system, and the laboratoryinformation management system tracking capability ofthe DNA samples in the Infinium assay.

5 High-density Infinium BeadChips forgenotyping and SNP-comparative genomichybridization applications

The ability to select almost any SNP of interest, especial-ly tag SNPs, allows for the design of information-richWGG products as exemplified in the HumanHap550BeadChip product. The >550 000 tag SNPs were selectedfrom over 2 million common SNPs discovered in the re-cently completed Phase I and II International HapMapProject using an algorithm for the LD statistics, “r2”, to se-lect tagSNPs in all HapMap populations [1, 12, 21]. Thetag SNP content was supplemented with additional SNPsto achieve even spacing across the genome. On average,there is one common SNP [minor allele frequency (MAF)>0.05] every 5.5, 6.5 and 6.3 kb on the autosomes in the

Biotechnol. J. 2007, 2, 41–49 www.biotechnology-journal.com

Scheme 1. Infinium assay workflow. Details of the assay protocols over 3 days are displayed.

Figure 4. (a) BeadChip in slide holder. (b) Capillary gap allows the flow ofreagents over the BeadChip surface. (c) A 96-slide processing station withtemperature control.

a b c

BiotechnologyJournal Biotechnol. J. 2007, 2, 41–49

46 © 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

CEU, CHB +JPT, and YRI (CEU = Caucasian, CHB+JPT =Han Chinese and Japanese, YRI = Yoruba populations),respectively (Fig. 5a). In addition, there is SNP contentfrom recently reported copy number polymorphism re-gions of the genome, non-synonymous (ns)SNPs, tagSNPs in the major histocompatibility complex, mitochon-drial SNPs, and Y-chromosome SNPs (Fig. 5a). The Hu-manHap550 provides excellent LD-based genome cover-age as shown in Fig. 5b. An additional ~100 000 YorubanSNPs have been included on the HumanHap650Y to im-prove coverage of populations of recent African ancestry.The MAF distribution for the HumanHap550 is shown inFig. 5c, demonstrating high median and relatively bal-anced MAF’s across the different populations. Finally,about >90% of the RefSeq transcripts and exons are cov-ered by at least one SNP (Fig. 5d).

The SNPs represented on the HumanHap550 Bead-Chip were subjected to functional testing to ensure strongperformance of the final product. Figure 6a shows repre-sentative genotyping data for a single sample across> 550 000 SNPs from the HumanHap550 BeadChip, and

Fig. 6b shows an individual locus across 120 samplesused for clustering. The average call rate, Mendelian in-consistency, and reproducibility error rate for these120 samples are shown in Table 1. These high quality andaccurate genotype calls are imperative for whole genomeassociation studies. For example, complex disease traitshave relatively small gene effects, potential associationsmay be missed if the assayed SNP, in LD with the diseaseSNP, has a low call rate or called incorrectly. The Human-Hap550 BeadChip’s performance shows strong concor-dance with genotyping data from the InternationalHapMap Project (Table 1) attesting to its high accuracy.

Traditionally, array-based comparative genomic hy-bridization (CGH) has been utilized to study genomic al-terations underlying cancer initiation and progression[22]. This classical technique involves the analysis of nor-malized intensities between a reference and subject sam-ple to scan the genome for DNA copy number changes,LOH, copy neutral LOH events, and other chromosomalaberrations found in cancers and congenital disorders.The ability of the HumanHap550 Genotyping BeadChip to

Figure 5. (a) HumanHap550 content description. (b) Genomic coverage of the HumanHap550 and the HumanHap650Y BeadChips using the HapMapRelease 21 data (www.hapmap.com) was estimated in three populations (CEU = Caucasian, CHB+JPT = Han Chinese and Japanese, YRI = Yoruba). Foreach population, the maximum pairwise r2 (cutoff r2) value of each HapMap locus (MAF = 0.05) to a locus on the BeadChips was determined. Maximum r2

values were computed using an algorithm based on methods described in Carlson et al. [1] Histograms were generated by calculating the cumulative pro-portion of all HapMap loci that were captured by HumanHap650Y BeadChip loci at various r2 thresholds. SNP assays represented on the BeadChip capture93%, 91%, and 76% of the HapMap Release 21 loci (MAF >0.05) in CEU, CHB/JPT, and YRI populations, respectively (r2 > 0.7, pairwise tests), (c) Human-Hap550 MAF distribution, (d) Percentage of total Refseq transcripts or exons covered by a given number of SNPs.

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 47

detect allelic imbalances has been demonstrated in thecancer cell lines HL60 (promyelocytic leukemia) andMCF7 (breast adenocarcinoma) [20]. In Fig. 7 are shownexamples of homozygous deletions, hemizygous dele-tions, copy neutral LOH, duplications, and amplificationsusing SNP-CGH detection at resolution finer than 100 kb.All of the aberrations have been previously identified us-ing spectral karyotyping (SKY). Most importantly, in SNP-CGH simultaneous measurement of both signal intensityvariation and changes in the allelic ratio on a locus basisallows the detection of both copy number changes and

copy neutral LOH events, not feasible with standard ar-ray-CGH platforms that monitor only the copy numberevents. Copy neutral LOH, or an extended tract of ho-mozygosity that appears as an LOH event but with nochange in the physical copy number of a locus. These ho-mozygosity tracts may be due to inbreeding, large scalegene conversion or chromosomal deletions. Additionally,other factors such as population history (bottlenecks) orlow recombination rates may also contribute to creatingextended regions of copy neutral LOH. It is becoming in-creasingly important in cancer biology to detect these

Biotechnol. J. 2007, 2, 41–49 www.biotechnology-journal.com

Table 1. HapHap550 BeadChip genotyping data

Parameter Counts Percent Productspecification

Call rate 66, 498, 377/99.78% > 99%a

66, 642, 240

Reproducibility 4, 430, 704/> 99.99% > 99.9%

4, 431, 059

Mendelian 5, 534/0.035% < 0.1%

inconsistencies 15, 432, 032

HapMap 59, 189, 425/99.74% –

concordance 59, 341, 039

a) Average, standard scan

Figure 6. (a) Genoplot of over 550 000 SNPs from a single DNA sample.(b) A typical individual Genoplot for a single locus across 120 differentDNA samples. NormR is the normalized signal intensity value. Theta iscalculated using the arc tan (B/A) conversion. A and B are the signal in-tensities for the A allele and B allele, respectively.

Figure 7. The ability of the HumanHap550 genotyping BeadChip to detect chromosomal aberrations in the cancer cell lines HL60 (promyelocyticleukemia) and MCF7 (breast adenocarcinoma) are demonstrated. In HL60, deletions of varying size on chromosome 9 (HL60); (a) An ~20-Mb deletion onthe p arm and an ~1.8-Mb deletion on the q arm. These hemizygous deletions (from two copies to one copy) are manifest as a downward deflection in thelog R ratio and a loss of heterozygotes in the allele frequencies (AF) plot, (b) other examples of aberrations in HL60 include monoallelic duplication ofchromosome 18 as indicated by an increase in the log R ratio and split of the heterozygotes into two states; one at 0.67 (2:1 ratio) AF and another at 0.33(1:2 ratio) AF, (c) several small amplifications in HL60 on Chromosome 8, and (d) two adjacent homozygous and hemizygous deletions in MCF7. The logR ratio compares the observed normalized intensity of the subject sample to the expected intensity based on the observed allelic ratio, through the linearinterpolation of the canonical clusters AA, AB, and BB. The canonical clusters are also used to convert theta values to AF values. This is accomplished by alinear interpolation of the known AFs assigned to each cluster (0, 0.5 and 1).

BiotechnologyJournal Biotechnol. J. 2007, 2, 41–49

48 © 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

copy neutral events. For instance, uniparental disomy andmitotic recombination have been shown to be involved inboth congenital abnormalities and somatic cell tumors[23, 24].

6 Conclusions

We have demonstrated that WGG technology can obtainhigh-quality genotyping results with >650 000 loci accu-rately, efficiently and cost effective ($0.001 per SNP) en-abling whole genome association and chromosomal aber-ration studies. These studies will result in a new genera-tion of diagnostic markers and tools, enabling early dis-ease or disease susceptibility detection. A recent findingof a gene linked to inflammatory bowel disease using theHumanHap BeadChip product line is hopefully illustrativeof future discoveries [25]. Current economics are mainlydriven by array miniaturization and assay multiplexing.The relatively unlimited multiplex capabilities, the flexi-bility to choose SNPs based on scientific interest, and as-say automation are key characteristics of the WGG tech-nology. These characteristics enable products rangingfrom genome-wide tag SNP-based, custom designed, andmulti-sample BeadChips with a common workflow fromwhole genome to fine mapping. The HumanHap650 cur-rently offers the most comprehensive genomic coverageacross populations.

Besides applications like genotyping, high-densitySNP-CGH BeadChips scan the genome for both physicaland genetic abnormalities, allowing the detection of copyneutral genetic variations at unprecedented 10-kb resolu-tion. These assays will, in the near term, help to charac-terize the basis and classification of cancers. We foreseegreat opportunities for whole-genome approaches, or tar-geted strategies thereof, which not only can measure thephysical nature of the nucleotide, but also its modificationstate; epigenetic or chromatin modification analysis.Challenges remain regarding the genome-wide multiplexlevels of epigenetic variation, its accuracy, and foremostquantitative nature of these assays. In the future, inte-grated whole genome analysis of the genome will likely in-clude a suite of methods including whole genome-SNPgenotyping, epigenetic, genomic loss, amplification stud-ies, focused large-scale genome sequencing, and RNAanalysis.

The technology development summarized here would nothave been possible without the efforts of many dedicatedindividuals. We would like to thank the people at Illuminafor their valuable contributions in molecular biology, au-tomation, oligo synthesis, chemistry, engineering, infor-matics, manufacturing, and process development. Illumi-na, Sentrix, BeadArray, Infinium, Oligator, and iSelect areregistered trademarks or trademarks of Illumina, Inc. All

other brands and names contained herein are the proper-ty of their respective owners.

7 References

[1] Carlson, C. S., Eberle, M. A., Rieder, M. J., Yi, Q. et al., Selecting amaximally informative set of single-nucleotide polymorphisms forassociation analyses using linkage disequilibrium. Am. J. Hum.Genet. 2004, 74, 106–120.

[2] Syvanen, A. C., Toward genome-wide SNP genotyping. Nat. Genet.2005, 37, 5–10.

[3] Matsuzaki, H., Shoulian, D., Loi, H., Di, X. et al., Genotyping over100,000 SNPs on a pair of oligonucleotide array. Nat. Methods 2004,1, 109–111.

[4] Couzin, J., Human genome. HapMap launched with pledges of $100million. Science 2002, 298, 941–942.

[5] Barrett, J. C., Cardon, L. R., Evaluating coverage of genome-wide as-sociation studies. Nat. Genet. 2006, 38, 659–662.

[6] Hinds, D. A., Stuve, L. L., Nilsen, G. B., Halperin, E. et al., Whole-genome patterns of common DNA variation in three human popula-tions. Science 2005, 307, 1072–1079.

[7] Long, A. D., Langley, C. H., The power of association studies to de-tect the contribution of candidate genetic loci to variation in com-plex traits. Genome Res. 1999, 9, 720–731.

[8] Lipshutz, R. J., Fodor, S. P.A., Gingeras, T. R., Lockhart, D. J., Highdensity synthetic oligonucleotide arrays. Nat. Genet. 1999, 21,20–24.

[9] Gunderson, K. L., Steemers, F. J., Lee, G., Mendoza, L. G. et al., Agenome-wide scalable SNP genotyping assay using microarraytechnology. Nat. Genet. 2005, 37, 549–554.

[10] Steemers, F. J., Chang, W., Lee, G., Barker, D. L. et al., Whole-genomegenotyping with the single-base extension assay. Nat. Methods2006, 3, 31–33.

[11] Steemers, F. J., Gunderson, K. L., Illumina, Inc. Pharmacogenomics2005, 6, 777–782.

[12] Gunderson, K. L., Kuhn, K. M., Steemers, F. J., Ng, P. et al., Whole-genome genotyping of haplotype tag single nucleotide polymor-phisms. Pharmacogenomics 2006, 7, 641–648.

[13] Gunderson, K. L., Steemers, F. J., Ren, H., Ng, P. et al., Whole-genomegenotyping. Methods Enzymol. 2006, 410, 359–376.

[14] Kwok, P. Y., Methods for genotyping single nucleotide polymor-phisms. Annu. Rev. Genomics Hum. Genet. 2001, 2, 235–258.

[15] Michael, K. L., Taylor, L. C., Schultz, S. L, Walt, D. R., Randomly or-dered addressable high-density optical sensor arrays. Anal. Chem.1998, 70, 1242–1248.

[16] Steemers, F. J., Ferguson, J. A., Walt, D. R., Screening unlabeledDNA targets with randomly ordered fiber-optic gene arrays. Nat.Biotechnol. 2000, 18, 91–94.

[17] Gunderson, K. L., Kruglyak, S., Graige, M. S., Garcia, F. et al., De-coding randomly ordered DNA arrays. Genome Res. 2004, 14,870–877.

[18] Lebl, M., Burger, C., Ellman, B., Heiner, D. et al., Fully automated par-allel oligonucleotide synthesizer. Collect. Czech. Chem. Commun.2001, 66, 1299–1314.

[19] Steinberg, G., Stromsburg, K., Thomas, L., Barker, D. et al., Strate-gies for covalent attachment of DNA to beads. Biopolymers 2004, 73,597–605.

[20] Peiffer, D. A., Le, J. M., Steemers, F. J., Chang, W. et al., High-reso-lution genomic profiling of chromosomal aberrations using Infiniumwhole-genome genotyping. Genome Res. 2006, 16, 1136–1148.

[21] The International HapMap Project. Nature 2003, 426, 789–796.

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

[22] Pinkel, D., Segraves, R., Sudar, D., Clark, S. et al., High resolutionanalysis of DNA copy number variation using comparative genomichybridization to microarrays. Nat. Genet. 1998, 20, 207–211.

[23] Bruce, S., Leinonen, R., Lindgren, C. M., Kivinen, K. et al., Globalanalysis of uniparental disomy using high density genotyping ar-rays. J. Med. Genet. 2005, 42, 847–851.

[24] Teh, M. T., Blaydon, D., Chaplin, T., Foot, N. J. et al., Genomewidesingle nucleotide polymorphism microarray mapping in basal cellcarcinomas unveils uniparental disomy as a key somatic event. Can-cer Res. 2005, 65, 8597–8603.

[25] Duerr, R. H., Taylor, K. D., Brant, S. R., Rioux, J. D. et al., A genome-wide association study identifies IL23R as an inflammatory boweldisease gene. Science 2006, 314, 1461-1463.

Biotechnol. J. 2007, 2, 41–49