Impact of the Human Genome Project on Epidemiologic ...

11
Epidemiologic Reviews Copyright O 1997 by The Johns Hopkins University School of Hygiene and Public Health All rights reserved Vol. 19, r4o. 1 Printed tn USA. Impact of the Human Genome Project on Epidemiologic Research Darrell L. Ellsworth, 1 D. Michael Hallman, and Eric Boerwinkle INTRODUCTION Traditionally, the fields of human genetics and epi- demiology were independent disciplines with minimal interaction between them. Over the years, however, genetic concepts were slowly integrated with epidemi- ologic methods to capitalize on the advantages of then- diverse perspectives and expertise (1). By the late 1970s, genetic epidemiology emerged as a formal dis- cipline and blossomed throughout the ensuing decades as advances in molecular biology advanced our under- standing of human genetic disease (2-5). Genetic epi- demiology relates genetic characteristics that may be influenced by environmental exposures to the distri- bution of disease among relatives and within diverse human populations. The primary objective of this field is to better understand the genetic etiology of disease in order to facilitate early prediction and design more effective intervention strategies. The discipline of genetic epidemiology has greatly expanded the applicability and utility to the public of genetic advancements, including the mapping of genes responsible for Mendelian diseases (such as cystic fibrosis) (6-8) and the development of models to predict disease (such as the multiple-step mechanism Received for publication September 6, 1996, and accepted for publication April 28, 1997. Abbreviations [NOTE: for a more detailed explanation and/or definition of the terms and abbreviations included in this presenta- tion, see the Glossary at the end of this volume]: apo, apolipo- protein; cM, centimorgan; dbEST, expressed sequence tag division of GenBank; ELSI, Ethical, Legal, and Social Issues Program; ES, embryonic stem; EST, expressed sequence tag; GDB, Genome Database; GenBank, National Institutes of Health genetic sequence database; HLA, human leukocyte antigen; kb, kilobase pairs; LOD score, decimal log likelihood ratio; Mb, megabase pairs; mRNA, messenger RNA; NHGRI, National Human Genome Research Insti- tute; NIDDM, non-insulin-dependent diabetes meJIitus; OMIM, On- line Mendelian Inheritance In Man; PCR, polymerase chain reaction; RFLP, restriction fragment length polymorphism; STS, sequence- tagged site; TDT, transmission disequilibrium test; UniGene, Unique Human Gene Sequence Collection. From the Human Genetics Center, The University of Texas— Houston Health Science Center, Houston, TX. 1 Current address: Epidemiology and Biometry Program, Division of Epidemiology and Clinical Applications, National Heart, Lung and Blood Institute, Bethesda, MD. Reprint requests to Dr. Eric Boerwinkle, Human Genetics Center, The University of Texas—Houston Health Science Center, P.O. Box 20334, Houston, TX 77225-0334. of carcinogenesis) (9, 10). The greatest challenges confronting genetic epidemiology, however, are the common chronic diseases with late age-of-onset which exert a tremendous burden on public health as mea- sured by morbidity, mortality, and cost. The greatest impact and benefit to public health from genetic epi- demiologic research will likely come from uncovering and better understanding the genetic etiology of the common chronic diseases (such as coronary artery disease and diabetes) and the common forms of cancer (such as breast and colon cancer). Until recently, genetic epidemiology made infer- ences primarily from statistical analyses of the distri- bution of disease or other traits among family mem- bers. Direct measures of genetic information were rare and, with the exception of the human leukocyte anti- gen (HLA) complex, were limited to red cell antigens and polymorphic red cell and plasma enzymes (11). For the field of genetic epidemiology to achieve its full potential and better characterize the genetic etiology of the common chronic diseases, high quality genetic markers were necessary for gene mapping, and im- proved methods needed to be developed to detect and quantify functional alleles. The Human Genome Project and other developments in molecular biology are providing the necessary tools for epidemiology and genetic epidemiology to uncover the molecular mech- anisms for variation in the distribution of disease among families and populations. Although there have been other conceptual and technical advances (partic- ularly in the area of computational methods) that have advanced the field of genetic epidemiology, progress in this area is not the subject of this review. In this presentation we briefly review the objectives of the genome project, the type of information provided by the genome initiative, and its utility to epidemiologic research. Our thesis is that the Human Genome Project is providing valuable tools to further the objectives of genetic epidemiology while simultaneously broad- ening its scope. However, it is incumbent upon both genetic and other epidemiologists to actively acquire this information and utilize it to its full advan- tage. Downloaded from https://academic.oup.com/epirev/article/19/1/3/616874 by guest on 04 June 2022

Transcript of Impact of the Human Genome Project on Epidemiologic ...

Epidemiologic ReviewsCopyright O 1997 by The Johns Hopkins University School of Hygiene and Public HealthAll rights reserved

Vol. 19, r4o. 1Printed tn USA.

Impact of the Human Genome Project on Epidemiologic Research

Darrell L. Ellsworth,1 D. Michael Hallman, and Eric Boerwinkle

INTRODUCTION

Traditionally, the fields of human genetics and epi-demiology were independent disciplines with minimalinteraction between them. Over the years, however,genetic concepts were slowly integrated with epidemi-ologic methods to capitalize on the advantages of then-diverse perspectives and expertise (1). By the late1970s, genetic epidemiology emerged as a formal dis-cipline and blossomed throughout the ensuing decadesas advances in molecular biology advanced our under-standing of human genetic disease (2-5). Genetic epi-demiology relates genetic characteristics that may beinfluenced by environmental exposures to the distri-bution of disease among relatives and within diversehuman populations. The primary objective of this fieldis to better understand the genetic etiology of diseasein order to facilitate early prediction and design moreeffective intervention strategies.

The discipline of genetic epidemiology has greatlyexpanded the applicability and utility to the public ofgenetic advancements, including the mapping of genesresponsible for Mendelian diseases (such as cysticfibrosis) (6-8) and the development of models topredict disease (such as the multiple-step mechanism

Received for publication September 6, 1996, and accepted forpublication April 28, 1997.

Abbreviations [NOTE: for a more detailed explanation and/ordefinition of the terms and abbreviations included in this presenta-tion, see the Glossary at the end of this volume]: apo, apolipo-protein; cM, centimorgan; dbEST, expressed sequence tag divisionof GenBank; ELSI, Ethical, Legal, and Social Issues Program; ES,embryonic stem; EST, expressed sequence tag; GDB, GenomeDatabase; GenBank, National Institutes of Health genetic sequencedatabase; HLA, human leukocyte antigen; kb, kilobase pairs; LODscore, decimal log likelihood ratio; Mb, megabase pairs; mRNA,messenger RNA; NHGRI, National Human Genome Research Insti-tute; NIDDM, non-insulin-dependent diabetes meJIitus; OMIM, On-line Mendelian Inheritance In Man; PCR, polymerase chain reaction;RFLP, restriction fragment length polymorphism; STS, sequence-tagged site; TDT, transmission disequilibrium test; UniGene, UniqueHuman Gene Sequence Collection.

From the Human Genetics Center, The University of Texas—Houston Health Science Center, Houston, TX.

1 Current address: Epidemiology and Biometry Program, Divisionof Epidemiology and Clinical Applications, National Heart, Lung andBlood Institute, Bethesda, MD.

Reprint requests to Dr. Eric Boerwinkle, Human Genetics Center,The University of Texas—Houston Health Science Center, P.O. Box20334, Houston, TX 77225-0334.

of carcinogenesis) (9, 10). The greatest challengesconfronting genetic epidemiology, however, are thecommon chronic diseases with late age-of-onset whichexert a tremendous burden on public health as mea-sured by morbidity, mortality, and cost. The greatestimpact and benefit to public health from genetic epi-demiologic research will likely come from uncoveringand better understanding the genetic etiology of thecommon chronic diseases (such as coronary arterydisease and diabetes) and the common forms of cancer(such as breast and colon cancer).

Until recently, genetic epidemiology made infer-ences primarily from statistical analyses of the distri-bution of disease or other traits among family mem-bers. Direct measures of genetic information were rareand, with the exception of the human leukocyte anti-gen (HLA) complex, were limited to red cell antigensand polymorphic red cell and plasma enzymes (11).For the field of genetic epidemiology to achieve its fullpotential and better characterize the genetic etiology ofthe common chronic diseases, high quality geneticmarkers were necessary for gene mapping, and im-proved methods needed to be developed to detect andquantify functional alleles. The Human GenomeProject and other developments in molecular biologyare providing the necessary tools for epidemiology andgenetic epidemiology to uncover the molecular mech-anisms for variation in the distribution of diseaseamong families and populations. Although there havebeen other conceptual and technical advances (partic-ularly in the area of computational methods) that haveadvanced the field of genetic epidemiology, progressin this area is not the subject of this review. In thispresentation we briefly review the objectives of thegenome project, the type of information provided bythe genome initiative, and its utility to epidemiologicresearch. Our thesis is that the Human Genome Projectis providing valuable tools to further the objectivesof genetic epidemiology while simultaneously broad-ening its scope. However, it is incumbent uponboth genetic and other epidemiologists to activelyacquire this information and utilize it to its full advan-tage.

Dow

nloaded from https://academ

ic.oup.com/epirev/article/19/1/3/616874 by guest on 04 June 2022

4 Ellsworth et al.

THE GENOME PROJECT

The Human Genome Project is a cooperative multi-national initiative with the ultimate goal of determin-ing the complete DNA sequence of the human genomeas well as the genomes of several model organisms(12-15). DNA is a macromolecule which representsthe molecular basis of heredity and consists of a lineararray of deoxyribonucleotides. Each deoxyribonucle-otide is composed of sugar (deoxyribose) and phos-phate groups as well as a nitrogenous base which canbe either a purine (adenine (A) or guanine (G)) or apyrimidine (cytosine (C) or thymine (T)). The preciseorder of the nitrogenous bases along the DNA encodesthe genetic information in a code that is universalamong organisms. The human genome consists of alinear arrangement of approximately three billion de-oxyribonucleotides partitioned into 22 autosomes andtwo sex chromosomes (X and Y), as well as a smallamount of DNA present in the mitochondria (16). Theprotein-coding portions of the estimated 100,000genes in the human genome represent only a fraction(5-10 percent) of our genetic material (17, 18). Se-quences that do not encode protein, such as introns,intergenic regions, pseudogenes, and repetitive ele-ments whose functions (and importance) are not com-pletely understood at present, comprise the remaining90 to 95 percent of the genome.

The genome initiative in the United States is coor-dinated by the National Human Genome ResearchInstitute (NHGRI) at the National Institutes of Healthand by the US Department of Energy (19, 20). Goalsto be accomplished in the first 5 years (1990-1995) ofthe Human Genome Project included: 1) completion ofgenetic maps with markers spaced 2-5 centimorgans(cM) apart (a centimorgan expresses relative distanceon a genetic map and is equal to 1 percent recombi-nation, or approximately one million base pairs) anddevelopment of technology for rapid genotyping; 2)development of physical maps with a resolution of 100kilobase pairs (1 kilobase pair (kb) is equal to 1,000base pairs); 3) complete sequencing bacteria, yeast,nematode, and fruit fly genomes, as well as developingthe capability to sequence 10 megabase pairs (1 mega-base pair (Mb) is equal to 1 million base pairs) ofhuman DNA per year at a cost of $0.50 per base pair;4) improving methods of identifying and mappinggenes; 5) creating and refining computer databases tohandle the enormous amounts of data generated fromthe genome initiative; and 6) explorating ethical issuesrelated to molecular diagnostics (21).

The genome project has adopted a hierarchical ap-proach to proceed through increasing levels of resolu-tion and detail. An immediate focus is the completionof several types of genomic maps that reflect the

organization and coordinate positioning of specificgenome landmarks (sequence-based markers). Lowresolution cytogenetic maps are produced by exami-nation of photomicrographs (karyotypes) depicting thenumber, size, and morphology of an individual's chro-mosomes. Karyotypes are prepared during metaphaseof mitosis (cell division) when chromosomes aresufficiently condensed so as to become visible.Differentially-stained regions (bands) of the chromo-somes provide an efficient system for assigning genesor other features of the DNA to specific chromosomalregions. Although chromosomal rearrangements havelimited utility in localizing genes influencing complexdiseases, some cytogenetic alterations have been use-ful in the identification of genes responsible for simplyinherited genetic disorders such as fragile X syndrome(22) and chronic granulomatous disease (23). In addi-tion to localizing disease genes, the readily distin-guishable banding patterns that comprise cytogeneticmaps constitute a framework for the construction ofmore detailed maps.

Genetic (or linkage) maps orient markers relative toone another such that distances between markers areexpressed as recombination frequencies rather than astrue physical distances measured in base pairs. Geneticdistances are inferred from an analysis of the inheri-tance patterns of marker genotypes in a large numberof families. Restriction fragment length polymor-phisms (RFLP) are DNA sequence variations detect-able by restriction enzymes (enzymes that cut theDNA molecule at specific sequences) that were usedin the construction of early low-resolution geneticmaps. However, due to their relatively low informa-tion content for genetic linkage analysis (the site ofcleavage is present or absent), restriction fragmentlength polymorphisms have been largely supplantedby microsatellites which consist of two to four basepair sequences (such as CA or GATA) that are tan-demly repeated. Microsatellites typically exhibit highlevels of variability due to differences in the number ofrepeats (24). Continuous refinement of comprehensivegenetic maps increases the density of high qualitymarkers. For example, the 1994 G6n6thon human link-age map contained 2,066 short tandem repeat poly-morphisms (25), while the latest version (1996) con-tains 5,264 microsatellites with an average interval of1.6 cM (26).

Physical maps, which contain ordered DNA markersat known (or closely approximated) distances (in basepairs rather than centimorgans), are constructed byassembling sets of overlapping DNA fragments (con-tigs) using sequence-tagged sites (STS) (27). Asequence-tagged site is a unique DNA sequence (typ-ically 200-500 base pairs in length) that is readily

Epidemiol Rev Vol. 19, No. 1, 1997

Dow

nloaded from https://academ

ic.oup.com/epirev/article/19/1/3/616874 by guest on 04 June 2022

Genome Initiative and Genetic Epidemiology 5

detectable by a polymerase chain reaction (PCR) assayand is used to identify long DNA fragments that areassumed to overlap because they share a commonsequence-tagged site. Physical maps are extremelyuseful for determining the positions of genes associ-ated with disease and can provide a primary scaffoldfor initiating large-scale DNA sequencing. A recentphysical map of the human genome with extensivelong-range continuity contains more than 15,000sequence-tagged sites with an average spacing of ap-proximately 200 kilobase pairs (28).

To rapidly identify expressed human genes and pro-vide key resources for gene mapping, recent effortshave focused on isolating and partially characterizinggenes that are transcribed as messenger RNA(mRNA). Messenger RNA molecules are transcribedfrom DNA (gene) sequences in the nucleus and thenfunction out in the cytoplasm by specifying the se-quence of amino acids in nascent polypeptides (trans-lation). The partial sequences of messenger RNA mol-ecules, which represent expressed genes, are known asexpressed sequence tags (ESTs) and can be localizedto specific chromosomal regions and integrated intotranscription maps. Transcription mapping is an expe-dient method for localizing potential disease suscepti-bility genes because it simultaneously provides loca-tion and sequence information. Catalogs of unmappedexpressed sequence tags also represent powerful toolsfor assessing human gene diversity and determininggene function through patterns of expression. Data arecurrently available on more than 292,878 expressedsequence tags derived from at least 37 distinct organsand tissues (29).

The ultimate goal of the Human Genome Project isto establish the entire 3 billion base pair sequence ofthe human genome. Only a fraction of the totalgenomic sequence has been determined to date; how-ever, current methods have enabled complete DNAsequences to be obtained for several genomic regions(30, 31). The largest contiguous segment in humansfor which the complete nucleotide sequence has beendetermined is the /3 T-cell receptor region (685 kb)containing a complex family of immune recognitiongenes (32). Refinements in high throughput DNA se-quencing technologies, such as automated fluores-cence methods (33, 34) and energy transfer primers(35), are anticipated to permit megabases of DNA tobe accurately and reliably sequenced within an accept-able length of time and at reasonable cost. Techno-logic advancements in sequencing methodology (36)and the emergence of novel strategies (37) will signif-icantly increase sequencing speed and efficiency.DNA sequence data will eventually reveal a wealth of

information on the organization and biologic complex-ity of the human genome.

INFORMATION TRANSFER

To take full advantage of the wealth of informationgenerated by the Human Genome Project, epidemiol-ogists and genetic epidemiologists must have the abil-ity to easily access the data, as well as a workingknowledge of the retrieval process to properly query,analyze, and interpret the desired information. A pri-mary component of the genome initiative is the devel-opment of comprehensive computer databases to as-similate the tremendous amount of mapping and DNAsequence data and to provide links to the scientific andmedical literature. Numerous databases have been es-tablished to provide organized storage and efficientdissemination of the genome mapping and sequencingdata (38, 39). The informatics movement has gener-ated global computer networks with on-line accessover the internet that permit remote access and re-trieval of raw or computed data. The seemingly un-limited potential of this technology is readily availableto epidemiologic researchers who may be unfamiliarwith genetics or genome informatics through simpli-fied accession programs and database helplines. Thesedatabases are invaluable to epidemiologic research andshould be familiar to all genetic epidemiologists be-cause they contain vast amounts of information re-garding the genes and molecular defects that contrib-ute to human disease, methods for rapid detection ofmutations and polymorphisms (if available), compre-hensive descriptions of disease phenotypes, and thestatus of treatment and intervention strategies.

Annotated DNA sequence information for humansis currently available from more than 677,205 entriesin the National Institutes of Health genetic sequencedatabase known as GenBank (www.ncbi.nlm.nih.gov)(release 99.0 in February 1997). The Unique HumanGene Sequence Collection (UniGene) is assimilatingDNA sequences to identify and map new human genes(40). Information is currently available on over 55,000sequence clusters representing the transcription prod-ucts of distinct genes. The most recent Online Men-delian Inheritance in Man (OMIM) catalog of humangenes and genetic disorders (41) contains 8,408 entriesthat include 5,439 established gene loci and descrip-tions of 398 inherited disease phenotypes. As the of-ficial repository for genomic mapping data resultingfrom the human genome initiative, the Genome Data-base (GDB) (www.gdb.org) organizes and stores data,including map locations of DNA markers and geneticdisease locus and probe information submitted by ge-nome researchers worldwide, and provides this infor-mation electronically to the scientific community. The

Epidemiol Rev Vol. 19, No. 1, 1997

Dow

nloaded from https://academ

ic.oup.com/epirev/article/19/1/3/616874 by guest on 04 June 2022

6 Ellsworth et al.

Genome Database currently contains information onmore than 1.5 million clones (physically-isolatedDNA fragments) and nearly 18,000 polymorphismswhich are accessible with state-of-the-art genomicmap viewing software.

LOCALIZING DISEASE GENES

Genetic linkage

Genetic linkage analysis is a common method forlocalizing genes contributing to human disease to anapproximate chromosomal region (42). By definition,linkage exists when two genes cosegregate from par-ents to offspring more often than would be expectedby chance and is due to the close proximity of the twogenes along a chromosome. In the search for genesassociated with disease, linkage is inferred to existwhen cosegregation has been detected more often thanexpected by chance between a marker (usually ahighly polymorphic microsatellite) and a gene affect-ing disease susceptibility. Traditional linkage methodsfor single gene disorders utilize a likelihood approachto evaluate the strength of evidence for linkage relativeto that for no linkage by calculating LOD scores(decimal log likelihood ratios) which require the modeof inheritance for the disease to be specified a priori.However, modes of inheritance for the commonchronic diseases are complex and often heterogeneousamong families. Because LOD scores applied to mul-tifactorial diseases may have inflated error rates, ro-bust "nonparametric" methods of linkage analysis arepreferred for complex diseases (43-45). Methods us-ing affected relative pairs are available for qualitativetraits (e.g., carotid artery atherosclerosis) but are oftenlimited by low power. Analyses of quantitative traits(e.g., cholesterol levels) in entire pedigrees may proveto be more appropriate and informative.

The Human Genome Project has created an invalu-able tool for genetic linkage studies—an integratedmap of microsatellite markers that are highly informa-tive for detecting linkage and are abundant throughoutthe genome. Constant refinement of genetic maps andan increasing density of reliable markers enhance ourability to accurately pinpoint the locations of diseasegenes and facilitate gene identification by definingregions that can be further characterized by physicalmapping techniques.

Genome-wide linkage analysis utilizing the humangenetic map recently identified a chromosomal regionbelieved to contain a gene contributing to a late-onsetcommon chronic disease. Non-insulin-dependent (type2) diabetes mellitus (NIDDM) is characterized by hy-perglycemia due to defects in insulin secretion and/or

action (46). Affecting 10-20 percent of the populationover 45 years of age, NIDDM is a leading cause ofmorbidity and mortality in developed countries. Al-though considerable progress has been achieved inidentifying genes responsible for Mendelian (early-onset) forms of diabetes, little is known about thegenes contributing to the common late-onset form(s)of NIDDM that are believed to be influenced by nu-merous genes as well as environmental factors. Nearly500 highly polymorphic markers with an average dis-tance between adjacent markers of less than 10 cMwere utilized to search for genes contributing toNIDDM in 330 Mexican-American affected siblingpairs (47). A number of candidate genes throughoutthe genome showed no evidence of linkage in thissample. Conversely, a single microsatellite markershowed highly significant linkage to NIDDM and mayindicate the presence of a gene that is a major contrib-utor to disease susceptibility in this population. Link-age analyses that consider multiple markers simulta-neously (multipoint analyses) indicated that 71 percentof the genome could be excluded as containing a locushaving an effect large enough to increase the relativerisk of disease (X.s) by 1.6 in individuals possessing thesusceptible genotype. However, only 5 percent of thegenome could be excluded as containing a locus hav-ing an effect large enough to increase the relative riskof disease by 1.2.

Association and transmission disequilibrium tests

Association studies compare the frequency of alleles(alternate forms of a given gene which differ in DNAsequence) between unrelated affected (case) and un-affected (control) individuals. A given allele is con-sidered to be associated with the disease if that alleleoccurs at a significantly higher frequency among casesrelative to controls. Khoury et al. (4) provide a morecomplete description of methods for genetic associa-tion studies. Association analysis may be more sensi-tive than linkage methods when the genes beingsought contribute to disease susceptibility but are nei-ther necessary nor sufficient to cause disease. Whenthe relative risk of disease given the susceptible geno-type is small, detecting genetic linkage becomes in-creasingly difficult (48).

Transmission disequilibrium tests (TDT) have re-cently been introduced to avoid some of the limitationsand pitfalls inherent in most linkage and simple asso-ciation studies (49, 50). Most transmission disequilib-rium statistics consider parents who are heterozygousfor an allele hypothesized to be associated with diseaseand evaluate the frequency with which that allele (orits alternate) is transmitted to affected offspring.Transmission of the disease-associated allele to af-

Epidemiol Rev Vol. 19, No. 1, 1997

Dow

nloaded from https://academ

ic.oup.com/epirev/article/19/1/3/616874 by guest on 04 June 2022

Genome Initiative and Genetic Epidemiology 7

fected individuals will be greater than expected underMendelian (random) segregation if the locus is locatednear the disease gene. Transmission disequilibriumtests are linkage-based association tests that may beappropriate for complex diseases because they greatlyreduce the likelihood that any allele frequency differ-ences between affected and nonaffected individualsmight be due to poorly-chosen controls or unsuspectedgenetic differences among subgroups within the pop-ulation.

Association studies and transmission disequilibriumtests are likely to play critical roles in mapping com-plex disease genes. However, such methods will re-quire large population samples and detailed, accuratemaps to localize genes that may contribute no morethan 5 percent to the total genetic variance of a com-plex disease. Current physical maps may containmarkers at sufficient densities (average spacing of 10cM) for locating monogenic disease genes and initiat-ing DNA sequencing, but localization of genes con-tributing to multifactorial diseases will likely requiredenser maps (on the order of 1 to 3 cM betweenadjacent markers) because linkage disequilibrium de-cays quickly with distance. Constant refinement ofgenetic maps by verifying marker order and increasingmarker density will be critical to the successful iden-tification of multifactorial disease genes with smallindividual effects.

Gene identification

Phenomenal progress in identifying and isolatinggenes for monogenic diseases has been achievedthrough "positional cloning" strategies that locategenes using genetic and physical mapping techniqueswith only minimal information about the function ofthe gene or the basic biochemical defects (51, 52).Linkage and association analyses in families affectedwith the disease are typically used to define an initialcandidate region in which the responsible gene isbelieved to be located. The candidate region may benarrowed using information from patients carryinglarge cytogenetic rearrangements and/or deletions.Fine-structure genetic mapping (a high-resolutionanalysis of exchange events between chromosomesthat occur during meiosis) may further delimit theinterval which can then be characterized with physicalmaps. After a thorough inventory of genes and ex-pressed sequences within the region, mutation screen-ing must be conducted to identify the causative gene.Each candidate is surveyed for mutations and theresponsible gene is identified by alterations in individ-uals affected with the disease. To date, positionalcloning efforts have led to the successful localizationand characterization of more than 40 human disease

genes (53). Although many genes identified thus far(for monogenic diseases) may be of limited interest toepidemiologists, these initial successes using genometechnology provide a foundation for the developmentof techniques and approaches for localizing genescontributing to complex diseases (e.g., 54).

Most single gene disorders are characterized by alow frequency of the disease allele in the generalpopulation and high penetrance (a large proportion ofindividuals with the disease allele show symptomsof the disease). Complex disorders are characterizedby high levels of genetic complexity, difficulties inearly stage diagnosis, late onset of clinical symp-toms, and probable gene-by-environment interactions.Alleles associated with increased susceptibility tomultifactorial diseases are often common in thegeneral population, and a given gene may contributeonly a small proportion to the total genetic varianceunderlying the affliction. Therefore, traditional ap-proaches for localizing Mendelian disease genes maynot be feasible for genetically complex disorders. Lo-cating a single gene within a chromosomal region thatis typically 2-5 megabase pairs in length is severelyhampered by the absence of chromosomal rearrange-ments or deletions that define the candidate region andby the often subtle nature of functional sequence vari-ation which may be located outside the coding region(55, 56).

Once a complex disease gene has been localized toa defined genetic interval, genes previously mappedwithin the critical region become strong candidates.This positional candidate approach (53) involvinglinkage analysis of multiple affected family membersto localize susceptibility genes to chromosomal re-gions, followed by an intensive search for logicalcandidates within the interval, may prove to be anefficient strategy for locating genes contributing to thecommon chronic diseases. Explosive growth in theconstruction and refinement of transcription maps(57, 58) through the efforts of an international consor-tium is expediting the discovery and characteri-zation of genes mutated in human disease. A recentanalysis indicates that 71 percent (32 of 45) of humandisease genes isolated by positional cloning arerepresented by at least one expressed sequence tag ina publicly-accessible database (59). The expressedsequence tags division (dbEST) (60) of GenBank,which is part of the International Nucleotide SequenceDatabase Collaboration, now contains informationon hundreds of thousands of expressed sequence tagsderived from numerous human tissues or cell typesthat may be retrieved electronically over the internet(40).

Epidemiol Rev Vol. 19, No. 1, 1997

Dow

nloaded from https://academ

ic.oup.com/epirev/article/19/1/3/616874 by guest on 04 June 2022

8 Ellsworth et al.

DNA SEQUENCE VARIATION AND FUNCTIONALMUTATIONS

Once potential candidate genes have been identified,an exhaustive search must be conducted for DNA vari-ation within the candidate region. A variety of scanningtechniques have been utilized for the initial detectionof unknown mutations within relatively large genomicregions (61). Most methods detect the differential migra-tion under electrophoresis of single-stranded DNAelements that differ in sequence (62, 63) or alterations inthe melting points of double-stranded DNA molecules(64). These techniques detect, with varying efficiencies,the presence of mutations, but they do not identify theprecise location or nature of the structural change (65).

Modern genome sequencing technologies now per-mit the search for DNA variation in linked regions toproceed by direct DNA sequencing (66) without theneed to conduct the single-stranded conformation (orsimilar) analyses mentioned above. Direct sequencingmethods both locate and characterize all DNA varia-tion within a region. Characterizing the structure ofnewly discovered genes and identifying new sequencepolymorphisms in previously uncharacterized regionswill greatly expedite the search for DNA sequencevariation implicated in genetic disease.

Methods of mutation detection identify DNA vari-ants but provide no information regarding their bio-logic significance. Sequence alterations, such as singlenucleotide substitutions, may or may not be function-ally relevant. The task of distinguishing DNA variantsthat contribute to disease from neutral polymorphismsis one of the most intellectually challenging problemsconfronting human geneticists. Several approaches(67, 68) have been developed to help pinpoint caus-ative variants or reduce the number of potential can-didates that require further investigation.

An indirect result of the Human Genome Project hasbeen the rapid expansion of protein sequence data-bases that aid in the quantification of functional mu-tations. Resources such as the Molecular ModellingDatabase and computer software programs such asRASMOL (69) are particularly useful to geneticistsand molecular biologists in locating amino acid sub-stitutions that alter protein structure (and possiblyfunction) and determining the spatial position of vari-ants relative to known functional sites within the pro-tein. The accumulation of protein structure and se-quence data will continue to provide a wealth ofinformation on the biologic functions of these macro-molecules. Using the tools of modern molecular biol-ogy and human genetics, a primary objective of ge-netic epidemiology should be focused on defining theunderlying functional mutations and exploring possi-

ble disease mechanisms that culminate in clinicallyapparent disease.

COMPARATIVE GENOMICS AND ANIMALMODELS

The "Human" Genome Project is actually a diverseinitiative that includes the parallel mapping and se-quencing of selected model organisms, including bac-teria (Escherichia coli), yeast (Saccharomyces cerevi-siae), nematode (Caenorhabditis elegans), fruit fly(Drosophila melanogaster), and mouse (Mus muscu-lus), whose genomes increase progressively in sizeand structural complexity. Critical nucleic acid struc-tures and protein functions are frequently conservedthroughout evolution across a diverse array of organ-isms. Detailed comparisons among a variety of speciesare therefore useful in deciphering structural informa-tion encoded in the DNA and provide insight into thefunctional significance of genomic sequences. A largenumber of genes present in humans have counterpartsin other species, allowing sequence homology be-tween species to be used to detect genes and regulatoryelements in newly-characterized segments of humanDNA sequence (70). Important similarities in chromo-somal structure and gene function between study or-ganisms and humans will prove invaluable in the dif-ficult process of determining gene functions andmechanisms of genetic disease etiology (e.g., 71).

Gene targeting technology permits specific alter-ations to be made in selected genes within the ge-nomes of model organisms (particularly the mouse).Targeting has been used to disrupt native genesthereby generating "knockout" animals completelylacking the product of a particular gene. Knockouts arecreated with targeting plasmids (extrachromosomalgenetic elements) containing an altered version of thegene of interest which can be introduced into embry-onic stem (ES) cells. Through homologous recombi-nation, a portion of the native gene is replaced by theintroduced variant thereby disrupting its structure andnormal functioning (72). Transgenic animals contain-ing a functional copy of a foreign gene (such as ahuman gene) may be produced by assembling a DNAconstruct containing the gene of interest along withregulatory elements necessary for expression followedby microinjection into fertilized mouse oocytes (onecell stage embryos) which are then implanted intopseudopregnant females (73, 74). Properly designedgene targeting studies have the ability to evaluate thephysiologic effects of precise genetic changes whilesimultaneously eliminating or minimizing environ-mental effects as well as the effects of other genes.

The ability to manipulate the genomes of modelorganisms by disrupting native genes and/or introduc-

EpidemiolRev Vol. 19, No. 1, 1997

Dow

nloaded from https://academ

ic.oup.com/epirev/article/19/1/3/616874 by guest on 04 June 2022

Genome Initiative and Genetic Epidemiology 9

ing human genes has significantly advanced our un-derstanding of numerous monogenic disorders inhumans. However, dissecting genetic factors con-tributing to disease and distinguishing between causa-tion and correlation are more difficult for complexdiseases with multiple genetic and environmentalcomponents. Although we do not fully understand thedegree to which animal knockout models are biologi-cally relevant to human diseases (particularly thosewith complex etiologies) or whether information fromsingle gene disruptions can be extrapolated to multi-factorial conditions, recent advances in transgenictechnology (74, 75) have proven useful for examiningthe often modest effects of complex disease genes,independent of other genes and environmental factorsthat may influence susceptibility (76-78). Transgenicanimals provide information on the metabolic func-tions of genes and the relation between geneticallydetermined alterations in gene dosage and predisposi-tion to disease. We anticipate the ability to test humanfunctional variants in animal systems as refinements ingene replacement technology allow intact disease-associated or nonassociated alleles to be introduced indefined copy number and as carefully-structuredbreeding programs eliminate phenotypic effectscaused by polymorphic differences among animals.

CLINICAL AND DIAGNOSTIC APPLICATIONS

Molecular diagnostics

A major focus of research on the epidemiology andprevention of disease is the ability to identify individ-uals at increased risk and to predict disease beforeonset of clinical manifestations. An increasingly effec-tive strategy for defining individuals at increased riskinvolves the identification of specific DNA polymor-phisms that are associated with disease. Molecular(DNA) diagnostics is a rapidly expanding (in bothscope and importance) discipline of medical geneticsthat encompasses a diverse array of clinical applica-tions from the diagnosis of genetic disorders (79) andneoplastic conditions (80) to the identification of in-fectious disease agents (81). Technical improvementsand sophisticated variations of recombinant DNAtechnologies are increasingly being applied to detectdisease-associated mutations in human genes. For ex-ample, diagnostic tests are currently available for nu-merous single gene disorders such as Duchenne andBecker muscular dystrophies (82) and cystic fibrosis(83) and for various forms of cancer (84, 85). Geneticlesions known to be responsible for human inheriteddiseases are already being collated in a comprehensiveonline reference source, the Human Gene MutationDatabase (available through OMfM), which provides

information of practical diagnostic importance to ge-neticists, physicians, and genetic counselors.

The majority of human diseases involve multiplegenes that may interact with each other and whoseeffects are often mediated by the environment. Due tothe high prevalence of complex diseases, such as car-diovascular disease, diabetes, and certain cancers inthe general population, the capacity to identify those atincreased risk of disease could lead to preventivemeasures (lifestyle changes) and targeted interventionstrategies designed to modify risk and/or prevent pre-mature onset of disease. Many of the genes contribut-ing to multifactorial diseases have not yet been char-acterized, and the ability to detect DNA sequencevariation predisposing to such diseases is often beyondour diagnostic capabilities. Fortunately, several genesimplicated in the occurrence of complex diseases orcommon forms of cancer have been characterizedwhere specific mutations or variants that are commonin the population contribute to disease. One exampleof such a gene is apolipoprotein (apo) E. Apolipopro-tein E is a structural constituent of several lipoproteinspecies and plays a major role in lipid metabolismthrough cellular uptake of lipoprotein particles (86).The human apolipoprotein E gene is polymorphic withthree common alleles {el, e3, and e4) (87). Variousstudies have shown that the effects of this gene arerelatively consistent across ethnically and geographi-cally diverse populations—the average effect of the e2allele is to lower total serum cholesterol levels whilethe average effect of the e4 allele is to raise totalcholesterol levels (88, 89). The e2 allele is hypothe-sized to have a protective effect on the development ofatherosclerosis (90) because it is associated with lowercholesterol levels and is more frequent in patients withno or minimal atherosclerotic involvement (91). Con-versely, a number of epidemiologic studies have re-ported an association of the e4 allele with cardiovas-cular disease (92-94). The multifactorial etiology of,and environmental influences on, cardiovascular dis-ease make it difficult to accurately predict disease riskfor specific individuals. However, efforts to reduce theprevalence of known risk factors in the general popu-lation and in particular at-risk subgroups are effectiveintervention strategies.

The recent identification of genes influencing he-reditary breast and ovarian cancers heightened enthu-siasm that such discoveries would improve the abilityto identify individuals most at risk of developingbreast and ovarian cancer and would be key to betterunderstanding all forms of cancer. Breast cancer rep-resents the most common form of cancer amongwomen in westernized countries; cumulative lifetimerisk for non-Jewish women in the general population is

Epidemiol Rev Vol. 19, No. 1, 1997

Dow

nloaded from https://academ

ic.oup.com/epirev/article/19/1/3/616874 by guest on 04 June 2022

10 Ellsworth et al.

approximately 12 percent, but risk may approach 85percent in families of Ashkenazi Jews that carrydisease-associated mutations (95). Breast cancer sus-ceptibility genes that may account for 3-7 percent ofall familial breast cancer cases were recently isolatedand characterized (BRCA1 in 1994 (96) and BRCA2 in1995 (97)). Despite these highly publicized discover-ies, reliable population screening programs and effec-tive treatment and prevention options are not yet avail-able. Difficulties in relating DNA variation to risk forbreast cancer are attributable to our current lack ofknowledge regarding the function and regulation ofthe BRCA genes and a plethora of nonrecurrent (seenin only one or a few families) mutations (98). Re-searchers have identified more than 235 different se-quence variations within BRCA1 and approximately100 mutations in BRCA2 (99). Prevalence estimatesfor the various mutations in the general population arenot yet available, and the risk of disease imparted byspecific mutations remains unknown. The heterogene-ity of variation within the breast cancer genes mayreduce the effectiveness of potential 'diagnostic tests,leading to high frequencies of both false positive andfalse negative results. Commercial tests are now avail-able to detect specific mutations, and numerous otherdiagnostic tests are undoubtedly in development.However, the ability to detect mutations influencingdisease risk may quickly outdistance our ability todevelop effective measures for prevention and treat-ment.

The progressive characterization of disease genesthrough the Human Genome Project, coupled with animproved ability to identify the molecular defects con-tributing to disease, is expected to revolutionize themolecular diagnosis of genetic diseases. Ultimate im-provements in the ability to diagnose genetic disease atthe DNA level may advance our knowledge of geneticdisease etiology, but such progress requires a concom-itant acceleration in therapeutic, intervention, and pre-vention options. Dissemination of technologic devel-opments associated with genome research (such aspolymerase chain reaction and DNA sequencing) tothe fields of medical genetics and genetic epidemiol-ogy must not only enhance our ability to diagnose andpredict genetic disease but should also provide futuredirections for prevention and treatment.

Ethical and social issues in diagnostic moleculargenetics

Prior to the genome initiative, ethical issues in mo-lecular genetics focused primarily on monitoring andregulating experimentation in recombinant DNA andgenetic engineering. With the inception of the HumanGenome Project, a joint National Institutes of Health/

Department of Energy working group, the Ethical,Legal, and Social Issues (ELSI) Program, has beenestablished to examine various issues associated withthe generation and dissemination of a vast array ofgenomic information. High-priority issues initially tar-geted by the Ethical, Legal, and Social Issues Programfor development of policies and guidelines included:1) the integration and impact of new genetic tests andthe debate over population screening; 2) privacy andconfidentiality of genetic information; 3) geneticcounseling and reproductive decisions influenced bydiagnostic results; and 4) public education (100). Inrecent years, the Ethical, Legal, and Social IssuesProgram has also emphasized technical problems suchas the potential for genetic discrimination, educatingphysicians in the advantages and limitations of geneticdata, quality control in DNA testing laboratories, anddefining guidelines for obtaining informed consent forgenetic research (101).

As new genetic assays are introduced into clinicalpractice, rigorous adherence to established protocolsand quality control assurance are of paramount impor-tance. Attention must be directed toward the debateover implementing population-wide screening pro-grams as routine practice in clinical medicine to detectthose at increased genetic risk. The increasing abilityto diagnose individuals at risk for genetic diseases forwhich there are no therapeutic options will requireenactment of measures to 1) prevent insurance and/oremployment discrimination against asymptomatic car-riers and 2) accommodate the psychologic needs ofthose who are likely to develop a late-onset condition.Improvement of noninvasive prenatal and preimplan-tation diagnostic procedures is creating an immediateneed to explore the ethical dilemmas and difficultreproductive choices faced by prospective parentsknown to carry disease-associated genes (102, 103).Increasing public awareness of die availability, bene-fits, and limitations of molecular diagnostic tests isanticipated to simultaneously improve health care de-livery while minimizing the potential for psychologicand social stigmatization. The impending explosion inthe number of well-characterized human disease genesand new abilities to diagnose genetic disorders willlikely necessitate development of novel avenues foreducation and genetic testing. Careful integration ofgenetic information widi a practical system for char-acterizing and resolving ethical and social issues willprovide future directions for the fields of moleculargenetics and clinical medicine.

CONCLUSIONS

Rapid discoveries of novel genes for a variety ofhuman diseases are anticipated as genomic maps be-

EpidemiolRev Vol. 19, No. 1, 1997

Dow

nloaded from https://academ

ic.oup.com/epirev/article/19/1/3/616874 by guest on 04 June 2022

Genome Initiative and Genetic Epidemiology 11

come more detailed and methods for mapping andcharacterizing disease genes become more refined.Recent and continuing developments in genome tech-nology and analytic methods provide the tools and rawmaterials for unraveling the complexities of the com-mon chronic diseases and common forms of cancer.Perhaps the greatest utility of the vast genetic infor-mation being generated by the human genome initia-tive is in primary prevention programs. Methods foridentifying asymptomatic individuals at risk for ge-netic disease and the development of more efficaciousintervention strategies are becoming paramount ashealth care costs escalate and medical genetics shiftsto early detection and prevention of disease. Withthese abilities will come the need to fully integrategenetic information into large prospective studies, andintervention trials to accurately predict disease riskand synthesize new approaches to risk reduction. Un-derstanding the role of genes in human disease willimprove our understanding of genetic disease etiologyas well as our ability to predict disease. Insight into thegenetic basis of chronic disease etiology will haveimmediate impact by suggesting novel therapeutic ap-proaches and aiding new drug discovery.

ACKNOWLEDGMENTS

This work was supported by grants HL51021 andHL54481 from the National Heart, Lung, and Blood Insti-tute.

The authors thank their collaborators and friends in theRochester Family Heart Study, the Atherosclerosis Risk InCommunities project, and the Starr County Health Studiesfor years of support and encouragement.

REFERENCES

1. Neel JV, Schull WJ. Human heredity. Chicago, IL: TheUniversity of Chicago Press, 1954.

2. Morton NE, Chung CS, eds. Genetic epidemiology. NewYork, NY: Academic Press, 1978.

3. Sing CF, Skolnick M, eds. Genetic analysis of commondiseases: applications to predictive factors in coronary dis-ease. New York, NY: Alan R Liss, 1979.

4. Khoury MJ, Beaty TH, Cohen BH. Fundamentals of geneticepidemiology. New York, NY: Oxford University Press,1993.

5. Weiss KM. Genetic variation and human disease: principlesand evolutionary approaches. Cambridge, United Kingdom:Cambridge University Press, 1993.

6. Kerem B, Rommens JM, Buchanan JA, et ah Identificationof the cystic fibrosis gene: genetic analysis. Science 1989;245:1073-80.

7. Riordan JR, Rommens JM, Kerem B, et al. Identification ofthe cystic fibrosis gene: cloning and characterization of com-plementary DNA. Science 1989;245:1066-73.

8. Rommens JM, Iannuzzi MC, Kerem B, et al. Identification of

the cystic fibrosis gene: chromosome walking and jumping.Science 1989;245:1059-65.

9. Knudson AG Jr. Mutation and cancer: statistical study ofretinoblastoma. Proc Natl Acad Sci U S A 1971;68:820-3.

10. Fearon ER, Vogelstein B. A genetic model for colorectaltumorigenesis. Cell 1990;61:759-67.

11. Harris H. The principles of human biochemical genetics. 3rdrev ed. Amsterdam, The Netherlands: Elsevier/North-Holland Biomedical Press, 1980.

12. Green ED, Waterston RH. The Human Genome Project:prospects and implications for clinical medicine. JAMA1991;266:1966-75.

13. Engel LW. The Human Genome Project: history, goals, andprogress to date. Arch Pathol Lab Med 1993;117:459-65.

14. Olson MV. The Human Genome Project Proc Natl Acad SciU S A 1993;90:4338-44.

15. Green ED, Cox DR, Myers RM. The Human Genome Projectand its impact on the study of human disease. In: Scriver CR,Beaudet AL, Sly WS, et al., eds. The metabolic and molec-ular bases of inherited disease. Vol 1. 7th ed. New York, NY:McGraw-Hill, 1995:401-36

16. Morton NE. Parameters of the human genome. Proc NatlAcad Sci U S A 1991;88:7474-6.

17. Bishop JO. The gene numbers game. Cell 1974;2:81-6.18. Nowak R. Mining treasures from "junk DNA." (News).

Science 1994;263:608-10.19. Cantor CR. Orchestrating the Human Genome Project. Sci-

ence 1990;248:49-51.20. Watson JD. The Human Genome Project: past, present, and

future. Science 1990;248:44-9.21. Understanding our genetic inheritance—The US Human Ge-

nome Project: the first five years, FY 1991-1995. Bethesda,MD: Department of Health and Human Services, PublicHealth Service, National Institutes of Health, National Centerfor Human Genome Research; and US Department of En-ergy, Office of Energy Research, Office of Health and En-vironmental Research, Human Genome Program, 1990.

22. Verkerk AJMH, Pieretti M, Sutcliffe JS, et al. Identificationof a gene (FMR-1) containing a CGG repeat coincident witha breakpoint cluster region exhibiting length variation infragile X syndrome. Cell 1991;65:905-14.

23. Royer-Pokora B, Kunkel LM, Monaco AP, et al. Cloning thegene for an inherited human disorder—chronic granuloma-tous disease—on the basis of its chromosomal location.Nature 1986;322:32-8.

24. Weber JL, May PE. Abundant class of human DNA poly-morphisms which can be typed using the polymerase chainreaction. Am J Hum Genet 1989;44:388-96.

25. Gyapay G, Morissette J, Vignal A, et al. The 1993-94Ge'ne'thon human genetic linkage map. Nat Genet 1994;7(spec no):246-339.

26. Dib C, Faurf S, Fizames C, et al. A comprehensive geneticmap of the human genome based on 5,264 microsatellites.Nature 1996;380:152-4.

27. Olson M, Hood L, Cantor C, et al. A common language forphysical mapping of the human genome. Science 1989;245:1434-5.

28. Hudson TJ, Stein LD, Gerety SS, et al. An STS-based map ofthe human genome. Science 1995 ;270:1945-54.

29. Adams MD, Kerlavage AR, Fleischmann RD, et al. Initialassessment of human gene diversity and expression patternsbased upon 83 million nucleotides of cDNA sequence. Na-ture 1995;377(6547 Suppl):3-174.

30. Chen EY, Liao YC, Smith DH, et al. The human growthhormone locus: nucleotide sequence, biology, and evolution.Genomics 1989;4:479-97.

31. Martin-Gallardo A, McCombie WR, Gocayne JD, et al.Automated DNA sequencing and analysis of 106 kilobasesfrom human chromosome 19ql3.3. Nat Genet 1992;l:34-9.

32. Rowen L, Koop BF, Hood L. The complete 685-kilobaseDNA sequence of the human fi T cell receptor locus. Science1996;272:1755-62.

Epidemiol Rev Vol. 19, No. 1, 1997

Dow

nloaded from https://academ

ic.oup.com/epirev/article/19/1/3/616874 by guest on 04 June 2022

12 Ellsworth et al.

33. Wilson RK, Chen C, Avdalovic N, et al. Development of anautomated procedure for fluorescent DNA sequencing.Genomics 1990;6:626-34.

34. Adams MD, Fields C, Venter JC, eds. Automated DNAsequencing and analysis. London, United Kingdom: Aca-demic Press, 1994.

35. Ju J, Glazer AN, Mathies RA. Energy transfer primers: a newfluorescence labeling paradigm for DNA sequencing andanalysis. Nat Med 1996;2:246-9.

36. Barrell B. DNA sequencing: present limitations and pros-pects for the future. FASEB J 1991;5:40-5.

37. Hunkapiller T, Kaiser RJ, Koop BF, et al. Large-scale andautomated DNA sequence determination. Science 1991 ;254:59-67.

38. Pearson ML, Soil D. The Human Genome Project: a para-digm for information management in the life sciences.FASEB J 1991;5:35-9.

39. Smith RF. Perspectives: sequence data base searching in theera of large-scale genomic sequencing. Genome Res 1996;6:653-60.

40. Boguski MS, Schuler GD. ESTablishing a human transcriptmap. Nat Genet 1995;10:369-71.

41. Online Mendelian Inheritance in Man (OMIM). Bethesda,MD: National Center for Biotechnology Information, Na-tional Library of Medicine (www.ncbi.nlm.nih.gov/Omim).Accessed 1996.

42. Ott J. Analysis of human genetic linkage. Baltimore, MD:Johns Hopkins University Press, 1991.

43. Weeks DE, Lange K. The affected-pedigree-member methodof linkage analysis. Am J Hum Genet 1988;42:315-26.

44. Amos CI. Robust variance-components approach for assess-ing genetic linkage in pedigrees. Am J Hum Genet 1994;54:535-43.

45. Risch N, Zhang H. Extreme discordant sib pairs for mappingquantitative trait loci in humans. Science 1995;268:1584-9.

46. DeFronzo RA, Bonadonna RC, Ferrannini E. Pathogenesis ofNIDDM: a balanced overview. Diabetes Care 1992;15:318-68.

47. Hanis CL, Boerwinkle E, Chakraborty R, et al. A genome-wide search for human non-insulin-dependent (type 2) dia-betes genes reveals a major susceptibility locus on chromo-some 2. Nat Genet 1996;13:161-6.

48. Greenberg DA. Linkage analysis of "necessary" disease lociversus "susceptibility" loci. Am J Hum Genet 1993;52:135-43.

49. Spielman RS, McGinnis RE, Ewens WJ. Transmission testfor linkage disequilibrium: the insulin gene region andinsulin-dependent diabetes mellitus (IDDM). Am J HumGenet 1993;52:506-16.

50. Thomson G. Mapping disease genes: family-based associa-tion studies. Am J Hum Genet 1995;57:487-98.

51. Collins FS. Positional cloning: let's not call it reverse any-more. (News). Nat Genet 1992;l:3-6.

52. Collins FS. Identifying human disease genes by positionalcloning. Harvey Lect 1990-1991 ;86:149-64.

53. Collins FS. Positional cloning moves from perditional totraditional. Nat Genet 1995;9:347-50.

54. Boerwinkle E, Ellsworth DL, Hallman DM, et al. Geneticanalysis of atherosclerosis: a research paradigm for the com-mon chronic diseases. Hum Mol Genet 1996;5(spec no):1405-10.

55. Bennett ST, Lucassen AM, Gough SCL, et al. Susceptibilityto human type 1 diabetes at IDDM2 is determined by tandemrepeat variation at the insulin gene minisatellite locus. NatGenet 1995;9:284-92.

56. Kennedy GC, German MS, Ratter WJ. The minisatellite inthe diabetes susceptibility locus IDDM2 regulates insulintranscription. Nat Genet 1995;9:293-8.

57. Berry R, Stevens TJ, Walter NAR, et al. Gene-basedsequence-tagged-sites (STSs) as the basis for a human genemap. Nat Genet 1995;10:415-23.

58. Schuler GD, Boguski MS, Stewart EA, et al. A gene map ofthe human genome. Science 1996;274:540-6.

59. Bassett DE Jr, Boguski MS, Spencer F, et al. Comparativegenomics, genome cross-referencing and XREFdb. TrendsGenet 1995;11:372-3.

60. Boguski MS, Lowe TMJ, Tolstoshev CM. dbEST—databasefor "expressed sequence tags". Nat Genet 1993;4:332-3.

61. Landegren U, ed. Laboratory protocols for mutation detec-tion. Oxford, United Kingdom: Oxford University Press,1996.

62. Orita M, Iwahana H, Kanazawa H, et al. Detection of poly-morphisms of human DNA by gel electrophoresis as single-strand conformation polymorphisms. Proc Natl Acad SciU S A 1989;86:2766-70.

63. Brow MD, Oldenburg M, Lyamichev V, et al. Mutationdetection by cleavase fragment length polymorphism analy-sis. Focus 1996;18:2-5.

64. Jin L, Underbill PA, Oefner PJ, et al. Systematic search forpolymorphisms in the human genome using denaturing high-performance liquid chromatography (DHPLC). (Abstract).Am J Hum Genet 1995;57(suppl):A26.

65. Grompe M. The rapid detection of unknown mutations innucleic acids. Nat Genet 1993;5:111-17.

66. Olson MV. A time to sequence. Science 1995;270:394-6.67. Templeton AR, Boerwinkle E, Sing CF. A cladistic analysis

of phenotypic associations with haplotypes inferred fromrestriction endonuclease mapping. I. Basic theory and ananalysis of alcohol dehydrogenase activity in Drosophila.Genetics 1987;117:343-51.

68. Julier C, Lucassen A, Villedieu P, et al. Multiple DNAvariant association analysis: application to the insulin generegion in type I diabetes. Am J Hum Genet 1994;55:1247-54.

69. Sayle RA, Milner-White EJ. RASMOL: biomolecular graph-ics for all. Trends Biochem Sci 1995;20:374-6.

70. Green P, Lipman D, Hillier L, et al. Ancient conservedregions in new gene sequences and the protein databases.Science 1993;259:1711-16.

71. Fishel R, Lescoe M, Rao MRS, et al. The human mutatorgene homolog MSH2 and its association with hereditarynonpolyposis colon cancer. Cell 1993;75:1027-38.

72. Piedrahita JA, Zhang SH, Hagaman JR, et al. Generation ofmice carrying a mutant apolipoprotein E gene inactivated bygene targeting in embryonic stem cells. Proc Natl Acad SciU S A 1992;89:4471-5.

73. Rubin EM, Ishida BY, Clift SM, et al. Expression of humanapolipoprotein A-I in transgenic mice results in reducedplasma levels of murine apolipoprotein A-I and the appear-ance of two new high density lipoprotein size subclasses.Proc Natl Acad Sci U S A 1991;88:434-8.

74. Schultz JR, Rubin EM. The properties of HDL in geneticallyengineered mice. Curr Opin Lipidol 1994;5:126-37.

75. Smithies O, Maeda N. Gene targeting approaches to complexgenetic diseases: atherosclerosis and essential hypertension.Proc Natl Acad Sci U S A 1995;92:5266-72.

76. Zhang SH, Reddick RL, Piedrahita JA, et al. Spontaneoushypercholesterolemia and arterial lesions in mice lackingapolipoprotein E. Science 1992:258:468-71.

77. Schultz JR, Verstuyft JG, Gong EL, et al. Protein composi-tion determines the anti-atherogenic properties of HDL intransgenic mice. Nature 1993;365:762-4.

78. Warden CH, Hedrick CC, Qiao JH, et al. Atherosclerosis intransgenic mice overexpressing apolipoprotein A-II. Science1993;261:469-72.

79. Antonarakis SE. Diagnosis of genetic disorders at the DNAlevel. N Engl J Med 1989;320:153-63.

80. Kawasaki ES. The polymerase chain reaction: its use in themolecular characterization and diagnosis of cancers. CancerInvest 1992;10:417-29.

81. Peter JB. The polymerase chain reaction: amplifying ouroptions. Rev Infect Dis 1991;13:166-71.

82. Diagnosis of Duchenne and Becker muscular dystrophies by

Epidemiol Rev Vol. 19, No. 1, 1997

Dow

nloaded from https://academ

ic.oup.com/epirev/article/19/1/3/616874 by guest on 04 June 2022

Genome Initiative and Genetic Epidemiology 13

polymerase chain reaction: a raulticenter study. JAMA 1992;267:2609-15.

83. Fujimura FK. Cystic fibrosis gene analysis: recent diagnosticapplications. Clin Biochem 1991:24:353-61.

84. Jass JR, Cottier DS, Jeevaratnam P, et al. Diagnostic use ofmicrosatellite instability in hereditary non-polyposis colorec-tal cancer. Lancet 1995;346:1200-l.

85. Plummer SJ, Anton-Culver H, Webster L, et al. Detection ofBRCA1 mutations by the protein truncation test Hum MolGenet 1995;4:1989-91.

86. Mahley RW, Innerarity TL, Rail SC Jr, et al. ApolipoproteinE: genetic variants provide insights into its structure andfunction. Curr Opin Lipidol 1990; 1:87-95.

87. Zannis VL Breslow JL. Human very low density lipoproteinapolipoprotein E isoprotein polymorphism is explained bygenetic variation and posttranslational modification. Bio-chemistry 1981 ;20:1033-41.

88. Boerwinkle E, Utermann G. Simultaneous effects of theapolipoprotein E polymorphism on apolipoprotein E, apoli-poprotein B, and cholesterol metabolism. Am J Hum Genet1988;42:104-12.

89. Hallman DM, Boerwinkle E, Sana N, et al. The apolipopro-tein E polymorphism: a comparison of allele frequencies andeffects in nine populations. Am J Hum Genet 1991 ;49:338-49.

90. Davignon J, Gregg RE, Sing CF. Apolipoprotein E polymor-phism and atherosclerosis. Arteriosclerosis 1988;8:1—21.

91. Menzel HJ, Kladetzky RG, Assmann G. Apolipoprotein Epolymorphism and coronary artery disease. Arteriosclerosis1983;3:310-15.

92. Lenzen HJ, Assmann G, Buchwalsky R, et al. Association ofapolipoprotein E polymorphism, low-density lipoprotein

cholesterol, and coronary artery disease. Clin Chem 1986;32:778-81.

93. Nieminen MS, Mattila KJ, Aalto-Setaia" K, et al. Lipoproteinsand their genetic variation in subjects with and withoutangiographically verified coronary artery disease. Arterio-scler Thromb 1992;12:58-69.

94. Stengard JH, Zerba KE, Pekkanen J, et al. Apolipoprotein Epolymorphism predicts death from coronary heart disease ina longitudinal study of elderly Finnish men. Circulation1995;91:265-9.

95. Szabo Cl, King MC. Inherited breast and ovarian cancer.Hum Mol Genet 1995;4(spec no):1811-17.

96. Mild Y, Swensen J, Shattuck-Eidens D, et al. A strongcandidate for the breast and ovarian cancer susceptibilitygene BRCA1. Science 1994;266:66-71.

97. Wooster R, Bignell G, Lancaster J, et al. Identification of thebreast cancer susceptibility gene BRCA2. Nature 1995;378:789-92.

98. Collins FS. BRCAl-lots of mutations, lots of dilemmas.(Editorial). N Engl J Med 1996;334:186-8.

99. Kahn P. Coming to grips with genes and risk. (News).Science 1996274:496-8.

100. Durfy SJ. Ethics and the Human Genome Project. ArchPathol Lab Med 1993;117:466-9.

101. Marshall E. The genome project's conscience. (News). Sci-ence 1996;274:488-90.

102. Verlinsky Y, Pergament E, Strom C. The preimplantationgenetic diagnosis of genetic diseases. J In Vitro Fert EmbryoTransf 1990;7:l-5.

103. Grody WW. Molecular genetics: introduction. Arch PatholLab Med 1993;117:470-2.

Epidemiol Rev Vol. 19, No. 1, 1997

Dow

nloaded from https://academ

ic.oup.com/epirev/article/19/1/3/616874 by guest on 04 June 2022