Post on 28-Apr-2023
sevier.com/locate/ygeno
Genomics 86 (200
Integrating QTL and high-density SNP analyses in mice to identify
Insig2 as a susceptibility gene for plasma cholesterol levelsi
Alessandra C. Cervinoa, Guoya Lia, Steve Edwardsa, Jun Zhua, Cathy Lauriea, George Tokiwaa,
Pek Yee Luma, Susanna Wangb, Lawrence W. Castellinib, Aldons J. Lusisb,
Sonia Carlsona, Alan B. Sachsa, Eric E. Schadta,*
a Rosetta Inpharmatics, LLC, a Wholly Owned Subsidiary of Merck & Co., Inc., Seattle, WA 98109, USAb Division of Cardiology, Department of Medicine, and Department of Microbiology, Immunology, and Molecular Genetics,
David Geffen School of Medicine at the University of California at Los Angeles, Los Angeles, CA 90095-1679, USA
Received 19 May 2005; accepted 25 July 2005
Available online 29 August 2005
Abstract
The use of inbred strains of mice to dissect the genetic complexity of common diseases offers a viable alternative to human studies, given the
control over experimental parameters that can be exercised. Central to efforts to map susceptibility loci for common diseases in mice is a
comprehensive map of DNA variation among the common inbred strains of mice. Here we present one of the most comprehensive high-density,
single nucleotide polymorphism (SNP) maps of mice constructed to date. This map consists of 10,350 SNPs genotyped in 62 strains of inbred
mice. We demonstrate the utility of these data via a novel integrative genomics approach to mapping susceptibility loci for complex traits. By
integrating in silico quantitative trait locus (QTL) mapping with progressive QTL mapping strategies in segregating mouse populations that
leverage large-scale mapping of the genetic determinants of gene expression traits, we not only facilitate identification of candidate quantitative
trait genes, but also protect against spurious associations that can arise in genetic association studies due to allelic association among unlinked
markers. Application of this approach to our high-density SNP map and two previously described F2 crosses between strains C57BL/6J (B6) and
DBA/2J and between B6 ApoE�/� and C3H/HeJ ApoE�/� results in the identification of Insig2 as a strong candidate susceptibility gene for total
plasma cholesterol levels.
D 2005 Elsevier Inc. All rights reserved.
Keywords: SNP; QTL; Complex traits; Insig2; Cholesterol; Gene expression; Genetics
One class of genetic approaches to dissecting complex
disease traits treats variations in DNA as markers and tests
whether such markers are associated with a given disease trait
of interest. Markers found to cosegregate or otherwise associate
with a given disease highlight regions in the genome that are
considered linked to the disease and that at least partially
explain susceptibility to the disease. The use of inbred strains
of mice in this setting has proven to be a viable alternative to
human genetic studies given the degree of control that can be
0888-7543/$ - see front matter D 2005 Elsevier Inc. All rights reserved.
doi:10.1016/j.ygeno.2005.07.010
i The Mouse Phenome Database (http://www.jax.org/phenome) accession
numbers for the phenomes discussed herein are MPD:8 (chol_fat from the
Albers1 set), MPD:29 (total_chol from the Paigen1 set), and MPD:99 (chol_17
from the Paigen2 set).
* Corresponding author.
E-mail address: eric_schadt@merck.com (E.E. Schadt).
exercised over experimental parameters such as environment,
breeding scheme, and detailed phenotyping. More recently,
quantitative trait locus (QTL) mapping has led to the
identification of numerous genetic loci for a variety of traits
relevant to human diseases, including behavioral differences,
lipid levels, obesity, and atherosclerosis [1]. Although the use
of QTL analysis to identify complex disease genes can still be a
long and laborious undertaking, important progress in the
development of genomics and bioinformatics tools, such as the
creation of whole-genome congenic strains, microarray tech-
nologies, and the availability of the genomic sequence from
several different strains, is serving to accelerate this process.
Previously, we have reported on exploiting genomic varia-
tions in mice used to infer causal relationships among gene
expression traits and between gene expression and disease traits
[2–4]. One intuitive strategy in particular involved examining
5) 505 – 517
www.el
A.C. Cervino et al. / Genomics 86 (2005) 505–517506
genes located in regions of the genome genetically linked to the
disease trait and then testing for colocalization of cis-acting
expression QTL (eQTL) for those genes residing in the region
linked to the disease and correlated with disease-related
phenotypes. These genes serve as natural candidates for disease
phenotypes of interest. One problem with this approach arises if
the gene contributing to a particular disease is not causally
associated with respect to transcript abundances, but instead
involves some other mechanism of action unrelated to changes in
transcript levels (e.g., alternative splicing or amino acid changes
leading to changes in protein function). In such cases, or in cases
in which the relevant tissues cannot be profiled or the particular
state of a system in which changes in expression lead to disease
cannot be determined, there will be no cis eQTL colocalizing
with the disease trait QTL of interest. Further, the number of cis
eQTL with significant lod scores in a given linkage region of
interest that are independent but closely linked to the QTL for the
trait of interest also complicates this process, as we have
previously described [5]. These shortcomings, combined with
the difficulty in general in positionally cloning quantitative trait
genes underlying QTL, speak to the need for alternative methods
to identify more efficiently causal genes for disease.
One alternative approach that has been recently detailed
exploits naturally occurring DNA variation over any number of
common strains of mice to recover haplotype structures in
these mice, which in turn can be used to carry out genome-wide
association scans [6,7]. Such ‘‘in silico’’ genetics procedures
have been applied to collections of inbred lines to narrow
significantly genomic regions previously linked to disease in
segregating mouse populations, even allowing for the direct
identification of the causal gene and ultimately the causal
polymorphism giving rise to the disease trait [6,7]. A necessary
first step in carrying out this type of approach is a dense map of
markers informative over the strains of mice commonly used in
biological research. Here we present one of the most
comprehensive high-density single nucleotide polymorphism
(SNP) maps constructed to date for common strains of
laboratory mice. Our map consists of 10,350 SNPs genotyped
in 62 strains of inbred mice, including 56 common strains
ordered from The Jackson Laboratory. Using hierarchical trees,
phylogenetic analysis, and the reconstruction of haplotype
blocks, we investigated the ancestral relationships among these
62 strains of mice. The SNP data are essential to establish
ancestral relationships among the strains and to leverage these
relationships to reconstruct haplotypes for any region of
interest in the genome, thereby facilitating more rapid
identification of genes underlying complex disease traits.
However, we also show that there is strong, widespread
nonrandom association operating among unlinked loci (linkage
disequilibrium) that has the potential to lead to spurious
associations in genetic association studies. To protect against
such spurious associations, we developed an integrative
genomics approach to map susceptibility loci for complex
traits. This approach involves integrating in silico (association-
based) QTL mapping with QTL mapping in segregating mouse
populations. By leveraging large-scale mapping of the genetic
determinants of gene expression traits, we not only facilitate
identification of candidate quantitative trait genes, but also
protect against spurious associations due to allelic association
among unlinked markers. Application of this novel process to
our high-density SNP map and two previously described F2
crosses between strains C57BL/6J (B6) and DBA/2J (DBA)
and between B6 ApoE�/� and C3H/HeJ (C3H) ApoE�/�
results in the identification of Insig2 as a strong candidate gene
for total plasma cholesterol levels.
Results
SNP distribution
The exact number of SNPs between all common inbred
strains of mice is still unknown, although the completion of the
sequencing of the mouse genome and several large-scale SNP
discovery studies have begun to shed light on the extent of DNA
differences among common laboratory strains of mice. The
distribution of SNPs across the genome does not appear to be
random, but rather bimodal, with areas of high SNP frequency
(¨40 SNP per 10 kb) and low SNP frequency (¨0.5 SNP per 10
kb) corresponding to areas of shared ancestry [8]. From some of
the more comprehensive studies aimed at assessing DNA
variation in common strains of mice, SNP frequency estimates
range from 1 SNP every 700+ bp for related inbred strains of
mice to 150 bp for more distantly related mice [9]. These
estimates are consistent with our own estimates obtained from
the Celera Mouse Genome Database, in which more than
2,400,000 SNPs are represented for five inbred strains of mice.
In August 2004, Celera and NCBI had approximately
60,000 high-confidence SNPs represented in their publicly
available databases. Others have significantly leveraged the
complete set of SNPs represented in the Celera database to
construct a map of 10,990 SNPs across 48 inbred lines of mice
and demonstrated the utility of such high-density maps in
carrying out in silico mapping studies [10]. While the
proprietary Celera database represented in the commercially
available Celera Discovery System offers the most compre-
hensive source of SNP data available in mouse, the focus on
only 5 inbred strains of mice may limit the ability to derive
more general ancestral relationships among more varied strains
of mice due to strain-specific biases in the SNPs represented.
We sought to construct a high-density SNP map that was
complementary to those provided by Celera and Pletcher et al.
[10]. In an attempt to reduce strain-specific biases, we chose to
limit the selection of SNPs coming from the Celera database,
so that SNPs were chosen from this database only when
necessary to cover a specific region of the genome that could
not otherwise be covered.
The 60,000 SNPs available in the public domain (dbSNP) at
the time we designed our SNP panel give rise to an average SNP
density of 1 SNP every ¨40 kb. We selected an initial set of
15,360 SNPs for genotyping by Illumina from this set (see
Methods). Of these 15,360 SNPs we were able to map 12,572 to
the mouse genome (NCBI Build 32) with high confidence.
From this reduced set of SNPs Illumina was able to convert
10,350 into valid genotyping assays. This set of SNPs provided
Table 1
Summary description of SNP distribution per chromosome for the high-density
SNP map described in the text
Chromosome Chromosome
length in MbaNo. of SNPs Average distance
in kb (SD)
1 195 866 230 (261)
2 182 795 228 (282)
3 161 677 242 (270)
4 154 632 244 (299)
5 149 613 245 (291)
6 150 588 256 (321)
7 133 610 225 (292)
8 129 565 226 (234)
9 124 522 242 (277)
10 131 447 291 (337)
11 122 586 207 (269)
12 115 472 238 (250)
13 116 513 224 (234)
14 117 486 239 (265)
15 104 425 249 (286)
16 99 447 216 (279)
17 94 409 234 (258)
18 91 386 234 (286)
19 61 311 197 (270)
a Based on Ensembl.
A.C. Cervino et al. / Genomics 86 (2005) 505–517 507
for good coverage on all chromosomes, with an average SNP
spacing of 200 kb (Table 1). On average, roughly 33% of the
validated SNPs were informative between any pair of strains,
allowing for the selection of high-density SNP panels for use in
genotyping experimental crosses constructed from any pair of
strains represented in our set. We have seen that increasing the
number of markers used in a QTL analysis from the
recommended 100 to 200 for a moderate-sized cross [11] to
1500 or more can improve linkage results by increasing the
accuracy of the positioning of QTL (manuscript in preparation).
Haplotype blocks
Due to the close genetic relationship among the various
strains, one would expect large areas of the mouse genome to
be identical by descent between any two common strains. One
recent study by Petkov et al. [12] reported an average block
size of 6 Mb (personal communication). If this were in fact the
case, one would expect 1 SNP per 200 kb to capture a large
part of the haplotype structure in mice. However, others have
provided data indicating a much finer haplotype structure. Park
et al. [13] used haplotype block analysis in an attempt to map a
quantitative trait gene for a complex disease trait mapping to
the proximal end of chromosome 19. Their data appear to be
consistent with haplotype blocks that are greater than 80 kb,
but with the largest observed haplotype block spanning only
¨150 kb in their chromosome 19 region of interest. Because
the experimental approach employed by Park et al. did not
sequence the entire region of interest, but instead investigated
only short amplicons across the region, they could not rule out
haplotype blocks of substantially shorter length. In yet another
study, Wade et al. [8] found only two different haplotypes
among 22 of 27 sequence segments of length 500–1000 bp,
highlighting again the potential to use haplotype-based analysis
to identify candidate genes, given that regions of the mouse
genome that are identical between any two given strains
exhibiting phenotypic differences can be excluded as candidate
regions underlying such differences. Others have reported
increased complexity in the mouse genome, including micro-
heterogeneity in otherwise identical stretches of genomic
sequence among different strains of mice, highlighting the
potential risks in creating haplotype blocks that may contain
small varying genomic segments [14,15].
We investigated the relationship between strains by looking
at regions of the genome between two strains that were not
informative (potentially identical by descent). We calculated
the block distance as the distance between the two polymorphic
markers most closely flanking the stretch of homozygous
markers defining the block. Of course, the average block length
defined in this way is an upper bound on the true block length,
given that informative SNPs that were not genotyped could
have broken up the homozygous regions into smaller blocks.
To illustrate this point we estimated block lengths using
varying densities of SNPs for a given region. In addition, we
constructed similar blocks based on all of the SNPs available
from NCBI and Celera for the DBA/2J and C57BL/6J strains
of mice in which regions identical by descent (IBD) can be
precisely identified given the high density of SNPs [5]. In Fig.
1, described below in more detail, we demonstrate that our
high-density SNP panel can be used to accurately identify
regions that are IBD between two strains of mice. Using the
comprehensive set of SNPs available in the Celera RefSNP
database for the B6 and DBA strains of mice, wherein the
genomes for these two strains of mice have been completely
sequenced, all IBD regions between these two strains were
identified. Comparison to the regions of sharing constructed
from our high-density SNP map highlights that our dense SNP
map was able to identify the major IBD blocks in this region
(highlighted in Fig. 1).
Relationship between strains based on hierarchical trees
To investigate the relationship between the 62 strains we
computed a cluster dendrogram using all of the SNPs across
the entire genome between any two strains as a measure of
similarity between them. Only one haplotype per strain was
used in the calculations since the mice were inbred, and the
number of observed heterozygotes (proportional to the
genotyping error rate in this case) was judged to be small
enough to have no significant impact on our conclusions.
Because SNPs were selected based on the information content
in the strains of interest, care must be exercised in interpreting
the tree as representative of genetic history. However, given
that our SNP selection process was not intentionally biased
toward any particular group of strains, it is likely to have
resulted in a less biased set of markers that captures more of the
true genetic history among the strains.
The first similarity measure used to assess relationships
among the mouse strains was computed as the average length
over all blocks of SNPs in which a given strain pair shares the
same alleles. We expect the length of these ‘‘blocks’’ to be
Fig. 1. Defining regions identical by descent (IBD) that highlight the haplotype structure through the chromosome 1 region identified in the text for the B6, DBA,
and C3H strains of mice. The top is a graph of the SNP counts between B6 and DBA in the 120- to 130-Mb chromosome 1 region, in which the counts are specific to
50-kb windows moved through the region in 10-kb increments. The Celera genome sequence for the B6 and DBA strains was used to define the SNPs, providing a
nearly complete map of DNAvariation between these two strains [44]. Regions with consecutive counts that fall below 5 are considered IBD between the B6 and the
DBA strains of mice [5]. The second graph is the same as the first, but smoothed by computing a moving average through the interval. The third graph indicates
regions that are called IBD using the frequency plot given in the second graph. Regions in which the count dropped below 5 were considered IBD (curve set to 0),
while all other regions were considered heterozygous (curve set to 1). Panel 4 indicates IBD regions for the DBA and B6 strains of mice defined by the high-density
SNP map described in the text. Yellow indicates IBD regions as defined in the text (homozygous blocks), while green indicates regions that are polymorphic between
the two strains. The IBD regions identified by the high-density SNP map in this fairly narrow interval can be seen to approximate well the IBD regions identified
using the complete data from the Celera database. The bottom panel is similar to the fourth but for the B6 and C3H strains of mice. The vertical black line near 124
Mb indicates the location of the Insig2 gene.
A.C. Cervino et al. / Genomics 86 (2005) 505–517508
proportional to the extent of shared ancestry (i.e., the longer the
shared blocks, the more closely related two strains are).
Hierarchical trees were generated for each chromosome using
the R package and the standard hclust function. Trees for the 19
autosomal chromosomes are available in the supplementary
material, but a representative tree is shown in Fig. 2A for
chromosome 1. In looking at the similarity between strains
using the longest block algorithm, it is interesting to note that
the trees vary by chromosome, suggesting that stochastic,
perhaps genetic, and even epigenetic processes were at play in
fixing DNA variation among the strains. This interchromosome
tree variation warrants further investigation to assess whether
the trees are significantly different or statistically indistin-
guishable. For ease of interpretation we have colored the strains
by group, as categorized by Beck et al. [16]. It is apparent that
the clusters generally reflect the known relationships among
the different strains of mice. This fact at least partially validates
the choice of distance metric.
The main characteristic shared across all 19 chromosomes
is that of separating ‘‘wild-derived’’ mice (represented in
brown) from the remaining inbred mice at the first
bifurcation. At most it took two bifurcations in the tree to
separate all wild strains from the standard inbred laboratory
strains. As seen in Fig. 2A, the first bifurcation separates the
wild strains Mus mus musculus, M. m. molossinus, M.
spicilegus (formerly hortulanus), M. spretus, and M. m.
castaneus from all other strains. The second bifurcation
separates the M. m. domesticus strains from the common
inbred laboratory strains, indicating a closer genetic distance
from the common inbred laboratory strains to the M. m.
domesticus subspecies. The standard laboratory mice are
reported to be descendents of both M. m. domesticus and M.
m. musculus [16]. The M. m. domesticus group, based on the
chromosome 1 tree in Fig. 2A, includes strains PERA, PERC,
WSB, WMP, TIRANO, and ZALENDE as well as IS. IS is
only partly M. m. domesticus and it is interesting to see it
Fig. 2. Relationships among inbred lines of mice inferred from the high-density SNP map. (A) Hierarchical tree highlighting relationships among 62 strains of mice
based on chromosome 1 SNPs. (B) Phylogenetic tree for 62 common inbred strains of mice based on SNPs in chromosome 1.
A.C. Cervino et al. / Genomics 86 (2005) 505–517 509
A.C. Cervino et al. / Genomics 86 (2005) 505–517510
cluster with the other domesticus strains. More surprising is
the clustering pattern of M. m. poschiavinus. RBF clusters
with other wild-derived strains only for chromosomes 6, 8, 9,
12, 13, 14, and 18. RBF is adjacent to PL for three of the
chromosomes (1, 3, and 7), RF for two chromosomes (2 and
17), and M16 for two other chromosomes (5 and 16).
Otherwise the separation between the ‘‘M subspecies’’ is
exactly what is expected: M. spicilegus and M. spretus are
the most distant strains in our tree, both strains having
diverged over 2 million years ago, followed by M. m.
domesticus, castaneus, and musculus, which diverged ap-
proximately 1 million years ago [16]. Our data further
support the view that M. m. molossimus would be derived
from M. m. musculus and M. m. castaneus [12].
Similar conclusions were reached when looking at the
genealogy tree (Fig. 2B). We reconstructed a mouse
phylogenetic tree using Phylip 3.6. Kimura’s two-parameter
model was used for calculation of genetic distance using the
SNPs on chromosome 1 [17]. The neighbor method was
selected for building the phylogenetic trees [18]. Compared to
Petkov et al. [12], our tree does not separate the group 1 and
group 2 mice they reported (the Bagg albino derives from the
Swiss mice). The minor differences between the two trees are
likely due to biases in SNP selection, so that the Petkov tree
is able to separate these closely related groups because the
SNP selection they employed was more biased toward these
strains.
Fig. 3. Map of allelic association for markers on chromosomes 8 and 14. The black p
value less than 2.3 � 10�8), where the upper triangle of the matrix corresponds to
marker data. There are six regions indexed by roman numerals representing correlati
on chromosome 8 for permuted data, (III) on chromosome 14 for observed genotype
8 and 14 for observed genotype data, and (VI) on chromosomes 8 and 14 for permut
8 and 14 that are strongly associated with each other.
Strong nonrandom associations observed among unlinked
markers
We next computed all pair-wise correlations as a measure
of linkage disequilibrium (LD) between SNP markers over
the 62 inbred strains of mice to assess the degree of
nonrandom association between alleles at different loci (both
linked and unlinked). After a filter was set to eliminate SNPs
with minor allele frequencies less than 15%, a total of 6641
of the 10,350 genotyped SNPs were selected for this
association analysis. Of the pairs considered, 930,037
(4.2%) were significantly correlated at the 0.0001 signifi-
cance level (2204 would be expected by chance). To
estimate the false discovery rate (FDR) [19] at the 0.0001
level, we permuted the genotypes for each marker 1000
times and computed all pair-wise correlations for each
permuted marker set. The mean number of pairs with
correlations significant at the 0.0001 level over all permuted
sets was 8901, leading to an FDR of just under 1%.
Interestingly, of the 930,037 pairs that were significantly
correlated, 836,315 (nearly 90%) comprised SNPs located on
different chromosomes, demonstrating a degree of allelic
association that has not been previously reported for the
common strains of inbred mice.
Fig. 3 highlights the extent of ‘‘long-range’’ allelic
association among markers defining LD blocks on chromo-
somes 8 and 14. Highlighted in Fig. 3 are two LD blocks on
oints depict marker pairs for which R2 is greater than 0.4 (corresponding to a p
the observed marker data and the lower triangle corresponds to the permuted
ons between marker pairs: (I) on chromosome 8 for observed genotype data, (II)
data, (IV) on chromosome 14 for permuted genotype data, (V) on chromosomes
ed genotype data. Highlighted by the red boxes are LD blocks on chromosomes
A.C. Cervino et al. / Genomics 86 (2005) 505–517 511
chromosomes 8 and 14 that are strongly associated with
each other. The LD block highlighted on chromosome 8 is
associated with many regions over both chromosomes, a trend
that extends to all other chromosomes examined. This type of
behavior highlights that there are some LD blocks or specific
markers that are highly connected with many different regions
of the genome (‘‘hub’’ nodes of the marker interaction
network), suggesting a scale-free topology [20] for the network
of associated markers (Supplementary Fig. 1). Nonrandom
association of alleles at unlinked markers is a well-known
phenomenon in human genetic studies. Clustered sampling,
epistatic interactions among two or more loci, and population
admixture are all well-recognized mechanisms that can lead to
the type of behavior exhibited in Fig. 3 [21].
Two pieces of data suggest that at least part of the
observed allelic association may be due to epistatic selection
of unlinked alleles during the construction of the inbred
strains of mice. First, the SNP interaction network appears to
exhibit scale-free properties (Supplementary Fig. 1), which
may indicate specific regions that were strongly selected for
during the inbreeding process and interacting epistatically
with many other regions across the genome. Second, when
we examined genes in the associated chromosome 8 and 14
LD blocks (Fig. 3) for enrichment of GO Biological Process
categories, we found that fertilization pathways were
significantly enriched in these two regions. Specifically, there
were 813 genes represented in the chromosome 8 and 14 LD
regions and 128 genes represented in the GO Biological
Process fertilization category, of 28,069 genes annotated in
the genome [22]. Of the 783 genes in the LD regions, 16
were in the fertilization category (4 would be expected by
chance). The probability of observing this enrichment by
chance is 1.6 � 10�6, as determined by the Fisher exact test.
Whatever the explanation for the long-range allelic associ-
ation, the extent of correlation among these markers has the
potential to lead to spurious associations in association-based,
genome-wide QTL mapping studies.
From linkage to in silico QTL mapping, an integrated
approach to mapping genes for complex traits
One use of the high-density SNP panel described herein is in
carrying out association-based (in silico) QTL mapping studies
among the inbred strains represented in the set. However, the
limited number of strains represented in the set of strains
genotyped over this high-density panel, combined with testing
many markers in the context of a genome-wide scan (multiple
testing issues) and the problem of allelic association among
unlinked markers, not only lead to reduced power to identify
regions of the genome associated with phenotypes of interest,
but can also lead to spurious associations between markers and
disease phenotypes. To address these limitations we propose a
multistep strategy to carry out association-based QTL mapping
studies.
The first step involves analysis of one or more experimen-
tal cross populations to identify QTL for disease phenotypes
of interest. The identification of QTL for disease phenotypes
serves to focus attention on a significantly reduced portion of
the genome, in which association studies can then be carried
out to fine-map the QTL regions of interest. A significant
number of QTL studies have been carried out in experimental
cross populations, so that these populations make for a useful
resource in focusing attention on regions known to be
associated with disease phenotypes. Further, QTL analyses
in experimental crosses such as an F2 intercross are not
subject to allelic association between genes on different
chromosomes, such as that depicted in Fig. 3. However, in
these types of experimental crosses, the precise positioning of
QTL is well known to be problematic [11], so that following
up detection of QTL in this setting with an association-based
approach could provide a more rapid method to fine-map the
QTL regions. This complementary linkage/association ap-
proach is a common strategy employed in human genetic
studies [23].
The second step involves carrying out genetic association
studies in the QTL support intervals of interest. The haplotype
structure among the inbred strains of mice are used in this
type of analysis to narrow more rapidly the precise region
controlling for the disease phenotypes of interest, as previously
described by others [6,7,10]. However, unlike these and related
studies, the power to detect association will be significantly
increased given the focus on a single QTL region (which nearly
eliminates multiple testing issues), and there will be less danger
of spurious associations with unlinked regions that are not
associated with the disease phenotypes of interest.
The final step leverages expression data in the experimen-
tal cross populations as a way to prioritize positional
candidates supported by haplotypes identified as associated
with the disease phenotypes of interest. Several groups have
proposed a straightforward strategy for identifying candidate
susceptibility genes for QTL of interest by integrating gene
expression and genotypic data in the context of experimental
cross populations [2,4,24–29]. Briefly, genes physically
located in the regions supported by the QTL of interest are
identified. The corresponding expression traits from a given
tissue for these genes are then treated as quantitative traits in
the experimental cross setting, and eQTL are mapped for
these expression traits. Genes can be considered experimen-
tally supported positional candidates identified in a complete-
ly objective fashion if they have the following properties: (1)
detected as expressed in the tissue of interest, (2) physically
residing in the QTL support interval for the phenotypes of
interest, (3) correlated with the phenotype of interest, and (4)
exhibiting cis eQTL behavior.
Application of the integrated linkage/association-based QTL
mapping strategy to identify Insig2 as a strong candidate gene
for total plasma cholesterol levels
In a complementary study we have mapped QTL for
metabolic traits and liver gene expression traits in F2 mice
constructed from the C57B6/J (B6) and C3H/HeJ (C3H) strains
of mice on an ApoE null background [30]. Linkage analysis
was performed on various metabolic traits of interest to obesity.
Table 2
QTL analysis results for the BXH mice for the chromosome 1 locus defined in the text
Trait Chr. Start End Max LOD R2 Add Dom
Free fatty acids 1 0.5331 1.0281 0.799 6.747234 0.104713 16.22848 1.895533
Glucose 1 4.28 � 10�2 1.0281 0.7733 7.786703 0.119519 54.53892 11.004
Log(HDL cholesterol) 1 0.2144 1.0281 0.7787 9.282095 0.140734 0.212238 6.24 � 10�2
LDL + VLDL 1 0.2751 1.0281 0.6351 5.476793 8.57 � 10�02 127.9582 93.38116
Total cholesterol 1 0.2551 1.0281 0.6351 5.934453 9.25 � 10�2 135.1756 95.01039
Log(triglycerides) 1 0.5209 1.0281 0.7787 8.928773 0.136227 0.380037 �0.16505
The third and fourth columns define the region in Morgans supporting the linkage, the fifth column gives the position in Morgans for the peak lod score in the
linkage region, the sixth column gives the maximum lod score in the linkage region, the seventh column gives the fraction of variation explained by the QTL, and the
final two columns give estimates of the additive and dominance effects for the QTL.
A.C. Cervino et al. / Genomics 86 (2005) 505–517512
The strongest lod scores were observed on chromosome 1 for
free fatty acids, plasma glucose levels, plasma HDL cholesterol
levels, plasma LDL + VLDL cholesterol levels, total plasma
cholesterol levels, and triglyceride levels (Table 2). Fig. 4 gives
the chromosome 1 lod score curve for total cholesterol and
insulin levels. Although the lod peak covers a large interval of
Fig. 4. Cis eQTL for the Insig2 gene (RIKEN cDNA 2900053111) from the
BXH liver expression data overlapping cQTL for insulin and total cholesterol
level traits scored in the BXH animals. The curves represented at the top were
constructed using the full BXH data set, while the curves at the bottom were
constructed using only the female animals in the BXH set.
about 40 cM (typical for an F2 cross), the region defined by a
one lod drop (99% confidence interval for true QTL location) is
5.5 cM (between 58.9 and 64.4 cM, corresponding to the
physical interval 120 to 128 Mb).
We performed genetic association tests between SNPs and
lipid traits represented in the JAX Mouse Phenome database
to attempt to support and narrow further the chromosome 1
linkage region identified in the BXH cross. We queried the
JAX Phenome database for experiments that were compara-
ble to the BXH cross and for which lipid traits had been
scored. The mice comprising the BXH data set were 24
weeks of age and had been on a high-fat Western diet for 14
weeks at the time the cholesterol and insulin phenotypes
were scored. Given this, we selected the following three sets
of data from the phenome database [31]: ‘‘chol_17’’ from the
Paigen2 set of experiments (MPD:99) consisting of total
cholesterol measurements on mice that were 25 weeks of age
and that had been on a high-fat diet for 17 weeks, ‘‘chol_fat’’
from the Albers1 set of experiments (MPD:8) consisting of
cholesterol measurements on 12-week-old mice that had been
on a high-fat diet for 6 weeks, and ‘‘total_chol’’ from the
Paigen1 set of experiments (MPD:29) consisting of choles-
terol measurements on 16-week-old mice that had been on an
atherogenic diet for 8 weeks. Association analyses were
performed separately for these three data sets; where
appropriate, gender was considered as a covariate. Of the
different strains we had genotyped, 33 were represented in
the Paigen2 set, 12 in the Albers1 set, and 31 in the Paigen1
set. The results from single SNP association tests with the
total cholesterol traits are given in Table 3. Only a single
SNP at 121 Mb on chromosome 1 was identified as
Table 3
Tests for association between SNP markers with minor allele frequencies
greater than 1% in the chromosome 1 linkage region defined in the text and
total cholesterol levels
Data set SNP p value
Alberts1 119144634 0.005294
Alberts1 120992934 0.005294
Alberts1 131027701 0.007862
Paigen1 119144634 0.004477
Paigen1 120992934 0.001748
Paigen2 70842743 0.009736
Paigen2 120992934 0.003204
SNP 120992934 is the only marker significantly associated (at the 0.01
significance level) with total cholesterol in all three data sets.
A.C. Cervino et al. / Genomics 86 (2005) 505–517 513
significantly associated with cholesterol levels in all three
data sets. This site lies within the 99% confidence interval
for the QTL location (120 to 128 Mb).
Further support for a cholesterol trait QTL in this region
is provided by a previously described cross between the B6
and DBA strains of mice (BXD) [3], which shows linkage to
this region for cholesterol levels, free fatty acids, aortic
lesions, and fat pad mass, with lod scores of 2.0, 1.5, 1.5,
and 3.0, respectively. Although lod scores of 2–3 may be
considered only suggestive in a genome-wide scan, they can
be considered significant in the context of a region-specific
test of a hypothesis generated by the BXH data. If we
assume that the same QTL is segregating in both the B6 �C3H and the B6 � DBA crosses, then further positional
information is provided by the high-density SNP map. This
QTL is unlikely to fall in a region that is IBD between B6
and C3H or between B6 and DBA. Fig. 1 shows three
relatively large blocks of IBD that exclude a considerable
portion of the QTL 1-LOD region, leaving the interval 121
to 125 Mb and small intervals near 120 and 128 Mb as the
most likely locations.
We identified all genes physically residing in the 1-LOD
(120 to 128 Mb) interval that (1) had gene expression values in
liver from the BXH cross that were significantly correlated
with cholesterol levels and (2) gave rise to a cis-acting eQTL in
the BXH liver data (Fig. 4). Only one gene in this interval was
found to meet these criteria, Insig2, which is located at about
124 Mb (Fig. 1). In addition, there was a significant cis-acting
eQTL for Insig2 liver expression in the BXD data giving rise to
a lod score of 4.7. This eQTL roughly explains the same
amount of variation as the cis-acting eQTL for this gene in the
BXH cross (the smaller BXD lod score reflects the fact that the
BXD cross is roughly 40% the size of the BXH cross), where
the C3H and DBA versions of this gene are showing increased
expression relative to the B6 version of the gene. Therefore, the
BXD data are completely consistent with the BXH and inbred
strain data and support Insig2 as a strong candidate gene
underlying the chromosome 1 cholesterol linkage. Insig2 is a
strong candidate susceptibility gene for cholesterol regulation
given the key roles of the Insig proteins in lipid metabolism
[32,33]. Furthermore, Insig2 is supported as a causal candidate
for these traits using the likelihood test for causality previously
described [2].
Table 4
Correlations between gene expression traits (Ins1, Ins2, Scap, Srebpc1, Srebpc2, Sc
INS2 Insulin INS1 Total Chol SREBP1
INS2 0.822964 0 0.796904 0.801376
Insulin 0.015572 0.917524 5.75 � 10�14 0.161077
INS1 0.978438 0.024302 0.701406 0.93651
Total Chol 0.014874 0.419804 0.045175 0.208941
SREBP1 0.019599 �0.08149 0.008504 0.0682
HDLC 0.049301 0.169336 0.073052 0.248179 �0.24365
SREBP2 �0.02595 �0.02545 �0.03464 �0.14182 0.249391
Insig1 �0.05377 0.142939 �0.05692 0.068291 0.371524
SCAP 0.029268 �0.29465 0.007832 �0.26951 0.423068
Insig2 �0.00983 0.424657 0.019525 0.343963 �0.40619
The upper triangle of the matrix gives the p values for the correlations, and the low
Genes interacting with Insig2 further support its role in the
regulation of cholesterol
As described in the literature, the regulation of the Srebp –
Scap – Insig system is complex [33–36]. Insig1, Insig2, Srebp-
1c, and Srebp2 affect the transcription of different lipid and
cholesterol metabolism genes. At physiological levels, Srebp-
1c plays a larger role in controlling the transcription of
lipogenic genes such as fatty acid synthase. Srebp-2 is a potent
activator of many cholesterol biosynthetic genes like HMG-
CoA reductase, HMG-CoA synthase, and squalene epoxidase,
in addition to many others. Only recently was Insig2a
discovered as a liver-specific isoform [33]. Yabe et al. showed
that the expression of the liver-specific isoform, Insig2a, is
repressed by insulin. In contrast, the expression of Srebp-1c
was up-regulated in the presence of insulin. The probe for
Insig2 represented on the gene expression array used in the
BXH cross is situated in the 3VUTR region of the gene and is
specific to Insig2a, the liver-specific isoform. Insig1 and
Insig2 are also regulated differently by sterols. Interestingly,
our results show that Insig2a and not Insig1 is associated with
plasma cholesterol levels. To investigate further the biological
significance of Insig2a in the BXH cross, we computed all
pair-wise Pearson correlations between the Srebp-1c, Srebp2,
Scap, Ins1, Ins2, Insig1, and Insig2a gene expression traits as
well as plasma cholesterol and insulin levels (Table 4). Insig2a
expression in the liver is significantly correlated with plasma
insulin, HDLC, and total cholesterol levels. Insig2a expression
levels are also negatively correlated with Srebp-1, Srebp-2, and
Scap expression levels. The negative correlation between
Srebp-1 and Insig2a in our data is in line with the results by
Yabe et al. [33], in which Srebp-1 and Insig2a are oppositely
regulated by insulin.
To investigate further the different roles Insig1 and Insig2a
may play in vivo, we used our previously described Bayesian
gene network reconstruction method [4] to explore the gene
neighborhoods surrounding the Insig1 and Insig2a gene nodes
in the female liver transcriptional network for the BXH cross
(Fig. 5). Although Insig1 and Insig2a were in the same gene
network neighborhood, genes appearing downstream of them
were distinct. Genes most strongly connected to Insig1
included fatty acid synthase (Fasn), stearoyl-coenzyme A
desaturase 1 (Scd1), acetyl-CoA synthetase 1, ATP citrate lyase
ap, Insig1, Insig2) and plasma protein levels (insulin, total cholesterol)
HDLC SREBP2 Insig1 SCAP Insig2
0.435403 0.362521 0.301656 0.345414 0.759756
0.031237 0.210432 0.010866 2.20 � 10�7 6.66 � 10�16
0.413371 0.410997 0.467686 0.75256 0.798459
7.36 � 10�5 0.001509 0.124138 1.24 � 10�6 1.03 � 10�9
7.38 � 10�5 5.35 � 10�6 1.12 � 10�10 3.71 � 10�14 6.59 � 10�12
0.083233 0.953825 2.81 � 10�6 1.61 � 10�11
�0.10635 0 0 0.000172
�0.05974 0.499925 8.13 � 10�7 0.044492
�0.27707 0.453157 0.290725 1.78 � 10�10
0.378067 �0.21607 0.022155 �0.34113
er triangle of the matrix gives the Pearson correlation coefficients.
Fig. 5. Insig1/Insig2 subnetwork from the BXH liver Bayesian network. The Bayesian network was reconstructed as described in the text. The subnetwork was
obtained from the full liver expression network by identifying nodes within a pathlength of 4 of Insig1 or Insig2. Highlighted in the network are nodes corresponding
to genes involved in lipid and cholesterol metabolism. As described in the text, Insig2 is significantly associated with a set of genes that are enriched for those
involved in cholesterol metabolism (light blue nodes), whereas Insig1 is significantly involved in lipid synthesis (green nodes). The Insig1-specific network appears
to be involved in biological processes that are distinct from biological processes associated with Insig2.
A.C. Cervino et al. / Genomics 86 (2005) 505–517514
(Acly), and other lipogenic genes (Fig. 5). These genes are
known to be activated by Srebp-1c. In contrast, genes most
closely related to Insig2a were generally cholesterol biosyn-
thetic genes, such as squalene epoxidase (Sqle), HMG-CoA
synthase, and Cyp51, among several others (Fig. 5). Srebp-2 is
known to activate cholesterol biosynthetic genes such as these.
To assess further the significance of the cholesterol synthesis
association observed in the Insig2a subnetwork, we examined
those genes in the BXH liver data that were significantly
correlated with Insig2 expression levels. Of the 14,424 most
transcriptionally active genes in the liver, 878 were signifi-
cantly correlated with Insig2 at the 0.01 significance level.
These genes were tested for enrichment of GO Biological
Process categories. Cholesterol metabolism was among the
most significantly enriched categories in this gene set
(uncorrected Fisher exact p value 9.9 � 10�11; p value after
correction for multiple testing was 1.4 � 10�7). In addition, of
the 14,424 most transcriptionally active genes, 954 had eQTL
with LODs greater than 2 falling in a 15-cM window
containing Insig2, and of these, 123 were also significantly
correlated with Insig2. The probability we would observe this
overlap by chance is 4.7 � 10�16, as determine by the Fisher
exact test.
Discussion
We have employed a novel procedure to identify genes
underlying quantitative traits using a comprehensive SNP map
constructed over 62 common strains of mice. By intersecting
F2 gene expression and clinical trait QTL data and publicly
available phenotype data on common inbred strains of mice,
we were able to narrow significantly a region of the genome
controlling for cholesterol levels in mice. Combining the
haplotype data with the genetics of gene expression data
generated in the BXH and BXD crosses, we were able to
identify Insig2 as a strong candidate susceptibility gene for the
cholesterol trait in the refined region. The process leading to
the identification of Insig2 was completely objective and data
driven, leveraging off of a number of novel data sources and
methods to identify the gene.
Among the interesting observations from our high-density
SNP map is the extensive long-range allelic association
among unlinked markers throughout the genome. The extent
of the allelic association has not been previously described in
the inbred strains of mice. While population admixture is a
plausible explanation for the extensive long-range allelic
association observed in this collection of inbred strains of
mice, it also seems likely that epistatic selection may have
played an important role. The interaction network of
associated markers appears to exhibit scale-free properties
(Supplementary Fig. 1), which may indicate specific regions
that were strongly selected in interaction with many other
regions across the genome during domestication and inbreed-
ing. Furthermore, epistatic selection related to hybrid
incompatibilities may have occurred as a consequence of
the origin of laboratory mice as a hybrid mixture of M. m.
domesticus from Western Europe and other subspecies (M.
m. musculus, molossinus, and castaneus) from Asia [8,37]. It
is well known that hybrids between these subspecies show
marked genetic incompatibilities (i.e., negative epistatic
interactions) that cause infertility, particularly in males
[38,39]. The genetic divergence between subspecies could
result in incompatibilities affecting a number of traits in the
hybrid progenitors of laboratory mice.
A.C. Cervino et al. / Genomics 86 (2005) 505–517 515
The identification of Insig2 as a susceptibility gene for
circulating levels of cholesterol using the fine-mapping strategy
described herein is supported by several additional lines of
evidence. First, the literature supports Insig2 as one of the key
genes involved in the regulation of lipid and cholesterol
metabolism. Second, our gene network showed that Insig2a is
associated with cholesterol biosynthesis genes, and liver gene
expression traits significantly correlated with Insig2 are
significantly enriched for genes involved in cholesterol
metabolism. Third, genes that are significantly correlated to
Insig2 are enriched for eQTL that link to the Insig2 gene, and
this set of genes is significantly enriched for genes involved in
cholesterol metabolism. Finally, the DBA strain carries the
same haplotype as the C3H strain in the region containing the
Insig2 gene, and the BXD data are completely consistent with
the BXH data and support Insig2 as at least one of the genes
controlling for cholesterol levels in the BXH chromosome 1
linkage region. In addition to cholesterol levels, Insig2 liver
expression was found to be significantly associated with
obesity and diabetes traits. From the network in Fig. 5 we
note that Insig2 is upstream of several genes (BC014805,
Socs2, and Mod1) testing as causal for obesity traits in the
BXD cross [1,2]. In a growth hormone deficient state, Socs2 is
known to increase fat mass in rodents and humans. Therefore,
our data also suggest that Insig2 may play a key role in
metabolic traits related to obesity and diabetes.
These data taken together provide strong evidence that
Insig2 is at least one of the genes underlying the chromosome 1
linkage in the BXH cross to cholesterol traits. It is interesting to
note that although Insig1 is a known gene involved in lipid and
cholesterol metabolism, we found Insig2a and not Insig1 to be
supported as a causal candidate for plasma cholesterol levels in
the BXH animals. The gene network we constructed for the
BXH data also showed Insig2a to be more closely associated
with cholesterol biosynthesis genes than Insig1, whereas Insig1
was more closely associated with lipogenic genes. This could
indicate that despite the apparent similarity in terms of binding
Scap and regulating the movement of Srebp’s into the Golgi for
processing, there may be some level of compartmentalization
regarding the roles of these genes, perhaps via more specific
interactions with the different Srebp’s. Alternatively, it may be
that Insig1 activity was not as polymorphic in the BXH and
BXD crosses, so that we were not as powered to detect
associations between Insig1 activity and the cholesterol traits in
these crosses.
Despite the strong evidence presented here supporting Insig2
as a causal candidate gene for cholesterol synthesis, the putative
functional role of this gene must be further validated using more
traditional methods such as gene knockouts or transgenics.
Isolating the C3H version of the Insig2 gene on the B6
background and demonstrating that these mice have increased
cholesterol levels over B6 control mice would provide the
necessary experimental support to conclude that Insig2 is at
least one of the quantitative trait genes underlying the
cholesterol linkages in the BXH cross. However, what was
accomplished via the application of our process was an
objective, completely data-driven approach to reducing the
number of genes to consider as candidates for the chromosome
1 cholesterol linkage. Other methods based on in silico QTL
mapping, QTL mapping in an F2 mouse population, or gene
expression experiments among the inbred strains of mice on
their own would not have supported Insig2 as strongly as a
susceptibility gene. It was the combination of multiple filters
over multiple orthogonal data sets that ultimately led to the
identification of this gene. The degree of pathway coherence
that exists among the genes linked to the Insig2 locus,
correlated with the expression of the Insig2 gene and correlated
with the cholesterol traits, provides additional confidence that
Insig2 is likely involved in the regulation of circulating
cholesterol levels. In the future, more integrative approaches
such as this may facilitate a more objective, data-driven
approach to the identification and prioritization of candidate
genes underlying QTL for complex traits of therapeutic interest.
Methods
Strain selection
A total of 63 inbred strains were selected based on their demonstrated
utility for biological and biomedical research, including their relevance as
models for common human diseases. The set of 63 strains consisted of A/J,
AKR/J, BALB/cJ, BKSdb, BTBR + (T) tf/tf, BUB/BnJ, C3H/HeJ, C3H/
HeSnJ, C57BL/10J, C57BL/6J, C57BLKS/J, C57BR/cdJ, C58/J, CAST/Ei,
CBA/J, CE/J, CZECHII/Ei, DBA/2J, DDY/Jcl, EL/SuzSeyFrkJ, FVB/NJ,
HTG/GosfSn, I/LnJ, ICR, IS/CamRk, 129P3/J, 129S1/SvlmJ, 129SBE-M,
129X1/SvJ, KK/HlJ, LG/J, LLCA, LP/J, M16, MOLF/EiJ, MRL/MpJ, NOD/
LtJ, NON/LtJ, NZB/BlNJ, NZO/HILtJ, NZW/LaC, P/J, Mus Pahari/Ei,
PANCEVO/Ei, PERA/EiJ, PERC/EiJ, PL/J, PWK/Ph, RBF/DnJ, RF/J,
RIIIS/J, SAMR1,SAMP10, SAMP8, SJL/J, SKIVE/Ei, SM/J, SPRET/Ei,
SWR/J, TIRANO/Ei, WMP/Pas, WSB/Ei, and ZALENDE/Ei. DNAs from
SAMR1, SAMP10, and SAMP8 mice were sent to us by Jennifer Greenhall
of UCSD; DNAs from the ICR and M16 strains were sent to us by Daniel
Pomp of the University of Nebraska; DNA from BKSdb was sent to us by
Richard Davis of UCLA; and DNAs from the remaining 57 strains were
ordered from The Jackson Laboratory. Only 1 of the strains, Mus pahari,
failed the genotyping set and was therefore excluded from all of the analyses.
SNP selection
Reference SNPs from dbSNP (Build 118) were mapped to the public
(October 2003, NCBI Build 32) and Celera (release 13) assemblies as
described below. SNPs were selected from 800-kb windows across both
genomes where possible. When using the Celera assembly, Celera SNPs
(release 3.4) from their mouse genome assembly were used in addition to the
publicly available SNPs in dbSNP. Multiple SNPs were picked from
windows for which strain information was available to increase the
likelihood that the window would contain at least one informative SNP for
a wide range of strain pairings. Four reference strains were used (C57BL/6,
A, CAST, CZECHII) to represent a diverse population of common strains,
and SNPs were selected to provide informative markers for all strains
annotated vs the four reference strains in each window. SNPs within each
window were scored for a variety of parameters as described below, and the
top scoring SNPs were selected from each window. Celera SNPs selected for
genotyping were subsequently mapped to the public (October 2003, NCBI
Build 32) assembly as described below.
The SNP flanking sequences (Celera SNP and dbSNP) were mapped onto
the Celera mouse assembly and the public mouse assembly (mm4, October
2003 release http://genome.ucsc.edu) by BLASTN (1) with the word size
parameter set to 28 and an e value threshold of 1 � 10�3. SNP positions
matching the following criteria were stored: (1) the alignments were longer than
100 bp with 97% identity, (2) the genomic sequence at the SNP position
A.C. Cervino et al. / Genomics 86 (2005) 505–517516
matched one of the annotated alleles for the SNP, and (3) the aligned sequence
included at least 50 bp on either side of the SNP.
SNPs in a window were scored based on proximity to other SNPs or
repetitive, low-complexity sequence and by the quality of the alignment to the
genomic sequence. SNPs within 5 bases of a repeat sequence or a neighboring
SNPs and SNPs that mapped ambiguously to multiple locations in the genome
were excluded from consideration. In addition, the validation status and
heterozygosity reported by dbSNP were used to prioritize SNPs within any
given window. Finally, SNPs predicted to yield poor genotyping results using
the Illumina genotyping platform were excluded from consideration, and the
highest scoring SNPs remaining were selected from each window.
DNA preparation and genotyping assay
DNA for each strain was isolated by commercial vendors and research
collaborators and sent to the Gene Expression Laboratory at Rosetta to assess
quantity. Each DNA sample was serially diluted (1:40, 1:400) in duplicate and
PicoGreen dsDNA assays (Molecular Probes) were performed on the samples.
Average DNA concentrations were determined using standard curves for the
assay. A 96-well microtiter plate containing 12 Ag (>100 ng/Al) of each
individual DNA sample was prepared and genotyping for all SNPs was carried
out by Illumina as described previously [3].
BXD and BXH F2 crosses
The BXD F2 mouse population and associated liver gene expression data
used in this study have been previously described [3,40]. Briefly, an F2
population consisting of 111 mice was constructed from two inbred strains of
mice, C57BL/6J and DBA/2J. Only female mice were maintained in this
population. Mice were on a rodent chow diet up to 12 months of age and then
switched to an atherogenic high-fat, high-cholesterol diet for another 4 months.
At 16 months of age the mice were euthanized and their livers extracted for
gene expression profiling. The mice were genotyped at 139 microsatellite
markers uniformly distributed over the mouse genome to allow for the genetic
mapping of the gene expression and disease traits.
The BXH F2 mouse population and associated liver gene expression data
used in this study will be more fully described in a future publication [30]. This
population was constructed from C57BL/6J ApoE null (B6.ApoE�/�) mice
purchased from The Jackson Laboratory and from C3H/HeJ ApoE null
(C3H.ApoE�/�) mice generated by backcrossing B6.ApoE�/� to C3H for 10
generations. F1 mice were generated from reciprocal intercrossing between
B6.ApoE�/� and C3H.ApoE�/�, and F2 mice were subsequently bred by
intercrossing F1 mice. A total of 334 (169 female, 165 male) were bred. All
mice were fed Purina Chow containing 4% fat until 8 weeks of age and then
transferred to a ‘‘Western’’ diet containing 42% fat and 0.15% cholesterol for 16
weeks. Mice were sacrificed at 24 weeks. At death, livers were immediately
collected and flash-frozen in liquid N2. Genomic DNA was isolated from
kidney by phenol–chloroform extraction. An examination of the Celera
database identified over 1800 SNPs that showed variation between the B6 and
the C3H strains, and a complete linkage map for all 19 autosomes was
constructed using 1032 of these SNPs, giving rise to a genetic map with an
average density of 1.5 cM. Genotyping was conducted by ParAllele using the
molecular-inversion probe multiplex technique [41].
For the BXH plasma lipid measures, mice were fasted overnight before
blood was collected through retro-orbital vein puncture under isoflurane
anesthesia. Plasma total cholesterol, HDL cholesterol, and triglyceride were
measured with enzymatic assays as previously described [42].
All procedures were in accordance with the current National Research
Council Guide for the Care and Use of Laboratory Animals and were approved
by the UCLA Animal Research Committee.
RNA sample preparation, microarray hybridization, and expression
analysis
Total RNA from the BXD and BXH liver samples was purified from 25-mg
portions using an RNeasy Mini Kit according to the manufacturer’s instructions
(Qiagen, Valencia, CA, USA), as previously described for the BXD set [3].
Fluorescently labeled cRNA (5 mg) from each F2 animal in each cross was
hybridized against a pool of RNAs specific to each cross. The RNA pools for
each cross were constructed from equal aliquots of RNA from all animals in the
BXD cross and 150 randomly selected animals in the BXH cross. Array images
were processed as previously described to obtain background noise, single-
channel intensity, and associated measurement error estimates [43]. Expression
changes between two samples were quantified as log10 (expression ratio),
where the ‘‘expression ratio’’ was taken to be the ratio between normalized,
background-corrected intensity values for the two channels (red and green) for
each spot on the array. An error model for the log ratio was applied as
previously described to quantify the significance of differential expression
between two samples [43].
QTL mapping
QTL analyses on the gene expression traits and phenotypes in the BXD and
BXH crosses were carried out as previously described [3,30,40]. Statistical
procedures for the association-based (in silico) QTL mapping were implemen-
ted in SPLUS 6.1. Cholesterol and insulin measures were treated as continuous
traits and were fitted to genotypic values for a given SNP using a generalized
linear model. When appropriate, sex was treated as a covariate and sex-by-
genotype interactions were modeled. An SNP and phenotype were considered
significantly associated if the likelihood ratio statistic between the full model
and the null model was significant. The null model was similar to the full
model, except that all parameters from the full model containing the genotype
random variable were removed.
Additional statistical analyses
The Pearson correlation coefficient was computed between all phenotypes
and gene expression traits to identify expression traits that were significantly
correlated with relevant phenotypes or other gene expression traits. When
appropriate, gene expression and phenotypic values were adjusted for gender
effects before computing the Pearson correlations. Five hundred thirty-three
genes were identified as significantly differentially expressed in at least 20% of
the BXH liver samples from the greater than 23,000 genes represented on the
microarray used in this experiment. A gene was determined to be significantly
differentially expressed in a given sample if it had an absolute value of the fold
change greater than 1.5, a mean log average intensity greater than �1.5, and a
mean log ratio that was significant at the 0.01 level. The 533 gene expression
traits in addition to 21 metabolic phenotypes were then provided as input into
our previously described Bayesian network reconstruction program, using the
parameter settings originally described by Zhu et al. [4]. That portion of the
network containing Insig1 and Insig2 is shown in Fig. 5.
Acknowledgments
We thank Iris Hovatta (MRL, San Diego) and Jennifer
Greenhall (UCSD) for providing the SAM DNAs, Daniel
Pomp (UN) for providing the M16 and ICR DNAs, and
Richard Davis and Jake Lusis’ lab (UCLA) for providing
DNAs for a number of strains. We also thank the Rosetta Gene
Expression Laboratory for the microarray work.
Appendix A. Supplementary data
Supplementary data associated with this article can be found,
in the online version, at doi:10.1016/j.ygeno.2005.07.010.
References
[1] H. Allayee, A. Ghazalpour, A.J. Lusis, Using mice to dissect genetic
factors in atherosclerosis, Arterioscler. Thromb. Vasc. Biol. 23 (2003)
1501–1509.
A.C. Cervino et al. / Genomics 86 (2005) 505–517 517
[2] E.E. Schadt, et al., An integrative genomics approach to infer causal
associations between gene expression and disease, Nat. Genet. 37 (2005)
710–717.
[3] E.E. Schadt, et al., Genetics of gene expression surveyed in maize, mouse
and man, Nature 422 (2003) 297–302.
[4] J. Zhu, et al., An integrative genomics approach to the reconstruction of
gene networks in segregating populations, Cytogenet. Genome Res. 105
(2004) 363–374.
[5] S. Doss, E.E. Schadt, T.A. Drake, A.J. Lusis, Cis-acting expression
quantitative trait loci in mice, Genome Res. 15 (2005) 681–691.
[6] A. Grupe, et al., In silico mapping of complex disease-related traits in
mice, Science 292 (2001) 1915–1918.
[7] G. Liao, et al., In silico genetics: identification of a functional
element regulating H2-Ealpha gene expression, Science 306 (2004)
690–695.
[8] C.M. Wade, et al., The mosaic structure of variation in the laboratory
mouse genome, Nature 420 (2002) 574–578.
[9] K.A. Frazer, et al., Segmental phylogenetic relationships of inbred mouse
strains revealed by fine-scale analysis of sequence variation across 4.6 Mb
of mouse genome, Genome Res. 14 (2004) 1493–1500.
[10] M.T. Pletcher, et al., Use of a dense single nucleotide polymorphism map
for in silico mapping in the mouse, PLoS Biol. 2 (2004) e393.
[11] A. Darvasi, A. Weinreb, V. Minke, J.I. Weller, M. Soller, Detecting
marker-QTL linkage and estimating QTL gene effect and map location
using a saturated genetic map, Genetics 134 (1993) 943–951.
[12] P.M. Petkov, et al., An efficient SNP system for mouse genome
scanning and elucidating strain relationships, Genome Res. 14 (2004)
1806–1811.
[13] Y.G. Park, R. Clifford, K.H. Buetow, K.W. Hunter, Multiple cross and
inbred strain haplotype mapping of complex-trait candidate genes,
Genome Res. 13 (2003) 118–121.
[14] F.Y. Ideraabdullah, et al., Genetic and haplotype diversity among wild-
derived mouse inbred strains, Genome Res. 14 (2004) 1880–1887.
[15] B. Yalcin, et al., Unexpected complexity in the haplotypes of commonly
used inbred strains of laboratory mice, Proc. Natl. Acad. Sci. USA 101
(2004) 9734–9739.
[16] J.A. Beck, et al., Genealogies of mouse inbred strains, Nat. Genet. 24
(2000) 23–25.
[17] M. Kimura, A simple method for estimating evolutionary rates of base
substitutions through comparative studies of nucleotide sequences, J. Mol.
Evol. 16 (1980) 111–120.
[18] J. Felsenstein, Phylogenies from molecular sequences: inference and
reliability, Annu. Rev. Genet. 22 (1988) 521–565.
[19] G.R. Grant, J. Liu, C.J. Stoeckert Jr., A practical false discovery rate
approach to identifying patterns of differential expression in microarray
data, Bioinformatics 21 (11) (2005) 2684–2690.
[20] A.L. Barabasi, Z.N. Oltvai, Network biology: understanding the cell_s
functional organization, Nat. Rev. Genet. 5 (2004) 101–113.
[21] J. Ott, Analysis of Human Genetic Linkage, Johns Hopkins Univ. Press,
Baltimore, 1999.
[22] T. Hubbard, et al., The Ensembl genome database project, Nucleic Acids
Res. 30 (2002) 38–41.
[23] A. Helgadottir, et al., The gene encoding 5-lipoxygenase activating protein
confers risk of myocardial infarction and stroke, Nat. Genet. 36 (2004)
233–239.
[24] R.B. Brem, G. Yvert, R. Clinton, L. Kruglyak, Genetic dissection
of transcriptional regulation in budding yeast, Science 296 (2002)
752–755.
[25] E.J. Chesler, et al., Complex trait analysis of gene expression uncovers
polygenic and pleiotropic networks that modulate nervous system
function, Nat. Genet. 37 (3) (2005) 233–242.
[26] R.C. Jansen, J.P. Nap, Genetical genomics: the added value from
segregation, Trends Genet. 17 (2001) 388–391.
[27] C.L. Karp, et al., Identification of complement factor 5 as a susceptibility
locus for experimental allergic asthma, Nat. Immunol. 1 (2000) 221–226.
[28] S.A. Monks, et al., Genetic inheritance of gene expression in human cell
lines, Am. J. Hum. Genet. 75 (2004) 1094–1105.
[29] M. Morley, et al., Genetic analysis of genome-wide variation in human
gene expression, Nature 430 (2004) 743–747.
[30] Wang, S., et al. Genetic and genomic analysis of a fat mass trait with
complex inheritance reveals marked sex-specificity. Submitted for
publication (2005).
[31] K. Paigen, J.T. Eppig, A mouse phenome project, Mamm. Genome 11
(2000) 715–717.
[32] A.D. Attie, Insig: a significant integrator of nutrient and hormonal signals,
J. Clin. Invest. 113 (2004) 1112–1114.
[33] D. Yabe, R. Komuro, G. Liang, J.L. Goldstein, M.S. Brown, Liver-
specific mRNA for Insig-2 down-regulated by insulin: implications for
fatty acid synthesis, Proc. Natl. Acad. Sci. USA 100 (2003) 3155–3160.
[34] R.G. Anderson, J. Goldstein, M. Brown, From cholesterol homeostasis
to new paradigms in membrane biology, Trends Cell Biol. 13 (2003)
534–539.
[35] R. McPherson, A. Gauthier, Molecular regulation of SREBP function: the
Insig–SCAP connection and isoform-specific modulation of lipid
synthesis, Biochem. Cell Biol. 82 (2004) 201–211.
[36] T. Yang, et al., Crucial step in cholesterol homeostasis: sterols promote
binding of SCAP to INSIG-1, a membrane protein that facilitates retention
of SREBPs in ER, Cell 110 (2002) 489–500.
[37] L.M. Silver, Mouse Genetics: Concepts and Applications, Oxford Univ.
Press, New York, 1995.
[38] J. Britton-Davidian, F. Fel-Clair, J. Lopez, P. Alibert, P. Boursot,
Postzygotic isolation between the two European subspecies of the house
mouse: estimates from fertility patterns in wild and laboratory-bred
hybrids, Biol. J. Linnean Soc. 84 (2005) 379–393.
[39] J. Forejt, P. Ivanyi, Genetic studies on male sterility of hybrids between
laboratory and wild mice (Mus musculus L.), Genet. Res. 24 (1974)
189–206.
[40] T.A. Drake, et al., Genetic loci determining bone density in mice with diet-
induced atherosclerosis, Physiol. Genom. 5 (2001) 205–215.
[41] P. Hardenbol, et al., Highly multiplexed molecular inversion probe
genotyping: over 10,000 targeted SNPs genotyped in a single tube assay,
Genome Res. 15 (2005) 269–275.
[42] C.C. Hedrick, L.W. Castellani, C.H. Warden, D.L. Puppione, A.J. Lusis,
Influence of mouse apolipoprotein A-II on plasma lipoproteins in
transgenic mice, J. Biol. Chem. 268 (1993) 20676–20682.
[43] Y.D. He, et al., Microarray standard data set and figures of merit for
comparing data processing methods and experiment designs, Bioinfor-
matics 19 (2003) 956–965.
[44] R.J. Mural, et al., A comparison of whole-genome shotgun-derived mouse
chromosome 16 and the human genome, Science 296 (2002) 1661–1671.