Case control study of interactions in Parkinson's disease using logistic regression exploiting...
Transcript of Case control study of interactions in Parkinson's disease using logistic regression exploiting...
Tel Aviv University
Raymond and Beverly Sackler
Faculty of Exact Sciences
Case control study of interactions in Parkinson's disease using logistic regression exploiting gene-
environment independence
This work is submitted in partial fulfillment of the requirements for the degree of Masters of Sciences (M.Sc) in Biostatistics at Tel aviv University
The Deparment of Statistics and Operations Research
Written By: Lital Bridavsky
Supervised by: Dr. Chava Peretz
Prof. Saharon Rosset
September 2013
אוניברסיטת תל אביב
הפקולטה למדעים מדויקים
ש ריימונד ובברלי סאקלר"ע
ביקורת של אינטראקציות במחלת פרקינסון -מחקר מקרהבשימוש רגרסיה לוגיסטית בהנחת אי תלות בין סביבה
וגנים
חיבור זה הוגש כחלק מהדרישות לתואר
באוניברסיטת תל אביב לביוסטטיסטיקה M.Sc" מוסמך אוניברסיטה"
לסטטיסטיקה וחקר ביצועים החוג
ע"י: ליטל ברידבסקי
העבודה הוכנה בהדרכתו של:
Dr. Chava Peretz Prof. Saharon Rosset
3102ספטמבר
1
Table of Contents
1. Introduction ........................................................................................ 4
Gene-Environment Etiology in Neurological Diseases .................................................... 4
Statistical Challenges in Exploring Gene-Environment Interactions .................................. 4
Parkinson's disease:................................................................................................... 5
2. Aim ............................................................................................................... 9
Specific Aims ............................................................................................................ 9
3. Methods ................................................................................................. 9
Study Design ............................................................................................................. 9
Study Population ....................................................................................................... 9
Study Variables: ...................................................................................................... 11
Statistical Analysis ................................................................................................... 11
Ethics ..................................................................................................................... 15
4. Results ..................................................................................................... 15
Summary Statistics .................................................................................................. 15
Independence Tests ................................................................................................ 19
Classical Logistic Regression Using the Empirical Data ................................................. 20
C&C Logistic Regression using the Empirical Data ........................................................ 22
Simulations ............................................................................................................ 23
5. Discussion ............................................................................................. 26
Bibliography ............................................................................................................... 30
2
Table of Figures
Table 1: Distribution of study variables among cases and controls …………………………………….16
Table 2: Contingency table of APOE-Ɛ2 and disease status…………………………………………………16
Table 3: Contingency tables of APOE-Ɛ2 and Smoking status cases…………………………………….16
Table 4: APOE-Ɛ2, Coffee and Smoking Profile Frequencies………………………………………………..17
Table 5a: Coffee and Smoking Missing-ness Among Cases and Controls……………………………..17
Table 5b: Agreement to Genetic testing Among Cases and Controls……..…………………………...17
Figure 1a: Smokers, APOE-Ɛ2 carriers, and country of origin……………………………………………….18
Figure 1b: Coffee drinkers, APOE-Ɛ2 carriers, and country of origin……………….....................18
Table 6a: Ever Smoker and APOE-Ɛ2 carrier contingency table for controls………………………..19
Table 6b: Coffee Consumption and APOE-Ɛ2 carrier contingency table for
controls…………………………………………………………………………………………….…………………………………19
Table 7a: Classical Logistic Regression using the empirical study data - smoking & APOE-Ɛ2
………………………………………………………………………………………………….........................................20
Table 7b: Classical Logistic Regression using the empirical study data - Coffee Drinking and
APOE-Ɛ2……………………………………………………………………………………………………………………………...21
Table 8a: C&C Logistic Regression using the empirical study data - smoking & APOE-Ɛ2
…………………………………………………………………………………………………………………………………………...22
Table 8b: C&C Logistic Regression using the empirical study data - coffee drinking
…………………………………………………………………………………………………………………………………………...22
Table 9a: Example Classical Logistic Regression of simulated data set i…………………..…………..23
Table 9b: Example C&C Regression of simulated data set i………………………………………………….24
Table 10: Summary statistics of simulated effects - simulation i……………………….…………………24
Table 11a: Example Classical Logistic Regression of simulated data set ii………………………….…25
Table 11b: Example C&C Regression of simulated data set ii………………………………………….……25
Table 12: Summary statistics of simulated effects - simulation ii……………………………..………….25
3
Abstract:
It has become evident in recent years that gene-environment interactions play a
significant role in many chronic diseases. Incorporating such analysis into
studies requires greater statistical power. This paper sought to investigate the
performance of logistic regression exploiting gene-environment independence in
an Israeli case-control study of Parkinson's disease. This regression was used to
consider a cigarette smoking - APOE-Ɛ2 interaction and a coffee drinking- APOE-
Ɛ2 interaction in Parkinson's disease. The above regression was compared to a
classical logistic regression. For the smoking- APOE-Ɛ2 interaction the odds ratio
was 0.129 (95% CI 0.03-0.59). The result of the coffee- APOE-Ɛ2 analysis was not
significant, OR= 0.648 (95% CI 0.21-2.02). Due to a possible dependence
between smoking and APOE-Ɛ2, the comparison of the two logistic regressions
showed that in this case the classical logistic regression was favorable, although
it still lacked power to estimate the main effects. In light of current challenges in
the study of chronic diseases, further development of statistical tools for
estimating interactions in case-control studies is needed.
4
1. Introduction
Gene-Environment Etiology in Neurological Diseases
The etiology of many chronic diseases is unknown. Previously, the roles of "nature
versus nurture" in studies of disease risk were polarized (1). Although some diseases
are predominantly genetic or environmental, we have become aware that many
diseases, especially neurological diseases, have complex mechanisms which involve
interactions between genetics and the environment. Recent advances in human
genomics and biotechnology make is possible to study large numbers of genetic
markers and explore their interactions with environment as potential risk factors. It is
well known that environmental exposures may vary over time, but it's not frequently
considered that gene expression also varies over time. Epigenetic mechanisms of
changes in gene function include alterations in DNA methylation, histone
modification, and microRNA. The effects of toxic exposures have been found to be
mediated by epigenetic mechanisms (2). Studies have already shown gene-
environment interactions in neurological conditions such as Alzheimer's Disease (3)
Parkinson's Disease (4) (5) (6) (7), Depression (3), (8) and other conditions (9) (10).
In the case of Parkinson's disease only a small proportion of cases can be attributed to
a single genetic mutation or environmental factor. A complex interaction of genes and
environment likely underlie the majority of the cases. Twin studies have shown low
concordance of Parkinson's disease, with the exception of young onset PD, indicating
the importance of non-genetic factors (11).
Statistical Challenges in Exploring Gene-Environment Interactions
Cohort, case only and case control study designs have been used to study gene-
environment interactions. Because genotypes do not vary over time, and genotypes
precede phenotypes, case control studies have been known to have greater efficacy
than cohort studies in determining genetic main effects (2).
Cohort studies also incorporate the effects of time. In such studies the gene-
environment interaction is incorporated into a 3 way interaction in which time or life
stage is the third factor. If the gene-environment interaction varies over time, or is
prominent at a specific time and dormant in others, then plotting the coefficients as
functions of time can reveal such patterns. However, due to the high cost of Cohort
studies, case-control studies are more common and practical.
The most popular choice for case control analysis is logistic regression. Prentice and
Pyke showed that one can estimate all the regression coefficients except for the
intercept from retrospective studies using ordinary logistic regression as if the data
were obtained in a prospective study in rare diseases (12).
When the aim of a study is to detect interaction effects as well, the sample size needs
to be four times larger than a study of only main effects of the same magnitude (13).
One study explored the implications of incorporating a gene-environment interaction
into a classical logistic regression model on the power to estimate the genetic main
effect, and found that in some scenarios incorporations of the interaction increased the
power and in others it decreased the power (14). This becomes even more challenging
when investigating interactions of increasing degrees, or multiple gene interactions.
5
Genome wide association studies have studied thousands of genetic markers to
identify the genetic variables associated with diseases.
Peigorsch et al suggest a method of estimating an interaction parameter among cases
only. Assuming independence between gene and exposure in the general population,
and that the disease is rare, they estimate the interaction effect among cases alone by
treating the exposure as the outcome and estimating the main effects for the genetic
categories in a logistic regression (15). Umbach and Weinberg demonstrate estimation
of logistic regression odds ratios through log-linear methods. They also use the log
linear method to incorporate gene-environment independence. This enables a slight
increase in precision for estimation of gene and environment main effects and a
greater increase in precision to estimate the interaction effect. Additionally, the
independence assumption allows the ability to collect genetic information from cases
only to estimate the interaction effect (16).
Nilanjan Chatterjee and Raymond J. Carroll recently developed a logistic regression
with semi parametric maximum likelihood estimation exploiting gene-environment
independence to increase the statistical power of the model without increasing the
sample size. (17). This model has several advantages over previous models which
exploit the gene-environment independence. This method does not require a rare
disease assumption. It also allows continuous and non-parametric distribution of
exposure, and ease of incorporating additional background variables. This logistics
regression model will later on be referred to as C&C.
Parkinson's disease:
Epidemiology
Parkinson's disease is a progressive neurological disease. It's a rare disease, but the
second most common neurodegenerative disease (18) (7). There is no known cure.
Tremor at rest is the most common sign of the motor onset of the disease. The motor
symptoms are part of the earlier symptoms of PD. For this reasons Parkinson's disease
is most often diagnosed in movement disorder clinics. Other motor symptoms of PD
include bradykinesia, muscle rigidity and gait disturbances. In the later stages of the
disease patients suffer from cognitive impairment, dementia, sleep disturbances, and
autonomic disturbances (18). Many of the earlier symptoms are caused by a lowered
production of dopamine in the brain (19) (20) (7).
The average age of onset of PD is around 60 years, although diagnosis of the disease
is difficult and often done many months after onset occurs. The estimated prevalence
is about 100-200 per 100,000 people and an age standardized incidence of 8.6-19 per
100,000 (18). The standardized mortality ratio for Parkinson's disease is 1.58 (95% CI
1.21-4.4) (21). There are 5-10% of PD patients who have early onset PD (occurring
before 50 years of age) (7). The median survival time of PD patients from diagnosis is
10.3 years (18). With the current treatments available, that can be increased to 12
years (22).
Life expectancy of PD patients is decreased due to the disease despite current
treatments available. Most patients survive long enough to require significant medical
6
and day to day assistance due to the progression of the disease. Due to the length and
severity of the disease it incurs a high socioeconomic cost. (23) (24).
Little is known about its etiology. Its currently known that the presence of
intracellular Lewy bodies found in brain cells are part of the neuropathological
character of the disease (25) (26).
Epidemiology in the Jewish Population: The latest study of Parkinson's disease in
Israel indicates that the prevalence of Parkinson's disease in Israel is 256 per 100,000,
higher than in most other populations (27). Ashkenazi Jewish descendants have a
higher prevalence of Parkinson's disease relative to the general population due to
genetic mutations (28) and a high correlation to GBA carriers, which are common
among Ashkenazi Jews as well. In 1989 a cluster of Parkinson's disease patients
developed in three adjacent Kibbutzim in the Negev of southern Israel. Currently, the
reported incidence of Parkinsonism in this cluster area is 471 per 100,000 people,
more than twice the incidence worldwide (29). Currently the most likely suspect of
the cause of this cluster is the mutual water supply well that these three Kibbutzim
share.
Genetic, Lifestyle and Environmental risk factors
Genetics: In general family members of PD patients are 3-4 times more likely to
develop PD. Lately studies have found a number of genes possibly correlated with the
incidence of PD, among them PARK4 (dominant LOD 2.64) (30), SNCA (recessive
OR 1.36; 95% CI 1.02-1.82), MAPT (recessive OR, 1.41; 95% CI, 1.03–1.93) (31)
and LRRK2 Gly2385Arg variant (recessive OR 2.67; 95% CI 1.43-4.99) (32). NAT2
was suspected as a risk factor for PD but has since been found not to have significant
effects of the risk of Parkinson's disease by several studies (33) (34). A genome wide
association study of a Japanese population sample also found PARK16 to be
associated with reduced risk for PD (dominant OR 0.66) (35). In Gao et al a gene-
gene interaction using logistic regression was found between FGF20 and MAOB
polymorphisms OR 0.52; (95% CI 0.3-0.9) (36). Mamah et al. found that the MAPT
and SNCA polymorphisms had the same quantitative effect on risk for PD together as
they did alone (31).
A case control study in America found that Ashkenazi Jew's had much higher
frequency of the LRRK G2019S mutation, with an OR of 17.6 (95% CI 5.9-52.2,
dominant.) The study also found that the LRRK G2019S mutation is very common in
Ashkenazi Jews in North America, 15-20 times more common than in European
studies of the gene mutation (28). A later cohort study in Israel found that the LRRK
G2019S mutation and many GBA mutations associated with PD risk are very
common among Israeli Jews. GBA mutation carriers are common among Ashkenazi
Jews. One third of the cohort study population had either LRRK or GBA. Moderate to
severe GBA mutations were associated with increased risk for PD (OR 5.6; CI 95%
3.4-18.9). LRRK2 mutation carriers were associated with a lower age of disease onset
by 3-9.7 years compared to non-carrier groups (37). An international, multi center
case-control study found a Mantel-Haenszel OR of 5.43 (dominant) for all GBA
mutations (38).
Genetics-APOE: APOE is a gene studied in association with Alzheimer's disease and
later Parkinson's disease as well. APOE has three allele types - Ɛ2, Ɛ3 and Ɛ4. Ɛ3 is
7
the most prevalent among all populations. APOE-Ɛ4 has been associated with
increased risk for Alzheimer's disease. Since Alzheimer's disease and Parkinson's
disease are similar neurological progressive diseases this led to the study of APOE in
Parkinson's disease as well. Several studies found a modest association between
APOE-Ɛ2 and increased risk for Parkinson's disease. The evidence of an association
between APOE-Ɛ4 and Parkinson's disease is weak. A meta-analysis has shown that
APOE-Ɛ2 is associated with a modest increased risk for Parkinson's Disease (OR 1.2;
95% 1.02-1.42) (39) and also found no association between Parkinson's Disease and
APOE-Ɛ4, as did some other studies (40). Another case control study also found
APOE-Ɛ2 to be associated with a modest increase in risk for Parkinson's disease
(dominant OR=1.16; 95% CI 1.03-1.31) (41). Williams-Gray also did a meta-analysis
of other case control studies of APOE-Ɛ2 and Parkinson's disease which showed that
the majority of studies did not succeed in showing a significant effect. They also
stated that to provide sufficient power to detect such a modest effect would require
one study to have upwards of 5000 cases and controls. In a population based cohort
study they found that APOE-Ɛ4 and especially APOE-Ɛ2 were associated with
increased risk of dementia within Parkinson's Patients (dominant, OR=6.27; 95% CI
3.07-12.82)(dominant, OR=13.46; 95% CI 4.46-40.64) (42). One family matched
case-control study found that APOE-Ɛ4 carriers have an increased risk for PD
(recessive, OR 1.8; 95% CI 1.01-3.11) and had earlier ages of PD onset (43).
Worldwide frequencies APOE-Ɛ2 vary between 10-20% (44) (45). In a previous case
control study of an Israeli population which performed genetic testing for APOE they
found that 5% of controls had APOE-Ɛ2 and 53% (7 out of 13) of cases had APOE-Ɛ2
(46).
Environmental Exposure: Several case-control (47) and meta-analysis (48) studies
have found strong evidence that pesticides increase the risk of PD (OR 1.94; 95% CI
1.49-2.53). A case control study in Israel also found increased risk associated with
pesticide exposure (OR 6.81; 95% CI 0.75-64.89) and construction work (OR 2.32;
95% CI 0.84-6.44) (49). Many studies have explored the relationship between
occupational exposures to metals such as copper, iron, lead, mercury, manganese,
aluminum, zinc, cadmium, nickel and arsenic with varying and contradicting results
(50). One study found increased association with PD for chronic exposure over 20
years to copper (OR 2.49; 95% CI 1.06-5.89), manganese (OR 10.61; 95% CI 1.06-
105.83), lead-copper (OR 5.24; 95% CI 1.59-17.24), lead-iron (OR 2.83; 95% CI
1.07-7.50), and iron copper (OR 3.69; 95% CI 1.40-9.71). However, the evidence for
these exposures is still inconclusive as many studies did not find a correlation (51)
(50).
Lifestyle: Many studies, both case-control (47) (52) and prospective (53) (54) (55)
(56) (57), have shown a strong connection between caffeine consumption and a lower
risk for PD (by studying coffee and tea consumption.) Ascherio et al. (2004) found
that the effect of caffeine on PD may be prevented by the use of estrogen therapy in
women. One prospective study (58) concluded that the association found between
coffee and PD was the result of the correlation between coffee and smoking. Coffee
and smoking can be cultural habits and therefore commonly dependent. Most studies
succeed in showing that there is a dose-response relationship between caffeine and PD
therefor the evidence that caffeine is protective is strong. One case-control study
attempted to find an interaction between caffeine consumption and A(2)A genotypes
and found no significant interaction (59). Caffeine is a central nervous system
8
stimulant thought to work by inhibiting the adenosine receptor. Several studies in
mice and primates have shown that adenosine receptor antagonists can improve motor
deficits. If so, then caffeine may reduce the symptoms of PD rather than have a direct
effect on the pathogenesis of PD (53). Another study in mice found that caffeine may
reduce MPTP toxicity by blocking the A(2)A (adenosine) receptor (60).
A few case-control studies (61) (62) (62) (52) (63) (64), meta-analysis (65) and
prospective studies (58) have shown strong evidence that smoking lowers the risk for
PD. Several studies included alcohol in the analysis to evaluate if it has an effect and
to rule out confounding (61) (52). A suggested biological mechanism of protection by
smoking is that nigrostriatal dopaminergic neurons are protected from degeneration
by some element in cigarette smoke. A study using rodents found that exposure to
smoking or nicotine protects these neurons (66)
In a meta-analysis of smoking and coffee effects on developing PD (including many
of the studies cited above) the pooled RR was calculated to be 0.59 (95% CI 0.54-
0.63) for "ever" smokers. For "ever" coffee drinkers the pooled RR was 0.69 (95% CI
0.59-0.80.) (67)
Other anecdotal evidence shows there may be other risk factors including dairy
consumption, estrogen and diabetes. Some other suspected protective factors include
uric acid, hypertension and NSAIDS (54) (68).
Gene-Environment-Lifestyle Interactions
One case control study explored the Gene-Lifestyle interactions between the genes
MAPR, SNCA, APOE and UCHLI with the lifestyles of coffee drinking and cigarette
smoking using logistic regression. The study of interactions showed modest,
borderline significant main effects for APOE-Ɛ2 which showed increased risk for
Parkinson's Disease (OR 1.253; 95%CI 0.91-1.73). Coffee and smoking main effects
were significantly associated with a reduced risk for Parkinson's disease (OR 0.658;
95% CI 0.49-0.87) and (OR 0.689; 95%CI 0.50-0.93) respectively. The study found
there to be an interaction between APOE-Ɛ2 and coffee drinking associated with
reduced risk (dominant, OR 0.339; 95% CI 0.20-0.58). The same study also found an
interaction between SNCA 261 and smoking also associated with reduced risk (OR
0.268; 95% CI 0.12-0.57), but did not find an interaction between smoking and
APOE-Ɛ2 (69). A case-control study using multiple logistic regressions found a gene-
environment interaction between GSTM1 polymorphism and exposure to solvents to
be almost significantly associated with increased risk for Parkinson's Disease (OR
1.76; 95% CI 0.91-3.41) (5). There has been a Case-only study of interactions
between GSTM1, GSTP1, GSTZ1 and smoking in Parkinson's disease. The study
used logistic regression and found a significant interaction between GSTP1-C and
smoking associated with increased risk for Parkinson's disease (OR 2.00; 95% CI
1.11-3.60) (70). Another case control study using logistic regression found that there
is a gene-environment interaction between NAT2 and smoking (OR 0.61; 95% CI
0.43-0.87), GSTM1and smoking (OR 0.67; 95% CI 0.47-0.95) and GSTP1 and
smoking (OR 0.27; 95% CI 0.08-0.93). All the interactions in this study showed
reduced risk for Parkinson's disease in the presence of the respective gene and
smoking status (6).
9
2. Aim The aim of this study was to investigate possible gene-lifestyle, first degree
interactions of APOE-Ɛ2 with smoking and coffee drinking in the risk of Parkinson’s
disease in a case control study.
Specific Aims 1. To test for independence between APOE-Ɛ2 and smoking or coffee drinking
using empirical data.
2. To compare the effects of the multiplicative interaction between the classical
and the C&C logistic regressions using empirical data.
3. To explore the implications of violated independence assumptions using
simulated data.
4. To determine if there are interactions between APOE-Ɛ2 and smoking or
coffee drinking and quantify their effect based on the results of this study.
3. Methods
Study Design
This is a case control study with 3 groups. One group is the case group - participants
diagnosed with Parkinson's disease, and two control groups - the spouses of the
Parkinson's disease patients and patients with Arthritis and Rheumatism undergoing
treatment in Sourasky Medical Center in Tel Aviv. Two trained interviewers collected
the data in face to face interviews.
For the purposes of this study, the two control groups will be treated as one
Study Population
The cases were sequentially recruited from the PD patients of Sourasky Medical
Center. Controls were collected from the spouses of the PD patients who were
married for at least 15 years and Arthritis and Rheumatism patients in Sourasky
Medical Center. Spouses and Arthritis and Rheumatism patients were chosen as the
controls because they were residents of the same suburbs, they were of similar age,
and their socio-demographic characteristics were expected to match those of the PD
patients.
During the interviewer performed questionnaire the participants were asked the
following questions for current and past consumption of instant and regular coffee:
How often did you drink?
1-7 days a week, once a month, twice a month
What size serving did you drink?
A little, A moderate amount, A lot
11
During the interviewer performed questionnaire the participants were asked the
following questions about their smoking habits:
1. Do you smoke? If yes go to 3.
2. Have you smoked in the past? If no go to 4.
3. How many cigarettes per day?
4. Have there been times when you've smoked more, less, or quit? If no go to
6.
5. Smoking history:
a. from age ____ to age ____ smoke ____ cigarettes per day
b. from age ____ to age ____ smoke ____ cigarettes per day
c. from age ____ to age ____ smoke ____ cigarettes per day
6. At what age did you start smoking?
7. What age did you stop smoking?
8. How many years have you smoked in total?
Some participants refused to provide a sample for testing immediately after the
interview and subsequently changed their minds during the duration of the study after
receiving farther explanation. The percentage of refusals was about 20%, among those
1.5% were spouses of cases who were interviewed and then died during the study.
Two blood samples were disqualified. One was that of a spouse who had essential
tremor, and the second was that of a spouse that underwent a bone marrow transplant
prior to the study. Of all the participants who filled out the questionnaire 372 agreed
to genetic testing. Out of those, 370 were used.
Blood or saliva samples were collected for the APOE-Ɛ polymorphisms assessment in
the Medical Center's MDS Clinic. Analysis of the APOE-Ɛ samples was done in the
Molecular Biology Research Center of the Tel-Aviv Sourasky Medical Center by
Professor Avi Or.
Some questionnaire data was missing for some of the participants. Some participants
did not answer the questions about smoking, which were on page 2 of the
questionnaire. All those participants, and more, did not answer the questions about
coffee consumption which were on page 15. The completion of the questionnaires
was compromised by the availability of the participants to complete the
questionnaires with the interviewer.
11
Study Variables:
Main effects
APOE-Ɛ2
APOE has three known isoforms (Ɛ 2, Ɛ 3, Ɛ 4). The APOE-Ɛ2 was assumed
dominant. Its value was 1 for genotypes (Ɛ 2/ Ɛ 2, Ɛ 2/ Ɛ 3, Ɛ 2/ Ɛ 4) and 0 otherwise.
This method was also used in the meta-analysis study (71) as well as a case control
study exploring gene-lifestyle interactions (69) (Due to limitations in the sample
population of this study it was not practical to create categories for all the APOE- Ɛ2
genotypes, as there was only one participant with Ɛ 2/ Ɛ 2.)
Coffee
For the purposes of this paper, this lifestyle variable was converted into a binary
variable, where the categories were either coffee non-drinker or ever coffee drinker.
Due to complications with the collected data, and for simplicity, this analysis did not
use a multi categorical variable for different levels of coffee drinking.
Smoking
For the purposes of this paper, this data was calculated into an ever smoker variable,
having 1 for someone who ever smoked tobacco or 0 for someone for never smoked
(6).
Gender
Gender is a binary variable.
Multiplicative Interactions
Smoking X APOE-Ɛ2
There are two categories in the smoking-APOE-Ɛ2 interaction variable. The
interaction between APOE-Ɛ2 and Smoking will equal 1 for participants who ever
smoked and APOE-Ɛ2 carriers, and 0 for all other participants.
Coffee Drinking X APOE-Ɛ2
There are two categories in the coffee-APOE-Ɛ2 interaction variable. The interaction
between coffee and APOE-Ɛ2 will be 1 for participants who ever drank coffee and are
APOE-Ɛ2 carriers, and 0 for all other participants.
Statistical Analysis
The statistical analysis was done using R 2.13.0 software.
Aim 1: To test for independence between APOE-Ɛ2 and smoking or coffee
drinking using empirical data.
12
Independence tests
Testing for independence between two disease risk variables in a case control study
can be problematic. In this study for example, if there is an associated between
APOE-Ɛ2 and Parkinson's disease and smoking and Parkinson's disease, then by over
sampling cases we'll see an unnatural oversampling in certain profiles. This can cause
the test of dependence to not be rejected, regardless of the real dependence existing in
the general population. In the case of rare diseases, the real population is very similar
to our control sample. For this reason, the dependence test will be performed on the
controls only.
Tests of dependence will be done using Fisher Exact tests with two sided hypothesis
with α = 5%.
Aim 2: To compare the effects of the multiplicative interaction between the
classical and the C&C logistic regressions using empirical data.
Classical Logistic Regression
In a case control study let n0 be the number of controls and n1 be the number of cases,
for a total of n subjects. Let D denote the disease status where D=1 is with disease.
Let Xg be a binary covariate which denotes presence of the gene when Xg =1. Let Xe
be a group of all other exposures.
The logistic regression model that follows for disease risk with parameters β = (α, βg ,
βe , βge ) The log odds of this function is:
where p is the conditional probability of disease. The log likelihood used to estimate
maximum likelihood estimates is:
Where:
and
In a case control-study subjects are sampled by their disease status, so that in fact the
disease is fixed and the X is random which is different from the prospective model
just described. Prentice and Pyke showed, as mentioned above, that ignoring this
problem and using the prospective model on retrospective data still results in ML
parametric estimates similar to the odds ratios of a prospective model.
13
C&C Logistic Regression
When investigating a regression model with an interaction term, a large sample size is
needed. Incorporating a valid assumption into a model can increase statistical
efficiency and power. Previously, David M. Umbach and Clarice R. Weinberg
demonstrated a way to use a log linear model to incorporate an independence
assumption and estimate logistic regression odds ratios. However, their model was for
binary gene and exposure variables, and was quite limited. Raymond Carrol and
Ninjan Chaterjee created a logistic regression model which incorporates an
independence assumption between the interaction variables into a retrospective
likelihood while allowing exposure to be nonparametric and the gene to have multiple
categories (further referred to as C&C.)
As mentioned before, in retrospective data, rather than estimating P(D|X) you would
estimate P(X|D) because you've already determined disease status during case-control
sampling. Let j be the index for gene status and k be the index for exposure. In
population genetics theory, the probabilities of the different genetic statuses {q1…qj}
can be modeled as a function of some parameter vector θ (for example as in the
Hardy-Weinberg equilibrium) just that qj=qj(θ). For exposure values {e1…ek} we
define the distribution as having corresponding probability masses {δ1…δk}. The
retrospective log likelihood is then:
log(L(α,βg, βe , βge,θ,δ))=
Which according to the Bayes theorem is:
The joint probability P(Xg, Xe) is replaced by to P(Xg) P(Xe) to incorporate the
independence assumption. The probability P(Xek) is replaced by δk and P(Xgj) is
replaced by qj(θ) to get this:
This likelihood is used to estimate maximum likelihood estimates for the coefficients.
In this study, the nonparametric aspect of the model is not exploited due to the fact
that the gene and exposure variables are binary.
This analysis was done in R with the CGEN package. (72)
http://127.0.0.1:16855/library/CGEN/html/CGEN.html
14
Standardized Bootstrap Confidence Intervals
Due to the small sample size, a standardized bootstrap was performed to verify the
reliability of the statistically significant interaction effect.
The empirical data was bootstrapped 800 times to calculated standardized OR
bootstrap estimates, and each bootstrap was bootstrapped an addition 800 times to
estimate the standard deviation of each original bootstrap estimate. The lower and
upper limits were calculated with the following equation where is the OR estimate
from the classical logistic regression, "i" is the bootstrap index, is the standard
deviation estimate of the OR from the nested bootstrap of bootstrap i, and is the
standardized estimate of bootstrap i, and is the standard deviation of the OR
estimates of the main bootstrap.
Upper limit =
Lower limit =
Aim 3: To explore the implications of violated independence assumptions using
simulated data.
Simulations
The following are the simulations performed in the study:
i. First simulation: dependence between APOE-Ɛ2 and smoking
The gender, smoking, and allele statuses are sampled with replacement from the
empirical data. Then the Case status is sampled with the probability based on the
former variables per the model resulting from the logistic regression using the
empirical data, as so:
SCases <- rbinom(297,1,(exp(α+βx))/(exp((α+βx)+1)
ii. Second simulation: No association between APOE-Ɛ2 and smoking
This simulation is not sampled from the data but rather generated. Cases are generated
first, to simulate data like a case control sample.
SCases <- rbinom(5000,1,0.5)
Then allele variable is generated based on the Case status having 0.08 chance to have
APOE-Ɛ2 among controls and 0.10 chance among cases (which is loosely based on
the results of the empirical data):
S3allele <- rbinom(5000,1,0.084+0.0168*S3Cases)
15
Then smoking is generated assuming ~15% of cases who have the allele also smoke,
and %35 of the rest of the subjects smoke:
S3Smoke <- rbinom(5000,1,0.35-0.2*S3Cases*S3allele)
Then gender, also based on empirical data:
S3Gender <- rbinom(5000,1, 0.7011542-0.4328611*S3Cases)
Each simulation was be run once and analyzed once by the classical logistic
regression and C&C to display an example of one such simulation. Then each
simulation will be repeated 1000 times to calculate the median estimate and OR, 95%
confidence intervals, and median P-values of the multiplicative interaction
coefficients for both the classical and C&C logistic regressions. The 95% confidence
interval will be calculated by taking the 2.5% percentile value of the OR and the 97.5
percentile value.
Aim 4: To determine if there are interactions between APOE-Ɛ2 and smoking or
coffee drinking and quantify their effect based on the results of this study.
After considering aims 2&3 we’ll come to a conclusion about the existence of an
interaction, and their quantified effects.
Ethics
This study was approved by the Sourasky Medical Center institutional ethics
committee. Each participant signed an informed consent form. Each participant was
informed that they would not be receiving the results of their genetic testing, as per
the suggestion of the American Neurological Association. The APOE sampling was
approved by the Israeli Ministry of Health's ethics committee for human genetic
studies.
4. Results
Summary Statistics
The study includes 409 participants of which 163 cases and 246 controls. Of all the
participants who filled out the questionnaire 306 agreed to genetic testing and had
valid results. Two were discarded because the results of the genetic testing were
invalid. Of all the study participants 402 participants supplied sufficient self-reported
smoking information to determine ever smoking status and 268 provided enough self-
reported information about their coffee drinking habits.
16
Table 1: Distribution of study variables among cases and controls
Cases N
Smoking Ever 96
Never 63
Coffee Drinking Drinker 73
Non-Drinker 18
Controls N
Smoking Ever 79
Never 164
Coffee Drinking Drinker 163
Non-Drinker 14
Tables 1 presents summary statistics of cases and controls and lifestyles. There is a
higher rate of smokers among the cases, and a higher rate of coffee drinkers among
controls.
Table 2: Contingency table of APOE-Ɛ2 and disease status
Non-APOE-Ɛ2 APOE-Ɛ2
carrier Total
Controls 148 31 179
Cases 112 15 127
Total 260 46 306
Table 2 presents a summary of APOE-Ɛ2 carrier status and case-control status. In this
sample 15% of the participants had APOE-Ɛ2, 17% of the controls had APOE-Ɛ2, and
12% of the cases had APOE-Ɛ2.
Table 3: Contingency tables of APOE-Ɛ2 and Smoking status cases
Cases(N=124) Ever Smoker Never
Carrier 5 10
Non-Carrier 48 61
Cases(N=76) Coffee Drinker Non-Drinker
Carrier 9 2
Non-Carrier 55 10
Controls(N=176) Ever Smoker Never
Carrier 20 11
Non-Carrier 37 108
Controls(N=156) Coffee Drinker Non-Drinker
Carrier 27 1
Non-Carrier 116 12
*For relevant subjects with full information for both variables
Table 3 shows frequencies of APOE-Ɛ2 carrier status and lifestyle variables for cases
and controls respectively in contingency tables. This is a good example to show how
difficult it may be to explore gene-lifestyle (or environment) interactions of rare
diseases. In our case, the gene is fairly rare as well. This table shows the distribution
of APOE-Ɛ2, coffee drinking and smoking among the case and control groups.
17
Table 4: APOE-Ɛ2, Coffee and Smoking Profile Frequencies
APOE-
Ɛ2Carrier*
Coffee ** Smoking*** Cases Controls
1 1 1 3 16 1 0 1 1 1 1 0 0 1 0 1 1 0 6 11 0 1 1 24 31 0 0 1 3 1 0 0 0 7 11 0 1 0 30 83 *APOE-Ɛ2 carrier is one for genotypes (E2/E3,E2/E2, E2/E4)** Coffee is 1 for coffee consumption
and 0 for no coffee consumption *** Smoking is 1 for Ever smoker and 0 for never Smoker
Table 4 shows the frequencies of cases and controls among each of the profiles.
Table 5a: Coffee and Smoking Missing-ness Among Cases and Controls
Smoking Response Cases Controls
Not Missing 124 176
Missing Smoking 5 3
Coffee Response Cases Controls
Not missing 76 156
Missing Coffee 51 23
Table 5a shows the frequencies of missing coffee and smoking related responses
among cases and controls who agreed to genetic testing. Among cases 3.8% of the
participants did not respond to the smoking questions, and 1.7% among controls.
Among cases 40% of the participants did not respond to coffee related questions, and
12.8% among controls.
Table 5b: Agreement to Genetic testing Among Cases and Controls
Cases Controls
Agreed 127 179
Did not Agree 36 67
Table 5b shows the missing-ness of genetic information. A higher percentage of
controls did not agree to genetic testing. Of the controls 27% did not agree to genetic
testing. Among the cases 22% did not agree to genetic testing. It is assumed that
genetic information is missing at random. It is reasonable to assume that the gene
status did not affect the participants' decision to be tested.
18
Figure 1a: Smokers, APOE-Ɛ2 carriers, and country of origin:
There were 189 participants with reported smoking statuses and origin, and 230 participants with
APOE-Ɛ2 test results and reported origin.
Figure 1b: Coffee drinkers, APOE-Ɛ2 carriers, and country of origin:
There were 272 participants with reported coffee drinking statuses and origin, and 230 participants with
APOE-Ɛ2 test results and reported origin.
Coffee drinking and smoking are both social and cultural lifestyle habits, and
therefore it's interesting to see their distribution among ethnicities. The following
graphs show the distribution of smokers and APOE-Ɛ2 carriers per self-reported
country of origin, and the same for coffee drinkers and APOE-Ɛ2 carriers
respectively.
25%
43%
33%
28%
43%
36% 40%
0% 0%
11%
18% 20%
0%
21%
40%
0% 0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
Percent smokers
Percent APOE-2 Carriers
0%
43% 47%
7%
50%
31%
0% 0% 0%
11%
18% 20%
0%
21%
40%
0% 0%
10%
20%
30%
40%
50%
60%
Percent Coffee Drinkers
Percent APOE-2 Carriers
19
Aim 1: To test for independence between APOE-Ɛ2 and smoking or coffee
drinking using empirical data.
Independence Tests
Table 6a: Ever Smoker and APOE-Ɛ2 carrier contingency table for controls
Smoker\APOE-Ɛ2 Non-Carrier Carrier
Never 108(75%) 11(35%)
Ever 37(25%) 20(65%)
OR=5.25, 95% CI (2.16,13.37), P-value: 0.0007
Table 6a is a contingency table of the frequencies of smokers and APOE-Ɛ2 carriers.
A Fisher Exact test shows that independence is rejected with a p-value of 0.0007, and
the confidence interval of the odds ratio at 95% is (2.16,13.37). The C&C model is
based on the assumption that the variables which make up the multiplicative
interaction are independent from one another. Therefore, if there is no independence,
the results of a C&C model on this empirical data may be untrustworthy. This issue
will be discussed further on.
Table 6b: Coffee Consumption and APOE-Ɛ2 carrier contingency table for
controls
Coffee
Consumption\APOE-Ɛ2
Non-Carrier Carrier
Never 12(9%) 1(4%)
Ever 116(91%) 27(96%)
OR=2.77, 95% CI (0.38, 127.7), P-value: 0.468
Table 6a is a contingency table of the frequencies of coffee drinkers and APOE-Ɛ2
carriers. A Fisher Exact test shows that independence is NOT rejected in this case,
with a p-value of 0.466 and a confidence interval for the odds ratio being
(0.38,127.7). Therefor if we were to use a C&C model one may assume that the
results of such a regression model would be valid.
21
Classical Logistic Regression Using the Empirical Data
Aim 2: To compare the effects of the multiplicative interaction between the
classical and the C&C logistic regressions using empirical data.
Table 7a: Classical Logistic Regression using the empirical study data - smoking
& APOE-Ɛ2 (N=297)
Estimate Std. Error
Z Value Pr(>|z|) OR 95%CI
(intercept) 2.3715 0.4498 5.273 1.34e-7*** 10.713 4.43-25.87
Gender -1.8778 0.2721 -6.901 5.18e-12*** 0.152 0.09-0.26
APOE-Ɛ2 0.2759 0.5171 0.534 0.5936 1.317 0.48-3.63
Smoker 0.4819 0.3031 1.59 0.1118 1.619 0.89-2.93
Smoker:APOE-Ɛ2 -2.0469 0.7779 -2.632 0.0085** 0.129 0.03-0.59
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Table 7a shows the effect estimates and test statistics of the classical logistic
regression using our empirical case-control data for smoking and APOE-Ɛ2.
Goodness of fit: The Wald test of all the coefficients has =56.3 (p=1.8e-11.) The
value of the residual deviance was 336.6 (P=0.039.) The Log-Likelihood test of
this model compared to a null model had a value 66.89 (p=1.02*e-11). A
Likelihood test of this model compared to a model without an interaction valued
=7.25 (p=0.007). The AIC for the model with the interaction is 346, smaller than the
model without the interactions (AIC=351), also showing that the model with the
interaction is preferable over the model without it, despite the loss in degrees of
freedom. All of the goodness of fit tests indicate that the model with an interaction is
well fitted.
Due to the fact that there are very few APOE-Ɛ2 carriers in this study, it was
impressive that the classical logistic regression produced an interaction effect which
was statistically significant at any commonly used α. A non-parametric bootstrap-t of
this data helps to further investigate the reliability of this result. The data sample was
bootstrapped 1000 times. The 95% CI of the T value of the interaction effect was (-
1.534, 1.400). This gives a 95% CI of (-0.7258,-3.3939)
To verify the reliability of the interaction effect results, a standardized bootstrap was
performed. The standardized bootstrap 95% confidence interval of the interaction
effect was (-4.195, -0.988) or the OR CI is (0.015-0.372), confirming that the effect is
significant.
Explanatory variables: The APOE-Ɛ2 carrier status and smoking main effects were
not significant while the interaction between them was, possibly showing that their
effects on disease risk are only multiplicative and not additive, i.e. that smoking and
21
APOE-Ɛ2 have no main effects. The odds ratio of APOE-Ɛ2 is 1.317, 95% CI (0.48,
3.63) and it was not significant. It's important to note that the P-value of the smoking
main effect is almost significant at 10%, but there must be a correction for multiple
tests. With a Bonferroni correction of multiple tests the APOE-Ɛ2 P-value would have
to be less than 0.05 to be significant at an α of 10%. The multiplicative coefficient is
significant even after Bonferroni correction. The OR of the interaction indicates
decreased risk for Parkinson's disease for people who smoke/smoked and are APOE-
Ɛ2 carriers.
Table 7b: Classical Logistic Regression using the empirical study data - Coffee
Drinking and APOE-Ɛ2 (N=232)
Estimate Std. Error
Z Value Pr(>|z|) OR 95%CI
(intercept) 2.5657 0.6883 3.727 0.000193*** 13.01 3.37-50.14 Gender -1.8446 0.3177 -5.207 6.4e-9*** 0.158 0.08-0.29 APOE-Ɛ2 0.6879 1.4295 1.4988 0.6303 1.99 0.12-32.77 CoffeeDrinker -0.523 0.5207 -1.005 0.3151 0.592 0.21-1.64 CoffeeDrinker:APOE-Ɛ2 -1.2606 1.4988 -0.841 0.4 0.283 0.02-5.34
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
N=232
Table 7b shows the effects estimates and test statistics of the classical logistic
regression using the empirical case-control data for coffee drinking and APOE-Ɛ2.
Goodness of fit: The Wald test P-value for the coefficients excluding gender has a
=3.8 (p=0.28.) The residual deviance statistic for this model is 248.62
(p=0.1242.) A Log-Likelihood test of this model over a Null model had a =41.02
(p=2.65e-8). A Log-Likelihood test of this model over a model without an interaction,
however, was not significant, having =0.747 (p=0.387). A Wald test of all the
coefficients has =35.4 (p=3.9e-7). The results of the goodness of fit test are likely
just the result of the explained variance from the gender alone, as none of the other
effects were significant. Overall these tests indicate that this model is not well fitted
or, more likely, that the sample size may be too small to estimate these regression
parameters.
Explanatory Variables: The interaction is not significant at 5%. Only the gender effect
was significant.
The classical logistic regression had meaningful results regarding the interaction
effect, and this small sample might have been sufficient for that analysis despite the
loss of degrees of freedom due to the interaction factor. We will now explore the
results of the C&C model.
22
C&C Logistic Regression using the Empirical Data
Table 8a: Classical Logistic Regression using the empirical study data - smoking
& APOE-Ɛ2 (N=292)
Estimate Std. Error Z Value Pr(>|z|) OR 95%CI
(intercept) 2.3832 0.4400 5.417 6.06e-8*** 10.839 4.57-25.67 Gender -1.8395 0.2669 -6.892 5.46e-12*** 0.158 0.09-0.27 APOE-Ɛ2 0.1424 0.2808 0.507 0.611 1.153 0.67-2.00 Smoker -0.2392 0.3958 -0.604 0.545 0.787 0.36-1.71 Smoker:APOE-Ɛ2 -0.4332 0.5811 -0.745 0.456 0.648 0.21-2.02
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Table 8a shows the results of the C&C model regression using the empirical case-
control data for smoking and APOE-Ɛ2.
Goodness of fit: The Wald test statistic of all the coefficients in this model has =
52.2 (p=1.3*e-10).
Explanatory Variables: The interaction between smoking and the APOE-Ɛ2 is not
significant in this model, differing from the results of the classical logistic regression.
The odds ratio of the APOE-Ɛ2 main effect is 1.153, 95% CI (0.665, 1.999).
We should expect that the results should be more statistically significant, and yet we
see that it is the opposite. The standard errors here are less than the standard errors of
the classical logistic model for all the effects except for the smoker main effect. The
estimates are different as well. It is known that the violation of the independence
assumption causes bias in the estimates. It would seem that the reason the interaction
effect is not significant is due to bias resulting from the violated assumption.
Table 8b: C&C Logistic Regression using the empirical study data - coffee
drinking (N=232)
Estimate Std. Error Z Value Pr(>|z|) OR 95%CI
(intercept) 2.5911 0.673 3.85 1.18e-4*** 13.344 3.56-49.9 Gender -1.8109 0.314 -5.766 8.09e-9*** 0.164 0.088-0.302 APOE-Ɛ2 0.0443 0.8099 0.0547 0.197 1.045 0.21-5.11 Coffee -0.6518 0.5054 -1.29 0.956 0.521 0.19-1.40 Coffee:APOE-Ɛ2 -0.306 0.8604 -0.356 0.722 0.736 0.14-3.98
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Table 8b shows the results of the C&C regression using the empirical case-control
data for coffee drinking and APOE-Ɛ2.
Goodness of fit: The Wald test of all the coefficients of this model had =34.7
(p=5.3*e-7), and as we saw in the previous model for coffee drinking, this is probably
due solely to the explained variance of the gender. Again, this model is poorly fitted.
Explanatory Variables: The interaction effect is not significant before or after
Bonferroni correction. The only effects which were significant were gender and the
intercept.
23
In summary, there were meaningful results in the models of smoking & APOE-Ɛ2.
There was a difference between the classical logistic model and the C&C model. The
interaction was significant in the classical logistic regression but not in the C&C
model. When the independence assumption between the gene and the exposure
variables is valid, then these two regressions should bring almost identical effect
estimates, and the C&C model should have MORE power, rather than less. Although
we notice, in the two regressions on the smoking data, that 4 of 5 parameter estimates
had smaller standard errors in the C&C model, 4 out of 5 Z values were further from 0
in the classical logistic model. These results are likely due to the violation of the
independence assumption.
In order to explore further the effects of violating the independence assumption on the
results of the C&C model, in the next section we will experiment with simulated data
samples analyzed by this constrained model.
Aim 3: To explore the implications of violated independence assumptions using
simulated data.
Simulations The simulation generated a case control sample data with gender, APOE-Ɛ, smoking,
and disease status. This section explores the results of both the classical logistic
regression and the C&C regression when there is dependence between APOE-Ɛ2 and
smoking using simulated data. First, sample data set results will be displayed, and
then results summarizing all the data sets.
i. The first simulation has dependence between APOE-Ɛ2 and smoking
based on the empirical data. 1000 sets of samples, with sample size 297
were generated from the empirical data.
ii. The second simulation has independence between the APOE-Ɛ2 carrier
status and smoking. This simulation generated 1000 sample sets of 5000.
Simulation i:
Table 9.a: Example Classical Logistic Regression of simulated data set i
Estimate Std. Error
Z Value Pr(>|z|) OR 95%CI
(intercept) 3.0832 0.5171 5.963 2.4e-9*** 21.828 7.92-60.14 Gender -2.2870 0.2926 -7.814 5.5e-15*** 0.102 0.06-0.18 APOE-Ɛ2 1.0160 0.5987 1.697 0.090 2.762 0.85-8.93 Smoker 0.3200 0.3158 1.013 0.311 1.377 0.74-2.55 Smoker:APOE-Ɛ2 -2.2584 0.9779 -2.309 0.020. 0.105 0.02-0.71
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Table 9.a shows the results of a classical logistic regression using one set from
simulation i.
24
In this sample there was dependence between APOE-Ɛ2 status and smoking with a
correlation of 0.0307. The interaction effect was statistically significant at 0.05 with a
P-value of 0.02. The main effects were not significant.
Table 9.b: Example C&C Regression of simulated data set i
Estimate Std. Error Z Value Pr(>|z|) OR 95%CI
(intercept) 3.2090 0.5078 6.318 2.9e-10*** 24.754 9.15-66.97
Gender -2.3693 0.2891 -8.195 3.5e-16*** 0.094 0.05-0.16 APOE-Ɛ2 0.8348 0.4050 2.061 0.047* 2.304 1.04-5.09 Smoker 0.2158 0.3049 0.708 0.457 1.241 0.68-2.25 Smoker:APOE-Ɛ2 -1.4480 0.6685 -1.905 0.056 0.235 0.06-0.87
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Table 9.b shows the results of a C&C logistic regression using one set of the
simulated data described in simulation i.
These results differ somewhat from the logistic regression. First of all, the APOE-Ɛ2
main effect is significant. Secondly, the interaction effect was different, although still
significant and still negative.
Table 10: Summary statistics of simulated effects - simulation i
Coef
Median
Estimate
Median
OR
OR 95% CI Median SE Median
Pr(>|z|)
# simulations
PV<0.05
APOE-
Ɛ2
Classical 0.2850 1.330 0.395-4.356 0.5836 0.451 57
C&C -0.9673 0.380 0.156-0.790 0.4130 0.020 671
Smoking Classical -1.9197 0.147 0.076-0.260 0.2902 4.23e-11 1000
C&C -1.8290 0.161 0.086-0.278 0.2793 6.99e-11 1000
G*E Classical -2.0366 0.130 0.027-0.492 0.7351 0.0005 795
C&C 0.6837 1.981 0.718-5.759 0.5107 0.174 263
Table 10 shows the summary of the classical and C&C logistic regressions on the
simulated data. We see that the C&C model had lower standard errors consistently.
However due to the bias resulting from the violated independence assumption, the
interaction effect was not significant in most of the C&C model simulations.
25
Simulation ii:
Table 11a: Example Classical Logistic Regression of simulated data set iii
Estimate Std. Error
Z Value Pr(>|z|) OR 95%CI
(intercept) 0.8262 0.0504 16.393 2.15e-60*** 2.290 2.07-2.53 Gender -1.8367 0.0630 -29.137 1.21e-186*** 0.159 0.14-0.18 APOE-Ɛ2 0.6170 0.1282 4.811 1.50e-6*** 1.853 1.44-2.38 Smoker -0.0010 0.0684 -0.014 0.98 0.999 0.87-1.14 Smoker:APOE-Ɛ2 -1.3943 0.2748 -5.074 3.891e-07*** 0.248 0.14-0.42
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Table 11a shows the results of a classical logistic regression using one data set from
simulation ii. All the effects were significant except smoker status.
Table 11b: Example C&C Regression of simulated data set iii
Estimate Std. Error
Z Value Pr(>|z|) OR 95%CI
(intercept) 0.8250 0.0501 16.457 7.42e-61*** 2.282 2.06-2.52 Gender -1.8300 0.0627 -29.177 3.90e-187*** 0.160 0.14-0.18 APOE-Ɛ2 0.6058 0.1035 -0.017 4.80e-9*** 1.833 1.49-2.24 Smoker -0.0011 0.0668 5.854 0.98 0.999 0.87-1.14 Smoker:APOE-Ɛ2 -1.3944 0.1970 -7.079 1.45e-12*** 0.248 0.17-0.36
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Table 11b shows the results of a C&C logistic regression using one set of the
simulated data described in simulation ii. All the effects were significant except
smoker status.
Table 12: Summary statistics of simulated effects - simulation iii
Coef
Median
Estimate
Median
OR
OR 95% CI Median SE Median
Pr(>|z|)
# simulations
PV<0.05
APOE-
Ɛ2
Classical 0.4880 1.629 0.927-2.877 0.2890 0.094 386
C&C 0.4648 1.591 0.997-2.477 0.2320 0.044 519
Smoking Classical -1.8653 0.155 0.116-0.205 0.1417 1.22e-39 1000
C&C -1.8607 0.155 0.117-0.204 0.1409 7.33e-40 1000
G*E Classical -1.0938 0.335 0.085-0.993 0.5904 0.060 476
C&C -1.1123 0.329 0.113-0.660 0.4120 0.007 850
Table 11 shows the summaries of the classical and C&C logistic regressions on the
simulated data. This data was simulated with no violation of the independence
assumption. The C&C regression consistently produces lower SE estimates, as well as
effects similar to the classical logistic regression. And as such, the C&C regression
also had consistently lower p-values. The C&C regression also produced significant
effects more often than the classical logistic regression. In summary, we see that in
this case, when the assumption is not violated, the C&C model does what it is
proposed to do.
26
5. Discussion
Aim 1: To test for independence between APOE-Ɛ2 and smoking or coffee
drinking using empirical data.
Case control studies require larger sample sizes when testing possible interactions
(73). In recent years is has become evident that gene-environment interactions play
important and complicated roles in many chronic diseases (74). This paper sought to
test independence between the interaction variables so that the C&C logistic model
could be implemented to increase the power of the analysis.
The results of the independence test rejected independence between APOE-Ɛ2 and
smoking (p-value=0.0007), meaning a significant correlation was found between
APOE-Ɛ2 and smoking in the sample. This in itself may be a very interesting
relationship, and there are genome-wide studies to find such correlations (75). The
C&C model assumes independence in the general population, and it is possible that
the correlation found in this study may be due to stratification. Nonetheless for the
purposes of this paper, it posed a challenge in analyzing the possible interaction
between genetic and lifestyle factors in the empirical data. Most case-control studies
of rare diseases fail to reach a sufficient sample size in order to study interactions.
One study showed that just to estimate the modest effects of APOE-Ɛ2 a case control
study would require a sample size of around 5000 (41). Therefor the C&C regression
model would have been an attractive option to increase power for estimating the
interaction effect among such a small sample size.
The rejection of independence between APOE-Ɛ2 may be due to ethnicity, or other
cultural factors, as smoking tend to trend culturally. The C&C model has the
capability to include stratification variables. For analysis of a habit such as smoking it
would have been beneficial to test the significance of a stratum such as origin,
ethnicity or others.
The independence test for coffee drinking and APOE-Ɛ2 did not reject independence,
and therefore we may rely on the results of the C&C model on the coffee data.
Aim 2: To compare the effects of the multiplicative interaction between the
classical and the C&C logistic regressions using empirical data.
The OR of the interaction effect between APOE-Ɛ2 and smoking in the classical
logistic regression was 0.129 (95% CI 0.03-0.59), showing that the interaction lowers
the risk for Parkinson's disease. This result is significant at α=5% after Bonferroni
correction for multiple testing. The OR from the C&C model was 0.648 (95% CI
0.21-2.02). This is one demonstration of the outcomes of the invalidated
independence assumption on the C&C model. The interaction effect in the C&C
27
model is not statistically significant, and is different than the ML estimate of the
classical logistic regression, which was statistically significant.
When the independence assumption is violated, the C&C model's estimates are
biased. In this empirical data, the bias of the C&C estimates caused all of the effects
except for the intercept to be closer to zero. Therefore despite the lower standard error
estimates, the effects were not statistically significant.
It is interesting to note that when the gene and environment variables are both binary,
then the goals and benefits of the C&C model and the log linear type model from
Umbach and Weinberg's paper are the same - to estimate logistic regression odds
ratios while incorporating the independence assumption. When using the log linear
approach to estimate the OR of the main effects APOE-Ɛ2 and smoking, and their
interaction effect, they are, in this case, equivalent to the results of the C&C model.
This was tested for models not including gender. The log linear estimates for a model
assuming independence are (16):
And the results for the C&C model OR's were 1.6232, 0.7870, and 0.6485
respectively. The log linear method is also biased under the violation of the
independence assumption. When the variables are binary and the independence
assumption is not violated it would seem that in that case they would provide very
similar estimates as well.
No significant findings were found in the analysis of coffee drinking data. A previous
study which evaluated the interaction between smoking and APOE-Ɛ2 did not find
statistically significant results (69). The same did find a statistically significant
interaction effect between coffee drinking and APOE-Ɛ2. The lack of results in the
coffee drinking analysis likely due to the fact that the missing data was missing due to
incompleteness of the questionnaires, and the coffee drinking information was much
further into the questionnaire than the smoking data, and therefor was missing more
often.
Aim 3: To explore the implications of violated independence assumptions using
simulated data.
In the simulations, it was demonstrated that when resampling from the empirical data,
with replacement, the C&C model was successful in estimating the interaction effect
28
with a p-value<0.05 263 times, and the classical logistic model was able to estimate a
statistically significant interaction effect 795 times. We also see that the estimates are
similarly drastically different as they were in the results of the analysis on the
empirical data. The median smoking APOE-Ɛ2 OR of classical logistic regression was
0.130, and in the C&C model the median was 1.981. However, the C&C model still
estimated a lower standard error for the interaction effect. Nilanjan Chatterjee
addressed this bias in a paper where he proposed an Empirical Bayes-type shrinkage
estimator for the interaction effect which compromised between bias and efficiency
(76), showing that the bias can in fact be mild, like the result from the simulations. In
their paper, they demonstrate the variability of the bias under different conditions.
They show that the case-control estimate usually has less bias than the case-only
estimate, especially under departure from the independence assumption. The
Empirical Bayes-type shrinkage estimator proposed by Mukherjee and Chatterjee
relaxes the independence assumption, making it more robust and a better estimator for
data sets where there is some deviation from the independence assumption, while also
increasing the model's power. In this paper, since the results of the classical logistic
regression had greater p-values for most of the parameters, using the empirical Bayes-
type shrinkage estimator was less relevant.
The classical logistic regression demonstrated greater power to estimate the smoking
main effect as well. The medial p-value for the classical logistic regression was
4.23*10-11
, and the median p-value for the C&C regression was higher at 6.99*10-11
.
However, the opposite is true for the APOE-Ɛ2 main effect. For both effects the
estimated standard error was lower in the C&C regression, but the direction of the
bias effected the Z test parameter and p-values.
Aim 4: To determine if there are interactions between APOE-Ɛ2 and smoking or
coffee drinking and quantify their effect based on the results of this study.
The classical logistic regression showed a significant interaction between APOE-Ɛ2
and smoking in contrast to previous studies (69). In the case of this paper, due to the
violation of the independence assumption, the classical logistic regression effect is
more reliable than the effect estimated by the C&C model. To verify the reliability of
the result, since the sample size was indeed small, a standardized bootstrap was
performed and a 95% confidence interval of the OR was (0.015-0.372). Therefore we
may determine that the interaction effect OR is 0.129. It is important to note that this
interaction may be the result of population stratification, perhaps by ethnic or cultural
factors. This study did not include sufficient background, explanatory variables.
Genome-wide association is now enabling the discovery of more and more genetic
factors in Parkinson's disease (77). This study considered known genetic and lifestyle
factors and their interactions. The presence of a smoking- APOE-Ɛ2 interaction could
greatly benefit the study of Parkinson's Diseases, its etiology, and its treatment.
Current research suggests that smoking may have neuroprotective effects, and
29
stimulates dopamine neurons damaged in persons with Parkinson's disease (78). In
this study, it is unclear whether the effects of smoking and APOE-Ɛ2 were only in the
interaction. It is possible that there are main effects, and that there wasn't sufficient
power to detect them if the inclusion of the interaction in the models reduced the
power to estimate the main effects. According to a previous study inclusion of
interactions in models usually reduces the power to estimate main effects. (14). The
results of studies of APOE-Ɛ2 and Parkinson's disease have been inconsistent (41). It
is still unclear if previous studies were inconsistent because the APOE-Ɛ2 OR is very
close to 1, or if perhaps the effects of APOE-Ɛ2 are not main effects but rather only in
interaction with other exposures or lifestyles. It is possible that there is publication
bias in the studies of APOE-Ɛ2 as a main effect and Parkinson's disease. If the sample
size of the study was larger there might have been sufficient power to estimate main
effects as well if they exist.
As mentioned earlier the results of the analysis on coffee drinking were not
statistically significant, likely due to small sample size. Previous studies have found
and interaction between coffee drinking and APOE-Ɛ2, and warrants further study
(69).
Sufficient sample size and power continues to be a challenge in the case control
studies of chronic and rare diseases. Many studies neglect to consider interaction
gene-environment effects, which are often relevant in many chronic diseases (2).
However, introducing a gene-environment interaction into the analysis requires an
even larger sample size (79). This study demonstrates a possible solution to these
problems, as well as the possible roadblocks to using it.
Limitations:
This study was limited in size. The controls included two distinct groups, spouses and
other patients, to enlarge the total control group. The control group was not
completely random and therefore biased. There is also some evidence that smoking
increases the risk or rheumatoid arthritis, and some of the control patients may have
had rheumatoid arthritis (80). Some portion of the controls sampled from the
Rheumatism and Arthritis patients in Sourasky Medical Center may have had
rheumatoid arthritis.
Due to the length of the questionnaires there were missing responses to the coffee
questions which were located towards the end. Previous studies have shown that there
may be an interaction between coffee drinking and APOE-Ɛ2, and further study is
needed to verify these results. This study only investigated first degree interactions
due to its limited size. In smoking and APOE-Ɛ2, as well as other future studies, it is
necessary to investigate higher degree interactions as well, such as gene-gene-
environment or gene-environment-life-stage etc., to fully understand the etiology.
This study lacked the power to include addition background variables such as
31
ethnicity and education. These variables were lacking in the regression models, as
well as the independence tests, as they could have been used for stratification.
Further studies are needed to verify the reproducibility of these results, that they are
indeed indicative of an existing interaction between smoking and APOE-Ɛ2 rather
than the result of stratification. The same is true for the results of the independence
tests. Future studies are needed to better understand the interaction between smoking
and APOE-Ɛ2; to determine whether main effects of smoking and APOE-Ɛ2 and
present in the presence of the interaction effect, and if the interaction is of a first
degree or of a higher degree, interacting with other genes or exposures. As
demonstrated in this paper, there is a need for further development of statistical tools
for small case-control studies, especially for studies including interaction effects. It
would be interesting to use the bootstrapping method used in this paper to estimate the
standard error of effects from the C&C regression when the independence assumption
is violated, rather than rely on the estimations of the model. There is also a need to
further explore the performance of the C&C model under different conditions using
higher degree interactions.
Bibliography
1. "Naure Versus Nurture" and incompletely penetrant mutations. Simon, DK, Lin, MT and
Pascual-Leone, A. 2002, Journal of Neurology, Neurosurgery and Psychiatry, pp. 686-689.
2. Design and analysis issues in gene and environment studies. Lui, Chen-Yu, et al. 2012,
Environmental Health, p. 11:93.
3. Gene–environment correlations: a review of the evidence and implications for prevention
of mental illness. Jaffee, SR and Price, TS. 2007, Molecular Psychiatry, pp. 12, 432-442.
4. Case-control study of interactions between genetic and environmental factors in
Parkinson's disease. Palma, Guiseppe De, et al. 9145, 1998, The Lancet, Vol. 352, pp. 1986-
1987.
5. Gene-environment interactions in parkinsonism and Parkinson’s disease: the Geoparkinson
study. Dick, Finlay D, et al. 10, 2007, Occupational and Environmental Medicine, Vol. 64, pp.
673-680.
6. A case-control study of Parkinson's disease and tobacco use: Gene-tobacco interactions.
Palma, Guissepe De, et al. 7, 2010, Movement Disorders, Vol. 25, pp. 912-919.
7. Gene–environment interactions: Key to unraveling the mystery of Parkinson's disease.
Gao, Hui-Ming and Hong, Jau-Shyong. 1, 2011, Progress in Neurobiology, Vol. 94, pp. 1-19.
31
8. Gene-Environment Interactions in Depression Research. Reid, Scott M. Monroe and Mark
W. 10, 2008, Psychological Science, Vol. 19, pp. 947-956.
9. Colorectal Adenomas and the C677T MTHFR Polymorphism: Evidence for Gene-
Environment Interaction? Ulrich, Cornelia M, et al. 8, 1999, Cancer Epidemiology,
Biomarkers and Prevention, Vol. 8, p. 659.
10. A gene–environment interaction between smoking and shared epitope genes in HLA–DR
provides a high risk of seropositive rheumatoid arthritis. Padyukov, Leonid, et al. 10, 2004,
Arthritis and Rheumatism, Vol. 50, pp. 3004-3092.
11. Advances in Environmental Epidemiology. Tanner, Caroline M. 2010, Movement
Disorders, pp. S58-S52.
12. Logistic disease incidence models and case-control studies. Prentice, R L and Pyke, R. 3,
1979, Biometrika, Vol. 66, pp. 403-411.
13. The Design of Case-Control Studies: The influence of Confounding and Interaction Effects.
Smith, P G and Day, N E. 1984, International Journal of Epidemiology, pp. 356-365.
14. Does accounting for gene-environrment interaction increase the power to detect the
effect of a gene in a multifactorial disease. Selinger-Leneman, Hana, et al. 2003, Genetic
Epidemiology, pp. 200-207.
15. Non-hierarchical Logistic Models and Case-Only Designs for Assessing susceptibility in
population-based case-control studies. Piegorsch, Walter W, Weinberg, Clarice R and
Taylor, Jack A. 1994, STATISTICS IN MEDICINE, Vol. 13, pp. 153-162.
16. Designing and analysing case-control studies to exploit independence of genotype and
exposure. Umbach, David M and Weinberg, Clarice R. 15, 1997, Statistics in Medicine, Vol.
16, pp. 1731-1743.
17. Semiparametric maximum likelihood estimation exploiting gene-environment
independence in case-control studies. Chaterjee, Nilanjan and Carroll, Raymond J. 2, 2005,
Biometrika, Vol. 92, pp. 399-418.
18. Epidemiology of Parkinson’s disease. Alves, Guido, et al. 5, 2008, Journal of Neurology,
Vol. 255, pp. 18-32.
19. Brain dopamine and the syndromes of Parkinson and Huntington Clinical, morphological
and neurochemical correlations. Berheimer, H, et al. 4, 1973, Journal of the Neurological
Sciences, Vol. 20, pp. 415-455.
20. Dopamine transporters and neuronal injury. Miller, Garry W, et al. 10, 1999, Trends in
Pharmacological Sciences, Vol. 20, pp. 424-429.
21. The Sydney multicentre study of Parkinson's disease: progression and mortality at 10
years. Hely , M A, et al. 1999, Journal of Neurology, Neurosurgery, and Pscychiatry, pp. 300-
307.
32
22. Increased life expectancy resulting from addition of l-deprenyl to Madopar® treatment in
Parkinson's disease: A longterm study. Birkmayer, W, et al. 1985, Journal of Neural
Transmission, pp. 113-127.
23. Economic burden associated with Parkinson's disease on elderly Medicare beneficiaries.
Noyes, Kate, et al. 3, 2006, Movement Disorders, Vol. 21, pp. 362-372.
24. Longitudinal study of the socioeconomic burden of Parkinson’s disease in Germany.
Winter, Y, et al. 9, 2010, European Journal of Neurology, Vol. 17, pp. 1156-1163.
25. Staging of brain pathology related to sporadic Parkinson’s disease. Braak, Heiko, et al. 2,
2003, Neurobiology of Aging, Vol. 24, pp. 197-211.
26. Parkinson's disease: genetics and pathogenesis. Shulman, JM, Jager, PL de and Feany,
MB. 2011, Anual Review of Pathology Mechaisms of Disease, Vol. 6, pp. 193-222.
27. Use of a Refined Drug Tracer Algorithm to Estimate Prevalence and Incidence of
Parkinson's Disease in a Large Israeli Population. Chillag-Talmor, Orly, et al. 2011, Journal of
Parkinson's Disease, pp. 35-47.
28. LRRK2 G2019S as a Cause of Parkinson's Disease in Ashkenazi Jews. Ozelius, Laurie J, et
al. 2006, The New England Journal of Medicine, pp. 424-425.
29. Dynamics of Parkinsonism–Parkinson’s Disease in Residents of Adjacent Kibbutzim in
Israel’s Negev. Goldsmith, J R, et al. 1997, Environmental Research, pp. 156-161.
30. α-Synuclein Locus Triplication Causes Parkinson's Disease. Singleton, A B, et al. 2003,
Science, Vol. 302, p. 841.
31. Interaction of α-synuclein and tau genotypes in Parkinson's disease. Mamah, Catherine
E, et al. 3, 2005, Annals of Neurology, Vol. 57, pp. 439-443.
32. The LRRK2 Gly2385Arg variant is associated with Parkinson’s. Tan, E K, et al. 2007,
Human Genetics, pp. 857-863.
33. N-acetyltransferase-2 polymorphism in Parkinson’s disease: the Rotterdam study.
Harhangi, Sanjay B, et al. 1999, Hournal of Neurology, Neurosurgery, and Psychiatry, pp.
518-520.
34. A study of five candidate genes in Parkinson’s disease and related neurodegenerative
disorders. Nicholl, D J, et al. 1999, Neurology, p. 1415.
35. Genome-wide association study reveals genetic risk. Simon-Sanchez, Javier, et al. 2009,
Nature Genetics, pp. 41:12:1308-1314.
36. Gene-Gene Interaction Between FGF20 and MAOB in Parkinson Disease. Gao, X, et al. 2,
2008, Annals of Human Genetics, Vol. 72, pp. 157-162.
37. Genotype-Phenotype correlations between GBA mutations and Parkison's disease risk
and onset. Gan-Or, Z, et al. 2008, Neurology, pp. 2277-2283.
33
38. Multicenter Analysis of Glucocerebrosidase Mutations in Parkinson's Disease. Sidransky,
E, et al. 2009, The New England Journal of Medicine, pp. 1651-1661.
39. APOE-ε2 allele associated with higher prevalence of sporadic Parkinson disease. Huang,
Xuemei, Chen, Peter C and Poole, Charles. 2004, Neurology, pp. vol. 62 no. 12 2198-2202.
40. The Apolipoprotein E ϵ4 Allele in Parkinson's Disease with Alzheimer Lesions.
Egensperger, R, et al. 1996, Biochemical and Biophysical Research Communications, pp. 484-
486.
41. Apolipoprotein E genotype as a risk factor for susceptibility to and dementia in
Parkinson’s Disease. Williams-Gray, Carroline, et al. 2009, Journal of Neurology, pp. 493-
498.
42. Prognosis of Parkinson's Disease - Risk of Dementia and Mortality: The Rotterdam Study.
de Lau, Lonneke M L, et al. 2005, Archives of Neurology, pp. 1265-1269.
43. Apolipoprotein E controls the risk and age at onset of Parkinson disease. Li, Y J, et al.
2004, Neurology, pp. 2005-2009.
44. Diertary Fat Clearance in Normal Subjects is Regulated by Genetic Variation in
Apolipoprotien E. Weintraub, Moshe S, Eisenberg, Shlomo and Breslow, Jan L. 1987, Journal
Of Clinical Investigation, pp. 1571-1577.
45. The Apolipoprotien E Polymorphism: A Comparison of Frequenies and Effects in Nine
Populations. Hallman, Michael D, et al. 1991, American Journal of Human Genetics, pp. 338-
349.
46. Apolipoprotein-E Genotype and the Risk of Developing Cholelithiasis following Bariatric
Surgery: a Clue to Prevention of Routine Prophylactic Cholecystectomy. Abeid, Subhi Abu, et
al. 2002, Obesity Surgery, pp. 354-357.
47. Dose-dependent protective effect of coffee, tea, and smoking in Parkinson's disease: a
study in ethnic Chinese. Tan, E K, et al. 1, 2003, Journal of the Neurological Sciences, Vol.
216, pp. 163-167.
48. Environmental Risk Factors and Parkinson's Disease: A Metaanalysis. Priyadarshi,
Anumeet, et al. 2, 2001, Environmental Research, Vol. 86, pp. 122-127.
49. A Case-Control Study of Parkinson’s Disease in Urban Population of Southern Israel.
Herishanu, Yuval O, et al. 2001, The Canadian Journal of Neurological Sciences, pp. 144-147.
50. Occupational and Environmental risk factors for Parkinson's Disease. Lai, B C L, et al.
2002, Parkinsonism and Related Disorders, pp. 297-309.
51. Searching for a relationship between manganese and welding and Parkinson's disease.
Jankovic, J. 2005, Neurology, pp. 2021-2028.
52. Smoking, alcohol, and coffee consumption preceding Parkinson's disease. Benedetti, M
D, et al. 2000, Neurology, Vol. 55, pp. 1350-1358.
34
53. Association of Coffee and Caffeine Intake With the Risk of Parkinson Disease. Ross,
Webster G, et al. 20, 2000, The Journal of the American Medical Association, Vol. 283, pp.
2674-2679.
54. Coffee Consumption, Gender, and Parkinson’s Disease Mortality in the Cancer Prevention
Study II Cohort: The Modifying Effects of Estrogen. Ascherio, Alberto, et al. 10, 2004,
American Journal of Epidemiology, Vol. 160, pp. 977-984.
55. Coffee and tea consumption and the risk of Parkinson's disease. Hu, Gang, et al. 15,
2007, Movement Disorders, Vol. 22, pp. 2242-2248.
56. Prospective study of coffee consumption and risk of Parkinson's diseaseCoffee
consumption and Parkinson's disease. Saaksjarvi, K, et al. 7, 2008, European Journal of
Clinical Nutrition, Vol. 26, pp. 908-915.
57. Differential Effects of Black versus Green Tea on Risk of Parkinson's Disease in the
Singapore Chinese Health Study. Tan, Louis C, et al. 5, 2008, American Journal of
Epidemiology, Vol. 167, pp. 553-560.
58. Prospective Study of Cigarette Smoking and the Risk of Developing Idiopathic Parkinson's
Disease. Grandinetti, Andrew, et al. 12, 1994, American Journal of Epidemiology, Vol. 139,
pp. 1129-1138.
59. Exploring an interaction of adenosine A2A receptor variability with coffee and tea intake
in Parkinson's disease. Tan, E K, et al. 2006, American Journal of Medical Genetics, Vol. B,
pp. 634-636.
60. Neuroprotection by Caffeine and A2A. Chen, Jiang-Fan, et al. 2001, Journal of
Neuroscience, Vol. 21, pp. 1-6.
61. Smoking and Parkison's disease. Godwin-Austin, R B, et al. 1982, Journal of Neurology,
Neurosurgery and Psychiatry, Vol. 45, pp. 577-581.
62. Smoking and Parkinson's disease. Gorell, Jay M, et al. 1, 1999, Neurology, Vol. 52, p. 115.
63. Parkinson's Disease Risks Associated with Cigarette Smoking, Alcohol Consumption, and
Caffeine Intake. Checkoway, Harvey, et al. 8, 2002, American Journal of Epidemiology, Vol.
155, pp. 732-738.
64. Risk and protective factors for Parkinson's disease: A study in Swedish twins. Wirdefeldt,
Karin, et al. 1, 2005, Annals of Neurology, Vol. 57, pp. 27-23.
65. Pooled Analysis of Tobacco Use and Risk of Parkinson Disease. Ritz, Beate, et al. 7, 2007,
Neurology, Vol. 64, pp. 990-997.
66. Smoking, nicotine and Parkinson's disease. Quik, Maryka. 9, 2004, Trends in
Neurosciences, Vol. 27, pp. 561-568.
67. A Meta-analysis of Coffee Drinking,. Hernan, Miguel A, et al. 3, 2002, Annals of
Neurology, Vol. 52, pp. 276-284.
35
68. Update in the epidemiology of Parkinson's disease. Elbaz, Alexis and Moisan, Frederic.
2008, Current Opinion in Neurology, Vol. 21, pp. 454-460.
69. Exploring gene-environment interactions in Parkinson’s disease. McCulloch, Collin C, et
al. 2008, Human Genetics, pp. 257-265.
70. Case-only study of interactions between genetic polymorphisms of GSTM1, P1, T1 and Z1
and smoking in Parkinson's disease. Deng, Y, et al. 2004, Neuroscience letters, pp. 326-331.
71. APOE-E2 allele assiciated with higher prevalence of sporadic Parkison disease. Huang,
Xuemei, Chen, Peter C. and Poole, Charles. 2004, Neurology, pp. 2198-2202.
72. Bhattacharjee, Samsiddhi , Chatterjee , Nilanjan and Wheeler , William . An R package
for analysis of case-control studies in genetic epidemiology. An R package for analysis of
case-control studies in genetic epidemiology. [Online] 2010.
http://127.0.0.1:16855/library/CGEN/html/CGEN.html.
73. The Design of Case-Control Studies: The Influence of Confounding and Interaction Effects.
Smith, P G and Day, N E. 3, s.l. : International Journal of Epidemiology, 1983, Vol. 13. 356-
365.
74. An Epidemiologic Approach to Gene-Environment Interaction. Ottman, Ruth. s.l. :
Genetic Epidemiology, 1990, Vol. 7. 177-185.
75. Genome-wide meta-analyses identify multiple loci associated with smoking behavior.
Consortium, The Tobacco and Genetics. 2010, Nature Genetics, pp. 441-447.
76. Exploiting Gene-Environment Independence for Analysis of Case–Control Studies: An
Empirical Bayes-Type Shrinkage Estimator to Trade-Off between Bias and Efficiency.
Mukherjee, Bhramar and Chatterjee, Nilanjan . 2008, Biometrics, pp. 685-694.
77. Genome-wide association study reveals genetic risk underlying Parkinson's disease.
Simon-Sanchez, Javier , et al. 1308 - 1312, s.l. : Nature Genetics, 2009, Vol. 41.
78. Smoking, nicotine and Parkinson's disease. Quik, Maryka. 9, s.l. : Trends in
Neurosciences, 2004, Vol. 27. 561-568.
79. Minimum Sample Size Estimation to Detect GeneEnvironment Interaction in Case-Control
Designs. Hwang, Shih-Jen, et al. 11, s.l. : American Journal of Epidemiology, 1994, Vol. 140.
1029-1037.
80. Cigarette smoking increases the risk of rheumatoid arthritis: Results from a nationwide
study of disease-discordant twins. Silman , Alan J, Newman, Jason and Macgregor ,
Alexander J. 1996, Arthritis and Rheumatism, pp. 732-735.
36
תקציר
וסביבה גנים בין אינטראקציות של השפעתם את בבירור מראים האחרונות שנים של מחקרים
עוצמה נדרשת וניתוח במחקר האינטראקציות הכללת לשם .רבות כרוניות מחלות על
לוגיסטית רגרסיה של ביצועים חקירת הינה הנוכחית העבודה מטרת .משמעותית סטטיסטית
.בישראל פרקינסון מחלת של תרובק-מקרה במחקר וסביבה גנים של תלות אי בהנחת
-APOE קפה ושתיית APOE-Ɛ2 סיגריות עישון בין אינטראקציה לבדיקת ברגרסיה שתמשנוה
Ɛ2 צולב יחס. קלאסית תסטילוגי לרגרסיה לעיל רגרסיה השוות בוצע .פרקינסון מחלת לבן
עבור ניתוח תוצאת(. CI 95% 1.13-1.59) 1.129 היה APOE-Ɛ2 סיגריות עישון עבור
-OR=1.648 (CI 95% 1.21, סטטיסטית משמעותית לו הייתה APOE-Ɛ2 קפה שתיית
ותסטילוגי רגרסיות שתי השוואת, APOE-Ɛ2 -ל עישון בין אפשרית תלות הודות(. 2.12
לצפות היכולת חוסר למרות, זה במקרה לוגיסטית קלאסית רגרסיה של עדיפות הראתה
של פיתוח המשך נדרש, כרוניות מחלות של במחקר נוכחיים אתגרים לאור .עיקריות השפעות
.תרובק-מקרה במחקרי אטרקציות לחיזוי סטטיסטיים כלים