Case control study of interactions in Parkinson's disease using logistic regression exploiting...

38
Tel Aviv University Raymond and Beverly Sackler Faculty of Exact Sciences Case control study of interactions in Parkinson's disease using logistic regression exploiting gene- environment independence This work is submitted in partial fulfillment of the requirements for the degree of Masters of Sciences (M.Sc) in Biostatistics at Tel aviv University The Deparment of Statistics and Operations Research Written By: Lital Bridavsky Supervised by: Dr. Chava Peretz Prof. Saharon Rosset September 2013

Transcript of Case control study of interactions in Parkinson's disease using logistic regression exploiting...

Tel Aviv University

Raymond and Beverly Sackler

Faculty of Exact Sciences

Case control study of interactions in Parkinson's disease using logistic regression exploiting gene-

environment independence

This work is submitted in partial fulfillment of the requirements for the degree of Masters of Sciences (M.Sc) in Biostatistics at Tel aviv University

The Deparment of Statistics and Operations Research

Written By: Lital Bridavsky

Supervised by: Dr. Chava Peretz

Prof. Saharon Rosset

September 2013

אוניברסיטת תל אביב

הפקולטה למדעים מדויקים

ש ריימונד ובברלי סאקלר"ע

ביקורת של אינטראקציות במחלת פרקינסון -מחקר מקרהבשימוש רגרסיה לוגיסטית בהנחת אי תלות בין סביבה

וגנים

חיבור זה הוגש כחלק מהדרישות לתואר

באוניברסיטת תל אביב לביוסטטיסטיקה M.Sc" מוסמך אוניברסיטה"

לסטטיסטיקה וחקר ביצועים החוג

ע"י: ליטל ברידבסקי

העבודה הוכנה בהדרכתו של:

Dr. Chava Peretz Prof. Saharon Rosset

3102ספטמבר

1

Table of Contents

1. Introduction ........................................................................................ 4

Gene-Environment Etiology in Neurological Diseases .................................................... 4

Statistical Challenges in Exploring Gene-Environment Interactions .................................. 4

Parkinson's disease:................................................................................................... 5

2. Aim ............................................................................................................... 9

Specific Aims ............................................................................................................ 9

3. Methods ................................................................................................. 9

Study Design ............................................................................................................. 9

Study Population ....................................................................................................... 9

Study Variables: ...................................................................................................... 11

Statistical Analysis ................................................................................................... 11

Ethics ..................................................................................................................... 15

4. Results ..................................................................................................... 15

Summary Statistics .................................................................................................. 15

Independence Tests ................................................................................................ 19

Classical Logistic Regression Using the Empirical Data ................................................. 20

C&C Logistic Regression using the Empirical Data ........................................................ 22

Simulations ............................................................................................................ 23

5. Discussion ............................................................................................. 26

Bibliography ............................................................................................................... 30

2

Table of Figures

Table 1: Distribution of study variables among cases and controls …………………………………….16

Table 2: Contingency table of APOE-Ɛ2 and disease status…………………………………………………16

Table 3: Contingency tables of APOE-Ɛ2 and Smoking status cases…………………………………….16

Table 4: APOE-Ɛ2, Coffee and Smoking Profile Frequencies………………………………………………..17

Table 5a: Coffee and Smoking Missing-ness Among Cases and Controls……………………………..17

Table 5b: Agreement to Genetic testing Among Cases and Controls……..…………………………...17

Figure 1a: Smokers, APOE-Ɛ2 carriers, and country of origin……………………………………………….18

Figure 1b: Coffee drinkers, APOE-Ɛ2 carriers, and country of origin……………….....................18

Table 6a: Ever Smoker and APOE-Ɛ2 carrier contingency table for controls………………………..19

Table 6b: Coffee Consumption and APOE-Ɛ2 carrier contingency table for

controls…………………………………………………………………………………………….…………………………………19

Table 7a: Classical Logistic Regression using the empirical study data - smoking & APOE-Ɛ2

………………………………………………………………………………………………….........................................20

Table 7b: Classical Logistic Regression using the empirical study data - Coffee Drinking and

APOE-Ɛ2……………………………………………………………………………………………………………………………...21

Table 8a: C&C Logistic Regression using the empirical study data - smoking & APOE-Ɛ2

…………………………………………………………………………………………………………………………………………...22

Table 8b: C&C Logistic Regression using the empirical study data - coffee drinking

…………………………………………………………………………………………………………………………………………...22

Table 9a: Example Classical Logistic Regression of simulated data set i…………………..…………..23

Table 9b: Example C&C Regression of simulated data set i………………………………………………….24

Table 10: Summary statistics of simulated effects - simulation i……………………….…………………24

Table 11a: Example Classical Logistic Regression of simulated data set ii………………………….…25

Table 11b: Example C&C Regression of simulated data set ii………………………………………….……25

Table 12: Summary statistics of simulated effects - simulation ii……………………………..………….25

3

Abstract:

It has become evident in recent years that gene-environment interactions play a

significant role in many chronic diseases. Incorporating such analysis into

studies requires greater statistical power. This paper sought to investigate the

performance of logistic regression exploiting gene-environment independence in

an Israeli case-control study of Parkinson's disease. This regression was used to

consider a cigarette smoking - APOE-Ɛ2 interaction and a coffee drinking- APOE-

Ɛ2 interaction in Parkinson's disease. The above regression was compared to a

classical logistic regression. For the smoking- APOE-Ɛ2 interaction the odds ratio

was 0.129 (95% CI 0.03-0.59). The result of the coffee- APOE-Ɛ2 analysis was not

significant, OR= 0.648 (95% CI 0.21-2.02). Due to a possible dependence

between smoking and APOE-Ɛ2, the comparison of the two logistic regressions

showed that in this case the classical logistic regression was favorable, although

it still lacked power to estimate the main effects. In light of current challenges in

the study of chronic diseases, further development of statistical tools for

estimating interactions in case-control studies is needed.

4

1. Introduction

Gene-Environment Etiology in Neurological Diseases

The etiology of many chronic diseases is unknown. Previously, the roles of "nature

versus nurture" in studies of disease risk were polarized (1). Although some diseases

are predominantly genetic or environmental, we have become aware that many

diseases, especially neurological diseases, have complex mechanisms which involve

interactions between genetics and the environment. Recent advances in human

genomics and biotechnology make is possible to study large numbers of genetic

markers and explore their interactions with environment as potential risk factors. It is

well known that environmental exposures may vary over time, but it's not frequently

considered that gene expression also varies over time. Epigenetic mechanisms of

changes in gene function include alterations in DNA methylation, histone

modification, and microRNA. The effects of toxic exposures have been found to be

mediated by epigenetic mechanisms (2). Studies have already shown gene-

environment interactions in neurological conditions such as Alzheimer's Disease (3)

Parkinson's Disease (4) (5) (6) (7), Depression (3), (8) and other conditions (9) (10).

In the case of Parkinson's disease only a small proportion of cases can be attributed to

a single genetic mutation or environmental factor. A complex interaction of genes and

environment likely underlie the majority of the cases. Twin studies have shown low

concordance of Parkinson's disease, with the exception of young onset PD, indicating

the importance of non-genetic factors (11).

Statistical Challenges in Exploring Gene-Environment Interactions

Cohort, case only and case control study designs have been used to study gene-

environment interactions. Because genotypes do not vary over time, and genotypes

precede phenotypes, case control studies have been known to have greater efficacy

than cohort studies in determining genetic main effects (2).

Cohort studies also incorporate the effects of time. In such studies the gene-

environment interaction is incorporated into a 3 way interaction in which time or life

stage is the third factor. If the gene-environment interaction varies over time, or is

prominent at a specific time and dormant in others, then plotting the coefficients as

functions of time can reveal such patterns. However, due to the high cost of Cohort

studies, case-control studies are more common and practical.

The most popular choice for case control analysis is logistic regression. Prentice and

Pyke showed that one can estimate all the regression coefficients except for the

intercept from retrospective studies using ordinary logistic regression as if the data

were obtained in a prospective study in rare diseases (12).

When the aim of a study is to detect interaction effects as well, the sample size needs

to be four times larger than a study of only main effects of the same magnitude (13).

One study explored the implications of incorporating a gene-environment interaction

into a classical logistic regression model on the power to estimate the genetic main

effect, and found that in some scenarios incorporations of the interaction increased the

power and in others it decreased the power (14). This becomes even more challenging

when investigating interactions of increasing degrees, or multiple gene interactions.

5

Genome wide association studies have studied thousands of genetic markers to

identify the genetic variables associated with diseases.

Peigorsch et al suggest a method of estimating an interaction parameter among cases

only. Assuming independence between gene and exposure in the general population,

and that the disease is rare, they estimate the interaction effect among cases alone by

treating the exposure as the outcome and estimating the main effects for the genetic

categories in a logistic regression (15). Umbach and Weinberg demonstrate estimation

of logistic regression odds ratios through log-linear methods. They also use the log

linear method to incorporate gene-environment independence. This enables a slight

increase in precision for estimation of gene and environment main effects and a

greater increase in precision to estimate the interaction effect. Additionally, the

independence assumption allows the ability to collect genetic information from cases

only to estimate the interaction effect (16).

Nilanjan Chatterjee and Raymond J. Carroll recently developed a logistic regression

with semi parametric maximum likelihood estimation exploiting gene-environment

independence to increase the statistical power of the model without increasing the

sample size. (17). This model has several advantages over previous models which

exploit the gene-environment independence. This method does not require a rare

disease assumption. It also allows continuous and non-parametric distribution of

exposure, and ease of incorporating additional background variables. This logistics

regression model will later on be referred to as C&C.

Parkinson's disease:

Epidemiology

Parkinson's disease is a progressive neurological disease. It's a rare disease, but the

second most common neurodegenerative disease (18) (7). There is no known cure.

Tremor at rest is the most common sign of the motor onset of the disease. The motor

symptoms are part of the earlier symptoms of PD. For this reasons Parkinson's disease

is most often diagnosed in movement disorder clinics. Other motor symptoms of PD

include bradykinesia, muscle rigidity and gait disturbances. In the later stages of the

disease patients suffer from cognitive impairment, dementia, sleep disturbances, and

autonomic disturbances (18). Many of the earlier symptoms are caused by a lowered

production of dopamine in the brain (19) (20) (7).

The average age of onset of PD is around 60 years, although diagnosis of the disease

is difficult and often done many months after onset occurs. The estimated prevalence

is about 100-200 per 100,000 people and an age standardized incidence of 8.6-19 per

100,000 (18). The standardized mortality ratio for Parkinson's disease is 1.58 (95% CI

1.21-4.4) (21). There are 5-10% of PD patients who have early onset PD (occurring

before 50 years of age) (7). The median survival time of PD patients from diagnosis is

10.3 years (18). With the current treatments available, that can be increased to 12

years (22).

Life expectancy of PD patients is decreased due to the disease despite current

treatments available. Most patients survive long enough to require significant medical

6

and day to day assistance due to the progression of the disease. Due to the length and

severity of the disease it incurs a high socioeconomic cost. (23) (24).

Little is known about its etiology. Its currently known that the presence of

intracellular Lewy bodies found in brain cells are part of the neuropathological

character of the disease (25) (26).

Epidemiology in the Jewish Population: The latest study of Parkinson's disease in

Israel indicates that the prevalence of Parkinson's disease in Israel is 256 per 100,000,

higher than in most other populations (27). Ashkenazi Jewish descendants have a

higher prevalence of Parkinson's disease relative to the general population due to

genetic mutations (28) and a high correlation to GBA carriers, which are common

among Ashkenazi Jews as well. In 1989 a cluster of Parkinson's disease patients

developed in three adjacent Kibbutzim in the Negev of southern Israel. Currently, the

reported incidence of Parkinsonism in this cluster area is 471 per 100,000 people,

more than twice the incidence worldwide (29). Currently the most likely suspect of

the cause of this cluster is the mutual water supply well that these three Kibbutzim

share.

Genetic, Lifestyle and Environmental risk factors

Genetics: In general family members of PD patients are 3-4 times more likely to

develop PD. Lately studies have found a number of genes possibly correlated with the

incidence of PD, among them PARK4 (dominant LOD 2.64) (30), SNCA (recessive

OR 1.36; 95% CI 1.02-1.82), MAPT (recessive OR, 1.41; 95% CI, 1.03–1.93) (31)

and LRRK2 Gly2385Arg variant (recessive OR 2.67; 95% CI 1.43-4.99) (32). NAT2

was suspected as a risk factor for PD but has since been found not to have significant

effects of the risk of Parkinson's disease by several studies (33) (34). A genome wide

association study of a Japanese population sample also found PARK16 to be

associated with reduced risk for PD (dominant OR 0.66) (35). In Gao et al a gene-

gene interaction using logistic regression was found between FGF20 and MAOB

polymorphisms OR 0.52; (95% CI 0.3-0.9) (36). Mamah et al. found that the MAPT

and SNCA polymorphisms had the same quantitative effect on risk for PD together as

they did alone (31).

A case control study in America found that Ashkenazi Jew's had much higher

frequency of the LRRK G2019S mutation, with an OR of 17.6 (95% CI 5.9-52.2,

dominant.) The study also found that the LRRK G2019S mutation is very common in

Ashkenazi Jews in North America, 15-20 times more common than in European

studies of the gene mutation (28). A later cohort study in Israel found that the LRRK

G2019S mutation and many GBA mutations associated with PD risk are very

common among Israeli Jews. GBA mutation carriers are common among Ashkenazi

Jews. One third of the cohort study population had either LRRK or GBA. Moderate to

severe GBA mutations were associated with increased risk for PD (OR 5.6; CI 95%

3.4-18.9). LRRK2 mutation carriers were associated with a lower age of disease onset

by 3-9.7 years compared to non-carrier groups (37). An international, multi center

case-control study found a Mantel-Haenszel OR of 5.43 (dominant) for all GBA

mutations (38).

Genetics-APOE: APOE is a gene studied in association with Alzheimer's disease and

later Parkinson's disease as well. APOE has three allele types - Ɛ2, Ɛ3 and Ɛ4. Ɛ3 is

7

the most prevalent among all populations. APOE-Ɛ4 has been associated with

increased risk for Alzheimer's disease. Since Alzheimer's disease and Parkinson's

disease are similar neurological progressive diseases this led to the study of APOE in

Parkinson's disease as well. Several studies found a modest association between

APOE-Ɛ2 and increased risk for Parkinson's disease. The evidence of an association

between APOE-Ɛ4 and Parkinson's disease is weak. A meta-analysis has shown that

APOE-Ɛ2 is associated with a modest increased risk for Parkinson's Disease (OR 1.2;

95% 1.02-1.42) (39) and also found no association between Parkinson's Disease and

APOE-Ɛ4, as did some other studies (40). Another case control study also found

APOE-Ɛ2 to be associated with a modest increase in risk for Parkinson's disease

(dominant OR=1.16; 95% CI 1.03-1.31) (41). Williams-Gray also did a meta-analysis

of other case control studies of APOE-Ɛ2 and Parkinson's disease which showed that

the majority of studies did not succeed in showing a significant effect. They also

stated that to provide sufficient power to detect such a modest effect would require

one study to have upwards of 5000 cases and controls. In a population based cohort

study they found that APOE-Ɛ4 and especially APOE-Ɛ2 were associated with

increased risk of dementia within Parkinson's Patients (dominant, OR=6.27; 95% CI

3.07-12.82)(dominant, OR=13.46; 95% CI 4.46-40.64) (42). One family matched

case-control study found that APOE-Ɛ4 carriers have an increased risk for PD

(recessive, OR 1.8; 95% CI 1.01-3.11) and had earlier ages of PD onset (43).

Worldwide frequencies APOE-Ɛ2 vary between 10-20% (44) (45). In a previous case

control study of an Israeli population which performed genetic testing for APOE they

found that 5% of controls had APOE-Ɛ2 and 53% (7 out of 13) of cases had APOE-Ɛ2

(46).

Environmental Exposure: Several case-control (47) and meta-analysis (48) studies

have found strong evidence that pesticides increase the risk of PD (OR 1.94; 95% CI

1.49-2.53). A case control study in Israel also found increased risk associated with

pesticide exposure (OR 6.81; 95% CI 0.75-64.89) and construction work (OR 2.32;

95% CI 0.84-6.44) (49). Many studies have explored the relationship between

occupational exposures to metals such as copper, iron, lead, mercury, manganese,

aluminum, zinc, cadmium, nickel and arsenic with varying and contradicting results

(50). One study found increased association with PD for chronic exposure over 20

years to copper (OR 2.49; 95% CI 1.06-5.89), manganese (OR 10.61; 95% CI 1.06-

105.83), lead-copper (OR 5.24; 95% CI 1.59-17.24), lead-iron (OR 2.83; 95% CI

1.07-7.50), and iron copper (OR 3.69; 95% CI 1.40-9.71). However, the evidence for

these exposures is still inconclusive as many studies did not find a correlation (51)

(50).

Lifestyle: Many studies, both case-control (47) (52) and prospective (53) (54) (55)

(56) (57), have shown a strong connection between caffeine consumption and a lower

risk for PD (by studying coffee and tea consumption.) Ascherio et al. (2004) found

that the effect of caffeine on PD may be prevented by the use of estrogen therapy in

women. One prospective study (58) concluded that the association found between

coffee and PD was the result of the correlation between coffee and smoking. Coffee

and smoking can be cultural habits and therefore commonly dependent. Most studies

succeed in showing that there is a dose-response relationship between caffeine and PD

therefor the evidence that caffeine is protective is strong. One case-control study

attempted to find an interaction between caffeine consumption and A(2)A genotypes

and found no significant interaction (59). Caffeine is a central nervous system

8

stimulant thought to work by inhibiting the adenosine receptor. Several studies in

mice and primates have shown that adenosine receptor antagonists can improve motor

deficits. If so, then caffeine may reduce the symptoms of PD rather than have a direct

effect on the pathogenesis of PD (53). Another study in mice found that caffeine may

reduce MPTP toxicity by blocking the A(2)A (adenosine) receptor (60).

A few case-control studies (61) (62) (62) (52) (63) (64), meta-analysis (65) and

prospective studies (58) have shown strong evidence that smoking lowers the risk for

PD. Several studies included alcohol in the analysis to evaluate if it has an effect and

to rule out confounding (61) (52). A suggested biological mechanism of protection by

smoking is that nigrostriatal dopaminergic neurons are protected from degeneration

by some element in cigarette smoke. A study using rodents found that exposure to

smoking or nicotine protects these neurons (66)

In a meta-analysis of smoking and coffee effects on developing PD (including many

of the studies cited above) the pooled RR was calculated to be 0.59 (95% CI 0.54-

0.63) for "ever" smokers. For "ever" coffee drinkers the pooled RR was 0.69 (95% CI

0.59-0.80.) (67)

Other anecdotal evidence shows there may be other risk factors including dairy

consumption, estrogen and diabetes. Some other suspected protective factors include

uric acid, hypertension and NSAIDS (54) (68).

Gene-Environment-Lifestyle Interactions

One case control study explored the Gene-Lifestyle interactions between the genes

MAPR, SNCA, APOE and UCHLI with the lifestyles of coffee drinking and cigarette

smoking using logistic regression. The study of interactions showed modest,

borderline significant main effects for APOE-Ɛ2 which showed increased risk for

Parkinson's Disease (OR 1.253; 95%CI 0.91-1.73). Coffee and smoking main effects

were significantly associated with a reduced risk for Parkinson's disease (OR 0.658;

95% CI 0.49-0.87) and (OR 0.689; 95%CI 0.50-0.93) respectively. The study found

there to be an interaction between APOE-Ɛ2 and coffee drinking associated with

reduced risk (dominant, OR 0.339; 95% CI 0.20-0.58). The same study also found an

interaction between SNCA 261 and smoking also associated with reduced risk (OR

0.268; 95% CI 0.12-0.57), but did not find an interaction between smoking and

APOE-Ɛ2 (69). A case-control study using multiple logistic regressions found a gene-

environment interaction between GSTM1 polymorphism and exposure to solvents to

be almost significantly associated with increased risk for Parkinson's Disease (OR

1.76; 95% CI 0.91-3.41) (5). There has been a Case-only study of interactions

between GSTM1, GSTP1, GSTZ1 and smoking in Parkinson's disease. The study

used logistic regression and found a significant interaction between GSTP1-C and

smoking associated with increased risk for Parkinson's disease (OR 2.00; 95% CI

1.11-3.60) (70). Another case control study using logistic regression found that there

is a gene-environment interaction between NAT2 and smoking (OR 0.61; 95% CI

0.43-0.87), GSTM1and smoking (OR 0.67; 95% CI 0.47-0.95) and GSTP1 and

smoking (OR 0.27; 95% CI 0.08-0.93). All the interactions in this study showed

reduced risk for Parkinson's disease in the presence of the respective gene and

smoking status (6).

9

2. Aim The aim of this study was to investigate possible gene-lifestyle, first degree

interactions of APOE-Ɛ2 with smoking and coffee drinking in the risk of Parkinson’s

disease in a case control study.

Specific Aims 1. To test for independence between APOE-Ɛ2 and smoking or coffee drinking

using empirical data.

2. To compare the effects of the multiplicative interaction between the classical

and the C&C logistic regressions using empirical data.

3. To explore the implications of violated independence assumptions using

simulated data.

4. To determine if there are interactions between APOE-Ɛ2 and smoking or

coffee drinking and quantify their effect based on the results of this study.

3. Methods

Study Design

This is a case control study with 3 groups. One group is the case group - participants

diagnosed with Parkinson's disease, and two control groups - the spouses of the

Parkinson's disease patients and patients with Arthritis and Rheumatism undergoing

treatment in Sourasky Medical Center in Tel Aviv. Two trained interviewers collected

the data in face to face interviews.

For the purposes of this study, the two control groups will be treated as one

Study Population

The cases were sequentially recruited from the PD patients of Sourasky Medical

Center. Controls were collected from the spouses of the PD patients who were

married for at least 15 years and Arthritis and Rheumatism patients in Sourasky

Medical Center. Spouses and Arthritis and Rheumatism patients were chosen as the

controls because they were residents of the same suburbs, they were of similar age,

and their socio-demographic characteristics were expected to match those of the PD

patients.

During the interviewer performed questionnaire the participants were asked the

following questions for current and past consumption of instant and regular coffee:

How often did you drink?

1-7 days a week, once a month, twice a month

What size serving did you drink?

A little, A moderate amount, A lot

11

During the interviewer performed questionnaire the participants were asked the

following questions about their smoking habits:

1. Do you smoke? If yes go to 3.

2. Have you smoked in the past? If no go to 4.

3. How many cigarettes per day?

4. Have there been times when you've smoked more, less, or quit? If no go to

6.

5. Smoking history:

a. from age ____ to age ____ smoke ____ cigarettes per day

b. from age ____ to age ____ smoke ____ cigarettes per day

c. from age ____ to age ____ smoke ____ cigarettes per day

6. At what age did you start smoking?

7. What age did you stop smoking?

8. How many years have you smoked in total?

Some participants refused to provide a sample for testing immediately after the

interview and subsequently changed their minds during the duration of the study after

receiving farther explanation. The percentage of refusals was about 20%, among those

1.5% were spouses of cases who were interviewed and then died during the study.

Two blood samples were disqualified. One was that of a spouse who had essential

tremor, and the second was that of a spouse that underwent a bone marrow transplant

prior to the study. Of all the participants who filled out the questionnaire 372 agreed

to genetic testing. Out of those, 370 were used.

Blood or saliva samples were collected for the APOE-Ɛ polymorphisms assessment in

the Medical Center's MDS Clinic. Analysis of the APOE-Ɛ samples was done in the

Molecular Biology Research Center of the Tel-Aviv Sourasky Medical Center by

Professor Avi Or.

Some questionnaire data was missing for some of the participants. Some participants

did not answer the questions about smoking, which were on page 2 of the

questionnaire. All those participants, and more, did not answer the questions about

coffee consumption which were on page 15. The completion of the questionnaires

was compromised by the availability of the participants to complete the

questionnaires with the interviewer.

11

Study Variables:

Main effects

APOE-Ɛ2

APOE has three known isoforms (Ɛ 2, Ɛ 3, Ɛ 4). The APOE-Ɛ2 was assumed

dominant. Its value was 1 for genotypes (Ɛ 2/ Ɛ 2, Ɛ 2/ Ɛ 3, Ɛ 2/ Ɛ 4) and 0 otherwise.

This method was also used in the meta-analysis study (71) as well as a case control

study exploring gene-lifestyle interactions (69) (Due to limitations in the sample

population of this study it was not practical to create categories for all the APOE- Ɛ2

genotypes, as there was only one participant with Ɛ 2/ Ɛ 2.)

Coffee

For the purposes of this paper, this lifestyle variable was converted into a binary

variable, where the categories were either coffee non-drinker or ever coffee drinker.

Due to complications with the collected data, and for simplicity, this analysis did not

use a multi categorical variable for different levels of coffee drinking.

Smoking

For the purposes of this paper, this data was calculated into an ever smoker variable,

having 1 for someone who ever smoked tobacco or 0 for someone for never smoked

(6).

Gender

Gender is a binary variable.

Multiplicative Interactions

Smoking X APOE-Ɛ2

There are two categories in the smoking-APOE-Ɛ2 interaction variable. The

interaction between APOE-Ɛ2 and Smoking will equal 1 for participants who ever

smoked and APOE-Ɛ2 carriers, and 0 for all other participants.

Coffee Drinking X APOE-Ɛ2

There are two categories in the coffee-APOE-Ɛ2 interaction variable. The interaction

between coffee and APOE-Ɛ2 will be 1 for participants who ever drank coffee and are

APOE-Ɛ2 carriers, and 0 for all other participants.

Statistical Analysis

The statistical analysis was done using R 2.13.0 software.

Aim 1: To test for independence between APOE-Ɛ2 and smoking or coffee

drinking using empirical data.

12

Independence tests

Testing for independence between two disease risk variables in a case control study

can be problematic. In this study for example, if there is an associated between

APOE-Ɛ2 and Parkinson's disease and smoking and Parkinson's disease, then by over

sampling cases we'll see an unnatural oversampling in certain profiles. This can cause

the test of dependence to not be rejected, regardless of the real dependence existing in

the general population. In the case of rare diseases, the real population is very similar

to our control sample. For this reason, the dependence test will be performed on the

controls only.

Tests of dependence will be done using Fisher Exact tests with two sided hypothesis

with α = 5%.

Aim 2: To compare the effects of the multiplicative interaction between the

classical and the C&C logistic regressions using empirical data.

Classical Logistic Regression

In a case control study let n0 be the number of controls and n1 be the number of cases,

for a total of n subjects. Let D denote the disease status where D=1 is with disease.

Let Xg be a binary covariate which denotes presence of the gene when Xg =1. Let Xe

be a group of all other exposures.

The logistic regression model that follows for disease risk with parameters β = (α, βg ,

βe , βge ) The log odds of this function is:

where p is the conditional probability of disease. The log likelihood used to estimate

maximum likelihood estimates is:

Where:

and

In a case control-study subjects are sampled by their disease status, so that in fact the

disease is fixed and the X is random which is different from the prospective model

just described. Prentice and Pyke showed, as mentioned above, that ignoring this

problem and using the prospective model on retrospective data still results in ML

parametric estimates similar to the odds ratios of a prospective model.

13

C&C Logistic Regression

When investigating a regression model with an interaction term, a large sample size is

needed. Incorporating a valid assumption into a model can increase statistical

efficiency and power. Previously, David M. Umbach and Clarice R. Weinberg

demonstrated a way to use a log linear model to incorporate an independence

assumption and estimate logistic regression odds ratios. However, their model was for

binary gene and exposure variables, and was quite limited. Raymond Carrol and

Ninjan Chaterjee created a logistic regression model which incorporates an

independence assumption between the interaction variables into a retrospective

likelihood while allowing exposure to be nonparametric and the gene to have multiple

categories (further referred to as C&C.)

As mentioned before, in retrospective data, rather than estimating P(D|X) you would

estimate P(X|D) because you've already determined disease status during case-control

sampling. Let j be the index for gene status and k be the index for exposure. In

population genetics theory, the probabilities of the different genetic statuses {q1…qj}

can be modeled as a function of some parameter vector θ (for example as in the

Hardy-Weinberg equilibrium) just that qj=qj(θ). For exposure values {e1…ek} we

define the distribution as having corresponding probability masses {δ1…δk}. The

retrospective log likelihood is then:

log(L(α,βg, βe , βge,θ,δ))=

Which according to the Bayes theorem is:

The joint probability P(Xg, Xe) is replaced by to P(Xg) P(Xe) to incorporate the

independence assumption. The probability P(Xek) is replaced by δk and P(Xgj) is

replaced by qj(θ) to get this:

This likelihood is used to estimate maximum likelihood estimates for the coefficients.

In this study, the nonparametric aspect of the model is not exploited due to the fact

that the gene and exposure variables are binary.

This analysis was done in R with the CGEN package. (72)

http://127.0.0.1:16855/library/CGEN/html/CGEN.html

14

Standardized Bootstrap Confidence Intervals

Due to the small sample size, a standardized bootstrap was performed to verify the

reliability of the statistically significant interaction effect.

The empirical data was bootstrapped 800 times to calculated standardized OR

bootstrap estimates, and each bootstrap was bootstrapped an addition 800 times to

estimate the standard deviation of each original bootstrap estimate. The lower and

upper limits were calculated with the following equation where is the OR estimate

from the classical logistic regression, "i" is the bootstrap index, is the standard

deviation estimate of the OR from the nested bootstrap of bootstrap i, and is the

standardized estimate of bootstrap i, and is the standard deviation of the OR

estimates of the main bootstrap.

Upper limit =

Lower limit =

Aim 3: To explore the implications of violated independence assumptions using

simulated data.

Simulations

The following are the simulations performed in the study:

i. First simulation: dependence between APOE-Ɛ2 and smoking

The gender, smoking, and allele statuses are sampled with replacement from the

empirical data. Then the Case status is sampled with the probability based on the

former variables per the model resulting from the logistic regression using the

empirical data, as so:

SCases <- rbinom(297,1,(exp(α+βx))/(exp((α+βx)+1)

ii. Second simulation: No association between APOE-Ɛ2 and smoking

This simulation is not sampled from the data but rather generated. Cases are generated

first, to simulate data like a case control sample.

SCases <- rbinom(5000,1,0.5)

Then allele variable is generated based on the Case status having 0.08 chance to have

APOE-Ɛ2 among controls and 0.10 chance among cases (which is loosely based on

the results of the empirical data):

S3allele <- rbinom(5000,1,0.084+0.0168*S3Cases)

15

Then smoking is generated assuming ~15% of cases who have the allele also smoke,

and %35 of the rest of the subjects smoke:

S3Smoke <- rbinom(5000,1,0.35-0.2*S3Cases*S3allele)

Then gender, also based on empirical data:

S3Gender <- rbinom(5000,1, 0.7011542-0.4328611*S3Cases)

Each simulation was be run once and analyzed once by the classical logistic

regression and C&C to display an example of one such simulation. Then each

simulation will be repeated 1000 times to calculate the median estimate and OR, 95%

confidence intervals, and median P-values of the multiplicative interaction

coefficients for both the classical and C&C logistic regressions. The 95% confidence

interval will be calculated by taking the 2.5% percentile value of the OR and the 97.5

percentile value.

Aim 4: To determine if there are interactions between APOE-Ɛ2 and smoking or

coffee drinking and quantify their effect based on the results of this study.

After considering aims 2&3 we’ll come to a conclusion about the existence of an

interaction, and their quantified effects.

Ethics

This study was approved by the Sourasky Medical Center institutional ethics

committee. Each participant signed an informed consent form. Each participant was

informed that they would not be receiving the results of their genetic testing, as per

the suggestion of the American Neurological Association. The APOE sampling was

approved by the Israeli Ministry of Health's ethics committee for human genetic

studies.

4. Results

Summary Statistics

The study includes 409 participants of which 163 cases and 246 controls. Of all the

participants who filled out the questionnaire 306 agreed to genetic testing and had

valid results. Two were discarded because the results of the genetic testing were

invalid. Of all the study participants 402 participants supplied sufficient self-reported

smoking information to determine ever smoking status and 268 provided enough self-

reported information about their coffee drinking habits.

16

Table 1: Distribution of study variables among cases and controls

Cases N

Smoking Ever 96

Never 63

Coffee Drinking Drinker 73

Non-Drinker 18

Controls N

Smoking Ever 79

Never 164

Coffee Drinking Drinker 163

Non-Drinker 14

Tables 1 presents summary statistics of cases and controls and lifestyles. There is a

higher rate of smokers among the cases, and a higher rate of coffee drinkers among

controls.

Table 2: Contingency table of APOE-Ɛ2 and disease status

Non-APOE-Ɛ2 APOE-Ɛ2

carrier Total

Controls 148 31 179

Cases 112 15 127

Total 260 46 306

Table 2 presents a summary of APOE-Ɛ2 carrier status and case-control status. In this

sample 15% of the participants had APOE-Ɛ2, 17% of the controls had APOE-Ɛ2, and

12% of the cases had APOE-Ɛ2.

Table 3: Contingency tables of APOE-Ɛ2 and Smoking status cases

Cases(N=124) Ever Smoker Never

Carrier 5 10

Non-Carrier 48 61

Cases(N=76) Coffee Drinker Non-Drinker

Carrier 9 2

Non-Carrier 55 10

Controls(N=176) Ever Smoker Never

Carrier 20 11

Non-Carrier 37 108

Controls(N=156) Coffee Drinker Non-Drinker

Carrier 27 1

Non-Carrier 116 12

*For relevant subjects with full information for both variables

Table 3 shows frequencies of APOE-Ɛ2 carrier status and lifestyle variables for cases

and controls respectively in contingency tables. This is a good example to show how

difficult it may be to explore gene-lifestyle (or environment) interactions of rare

diseases. In our case, the gene is fairly rare as well. This table shows the distribution

of APOE-Ɛ2, coffee drinking and smoking among the case and control groups.

17

Table 4: APOE-Ɛ2, Coffee and Smoking Profile Frequencies

APOE-

Ɛ2Carrier*

Coffee ** Smoking*** Cases Controls

1 1 1 3 16 1 0 1 1 1 1 0 0 1 0 1 1 0 6 11 0 1 1 24 31 0 0 1 3 1 0 0 0 7 11 0 1 0 30 83 *APOE-Ɛ2 carrier is one for genotypes (E2/E3,E2/E2, E2/E4)** Coffee is 1 for coffee consumption

and 0 for no coffee consumption *** Smoking is 1 for Ever smoker and 0 for never Smoker

Table 4 shows the frequencies of cases and controls among each of the profiles.

Table 5a: Coffee and Smoking Missing-ness Among Cases and Controls

Smoking Response Cases Controls

Not Missing 124 176

Missing Smoking 5 3

Coffee Response Cases Controls

Not missing 76 156

Missing Coffee 51 23

Table 5a shows the frequencies of missing coffee and smoking related responses

among cases and controls who agreed to genetic testing. Among cases 3.8% of the

participants did not respond to the smoking questions, and 1.7% among controls.

Among cases 40% of the participants did not respond to coffee related questions, and

12.8% among controls.

Table 5b: Agreement to Genetic testing Among Cases and Controls

Cases Controls

Agreed 127 179

Did not Agree 36 67

Table 5b shows the missing-ness of genetic information. A higher percentage of

controls did not agree to genetic testing. Of the controls 27% did not agree to genetic

testing. Among the cases 22% did not agree to genetic testing. It is assumed that

genetic information is missing at random. It is reasonable to assume that the gene

status did not affect the participants' decision to be tested.

18

Figure 1a: Smokers, APOE-Ɛ2 carriers, and country of origin:

There were 189 participants with reported smoking statuses and origin, and 230 participants with

APOE-Ɛ2 test results and reported origin.

Figure 1b: Coffee drinkers, APOE-Ɛ2 carriers, and country of origin:

There were 272 participants with reported coffee drinking statuses and origin, and 230 participants with

APOE-Ɛ2 test results and reported origin.

Coffee drinking and smoking are both social and cultural lifestyle habits, and

therefore it's interesting to see their distribution among ethnicities. The following

graphs show the distribution of smokers and APOE-Ɛ2 carriers per self-reported

country of origin, and the same for coffee drinkers and APOE-Ɛ2 carriers

respectively.

25%

43%

33%

28%

43%

36% 40%

0% 0%

11%

18% 20%

0%

21%

40%

0% 0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

Percent smokers

Percent APOE-2 Carriers

0%

43% 47%

7%

50%

31%

0% 0% 0%

11%

18% 20%

0%

21%

40%

0% 0%

10%

20%

30%

40%

50%

60%

Percent Coffee Drinkers

Percent APOE-2 Carriers

19

Aim 1: To test for independence between APOE-Ɛ2 and smoking or coffee

drinking using empirical data.

Independence Tests

Table 6a: Ever Smoker and APOE-Ɛ2 carrier contingency table for controls

Smoker\APOE-Ɛ2 Non-Carrier Carrier

Never 108(75%) 11(35%)

Ever 37(25%) 20(65%)

OR=5.25, 95% CI (2.16,13.37), P-value: 0.0007

Table 6a is a contingency table of the frequencies of smokers and APOE-Ɛ2 carriers.

A Fisher Exact test shows that independence is rejected with a p-value of 0.0007, and

the confidence interval of the odds ratio at 95% is (2.16,13.37). The C&C model is

based on the assumption that the variables which make up the multiplicative

interaction are independent from one another. Therefore, if there is no independence,

the results of a C&C model on this empirical data may be untrustworthy. This issue

will be discussed further on.

Table 6b: Coffee Consumption and APOE-Ɛ2 carrier contingency table for

controls

Coffee

Consumption\APOE-Ɛ2

Non-Carrier Carrier

Never 12(9%) 1(4%)

Ever 116(91%) 27(96%)

OR=2.77, 95% CI (0.38, 127.7), P-value: 0.468

Table 6a is a contingency table of the frequencies of coffee drinkers and APOE-Ɛ2

carriers. A Fisher Exact test shows that independence is NOT rejected in this case,

with a p-value of 0.466 and a confidence interval for the odds ratio being

(0.38,127.7). Therefor if we were to use a C&C model one may assume that the

results of such a regression model would be valid.

21

Classical Logistic Regression Using the Empirical Data

Aim 2: To compare the effects of the multiplicative interaction between the

classical and the C&C logistic regressions using empirical data.

Table 7a: Classical Logistic Regression using the empirical study data - smoking

& APOE-Ɛ2 (N=297)

Estimate Std. Error

Z Value Pr(>|z|) OR 95%CI

(intercept) 2.3715 0.4498 5.273 1.34e-7*** 10.713 4.43-25.87

Gender -1.8778 0.2721 -6.901 5.18e-12*** 0.152 0.09-0.26

APOE-Ɛ2 0.2759 0.5171 0.534 0.5936 1.317 0.48-3.63

Smoker 0.4819 0.3031 1.59 0.1118 1.619 0.89-2.93

Smoker:APOE-Ɛ2 -2.0469 0.7779 -2.632 0.0085** 0.129 0.03-0.59

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Table 7a shows the effect estimates and test statistics of the classical logistic

regression using our empirical case-control data for smoking and APOE-Ɛ2.

Goodness of fit: The Wald test of all the coefficients has =56.3 (p=1.8e-11.) The

value of the residual deviance was 336.6 (P=0.039.) The Log-Likelihood test of

this model compared to a null model had a value 66.89 (p=1.02*e-11). A

Likelihood test of this model compared to a model without an interaction valued

=7.25 (p=0.007). The AIC for the model with the interaction is 346, smaller than the

model without the interactions (AIC=351), also showing that the model with the

interaction is preferable over the model without it, despite the loss in degrees of

freedom. All of the goodness of fit tests indicate that the model with an interaction is

well fitted.

Due to the fact that there are very few APOE-Ɛ2 carriers in this study, it was

impressive that the classical logistic regression produced an interaction effect which

was statistically significant at any commonly used α. A non-parametric bootstrap-t of

this data helps to further investigate the reliability of this result. The data sample was

bootstrapped 1000 times. The 95% CI of the T value of the interaction effect was (-

1.534, 1.400). This gives a 95% CI of (-0.7258,-3.3939)

To verify the reliability of the interaction effect results, a standardized bootstrap was

performed. The standardized bootstrap 95% confidence interval of the interaction

effect was (-4.195, -0.988) or the OR CI is (0.015-0.372), confirming that the effect is

significant.

Explanatory variables: The APOE-Ɛ2 carrier status and smoking main effects were

not significant while the interaction between them was, possibly showing that their

effects on disease risk are only multiplicative and not additive, i.e. that smoking and

21

APOE-Ɛ2 have no main effects. The odds ratio of APOE-Ɛ2 is 1.317, 95% CI (0.48,

3.63) and it was not significant. It's important to note that the P-value of the smoking

main effect is almost significant at 10%, but there must be a correction for multiple

tests. With a Bonferroni correction of multiple tests the APOE-Ɛ2 P-value would have

to be less than 0.05 to be significant at an α of 10%. The multiplicative coefficient is

significant even after Bonferroni correction. The OR of the interaction indicates

decreased risk for Parkinson's disease for people who smoke/smoked and are APOE-

Ɛ2 carriers.

Table 7b: Classical Logistic Regression using the empirical study data - Coffee

Drinking and APOE-Ɛ2 (N=232)

Estimate Std. Error

Z Value Pr(>|z|) OR 95%CI

(intercept) 2.5657 0.6883 3.727 0.000193*** 13.01 3.37-50.14 Gender -1.8446 0.3177 -5.207 6.4e-9*** 0.158 0.08-0.29 APOE-Ɛ2 0.6879 1.4295 1.4988 0.6303 1.99 0.12-32.77 CoffeeDrinker -0.523 0.5207 -1.005 0.3151 0.592 0.21-1.64 CoffeeDrinker:APOE-Ɛ2 -1.2606 1.4988 -0.841 0.4 0.283 0.02-5.34

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

N=232

Table 7b shows the effects estimates and test statistics of the classical logistic

regression using the empirical case-control data for coffee drinking and APOE-Ɛ2.

Goodness of fit: The Wald test P-value for the coefficients excluding gender has a

=3.8 (p=0.28.) The residual deviance statistic for this model is 248.62

(p=0.1242.) A Log-Likelihood test of this model over a Null model had a =41.02

(p=2.65e-8). A Log-Likelihood test of this model over a model without an interaction,

however, was not significant, having =0.747 (p=0.387). A Wald test of all the

coefficients has =35.4 (p=3.9e-7). The results of the goodness of fit test are likely

just the result of the explained variance from the gender alone, as none of the other

effects were significant. Overall these tests indicate that this model is not well fitted

or, more likely, that the sample size may be too small to estimate these regression

parameters.

Explanatory Variables: The interaction is not significant at 5%. Only the gender effect

was significant.

The classical logistic regression had meaningful results regarding the interaction

effect, and this small sample might have been sufficient for that analysis despite the

loss of degrees of freedom due to the interaction factor. We will now explore the

results of the C&C model.

22

C&C Logistic Regression using the Empirical Data

Table 8a: Classical Logistic Regression using the empirical study data - smoking

& APOE-Ɛ2 (N=292)

Estimate Std. Error Z Value Pr(>|z|) OR 95%CI

(intercept) 2.3832 0.4400 5.417 6.06e-8*** 10.839 4.57-25.67 Gender -1.8395 0.2669 -6.892 5.46e-12*** 0.158 0.09-0.27 APOE-Ɛ2 0.1424 0.2808 0.507 0.611 1.153 0.67-2.00 Smoker -0.2392 0.3958 -0.604 0.545 0.787 0.36-1.71 Smoker:APOE-Ɛ2 -0.4332 0.5811 -0.745 0.456 0.648 0.21-2.02

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Table 8a shows the results of the C&C model regression using the empirical case-

control data for smoking and APOE-Ɛ2.

Goodness of fit: The Wald test statistic of all the coefficients in this model has =

52.2 (p=1.3*e-10).

Explanatory Variables: The interaction between smoking and the APOE-Ɛ2 is not

significant in this model, differing from the results of the classical logistic regression.

The odds ratio of the APOE-Ɛ2 main effect is 1.153, 95% CI (0.665, 1.999).

We should expect that the results should be more statistically significant, and yet we

see that it is the opposite. The standard errors here are less than the standard errors of

the classical logistic model for all the effects except for the smoker main effect. The

estimates are different as well. It is known that the violation of the independence

assumption causes bias in the estimates. It would seem that the reason the interaction

effect is not significant is due to bias resulting from the violated assumption.

Table 8b: C&C Logistic Regression using the empirical study data - coffee

drinking (N=232)

Estimate Std. Error Z Value Pr(>|z|) OR 95%CI

(intercept) 2.5911 0.673 3.85 1.18e-4*** 13.344 3.56-49.9 Gender -1.8109 0.314 -5.766 8.09e-9*** 0.164 0.088-0.302 APOE-Ɛ2 0.0443 0.8099 0.0547 0.197 1.045 0.21-5.11 Coffee -0.6518 0.5054 -1.29 0.956 0.521 0.19-1.40 Coffee:APOE-Ɛ2 -0.306 0.8604 -0.356 0.722 0.736 0.14-3.98

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Table 8b shows the results of the C&C regression using the empirical case-control

data for coffee drinking and APOE-Ɛ2.

Goodness of fit: The Wald test of all the coefficients of this model had =34.7

(p=5.3*e-7), and as we saw in the previous model for coffee drinking, this is probably

due solely to the explained variance of the gender. Again, this model is poorly fitted.

Explanatory Variables: The interaction effect is not significant before or after

Bonferroni correction. The only effects which were significant were gender and the

intercept.

23

In summary, there were meaningful results in the models of smoking & APOE-Ɛ2.

There was a difference between the classical logistic model and the C&C model. The

interaction was significant in the classical logistic regression but not in the C&C

model. When the independence assumption between the gene and the exposure

variables is valid, then these two regressions should bring almost identical effect

estimates, and the C&C model should have MORE power, rather than less. Although

we notice, in the two regressions on the smoking data, that 4 of 5 parameter estimates

had smaller standard errors in the C&C model, 4 out of 5 Z values were further from 0

in the classical logistic model. These results are likely due to the violation of the

independence assumption.

In order to explore further the effects of violating the independence assumption on the

results of the C&C model, in the next section we will experiment with simulated data

samples analyzed by this constrained model.

Aim 3: To explore the implications of violated independence assumptions using

simulated data.

Simulations The simulation generated a case control sample data with gender, APOE-Ɛ, smoking,

and disease status. This section explores the results of both the classical logistic

regression and the C&C regression when there is dependence between APOE-Ɛ2 and

smoking using simulated data. First, sample data set results will be displayed, and

then results summarizing all the data sets.

i. The first simulation has dependence between APOE-Ɛ2 and smoking

based on the empirical data. 1000 sets of samples, with sample size 297

were generated from the empirical data.

ii. The second simulation has independence between the APOE-Ɛ2 carrier

status and smoking. This simulation generated 1000 sample sets of 5000.

Simulation i:

Table 9.a: Example Classical Logistic Regression of simulated data set i

Estimate Std. Error

Z Value Pr(>|z|) OR 95%CI

(intercept) 3.0832 0.5171 5.963 2.4e-9*** 21.828 7.92-60.14 Gender -2.2870 0.2926 -7.814 5.5e-15*** 0.102 0.06-0.18 APOE-Ɛ2 1.0160 0.5987 1.697 0.090 2.762 0.85-8.93 Smoker 0.3200 0.3158 1.013 0.311 1.377 0.74-2.55 Smoker:APOE-Ɛ2 -2.2584 0.9779 -2.309 0.020. 0.105 0.02-0.71

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Table 9.a shows the results of a classical logistic regression using one set from

simulation i.

24

In this sample there was dependence between APOE-Ɛ2 status and smoking with a

correlation of 0.0307. The interaction effect was statistically significant at 0.05 with a

P-value of 0.02. The main effects were not significant.

Table 9.b: Example C&C Regression of simulated data set i

Estimate Std. Error Z Value Pr(>|z|) OR 95%CI

(intercept) 3.2090 0.5078 6.318 2.9e-10*** 24.754 9.15-66.97

Gender -2.3693 0.2891 -8.195 3.5e-16*** 0.094 0.05-0.16 APOE-Ɛ2 0.8348 0.4050 2.061 0.047* 2.304 1.04-5.09 Smoker 0.2158 0.3049 0.708 0.457 1.241 0.68-2.25 Smoker:APOE-Ɛ2 -1.4480 0.6685 -1.905 0.056 0.235 0.06-0.87

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Table 9.b shows the results of a C&C logistic regression using one set of the

simulated data described in simulation i.

These results differ somewhat from the logistic regression. First of all, the APOE-Ɛ2

main effect is significant. Secondly, the interaction effect was different, although still

significant and still negative.

Table 10: Summary statistics of simulated effects - simulation i

Coef

Median

Estimate

Median

OR

OR 95% CI Median SE Median

Pr(>|z|)

# simulations

PV<0.05

APOE-

Ɛ2

Classical 0.2850 1.330 0.395-4.356 0.5836 0.451 57

C&C -0.9673 0.380 0.156-0.790 0.4130 0.020 671

Smoking Classical -1.9197 0.147 0.076-0.260 0.2902 4.23e-11 1000

C&C -1.8290 0.161 0.086-0.278 0.2793 6.99e-11 1000

G*E Classical -2.0366 0.130 0.027-0.492 0.7351 0.0005 795

C&C 0.6837 1.981 0.718-5.759 0.5107 0.174 263

Table 10 shows the summary of the classical and C&C logistic regressions on the

simulated data. We see that the C&C model had lower standard errors consistently.

However due to the bias resulting from the violated independence assumption, the

interaction effect was not significant in most of the C&C model simulations.

25

Simulation ii:

Table 11a: Example Classical Logistic Regression of simulated data set iii

Estimate Std. Error

Z Value Pr(>|z|) OR 95%CI

(intercept) 0.8262 0.0504 16.393 2.15e-60*** 2.290 2.07-2.53 Gender -1.8367 0.0630 -29.137 1.21e-186*** 0.159 0.14-0.18 APOE-Ɛ2 0.6170 0.1282 4.811 1.50e-6*** 1.853 1.44-2.38 Smoker -0.0010 0.0684 -0.014 0.98 0.999 0.87-1.14 Smoker:APOE-Ɛ2 -1.3943 0.2748 -5.074 3.891e-07*** 0.248 0.14-0.42

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Table 11a shows the results of a classical logistic regression using one data set from

simulation ii. All the effects were significant except smoker status.

Table 11b: Example C&C Regression of simulated data set iii

Estimate Std. Error

Z Value Pr(>|z|) OR 95%CI

(intercept) 0.8250 0.0501 16.457 7.42e-61*** 2.282 2.06-2.52 Gender -1.8300 0.0627 -29.177 3.90e-187*** 0.160 0.14-0.18 APOE-Ɛ2 0.6058 0.1035 -0.017 4.80e-9*** 1.833 1.49-2.24 Smoker -0.0011 0.0668 5.854 0.98 0.999 0.87-1.14 Smoker:APOE-Ɛ2 -1.3944 0.1970 -7.079 1.45e-12*** 0.248 0.17-0.36

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Table 11b shows the results of a C&C logistic regression using one set of the

simulated data described in simulation ii. All the effects were significant except

smoker status.

Table 12: Summary statistics of simulated effects - simulation iii

Coef

Median

Estimate

Median

OR

OR 95% CI Median SE Median

Pr(>|z|)

# simulations

PV<0.05

APOE-

Ɛ2

Classical 0.4880 1.629 0.927-2.877 0.2890 0.094 386

C&C 0.4648 1.591 0.997-2.477 0.2320 0.044 519

Smoking Classical -1.8653 0.155 0.116-0.205 0.1417 1.22e-39 1000

C&C -1.8607 0.155 0.117-0.204 0.1409 7.33e-40 1000

G*E Classical -1.0938 0.335 0.085-0.993 0.5904 0.060 476

C&C -1.1123 0.329 0.113-0.660 0.4120 0.007 850

Table 11 shows the summaries of the classical and C&C logistic regressions on the

simulated data. This data was simulated with no violation of the independence

assumption. The C&C regression consistently produces lower SE estimates, as well as

effects similar to the classical logistic regression. And as such, the C&C regression

also had consistently lower p-values. The C&C regression also produced significant

effects more often than the classical logistic regression. In summary, we see that in

this case, when the assumption is not violated, the C&C model does what it is

proposed to do.

26

5. Discussion

Aim 1: To test for independence between APOE-Ɛ2 and smoking or coffee

drinking using empirical data.

Case control studies require larger sample sizes when testing possible interactions

(73). In recent years is has become evident that gene-environment interactions play

important and complicated roles in many chronic diseases (74). This paper sought to

test independence between the interaction variables so that the C&C logistic model

could be implemented to increase the power of the analysis.

The results of the independence test rejected independence between APOE-Ɛ2 and

smoking (p-value=0.0007), meaning a significant correlation was found between

APOE-Ɛ2 and smoking in the sample. This in itself may be a very interesting

relationship, and there are genome-wide studies to find such correlations (75). The

C&C model assumes independence in the general population, and it is possible that

the correlation found in this study may be due to stratification. Nonetheless for the

purposes of this paper, it posed a challenge in analyzing the possible interaction

between genetic and lifestyle factors in the empirical data. Most case-control studies

of rare diseases fail to reach a sufficient sample size in order to study interactions.

One study showed that just to estimate the modest effects of APOE-Ɛ2 a case control

study would require a sample size of around 5000 (41). Therefor the C&C regression

model would have been an attractive option to increase power for estimating the

interaction effect among such a small sample size.

The rejection of independence between APOE-Ɛ2 may be due to ethnicity, or other

cultural factors, as smoking tend to trend culturally. The C&C model has the

capability to include stratification variables. For analysis of a habit such as smoking it

would have been beneficial to test the significance of a stratum such as origin,

ethnicity or others.

The independence test for coffee drinking and APOE-Ɛ2 did not reject independence,

and therefore we may rely on the results of the C&C model on the coffee data.

Aim 2: To compare the effects of the multiplicative interaction between the

classical and the C&C logistic regressions using empirical data.

The OR of the interaction effect between APOE-Ɛ2 and smoking in the classical

logistic regression was 0.129 (95% CI 0.03-0.59), showing that the interaction lowers

the risk for Parkinson's disease. This result is significant at α=5% after Bonferroni

correction for multiple testing. The OR from the C&C model was 0.648 (95% CI

0.21-2.02). This is one demonstration of the outcomes of the invalidated

independence assumption on the C&C model. The interaction effect in the C&C

27

model is not statistically significant, and is different than the ML estimate of the

classical logistic regression, which was statistically significant.

When the independence assumption is violated, the C&C model's estimates are

biased. In this empirical data, the bias of the C&C estimates caused all of the effects

except for the intercept to be closer to zero. Therefore despite the lower standard error

estimates, the effects were not statistically significant.

It is interesting to note that when the gene and environment variables are both binary,

then the goals and benefits of the C&C model and the log linear type model from

Umbach and Weinberg's paper are the same - to estimate logistic regression odds

ratios while incorporating the independence assumption. When using the log linear

approach to estimate the OR of the main effects APOE-Ɛ2 and smoking, and their

interaction effect, they are, in this case, equivalent to the results of the C&C model.

This was tested for models not including gender. The log linear estimates for a model

assuming independence are (16):

And the results for the C&C model OR's were 1.6232, 0.7870, and 0.6485

respectively. The log linear method is also biased under the violation of the

independence assumption. When the variables are binary and the independence

assumption is not violated it would seem that in that case they would provide very

similar estimates as well.

No significant findings were found in the analysis of coffee drinking data. A previous

study which evaluated the interaction between smoking and APOE-Ɛ2 did not find

statistically significant results (69). The same did find a statistically significant

interaction effect between coffee drinking and APOE-Ɛ2. The lack of results in the

coffee drinking analysis likely due to the fact that the missing data was missing due to

incompleteness of the questionnaires, and the coffee drinking information was much

further into the questionnaire than the smoking data, and therefor was missing more

often.

Aim 3: To explore the implications of violated independence assumptions using

simulated data.

In the simulations, it was demonstrated that when resampling from the empirical data,

with replacement, the C&C model was successful in estimating the interaction effect

28

with a p-value<0.05 263 times, and the classical logistic model was able to estimate a

statistically significant interaction effect 795 times. We also see that the estimates are

similarly drastically different as they were in the results of the analysis on the

empirical data. The median smoking APOE-Ɛ2 OR of classical logistic regression was

0.130, and in the C&C model the median was 1.981. However, the C&C model still

estimated a lower standard error for the interaction effect. Nilanjan Chatterjee

addressed this bias in a paper where he proposed an Empirical Bayes-type shrinkage

estimator for the interaction effect which compromised between bias and efficiency

(76), showing that the bias can in fact be mild, like the result from the simulations. In

their paper, they demonstrate the variability of the bias under different conditions.

They show that the case-control estimate usually has less bias than the case-only

estimate, especially under departure from the independence assumption. The

Empirical Bayes-type shrinkage estimator proposed by Mukherjee and Chatterjee

relaxes the independence assumption, making it more robust and a better estimator for

data sets where there is some deviation from the independence assumption, while also

increasing the model's power. In this paper, since the results of the classical logistic

regression had greater p-values for most of the parameters, using the empirical Bayes-

type shrinkage estimator was less relevant.

The classical logistic regression demonstrated greater power to estimate the smoking

main effect as well. The medial p-value for the classical logistic regression was

4.23*10-11

, and the median p-value for the C&C regression was higher at 6.99*10-11

.

However, the opposite is true for the APOE-Ɛ2 main effect. For both effects the

estimated standard error was lower in the C&C regression, but the direction of the

bias effected the Z test parameter and p-values.

Aim 4: To determine if there are interactions between APOE-Ɛ2 and smoking or

coffee drinking and quantify their effect based on the results of this study.

The classical logistic regression showed a significant interaction between APOE-Ɛ2

and smoking in contrast to previous studies (69). In the case of this paper, due to the

violation of the independence assumption, the classical logistic regression effect is

more reliable than the effect estimated by the C&C model. To verify the reliability of

the result, since the sample size was indeed small, a standardized bootstrap was

performed and a 95% confidence interval of the OR was (0.015-0.372). Therefore we

may determine that the interaction effect OR is 0.129. It is important to note that this

interaction may be the result of population stratification, perhaps by ethnic or cultural

factors. This study did not include sufficient background, explanatory variables.

Genome-wide association is now enabling the discovery of more and more genetic

factors in Parkinson's disease (77). This study considered known genetic and lifestyle

factors and their interactions. The presence of a smoking- APOE-Ɛ2 interaction could

greatly benefit the study of Parkinson's Diseases, its etiology, and its treatment.

Current research suggests that smoking may have neuroprotective effects, and

29

stimulates dopamine neurons damaged in persons with Parkinson's disease (78). In

this study, it is unclear whether the effects of smoking and APOE-Ɛ2 were only in the

interaction. It is possible that there are main effects, and that there wasn't sufficient

power to detect them if the inclusion of the interaction in the models reduced the

power to estimate the main effects. According to a previous study inclusion of

interactions in models usually reduces the power to estimate main effects. (14). The

results of studies of APOE-Ɛ2 and Parkinson's disease have been inconsistent (41). It

is still unclear if previous studies were inconsistent because the APOE-Ɛ2 OR is very

close to 1, or if perhaps the effects of APOE-Ɛ2 are not main effects but rather only in

interaction with other exposures or lifestyles. It is possible that there is publication

bias in the studies of APOE-Ɛ2 as a main effect and Parkinson's disease. If the sample

size of the study was larger there might have been sufficient power to estimate main

effects as well if they exist.

As mentioned earlier the results of the analysis on coffee drinking were not

statistically significant, likely due to small sample size. Previous studies have found

and interaction between coffee drinking and APOE-Ɛ2, and warrants further study

(69).

Sufficient sample size and power continues to be a challenge in the case control

studies of chronic and rare diseases. Many studies neglect to consider interaction

gene-environment effects, which are often relevant in many chronic diseases (2).

However, introducing a gene-environment interaction into the analysis requires an

even larger sample size (79). This study demonstrates a possible solution to these

problems, as well as the possible roadblocks to using it.

Limitations:

This study was limited in size. The controls included two distinct groups, spouses and

other patients, to enlarge the total control group. The control group was not

completely random and therefore biased. There is also some evidence that smoking

increases the risk or rheumatoid arthritis, and some of the control patients may have

had rheumatoid arthritis (80). Some portion of the controls sampled from the

Rheumatism and Arthritis patients in Sourasky Medical Center may have had

rheumatoid arthritis.

Due to the length of the questionnaires there were missing responses to the coffee

questions which were located towards the end. Previous studies have shown that there

may be an interaction between coffee drinking and APOE-Ɛ2, and further study is

needed to verify these results. This study only investigated first degree interactions

due to its limited size. In smoking and APOE-Ɛ2, as well as other future studies, it is

necessary to investigate higher degree interactions as well, such as gene-gene-

environment or gene-environment-life-stage etc., to fully understand the etiology.

This study lacked the power to include addition background variables such as

31

ethnicity and education. These variables were lacking in the regression models, as

well as the independence tests, as they could have been used for stratification.

Further studies are needed to verify the reproducibility of these results, that they are

indeed indicative of an existing interaction between smoking and APOE-Ɛ2 rather

than the result of stratification. The same is true for the results of the independence

tests. Future studies are needed to better understand the interaction between smoking

and APOE-Ɛ2; to determine whether main effects of smoking and APOE-Ɛ2 and

present in the presence of the interaction effect, and if the interaction is of a first

degree or of a higher degree, interacting with other genes or exposures. As

demonstrated in this paper, there is a need for further development of statistical tools

for small case-control studies, especially for studies including interaction effects. It

would be interesting to use the bootstrapping method used in this paper to estimate the

standard error of effects from the C&C regression when the independence assumption

is violated, rather than rely on the estimations of the model. There is also a need to

further explore the performance of the C&C model under different conditions using

higher degree interactions.

Bibliography

1. "Naure Versus Nurture" and incompletely penetrant mutations. Simon, DK, Lin, MT and

Pascual-Leone, A. 2002, Journal of Neurology, Neurosurgery and Psychiatry, pp. 686-689.

2. Design and analysis issues in gene and environment studies. Lui, Chen-Yu, et al. 2012,

Environmental Health, p. 11:93.

3. Gene–environment correlations: a review of the evidence and implications for prevention

of mental illness. Jaffee, SR and Price, TS. 2007, Molecular Psychiatry, pp. 12, 432-442.

4. Case-control study of interactions between genetic and environmental factors in

Parkinson's disease. Palma, Guiseppe De, et al. 9145, 1998, The Lancet, Vol. 352, pp. 1986-

1987.

5. Gene-environment interactions in parkinsonism and Parkinson’s disease: the Geoparkinson

study. Dick, Finlay D, et al. 10, 2007, Occupational and Environmental Medicine, Vol. 64, pp.

673-680.

6. A case-control study of Parkinson's disease and tobacco use: Gene-tobacco interactions.

Palma, Guissepe De, et al. 7, 2010, Movement Disorders, Vol. 25, pp. 912-919.

7. Gene–environment interactions: Key to unraveling the mystery of Parkinson's disease.

Gao, Hui-Ming and Hong, Jau-Shyong. 1, 2011, Progress in Neurobiology, Vol. 94, pp. 1-19.

31

8. Gene-Environment Interactions in Depression Research. Reid, Scott M. Monroe and Mark

W. 10, 2008, Psychological Science, Vol. 19, pp. 947-956.

9. Colorectal Adenomas and the C677T MTHFR Polymorphism: Evidence for Gene-

Environment Interaction? Ulrich, Cornelia M, et al. 8, 1999, Cancer Epidemiology,

Biomarkers and Prevention, Vol. 8, p. 659.

10. A gene–environment interaction between smoking and shared epitope genes in HLA–DR

provides a high risk of seropositive rheumatoid arthritis. Padyukov, Leonid, et al. 10, 2004,

Arthritis and Rheumatism, Vol. 50, pp. 3004-3092.

11. Advances in Environmental Epidemiology. Tanner, Caroline M. 2010, Movement

Disorders, pp. S58-S52.

12. Logistic disease incidence models and case-control studies. Prentice, R L and Pyke, R. 3,

1979, Biometrika, Vol. 66, pp. 403-411.

13. The Design of Case-Control Studies: The influence of Confounding and Interaction Effects.

Smith, P G and Day, N E. 1984, International Journal of Epidemiology, pp. 356-365.

14. Does accounting for gene-environrment interaction increase the power to detect the

effect of a gene in a multifactorial disease. Selinger-Leneman, Hana, et al. 2003, Genetic

Epidemiology, pp. 200-207.

15. Non-hierarchical Logistic Models and Case-Only Designs for Assessing susceptibility in

population-based case-control studies. Piegorsch, Walter W, Weinberg, Clarice R and

Taylor, Jack A. 1994, STATISTICS IN MEDICINE, Vol. 13, pp. 153-162.

16. Designing and analysing case-control studies to exploit independence of genotype and

exposure. Umbach, David M and Weinberg, Clarice R. 15, 1997, Statistics in Medicine, Vol.

16, pp. 1731-1743.

17. Semiparametric maximum likelihood estimation exploiting gene-environment

independence in case-control studies. Chaterjee, Nilanjan and Carroll, Raymond J. 2, 2005,

Biometrika, Vol. 92, pp. 399-418.

18. Epidemiology of Parkinson’s disease. Alves, Guido, et al. 5, 2008, Journal of Neurology,

Vol. 255, pp. 18-32.

19. Brain dopamine and the syndromes of Parkinson and Huntington Clinical, morphological

and neurochemical correlations. Berheimer, H, et al. 4, 1973, Journal of the Neurological

Sciences, Vol. 20, pp. 415-455.

20. Dopamine transporters and neuronal injury. Miller, Garry W, et al. 10, 1999, Trends in

Pharmacological Sciences, Vol. 20, pp. 424-429.

21. The Sydney multicentre study of Parkinson's disease: progression and mortality at 10

years. Hely , M A, et al. 1999, Journal of Neurology, Neurosurgery, and Pscychiatry, pp. 300-

307.

32

22. Increased life expectancy resulting from addition of l-deprenyl to Madopar® treatment in

Parkinson's disease: A longterm study. Birkmayer, W, et al. 1985, Journal of Neural

Transmission, pp. 113-127.

23. Economic burden associated with Parkinson's disease on elderly Medicare beneficiaries.

Noyes, Kate, et al. 3, 2006, Movement Disorders, Vol. 21, pp. 362-372.

24. Longitudinal study of the socioeconomic burden of Parkinson’s disease in Germany.

Winter, Y, et al. 9, 2010, European Journal of Neurology, Vol. 17, pp. 1156-1163.

25. Staging of brain pathology related to sporadic Parkinson’s disease. Braak, Heiko, et al. 2,

2003, Neurobiology of Aging, Vol. 24, pp. 197-211.

26. Parkinson's disease: genetics and pathogenesis. Shulman, JM, Jager, PL de and Feany,

MB. 2011, Anual Review of Pathology Mechaisms of Disease, Vol. 6, pp. 193-222.

27. Use of a Refined Drug Tracer Algorithm to Estimate Prevalence and Incidence of

Parkinson's Disease in a Large Israeli Population. Chillag-Talmor, Orly, et al. 2011, Journal of

Parkinson's Disease, pp. 35-47.

28. LRRK2 G2019S as a Cause of Parkinson's Disease in Ashkenazi Jews. Ozelius, Laurie J, et

al. 2006, The New England Journal of Medicine, pp. 424-425.

29. Dynamics of Parkinsonism–Parkinson’s Disease in Residents of Adjacent Kibbutzim in

Israel’s Negev. Goldsmith, J R, et al. 1997, Environmental Research, pp. 156-161.

30. α-Synuclein Locus Triplication Causes Parkinson's Disease. Singleton, A B, et al. 2003,

Science, Vol. 302, p. 841.

31. Interaction of α-synuclein and tau genotypes in Parkinson's disease. Mamah, Catherine

E, et al. 3, 2005, Annals of Neurology, Vol. 57, pp. 439-443.

32. The LRRK2 Gly2385Arg variant is associated with Parkinson’s. Tan, E K, et al. 2007,

Human Genetics, pp. 857-863.

33. N-acetyltransferase-2 polymorphism in Parkinson’s disease: the Rotterdam study.

Harhangi, Sanjay B, et al. 1999, Hournal of Neurology, Neurosurgery, and Psychiatry, pp.

518-520.

34. A study of five candidate genes in Parkinson’s disease and related neurodegenerative

disorders. Nicholl, D J, et al. 1999, Neurology, p. 1415.

35. Genome-wide association study reveals genetic risk. Simon-Sanchez, Javier, et al. 2009,

Nature Genetics, pp. 41:12:1308-1314.

36. Gene-Gene Interaction Between FGF20 and MAOB in Parkinson Disease. Gao, X, et al. 2,

2008, Annals of Human Genetics, Vol. 72, pp. 157-162.

37. Genotype-Phenotype correlations between GBA mutations and Parkison's disease risk

and onset. Gan-Or, Z, et al. 2008, Neurology, pp. 2277-2283.

33

38. Multicenter Analysis of Glucocerebrosidase Mutations in Parkinson's Disease. Sidransky,

E, et al. 2009, The New England Journal of Medicine, pp. 1651-1661.

39. APOE-ε2 allele associated with higher prevalence of sporadic Parkinson disease. Huang,

Xuemei, Chen, Peter C and Poole, Charles. 2004, Neurology, pp. vol. 62 no. 12 2198-2202.

40. The Apolipoprotein E ϵ4 Allele in Parkinson's Disease with Alzheimer Lesions.

Egensperger, R, et al. 1996, Biochemical and Biophysical Research Communications, pp. 484-

486.

41. Apolipoprotein E genotype as a risk factor for susceptibility to and dementia in

Parkinson’s Disease. Williams-Gray, Carroline, et al. 2009, Journal of Neurology, pp. 493-

498.

42. Prognosis of Parkinson's Disease - Risk of Dementia and Mortality: The Rotterdam Study.

de Lau, Lonneke M L, et al. 2005, Archives of Neurology, pp. 1265-1269.

43. Apolipoprotein E controls the risk and age at onset of Parkinson disease. Li, Y J, et al.

2004, Neurology, pp. 2005-2009.

44. Diertary Fat Clearance in Normal Subjects is Regulated by Genetic Variation in

Apolipoprotien E. Weintraub, Moshe S, Eisenberg, Shlomo and Breslow, Jan L. 1987, Journal

Of Clinical Investigation, pp. 1571-1577.

45. The Apolipoprotien E Polymorphism: A Comparison of Frequenies and Effects in Nine

Populations. Hallman, Michael D, et al. 1991, American Journal of Human Genetics, pp. 338-

349.

46. Apolipoprotein-E Genotype and the Risk of Developing Cholelithiasis following Bariatric

Surgery: a Clue to Prevention of Routine Prophylactic Cholecystectomy. Abeid, Subhi Abu, et

al. 2002, Obesity Surgery, pp. 354-357.

47. Dose-dependent protective effect of coffee, tea, and smoking in Parkinson's disease: a

study in ethnic Chinese. Tan, E K, et al. 1, 2003, Journal of the Neurological Sciences, Vol.

216, pp. 163-167.

48. Environmental Risk Factors and Parkinson's Disease: A Metaanalysis. Priyadarshi,

Anumeet, et al. 2, 2001, Environmental Research, Vol. 86, pp. 122-127.

49. A Case-Control Study of Parkinson’s Disease in Urban Population of Southern Israel.

Herishanu, Yuval O, et al. 2001, The Canadian Journal of Neurological Sciences, pp. 144-147.

50. Occupational and Environmental risk factors for Parkinson's Disease. Lai, B C L, et al.

2002, Parkinsonism and Related Disorders, pp. 297-309.

51. Searching for a relationship between manganese and welding and Parkinson's disease.

Jankovic, J. 2005, Neurology, pp. 2021-2028.

52. Smoking, alcohol, and coffee consumption preceding Parkinson's disease. Benedetti, M

D, et al. 2000, Neurology, Vol. 55, pp. 1350-1358.

34

53. Association of Coffee and Caffeine Intake With the Risk of Parkinson Disease. Ross,

Webster G, et al. 20, 2000, The Journal of the American Medical Association, Vol. 283, pp.

2674-2679.

54. Coffee Consumption, Gender, and Parkinson’s Disease Mortality in the Cancer Prevention

Study II Cohort: The Modifying Effects of Estrogen. Ascherio, Alberto, et al. 10, 2004,

American Journal of Epidemiology, Vol. 160, pp. 977-984.

55. Coffee and tea consumption and the risk of Parkinson's disease. Hu, Gang, et al. 15,

2007, Movement Disorders, Vol. 22, pp. 2242-2248.

56. Prospective study of coffee consumption and risk of Parkinson's diseaseCoffee

consumption and Parkinson's disease. Saaksjarvi, K, et al. 7, 2008, European Journal of

Clinical Nutrition, Vol. 26, pp. 908-915.

57. Differential Effects of Black versus Green Tea on Risk of Parkinson's Disease in the

Singapore Chinese Health Study. Tan, Louis C, et al. 5, 2008, American Journal of

Epidemiology, Vol. 167, pp. 553-560.

58. Prospective Study of Cigarette Smoking and the Risk of Developing Idiopathic Parkinson's

Disease. Grandinetti, Andrew, et al. 12, 1994, American Journal of Epidemiology, Vol. 139,

pp. 1129-1138.

59. Exploring an interaction of adenosine A2A receptor variability with coffee and tea intake

in Parkinson's disease. Tan, E K, et al. 2006, American Journal of Medical Genetics, Vol. B,

pp. 634-636.

60. Neuroprotection by Caffeine and A2A. Chen, Jiang-Fan, et al. 2001, Journal of

Neuroscience, Vol. 21, pp. 1-6.

61. Smoking and Parkison's disease. Godwin-Austin, R B, et al. 1982, Journal of Neurology,

Neurosurgery and Psychiatry, Vol. 45, pp. 577-581.

62. Smoking and Parkinson's disease. Gorell, Jay M, et al. 1, 1999, Neurology, Vol. 52, p. 115.

63. Parkinson's Disease Risks Associated with Cigarette Smoking, Alcohol Consumption, and

Caffeine Intake. Checkoway, Harvey, et al. 8, 2002, American Journal of Epidemiology, Vol.

155, pp. 732-738.

64. Risk and protective factors for Parkinson's disease: A study in Swedish twins. Wirdefeldt,

Karin, et al. 1, 2005, Annals of Neurology, Vol. 57, pp. 27-23.

65. Pooled Analysis of Tobacco Use and Risk of Parkinson Disease. Ritz, Beate, et al. 7, 2007,

Neurology, Vol. 64, pp. 990-997.

66. Smoking, nicotine and Parkinson's disease. Quik, Maryka. 9, 2004, Trends in

Neurosciences, Vol. 27, pp. 561-568.

67. A Meta-analysis of Coffee Drinking,. Hernan, Miguel A, et al. 3, 2002, Annals of

Neurology, Vol. 52, pp. 276-284.

35

68. Update in the epidemiology of Parkinson's disease. Elbaz, Alexis and Moisan, Frederic.

2008, Current Opinion in Neurology, Vol. 21, pp. 454-460.

69. Exploring gene-environment interactions in Parkinson’s disease. McCulloch, Collin C, et

al. 2008, Human Genetics, pp. 257-265.

70. Case-only study of interactions between genetic polymorphisms of GSTM1, P1, T1 and Z1

and smoking in Parkinson's disease. Deng, Y, et al. 2004, Neuroscience letters, pp. 326-331.

71. APOE-E2 allele assiciated with higher prevalence of sporadic Parkison disease. Huang,

Xuemei, Chen, Peter C. and Poole, Charles. 2004, Neurology, pp. 2198-2202.

72. Bhattacharjee, Samsiddhi , Chatterjee , Nilanjan and Wheeler , William . An R package

for analysis of case-control studies in genetic epidemiology. An R package for analysis of

case-control studies in genetic epidemiology. [Online] 2010.

http://127.0.0.1:16855/library/CGEN/html/CGEN.html.

73. The Design of Case-Control Studies: The Influence of Confounding and Interaction Effects.

Smith, P G and Day, N E. 3, s.l. : International Journal of Epidemiology, 1983, Vol. 13. 356-

365.

74. An Epidemiologic Approach to Gene-Environment Interaction. Ottman, Ruth. s.l. :

Genetic Epidemiology, 1990, Vol. 7. 177-185.

75. Genome-wide meta-analyses identify multiple loci associated with smoking behavior.

Consortium, The Tobacco and Genetics. 2010, Nature Genetics, pp. 441-447.

76. Exploiting Gene-Environment Independence for Analysis of Case–Control Studies: An

Empirical Bayes-Type Shrinkage Estimator to Trade-Off between Bias and Efficiency.

Mukherjee, Bhramar and Chatterjee, Nilanjan . 2008, Biometrics, pp. 685-694.

77. Genome-wide association study reveals genetic risk underlying Parkinson's disease.

Simon-Sanchez, Javier , et al. 1308 - 1312, s.l. : Nature Genetics, 2009, Vol. 41.

78. Smoking, nicotine and Parkinson's disease. Quik, Maryka. 9, s.l. : Trends in

Neurosciences, 2004, Vol. 27. 561-568.

79. Minimum Sample Size Estimation to Detect GeneEnvironment Interaction in Case-Control

Designs. Hwang, Shih-Jen, et al. 11, s.l. : American Journal of Epidemiology, 1994, Vol. 140.

1029-1037.

80. Cigarette smoking increases the risk of rheumatoid arthritis: Results from a nationwide

study of disease-discordant twins. Silman , Alan J, Newman, Jason and Macgregor ,

Alexander J. 1996, Arthritis and Rheumatism, pp. 732-735.

36

תקציר

וסביבה גנים בין אינטראקציות של השפעתם את בבירור מראים האחרונות שנים של מחקרים

עוצמה נדרשת וניתוח במחקר האינטראקציות הכללת לשם .רבות כרוניות מחלות על

לוגיסטית רגרסיה של ביצועים חקירת הינה הנוכחית העבודה מטרת .משמעותית סטטיסטית

.בישראל פרקינסון מחלת של תרובק-מקרה במחקר וסביבה גנים של תלות אי בהנחת

-APOE קפה ושתיית APOE-Ɛ2 סיגריות עישון בין אינטראקציה לבדיקת ברגרסיה שתמשנוה

Ɛ2 צולב יחס. קלאסית תסטילוגי לרגרסיה לעיל רגרסיה השוות בוצע .פרקינסון מחלת לבן

עבור ניתוח תוצאת(. CI 95% 1.13-1.59) 1.129 היה APOE-Ɛ2 סיגריות עישון עבור

-OR=1.648 (CI 95% 1.21, סטטיסטית משמעותית לו הייתה APOE-Ɛ2 קפה שתיית

ותסטילוגי רגרסיות שתי השוואת, APOE-Ɛ2 -ל עישון בין אפשרית תלות הודות(. 2.12

לצפות היכולת חוסר למרות, זה במקרה לוגיסטית קלאסית רגרסיה של עדיפות הראתה

של פיתוח המשך נדרש, כרוניות מחלות של במחקר נוכחיים אתגרים לאור .עיקריות השפעות

.תרובק-מקרה במחקרי אטרקציות לחיזוי סטטיסטיים כלים