Extensions to the Modeling of Initiation and Progression: Applications to Substance Use and Abuse
-
Upload
independent -
Category
Documents
-
view
0 -
download
0
Transcript of Extensions to the Modeling of Initiation and Progression: Applications to Substance Use and Abuse
ORIGINAL PAPER
Extensions to the Modeling of Initiation and Progression:Applications to Substance Use and Abuse
Michael C. Neale Æ Eric Harvey Æ Hermine H.M. Maes ÆPatrick F. Sullivan Æ Kenneth S. Kendler
Received: 9 June 2005 / Accepted: 17 February 2006 / Published online: 8 June 2006
� Springer Science+Business Media, Inc. 2006
Abstract Twin data can provide valuable insight into the
relationship between the stages of phenomena such as
disease or substance abuse. Initiation of substance use may
be caused by factors that are the same as, partially shared
with, or completely independent of those that cause
progression from use to abuse. Comparison of rates of
progression among the cotwins of twins who do vs. do not
initiate provides indirect information about the relationship
between initiation and progression. Existing models for this
relationship have been difficult to extend because they are
usually expressed in terms of explicit integrals. In this
paper, the problem is overcome by regarding the analysis
of twin data on initiation and progression as a special case
of missing data, in which individuals who do not initiate
are regarded as having missing data on progression mea-
sures. Using the general framework for the analysis of
ordinal data with missing values available in Mx makes
extensions that include other variables much easier. The
effects of continuous covariates such as age on initiation
and progression becomes simple. Also facilitated are the
examination of initiation and progression in two or more
substances, and transition models with two or more steps.
The methods are illustrated with data on the effects of
cohort on liability to cannabis use and abuse, bivariate
analysis of tobacco use and dependence and cannabis use
and abuse, and the relationships between initiation of
smoking, regular smoking and nicotine dependence. Other
suitable applications include the relationship between
symptoms and diagnosis, such as fears and the progression
to phobia.
Keywords Age Æ Cannabis Æ Comorbidity ÆDependence Æ Genetics Æ Initiation Æ Method Æ Missing
data Æ Missingness Æ Model Æ Nicotine Æ Substance
abuse Æ Substance use Æ Tobacco Æ Twins
Introduction
An important class of measurements describes two-stage
phenomena, such as initiation of substance use and pro-
gression to substance abuse or addiction. Many of these
phenomena involve conditional processes where an initial
Edited by Michael Stallings
Michael C. Neale (&) Æ Hermine H.M. Maes Æ Kenneth S.
Kendler
Virginia Institute for Psychiatric and Behavioral Genetics,
Virginia Commonwealth University, 980126, Richmond, VA
23298-0126, USA
e-mail: [email protected]
Tel.: +1-804-8283369
Fax: +1-804-8281471
Michael C. Neale Æ Hermine H.M. Maes Æ Kenneth S. Kendler
Department of Psychiatry, Virginia Commonwealth University,
Richmond, VA, USA
Michael C. Neale Æ Kenneth S. Kendler
Department of Human Genetics, Virginia Commonwealth
University, Richmond, VA, USA
Michael C. Neale
Department of Psychology, Virginia Commonwealth University,
Richmond, VA, USA
E. Harvey
Department of Environmental Sciences and Engineering,
University of North Carolina at Chapel Hill, Chapel Hill 27599
NC, USA
Patrick F. Sullivan
Department of Genetics Psychiatry and Epidemiology,
University of North Carolina at Chapel Hill, Chapel Hill 27599
NC, USA
Behav Genet (2006) 36:507–524
DOI 10.1007/s10519-006-9063-x
123
‘‘gateway’’ event necessarily precedes the development of
a subsequent outcome. This class is broad; it includes
seeking treatment given presence of a disorder, develop-
ment of a phobic disorder given an unreasonable fear, the
transition from mild symptom expression to clinical dis-
order, exposure to a risk factor such as combat stress, and
subsequent development of post-traumatic stress disorder
(PTSD) symptoms or diagnosis. Diagnostic criteria for
many psychiatric disorders have inherent hierarchies:
binging is a necessary but not sufficient condition for a
diagnosis of Bulimia Nervosa; and childhood conduct
disorder is a prerequisite for an adult diagnosis of antisocial
behavior. While the transitions from one point to the next
on a Likert scale (such as from ‘‘Never’’ to ‘‘Rarely’’ to
‘‘Often’’) are usually regarded as changes on the same
underlying continuum, it is possible that discontinuities
exist and that different factors are related to changes at
different points on the scale. The possible absence of a
single continuum may be more obvious in the case of
interview questions that have a stem and follow-up format,
such as ‘‘Have you ever...’’ with a set of subsequent items
that are asked only if the respondent replies in the affir-
mative to the stem question. For simplicity, we will refer to
these two stages of development as initiation and pro-
gression, but they clearly apply to a wide variety of mea-
surement contexts.
A popular approach to the examination of initiation and
progression is to use data collected from pairs of relatives
(Heath et al., 1998; Kendler et al., 1999). Essentially, the
method compares the rate of progression in the relatives of
initiators with the rate of progression in the relatives of
non-initiators. As long as there is some correlation between
relatives for initiation, it is possible to obtain an estimate of
the strength of the relationship between initiation and
progression within an individual. This proxy form of
information indirectly indexes the within-person correla-
tion between liability to initiate and liability to develop
dependence once initiation has begun.
In the classical twin study of MZ and DZ twins reared
together, the sources of variation are decomposed into
additive genetic, random environment and either domi-
nance genetic or common environment components. With
multivariate data it is also possible to decompose the
covariance between traits into the same (three) compo-
nents. In principle, therefore, one might expect to be able to
partition covariation between liability to initiation and
liability to progression into the same components. How-
ever, the lack of direct information on the within-person
resemblance for initiation and progression prevents a full
three-way decomposition. Those who do not initiate do not
have an opportunity to express the progression phenotype;
data on progression is missing in all those who do not
initiate. Nevertheless, it is possible to get an estimate of the
relationship between liability to initiation and to progres-
sion via structural equation modeling of twin data.
Figure 1 shows a path diagram of a model that specifies
a direct causal path from initiation to progression within
each member of a twin pair. This model is identified with
data collected from MZ and DZ twins. Applications to date
include alcohol, tobacco, cannabis and stimulants (Heath
and Martin, 1993; Heath et al., 1997; Kendler and Prescott,
1998; Kendler et al., 1999; Koopmans et al., 1999) and a
common finding is that there are additive genetic, shared
and specific environmental influences on initiation but
factors specific to progression appear to be influenced by
AI CI EI
I
P
AP CP EP
1.00 1.00 1.00
1.00 1.00 1.00
ai ci ei
ap cp ep
b
Fig. 1 Causal model of the relationship between initiation and
progression for an individual. The model requires cotwins and
specification for MZ and DZ twins for identification
508 Behav Genet (2006) 36:507–524
123
additive genetic and specific environment factors with little
role for the shared environment. The study of Kendler et al
(Kendler et al., 1999) was something of an exception to
this pattern, finding little evidence for the shared environ-
ment for initiation. However, initiation in that study was
defined as initiation of regular smoking rather than any
tobacco use whatsoever.
Certain submodels of the causal model shown in Fig-
ure 1 have important substantive meaning. First, if the
pathway from initiation to progression is zero, risk factors
for initiation and progression are entirely independent of
each other. Second, if this path has a standardized value of
unity (no residual variation in progression) then initiation
and progression represent different thresholds on a single
continuum of liability. These two models correspond to the
independent liability dimensions model and the single
liability dimension model of Heath et al. (1993). The full
model described here is an alternative to the combined
liability dimension model of Heath et al. It may be termed
a ‘‘causal contingent model’’ as it is essentially a direction
of causation model (initiation liability causes progression
liability) (Neale and Cardon, 1992) but it is applied to data
where progression is contigent on initiation.
In this article we describe some extensions of this
important and widely applicable model using missing data
methods. The first of these new models is the addition of
external indices of risk for initiation or progression. The
second is the bivariate case, in which comorbidity for
initiation and progression between two substances may be
examined. The third is multiple stages, such as initiation,
addiction and recovery.
The Univariate Model
One way to consider data on initiation and progression
from pairs of relatives is as a 4· 4 contingency table as
shown in Table 1a. As noted above, it is not possible to
discriminate progression from non-progression in those
who do not initiate. Therefore the 4 · 4 theoretical table of
outcomes (Table 1a) collapses to a 3· 3 table of observable
outcomes (Table 1b). One approach to modeling the rela-
tionship between the liabilities to initiation and progression
in pairs of relatives is to assume that there is a multivariate
normal distribution underlying the process. Individuals
whose liability is above a threshold, tI, on the initiation
dimension initiate the process, e.g., they start to smoke
tobacco. Those who have initiated and are also above
threshold tP on the progression dimension progress, e.g.,
they become nicotine dependent. This model is attractive
because it allows for a variety of models of covariation
between initiation and progression (Heath, 1990; Heath and
Martin, 1993; Kendler et al., 1999; Koopmans et al., 1997;
Meyer et al., 1992; True et al., 1997), of which the causal
model (Kendler et al., 1999; Neale and Cardon, 1992) has
proven to be popular. The model generates a predicted
covariance matrix R which is a function of a vector of
parameters of additive genetic, common environment,
specific environment and the causal path from initiation to
progression. This model can be specified in a variety of
ways, including through the graphical user interface of Mx
(Neale et al., 1999, http://www.views.vcu.edu/mx).
Appendix A contains an Mx script which implements the
model specification in matrix form.
Table 1 Crosstabulation of possible outcomes for a pair of relatives assessed on dichotomous initiation and progression variables
Twin 1 Twin 2
Initiation Progression Initiation
No Yes
Progression Progression
No Yes No Yes
a) Theoretical Cells
No No 1 (0000) 2 (0001) 3 (0010) 4 (0011)
No Yes 5 (0100) 6 (0101) 7 (0110) 8 (0111)
Yes No 9 (1000) 10 (1001) 11 (1010) 12 (1011)
Yes Yes 13 (1100) 14 (1101) 15 (1110) 16 (1111)
b) Observable cells
Twin 1 Twin 2
Initiation Progression Initiation
No Yes
Progression Progression
? No Yes
No ? 1+2+5+6 3+7 4+8
Yes No 9+10 11 12
Yes Yes 13+14 15 16
Behav Genet (2006) 36:507–524 509
123
Given the predicted covariance matrix R, the expected
cell frequencies for the 16 cells in Table 1 (panel a) may be
written as
EðCell abcdÞ
¼Z taþ1
ta
Z tbþ1
tb
Z tcþ1
tc
Z tdþ1
td
Uðxa; xb; xc; xdÞdxddxcdxbdxa
ð1Þ
where a and b (scored 0 or 1) index the initiation and pro-
gression of relative 1, and c and d index initiation and pro-
gression for relative 2. Mathematical description of the
proportions under the normal distribution with a single
threshold can be achieved using three quantities for the limits
of integration: –¥, t and +¥, which we label as Threshold 0,
Threshold 1 and Threshold 2. For the initiation dimension in
both twins, the threshold is denoted tI, and for the progres-
sion dimensions it is tP. These definitions enable us to use the
binary identification code for a cell in Table 1 (panel a) to
define the integral that describes its predicted proportion. For
example, cell 8 of Table 1 (panel a) has a binary identifi-
cation of abcd=0111, so the expected cell frequency is
EðCell 0111Þ
¼Z t1
t0
Z t2
t1
Z t2
t1
Z t2
t1
UðI1; P1; I2; P2ÞdP2; dI2; dP1; dI1
¼Z tI
�1
Z 1tP
Z 1tI
Z 1tP
UðI1; P1; I2; P2ÞdP2; dI2; dP1; dI1:
ð2Þ
This cell cannot be observed directly, because progres-
sion cannot be measured in twin 1 who is below threshold
for initiation. Pairs of this type will be indistinguishable
from pairs where twin 1 is below threshold on both initi-
ation and progression, Thus the predicted frequency for
pairs where twin 1 is below threshold (cell 3+7 in Table 1,
panel b), is the sum of the expected frequencies for cells 3
and 7 in Table 1 (panel a). The sum of these two four-
dimensional integrals may be expressed as a three dimen-
sional integral which may be evaluated more rapidly:
EðCell 0111Þ þ EðCell 0011Þ
¼Z tI
�1
Z 1tI
Z 1tP
UðI1; I2; P2ÞdP2dI2dI1
ð3Þ
because
Z t
�1/ðxÞ dxþ
Z 1t
/ðxÞ dx ¼Z 1�1
/ðxÞ dx ¼ 1: ð4Þ
The Mx script in Appendix A evaluates these integrals
explicitly, using the mnor function, which takes a covari-
ance matrix, a vector of means, a vector of lower thresh-
olds, a vector of upper thresholds, and a vector of flags
indicating the type of integration required. mnor returns the
value of the integral requested, which is the area under the
curve in the case of a univariate function, a volume in the
case of a bivariate function, and a hypervolume in the case
of trivariate or higher order multivariate functions.
Equivalence of Missing Data Approach
Although it is possible to use the function mnor to evaluate
integrals of the type given in Equation (3) in Mx, it becomes
increasingly inconvenient to do so when we wish to extend
the model to the multivariate case, or when the number of
subcategories of progression is large. Much simpler speci-
fication can be obtained by recognizing that the situation is
a special case of missing data. Individuals below threshold
for initiation have unknown status for progression, so they
have a missing value for progression. The maximum like-
lihood approach for dealing with missing data implemented
in Mx does precisely the same thing as is done by the
manual addition of integrals described above. In Mx the
user specifies a model (a formula that yields an m · m
matrix) for the covariance structure, and another (1 · m)
matrix formula for the thresholds. These formulae describe
a general case that has no missing data. The likelihood of a
binary data vector is computed by evaluating an m-dimen-
sional integral of the form of Equation (3). When there are
missing data, the covariance matrix is ‘‘filtered’’ to yield a
covariance matrix containing only those elements corre-
sponding to the variables that are present (rows and columns
corresponding to variables that are missing are deleted).
Likewise, the threshold matrix is filtered to delete those
thresholds corresponding to the variables that are missing.
The likelihood, a reduced dimension integral, is equivalent
to summing over all the possibilities, as in Equation (4).
It is therefore possible to specify a model for the co-
variances, a model for the thresholds and to enter the data
in rectangular ordinal file format for raw ordinal maximum
likelihood analysis. This approach has tremendous advan-
tages in terms of simplicity and flexibility. First, we can
take advantage of being able to model the thresholds as a
function of covariates such as age. Second, we can exploit
the general treatment of ordinal variables in Mx, simply by
increasing the size of the threshold formula. For example,
if there are eight ordered forms of progression, changing
the model simply requires increasing the number of rows of
the threshold matrix formula to accomodate the new
thresholds. In contrast, the required modifications to the
mnor based script of Appendix A would be substantial,
involving 81 formulae for the relevant integrals. Third, it
becomes much easier to use existing Mx scripts for more
complicated forms of twin data analysis, such as G·E
510 Behav Genet (2006) 36:507–524
123
interaction, sex limitation, rater bias, QTL effects and
multivariate models. Examples of these three classes of
extension are presented below.
Analysis of Covariates via Means
A very simple extension to the causal contingent model is
to include the effects of subject age. One approach to this
problem was presented by Neale et al. (1989) in which age
was added as an observed variable that caused the pheno-
types of twins. Although the method appears to work rea-
sonably well, there can be a problem if age is not normally
distributed in the sample. The general assumption of
multivariate normality would be violated, which would
cause goodness-of-fit statistics to be biased. To circumvent
this problem, it is possible to remove the effects of age by
modeling the threshold of subject i, tI as a simple linear
function:
tI ¼ t þ ageita
where t is the population baseline threshold (for individuals
of age zero), ta models the regression of the threshold on
age, and agei is the age in years of individual i at assess-
ment. It is possible to implement this type of ‘‘multilevel’’
analysis in Mx (Neale et al., 1999) via definition variables.
In this example the model for the effects of age at
assessment is linear, but more complex forms, such as
quadratic or logistic would be easy to specify.
An analogous approach was used to model the effects of
age in the continuous variable case in previous reports
(Neale et al., 2000; Zhu et al., 1999), where there is a
change in mean as a function of age. A more complex
approach to modeling thresholds as a function of age using
multiple groups was presented by Pickles et al. (1994). In
the present case, initiation and progression are treated as a
bivariate twin problem, and therefore thresholds for both
initiation and progression are allowed to vary as separate
linear functions of age.
Application: Age Effects on Cannabis Consumption
Sample and Measures
Illustrative data for age effects come from a study of ge-
netic and environmental risk factors for common psychi-
atric and substance use disorders in Caucasian female–
female twin pairs from the population-based Virginia Twin
Registry (Kendler et al., 1999). Telephone interviews were
completed on 1942 of 2293 (86.1% of eligible) individual
twins. Lifetime cannabis use and abuse was assessed by
modules modified from the Structured Clinical Interview
for DSM-IIIR (SCID; Spitzer et al., 1987). Use and abuse/
dependence were coded as binary variables. Cannabis use
was defined as lifetime use of cannabis (hashish or mari-
juana). Abuse was defined when subjects reported at least
one of the following criteria: use in dangerous situations;
legal problems arising from use; social problems arising
from use; or ignoring work or other significant obligations.
Dependence was defined when at least three of the following
symptoms were reported: use despite it causing physical
problems; use in larger quantities than intended; unsuc-
cessful attempts to cut down or quit; spending large amounts
of time obtaining the drug; tolerance to the drug’s effects; or
withdrawal symptoms experienced on cessation of use.
Results of fitting the causal model without age effects
were presented by Kendler et al. (1999). This earlier
analysis was restricted to pairs in which both twins par-
ticipated and provided responses. Here we use data from
incomplete pairs in which one twin participated and their
cotwin did not. If these incomplete twin pairs do not differ
from the complete pairs in their means or variances, then
the data are missing completely at random (MCAR) (Little
and Rubin, 1987). Under MCAR, the addition of these
pairs will not substantially affect the parameter estimates
but they will increase the precision of the estimate of the
threshold. A second possibility is that the unmatched pairs
are not representative of the population, but the values of
the non-missing twin predict the missingness of the cotwin,
along with some completely random missingness. This is
known as missing at random (MAR) (Little and Rubin,
1987). Here, use of unmatched twins will lead to changes
in the estimates of the parameters as well as increasing
their precision. In total, data from 647 MZ pairs (499
complete) and 450 DZ pairs (327 complete) were used.
Method
The original analyses by Kendler et al. (1999) were con-
ducted using a minimum v2 loss function which has some
advantages when cell frequencies are small (Agresti, 1990).
In this article we will use the maximum likelihood loss
function, and will compare the estimates obtained with the
two approaches. A third possibility is to use the power
divergence statistic with k=2/3 (Read and Cressie, 1988)
which has been shown to have advantageous properties when
cell frequencies are low (Read and Cressie, 1988). The
minimum v2 and maximum likelihood functions are special
cases of the power divergence statistic, with k=1 and lim kfi 0 respectively. The Mx script shown in Appendix B was
used to fit the model with age/cohort effects.
Results
The first two rows of Table 2 show results of analysing the
exact same data with the maximum likelihood and mini-
Behav Genet (2006) 36:507–524 511
123
mum v2 loss functions. While the results are not identical,
they are very similar and suggest that the use of maximum
likelihood with these sample sizes and this model is not
likely to generate substantial bias in parameter estimates.
This argument is supported by the fact that estimates with
ML are well within the confidence intervals of those ob-
tained under minimum v2. The third row of Table 2 shows
that including pairs with missing data increases the esti-
mate of the causal path from use to abuse, with a con-
comitant reduction in the proportion of variance accounted
for by residual components. Addition of age effects on the
two thresholds substantially improves the fit of the model
(difference between the two models’ –2ln L (asymptoti-
cally distributed as v22) =26.30, p < .001). Parameter
estimates show that the thresholds increase (and prevalence
decreases) with age in years according to the formulae
tI=–.797+.023*age (95% CI=.013; .029)and tP=0.999
+.012*age (95% CI=–.002; .024), indicating significant
effects of age for initiation only. The broader confidence
intervals on the age effect for persistence are consistent
with the lower frequency of persistence in the sample.
Mean bootstrap parameter estimates and their 95% CI’s
estimated directly from the 2.5% and 97.5% of 348 boot-
strap estimates are shown in the fifth row of the Table. The
lower CI’s for common environment effects overlap zero
for the common environment effects on initiation and for
all components of variance for abuse. This is not surprising
given that the value of .96 for the upper CI of the regres-
sion of risk for abuse on initition is close to unity. Results
from using the Power Divergence statistic (row PD in
Table 2) are generally consistent with those of minimum
v2, and differ only slightly from the maximum likelihood
parameter estimates, indicating that the ML estimation
procedure is relatively robust to the observed cell fre-
quencies in this case.
The addition of age effects on the prevalence of initia-
tion and progression does not affect the squared estimate of
b2, the causal path from initiation to progression. The
proportion of variance in progression varies in a narrow
range (.66 to .74) across all the analyses, indicating good
agreement on the point estimate of this component. The
proportion of variance associated with the common envi-
ronment (c2) is only slightly reduced by the addition of age,
which implies that most c2 is not due to changes in avail-
ability of cannabis across the cohort years assessed in this
sample. The confidence intervals on the transmission path
indicate that the hypothesis that the factors are independent
can be rejected, as can the hypothesis that they represent
points on a single liability dimension.
Multivariate Model
Often it is not sufficient to model external variables as
covariates because a deeper understanding of their rela-
tionship with substance use or abuse is desired. For
example, quite different psychobiological conclusions
would be drawn if the relationship between drug use and
the personality factor neuroticism was due to environ-
mental correlation than if it was due to shared genetic
factors. Furthermore we may wish to understand the rela-
tionship between the use of different substances, whether
initiation of one substance is a risk factor for initiation of a
second, or if initiation of one substance increases risk for
dependence on another. Development of a multivariate
model is necessary to test hypotheses of this type.
The simpler missing data approach extends directly to
the multivariate case, which makes possible the consider-
ation of alternative models for the covariance between
initiation and progression of two or more substances or
traits. Perhaps the most straightforward case is where we
estimate parameters that can be used to partition the
covariance between the traits into genetic and environ-
mental components. The Cholesky decomposition is useful
Table 2 Results of fitting the conditional causal model to data on
initiation and subsequent abuse (progression) of cannabis. Parameter
estimates were obtained using minimum v2 (min v2), maximum
likelihood (ML), and power divergence fit statistics on data from
complete pairs. The effects of including pairs in which data on use are
missing for one of the twins (ML miss) are also shown, together with
estimates when the effects of age on the threshold are modeled (ML
miss age). Averaged bootstrap estimates for this model (which were
computed by sampling the dataset with replacement and re-analyzing,
using the Mx bootstrap options) are denoted ML miss age bs. The
parameters subscripted I and M refer to initiation and abuse,
respectively; a is additive genetic, c is shared environment, e is
random environment, and b is the causal path from initiation to
progression.
Method
Initiation
b2
Progression
a2I c2
I e2I a2
M c2M e2
M
Min v2 .46 .29 .25 .66 .17 .00 .17
ML .48 .28 .25 .71 .14 .00 .15
ML miss .48 .28 .25 .73 .17 .00 .16
ML miss age .48 .26 .26 .73 .16 .00 .16
ML miss age bs .49 .26 .26 .74 .12 .01 .15
95% CI’s .18; .77 .00; .53 .18; .34 .43; .96 00; .33 .00 .09 .00; .36
PD .46 .29 .25 .68 .17 .00 .15
512 Behav Genet (2006) 36:507–524
123
for this purpose as it is numerically robust and easy to
specify (Neale and Cardon, 1992). In common with the
univariate treatment, the genetic and environmental factors
for initiation and progression are assumed to be uncorre-
lated1, and only within-trait causal paths are specified, as
shown in Figure 2a. It is the job of model fitting to find the
combination of these components that best matches the
observed pattern of familial resemblance in the data.
There are several alternative genetic models of comor-
bidity between initiation and progression of two sub-
stances. In principle, all the general comorbidity models of
Neale and Kendler (1995) could be applied to this type of
data. Here we use the reciprocal causal model as it provides
a simple examination of whether liability to use of one
substance is a risk factor for use or abuse of another, or vice
versa. Figure 2b shows this reciprocal causal model for an
individual. It is important to note that this model is at the
level of liability, not at the level of expression of the
phenotype per se, and is therefore not the best represen-
tation of the gateway hypothesis of substance abuse. To
clarify, suppose that an environmental factor causes a
change of +1 to the z-score of an individual’s liability to
use of substance X. In the causal model used here, this
environmental factor would have the same effect on lia-
bility to use and abuse of other substances regardless of the
individual’s initial liability. By contrast, under a gateway
hypothesis, it is assumed that it is only use of substance X
itself that causes an increase in the use or abuse of other
substances. Therefore, the environmental factor would only
have an effect on the use of substance X if the prior lia-
bility to use X was at most one unit less than the threshold
for use. The environmental factor would cause a change in
the use status of the individual, with possible concomitant
changes in liability to other substances. This more explicit
gateway model is not used here.
Application: Tobacco and Cannabis Smoking
Sample and Measures
The data for these analyses are taken from the third inter-
view wave of a population-based longitudinal study of
female twins (1,898 individuals, 851 complete pairs).
Sample ascertainment and the smoking measures are
described in detail elsewhere (Kendler et al., 1993, 1999).
Briefly, we defined smoking initiation as having ever
smoked a single cigarette. Regular smoking was defined as
a pattern of use in which the respondent smoked an average
of seven cigarettes per week for at least 4 weeks. Nicotine
dependence (ND) was defined according to scores on the
Fagerstrom Tolerance Questionnaire (FTQ) (Fagerstrom,
1978; Fagerstrom and Schneider, 1989). The FTQ is an
eight item scale (range 0–11) that is widely used in the
smoking literature and which assesses the degree of
dependence on nicotine. Scores of seven or more are
consistent with ND (Fagerstrom and Schneider, 1989). The
time frame for the FTQ was the subject’s lifetime period of
maximum cigarette use.
The current analyses differ from those previously pub-
lished Kendler et al., 1999) in two ways. First, smoking
initiation in the prior report (Kendler et al., 1999) is re-
ferred to here as regular smoking. Second, ND in the prior
report was based on a continuous factor score based on 12
items (including the eight FTQ items) whereas for sim-
plicity ND in this report is a dichotomization of the FTQ
total score. Measurement of cannabis use and abuse was as
described above for the analysis of age effects.
Method
Data were prepared as rectangular files consisting of one
record per twin pair. Each record contained data on initi-
ation of and dependence on nicotine and initiation and
abuse of cannabis. The Cholesky and Reciprocal Interac-
tion models were fitted by maximum likelihood to the raw
data. The Mx script used for this purpose is shown in
Appendix C.
Results
Table 3 shows parameter estimates from fitting two mul-
tivariate models to the twin data on nicotine initition (NI)
and dependence (ND) and cannabis use (CU) and abuse
(CM). The Cholesky and the causal models provided
similar fit to the data. Using Akaike’s Information Crite-
rion, the causal model is slightly more parsimonious having
two more degrees of freedom and only fitting 2.8 v2 units
more poorly. Despite their different substantive interpre-
tations, the two models generate very similar predicted
within-person correlations (see Table 4). These correla-
tions show that liability to initiation and progression for
nicotine and cannabis are closely related, especially within
substance. The two lowest correlations of approximately .5
are between ND and cannabis initiation.
The Cholesky model indicates quite substantial corre-
lations between the genetic and environmental components
of NI and CI, and between ND and CM, although these
correlations must be viewed with scepticism when the
variance components are small. For example, the common
environment correlation of 1.0 for ND with CM is practi-
cally meaningless because only a tiny fraction of the
1 This assumption is a consequence of partitioning the variation in
progression into components due to liability to initiate, and residual
components.
Behav Genet (2006) 36:507–524 513
123
1.00
AIN
1.00
CIN
1.00
EIN
IN
PN
APN CPN EPN
1.00 1.00 1.00
ain cin ein
apn cpn epn
bn
AIC CIC EIC
IC
PC
APC CPC EPC
1.00
aic
1.00
cic
1.00
eic
bc
apc cpc epc
1.00 1.00 1.00
aicn cicn eicn
apcn cpcn epcn
AIN CIN EIN
IN
PN
APN CPN EPN
AIC CIC EIC
IC
PC
APC CPC EPC
1.00 1.00 1.00
1.00 1.00 1.00
ain cin ein
apn cpn epn
bn
1.00
aic
1.00
cic
1.00
eic
bc
apc cpc epc
1.00 1.00 1.00
bicn
bpcnbpnc
binc
(a)
(b)
Fig. 2 Two bivariate models
for initation (I) and progression
(P) of cannabis (C) and nicotine
(N). Top: Cholesky
decomposition of sources of
covariance between IN and IC,
and between PN and PC.
Bottom: causal model of IN and
IC as risk factors for each other
and for PN and PC
514 Behav Genet (2006) 36:507–524
123
variance of each of these traits is associated with the
common environment. By contrast, the substantial genetic
correlation of .82 between NI and CI is quite precise be-
cause the genetic variance for both is quite substantial. An
interesting feature of the Cholesky model is that the spe-
cific environment correlation for dependence is negative
while the genetic correlation is positive. This finding sug-
gests that the two traits have more risk factors in common
than their phenotypic correlation suggests, although some
specific environment risk factors act in opposite ways on
nicotine and cannabis use.
The causal model results indicate similar findings to
previous univariate analyses in that liability to initiate ac-
counts for a substantial proportion of variance in liability to
dependence or abuse. A novel finding is that for initiation
there appears to be a negative feedback loop such that
liability to initiate smoking increases liability to initiate
cannabis (path =.82), whereas liability to CI decreases
liability to initiate nicotine (path=–.63, shown as a squared
term binc2=(–).40 in Table 3). Another interesting result is
that the cross-paths, from initiation of one substance to
abuse/dependence on the other are estimated at zero. That
is, liability to initiation of nicotine (cannabis) does not
appear to influence the liability for progression to abuse of
cannabis (nicotine dependence). There is however an
indirect effect as the initiation liabilities appear to have
mutual influence.
Multiple Stage Model
Several traits of interest may have more than one pre-
requisite before they are observed. For example, cessation
of nicotine dependence cannot occur before both nicotine
initiation and nicotine dependence have occurred. In a
clinical setting, a surgery may not occur unless injury has
occurred and it has been deemed serious enough by the
primary care physician to warrant referral to the surgical
unit. Similarly, one might observe initiation of alcohol use,
progression to alcohol abuse, and clinical treatment for
alcohol abuse as three stages of interest.
The theoretical possible outcomes for a pair of relatives
for a three-stage (two transitions) model are shown in
Table 5. While there are in principle 26=64 possible out-
comes, many cannot be observed in practice, or at least
only as part a heterogeneous outcome. The possible pair-
wise combinations that may be observed are delineated by
solid lines in Table 5; for example, if Twin 1 does not
initiate, and Twin 2 initiates but does not progress to stage
2, the cells 7, 15, 23 and 31 describe possible pair types
with this outcome. In this case, there are only four possible
outcomes for an individual: no initiation; initiation but no
progression; initiation and progression to the next stage
only; initiation and progression to the next stage and to the
final stage.
Technically, it is very easy to implement the multiple
stage model with the raw missing data approach. A path
diagram of the model is shown in Figure 3. This diagram
can be drawn in the Mx graphical interface, or script lan-
guage can be used to implement the model using matrix
algebra. The raw ordinal data input consists of twin pair
records with six values per pair: initiation, stage 1 and
stage 2 for both twins. When there is no initiation, both
stage 1 and stage 2 are coded as missing. When there is
Table 3 Parameter estimates from fitting two multivariate models to
data on the initiation and progression of cannabis and nicotine.
Proportions of additive genetic (a2), common environment (c2), and
specific environment (e2) variance are proportions of variance
excluding the causal influence( fi ) of other variables. Note: rA, rC
and rE were computed from parameter estimates in Figure 2 using,
e.g., ain � aicn=ffiffið
pain2ðaicn2 þ aic2ÞÞ
Nicotine Cannabis
Initiation Progression Initiation Progression
Cholesky factor model (–2lnL=6523.05, df=6015, AIC=–5506.95)
a2 0.67 0.71 0.40 0.49
c2 0.11 0.02 0.33 0.02
e2 0.21 0.27 0.27 0.48
bn – .57 – –
bc – – – .79
rA .82 .38
rC .85 1.00
rE .45 –.39
Causal model (–2lnL=6525.87, df=6017, AIC=–5508.132)
a2 0.81 0.76 0.15 0.43
c2 0.00 0.00 0.37 0.00
e2 0.19 0.24 0.47 0.57
bn2 – .68 – –
bc2 – – – .64
binc2 (–).40 – – –
bicn2 – – .85 –
bpnc2 – .00 – –
bpcn2 – – – .01
Notes: progression of cannabis is defined as DSM-IIIR abuse or
dependence; progression of nicotine is defined as nicotine depen-
dence. Fit statistics are: –2lnL, minus twice the logarithm of the
likelihood; df, degrees of freedom; and AIC, Akaike’s Information
Criterion (Akaike, 1987).
Table 4 Predicted within-person correlations between nicotine and
cannabis initiation, nicotine dependence and cannabis abuse from two
multivariate models. Results from the Cholesky factor model are
below the diagonal, and from the causal model are above the diagonal
Nicotine Cannabis
Initiation Dependence Initiation Abuse
Nicotine Initiation 1.00 0.79 0.70 0.64
Nicotine Dependence 0.75 1.00 0.53 0.49
Cannabis Initiation 0.70 0.52 1.00 0.84
Cannabis Abuse 0.63 0.50 0.89 1.00
Behav Genet (2006) 36:507–524 515
123
initiation but no progression to stage 1, stage 1 is coded as
zero and stage 2 is coded as missing. When there is initi-
ation and progression to stage 1 but not to stage 2, stage 2
is coded as zero.
In practice, the effective sample sizes will generally
decline for more advanced stages. This will make estima-
tion of variance components specific to advanced stages
more difficult, as they will have larger standard errors.
Furthermore, if the path from previous stages is large there
will be relatively little specific variation to partition and
thus estimation will be even less precise.
Figure 3 shows a direct path b3 from Initiation to
Stage 2, which can be difficult to grasp conceptually. It
is important to remember that these paths reflect statis-
tical regression paths, and not pathways for transition.
Thus b3 represents some influence of liability to Initia-
tion that does not influence liability to Stage 1. For
example, suppose that the three stages are initiation of
drug use, drug use on more than five occasions, and drug
addiction. Imagine that there are regional differences in
availability such that the drug is always available in
some regions but not often available in others. This
variation in availability might cause individual differ-
ences in initiation, and also in addiction which cannot
occur without a regular supply. However, the effect of
erratic supply on trying the substance five or more times
might be relatively trivial and therefore transmission of
cause through this intermediate stage would underesti-
mate the strength of association between initiation and
addiction.
Application: Tobacco Initiation, Regular Smoking and
Nicotine Dependence
Sample and Measures
The sample used here is the same as described above for
the bivariate analysis of nicotine and cannabis initiation
and progression.
Table 5 Crosstabulation of theoretically possible outcomes for a pair of relatives assessed on initiation and two stages of progression. Of the 64
possible cells, only 16 may be observed. The solid lines delineate the 16 possible observed classes
Twin 1 Twin 2
Init Stg 1 Stg 2 Initiation
No Yes
Stage 1 Stage 1
No Yes No Yes
Stage 2 Stage 2 Stage 2 Stage 2
No Yes No Yes No Yes No Yes
No No No 1 2 3 4 5 6 7 8
No No Yes 9 10 11 12 13 14 15 16
No Yes No 17 18 19 20 21 22 23 24
No Yes Yes 25 26 27 28 29 30 31 32
Yes No No 33 34 35 36 37 38 39 40
Yes No Yes 41 42 43 44 45 46 47 48
Yes Yes No 49 50 51 52 53 54 55 56
Yes Yes Yes 57 58 59 60 61 62 63 64
Fig. 3 Multiple stage model for
initiation and progression to two
possible further stages.
Progression is contingent on
being positive for the previous
stage. Parameters may be
estimated with data collected
from relatives
AIN CIN EIN
INITIATION
AS1 CS1 ES1
STAGE 1
AS2 CS2 ES2
STAGE 2
1.00
ain
1.00
cin
1.00
ein
b1
1.00
as1
1.00
cs1
1.00
es1
1.00
as2
1.00
cs2
1.00
es2
b2
b3
516 Behav Genet (2006) 36:507–524
123
Method
The multiple stage model was fitted by maximum likeli-
hood to the raw data files prepared for MZ and DZ twins.
The Mx script used for this purpose is shown in Appendix
D. Data from incomplete pairs were included in the analyis
to reduce possible bias that may accrue if data are MAR but
not MCAR.
Results
Maximum likelihood parameter estimates from fitting the
two transition models are shown in Table 6. The top half of
the table shows that the pattern of strong familial resem-
blance for smoking initiation, comprising 56% additive
genetic 22% common environment variance is recovered
with these data. The vast majority (80%) of the variance of
liability to regular smoking is accounted for by liability to
initiation, with the remainder almost entirely due to addi-
tive genetic effects specific to regular smoking. The tran-
sition from regular smoking to ND is much less strong.
There appears to be residual familial resemblance for ND
that is not accounted for by variance in regular smoking. In
addition, the factors responsible for ND as opposed to
regular smoking appear to be less strongly related to ini-
tiation of any smoking than seemed to be the case when
initiation and dependence were analyzed without infor-
mation on regular smoking.
The lower half of Table 6 shows the results of fitting a
more elaborate model of the relationship between initia-
tion, regular smoking and ND. Here a direct path from
initiation to ND was included. However, very little
improvement in fit was observed and the value of the path
itself is small. These results indicate that there is little
relationship between factors that influence initiation and
factors that influence ND beyond those that are mediated
by regular smoking.
Discussion
Methodological Development
Several extensions to the conditional causal model have
been described and applied. These extensions permit a
number of new hypotheses to be tested. At the simplest
level, the re-specification of the problem as one of missing
ordinal data permits a more general modeling of the
threshold. Covariates such as age, sex or genotype may be
specified to have direct effects on the mean and therefore
on the thresholds. Thus tests for the linear effect of age, or
the additive or non-additive association with measured
genotypes may be implemented without the need to
simultaneously model the distribution of age or the allele
frequencies.
Some covariates are inherently shared by members of
a twin pair reared together, such as socioeconomic status
or demography of the place of residence. Age is fre-
quently regarded as the same for both members of a
pair, but in practice there may be variation if the twins
are assessed at different times. Even when this is the
case, there is little interest in modeling the genetic and
environmental determinants of age, and age probably
will have a non-normal distribution. Therefore, modeling
age and shared environmental effects via their direct
effects on the mean (a ‘‘multilevel’’ model) is a practical
approach to removing their effects from the variable of
interest. No assumption about the distribution of these
variables is required; what remains is that the residuals
of the variables being analysed have a multivariate
normal distribution.
More subtle modeling of the effects of covariates is
possible when the covariate differs between twins in a
meaningful way. While such covariates might be treated in
the same way as age and other shared variables described
above, it should be understood that doing so simply
regresses out the effect of the covariate, which assumes
that the covariate causes the variables being analyzed. As
we have noted elsewhere (Heath et al., 1993; Neale and
Cardon, 1992; Neale et al., 1994a, b), this causal assump-
tion is empirically testable in twin data. It is also possible
to partition the covariance between the covariate and the
variables of interest into genetic, shared and specific
environmental components, via standard multivariate
genetic analysis.
There are some technical limitations to the analysis of
multiple ordinal variables. Integration of the multivariate
Table 6 Parameter estimates from fitting a multiple stage model to
data collected from female MZ and DZ twins on initiation of any
smoking, of regular smoking, and nicotine dependence
Initiation Regular Smoking Nicotine Dependence
No direct effect from Initiation to Dependence (–2lnL= 4014.62,
df=3465)
a2 .56 .17 .23
c2 .22 .02 .21
e2 .21 .01 .38
b12 – .80 –
b22 – – .17
Direct effect from Initiation to Dependence (–2lnL=4014.49,
df=3464)
a2 .56 .20 .08
c2 .22 .00 .32
e2 .21 .01 .39
b12 – .80 –
b22 – – .09
b32 – – .03
Behav Genet (2006) 36:507–524 517
123
normal distribution is computationally demanding when
the number of variables is large. For the purposes of reli-
able optimization, numerical integration should be per-
formed at a high level of accuracy, but doing so makes it
run very slowly. There is thus a trade-off between opti-
mization performance and computational speed. This
problem is somewhat temporary; as computers increase in
speed and as parallel computer architecture is exploited
more effectively, the time taken for more precise integra-
tion will decrease and stable optimization will be more
frequently obtained. In the analyses to date, there can be a
tendency to find local rather than global minima and it is
therefore prudent to use a variety of starting values to en-
sure that the solution obtained is indeed the maximum
likelihood.
Clearly these methods can provide valuable insight into
the scaling of variables that is difficult or impossible to
obtain from other sources. Modeling of data from relatives
provides a unique perspective within a particular variable.
Perhaps the closest parallel is the analysis of longitudinal
data. However, repeated measures taken across a long
interval might not be measuring the same construct. Con-
versely, a short interval between occasions may give rise to
response bias if there is interference from recent testing.
Only the study of relatives seems to be free of such diffi-
culties.
Substantive Findings
The illustrative applications of the methods described in
this paper offer a number of new insights to the etiology of
tobacco and cannabis initiation and progression to abuse or
dependence. First, including age in the model does reveal a
significant change in prevalence of cannabis use across the
18–50 year-old age range of the twins in this study. The
direction of the effect is positive for both initiation and
progression to abuse, according to the formulae
tI=–.797+.023*age and tP=0.999+.012*age. Therefore the
prevalence of both cannabis use and abuse is greater in
younger rather than older samples. This finding is consis-
tent with other epidemiological studies in the USA
(Department of Health and Human Services, 1999).
Despite being statistically significant, the effect is not large
enough to account for the substantial proportion of com-
mon environment variance typically found in studies of
twins. Thus failure to correct for age in previous studies has
most likely increased the estimate of common environment
variance by only one or two percent. The remainder may be
due to parental attributes, religion, availability, rural vs.
urban living, or other demographic factors that are shared
by twins.
Second, the bivariate analysis of nicotine and cannabis
initiation and progression produced several new insights.
Within individuals, initiation for these two substances is
highly correlated (r=.70). Progression to nicotine depen-
dence is less highly correlated with progression to cannabis
abuse (r=.50) although this latter correlation has a broader
confidence interval because data on abuse are missing in
those subjects that have not initiated use. Consistent with
univariate analyses of these data, there are substantial
correlations between the liabilities to initiate and progres-
sion of both nicotine and cannabis. There are also sub-
stantial correlations across the substances; liability to
initiate nicotine correlates .63 with liability to cannabis
abuse and liability to cannabis use correlates .53 with lia-
bility to nicotine dependence. These results indicate that
the factors that cause individual differences in liability to
both licit and illicit substance use have much in common.
Given that both involve the reward system in the brain, this
result may not be surprising, particularly for dependence or
abuse. At the components of variance level, the shared
environmental correlation between nicotine and cannabis
initiation is high (rc=.85), suggesting that social determi-
nants of substance use may be common to both nicotine
and cannabis. The same is true for additive genetic factors
that predispose to initiation of both substances. For factors
specific to progression to ND or cannabis abuse, the results
are less consistent. There is little shared environmental
variation for progression for either substance so the rc is
not relevant. While the additive genetic correlation is po-
sitive ra=.38, the specific environmental correlation is
opposite (re=–.39) suggesting that although there is little
within person correlation for factors specific to ND and
substance abuse, their causes may be quite substantially
correlated. These data therefore support the hypothesis that
there exist general neurobiological factors that predispose
to progression to substance abuse. However, the genetic
correlation is modest, indicating a substantial role for
factors specific to each substance as well. Apparently,
environmental factors not shared by twins may lead to
preference of one substance over the other, reflected by the
negative environment correlation.
The alternative, causal, model for the relationships
among nicotine and cannabis initiation and progression has
a provocative finding of a negative feedback loop between
nicotine and cannabis initiation. Liability to initiate nico-
tine increases liability to initiate cannabis, whereas liability
to initiate cannabis decreases liability to initiate nicotine.
Onset of tobacco use usually precedes onset of cannabis
use, which is somewhat consistent with this finding. The
findings do not answer the question of whether later onset
of cannabis use leads to a decrease in the use of tobacco. It
may be that they reflect a limitation of this modeling at the
population level, and mask a mixture of substance use
patterns. For example, one type of person might initiate
cannabis use and shun legal drugs such as tobacco and
518 Behav Genet (2006) 36:507–524
123
alcohol, perhaps as a manifestation of anti-establishment
feelings. A second type might participate in legal substance
use only, and a third may be willing to try anything.
Mixture distribution modeling, perhaps combined with
longitudinal analysis, might provide an empirical test of
such hypotheses, although it would likely require a good
indicator of group membership.
Third, the multiple stage model of nicotine initiation,
regular use and dependence elucidated some novel findings
about the development of ND in women. Liability to be-
come a regular smoker is very closely related to liability to
initiate. The development of dependence is less closely
related to either liability to regular smoking or initiation.
Factors specific to the development of regular smoking
appear to be largely additive genetic in origin, whereas
those specific to nicotine dependence (excluding those in-
volved in regular smoking) appear to be more environ-
mental. However, confidence intervals are broad and
therefore these findings are not likely to be very robust.
Finally, we note that in this study, abuse or dependence
of cannabis, and nicotine dependence were coded as a
binary variable based on a symptom count. This aggrega-
tion of symptoms into a sum score may lead to biased
estimates of variance components of the latent trait
underlying responses to the symptoms (Neale et al., 2005).
We hope that fully multivariate analyses of the symptoms
will become technically feasible in the future, so that this
potential source of bias may be examined.
Acknowledgements Michael Neale is grateful for support from
PHS grants RR08123, MH01458, DA-18673. Eric Harvey was sup-
ported by NIMH training grant MH-20030, Hermine Maes by HL-
60688, Patrick Sullivan by MH-59160, and Kenneth Kendler by AA-
09095, MH/AA-49492 and DA-11287.
Appendix A
Mx script for fitting univariate model to data collected
from twins using explicit calls to the multivariate
normal distribution integration routine mnor
!
! Mx script for causal conditional model
!
# ngroup 6
Group 1 Compute MZ Correlations
Calculation
Begin Matrices;
A Di 2 2 Free
C Di 2 2 Free
E Di 2 2 Free
B Fu 2 2
I Id 2 2
End Matrices;
Specify B ! causal parameter from
initiation to progression
0 0
7 0
! startingvalues
Matrix A :7 :7
Matrix C :5 :5
Matrix E :5 :5
Matrix B 0 0 :4 0
! parameter bounds
Bound 0 1 A 1 1 A 2 2 C 1 1 C 2 2
Begin algebra;
X = A*A’ ;
Y = C*C’ ;
Z = E*E’ ;
R = (I@((I-B)tf=}PSSym}e))& (X+Y+Zj X+YX+Yj X+Y+Z);End algebra;
End group
Group 2 DZ Correlation matrix
Calculation
Begin Matrices=Group 1;
H Fu 1 1 ! .5
End Matrices;
Matrix H .5
Begin algebra;
R = (I@((I-B)tf=}PSSym}e)) &(X+Y+Zj h@X+Yh@X+Yj X+Y+Z);End algebra;
End
Fit model to MZ data with user-defined
fitfunctionðMLÞData Ni=1 No=1
Begin Matrices;
d full 1 1 ! two
i zero 1 1
n full 1 1 ! scalar 2.0
o full 9 1
r computed =R1 ! correlation matrix A1B1A2B2
t full 1 4 ! thresholds abab
w zero 1 4 ! means
z unit 1 1
End matrices;
matrix d 2
matrix n 2
matrix o ! non, ex, current cell frequencies
214 53 5
55 117 17
1 20 18
! mnor function takes matrices with
4morerowsthancolumns:
! first n (=4) rows are correlation matrix
Behav Genet (2006) 36:507–524 519
123
! row n+1 is mean vector
! row n+2 is upper thresholds
! row n+3 is lower threshold
! row n+4 is indicator, 0 = integrate
from � infinity to upper threshold
! 1 = integrate
from lower threshold to þ infinity
!
Begin algebra ;
e = -o. ln
ðnmnor ðr w t t ðijijijiÞÞþnmnor ðr w t t ðijijijzÞÞþnmnor ðr w t t ðijzjijiÞÞþnmnor ðr w t t ðijzjijzÞÞnmnor ðr w t t ðijijzjiÞÞþnmnor ðr w t t ðijzjzjiÞÞnmnor ðr w t t ðijijzjzÞÞþnmnor ðr w t t ðijzjzjzÞÞnmnor ðr w t t ðzjijijiÞÞþnmnor ðr w t t ðzjijijzÞÞnmnor ðr w t t ðzjijzjiÞÞnmnor ðr w t t ðzjijzjzÞÞnmnor ðr w t t ðzjzjijiÞÞþnmnor ðr w t t ðzjzjijzÞÞnmnor ðr w t t ðzjzjzjiÞÞnmnor ðr w t t ðzjzjzjzÞÞÞ;End algebra ;
Compute d.nsum(e) ;
Option user rs
End
Fit model to DZ data with
user � definedfitfunctionðMLÞData Ni=1 No=1
Begin Matrices = Group 3;
! re-declare o
and r as they are different for DZ0so full 9 1
r comp =R2 ! correlation matrix A1B1A2B2
End matrices;
Specify t 10 11 10 11 ! equate thresholds for
twin 1 2ð and MZ=DZÞMatrix t .2 .0 .2 .0
Matrix o
100 52 3
45 82 16
6 18 4
Bound -2 3 10 11
Begin algebra ;
e = -(o). ln
ðnmnor ðr w t t ðijijijiÞÞþnmnor ðr w t t ðijijijzÞÞþnmnor ðr w t t ðijzjijiÞÞþnmnor ðr w t t ðijzjijzÞÞ
nmnor ðr w t t ðijijzjiÞÞþnmnor ðr w t t ðijzjzjiÞÞnmnor ðr w t t ðijijzjzÞÞþnmnor ðr w t t ðijzjzjzÞÞnmnor ðr w t t ðzjijijiÞÞþnmnor ðr w t t ðzjijijzÞÞnmnor ðr w t t ðzjijzjiÞÞnmnor ðr w t t ðzjijzjzÞÞnmnor ðr w t t ðzjzjijiÞÞþnmnor ðr w t t ðzjzjijzÞÞnmnor ðr w t t ðzjzjzjiÞÞnmnor ðr w t t ðzjzjzjzÞÞÞ;End algebra ;
Compute d.nsum(e);Option user
End
Group 5 constrain variances to 1
Constraint NI=1
Begin Matrices = Group 1;
U Unit 1 2
z izero 4 2
End matrices;
Constraint U ¼ nd2v(r)*z;End
Group 6 - standardize estimates
Calculation
Begin Matrices = Group 1;
I Id 2 2
J iden 4 4
End matrices;
Begin algebra;
K = I@((I-B)tf=}PSSym}e);L ¼ nv2dðnsqrtðnd2v((I@((I-B)tf=}PSSym}e))*ðXþ Yþ ZjX+Y
X+Yj X+Y+Z)* (I@((I-B)tf=}PSSym}e)’))));M = Ltf=}PSSym}e*K*L;R ¼ nd2v(X+Y+Z);
S ¼ ððnd2v(X))%R) ððnd2v(Y))%R) ððnd2v(Z))%R);End algebra;
Labels row S
A A A B C A C B E A E B
Labels row L
MZT1Abeta MZT1Bbeta
MZT1Abeta MZT1Bbeta
End
option func=1.e-10 ! function precision for
optimization
option df=18 ! adjust df
option nd=4 ! 4 decimal places
option eps=.00000001 ! integration precision for mnor
option th=-2 ! retry optimization from final
point twice
520 Behav Genet (2006) 36:507–524
123
Appendix B
Mx script for fitting conditional causal model including
cohort/age effects
! CCC with age
#ngroup 6
Group 1 Compute MZ Correlations
Calculation
Begin Matrices;
A Di 2 2 Free
C Di 2 2 Free
E Di 2 2 Free
B Fu 2 2
I Id 2 2
End matrices;
Specify B ! causal parameter from
initiationtoprogression
0 0
7 0
! starting values
Matrix A .7 .7
Matrix C .5 .5
Matrix E .5 .5
Matrix B 0 0 .4 0
! parameter bounds
Bound 0 1 A 1 1 A 2 2 C 1 1 C 2 2
Begin algebra;
X = A*A’ ;
Y = C*C’ ;
Z = E*E’ ;
R = (I@((I-B)tf=}PSSym}e)) & (X+Y+Zj X+YX+Yj X+Y+Z);End algebra;
End group
Group 2 Compute DZ Correlation matrix
Calculation
Begin Matrices=Group 1;
H Fu 1 1 ! .5
End matrices;
Matrix H .5
Begin algebra;
R = (I@((I-B)tf=}PSSym}e)) &(X+Y+Zj h@X+Yh@X+Yj X+Y+Z);End algebra;
End
Fit model to MZ data
Data Ninput=10
Labels zyg agea nicusea nicpc2a canusea
canabua
nicuseb nicpc2b canuseb canabub
Ordinal file=ffpair2.rec
Select if zyg = 1
Select agea canusea canabua canuseb canabub ;
Definition agea ;
Begin Matrices;
a full 1 1
i zero 1 1
n full 1 1 ! scalar 2.0
o full 9 1
r full 4 4 =R1 ! correlation matrix A1B1A2B2
t full 1 4 ! thresholds abab
u full 1 4 ! thresholds abab
w zero 1 4 ! means
z unit 1 1
End matrices;
Specify a agea ! A gets updated with age for
each case
! during calculation of covariances and
thresholds
matrix n 2
covariance r ;
thresholds t+u@a ;
option rs
End
Fit model to DZ data
Data Ninput=10
Labels zyg agea nicusea nicpc2a canusea
canabua
nicuseb nicpc2b canuseb canabub
Ordinal file=ffpair2.rec
Select if zyg = 2
Select agea canusea canabua canuseb canabub ;
Definition agea ;
Begin matrices =Group 3;
o full 9 1
r full 4 4 =R2 ! correlation matrix A1B1A2B2
End matrices;
specify t 10 11 10 11
Matrix t .2 .0 .2 .0
specify a agea
specify u 12 13 12 13
matrix u .01 .01 .01 .01
bound -.05 .05 u 1 1 U 1 2
Bound -2 3 10 11
covariance r ;
Thresholds t + u@a;
End
Group 5 - constrain variances = 1
Constraint NI=1
Begin Matrices = Group 1
U Unit 1 2
V iz 4 2
End matrices;
Constraint U ¼ nd2v(R)*V ;
End
Behav Genet (2006) 36:507–524 521
123
Group 6 - standardize estimates
Data calc
Begin Matrices;
A Di 2 2 = A1
C Di 2 2 = C1
E Di 2 2 = E1
B Fu 2 2 = B1
I Id 2 2
H Fu 1 1 ! .5
J iden 4 4
End matrices;
Begin algebra;
X = A*A’ ;
Y = C*C’ ;
Z = E*E’ ;
K = I@((I-B)tf=}PSSym}e);L ¼ nv2dðnsqrtðnd2v((I@((I-B)tf=}PSSym}e))*(X+Y+Zj X+YX+Yj X+Y+Z)* (I@((I-B)tf=}PSSym}e)’))));M = Ltf=}PSSym}e*K*L;R ¼ nd2v(X+Y+Z);S ¼ ððnd2v(X))%RÞ ððnd2v(Y))%RÞ ððnd2v(Z))%R);End algebra;
Labels row S
A A A B C A C B E A E B
Labels row L
MZT1Abeta MZT1Bbeta
MZT1Abeta MZT1Bbeta
Interval B 1 2 1
option mu nd=4
option nag=10 db=1
option func=1.e-8
option th=-2
option multiple issat
End
!fit submodel without age effect
save cccage.mxs
drop 12 13
End
Appendix C
Mx script for fitting bivariate conditional causal model
for initiation and progression in pairs of twins
! Bivariate Genetic Cholesky Model CCC
! Simulated ordinal data
#ngroups 4
#define nthresh1 1
#define nthresh2 1
#define nvar 4
Group 1: set up model
Calculation
Begin Matrices;
End matrices;
Bound .0 2 X 1 1 X 2 2 X 3 3 X 4 4
Bound .0 2 Y 1 1 Y 2 2 Y 3 3 Y 4 4
Bound .1 2 Z 1 1 Z 2 2 Z 3 3 Z 4 4
Matrix T
0 0 0 0 0 0 0 0
Bound -3 3 T 1 1 - T 1 8
Specify B
0 0 104 0
101 0 105 0
102 0 0 0
103 0 106 0
Bound -.99 .99 B 1 1 to B 4 4
Labels Col B IS DS IC DC
Labels Row B IS DS IC DC
Matrix X .8 .7071 .6 .5
Matrix Z .6 .5 .8 .7071
Matrix B
0 0 .3 0
.5 0 .3 0
.3 0 0 0
.3 0 .5 0
Begin algebra;
A= X*X’ ;
C= Y*Y’ ;
E= Z*Z’ ;
D= W*W’ ;
F= (J@(K-B))tf=}PSSym}e ;
End algebra;
End
Group 2: Fit model to MZ twin pairs
Data Ninput=10
Labels zyg agea nicusea nicpc2a canusea
canabua
nicuseb nicpc2b canuseb canabub
Ordinal file=ffinits.rec
Select if zyg = 1
select nicusea nicpc2a canusea canabua
nicuseb nicpc2b canuseb canabub ;
Begin Matrices= Group 1;
Covariances F&(A+C+E+Dj A+C+D
B Full nvar nvar Free ! causal pathways
J Iden 2 2
K Iden 4 4
X diag nvar nvar Free ! genetic structure
Y diag nvar nvar Free ! common environmental structure
Z diag nvar nvar Free ! specific environmental
structure
W Lower nvar nvar ! dominance structure (set to
zero)
T Full nthresh2 8 Free
522 Behav Genet (2006) 36:507–524
123
A+C+Dj A+C+E+D) /
Thresholds T ;
Option RSidual
End
Group 3: Fit model to DZ twin pairs
Data Ninput=10
Labels zyg agea nicusea nicpc2a canusea canabua
nicuseb nicpc2b canuseb canabub
Ordinal file=ffinits.rec
Select if zyg = 2
select nicusea nicpc2a canusea canabua
nicuseb nicpc2b canuseb canabub ;
Begin Matrices= Group 1;
H Full 1 1
Q Full 1 1
End matrices;
Matrix H .5
Matrix Q .25
Covariances F&(A+C+E+Dj H@A+C+Q@DH@A+C+Q@Dj A+C+E+D) /
Thresholds T ;
Option RSidual
Options NDecimals=4
option func=1.e-8
End
G4: Constrain variances
Constraint NI=1
Begin Matrices;
U unit 1 4
E symm 8 8 = %e2
Z iz 8 4
End matrices;
Constraint U ¼ nd2v(E) * Z ;
Option
End
Appendix D
Mx script for fitting three stage/two transition causal
model for initiation and two progressions in pairs of
twins
! Bivariate Genetic Cholesky Model CCC
! Simulated ordinal data
!
#ngroups 4
#define nthresh1 1
#define nthresh2 1
#define nvar 3
G1: set up model
Calculation
Begin Matrices;
End matrices;
Bound .0 1 X 1 1 X 2 2 X 3 3
Bound .0 1 Y 1 1 Y 2 2 Y 3 3
Bound .1 1 Z 1 1 Z 2 2 Z 3 3
Bound -3 3 T 1 1 - T 1 6
Specify B
0 0 0
101 0 0
0 1020 0
Bound -.99 .99 B 2 1 B 3 2
Labels Col B IS RS ND
Labels Row B IS RS ND
Matrix X .8 .7071 .6
Matrix Z .6 .5 .8
Matrix B 0 0 0 .5 0 0 0 .5 0
Begin algebra;
A= X*X’ ;
C= Y*Y’ ;
E= Z*Z’ ;
D= W*W’ ;
F= (J@(K-B))tf=}PSSym}e ;
End algebra;
End
G2: MZ twin pairs
#include patccc1.dat
Select if zyg = 1
select evera rega nda everb regb ndb ;
Begin Matrices= Group 1;
Covariances F&(A+C+E+Dj A+C+DA+C+Dj A+C+E+D) /
Thresholds T ;
Option RSidual
End
G3: DZ twin pairs
#include patccc1.dat
Select if zyg = 2
select evera rega nda everb regb ndb ;
Begin Matrices= Group 1;
H Full 1 1
Q Full 1 1
End matrices;
B Full nvar nvar Free ! causal pathways
J Iden 2 2
K Iden nvar nvar
I Lower nthresh2 nthresh2
X diag nvar nvar Free ! genetic structure
Y diag nvar nvar Free ! common environmental
structure
Z diag nvar nvar Free ! specific environmental
structure
W Lower nvar nvar ! dominance structure
T Full nthresh2 6 Free
Behav Genet (2006) 36:507–524 523
123
Matrix H .5
Matrix Q .25
Covariances F&(A+C+E+Dj H@A+C+Q@DH@A+C+Q@Dj A+C+E+D) /
Thresholds T ;
Option RSidual
Options NDecimals=4
option func=1.e-8
End
G4: Constrain variances
Constraint NI=1
Begin Matrices;
U unit 1 nvar
E symm 6 6 = %e2
Z iz 6 nvar
End matrices;
Constraint U ¼ nd2v(E) * Z ;
Option Multiple th=-2
End
References
Agresti A (1990) Categorical data analysis. Wiley
Akaike H (1987) Factor analysis and AIC. Psychometrika 52:317–332
Department of Health and Human Services (1999) National house-
hold survey on drug abuse main findings 1997. 5600 Fishers
Lane, Room 16-015, Rockville MD 20857: Office of Applied
Studies, Substance Abuse and Mental Health Services Admin-
istration. (http://www.samhsa.gov)
Fagerstrom K (1978) Measuring degree of physical dependence to
tobacco smoking with reference to individualization of treat-
ment. Addict Behav 3:235–241
Fagerstrom K, Schneider N (1989) Measuring nicotine dependence: a
review of the fagerstrom tolerance questionnaire. J Behav Med
12:59–182
Heath AC (1990) Persist or quit? testing for a genetic contribution to
smoking persistence. Acta Genet Med Gemellol 39:447–458
Heath AC, Bucholz KK, Madden PAF, Dinwiddie SH, Slutske WS,
Bierut LJ, Statham DJ, Dunne MP, Whitfield JB, Martin NG
(1997) Genetic and environmental contributions to alcohol
dependence risk in a national twin sample: consistency of find-
ings in women and men. Psychol Med 27:381–1396
Heath AC, Kessler RC, Neale MC, Hewitt JK, Eaves LJ, Kendler KS
(1993) Testing hypotheses about direction-of-causation using
cross-sectional family data. Behav Genet, 23(1):29–50
Heath AC, Madden PAF, Martin NG (1998) Statistical methods in
genetic research on smoking. Stat Methods Med Res 7:65–86
Heath AC, Martin NG (1993). Genetic models for the natural history
of smoking: evidence for a genetic influence on smoking per-
sistence. Addict Behav 18:9–34
Kendler KS, Karkowski LM, Corey LA, Prescott CA, Neale MC
(1999) Genetic and environmental risk factors in the aetiology of
illicit drug initiation and subsequent misuse in women. Brit J
Psychiat 175:351–356
Kendler KS, Neale MC, Maclean CJ, Heath AC, Eaves LJ, Kessler
RC (1993) Smoking and major depression: a causal analysis.
Arch Gen Psychiat 50:36–43
Kendler KS, Neale MC, Sullivan PF, Gardner CO, Prescott CA
(1999) A population-based twin study in women of smoking
initiation and nicotine dependence. Psychol Med 29:299–308
Kendler KS, Prescott CA (1998) Cannabis use, abuse and dependence
in a population-based sample of female twins. Am J Psychiat
155:1016–1022
Koopmans J, Heath A, Neale M, Boomsma D (1997) The genetics of
initiation and quantity of alcohol and tobacco use. In: Koopmans
JR (eds) The genetics of health-related behaviors. Print Partners
Ipskamp, Amsterdam, pp 90–108
Koopmans J, Slutske W, Heath A, Neale M (1999) The genetics of
smoking initiation and quantity smoked in dutch adolescent and
young adult twins. Behav Genet 29:383–394
Little RJA, Rubin DB (1987) Statistical analysis with missing data.
New York, Wiley
Meyer JM, Heath AC, Eaves LJ (1992) Using multidimensional
scaling on data from pairs of relatives to ex plore the dimen-
sionality of categorical multifactorial traits. Genet Epidemiol
9:87–107
Neale MC, Cardon LR (1992) Methodology for genetic studies of
twins and families. Kluwer Academic Press
Neale MC, Eaves LJ, Hewitt JK, Kendler KS (1994) Multiple
regression with data collected from relatives. Multivar Behav
Res 29:33–61
Neale MC, Kendler KS (1995) Models of comorbidity for multifac-
torial disorders. Am J Human Genet 57:935–953
Neale MC, Lubke GH, Aggen SH, Dolan CV (2005) Problems with
using sum scores for estimating variance components: contam-
ination and measurement non-invariance. Twin Res Human
Genet 8(6). (In Press)
Neale MC, Martin NG (1989). The effects of age, sex and genotype
on self-report drunkenness following a challenge dose of alco-
hol. Behav Genet 19:63–78
Neale MC, Walters EW, Heath AC, Kessler RC, Perusse D, Eaves LJ,
Kendler KS (1994) Depression and parental bonding: cause,
consequence, or genetic covariance? Genet Epidemiol 11:503–
522
Neale M, Boker S, Xie G, Maes H (1999) Mx: statistical modeling
(5th Ed). Box 980126 Richmond VA, Department of Psychiatry
Virginia Commonwealth University
Neale M, De Knijff P, Havekes L, Boomsma D (2000) Influences of
the ApoE polymorphism on quantitative apolipoprotein E levels.
Genet Epidemiol 18:331–340
Pickles A, Neale MC, Simonoff E, Rutter M, Hewitt J, Meyer J,
Crouchley R, Silberg J, Eaves L (1994) A simple method for
censored age of onset data subject to recall bias: mothers reports
of age of puberty in male twins. Behav Genet 24:457–468
Read TRC, Cressie NAC (1988). Goodness-of-fit statistics for dis-
crete multivariate data. New York, Springer-Verlag
Spitzer RL, Williams JB, Gibbon M (1987) Structured Clinical
Interview for DSM-III-R. New York, Biometrics Research Dept.
and New York State Psychiatric Institute
True WR, Heath AC, Scherrer JF, Goldberg J, Lin N, Eisen SA,
Lyons MJ (1997) Genetic and environmental contributions to
cigarette smoking. Addiction 92:1277–1287
Zhu G, Duffy DL, Eldridge A, Grace M, Mayne C, O’Gorman L,
Aitken JF, Neale MC, Hayward NK, Green NG, Martin AC
(1999) A major quantitative-trait locus for mole density is linked
to the familial melanoma gene cdkn2a: a maximum-likelihood
combined linkage and association analysis in twins and their
sibs. Am J Human Genet
524 Behav Genet (2006) 36:507–524
123