A Bayesian Model for the Analysis of Transgenerational Epigenetic Variation.
Transcript of A Bayesian Model for the Analysis of Transgenerational Epigenetic Variation.
1
A Bayesian Model for the Analysis of Transgenerational Epigenetic Variation. 1
Luis Varona*§, Sebastián Munilla*†, Elena Flavia Mouresan*, Aldemar González-2
Rodríguez*, Carlos Moreno*§, Juan Altarriba*§. 3
4
* Unidad de Genética Cuantitativa y Mejora Animal, Universidad de Zaragoza, 5
50013, Zaragoza, SPAIN. 6
§ Instituto de Biocomputación y Física de los Sistemas Complejos (BIFI), 7
Universidad de Zaragoza, 50018, Zaragoza, SPAIN. 8
† Departamento de Producción Animal, Facultad de Agronomía, Universidad de 9
Buenos Aires, 1417, Ciudad Autónoma de Buenos Aires, ARGENTINA 10
11
12
G3: Genes|Genomes|Genetics Early Online, published on January 23, 2015 as doi:10.1534/g3.115.016725
© The Author(s) 2013. Published by the Genetics Society of America.
2
RUNNING TITLE: Epigenetic Variance Component Estimation 13
KEYWORDS: Epigenetics, Bayesian Analysis, Genetic Variance, Resemblance 14
Between Relatives. 15
16
CORRESPONDING AUTHOR: 17
Luis Varona 18
Unidad de Genética Cuantitativa y Mejora Animal. 19
Facultad de Veterinaria. Universidad de Zaragoza 20
c/ Miguel Servet 177. 50013. ZARAGOZA. SPAIN 21
Phone: (34)976-761623 22
Fax: (34)976-761612 23
e-mail: [email protected] 24
25
3
ABSTRACT 26
Epigenetics has become one of the major areas of biological research. However, the 27
degree of phenotypic variability that is explained by epigenetic processes still remains 28
unclear. From a quantitative genetics perspective, the estimation of variance 29
components is achieved by means of the information provided by the resemblance 30
between relatives. In a previous study, this resemblance was described as a function of 31
the epigenetic variance component and a reset coefficient that indicates the rate of 32
dissipation of epigenetic marks across generations. Given these assumptions, we 33
propose a Bayesian mixed model methodology that allows the estimation of epigenetic 34
variance from a genealogical and phenotypic database. The methodology is based on the 35
development of a T matrix of epigenetic relationships that depends on the reset 36
coefficient. In addition, we present a simple procedure for the calculation of the inverse 37
of this matrix (T-1) and a Gibbs sampler algorithm that obtains posterior estimates of all 38
the unknowns in the model. The new procedure was used with two simulated data sets 39
and with a beef cattle database. In the simulated populations, the results of the analysis 40
provided marginal posterior distributions that included the population parameters in the 41
regions of highest posterior density. In the case of the beef cattle dataset, the posterior 42
estimate of transgenerational epigenetic variability was very low and a model 43
comparison test indicated that a model that did not included it was the most plausible. 44
45
4
INTRODUCTION 46
Epigenetics studies variations in gene expression that are not caused by modifications of 47
the DNA sequence (Eccleston et al. 2007) and is now considered as one of the most 48
important fields of biological research. Several biochemical mechanisms that alter gene 49
activity (DNA methylation, histone modifications etc.) underpin epigenetic processes 50
(Jablonka and Raz 2009; Law and Jacobsen 2010). A number of studies have 51
emphasised the importance of epigenetics in cancer (Esteller 2008) and other human 52
illnesses (Feinberg 2007) as well as in other mammal (Reik 2007) and plant traits 53
(Henderson and Jacobsen 2007). Gene activity modifications may occasionally occur in 54
a sperm or an egg cells and can be transferred to the next generation, denoted as 55
transgenerational epigenetic inheritance (Youngson and Withelaw 2008); a phenomenon 56
that has been reported in a wide range of organisms (Jablonka and Raz 2009). Despite 57
this, the magnitude of the phenotypic variation that is explained by epigenetic processes 58
still remains unclear (Grossniklaus et al. 2013; Heard and Martienssen 2014). 59
From the perspective of quantitative genetics, the presence of transgenerational 60
epigenetic inheritance involves a redefinition of the covariance between relatives. Tal et 61
al. (2010) developed a model for the calculation of the covariance between relatives for 62
asexual and sexual reproduction as a function of epigenetic heritability ( ), the reset 63
coefficient (v), and its complement, the epigenetic transmission coefficient (1-v). 64
According to these authors, the covariance between relatives is reduced as the number 65
of opportunities to dissipate (or reset) the epigenetic marks increase. Therefore, for 66
sexual diploid organisms, the covariance between parent and offspring is higher than the 67
covariance between full sibs, and the covariance between half sibs is higher than the 68
covariance between an uncle and its nephew, despite the fact that the additive numerator 69
relationship between them is identical (0.5 and 0.25, respectively). 70
2
5
In animal and plant breeding, the standard procedure for estimating variance 71
components is by means of a linear mixed model (Henderson 1984) that includes 72
systematic or random environmental effects, one or more random genetic effects and a 73
residual. The variance components are estimated by using likelihood-based procedures 74
(Patterson and Thompson 1971) or Bayesian approaches (Gianola and Fernando 1986). 75
Under the Bayesian paradigm, the standard procedure for determining the posterior 76
distribution of the variance components uses the Gibbs Sampler algorithm (Gelfand and 77
Smith 1990) that involves an updated iterative sampling scheme from the full 78
conditional distributions of all the unknowns in the model. 79
The aim of this study is to present a Bayesian linear mixed model that allows to 80
estimate the heritable epigenetic variability, the reset coefficient and the epigenetic 81
transmission coefficient, based on genealogical and phenotypic information. The model 82
is illustrated with two simulated datasets and with one example that considers a beef 83
cattle database. 84
85
MATERIAL AND METHODS 86
Statistical Model 87
The standard mixed linear model is described by the following equation: 88
eZuXby 89
where y is the vector of phenotypic records, b is the vector of systematic effects, u is the 90
vector of random additive genetic effects, and e is the vector of residuals. Then, X and 91
Z, are the matrices that link the systematic and additive genetic effects with the data. 92
The usual assumption for the prior distribution of u and e are the following multivariate 93
Gaussian distributions (MVN): 94
2,~ uMVN A0u 2,~ eMVN I0e 95
6
where and are the additive genetic and residual variances, respectively, and A is 96
the numerator relationship matrix. Further, the prior distribution of b is commonly 97
assumed to be a uniform distribution. Conjugate priors for the variance components are 98
the following inverted chi-square distributions: 99
100
where and are the prior values of the variances and and are their 101
corresponding prior “degrees of belief”. 102
In order to estimate transgenerational epigenetic variability, this standard model can be 103
expanded to: 104
105
where w is the vector of individual epigenetic effects. Note that the incidence matrix (Z) 106
is the same for u and w and that both genetic and epigenetic effects are assumed to be 107
independent. The prior distribution of w is defined as: 108
109
where T is the matrix of epigenetic relationships between individuals and is the 110
transgenerational epigenetic variance. As before, the prior distribution of is defined 111
as: 112
113
with hyperparameters and . 114
The structure of the T matrix is defined by the recursive relationship between the 115
epigenetic effect of one individual ( iw ) with respect to the epigenetic effects of its 116
father ( fiw ) and mother ( miw ): 117
imifii www 118
2
u2
e
),(~ 212
uuu ns ),(~ 212
eee ns
2
us 2
es un en
eZwZuXby
),(~ 2
wMVN T0w
2
w
2
w
),(~ 212
www ns
2
ws wn
7
where 119
)1(2
1v 120
As defined by Tal et al. (2010), v is the reset coefficient and (1-v) is the epigenetic 121
transmission coefficient. The reset coefficient represents the proportion of epigenetic 122
marks across the parental genome that are expected to be erased, whereas its opposite, 123
the epigenetic transmission coefficient, indicates the proportion that are transmitted. 124
Further, is the residual epigenetic effect of the ith individual, independent from wi,, 125
and whose distribution is: 126
if both parents are known; 127
if only one ancestor in known; 128
assuming that the variance of transgenerational epigenetics effects (V()) is constant 129
across generations: 130
2
wmifii wVwVwV . 131
In matrix notation: 132
(1) 133
where the P matrix defines a recurrent relationship with the epigenetic effects of the 134
father and mother. For non-base individuals, the ith row of the P matrix contains a 135
parameter λ in the column pertaining to the father and mother of the ith individual. The 136
rest of elements are null. 137
Furthermore if: 138
139
then 140
112 ')(
PIVPITwV w 141
i
))21(,0(~ 22
wi N
ei ~ N(0,(1-l2 )sw
2 )
w= Pw+e
1
PIw
8
where V is a diagonal matrix with entries equal to for base individuals, 142
for individuals with one known ancestor, and for individuals 143
whose father and mother are known. The prior distribution for will be assumed to be 144
uniform, between 0 and 0.5. 145
146
Parameter estimation through a Gibbs Sampler 147
The Gibbs sampler algorithm is an iterative, updating sampling scheme that obtains 148
samples from the marginal posterior distributions of all the unknowns in a model. It 149
requires samples from the full conditional distributions of all the parameters. In the 150
proposed model, the set of parameters can be classified into three main groups: i) the 151
location parameters (b, u and w); ii) the parameter λ associated with matrix T; iii) the 152
variance components ( 2
u , 2
w and 2
e ). 153
The full conditional distributions for these groups are: 154
i) Sampling of the location parameters (b, u, w). 155
The conditional distributions of the location parameters are univariate Gaussian (Wang 156
et al. 1994), with parameters drawn from the following mixed model equations: 157
158
where 159
160
and and . 161
Specifically, the full conditional distribution for the ith location parameter (si) is: 162
2
w
22 )1( w 22)21( w
rCs
yZ
yZ
yX
r
w
u
b
s
TZZZZXZ
ZZAZZXZ
ZXZXXX
C
'
'
'
'''
'''
'''
1
1
2
2
u
e
2
2
w
e
9
163
given the multivariate Gaussian nature of the conditional distribution of the location 164
parameters. 165
A key limitation of the procedure is the calculation of the A-1 and T-1 matrices. Whereas 166
A-1 is calculated by the standard Henderson’s rules (Henderson 1976) only once, the T-1 167
matrix needs to be calculated afresh in each cycle as the parameter λ is updated. An 168
algorithm to set-up the T-1 matrix is presented in the appendix of this work (see 169
appendix). 170
ii) The full conditional distribution of . 171
The conditional distribution of is developed from the recursive definition of the 172
epigenetic effects (equation 1). It is a truncated univariate Gaussian distribution (TN) 173
between 0 and 0.5 due to the prior distribution of the parameter. 174
175
176
177
178
ii
e
ii
ji
jiji
icc
scr
Ns2
,~
,~ ]5.0,0[TNp w
ml =
(wfi + wmi )wi
s w
2 1- 2l 2( )( )i=1
n1
å +wfiwi
s w
2 1- l 2( )( )i=1
n2
å +wmiwi
s w
2 1- l 2( )( )i=1
n3
å
(wfi + wmi )2
s w
2 1- 2l 2( )( )i=1
n1
å +wfi
2
s w
2 1- l 2( )( )i=1
n2
å +wmi
2
s w
2 1- l 2( )( )i=1
n3
å
322
122
2
122
2
122
2
1121
)(
1n
i w
mi
n
i w
fin
i w
mifi wwww
10
where n1 is the number of individuals with known fathers and mothers, n2 is the
179
number of individuals with a known father only and n3 is the number of individuals with
180
a known mother.
181
iii) The full conditional distributions of the variance components ( ). 182
The conditional distributions for the variance components are the following inverted 183
chi-square distributions 184
eee
www
uuu
nnans
nnans
nnans
,'~
,'~
,'~
212
2112
2112
ee
wTw
uAu
185
where nan is the number of individuals in the population, ndat is the number of 186
phenotypic data and 2
xs and xn are the hyperparameters for 2
x . 187
188
Simulated Data Sets 189
In order to evaluate the procedure we simulated two different datasets. The first one had 190
a relative high percentage of variation caused by additive and transgenerational 191
epigenetic effects (35% and 20%, respectively) and an intermediate reset coefficient 192
(v=0.40). In the second dataset, a lower percentage of variation was explained by 193
additive genetic and transgenerational epigenetic effects (15% and 10%, respectively) 194
and the reset coefficient was higher (v=0.80). Each dataset was composed by a three 195
generation population. The base populations included 3,000 individuals (1,500 sires and 196
1,500 dams) and each of the two subsequent generations were composed by 3,000 full 197
sib families of 10 individuals each one. The sire and dam for each family were sampled 198
randomly from the individuals of the previous generation. The genetic and epigenetic 199
s u
2,sw
2,s e
2
11
effects for each individual (ui and wi) in the base population were generated from the 200
following Gaussian distributions: 201
),0(~
),0(~
2
2
wi
ui
Nw
Nu
202
Further, the genetic and epigenetic effects for the individuals (uj and wj) of the second 203
and third generation were obtained by sampling from: 204
))1(,(~
)2
,2
1
2
1(~
22
2
wmjfjj
u
mjfjj
wwNw
uuNu
205
where fiu and mju , and fiw and mjw were the additive genetic and transgenerational 206
epigenetic effects of the father and the mother of the jth individual, respectively. In 207
addition, one phenotypic record (yi) was generated for every individual by:
208
),(~ 2
eiii wuNy 209
where was a general mean set to 100 units. The values of the parameters for the two 210
datasets were: 211
Dataset 1.
212
40.0,20.0,35.0,30.0,270,120,210 22222 vhewu
213
being 214
)/( 22222
ewuuh and )/( 22222
ewuw
215
Dataset 2. 216
80.0,10.0,15.0,10.0,450,60,90 22222 vhewu 217
218
The Gibbs sampler was implemented using own software written in Fortran 95. 219
The analysis consisted of a single chain of 1,250,000 cycles and the first 250,000 were 220
discarded. Each analysis took 36 hours using a single thread of an Intel Xeon E5-2650 221
12
of 2.00 GHz. The source code is included as supplementary information. Convergence 222
was checked by the visual inspection of the chains and with the test of Raftery and 223
Lewis (1992). All samples were stored for calculating summary statistics. 224
225
Pirenaica Beef Cattle Data 226
As an illustrative example, we used a database of phenotypic records from the yield 227
recording system of the Pirenaica beef cattle breed. The Pirenaica breed is a meat-type 228
beef population from northern Spain with an approximate census of 20,000 individuals 229
that are typically reared under extensive conditions (Sánchez et al. 2002). The data set 230
was made up of 78,209 records for Birth Weight (BW) with an average value of 41.52 231
kg and a raw standard deviation of 4.65 kg. In addition, a pedigree file including 232
125,974 individual-sire-dam records was utilised. This information was provided by the 233
National Pirenaica Breeders Confederation (Confederación Nacional de Asociaciones 234
de Ganado Pirenaico; http://www.conaspi.net). Animal Care and Use Committee 235
approval was not required for this study as field data was obtained from the Yield 236
Recording System of the Pirenaica breed; furthermore, data was recorded by the 237
stockbreeders themselves, under standard farm management, with no additional 238
requirements. 239
The full model of analysis was: 240
ewZhZpZmZuZXby 13221 241
Where b is the vector of fixed systematic effects that included sex (2 levels) and age of 242
the mother (16 levels), u and m are the vectors of direct and maternal additive genetic 243
effects with 125,974 elements, p is the vector of permanent environmental maternal 244
effects (21,143 levels), h is the vector of the random herd-year-season effects (12,925 245
levels), w is the vector of transgenerational epigenetic effects (125,974 levels), and e is 246
13
the vector of the residuals. Further, X, Z1, Z2 and Z3 are the incidence matrices that link 247
the different effects with the phenotypic data. Appropriate uniform bounded 248
distributions were assumed for the systematic effects and for each variance component (249
},,,{ 22222
ewhpx ), defined by hyperparameters (nx=-2 and s2x=0). Further, the 250
prior distribution for the (co) variance matrix between direct and maternal genetic 251
effects (G) was the following inverted Wishart: 252
),(, 00 GGG GG nIWnp 253
where 254
2
2
mum
umu
G 255
being 2
u and 2
m the direct and maternal genetic variances and um the covariance 256
between them. In this case, Gn was set to -3 and G0 to a 2 × 2 matrix of zeroes to define 257
a uniform distribution. 258
Two statistical models (I and II) were fitted. Model I includes transgenerational 259
epigenetic variance ( 2
w ), whereas Model II does not. For each model, the Gibbs 260
sampler was implemented with a single chain of 3,250,000 cycles and the first 250,000 261
were discarded. Convergence was checked by the visual inspection of the chains and 262
with the test of Raftery and Lewis (1992). The models were compared based on the 263
pseudo-log-marginal probability of the data (Gelfand 1996, Varona and Sorensen 2014) 264
by computing the logarithm of the Conditional Predictive Ordinate (LogCPO) 265
calculated from the MCMC samples. It was calculated as: 266
i
kii MypLogCPO ,ln y
267
where y-i is the vector of data with the ith datum (yi) deleted, Mk is the kth model and 268
14
1
1 ,
1,
Ns
j k
j
ki
kiiMyp
NsMyp
y 269
being Ns is the number of MCMC draws, j
k is the jth draw from the posterior 270
distribution of the parameters of the kth model. 271
272
RESULTS AND DISCUSSION 273
From the perspective of quantitative genetics, the estimation of variance components is 274
obtained from the statistical information provided by the covariance between the 275
phenotypes of relatives (Falconer and Mackay 1996). Tal et al. (2010) proposed a 276
simple model for the resemblance between relatives under transgenerational epigenetic 277
inheritance for asexual and sexual reproduction. This model assumes that 278
transgenerational epigenetic effects are not correlated with the additive genetic ones, 279
and that they are distributed under a Gaussian law. After the central limit theorem, this 280
may be explained by a large number of epigenetic marks randomly distributed across 281
the genome. In addition, the distribution pattern of these marks is assumed independent 282
between individuals unless they are relatives. In that case, some of these marks should 283
have not been erased during the meiosis drawing them apart from their common 284
ancestor. The population rate of erasure of these marks is measured through the reset 285
coefficient (v). 286
In this study, we propose a procedure that makes use of the aforementioned authors' 287
definition for sexual diploid organisms to estimate the parameters under a Bayesian 288
mixed model framework. Our approach makes it feasible to estimate transgenerational 289
epigenetic variability from huge datasets of genealogical and phenotypic data. The key 290
to the procedure is the definition of a T matrix of (co) variance between 291
transgenerational epigenetic effects. This matrix exclusively depends on a single 292
15
parameter ( ) which is directly related to the reset coefficient (v) put forward by Tal et 293
al. (2010). As an example, for a simple pedigree of 7 individuals (Figure 1), the T 294
matrix is: 295
296
As Tal et al. (2010) stated, if we compare the covariance between full sibs (individuals 297
5 and 6) and the covariance between sire-offspring (individuals 1 and 4), the covariance 298
is higher for the sire offspring pair, vs. , as is defined as being within the 299
interval (0, 0.5). Note that if the reset coefficient is 0, then λ is ½ and the T matrix 300
becomes equivalent to the numerator relationship matrix (A), where the covariances 301
between full sibs and sire offspring are equivalent. Similarly, the uncle-nephew 302
covariance (individuals 5 and 7) is , lower than the covariance between half sibs 303
(individuals 4 and 5) which is only . 304
The mixed model implementation of the covariance between relatives requires the 305
inverse of the T matrix to construct the mixed model equations, in a similar manner as 306
the numerator relationship matrix in the standard model (Henderson 1984). In this 307
study, we describe a simple procedure for the calculation of this T-1 matrix (see 308
appendix). The procedure takes into account the recursive nature of the 309
transgenerational epigenetic effects, using an argument equivalent to Quass (1976), for 310
the inverse of numerator relationship matrix (A-1), and Quintanilla et al. (1999), for 311
dam-related permanent maternal environmental effects. The main consequence is that 312
the inverse of the T matrix can be sequentially constructed by reading the pedigree of 313
120
120
2210
10
0100
000010
001
3322
22
322
322
2
2
T
22
32
2
16
the population, in the light of very simple rules. These rules are close to Henderson's 314
(1976) rules for constructing the inverse of the A matrix. 315
For the example pedigree, the T-1 matrix is: 316
317
318
Given this algorithm to set-up the T-1 matrix, the implementation of a Gibbs sampling 319
approach becomes straightforward; it merely requires sequential iterative sampling from 320
Gaussian (b, u and w), truncated Gaussian ( ) and inverted chi-square distributions (321
). The only minor complication is that the T-1 matrix depends on parameter 322
and consequently must be calculated in each Gibbs sampler’s iteration. One possible 323
alternative to avoid this matrix inversion, although not to avoid its updating in each 324
cycle, is to implement an approach similar to the one proposed by Rodríguez-Ramilo et 325
al. (2014) for the genomic relationship matrix. However, setting up the T matrix is 326
computationally more demanding than the calculation of its inverse with the algorithm 327
proposed in this study. 328
The proposed procedure was first checked with simulated data. The summary of the 329
marginal posterior distributions for the variance components, additive genetic (h2) and 330
epigenetic ( 2 ) heritability, reset coefficient (v) and epigenetic transmission coefficient 331
(1-v) for both cases of simulation are presented in Table 1. The highest posterior density 332
)1(
1
)1(00000
)1()1()21(
100
)21(0
)21(
00)21(
10
)21(0
)21(
000)21(
10
)21()21(
0)21()21(
0)21(
2100
000)21(
0)21(
10
0)21()21()21(
00)21(
31
22
22
2
222
222
222
222
2
22
2
2222
2
1
T
s u
2,sw
2,s e
2
17
at 95 % for all parameters in the model included the simulated values. However, some 333
details of the results should be highlighted. In the second case of simulation, with lower 334
2
w and , the posterior standard deviation (and the HPD95) for the 2
w and 2
e were 335
remarkable wider (66.19 and 67.42, respectively). The cause of this phenomenon is that 336
there is a statistical confusion between the epigenetic and residual variance components 337
when the reset coefficient (v) is very high, because the T and I matrices becomes very 338
similar. To illustrate this fact, in Figures 2 and 3, we present the joint posterior densities 339
of 2 and v for both cases of simulation. The results of the first case of simulation 340
showed posterior independence between both parameters, whereas, in the second, the 341
marginal posterior density presented a half-moon shape. This indicates that for large 342
values of v, 2 (and 2
w ) may take any value on its support with a fairly equal 343
probability. In other words, in the first simulated case the model showed very good 344
ability to discriminate between both parameters, whereas in the second it did not. In 345
addition, mixing of the McMC procedure in the second case of simulation was clearly 346
worst, and adaptive McMC algorithms, such as the proposed by Mathew et al. (2012) 347
may represent an interesting alternative for its implementation in large data sets. 348
It must be noted that the sources of information for the estimation of 2
w and v (or ) 349
comes from the comparison between the covariance between different categories of 350
relatives (see Table 2). Following Tal et al. (2010), one estimate of v can be achieved 351
from the difference between the estimates of covariance between half-sib and uncle-352
nephew relationships, a difference that becomes almost null for low values of (or 353
high v), as seen in the second case of simulation (23.10 vs 22.62). On the contrary, with 354
higher values of (or lower v), as in the first case of simulation, this difference is 355
higher (60.60 vs. 57.36), and, thus, the amount of information available for estimation 356
purposes increases. This uncertainty in the estimation of (or v) is also reflected in the 357
18
estimation of 2
w ( 2 ) and 2
e and implies wider posterior marginal densities. 358
Nevertheless, it is important to emphasize that the joint posterior mode (v = 0.80, 2 = 359
0.13) in the second case of simulation was very close to the simulated parameters (0.80 360
and 0.10, respectively). 361
We have also checked our model with a real dataset. In Table 3, summary statistics of 362
the marginal posterior distributions for the parameters and LogCPO values are 363
presented for models I and II in the analysis of Birth Weight from the Pirenaica beef 364
cattle population. The posterior mean estimates for heritability ranged between 0.38 365
(Model I) to 0.41 (Model II), and the posterior mean estimates for transgenerational 366
epigenetic heritability was only 0.04 (Model I), with a highest posterior density at 95% 367
that ranged between 0.00 and 0.11. These results are coherent with the output of the 368
model comparison test (LogCPO), which pointed to Model II as the more plausible. If 369
we focus in this latter model, the estimates of direct and maternal heritability were 370
within the range of estimates in the literature for these traits (Meyer 1992; Varona et al. 371
1999; Jamrozik et al. 2014), and the absence of the epigenetic transgenerational 372
heritability is coherent with the fact that most of the epigenetic marks are erased during 373
the meiosis in mammals (Jablonka and Raz 2009, Schmitz 2014). However, it should be 374
highlighted that the posterior mean estimate of the reset coefficient under Model I was 375
surprisingly low (0.20), although its posterior standard deviation was very high (0.20), 376
and that the HPD95 region covered almost all the parametric space (0.01-0.89). This 377
wide range reflects the absence of information to properly estimate the parameter, given 378
the low magnitude of 2
w . 379
It is worth noting that the procedure can also provide epigenetic breeding values in the 380
fields of animal and plant breeding. These are calculated by weighting the phenotypic 381
information of relatives according to the magnitude of the elements of the T matrix. In 382
19
fact, for epigenetic effects, the weight of distantly related individuals is lowered as the 383
number of opportunities to reset the epigenetic marks increases. Both additive genetic 384
and epigenetic effects can be used for prediction of the future performance of the 385
individual: the expected genetic response (R) after one cycle of mass selection should 386
be , where i is the intensity of selection and is the phenotypic 387
variance. However, further research must be undertaken in order to develop adequate 388
indices of selection that consider genetic and epigenetic effects. Although both of them 389
affect the immediate future performance of the offspring, epigenetic effects are diluted 390
in future generations as the epigenetic marks are cleaned. It is important to note that if 391
this selection pressure is relaxed, the average epigenetic effect declines to zero with a 392
rate of , where n is the number of generations without selection. 393
It should be further noted that the proposed model uses a very basic definition of 394
transgenerational epigenetic inheritance, as it assumes equal epigenetic variance for all 395
individuals in the population. However, when phenotypic records are recorded through 396
the life of the individual, transgenerational epigenetic variance can be modelled as age 397
dependent, based on the consideration that the number of epigenetic marks was 398
accumulated in the genome of the individuals. Moreover, in the proposed model, it is 399
assumed that the parameter (or the reset coefficient) is equal for fathers and mothers, 400
and, in future research, it seems reasonable to assume a different transmission 401
coefficient for males and females, allowing for the presence of sex differential genomic 402
imprinting (Barlow 1997; Reik and Walter 1998). The model can be refined by the 403
inclusion of a hierarchical Bayesian paradigm that includes systematic effects for any 404
environmental factors that may influence the epigenetic transmission coefficient. In 405
addition, a more profound approach could even allow for the consideration of a genetic 406
222 1 pvhiR 2
p
nv1
20
determinism of the reset coefficient with a model similar to that proposed by Varona et 407
al. (2008) for the degree of asymmetry of a skewed Gaussian distribution. 408
Finally, and as mentioned by Jablonka and Law (2009) and Tal et al. (2010), 409
epigenetics may be viewed as a more wide ranging concept which could include several 410
types of cultural transmission (Avital and Jablonka 2000; Richerson and Boyd 2005). 411
The procedure suggested in this work could also be applied to the analysis of human or 412
animal datasets that involve any kind of transgenerational transmission, even those not 413
directly related to gene expression. 414
By way of a conclusion, we can say that this paper presents an original procedure for 415
the estimation of transgenerational epigenetic variability based on some generalised 416
assumptions. It is hoped that the proposal will lead to future research on variations of 417
epigenetic transmission abilities caused by environmental and/or genetic factors. 418
419
ACKNOWLEDGEMENTS 420
This research was funded by grant AGL2010-15903 (Ministerio de Ciencia e 421
Innovación, Spain). The authors would like to thank the Pirenaica Breeders Association 422
(Confederación Nacional de Asociaciones de Ganado Pirenaico) for the provision of 423
data used in this work. We also want to thank the useful comments of the editor and the 424
reviewers. 425
426
REFERENCES 427
Avital, E., and E. Jablonka, 2000 Animal Traditions: Behavioural Inheritance in 428
Evolution. Cambridge University press, Cambridge. 429
Barlow, D. P., 1997 Competition - A common motif for the imprinting mechanism?. 430
EMBO J. 16: 6899-6905. 431
21
Eccleston, A., N. DeWitt, C. Gunter, B. Marte, and D. Nath, 2007 Introduction 432
Epigenetics. Nature. 447:395-400. 433
Esteller, M., 2008. Epigenetics of Cancer. N. Engl. J. Med. 358:1148-1159. 434
Falconer, D. S., and T. F.C. Mckay, 1996 Introduction to Quantitative Genetics, Ed 4. 435
Longmans Green, Harlow, Essex, UK. 436
Feinberg, A. P., 2007 Phenotypic plasticity and the epigenetics of human disease. 437
Nature 447: 433-440. 438
Gelfand, A. E., 1996 Model determination using sampling-based methods, pp. 145–161 439
in Markov Chain Monte Carlo in Practice, edited by W. R. Gilks, S. Richardson, and D. 440
J. Spiegelhalter. Chapman & Hall, London. 441
Gelfand, A. E., and A. F. M. Smith, 1990 Sampling-based approaches to calculating 442
marginal densities. Journal of the American Statistical Association 85:398-409. 443
Gianola, D., and R. L. Fernando, 1986 Bayesian methods in animal breeding theory. J. 444
Anim. Sci. 63:217-244. 445
Grossniklaus, U., W. G. Kelly, A. C. Ferguson-Smith, M. Pembrey, and S. Lindquist, 446
2013 Transgenerational epigenetic inheritance: how important is it?. Nat. Rev. Genet. 447
14: 228-235. 448
Heard, E., and R. A. Martienssen, 2014 Transgenerational epigenetic inheritance: Myths 449
and mechanisms. Cell 157: 95-109. 450
Henderson, C. R., 1976 A simple method for computing the inverse of a numerator 451
relationship matrix used in the prediction of breeding values. Biometrics 32:69-83. 452
Henderson, C. R., 1984 Application of Linear Models in Animal Breeding. University of 453
Guelph, Guelph, ON. 454
Henderson, I.R., and S. E. Jacobsen, 2007 Epigenetic inheritance in plants. Nature 455
447:418-424. 456
22
Jablonka, E., and G. Raz, 2009 Transgenerational epigenetic inheritance: Prevalence, 457
mechanisms and implications for the study of heredity and evolution. Q Rev Biol 458
84:131-176. 459
Jamrozik, J., and S. P. Miller, 2014 Genetic evaluation of calving ease in Canadian 460
Simmentals using birth weight and gestation length as correlated traits. Lives. Sci. 461
162:42-49. 462
Law, J. A., and S. E. Jacobsen, 2010 Establishing, maintaining and modifying DNA 463
methylation patterns in plants and animals. Nat Rev Genet 11:204-220. 464
Mathew, B., A. M. Bauer, P. Koistinen, T. C. Reetz, J. Léon, and M. J. Sillanpää. 2012. 465
Bayesian adaptive Markov chain Monte carlo estimation of genetic parameters. 466
Heredity 109:235-234. 467
Meyer, K., 1992 Variance components due to direct and maternal effects for growth 468
traits of Australian beef cattle. Lives. Prod. Sci. 11:143-177. 469
Patterson, H. D., and R. Thompson, 1971 Recovery of inter-block information when 470
block sizes are unequal. Biometrika 58:545-554. 471
Quass, R.L., 1976 Computing the diagonal elements of the inverse of a large numerator 472
relationship matrix. Biometrics 32:949-953. 473
Quintanilla, R., L. Varona, M. R. Pujol, and J. Piedrafita, 1999 Maternal animal model 474
with correlation between maternal environmental effects of related dams. J. Anim. Sci. 475
77:2904-2917. 476
Raftery, A.E., and S. M. Lewis, 1992 How many iterations in the Gibbs Sampler, pp. 477
763-774 in Bayesian Statistics IV, edited by J. M. Bernardo, J. O. Berger, A. P. Dawid 478
and A. F. M. Smith. Oxford University Press. New York. 479
Reik, W., 2007 Stability and flexibility of epigenetic gene regulation in mammalian 480
development. Nature 447:425-432. 481
23
Reik, W., and J. Walker, 1998 Imprinting mechanisms in mammals. Curr. Opin. Genet. 482
Dev. 8:154-164. 483
Richerson, P. J., and R. Boyd, 2005 Not by Genes Alone: How Culture Transformed 484
Human Evolution. University of Chicago Press. Chicago. 485
Rodríguez-Ramilo, S. T., L. A. García-Cortés, O. González-Recio, 2014 Combining 486
genomic and genealogical information in a reproducing kernel Hilbert spaces regression 487
model for genome-enabled predictions in dairy cattle. PLoS ONE 9:e93424 488
Sánchez, A., J. Ambrona, and L. Sánchez, 2002 Razas ganaderas españolas bovinas. 489
MAPA-FEAGAS. Madrid. SPAIN. 490
Schmitz, R. J., 2014 The secret garden – epigenetic alleles underlies complex traits. 491
Science 343:1082-1083. 492
Tal, O., E. Kisdi, and E. Jablonka, 2010 Epigenetic contribution to covariance between 493
relatives. Genetics. 184:1037-1050. 494
Varona, L., N. Ibañez-Escriche, R. Quintanilla, J. L. Noguera, and J. Casellas, 2008 495
Bayesian analysis of quantitative traits using skewed distributions. Genetics Research 496
90:179-190. 497
Varona, L., I. Misztal, and J. K. Bertrand, 1999 Threshold-linear versus linear-linear 498
analysis of birth weight and calving ease using an animal model: I. Variance component 499
estimation. J. Anim. Sci. 77:1994-2002. 500
Varona, L., D. Sorensen, 2014 Joint Analysis of binomial and continuous traits with 501
recursive model: a case study using mortality and litter size in pigs. Genetics 196:643-502
651. 503
Wang, C. S., J. J. Rutledge, and D. Gianola, 1994 Bayesian analysis of mixed linear 504
models via Gibbs Sampling with an application to litter size in Iberian pigs. Genet. Sel. 505
Evol. 26:91-115. 506
24
Youngson, N. A., and E. Whitelaw, 2008 Transgenerational epigenetic effects. Annu. 507
Rev. Genomics Hum. Genet. 9:233-257. 508
509
25
APPENDIX 510
Derivation of the T-1 matrix. 511
The vector of epigenetic effects (w) can be represented as: 512
513
where the P matrix defines a recurrent relationship with the epigenetic effects of the 514
father and mother. For non-base individuals, the ith row of the P matrix contains a 515
parameter λ in the column pertaining to the father and mother of the ith individual. The 516
rest of the elements are null. 517
Furthermore, if: 518
519
then 520
521
given that 522
PIVPITwV 1
2
11'
1
w
(A1) 523
from , it is possible to define a QT matrix as: 524
'1
2PITPIVQT
w (A2) 525
If we consider the structure of and , the QT matrix has a diagonal 526
structure with 1’s for base individuals and and for those with one or two 527
known parents, respectively. 528
Finally, replacing (A2) into (A1): 529
530
w= Pw+e
1
PIw
11')(
PIPIwV V
2
wTwV
wPI
PI 'PI
21 221
PIQPIT T 11 '
26
Taking advantage of this structure, it is possible to define the following algorithm for 531
constructing T-1 by reading the pedigree information and calculating for each individual 532
(i) in the pedigree: 533
a) If the father and mother are unknown; 534
add 1 to the diagonal (i,i) 535
b) If only one parent (p) is known; 536
add to the diagonal (i,i) 537
add to the elements (i,p), (p,i) 538
add to the element (p,p) 539
c) If both ancestors (p and m) are known; 540
add to the diagonal (i,i) 541
add to the elements (i,p), (i,m), (m,i), (p,i) 542
add to the elements (p,p), (p,m), (m,p), (m,m) 543
544
21
1
21
2
2
1
221
1
221
2
2
21
27
Table 1. Posterior Mean Estimate (PM), Posterior Standard Deviation (PSD) and 545
Highest Posterior Density at 95 % (HPD95) for simulation cases I and II. 546
Case I
su
2 = 210sw
2 =120se
2 = 270 l = 0.30
Case II
10.04506090 222 ewu
Parameter PM PSD HPD95 PM PSD HPD95
2
u 217.87 18.21 174.66-247.49 90.91 4.63 81.42-99.25
2
w 132.92 14.42 106.30-164.53 111.99 66.19 37.45-288.61
2
e 251.00 19.71 205.07-282.31 396.30 67.42 217.79-475.16
2h 0.362 0.029 0.291-0.408 0.152 0.007 0.136-0.164
2 0.221 0.024 0.176-0.274 0.187 0.110 0.062-0.481
0.256 0.055 0.148-0.365 0.091 0.059 0.020-0.233
v 0.488 0.111 0.270-0.704 0.818 0.117 0.534-0.960
1-v 0.512 0.111 0.296-0.730 0.182 0.117 0.040-0.466
547
s u
2 is the additive genetic variance, s w
2 is the transgenerational epigenetic variance and 548
s e
2 is the residual variance. Moreover, h2 is the heritability, g 2is the transgenerational 549
epigenetic heritability, l is the autorecursive parameter, v is the reset coefficient and 1-550
v the epigenetic transmission coefficient. 551
552
553
554
555
28
Table 2. Expected covariance between relatives in cases of simulation I and II 556
Relatives Expected
Covariance
Case I
30.0120210 22 wu
Case II
10.06090 22 wu
Offspring-
Progeny
22
2
1wu
132 51
Full-Sibs 222 22
1wu
121.2 46.2
Half-Sibs 222
4
1wu
60.6 23.1
Uncle-Nephew 232 24
1wu
57.36 22.62
557
s u
2 is the additive genetic variance, s w
2 is the transgenerational epigenetic variance and 558
l is the autorecursive parameter. 559
560
29
561
Table 3. Posterior Mean Estimate (PM), Posterior Standard Deviation (PSD) and 562
Highest Posterior Density at 95 % (HPD95) for models I and II. 563
Model I Model II
PM PSD HPD95 PM PSD HPD95 2
u 9.033 0.914 7.059, 10.449 9.897 0.408 9.123, 10.727
2
m 3.574 0.212 3.168, 3.998 3.554 0.217 3.135, 3.981
um -4.473 0.253 -4.979, -3.990 -4.508 0.259 -5.032, -4.013
2
p 0.747 0.081 0.590, 0.906 0.765 0.079 0.611, 0.922
2
w 0.876 0.820 0.000, 2.701 - - -
2
h 2.634 0.069 2.501, 2.771 2.633 0.069 2.499, 2.770
2
e 7.055 0.223 6.606, 7.478 7.156 0.209 6.735, 7.556
0.398 0.103 0.056, 0.494 - - -
v 0.204 0.207 0.012, 0.888 - - -
1-v 0.796 0.207 0.112, 0.988 - - - 2h 0.377 0.035 0.299, 0.427 0.412 0.012 0.389, 0.436 2m 0.149 0.007 0.135, 0.164 0.148 0.007 0.133, 0.162
2 0.036 0.034 0.000, 0.113 - - -
LogCPO -287542.3 -287475.1
564
s u
2 is the additive genetic variance, s m
2 is the maternal environmental variance, um is 565
the covariance between them, 2
p is the permanent maternal environmental variance, 566
2
h is the herd-year-season variance, s w
2 is the transgenerational epigenetic variance 567
and s e
2 is the residual variance. Moreover, h2 is the heritability, 2m is the maternal 568
heritability, g 2 is the transgenerational epigenetic heritability, l is the autorecursive 569
parameter, v is the reset coefficient and 1-v the epigenetic transmission coefficient. 570
LogCPO is the logarithm of the conditional predictive ordinate. 571
572
31
Figure 2. Joint posterior distribution of the transgenerational epigenetic heritability ( 2575
) and the reset coefficient (v) in the first case of simulation. 576
577
578