A Bayesian Model for the Analysis of Transgenerational Epigenetic Variation.

1

A Bayesian Model for the Analysis of Transgenerational Epigenetic Variation. 1

Luis Varona*§, Sebastián Munilla*†, Elena Flavia Mouresan*, Aldemar González-2

Rodríguez*, Carlos Moreno*§, Juan Altarriba*§. 3

4

* Unidad de Genética Cuantitativa y Mejora Animal, Universidad de Zaragoza, 5

50013, Zaragoza, SPAIN. 6

§ Instituto de Biocomputación y Física de los Sistemas Complejos (BIFI), 7

Universidad de Zaragoza, 50018, Zaragoza, SPAIN. 8

† Departamento de Producción Animal, Facultad de Agronomía, Universidad de 9

Buenos Aires, 1417, Ciudad Autónoma de Buenos Aires, ARGENTINA 10

11

12

G3: Genes|Genomes|Genetics Early Online, published on January 23, 2015 as doi:10.1534/g3.115.016725

© The Author(s) 2013. Published by the Genetics Society of America.

2

RUNNING TITLE: Epigenetic Variance Component Estimation 13

KEYWORDS: Epigenetics, Bayesian Analysis, Genetic Variance, Resemblance 14

Between Relatives. 15

16

CORRESPONDING AUTHOR: 17

Luis Varona 18

Unidad de Genética Cuantitativa y Mejora Animal. 19

Facultad de Veterinaria. Universidad de Zaragoza 20

c/ Miguel Servet 177. 50013. ZARAGOZA. SPAIN 21

Phone: (34)976-761623 22

Fax: (34)976-761612 23

e-mail: [email protected] 24

25

3

ABSTRACT 26

Epigenetics has become one of the major areas of biological research. However, the 27

degree of phenotypic variability that is explained by epigenetic processes still remains 28

unclear. From a quantitative genetics perspective, the estimation of variance 29

components is achieved by means of the information provided by the resemblance 30

between relatives. In a previous study, this resemblance was described as a function of 31

the epigenetic variance component and a reset coefficient that indicates the rate of 32

dissipation of epigenetic marks across generations. Given these assumptions, we 33

propose a Bayesian mixed model methodology that allows the estimation of epigenetic 34

variance from a genealogical and phenotypic database. The methodology is based on the 35

development of a T matrix of epigenetic relationships that depends on the reset 36

coefficient. In addition, we present a simple procedure for the calculation of the inverse 37

of this matrix (T-1) and a Gibbs sampler algorithm that obtains posterior estimates of all 38

the unknowns in the model. The new procedure was used with two simulated data sets 39

and with a beef cattle database. In the simulated populations, the results of the analysis 40

provided marginal posterior distributions that included the population parameters in the 41

regions of highest posterior density. In the case of the beef cattle dataset, the posterior 42

estimate of transgenerational epigenetic variability was very low and a model 43

comparison test indicated that a model that did not included it was the most plausible. 44

45

4

INTRODUCTION 46

Epigenetics studies variations in gene expression that are not caused by modifications of 47

the DNA sequence (Eccleston et al. 2007) and is now considered as one of the most 48

important fields of biological research. Several biochemical mechanisms that alter gene 49

activity (DNA methylation, histone modifications etc.) underpin epigenetic processes 50

(Jablonka and Raz 2009; Law and Jacobsen 2010). A number of studies have 51

emphasised the importance of epigenetics in cancer (Esteller 2008) and other human 52

illnesses (Feinberg 2007) as well as in other mammal (Reik 2007) and plant traits 53

(Henderson and Jacobsen 2007). Gene activity modifications may occasionally occur in 54

a sperm or an egg cells and can be transferred to the next generation, denoted as 55

transgenerational epigenetic inheritance (Youngson and Withelaw 2008); a phenomenon 56

that has been reported in a wide range of organisms (Jablonka and Raz 2009). Despite 57

this, the magnitude of the phenotypic variation that is explained by epigenetic processes 58

still remains unclear (Grossniklaus et al. 2013; Heard and Martienssen 2014). 59

From the perspective of quantitative genetics, the presence of transgenerational 60

epigenetic inheritance involves a redefinition of the covariance between relatives. Tal et 61

al. (2010) developed a model for the calculation of the covariance between relatives for 62

asexual and sexual reproduction as a function of epigenetic heritability ( ), the reset 63

coefficient (v), and its complement, the epigenetic transmission coefficient (1-v). 64

According to these authors, the covariance between relatives is reduced as the number 65

of opportunities to dissipate (or reset) the epigenetic marks increase. Therefore, for 66

sexual diploid organisms, the covariance between parent and offspring is higher than the 67

covariance between full sibs, and the covariance between half sibs is higher than the 68

covariance between an uncle and its nephew, despite the fact that the additive numerator 69

relationship between them is identical (0.5 and 0.25, respectively). 70

2

5

In animal and plant breeding, the standard procedure for estimating variance 71

components is by means of a linear mixed model (Henderson 1984) that includes 72

systematic or random environmental effects, one or more random genetic effects and a 73

residual. The variance components are estimated by using likelihood-based procedures 74

(Patterson and Thompson 1971) or Bayesian approaches (Gianola and Fernando 1986). 75

Under the Bayesian paradigm, the standard procedure for determining the posterior 76

distribution of the variance components uses the Gibbs Sampler algorithm (Gelfand and 77

Smith 1990) that involves an updated iterative sampling scheme from the full 78

conditional distributions of all the unknowns in the model. 79

The aim of this study is to present a Bayesian linear mixed model that allows to 80

estimate the heritable epigenetic variability, the reset coefficient and the epigenetic 81

transmission coefficient, based on genealogical and phenotypic information. The model 82

is illustrated with two simulated datasets and with one example that considers a beef 83

cattle database. 84

85

MATERIAL AND METHODS 86

Statistical Model 87

The standard mixed linear model is described by the following equation: 88

eZuXby 89

where y is the vector of phenotypic records, b is the vector of systematic effects, u is the 90

vector of random additive genetic effects, and e is the vector of residuals. Then, X and 91

Z, are the matrices that link the systematic and additive genetic effects with the data. 92

The usual assumption for the prior distribution of u and e are the following multivariate 93

Gaussian distributions (MVN): 94

2,~ uMVN A0u 2,~ eMVN I0e 95

6

where and are the additive genetic and residual variances, respectively, and A is 96

the numerator relationship matrix. Further, the prior distribution of b is commonly 97

assumed to be a uniform distribution. Conjugate priors for the variance components are 98

the following inverted chi-square distributions: 99

100

where and are the prior values of the variances and and are their 101

corresponding prior “degrees of belief”. 102

In order to estimate transgenerational epigenetic variability, this standard model can be 103

expanded to: 104

105

where w is the vector of individual epigenetic effects. Note that the incidence matrix (Z) 106

is the same for u and w and that both genetic and epigenetic effects are assumed to be 107

independent. The prior distribution of w is defined as: 108

109

where T is the matrix of epigenetic relationships between individuals and is the 110

transgenerational epigenetic variance. As before, the prior distribution of is defined 111

as: 112

113

with hyperparameters and . 114

The structure of the T matrix is defined by the recursive relationship between the 115

epigenetic effect of one individual ( iw ) with respect to the epigenetic effects of its 116

father ( fiw ) and mother ( miw ): 117

imifii www 118

2

u2

e

),(~ 212

uuu ns ),(~ 212

eee ns

2

us 2

es un en

eZwZuXby

),(~ 2

wMVN T0w

2

w

2

w

),(~ 212

www ns

2

ws wn

7

where 119

)1(2

1v 120

As defined by Tal et al. (2010), v is the reset coefficient and (1-v) is the epigenetic 121

transmission coefficient. The reset coefficient represents the proportion of epigenetic 122

marks across the parental genome that are expected to be erased, whereas its opposite, 123

the epigenetic transmission coefficient, indicates the proportion that are transmitted. 124

Further, is the residual epigenetic effect of the ith individual, independent from wi,, 125

and whose distribution is: 126

if both parents are known; 127

if only one ancestor in known; 128

assuming that the variance of transgenerational epigenetics effects (V()) is constant 129

across generations: 130

2

wmifii wVwVwV . 131

In matrix notation: 132

(1) 133

where the P matrix defines a recurrent relationship with the epigenetic effects of the 134

father and mother. For non-base individuals, the ith row of the P matrix contains a 135

parameter λ in the column pertaining to the father and mother of the ith individual. The 136

rest of elements are null. 137

Furthermore if: 138

139

then 140

112 ')(

PIVPITwV w 141

i

))21(,0(~ 22

wi N

ei ~ N(0,(1-l2 )sw

2 )

w= Pw+e

1

PIw

8

where V is a diagonal matrix with entries equal to for base individuals, 142

for individuals with one known ancestor, and for individuals 143

whose father and mother are known. The prior distribution for will be assumed to be 144

uniform, between 0 and 0.5. 145

146

Parameter estimation through a Gibbs Sampler 147

The Gibbs sampler algorithm is an iterative, updating sampling scheme that obtains 148

samples from the marginal posterior distributions of all the unknowns in a model. It 149

requires samples from the full conditional distributions of all the parameters. In the 150

proposed model, the set of parameters can be classified into three main groups: i) the 151

location parameters (b, u and w); ii) the parameter λ associated with matrix T; iii) the 152

variance components ( 2

u , 2

w and 2

e ). 153

The full conditional distributions for these groups are: 154

i) Sampling of the location parameters (b, u, w). 155

The conditional distributions of the location parameters are univariate Gaussian (Wang 156

et al. 1994), with parameters drawn from the following mixed model equations: 157

158

where 159

160

and and . 161

Specifically, the full conditional distribution for the ith location parameter (si) is: 162

2

w

22 )1( w 22)21( w

rCs

yZ

yZ

yX

r

w

u

b

s

TZZZZXZ

ZZAZZXZ

ZXZXXX

C

'

'

'

'''

'''

'''

1

1

2

2

u

e

2

2

w

e

9

163

given the multivariate Gaussian nature of the conditional distribution of the location 164

parameters. 165

A key limitation of the procedure is the calculation of the A-1 and T-1 matrices. Whereas 166

A-1 is calculated by the standard Henderson’s rules (Henderson 1976) only once, the T-1 167

matrix needs to be calculated afresh in each cycle as the parameter λ is updated. An 168

algorithm to set-up the T-1 matrix is presented in the appendix of this work (see 169

appendix). 170

ii) The full conditional distribution of . 171

The conditional distribution of is developed from the recursive definition of the 172

epigenetic effects (equation 1). It is a truncated univariate Gaussian distribution (TN) 173

between 0 and 0.5 due to the prior distribution of the parameter. 174

175

176

177

178

ii

e

ii

ji

jiji

icc

scr

Ns2

,~

,~ ]5.0,0[TNp w

ml =

(wfi + wmi )wi

s w

2 1- 2l 2( )( )i=1

n1

å +wfiwi

s w

2 1- l 2( )( )i=1

n2

å +wmiwi

s w

2 1- l 2( )( )i=1

n3

å

(wfi + wmi )2

s w

2 1- 2l 2( )( )i=1

n1

å +wfi

2

s w

2 1- l 2( )( )i=1

n2

å +wmi

2

s w

2 1- l 2( )( )i=1

n3

å

322

122

2

122

2

122

2

1121

)(

1n

i w

mi

n

i w

fin

i w

mifi wwww

10

where n1 is the number of individuals with known fathers and mothers, n2 is the

179

number of individuals with a known father only and n3 is the number of individuals with

180

a known mother.

181

iii) The full conditional distributions of the variance components ( ). 182

The conditional distributions for the variance components are the following inverted 183

chi-square distributions 184

eee

www

uuu

nnans

nnans

nnans

,'~

,'~

,'~

212

2112

2112

ee

wTw

uAu

185

where nan is the number of individuals in the population, ndat is the number of 186

phenotypic data and 2

xs and xn are the hyperparameters for 2

x . 187

188

Simulated Data Sets 189

In order to evaluate the procedure we simulated two different datasets. The first one had 190

a relative high percentage of variation caused by additive and transgenerational 191

epigenetic effects (35% and 20%, respectively) and an intermediate reset coefficient 192

(v=0.40). In the second dataset, a lower percentage of variation was explained by 193

additive genetic and transgenerational epigenetic effects (15% and 10%, respectively) 194

and the reset coefficient was higher (v=0.80). Each dataset was composed by a three 195

generation population. The base populations included 3,000 individuals (1,500 sires and 196

1,500 dams) and each of the two subsequent generations were composed by 3,000 full 197

sib families of 10 individuals each one. The sire and dam for each family were sampled 198

randomly from the individuals of the previous generation. The genetic and epigenetic 199

s u

2,sw

2,s e

2

11

effects for each individual (ui and wi) in the base population were generated from the 200

following Gaussian distributions: 201

),0(~

),0(~

2

2

wi

ui

Nw

Nu

202

Further, the genetic and epigenetic effects for the individuals (uj and wj) of the second 203

and third generation were obtained by sampling from: 204

))1(,(~

)2

,2

1

2

1(~

22

2

wmjfjj

u

mjfjj

wwNw

uuNu

205

where fiu and mju , and fiw and mjw were the additive genetic and transgenerational 206

epigenetic effects of the father and the mother of the jth individual, respectively. In 207

addition, one phenotypic record (yi) was generated for every individual by:

208

),(~ 2

eiii wuNy 209

where was a general mean set to 100 units. The values of the parameters for the two 210

datasets were: 211

Dataset 1.

212

40.0,20.0,35.0,30.0,270,120,210 22222 vhewu

213

being 214

)/( 22222

ewuuh and )/( 22222

ewuw

215

Dataset 2. 216

80.0,10.0,15.0,10.0,450,60,90 22222 vhewu 217

218

The Gibbs sampler was implemented using own software written in Fortran 95. 219

The analysis consisted of a single chain of 1,250,000 cycles and the first 250,000 were 220

discarded. Each analysis took 36 hours using a single thread of an Intel Xeon E5-2650 221

12

of 2.00 GHz. The source code is included as supplementary information. Convergence 222

was checked by the visual inspection of the chains and with the test of Raftery and 223

Lewis (1992). All samples were stored for calculating summary statistics. 224

225

Pirenaica Beef Cattle Data 226

As an illustrative example, we used a database of phenotypic records from the yield 227

recording system of the Pirenaica beef cattle breed. The Pirenaica breed is a meat-type 228

beef population from northern Spain with an approximate census of 20,000 individuals 229

that are typically reared under extensive conditions (Sánchez et al. 2002). The data set 230

was made up of 78,209 records for Birth Weight (BW) with an average value of 41.52 231

kg and a raw standard deviation of 4.65 kg. In addition, a pedigree file including 232

125,974 individual-sire-dam records was utilised. This information was provided by the 233

National Pirenaica Breeders Confederation (Confederación Nacional de Asociaciones 234

de Ganado Pirenaico; http://www.conaspi.net). Animal Care and Use Committee 235

approval was not required for this study as field data was obtained from the Yield 236

Recording System of the Pirenaica breed; furthermore, data was recorded by the 237

stockbreeders themselves, under standard farm management, with no additional 238

requirements. 239

The full model of analysis was: 240

ewZhZpZmZuZXby 13221 241

Where b is the vector of fixed systematic effects that included sex (2 levels) and age of 242

the mother (16 levels), u and m are the vectors of direct and maternal additive genetic 243

effects with 125,974 elements, p is the vector of permanent environmental maternal 244

effects (21,143 levels), h is the vector of the random herd-year-season effects (12,925 245

levels), w is the vector of transgenerational epigenetic effects (125,974 levels), and e is 246

http://www.conaspi.net/

13

the vector of the residuals. Further, X, Z1, Z2 and Z3 are the incidence matrices that link 247

the different effects with the phenotypic data. Appropriate uniform bounded 248

distributions were assumed for the systematic effects and for each variance component (249

},,,{ 22222

ewhpx ), defined by hyperparameters (nx=-2 and s2x=0). Further, the 250

prior distribution for the (co) variance matrix between direct and maternal genetic 251

effects (G) was the following inverted Wishart: 252

),(, 00 GGG GG nIWnp 253

where 254

2

2

mum

umu

G 255

being 2

u and 2

m the direct and maternal genetic variances and um the covariance 256

between them. In this case, Gn was set to -3 and G0 to a 2 × 2 matrix of zeroes to define 257

a uniform distribution. 258

Two statistical models (I and II) were fitted. Model I includes transgenerational 259

epigenetic variance ( 2

w ), whereas Model II does not. For each model, the Gibbs 260

sampler was implemented with a single chain of 3,250,000 cycles and the first 250,000 261

were discarded. Convergence was checked by the visual inspection of the chains and 262

with the test of Raftery and Lewis (1992). The models were compared based on the 263

pseudo-log-marginal probability of the data (Gelfand 1996, Varona and Sorensen 2014) 264

by computing the logarithm of the Conditional Predictive Ordinate (LogCPO) 265

calculated from the MCMC samples. It was calculated as: 266

i

kii MypLogCPO ,ln y

267

where y-i is the vector of data with the ith datum (yi) deleted, Mk is the kth model and 268

14

1

1 ,

1,

Ns

j k

j

ki

kiiMyp

NsMyp

y 269

being Ns is the number of MCMC draws, j

k is the jth draw from the posterior 270

distribution of the parameters of the kth model. 271

272

RESULTS AND DISCUSSION 273

From the perspective of quantitative genetics, the estimation of variance components is 274

obtained from the statistical information provided by the covariance between the 275

phenotypes of relatives (Falconer and Mackay 1996). Tal et al. (2010) proposed a 276

simple model for the resemblance between relatives under transgenerational epigenetic 277

inheritance for asexual and sexual reproduction. This model assumes that 278

transgenerational epigenetic effects are not correlated with the additive genetic ones, 279

and that they are distributed under a Gaussian law. After the central limit theorem, this 280

may be explained by a large number of epigenetic marks randomly distributed across 281

the genome. In addition, the distribution pattern of these marks is assumed independent 282

between individuals unless they are relatives. In that case, some of these marks should 283

have not been erased during the meiosis drawing them apart from their common 284

ancestor. The population rate of erasure of these marks is measured through the reset 285

coefficient (v). 286

In this study, we propose a procedure that makes use of the aforementioned authors' 287

definition for sexual diploid organisms to estimate the parameters under a Bayesian 288

mixed model framework. Our approach makes it feasible to estimate transgenerational 289

epigenetic variability from huge datasets of genealogical and phenotypic data. The key 290

to the procedure is the definition of a T matrix of (co) variance between 291

transgenerational epigenetic effects. This matrix exclusively depends on a single 292

15

parameter ( ) which is directly related to the reset coefficient (v) put forward by Tal et 293

al. (2010). As an example, for a simple pedigree of 7 individuals (Figure 1), the T 294

matrix is: 295

296

As Tal et al. (2010) stated, if we compare the covariance between full sibs (individuals 297

5 and 6) and the covariance between sire-offspring (individuals 1 and 4), the covariance 298

is higher for the sire offspring pair, vs. , as is defined as being within the 299

interval (0, 0.5). Note that if the reset coefficient is 0, then λ is ½ and the T matrix 300

becomes equivalent to the numerator relationship matrix (A), where the covariances 301

between full sibs and sire offspring are equivalent. Similarly, the uncle-nephew 302

covariance (individuals 5 and 7) is , lower than the covariance between half sibs 303

(individuals 4 and 5) which is only . 304

The mixed model implementation of the covariance between relatives requires the 305

inverse of the T matrix to construct the mixed model equations, in a similar manner as 306

the numerator relationship matrix in the standard model (Henderson 1984). In this 307

study, we describe a simple procedure for the calculation of this T-1 matrix (see 308

appendix). The procedure takes into account the recursive nature of the 309

transgenerational epigenetic effects, using an argument equivalent to Quass (1976), for 310

the inverse of numerator relationship matrix (A-1), and Quintanilla et al. (1999), for 311

dam-related permanent maternal environmental effects. The main consequence is that 312

the inverse of the T matrix can be sequentially constructed by reading the pedigree of 313

120

120

2210

10

0100

000010

001

3322

22

322

322

2

2

T

22

32

2

16

the population, in the light of very simple rules. These rules are close to Henderson's 314

(1976) rules for constructing the inverse of the A matrix. 315

For the example pedigree, the T-1 matrix is: 316

317

318

Given this algorithm to set-up the T-1 matrix, the implementation of a Gibbs sampling 319

approach becomes straightforward; it merely requires sequential iterative sampling from 320

Gaussian (b, u and w), truncated Gaussian ( ) and inverted chi-square distributions (321

). The only minor complication is that the T-1 matrix depends on parameter 322

and consequently must be calculated in each Gibbs sampler’s iteration. One possible 323

alternative to avoid this matrix inversion, although not to avoid its updating in each 324

cycle, is to implement an approach similar to the one proposed by Rodríguez-Ramilo et 325

al. (2014) for the genomic relationship matrix. However, setting up the T matrix is 326

computationally more demanding than the calculation of its inverse with the algorithm 327

proposed in this study. 328

The proposed procedure was first checked with simulated data. The summary of the 329

marginal posterior distributions for the variance components, additive genetic (h2) and 330

epigenetic ( 2 ) heritability, reset coefficient (v) and epigenetic transmission coefficient 331

(1-v) for both cases of simulation are presented in Table 1. The highest posterior density 332

)1(

1

)1(00000

)1()1()21(

100

)21(0

)21(

00)21(

10

)21(0

)21(

000)21(

10

)21()21(

0)21()21(

0)21(

2100

000)21(

0)21(

10

0)21()21()21(

00)21(

31

22

22

2

222

222

222

222

2

22

2

2222

2

1

T

s u

2,sw

2,s e

2

17

at 95 % for all parameters in the model included the simulated values. However, some 333

details of the results should be highlighted. In the second case of simulation, with lower 334

2

w and , the posterior standard deviation (and the HPD95) for the 2

w and 2

e were 335

remarkable wider (66.19 and 67.42, respectively). The cause of this phenomenon is that 336

there is a statistical confusion between the epigenetic and residual variance components 337

when the reset coefficient (v) is very high, because the T and I matrices becomes very 338

similar. To illustrate this fact, in Figures 2 and 3, we present the joint posterior densities 339

of 2 and v for both cases of simulation. The results of the first case of simulation 340

showed posterior independence between both parameters, whereas, in the second, the 341

marginal posterior density presented a half-moon shape. This indicates that for large 342

values of v, 2 (and 2

w ) may take any value on its support with a fairly equal 343

probability. In other words, in the first simulated case the model showed very good 344

ability to discriminate between both parameters, whereas in the second it did not. In 345

addition, mixing of the McMC procedure in the second case of simulation was clearly 346

worst, and adaptive McMC algorithms, such as the proposed by Mathew et al. (2012) 347

may represent an interesting alternative for its implementation in large data sets. 348

It must be noted that the sources of information for the estimation of 2

w and v (or ) 349

comes from the comparison between the covariance between different categories of 350

relatives (see Table 2). Following Tal et al. (2010), one estimate of v can be achieved 351

from the difference between the estimates of covariance between half-sib and uncle-352

nephew relationships, a difference that becomes almost null for low values of (or 353

high v), as seen in the second case of simulation (23.10 vs 22.62). On the contrary, with 354

higher values of (or lower v), as in the first case of simulation, this difference is 355

higher (60.60 vs. 57.36), and, thus, the amount of information available for estimation 356

purposes increases. This uncertainty in the estimation of (or v) is also reflected in the 357

18

estimation of 2

w ( 2 ) and 2

e and implies wider posterior marginal densities. 358

Nevertheless, it is important to emphasize that the joint posterior mode (v = 0.80, 2 = 359

0.13) in the second case of simulation was very close to the simulated parameters (0.80 360

and 0.10, respectively). 361

We have also checked our model with a real dataset. In Table 3, summary statistics of 362

the marginal posterior distributions for the parameters and LogCPO values are 363

presented for models I and II in the analysis of Birth Weight from the Pirenaica beef 364

cattle population. The posterior mean estimates for heritability ranged between 0.38 365

(Model I) to 0.41 (Model II), and the posterior mean estimates for transgenerational 366

epigenetic heritability was only 0.04 (Model I), with a highest posterior density at 95% 367

that ranged between 0.00 and 0.11. These results are coherent with the output of the 368

model comparison test (LogCPO), which pointed to Model II as the more plausible. If 369

we focus in this latter model, the estimates of direct and maternal heritability were 370

within the range of estimates in the literature for these traits (Meyer 1992; Varona et al. 371

1999; Jamrozik et al. 2014), and the absence of the epigenetic transgenerational 372

heritability is coherent with the fact that most of the epigenetic marks are erased during 373

the meiosis in mammals (Jablonka and Raz 2009, Schmitz 2014). However, it should be 374

highlighted that the posterior mean estimate of the reset coefficient under Model I was 375

surprisingly low (0.20), although its posterior standard deviation was very high (0.20), 376

and that the HPD95 region covered almost all the parametric space (0.01-0.89). This 377

wide range reflects the absence of information to properly estimate the parameter, given 378

the low magnitude of 2

w . 379

It is worth noting that the procedure can also provide epigenetic breeding values in the 380

fields of animal and plant breeding. These are calculated by weighting the phenotypic 381

information of relatives according to the magnitude of the elements of the T matrix. In 382

19

fact, for epigenetic effects, the weight of distantly related individuals is lowered as the 383

number of opportunities to reset the epigenetic marks increases. Both additive genetic 384

and epigenetic effects can be used for prediction of the future performance of the 385

individual: the expected genetic response (R) after one cycle of mass selection should 386

be , where i is the intensity of selection and is the phenotypic 387

variance. However, further research must be undertaken in order to develop adequate 388

indices of selection that consider genetic and epigenetic effects. Although both of them 389

affect the immediate future performance of the offspring, epigenetic effects are diluted 390

in future generations as the epigenetic marks are cleaned. It is important to note that if 391

this selection pressure is relaxed, the average epigenetic effect declines to zero with a 392

rate of , where n is the number of generations without selection. 393

It should be further noted that the proposed model uses a very basic definition of 394

transgenerational epigenetic inheritance, as it assumes equal epigenetic variance for all 395

individuals in the population. However, when phenotypic records are recorded through 396

the life of the individual, transgenerational epigenetic variance can be modelled as age 397

dependent, based on the consideration that the number of epigenetic marks was 398

accumulated in the genome of the individuals. Moreover, in the proposed model, it is 399

assumed that the parameter (or the reset coefficient) is equal for fathers and mothers, 400

and, in future research, it seems reasonable to assume a different transmission 401

coefficient for males and females, allowing for the presence of sex differential genomic 402

imprinting (Barlow 1997; Reik and Walter 1998). The model can be refined by the 403

inclusion of a hierarchical Bayesian paradigm that includes systematic effects for any 404

environmental factors that may influence the epigenetic transmission coefficient. In 405

addition, a more profound approach could even allow for the consideration of a genetic 406

222 1 pvhiR 2

p

nv1

20

determinism of the reset coefficient with a model similar to that proposed by Varona et 407

al. (2008) for the degree of asymmetry of a skewed Gaussian distribution. 408

Finally, and as mentioned by Jablonka and Law (2009) and Tal et al. (2010), 409

epigenetics may be viewed as a more wide ranging concept which could include several 410

types of cultural transmission (Avital and Jablonka 2000; Richerson and Boyd 2005). 411

The procedure suggested in this work could also be applied to the analysis of human or 412

animal datasets that involve any kind of transgenerational transmission, even those not 413

directly related to gene expression. 414

By way of a conclusion, we can say that this paper presents an original procedure for 415

the estimation of transgenerational epigenetic variability based on some generalised 416

assumptions. It is hoped that the proposal will lead to future research on variations of 417

epigenetic transmission abilities caused by environmental and/or genetic factors. 418

419

ACKNOWLEDGEMENTS 420

This research was funded by grant AGL2010-15903 (Ministerio de Ciencia e 421

Innovación, Spain). The authors would like to thank the Pirenaica Breeders Association 422

(Confederación Nacional de Asociaciones de Ganado Pirenaico) for the provision of 423

data used in this work. We also want to thank the useful comments of the editor and the 424

reviewers. 425

426

REFERENCES 427

Avital, E., and E. Jablonka, 2000 Animal Traditions: Behavioural Inheritance in 428

Evolution. Cambridge University press, Cambridge. 429

Barlow, D. P., 1997 Competition - A common motif for the imprinting mechanism?. 430

EMBO J. 16: 6899-6905. 431

21

Eccleston, A., N. DeWitt, C. Gunter, B. Marte, and D. Nath, 2007 Introduction 432

Epigenetics. Nature. 447:395-400. 433

Esteller, M., 2008. Epigenetics of Cancer. N. Engl. J. Med. 358:1148-1159. 434

Falconer, D. S., and T. F.C. Mckay, 1996 Introduction to Quantitative Genetics, Ed 4. 435

Longmans Green, Harlow, Essex, UK. 436

Feinberg, A. P., 2007 Phenotypic plasticity and the epigenetics of human disease. 437

Nature 447: 433-440. 438

Gelfand, A. E., 1996 Model determination using sampling-based methods, pp. 145–161 439

in Markov Chain Monte Carlo in Practice, edited by W. R. Gilks, S. Richardson, and D. 440

J. Spiegelhalter. Chapman & Hall, London. 441

Gelfand, A. E., and A. F. M. Smith, 1990 Sampling-based approaches to calculating 442

marginal densities. Journal of the American Statistical Association 85:398-409. 443

Gianola, D., and R. L. Fernando, 1986 Bayesian methods in animal breeding theory. J. 444

Anim. Sci. 63:217-244. 445

Grossniklaus, U., W. G. Kelly, A. C. Ferguson-Smith, M. Pembrey, and S. Lindquist, 446

2013 Transgenerational epigenetic inheritance: how important is it?. Nat. Rev. Genet. 447

14: 228-235. 448

Heard, E., and R. A. Martienssen, 2014 Transgenerational epigenetic inheritance: Myths 449

and mechanisms. Cell 157: 95-109. 450

Henderson, C. R., 1976 A simple method for computing the inverse of a numerator 451

relationship matrix used in the prediction of breeding values. Biometrics 32:69-83. 452

Henderson, C. R., 1984 Application of Linear Models in Animal Breeding. University of 453

Guelph, Guelph, ON. 454

Henderson, I.R., and S. E. Jacobsen, 2007 Epigenetic inheritance in plants. Nature 455

447:418-424. 456

22

Jablonka, E., and G. Raz, 2009 Transgenerational epigenetic inheritance: Prevalence, 457

mechanisms and implications for the study of heredity and evolution. Q Rev Biol 458

84:131-176. 459

Jamrozik, J., and S. P. Miller, 2014 Genetic evaluation of calving ease in Canadian 460

Simmentals using birth weight and gestation length as correlated traits. Lives. Sci. 461

162:42-49. 462

Law, J. A., and S. E. Jacobsen, 2010 Establishing, maintaining and modifying DNA 463

methylation patterns in plants and animals. Nat Rev Genet 11:204-220. 464

Mathew, B., A. M. Bauer, P. Koistinen, T. C. Reetz, J. Léon, and M. J. Sillanpää. 2012. 465

Bayesian adaptive Markov chain Monte carlo estimation of genetic parameters. 466

Heredity 109:235-234. 467

Meyer, K., 1992 Variance components due to direct and maternal effects for growth 468

traits of Australian beef cattle. Lives. Prod. Sci. 11:143-177. 469

Patterson, H. D., and R. Thompson, 1971 Recovery of inter-block information when 470

block sizes are unequal. Biometrika 58:545-554. 471

Quass, R.L., 1976 Computing the diagonal elements of the inverse of a large numerator 472

relationship matrix. Biometrics 32:949-953. 473

Quintanilla, R., L. Varona, M. R. Pujol, and J. Piedrafita, 1999 Maternal animal model 474

with correlation between maternal environmental effects of related dams. J. Anim. Sci. 475

77:2904-2917. 476

Raftery, A.E., and S. M. Lewis, 1992 How many iterations in the Gibbs Sampler, pp. 477

763-774 in Bayesian Statistics IV, edited by J. M. Bernardo, J. O. Berger, A. P. Dawid 478

and A. F. M. Smith. Oxford University Press. New York. 479

Reik, W., 2007 Stability and flexibility of epigenetic gene regulation in mammalian 480

development. Nature 447:425-432. 481

23

Reik, W., and J. Walker, 1998 Imprinting mechanisms in mammals. Curr. Opin. Genet. 482

Dev. 8:154-164. 483

Richerson, P. J., and R. Boyd, 2005 Not by Genes Alone: How Culture Transformed 484

Human Evolution. University of Chicago Press. Chicago. 485

Rodríguez-Ramilo, S. T., L. A. García-Cortés, O. González-Recio, 2014 Combining 486

genomic and genealogical information in a reproducing kernel Hilbert spaces regression 487

model for genome-enabled predictions in dairy cattle. PLoS ONE 9:e93424 488

Sánchez, A., J. Ambrona, and L. Sánchez, 2002 Razas ganaderas españolas bovinas. 489

MAPA-FEAGAS. Madrid. SPAIN. 490

Schmitz, R. J., 2014 The secret garden – epigenetic alleles underlies complex traits. 491

Science 343:1082-1083. 492

Tal, O., E. Kisdi, and E. Jablonka, 2010 Epigenetic contribution to covariance between 493

relatives. Genetics. 184:1037-1050. 494

Varona, L., N. Ibañez-Escriche, R. Quintanilla, J. L. Noguera, and J. Casellas, 2008 495

Bayesian analysis of quantitative traits using skewed distributions. Genetics Research 496

90:179-190. 497

Varona, L., I. Misztal, and J. K. Bertrand, 1999 Threshold-linear versus linear-linear 498

analysis of birth weight and calving ease using an animal model: I. Variance component 499

estimation. J. Anim. Sci. 77:1994-2002. 500

Varona, L., D. Sorensen, 2014 Joint Analysis of binomial and continuous traits with 501

recursive model: a case study using mortality and litter size in pigs. Genetics 196:643-502

651. 503

Wang, C. S., J. J. Rutledge, and D. Gianola, 1994 Bayesian analysis of mixed linear 504

models via Gibbs Sampling with an application to litter size in Iberian pigs. Genet. Sel. 505

Evol. 26:91-115. 506

24

Youngson, N. A., and E. Whitelaw, 2008 Transgenerational epigenetic effects. Annu. 507

Rev. Genomics Hum. Genet. 9:233-257. 508

509

25

APPENDIX 510

Derivation of the T-1 matrix. 511

The vector of epigenetic effects (w) can be represented as: 512

513

where the P matrix defines a recurrent relationship with the epigenetic effects of the 514

father and mother. For non-base individuals, the ith row of the P matrix contains a 515

parameter λ in the column pertaining to the father and mother of the ith individual. The 516

rest of the elements are null. 517

Furthermore, if: 518

519

then 520

521

given that 522

PIVPITwV 1

2

11'

1

w

(A1) 523

from , it is possible to define a QT matrix as: 524

'1

2PITPIVQT

w (A2) 525

If we consider the structure of and , the QT matrix has a diagonal 526

structure with 1’s for base individuals and and for those with one or two 527

known parents, respectively. 528

Finally, replacing (A2) into (A1): 529

530

w= Pw+e

1

PIw

11')(

PIPIwV V

2

wTwV

wPI

PI 'PI

21 221

PIQPIT T 11 '

26

Taking advantage of this structure, it is possible to define the following algorithm for 531

constructing T-1 by reading the pedigree information and calculating for each individual 532

(i) in the pedigree: 533

a) If the father and mother are unknown; 534

add 1 to the diagonal (i,i) 535

b) If only one parent (p) is known; 536

add to the diagonal (i,i) 537

add to the elements (i,p), (p,i) 538

add to the element (p,p) 539

c) If both ancestors (p and m) are known; 540

add to the diagonal (i,i) 541

add to the elements (i,p), (i,m), (m,i), (p,i) 542

add to the elements (p,p), (p,m), (m,p), (m,m) 543

544

21

1

21

2

2

1

221

1

221

2

2

21

27

Table 1. Posterior Mean Estimate (PM), Posterior Standard Deviation (PSD) and 545

Highest Posterior Density at 95 % (HPD95) for simulation cases I and II. 546

Case I

su

2 = 210sw

2 =120se

2 = 270 l = 0.30

Case II

10.04506090 222 ewu

Parameter PM PSD HPD95 PM PSD HPD95

2

u 217.87 18.21 174.66-247.49 90.91 4.63 81.42-99.25

2

w 132.92 14.42 106.30-164.53 111.99 66.19 37.45-288.61

2

e 251.00 19.71 205.07-282.31 396.30 67.42 217.79-475.16

2h 0.362 0.029 0.291-0.408 0.152 0.007 0.136-0.164

2 0.221 0.024 0.176-0.274 0.187 0.110 0.062-0.481

0.256 0.055 0.148-0.365 0.091 0.059 0.020-0.233

v 0.488 0.111 0.270-0.704 0.818 0.117 0.534-0.960

1-v 0.512 0.111 0.296-0.730 0.182 0.117 0.040-0.466

547

s u

2 is the additive genetic variance, s w

2 is the transgenerational epigenetic variance and 548

s e

2 is the residual variance. Moreover, h2 is the heritability, g 2is the transgenerational 549

epigenetic heritability, l is the autorecursive parameter, v is the reset coefficient and 1-550

v the epigenetic transmission coefficient. 551

552

553

554

555

28

Table 2. Expected covariance between relatives in cases of simulation I and II 556

Relatives Expected

Covariance

Case I

30.0120210 22 wu

Case II

10.06090 22 wu

Offspring-

Progeny

22

2

1wu

132 51

Full-Sibs 222 22

1wu

121.2 46.2

Half-Sibs 222

4

1wu

60.6 23.1

Uncle-Nephew 232 24

1wu

57.36 22.62

557

s u

2 is the additive genetic variance, s w

2 is the transgenerational epigenetic variance and 558

l is the autorecursive parameter. 559

560

29

561

Table 3. Posterior Mean Estimate (PM), Posterior Standard Deviation (PSD) and 562

Highest Posterior Density at 95 % (HPD95) for models I and II. 563

Model I Model II

PM PSD HPD95 PM PSD HPD95 2

u 9.033 0.914 7.059, 10.449 9.897 0.408 9.123, 10.727

2

m 3.574 0.212 3.168, 3.998 3.554 0.217 3.135, 3.981

um -4.473 0.253 -4.979, -3.990 -4.508 0.259 -5.032, -4.013

2

p 0.747 0.081 0.590, 0.906 0.765 0.079 0.611, 0.922

2

w 0.876 0.820 0.000, 2.701 - - -

2

h 2.634 0.069 2.501, 2.771 2.633 0.069 2.499, 2.770

2

e 7.055 0.223 6.606, 7.478 7.156 0.209 6.735, 7.556

0.398 0.103 0.056, 0.494 - - -

v 0.204 0.207 0.012, 0.888 - - -

1-v 0.796 0.207 0.112, 0.988 - - - 2h 0.377 0.035 0.299, 0.427 0.412 0.012 0.389, 0.436 2m 0.149 0.007 0.135, 0.164 0.148 0.007 0.133, 0.162

2 0.036 0.034 0.000, 0.113 - - -

LogCPO -287542.3 -287475.1

564

s u

2 is the additive genetic variance, s m

2 is the maternal environmental variance, um is 565

the covariance between them, 2

p is the permanent maternal environmental variance, 566

2

h is the herd-year-season variance, s w

2 is the transgenerational epigenetic variance 567

and s e

2 is the residual variance. Moreover, h2 is the heritability, 2m is the maternal 568

heritability, g 2 is the transgenerational epigenetic heritability, l is the autorecursive 569

parameter, v is the reset coefficient and 1-v the epigenetic transmission coefficient. 570

LogCPO is the logarithm of the conditional predictive ordinate. 571

572

30

Figure 1. Example of a pedigree. 573

574

31

Figure 2. Joint posterior distribution of the transgenerational epigenetic heritability ( 2575

) and the reset coefficient (v) in the first case of simulation. 576

577

578

32

Figure 3. Joint posterior distribution of the transgenerational epigenetic heritability ( 2579

) and the reset coefficient (v) in the second case of simulation. 580

581

A Bayesian Model for the Analysis of Transgenerational Epigenetic Variation.

Documents

Transcript of A Bayesian Model for the Analysis of Transgenerational Epigenetic Variation.