Identification of Deleterious SNPs and Their Effects on Structural Level in CHRNA3 Gene

10
ORIGINAL ARTICLE Identification of Deleterious SNPs and Their Effects on Structural Level in CHRNA3 Gene Vivek Chandramohan 1 Navya Nagaraju 1 Shrikant Rathod 2 Anubhav Kaphle 1 Uday Muddapur 2 Received: 18 February 2015 / Accepted: 12 May 2015 Ó Springer Science+Business Media New York 2015 Abstract The aim of our study is to identify probable deleterious genetic variations that can alter the expression and the function of the CHRNA3 gene using in silico methods. Of the 2305 SNPs identified in the CHRNA3 gene, 115 were found to be non- synonymous and 12 and 15 nsSNPs were found to be in the 5 0 and 3 0 UTRs, respectively. Further, out of the 115 nsSNPs investigated, eight were predicted to be deleterious by both SIFT and PredictSNP servers. The major mutations predicted to affect the structure of the protein are phenylalanine to valine (Y43V) and lysine to asparagine (K216N) as shown by the trajectory run in molecular dynamics studies. The random transition of the protein structures over the simulation period caused by these mutations hints at how the native state is distorted which could lead to the loss of structural stability and func- tionality of the nicotinic acetylcholine receptors subunit a-3 protein. Based on this work, we propose that the nsSNP with SNP id of rs75495285 and rs76821682 will have comparatively more deleterious effects than the other predicted mutations in destabi- lizing the protein structure. Keywords CHRNA3 SIFT PredictSNP Molecular modeling Molecular dynamics Introduction Lung cancer is one of the most common forms of malignant tumor in humans and is the most common cause of cancer-related mortality. Chronic smoking, occupational exposure, air pollution, and other factors are considered to be the causes of lung & Vivek Chandramohan [email protected] 1 Department of Biotechnology, Siddaganga Institute of Technology, Tumkur 572013, Karnataka, India 2 Department of Biotechnology, KLE Dr. MSS CET, Belgaum, India 123 Biochem Genet DOI 10.1007/s10528-015-9676-y

Transcript of Identification of Deleterious SNPs and Their Effects on Structural Level in CHRNA3 Gene

ORIGINAL ARTICLE

Identification of Deleterious SNPs and Their Effectson Structural Level in CHRNA3 Gene

Vivek Chandramohan1 • Navya Nagaraju1 •

Shrikant Rathod2 • Anubhav Kaphle1 •

Uday Muddapur2

Received: 18 February 2015 / Accepted: 12 May 2015

� Springer Science+Business Media New York 2015

Abstract The aim of our study is to identify probable deleterious genetic variations

that can alter the expression and the function of the CHRNA3 gene using in silico

methods. Of the 2305 SNPs identified in the CHRNA3 gene, 115 were found to be non-

synonymous and 12 and 15nsSNPswere found to be in the 50 and 30 UTRs, respectively.Further, out of the 115 nsSNPs investigated, eight were predicted to be deleterious by

both SIFT and PredictSNP servers. Themajormutations predicted to affect the structure

of the protein are phenylalanine to valine (Y43V) and lysine to asparagine (K216N) as

shown by the trajectory run inmolecular dynamics studies. The random transition of the

protein structures over the simulation period caused by these mutations hints at how the

native state is distorted which could lead to the loss of structural stability and func-

tionality of the nicotinic acetylcholine receptors subunita-3 protein.Based on thiswork,we propose that the nsSNP with SNP id of rs75495285 and rs76821682 will have

comparatively more deleterious effects than the other predicted mutations in destabi-

lizing the protein structure.

Keywords CHRNA3 � SIFT � PredictSNP � Molecular modeling � Molecular

dynamics

Introduction

Lung cancer is one of the most common forms of malignant tumor in humans and is

the most common cause of cancer-related mortality. Chronic smoking, occupational

exposure, air pollution, and other factors are considered to be the causes of lung

& Vivek Chandramohan

[email protected]

1 Department of Biotechnology, Siddaganga Institute of Technology,

Tumkur 572013, Karnataka, India

2 Department of Biotechnology, KLE Dr. MSS CET, Belgaum, India

123

Biochem Genet

DOI 10.1007/s10528-015-9676-y

cancer. Smoking causes over 80% of lung cancer cases (Salim et al. 2011).

However, less than 20% of smokers develop lung cancer. The reasons for varied

cancer susceptibility among smokers are still unknown. Cigarette smoke contains

numerous carcinogens, including tar and benzopyrene. These carcinogens activate

signaling pathways that affect cell growth, differentiation, and apoptosis. Out of all

the compounds present, nicotine is the primary component responsible for the

addiction toward tobacco consumption, and this addiction greatly exacerbates the

cumulative health dangers of tobacco (Shen et al. 2012). Nicotine binds to specific

nicotinic acetylcholine receptors (nAChRs), which are encoded by a set of CHRNA

genes. CHRNA expression in the core region of the brain is also found to be closely

correlated with nicotine addiction (Shen et al. 2013). nAChRs, however, are not

only confined to cells at synapses but are also found on non-neuronal cells such as

lung epithelial cells, inferring the idea that they have other functions beyond

neurotransmission (Egleton et al. 2008), and their over-expression has been shown

to involve in increased signal transduction that promotes cell proliferation and

cancer metastasis (Singh et al. 2011). There are more than a dozen of different

nAChRs subunits encoded by the CHRNA genes, subdivided into a and b subunits

that form pentameric ion channel complexes (Roman and Koval 2009). The

CHRNA3 gene encodes an alpha-type subunit of the channel molecule, as it

contains characteristic adjacent cysteine residues. Primary lung cancer morbidity

and mortality have increased rapidly in the last 3 decades, which is attributed to

complex interactions between environmental and genetic risk factors (McMenamin

et al. 2010). Single nucleotide polymorphisms (SNPs) are the most common and

stable markers of human genetic variation and may be associated with the risk of

variety of cancers, including that of the lung (Xu et al. 2013). Studies have shown

that single nucleotide variants of the CHRNA genes have a strong association with

lung cancer risk (Hung et al. 2008; Thorgeirsson et al. 2008). Further, there have

been some studies about the association of the polymorphisms in the loci of

CHRNA3 and increased lung cancer susceptibility (Amos et al. 2010). Thus, the

present study aimed to perform a computational analysis of the total SNPs present in

the CHRNA3 gene that are recorded in the SNPs databases, and identify possible

deleterious mutations that might have functional effects. Further, to assess these

effects, we went a step ahead to produce modeled protein structures for wild and

mutant types, and check the dynamics and stability of these structures for a given

simulation time period.

Materials and Methods

Datasets

The SNPs and their related protein sequence for the CHRNA3 gene were retrieved

from the dbSNP (http://www.ncbi.nlm.nih.gov/SNP/) for computational study. The

study of functional coding nsSNPs by sequence homology-based method (SIFT and

predictSNP) was performed.

Biochem Genet

123

Establishing Deleterious nsSNPs by SIFT and PredictSNP

SIFT is a homology-based tool that presumes that important amino acids will be

conserved in the protein family. Hence, changes at well-conserved positions tend to

be predicted as deleterious. The query was submitted in the form of SNP IDs or as

protein sequences. The underlying principle of this program is that SIFT takes a

query sequence and uses multiple alignment information to predict tolerated and

deleterious substitutions for every position of the query sequence (Jia et al. 2014).

The cutoff value in the SIFT program is a tolerance index of C0.05. The higher the

tolerance index, the less functional impact a particular amino acid substitution is

likely to have. The PredictSNP is a consensus classifier that combines six of the top

performing tools for the prediction of the effects of mutation on protein function.

The obtained results are provided together with annotations extracted from the

Protein Mutant Database and the Uniprot Database (Bendl et al. 2014). The output

value of PredictSNP score belongs to the continuous interval h-1, ?1i. The

mutations are considered to be neutral for the values in the interval h-1, 0i and

deleterious for the values in the interval h0, ?1i. The absolute distance of the

PredictSNP score from zero expresses the confidence of the consensus classifier

about its prediction. For easy comparison with the confidence scores of individual

integrated tools, we transformed the confidence of the PredictSNP consensus

classifier to the observed accuracy in the same way as described for confidence

scores of the integrated tools.

Homology Modeling and Mutational Analysis

Homology modeling and structural validation of human CHRNA3 was carried out

on the basis of template structure: 4AQ5_A using Discovery studio 3.5 homology

modeling protocol. The overall stereo-chemical quality of the final models for

CHRNA3 protein was accessed by the program PROCHECK (Shahzad et al. 2013).

Mutation analysis was performed based on the results obtained from various in

silico tools. Mutants were prepared by replacing the native residues with the

corresponding mutant residues (R239G, A212D, F43V, R37H, I209M, H217Y,

Y215H, and K216N). The obtained structures have been optimized classically using

CHARMm force field implemented in DS 3.5 with conjugate gradient energy

minimization algorithms and convergence energy of 0.001 kcal/mol (Matsumoto

et al. 2013). These structures were then used for molecular simulations.

Molecular Dynamics

The molecular dynamics (MD) was performed for wild and mutant protein

structures obtained from Homology modeling studies in explicit solvent for 6 ns at a

temperature of 310 K. The temperature was maintained constant at 310 K. All

minimizations and MD simulations were performed using Discovery Studio

Molecular Dynamics Protocol 3.5. Based on the analyzed trajectory protocol,

6000 snapshots were superimposed with respect to all backbone and side chain

atoms to remove overall translation and rotation and then clustered at various

Biochem Genet

123

RMSD cutoff values based on atomic coordinates of all backbone and side chain

atoms of the protein (Novikov et al. 2013).

Results and Discussion

SNP Dataset from dbSNP

The CHRNA3 gene was investigated and the details about the gene variants were

obtained from the dbSNP database. It contained a total of 115 non-synonymous

coding SNPs (nsSNPs), 26 intron region SNPs, and 12 and 15 SNPs in 50 and 30

untranslated regions (UTRs), respectively (Fig. 1). We selected non-synonymous

coding SNPs for our investigation to analyze the structural and functional changes

in CHRAN3 protein. Furthermore, it can be seen that the number of nsSNPs in the

coding region is higher compared to the number of SNPs in the 50 and 30

untranslated regions inferring that the possible effects of the sequence variations

might be majorly on the composition and structure of gene products than on the

regulation of gene expression.

Establishment of Deleterious nsSNPs by the SIFT and PredictSNP

The conservation level of a particular position in a protein was determined by using

the sequence homology-based tools SIFT. The protein sequences of 115 nsSNPs

were submitted independently to the SIFT program to check their tolerance index.

The higher the tolerance index, the less of a functional impact a particular amino

acid substitution is likely to have and vice versa. Among the 115 nsSNPs, eight were

predicted to be deleterious, with a tolerance index score of less than 0.05. The

results are presented in (Table 1). The structural levels of alteration were

Fig. 1 Distribution of nsSNPs, 50 UTR, 30 UTR SNPs, and intron SNPs

Biochem Genet

123

determined by applying the Predict SNPs program. Protein sequence with

mutational position and amino acid variants associated with the 115 nsSNPs

investigated in this work was submitted as input to the PredictSNPs server and the

results are depicted in (Table 2).

Homology Modeling

CHRNA3 gene does not have an experimentally solved 3D structure. Hence, we

constructed a 3D model of the protein using homology modeling (Fig. 2) to analyze

any structural change resulting from the mutations in the sequence. BLAST search

identified a suitable template with PDB ID 4AQ5_A for modeling of CHRNA3

protein. Ramachandran plot for the model shows 98.6% of the residues in either the

core region or the allowed region and the remaining 1.4% of the residues in the

generously allowed region with no residue in the disallowed region as plotted.

Based on the results obtained from the above analysis, given in Tables 1 and 2, eight

experimentally validated nsSNPs [R239G, A212D, F43V, R37H, I209M, H217Y,

Y215H, and K216N] were chosen for the structural analysis (Fig. 3). Based on the

position of amino acids in the corresponding chains of the modeled structure, the

Table 1 nsSNPs predicted to

be functionally significant by

SIFT tool

SNP ID Amino acid change Tolerance index

rs8192475 R37H 0.00

rs55958820 I209M 0.01

rs76821682 K216N 0.00

rs71651684 V147A 0.00

rs61737495 R239G 0.00

rs72650603 H217Y 0.00

rs71581738 Y215H 0.50

rs75253420 A212D 0.00

rs75495285 F43V 0.00

Table 2 nsSNPs predicted to be functionally significant by PredictSNP tool in percentage

Mutation Predict SNP MAPP PhD_SNP PolyPhen-1 PolyPhen-2 SIFT SNAP

R239G 87 84 88 74 81 79 89

A212D 87 48 77 74 65 53 72

F43V 87 43 61 59 55 53 56

R37H 72 70 86 74 47 53 56

I209M 76 63 59 59 55 43 56

H217Y 61 71 77 59 40 79 58

Y215H 75 79 45 59 63 76 58

K216N 87 89 82 78 89 79 90

Biochem Genet

123

mutation analysis was performed using Discovery studio 3.5 viewer, and energy

minimization was carried out using the MD simulation protocol.

Molecular Dynamics Simulation

A comparative MD analysis of the predicted deleterious mutants R239G, A212D,

F43V, R37H, I209M, H217Y, Y215H, and K216N with the native was carried out.

In the 6 ns simulation trajectory, different parameters such as root mean square

deviation (RMSD) and potential energy were applied to analyze the level of

structural changes. The backbone RMSD was calculated from the trajectory value of

native and mutant models. The RMSD calculations and the potential energy are

plotted and are shown in the Figs. 4 and 5, respectively. The graph of RMSD versus

trajectory time shows the overall deviations in the protein structures during the

course of simulation. Our modeled structure was very stable after an initial jump of

8–13 nm at 750 ps time. Similar stability trend was observed in the mutant A212D,

which had a very low deviation in the backbone RMSD of just *0.5 nm over the

time. Mutant I209M was also pretty stable during the first half of the run, but the

structure was distorting more rapidly after 4 ns with RMSD deviation of more than

1.5 nm. Other mutants were very unstable from the beginning of the simulation. The

mutants H217Y, K216N, F43V, Y215H, R239G, and R37H showed maximum

deviation from their original structures. Among the simulated mutant structures,

F43V and K214N had maximum random distortion in their structures over the

period of 6 ns. The instability might be explained in terms of the structural

Fig. 2 Model structure of CHRNA3 protein with solid ribbon display style (red, blue and green color:alpha helix, beta sheets, and coil, respectively) (Color figure online)

Biochem Genet

123

- Alanine changes to Aspartic acid - Phenylalanine Changes to Valine

- Histidine Changes to Tyrosine K216N- Lysine Changes to Asparagine

- Arginine Changes to Glycine - Arginine Changes to Histidine

- Isoleucine changes to Methionine – Tyrosine changes to Histidine

Fig. 3 Eight experimentally validated nsSNPs [R239G, A212D, F43V, R37H, I209M, H217Y, Y215H,and K216N]—violet color wild residues and brown color mutated residues (Color figure online)

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 60000123456789

1011121314

R M

S D

(nm

)

Time (ps)

A212D H217Y I209M R37H R239G WILD Y215H K216N F43V

Fig. 4 Backbone RMSDs are shown as a function of time for WT and mutant CHRNA3 proteinstructures over the time period of 6 ns

Biochem Genet

123

differences in amino acid backbone in both the mutations. It can be inferred from

MD studies that for the above two mutations F43V and K214N, the protein

undergoes significant structure transitions when compared to the native structure.

Tallying Our Predicted Mutations with TCGA Cancer Data

After obtaining the complete MD results for our predicted mutation structures, we

used the cancer genome atlas (TCGA) data to check if any real cancer patient data

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000-4600

-4400

-4200

-4000

-3800

-3600

-3400

-3200

-3000

-2800

-2600

Pot

entia

l_en

ergy

Time (ps)

A212D F43V H217Y I209M R37H R239G WILD K216N Y215H

Fig. 5 Potential energy is shown as a function of time for WT and mutant CHRNA3 protein structuresover the time period of 6 ns

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 60000

1

2

3

4

5

6

7

8

9

10

11

12

13

14

R M

S D

(nm

)

Time (ps)

H217Y WILD Y215H K216N F43V

Fig. 6 Backbone RMSDs are shown as a function of time for WT and selected mutants from TCGAdatabase of CHRNA3 protein structures over the period of 6 ns for comparison

Biochem Genet

123

are available that matches our prediction. We used cBioportal (http://www.

cbioportal.org/index.do) to navigate through the datasets. We found one of the

mutations, Y215H, in the lung adenocarcinoma cell line RERF-LC-Ad1 with allele

frequency of 0.51. We also found the second predicted mutation H217Y in the

database, but in a different cell line—Brain cancer GMS-10. Unfortunately, we did

not find the rest of our mutations linked in any of the lung cancer data.

Our MD studies, however, have validated the data in terms of protein stability.

These two mutations also destabilize protein structures with random shape

transitions, but the other two mutations what we inferred before have more effects

on the structure stability than these two as shown in Fig. 6.

Conclusion

The CHRNA3 gene was investigated by assessing the influence of functional SNPs

by means of computation methods. Of the 2305 SNPs in the CHRNA3 gene, 115

were found to be non-synonymous and 12 and 15 nsSNPs were found to be in the 50

and 30 UTRs, respectively. Eight nsSNPs were found to be common in both the

SIFT and the PolyPhen Server. It was predicted that the major mutations in the

native protein of the CHRNA3 gene were from phenylalanine to valine (Y43V) and

lysine to asparagine (K216N). The structural effects of these mutations were

validated using MD studies that show large transitions in the protein structure over

the period of 6 ns. Our predicted data were tallied with the real cancer data from

TCGA database and one mutation out of 8 - Y215H was found in lung cancer cell

line. However, the MD studies showed that the mutations Y43V and K216N would

have comparatively more effects in destabilizing the protein structure than the

mutation Y215H, if they occurred in lung cancers.

Acknowledgments The authors wish to thank the Management, Principal, Director and Head of the

Department of the Siddaganga Institute of Technology, Tumkur, and KLE Dr. MSS CET, Belgaum. The

authors also thank KBITS for providing computational resources to carry out this study.

Conflict of interest No conflict of interest.

References

Amos CI et al (2010) Nicotinic acetylcholine receptor region on chromosome 15q25 and lung cancer risk

among African Americans: a case-control study. J Natl Cancer Inst 102:1199–1205

Bendl J et al (2014) PredictSNP: robust and accurate consensus classifier for prediction of disease-related

mutations. PLoS Comput Biol 10:e1003440

Egleton RD, Brown KC, Dasgupta P (2008) Nicotinic acetylcholine receptors in cancer: multiple roles in

proliferation and inhibition of apoptosis. Trends Pharmacol Sci 29:151–158

Hung RJ et al (2008) A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor

subunit genes on 15q25. Nature 452:633–637

Jia M et al (2014) Computational analysis of functional single nucleotide polymorphisms associated with

the CYP11B2 gene. PLoS One 9:e104311

Biochem Genet

123

Matsumoto Y et al (2013) Crystal structure of a complex of human chymase with its benzimidazole

derived inhibitor. J Synchrotron Radiat 20:914–918

McMenamin SB et al (2010) Adoption of policies to treat tobacco dependence in U.S. medical groups.

Am J Prev Med 39:449–456

Novikov GV, Sivozhelezov VS, Shaitan KV (2013) Investigation of the conformational dynamics of the

adenosine A2A receptor by means of molecular dynamics simulation. Biofizika 58:618–634

Roman J, Koval M (2009) Control of lung epithelial growth by a nicotinic acetylcholine receptor: the

other side of the coin. Am J Pathol 175:1799–1801

Salim EI, Jazieh AR, Moore MA (2011) Lung cancer incidence in the arab league countries: risk factors

and control. Asian Pac J Cancer Prev 12:17–34

Shahzad K et al (2013) A structured-based model for the decreased activity of Ala222Val and Glu429Ala

methylenetetrahydrofolate reductase (MTHFR) mutants. Bioinformation 9:929–936

Shen B et al (2012) Correlation between polymorphisms of nicotine acetylcholine acceptor subunit

CHRNA3 and lung cancer susceptibility. Mol Med Rep 6:1389–1392

Shen B et al (2013) CHRNA5 polymorphism and susceptibility to lung cancer in a Chinese population.

Braz J Med Biol Res 46:79–84

Singh S, Pillai S, Chellappan S (2011) Nicotinic acetylcholine receptor signaling in tumor growth and

metastasis. J Oncol 2011:456743

Thorgeirsson TE et al (2008) A variant associated with nicotine dependence, lung cancer and peripheral

arterial disease. Nature 452:638–642

Xu J et al (2013) Genetic variation in a microRNA-502 minding site in SET8 gene confers clinical

outcome of non-small cell lung cancer in a Chinese population. PLoS One 8:e77024

Biochem Genet

123