A Reexamination of Correlations of Amino Acids with Particular Secondary Structures

13
A Reexamination of Correlations of Amino Acids with Particular Secondary Structures Sas ˇa N. Malkov Miodrag V. Z ˇ ivkovic ´ Milos ˇ V. Beljanski Srd ¯an Ð. Stojanovic ´ Snez ˇana D. Zaric ´ Published online: 12 March 2009 Ó Springer Science+Business Media, LLC 2009 Abstract Using the data from Protein Data Bank the correlations of primary and secondary structures of proteins were analyzed. The correlation values of the amino acids and the eight secondary structure types were calculated, where the position of the amino acid and the position in sequence with the particular secondary structure differ at most 25. The diagrams describing these results indicate that correlations are significant at distances between -9 and 10. The results show that the substituents on Cb or Cc atoms of amino acid play major role in their preference for particular secondary structure at the same position in the sequence, while the polarity of amino acid has significant influence on a-helices and strands at some distance in the sequence. The diagrams corresponding to polar amino acids are noticeably asymmetric. The diagrams point out the exchangeability of residues in the proteins; the amino acids with similar diagrams have similar local folding requirements. Keywords Protein Amino acid Protein secondary structure Statistical correlation Abbreviations DSSP Define secondary structure of proteins the standard method for assigning secondary structure to the amino acids of a protein with determined atomic-resolution coordinates of the protein PDB Protein data bank a repository for 3-D structural data of proteins and nucleic acids PDBSELECT PDB selection of a representative set of PDB chains 1 Introduction Understanding relation of primary and secondary structure of proteins is very important. There are many methods that consider protein folding and many of them use some information about the protein secondary structure [6, 8, 1719, 30, 34, 35, 37, 38, 47, 56, 60, 61, 66]. Conformational preferences of amino acids, usually called propensities, are used to predict secondary and tertiary structures of pro- teins. Propensities of amino acids were introduced long time ago and it is known that amino acids have various propensities for the adoption of helical, strand, and random Electronic supplementary material The online version of this article (doi:10.1007/s10930-009-9166-3) contains supplementary material, which is available to authorized users. S. N. Malkov M. V. Z ˇ ivkovic ´ Department of Mathematics, University of Belgrade, Studentski Trg 16, 11000 Belgrade, Serbia S. N. Malkov e-mail: [email protected] M. V. Z ˇ ivkovic ´ e-mail: [email protected] M. V. Beljanski Institute of General and Physical Chemistry, Studentski Trg 16, 11000 Belgrade, Serbia e-mail: [email protected] S. Ð. Stojanovic ´ IHTM—Department of Chemistry, University of Belgrade, Njegos ˇeva 12, 11001 Belgrade, Serbia e-mail: [email protected] S. D. Zaric ´(&) Department of Chemistry, University of Belgrade, Studentski Trg 16, 11000 Belgrade, Serbia e-mail: [email protected] 123 Protein J (2009) 28:74–86 DOI 10.1007/s10930-009-9166-3

Transcript of A Reexamination of Correlations of Amino Acids with Particular Secondary Structures

A Reexamination of Correlations of Amino Acids with ParticularSecondary Structures

Sasa N. Malkov Æ Miodrag V. Zivkovic ÆMilos V. Beljanski Æ Srdan Ð. Stojanovic ÆSnezana D. Zaric

Published online: 12 March 2009

� Springer Science+Business Media, LLC 2009

Abstract Using the data from Protein Data Bank the

correlations of primary and secondary structures of proteins

were analyzed. The correlation values of the amino acids

and the eight secondary structure types were calculated,

where the position of the amino acid and the position in

sequence with the particular secondary structure differ at

most 25. The diagrams describing these results indicate that

correlations are significant at distances between -9 and 10.

The results show that the substituents on Cb or Cc atoms of

amino acid play major role in their preference for particular

secondary structure at the same position in the sequence,

while the polarity of amino acid has significant influence

on a-helices and strands at some distance in the sequence.

The diagrams corresponding to polar amino acids are

noticeably asymmetric. The diagrams point out the

exchangeability of residues in the proteins; the amino acids

with similar diagrams have similar local folding

requirements.

Keywords Protein � Amino acid �Protein secondary structure � Statistical correlation

Abbreviations

DSSP Define secondary structure of proteins the

standard method for assigning secondary

structure to the amino acids of a protein

with determined atomic-resolution

coordinates of the protein

PDB Protein data bank a repository for 3-D

structural data of proteins and nucleic

acids

PDBSELECT PDB selection of a representative

set of PDB chains

1 Introduction

Understanding relation of primary and secondary structure

of proteins is very important. There are many methods that

consider protein folding and many of them use some

information about the protein secondary structure [6, 8,

17–19, 30, 34, 35, 37, 38, 47, 56, 60, 61, 66]. Conformational

preferences of amino acids, usually called propensities, are

used to predict secondary and tertiary structures of pro-

teins. Propensities of amino acids were introduced long

time ago and it is known that amino acids have various

propensities for the adoption of helical, strand, and random

Electronic supplementary material The online version of thisarticle (doi:10.1007/s10930-009-9166-3) contains supplementarymaterial, which is available to authorized users.

S. N. Malkov � M. V. Zivkovic

Department of Mathematics, University of Belgrade,

Studentski Trg 16, 11000 Belgrade, Serbia

S. N. Malkov

e-mail: [email protected]

M. V. Zivkovic

e-mail: [email protected]

M. V. Beljanski

Institute of General and Physical Chemistry, Studentski Trg 16,

11000 Belgrade, Serbia

e-mail: [email protected]

S. Ð. Stojanovic

IHTM—Department of Chemistry, University of Belgrade,

Njegoseva 12, 11001 Belgrade, Serbia

e-mail: [email protected]

S. D. Zaric (&)

Department of Chemistry, University of Belgrade, Studentski

Trg 16, 11000 Belgrade, Serbia

e-mail: [email protected]

123

Protein J (2009) 28:74–86

DOI 10.1007/s10930-009-9166-3

coil conformation [9–11, 31, 36, 43, 46, 48]. Levitt

observed that 19 of 20 naturally occurring amino acids

have preferences for only one type of secondary structure,

and deduced a very clear classification of amino acids by

their preferences. It was shown that the preferences and

classification correlate with the chemical structure of

amino acids [36]. Recently, the position dependent amino

acids propensities have been studied [15, 40, 49, 50].

Although a lot is known about secondary and tertiary

protein structure and protein folding, the process of folding

is not understood completely. The molecular mechanism of

protein self-assembly is still an open question [20]. It is

believed that the energetics of side chain interactions

dominate protein folding process. However, it was shown

that secondary structure can determine protein native

conformation, devoid of side chains [21, 26]. Recently,

backbone-based theory of protein folding was proposed,

where the protein folding mechanism is based on backbone

hydrogen bonding [54], while a-helix and b-sheet pro-

pensities are closely connected with the energetics of

peptide hydrogen bonds [3].

Individual amino acids show intrinsic propensities

towards certain secondary structure types [9–11, 31, 36, 40,

43, 46, 48]; however, these preferences are modulated by

the sequence segments within which they reside [12, 44,

71]. Moreover, any amino acid in the protein could have

the influence on secondary structure type at certain posi-

tion. It is estimated that local information contains roughly

65% of the secondary structure information [58]. There are

also approaches that consider non-local interactions in the

sequence [22, 23, 27, 59]. Previous works show that non-

covalent interactions play important role in stabilizing

structures of proteins [33] and that there are many new

types of noncovalent interactions that should be considered

[41, 69, 72–74].

In agreement with the findings that there are many

factors that influence tertiary structure of proteins,

advanced methods for prediction of protein structure are

based on neural networks [42]. The neural networks usage

does not improve the understanding of the relation of

amino acids and their chemical structure to their preference

for secondary structure. This fact indicates that not all

factors influencing protein structure are well understood.

In our previous work, we analyzed the correlations of

primary and secondary structures of proteins at same

position in the sequence using a data from the Protein Data

Bank (PDB) [39]. Although many of our results with this

much larger data set are in accordance with results obtained

in seventies on a very small number of proteins [9–11, 36,

52], we have identified a number of important differences.

Our results show clearly how the chemical structure of

amino acids plays a major role in determining their pref-

erences for specific secondary structures and enabled us to

determine rules for predicting the preference of an amino

acid towards a particular secondary structure type based

only on the chemical structure of its substituents at the Cbor Cc atoms [39]. Based on clear preferences of amino

acids towards certain secondary structures, we classified

amino acids into four groups: a-helix preferrers, strand

preferrers, turn and bend preferrers, and others (the group

containing His and Cys, the amino acids showing no clear

preference for any secondary structure). Our results show

that amino acids in a same group have similar structural

characteristics at their Cb and Cc atoms. All a-helix pre-

ferrers have neither polar heteroatoms on Cb and Cc atoms,

nor branching nor aromatic group on the Cb atom. All

strand preferrers have aromatic groups or branching on the

Cb atom. All turn and bend preferrers have polar hetero-

atom on Cb or Cc atoms or do not have a Cb atom at all.

In present paper we report on the influence of amino

acids on secondary structure along the sequence in terms of

statistical correlation. Similar task was performed before

[53], using information measure instead of correlation, and

with substantially smaller number of proteins. Using the

mathematical method that is very similar to the one used in

our previous work [39], here we consider correlations of

amino acids and particular secondary structures at different

relative positions. That enables us to estimate how far the

dependence of secondary structure on amino acid is dis-

tributed along the chain.

2 Methods

Secondary structure types are denoted using letters: H for

a-helix, B for isolated b-bridge, E for extended strand, G

for 3-helix, I for 5-helix, T for hydrogen bonded turn and

S for bend. All other structural elements that do not

belong to these secondary structure types are considered

coil and denoted by C. Secondary structure types are

often reduced to only three; H, E, and C [32, 57]. Here

we consider all eight secondary structure types, including

coils.

3 Computational Model

Consider a set P of n protein chains. Primary structures of

these protein chains are described by sequences a1,…, an.

Let len(i) denote the length of the sequence ai, and let the

residues of the sequence ai are ai,1,…, ai,len(i), 1 B i B n.

The corresponding assigned secondary structures are

denoted by b1,…, bn, where bi is a sequence bi,1,…, bi,len(i),

1 B i B n.

If A is a logical expression, then the indicator variable

I(A) is defined by:

Correlations of Amino Acids with Particular Secondary Structures 75

123

I Að Þ ¼ 1; A ¼ true

0; A ¼ false

Suppose now that a1,…, an and b1,…,bn are the n

random sequences of corresponding lengths. Let

Xij(s) = I(bij = s) and Yij(p) = I(aij = p) denote binary

random variables corresponding to events that the

secondary structure type assigned to residue aij is s, and

that aij is the amino acid p, respectively. Consider the

window in the sequence i, centered at the position j. Let

Zijðs; p; sÞ ¼ XijðsÞYi;jþsðpÞ ð1Þ

denote the random variable corresponding to the joint event

that the secondary structure type assigned to aij (amino acid

at the center of the window) is s and that ai,j?s (amino acid

at the offset s from the window center) is p.

The offset s range is determined by the size of the

window. The definition of Zijðs; p; sÞ is valid only if both j

and j ? s are between 1 and len(i). Let S(s) denote the set

of valid pairs (i, j) for given offset s:

SðsÞ ¼ ði; jÞj1� j; jþ s� lenðiÞf g;

and let T(s) denote the size of S(s).Consider now the random variables. NPS(s,p,s)—the

number of times the amino acid p occurs at distance s from

the position j, where the secondary structure type is s:

NPSðs; p; sÞ ¼XSðsÞ

Zijðs; p; sÞ;

NP(p,s)—the number of occurrences of the amino acid p

at positions j ? s, such that the position j is inside the same

chain, for all secondary structure types s:

NPðp; sÞ ¼X

s

NPSðs; p; sÞ;

NS(s,s)—the number of times the secondary structure

type s is assigned at the position j, such that the position

j ? s is inside the same chain:

NSðs; sÞ ¼X

p

NPSðp; s; sÞ:

The total count of valid sample pairs at distance s is

TðsÞ ¼Xp;s

NPSðs; p; sÞ: ð2Þ

The correlation coefficient of random variables X and Y

is defined by

qðX; YÞ ¼ CovðX; YÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarðXÞVarðYÞ

p ;

(the reader is referred to book [62], for example). If both

variables are binary, then

qðX; YÞ ¼ XY � X YffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXð1� XÞYð1� YÞ

p :

The correlation coefficient is always in the range [-1,

1]. It is 0 if X and Y are independent. The correlation

coefficient is 1 or -1 if and only if the random variables

are linearly dependent.

Consider the correlation of random variables Xij and Yik,

qðXijðsÞ; YikðpÞÞ ¼Zijðs; p; sÞ � XijðsÞ YikðpÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

XijðsÞð1� XijðsÞÞYikðpÞð1� YikðpÞÞq ;

where k = j?s and Zijðs; p; sÞ is defined by Eq. 1.

Assuming that distributions of Xij and Yik depend only on

p, s and s (i.e., they are independent on the sequence choice

and the absolute position inside the sequence), the

correlation coefficients at offset s is estimated by

qðs; p; sÞ ¼ Zðs; p; sÞ � Xðs; sÞ Yðp; sÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXðs; sÞð1� Xðs; sÞÞYðp; sÞð1� Yðp; sÞÞ

q :

The estimates of means of X, Y and Z are

Xðs; sÞ ¼ NSðs; sÞ=TðsÞ;

Yðp; sÞ ¼ NPðp; sÞ=TðsÞ;

Zðs; p; sÞ ¼ NPSðs; p; sÞ=TðsÞ:

Hence the correlation coefficient estimate is

qðs; p; sÞ ¼NPSðs; p; sÞTðsÞ � NPðp; sÞNSðs; sÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

NPðp; sÞðTðsÞ � NPðp; sÞÞNSðs; sÞðTðsÞ � NSðs; sÞÞp :

ð3Þ

The value of q(s, p,s) is positive (respectively, negative

and zero) if the pair (p, s) occurs more (respectively, less

and equally) frequently at the distance s than it would

occur if p and s were independent at the distance s.

In order to estimate the significance of the correlation

coefficient, we evaluate the statistics ts

ts ¼ q

ffiffiffiffiffiffiffiffiffiffiffiffiffin� 2

1� q2

s;

where n = T(s). Under the assumption that the correlation

coefficient is 0, the distribution of the statistics ts is

t-distribution with n-2 degrees of freedom (see [62] for

example).

If the sample size n is large, then t-distribution is

approximated by the normal distribution N(0,1). Let the

null hypothesis be that X and Y are independent, i.e., that

there is no dependence of the secondary structure type s on

76 S. N. Malkov et al.

123

amino acid p at distance s. The null hypothesis is consid-

ered false if it implies that the probability to obtain

correlation coefficient estimate with absolute value greater

than calculated is less than 0.001. If the null hypothesis is

true, using the normal distribution approximation we obtain

that the probability of the event ‘‘|ts| is greater than tlim’’ is

0.05 for tlim = 1.96. Hence, the correlation coefficient is

significant, and we consider that X and Y are dependent if

|ts| C 1.96. If we denote the corresponding value of the

correlation coefficient by qlim, then the correlation coeffi-

cient is significant if

qj j � qlim ¼tlimffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

t2lim þ n� 2

p ¼ 1:96ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1:84þ np : ð4Þ

We now compare the correlation coefficient with the other

dependency measures—propensity [9–11] and information

measure [53]. Let a = P(bij = s), b = P(aij = p) denote the

probabilities of secondary structure s, and the amino acid p,

respectively. Furthermore, let x = P(bij = s | ai,j?s = p)

denote the conditional probability of secondary structure s at

position i after knowing that the amino acid p is at the

position j ? s.

The ratio x/a is a propensity (generalized, because the

primary and secondary structure are not considered at the

same position). The correlation coefficient

qðs; p; sÞ ¼ ðx� aÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

b

að1� aÞð1� bÞ

s

is a linear function of x, and the information measure

Iðbij ¼ s : �s; ai;jþs ¼ pÞ ¼ ln1

ð1=xÞ � 1� ln

a

1� a

is a nonlinear, but increasing function of x. We conclude

that for fixed a, b the correlation coefficient is an increasing

function of the information measure (and vice versa). The

advantage of using correlation coefficient instead of

information measure is a consequence of the possibility to

determine the threshold value, below which the correlation

of primary and secondary structures is not significant.

4 Data Sets

As a source of protein data we used Protein Data Bank (PDB),

release #103 from January 2003, containing 18482 proteins

[5]. To assign secondary structures we used the program

DSSP [29]. Some families of proteins are overrepresented in

PDB. Therefore we used PDBSELECT selection of sequen-

ces with threshold of 25% to eliminate redundant data from

the full PDB protein sequences set. PDBSELECT is designed

both to reduce internal homologies and to preserve the

selection representativeness [28]. It contains sequences with

best overall quality, considering sequence source and tech-

nique, resolution, completeness and length. The resulting set

contains 1737 sequences with 282,329 amino acid residues.

The sample size ranges from 244,329 (for |s| = 25) to

282,329 (for s = 0). The corresponding values of qlim are

0.0040 (for |s| = 25), and 0.0037 (for s = 0).

Remark: The correlations are calculated for some other

thresholds, also, and similar results are obtained. While

specific correlation values differ, the trends and the con-

clusions are the same. The method shows a significant

stability. The threshold of 25% is subjectively estimated as

a good measure because smaller thresholds raise redun-

dancy and larger thresholds reduce the sample size.

5 Results and Discussion

The values of correlation of amino acids with secondary

structure types are computed for the PDBSELECT subset

of protein sequences using Eq. 3. The 8160 correlation

values are calculated (8 types of secondary structures, 20

amino acids and 51 different offsets).

Based on preferences of amino acids to participate in

particular secondary structures, determined by their corre-

lation values at offset s = 0, in our previous work we

classified amino acids into four groups: a-helix preferrers

(Ala, Leu, Glu, Gln, Arg, Met, Lys), strand preferrers (Val,

Ile, Tyr, Phe, Thr, Trp), turn and bend preferrers (Gly, Asn,

Pro, Asp, Ser), and the fourth group consisting of amino

acids that do not show clear preference for any particular

secondary structure type (Cys, His) [39]. Amino acids in

each group are ordered according to decreasing correlation

values.

The computed correlation values are represented in

Figs. 1–3. We organize the data according to the classifi-

cation of amino acids based on their preferences towards

certain secondary structures [39] in four groups: a-helix

preferrers, strand preferrers, turn and bend preferrers, and

His and Cys, the amino acids showing no clear preference

for any secondary structure. Since His and Cys do not show

significant correlation values for any of the secondary

structure types the data for these two amino acids are not

presented.

Each figure contains diagrams for one group of amino

acids, with separate diagrams for each amino acid. Each

diagram consists of eight graphs, representing the corre-

lation of particular amino acid and eight secondary

structure types. The diagrams are in full concordance with

previous analysis of correlations of amino acid residues

and secondary structure types at the same position [39].

Moreover, similar shapes of the diagrams of the amino

acids in the same group (Figs. 1–3) are additional support

for our classification of the amino acids.

Correlations of Amino Acids with Particular Secondary Structures 77

123

(a)

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Ala - H

Ala - E

Ala - T

Ala - B

Ala - G

Ala - I

Ala - S

Ala - C

(b)

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Leu - H

Leu - E

Leu - T

Leu - B

Leu - G

Leu - I

Leu - S

Leu - C

(c)

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Glu - H

Glu - E

Glu - T

Glu - B

Glu - G

Glu - I

Glu - S

Glu - C

(d)

-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Gln - H

Gln - E

Gln - T

Gln - B

Gln - G

Gln - I

Gln - S

Gln - C

Fig. 1 Correlations of a-helix preferrers with secondary structure

types. The correlations for each of a-helix preferrers are presented in

a separate diagram: a Alanine, b Leucine, c Glutamic acid, dGlutamine, e Arginine, f Methionine and g Lysine. Each diagram

contains eight graphs representing the correlation of an amino acid

and the eight secondary structure types (H—a-helix, B—isolated

b-bridge, E—extended strand, G—3-helix, I—5-helix, T—hydrogen

bonded turn, S—bend, C—coil). The x-axis is labeled by s, the offset

of the amino acid relative to the position of the considered secondary

structure type. The positive (negative) values of s correspond to

amino acid positions following (preceding) the secondary structure

type position (Eq. 1). The y-axis is labeled by the correlation value.

Correlation values above 0.004 are statistically significant The range

of s is same in all diagrams, but the range of correlations (y-axis)

varies significantly

78 S. N. Malkov et al.

123

The diagrams in Figs. 1–3 show correlation of amino

acid and the secondary structure at the offset s varying

from -25 to 25. It can be noticed that the range of the

offset s values, for which the correlation coefficients are

significant (Eq. 4), varies for different amino acids, as

well as for different secondary structures. If we take

0.004 (see Method/Data Sets) as the threshold correlation

value, then the average range with significant correlation

values is [-9, 10]. In some cases, significant correlation

values are obtained for s beyond that range. For example,

the helix preferrers often have significant correlations

with secondary structures at distances of 10–12 residues,

and most of helix preferrers have even significant cor-

relation values at the distances of about 20 residues

(Fig. 1).

In previous articles [24, 25, 53] a smaller range [-8, 8]

was considered to be significant. We believe that larger data

set and different computational model explain the wider

range of significant values. As we explained above (In

Computational model) information measure and correlation

coefficient are monotonously connected. Because of that the

shapes of diagrams in Figs. 1–3 should be the same as

diagrams of directional information transfer obtained before

[53]. However, there are differences caused by data sets of

significantly different sizes. The previously published dia-

grams of directional information transfer generally agree in

shape with our correlation diagrams for Ala, Leu, Met, Lys,

Val, Ile, Gly, Asn, Pro and Asp; they are substantially dif-

ferent for Gln, Arg, Tyr, Thr, Trp and Cys; and modesty

different for Glu, Phe and His.

(e)

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Arg - H

Arg - E

Arg - T

Arg - B

Arg - G

Arg - I

Arg - S

Arg - C

(f)

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Met - H

Met - E

Met - T

Met - B

Met - G

Met - I

Met - S

Met - C

(g)

-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Lys - H

Lys - E

Lys - T

Lys - B

Lys - G

Lys - I

Lys - S

Lys - C

Fig. 1 continued

Correlations of Amino Acids with Particular Secondary Structures 79

123

Amino acids belonging to a same group generally tend

to express similar behavior with respect to certain sec-

ondary structure types, not only to the one they prefer, but

also to the others.

Some of the diagrams are roughly symmetric, while

some of them are asymmetric. Roughly symmetric dia-

grams with the maximum at s = 0 indicate that these amino

acids tend to form specific secondary structure, while

roughly symmetric diagrams with the minimum at s = 0

indicate that amino acids tend to disrupt specific secondary

structure. Amino acids with diagrams with maximum for

positive values of s tend to occur at the C-end of the

specific secondary structure, while amino acids with dia-

grams with maximum for negative values of s tend to occur

at the N-end of the specific secondary structure.

All diagrams have largest correlation values for small

values of s (the positions close to particular amino acid)

indicating that there is the strong influence only on the

secondary structure in the vicinity and that local interac-

tions are important for secondary structure. It was shown

that secondary structure is a consequence of local side-

chain-backbone interactions [67]. Based on semiempirical

model [70] and calculated from the first principle [55] it

was quantitatively established that the local interaction

between the side chains and the backbone is dominant

cause of b-sheet propensities of amino acids. It is consid-

ered that the local interactions are responsible for

secondary structure propensities and that locally formed

secondary structures serve to preorganize the folding pro-

cess [1, 4, 13, 45, 68].

The diagrams in Figs. 1–3 reveal agreement with the

data obtained for the exchangeability of residues in evo-

lutionarily related proteins. By analysing the replacement

pattern between amino acids in structurally similar proteins

four strong clusters were observed: (1) Ile and Val, (2) Leu

and Met, (3) Lys, Arg and Gln, and (4) Tyr and Phe [51]. It

has been also shown that the exchangeability of many

residues cannot be described simply in terms of their

chemical properties, however, these residues are con-

formationally similar [7], enabling similar local folding

requirements. Our data also show large similarities in the

diagrams among amino acids in the same cluster. Cluster

(1) Ile and Val, and (4) Tyr and Phe have similar shapes

and similar values of correlation coefficients of the dia-

grams for all secondary structures. Cluster (2) Leu and Met

have very similar shapes of the diagrams, although, the

correlation coefficients are somewhat different, while

cluster (3) Lys, Arg and Gln, have small differences in the

shape of the diagrams and very similar correlation

coefficients.

This demonstrates that the data in Figs. 1–3 posses

important information about secondary structure and can

help in recognizing which substitutions of the amino acids

will not change the secondary structure.

5.1 a-Helix Preferrers

These amino acids support the appearance of a-helices in

their vicinity, but there are differences in their appearance

at different positions within helices (Fig. 1). That is in

agreement with previous results showing that certain amino

acids have preference to be at the beginning or at the end of

a-helix [7, 9–11, 15, 53].

While Ala induces a-helices in its vicinity almost

equally in both directions, the other amino acids from this

group have asymmetrical distribution of the a-helices

support. The asymmetry of Gln’s correlation is not large.

The correlation distribution for amino acids Leu, Arg, Met

and Lys with a-helices is shifted towards positive values of

s, with the largest shift for Lys. On the contrary, the cor-

relation for Glu is strongly shifted towards negative values

of s. That leads to the conclusion that Ala and Gln are

evenly distributed in helices, Leu, Arg, Met and Lys show

tendency to be closer to helix C-ends, while Glu tends to be

very close to a-helices N-terminal. Tendencies of these

amino acids to be at different positions in a-helices are in

agreement with previously calculated propensities of

amino acid in different regions of a-helices [7, 9–11, 15,

53]. It is also known that Lys and Arg stabilize C-terminal

end of a-helices, while Glu stabilizes the N-terminal end of

a-helices [2, 14].

Diagrams of amino acids that belong to the group of

a-helix preferrers have interesting feature that correlation

with a-helix secondary structure is decreasing up to posi-

tions at the distance of 12, however, at the longer distance

the correlation is again somewhat increased at the distances

of 20 positions. The large correlation at shorter distances is

correlation within the same helix. The correlation at longer

distances is consequence of tertiary contact for two helices

packing against each other.

All a-helix preferrers, except Lys, obstruct the formation

of strands in their neighborhood. Lys obstructs the forma-

tion of strands at its and preceding positions, but strongly

supports their formation in subsequent positions, which is

related to its higher a-helix propensity at the helix end. It is

interesting that Leu has strong negative correlation with

strands at distance 5–6 in both directions, although it tends

to build strands. Met is relatively indifferent to appearance

in strands, but it has negative correlation with strands in its

neighborhood.

Lys supports turns formation in its immediate neigh-

borhood. Met obstructs the formation of coils in the

preceding positions. Glu strongly supports the formation of

coils in the preceding positions.

80 S. N. Malkov et al.

123

5.2 Strand Preferrers

Most of amino acids from the strand preferrers group (Val,

Ile, Tyr, Phe, Thr and Trp) do not have positive correlation

with strands at a distance (Fig. 2). Amino acids Ile, Phe

have negative correlation with strands at distance of 4–8,

while Val and Trp have less pronounced but still negative

correlation at the same distance. Trp has stronger negative

correlation for negative values of s. It is interesting that Val

exhibits positive correlation with strands at distances

longer than 8. Tyr is relatively neutral to strands in its

vicinity. Having positive, but small correlation with strands

for all s values, Thr differs from other members of the

group. In previous classification of Thr as a strand preferrer

it was point out that Thr is unique among strand preferrers,

and indeed among all amino acids, because it has almost

the same correlation value with strands and with coils [39].

This is the consequence of the Thr structure. Namely, Thr

has structural characteristics of strand preferrers (branching

on Cb atom) and turn and bend preferrers (polar hetero-

atom at Cb). Hence, quite unique diagrams for Thr could

be anticipated.

Many of strand preferrers have negative correlation with

distant a-helix structures. Thr and Val have negative cor-

relation with a-helices for positive values of s. Ile shows

small positive correlation for almost all positive values of

s. There is also uniform small negative correlation between

Tyr and a-helices for both positive and negative values of s.

(a)

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Val - H

Val - E

Val - T

Val - B

Val - G

Val - I

Val - S

Val - C

(b)

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Ile - H

Ile - E

Ile - T

Ile - B

Ile - G

Ile - I

Ile - S

Ile - C

(c)

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Tyr - H

Tyr - E

Tyr - T

Tyr - B

Tyr - G

Tyr - I

Tyr - S

Tyr - C

Fig. 2 Correlations of strand preferrers with secondary structure types. The correlations for each of strand preferrers are presented in a separate

diagram: a Valine, b Isoleucine, c Tyrosine, d Phenylalanine, e Threonine and f Tryptophan. See Fig. 1 for detailed description

Correlations of Amino Acids with Particular Secondary Structures 81

123

Most of strand preferrers have positive correlation with

distant turns. Thr is an exception: it supports the formation

of turns in the preceding, but has negative correlation with

them in the subsequent positions. Thr weakly supports the

formation of bends, mostly in positions between -2 and 5.

Other strand preferrers obstruct the formation of bends.

Thr, Tyr, Phe and Ile are positively correlated with

b-bridges. Val, Ile and Thr obstruct the formation of

3-helices, while Phe and Tyr support them at both their and

immediately preceding positions. As was mentioned above,

Thr has positive correlation with coils, and it supports the

formation of coils at it’s and precedent positions. Ile, Val,

Tyr and Phe obstruct coils. Val tends to support coils in its

vicinity.

5.3 Turn and Bend Preferrers

Amino acids from the third group (Gly, Asn, Pro, Asp and

Ser), turn and bend preferrers, are frequent in the vicinity

of strands, but have negative correlation with a-helices in

their vicinity, except Asp (Fig. 3). Asp supports a-helices

in subsequent positions, but has negative correlation in

precedent positions. Similar to Asp, Ser also supports

a-helices in subsequent positions, and has negative corre-

lation in precedent positions. It is in agreement with

already known fact that Asp and Ser stabilize N-terminal

end of a-helices [2, 14]. Asp and Ser have quite similar

diagrams for strand secondary structures too. There are also

similarities of diagrams for Ser and Thr.

(d)

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Phe - H

Phe - E

Phe - T

Phe - B

Phe - G

Phe - I

Phe - S

Phe - C

(e)

-0.05

-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Thr - H

Thr - E

Thr - T

Thr - B

Thr - G

Thr - I

Thr - S

Thr - C

(f)

-0.02

-0.015

-0.01

-0.005

0

0.005

0.01

0.015

0.02

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Trp - H

Trp - E

Trp - T

Trp - B

Trp - G

Trp - I

Trp - S

Trp - C

Fig. 2 continued

82 S. N. Malkov et al.

123

Most of turn and bend preferrers have positive correla-

tions with bends, turns and coils in their close vicinity, but

often asymmetrically.

Asp supports coils at it’s and preceding positions (s C 0).

Gly tends to build turns and appears very often at their end.

It supports the formation of coils at the subsequent position

(s = -1). Pro tends to initiate turns and to support turns in

its vicinity, but it strongly obstructs the formation of turns at

s = ?1. Pro tends to appear at bends’ and coils’ ends. Asp

and Ser support the formation of 3-helices.

5.4 Cys and His

Remaining two amino acids Cys and His are relatively

weakly correlated with secondary structures [39], and it is

also the case for their vicinity. For Cys and His only a small

number of correlation values exceed the threshold qlim. Cys

has negative correlation with strands for s values below -4

and above ?5, while His has small negative correlation

with a-helices for s between -3 and 1.

6 Influence of Amino Acids Structures on Shapes

of the Diagrams

In our previous work [39] we showed that preferences of

amino acid for particular secondary structure at the same

position in the sequence are related to amino acids struc-

tural properties at the Cb and Cc atoms. Hence, the

structural properties of an amino acid close to the backbone

(a)

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Gly - H

Gly - E

Gly - T

Gly - B

Gly - G

Gly - I

Gly - S

Gly - C

(b)

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Asn - H

Asn - E

Asn - T

Asn - B

Asn - G

Asn - I

Asn - S

Asn - C

(c)

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Pro - H

Pro - E

Pro - T

Pro - B

Pro - G

Pro - I

Pro - S

Pro - C

Fig. 3 Correlations of turn and bend preferrers with secondary structure types. The correlations for each of turn and bend preferrers are

presented in a separate diagram: a Glycine, b Asparagine, c Proline, d Aspartic acid and e Serine. See Fig. 1 for detailed description

Correlations of Amino Acids with Particular Secondary Structures 83

123

have strong influence on the secondary structure at the

same position in the sequence. This is in accordance with

the other observations that local interactions are responsi-

ble for secondary structure propensities [1, 55, 67, 68, 70].

As it was shown in Figs. 1–3 and stressed above, many

features of the graphs are similar for the amino acids

belonging to the same group. Since amino acids in a same

group have the same structural properties on Cb and Ccatoms, the similarities are consequence of these structural

properties. In this work, the computed correlation values

for distant residues provide us with an additional obser-

vation: as the distance between observed amino acids and

secondary structure residues becomes larger, the influence

of amino acid structural elements, which are important for

the secondary structure at the position of the amino acid,

decreases and, at the same time, the influence of polarity

and capability for hydrogen bonding of the residue

increases. In accordance with the presented influence

properties, the diagrams for the amino acids in a same

group are very similar to each other at the offset 0, but the

similarity decreases for larger offsets. For example, dia-

grams for a-helix preferrers with polar groups Glu, Arg,

and Lys show significant difference at longer distances

compared to the rest of the group. Similarly, Thr has very

different behavior at non-zero distances compared to the

rest of the strand preferrers, and Asp and Ser behave

differently than the rest of the turn preferrers.

An asymmetry of the diagrams is noticed for several

amino acids, and it is not connected with particular group

of amino acids. All the asymmetric diagrams were

observed for polar amino acids indicating that electro-

static interactions are important for prediction of the

protein structure [65]. It is also in agreement with the

information about long-range interactions in proteins

deduced from a comparison between the results obtained

from the host-guest technique and from X-ray data on

proteins [64].

The asymmetric diagrams were observed for polar

amino acids where the polar side chains can form salt

bridges and side-chain main-chain hydrogen bonds.

Recently, salt bridges in 2017 monomeric protein struc-

tures were analysed [63]. On average, there are about five

salt bridges for each protein. Hence, we consider that small

number of salt bridges per protein does not influence cor-

relation symmetry of amino acids depicted in Figs. 1–3.

Side-chain main-chain (SC-MC) hydrogen bonds occur

more frequently; the ratio of the number of SC-MC

hydrogen bonds to the total number of polar residues is

close to 1:2 [16], indicating that these hydrogen bonds can

have influence on the shape of the diagrams.

Most of asymmetric diagrams are for the correlation

with a-helices, what is in accordance with the fact that in

a-helices there are interactions of amino acids at some

distance.

(d)

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Asp - H

Asp - E

Asp - T

Asp - B

Asp - G

Asp - I

Asp - S

Asp - C

(e)

-0.05

-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

-25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Ser - H

Ser - E

Ser - T

Ser - B

Ser - G

Ser - I

Ser - S

Ser - C

Fig. 3 continued

84 S. N. Malkov et al.

123

The large asymmetry for a-helices is observed for Glu

and Asp. It is interesting that both of them have carboxyl

groups. Although Glu has large positive correlation values

with helices, and Asp has large negative values, both have

decreased correlation for positive s values. It is observed

that both of them have significant propensities for N-end of

helix [7]. Glu forms SC-MC i/i-3 hydrogen bonds in

helices, while Asp forms i ? 2/i and i ? 3/i in N-end of

helices [16].

The asymmetry of the diagrams for a-helices is also

pronounced for amino acids with amino groups, Arg and

Lys. These two amino acids have asymmetric diagrams for

strands, too. It is connected with their large propensity to

be located in C-end of the helices [7].), and it is more

pronounced for Lys with larger propensity. Leu and Met

also have propensities, but smaller than Lys and Arg to be

in the C-end of the helices [7]. Leu and Met have less

asymmetric diagrams, but also have maximum correlation

values for positive s. Less asymmetry is connected with the

fact that they are not polar.

The diagrams for a-helix and strands for Asp and Ser are

very similar. They are asymmetric and quite different than

the diagrams for other turn and bend preferrers. This is

related to the similarities in the structures of Asp and Ser—

both of them have polar oxygen atoms, carboxyl and

hydroxyl group, respectively. Similar shape of the diagram

for a-helix has Thr. It could be anticipated since Thr and

Ser form similar SC-MC hydrogen bonds. Namely, both of

them have high propensities for i/i-4 and i/i-3 hydrogen

bonds, and somewhat lower propensities for i ? 2/i and

i ? 3/i bonds; where i ? 2/i propensity for Thr is the

lowest (1.07) [16]. Most of the i/i-4 bonds are inside of

a-helices, while some of them are in C-terminal end. Most

of the i/i-3 hydrogen bonds occur at the C-terminal end,

however, some of them are at the N-terminal end of

a-helices and in b-structures. Majority of i ? 2/i and i ? 3/

i interactions occur in the b-structure at i and the secondary

structure of the following residues are bridge or a-helices

[16].

Considering strand preferrers, the diagrams for Thr and

Trp are quite different from the rest of the diagrams in the

group. The difference is the consequence of heteroatoms in

the structures of Thr and Trp.

7 Conclusions

Calculating correlations of amino acids with particular

secondary structures, we determined amino acids tendency

to influence the secondary structure along the sequence.

The diagrams representing correlations of amino acids

with secondary structures at neighboring positions, for

offset s values from -25 to 25, reveal that correlations are

generally significant at distances in the range between -9

and 10, depending on both the amino acid and the sec-

ondary structure type. The shapes of the diagrams are

related to structural properties of amino acids. The largest

asymmetry is observed for polar amino acids and amino

acids that can form hydrogen bonds.

Our previous results show that the structural elements on

Cb or Cc atoms of amino acid, and not their polarity, play

major role in their preference for a-helices and strands at

the same position in the sequence. However, results pre-

sented here show that polarity of amino acids plays

important role in forming a-helices and strands at some

distance in the sequence.

The amino acids with similar diagrams have similar

local folding requirements, hence, they show which sub-

stitutions of the amino acids will not change the secondary

structure.

The diagrams posses important information about

influence of amino acid on local secondary structure that

could be useful in secondary structure predictions and in

prediction of possible amino acid replacement by others

having similar local folding requirements.

Acknowledgments This work was supported under projects No

144030 and No 142037 by the Ministry of Science of Republic of

Serbia.

References

1. Aurora R, Creamer TP, Srinivasan R, Rose GD (1997) J Mol Biol

272:1413–1416

2. Aurora R, Rose GD (1998) Protein Sci 7:21–38

3. Baldwin RL (2007) J Mol Biol 371:283–301

4. Baldwin RL (2008) Annual Rev Biophys 37:1–21

5. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN,

Weissig H, Shindyalov IN, Bourne PE (2000) Nucleic Acids Res

28(1):235–242

6. Bowie JU, Luthy R, Eisenberg D (1991) Science 253:164–170

7. Chakrabarti P, Pal D (2001) Prog Biophys Mol Biol 76:1–102

8. Chen CC, Singh JP, Altman RB (1999) Bioinformatics 15:53–65

9. Chou PY, Fasman GD (1974) Biochemistry 13:222–245

10. Chou PY, Fasman GD (1974) Biochemistry 13(2):211–222

11. Chou PY, Fasman GD (1978) Adv Enzymol Relat Areas Mol

Biol 47:45–148

12. Dalal S, Balasubramanian S, Regan L (1998) Nat Struct Biol

4:548–552

13. Dill KA, Ozkan SB, Shell MS, Weikl TR (2008) Annu Rev

Biophys 37:289–316

14. Doig AJ, Baldwin RL (1995) Protein Sci 4:1325–1336

15. Engel DE, DeGrado WF (2004) J Mol Biol 337(5):1195–1205

16. Eswar N, Ramakrishnan C (2000) Protein Eng 13:227–238

17. Eyrich VA, Standley DM, Felts AK, Friesner RA (1999) Proteins

35:41–57

18. Eyrich VA, Standley DM, Friesner RA (1999) J Mol Biol

288:725–742

19. Fischer D, Eisenberg D (1996) Protein Sci 5:947–955

20. Fitzkee NC, Fleming PJ, Gong H, Panasik N Jr, Street TO, Rose

GD (2005) Trends Biochem Sci 30:73–80

Correlations of Amino Acids with Particular Secondary Structures 85

123

21. Fleming PJ, Gong HP, Rose GD (2006) Protein Sci 15(8):1829–

1834

22. Frishman D, Argos P (1996) Protein Eng 9:133–142

23. Frishman D, Argos P (1997) Proteins 27:329–335

24. Garnier J, Osguthorpe DJ, Robson B (1978) J Mol Biol 120:97–

120

25. Gibrat JF, Garnier J, Robson B (1987) J Mol Biol 198:425–443

26. Gong H, Fleming PJ, Rose GD (2005) Proc Natl Acad Sci USA

102(45):16227–16232

27. Gromiha MM, Selvaraj S (2004) Prog Biophys Mol Biol

86(2):235–277

28. Hobohm U, Sander C (1994) Protein Sci 3:522–524

29. Kabsch W, Sander C (1983) Biopolymers. 22(12):2577–2637

30. Kelley LA, MacCallum RM, Sternberg MJE (2000) J Mol Biol

299:499–520

31. Kim CA, Berg JM (1990) Nature 362:267–270

32. Kloczkowski A, Ting KL, Jernigan RL, Garnier J (2002) Proteins

49:154–166

33. Koehl P, Levitt M (1999) Proc Natl Acad Sci 96:12524–12529

34. Koretke KK, Luthey-Schulten L, Wolynes PG (1998) Proc Natl

Acad Sci USA 95:2932–2937

35. Levitt M, Warshel A (1975) Nature 253:694–698

36. Levitt M (1978) Biochemistry 17:4277–4285

37. Lomize AL, Pogozheva ID, Mosberg HI (1999) Proteins Suppl

3:199–203

38. Maiorov VN, Crippen GM (1992) J Mol Biol 227:876–888

39. Malkov SN, Zivkovic MV, Beljanski MV, Hall MB, Zaric SD

(2008) J Mol Model 14:769–775

40. Mandel-Gutfreund Y, Gregoret LM (2002) J Mol Biol 323(3):

453–461

41. Medakovic VB, Milcic MK, Bogdanovic GA, Zaric SD (2004)

J Inorg Biochem 98:1867–1873

42. Midic U, Dunker AK, Obradovic Z (2005) Proc IEEE Sym

CIBCB 490–497

43. Minor DL, Kim PS (1994) Nature 367:660–663

44. Minor DL, Kim PS (1994) Nature 371:264–267

45. Nolting B, Salimi N, Guth U (2008) J Theor Biol 251:331–347

46. O’Neil KT, DeGrado WF (1990) Science 250:646–651

47. Ortiz AR, Kolinski A, Rotkiewicz P, Ilkowsky B, Skolnick J

(1999) Proteins suppl3:177–185

48. Padmanabhan S, Marqusee S, Ridgeway T, Laue TM, Baldwin

RL (1990) Nature 344:268–270

49. Penel S, Hughes E, Doig AJ (1999) J Mol Biol 287:127–143

50. Petukhov M, Munoz V, Yumoto N, Yoshikawa S, Serrano L

(1998) J Mol Biol 278:279–289

51. Risler JL, Delorme MO, Delacroix H, Henaut A (1988) J Mol

Biol 204:1019–1029

52. Robson B (1974) Biochem J 141(3):853–867

53. Robson B, Suzuki E (1976) J Mol Biol 107:327–356

54. Rose GD, Fleming PJ, Banavar JR, Maritan A (2006) Proc Natl

Acad Sci USA 103(45):16623–16633

55. Rossmeisl J, Kristensen I, Gregersen M, Jacobsen KW, Nørskov

JK (2003) J Am Chem Soc 125:16383–16386

56. Rost B (1998) In: von Rague-Schleyer P (ed) Encyclopedia of

computational chemistry. John Wiley, Sussex, pp 2242–2255

57. Rost B (2001) J Struct Biol 134:204–218

58. Rost B (2003) In: Bourne PE (ed) Structural bioinformatics.

Wiley-Liss, Hoboken, pp 559–587

59. Salamov AA, Solovyev VV (1997) J Mol Biol 268:31–36

60. Samudrala R, Huang E, Koehl P, Levitt M (2000) Protein Eng

13:453–457

61. Samudrala R, Xia Y, Huang E, Levitt M (1999) Proteins suppl

3:194–198

62. Samuels ML, Witmer JA (2003) Statistics for the life sciences,

3rd edn. Pearson Education, New Jersey

63. Sarakatsannis JN, Duan. Y (2005) Proteins 60:732–739

64. Scheraga HA (1978) Pure Appl Chem 50:315–324

65. Scheraga HA, Liwo A, Odsiej S, Czaplewski C, Pillardy J, Ripoll

DR, Vila JA, Kazmierkiewicz R, Saunders JA, Arnautova YA,

Jagielska A, Chinchio M, Nanias M (2004) Front Biosci 9:3296–

3323

66. Solis AD, Rackovsky S (2004) Polymer 45:525–546

67. Shortle D (2002) Protein Sci 11:18–26

68. Srinivasan R, Rose GD (1999) Proc Natl Acad Sci USA

96:14258–14263

69. Stojanovic SÐ, Medakovic VB, Predovic G, Beljanski M, Zaric

SD (2007) J Biol Inorg Chem 12:1063–1071

70. Street AG, Mayo SL (1999) Proc Natl Acad Sci USA 96:9074–

9076

71. Xiong H, Buckwalter BL, Shieh HM, Hecht MH (1995) Proc Natl

Acad Sci USA 92:6349–6353

72. Zaric SD, Popovic D, Knapp EW (2000) Chem Eur J 6:3935–

3942

73. Zaric SD, Popovic DM, Knapp EW (2001) Biochemistry

40:7914–7928

74. Zaric SD (2003) Eur J Inorg Chem 34:2197–2209

86 S. N. Malkov et al.

123