Size, shape, and flexibility of RNA structures
Transcript of Size, shape, and flexibility of RNA structures
arX
iv:q
-bio
/060
9035
v1 [
q-bi
o.B
M]
23
Sep
2006
Size, shape, and flexibility of RNA structures
Changbong Hyeon1 and Ruxandra I. Dima2 and D. Thirumalai1,3
1Biophysics Program,
Institute for Physical Science and Technology,
University of Maryland,
College Park, MD 20742
2Department of Chemistry,
University of Cincinnati,
Cincinnati OH, 45221
3Department of Chemistry and Biochemistry,
University of Maryland,
College Park, MD 20742
(Dated: February 9, 2008)
Determination of sizes and flexibilities of RNA molecules is important in under-
standing the nature of packing in folded structures and in elucidating interactions
between RNA and DNA or proteins. Using the coordinates of the structures of
RNA in the Protein Data Bank we find that the size of the folded RNA structures,
measured using the radius of gyration, RG, follows the Flory scaling law, namely,
RG = 5.5N1/3 A where N is the number of nucleotides. The shape of RNA molecules
is characterized by the asphericity ∆ and the shape S parameters that are computed
using the eigenvalues of the moment of inertia tensor. From the distribution of ∆,
we find that a large fraction of folded RNA structures are aspherical and the distri-
bution of S values shows that RNA molecules are prolate (S > 0). The flexibility
of folded structures is characterized by the persistence length lp. By fitting the
distance distribution function, P (r) that is computed using the coordinates of the
folded RNA, to the worm-like chain model we extracted the persistence length lp.
We find that lp ≈ 1.5N0.33 A which might reflect the large separation between the
free energies that stabilize secondary and tertiary structures. The dependence of lp
2
on N implies the average length of helices should increases as the size of RNA grows.
We also analyze packing in the structures of ribosomes (30S, 50S, and 70S) in terms
of RG, ∆, S, and lp. The 70S and the 50S subunits are more spherical compared to
most RNA molecules. The globularity in 50S is due to the presence of an unusually
large number (compared to 30S subunit) of small helices that are stitched together
by bulges and loops. Comparison of the shapes of the intact 70S ribosome and the
constituent particles suggests that folding of the individual molecules might occur
prior to assembly.
INTRODUCTION
Molecular recognition between RNAs or RNA and protein is involved in a number of cellular
functions. In all these processes RNA interacts with other biomolecules. In order to under-
stand the biophysical basis of interactions of RNA with other biological molecules it is necessary
to characterize the shapes of the interacting partners. Hence, it is important to elucidate the
shapes and flexibilities of RNA structures. The large increase in the number of three dimen-
sional structures allows us to quantify RNA shapes which is needed to describe the assembly of
complexes such as the ribosome.
In contrast to the situation in RNA much is known about packing and shape fluctuations
in proteins.1,2,3,4 In part this is because the number of solved protein structures is ∼30,000
while the RNA structure database contains only ∼600 structures. Despite considerable suc-
cess in the secondary structure predictions of nucleic acid sequences using energy minimization
dynamic programming algorithm5,6 or comparative sequence analysis7 the complicated nature
of counterion-mediated tertiary interactions in RNAs makes it difficult to obtain three dimen-
sional RNA structures using computational methods. The recent experimental determination
of medium to large size of RNA structures has prompted us to perform a statistical analysis of
RNA structures with the aim of characterizing their shapes and flexibility.
In this paper, we study the structural features of RNA using the currently available RNA
three dimensional structures.8 The size of RNA, as measured by the radius of gyration RG,
shows that typically RNA molecules are compact. The variation of RG with the number (N)
of nucleotides obeys Flory law i.e., RG = aN1/3 A. Although the overall scaling law for RG for
RNA is identical to that for proteins there are considerable differences in their shapes. We find
that the folded states of RNAs are largely prolate and are considerably more aspherical than
3
proteins. The flexibility of RNA, which is crucial in describing interactions with proteins and
RNA and DNA, is described in terms of the persistence length (lp) which can be measured using
X-ray scattering9 and other methods.10 The values of lp for RNA, which are considerably larger
than for proteins, vary between (5-30)A depending on N . Using the shape parameters and lp we
also describe the unusual structural characteristics of the ribosome, a large ribonucleoprotein
complex.
METHODS
RNA structures : We computed several quantities to characterize the shapes of RNA
using the atomic coordinates of their structures determined by X-ray crystallography, NMR, or
cryo-EM. The coordinates for all RNA structures were obtained from the Protein Data Bank
(PDB).8 Our analysis is performed for over 1185 individual RNA chains with the number of
nucleotides N > 10 found in 642 RNA related PDB files as of June 2005. Among these, 195
RNA chain structures are monomers, and the rest of the chains are part of oligomers or appear
in complexes with other RNA molecules or proteins. Structural features in the monomeric form
can be different from those determined in an oligomer or complex because the intermolecular
interaction can affect the individual chain structure. Therefore, we analyzed the two groups
of structures separately. For comparison, we have also calculated shape characteristics for a
dataset of proteins. The results for proteins enable us to assess certain unusual features of
RNA-protein interactions especially in the ribosome.
Size : The radius of gyration (RG) is an indicator of the overall size of RNA. The value of
R2G, which can be measured using small angle X-ray or neutron scattering, is calculated using
R2G =
1
2∑M
i mi
∑Mj mjN2
M∑
i
M∑
j
mimj(~ri − ~rj)2. (1)
where M is the number of atoms in the molecule, and mi is the mass of the ith atom. In the cal-
culation of R2G for RNA structure we used only the coordinates of the heavy atoms (C, N, O, P).
Shape : The deviation from the spherical shape is characterized by the asphericity ∆ and
4
the shape parameter S, both of which are calculated from the inertia tensor,11,12
Tαβ =1
2∑M
i mi
∑Mj mj
M∑
i
M∑
j
mimj(riα − rjα)(riβ − rjβ)
=1
∑Mi mi
M∑
i
mi(riα − RCα)(riβ − RCβ) (2)
where riα is the α-th Cartesian component of the position of atom i, and ~RC =∑M
i mi~ri/∑M
i mi
is the a center of mass. The square of the radius of gyration is R2G = trT . The eigenvalues
λ1, λ2 and λ3 of the matrix T are the the squares of the three principal radii of gyration. The
extent of asphericity is characterized using ∆ (0 ≤ ∆ ≤ 1)
∆ =3
2
∑3i=1(λi − λ)2
(trT )2(3)
where λ = (λ1 + λ2 + λ3)/3. For a perfect sphere ∆ = 0. Deviation from ∆ = 0 indicates the
extent of anisotropy. The overall shape of a molecule is assessed using
S = 27
∏3i=1(λi − λ)
(trT )3(4)
which satisfies the bound −1/4 ≤ S ≤ 2. Negative values of S correspond to oblate ellipsoids
and S > 0 are prolate ellipsoids.
Most studies of packing in proteins and RNAs involve tessellation of space which always
introduces certain amount of arbitrariness.4,13 In contrast, the shape parameters ∆ and S
are directly computed using only the atomic coordinates. Knowledges of ∆ and S are im-
portant in determining the overall motion of RNA and their interaction with other biomolecules.
Persistence Length : A parameter that describes the flexibility of biomolecules is the
persistence length, lp, which is most clearly defined by assuming that RNA structures can be
described by a polymer model. Based on previous experimental studies it is suspected that the
statistical properties of dsDNA,10,14 ssDNA,15 and RNA9,16 can be described using the worm-
like chain (WLC) model. For WLC models lp can be estimated provided the distribution of the
mean end-to-end distance RE or RG is known. Exact calculation of neither P (RE) nor P (RG)
is possible for WLC. A simple and accurate theoretical expression has been derived for P (RE)
of worm-like chain using the mean field approximation.17,18 The resulting distribution, which is
in good agreement with computer simulations,19 is
PWLC(rE) =4πCr2
E
(1 − r2E)9/2
exp[−3t
4(1 − r2E)
]. (5)
5
where rE ≡ RE/L and t ≡ L/lp. L is the contour length. For RNA molecules, which from the
perspective of polymers, can be viewed as a branched polyelectrolyte chains, the contour length
is also an unknown parameter. The normalization constant C = 1/[π3/2e−αα−3/2(1 + 3α−1 +
15/4α−2)] with an α = 3t/4. When lp is small PWLC(rE) reduces to a Gaussian chain whereas
for large lp PWLC(rE) approaches the rod-limit as rE → 1.
Although direct measurements of P (RE) for biomolecules are not routinely performed it is
conceivable that P (RE) may be obtained using single molecule FRET experiments. However,
the distance distribution function P (r) can be measured using SAXS experiments.20,21,22 Based
on general arguments, we expect that the distribution functions P (r) and P (RE) should coincide
provided r ≫ RG. Because 〈R2E〉 ∼ 〈R2
G〉 ∼ Llp for WLC provided L is large it follows scaling
arguments that P (r) should decay for large r as
P (r) = β exp
(
−1
1 − x2
)
(6)
where x = lpr/R2G, and β is an arbitrary constant. In practice Eq.6 accurately describes P (r)
computed using the coordinates of RNA structures when r/Rg > 1. We determined lp by fitting
the P (r) function for RNA structures to Eq.6.
Recently, we used Eq.6 to analyze small angle X-ray scattering data. We showed that lp
for the Azoarcus ribozyme changes by a factor of 2 as the molecule folds upon addition of
counterions (Mg2+ or Na+). Although the structural basis for the success of WLC in describing
certain properties of folded RNA is unclear, Eq.6 is useful in analyzing scattering data.
For purposes of comparisons we have also calculated P (r) for folded structures for 56,000 pro-
tein chains. To our knowledge the persistence length of proteins has not been directly measured.
We obtain lp by fitting P (r), obtained from the coordinates of the structures in the PDB, to Eq.6.
RESULTS
Distribution of RNA structures as a function of N: From the distribution of P (N)
the number of RNA structures in the PDB as a function of chain length (N) in Fig.1 we find
that ∼ 70% of the database contains N in the range 10 < N < 30. The peak in P (N) between
70 < N < 80 is due to the large number of tRNA structures that have been determined in
various conditions. The peaks at N ≈ 1500 and N ≈ 3000 correspond to 16S and 23S ribosomal
RNAs, respectively. Compared to statistics of protein structures (see Fig.1 inset), RNA
structures are more clustered at small values of N but span a broader range of N . However,
this distribution is unrelated to the number of RNA molecules that are relevant to biological
6
functions. There is a broad range in N that represents noncoding RNAs. For example, the
length of human ncRNA functioning in gene silencing process is ∼100,000 nucleotides.23 From
Fig.1, which reflects the current status in RNA structure determination, it is clear that there
is a large gap between the total number of functional RNAs and those with known three
dimensional structures.
Size of RNA obeys the Flory law : If the overall shape of RNA is spherical then its
volume, an extensive variable, is V ≈ 4π3
R3G with RG being the radius of gyration. For accurate
computation of volumes one should use the hydrodynamic radius instead of RG. Because V ∼
a3N where a is a characteristic length (approximately the distance separating two consecutive
nucleotides) it follows that RG ∼ aN1/3. This general result was first derived by Flory who
showed that RG ∼ aNν where ν = 1/3 for maximally compact structures. Because RNA is
a polyelectrolyte its RG depends on the concentration of counterions (C). At low values of
C, RNA is expanded and the transition to a compact structure occurs only when C exceeds a
critical value.
We calculated RG, using Eq.1 (see Methods), for the 1155 “folded” RNA structures. A plot
of RG as a function of N confirms the Flory result. From the plot in Fig.2(a) we find that, for
the folded RNA structures, RG can be accurately calculated using
RG = aN1/3 (7)
where a = 5.5A. The prefactor, a = 5.5A, for the folded structures approximately corresponds
to the average distance (≈5.5A) between the phosphate groups along the backbone (Fig.2(b)).
Recent measurement of RG for the compact state of the 195 nucleotides Azoarcus ribozyme at
high concentration of Na+ or Mg2+ shows that RG ≈ 35A.9 From Eq.7 we find RG ≈ 32A.
This analysis further suggests that the prefactor in Eq.7 may indeed be interpreted as the mean
distance between consecutive phosphate groups in the folded structures. If the RG data in
Fig.2(a) for N < 20 is neglected we find that Eq.7 is obeyed with a ≈ 5A. Thus, the scaling
relation is robust.
It is perhaps more reasonable to view RNA structures as formed from relatively rigid
duplexes that are linked by flexible motifs such as bulges, loops, etc. In such a picture
the fraction of base-paired nucleotides can be chosen as a variable to describe the overall
size. We have shown previously (see Fig. 10 in24) that the number of base pairs in RNA is
∝ N. Thus, the Flory result would be valid even if one accounts for the rigidity of RNA duplexes.
7
Single-chain RNAs are aspherical and prolate : Even though folded RNA structures
are compact, as assessed by their size, there are substantial deviations from sphericity. Indeed,
the distribution P (∆) for single chain RNAs (Fig.3-(a)) has a broad peak around ∆ ≈ 0.3. This
shows that the native-state conformations of single chain RNA molecules deviate greatly from
a sphere. This finding is in stark contrast to P (∆) in single-chain protein structures where the
peak of the distribution is at ∆ < 0.1.25 In addition, only ∼15% of single-chain RNA structures
have ∆ < 0.2, while in proteins the corresponding number is ∼80%. This analysis shows that
even if native structures of RNAs are compact (RG = 5.5N1/3A) they are highly aspherical.
Because many RNAs are organized as oligomers, we also obtained the values of ∆ for such
structures. The distribution of ∆ for oligomeric RNAs is also very broad (Fig.3-(a) middle
panel). Approximately 34% of the 518 oligomeric RNAs have ∆ < 0.2 which shows that
oligomerization in RNA increases the sphericity of the molecule. This conclusion is substanti-
ated by analyzing the R∆25 which is the ratio between the degree of asphericity of the oligomer
and the average ashpericity of the individual chains. If R∆ = 1 then the oligmers and the chains
have the same asphericity while R∆ < 1 indicates that the oligomer is more spherical than its
components. Nearly ∼60% of oligomeric RNAs have R∆ ≤ 1.
The distribution of the shape parameter, S, in single-chain RNAs (Fig.3-(b) top panel)
shows that RNA is mostly prolate because most of the chains have S > 0. This tendency
towards prolate shapes is stronger than in proteins where ∼50% of single-chains are spherical
or nearly so.25 On the other hand, the complexes of RNA chains found in the PDB structures
exhibit a bias towards spherical structures as shown in the peak around S = 0 in Fig.3-(b)
bottom panel. It should be emphasized that there is no systematic dependence of ∆ or S on N.
A plot of ∆ and S on N shows no correlation whatsoever. The observed variations is directly
attributable to sequence and hence the topology of the folded structure.
Distribution function of radius of gyration can be described by WLC model: For
the database of RNA molecules, we calculated the distance distribution, P (r), using the coor-
dinates of the heavy atoms. The P (r) functions (Fig.4(a)) for a few RNA molecules, resemble
those obtained using SAXS experiments for compact RNA molecules. The value of the persis-
tence length is obtained by fitting P (r) to Eq.6 in the range RG < r < 2.5RG. As can be seen
from Fig.5 the value of lp varies between (5-25)A.
If the WLC model correctly describes the distance distribution function an important
prediction follows from Eq.6, namely, that by replacing r by the dimensionless variable
x = rlp/R2G all the P (r) curves must coincide for r/RG > 1. In other words, irrespective of the
8
size, sequence or the nature of interactions that stabilize the native topology, the tail of P (r)
(r > RG) should superimpose. Thus, P (r) should be a function of only lpr/R2G. This important
prediction is validated in Fig.4(b) in which a plot of P (x) with x = rlp/R2G shows that all the
structures follow the same functional form for x > 0.5 (see26 for the same analysis performed on
the end-to-end distance distribution of DNA). From this result we conclude that the distance
distribution function of RNA structures are well described by the WLC model. We do not have
any structural basis for this observation.
Persistence length increases with N: It is remarkable that P (r) for folded RNA is well
described by the WLC model which accounts only for the bending penalty of a thin elastic
material. The structural basis for this important finding is not clear. By fitting P (r) to Eq.6 for
r/RG > 1 we find that lp for folded structures increases with N . The finding that lp grows as
lp = 1.5Nα with α ≈ 1/3 can be rationalized using the arguments given below. A consequence
of the sublinear growth of lp with N is that the effective contour length for folded RNA must
also grow sublinearly with N , i.e., Leff = 3 ×(
5.52
1.5
)
N1/3 ≈ 60N1/3A. In the unfolded state we
expect the contour length L ∝ N . Interestingly, recent single molecular measurements have also
shown that lp for microtubules depends on the contour length.27
The increase in lp with N is related to the restriction that the folded states of biomolecules
be conformationally less dynamic than unfolded states. It is known from polymer physics that
if lp is fixed and there are no interactions that stabilize a specific structure then on large scales
(≫ lp) the structure would be intrinsically flexible. This would mean that spontaneous global
fluctuations of folded RNA would be highly likely due to increase in conformational entropy.
The requirement that biomolecules should adopt a near unique native fold which minimizes
entropy in the native basin of attraction (NBA), implies that lp itself should grow with N . In
contrast, for unfolded RNA, whose conformational entropy is greater than the structures in the
NBA, we expect that lp should be independent of N (see Appendix).
The persistence length lp, which determines the flexibility of RNA, depends on the concen-
tration, shape, and size of counterions. The balance of the effective energetics of interactions
(stacking interactions, hydrogen bonding, hydrophobic interaction, and repulsion between phos-
phate groups and tertiary interactions) renormalizes lp. Let us assume that the interactions are
approximated as pairwise additive and short-ranged ∆G ≈∑
|~ri−~rj |<RG∆Gij . In the presence
of these interactions the persistence length should scale as the range of the interactions i.e.,
lp ≈ RG ≈ N1/3. The non-local interactions, which stabilize the folded RNA structures, grow
with N and hence affect lp. In the absence of interactions that stabilize the three dimensional
9
fold lp is determined only by the intrinsic property of primary sequence and hence should not
depend on N (see Appendix).
We further rationalize the dependence of lp on N by noting that about 54% of all nucleotides
in folded RNA structures are involved in base pairing (see Fig. 10 in24). One possible way,
independent of N, of achieving the 54% base pairings is to distribute them over several short
duplexes that are stabilized by tertiary interactions in the native state. Because the tertiary
interactions in RNA are weaker than the base stackings (and other) interactions that stabilize
hairpin-like structures, creation of several short duplexes is not favorable. Alternatively, it is
free energetically more favorable to create a smaller number of longer stable rigid duplexes
that are stabilized by tertiary interactions to create a nearly spherical shape. This strategy
seems to operate as N increases as seen in ribosomes. As a consequence of the presence of large
number of rigid duplexes, which reflects the hierarchical nature of RNA assembly, lp increases
with N. In other words, in RNA there is clear separation in energy scales stabilizing secondary
and tertiary interactions. Such a hierarchy implies that stiffness itself must be dependent on N.
Because such clear separation in structural organization does not exist in proteins we expect
that lp in proteins must weaker dependence on N (Fig. 5). A similar reasoning has been give
to explain the growth of lp with N for microtubules.27
DISCUSSION
Differences in shapes and packing between proteins and RNA: It is difficult to com-
pare, in absolute terms, packing in proteins and RNA because the nature of interactions that
stabilize their native structures are distinct.28 Nevertheless, the Flory scaling (RG ∼ aN1/3) ob-
served in RNA and proteins shows that both are maximally compact. For a given N , the approx-
imate volume of RNA is larger than proteins. The ratio, VRNA/VPROT ≈ (aRNA/aPROT )3 ≈ 5.6
for a fixed N . This suggests that, in all likelihood, RNA is more loosely packed than proteins
− a conclusion that is in apparent contradiction with a recent structural analysis.13 Voss and
Gerstein based their conclusion on Voronoi construction to decipher volumes of RNA and spe-
cific volume calculations. They concluded that “based on well packed atoms” RNA is more
tightly packed than proteins.13 The inherent arbitrariness in assigning volumes to atoms based
on Voronoi tessellation of space and the use of mass in the definition of specific volume obscures
packing effects which should be based on sizes of nucleotides alone. The present computations
show that, based on volume fraction considerations, RNA is not as compact as proteins as long
as N (the number of nucleotides or the number of aminoacids) is fixed.
10
The observed differences between shapes of RNA and proteins are primarily due to the
nature of interactions that stabilize the folded structures of RNA and proteins. Tertiary
structure formation in RNA must be preceded by substantial neutralization of the negative
charges of phosphate groups. Condensation of counterions that are non-specifically bound
results in the residual charge on the phosphate group being less than ∼ −0.1e where e is
the charge of the electron. However, packing in the resulting tertiary fold is determined not
only by interactions involving nucleotides but also by correlations between counterions.29 The
condensation of a large number of counterions needed to neutralize the charges on the phosphate
groups results in spatial correlation between them. If the volume excluded by the counterions
is large (for example the volume of cobalt hexamine is greater than that of Mg2+) then binding
of one counterion prevents another one being spatially adjacent. These counterion-mediated
interactions and their correlation also inherently affect packing in RNA. In contrast, packing
in the core of proteins is predominantly determined by interactions between hydrophobic side
chains and their contacts with the protein backbone. Because of the absence of additional
ligands, except in certain cases like heme proteins, dense packing in proteins is easier to achieve.
Shape fluctuations of proteins and RNA in the ribosome: The analysis of shape
and flexibility of isolated proteins and RNA gives insight into packing in isolated biomolecules.
However, in a vast majority of cases, function requires interactions between two or more com-
ponents. A prime example is the ribosome, a ribonucleoprotein complex, that plays a central
role in protein synthesis.30,31,32,33 Complexes of both small and large subunits with various an-
tibiotics have revealed the mechanism of the ribosomal machinery for tRNA recognition and
protein synthesis.34,35,36,37 The remarkable three dimensional map of entire ribosome (70S) in-
cluding three tRNAs and mRNA that shows a snapshot of the translation process, has also
been resolved by cryo-EM techniques at 5.5A resolution.33 The binding interface between 30S
and 50S subunits, tRNA recognition site in 30S subunit, and peptidyl transferase site on 50S
subunit are all devoid of the ribosomal proteins. The cavity is formed at the interface between
two subunits where three tRNA and a string of mRNA can be accommodated. The structures of
∼ 50 ribosomal proteins have also been investigated, giving further insights into the interaction
and the assembly process of the ribosome.38,39
Comparison of the shapes of the structures in isolation and in the complex allows us to infer if
there are large scale shape changes upon complexation. To this end, we analyzed the individual
components of the ribosome as well as each structural domain by using the parameters that
quantify molecular sizes, shapes, and flexibilities of the individual components. We used the
11
atomic coordinates from 1GIX (30S subunit composed of 16S rRNA, 3 tRNA, 1 mRNA, and
20 r-proteins) and 1GIY (50S subunit composed of 23S rRNA, 5S rRNA and 22 r-proteins)
that form an entire ribosome complex upon combination.33 The parameters characterizing the
structural components of ribosome are summarized in Table.I.
r-RNAs : Each ribosomal RNA (16S, 23S rRNA) can be further decomposed into sev-
eral structural domains whose folding is autonomous even in the absence of ribosomal
proteins.40,41,42,43 The structural features of individual domains of rRNAs in Fig.6(a), 6(b) are
quantified in terms of RG, ∆, and S, with corresponding regions differently colored in the sec-
ondary structure map. Comparison of ∆ and S values of rRNA domains (Table.I) with P (∆)
and P (S) in Fig.3 shows that, except for the 3’m domain of 16S rRNA, the overall shapes of
rRNA domains are nearly-spherical and slightly prolate (0 < S < 0.25). Thus, no significant
difference between the overall shape is found in rRNAs domain in comparison to typical RNA
molecules. However, the deviations of RG from the scaling law (Eq.7), especially for the do-
mains of 23S rRNA, II, IV, V, VI, show that they are more extended in size than normal RNA
(Fig.7(a)). We find that the size of the domains in the 16S rRNA, 5’, C, 3’M obeys the scaling
law (Eq.7).
Because the shape of the fold from each domain is identical to the one assembled in the intact
ribosome, the assembly from extended domains must occur by a jigsaw puzzle type matching.
The head part of the 16S rRNA, which is crucial for A, P, E, tRNA binding sites is entirely
composed of the 3’M domain. The 5’ and C domains comprise the body and the platform
part, respectively (see33 for terminology). 3’m domain lies at the interface and interacts with
IV-domain of the 23S rRNA when the two subunits dock. After the rRNA domains and r-
proteins are assembled to form a functional subunit, 50S subunit is highly spherical (∆ = 0.05,
S = −0.01). In contrast, the 30S subunit is aspherical and prolate (∆ = 0.21, S = 0.14). The
acquisition of the spherical shape of the entire ribosome (∆ = 0.03, S = 0.01) must occur after
the folding of two subunits. Comparison of the shape of 30S, 50S, and 70S particles suggests
that there is very little alteration in their respective ∆ and S values upon complexation. This
observation suggests that these domains probably fold prior to assembly.
Despite their large sizes, the 50S and the 70S particles are considerably more spherical than
the majority of RNA molecules. The globular nature of the 50S particle and the 70S complex is
surprising given that the typical RNA complexes are aspherical. This asphericity, especially for
medium-sized RNA, is the result of coaxial stacking of helices found in the secondary structures.
The stacking leads to formation of long helices which are expected to be rigid with large values
of lp. The 30S subunit, which is highly aspherical and prolate, fits well with this expectation.
12
Noller has pointed out that the ribosome is made up of mostly small helices linked by flexible
bulges and loops.44 This observation applies to the 50S subunit (Fig.6(b)). However, large-sized
coaxial stackings are dominant in the 16S rRNA, but not in the 23S rRNA. As a result, the 30S
subunit is highly aspherical. The 70S complex is highly spherical. The globularity of the 70S
arises because the 30S subunit fits precisely (despite its high ∆ and S values (see Table.I)) at
the interface with the 50S to create a nearly perfect sphere.
r-proteins : Similar quantitative analysis can be performed on the ribosomal proteins. The
values of RG in some r-proteins deviate from the scaling law and the shape is generally more
biased to the prolate shape than in the non-ribosomal proteins (Fig.7). Ribosomal proteins
are mostly distributed on the back of the interface and the periphery of rRNAs with some of
proteins being anchored deep into the crevices of rRNA. The anchoring is accomplished using
the long tail of peptide chain composed of positively charged amino acids (ARG, LYS, HIS).38,39
The unusual topology of r-proteins prompted us to investigate whether or not the r-proteins
maintain their shape in isolation. We compared the structure of 16 r-proteins complexed in the
ribosome ribosome with the isolated r-protein structures independently determined by X-ray or
NMR available in PDB. The structural deviation between the isolated and ribosome-complexed
r-proteins is quantified using root mean square deviation (RMSD). The structured domains,
like α-helix and β-sheet, are well matched in the isolated protein and in the complex, but the
structural deviation is large in the loop and the tail regions of the structure. The structure
comparison suggests that the ordered part of the r-protein is at least well conserved in both
situations. The disordered tail part is stabilized upon complex formation inside the crevices of
rRNA.38,39
CONCLUSIONS
In this paper we have shown, by analyzing the available RNA structures, that RG can be
accurately computed using the celebrated Flory law. In contrast to proteins, RNA molecules
are considerably more aspherical with the overall shape being prolate. The prolate nature of
RNA shapes suggests that their diffusion is intrinsically anisotropic. For a given value of N (the
number of nucleotides or amino acids) the persistence length of RNA is considerably larger than
proteins. These findings suggest that typically RNA is not nearly as densely packed as proteins
even though both are compact in the folded states.
The structural basis for the success of WLC model in quantitatively fitting the distance
distribution curves for proteins and RNA is not clear. It has been appreciated for a long
13
time that elasticity-based models are appropriate for ds-DNA in monovalent counterions. The
present findings that P (r) (for r/RG > 1) for compact RNA and proteins can be described using
polymer models that accounts only for bending energies is surprising. Our work shows that lp,
which is needed to describe interaction between biomolecules, can be accurately obtained using
the experimentally measurable P (r). The fit of P (r) to WLC also shows that lp increases with
N . Such an unusual behavior is, perhaps, related to the need to minimize entropic fluctuations
in the native state. Suppression of conformational fluctuations in long RNA can achieved by
having a small number of long rigid helices that are stabilized by weak tertiary interactions.
Despite the success of the polymer-based analysis of RNA structures of varying complexity the
microscopic basis for characterizing for folded biomolecules using WLC model remains to be
established.
APPENDIX
The observation that the persistence length of RNA in the compact folded states increases as
lp ≈ a1N0.3 with a1 ≈ 1.5A was rationalized in terms of the restricted conformational fluctuations
in the native state. A corollary of this interpretation is that lp should become independent of N
(or the sequence) if RNA is in the unfolded state. In this appendix, we adopt an oversimplified
model for the unfolded state of RNA to explicitly show that at large (N > 40) lp indeed does
not depend on N .
The absence of persistent tertiary structure allows us to describe the polynucleotide chain as a
worm-like chain model. Such a coarse-grained description may be an approximate representation
of a single stranded chain made up of one nucleotide (for example polyA). To verify how lp
changes as N increases we have performed simulations using WLC which takes into account
only the excluded volume interactions between the beads representing the nucleotides. The
energy function is
H =N−1∑
i=1
kb
2(ri,i+1 − a)2 +
N−2∑
i=1
ka(1 − ri,i+1 · ri+1,i+2) +N−2∑
i=1
N∑
j=i+2
ke
2(ri.j − a)2Θ(a − ri,j) (8)
where ri,j, ri,j are distance and unit vector between i and j beads, respectively. The first term
restricts the extension (or reduction) of bond length around a with kb = 2000ǫ/a2 where ǫ is the
unit of energy. The second term is the bond angle potential that prohibits significant deviation
from the equilibrium value. We assign ka = 10ǫ. The last term with ke = 2000ǫ/a2 takes into
account volume exclusion interaction. By construction, the homopolymer WLC cannot form
any preferred low energy compact structures.
14
For this model, whose energy function is given by Eq.8, we obtained the end-to-end distance
(RE) distribution function using Monte Carlo simulations. Using the energy function in Eq.8, we
generated a large number of equilibrium conformations of the WLC model by employing the pivot
algorithm.45 Unlike a standard Monte Carlo methods that generates polymer conformations by
moving each monomer the pivot algorithm produces a global change in the configuration by
pivoting the chain around the randomly selected monomer position at each iteration. The
algorithm enhances the sampling rate of the available conformational space. The acceptance is
judged by Metropolis criterion.
From the ensemble of conformations generated using the pivot algorithm we obtained the
end-to-end distribution function P (RE). The simulated distribution function P (RE) can be fit
using Eq.5 from which we obtain lp. The dependence of lp on N for the WLC, without the
possibility of forming ordered structures, shows (Fig.8) that lp becomes independent of N when
N > 40. The rise in lp for N < 40 is due to the domination of the bending energy (second term
in Eq.8). For larger values of N the entropic contributions can compensate for the bending
energy and lp saturates to its intrinsic value. Thus, for WLC with excluded volume interactions
the bending penalty dominates at small N values and the chain is intrinsically flexible when N
is very large. This situation is in stark contrast with folded RNA (or proteins) where lp grows
with N . The increase of lp as N increases, which is due to interactions that stabilize RNA,
is required to suppress conformational fluctuations when biomolecules reach the functionally
competent state. Similar findings are well known for polypeptides such as polyPro, polyGly,
etc.46
ACKNOWLEDGMENT
This work was supported in part by a grant from the National Science Foundation through
NSF CHE-05-14056.
1 Richards, F. M. and Lim, W. A. An anlysis of packing in the protein-folding problem. Q. Rev.
Biophys. 26, 423–498 (1994).
2 Richards, F. M. The interpretation of protein structures: Total volume, group volume distributions
and packing density. J. Mol. Biol. 82, 1–14 (1974).
15
3 Finney, J. L. Volume occupation, environment and accessibility in proteins. The problem of the
protein surface. J. Mol. Biol. 96, 721–732 (1975).
4 Liang, J. and Dill, K. A. Are proteins well-packed? Biophys. J. 81, 751–766 (2001).
5 Zuker, M., Mathews, D., and Turner, D. Algorithms and Thermodynamics for RNA Secondary
Structure Prediction: A Practical Guide in RNA Biochemistry and Biotechnology. Kluwer Academic
Publishers, (1999).
6 Hofacker, I. V. Vienna RNA secondary structure server. Nucl. Acids. Res. 31(13), 3429–3431 (2003).
7 Gutell, R. R., Lee, J. C., and Cannone, J. J. The accuracy of ribosomal RNA comparative structure
models. Curr. Opin. Struct. Biol. 12, 301–310 (2002).
8 Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N.,
and Bourne, P. E. The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000).
9 Caliskan, G., Hyeon, C., Perez-Salas, U., Briber, R. M., Woodson, S. A., and Thirumalai, D.
Persistence Length Changes Dramatically as RNA Folds. Phys. Rev. Lett. 95, 268303 (2005).
10 Bustamante, C., Marko, J. F., Siggia, E. D., and Smith, S. Entropic elasticity of λ-phase DNA.
Science 265(5178), 1599–1600 (1994).
11 Aronovitz, J. A. and Nelson, D. R. Universal features of polymer shapes. J. Phys. (Paris) 47,
1445–1456 (1986).
12 Honeycutt, J. D. and Thirumalai, D. Static properties of polymer chains in porous media. J. Chem.
Phys. 90, 4542–4559 (1989).
13 Voss, N. R. and Gerstein, M. Calculation of standard atomic volumes for RNA and comparison
with proteins: RNA is packed more tightly. J. Mol. Biol. 346, 477–492 (2005).
14 Smith, S. B., Finzi, L., and Bustamante, C. Direct Mechanical Measurements of the Elasticity of
Single DNA Molecules by Using Magnetic Beads. Science 258, 1122–1126 (1992).
15 Tinland, B., Pluen, A., Sturm, J., and Weill, G. Persistence Length of Single-Stranded DNA.
Macromolecules 30, 5763 (1997).
16 Liphardt, J., Onoa, B., Smith, S. B., Tinoco, Jr., I., and Bustamante, C. Reversible unfolding of
single RNA molecules by mechanical force. Science 292, 733–737 (2001).
17 Thirumalai, D. and Ha, B.-Y. Theoretical and Mathematical Models in Polymer Research, chapter I.
Statistical Mechanics of Semiflexible Chains: A Mean Field Variational Approach, 1–35. Academic
Press, San Diego (1998). edited by A. Grosberg.
16
18 Hyeon, C. and Thirumalai, D. Kinetics of interior loop formation in semiflexible chains. J. Chem.
Phys. 124, 104905 (2006).
19 Wilhelm, J. and Frey, E. Radial Distribution Function of Semiflexible Polymers. Phys. Rev. Lett.
77(12), 2581–2584 (1996).
20 Fang, X., Littrell, K., Yang, X., Henderson, S. J., Siefert, S., Thiyagarajan, P., Pan, T., and Sosnick,
T. R. Mg2+-Dependent Compaction and Folding of Yeast tRNAPhe and the Catalytic Domain of
the B. subtilis RNase P RNA Determined by Small-Angle X-ray Scattering. Biochemistry 39,
11107–11113 (2000).
21 Russell, R., Millett, I. S., Doniach, S., and Herschlag, D. Small angle X-ray scattering reveals a
compact intermediate in RNA folding. Nature Struct. Biol. 7(5), 367–370 (2000).
22 Russell, R., Millett, I. S., Tate, M. W., Kwok, L. W., Nakatani, B., Gruner, S. M., Mochrie, S. G. J.,
Pande, V., Doniach, S., Herschlag, D., and Pollack, L. Rapid compaction during RNA folding. Proc.
Natl. Acad. Sci. 99(7), 4266–4271 (2002).
23 Sleutels, F., Zwart, R., and Barlow, D. P. The non-codeing Air RNA is required for silencing
autosomal imprinted genes. Nature 415, 810–813 (2002).
24 Dima, R. I., Hyeon, C., and Thirumalai, D. Extracting stacking interaction parameters for RNA
from the data set of native structures. J. Mol. Biol. 347, 53–69 (2005).
25 Dima, R. I. and Thirumalai, D. Asymmetry in the shapes of folded and denatured states of proteins.
J. Phys. Chem. B 108, 6564–6570 (2004).
26 Valle, F., Favre, M., de Los Rios, P., Rosa, A., and Dietler, G. Scaling exponents and probability
distribution of dna end-to-end distance. Phys. Rev. Lett. 95, 158105 (2005).
27 Pampaloni, F., Lattanzi, G., Jons, A., Surrey, T., Frey, E., and Florin, E. L. Thermal fluctuations of
grafted microtubules provide evidence of a length-dependent persistence length. Proc. Natl. Acad.
Sci. 103, 10248–10253 (2006).
28 Thirumalai, D. and Hyeon, C. RNA and Protein folding: Common Themes and Variations. Bio-
chemistry 44(13), 4957–4970 (2005).
29 Koculi, E., Lee, N., Thirumalai, D., and Woodson, S. A. Folding of the Tetrahymena Ribozyme by
Polyamines: Importance of Counterion Valence and Size. J. Mol. Biol. 341(1), 27–36 (2004).
30 Wimberly, B. T., Brodersen, D. E., Clemons Jr., W. M., Morgan-Warren, R. J., Carter, A. P.,
Vonrhein, C., Hartsch, T., and Ramakrishnan, V. Structure of the 30S ribosomal subunit. Nature
17
407, 327–339 (2000).
31 Ban, N., Nissen, P., Hansen, J., Moore, P. B., and Steitz, T. A. The complete atomic structure of
the large ribosomal subunit at 2.4 A resolution. Science 289, 905–920 (2000).
32 Harms, J., Schluenzen, F., Zarivach, R., Bashan, A., Gat, S., Agmon, I., Bartels, H., Franceschi,
F., and Yonath, A. High resolution structure of the large ribosomal subunit from a mesophilic
eubacterium. Cell 107, 679–688 (2001).
33 Yusupov, M. M., Yusupova, G. Z., Baucom, A., Lieberman, K., Earnest, T. N., Cate, J. H. D., and
Noller, H. F. Crystal structure of the ribosome at 5.5 A resolution. Science 292, 883–896 (2001).
34 Brodersen, D. E., Clemons Jr., W. M., Carter, A. P., Morgan-Warren, R., Wimberly, B. T., and
Ramakrishnan, V. The structural basis for the action of the antibiotics tetracycline, pactamycin,
and hygromycin B on the 30S ribosomal subunit. Cell 103, 1143 (2000).
35 Carter, A. P., Clemons Jr., W. M., Brodersen, D. E., Morgan-Warren, R. J., Wimberly, B. T.,
and Ramakrishnan, V. Functional insights from the structure of the 30S ribosomal subunit and its
interactions with antibiotics. Nature 407, 340–348 (2000).
36 Schluenzen, F., Zarivach, R., Harms, J., Bashan, A., Tocilj, A., Albrecht, R., Yonath, A., and
Franceshi, F. Structural basis for the interaction of antibiotics with the peptidyl transferase center
in eubacteria. Nature 413, 814 (2001).
37 Hansen, J. L., Ippolito, J. A., Ban, N., Nissen, P., Moore, P. B., and Steitz, T. The structures of
four macrolide antibiotics bound to the large ribosomal subunit. Mol. Cell 10, 117 (2002).
38 Brodersen, D. E., Clemones Jr., W. M., Carter, A. P., Wimberly, B. T., and Ramakrishnan, V.
Crystal structure of the 30 S ribosomal subunit from Thermus thermophilus: Structure of the
proteins and their interactions with 16 S RNA. J. Mol. Biol. 316, 725–768 (2002).
39 Klein, D. J., Moore, P. B., and Steitz, T. A. The roles of ribosomal proteins in the structure
assembly, and evolution of the large ribosomal subunit. J. Mol. Biol. 340, 141–177 (2004).
40 Egeberg, J., Leffers, H., Christensen, A., Andersen, H., and Garrett, R. A. Structure and accessi-
bility of domain I of Escherichia coli 23 S RNA in free RNA, in the L24-RNA complex and in 50 S
subunits. J. Mol. Biol. 196, 125–136 (1987).
41 Leffers, H., Egebjerg, J., Andersen, A., Christensen, T., and Garrett, R. A. Domain VI of Es-
cherichia coli 23 S ribosomal RNA Structure, assembly and function. J. Mol. Biol. 204, 507–522
(1988).
18
42 Moore, P. B. and Steitz, T. A. The structural basis of large ribosomal subunit function. Ann. Rev.
Biochem 72, 813–850 (2003).
43 Adilakshmi, T., Ramaswamy, P., and Woodson, S. A. Protein-independent Folding pathway of the
16 S rRNA 5’ Domain. J. Mol. Biol. 351, 508–519 (2005).
44 Noller, H. F. RNA Structure: Reading the Ribosome. Science 309, 1508–1514 (2005).
45 Bishop, M., Clarke, J. H. R., Rey, A., and Freire, J. J. Investigation of the end-to-end vector
distribution function for linear polymer in different regimes. J. Chem. Phys. 95, 4589–4592 (1991).
46 Cantor, C. R. and Schimmel, P. R. Biophysical Chemistry. W. H. Freeman and Company, San
Francisco, (1979).
47 Buscaglia, M., Lapidus, L. J., Eaton, W. A., and Hofrichter, J. Effects of Denaturants on the
Dynamics of Loop Formation in Polypeptides. Biophys. J. 91, 276–288 (2006).
48 Sherman, E. and Haran, G. Coil-globule transition in the denatured state of a small protein. Proc.
Natl. Acad. Sci. 103, 11539–11543 (2006).
49 Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., and
Ferrin, T. E. UCSF Chimera - A Visualization System for Exploratory Research and Analysis. J.
Comput. Chem. 25, 1605–1612 (2004).
19
TABLE I: Structural features of the ribosome.
Na
RG[A]b ∆cS lp[A]d RMSDe[A] N RG[A] ∆ S lp[A] RMSD[A]
70S 9662 86.2 0.03 0.01 27.1 -
30S 3915 66.3 0.21 0.14 23.1 - 50S 5747 74.3 0.05 -0.01 23.8 -
16S 1519 65.1 0.28 0.21 22.3 - 23S 2889 66.4 0.02 -0.01 23.6 -
5’ 542 42.8 0.21 0.14 14.9 - I 557 46.1 0.23 0.20 16.5 -
C 352 39.8 0.28 0.25 14.4 - II 736 56.7 0.08 0.00 18.0 -
r-RNA 3’M 484 39.3 0.07 0.01 13.4 - III 378 35.4 0.18 0.12 12.3 -
3’m 141 45.1 0.66 1.06 -f - IV 343 44.7 0.25 0.25 15.6 -
- V 600 56.3 0.26 0.22 19.4 -
- VI 275 41.1 0.22 -0.08 13.2 -
5S 123 32.5 0.45 0.59 10.6 -
S2 234 19.3 0.24 0.23 6.4 - L1 224 18.0 0.15 0.09 6.1 5.79
S3 206 18.3 0.13 0.08 6.1 - L2 173 19.1 0.21 0.12 6.0 -
S4 208 17.6 0.15 0.04 5.9 - L3 191 22.7 0.29 0.28 7.1g -
S5 150 16.9 0.28 0.29 5.6 1.19 L4 189 25.7 0.60 0.91 7.2 2.90
S6 101 14.2 0.10 -0.03 4.5 1.17 L5 122 16.9 0.16 0.10 5.7 -
S7 155 17.5 0.24 0.19 5.6 0.93 L6 164 19.2 0.39 0.45 5.8 2.06
S8 138 15.5 0.11 0.00 5.0 - L7 128 18.1 0.43 0.53 5.7 0.62
S9 127 18.2 0.41 0.50 21.8 - L9 148 25.9 0.71 1.18 - 2.19
r-protein S10 98 18.6 0.67 1.08 23.1 - L11 133 16.7 0.30 0.32 5.4 2.00
S11 119 15.0 0.22 0.14 4.7 - L12 128 18.1 0.42 0.52 5.6 0.75
S12 124 19.9 0.33 0.38 23.7 - L13 117 14.2 0.11 0.02 4.6 -
S13 125 22.3 0.46 0.56 7.1 - L14 122 13.4 0.09 0.02 4.4 -
S14 60 13.8 0.44 0.56 3.9 - L15 84 13.8 0.23 0.21 4.2 -
S15 88 13.6 0.19 0.08 4.3 3.89 L16 138 17.4 0.43 0.56 5.6 17.7
S16 83 12.3 0.13 -0.06 3.8 1.88 L18 113 13.5 0.10 -0.01 4.3 1.84
S17 104 15.5 0.11 0.02 4.9 4.82 L19 52 10.5 0.09 -0.04 3.0 -
S18 73 12.0 0.08 -0.04 3.6 - L22 110 16.6 0.51 0.71 5.3 -
S19 80 12.7 0.08 -0.02 3.9 3.16 L23 76 11.5 0.08 0.00 3.5 4.27
S20 99 17.4 0.64 1.02 5.4 - L24 110 15.3 0.08 -0.04 4.9 -
THX 24 7.2 0.26 0.23 - - L25 89 12.3 0.05 -0.01 3.9 1.41
L29 64 12.8 0.43 0.54 - -
L30 60 10.4 0.18 0.09 3.0 -
aN is the number of nucleotides or aminoacids.bThe radius of gyration RG is calculated using Eq.1.cThe shape parameters ∆ and S are computed using Eq.3, 4.dlp is the persistence length.eThe root mean square deviation is the extent of structural deviation of the ribosomal proteins in the complex
and in isolation.fPersistence length is not reported if the correlation coefficient of nonlinear fitting is less than 0.85.gUnusually large values of the parameters (∆, S > 0.6, and lp > 7.0A) are given in bold.
20
FIGURE CAPTIONS
Figure 1: Distribution of RNA structures in the Protein Data Bank (PDB) as a function of
chain length, N. The arrows show the N values for 16S and 23S ribosomal RNAs, respectively.
The inset shows the same plot for protein structures.
Figure 2: (a) Radius of gyration as a function of N . The straight line is a fit to the data that
shows the scaling law RG = 5.5N0.33A. The correlation coefficient if 0.94. If data for N > 300 are
neglected we found RG = 5.6N0.33 with a correlation coefficient of 0.92 (fit in green). Data points
inside the circle, which deviate significantly from the scaling law, correspond to the structures
that are similar to ds-DNA (PDB code: 1H1K). We excluded these structure from the fitting
procedure. For comparison the plot of RG as a function of N for 13704 monomeric proteins are
shown in the inset. The linear line corresponds to RG = 3.1N0.31A with a correlation coefficient
of 0.89. (b) Distance distribution of neighboring phosphor atoms along the RNA backbone. The
distance, RP−P corresponds to separation of the backbone P atoms between ith and (i + 1)th
nucleotide where i = 1, 2, . . . , (N − 1).
Figure 3: (a) Distribution of the asphericity parameter ∆ for RNA. The top panel corre-
sponds to single chain, the middle represents single chain in a complex, and the bottom panel
is for the complex. Large deviation from sphericity is found in RNA. (b) Distribution of shape
parameters for RNA. The legend for the three panel is the same as in (a). RNA molecules in
general are aspherical and prolate like an American football.
Figure 4: (a) The distance distribution P (r) as a function of r for selected proteins and RNA.
We calculated P (r) using the coordinates of the folded structures. The legend at the bottom
gives the PDB codes for which P (r)s are shown. (b) Dependence of P (r) on the dimensionless
variable x = rlp/R2G. If RNA and proteins can be modeled as WLC then it follows that, for
x > 1, P (x) should fall on a single line (See Eq.6) independent of the fold. The tails of P (x)
for P (r) in (a) practically collapse onto a single curve. The log P (r)/β distributions between
dash lines are plotted as a function of − 11−x2 , which show a nice overlap with the condition,
log [P (x)/β]−1/(1−x2)
∼ 1, being satisfied.
Figure 5: Dependence of lp on the chain length for RNA and proteins. The persistence
length lp was computed by fitting P (r) to Eq.6. The lines correspond to lp = 1.47N0.33A (RNA)
and lp = 1.00N0.33A (proteins). There is greater dispersion in the data for proteins than for
RNA. Indeed, the correlation coefficient in the fit for RNA is 0.98 whereas for proteins it is only
0.79. Nevertheless, the lp values for proteins are in the range inferred from experiments for both
peptides and proteinsI.47,48
Figure 6: (a) Structural domains of the 16S rRNA. The corresponding secondary structure at
21
the center is in the same color. View from interface (left) and back (right) of 16S rRNA assembled
by these structural domains (b) Structural domains of the 23S rRNA. The organization of the
figure is identical to that of 16S rRNA in (a). The coaxial stackings, are specified as dark lines
on the secondary structures. Molecular graphics images were produced using XRNA and UCSF
Chimera package.49
Figure 7: (a) Radii of gyration (RG) of the structural domains in 16S (filled circle) and 23S
(empty diamond) rRNAs are plotted as a function of N . Red line representing RG = 5.5N0.33
is drawn to show the deviation of rRNA domain from the statistics found in usual RNAs. (b)
Plot of RG against N for ribosomal proteins. Red line represents RG = 3.1N1/3 scaling law
found in “normal” globular proteins. Ribosomal proteins (L3, L4, L9, S10, S12, S13, S20) that
show a large deviation from the scaling law are explicitly indicated. When the tail part of these
proteins are removed, RG for the r-proteins obey the Flory scaling law (see open red circles).
Figure 8: Persistence length lp as a function of N for a WLC model described in the
Appendix. This model may represent a homopolymeric nucleotide at low salt concentrations.
The value of lp is obtained by fitting the end-to-end distribution functions P (RE) that were
generated by Monte Carlo simulations (see Appendix). An example of P (RE) as a function of
RE/L for N = 30 is shown in the inset. The dependence of lp in N shows that, for large N , lp
is a constant for a homopolymer chain at low ionic concentration.
22
10 100 1000N
0
100
200
300
400
500
600#
(RN
A)
1000 2000N
0
2000
4000
6000
8000
10000
# (p
rote
in)
FIG. 1: