Size, shape, and flexibility of RNA structures

28
arXiv:q-bio/0609035v1 [q-bio.BM] 23 Sep 2006 Size, shape, and flexibility of RNA structures Changbong Hyeon 1 and Ruxandra I. Dima 2 and D. Thirumalai 1,3 1 Biophysics Program, Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742 2 Department of Chemistry, University of Cincinnati, Cincinnati OH, 45221 3 Department of Chemistry and Biochemistry, University of Maryland, College Park, MD 20742 (Dated: February 9, 2008) Determination of sizes and flexibilities of RNA molecules is important in under- standing the nature of packing in folded structures and in elucidating interactions between RNA and DNA or proteins. Using the coordinates of the structures of RNA in the Protein Data Bank we find that the size of the folded RNA structures, measured using the radius of gyration, R G , follows the Flory scaling law, namely, R G =5.5N 1/3 ˚ A where N is the number of nucleotides. The shape of RNA molecules is characterized by the asphericity Δ and the shape S parameters that are computed using the eigenvalues of the moment of inertia tensor. From the distribution of Δ, we find that a large fraction of folded RNA structures are aspherical and the distri- bution of S values shows that RNA molecules are prolate (S> 0). The flexibility of folded structures is characterized by the persistence length l p . By fitting the distance distribution function, P (r) that is computed using the coordinates of the folded RNA, to the worm-like chain model we extracted the persistence length l p . We find that l p 1.5N 0.33 ˚ A which might reflect the large separation between the free energies that stabilize secondary and tertiary structures. The dependence of l p

Transcript of Size, shape, and flexibility of RNA structures

arX

iv:q

-bio

/060

9035

v1 [

q-bi

o.B

M]

23

Sep

2006

Size, shape, and flexibility of RNA structures

Changbong Hyeon1 and Ruxandra I. Dima2 and D. Thirumalai1,3

1Biophysics Program,

Institute for Physical Science and Technology,

University of Maryland,

College Park, MD 20742

2Department of Chemistry,

University of Cincinnati,

Cincinnati OH, 45221

3Department of Chemistry and Biochemistry,

University of Maryland,

College Park, MD 20742

(Dated: February 9, 2008)

Determination of sizes and flexibilities of RNA molecules is important in under-

standing the nature of packing in folded structures and in elucidating interactions

between RNA and DNA or proteins. Using the coordinates of the structures of

RNA in the Protein Data Bank we find that the size of the folded RNA structures,

measured using the radius of gyration, RG, follows the Flory scaling law, namely,

RG = 5.5N1/3 A where N is the number of nucleotides. The shape of RNA molecules

is characterized by the asphericity ∆ and the shape S parameters that are computed

using the eigenvalues of the moment of inertia tensor. From the distribution of ∆,

we find that a large fraction of folded RNA structures are aspherical and the distri-

bution of S values shows that RNA molecules are prolate (S > 0). The flexibility

of folded structures is characterized by the persistence length lp. By fitting the

distance distribution function, P (r) that is computed using the coordinates of the

folded RNA, to the worm-like chain model we extracted the persistence length lp.

We find that lp ≈ 1.5N0.33 A which might reflect the large separation between the

free energies that stabilize secondary and tertiary structures. The dependence of lp

2

on N implies the average length of helices should increases as the size of RNA grows.

We also analyze packing in the structures of ribosomes (30S, 50S, and 70S) in terms

of RG, ∆, S, and lp. The 70S and the 50S subunits are more spherical compared to

most RNA molecules. The globularity in 50S is due to the presence of an unusually

large number (compared to 30S subunit) of small helices that are stitched together

by bulges and loops. Comparison of the shapes of the intact 70S ribosome and the

constituent particles suggests that folding of the individual molecules might occur

prior to assembly.

INTRODUCTION

Molecular recognition between RNAs or RNA and protein is involved in a number of cellular

functions. In all these processes RNA interacts with other biomolecules. In order to under-

stand the biophysical basis of interactions of RNA with other biological molecules it is necessary

to characterize the shapes of the interacting partners. Hence, it is important to elucidate the

shapes and flexibilities of RNA structures. The large increase in the number of three dimen-

sional structures allows us to quantify RNA shapes which is needed to describe the assembly of

complexes such as the ribosome.

In contrast to the situation in RNA much is known about packing and shape fluctuations

in proteins.1,2,3,4 In part this is because the number of solved protein structures is ∼30,000

while the RNA structure database contains only ∼600 structures. Despite considerable suc-

cess in the secondary structure predictions of nucleic acid sequences using energy minimization

dynamic programming algorithm5,6 or comparative sequence analysis7 the complicated nature

of counterion-mediated tertiary interactions in RNAs makes it difficult to obtain three dimen-

sional RNA structures using computational methods. The recent experimental determination

of medium to large size of RNA structures has prompted us to perform a statistical analysis of

RNA structures with the aim of characterizing their shapes and flexibility.

In this paper, we study the structural features of RNA using the currently available RNA

three dimensional structures.8 The size of RNA, as measured by the radius of gyration RG,

shows that typically RNA molecules are compact. The variation of RG with the number (N)

of nucleotides obeys Flory law i.e., RG = aN1/3 A. Although the overall scaling law for RG for

RNA is identical to that for proteins there are considerable differences in their shapes. We find

that the folded states of RNAs are largely prolate and are considerably more aspherical than

3

proteins. The flexibility of RNA, which is crucial in describing interactions with proteins and

RNA and DNA, is described in terms of the persistence length (lp) which can be measured using

X-ray scattering9 and other methods.10 The values of lp for RNA, which are considerably larger

than for proteins, vary between (5-30)A depending on N . Using the shape parameters and lp we

also describe the unusual structural characteristics of the ribosome, a large ribonucleoprotein

complex.

METHODS

RNA structures : We computed several quantities to characterize the shapes of RNA

using the atomic coordinates of their structures determined by X-ray crystallography, NMR, or

cryo-EM. The coordinates for all RNA structures were obtained from the Protein Data Bank

(PDB).8 Our analysis is performed for over 1185 individual RNA chains with the number of

nucleotides N > 10 found in 642 RNA related PDB files as of June 2005. Among these, 195

RNA chain structures are monomers, and the rest of the chains are part of oligomers or appear

in complexes with other RNA molecules or proteins. Structural features in the monomeric form

can be different from those determined in an oligomer or complex because the intermolecular

interaction can affect the individual chain structure. Therefore, we analyzed the two groups

of structures separately. For comparison, we have also calculated shape characteristics for a

dataset of proteins. The results for proteins enable us to assess certain unusual features of

RNA-protein interactions especially in the ribosome.

Size : The radius of gyration (RG) is an indicator of the overall size of RNA. The value of

R2G, which can be measured using small angle X-ray or neutron scattering, is calculated using

R2G =

1

2∑M

i mi

∑Mj mjN2

M∑

i

M∑

j

mimj(~ri − ~rj)2. (1)

where M is the number of atoms in the molecule, and mi is the mass of the ith atom. In the cal-

culation of R2G for RNA structure we used only the coordinates of the heavy atoms (C, N, O, P).

Shape : The deviation from the spherical shape is characterized by the asphericity ∆ and

4

the shape parameter S, both of which are calculated from the inertia tensor,11,12

Tαβ =1

2∑M

i mi

∑Mj mj

M∑

i

M∑

j

mimj(riα − rjα)(riβ − rjβ)

=1

∑Mi mi

M∑

i

mi(riα − RCα)(riβ − RCβ) (2)

where riα is the α-th Cartesian component of the position of atom i, and ~RC =∑M

i mi~ri/∑M

i mi

is the a center of mass. The square of the radius of gyration is R2G = trT . The eigenvalues

λ1, λ2 and λ3 of the matrix T are the the squares of the three principal radii of gyration. The

extent of asphericity is characterized using ∆ (0 ≤ ∆ ≤ 1)

∆ =3

2

∑3i=1(λi − λ)2

(trT )2(3)

where λ = (λ1 + λ2 + λ3)/3. For a perfect sphere ∆ = 0. Deviation from ∆ = 0 indicates the

extent of anisotropy. The overall shape of a molecule is assessed using

S = 27

∏3i=1(λi − λ)

(trT )3(4)

which satisfies the bound −1/4 ≤ S ≤ 2. Negative values of S correspond to oblate ellipsoids

and S > 0 are prolate ellipsoids.

Most studies of packing in proteins and RNAs involve tessellation of space which always

introduces certain amount of arbitrariness.4,13 In contrast, the shape parameters ∆ and S

are directly computed using only the atomic coordinates. Knowledges of ∆ and S are im-

portant in determining the overall motion of RNA and their interaction with other biomolecules.

Persistence Length : A parameter that describes the flexibility of biomolecules is the

persistence length, lp, which is most clearly defined by assuming that RNA structures can be

described by a polymer model. Based on previous experimental studies it is suspected that the

statistical properties of dsDNA,10,14 ssDNA,15 and RNA9,16 can be described using the worm-

like chain (WLC) model. For WLC models lp can be estimated provided the distribution of the

mean end-to-end distance RE or RG is known. Exact calculation of neither P (RE) nor P (RG)

is possible for WLC. A simple and accurate theoretical expression has been derived for P (RE)

of worm-like chain using the mean field approximation.17,18 The resulting distribution, which is

in good agreement with computer simulations,19 is

PWLC(rE) =4πCr2

E

(1 − r2E)9/2

exp[−3t

4(1 − r2E)

]. (5)

5

where rE ≡ RE/L and t ≡ L/lp. L is the contour length. For RNA molecules, which from the

perspective of polymers, can be viewed as a branched polyelectrolyte chains, the contour length

is also an unknown parameter. The normalization constant C = 1/[π3/2e−αα−3/2(1 + 3α−1 +

15/4α−2)] with an α = 3t/4. When lp is small PWLC(rE) reduces to a Gaussian chain whereas

for large lp PWLC(rE) approaches the rod-limit as rE → 1.

Although direct measurements of P (RE) for biomolecules are not routinely performed it is

conceivable that P (RE) may be obtained using single molecule FRET experiments. However,

the distance distribution function P (r) can be measured using SAXS experiments.20,21,22 Based

on general arguments, we expect that the distribution functions P (r) and P (RE) should coincide

provided r ≫ RG. Because 〈R2E〉 ∼ 〈R2

G〉 ∼ Llp for WLC provided L is large it follows scaling

arguments that P (r) should decay for large r as

P (r) = β exp

(

−1

1 − x2

)

(6)

where x = lpr/R2G, and β is an arbitrary constant. In practice Eq.6 accurately describes P (r)

computed using the coordinates of RNA structures when r/Rg > 1. We determined lp by fitting

the P (r) function for RNA structures to Eq.6.

Recently, we used Eq.6 to analyze small angle X-ray scattering data. We showed that lp

for the Azoarcus ribozyme changes by a factor of 2 as the molecule folds upon addition of

counterions (Mg2+ or Na+). Although the structural basis for the success of WLC in describing

certain properties of folded RNA is unclear, Eq.6 is useful in analyzing scattering data.

For purposes of comparisons we have also calculated P (r) for folded structures for 56,000 pro-

tein chains. To our knowledge the persistence length of proteins has not been directly measured.

We obtain lp by fitting P (r), obtained from the coordinates of the structures in the PDB, to Eq.6.

RESULTS

Distribution of RNA structures as a function of N: From the distribution of P (N)

the number of RNA structures in the PDB as a function of chain length (N) in Fig.1 we find

that ∼ 70% of the database contains N in the range 10 < N < 30. The peak in P (N) between

70 < N < 80 is due to the large number of tRNA structures that have been determined in

various conditions. The peaks at N ≈ 1500 and N ≈ 3000 correspond to 16S and 23S ribosomal

RNAs, respectively. Compared to statistics of protein structures (see Fig.1 inset), RNA

structures are more clustered at small values of N but span a broader range of N . However,

this distribution is unrelated to the number of RNA molecules that are relevant to biological

6

functions. There is a broad range in N that represents noncoding RNAs. For example, the

length of human ncRNA functioning in gene silencing process is ∼100,000 nucleotides.23 From

Fig.1, which reflects the current status in RNA structure determination, it is clear that there

is a large gap between the total number of functional RNAs and those with known three

dimensional structures.

Size of RNA obeys the Flory law : If the overall shape of RNA is spherical then its

volume, an extensive variable, is V ≈ 4π3

R3G with RG being the radius of gyration. For accurate

computation of volumes one should use the hydrodynamic radius instead of RG. Because V ∼

a3N where a is a characteristic length (approximately the distance separating two consecutive

nucleotides) it follows that RG ∼ aN1/3. This general result was first derived by Flory who

showed that RG ∼ aNν where ν = 1/3 for maximally compact structures. Because RNA is

a polyelectrolyte its RG depends on the concentration of counterions (C). At low values of

C, RNA is expanded and the transition to a compact structure occurs only when C exceeds a

critical value.

We calculated RG, using Eq.1 (see Methods), for the 1155 “folded” RNA structures. A plot

of RG as a function of N confirms the Flory result. From the plot in Fig.2(a) we find that, for

the folded RNA structures, RG can be accurately calculated using

RG = aN1/3 (7)

where a = 5.5A. The prefactor, a = 5.5A, for the folded structures approximately corresponds

to the average distance (≈5.5A) between the phosphate groups along the backbone (Fig.2(b)).

Recent measurement of RG for the compact state of the 195 nucleotides Azoarcus ribozyme at

high concentration of Na+ or Mg2+ shows that RG ≈ 35A.9 From Eq.7 we find RG ≈ 32A.

This analysis further suggests that the prefactor in Eq.7 may indeed be interpreted as the mean

distance between consecutive phosphate groups in the folded structures. If the RG data in

Fig.2(a) for N < 20 is neglected we find that Eq.7 is obeyed with a ≈ 5A. Thus, the scaling

relation is robust.

It is perhaps more reasonable to view RNA structures as formed from relatively rigid

duplexes that are linked by flexible motifs such as bulges, loops, etc. In such a picture

the fraction of base-paired nucleotides can be chosen as a variable to describe the overall

size. We have shown previously (see Fig. 10 in24) that the number of base pairs in RNA is

∝ N. Thus, the Flory result would be valid even if one accounts for the rigidity of RNA duplexes.

7

Single-chain RNAs are aspherical and prolate : Even though folded RNA structures

are compact, as assessed by their size, there are substantial deviations from sphericity. Indeed,

the distribution P (∆) for single chain RNAs (Fig.3-(a)) has a broad peak around ∆ ≈ 0.3. This

shows that the native-state conformations of single chain RNA molecules deviate greatly from

a sphere. This finding is in stark contrast to P (∆) in single-chain protein structures where the

peak of the distribution is at ∆ < 0.1.25 In addition, only ∼15% of single-chain RNA structures

have ∆ < 0.2, while in proteins the corresponding number is ∼80%. This analysis shows that

even if native structures of RNAs are compact (RG = 5.5N1/3A) they are highly aspherical.

Because many RNAs are organized as oligomers, we also obtained the values of ∆ for such

structures. The distribution of ∆ for oligomeric RNAs is also very broad (Fig.3-(a) middle

panel). Approximately 34% of the 518 oligomeric RNAs have ∆ < 0.2 which shows that

oligomerization in RNA increases the sphericity of the molecule. This conclusion is substanti-

ated by analyzing the R∆25 which is the ratio between the degree of asphericity of the oligomer

and the average ashpericity of the individual chains. If R∆ = 1 then the oligmers and the chains

have the same asphericity while R∆ < 1 indicates that the oligomer is more spherical than its

components. Nearly ∼60% of oligomeric RNAs have R∆ ≤ 1.

The distribution of the shape parameter, S, in single-chain RNAs (Fig.3-(b) top panel)

shows that RNA is mostly prolate because most of the chains have S > 0. This tendency

towards prolate shapes is stronger than in proteins where ∼50% of single-chains are spherical

or nearly so.25 On the other hand, the complexes of RNA chains found in the PDB structures

exhibit a bias towards spherical structures as shown in the peak around S = 0 in Fig.3-(b)

bottom panel. It should be emphasized that there is no systematic dependence of ∆ or S on N.

A plot of ∆ and S on N shows no correlation whatsoever. The observed variations is directly

attributable to sequence and hence the topology of the folded structure.

Distribution function of radius of gyration can be described by WLC model: For

the database of RNA molecules, we calculated the distance distribution, P (r), using the coor-

dinates of the heavy atoms. The P (r) functions (Fig.4(a)) for a few RNA molecules, resemble

those obtained using SAXS experiments for compact RNA molecules. The value of the persis-

tence length is obtained by fitting P (r) to Eq.6 in the range RG < r < 2.5RG. As can be seen

from Fig.5 the value of lp varies between (5-25)A.

If the WLC model correctly describes the distance distribution function an important

prediction follows from Eq.6, namely, that by replacing r by the dimensionless variable

x = rlp/R2G all the P (r) curves must coincide for r/RG > 1. In other words, irrespective of the

8

size, sequence or the nature of interactions that stabilize the native topology, the tail of P (r)

(r > RG) should superimpose. Thus, P (r) should be a function of only lpr/R2G. This important

prediction is validated in Fig.4(b) in which a plot of P (x) with x = rlp/R2G shows that all the

structures follow the same functional form for x > 0.5 (see26 for the same analysis performed on

the end-to-end distance distribution of DNA). From this result we conclude that the distance

distribution function of RNA structures are well described by the WLC model. We do not have

any structural basis for this observation.

Persistence length increases with N: It is remarkable that P (r) for folded RNA is well

described by the WLC model which accounts only for the bending penalty of a thin elastic

material. The structural basis for this important finding is not clear. By fitting P (r) to Eq.6 for

r/RG > 1 we find that lp for folded structures increases with N . The finding that lp grows as

lp = 1.5Nα with α ≈ 1/3 can be rationalized using the arguments given below. A consequence

of the sublinear growth of lp with N is that the effective contour length for folded RNA must

also grow sublinearly with N , i.e., Leff = 3 ×(

5.52

1.5

)

N1/3 ≈ 60N1/3A. In the unfolded state we

expect the contour length L ∝ N . Interestingly, recent single molecular measurements have also

shown that lp for microtubules depends on the contour length.27

The increase in lp with N is related to the restriction that the folded states of biomolecules

be conformationally less dynamic than unfolded states. It is known from polymer physics that

if lp is fixed and there are no interactions that stabilize a specific structure then on large scales

(≫ lp) the structure would be intrinsically flexible. This would mean that spontaneous global

fluctuations of folded RNA would be highly likely due to increase in conformational entropy.

The requirement that biomolecules should adopt a near unique native fold which minimizes

entropy in the native basin of attraction (NBA), implies that lp itself should grow with N . In

contrast, for unfolded RNA, whose conformational entropy is greater than the structures in the

NBA, we expect that lp should be independent of N (see Appendix).

The persistence length lp, which determines the flexibility of RNA, depends on the concen-

tration, shape, and size of counterions. The balance of the effective energetics of interactions

(stacking interactions, hydrogen bonding, hydrophobic interaction, and repulsion between phos-

phate groups and tertiary interactions) renormalizes lp. Let us assume that the interactions are

approximated as pairwise additive and short-ranged ∆G ≈∑

|~ri−~rj |<RG∆Gij . In the presence

of these interactions the persistence length should scale as the range of the interactions i.e.,

lp ≈ RG ≈ N1/3. The non-local interactions, which stabilize the folded RNA structures, grow

with N and hence affect lp. In the absence of interactions that stabilize the three dimensional

9

fold lp is determined only by the intrinsic property of primary sequence and hence should not

depend on N (see Appendix).

We further rationalize the dependence of lp on N by noting that about 54% of all nucleotides

in folded RNA structures are involved in base pairing (see Fig. 10 in24). One possible way,

independent of N, of achieving the 54% base pairings is to distribute them over several short

duplexes that are stabilized by tertiary interactions in the native state. Because the tertiary

interactions in RNA are weaker than the base stackings (and other) interactions that stabilize

hairpin-like structures, creation of several short duplexes is not favorable. Alternatively, it is

free energetically more favorable to create a smaller number of longer stable rigid duplexes

that are stabilized by tertiary interactions to create a nearly spherical shape. This strategy

seems to operate as N increases as seen in ribosomes. As a consequence of the presence of large

number of rigid duplexes, which reflects the hierarchical nature of RNA assembly, lp increases

with N. In other words, in RNA there is clear separation in energy scales stabilizing secondary

and tertiary interactions. Such a hierarchy implies that stiffness itself must be dependent on N.

Because such clear separation in structural organization does not exist in proteins we expect

that lp in proteins must weaker dependence on N (Fig. 5). A similar reasoning has been give

to explain the growth of lp with N for microtubules.27

DISCUSSION

Differences in shapes and packing between proteins and RNA: It is difficult to com-

pare, in absolute terms, packing in proteins and RNA because the nature of interactions that

stabilize their native structures are distinct.28 Nevertheless, the Flory scaling (RG ∼ aN1/3) ob-

served in RNA and proteins shows that both are maximally compact. For a given N , the approx-

imate volume of RNA is larger than proteins. The ratio, VRNA/VPROT ≈ (aRNA/aPROT )3 ≈ 5.6

for a fixed N . This suggests that, in all likelihood, RNA is more loosely packed than proteins

− a conclusion that is in apparent contradiction with a recent structural analysis.13 Voss and

Gerstein based their conclusion on Voronoi construction to decipher volumes of RNA and spe-

cific volume calculations. They concluded that “based on well packed atoms” RNA is more

tightly packed than proteins.13 The inherent arbitrariness in assigning volumes to atoms based

on Voronoi tessellation of space and the use of mass in the definition of specific volume obscures

packing effects which should be based on sizes of nucleotides alone. The present computations

show that, based on volume fraction considerations, RNA is not as compact as proteins as long

as N (the number of nucleotides or the number of aminoacids) is fixed.

10

The observed differences between shapes of RNA and proteins are primarily due to the

nature of interactions that stabilize the folded structures of RNA and proteins. Tertiary

structure formation in RNA must be preceded by substantial neutralization of the negative

charges of phosphate groups. Condensation of counterions that are non-specifically bound

results in the residual charge on the phosphate group being less than ∼ −0.1e where e is

the charge of the electron. However, packing in the resulting tertiary fold is determined not

only by interactions involving nucleotides but also by correlations between counterions.29 The

condensation of a large number of counterions needed to neutralize the charges on the phosphate

groups results in spatial correlation between them. If the volume excluded by the counterions

is large (for example the volume of cobalt hexamine is greater than that of Mg2+) then binding

of one counterion prevents another one being spatially adjacent. These counterion-mediated

interactions and their correlation also inherently affect packing in RNA. In contrast, packing

in the core of proteins is predominantly determined by interactions between hydrophobic side

chains and their contacts with the protein backbone. Because of the absence of additional

ligands, except in certain cases like heme proteins, dense packing in proteins is easier to achieve.

Shape fluctuations of proteins and RNA in the ribosome: The analysis of shape

and flexibility of isolated proteins and RNA gives insight into packing in isolated biomolecules.

However, in a vast majority of cases, function requires interactions between two or more com-

ponents. A prime example is the ribosome, a ribonucleoprotein complex, that plays a central

role in protein synthesis.30,31,32,33 Complexes of both small and large subunits with various an-

tibiotics have revealed the mechanism of the ribosomal machinery for tRNA recognition and

protein synthesis.34,35,36,37 The remarkable three dimensional map of entire ribosome (70S) in-

cluding three tRNAs and mRNA that shows a snapshot of the translation process, has also

been resolved by cryo-EM techniques at 5.5A resolution.33 The binding interface between 30S

and 50S subunits, tRNA recognition site in 30S subunit, and peptidyl transferase site on 50S

subunit are all devoid of the ribosomal proteins. The cavity is formed at the interface between

two subunits where three tRNA and a string of mRNA can be accommodated. The structures of

∼ 50 ribosomal proteins have also been investigated, giving further insights into the interaction

and the assembly process of the ribosome.38,39

Comparison of the shapes of the structures in isolation and in the complex allows us to infer if

there are large scale shape changes upon complexation. To this end, we analyzed the individual

components of the ribosome as well as each structural domain by using the parameters that

quantify molecular sizes, shapes, and flexibilities of the individual components. We used the

11

atomic coordinates from 1GIX (30S subunit composed of 16S rRNA, 3 tRNA, 1 mRNA, and

20 r-proteins) and 1GIY (50S subunit composed of 23S rRNA, 5S rRNA and 22 r-proteins)

that form an entire ribosome complex upon combination.33 The parameters characterizing the

structural components of ribosome are summarized in Table.I.

r-RNAs : Each ribosomal RNA (16S, 23S rRNA) can be further decomposed into sev-

eral structural domains whose folding is autonomous even in the absence of ribosomal

proteins.40,41,42,43 The structural features of individual domains of rRNAs in Fig.6(a), 6(b) are

quantified in terms of RG, ∆, and S, with corresponding regions differently colored in the sec-

ondary structure map. Comparison of ∆ and S values of rRNA domains (Table.I) with P (∆)

and P (S) in Fig.3 shows that, except for the 3’m domain of 16S rRNA, the overall shapes of

rRNA domains are nearly-spherical and slightly prolate (0 < S < 0.25). Thus, no significant

difference between the overall shape is found in rRNAs domain in comparison to typical RNA

molecules. However, the deviations of RG from the scaling law (Eq.7), especially for the do-

mains of 23S rRNA, II, IV, V, VI, show that they are more extended in size than normal RNA

(Fig.7(a)). We find that the size of the domains in the 16S rRNA, 5’, C, 3’M obeys the scaling

law (Eq.7).

Because the shape of the fold from each domain is identical to the one assembled in the intact

ribosome, the assembly from extended domains must occur by a jigsaw puzzle type matching.

The head part of the 16S rRNA, which is crucial for A, P, E, tRNA binding sites is entirely

composed of the 3’M domain. The 5’ and C domains comprise the body and the platform

part, respectively (see33 for terminology). 3’m domain lies at the interface and interacts with

IV-domain of the 23S rRNA when the two subunits dock. After the rRNA domains and r-

proteins are assembled to form a functional subunit, 50S subunit is highly spherical (∆ = 0.05,

S = −0.01). In contrast, the 30S subunit is aspherical and prolate (∆ = 0.21, S = 0.14). The

acquisition of the spherical shape of the entire ribosome (∆ = 0.03, S = 0.01) must occur after

the folding of two subunits. Comparison of the shape of 30S, 50S, and 70S particles suggests

that there is very little alteration in their respective ∆ and S values upon complexation. This

observation suggests that these domains probably fold prior to assembly.

Despite their large sizes, the 50S and the 70S particles are considerably more spherical than

the majority of RNA molecules. The globular nature of the 50S particle and the 70S complex is

surprising given that the typical RNA complexes are aspherical. This asphericity, especially for

medium-sized RNA, is the result of coaxial stacking of helices found in the secondary structures.

The stacking leads to formation of long helices which are expected to be rigid with large values

of lp. The 30S subunit, which is highly aspherical and prolate, fits well with this expectation.

12

Noller has pointed out that the ribosome is made up of mostly small helices linked by flexible

bulges and loops.44 This observation applies to the 50S subunit (Fig.6(b)). However, large-sized

coaxial stackings are dominant in the 16S rRNA, but not in the 23S rRNA. As a result, the 30S

subunit is highly aspherical. The 70S complex is highly spherical. The globularity of the 70S

arises because the 30S subunit fits precisely (despite its high ∆ and S values (see Table.I)) at

the interface with the 50S to create a nearly perfect sphere.

r-proteins : Similar quantitative analysis can be performed on the ribosomal proteins. The

values of RG in some r-proteins deviate from the scaling law and the shape is generally more

biased to the prolate shape than in the non-ribosomal proteins (Fig.7). Ribosomal proteins

are mostly distributed on the back of the interface and the periphery of rRNAs with some of

proteins being anchored deep into the crevices of rRNA. The anchoring is accomplished using

the long tail of peptide chain composed of positively charged amino acids (ARG, LYS, HIS).38,39

The unusual topology of r-proteins prompted us to investigate whether or not the r-proteins

maintain their shape in isolation. We compared the structure of 16 r-proteins complexed in the

ribosome ribosome with the isolated r-protein structures independently determined by X-ray or

NMR available in PDB. The structural deviation between the isolated and ribosome-complexed

r-proteins is quantified using root mean square deviation (RMSD). The structured domains,

like α-helix and β-sheet, are well matched in the isolated protein and in the complex, but the

structural deviation is large in the loop and the tail regions of the structure. The structure

comparison suggests that the ordered part of the r-protein is at least well conserved in both

situations. The disordered tail part is stabilized upon complex formation inside the crevices of

rRNA.38,39

CONCLUSIONS

In this paper we have shown, by analyzing the available RNA structures, that RG can be

accurately computed using the celebrated Flory law. In contrast to proteins, RNA molecules

are considerably more aspherical with the overall shape being prolate. The prolate nature of

RNA shapes suggests that their diffusion is intrinsically anisotropic. For a given value of N (the

number of nucleotides or amino acids) the persistence length of RNA is considerably larger than

proteins. These findings suggest that typically RNA is not nearly as densely packed as proteins

even though both are compact in the folded states.

The structural basis for the success of WLC model in quantitatively fitting the distance

distribution curves for proteins and RNA is not clear. It has been appreciated for a long

13

time that elasticity-based models are appropriate for ds-DNA in monovalent counterions. The

present findings that P (r) (for r/RG > 1) for compact RNA and proteins can be described using

polymer models that accounts only for bending energies is surprising. Our work shows that lp,

which is needed to describe interaction between biomolecules, can be accurately obtained using

the experimentally measurable P (r). The fit of P (r) to WLC also shows that lp increases with

N . Such an unusual behavior is, perhaps, related to the need to minimize entropic fluctuations

in the native state. Suppression of conformational fluctuations in long RNA can achieved by

having a small number of long rigid helices that are stabilized by weak tertiary interactions.

Despite the success of the polymer-based analysis of RNA structures of varying complexity the

microscopic basis for characterizing for folded biomolecules using WLC model remains to be

established.

APPENDIX

The observation that the persistence length of RNA in the compact folded states increases as

lp ≈ a1N0.3 with a1 ≈ 1.5A was rationalized in terms of the restricted conformational fluctuations

in the native state. A corollary of this interpretation is that lp should become independent of N

(or the sequence) if RNA is in the unfolded state. In this appendix, we adopt an oversimplified

model for the unfolded state of RNA to explicitly show that at large (N > 40) lp indeed does

not depend on N .

The absence of persistent tertiary structure allows us to describe the polynucleotide chain as a

worm-like chain model. Such a coarse-grained description may be an approximate representation

of a single stranded chain made up of one nucleotide (for example polyA). To verify how lp

changes as N increases we have performed simulations using WLC which takes into account

only the excluded volume interactions between the beads representing the nucleotides. The

energy function is

H =N−1∑

i=1

kb

2(ri,i+1 − a)2 +

N−2∑

i=1

ka(1 − ri,i+1 · ri+1,i+2) +N−2∑

i=1

N∑

j=i+2

ke

2(ri.j − a)2Θ(a − ri,j) (8)

where ri,j, ri,j are distance and unit vector between i and j beads, respectively. The first term

restricts the extension (or reduction) of bond length around a with kb = 2000ǫ/a2 where ǫ is the

unit of energy. The second term is the bond angle potential that prohibits significant deviation

from the equilibrium value. We assign ka = 10ǫ. The last term with ke = 2000ǫ/a2 takes into

account volume exclusion interaction. By construction, the homopolymer WLC cannot form

any preferred low energy compact structures.

14

For this model, whose energy function is given by Eq.8, we obtained the end-to-end distance

(RE) distribution function using Monte Carlo simulations. Using the energy function in Eq.8, we

generated a large number of equilibrium conformations of the WLC model by employing the pivot

algorithm.45 Unlike a standard Monte Carlo methods that generates polymer conformations by

moving each monomer the pivot algorithm produces a global change in the configuration by

pivoting the chain around the randomly selected monomer position at each iteration. The

algorithm enhances the sampling rate of the available conformational space. The acceptance is

judged by Metropolis criterion.

From the ensemble of conformations generated using the pivot algorithm we obtained the

end-to-end distribution function P (RE). The simulated distribution function P (RE) can be fit

using Eq.5 from which we obtain lp. The dependence of lp on N for the WLC, without the

possibility of forming ordered structures, shows (Fig.8) that lp becomes independent of N when

N > 40. The rise in lp for N < 40 is due to the domination of the bending energy (second term

in Eq.8). For larger values of N the entropic contributions can compensate for the bending

energy and lp saturates to its intrinsic value. Thus, for WLC with excluded volume interactions

the bending penalty dominates at small N values and the chain is intrinsically flexible when N

is very large. This situation is in stark contrast with folded RNA (or proteins) where lp grows

with N . The increase of lp as N increases, which is due to interactions that stabilize RNA,

is required to suppress conformational fluctuations when biomolecules reach the functionally

competent state. Similar findings are well known for polypeptides such as polyPro, polyGly,

etc.46

ACKNOWLEDGMENT

This work was supported in part by a grant from the National Science Foundation through

NSF CHE-05-14056.

1 Richards, F. M. and Lim, W. A. An anlysis of packing in the protein-folding problem. Q. Rev.

Biophys. 26, 423–498 (1994).

2 Richards, F. M. The interpretation of protein structures: Total volume, group volume distributions

and packing density. J. Mol. Biol. 82, 1–14 (1974).

15

3 Finney, J. L. Volume occupation, environment and accessibility in proteins. The problem of the

protein surface. J. Mol. Biol. 96, 721–732 (1975).

4 Liang, J. and Dill, K. A. Are proteins well-packed? Biophys. J. 81, 751–766 (2001).

5 Zuker, M., Mathews, D., and Turner, D. Algorithms and Thermodynamics for RNA Secondary

Structure Prediction: A Practical Guide in RNA Biochemistry and Biotechnology. Kluwer Academic

Publishers, (1999).

6 Hofacker, I. V. Vienna RNA secondary structure server. Nucl. Acids. Res. 31(13), 3429–3431 (2003).

7 Gutell, R. R., Lee, J. C., and Cannone, J. J. The accuracy of ribosomal RNA comparative structure

models. Curr. Opin. Struct. Biol. 12, 301–310 (2002).

8 Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N.,

and Bourne, P. E. The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000).

9 Caliskan, G., Hyeon, C., Perez-Salas, U., Briber, R. M., Woodson, S. A., and Thirumalai, D.

Persistence Length Changes Dramatically as RNA Folds. Phys. Rev. Lett. 95, 268303 (2005).

10 Bustamante, C., Marko, J. F., Siggia, E. D., and Smith, S. Entropic elasticity of λ-phase DNA.

Science 265(5178), 1599–1600 (1994).

11 Aronovitz, J. A. and Nelson, D. R. Universal features of polymer shapes. J. Phys. (Paris) 47,

1445–1456 (1986).

12 Honeycutt, J. D. and Thirumalai, D. Static properties of polymer chains in porous media. J. Chem.

Phys. 90, 4542–4559 (1989).

13 Voss, N. R. and Gerstein, M. Calculation of standard atomic volumes for RNA and comparison

with proteins: RNA is packed more tightly. J. Mol. Biol. 346, 477–492 (2005).

14 Smith, S. B., Finzi, L., and Bustamante, C. Direct Mechanical Measurements of the Elasticity of

Single DNA Molecules by Using Magnetic Beads. Science 258, 1122–1126 (1992).

15 Tinland, B., Pluen, A., Sturm, J., and Weill, G. Persistence Length of Single-Stranded DNA.

Macromolecules 30, 5763 (1997).

16 Liphardt, J., Onoa, B., Smith, S. B., Tinoco, Jr., I., and Bustamante, C. Reversible unfolding of

single RNA molecules by mechanical force. Science 292, 733–737 (2001).

17 Thirumalai, D. and Ha, B.-Y. Theoretical and Mathematical Models in Polymer Research, chapter I.

Statistical Mechanics of Semiflexible Chains: A Mean Field Variational Approach, 1–35. Academic

Press, San Diego (1998). edited by A. Grosberg.

16

18 Hyeon, C. and Thirumalai, D. Kinetics of interior loop formation in semiflexible chains. J. Chem.

Phys. 124, 104905 (2006).

19 Wilhelm, J. and Frey, E. Radial Distribution Function of Semiflexible Polymers. Phys. Rev. Lett.

77(12), 2581–2584 (1996).

20 Fang, X., Littrell, K., Yang, X., Henderson, S. J., Siefert, S., Thiyagarajan, P., Pan, T., and Sosnick,

T. R. Mg2+-Dependent Compaction and Folding of Yeast tRNAPhe and the Catalytic Domain of

the B. subtilis RNase P RNA Determined by Small-Angle X-ray Scattering. Biochemistry 39,

11107–11113 (2000).

21 Russell, R., Millett, I. S., Doniach, S., and Herschlag, D. Small angle X-ray scattering reveals a

compact intermediate in RNA folding. Nature Struct. Biol. 7(5), 367–370 (2000).

22 Russell, R., Millett, I. S., Tate, M. W., Kwok, L. W., Nakatani, B., Gruner, S. M., Mochrie, S. G. J.,

Pande, V., Doniach, S., Herschlag, D., and Pollack, L. Rapid compaction during RNA folding. Proc.

Natl. Acad. Sci. 99(7), 4266–4271 (2002).

23 Sleutels, F., Zwart, R., and Barlow, D. P. The non-codeing Air RNA is required for silencing

autosomal imprinted genes. Nature 415, 810–813 (2002).

24 Dima, R. I., Hyeon, C., and Thirumalai, D. Extracting stacking interaction parameters for RNA

from the data set of native structures. J. Mol. Biol. 347, 53–69 (2005).

25 Dima, R. I. and Thirumalai, D. Asymmetry in the shapes of folded and denatured states of proteins.

J. Phys. Chem. B 108, 6564–6570 (2004).

26 Valle, F., Favre, M., de Los Rios, P., Rosa, A., and Dietler, G. Scaling exponents and probability

distribution of dna end-to-end distance. Phys. Rev. Lett. 95, 158105 (2005).

27 Pampaloni, F., Lattanzi, G., Jons, A., Surrey, T., Frey, E., and Florin, E. L. Thermal fluctuations of

grafted microtubules provide evidence of a length-dependent persistence length. Proc. Natl. Acad.

Sci. 103, 10248–10253 (2006).

28 Thirumalai, D. and Hyeon, C. RNA and Protein folding: Common Themes and Variations. Bio-

chemistry 44(13), 4957–4970 (2005).

29 Koculi, E., Lee, N., Thirumalai, D., and Woodson, S. A. Folding of the Tetrahymena Ribozyme by

Polyamines: Importance of Counterion Valence and Size. J. Mol. Biol. 341(1), 27–36 (2004).

30 Wimberly, B. T., Brodersen, D. E., Clemons Jr., W. M., Morgan-Warren, R. J., Carter, A. P.,

Vonrhein, C., Hartsch, T., and Ramakrishnan, V. Structure of the 30S ribosomal subunit. Nature

17

407, 327–339 (2000).

31 Ban, N., Nissen, P., Hansen, J., Moore, P. B., and Steitz, T. A. The complete atomic structure of

the large ribosomal subunit at 2.4 A resolution. Science 289, 905–920 (2000).

32 Harms, J., Schluenzen, F., Zarivach, R., Bashan, A., Gat, S., Agmon, I., Bartels, H., Franceschi,

F., and Yonath, A. High resolution structure of the large ribosomal subunit from a mesophilic

eubacterium. Cell 107, 679–688 (2001).

33 Yusupov, M. M., Yusupova, G. Z., Baucom, A., Lieberman, K., Earnest, T. N., Cate, J. H. D., and

Noller, H. F. Crystal structure of the ribosome at 5.5 A resolution. Science 292, 883–896 (2001).

34 Brodersen, D. E., Clemons Jr., W. M., Carter, A. P., Morgan-Warren, R., Wimberly, B. T., and

Ramakrishnan, V. The structural basis for the action of the antibiotics tetracycline, pactamycin,

and hygromycin B on the 30S ribosomal subunit. Cell 103, 1143 (2000).

35 Carter, A. P., Clemons Jr., W. M., Brodersen, D. E., Morgan-Warren, R. J., Wimberly, B. T.,

and Ramakrishnan, V. Functional insights from the structure of the 30S ribosomal subunit and its

interactions with antibiotics. Nature 407, 340–348 (2000).

36 Schluenzen, F., Zarivach, R., Harms, J., Bashan, A., Tocilj, A., Albrecht, R., Yonath, A., and

Franceshi, F. Structural basis for the interaction of antibiotics with the peptidyl transferase center

in eubacteria. Nature 413, 814 (2001).

37 Hansen, J. L., Ippolito, J. A., Ban, N., Nissen, P., Moore, P. B., and Steitz, T. The structures of

four macrolide antibiotics bound to the large ribosomal subunit. Mol. Cell 10, 117 (2002).

38 Brodersen, D. E., Clemones Jr., W. M., Carter, A. P., Wimberly, B. T., and Ramakrishnan, V.

Crystal structure of the 30 S ribosomal subunit from Thermus thermophilus: Structure of the

proteins and their interactions with 16 S RNA. J. Mol. Biol. 316, 725–768 (2002).

39 Klein, D. J., Moore, P. B., and Steitz, T. A. The roles of ribosomal proteins in the structure

assembly, and evolution of the large ribosomal subunit. J. Mol. Biol. 340, 141–177 (2004).

40 Egeberg, J., Leffers, H., Christensen, A., Andersen, H., and Garrett, R. A. Structure and accessi-

bility of domain I of Escherichia coli 23 S RNA in free RNA, in the L24-RNA complex and in 50 S

subunits. J. Mol. Biol. 196, 125–136 (1987).

41 Leffers, H., Egebjerg, J., Andersen, A., Christensen, T., and Garrett, R. A. Domain VI of Es-

cherichia coli 23 S ribosomal RNA Structure, assembly and function. J. Mol. Biol. 204, 507–522

(1988).

18

42 Moore, P. B. and Steitz, T. A. The structural basis of large ribosomal subunit function. Ann. Rev.

Biochem 72, 813–850 (2003).

43 Adilakshmi, T., Ramaswamy, P., and Woodson, S. A. Protein-independent Folding pathway of the

16 S rRNA 5’ Domain. J. Mol. Biol. 351, 508–519 (2005).

44 Noller, H. F. RNA Structure: Reading the Ribosome. Science 309, 1508–1514 (2005).

45 Bishop, M., Clarke, J. H. R., Rey, A., and Freire, J. J. Investigation of the end-to-end vector

distribution function for linear polymer in different regimes. J. Chem. Phys. 95, 4589–4592 (1991).

46 Cantor, C. R. and Schimmel, P. R. Biophysical Chemistry. W. H. Freeman and Company, San

Francisco, (1979).

47 Buscaglia, M., Lapidus, L. J., Eaton, W. A., and Hofrichter, J. Effects of Denaturants on the

Dynamics of Loop Formation in Polypeptides. Biophys. J. 91, 276–288 (2006).

48 Sherman, E. and Haran, G. Coil-globule transition in the denatured state of a small protein. Proc.

Natl. Acad. Sci. 103, 11539–11543 (2006).

49 Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., and

Ferrin, T. E. UCSF Chimera - A Visualization System for Exploratory Research and Analysis. J.

Comput. Chem. 25, 1605–1612 (2004).

19

TABLE I: Structural features of the ribosome.

Na

RG[A]b ∆cS lp[A]d RMSDe[A] N RG[A] ∆ S lp[A] RMSD[A]

70S 9662 86.2 0.03 0.01 27.1 -

30S 3915 66.3 0.21 0.14 23.1 - 50S 5747 74.3 0.05 -0.01 23.8 -

16S 1519 65.1 0.28 0.21 22.3 - 23S 2889 66.4 0.02 -0.01 23.6 -

5’ 542 42.8 0.21 0.14 14.9 - I 557 46.1 0.23 0.20 16.5 -

C 352 39.8 0.28 0.25 14.4 - II 736 56.7 0.08 0.00 18.0 -

r-RNA 3’M 484 39.3 0.07 0.01 13.4 - III 378 35.4 0.18 0.12 12.3 -

3’m 141 45.1 0.66 1.06 -f - IV 343 44.7 0.25 0.25 15.6 -

- V 600 56.3 0.26 0.22 19.4 -

- VI 275 41.1 0.22 -0.08 13.2 -

5S 123 32.5 0.45 0.59 10.6 -

S2 234 19.3 0.24 0.23 6.4 - L1 224 18.0 0.15 0.09 6.1 5.79

S3 206 18.3 0.13 0.08 6.1 - L2 173 19.1 0.21 0.12 6.0 -

S4 208 17.6 0.15 0.04 5.9 - L3 191 22.7 0.29 0.28 7.1g -

S5 150 16.9 0.28 0.29 5.6 1.19 L4 189 25.7 0.60 0.91 7.2 2.90

S6 101 14.2 0.10 -0.03 4.5 1.17 L5 122 16.9 0.16 0.10 5.7 -

S7 155 17.5 0.24 0.19 5.6 0.93 L6 164 19.2 0.39 0.45 5.8 2.06

S8 138 15.5 0.11 0.00 5.0 - L7 128 18.1 0.43 0.53 5.7 0.62

S9 127 18.2 0.41 0.50 21.8 - L9 148 25.9 0.71 1.18 - 2.19

r-protein S10 98 18.6 0.67 1.08 23.1 - L11 133 16.7 0.30 0.32 5.4 2.00

S11 119 15.0 0.22 0.14 4.7 - L12 128 18.1 0.42 0.52 5.6 0.75

S12 124 19.9 0.33 0.38 23.7 - L13 117 14.2 0.11 0.02 4.6 -

S13 125 22.3 0.46 0.56 7.1 - L14 122 13.4 0.09 0.02 4.4 -

S14 60 13.8 0.44 0.56 3.9 - L15 84 13.8 0.23 0.21 4.2 -

S15 88 13.6 0.19 0.08 4.3 3.89 L16 138 17.4 0.43 0.56 5.6 17.7

S16 83 12.3 0.13 -0.06 3.8 1.88 L18 113 13.5 0.10 -0.01 4.3 1.84

S17 104 15.5 0.11 0.02 4.9 4.82 L19 52 10.5 0.09 -0.04 3.0 -

S18 73 12.0 0.08 -0.04 3.6 - L22 110 16.6 0.51 0.71 5.3 -

S19 80 12.7 0.08 -0.02 3.9 3.16 L23 76 11.5 0.08 0.00 3.5 4.27

S20 99 17.4 0.64 1.02 5.4 - L24 110 15.3 0.08 -0.04 4.9 -

THX 24 7.2 0.26 0.23 - - L25 89 12.3 0.05 -0.01 3.9 1.41

L29 64 12.8 0.43 0.54 - -

L30 60 10.4 0.18 0.09 3.0 -

aN is the number of nucleotides or aminoacids.bThe radius of gyration RG is calculated using Eq.1.cThe shape parameters ∆ and S are computed using Eq.3, 4.dlp is the persistence length.eThe root mean square deviation is the extent of structural deviation of the ribosomal proteins in the complex

and in isolation.fPersistence length is not reported if the correlation coefficient of nonlinear fitting is less than 0.85.gUnusually large values of the parameters (∆, S > 0.6, and lp > 7.0A) are given in bold.

20

FIGURE CAPTIONS

Figure 1: Distribution of RNA structures in the Protein Data Bank (PDB) as a function of

chain length, N. The arrows show the N values for 16S and 23S ribosomal RNAs, respectively.

The inset shows the same plot for protein structures.

Figure 2: (a) Radius of gyration as a function of N . The straight line is a fit to the data that

shows the scaling law RG = 5.5N0.33A. The correlation coefficient if 0.94. If data for N > 300 are

neglected we found RG = 5.6N0.33 with a correlation coefficient of 0.92 (fit in green). Data points

inside the circle, which deviate significantly from the scaling law, correspond to the structures

that are similar to ds-DNA (PDB code: 1H1K). We excluded these structure from the fitting

procedure. For comparison the plot of RG as a function of N for 13704 monomeric proteins are

shown in the inset. The linear line corresponds to RG = 3.1N0.31A with a correlation coefficient

of 0.89. (b) Distance distribution of neighboring phosphor atoms along the RNA backbone. The

distance, RP−P corresponds to separation of the backbone P atoms between ith and (i + 1)th

nucleotide where i = 1, 2, . . . , (N − 1).

Figure 3: (a) Distribution of the asphericity parameter ∆ for RNA. The top panel corre-

sponds to single chain, the middle represents single chain in a complex, and the bottom panel

is for the complex. Large deviation from sphericity is found in RNA. (b) Distribution of shape

parameters for RNA. The legend for the three panel is the same as in (a). RNA molecules in

general are aspherical and prolate like an American football.

Figure 4: (a) The distance distribution P (r) as a function of r for selected proteins and RNA.

We calculated P (r) using the coordinates of the folded structures. The legend at the bottom

gives the PDB codes for which P (r)s are shown. (b) Dependence of P (r) on the dimensionless

variable x = rlp/R2G. If RNA and proteins can be modeled as WLC then it follows that, for

x > 1, P (x) should fall on a single line (See Eq.6) independent of the fold. The tails of P (x)

for P (r) in (a) practically collapse onto a single curve. The log P (r)/β distributions between

dash lines are plotted as a function of − 11−x2 , which show a nice overlap with the condition,

log [P (x)/β]−1/(1−x2)

∼ 1, being satisfied.

Figure 5: Dependence of lp on the chain length for RNA and proteins. The persistence

length lp was computed by fitting P (r) to Eq.6. The lines correspond to lp = 1.47N0.33A (RNA)

and lp = 1.00N0.33A (proteins). There is greater dispersion in the data for proteins than for

RNA. Indeed, the correlation coefficient in the fit for RNA is 0.98 whereas for proteins it is only

0.79. Nevertheless, the lp values for proteins are in the range inferred from experiments for both

peptides and proteinsI.47,48

Figure 6: (a) Structural domains of the 16S rRNA. The corresponding secondary structure at

21

the center is in the same color. View from interface (left) and back (right) of 16S rRNA assembled

by these structural domains (b) Structural domains of the 23S rRNA. The organization of the

figure is identical to that of 16S rRNA in (a). The coaxial stackings, are specified as dark lines

on the secondary structures. Molecular graphics images were produced using XRNA and UCSF

Chimera package.49

Figure 7: (a) Radii of gyration (RG) of the structural domains in 16S (filled circle) and 23S

(empty diamond) rRNAs are plotted as a function of N . Red line representing RG = 5.5N0.33

is drawn to show the deviation of rRNA domain from the statistics found in usual RNAs. (b)

Plot of RG against N for ribosomal proteins. Red line represents RG = 3.1N1/3 scaling law

found in “normal” globular proteins. Ribosomal proteins (L3, L4, L9, S10, S12, S13, S20) that

show a large deviation from the scaling law are explicitly indicated. When the tail part of these

proteins are removed, RG for the r-proteins obey the Flory scaling law (see open red circles).

Figure 8: Persistence length lp as a function of N for a WLC model described in the

Appendix. This model may represent a homopolymeric nucleotide at low salt concentrations.

The value of lp is obtained by fitting the end-to-end distribution functions P (RE) that were

generated by Monte Carlo simulations (see Appendix). An example of P (RE) as a function of

RE/L for N = 30 is shown in the inset. The dependence of lp in N shows that, for large N , lp

is a constant for a homopolymer chain at low ionic concentration.

22

10 100 1000N

0

100

200

300

400

500

600#

(RN

A)

1000 2000N

0

2000

4000

6000

8000

10000

# (p

rote

in)

FIG. 1:

23

(a)

2 3 4 5 6 7 8 9 10

RP-P [Å]

0

2

4

6

8

10

P(R

P-P

), %

(b)

FIG. 2:

24

FIG. 3:

25

FIG. 4:

26

FIG. 5:

27

(a)

(b)

28

100 1000

N

100

RG [Å

]

3’m

IIV

IVVI

III

I5’

3’MC

(a)

10 100

N

10

RG [Å

]

L9 L4S13

L3S12

S10

S20

(b)

FIG. 7:

0 0.2 0.4 0.6 0.8 1RE/L

0

1

2

3

4

5

P(R

E)

0 20 40 60 80 100N

5

10

15

l p [Å

]

N=30

FIG. 8: