The genetic code as a periodic table

14
J. Mol. Evol. 11,211--224 (1978) Journal of Molecular Evolution © by Springer-Verlag 1978 The Genetic Code as a Periodic Table John R. Jungck Department of Biology, Clarkson College, Potsdam, NY 13676, USA Summary. The contemporary genetic code is reflective of a significant correla- tion between the properties of amino acids and their anticodons in a periodic manner. Almost all properties of amino acids showed a greater correlation to anticodonic than to codonic dinucleoside monophosphate properties. The pol- arity and bulkiness of amino acid side chains can be used to predict the anti- codon with considerable confidence. The results are most consistent with pre- dictions of the "direct interaction" and "ambiguity reduction" hypotheses for the origin of the genetic code. Key words: Genetic Code - Origin - Amino acid - Oligonucleotide - Correlation - Anticodons. Introduction The search for a physical basis of genetic coding dates back at least to the time when it was initially suggested that nucleotide sequences code for amino acid sequences. In one of the earliest models, Gamow (1954) suggested that amino acids literally fit into the rhombic cleft between three contiguous base pairs in a DNA double helix; this model had the elegant property of being able to maximally code for twenty amino acids. However, since the deciphering of the genetic code, mechanistic models for the evolutionary basis of coding have primarily fallen into two classes. The first model supposes that there was a direct interaction between the codon and the respec- tive amino acid (Pelc and Welton, 1966; Woese et al., 1967; Lacey and Pruitt, 1969). The second model supposes that there was an interaction, either directly or in a com- mon environment, between an amino acid and its anticodonic oligonucleotide (Dunnill, 1966; Ralph, 1958; Nagyvary and Fendler, 1974; Lacey and Weber, 1976). Others (Rendell et al., 1971) did not commit themselves either way. The preceding paper by Weber and Lacey (1978) presents data which employs all twenty proteinaceous amino acids, ali four ribonucleotides, and all sixteen dinucleoside monophosphates. Their report is the first to present a series of measurements for each of these compounds in a single system. Their association of properties between the homocodonic amino acids 0022--2844/78/0011/021~/~ 2.80

Transcript of The genetic code as a periodic table

J. Mol. Evol. 11,211--224 (1978) Journal of Molecular Evolution © by Springer-Verlag 1978

The Genetic Code as a Periodic Table

John R. Jungck

Department of Biology, Clarkson College, Potsdam, NY 13676, USA

Summary. The contemporary genetic code is reflective of a significant correla- tion between the properties of amino acids and their anticodons in a periodic manner. Almost all properties of amino acids showed a greater correlation to anticodonic than to codonic dinucleoside monophosphate properties. The pol- arity and bulkiness of amino acid side chains can be used to predict the anti- codon with considerable confidence. The results are most consistent with pre- dictions of the "direct interaction" and "ambiguity reduction" hypotheses for the origin of the genetic code.

Key words: Genetic Code - Origin - Amino acid - Oligonucleotide - Correlation - Anticodons.

Introduction

The search for a physical basis of genetic coding dates back at least to the time when it was initially suggested that nucleotide sequences code for amino acid sequences. In one of the earliest models, Gamow (1954) suggested that amino acids literally fit into the rhombic cleft between three contiguous base pairs in a DNA double helix; this model had the elegant property of being able to maximally code for twenty amino acids. However, since the deciphering of the genetic code, mechanistic models for the evolutionary basis of coding have primarily fallen into two classes. The first model supposes that there was a direct interaction between the codon and the respec- tive amino acid (Pelc and Welton, 1966; Woese et al., 1967; Lacey and Pruitt, 1969). The second model supposes that there was an interaction, either directly or in a com- mon environment, between an amino acid and its anticodonic oligonucleotide (Dunnill, 1966; Ralph, 1958; Nagyvary and Fendler, 1974; Lacey and Weber, 1976). Others (Rendell et al., 1971) did not commit themselves either way. The preceding paper by Weber and Lacey (1978) presents data which employs all twenty proteinaceous amino acids, ali four ribonucleotides, and all sixteen dinucleoside monophosphates. Their report is the first to present a series of measurements for each of these compounds in a single system. Their association of properties between the homocodonic amino acids

0022--2844/78/0011/021~/~ 2.80

212 J . R . Jungck

(lys, gly, phe, and pro each has one homogeneous codon) and their anticodon mono- nucleotides is quite straightforward and similar to that suggested by Nagyvary and Fendler (1974). However, the data for all twenty amino acids correlated with their dinucleotide anticodon equivalents is less straightforward.

The purpose of this paper is to: 1) evaluate statistically the relationship between all twenty amino acids and their

anticodon dinucleoside monophosphates based on the Weber and Lacey (1978) data, and so, concomitantly, the anticodon based model;

2) demonstrate that the association between amino acids and their anticodon di- nucleoside monophosphates is not l imited to "hydrophil ici t ies" in one solvent system, but instead are related to other general properties of each series measured indepen- dently (in fact, fifteen characteristics of each of the twenty amino acids and three characteristics of each of the sixteen anticodon dinucleoside monophosphates will be examined for their respective 90 intercorrelations);

3) show that the relationships between amino acids and their anticodon dinucleo- side monophosphates correlate well with the organization (periodicity) of the genetic code as commonly presented.

Also, the correlations presented here allow a difficulty in this subject, which was not directly at tacked, to be addressed. Woese et al. (1967) have stated: "Regarding those constraints on codons assigned to ' related' amino acids we can make less definite statements. The reason is precisely because we have no adequate definition of what 'related amino acids' are - a definition which is relative, is context dependent, in any case." By analyzing fifteen properties of each amino acid, a related set of amino acids will be defined.

Ration~e

The genetic code is well known to be systematically degenerate (most codons with identical 5 '-dinucleotides code for the same or closely related amino acids), consider- ably resistant to mutat ions (especially transitions, less so transversions), symmetrically arranged (Karasev, 1976; Versteeg and Vliegenthart, 1965), and assumes universal. Many experimenters in at tempting to elucidate the origin of the genetic code have tried to show that the physical parameters of amino acids and nucleotides limit the stochastic possibilities for genetic codes. While this may be true, I have not seen an explicit eval- uation of how many codes are possible. Some underestimates can be calculated if con- crete assumptions are made. We (Bertman and Jungck, 1978) have used combinatorics to reach estimates that at least 1071 to 1084 different genetic codes like our contem- porary code with 64 codons and 21 coded entities are possible. It seems implausible that natural selection could have operated on all of these conceivable codes. There- fore, is there any discernible reason for natural selection of this particular code? While many papers have been addressed to this problem, none of them has included and ana- lyzed data on all sixteen dinucleotides and all twenty amino acids. It is the purpose of this paper to scrutinize the properties of these 36 compounds for regularities amongst coded pairs. If there are any regularities at all, then these regularities, even if only rough ones, will constitute clues to a physical foundat ion of the genetic code. On the other hand, if the origin of the genetic code were a purely arbitrary fixation, then there may be no systematic order to be found (Woese, 1974).

The Genetic Code as a Periodic Table 213

Correlations Between Amino Acids and Their Anticodon Dinucleotides

Besides the relative "hydrophilicity" data on the twenty proteinaceous amino acids reported by Weber and Lacey (1978), a variety of other properties are easily available from the literature. Thirteen other sets of data available on the physical and chemical characteristics of amino acids are indexed in Table 1 and individual values are listed in Table 2. In addition, the compositional frequencies of amino acids in 69 distinctly heterologous, evolutionarily diverse protein sequences (Jungck, 1971, unpublished) were also collected and are listed.

Besides the RF'S (relative hydrophilicities) reported by Weber and Lacey (1978) for each of the sixteen dinucleoside monophosphates, Barzilay et al. (1973), using a slightly different solvent, reported paper chromatography RF'S for all sixteen dinucleo- tides. Furthermore, if we make use of the contention that dinucleotide values can be approximately predicted (correlation > 0.95) from the mononucleotide data, then it would be intriguing to do so from Garel et al's. (1973) data because they reported the partition coefficients of seventeen amino acids (see Tables 1 and 2) as well as many mononucleotides in a single solvent system. This inclusion is particularly important because Garel et al. (1973) used oil-water partitions while Weber and Lacey (1978) employed an intensely hydrophilic mobile phase in their chromatography experiments. Therefore, we would expect correlations between amino acids and dinucleotides in these two experimental systems to be oppositely correlated. These three solvent sys- tems are listed in Table 1. Values of properties for the individual sixteen dinucleotides are listed in Table 2. For the three amino acids which have two doublet codes, prop- erties of anticodon dinucleotides were averaged on a weighted basis.

Pearson product moment correlation coefficients (Rp'S)

~XV - [ (:~X~Y)/N] (1) R p = x / ~ X 2 _ [ ( Y x ) 2 / N ] ~ ~ ~ y 2 _ [ ( ] ~ y ) 2 / N ] ]1'

were generated for the interrelationships between the fifteen properties of amino acids. These Rp's are tabulated in the matrix shown inTable 3. Zimmerman et al.'s (1968) polarities of the amino acids are not significantly correlated to eleven other amino acid properties (of the other fourteen properties, Woese et al.'s polarity, Grantham's polar- ity, and Levitt's hydrophobicity were the only exceptions); therefore, these polarities are distinct enough from other physical properties to be able to be used as quite inde- pendent predictors. Similarly, bulkiness is only significantly correlated (p < 0.001) to specific volume, Jones' hydrophobicity, and Bull and Breese's hydrophobicity. It is unsignificantly correlated e.g., to Zimmerman et al.'s polarities, and, therefore, is also a quite good candidate as an independent predictor.

Table 4 presents the Pearson product moment correlation coefficients for the inter- relationships between each of the fifteen amino acid properties and both the codonic and anticodonic series of dinucleoside monophosphates ' properties measured in three different systems.

In forty out of forty-five pairs, the absolute value of the amino acid - anticodonic dinucleoside monophosphate correlation was greater than the amino acid-codon cor- relation.

The Weber-Lacey (1978) values for the dinucleoside monophosphates paired with all fifteen amino acid properties considtently showed this anticodonic preference in

214 J . R . J u n g c k

Table 1. Index of amino acid and dinucleoside properties

Property Determination Procedure Reference

A. Amino acids

Molecular weight

Molecular volume

Refractivity

Alpha pK 1

Bulkiness

Specific volume

Polarity Zimmerman

Polarity Woese

Polarity Grantham

Hydrophobicity Jones Hydrophobicity Levitt

Hydrophobicity Bull and Breese

Hydrophilicity Weber and Lacey

Partition Coefficient (Garel K)

Sequence Frequency

Handbook value

Residue volume (A 3) minus the constant peptide volume (See Goldsack and Chalifoux, 1973 for unadjusted data)

Refractivity scale based on glycine as the zero point. (MeMeekin et al., 1964, published data for 17 a.a.)

Titration (Combined Cohn and Edsail, 1943; Edsall and Wyman, 1958, values)

Ratio of the side chain volume (Waugh, 1959) to length (Sorm, 1962)

Residue partial molal volumes (Cohn and Edsall, 1943, 17 a.a.)

48 + dipole momen t for ionized side chains dipole moment otherwise

Slope of the straight line resulting when log R F for that a.a. is plot ted against mole fraction of H20 in the pyridine-H20 solvent employed in paper chromatography

Woese et al.'s value averaged with R F in another system

Supplemented estimates to Nozaki and Tanford, 1971, data based on free energy of transfer of a.a. side chains from H20 to 100% ethanol or dioxane (Both Jones and Levitt)

Free energy of transfer of a.a. residues from 0.10 M NaCI solution to the surface

R F of a.a. in a high salt solvent for chromatography

The partition coefficient K = RF/(1-RF) obtained from a K3PO4, methoxy-2 ethanol, and 20% 2-butoxyethanol chromatography system

Frequency of occurrence of the a.a. in 69 hererologous evolutionarily diverse proteins

Numerous sources

Grantham, 1974

Jones, 1975

Zimmermann et al., 1968

Zimmermann et al., 1968

McMeekin et al., 1964

Zimmermann et al., 1968

Woese et al., 1967

Grantham, 1974

Jones, 1975

Levitt, 1976

Bull and nreese, 1974

Weber and Lacey, 1978

Garel et ai., 1973

Jungck, 1971 ; unpublished data

B. Dinucleoside monophosphates

Hydrophilieity Weber and Lacey

Hydrophilicity Barzilay et al.

Hydrophobicity Garel et al.

R F in (lO/90:v/v) 1.0 M ammonium acetate/saturated Weber and ammonium sulfate, pH 7.0 at 25°C Lacey, 1978

R F from paper chromatography using saturated ammonium Barzilay sulfa te /1M sodium acetate/ isopropanol (80/18/2:v/v/v) et al., 1973 as a developing solvent

K determined as above; R x was calculated as the product Garel et al., of the RX'S of the constituents 1973; this

paper

The Genet ic Code as a Periodic Table 215

0

c~

0

0

e.,

o

o

,4

o . 0 . . . . . o ° . . . ° . ° . , 0 .

o o o o o o o o o o o o o o o o o o o o

~ ~ o o $ o ~ o ~ ~ ~ 7 ~ . . . . . . ° . ° . . 0 . . . . . . . .

0 , . . . . . . ° ° o ° 0 0 ° 0 ° , 0 .

. . . . . . 0 . . . . . . . . . . . . .

o o o o o o o o o o o o ~ o o o o o o o . . . . . . . . . . . . . . . . . . . °

%0 .~ co co ~ 0~ .~ ~ ~ %o ~ ~, I-- ~ ~ .-4 ~-- ~ .-4 c~ o4 ~ ~ o -~ o~ ~ ~ o .-4 .~ ~ c~ ~ c0 c~ .-I (~ u~ . . . . . . . . . . . • • . • • • • • •

o o o o o o o o o o o o o o o o o o o o

• . . . . . . . . . . ° . . . . . . . .

o o o o ~ o o o o 0 0 o o o o o o o o o

° 0 0 0 . . . . . . ° 0 ° ° ° . . . . .

o o o o o o o o o o o o o o o o o o o o

<o ~ ~ ~ o ° ~ ~ ~ == ~ ~ < .~ ~ ~ o <

0 .=.

o

0

0

0

0

-0

0

8 'm

0

0

=.

<

216 J.R. Jungck

" m

e~

8 v

,4

0 ZC~

~ ~ ~ o ~ ~ o ~ o o ~

. , ° . . , . , . . . . . . . . . . . .

o o o o ~ o o o o ~ o o O o O O O O ~

I I I I I

~ o ~ o ~ o ~ ~ o ~ ~

i

• . . . . , . . . . . . . . . . ° , . .

~ ~ o ~ ~ o o . . . . .

J . . . ° . . . . . . . . . . . . . . . .

J

~ . 0 . 0 , . . . . . , . . . . . . . . ,

The Genetic Code as a Periodic Table 217

0

0

0

H

r~

~o

o

o o~ o o

tt~ C~l

o o o o

I-I

• . • ° . °

0 0 o ? 0 1 ,

• ° . . . •

~ o o o o o I I I I I

• ° ° ~ ~

o o o ~ I I

I ~

~ M H ~ u u

• ° ° • '~,

~' fi' fi' fi' fi' ,

o ~ g o g g o I - ~ I I I

M

~ ~; g g g d g , , ,

~ " • o o g d I~ I I l I I

O ~

m ~

O 0

I I I I I

d d o o d d d d I I I I I

I I I I I I

~q

6 0

o'~

m

218 J . R . Jungck

Table 4. lntercorrelations between the properties of amino acids and associated dinucleoside monophosphates

AMINO ACID PROPERTIES

WEBER-LACEY R F BARZILAY ET AL. R F GAREL ET AL. R X ANTI- ANTI- ANTI- CODON CODON CODON CODON CODON CODON

GRANTHAM POLARITY

WOESE POLARITY

LEVITT HYDROPHO- BICITY

BULL & BREESE HYDROPHO- BICITY

BULKINESS

JONES HYDROPHO- BICITY

ZIMME RMAN POLARITY

SPECIFIC VOLUME

WEBER & LACEY

GAREL K

REFRACTIVITY

MOLECULAR VOLUME

ALPHA pK 1

SEQUENCE FREQUENCY

MOLECULAR WEIGHT

+0.90 -0.59 +0.78 -0.40 -0.77 +0.59

+0.89 -0.58 +0.75 -0.63 -0.70 +0.72 a

-0.83 +0.55 -0.68 +0.61 +0.67 -0.65

+0.76 -0.42 +0.68 -0.43 -0.61 +0.50

-0.67 +0.41 -0.63 +0.37 +0.58 -0.31

-0.66 +0.49 -0.55 +0.53 +0.60 -0.44

+0.64 -0.42 +0.56 -0.44 -0.48 +0.51 a

-0.62 +0.36 -0.66 +0.03 +0.55 -0.25

+0.50 b -0.37 +0.30 -0.49 a -0.52 +0.40

-0.38 +0.34 -0.33 +0.32 +0.43 c -0.26

-0.32 +0.23 -0.20 +0.31 a +0.37 -0.17

-0.29 +0.12 -0.21 +0.19 +0.37 -0.05

-0.22 -0.05 -0.21 -0.20 +0.06 +0.02

+0.19 -0.03 +0.10 -0.07 -0.10 +0.05

-0.ii +0.05 +0.02 +0.30 a +0.19 -0.02

Underlined values are significant at the 0.001 level. Values without "a" have anticodon correla- tions with amino acids greater than respective codon correlations (40 of the 45 pairs). Only one of the exceptions is significant at the 0.001 level. All signs are in favor of the anticodon hypo- thesis b There was considerable packing and a slight hyperbolic appearance when this pair of properties was plotted; therefore, a linear transformation was attempted (a.a. R F vs a.a. RF/dinucleoside RF); the Rp was 0.65 c The rank order correlation between the 17 amino acids actually reported by Garel et al. (1973) and their anticodon dinucleoside monophosphates is 0.74

every single ins tance; in fact , seven cases showed a very significant correlat ion wi th

an t i codons (p < 0.001) while n o n e of the codonic associat ions were this significant.

The Genetic Code as a Periodic Table 219

While the Barzilay et al. (1973) and Garel et al. (1973) associations were not as ex- clusive, they are still consistent with the hypothesis of amino acid-anticodon association if one makes the ordinary chemical assumption that polar compounds associate with polar compounds and nonpolar compounds associate with nonpolar compounds, similis similigaudet (Nagyvary and Fendler, 1974). Recall the fact that both Weber and Lacey (1978) and Barzilay et al. (1973) employed a hydrophilic solvent system while Garel et al. (1973) employed a hydrophobic solvent system. Thus, an examination of Table 4 quickly reveals that signs of the correlation coefficients are as important as the absolute values of the correlation coefficients in assessing the nature of the associations of amino acids and dinucleoside monophosphates.

While the ergo propter boc fallacy of confusing correlation with causation must be assiduously avoided, it can easily be inferred from these results that certain associations are certainly worth pursuing. For example, my colleagues R. Patel and J. Kratohvil have suggested investigating amino acid-oligonucleotide interactions by measuring bind- ing constants via temperature jump relaxation kinetics of fluorescence and/or the change of volume upon mixing. With respect to this latter suggestion, note that in the correlations shown in Table 4 that the association between the specific volume of amino acids and the anticodonic dinucleoside monophosphates (Barzilay system) is significant at the 0.001 level while the amino acid-codon pair show a coefficient very close to zero. Also, both these suggested measurements should be feasible even with minute quantities of oligonucleotide.

If Table 4 is inspected further, it can easily be inferred which types of properties of amino acids are probably quite important in their interactions with oligonucleotides. All three measures of amino acid polarity (Grantham, 1974; Woese et al., 1967; and Zimmerman et al., 1968), all three measures of amino acid hydrophobicity (Levitt, 1976; Bull and Breese, 1974; and Jones, 1975), and the measures of amino acid bulki- ness and specific volume have the highest absolute values of correlation. On the other hand, properties such as molecular weight, alpha pK1, and refractivity are quite uncor- related to either anticodonic or codonic values of dinucleoside monophosphates. This observation reinforces the contention that the primary physical relationship between amino acids and their anticodons was on a "hydrophobicity-hydrophilicity" basis or a bulkiness-polarity basis and not on some other bases (at least of those studied; the electrostatic interactions suggested by Helene (1977) were not amenable to this kind of analysis).

Finally, the three sets of nucleotide data are all significantly correlated with each other well beyond the 0.0005 fiducial level. As predicted, of course, two out of the three correlation coefficients are opposite in sign to that of the third.

Periodic Generic Code

We may now address ourselves to the reasonableness of the genetic code assignments. Woese et al. (1966) and Volkenstein (1966) made organizations of the genetic code based principally on a few properties o f the amino acids, data on the nucleotides not being available. On the other hand, Karasev (1976) organized the genetic code simply on the basis of 5'->3' and 3'->5' symmetries of dinucleotide pairs. With data on the nucleotides and amino acids in the same test system, Lacey and Weber (1977) organized the genetic anticode on hydrophilic and charge considerations.

220 J.R. Jungck

H

0

Z I

Table 5. The genetic anticode

INCREASING

CENTER NUCLEOTIDE

MONONUCLEOTIDE RELATIVE HYDROPHILICITY

A

G

C

U

A

G

C

U

A

G

C

U

A

G

C

U

Phe

G

t~lll il, Ill 5.0 ~GA Set , 7 , , 5 I

1 - 7 1 1 I i j l I , 1 Ser l I I i'll I ~e~, ' 1 ' I i j j i ( I j

I s?r d I I t

llJl~lll Pro 6.6

Pro

Pro

Pro

Ala 7.0

Ala

Ala

Ala

6.6 Thr

Thr

Thr

Thr

AAA

GAA Phe 9GA

CAA Leu 4.9 ~7~ II I

U A A Leu U~_A

AAG Leu 4.9 AGG

GAG Leu GGG

CAG Leu CGG

UAG Leu UGG

AAC Val 5.6 AGC

GAC Val GGC

CAC Val CGC

UAC Val UGC

AAU I le 4.9 AGU

GAU I le GGU

CAU Met 5.3 CGU

UAU I le UGU

C O

ACA Cys 4.8

GCA Cys

CCA Trp 5.2

UCA Term.

AUA Tyr

GUA Tyr

CUA Term.

UUA Term.

f / /

5 /yy

/

5 . 4 H

X D Z D Z

5~

yl

U

,.-o I--t

H

I,-t

5.1 6.9 7.7 9.7 P

INCREASING WOESE' S AMINO ACID POLAR REQUIREMENT

The sixteen boxes of the genetic anticode are arranged according to the systematic arrangement of mononucleotides and "strong" dinucleotides of anticodons (underlined couplets). The nine amino acids in the diagonally shaded boxes are compositely most polar, least bulky, and least hydrophobie amino acids. The unshaded boxes contain eleven least polar, most bulky and most hydrophobic amino acids. The value for serine was consistently most deviant from prediction in multiple re- gression calculations; hence, the serine-xGA pairs are given vertical shading. The significantly re- lated groups of amino acids are quite different from earlier groupings (Karasev, 1976; Volkenstein, 1966). While Lacey and Weber (1977) arranged the nucleotides identically, their amino acid rela- tionships were quite different.

The number next to each amino acid is its respective value of Woese et al.'s (1967) polar require- ment. The numbers at the bottom of each column are the anticodonically weighted averages of the amino acid polar requirement values in that column.

Note that the initiation and termination anticodons are in opposite comers (diagonally) of the table constructed in this fashion.

With the 174 exp l i c i t correlat ions presented here, instead o f just an inferred single associat ion, w e can inc lude m a n y other factors in organizing the genet ic code and

t ~

0

H

The Genetic Code as a Periodic Table 221

challenge earlier inferences. First, the sixteen dinucleotides displayed in a four-by-four table (Table 5) are ordered in a significant manner (p < 0.0005in ten instances and p > 0.05 in thirty instances) with respect to the chromatographic characteristics of their respective anticodon dinucleotides. These results indicate that the correlations between amino acids and anticodon dinucleotides are not just an arbitrary property of a single chromatographic system. Furthermore, the correlation between amino acids and oligo- nucleotides is with physical properties that have not evolved. By contrast, there are eons of evolutionary history in the relationship between amino acids and tRNA mole- cules, (Nathenson et al., 1965; Woese et al., 1967).

If we view the genetic code as a two way table of rows and columns, then we can ask what basic properties underlie this two way relationship. Multiple linear regression yielded the following equations for the calculation of expected dinucleoside mono- phosphate RF'S (of the Weber and Lacey type, 1978) from Zimmermann et al.'s ( i968) polarity and bulkiness and Jones' (1975) "hydrophobici ty" indices for amino acids:

2) Dinucleotide R F = 0.0029 Polarity (Z) -- 0.0149 Bulkiness + 0.376

[RPcalc_ob s = 0.75]

3) Dinucleotide R F = 0.0028 Polarity (Z) -- 0.0568 "Hydrophobici ty" (J) + 0.2271

[Rpcalc_ob s = 0.72]

4) Dinucleotide R F = 0.0028 Polarity (Z) -- 0.0109 Bulkiness - 0.021 "Hydro- phobicity" (J) + 0.3449

[Rpcalc.ob s = 0.78]

As anticipated from the fact that bulkiness and "hydrophobici ty" are highly correlated, the equation with three independent variables does not give a significantly better pre- diction than that with only two variables: polarity and bulkiness. The data of Woese (1973); Woese et al. (1967); and Grantham (1974) were added later and obviously gave the strongest correlations. If we use multiple linear regressions with Woese's polarity requirement and Zimmerman et al.'s bulkiness, then we obtain the very significant relationship :

5) Dinucleotide R F = 0.037 Polarity requirement -- 0.0074 Bulkiness + 0.0320

[Rpobs_cal c = 0.92]

This relationship is illustrated in Figure 1. Bull and Breese's (1974) "hydrophobicity data were also employed in multiple linear regression analyses; however, no improve- ment in prediction was obtained because of its high correlation with other parameters. Note also that the direction of intercorrelations seems to indicate that Bull and Breese (1974) actually measured "hydrophilicity" not "hydrophobicity".

In Table 5 is presented the genetic anticode organized by the nucleotide hydrophil- icities. Also, Woese's polar requirement values are included in the tabulation. Note that anticodonically weighted averages of the polarities increase monotonically from the left to the right of the Table 5 in the same direction as increasing nucleotide hydro- philicity. Secondly, if we reduce the genetic code to a two-by-two contingency table organized into the four kinds of dinucleotides: purine-purine, purine-pyrimidine,

222 J . R . Jungck

O B S 4. E R V E D

3- D I N U C L2. E 0 T I D E 1

R F

0 0

0 u . . u u

, c c ~ / u c-uc

CG./C/U/

G~UU G;UG

'CAuA' *~ f .~C AU" z/oAU //xc ~.~

/~.AA I I ! I I 1 2 3 4 5

EXPECTED D1NUCLEOTIDE R F

Fig. 1, The expected dinucleotide R F of the anticodon corresponding to an amino acid was cal- culated from the bulkiness and polarity requirements of the amino acid by:

Dinucleotide R F = 0.037 Polarity requirement - 0.0074 Bulkiness + 0.032.

The observed dinucleotide RF'S are those reported by Weber and Lacey (1978). The Pearson product moment correlation between observed and expected values is 0.92 and the Spearman rank order correlation coefficient is 0.95. The diagonal line represents a perfect correlation. Only 19 of the 20 points are countable on the graph because the two predictions for UG were indistinguishable

pyrimidine-purine, and pyrimidine-pyrimidine; and assort the anticoded amino acid bulkiness values, then we obtain a Chi-square value of 4.74, which with one degree of freedom is significant at the 3% level (if the Yates correction is applied, X 2 = 4.35 which is significant at the 4% level). Since purines are much bulkier than pyrimidines, these results are consistent with the hypothesis that bulky amino acids tend to have bulky anticodon dinucleotides. Therefore, it seems reasonable to think of the periodic nature of genetic coding as reflecting the polarities and bulkincsses of the respective pairs of amino acids and anticodons.

While the statistical calculations reported here do support and extend the thesis (orig- inally suggested by Dunhill, 1966 and Ralph, 1968; and previously experimentally supported by Nagyvary and Fendler, 1974 and Weber and Lacey, 1978) that the re- lationships between amino acids and their anticodon dinucleotides were the basis for the origin of the genetic code, the mechanism whereby those relationships manifested themselves in primitive coding systems is not clear. Obviously any such mechanism will have to have relied upon kinetic factors, associated with an actual translation mech- anism, in addition to the oligonucleotide and amino acid characteristics discussed here.

The Genetic Code as a Periodic Table 223

Acknowledgements. I certainly appreciate the fact that Drs. Arthur Weber and James C. Lacey, Jr., shared their data and many references to relevant literature. Dr. Robert Feinberg, Mathematics Department, Iowa State, and Dr. Richard Dembo, Social Sciences Department, and Dr. B. Dennis Sustare, Biology Department, both of Clarkson College, provided expert mathematical and com- puter programming assistance. Dr. J. Myron Hood, Division of Biology, California Institute of Technology, and Dr. Jack E. Leonard and Daniel Armstrong, Department of Chemistry, Texas A and M, made many useful comments for revising the manuscript. I also want to thank my students for their patience in enduring my numerous attempts to present new models of the genetic code to them.

References

Barzilay, I., Sussman, J.L., Lapidot, V. (1973). J. Chromatogr. 79,139-146 Bertman, M.O., Jungck, J.R. (1978). Notices Am. Math. Soc. 25, A-174 Bull, H.B., Breese, K. (1974). Arch. Biochem. Biophys. 161,665-670 Cohn, E.J., Edsall, J.T. (1943). Proteins, amino acids and peptides, pp. 84-85,372.

New York: Reinhold Dunni11, P. (1966). Nature 210, 1267-1268 Edsall, J.T., Wyman, J. (1958). Biophysical chemistry, p. 452. New York: Academic

Press Gamow, G. (1954). Nature 173,318 Garel, J.p., Filliol, D., Mandel, P. (1973). J. Chromat. 78,381-391 Goldsack, D.E., Chalifoux, R.C. (1973). J. Theor. Biol. 39,645-651 Grantham, R. (1974). Science 185,862-864 Helene, C. (1977). FEBS Lett. 74, 10-13 Jones, D.D. (1975). J. Theor. Biol. 50,167-183 Jungck, J.R. (1971)Curr. Mod. Biol. 3,307-318 Karasev, V.A. (1976) Vestn. Leningr. Univ., Biol. 1, 93-97 Lacey, J.C., Jr., Pruitt, K.M. (1969). Nature 223,799-804 Lacey, J.C., Jr., Weber, A.L. (1976). In: Protein structure and evolution, L. Fox et al.,

eds., pp. 213-222 New York: Marcel-Dekker Lacey, J.C., Jr., Weber, A.L. (1977). Precamb. Res. 5, 1-22 Levitt, M. (1976). J. Mol. Biol. 104, 59-107 McMeekin, T.L., Groves, M.L., Hipp, N.J. (1964). In: Amino acids and serum proteins,

J.A. Stekol, ed., pp. 54-66. Washington: American Chemical Society Nagyvary, J., Fendler, J.H. (1974). Orig. Life 5,357-362 Nathenson, S.G., Dohan, F.C., Jr., Richards, H.H., Cantoni, G.L. (1965). Biochem. J.

4, 2412-2418 Nozaki, Y., Tanford, C. (1971). J. Biol. Chem. 246, 2211-2217 Pelc, S.R., Welton, M.G.E. (1966). Nature 209,868-870 Ralph, R.K. (1968). Biochem. Biophys. Res. Commun. 33,213-218 Rendell, M.S., Harlos, J.p., Rein, R. (1971). Biopolymers 10, 2083-2094 Sorm, F. (1962). Coll. Czech. Chem. Comm. 27,203-316 Versteeg, D.H.G., Vliegenthart, J.F.G. (1965). Experientia 21,615-616 Volkenstein, M.V. (1966). Biochim. Biophys. Acta 119, 421-424 Waugh, D.F. (1959). Rev. Mod. Phys. 31, 84-93 Weber, A.L., Lacey, J.C., Jr. (1978). J. Mol. Evol. dieses Heft Woese, C.R. ( 1973 ). Naturwissensch aften 60, 447-459

224 J.R. Jungck

Woese, C.R., Dugre, D.H., Dugre, S.A., Kondo, M., Saxinger, W.C. (1967). Cold Spring Harbor Sym. Quant. Biol. 31,723-736

Woese, C.R., Dugre, D.H., Saxinger, W.C., Dugre, S.A. (1966). Proc. Natl.. Acad. Sci. USA 55,966-974

Zimmerman, J.M., Eliezer, N., Simha, R. (1968). J. Theor. Biol. 21,170-201

Received October 5, •977; Revised February 27, 1978