evidence indicating independent assortment of - CiteSeerX

13
EVIDENCE INDICATING INDEPENDENT ASSORTMENT OF FRAMEWORK AND COMPLEMENTARITY-DETERMINING SEGMENTS OF THE VARIABLE REGIONS OF RABBIT LIGHT CHAINS Delineation of a Possible J Minigene* BY ELVIN A. KABAT,:~ TAI TE WU,§ AND HOWARD BILOFSKY From the National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20205; the Departments of Microbiology, Human Genetics and Development, and Neurology, and the Cancer Center, College of Physicians & Surgeons, Columbia University, New York 10032; the Departments of Biochemistry and Molecular Biology, Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, Illinois 60201; and Bolt Beranek and Newman Inc., Cambridge, Massachusetts 02238 The variable (V) 1 regions of immunoglobulin light chains when aligned for maximum homology can be divided into four framework regions (FR) separated by three complementarity-determining (CDR) (hypervariable [1]) regions or segments (2, 3). The latter as predicted (1), together with the corresponding three CDR of the heavy chain (4), form the antibody-combining sites (3-11). Light chains FR1, FR2, FR3, and FR4 comprise residues 1-23, 35-49, 57-88, and 98-107 and CDR1, CDR2, and CDR3 residues 24-34, 50-56, and 89-97. If the FR segments were grouped into sets of identical sequence and the members of each set were traced, it was shown (12) that members of a given FRI set could be associated with different FR2, FR3, and FR4 sets. This independent assortment suggested that the FR sets, and by implication the CDR sets, were under different genetic control, and the hypothesis was put forward that the individual FR and CDR sets were controlled by minigenes assembled somatically by recombination at the DNA level (12). A minigene is defined as a segment of DNA coding for a portion of a domain and which shows evidence of segregation as a functional unit independent of the rest of the DNA coding for the V region (13). Because we only assorted FR segments, the findings would be independent of whether one or two residues of a given CDR assorted with any FR segment. Studies by Tonegawa et al. with cloned mouse Vx (14, 15) and V~ (16) genes and by Seidman et al. (17, 18) with mouse V, genes showed that in 12-d-old embryo DNA, genes * Supported by the National Cancer Institute, National Institute of Allergy and Infectious Diseases, National Institute of Arthritis, Metabolism, and DigestiveDiseases,National Institute of General Medical Sciences, and the Division of Research Resources (Contracts NOI-RR-2147 and NOI-RR-8-2158),of the National Institutes of Health; and by a Cancer Center Support grant to Columbia University, from the National Cancer Institute, CA 13696. Recipient of grant BMS-76-81029 from the National Science Foundation. § Recipient of grant 5-RO1-GM21482-04 from the National Institutes of Health. Abbreviations used in this paper: CDR, complementarity-determining regions; FR, framework region(s); V, variable. 72 THE JOURNAL OF EXPERIMENTAL MEDICINE • VOLUME 152, 1980

Transcript of evidence indicating independent assortment of - CiteSeerX

E V I D E N C E I N D I C A T I N G I N D E P E N D E N T A S S O R T M E N T O F

F R A M E W O R K A N D C O M P L E M E N T A R I T Y - D E T E R M I N I N G

S E G M E N T S O F T H E V A R I A B L E R E G I O N S O F

R A B B I T L I G H T C H A I N S

Del inea t ion o f a Possible J Min igene*

BY ELVIN A. KABAT,:~ TAI TE WU,§ AND HOWARD BILOFSKY

From the National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20205; the Departments of Microbiology, Human Genetics and Development, and Neurology, and the Cancer Center,

College of Physicians & Surgeons, Columbia University, New York 10032; the Departments of Biochemistry and Molecular Biology, Engineering Sciences and Applied Mathematics, Northwestern

University, Evanston, Illinois 60201; and Bolt Beranek and Newman Inc., Cambridge, Massachusetts 02238

The variable (V) 1 regions of immunoglobulin light chains when aligned for maximum homology can be divided into four framework regions (FR) separated by three complementarity-determining (CDR) (hypervariable [1]) regions or segments (2, 3). The latter as predicted (1), together with the corresponding three C D R of the heavy chain (4), form the antibody-combining sites (3-11). Light chains FR1, FR2, FR3, and FR4 comprise residues 1-23, 35-49, 57-88, and 98-107 and CDR1, CDR2, and CDR3 residues 24-34, 50-56, and 89-97. I f the FR segments were grouped into sets of identical sequence and the members of each set were traced, it was shown (12) that members of a given FRI set could be associated with different FR2, FR3, and FR4 sets. This independent assortment suggested that the FR sets, and by implication the CDR sets, were under different genetic control, and the hypothesis was put forward that the individual FR and C D R sets were controlled by minigenes assembled somatically by recombination at the DNA level (12). A minigene is defined as a segment of DNA coding for a portion of a domain and which shows evidence of segregation as a functional unit independent of the rest of the DNA coding for the V region (13). Because we only assorted FR segments, the findings would be independent of whether one or two residues of a given CDR assorted with any FR segment. Studies by Tonegawa et al. with cloned mouse Vx (14, 15) and V~ (16) genes and by Seidman et al. (17, 18) with mouse V, genes showed that in 12-d-old embryo DNA, genes

* Supported by the National Cancer Institute, National Institute of Allergy and Infectious Diseases, National Institute of Arthritis, Metabolism, and Digestive Diseases, National Institute of General Medical Sciences, and the Division of Research Resources (Contracts NOI-RR-2147 and NOI-RR-8-2158), of the National Institutes of Health; and by a Cancer Center Support grant to Columbia University, from the National Cancer Institute, CA 13696.

Recipient of grant BMS-76-81029 from the National Science Foundation. § Recipient of grant 5-RO1-GM21482-04 from the National Institutes of Health.

Abbreviations used in this paper: CDR, complementarity-determining regions; FR, framework region(s); V, variable.

72 THE JOURNAL OF EXPERIMENTAL MEDICINE • VOLUME 152, 1980

ELVIN A. KABAT, TAI TE WU, AND HOWARD BILOFSKY 73

coding for residues 1-95 of the V region were followed by an intervening sequence. Other genes for the mouse h-region coded for residues 96-108, termed the J segment and were followed by an intervening sequence of about 1.2 Kb and then by the Cx coding segment (15). In mouse V, chains (19, 20), a clone coding for C, contained five J segments, each separated by an intervening sequence that varied in length from 246 to 310 base pairs and with an intervening sequence o f - 2 . 5 Kb between the ol closest to C and the C region. Thus, as defined, t he J segments are clearly minigenes (12, 13).

The amino acid (21) and nucleic acid (22, 23) sequence data on the hinge region also show that these nucleotides code only for a portion of a domain and therefore constitute a minigene. It has been suggested (22, 23) that it evolved from a complete domain by a shift of a splice site followed by mutational divergence of the left end of the domain to become a portion of an intervening sequence with preservation of considerable homology in nucleotide sequence with the 5' flanking end of the CH 1 domain.

The assortment principle was also used by Weigert et al. (24) to define the ,J segment from amino acid sequences with 96 as the site of the recombination, e.g., including two residues of CDR3 in ,J. They proposed that some diversity of antibody- combining sites could be generated by V-J joining because residue 96 is highly hypervariable (1, 2, 25). Because adult myeloma VL DNA clones code for a contiguous stretch of amino acids 1-107 (or 108) (26), the joining o f t h e J minigene to the DNA of the region coding for amino acids 1-95 has taken place during embryogenesis.

Because it was subsequently shown (27) that the five clones coding for V~ and the three coding for V~ all had segments matching at from 6/7 to 9/10 nucleotides plus the Trp codon at position 35, which could serve as recognition sites for recombination or insertion of CDR nucleotides between the nucleotides coding for the FR segments and for the CDR3-J junction, it was considered of interest to examine the amino acid sequences of rabbit V~ chains for evidence of independent assortment of FR and CDR segments and to see if a,J segment in the rabbit could be defined by assorting residues 96 and 97 of CDR3 with FR4 as has been found for the mouse.

Assorting rabbit V~ sequences may have certain potential advantages in that the rabbit populations are not inbred and in that most of the sequencing has been done on antibodies with specificities largely limited to several pneumococcal type-specific and to streptococcal grot/p-specific polysaccharides. It thus might be possible to recognize assortments associated with or not associated with certain antibody speci- ficities in FR and CDR segments. There are also very many complete rabbit V~ sequences, and yet the degree of sequence variation even among antibodies to the same antigen has not made it possible to define individual amino acid residues as contacting amino acids. Moreover, the sequence data included one pair of rabbits, K9-335 and K9-338, littermates (28, 29) producing monoclonal antibody to strepto- coccal group A variant carbohydrate whose light chains were identical in sequence from amino acids 1 to 108 and a second pair of completely unrelated rabbits (Ch. Brandt and ,J. C. olaton. Personal communication.), 311 and 4422, one bred in the United States and the other in Switzerland; both had produced monoclonal antibodies to type II pneumococcal polysaccharide whose light chains also were identical in sequence from amino acids 1-108 (30).

The data provide considerable evidence for independent assortment of FR and CDR segments and, as also concluded by Braun et al. (31), indicate that theJ segment

74 FRAMEWORK AND (2OMPLEMENTARITY-DETERMINING SEGMENTS

in the r abb i t as in t he m o u s e m a y i nc lude two res idues o f C D R 3 a n d m a y c o n t r i b u t e

to the gene ra t i on o f a n t i b o d y divers i ty .

M a t e r i a l s a n d M e t h o d s

All rabbit V, sequences were contained in Sequences of Immunoglobulin Chains (25); they represent the work of numerous groups of investigators and original references may be found in (25). Sequences that have not been determined for an entire FR segment were generally not used except when an invariant Cys 23 was lacking, and the sequences fell into an FR or C D R set with other members. All chains with the identical sequence for an FR or C D R segment were grouped together as a set. Members of a set are enclosed in a box. The prototype sequence was K4820 and was selected from the set with the most members of FR2, the most commonly occurring preserved segment found to date in 1 human V, IV, 20 mice (6 BALB/c and 14 NZB), and 13 rabbit sequences (12, 13). Above the prototype set are listed all amino acid positions and residues at which the other sets differed from the prototype (32). Above all other sets are listed only the positions and amino acid residues at which the set differed from the prototype. It is thus easy to see which residues in each set differed from the prototype and also by comparing any two sets to see at which positions they differ from one another (32). If the amino acid in any set differed from the prototype at a given position and if this involved a two base change, this is indicated by an asterisk (*). I f the differences involved deletions or insertions or when a three base change was involved this is indicated by a dagger (t"). To define a potential J segment and because of length variations in CDR3, the last two residues of C D R 3 in each sequence were renumbered as 97E and 97F and assorted with FR4 as a J segment (97E, 97F- 107) comparable with what had been observed in the mouse by assortment (24) and by nucleic acid sequencing of clones (26). In rabbit 2717 residue 106A had been misaligned and was made residue 107. This homologized all subsequent residues. It should be noted that FR4 is identical for all rabbit light chains except for K16-167 in which Thr replaces Gly at position 100. Chains that have not been sequenced completely are given in parenthesis with the number of unsequenced positions. Above the sets, unsequenced positions are indicated by a question mark. In assembling Fig. 1, the individual sets as typed were cut out and arranged to produce a reasonable minimum number of crossing lines in connecting the members of the sets. If this is not done (24, 33) patterns of greater complexity are generally seen that do not give a true representation of the extent of the assortment. This procedure not only makes for clarity of the figures, but is also essential conceptually when there is no reason to arrange the sets in any specific order. Indeed, when one attempts to rearrange the sets in some defined order to test an hypothesis, the finding of increased complexity of the assortment may provide data consistent with or conflicting with the hypothesis. Thus our original simplest assortment patterns of FR segments of human, mouse, and rabbit V, chains (12) became extremely complex when the sets were ordered in terms of the likelihood of the amino acid differences in the FR segments being ascribable to somatic mutat ion (32). This increase in complexity of the assortment patterns indicated that somatic mutat ion was not making a significant contribution in generating the different FR sets (32).

All rabbit chains with the prefix K were from breeding colonies at The Rockefeller University (New York) and the Basel Institute of Immunology (Basel) and are related. Detailed geneologies may be found in Braun et al. (29, 31); the prototype rabbit K4820 was a progenitor of the Basel colony. It and K4878 and K4872 came from the State Serum Institute, Copenhagen, whereas K19, K20, K23 and K31 came from the Rockefeller University colony. Thus a portion of the data comes from families produced by matings of brother-sister and of other close

Fro. 1. Assortment of FR, (2DR regions, and a proposed J segment of rabbit V, chains. O, antitype III pneumococcal polysaccharide; O, antitype VIII pneumococcal polysaccharide; A, antistrepto- coccal group A variant carbohydrate; &, antistreptococcal group C carbohydrate; x, anti-Micrococcus lysodeikticus; +, anti-p-azophenylarsonate; ~, antitype II pneumococcal polysaccharide; I--I, antidi- goxin; I , anti-p-azobenzoate; *; two base changes; t, three base changes; :~, Gin Gin Asp inserted between residues 58 and 59 and Val between 85 and 86; ?, residue not known; -, deletion; ¶, residues 43 and 44 missing; #, an unidentified residue is present at position 97F. K9-335 and K9-338 were litter mates. 311 and 4422 are unrelated. One was purchased from a dealer in the U. S. and the other from a dealer in Switzerland.

F R X CD R1

1 -23 24 -5 t t

OAla 13Al= 2 5 A 1 . 2 9 L l e

4 ~ e ~ 1 7 G ] y 2 7 B -- 3 4 5 e = 9 A 1 . 1 9 V = 1 286.=

ZOSer 2 0 T h = . 1 v=1 22Lys [ K482O ~

2 8 A S u 3 2 s e t

2 7 G l u 3 4 A l a 9 s e t l Z S e r * 3 2 ~ 1 u *

13368 " 1

28A~X 3 1 S e t av=~ 3OTyr-- 3 2 T y r

I K z ~ 4 o _~ 1 3 n l u 3OTyr* 32G~y~ 2VaZ l ~ P r o 31Set 34Ala

3 l T h r ) 4 A l a 2 P h e 9Set 3 2 T y =

0 _ ~ lop=o 2 7 B V ~ l T l & t a 12Set* 28Tyz 34Ala 2Leu 22Ser* 2 9 L ~ . 9S~

Z7ASe~ + 3OS~r*

2 v a £ llger ~ ~ . . . . . . . 9S=o 2=A ~ 8 P e ~ " n 2 7 ~ n 2 9 S e ~

2 7 B L e ~ T 3 0 A . ~

I ~ O - s o ~ ~ \

l~Olu 2 7 ~ V a l T ] O A . n * 2 P h e Z 4 p = o 2 S T y r 4Leu 29GIy*

] A l a 12Set* 2Ty=* 2 2 A S h

~os~

2 T y r * 28Asn 3 2 A s h * 30Tyr 3 ~ A l a

2 7 A S e = * 3OAon* 2 7 B Y = l * 3 = A r g *

34A1.

Z 7 G l u 3 2 T r p * 30Set 34Ala

27BVal ~ 2STy~

- ~ 2 2 S e r *

t A Z ~ 4 L ~ 2 ~ ' h e 2 2 A s n

9 ~ e r 1 7 ~ p l Z h y s * 2 2 A 8 ~ 1 2 5 e r *

FR2

35 -49

3b~yr 4 ~ P ~ O 40pro 4bLeu 42Cin 4 8 £ 1 e

2 5 S e t 2 9 G l u

l~Zn* ~ 2 S e r * 2 7 ~ V = 1 7 )IGly* 9 S ~ = 2 2 A . n 2 8 T y r 3 2 A r g *

Z 3 G l u 1 9 1 3 5 ~ l

~ O P h e " 3 2 A e x 31Asx

2 8 V . 1 - - 32ASh* | K 6 - - 1 3 9 A 2 9 T y r * 3 4 A l a

o + 4 L ~ U K 6 - - I 3 9 I m 1 ~ 1 5 B e t K 3 3 - - 3 B O & I ~ ( ~ 1 1 1 4 7 ) 2

3 t s~ -

/

~ 1 ( K 1 9 ) 1 =m

1 2 S e r * I 3 T 7 4 " 1

4 L e u 2 7 A S e r * 3 O A . n . 2 7 B V a X * 3 2 A ~ g *

o - ~ z z s r * ~ • + t ~ G ] & 1 3 3 2 2 A o l 2 V a Z 22A~. ~lSe= 34AXa

CDR2 FR3 CDR5

50-56 57-88 89-97D

~V~l 7sv~L 59P~o 7 9 ~ n ~ 9 C l n 9 ~ -- 63Lys 8 1 A 1 . 9OCln

7 ~ l u 84Set 9 2 ~ . 9 7 B ~OLy= 53Thr 71Fh o 8 5 T h r 9 3 L e a e T C --. 5 X A I s 5 5 A X a 7 4 P r o 8 7 T y r 94~Iy 9 7 n -- 52Se~ 7 7 A ~ p 95ASh - -

5 9 s~= 7 ~ 1 =

7 4 ~ h r 8 4 A Z a 91Tyr* 94S.+r

S g S e = 7 9 G 1 u

7&Thr 8 4 A 1 ~ 9 1 S e r 9 ~ y ~ ~OArg he 9 2 T l l ~

596~= 7Skeu 9 o s e ~ * 9 6 A s p ~ 91Thr* 9 7 C ] y - -

94ASh* 97CASh 2 95Ser 9 7 D ~ y =

69Thr 8 1 A s p 3 3 8 1 i 7 ~ 1 . 0 3 A l =

7 9 ~ 1 u 9*Thr* 9 6 A ~ = /

7 ~ 1 n 84AI= B g L y s O ~ l l e

69Thr 8 3 A X a

74Thr 8 9 A r g 9 2 ~ 9 0 V 8 ! . 9 3 T h r *

69Thr 83AZa 7o~1~ 84~1~ 77~1Y 8 ~ Z Z e 9 0 L a u 9 4 A s ~

69Thr 83AIa ~ 9 ~ I Y * 9 3 T y r *

9 2 T y r 95Ser

A X * S 3 ~ * 5 9 S e t 74Set 8 0 L e ~ 95S 63Thr 8 ~ p 9OcXy* 9 6 S e t ?

. . . . . . . . . . . . . . ~;~ :~ ; 9 ~ C y ~

6 3 A = ~ 7 8 L e u " t 3 3 1 5 ol 6 ~ T h r ~ 3 A 1 .

9~c lg 5 ~ ? 9 2 T h r 9 5 --

~9$er 8 r A s p 6 9 T h ~ 8 3 A l a

7 7 G l y ~ r l ~ g3Asn* 9 7 v a l ~ . ~ . . . . ~ . ~ w ~ ÷

7 0 ~ s p 8 3 A ~ a 90c1y* 9 3 T y r ~ 74Thr S 4 A l a q[Ser 9 6 T h r * 7 8 L e u 8 ? P h e 9 Z A ~ X 95cly*

' 6 9 T h r 8 A l a 7 4 T h r O ~ A X a 9 0 G l Y *

9 2 A ~ p 95Ala*

6 9 T h r 89Ala 81Amp 95Glu 7OGXn 8~Ala 96Thr--

9 3 T h r

8 ~ k l a 9 0 S ~ r * 9 1 A l a

69Th¢ 81Asp 74Thr 83Ala 7 7 G l y 84AIm 8 9 L e u 9 4 T y r ~

5g 74Thr ~ { K Z T - - 3 0 6 ) A I I • ~ 92? o ~ l y ~

9 ~ l y * 94Ser

5 7 - - 0 0 Z 7 6 ? 9 Z T Y ~ * 9SASp 6 9 T h r S 3 A I a 7 ~ 1 n 8 ~ A X a

J

9 7 E - 1 0 7

K g - - 3 3 5 K 9 3 3 ~

K 3 0 2 6 7 ( K ~ 7 - 3 Q 6 ) 9 1

9 7 f . t h r 9 v r ~ v r *

9 7 ~ C l y ~ 9 7 F ? , ,

l O O T h r

9 7 ~ 1 a * 9 7 F s e c ~

9 7 E V ~ 1 ~ T F ~ l y

9 7 E A r n 97 FASn

| ) 3 6 8 . |

9 7 E T h ~

9 7 E T y r * 9 7 F T h r *

9 7 E V a l 9 7 n h r *

6O? 7 7 ~ * y

6 9 T h r 8 3 A l a 92Trp ~ 96Ash ~ 3 6 P h e 4~Ar s 7 0 A S p 8 6 A i a 93Thrl 9 7 V a ] ~

0 4 ~ o ~ 9 7 M ~ p Y 9 7 E A s ~ 9 7 F T ~ , = *

4 6 ~ 1 y * 5 9 s e t 7 @ L e ~ 63~er 7 9 G l u 8 9 ¢ ; 1 u 9 4 S e r

j 4 1 3 s ~ f - - 50~hr ~ z 6~Cly 8 0 ? 9 0 7 05Clx 69Th= O ~ A t a

I x p - 1 mill I ( x p - O 2 m l I ( x P - * ) ~ IN I J C x P - * ) z I f J ( x p - ~ ) 6 m I

6 2 P h e 7 0 L e u 66Arg 7 9 G l u 9 0 S e t * 9 3 S e r * 69Thr 8 3 A l a O 1 T y r * 9411.*

4 5 V . 1 50Ar~ ~ x ~ = 7 4 T ~ r 8 & ~ Z . 9 2 T y r 9 5 S e t

1 3 3 7 ~ " t | 3 3 7 4 e l I 3 3 7 4 e l 1 3 3 7 4 e I

~ 7 - - 6 1 7 8 ~ s p 8 9 A l a 0 4 s e t 6 9 T h e 8 3 A l a 9 ~ l y * 95Se= 7 ~ n 84AIa

4 0 . 4 ~ - - 4 9 7 5 0 - - 5 6 ? 7 6 T r 9 3 S e t 9 7 . S s ~

1 ( 3 T 7 4 ) 8 : ~ ; { 3 T 7 4 ) ~ 1 1 --~ 1 ( 3 T 7 4 } ~ - - I | 3 T 7 4 e I

7 0 L e u 8 3 A l a 7 4 T h r 8 4 A 1 a

1 ( 3 T 7 4 ) 1 . J

O S e r 3 2 A r g *

2 5 S ~ = 2 9 ~ . = 2 6 T h = 3 0 ~ . =

~2Ty= z c . ~ * 1 2 s ~ - - z 7 B x z ~ + 3 4 A 1 .

J 2 7 1 7 I J | 1 Z 7 1 7 I I 1 |

i s~=~ 2SS~ 29~h=

e 7 ~ v . l * 3 4 7 9S.~ Z Z ~ n *oP~o 28~y~

[303 ~ 1 1 ( 3 o 3 ) ~ ~ 1

] 3V~1 26Th~ 2 9 ~ 2 L ~ 14pro 2 7 ~ G l y ~ 3 1 ~ . x 9 s e e 1 7 A . x 2 7 E a l t 3 2 C l y .

lOPro 22Thr 28Thr 34AZa

t ~53 ~I I ~53 ~I

43ArS ~ 6 V a l 5 0 g l y 5 3 L y = ~ 4 1 5 3 I

( 4 1 5 3 1 i ) 3 | 4 1 5 3 X ~ l ~ - - ' - - - ~ 5 7 - - 6 z " 7 •

] ) A l l ~ ( 4 1 5 3 ] Z ) 1 6 ~ J

63Thr 7 9 C l x 89Ciy* 9 3 T y r * 6 9 T h r 8 1 b . p Q ~ Z y * 9 4 T h v 7 ~ 1 x 8 3 A 1 ~ 9 1 A l a 9 5 G l y *

4 6 ~ a 5 0 T h r 5 3 S e r 7 4 T h r 8 4 A X a 92A~p ~ 7 ~ T h r 9 7 F S ~ r ~

J 2 7 1 7 I J j

this page intentionally left blank

ELVIN A. KABAT, TAI TE WU, AND HOWARD BILOFSKY 77

relatives, whereas the rest of the data derives from outbred and randomly selected animals. The inbred animals were immunized largely with streptococcal group C and group A variant streptococci, whereas the noninbred animals received pneumococcai polysaccharides, Micrococcus lysodeiktieus, etc.

Results an d Discussion

Table I presents the findings indicating independent assortment of FR and CDR segments and defining a possible J segment that contains two residues of CDR3 as established earlier in the mouse (19, 20, 24) and as proposed for the rabbit (31).

Among the important findings are the following: (a) The FR sets that contain more than a single member generally show multiple

antibody specificities. This is true for all FR 1 sets, and for the largest FR2 set. One FR2 set with four members and the two FR3 sets containing the pairs of identical sequences are limited to a single antibody specificity.

(b) The CDR sets tend to vary. Sets with multiple members in CDR1 have the same antibody specificity, whereas the sets in CDR2, the shortest CDR, excluding the pairs with identical V-region sequences, have members with different antibody specificities. This is perhaps not surprising because the antibody specificities are restricted to but a few antigens and because many members of these sets are related. The one CDR3 set of two related members, K16-167 and K27-306 (cousins), but that

TABLE I

Frequency Distribution of Members of Multiple Sets in Relation to Amount of Sequence Data

Segment FRI CDRI FR2 CDR2 FR3 CDR3 J

Amino acid residue 1-23 24-34 35-49 50-56 57-88 89-97D numbers

Number of sequences 5l 43 28 (+1") 26 27 (+1") 25 26 Number of sets 24 34 12 (+3*) 13 (+4*) 26 23 18 Sets/sequences, % 47 79 43 50 96 92 62 Sets with multiple 11 4 3 4 2 3 4

members Sets with single mere- 13 30 9 9 24 20 14

bers Number of sequences 38 14 21 17 4 4 12

in sets with multiple members

Number of sequences 75 32 75 65 15 16 46 in sets with multiple members/total number of se- quences, %

Sets with single mere- 54 85 75 70 88 87 77 bers/total number of sets, %

Number of sets/num- 1.05 3.1 0.8 1.8 0.8 1.8 1.5 bet of amino acid residues in segment

97E, 97F-107

TheJ segment is made up by taking the last two residues of CDR3, called 97E, 97F, plus FR4 by analogy with the mouse J sequence.

* No sequence data or sequence incomplete.

78 FRAMEWORK AND COMPLEMENTARITY-DETERMINING SEGMENTS

differed in sequence in the other FR and CDR segments had the same antibody specificity; CDR2 of K27-306 was not sequenced.

(c) The most striking evidence for independent assortment is the finding that the monoclonal antitype II pneumococcal light chains with identical V-region sequence from unrelated outbred rabbits 311 and 4422, one from Switzerland, the other from the United States (30), had an FR1 identical with two other antistreptococcal group A variant antibodies from two other rabbits K16-167 and K27-489. Unlike K16-167 and K27-489, 311 and 4422 had the preserved FR2 set as did the prototype K4820 originating in Denmark; this set contained segments of chains of antibodies of three specificities different from type II, one chain of no known specificity, as well as two other chains with type II specificity.

(d) The pair of littermates with an identical sequence K9-335 and K9-338 assorted in the same set in CDR2 with three related rabbits of the same specificity and with three unrelated outbred rabbits 3381, BS-1, and K25--these having another specific- ity. It is of interest that a second population of antibodies of the same specificity from one member of the pair, rabbit K9-335I, had different FR1, CDRI, shared the preserved FR2 set, and yet was in a CDR2 set that contained antibodies of four specificities.

(e) The J segment contained 18 sets. This is a large number, especially because the FR4 portion, residues 98-107, was identical in all but one sequence, K16-167, which had Thr at position 100 instead of Gly. All members of the prototype set had the same specificity as did the set with K16-167, and, indeed, except for this substitution, it would have fallen into the prototype J set because it also has Ile-Val at positions 97E and 97F. Of the three sets with two members, one set had a chain 3547 not known to have antibody activity (34) although it had been immunized with strepto- coccal vaccine. The other had antistreptococcal group A variant activity like the prototype set; another set had one chain with antipneumococcal type I!I activity and the other with antistreptococcal group A variant activity; the third set had the two identical chains from unrelated rabbits with type II antipneumococcal activity.

( f ) Of the remaining 13 sets of J segments that differed from each other and from the prototype at one or both of the two positions 97E, 97F which could contribute to the generation of diversity, 8 were from light chains of antibodies to type III pneumococcal polysaccharide, 2 to p-azobenzoate, and 1 each to type VIII pneumo- coccal polysaccharide, to group C streptococcal polysaccharide and to M. lysodeikticus. The variations in sequence at positions 97E and 97F for the J segments from chains with type III antipneumococcal antibody give no indications of how they could influence site specificity, but there is no evidence that the different type III sites are necessarily the same.

(g) In me mouse, the first three nucleotides in the intervening sequence following the codon for amino acid 95 are CCC, and there appears to be considerable preservation of nucleotide sequences at the junction of J and CDR3 (26). Sakano et al. (19), Max et al. (20), and Weigert et al. (24) have proposed that some diversity in J could have been generated by intracodon recombination when the J nucleotides are joined to the rest of the V region. If one attempts such intracodon recombinations assuming CCC to be preserved at the beginning of the intervening sequence in the rabbit as in the mouse, the prototype set Ile Val would generate Leu or Pro at position 97E, neither of which has been found. Indeed, with CCC by the proposed intracodon

ELVIN A. KABAT, TAI TE WU, AND HOWARD BILOFSKY 79

recombination it would be possible to produce, in addition to Leu and Pro, only His, Gin, and Arg, and only Arg has been found at position 97E in one J segment. Thus the amount of diversity that may be introduced by VJ joining in the rabbit is not very great, and 11 different amino acids have been found at position 97E in the 18 J segments. The outbred rabbit may prove, therefore, to have more J , minigenes than the inbred mouse 2 or a more complex mechanism to generate diversity (35) as also proposed for the variable portion of Ig heavy chain (33). The precise location of the N-terminal end of J segments cannot be identified from amino acid sequences alone (24), and nucleotide sequence data will be needed to determine whether the two last amino acid residues in CDR3 should indeed be included with FR4 to form the J segments of rabbit kappa light chains, as has been found for mouse light chains. Indeed much of the diversity generated by intracodon recombination in the mouse derives from an extra Pro between residues 95 and 96 in a single NZB light chain PC7132 (24) which was assigned to J, thus permitting intracodon recombination involving six nucleotides of the intervening sequence following residues 95. It is possible that such recombination between the other FR and (2DR minigene segments could also contribute somatically to diversity.

(h) An important parameter in relation to any mechanism for the generation of diversity is the number of CDR and J sets in relation to FR sets. Table I is an attempt to estimate this. If there are many more CDR sets than FR sets, this would have substantial implications for the generation of diversity. Although the rabbit amino acid sequence data are the most extensive, they nevertheless represent a very highly selected group of sequences because of the limited number of antibody specificities that have been examined. The data are also skewed in that many more FRI and CDR1 sequences have been determined as compared with the rest of the chain. The tendency to have multiple sets will also be a function of the length of each segment comprising the set. Table I attempts to obtain an estimate of the relative numbers of FR and CDR sets.

If the CDR are compared with the FR, it is clear, as seen from the last row in Table I, that the number of sets is a function of the number of amino acid residues in the FR or CDR segment, and this must be taken into account. Even without this, however, it is clear comparing FR 1 and FR2 with CDR 1 that there are almost twice as many sets in CDR1 as in FR1 and FR2 and that a much smaller proportion of sets in CDRI has multiple members despite the selection for only a few antibody specificities. If these values are corrected for length of the segment, the difference in number of sets between (]DR ] relative to FR 1 and FR2 increases two to three times. The same results are seen if CDR3 and the proposed J segment are compared with FRI and FR2; in each instance, the percentage of sequences in sets with multiple members is much lower for CDR3 and J. These differences in number of sets would be further increased if adjusted for length differences with CDR2, which differs strikingly in length from the other segments and which shows much less variability in rabbit V, chains than is seen in other species (36). The proportion of sequences in sets

2 Rudikoff, S., D. M. Rao, C. P. J. Glaudemans, and M. Potter (personal communication) have suggested for mouse antigalactan Vx light chains that an additional gene containing an A as the first nucleotide of the intervening sequence following the codon for amino acid 95 could also account for the additional diversity.

80 FRAMEWORK AND COMPLEMENTARITY-DETERMINING SEGMENTS

with multiple members is less than in FR 1 and FR2. The number of sets would also almost be doubled if adjusted for length differences.

(i) The data on FR3, the segment of longest length, do not appear to fit with the other segments. The 26 sets each have but a single sequence except the 2 sets each with the completely identical V~ chains. However, there is some indication that FR3 can be divided into two segments which show some suggestion of assortment or recombination. Such assortment can be seen if F'R3 is separated into two segments 57-68 and 69-88; the data are generally similar for 57-69 and 70-88 or 57-70 and 71-88. Fig. 2 shows the data for 57-68 and 60-88; it is clear that many of the individual chains in Fig. 1 can be grouped into sets and that these sets show evidence of assortment. It may be of significance that the three-dimensional structure of immunoglobulin light chains shows residues 68 and 60 to be at a bend connecting two//-strands in the mouse V, light chain Fv dimer REI (5, 9).

The two sets of .57-68 contain 13 and 5 chains and assort with six sets of 60-88 containing two or three members each. Thus the separation of FR3 into two segments has substantially reduced the 26 distinct FR3 sets.

Although the clones from 12-d-old mouse embryo DNA indicate that nucleotides coding for amino acids 1-05 occur as a contiguous segment (14, 17, 18, 26), the evidence for assortment, by recombination with or insertion into nucleotides coding for FR segments, of nucleotides coding for C D R segments suggests that this minigene mechanism in addition to diversity generated by the joining of residues 1-95 to the J minigene is fundamental to the generation of diversity.

57-68

59Pro K482o | K9-335 F ' IK9-338 | i K29-213 1 IBS-5 ) 3547 l )311 l 14422 ,K16-167 | K49-501 I--'-- 3322-B ,K30-3671 \ B4B5 J '~

59Ser

69-88 70Gin 78Val 74Pro 79Gin 77Asp 81Ala

' ~ 74Thr

~ ~ lK732"50J 70Glu 77Gly

74Thr 81Asp

78Leu 79Glu "

70Glu 79Glu 78Leu

Flo. 2. Assortment of FR3 into two segments.

ELVIN A. KABAT, TAI TE WU, AND HOWARD BILOFSKY 81

The recent report of Schilling et al. (33) of a segment generating diversity in CDR3 N-terminal to the J segment of anti-al ---* 3 dextran hybridomas and bearing individual idiotypic determinants provides additional evidence consistent with the minigene hypothesis.

It is important to distinguish between the generation of diversity and the generation of complementarity differences responsible for antibody specificity, e.g., for noncova- lent binding of an antigenic determinant in the antibody-combining site. Although positions 96 and 97 of CDR3 of VL are coded for by the J minigenes (195 20) and although position 96 shows the highest variability (1, 2, 25), position 96 has been identified only once as a contacting residue in the high resolution x-ray crystallo- graphic structures thus far reported (5-11) and in initial model building studies of antibody-combining sites without (37, 38) or with (39) nuclear magnetic resonance data in which the amino acid residues of the CDR sequences known to bind ligands were introduced on to a framework established by x-ray crystallographic studies. Padlan et al. (40) established residue 96 as a contacting residue for phosphorylcholine in the mouse myeloma protein McPC603. In mouse V, chains position 97 is essentially invariant (25). Rudikoff et al. ~ also consider that in the fll ---* 6 galactans, the alternative amino acids at position 96 do not contribute to complementarity. Thus the intracodon recombination proposed (19, 20, 24) as a mechanism for V-J joining and for reducing the number o f J minigenes might create sequence diversity without contributing significantly to those complementarity differences which would make for various antibody specificities. Indeed, even if antibody site complementarity were to be generated by position 96 as in McPC603 when an actual J sequence was assembled somatically by joining to the rest of the V region, it would not necessarily follow that the other amino acids if created at position 96 by intracodon recombination would necessarily function as CDR residues in CDR3 although conceivably they might influence other residues in CDR3 conformationally so that they become contacting. Indeed, although the discovery of the J minigene and of somatic assembly at the DNA level between the 12th d of embryonic life and the adult (12, 15, 17, 18, 26) is a seminal development in molecular genetics, it has tended to focus attention on position 96 and away from the other CDR residues that are crucial to the generation of antibody diversity, of idiotypic specificity, and of their interrelations.

S u m m a r y

Amino acid sequences of rabbit light chains "show considerable evidence of inde- pendent assortment of framework (FR) and complementarity-determining (CDR) segments. This suggests that they are coded for by independent genetic units (mini- genes) and that individual light chains are assembled somatically by recombining these units. Identical FR sets with multiple members generally comprise chains with different specificities, whereas identical CDR sets tend to have chains of a single specificity. A J segment, which, by analogy with mouse light chains, is made up of the last two residues of CDR3 plus all of FR4, contained 18 different sets and could contribute to diversity generated by CDR3.

The longest segment, FR3, had a very large number of sets. Evidence is presented showing that the number of sets could be substantially reduced by permitting FR3 to

82 FRAMEWORK AND COMPLEMENTARITY-DETERMINING SEGMENTS

be formed by two independently assorting segments comprising residues 57-68 and 69-88.

Received for publication 22 February 1980.

Refe rences

I. Wu, T. T., and E. A. Kabat. 1970. An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity.J. Exp. Med. 132:21 I.

2. Kabat, E. A., T. T. Wu, and H. Bilofsky. 1976. Variable Regions of Immunoglobulin Chains, Tabulations and Analyses of Amino Acid Sequences. Bolt Beranek and Newman Inc., Cambridge, Mass.

3. Kabat, E. A. 1978. The structural basis of antibody complementarity. Adv. Protein Chem. 32: 1.

4. Kabat, E. A., and T. T. Wu. 1971. Attempts to locate complementarity-determining residues in the variable positions of light and heavy chains of immunoglobulins. Ann. N. Y. Acad. Sci. 190.382.

5. Davies, D. R., E. A. Padlan, and D. Segal. 1975. Immunoglobulin structures at high resolution. Contemp. Top. Mol. Immunol. 4:127.

6. Padlan, E. A. 1977. Structural basis for the specificity of antigen-antibody reactions and structural mechanisms for the diversification of antigen-binding specificities. Q. Rev. Biophys. 10,35.

7. Saul, F. A., L. M. Amzel, and R. J. Poljak. 1978. Preliminary refinement and structural analysis of the Fab' fragment from human immunoglobulin New at 2.0/k resolution. J. Biol. Chem. 25:585.

8. Fehlhammer, H., M. Sehiffer, O. Epp, P. M. Golman, E. E. Lattman, and W. Steigemann. 1975. The structural determination of the variable portion of the Bence-Jones protein Au. Biophys. Struct. Mech. 1:139.

9. Epp, O., E. E. Lattman, M. Schiffer, R. Huber, and W. Palm. 1975. The molecular structure of a dimer composed of the variable portions of the Bence-Jones protein REI refined at 2.0/~ resolution. Biochemist~. 14:4943.

10. EdmuncIson, A. B., K. R. Ely, R. L. Girling, E. E. Abola, M. Schiffer, F. A. Westholm, M. D. Fausch, and H. F. Deutsch. 1974. Binding of 2,4-dinitrophenyl compounds and other small molecules to a crystalline X-type Bence Jones dimer. Biochemist01. 13:3816.

11. Wang, B. C., C. S. Yoo, and M. Sax. 1979. Crystal structure of Benee Jones protein Rhe (3 A) and its unique domain-domain association.J. Mol. Biol. 129.657.

12. Kabat, E. A., T. T. Wu, and H. Bilofsky. 1978. Variable region genes for the immunoglob- ulin framework are assembled from small segments of DNA-a hypothesis. Proc. Natl. Acad. Sci. U. S. A. 75:2429.

13. Kabat, E. A., T. T. Wu, and H. Bilofsky. 1979. Evidence supporting somatic assembly of the DNA segments (minigenes), coding for the framework, and complementarity-determin- ing segments of immunoglobulin variable regions.J. Exp. Med. 14~1299.

14. Tonegawa, S., A. M. Maxam, R. Tizard, O. Bernard, and W. Gilbert. 1978. Sequence of a mouse germ-line gene for a variable region of an immunoglobulin light chain. Proc. Natl. Acad. Sci. U. S. A. 75:1485.

15. Brack, C., M. Hirama, R. Lenhard-Schuller, and S. Tonegawa. 1978. A complete immu- noglobulin gene is created by somatic recombination. Cell. 15:1.

16. Lenhard-Schuller, R., B. Hohn, C. Brack, M. Hirama, and S. Tonegawa. 1978. DNA clones containing immunoglobulin • chain genes isolated by in vitro packaging into phage )~ coats. Proc. Natl. Acad. Sci. U. S. A. 75:4709.

ELVIN A. KABAT, TAI TE WU, AND HOWARD BILOFSKY 83

17. Seidman, J. G., and P. Leder. 1978. The arrangement and rearrangement of antibody genes. Nature (Lond.). 276.'790.

18. Seidman, J. G., E. E. Max, and P. Leder. 1979. A K-immunoglobulin gene is formed by site-specific recombination without further somatic mutation. Nature (Lond.). 280:.370.

19. Sakano, H., K. Hiippi, G. Heinrich, and S. Tonegawa. 1979. Sequences at the somatic recombination sites of immunogiobulin light-chain gene. Nature (Lond.). 280:.288.

20. Max, E. E., J. G. Seidman, and P. Leder. 1979. Sequences of five potential recombination sites encoded close to an immunoglobulin K constant region gene. Proc. Natl. Acad. Sci. U. S. A. 76:3450.

21. Frangione, B., and E. C. Franklin. 1979. Split immunoglobulin genes and human heavy chain deletion mutants. J. Immunol. 122:1177.

22. Tucker, P. W., K. B. Marcu, N. Newell, J. Richards, and F. R. Blattner. 1979. Sequence of the cloned gene for the constant region of murine y2b immunoglobulin heavy chain. Science (Wash. D. C ). 206:1303.

23. Honjo, T., M. Obata, Y. Yamawaki-Kataoka, T. Kataoka, T. Kawakami, N. Takahashi, and Y. Mano. 1979. Cloning and complete nucleotide sequence of mouse immunoglobulin yl chain gene. Cell. 18:559.

24. Weigert, M., L. Gatmaitan, E. Loh, J. Schilling, and L. Hood. 1978. Rearrangement of genetic information may produce immunoglobulin diversity. Nature (Lond.). 276:785.

25. Kabat, E. A., T. T. Wu, and H. Bilofsky. 1979. Sequences of Immunoglobulin Chains. National Institutes of Health Publication 80-2008.

26. Bernard, O., N. Hozumi, and S. Tonegawa. 1978. Sequences of mouse immunoglobulin light chain genes before and after somatic changes. Cell. 15:1133.

27. Wu, T. T., E. A. Kabat, and H. Bilofsky. 1979. Some sequence similarities among cloned mouse DNA segments that code for h and • light chains of immunogiobulins. Proc. Natl. Acad. Sci. U. S. A. 76:4617.

28. Braun, D. G., and H. Huser. 1977. Rabbit antipolysaccharide antibodies: structure and genetics. Progr. Imrnunol. 3:255.

29. Braun, D. G., H. Huser, and W. F. Riesen. 1976. Rabbit antibody light chains: selective breeding narrows variability in framework and complementarity determining residues. Fur. J. Imraunol. 6:570.

30. Brandt, D. Ch., and J.-C. Jaton. 1978. Identical VL region sequences of two antibodies from two outbred rabbits exhibiting complete idiotypic cross-reactivity and probably the same antigen binding site fine structure.J. Immunol. 121:1194.

31. Braun, D. G., H. Huser, R. Knecht, and J. K. Wright. 1979. Amino acid sequences of antibody light chain variable regions of pedigreed rabbits: K light chain K49-501 (allotype b4) anti-streptococcal group A-variant polysaccharide antibody. Hoppe-Seyler's Z. Physiol. Chem. 230:.663.

32. Kabat, E. A. 1979. Implications of the assortment of framework segments for the assembly of immunoglobulin VL and Vn regions and the generation of diversity. In Cells of Immunoglobulin Synthesis. B. Pernis, and H. J. Vogel, editors. Academic Press, Inc., New York. 33.

33. Schilling, J., B. Clevinger, J. M. Davie, and L. Hood. 1980. Amino acid sequence of homogeneous antibodies to dextran and DNA rearrangements in heavy chain V-region gene segments. Nature (Lond.). 283:35.

34. Thunberg, A. L., and T. J. Kindt. 1976. Amino acid sequence of rabbit light chains: variable region of a light chain from a homogeneous immunogiobulin raised by strepto- coccal immunization. Biochemist~. 15:1381.

35. Mage, R. G. The phenotypic expression of rabbit immunoglobulins: a model of complex regulated gene expression and cellular differentiation. Contemp. Top. Mol. Immunol. In press.

84 FRAMEWORK AND COMPLEMENTARITY-DETERMINING SEGMENTS

36. Haber, E., M. N. Margolies, L. E. Cannon, and M. S. Rosemblatt. 1975. Restricted clonal responses: a tool in understanding antibody specificity. Miami Winter Syrup. 9,393.

37. Davies, D. R., and E. A. Padlan. 1976. Correlations between antigen binding specificity and the three-dimensional structure of the antibody combining site. In Antibodies in Human Diagnosis and Therapy. E. Haber and R. M. Krause, editors. Raven Press, New York. 119.

38. Padlan, E. A., D. R. Davies, I. Pecht, D. Givol, and C. Wright. 1976. Model building studies of antigen-binding sites: the hapten-binding site of MOPC 315. Cold Spring Harbor Symp. Quant. Biol. 41:627.

39. Dwek, R. A., S. Wain-Hobson, S. Dower, P. Gettins, B. Sutton, and S. J. Perkins. 1977. Structure of an antibody combining site by magnetic resonance. Nature (Lond.). 266:31.

40. Padlan, E. A., D. R. Davies, S. Rudikoff, and M. Potter. 1976. Structural basis for the specificity of phosphorylcholine-binding immunoglobulins. Immunochemist~. 13:945.