The Nature of Introns 4-7 Largely Reflects the Lineage Specificity of HLA-A Alleles

16
Abstract For most HLA-A alleles the phylogeny of the 3non-coding regions has not yet been studied systemat- ically. In this study, we have determined the sequences of introns 4–7 in 50 HLA-A variants, and have computed nucleotide substitution rates and phylogenetic relation- ships. The A2/A28, A9, and A10 groups were character- ized by clear lineage specificity. For the A19 group, lin- eage specificity was weaker. A*3001 clustered together with the alleles of the A1/A3/A11/A36 serological family, but not with the A19 group alleles. Reduced lineage specificity was also observed for the alleles of the A1/A3/A11/A36 groups. The 3intron sequences of A*8001 were clearly distinct from all other alleles stud- ied. In several cases two allelic groups shared identical intron sequences, whereby the patterns varied with the introns. A similar situation has been previously de- scribed for the 5introns. Since recombination is the ma- jor mechanism of HLA diversification, the intronic lin- eage specificity corresponds to the comparatively lower recombination rate of the HLA-A 3exons. The low level of recombination within the 3region of HLA-A is sup- ported by the low CpG content with a maximum of 3.0% in this region compared with up to 10.7% in the 5re- gion. Apart from phylogenetic studies of HLA diversity and diversification, the sequence data obtained in our study may prove valuable for the development of a hap- lotype-specific sequencing strategy for the HLA-A 3ex- ons and for the explanation of recombination events in newly described HLA class I alleles. Keywords HLA-A · MHC · Gene rearrangement · Intron · Diversity Introduction The human class I major histocompatibility complex (MHC) gene HLA-A is characterized by an enormous number of polymorphisms (Marsh et al. 2001) that in January 2001 comprised 239 alleles (Robinson et al. 2000). This polymorphism is mainly concentrated in the antigen-presenting α1 and α2 domains, which are encod- ed by exons 2 and 3, respectively, and is regarded as an evolutionary advantage for the formation of immune re- sponses against pathogens (Bjorkman and Parham 1990; Parham et al. 1995). According to their serological reac- tivity, HLA-A alleles have been grouped into five lineag- es: A2/A28, A1/A3/A11/A36, A9, A10, and A19 (Dausset 1971; Kato et al. 1989; Lawlor et al. 1990; Lopez de Castro et al. 1982; Madrigal et al. 1993). With the intro- duction of molecular methods, the grouping of the re- spective lineages was confirmed at the nucleotide se- quence level. Corresponding to the relationship between exons, strong lineage specificity was found in introns 1, 2, and 3 (Blasczyk et al. 1996, 1997; Cereb et al. 1996a, b; Kotsch et al. 1997; Meyer and Blasczyk 2000), which provided the prerequisites for a haplotype-specific se- quencing strategy of HLA-A (Kotsch et al. 1997). In con- trast to the 5region of the HLA-A gene, defined here as the part which comprises exons 1–3 and introns 1–3, on- ly limited sequence data (Crew 1997; Summers et al. 1993) exist from the 3end (exons 4–8, introns 4 – 7), encoding the α3, transmembrane and cytoplasmic do- mains (Strachan et al. 1984). To obtain further informa- tion about the nature of the 3intronic sequences, introns 4–7 of 81 HLA-A gene copies, each representing at least one variant of each HLA-A allelic group, were sequenced following haplotype-specific amplification. Using these sequences, phylogenetic analyses were performed, and evolutionary rates for introns 4–7 were computed. Fur- thermore, evolutionary parameters including recombina- tion rates were calculated throughout the gene for 30 al- leles, of which coding and non-coding sequences from exon 1 through intron 7 are known. H.-A. Elsner · R. Blasczyk ( ) Department of Transfusion Medicine, Hannover Medical School, Carl-Neuberg-Strasse 1, 30625 Hannover, Germany e-mail: [email protected] Tel.: +49-511-5326700, Fax: +49-511-5322079 J. Rozas Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain Immunogenetics (2002) 54:447–462 DOI 10.1007/s00251-002-0491-3 ORIGINAL PAPER Holger-Andreas Elsner · Julio Rozas Rainer Blasczyk The nature of introns 4–7 largely reflects the lineage specificity of HLA–A alleles Received: 12 February 2002 / Accepted: 22 July 2002 / Published online: 12 September 2002 © Springer-Verlag 2002

Transcript of The Nature of Introns 4-7 Largely Reflects the Lineage Specificity of HLA-A Alleles

Abstract For most HLA-A alleles the phylogeny of the3′ non-coding regions has not yet been studied systemat-ically. In this study, we have determined the sequencesof introns 4–7 in 50 HLA-A variants, and have computednucleotide substitution rates and phylogenetic relation-ships. The A2/A28, A9, and A10 groups were character-ized by clear lineage specificity. For the A19 group, lin-eage specificity was weaker. A*3001 clustered togetherwith the alleles of the A1/A3/A11/A36 serological family,but not with the A19 group alleles. Reduced lineagespecificity was also observed for the alleles of theA1/A3/A11/A36 groups. The 3′ intron sequences ofA*8001 were clearly distinct from all other alleles stud-ied. In several cases two allelic groups shared identicalintron sequences, whereby the patterns varied with theintrons. A similar situation has been previously de-scribed for the 5′ introns. Since recombination is the ma-jor mechanism of HLA diversification, the intronic lin-eage specificity corresponds to the comparatively lowerrecombination rate of the HLA-A 3′ exons. The low levelof recombination within the 3′ region of HLA-A is sup-ported by the low CpG content with a maximum of 3.0%in this region compared with up to 10.7% in the 5′ re-gion. Apart from phylogenetic studies of HLA diversityand diversification, the sequence data obtained in ourstudy may prove valuable for the development of a hap-lotype-specific sequencing strategy for the HLA-A 3′ ex-ons and for the explanation of recombination events innewly described HLA class I alleles.

Keywords HLA-A · MHC · Gene rearrangement · Intron · Diversity

Introduction

The human class I major histocompatibility complex(MHC) gene HLA-A is characterized by an enormousnumber of polymorphisms (Marsh et al. 2001) that inJanuary 2001 comprised 239 alleles (Robinson et al.2000). This polymorphism is mainly concentrated in theantigen-presenting α1 and α2 domains, which are encod-ed by exons 2 and 3, respectively, and is regarded as anevolutionary advantage for the formation of immune re-sponses against pathogens (Bjorkman and Parham 1990;Parham et al. 1995). According to their serological reac-tivity, HLA-A alleles have been grouped into five lineag-es: A2/A28, A1/A3/A11/A36, A9, A10, and A19 (Dausset1971; Kato et al. 1989; Lawlor et al. 1990; Lopez deCastro et al. 1982; Madrigal et al. 1993). With the intro-duction of molecular methods, the grouping of the re-spective lineages was confirmed at the nucleotide se-quence level. Corresponding to the relationship betweenexons, strong lineage specificity was found in introns 1,2, and 3 (Blasczyk et al. 1996, 1997; Cereb et al. 1996a,b; Kotsch et al. 1997; Meyer and Blasczyk 2000), whichprovided the prerequisites for a haplotype-specific se-quencing strategy of HLA-A (Kotsch et al. 1997). In con-trast to the 5′ region of the HLA-A gene, defined here asthe part which comprises exons 1–3 and introns 1–3, on-ly limited sequence data (Crew 1997; Summers et al.1993) exist from the 3′ end (exons 4–8, introns 4 – 7),encoding the α3, transmembrane and cytoplasmic do-mains (Strachan et al. 1984). To obtain further informa-tion about the nature of the 3′ intronic sequences, introns4–7 of 81 HLA-A gene copies, each representing at leastone variant of each HLA-A allelic group, were sequencedfollowing haplotype-specific amplification. Using thesesequences, phylogenetic analyses were performed, andevolutionary rates for introns 4–7 were computed. Fur-thermore, evolutionary parameters including recombina-tion rates were calculated throughout the gene for 30 al-leles, of which coding and non-coding sequences fromexon 1 through intron 7 are known.

H.-A. Elsner · R. Blasczyk (✉)Department of Transfusion Medicine, Hannover Medical School,Carl-Neuberg-Strasse 1, 30625 Hannover, Germanye-mail: [email protected].: +49-511-5326700, Fax: +49-511-5322079

J. RozasDepartament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain

Immunogenetics (2002) 54:447–462DOI 10.1007/s00251-002-0491-3

O R I G I N A L PA P E R

Holger-Andreas Elsner · Julio RozasRainer Blasczyk

The nature of introns 4–7 largely reflects the lineage specificityof HLA–A alleles

Received: 12 February 2002 / Accepted: 22 July 2002 / Published online: 12 September 2002© Springer-Verlag 2002

448

Table 1 DNA samples used in this study

Number Source (sample no.) Allele Sample . Ethnic origin Intron 4 Intron 5 Intron 6 Intron 7 Cell lined

no GenBank GenBank GenBank GenBank accession accession accession accessionno. no. no. no.

1 A*01011 MHH Caucasian AJ458203 AJ491721 AJ490826 AJ490941996724 (Germany)

2 DZA 1999 (#5)a A*01011 MHH Caucasian 994938 (Germany)

3 DZA 1999 (#18)a A*01011 MHH Caucasian OLH 994951 (Germany) (=E20399904)

4 UCLA 1997 (#001)b A*02011 970551B unknown AJ458204 AJ491722 AJ490827 AJ4909425 A*02011 MHH Caucasian

9970616 UCLA 1999 (#77)b A*0202 MHH Asian/Black AJ458205 AJ491723 AJ490828 AJ490943

9928817 DZA 1999 (#15)a A*0203 MHH Asian (Korea) AJ458206 AJ491724 AJ490829 AJ490944 KON

994948 (=E29962804)8 DZA 1999 (#19)a A*0205 MHH Caucasian AJ458207 AJ491725 AJ490830 AJ490945 STF

994952 (Germany) (=E20399906)9 UCLA 1996 (#889)b A*0206 961074B Oriental AJ458208 AJ491726 AJ490831 AJ490946

10 UCLA 1999 (#100)b A*0207 MHH Asian AJ458209 AJ491727 AJ490832 AJ490947997437 (China)

11 UCLA 1999 (#96)b A*0211 MHH Asian AJ458210 AJ491728 AJ490833 AJ490948996266 Indian

12 A*0217 960673B Caucasian AJ458211 AJ491729 AJ490834 AJ49094913 UCLA 1997 (#001)b A*0225 970551B Unknown AJ458212 AJ491730 AJ490835 AJ49095014 A*0227 970129B Caucasian AJ458213 AJ491731 AJ490836 AJ49095115 A*03011 MHH Caucasian AJ458214 AJ491732 AJ490837 AJ490952

996849 (Germany)16 UCLA 1999 (#83)b A*03011 MHH Caucasian

99305317 DZA 1999 (#7)a A*03011 MHH Caucasian

99494018 A*1101 MHH Caucasian AJ458215 AJ491733 AJ490838 AJ490953

996662 (Germany)19 DZA 1999 (#17)a A*1101 MHH Unknown ALJ

994950 (=E20399905)20 A*1102 960895B AJ458216 AJ491734 AJ490839 AJ49095421 DZA 1999 (#11)a A*2301 MHH Caucasian AJ458217 AJ491735 AJ490840 AJ490955 BNA

994944 (Germany) (=E21706203)22 A*2301 MHH Caucasian

99070423 UCLA 1999 (#85)b A*2301 MHH Black

99426024 A*2301 MHH Black

999546 (Nicaragua)25 UCLA 1999 (#94)b A*2301 MHH Black

99626426 A*2301 MHH Caucasian

997444 (Germany)27 UCLA 1999 (#73)b A*2402101 MHH Caucasian AJ458218 AJ491736 AJ490841 AJ490956

99055728 A*2402101 MHH Caucasian

996833 (Germany)29 A*24031 MHH Caucasian AJ458219 AJ491737 AJ490842 AJ490957

00397430 DZA 1999 (#19)a A*2405 MHH Caucasian AJ458220 AJ491738 AJ490843 AJ490958 STF

994952 (Germany) (=E20399906)31 UCLA 1997 (#889)b A*2407 961074B Oriental AJ458221 AJ491739 AJ490844 AJ49095932 A*2410 970130B Unknown AJ458222 AJ491740 AJ490845 AJ49096033 A*2416 MHH Caucasian AJ458223 AJ491741 AJ490846 AJ490961 THOC

990512 (Germany) (=E29980601)34 UCLA 1999 (#122)b A*2417 MHH Asian AJ458224 AJ491742 AJ490847 AJ490962

00328435 A*2418 MHH Caucasian AJ458225 AJ491743 AJ490848 AJ490963

9910611 (Germany)36 A*2501 MHH Caucasian AJ458226 AJ491744 AJ490849 AJ490964

996298 (Turkey)37 DZA 1999 (#11)a A*2501 MHH Caucasian BNA

994944 (Germany) (=E21706203)

449

Table 1 (continued)

Number Source (sample no.) Allele Sample Ethnic origin Intron 4 Intron 5 Intron 6 Intron 7 Cell lined

no. GenBank GenBank GenBank GenBank accession accession accession accessionno. no. no. no.

38 DZA 1999 (#5)a A*2501 MHH Caucasian 994938 (Germany)

39 A*2601 MHH Caucasian AJ458227 AJ491745 AJ490850 AJ490965996303

40 A*2601 MHH Caucasian990704

41 A*2901 MHH Caucasian AJ458228 AJ491746 AJ490851 AJ490966006933

42 A*2902 MHH Caucasian AJ458229 AJ491747 AJ490852 AJ490967994912 (Germany)

43 A*2902 MHH Caucasian 996064 (Germany)

44 A*2902 MHH Caucasian 996662 (Germany)

45 UCLA 1999 (#83)b A*3001 MHH Black AJ458230 AJ491748 AJ490853 AJ490968993054

46 UCLA 1999 (#75)b A*3001 MHH Black990559

47 A*3001 981777 Unknown48 A*3002 998645 Unknown AJ458231 AJ491749 AJ490854 AJ49096949 A*3002 981777 Unknown50 A*3004 94/450B Unknown AJ458255 AJ491772 AJ490878 AJ49099351 UCLA 1999 (#104)b A*31012 MHH American AJ458232 AJ491750 AJ490855 AJ490970

997975 Indian52 A*31012 MHH Caucasian

99706353 A*3201 MHH Caucasian AJ458233 AJ491751 AJ490856 AJ490971

996764 (Germany)54 UCLA 1999 (#106)b A*3201 MHH Caucasian

99864455 UCLA 1999 (#91)b A*3201 MHH Caucasian

99520056 UCLA 1999 (#92)b A*3301 MHH Hispanic AJ458234 AJ491752 AJ490857 AJ490972

99520157 A*3303 960155B Unknown AJ458235 AJ491753 AJ490858 AJ49097358 A*3303 MHH Caucasian

996183 (Germany)59 ET 1999 (#1504)c A*3305 MHH Unknown AJ458236 AJ491754 AJ490859 AJ490974

99896560 UCLA 1999 (#95)b A*3402 MHH Black AJ458237 AJ491755 AJ490860 AJ490975

99626561 A*3402 MHH Caucasian

996794 (Germany)62 ET 1999 (#99–08)c A*3601 MHH Unknown AJ458238 AJ491756 AJ490861 AJ490976

99314163 A*4301 961375B Black AJ458239 AJ491757 AJ490862 AJ49097764 A*6601 MHH Unknown AJ458240 AJ491758 AJ490863 AJ490978

99405065 A*6601 MHH Caucasian

00361766 A*6601 MHH Caucasian

999338 (Ethiopia)67 UCLA 1999 (#102)b A*6602 MHH Black AJ458241 AJ491774 AJ490864 AJ490979

99797368 UCLA 1999 (#83)b A*68011 MHH Black AJ458242 AJ491759 AJ490865 AJ490980

99305469 A*68011 961375B Black70 A*68012 MHH Caucasian AJ458243 AJ491760 AJ490866 AJ490981

99706071 DZA 1999 (#18)a A*68012 MHH Caucasian OLH

994951 (Germany) (=E20399904)72 UCLA 1999 (#101)b A*6802 MHH Unknown AJ458244 AJ491761 AJ490867 AJ490982

99797273 A*6813 MHH Caucasian AJ458245 AJ491762 AJ490868 AJ490983 HFAR

990773 (Syria) (=E29964403)

450

Table 1 (continued)

Number Source (sample no.) Allele Sample Ethnic origin Intron 4 Intron 5 Intron 6 Intron 7 Cell lined

no. GenBank GenBank GenBank GenBank accession accession accession accessionno. no. no. no.

74 A*6901 970343B Unknown AJ458246 AJ491763 AJ490869 AJ49098475 UCLA 1999 (#83)b A*6901 MHH Caucasian

99305376 UCLA 1999 (#84)b A*7401 MHH Black AJ458247 AJ491764 AJ490870 AJ490985

99305577 UCLA 1999 (#75)b A*7401 MHH Black

99055978 A*7402 951065B Unknown AJ458248 AJ491765 AJ490871 AJ49098679 A*7403 MHH Caucasian AJ458249 AJ491766 AJ490872 AJ490987

00295880 UCLA 1999 (#103)b A*8001 MHH Black AJ458250 AJ491767 AJ490873 AJ490988

99797481 UCLA 1999 (#82)b A*8001 MHH Unknown

993052

If not otherwise indicated, samples were obtained from patients of the regional transplant programa DZA = German DNA Exchangeb University of California Los Angeles DNA Extract-Class I Typing Exchange, and Cell Exchangec DNA Quality Control Exercise, and Quality Control Exercise, Eurotransplant Leidend Cell lines are deposited at the European Collection for Biomedical Research, Essen, Germany

Primer Orientation Sequence Length Tm (°C) Position Localization Amplified HLA-Aallelic groups

I3-group1 Sense 5′ TgA ATT TTC TgA 22 62 578–599 Intron 3 *01,*03,*11,*23*24,CTC TTC CCg T 3′ *30,*36*80

I3-group2 Sense 5′ ACA gAT gCA AAA 22 62 562–583 Intron 3 *02,*2416,*25,*26,TgC CTg AAT g 3′ *29,*31,*32,*33,*34,

*43,*66,*68,*69,*743′UTR-common Antisense 5′ CAC Agg TCA 19 62 219–240a) 3′ UTR Generic primer

gCg Tgg gAA G 3′

Table 2 Sequence, length, and melting temperature (Tm) of theamplification primers used in this study. The 3′UTR-commonprimer was derived from the primer HLA3UTA (Domena et al.1993b). Numbering refers to a published consensus sequence

(Summers et al. 1993). The consensus sequence contains threegaps, which explains the discrepancy between localization andprimer length.

Materials and methods

DNA samples and identification of alleles

In total, 81 HLA-A gene copies representing 50 alleles (Table 1)were sequenced. DNA samples were from 69 individuals. From 56individuals only one of the two HLA-A gene copies, and from 13individuals both gene copies, were examined. In some cases, mul-tiple representatives of the same alleles were sequenced, wherebysamples of different ethnic origins were chosen, when possible.

Alleles were identified by sequencing of exons 2 and 3, as pre-viously described (Kotsch et al. 1997). Sequencing of additionalparts of the gene was necessary for unambiguous identification ofA*01011 (exon 4), A*0207 (exon 4), A*0211 (exon 4), A*02172(exon 2), and A*2402101 (intron 2).

PCR amplification and sequencing

Cycle-sequencing of introns 4–7 was performed using an AppliedBiosystems 377 sequencer by the method previously described(Kotsch et al. 1997). All PCR products spanned from intron 3through the 3′UTR. For HLA-A heterozygous samples haplotype-specificity was achieved by the selection of samples bearing al-

leles that belong to different amplification groups. Amplificationprimers and the respective amplification groups are given in Ta-ble 2.

Phylogenetic analysis and determination of evolutionary rates

Phylogenetic trees were constructed based on the intronic se-quences of the 50 alleles examined in this study. Evolutionary pa-rameters were calculated for those 30 alleles (A*01011, *02011,*0203, *0205, *03011, *1101, *2301, *2402101, *24031, *2407,*2501, *2601, *2901, *2902, *3001, *3002, *31012, *3201,*3301, *3303, *3402, *3601, *4301, *6601, *6602, *68011,*6802, *6901, *7401, and *8001), from which sequence data ofintrons 1–3, and exons 1–7 were also available. Sequence data ofintrons 1–3 were taken from two previous studies (Cereb et al.1996a, b; Kotsch et al. 1997), and data of exons 1–7 were obtainedfrom the IMGT/HLA Sequence Database (Robinson et al. 2000).

Sequences were aligned using the software CLUSTAL W(Thompson et al. 1994) running under the graphical interface pro-gram BioEdit 4.8.8 (Hall 1999), and were, in part, re-edited manu-ally. Phylogenetic trees were constructed by the neighbor-joiningmethod (Saitou and Nei 1987), using the software Treecon ver-sion 1.3b (Van de Peer and De Wachter 1994). The number of nu-cleotide substitutions was calculated using the software K-Estima-

451

Fig. 1 a Alignment of HLA-A intron 4. b Alignment of HLA-A in-tron 5. c Alignment of HLA-A intron 6. d Alignment of HLA-A in-tron 7. Dashes denote identity with the first sequence. Gaps arerepresented by asterisks

tor 5.5 (Comeron 1999); for coding and non-coding sequences thenumber of nucleotide substitutions was evaluated based on the al-gorithm described by Kimura (1980). The average number andstandard deviation of substitutions was calculated in an Excel 97(Microsoft, Redmond, Wash.) calculation table from the results ofthe pairwise comparisons performed with K-Estimator 5.5. Thepopulation recombination parameter R (R=4Nr, where N is the ef-fective population size and r is the recombination rate between themost distant sites and per generation), was estimated from Rm –the minimum number of recombination events observed in thesample (Hudson and Kaplan 1985) – by using computer simula-tions based on the coalescent (500 replicates). This analysis wasconducted fixing the sample size (n=30) and the number of segre-gating sites by using the DnaSP 3.50 software (Rozas and Rozas1999).

452

Fig. 1 (continued)

453

Fig. 1 (continued)

454

Fig. 1 (continued)

455

Fig. 1 (continued)

Results

Sequence alignments of introns 4–7 are shown inFigs. 1a–d. Phylogenetic trees of introns 4–7 (group 1)are given in Figs. 2a–d. Length, number of nucleotidesubstitutions per site, recombination rates, GC and CpGcontent are given in Table 3.

The A2/A28 family is characterized by strong groupspecificity in the 3′ introns, since all alleles share identi-cal sequences in introns 4–6, and since there is only asingle nucleotide difference between A*68012 and theother alleles of this group involving position 41 of intron7 (Figs. 1d and 2d).

Likewise, the A10 family, although in its 5′ region aserologically diverse family of antigens (Madrigal et al.

1993), demonstrates clear group specificity in its 3′ int-rons. In exons 4, 5, and 7, all alleles have identical intronsequences; and in intron 6 there are two clusters (firstcluster: A*2501, *2601, *2603, *4301, *6601; secondcluster: A*3402, *6602), which yet differ only by a sin-gle nucleotide exchange at position 100.

With regard to the serological A19 family, the situa-tion is more complex. The two A*29 and the three A*33variants studied shared identical sequences within the re-spective allelic groups in all 3′ introns. However, apartfrom exons 6 and 7 (data not shown), the A19 family issplit into several clusters, the compositions of whichvary with exons and introns. For example, in intron 4 theA*29 sequence is identical with A*31012, A*33, andA*2416 (see below), whereas in intron 5 the A*29 allelesshare their sequence with A*3201 and A*7401. Anotherimportant finding is that in all exons and introns exam-ined in this study, A*30 variants cluster together with theA1/A3/A11/A36 group, clearly distinct from the otherA19 variants. This result furnishes additional evidencethat A*30 rather belongs to the A1/A3/A11/A36 familythan to the A19 group (Kato et al. 1989). Interestingly,A*3001, A*3002, and A*3004 share identical sequencesonly in introns 4 and 6, but have different sequenceseach in intron 5 (positions 207 and 222), whereas in in-tron 7 the allele A*3002 differs from A*3001 andA*3004 at position 41. This finding is concordant withthe idea that A*30 alleles appear to have been involvedin more genetic division than any other alleles of theA1/A3/A11 or A19 groups, and that A*30 may representan atypical group in which the rate of gene conversionsor mutations in unusually high (Kato et al. 1989). Fur-thermore, the study of A19 sequences reveals thatA*2416 clusters together with A*31012 and A*3301 inintrons 4–7. This result confirms the finding that A*2416originated by conversion between a common A9 allele(donor) and A*3101 (acceptor), as previously concludedfrom sequencing data of exons 1–4, and introns 1–3(Binder et al. 2000). Apart from A*2416, all alleles ofthe serological A9 family, i.e. A*2301 and seven A*24alleles, share identical sequences in all 3′ introns.

HLA-A1, A3, A11 and A36 have been described toform a further cross-reactive group associated with thepublic markers P01, P11, and P93 (Duquesnoy et al.1990). In our study, alleles of this group were foundclustered together in introns 4 and 6, whereas they weregrouped into different branches at introns 5 and 7 (Figs.2b and 2d). Sequencing of intron 5 confirmed a uniquenucleotide exchange (G>A) at position 19 in both A*11variants studied (A*1101 and A*1102), which has beenrecently described to cause a second donor splice site(Tijssen et al. 2000). Both A*11 alleles studied shareidentical sequences in all 3′ introns.

The phylogenetic analysis of HLA-A*8001, whichwas described for the first time in Afro-Americans ex-pressing a unique HLA-A serologic specificity (Domenaet al. 1993a), revealed that this allele represents a distinctfamily in the 3′ introns, which is most related to allelesof the A3 and A9 families. This finding thus reflects the

456

Fig. 1 (continued)

exonic phylogeny of A*8001 (Domena et al. 1993a), anddemonstrates the lineage specificity of this HLA-A vari-ant.

In several cases the intronic homogeneity withingroups extends to homogeneity between groups. In in-tron 4, inter-group homogeneity can be observed for theA2/A28 family sharing its sequence with the A10 family;and for the A9 family having an identical sequence withthe A1/A3/A11/A36 group (Fig. 2a). In intron 6 theA2/A28 group shares its sequence with theA1/A3/A11/A36 group; and the sequences of the A10group are identical with A*29, A*3201, and A*74(Fig. 2c). In intron 7, the A10 sequence is identical withthat of the A19 group, whereas in intron 5, families arecharacterized by individual sequences (Fig. 2d).

Discussion

An important feature of the HLA-A 3′ introns is theirgroup-specific conservation, since apart from the A19group few variations within allelic groups and familieswere observed. This conservation within groups is incontrast to the pattern of changing relations betweengroups. With the exception of the most diverse intron 5,there are at least two families that share identical se-quences, and the way these groups are related varies withthe introns. A similar pattern of group-specific conserva-tion and variation of relationships among lineages wasalso observed in the HLA-A 5′ introns (Meyer andBlasczyk 2000). As discussed below, this finding indi-cates that recombination was not restricted to the 5′ partof HLA-A but took place, to a lesser extent, also in the 3′part of the gene.

Previous studies on the phylogenetic relationship ofexons of the HLA-A families revealed that they can be

457

Fig. 1 (continued)

grouped into two major lineages. The HLA-A2 lineagecomprises the A2, A10 and A19 families; and the HLA-A3 lineage consists of the A9, A80, and A1/A3/A11/A30families, although the latter group might rather form fourseparate families than a single one (Adams et al. 2000;Domena et al. 1993a; Kato et al. 1989). The 3′ intron da-ta obtained in the present study support this concept ofbifurcation since, with the exception of intron 6, wherethe A2/A28 and A1/A3/A11/A36 families share identicalsequences, only families of the respective lineage clusterat the same branch. Since the groups that share identicalsequences vary with the introns, this pattern might indi-cate homogenization of intron sequences due to interal-lelic recombination of the adjacent exons and subsequentgenetic drift of the introns, as previously described forclass II MHC genes (Hughes 2000). This idea would be

consistent with the finding that, apart from intron 7, thecalculated number of nucleotide substitutions (K) in the3′ introns is generally lower than the number of synony-mous nucleotide substitutions (Ks) in the adjacent exons(Table 3), whereas in the absence of recombination equalvalues would be expected. However, evidence againstthis thesis of exonic recombination and intronic homoge-nization is given by the relationship between Ks and thecalculated number of nucleotide substitutions leading toamino acid exchanges (Ka) in the 3′ exons. Gene conver-sion and recombination are the predominant mechanismsthat contribute to the generation of new alleles (Hugheset al. 1993; Parham et al. 1995; Parham and Ohta 1996).Due to overdominant selection, which is the major driv-ing force for recombination in MHC genes, Ka would beexpected to clearly exceed Ks in exons with high recom-bination rates, as can be seen for exons 2 and 3 (Come-ron 1999; Hughes and Yeager 1998). Yet, in the 3′ partof HLA-A only in the short exon 7 does Ka clearly exceedKs, whereas in exon 5 Ka exceeds Ksonly by very little,and in exons 4 and 6 Kais lower than Ks (Table 3). Takentogether, this partly conflicting evidence, nucleotide sub-stitution rates and phylogenies of the 3′ introns suggest alower recombination activity in the 3′ compared to the 5′regions. This is reflected in overall lineage specificityand in the recombination rates, as determined by thecomputer simulations based on the coalescent, which areclearly lower in the 3′ than in the 5′ exons (Table 3).

In contrast to the corresponding introns, exons 6 and7 are characterized by pronounced homogeneity (datanot shown). This finding is most likely to be explainedby purifying selection and/or by the comparatively lownumber of nucleotides (33 and 48 bp, respectively),which from a statistical point of view reduces the proba-bility of detecting polymorphisms.

Conversion of mouse MHC class I and II, and ofHLA class II genes has been recently described to be as-sociated with CpG-rich regions (Högstrand and Böhme1999). The low CpG content (≤3%) of the 3′ intronsfound in our study (Table 3) compared to more than 10%in the 5′ regions may furnish evidence that the associa-tion between CpG content and gene conversion alsoholds true for HLA-A. In contrast to the abrupt CpG de-crease downstream of exon 3 (CpG content in these parts0–3.0%), the GC content decreases in stages from exon 1(71.5%) to intron 7 (46.4%), which demonstrates thatGC content alone is not informative with regard to theprobability of recombination (Table 3).

In summary, the data obtained in this study demon-strate clear lineage specificity of the 3′ introns of mostHLA-A families and enable a systematic survey of evolu-tionary data of HLA-A alleles from exon 1 through 7.Since HLA typing by generic sequencing cannot definethe cis/trans linkage of heterozygous nucleotide posi-tions, unambiguous identification of alleles and detectionof new variants – apart from cloning – is only possibleby sequencing after haplotype-specific amplification. ForHLA-A exons 2 and 3, this sequencing strategy is a well-established method (Blasczyk et al. 1996; Kotsch et al.

458

Fig. 1 (continued)

1997). The polymorphisms of the 3′ introns provide theprerequisites for the development of a haplotype-specificsequencing strategy for the 3′ exons of HLA–A as well.Furthermore, the sequence data compiled in this studymay prove valuable for the explanation of recombinationevents in newly described HLA class I alleles and forfurther phylogenetic studies of HLA diversity and diver-sification.

459

Fig. 1 (continued)

460

Fig. 1 (continued)

Intron Length (bp) K R Rm GC (%) CpG (%) Exon Length (bp) Ks Ka R Rm GC (%) CpG (%)

1 128 0.0459 3.7 1 76.1 10.4 1 73 0.0200 0.0211 21.7 1 71.5 7.1(±0.0090) (±0.0115) (±0.0102)

2 239 0.0255 7.5 2 71.8 9.9 2 270 0.0333 0.0474 34.6 6 65.9 10.7(±0.0047) (±0.0155) (±0.0187)

3 599 0.0182 4.7 2 55.2 2.3 3 276 0.0255 0.0489 59.7 7 65.2 10.0(±0.0038) (±0.0102) (±0.0188)

4 102 0.0121 n.a. 0 57.4 0.5 4 276 0.0542 0.0164 17.0 3 62.5 2.1(±0.0026) (±0.0238) (±0.0075)

5 444 0.0265 25.2 5 53.1 1.0 5 117 0.0342 0.0355 17.1 2 59.3 3.0(±0.0056) (±0.0159) (±0.0153)

6 148 0.0232 n.a. 0 53.1 0.3 6 33 0.0489 0.0151 n.a. 0 52.7 0(±0.0058) (±0.0314) (±0.0097)

7 169 0.0397 3.2 1 46.4 1.8 7 48 0.0052 0.0172 n.a. 0 50.2 0(±0.0085) (±0.0076) (±0.0088)

Table 3 Features of HLA-A introns and exons 1–7. K Mean num-ber of nucleotide substitutions per site, standard deviation given inbrackets; Ks mean number of synonymous nucleotide substitutionsper site, standard deviation given in brackets; Ka mean number ofnon-synonymous nucleotide substitutions per site, standard devia-

tion given in brackets; R recombination per intron and exon, re-spectively, based on 500 replicates; Rm minimum number of re-combination events; n.a. not applicable, since Rm is equal to 0.Percentage of CpG is calculated as the number of CpG dinucleo-tides divided by the total number of nucleotides

Acknowledgements We would like to thank Sandra Reuter forexcellent technical assistance. This study was supported by a grantfrom the Wilhelm Sander Foundation.

References

Adams EJ, Cooper S, Thomson G, Parham P (2000) Commonchimpanzees have greater diversity than humans at two of thethree highly polymorphic MHC class I genes. Immunogenetics51:410–424

Binder T, Heym J, Horn B, Blasczyk R (2000) HLA-A*2416: anew allele of the HLA-A19 lineage. Tissue Antigens 55:178–181

Bjorkman PJ, Parham P (1990) Structure, function, and diversityof class I major histocompatibility complex molecules. AnnuRev Biochem 59:253–288

Blasczyk R, Wehling J, Weber M, Salama A (1996) Sequenceanalysis of the 2nd intron revealed common sequence motifs providing the means for a unique sequencing basedtyping protocol of the HLA-A locus. Tissue Antigens 47:102–110

Blasczyk R, Kotsch K, Wehling J (1997) The nature of polymor-phism of the HLA class I non-coding regions and their contri-bution to the diversification of HLA. Hereditas 127:7–9

461

Fig. 2a–d Unrooted phylogenetic trees of introns 4–7 sequencedin this study. Analyses were based on all sequences determined inthis study. Bootstrap percentages ≥75% are given at the inter-nodes. When all variants of an allelic group clustered at the samebranch of the tree, only the first variant of the respective allelicgroup is given. a Phylogenetic analysis of intron 4. b Phylogeneticanalysis of intron 5. c Phylogenetic analysis of intron 6. d Phylo-genetic analysis of intron 7

Cereb N, Kong Y, Lee S, Maye P, Yang SY (1996a) Nucleotide sequences of MHC class I introns 1, 2 and 3 in humans and intron 2 in nonhuman primates. Tissue Antigens 47:498–511

Cereb N, Kong Y, Lee S, Maye P, Yang SY (1996b) Erratum. TissueAntigens 48:235–236

Comeron JM (1999) K-Estimator: calculation of the number ofnucleotide substitutions per site and the confidence intervals.Bioinformatics 15:763–764

Crew MD (1997) Compilation of distinct HLA-A, -B and -Ctransmembrane and cytoplasmic domain-encoding sequences.Eur J Immunogenet 24:443–449

Dausset J (1971) The genetics of transplantation antigens. Trans-plant Proc 3:8–14

Domena JD, Hildebrand WH, Bias WB, Parham P (1993a) A sixthfamily of HLA-A alleles defined by HLA-A*8001. Tissue Antigens 42: 156–159

Domena JD, Little AM, Madrigal AJ, Hildebrand WH, Johnston-Dow L, du Toit E, Bias WB, Parham P (1993b) Structural heterogeneity in HLA-B70, a high-frequency antigen of blackpopulations. Tissue Antigens 42:509–517

Duquesnoy RJ, White LT, Fierst JW, Vanek M, Banner BF,Iwaki Y, Starzl TE (1990) Multiscreen serum analysis of highlysensitized renal dialysis patients for antibodies toward publicand private class I HLA determinants. Implications for comput-er-predicted acceptable and unacceptable donor mismatches inkidney transplantation. Transplantation 50:427–437

Hall TA (1999) BioEdit: a user-friendly biological sequence align-ment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95

Högstrand K, Böhme J (1999) Gene conversion of major histo-compatibility complex genes is associated with CpG-rich regions. Immunogenetics 49:446–455

Hudson RR, Kaplan NL (1985) Statistical properties of the numberof recombination events in the history of a sample of DNA sequences. Genetics 111:147–164

Hughes AL (2000) Evolution of introns and exons of class II major histocompatibility complex genes of vertebrates. Immu-nogenetics 51:473–486

Hughes AL, Yeager M (1998) Natural selection at major histo-compatibility complex loci of vertebrates. Annu Rev Genet32:415–435

Hughes AL, Hughes MK, Watkins DI (1993) Contrasting roles ofinterallelic recombination at the HLA-A and HLA-B loci. Genetics 133:669–680

Kato K, Trapani J A, Allopenna J, Dupont B, Yang SY (1989)Molecular analysis of the serologically defined HLA-Aw19antigens. A genetically distinct family of HLA-A antigenscomprising A29, A31, A32, and Aw33, but probably not A30.J Immunol 143:3371–3378

Kimura M (1980) A simple method for estimating evolutionaryrates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120

Kotsch K, Wehling J, Köhler S, Blasczyk R (1997) Sequencing ofHLA class I genes based on the conserved diversity of thenoncoding regions: sequencing-based typing of the HLA-Agene. Tissue Antigens 50:178–191

Lawlor, DA, Warren E, Ward FE, Parham P (1990) Comparison ofclass I MHC alleles in humans and apes. Immunol Rev 113:147–185

Lopez de Castro JA, Strominger JL, Strong, DM, Orr HT (1982)Structure of crossreactive human histocompatibility antigensHLA-A28 and HLA-A2: possible implications for the genera-tion of HLA polymorphism. Proc Natl Acad Sci USA 79:3813–3817

Madrigal JA, Hildebrand WH, Belich MP, Benjamin RJ, LittleAM, Zemmour J, Ennis PD, Ward FE, Petzl-Erler ML, duToit ED, Parham P (1993) Structural diversity in the HLA-A10 family of alleles: correlations with serology. Tissue Anti-gens 41:72–80

Marsh SG, Bodmer JG, Albert ED, Bodmer WF, Bontrop RE,Dupont B, Erlich HA, Hansen JA, Mac HB, Mayr WR, Parham P, Petersdorf EW, Sasazuki T, Schreuder GM, Strominger JL, Svejgaard A, Terasaki PI (2001) Nomenclaturefor factors of the HLA system, 2000. Eur J Immunogenet28:377–424

Meyer D, Blasczyk R (2000) The effect of mutation, recombina-tion and selection on HLA non-coding sequences. In: KasaharaM (ed) Major histocompatibility complex. Evolution, struc-ture and function. Springer, Berlin Heidelberg Tokyo,pp 398–411

Parham P, Ohta T (1996) Population biology of antigen presenta-tion by MHC class I molecules. Science 272:67–74

Parham P, Adams EJ, Arnett KL (1995) The origins of HLA-A,B,C polymorphism. Immunol Rev 143:141–180

Robinson J, Malik A, Parham P, Bodmer JG, Marsh SG (2000)IMGT/HLA database–a sequence database for the human ma-jor histocompatibility complex. Tissue Antigens 55:280–287

Rozas J, Rozas R (1999) DnaSP version 3: an integrated programfor molecular population genetics and molecular evolutionanalysis. Bioinformatics 15:174–175

Saitou N, Nei M (1987) The neighbor-joining method: a newmethod for reconstructing phylogenetic trees. Mol Biol Evol4:406–425

Strachan T, Sodoyer R, Damotte M, Jordan BR (1984) Completenucleotide sequence of a functional class I HLA gene, HLA-A3:implications for the evolution of HLA genes. EMBO J 3:887–894

Summers CW, Hampson VJ, Taylor, GM (1993) HLA class I non-coding nucleotide sequences, 1992. Eur J Immunogenet20:201–240

Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence align-ment through sequence weighting, position-specific gap penal-ties and weight matrix choice. Nucleic Acids Res 22:4673–4680

Tijssen HJ, Sistermans EA, Joosten I (2000) A unique second donor splice site in the intron 5 sequence of the HLA-A*11alleles results in a class I transcript encoding a molecule withan elongated cytoplasmic domain. Tissue Antigens 55:422–428

Van de Peer Y, De Wachter R (1994) TREECON for Windows: asoftware package for the construction and drawing of evolu-tionary trees for the Microsoft Windows environment. ComputAppl Biosci 10:569–570

462