Computational allergenicity prediction of transgenic proteins expressed in genetically modified...

13
410 Introduction Crops used in the production of food, feed, and fiber are being improved through introduction of DNA encod- ing one or more specific proteins. In most countries, genetically engineered foods must be assessed for the safety before market approval is granted [1–3]. An important issue in the safety assessment is the potential allergenicity of a protein intended to be used as foreign protein into the food crop by genetic engineering. e safety/allergenicity of the newly introduced protein must be assessed because allergic reactions may be induced in some individuals who may be sensitive to the inserted protein(s). In addition, using recombinant DNA technology, gene(s) can be borrowed from any source (no species barrier) and borrowed gene may or may not have any history of consumption, like cry gene(s) from Bacillus thuringiensis. Post-translation modifications allowed by different hosts may also have an impact on allergenic potential of proteins [4]. To assess the allergenic potential of genetically modified (GM) crops, several guidelines like International Life Sciences Institute/International Food Biotechnology Council (ILSI/IFBC) [5], Food and Agriculture orga- nization/World Health Organization (FAO/WHO) [6], Codex Alimentarius Commission [7], and Organization for Economic cooperation and Development (OECD) [8] are available. Since no single criterion can exactly predict potential allergenicity, step-by-step procedure described in above said guidelines is used for assess- ment. e principal aim of the safety assessment of GM crops is to prevent the introduction of known or clini- cally cross-reactive allergens. One important step in ORIGINAL ARTICLE Computational allergenicity prediction of transgenic proteins expressed in genetically modified crops Alok Kumar Verma, Amita Misra, Swarna Subash, Mukul Das, and Premendra D. Dwivedi Food Toxicology Division, Indian Institute of Toxicology Research (Formerly Industrial Toxicology Research Centre), Council of Scientific and Industrial Research (CSIR), Lucknow, Uttar Pradesh, India Abstract Development of genetically modified (GM) crops is on increase to improve food quality, increase harvest yields, and reduce the dependency on chemical pesticides. Before their release in marketplace, they should be scrutinized for their safety. Several guidelines of different regulatory agencies like ILSI, WHO Codex, OECD, and so on for allergenicity evaluation of transgenics are available and sequence homology analysis is the first test to determine the allergenic potential of inserted proteins. Therefore, to test and validate, 312 allergenic, 100 non-allergenic, and 48 inserted proteins were assessed for sequence similarity using 8-mer, 80-mer, and full FASTA search. On performing sequence homology studies, ~94% the allergenic proteins gave exact matches for 8-mer and 80-mer homology. However, 20 allergenic proteins showed non-allergenic behavior. Out of 100 non-allergenic proteins, seven qualified as allergens. None of the inserted proteins demonstrated allergenic behavior. In order to improve the predictability, proteins showing anomalous behavior were tested by Algpred and ADFS separately. Use of Algpred and ADFS softwares reduced the tendency of false prediction to a great extent (74–78%). In conclusion, routine sequence homology needs to be coupled with some other bioinformatic method like ADFS/Algpred to reduce false allergenicity prediction of novel proteins. Keywords: Allergen database, allergenicity prediction, genetically modified food, novel proteins, sequence homology e first two authors contributed equally to this work. Address for Correspondence: Dr. Premendra D. Dwivedi, Food Toxicology Division, Indian Institute of Toxicology Research (IITR), P.O. Box No. 80, Mahatma Gandhi Marg, Lucknow 226 001, Uttar Pradesh, India. E-mail: [email protected] (Received 04 May 2010; revised 22 June 2010; accepted 24 July 2010) Immunopharmacology and Immunotoxicology, 2011; 33(3): 410–422 © 2011 Informa Healthcare USA, Inc. ISSN 0892-3973 print/ISSN 1532-2513 online DOI: 10.3109/08923973.2010.523704

Transcript of Computational allergenicity prediction of transgenic proteins expressed in genetically modified...

410

Introduction

Crops used in the production of food, feed, and fiber are being improved through introduction of DNA encod-ing one or more specific proteins. In most countries, genetically engineered foods must be assessed for the safety before market approval is granted [1–3]. An important issue in the safety assessment is the potential allergenicity of a protein intended to be used as foreign protein into the food crop by genetic engineering. The safety/allergenicity of the newly introduced protein must be assessed because allergic reactions may be induced in some individuals who may be sensitive to the inserted protein(s). In addition, using recombinant DNA technology, gene(s) can be borrowed from any source (no species barrier) and borrowed gene may or may not have any history of consumption, like cry

gene(s) from Bacillus thuringiensis. Post-translation modifications allowed by different hosts may also have an impact on allergenic potential of proteins [4]. To assess the allergenic potential of genetically modified (GM) crops, several guidelines like International Life Sciences Institute/International Food Biotechnology Council (ILSI/IFBC) [5], Food and Agriculture orga-nization/World Health Organization (FAO/WHO) [6], Codex Alimentarius Commission [7], and Organization for Economic cooperation and Development (OECD) [8] are available. Since no single criterion can exactly predict potential allergenicity, step-by-step procedure described in above said guidelines is used for assess-ment. The principal aim of the safety assessment of GM crops is to prevent the introduction of known or clini-cally cross-reactive allergens. One important step in

ORIGINAL ARTICLE

Computational allergenicity prediction of transgenic proteins expressed in genetically modified crops

Alok Kumar Verma, Amita Misra, Swarna Subash, Mukul Das, and Premendra D. Dwivedi

Food Toxicology Division, Indian Institute of Toxicology Research (Formerly Industrial Toxicology Research Centre), Council of Scientific and Industrial Research (CSIR), Lucknow, Uttar Pradesh, India

AbstractDevelopment of genetically modified (GM) crops is on increase to improve food quality, increase harvest yields, and reduce the dependency on chemical pesticides. Before their release in marketplace, they should be scrutinized for their safety. Several guidelines of different regulatory agencies like ILSI, WHO Codex, OECD, and so on for allergenicity evaluation of transgenics are available and sequence homology analysis is the first test to determine the allergenic potential of inserted proteins. Therefore, to test and validate, 312 allergenic, 100 non-allergenic, and 48 inserted proteins were assessed for sequence similarity using 8-mer, 80-mer, and full FASTA search. On performing sequence homology studies, ~94% the allergenic proteins gave exact matches for 8-mer and 80-mer homology. However, 20 allergenic proteins showed non-allergenic behavior. Out of 100 non-allergenic proteins, seven qualified as allergens. None of the inserted proteins demonstrated allergenic behavior. In order to improve the predictability, proteins showing anomalous behavior were tested by Algpred and ADFS separately. Use of Algpred and ADFS softwares reduced the tendency of false prediction to a great extent (74–78%). In conclusion, routine sequence homology needs to be coupled with some other bioinformatic method like ADFS/Algpred to reduce false allergenicity prediction of novel proteins.

Keywords: Allergen database, allergenicity prediction, genetically modified food, novel proteins, sequence homology

The first two authors contributed equally to this work.Address for Correspondence: Dr. Premendra D. Dwivedi, Food Toxicology Division, Indian Institute of Toxicology Research (IITR), P.O. Box No. 80, Mahatma Gandhi Marg, Lucknow 226 001, Uttar Pradesh, India. E-mail: [email protected]

(Received 04 May 2010; revised 22 June 2010; accepted 24 July 2010)

Immunopharmacology and Immunotoxicology, 2011; 33(3): 410–422© 2011 Informa Healthcare USA, Inc.ISSN 0892-3973 print/ISSN 1532-2513 onlineDOI: 10.3109/08923973.2010.523704

Immunopharmacology and Immunotoxicology

2011

33

3

410

422

04 May 2010

22 June 2010

24 July 2010

0892-3973

1532-2513

© 2011 Informa Healthcare USA, Inc.

10.3109/08923973.2010.523704

LIPI

523704

Computational allergenicity prediction of transgenic crops 411

© 2011 Informa Healthcare USA, Inc.

this procedure is to determine, with the aid of computer programs, whether the primary structure (amino-acid sequence) of the transgenic protein is similar to any sequence of known allergenic protein(s). FAO/WHO [6] and Codex [7] proposed two criteria for sequence homology. First, short identical stretches of contigu-ous amino acids above a specified number (generally six or eight) are identified, since these constitute the so-called linear or continuous IgE-binding epitopes. Second, partial identity in a larger segment of the amino-acid sequence that reflects overall similarity of protein structure is identified. Codex alimentarius guidelines recommended >35% identical amino acids in 80-amino-acid segment as threshold for consider-ing a protein allergenic. The similar stretches that are identified this way may harbor potential binding sites (called epitopes) for IgE antibodies. Actually, antigen induces cross-linking of IgE bound to mast cells/baso-phils, which results in release of vasoactive mediators such as histamine that cause the symptoms of allergy [9,10]. ILSI/IFBC decision tree approach [5] highly rec-ommended eight contiguous amino acid matches for 100% homology for allergenicity predictions but some findings showed that even six-amino-acid fragments are sufficient to bind with IgE and induce IgE-mediated reaction [11,12]. However, several studies have shown that the short peptide matches give extremely high number of false negative hits [13–15]. Hileman et al. [16] reported that search for eight-amino-acid match may provide an added margin of safety when assessing the allergenic potential of inserted protein as compared with search with a six-amino-acid window, which pro-duces many random, irrelevant matches. Therefore, the aim of the present study was to test and validate guide-lines for in silico allergenicity assessment taking known allergenic proteins as positive control, non-allergenic as negative control, and inserted protein sequences as test protein. The proteins in each class that behaved against their known characters were further analyzed by two other public domain softwares: Algpred and Allergen Database for Food Safety (ADFS) to know whether the addition of these tests could reduce the false prediction as these tests are motif-based and thereby reduce the chances of introducing allergenic protein to GM crops.

Methods

In order to perform this study, three groups of proteins were selected. Group 1 contained 312 known

allergenic proteins; group 2 had 100 non-allergenic; and group 3 included inserted protein sequences. All the non-allergenic proteins were randomly selected and evaluated to screen possible allergenicity using Food Allergy Research and Resource Program (FARRP) allergen database administered by the University of Nebraska, Lincoln (http://www.allergenonline.com) for homology with known allergenic proteins as per the FAO/WHO [6] and Codex [7] criteria. We specifi-cally compared query protein sequence with allergenic sequence in database using three criteria: 8-mer, 80-mer, and full FASTA. Algpred and ADFS softwares were utilized to further confirm allergenicity of the pro-teins that behaved against their expected characters. Schematic diagram of procedure followed in this study is shown in Figure 1.

Allergenic sequences for bioinformaticsSequences of allergenic proteins were taken from AllergenOnline version 9 (updated 2009) database. The database is available at http://www.allergenonline.com. The database contains 1386 known allergens associated with food, airway, contact, or injected (biting or stinging insects) allergenic sources. This database is maintained by the FARRP of the University of Nebraska. All database entries are linked to sequences in the National Center for Biotechnology Information (NCBI) of the National Institutes of Health (NIH).

Non-allergenic and inserted protein sequencesAmino-acid sequences of non-allergenic and inserted proteins were retrieved from NCBI entrez protein data-base. The database is available at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=pubmed. Selected non-allergenic proteins were searched in dif-ferent allergen databases, such as AllergenOnline and Structural Database of Allergenic Proteins (SDAP; available at http://fermi.utmb.edu/SDAP/) and if the protein was found to be absent in these databases, then only proteins were selected as non-allergenic protein for analysis.

(i) Sequence homology study on AllergenOnlineSequence homology studies were performed using “AllergenOnline” public domain software. The FASTA computer algorithm was used for these sequence align-ments. This program achieves a high level of sensitivity for similarity searching at high speed. With FASTA, “head-to-tail” alignments were made of each sub-sequence

Abbreviations8-/80-mer Peptide of 8/80 amino acidsADFS Allergen Database for Food SafetyFAO/WHO Food and Agriculture organization/World Health

Organization

GM genetically modifiedILSI/IFBC International Life Sciences Institute/International

Food Biotechnology Council.

412 A. K. Verma et al.

Immunopharmacology and Immunotoxicology

with each database sequence. The default threshold for the number of identical amino acids was 35% in the alignment with an 80-amino-acid window (80-mer), which is considered a significant level of identity between the input sequence and the allergenic protein’s sequence. The identity presented by the Web site in the results of the alignments was therefore the % identical amino acids in the 80-amino-acid window. The entered sequences were compared with the sequences of allergenic proteins compiled in the database.

Exactmer search of AllergenOnline. The identification of exact matches of any eight-amino-acid long sequence between a query protein sequence and a well-known allergen was done by AllergenOnline. To this end, a word match algorithm was used, which searches for identi-cal matches of a specified number of contiguous amino acids between the input sequence and a given database sequence. Sequence similarity to self-matched was ignored. The search facility automatically carries out the process recommended by the Codex guideline [7].

80-mer sequence similarity. Based on the recommen-dation of FAO/WHO [6] and Codex [7] guidelines, the FASTA 3 algorithm was used to compare all possible con-tiguous amino acids [17]. Alignment of 80-amino-acid sub-sequences of the input sequence was done using a sliding window of 80-amino-acid size. The step size was one amino acid, such that from a sequence of 100 amino acids, for example, 21 sub-sequences of 80-amino-acid length are made (1–80, 2–81, 3–82, …, 20–99, 21–100). Each of these sub-sequences was aligned to database sequences. Every possible 80 contiguous amino acid segments of each query protein was compared with

the database on AllergenOnline, using the FASTA 3 algorithm.

Full FASTA search. Full FASTA search can be very spe-cific when comparing long regions of low similarity of novel proteins with known allergen. If there is similar-ity between proteins of interest with known allergen, then it will be considered allergenic. We have compared the query protein sequence with known allergens on AllergenOnline and observed whether there is any match between query protein sequence and known allergens available on FARRP database.

(ii) Allergenicity prediction using public domain software

(1) Algpred. Allergenic and non-allergenic proteins found opposite to their expected characters, that is, allergenic proteins showing non-allergenic behavior and non-allergenic proteins showing allergenic behavior by AllergenOnline prediction were further analyzed using Algpred software [18]. Algpred is available at http://www.imtech.res.in/raghava/algpred/index.html. This software integrates various approaches in order to predict allergenic protein with high accuracy. Algpred predicts allergenicity of protein sequences using 5 different approaches namely; IgE epitope mapping, MEME/MAST motif, Support Vector Machine (SVM) based on amino acids composition, Support Vector Machine (SVM) based on dipeptide com-position, Allergen Representative Peptide (ARP) method.

The salient features of AlgPred server are:

Algpred allows prediction of allergens based on 1. similarity of known epitope with any region of protein.

Figure 1. Schematic representation of procedure followed in computational allergenicity prediction.

Computational allergenicity prediction of transgenic crops 413

© 2011 Informa Healthcare USA, Inc.

The mapping of IgE epitope(s) feature of server 2. allows user to locate the position of epitope in their protein.

Server search MEME/MAST allergen motifs using 3. MAST and assign a protein allergen if it have any motif.

Allows predicting allergens based on SVM modules 4. using amino acid or dipeptide composition.

It facilitates BLAST search against 2890 ARPs obtained 5. from Björklund et al. [19] and assign a protein aller-gen if it have a BLAST hit.

Hybrid option of server allows predicting allergen 6. using combined approach (SVMc + IgE epitope + ARPs BLAST + MAST).

(2) Allergen Database for Food Safety. ADFS is a Web server database system that is used as allergenicity pre-diction tools. This ADFS was launched as a project of the Division of Biochemistry and Immunochemistry of National Institute of Health Sciences (Japan). This data-base is available at http://allergen.nihs.go.jp/ADFS/. To study the allergenic potential of protein in the food, the database has been constructed to include known aller-gens and B-cell epitope sequences.

This database includes 13 (aero animal, aero fungi, aero insect, aero mite, aero plant, contact, food ani-mal, food fungi, food plant, gliadin, protozoan, venom/salivary, and worm) categories of allergens based on allergen type in AllergenOnline, with their accession numbers, epitope information, 3D-structure informa-tion, and sugar-containing information. This database also provides sequence search tools for obtaining the sequence homology of a certain protein or peptide relating to allergens (BLAST, epitope search) and gives tools for allergenicity prediction based on two different methods: the FAO/WHO method and the motif-based method.

Sequence homology search provided in database gives important information that is helpful in identify-ing potential cross-reactivity with known allergens. Motif-based predictions of the allergenicity of a query sequence by comparison with allergens in the ADFS are also available. By using the MEME motif discovery tool, 65 allergen motifs and 170 allergen sequences have been extracted from 1062 individual allergen sequences in the ADFS following the method described by Stadler et al. (2003). The potential allergenicity of query protein is analyzed using two-step criteria. First, using a profile analysis tool, the structural similarity between query sequence and the allergen motifs are compared. If query matches any of the allergen motifs, the protein is pre-dicted to be allergenic. If not, query sequence is then successively analyzed for amino-acid sequence similari-ties to the allergen sequences that did not match any of the allergen motifs (170 sequences) using the BLAST algorithm.

Results

(i) Sequence similarity studyOf all 312 analyzed allergenic proteins, 292 (93.58%) showed allergenicity as demonstrated by sequence similarity to known allergen available on database while using criteria of identical matching for 8-mer, >35% homology using 80 mer sliding window, and full FASTA search. Twenty proteins (6.41%) were qualified as non-allergenic while using any of three criteria selected for the study (Table 1). Out of these 20 allergenic proteins, nine proteins namely allergen ziz m1, one protein from Actinidia chinensis, one protein from Corylus avellana, alpha s2-like casein precursor, allergen Api g 5 (segment 1 of 4), major Gly 50 kDa allergen, hull allergen Glym2, seed allergenic protein 2, and 5a2 protein were recog-nized as non-allergenic by all three criteria (8, 80-mer, and full FASTA) that means they strictly showed false negative results when their homology to self was ignored (Table 1, depicted in bold).

Out of 100 non-allergenic proteins, 93 proteins were found to be non-allergenic. Seven proteins namely horse radish peroxidise (HRP), superoxide dismutase (SOD), alpha amylase inhibitor, carrot chalcone syn-thase 2, chain A pectin methyl esterase, putative protein disulfide isomerase, and diacylglycerol acyltransferase qualified as allergenic proteins by at least one of three selected criteria (Table 2). Of these seven, five proteins namely HRP, SOD, alpha amylase inhibitor, carrot chal-cone synthase 2, and diacylglycerol acyltransferase were qualified as non-allergen when using 8 and 80-mer criterion.

In case of inserted proteins, all the selected 48 proteins did not show any sequence similarity with allergenic pro-teins on AllergenOnline search (Table 3).

(ii) Algpred allergenicity predictionTable 4 shows Algpred analysis of 20 allergenic pro-teins that behaved as non-allergenic in AllergenOnline analysis. Out of these 20 allergenic proteins, two proteins namely nonspecific lipid-transfer protein P2 and major allergen Tur c1 were identified as allergen. Sixteen pro-teins namely allergen Api g 5, Cit l 3, Cuc m 3, Vit v 1, Act c 2, 31 kDa major allergen/disease resistance protein homolog, 18 kDa major allergen/Bet v 1 homolog, beta-lactoglobulin, lysozyme protein, allergen ziz m1, one protein from A. chinensis, alpha s2-like casein precur-sor, allergen Api g5 (segment 1 of 4), major Gly 50 kDa allergen, hull allergen Glym2, and 5a2 protein were predicted as both allergen/non-allergen. Two allergenic proteins namely one protein of C. avellana and seed allergenic protein 2 of Prunus dulsis were still recognized as non-allergenic when analyzed with Algpred. It must be mentioned here that ±allergenic proteins, that is, showing allergenic behavior with some parameters and non-allergenic with others were considered as allergens in this study.

414 A. K. Verma et al.

Immunopharmacology and Immunotoxicology

Table 1. List of allergenic proteins* showing non-allergenic property by AllergenOnline analysis [8-mer, 80-mer, or full FASTA].

S. No. Accession No. Name of protein Source of protein Query sequence length (# of AA)

Number of matches with allergens

80-mer 8 merFull

FASTA1. 32363124 Allergen Api g 5 Apium graveolens 30 0 1 72. P84160 Nonspecific

lipid-transfer protein (LTP) (allergen Cit l 3)

Citrus limon 20 0 2 21

3. 46396596 Allergen Cuc m 3 Cucumis melo 21 0 1 24. P80274 Allergen Vit v 1

(nonspecific lipid- transfer protein P4)

Vitis sp. 37 0 0 27

5. P83958 Thaumatin-like protein (allergen Act c 2)

Actinidia chinensis 20 0 2 14

6. AAB35897 31 kDa major allergen/disease resistance protein homolog (N-terminal)

Malus × domestica 26 0 3 4

7. AAB35896 18 kDa major allergen/Bet v 1 homolog

Malus × domestica 25 0 12 43

8. P33556 Nonspecific lipid-transfer protein P2 (LTP P2)

Vitis sp. 38 0 8 24

9. 47117351 Major allergen Tur c 1 (segment 2 of 6)

Batillus cornutus 27 0 29 42

10. AAA30413 Beta-lactoglobulin Bos taurus (cattle) 14 0 0 311. AAA48944 Lysozyme protein Gallus gallus

(chicken)24 0 1 1

12. 85701136 Unnamed protein Actinidia chinensis 189 0 0 013. AAX40948 Allergen Ziz m 1 Ziziphus mauritiana 330 0 0 014. AAO65960 Unnamed protein Corylus avellana 140 0 0 015. AAA30479 Alpha s2-like casein

precursorBos tarus 222 0 0 0

16. 33300921 Allergen Api g 5 (segment 1 of 4)

Apium graveolens 22 0 0 0

17. P82947 Major Gly 50 kDa allergen

Glycine max (soybean)

17 0 0 0

18. A57106 Hull allergen Gly m 2—soybean (fragment)

Glycine max (soybean)

20 0 0 0

19. P82952 Seed allergenic protein 2 (Conglutin gamma)

Prunus dulcis (almond)

25 0 0 0

20. CAI64398 5a2 protein Triticum aestivum (bread wheat)

94 0 0 0

*Total number of allergenic proteins taken for analysis = 312.No. 12–20 are the allergenic proteins showing non-allergenic behavior depicted in bold by all the three criteria.

Table 2. List of non-allergenic proteins* showing allergenic property by AllergenOnline analysis [8-mer, 80-mer, or full FASTA].

S No. Accession No. Name of protein

Source of protein

Query sequence length (# of AA)

Number of matches with allergens80-mer 8-mer Full FASTA

1. 1H57_A HRP Armoracia rusticana (horseradish)

308 0 0 5

2 AAG17389 SOD Helicoverpa zea SNPV 159 0 0 203. 1609234B Alpha amylase

inhibitorHordeum vulgare (barley) 429 0 0 2

4. CAA07245 Carrot chalcone synthase 2

Daucus carota (carrot) 397 0 0 1

5. 1GQ8_A Chain A, pectin methylesterase

Daucus carota (carrot) 319 3 2 3

6. AAG13988 Putative protein disulfide isomerase

Prunus avium (sweet cherry)

196 0 1 4

7. ABW34442 Diacylglycerol acyltransferase

Arachis hypogaea (peanut)

340 0 0 2

*Total number of non-allergenic proteins taken for analysis = 100.

Computational allergenicity prediction of transgenic crops 415

© 2011 Informa Healthcare USA, Inc.

Table 3. Summary results of inserted proteins obtained from AllergenOnline analysis [8-mer, 80-mer, or full FASTA].

S No. Accession No. Name of protein Source of protein Query sequence length (# of AA)

Number of matches with allergens80-mer 8-mer Full FASTA

1. CAE50344 Dihydrodipicolinate synthase

Corynebacterium diphtheriae

276 0 0 0

2. BAE77904 DNA adenine methylase Escherichia coli W3110

278 0 0 0

3. AAP03690 nptII marker protein Nicotiana tabacum (tobacco)

242 0 0 0

4. P16426 Phosphinothricin N-acetyltransferase

Streptomyces hygroscopicus

183 0 0 0

5. Q57146 Phosphinothricin N-acetyltransferase

Streptomyces viridochromogenes

183 0 0 0

6. AAT81429 Coat protein Potato virus Y 268 0 0 07. P07693 S-Adenosylmethionine

hydrolaseEnterobacteria phage T3

152 0 0 0

8. Q41635 Thioesterase Umbellularia californica (California bay)

382 0 0 0

9. AAA46420 Coat protein Cucumber mosaic virus

218 0 0 0

10. 2C4B_B Barnase Bacillus amyloliquefaciens

143 0 0 0

11. CAA79714 Dihydrodipicolinate synthase

Corynebacterium glutamicum

301 0 0 0

12. AAY66992.1 Cry1Ac protein Bacillus thuringiensis (Bt)

1164 0 0 0

13. P0A372 Cry1 Ab Bacillus thuringiensis (Bt)

1155 0 0 0

14. ABC95996 Cry2Ab Bacillus thuringiensis (Bt)

633 0 0 0

15. Q9R4E4 3-Phosphoshikimate 1-carboxyvinyltransferase

Agrobacterium sp. CP4 455 0 0 0

16. P48628 Omega-6 fatty acid desaturase

Glycine max 424 0 0 0

17. CAI96522 Vegetative insecticidal protein

Bacillus thuringiensis 789 0 0 0

18. AAM53602 Coat protein Zucchini yellow mosaic virus

279 0 0 0

19. AAB21856 Coat protein Papaya ringspot virus 287 0 0 020. YP_077277 Coat protein Watermelon mosaic

virus281 0 0 0

21. Q00740 ACC deaminase Pseudomonas sp. 338 0 0 022. P30012 Quinolinate

phosphoribosyltransferase (QAPRTase)

Salmonella typhimurium

297 0 0 0

23. P10045 Nitrilase Klebsiella pneumoniae subsp. ozaenae

270 0 0 0

24. P16426-1 Phosphinothricin N-acetyltransferase

Streptomyces hygroscopicus

183 0 0 0

25. P00551 Aminoglycoside 3'-phosphotransferase

Escherichia coli 271 0 0 0

26. Q8X671 Partial beta-d- glucuronidase

Escherichia coli O157:H7

368 0 0 0

27. P16596 Coat protein (capsid protein) (CP)

Papaya mosaic potexvirus (PMV)

215 0 0 0

28. Q01901 Genome polyprotein Papaya ringspot virus (strain P/mutant HA)

3344 0 0 0

29. Q79IL2 3″-O-aminoglycoside adenylyltransferase

Klebsiella oxytoca 263 0 0 0

Table 3. continued on next page

416 A. K. Verma et al.

Immunopharmacology and Immunotoxicology

Table 5 shows analysis of non-allergenic proteins that showed false positive results, that is, proteins were pre-dicted as allergenic on AllergenOnline. Out of the seven non-allergenic proteins, those showing false positive result in sequence homology studies; four proteins namely SOD, pectin methyltransferase chain A, putative protein disulfide isomerase, and diacylglycerol acyltransferase were found ±allergen by Algpred hybrid approach. The remaining three proteins were predicted non- allergenic with accuracy namely HRP, alpha amylase inhibitor, car-rot chalcone synthase 2 by Algpred method.

(iii) ADFS allergenicity predictionAll the proteins showing anomalous behavior with AllergenOnline analysis were also analyzed on ADFS

software. This software predicted the results on the basis of motif-based and Blast search analysis.

Out of 20 allergenic proteins, 15 proteins (75%) namely allergen Api g 5, Cit l 3, Cuc m 3, Vit v 1, Act c 2, 31 kDa major allergen/disease resistance protein homolog, 18 kDa major allergen/bet v 1 homolog, nonspecific lipid-transfer protein P2, allergen ziz m1, one protein from A. chinensis, alpha s2-like casein precursor, aller-gen Api g5 (segment 1 of 4), major Gly 50 kDa allergen, one protein of C. avellana, and seed allergenic protein 2 of P. dulsis were predicted as allergen by this software; however, remaining five proteins like major allergen Tur c1, beta-lactoglobulin, lysozyme protein, hull allergen Glym2, and 5a2 protein from Triticum aestivum were still identified as non-allergenic (Table 6).

S No.

Accession No.

Name of protein

Source of protein

Query sequence length (# of AA)

Number of matches with allergens80-mer 8-mer Full FASTA

30. P14509 Aminoglycoside 3'-phosphotransferase

Escherichia coli 271 0 0 0

31. B1XF82 Beta-d-glucuronidase Escherichia coli (strain DH10B)

603 0 0 0

32. P09342 Acetolactate synthase 1, chloroplastic

Nicotiana tabacum 667 0 0 0

33. P14720 Dihydroflavonol-4- reductase

Petunia hybrida (Petunia)

380 0 0 0

34. P16426 Phosphinothricin N-acetyltransferase

Streptomyces hygroscopicus

183 0 0 0

35. P00648 Ribonuclease Bacillus amyloliquefaciens

157 0 0 0

36. P17597 Acetolactate synthase, chloroplastic

Arabidopsis thaliana (mouse-ear cress)

670 0 0 0

37. P00386 d-Nopaline dehydrogenase

Agrobacterium tumefaciens (strain T37)

413 0 0 0

38. P00807 Beta-lactamase (EC 3.5.2.6) (penicillinase)

Staphylococcus aureus 281 0 0 0

39. Q5X572 Spectinomycin phosphotransferase

Legionella pneumophila (strain Paris)

332 0 0 0

40. Q04789 Acetolactate synthase Bacillus subtilis 570 0 0 041. P17767 Genome polyprotein Plum pox potyvirus

(strain Rankovic) (PPV)3140 0 0 0

42. P0A379 Pesticidal crystal protein cry3Aa

Bacillus thuringiensis subsp. tenebrionis

644 0 0 0

43. Q15EW6 Neomycin phosphotransferase

Escherichia coli 242 0 0 0

44. Q6K2E8 Acetolactate synthase 1, chloroplastic

Oryza sativa subsp. 644 0 0 0

45. P18027 Coat protein Cucumber mosaic virus (strain Y) (CMV)

218 0 0 0

46. P18479 Genome polyprotein Zucchini yellow mosaic virus (strain California) (ZYMV)

3080 0 0 0

47. Q7D0R6 Chorismate synthase Agrobacterium tumefaciens (strain C58/ATCC 33970)

365 0 0 0

48. P30011 Nicotinate-nucleotide pyrophosphorylase

Escherichia coli 297 0 0 0

Table 3. Continued.

Computational allergenicity prediction of transgenic crops 417

© 2011 Informa Healthcare USA, Inc.

In case of seven non-allergenic proteins that were recognized as allergen by AllergenOnline, ADFS recog-nized five proteins (~72%) namely HRP, carrot chalcone synthase 2, pectin methyltransferase chain A, putative protein disulfide isomerase, and diacylglycerol acyl-transferase as non-allergenic, while two proteins (SOD and alpha amylase inhibitor) were still identified as allergen (Table 7).

As depicted in Figure 2, both Algpred and ADFS analysis reduced the false prediction as a result of AllergenOnline analysis to a great extent, irrespective to the type of pro-tein analyzed. In total, 27 proteins were predicted against their expected properties by AllergenOnline. However, use of Algpred software reduced the number of false pre-diction to six only and provided 78% correct prediction, whereas use of ADFS reduced the number of the false

prediction to seven and provided 74% correct prediction (Table 8). There is very little difference in predictability of these two methods but none the less their use increases predictability.

(iv) Comparative analysis of Algpred and ADFSBoth Algpred and ADFS significantly reduced the false results obtained by AllergenOnline analysis. In the case of allergenic proteins, Algpred was giving 90% correct prediction as compared with ADFS where correct predict-ability is 75%. In the case of non-allergenic group, seven proteins showed anomalous behavior by AllergenOnline. When these seven proteins showing false results were analyzed by Algpred, the correct percentage prediction was ~43%, whereas ADFS software was giving more accu-rate result as it was to the extent of ~71% correct. When

Table 4. Analysis of known allergenic proteins for false non-allergenic prediction by Algpred.

S. No.

Name of allergenic protein

Number of IgE epitopes in allergen

SVM module based on amino-acid composition

(threshold= −0.4)

SVM module based on dipeptide composition

(threshold = −0.2)

Blast search on allergen

representative peptides

Hybrid– hybrid

approachScore Result Score Result1. Allergen Api g 5 No IgE epitope 0.73935022 + −0.026924609 + − ±2. Allergen Cit l 3 No IgE epitope −0.33898027 + −0.17052159 + − ±3. Allergen Cuc m 3 No IgE epitope −0.30198365 + −0.18612841 + − ±4. Allergen Vit v 1 No IgE epitope 0.89295357 + 0.036251804 + − ±5. Allergen Act c 2 No IgE epitope −0.14438628 + −0.16404465 + − ±6. 31 kDa major

allergen/disease resistance protein homolog

No IgE epitope −0.072078933 + −0.10662074 + − ±

7. 18 kDa major allergen/Bet v 1 homolog

No IgE epitope 0.57626318 + −0.1290235 + − ±

8. Nonspecific lipid-transfer protein P2

No IgE epitope 0.26718876 + −0.057986561 + + +

9. Major allergen Tur c 1

IgE epitope 0.14706708 + −0.15897165 + − +

10. Beta-lactoglobulin No IgE epitope −0.073090461 + −0.1831803 + − ±11. Lysozyme protein No IgE epitope −0.36387386 + + − ±12. Unnamed protein

(Actinidia chinensis)

No IgE epitope 0.69288182 + 0.42123663 + − ±

13. Allergen Ziz m 1 (Ziziphus mauritiana)

No IgE epitope 0.56203082 + −0.16288844 + − ±

14. Unnamed protein (Corylus avellana)

No IgE epitope −1.5304213 − −0.74863182 − − −

15. Alpha s2-like casein precursor

No IgE epitope 0.24823425 + 0.8345408 + − ±

16. Allergen Api g5 (segment 1 of 4)

No IgE epitope 0.040964084 + −0.23914127 − − ±

17. Major Gly 50 kDa allergen

No IgE epitope −0.14007564 + −0.18880587 + − ±

18. Hull allergen Gly m 2—soybean (fragment)

No IgE epitope 0.011782403 + −0.1244494 + − ±

19. Seed allergenic protein 2 (Conglutin gamma)

No IgE epitope −0.57985947 − −0.21003721 − − −

20. 5a2 protein No IgE epitope 1.2659453 + 0.7684753 + − ±

418 A. K. Verma et al.

Immunopharmacology and Immunotoxicology

correct predictions of allergenic and non-allergenic proteins were added, no significant difference between Algpred and ADFS software remained, as Algpred gave ~78% correct prediction, whereas ADFS analysis could predict ~74% correctly (Figure 3 and Table 8).

Discussion

The aim to design plants with specific characteristics by artificial insertion of genes from other species or sometimes entirely different kingdoms is mainly to improve tolerance to pests, herbicides, drought, salt,

Table 6. Analysis of allergenic proteins by Allergen Database for Food Safety for their false non-allergenic potential.S. No. Access No. Name of protein Source Motif result Blast result Total1. 32363124 Allergen Api g 5 Apium graveolens Negative Positive Positive2. P84160 Nonspecific lipid-transfer

protein (LTP) (allergen Cit l 3)Citrus limon Negative Positive Positive

3. 46396596 Allergen Cuc m 3 Cucumis melo Negative Positive Positive4. P80274 Allergen Vit v 1

(nonspecific lipid-transfer protein P4)

Vitis sp. Negative Positive Positive

5. P83958 Thaumatin-like protein (allergen Act c 2)

Actinidia chinensis Negative Positive Positive

6. AAB35897 31 kDa major allergen/disease resistance protein homolog (N-terminal)

Malus × domestica Negative Positive Positive

7. AAB35896 18 kDa major allergen/ Bet v 1 homolog

Malus × domestica Negative Positive Positive

8. P33556 Nonspecific lipid-transfer protein P2

Vitis sp. Negative Positive Positive

9. 47117351 Major allergen Tur c 1 (segment 2 of 6)

Batillus cornutus Negative Negative Negative

10. AAA30413 Beta-lactoglobulin Bos taurus Negative Negative Negative11. AAA48944 Lysozyme protein Gallus gallus Negative Negative Negative12. 85701136 Unnamed protein Actinidia chinensis Negative Positive Positive13. AAX40948 Allergen Ziz m 1 Ziziphus mauritiana Negative Positive Positive14. AAO65960 Unnamed protein Corylus avellana Negative Positive Positive15. AAA30479 Alpha s2-like casein

precursorBos tarus Negative Positive Positive

16. 33300921 Allergen Api g 5 (segment 1 of 4)

Apium graveolens Negative Positive Positive

17. P82947 Major Gly 50 kDa allergen Glycine max Negative Positive Positive18. A57106 Hull allergen Gly m 2 —

soybean (fragment)Glycine max Negative Negative Negative

19. P82952 Seed allergenic protein 2 (Conglutin gamma)

Prunus dulcis Negative Positive Positive

20. CAI64398 5a2 protein Triticum aestivum Negative Negative Negative

Table 5. Analysis of non-allergenic proteins for their probable allergenic potential by Algpred.

S. No.

Name of allergenic protein

Number of IgE epitopes in allergen

SVM module based on amino-acid composition

(threshold = −0.4)

SVM module based on dipeptide composition

(threshold= −0.2)

Blast search on allergen

representative peptides

Hybrid– hybrid

approachScore Result Score Result1. HRP No IgE

epitopes−0.40101307 − −0.32596487 − + −

2. SOD No IgE epitopes

−0.31857682 + −1.1407793 − − ±

3. Alpha amylase inhibitor

No IgE epitopes

−0.5020639 − −0.40679225 − − −

4. Carrot chalcone synthase 2

No IgE epitopes

−0.96159906 − −0.56503737 − − −

5. Chain A, pectin methylesterase

No IgE epitopes

0.69488487 + −0.021643314 + − ±

6. Putative protein disulfide isomerase

No IgE epitopes

1.7529833 + 1.4655609 + − ±

7. Diacylglycerol acyltransferase

No IgE epitopes

−0.058797054 + −0.38511381 − − ±

Computational allergenicity prediction of transgenic crops 419

© 2011 Informa Healthcare USA, Inc.

and improve nutritional quality. Over a last few years, several approaches have been developed for the safety assessment of genetically engineered crops and research is going on to provide new predictive tools to define char-acteristics of allergenic protein. A number of differing recommendations for assessing the allergenicity of trans-genic proteins have been suggested [5–7]. Bioinformatics analysis of proteins introduced in genetically engineered crops is an important initial step to assess the potential allergenicity of the protein and sequence homology is one of the recommended methods [5]. There is an agreement on including sequence similarity for evaluating allerge-nicity of GM crops [20]; however, there are differences of opinions on how it should be done, as there is no large study to validate the above-mentioned methods and suggest improvements, if required. Several studies have shown that the short peptide matches (6-mer, 8-mer) between query proteins and known allergen appear to be of a little value in assessing the potential allergenicity of query protein [13–15]. Silvanovich et al. [13] as well as Stadler and Stadler [15] reported that sequence searches of short contiguous amino acids to identify allergenic

proteins is a product of chance and adds little value to allergy assessments for newly expressed proteins. Instead of this, Hileman et al. [16] reported that search for eight-amino-acid match gives less false hits as compared with six-amino-acid match. Hence, it is safer for assessment of the allergenicity of newly expressed protein by 8-mer criterion over 6-mer. Therefore, the aim of present study was to test and validate ILSI/IFBC criteria for in silico allergenicity assessment of GM crops and how the accu-racy of prediction can be improved with use of additional available tools, if any. We tried to improve the correct predictions by using Algpred and ADFS software on results obtained after AllergenOnline analysis sequence homology.

In order to fulfill this aim, 312 allergenic, 100 non-allergenic, and 48 inserted protein sequences were selected for analyzing allergenicity using AllergenOnline. Only those protein sequences were further analyzed on Algpred and ADFS softwares, where prediction was against expectation, for example, allergenic protein pre-dicted as non-allergenic and vice versa. AllergenOnline is a most commonly used public domain Web site and most updated to best of our knowledge for the purpose of sequence homology search [13,14,21]. In the present study, the same Web site was used for sequence similar-ity. Most notably, it was found that though it is one of the most commonly used Web site, still when analyzed a total of 27 (20 allergenic and seven non-allergenic) proteins (5.86%) were falsely predicted. Previous stud-ies also reported that most identical stretches are likely to be false positives [16,22,23]. Therefore, some other

Table 7. Analysis of non-allergenic proteins for their allergenic potential (AllergenOnline prediction) by Allergen Database for Food Safety based on motif method.S No. Accession No. Name of protein Source of protein Motif result Blast result Total1. 1H57_A HRP Armoracia rusticana Negative Negative Negative2. AAG17389 SOD Helicoverpa zea SNPV Negative Positive Positive3. 1609234B Alpha amylase inhibitor Hordeum vulgare (barley) Negative Positive Positive4. CAA07245 Carrot chalcone synthase 2 Daucus carota (carrot) Negative Negative Negative5. 1GQ8_A Chain A, pectin methylesterase Daucus carota (carrot) Negative Negative Negative6. AAG13988 Putative protein disulfide

isomerasePrunus avium (sweet cherry)

Negative Negative Negative

7. ABW34442 Diacylglycerol acyltransferase Arachis hypogaea (peanut) Negative Negative Negative

20

18

16

14

12

Fals

e P

redi

ctio

ns

Allergenic protein Non-allergenicprotein

Category of analysed proteins

AllergenOnline

Algpred

Allergen Database forFood Safety

10

8

6

4

2

0

Figure 2. Algpred and ADFS analysis of proteins showing unexpected behavior* by AllergenOnline. *Allergic protein showing non-allergenic and non-allergenic proteins showing allergenic.

Table 8. Analysis of false positive and false negative results by Algpred and Allergen Database for Food Safety (ADFS).Improvement in prediction (number/percent) Algpred ADFSFalse prediction for allergenic proteins (Total nos. = 20)Correct prediction (No.) 18/20 15/20Correct prediction (%) 90% 75%False prediction for non-allergenic proteins (Total nos. = 7)Correct prediction (No.) 3/7 5/7Correct prediction (%) 42.9% 71.4%Total correct prediction (allergenic + non-allergenic)

21/27 20/27

Total correct percentage prediction

77.8% 74.1%

420 A. K. Verma et al.

Immunopharmacology and Immunotoxicology

useful approaches should be used in order to get correct prediction of allergenic potential.

Using Algpred software, it was evident that there was high increase in correct prediction. Twenty allergenic

proteins were showing anomalous behavior as they were categorized as non-allergenic when analyzed by AllergenOnline. Algpred significantly reduces the false results that were shown by AllergenOnline as 18 pro-teins out of 20 were predicted as allergenic by this soft-ware; therefore, it gave only two false results (Figure 2). Therefore, in case of allergenic proteins analysis, Algpred gave 90% correct results. Similarly, in non-allergenic group, out of 100, seven proteins showed allergenic behavior by AllergenOnline. When these seven proteins were ana-lyzed by Algpred, three proteins among them were found to be (42.8%) non-allergenic whereas four of them were found as allergen (Figure 2). Therefore, we can emphasize that Algpred worked very efficiently in reducing the false results obtained by AllergenOnline and increasing correct prediction by ~78% (Figure 3). Algpred discriminates well the allergenic and non-allergenic proteins as it is shown by our study. Hence, AllergenOnline analysis followed by Algpred minimizes the risk of false results.

All of the same 27 proteins that gave false results on AllergenOnline were also tested by software ADFS, which gives tools for allergenicity prediction based on two different methods: FAO/WHO method and the motif-based method. In the present study, we have used motif-based method for allergenicity prediction as FAO/WHO method was already used in AllergenOnline approach. In case of allergenic protein analysis, 15 out of

Cor

rect

per

cent

age

pred

ictio

n

Allergenicprotein

Non-allergenicprotein

Total correct predictionof proteins showing

unexpected behaviour

Protein analyzed

Algpred software Allergen Database forFood Safety

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

Figure 3. Comparative analysis of proteins showing unexpected behavior using Algpred and ADFS softwares.

20 allergenic proteinsas non-allergenic

7 non-allergenicproteins as allergenic

Allergenicity of proteins further tested using Algpred and ADFS softwares

False Predictions by AllergenOnline (n=27)

2 protein (+) 16 protein (+/-) 4 proteins (+/-)

2 proteins(allergenic)

15 protein(allergenic)

2 proteins (-)(non-allergenic)

3 proteins (-)(non-allergenic)

5 proteins(non-allergenic)

5 proteins(non-allergenic)

77%Correct

prediction

74.07%Correct

prediction

Algpredsoftware

AllergenDatabasefor FoodSafety

Figure 4. Stepwise summary of Algpred and ADFS analysis of proteins demonstrating anomalous behavior.

Computational allergenicity prediction of transgenic crops 421

© 2011 Informa Healthcare USA, Inc.

20 proteins (75%) were predicted correctly as allergenic by ADFS, whereas in case of non-allergenic proteins, five out of seven (71.42%) were revealed as non-allergen (Figures 2 and 3). Therefore, it appears that ADFS soft-ware is more useful in allergenicity prediction of novel inserted proteins, which have no record of allergenic potential in databases. ADFS can enhance accuracy that can be helpful in assessing the allergenic potential of newly introduced protein(s) and potential risk associ-ated with use of transgenic protein in crops can be mini-mized. ADFS also decreases the fallacy of AllergenOnline predictions and producing 74% correct results but that is little less than Algpred.

In conclusion, in silico evaluation of a protein by the FASTA algorithm is the most predictive of a clinically rele-vant cross-reactive allergen. Use of Algpred and/or ADFS software may be extremely helpful as they will reduce false allergenic prediction to a great extent as depicted in Figure 4. Also, Algpred and ADFS softwares are useful in considering the analysis of allergenic proteins that were missing the screening test of Codex and AllergenOnline that predicted these proteins as non-allergenic. Hence, guideline of 100% homology to short amino acids needs to be coupled with some other bioinformatic tools for allergenicity prediction. Integration of AllergenOnline analysis for query protein to any other motif-based soft-ware may reduce the number of false hits as depicted in summary of path followed for predictive determination of protein allergenicity (Figure 5). Last, allergenicity prediction should be confirmed by in vivo and in vitro laboratory methods for allergenicity testing.

Acknowledgements

We are grateful to the Director of the Institute for his keen interest in this study and financial support of the SIP-08 of CSIR is duly acknowledged. A. Misra is thankful to Indian Council of Medical Research (ICMR), New Delhi for the award of Senior Research Fellowship and A. K. Verma is thankful to CSIR for project assistantship from SIP-08. IITR communication No. 2685

Declaration of interest

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.

References1. Kuiper, H.A., Kleter, G.A., Noteborn, H.P., Kok, E.J. Assessment of

the food safety issues related to genetically modified foods. Plant J. 2001, 27, 503–528.

2. FAO/WHO. Codex principles and guidelines on foods derived from biotechnology. FAO, Rome. 2003.

3. Schilter, B., Constable, A. Regulatory control of genetically modified (GM) foods: likely developments. Toxicol. Lett. 2002, 127, 341–349.

4. Ladics, G.S., Bardina, L., Cressman, R.F., Mattsson, J.L., Sampson, H.A. Lack of cross-reactivity between the Bacillus thuringiensis derived protein Cry1F in maize grain and dust mite Der p7 protein with human sera positive for Der p7-IgE. Regul. Toxicol. Pharmacol. 2006, 44, 136–143.

5. Metcalfe, D.D., Astwood, J.D., Townsend, R., Sampson, H.A., Taylor, S.L., Fuchs, R.L. Assessment of the allergenic potential of

New protein to be used in transgenic crop

Sequence homology performed for evaluatingallergenicity using ILSI/ Codex guidelines

Positive for allergenicity

Positive forallergenicity

Protein may be allergenic

May be used in transgenic crops Protein may be non-allergenic

Negative forallergenicity

Negative for allergenicity

Protein further analysed by motif basedmethods for allergenicity

Figure 5. Summary of path followed for predictive determination of allergenicity of a protein.

422 A. K. Verma et al.

Immunopharmacology and Immunotoxicology

foods derived from genetically engineered crop plants. Crit. Rev. Food Sci. Nutr. 1996, 36(Suppl.), S165–S186.

6. FAO/WHO. Evaluation of allergenicity of genetically modified foods. Report of a joint FAO/WHO expert consultation on allergenicity of foods derived from biotechnology. (Food and Agriculture Organization of the United Nations (FAO), Rome, 2001.

7. Codex Alimentarius Commission. Alinorm 03/34: Joint FAO/WHO Food Standard Programme, Codex Alimentarius Commission, Twenty-Fifth Session, Rome, 30 June–5 July, 2003. Appendix III, Guideline for the conduct of food safety assessment of foods derived from recombinant-DNA plants and Appendix IV, Annex on the assessment of possible allergenicity, 2003.

8. OECD. Report of the Organization for Economic cooperation and Development Workshop on the Toxicological and Nutritional Testing of Novel Foods, Aussois, France, 5–8 March 1997. France: Organization for Economic cooperation and Development, 1998.

9. Nadler, M.J., Matthews, S.A., Turner, H., Kinet, J.P. Signal transduction by the high-affinity immunoglobulin E receptor Fc epsilon RI: coupling form to function. Adv. Immunol. 2000, 76, 325–355.

10. Metzger, H. The high affinity receptor for IgE on mast cells. Clin. Exp. Allergy 1991, 21, 269–279.

11. Burks, A.W., Shin, D., Cockrell, G., Stanley, J.S., Helm, R.M., Bannon, G.A. Mapping and mutational analysis of the IgE-binding epitopes on Ara h 1, a legume vicilin protein and a major allergen in peanut hypersensitivity. Eur. J. Biochem. 1997, 245, 334–339.

12. Banerjee, B., Greenberger, P.A., Fink, J.N., Kurup, V.P. Conformational and linear B-cell epitopes of Asp f 2, a major allergen of Aspergillus fumigatus, bind differently to immunoglobulin E antibody in the sera of allergic bronchopulmonary aspergillosis patients. Infect. Immun. 1999, 67, 2284–2291.

13. Silvanovich, A., Nemeth, M.A., Song, P., Herman, R., Tagliani, L., Bannon, G.A. The value of short amino acid sequence matches

for prediction of protein allergenicity. Toxicol. Sci. 2006, 90, 252–258.

14. Herman, R.A., Song, P., Thirumalaiswamysekhar, A. Value of eight amino-acid matches in predicting the allergenicity status of proteins: an empirical bioinformatic investigation. Clin. Mol. Allergy 2009, 7–9.

15. Stadler, M.B., Stadler, B.M. Allergenicity prediction by protein sequence. FASEB J. 2003, 17, 1141–1143.

16. Hileman, R.E., Silvanovich, A., Goodman, R.E., Rice, E.A., Holleschak, G., Astwood, J.D., Hefle, S.L. Bioinformatic methods for allergenicity assessment using a comprehensive allergen database. Int. Arch. Allergy Immunol. 2002, 128, 280–291.

17. Pearson, W.R. Effective protein sequence comparison. Meth. Enzymol. 1996, 266, 227–258.

18. Saha, S., Raghava, G.P. AlgPred: prediction of allergenic proteins and mapping of IgE epitopes. Nucleic Acids Res. 2006, 34, W202–W209.

19. Björklund, A.K., Soeria-Atmadja, D., Zorzet, A., Hammerling, U., Gustafsson, M.G. Supervised identification of allergen-representative peptides for in silico detection of potentially allergenic proteins. Bioinformatics 2005, 21, 39–50.

20. Goodman, R., Silvanovich, A., Hileman, R.E., Bannon, G.A., Rice, E.A., Astwood, J.D. Bioinformatic methods for identifying known or potential allergens in the safety assessment of genetically modified crops. Comments Toxicol. 2002, 8, 251–269.

21. Goodman, R.E., Taylor, S.L., Yamamura, J., Kobayashi, T., Kawakami, H., Kruger, C.L., Thompson, G.P. Assessment of the potential allergenicity of a milk basic protein fraction. Food Chem. Toxicol. 2007, 45, 1787–1794.

22. Gendel, S.M. Sequence analysis for assessing potential allergenicity. Ann. N.Y. Acad. Sci. 2002, 964, 87–98.

23. Kleter, G.A., Peijnenburg, A.A. Screening of transgenic proteins expressed in transgenic food crops for the presence of short amino acid sequences identical to potential, IgE-binding linear epitopes of allergens. BMC Struct. Biol. 2002, 2–8.