SnS-Align: a graphic tool for alignment of distantly related proteins

Int. J. Bioinformatics Research and Applications, Vol. 5, No. 6, 2009 663

Copyright © 2009 Inderscience Enterprises Ltd.

SnS-Align: a graphic tool for alignment of distantly related proteins

Ganiraju Manyam and Ancha Baranova Department of Molecular and Microbiology, College of Science, George Mason University, PW2 University Blvd., Manassas, VA 20110, USA E-mail: [email protected] E-mail: [email protected]

Mikhail Skoblov Research Center for Medical Genetics, RAMS, Moskvorechie Str., 1, Moscow, Russian Federation E-mail: [email protected]

Rakesh K. Mishra* Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad 500007, India E-mail: [email protected] *Corresponding author

Abstract: Genomic sequences for many animal species are now available in the public domain. Protein similarity search in evolutionarily distant organisms by sequence comparison often turns out to be difficult. Here, we present the Structure and Sequence Alignment (SnS-Align) tool that graphically presents pairwise local alignment of sandwiched protein sequences, a hybrid of the primary protein sequence and its secondary structure. The utility of the tool is demonstrated by sample analysis of the gap junction protein superfamily of innexins/pannexins and the classic myoglobin family. SnS-Align can also be used for demarcation of the structurally conserved domains within superfamilies of paralogous genes.

Keywords: sequence and structure alignment; remote homology detection; sandwiched protein sequence comparison; pairwise sequence alignment; structurally conserved domains; SnS-Align.

Reference to this paper should be made as follows: Manyam, G., Baranova, A., Skoblov, M. and Mishra, R.K. (2009) ‘SnS-Align: a graphic tool for alignment of distantly related proteins’, Int. J. Bioinformatics Research and Applications, Vol. 5, No. 6, pp.663–673.

Biographical notes: Ganiraju Manyam received his MS in Bioinformatics from International Institute of Information Technology, Hyderabad, India, and is now working on his PhD in Biosciences at Molecular and Microbiology Department, College of Science, George Mason University, Fairfax, VA, USA. His research interests are in the areas of data mining, machine learning and bioinformatics.

664 G. Manyam et al.

Ancha Baranova received her PhD in Molecular Biology from Moscow State University, Moscow, Russia. Since 1998–2002, she worked as senior scientist in Vavilov Institute of General Genetics, Moscow, Russia. Since 2002, she is an Assistant Professor in Molecular and Microbiology Department, College of Science, George Mason University, Fairfax, VA, USA. Her research interests are functional genomics and bioinformatics.

Mikhail Skoblov received his PhD in Genetics from the Russian Academy of Science in 2004. Currently, he is a group leader at the Research Center for Medical Genetics, Russian Academy of Medical Sciences. His research interests are bioinformatics, genome expression, transcription factors, and promoters and transcriptional regulation.

Rakesh K. Mishra received his DPhil from the University of Allahabad, India, in 1986. For two years (1986–1988), he did his postdoctoral research in the Indian Institute of Science, India. Later, he joined the Center for Cellular and Molecular Biology (CCMB), India, as a scientist during 1988–1992. He carried out further postdoctoral research in the University of Bordeaux, France (1992–1995), Saint Louise University School of Medicine, USA (1995–1996) and University of Geneva, Switzerland (1996–2001). He again joined CCMB as Senior Scientist in March 2001.

1 Introduction

Many genomic sequences are now available, as increasingly large numbers of genome projects are pouring in data in the public domain. Finding evolutionary relationship among various proteins encoded by these genomes is one of the major aims of these studies. Sequence alignment is a commonly used tool to detect the homology between proteins. However, this approach is unreliable when the sequence similarity is lower than 30%, as may be likely the case when evolutionarily distant proteins are aligned. Whereas several sequence comparison tools have been developed for finding homologous proteins in closely related organisms, finding their relatives in evolutionarily distant organisms remains to be a demanding task.

In many instances, primary sequence of the protein changes over the course of evolution, whereas the biochemical function is retained through preservation of a few critical residues that are structurally positioned within the critical points of the molecule. Because of that, the proteins with similar secondary structure and fold might vary in their primary sequences (Rost et al., 1997). Thus, description of the protein structure is commonly taken into consideration during remote homology detection (Fariselli et al., 2007). For example, crystal structures often lead to identification of functionally related domains in the proteins with no significant sequence similarity. Unfortunately, compared with the size of individual proteomes, the crystals solved are very few. Therefore, development of in silico prediction methods for protein structure shall be instrumental to comparing and aligning proteins by their structural elements (Rost, 2001; Szustakowski and Weng, 2000).

https://www.researchgate.net/publication/13981741_Fold_recognition_by_prediction-based_threading?el=1_x_8&enrichId=rgreq-c1f79f69-3704-43a0-99d0-1d31b062bef8&enrichSource=Y292ZXJQYWdlOzM4MDY1Mjk1O0FTOjk5NTU3NDA5MDM0MjQ5QDE0MDA3NDc3NDQ4NDQ=

SnS-Align: a graphic tool for alignment 665

Current alignment algorithms and remote homology detection methods are based on sequence–sequence, sequence–structure and structure–structure comparisons (Wan and Xu, 2005). Sequence–sequence based detection of the distant homology can be done by PSI-BLAST (Altschul et al., 1997) that creates profiles of the hits obtained by multiple alignment after each iteration. Information extracted from these profiles is used for subsequent iterations. Recognition of the distantly related protein family members requires multiple iterations of PSI-BLAST. Another sequence-based finder for remote homologies, the Jumping Alignment algorithm (Spang et al., 2002), uses both vertical information (sequence profiles) and horizontal information (conserved domains) retrieved from multiple sequence alignments.

Probabilistic models of the sequence profiles are built upon Hidden Markov Models (HMMs) that are later compared with the query sequence(s) to identify remote homologies (Karplus et al., 1998). Similarly, HMM to HMM comparisons can also be utilised for remote homology detection (Soding, 2005). FUGUE uses sequence to structure paradigm (Shi et al., 2001), where multiple sequences are aligned with multiple structures located in the pre-existing Homologous Structure Alignment Database (HOMSTRAD) (de Bakker et al., 2001). Homology-Derived Structures of Proteins (HSSP) is another pre-existing database populated with the sequence homologues for each of the protein 3D structures submitted to the Protein Data Bank (PDB) (Dodge et al., 1998). A sequence-based search of HSSP results in alignments is based on the known protein structures. HSSP and many other existing protein alignment algorithms are based on established structure classifications, and, therefore, are limited to them.

Structure to structure comparisons usually lead to better results, as empirically demonstrated by tools like DALI (Holm and Sander, 1999) that relies on the database containing the descriptions of protein domain architecture, the definitions of their structural neighbours and a comprehensive library of explicit multiple alignments of distantly related protein families. On the other hand, DALI-like tools are of less help for remote homology detection owing to the difficulties of the reliable generation of the structural models (Russell et al., 1997). Secondary structural information was incorporated in the protein sequence alignment, which showed an improved efficiency in detecting evolutionary distant proteins (Wallqvist et al., 2000). Combining evolutionary information along with predicted structure with sequence profiles also seems to improve the quality of alignment in remote homologous proteins (Ohlson et al., 2006). Multiple alignment of proteins with sequence and structure information is also shown to improve the protein motif identification (Kim and Xie, 2006), realising the importance of the secondary structural information in the context of sequence alignment.

Here, we present another novel method for pairwise alignment of proteins that employs information on both the sequence and the secondary structure of the protein. This is a more simplistic and user-friendly tool that relies on the notion that divergent sequences may fold to similar structures that retain their function (Michnick and Shakhnovich, 1998). Alignments produced by SnS software are visualised and scored to emphasise the similarities of the evolutionarily distant proteins both at the sequence and at the structure level. The standalone graphic tool, SnS-Align, is available for free download at http://mason.gmu.edu/~gmanyam/SnS-Align.html and http://www.ccmb. res.in/rakeshmishra/tools.html.

https://www.researchgate.net/publication/11822222_HOMSTRAD_Adding_sequence_information_structure-based_alignments_of_homologous_protein_families?el=1_x_8&enrichId=rgreq-c1f79f69-3704-43a0-99d0-1d31b062bef8&enrichSource=Y292ZXJQYWdlOzM4MDY1Mjk1O0FTOjk5NTU3NDA5MDM0MjQ5QDE0MDA3NDc3NDQ4NDQ=

https://www.researchgate.net/publication/10985667_A_Novel_Approach_to_Remote_Homology_Detection_Jumping_Alignments?el=1_x_8&enrichId=rgreq-c1f79f69-3704-43a0-99d0-1d31b062bef8&enrichSource=Y292ZXJQYWdlOzM4MDY1Mjk1O0FTOjk5NTU3NDA5MDM0MjQ5QDE0MDA3NDc3NDQ4NDQ=

https://www.researchgate.net/publication/14020723_Recognition_of_analogous_and_homologous_protein_folds_Analysis_of_sequence_and_structure_conservation?el=1_x_8&enrichId=rgreq-c1f79f69-3704-43a0-99d0-1d31b062bef8&enrichSource=Y292ZXJQYWdlOzM4MDY1Mjk1O0FTOjk5NTU3NDA5MDM0MjQ5QDE0MDA3NDc3NDQ4NDQ=

https://www.researchgate.net/publication/13440947_Protein_folds_and_families_Sequence_and_structure_alignments?el=1_x_8&enrichId=rgreq-c1f79f69-3704-43a0-99d0-1d31b062bef8&enrichSource=Y292ZXJQYWdlOzM4MDY1Mjk1O0FTOjk5NTU3NDA5MDM0MjQ5QDE0MDA3NDc3NDQ4NDQ=

https://www.researchgate.net/publication/13830039_The_HSSP_database_of_protein_structure-sequence_alignments_and_family_profiles?el=1_x_8&enrichId=rgreq-c1f79f69-3704-43a0-99d0-1d31b062bef8&enrichSource=Y292ZXJQYWdlOzM4MDY1Mjk1O0FTOjk5NTU3NDA5MDM0MjQ5QDE0MDA3NDc3NDQ4NDQ=

https://www.researchgate.net/publication/6563903_Protein_Multiple_Alignment_Incorporating_Primary_and_Secondary_Structure_Information?el=1_x_8&enrichId=rgreq-c1f79f69-3704-43a0-99d0-1d31b062bef8&enrichSource=Y292ZXJQYWdlOzM4MDY1Mjk1O0FTOjk5NTU3NDA5MDM0MjQ5QDE0MDA3NDc3NDQ4NDQ=

https://www.researchgate.net/publication/12171008_Iterative_sequencesecondary_structure_search_for_protein_homologs_comparison_with_amino_acid_sequence_alignments_and_application_to_fold_recognition_in_genome_databases?el=1_x_8&enrichId=rgreq-c1f79f69-3704-43a0-99d0-1d31b062bef8&enrichSource=Y292ZXJQYWdlOzM4MDY1Mjk1O0FTOjk5NTU3NDA5MDM0MjQ5QDE0MDA3NDc3NDQ4NDQ=

https://www.researchgate.net/publication/222576607_FUGUE_Sequence-structure_Homology_Recognition_Using_Environment-specific_Substitution_Tables_and_Structure-dependent_Gap_Penalties?el=1_x_8&enrichId=rgreq-c1f79f69-3704-43a0-99d0-1d31b062bef8&enrichSource=Y292ZXJQYWdlOzM4MDY1Mjk1O0FTOjk5NTU3NDA5MDM0MjQ5QDE0MDA3NDc3NDQ4NDQ=

https://www.researchgate.net/publication/13360942_Hidden_Markov_Models_for_Detecting_Remote_Protein_Homologies?el=1_x_8&enrichId=rgreq-c1f79f69-3704-43a0-99d0-1d31b062bef8&enrichSource=Y292ZXJQYWdlOzM4MDY1Mjk1O0FTOjk5NTU3NDA5MDM0MjQ5QDE0MDA3NDc3NDQ4NDQ=

https://www.researchgate.net/publication/8188455_Protein_Homology_detection_by_HMM-HMM_comparison?el=1_x_8&enrichId=rgreq-c1f79f69-3704-43a0-99d0-1d31b062bef8&enrichSource=Y292ZXJQYWdlOzM4MDY1Mjk1O0FTOjk5NTU3NDA5MDM0MjQ5QDE0MDA3NDc3NDQ4NDQ=


2 Algorithm and implementation

2.1 Sandwiched sequences

SnS-Align initially formulates the sandwiched sequence, which represents the secondary structural elements (α-Helices, β-Sheets) along with the primary sequence. The predicted structural elements are embedded in the primary sequence, replacing the corresponding amino acid residues (see Figure 1). Most conserved structural elements, i.e., the α-Helices and β-Sheets, are considered at the structural level, whereas the remaining structural elements like the loops and turns are aligned at the sequence level (D’Alfonso et al., 2001).

Figure 1 Sandwiched sequence is the overlap of protein secondary structure and its primary amino acid sequence. The following symbols designate the corresponding secondary structural elements

#: α-Helix; $: β-Sheet and –: Turn or coil.

2.2 Process flow

SnS-Align could use either protein or nucleotide sequence as inputs to query the target database defined by the user as a set of protein or genomic (DNA) sequences. Nucleotide queries are translated into protein(s) conceptually from the longest Open Reading Frame (ORF) possible. In case of target database of the genomic sequences, automatic transformation is performed to corresponding protein sequences. The secondary structures are predicted for both the query and the database protein sequences, and then along with primary amino acid sequences of the proteins are transformed into sandwiched sequences. These sandwiched sequences are subjected to local alignment using the Smith–Waterman algorithm (Smith and Waterman, 1981) and pre-calculated PAM and BLOSUM scoring matrices. Both the secondary structural elements in the sandwiched sequence, α-Helices and β-Sheets, are given a score equal to the arithmetic mean of all individual amino acid scores in the scoring matrix selected by user. Amino acids occupying conserved positions within the conserved structure (hence, functionally important) receive an additional bonus score equal to the score of that particular amino acid in the user-selected matrix, along with the secondary structural score. Therefore, a priority order for the scoring pattern is established from sequence to structure and then from structure to ‘sequence and structure’.


2.3 Implementation

The algorithm is implemented in Perl. Graphical user interface is built using the Tk module. SnS-Align accepts fasta-formatted files of protein or DNA sequences as ‘inputs’. The secondary structure prediction is performed using GOR method (Garnier et al., 1978), and the corresponding programme garnier in the EMBOSS package is utilised for this task (Rice et al., 2000). The garnier output is parsed and the regions corresponding to α-Helices and β-Sheets in the primary protein sequence are transformed, to formulate the sandwiched protein sequence. These sandwiched sequences are subject to local alignment by Smith–Waterman algorithm (Smith and Waterman, 1981), using the water module of the EMBOSS package (Rice et al., 2000). The local alignment output is parsed and the scoring pattern is applied to each level of conservation, i.e., to sequence, structure and then ‘sequence and structure’, respectively. The final output is graphically displayed to show the conservation at various levels of sequence and structure of aligned proteins. Alignments are sorted in the descending order of their scores. The conservation is represented for sequence and structure by ‘+’ and ‘*’, respectively. If an amino acid is found to be conserved within a secondary structure, the amino acid symbol is displayed at the corresponding position. Snapshot of a sample SnS alignment is shown in Figure 2. This graphic can be saved in a postscript format. User can also review the summary of alignments performed that display the percentage of the similarity and scores, which are shown for each alignment. Both the alignment(s) and the summary can also be saved in a text format.

Figure 2 A screenshot of the SnS-Align output comparing the human and porcine myoglobin proteins. The α-Helices and β-Sheets are represented graphically, along with the similarity score. In the alignment, ‘*’ indicates conserved secondary structures and ‘+’ indicates conservative amino acid positions


3 Results and discussion

The SnS alignment tool can be used to align homologous protein sequences among evolutionarily distant species. It can be utilised to identify proteins belonging to the same family, distant homologues, conserved domains and critical residues within the structurally conserved regions. Both the sequence and the structure data are used to align a sandwiched sequence, which represents the secondary structural elements (α-Helices, β-Sheets) along with the primary sequence.

3.1 Myoglobin family

We tested SnS-Align with human myoglobin protein as a query and its orthologues taken from Swissprot/Trembl database. The classical sequence-based alignment produced by Smith–Waterman algorithm (Smith and Waterman, 1981) showed up a good similarity for the evolutionarily closer organisms but becomes inefficient when the evolutionary distance is increased (see Table 1). The comparative analysis showed the superior performance of SnS-Align algorithm in aligning distant protein sequences and highlighting similarities between evolutionarily distant homologues using both primary sequence and secondary structural information (Table 1). All tested myoglobins analysed by SnS Alignment showed a similarity of at least 53% (with a gap opening penalty: 10, gap extension penalty: 0.5 using BLOSUM62 scoring matrix) to their human counterpart. A number of the conserved amino acid positions were automatically highlighted within the secondary structural elements (Figure 2). Some of these residues might be critical in the physiological function of myoglobin (Devos and Valencia, 2000).

Table 1 Identities and Similarities of evolutionary distant proteins related to human myoglobin as revealed by Smith–Waterman and SnS pairwise alignment algorithms. Approximate time of divergence for listed species could be found in Supplementary File (http://mason.gmu.edu/~gmanyam/SnS-Align.html)

Smith–Waterman alignment (%) SnS alignment (%)

Sequence ID Identity Similarity Identity Similarity Organism GLB_APLLI 28.42 43.16 55.06 56.33 Aplysia limacina

(Slug sea hare) GLB_BIOGL 23.81 44.9 52.29 52.94 Biomphalaria glabrata

(Bloodfluke planorb) GLB_BUCUU 28.4 39.51 65.1 65.1 Buccinum undatum

(Common whelk) GLB_CERRH 23.31 39.85 53.25 53.25 Cerithidea rhizophorarum

(Water snail) GLB_DICDE 26.25 40 63.06 64.86 Dicrocoelium dendriticum

(Small liver fluke) GLB_ISOHY 25.23 38.32 62.59 62.59 Isoparorchis hypselobagri

(Trematode) GLB_NASMU 24.11 43.75 60.59 60.59 Nassa mutabilis (Sea snail)


Table 1 Identities and Similarities of evolutionary distant proteins related to human myoglobin as revealed by Smith–Waterman and SnS pairwise alignment algorithms. Approximate time of divergence for listed species could be found in Supplementary File (http://mason.gmu.edu/~gmanyam/SnS-Align.html) (continued)

Smith–Waterman alignment (%) SnS alignment (%)

Sequence ID Identity Similarity Identity Similarity Organism MYG_BRARE 43.54 60.54 67.67 68.42 Brachydanio rerio

(Zebrafish) (Danio rerio) MYG_CHIRA 43.92 60.81 68.03 68.03 Chionodraco rastrospinosus

(Ocellated icefish) MYG_CRYAN 44.59 60.14 63.16 63.16 Cryodraco antarcticus

(Crocodile icefish) MYG_GOBGI 44.59 60.81 71.53 71.53 Gobionotothen gibberifrons

(Humped rockcod) MYG_THUAL 46 59.33 64.33 64.33 Thunnus albacares

(Yellowfin tuna) Q8T6J8_CLOSI 28.89 48.15 62.16 62.84 Clonorchis sinensis Q9DGI8_KATPE 45.89 58.22 65.77 65.77 Katsuwonus pelamis

(Skipjack tuna) (Bonito) Q9DGJ0_SARCH 47.97 60.81 69.23 69.23 Sarda chiliensis (Sard) Q9DGJ1_MAKNI 48.67 60 68.59 68.59 Makaira nigricans

(Blue marlin) Q9GQX0_PAREP 26.39 46.53 55.48 55.48 Paramphistomum epiclitum Q9QZ76_RAT 83.01 91.5 80.39 83.01 Rattus norvegicus (Rat) MYG_ZIPCA 84.97 91.5 81.7 82.35 Ziphius cavirostris

(Goose-beaked whale) MYG_PIG 93.46 96.08 84.97 84.97 Sus scrofa (Pig)

3.2 Innexin/pannexin superfamily

SnS-Align algorithm was tested further using human pannexin family. Human genome contains three pannexin genes that encode vertebrate homologues of the innexins – invertebrate family of gap junction proteins (Baranova et al., 2004). Amino acid sequences of the connexins and innexins lack similarity detectable by traditional BLAST algorithm. Nonetheless, these groups of proteins have similar topology with four transmembrane domains (Yen and Saier, 2007). SnS alignment performed with any of the three human pannexins as a query revealed its similarity to various innexins scored as 38–48% (with a gap opening penalty: 10, gap extension penalty: 0.5 according to BLOSUM62 scoring matrix). The structural elements that showed up in the alignments reflect hypothesised similarities in their physiological function (Baranova et al., 2004). Figure 3 depicts SnS alignment of human pannexin-1 and innexin-1 of Homarus gammarus (European lobster). Despite very poor conservation at the level of primary amino acid sequence, automatically highlighted structural similarities aid researcher in visualisation of the conserved domains that probably retain functional similarities.


Figure 3 SnS alignment of human pannexin-1 and innexin-1 of Homarus gammarus (European lobster). Although the conservation of the primary amino acid sequence is poor, the conservation at the structure level is sufficient to depict functional similarity between pannexin and innexin

3.3 Discussion

The general objectives of SnS-Align are an identification of homologous sequences present in the distantly related proteomes and a graphical visualisation of the alignments of these sequences. Graphical interface of SnS-Align is especially useful for the analysis of the cases when there is no notable sequence similarity between the query and target protein sequences. SnS-Align relies on the GOR method of the secondary structure prediction, which is utilised later to form the protein sandwiched sequence (Garnier et al., 1978). The prediction of the secondary structure is performed for both the query and the target sequences. The graphic interface of SnS-Align reflects both primary protein sequence alignment and embedded secondary structural elements. Thus, SnS-Align facilitates an extraction of evolutionary relationship between individual proteins through visualisation of the conserved structural elements common for distantly related orthologues and paralogues. Particularly, SnS-Align tool has been employed for


visualisation of the structural similarities of the mammalian pannexins present in the vertebrate genomes and invertebrate gap junction proteins called innexins (Litvin et al., 2006). Before the discovery of the pannexins (Baranova et al., 2004), it was believed that vertebrate gap junctions are made exclusively of chordate-specific connexins. Thus, extermination of innexin-encoding genes within the chordates ancestry and their substitution with unrelated gene family of connexins represented an evolutionary enigma. Pannexins were discovered serendipitously, through TBLASTX querying of cDNA database with iterative PSI-BLAST search (Baranova et al., 2004; Panchin et al., 2000). Later, it has been postulated that pannexins are ubiquitous in Metazoa (Panchin, 2005), whereas evolutionary origin of connexins remains mysterious. Currently, it is unknown whether they can be located in echinoderms and, respectively, in earlier deuterostome ancestors and Cnidaria. Use of the graphical alignment visualisation tool SnS-Align may facilitate this and similar studies in the field of evolutionary biology that will be performed as soon as corresponding complete genomes will be assembled.

SnS Alignment tool can also be used for the curation of the false positive hits that are expected to be found within relatively long lists of the potential homologues detected by whole genomic searches performed by PSI-BLAST (Altschul et al., 1997), Jumping alignment (Spang et al., 2002) or for that case, any other remote homology detection methods. For example, SnS alignment successfully identified an orthologue of human myoglobin within PSI-BLAST generated list of targets from Zebrafish proteome. SnS-Align ranked this orthologue as the best hit, which was otherwise ranked as 33rd by PSI-BLAST. In addition to that, one can use SnS-Align as a simple tool for an identification of the structurally conserved amino acid residues demarcating structural domains of the distantly related proteins or in superfamilies of paralogues within the same proteome.

Our future plans include the development of a number of additional SnS-Align modules based on other types of secondary structure differing from α-Helices and β-Sheets, e.g., on hydrophobicity plots, entropy profiles. The local alignment algorithm of sandwiched sequences can be further advanced to a multiple alignment method, which gives a scope to identify families of structurally related proteins that have a poor similarity at the sequence level.

4 Conclusion

SnS-Align is a pairwise alignment tool graphically representing the local alignment of sandwiched protein sequences, a hybrid of the primary protein sequence and its secondary structure as an output. The programme accepts protein or nucleotide sequences in fasta format and produces clearly depicted visualisation of the conserved elements at both the structure and the sequence levels. The scoring hierarchy is built up from sequence to structure and then from structure to ‘sequence and structure’. This enables the SnS to extract not only the conserved protein regions, but also the conserved residues within the secondary structure. The utility of the tool is demonstrated by sample analysis of the gap junction protein superfamily of innexins/pannexins and the classic myoglobin family. In both these cases, SnS-Align identified the distant homologues through scoring of their conserved structural elements. SnS-Align can also be used both for remote homologue identification and for demarcation of the structurally conserved domains within superfamilies of paralogous genes.


References Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J.

(1997) ‘Gapped BLAST and PSI-BLAST: a new generation of protein database search programs’, Nucleic Acids Res., Vol. 25, No. 17, pp.3389–3402.

Baranova, A., Ivanov, D., Petrash, N., Pestova, A., Skoblov, M., Kelmanson, I., Shagin, D., Nazarenko, S., Geraymovych, E., Litvin, O., Tiunova, A., Born, T.L., Usman, N., Staroverov, D., Lukyanov, S. and Panchin, Y. (2004) ‘The mammalian pannexin family is homologous to the invertebrate innexin gap junction proteins’, Genomics, Vol. 83, No. 4, pp.706–716.

D’alfonso, G., Tramontano, A. and Lahm, A. (2001) ‘Structural conservation in single-domain proteins: implications for homology modeling’, J. Struct. Biol., Vol. 134, Nos. 2–3, pp.246–256.

De Bakker, P.I., Bateman, A., Burke, D.F., Miguel, R.N., Mizuguchi, K., Shi, J., Shirai, H. and Blundell, T.L. (2001) ‘HOMSTRAD: adding sequence information to structure-based alignments of homologous protein families’, Bioinformatics, Vol. 17, No. 8, pp.748, 749.

Devos, D. and Valencia, A. (2000) ‘Practical limits of function prediction’, Proteins, Vol. 41, No. 1, pp.98–107.

Dodge, C., Schneider, R. and Sander, C. (1998) ‘The HSSP database of protein structure-sequence alignments and family profiles’, Nucleic Acids Res., Vol. 26, No. 1, pp.313–315.

Fariselli, P., Rossi, I., Capriotti, E. and Casadio, R. (2007) ‘The WWWH of remote homolog detection: the state of the art’, Brief Bioinform., Vol. 8, No. 2, pp.78–87.

Garnier, J., Osguthorpe, D.J. and Robson, B. (1978) ‘Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins’, J. Mol. Biol., Vol. 120, No. 1, pp.97–120.

Holm, L. and Sander, C. (1999) ‘Protein folds and families: sequence and structure alignments’, Nucleic Acids Res., Vol. 27, No. 1, pp.244–247.

Karplus, K., Barrett, C. and Hughey, R. (1998) ‘Hidden Markov models for detecting remote protein homologies’, Bioinformatics, Vol. 14, No. 10, pp.846–856.

Kim, N.K. and Xie, J. (2006) ‘Protein multiple alignment incorporating primary and secondary structure information’, J. Comput. Biol., Vol. 13, No. 10, pp.1735–1748.

Litvin, O., Tiunova, A., Connell-Alberts, Y., Panchin, Y. and Baranova, A. (2006) ‘What is hidden in the pannexin treasure trove: the sneak peek and the guesswork’, J. Cell Mol. Med., Vol. 10, No. 3, pp.613–634.

Michnick, S.W. and Shakhnovich, E. (1998) ‘A strategy for detecting the conservation of folding-nucleus residues in protein superfamilies’, Fold Des., Vol. 3, No. 4, pp.239–251.

Ohlson, T., Aggarwal, V., Elofsson, A. and Maccallum, R.M. (2006) ‘Improved alignment quality by combining evolutionary information, predicted secondary structure and self-organizing maps’, BMC Bioinformatics, Vol. 7, p.357.

Panchin, Y., Kelmanson, I., Matz, M., Lukyanov, K., Usman, N. and Lukyanov, S. (2000) ‘A ubiquitous family of putative gap junction molecules’, Curr. Biol., Vol. 10, No. 13, pp.R473–R474.

Panchin, Y.V. (2005) ‘Evolution of gap junction proteins – the pannexin alternative’, J. Exp. Biol., Vol. 208, No. 8, pp.1415–1419.

Rice, P., Longden, I. and Bleasby, A. (2000) ‘EMBOSS: the European molecular biology open software suite’, Trends Genet., Vol. 16, No. 6, pp.276–277.

Rost, B. (2001) ‘Review: protein secondary structure prediction continues to rise’, J. Struct. Biol., Vol. 134, Nos. 2–3, pp.204–218.

Rost, B., Schneider, R. and Sander, C. (1997) ‘Protein fold recognition by prediction-based threading’, J. Mol. Biol., Vol. 270, No. 3, pp.471–480.


Russell, R.B., Saqi, M.A., Sayle, R.A., Bates, P.A. and Sternberg, M.J. (1997) ‘Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation’, J. Mol. Biol., Vol. 269, No. 3, pp.423–439.

Shi, J., Blundell, T.L. and Mizuguchi, K. (2001) ‘FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties’, J. Mol. Biol., Vol. 310, No. 1, pp.243–257.

Smith, T.F. and Waterman, M.S. (1981) ‘Identification of common molecular subsequences’, J. Mol. Biol., Vol. 147, No. 1, pp.195–197.

Soding, J. (2005) ‘Protein homology detection by HMM-HMM comparison’, Bioinformatics, Vol. 21, No. 7, pp.951–960.

Spang, R., Rehmsmeier, M. and Stoye, J. (2002) ‘A novel approach to remote homology detection: jumping alignments’, J. Comput Biol., Vol. 9, No. 5, pp.747–760.

Szustakowski, J.D. and Weng, Z. (2000) ‘Protein structure alignment using a genetic algorithm’, Proteins, Vol. 38, No. 4, pp.428–440.

Wallqvist, A., Fukunishi, Y., Murphy, L.R., Fadel, A. and Levy, R.M. (2000) ‘Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases’, Bioinformatics, Vol. 16, No. 11, pp.988–1002.

Wan, X.F. and Xu, D. (2005) ‘Computational methods for remote homolog identification’, Curr. Protein Pept. Sci., Vol. 6, No. 6, pp.527–546.

Yen, M.R. and Saier Jr., M.H. (2007) ‘Gap junctional proteins of animals: the innexin/pannexin superfamily’, Prog. Biophys. Mol. Biol., Vol. 94, Nos. 1–2, pp.5–14.

SnS-Align: a graphic tool for alignment of distantly related proteins

Documents

Transcript of SnS-Align: a graphic tool for alignment of distantly related proteins