Functional annotation of putative hypothetical proteins from Candida dubliniensis

8
Functional annotation of putative hypothetical proteins from Candida dubliniensis Kundan Kumar, Amresh Prakash, Munazzah Tasleem, Asimul Islam, Faizan Ahmad, Md. Imtaiyaz Hassan Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi 110025, India abstract article info Article history: Received 23 November 2013 Received in revised form 27 March 2014 Accepted 28 March 2014 Available online xxxx Keywords: Candida dubliniensis Hypothetical protein Sequence analysis Functional annotation Functional genomics An extensive analysis of C. dubliniensis proteomics data showed that ~22% protein are conserved hypothetical proteins (HPs) whose function is still not determined precisely. Analysis of gene sequence of HPs provides a plat- form to establish sequencefunction relationships to a more profound understanding of the molecular machinery of organisms at systems level. Here we have combined the latest versions of bioinformatics tools including, pro- tein family, motifs, intrinsic features from the amino acid sequence, sequencefunction relationship, pathway analysis, etc. to assign a precise function to HPs for which no any experimental information is available. Our re- sults show that 27 HPs have well dened functions and we categorized them as enzyme, nucleic acid binding, transport protein, etc. Five HPs showed adhesin character that is likely to be essential for the survival of yeast and pathogenesis. We also addressed issues related to the sub-cellular localization and signal peptide identica- tion which provides an idea about its colocalization and function. The outcome of the present study may facilitate better understanding of mechanism of virulence, drug resistance, pathogenesis, adaptability to host, tolerance for host immune response, and drug discovery for treatment of C. dubliniensis infections. © 2014 Elsevier B.V. All rights reserved. 1. Introduction Candida dubliniensis is a germ tube-positive yeast, that acts as an op- portunist pathogen (Sebti et al., 2001). Normally, this species of Candida is harmless in different body parts but it may become virulent under certain conditions (Sullivan et al., 2005). Candida present in most of the body parts including the oral cavity, urine, vagina, lung, feces and sputum, especially in immunocompromised individuals/HIV-infected patient (Sebti et al., 2001). Clinically, 2 to 7% of candidemia caused by C. dubliniensis, showed their presence in the gastrointestinal tract. Candida showed a wide range of infections from supercial vaginal and oral mucosa to serious systematic infections (Sullivan et al., 2005). These infections are usually countered with the administration of antifungal drugs, however, treatment becomes more difcult with the development of resistance to antifungal agents (Moran et al., 1997). Furthermore, a close phenotypic resemblance of C. dubliniensis with Candida albicans makes the clinical diagnosis more difcult (O'Connor et al., 2010). Although, C. dubliniensis is less pathogenic than C. albicans, its ability to produce hyphae and having more survival time pronounce its pathogenicity (Jackson et al., 2009). Hence, this spe- cies is a prime target of investigation of fungal infection, especially for the condition of low immunity and frequent development of resistance to antifungal agents (Sullivan and Coleman, 1998). Recently, the genome of C. dubliniensis has been sequenced, and open a new promising channel for extensive research (Jackson et al., 2009). The genome of C. dubliniensis is composed of eight chromosomes containing 262288 reads with a total length of 14.6 Mb. An extensive analysis of C. dubliniensis genome leads to the identication of 1323 pro- teins as hypothetical out of 5860 open reading frames (Jackson et al., 2009). HPs are predicted from open reading frame, having no experi- mental evidence of translation and from their functional annotation (Nimrod et al., 2008). Nearly, half of the proteins in most genomes be- long to HPs, and have an absolute importance to complete genomic and proteomic information (Loewenstein et al., 2009). Recent studies suggest many signicant roles of HPs because it constitutes a consider- able fraction of proteomes and has a reasonable probability that these proteins are novel with uncharacterized biological roles (Adams et al., 2007; Desler et al., 2009; Eisenstein et al., 2000). HPs generally contain low identity compared to other known or annotated proteins (Galperin and Koonin, 2004). However, recent studies showed that a large fraction of genes encoding HPs have strong phylogenetic linkages with known proteins (Mazandu and Mulder, 2012; Shahbaaz et al., 2013). Further- more, we have been working on the structure based drug design and Gene xxx (2014) xxxxxx Abbreviations: HP, hypothetical protein; BLAST, basic local alignment search tool; PSI- BLAST, position specic iterative basic local alignment search tool; HMMTOP, prediction of transmembrane helices and topology of proteins; TMHMM, membrane protein topology prediction method based on a hidden Markov model; CATH, class, architecture, topology and homology; GRAVY, grand average of hydropathicity; CDD, Conserved Domain Database; SMART, simple modular architecture research tool; PANTHER, Protein ANalysis THrough Evolutionary Relationships; SVM, Support Vector Machine; PP2C, pro- tein phosphatase 2C; SAM, S-adenosyl methionine; DGK, diacylglycerol kinase; CMD, carboxymuconolactone decarboxylase; MFS, major facilitator superfamily. Corresponding author. E-mail address: [email protected] (M.I. Hassan). GENE-39572; No. of pages: 8; 4C: http://dx.doi.org/10.1016/j.gene.2014.03.060 0378-1119/© 2014 Elsevier B.V. All rights reserved. Contents lists available at ScienceDirect Gene journal homepage: www.elsevier.com/locate/gene Please cite this article as: Kumar, K., et al., Functional annotation of putative hypothetical proteins from Candida dubliniensis, Gene (2014), http:// dx.doi.org/10.1016/j.gene.2014.03.060

Transcript of Functional annotation of putative hypothetical proteins from Candida dubliniensis

Gene xxx (2014) xxx–xxx

GENE-39572; No. of pages: 8; 4C:

Contents lists available at ScienceDirect

Gene

j ourna l homepage: www.e lsev ie r .com/ locate /gene

Functional annotation of putative hypothetical proteins fromCandida dubliniensis

Kundan Kumar, Amresh Prakash, Munazzah Tasleem, Asimul Islam, Faizan Ahmad, Md. Imtaiyaz Hassan ⁎Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi 110025, India

Abbreviations:HP, hypothetical protein; BLAST, basic lBLAST, position specific iterative basic local alignment seartransmembrane helices and topology of proteins; TMHMprediction method based on a hidden Markov model; CAand homology; GRAVY, grand average of hydropathicDatabase; SMART, simple modular architecture reseANalysis THrough Evolutionary Relationships; SVM, Supptein phosphatase 2C; SAM, S­adenosyl methionine; DGcarboxymuconolactone decarboxylase; MFS, major facilita⁎ Corresponding author.

E-mail address: [email protected] (M.I. Hassan).

http://dx.doi.org/10.1016/j.gene.2014.03.0600378-1119/© 2014 Elsevier B.V. All rights reserved.

Please cite this article as: Kumar, K., et al., Fundx.doi.org/10.1016/j.gene.2014.03.060

a b s t r a c t

a r t i c l e i n f o

Article history:Received 23 November 2013Received in revised form 27 March 2014Accepted 28 March 2014Available online xxxx

Keywords:Candida dubliniensisHypothetical proteinSequence analysisFunctional annotationFunctional genomics

An extensive analysis of C. dubliniensis proteomics data showed that ~22% protein are conserved hypotheticalproteins (HPs)whose function is still not determined precisely. Analysis of gene sequence of HPs provides a plat-form to establish sequence–function relationships to amore profoundunderstanding of themolecularmachineryof organisms at systems level. Here we have combined the latest versions of bioinformatics tools including, pro-tein family, motifs, intrinsic features from the amino acid sequence, sequence–function relationship, pathwayanalysis, etc. to assign a precise function to HPs for which no any experimental information is available. Our re-sults show that 27 HPs have well defined functions and we categorized them as enzyme, nucleic acid binding,transport protein, etc. Five HPs showed adhesin character that is likely to be essential for the survival of yeastand pathogenesis. We also addressed issues related to the sub-cellular localization and signal peptide identifica-tionwhich provides an idea about its colocalization and function. The outcome of the present studymay facilitatebetter understanding ofmechanism of virulence, drug resistance, pathogenesis, adaptability to host, tolerance forhost immune response, and drug discovery for treatment of C. dubliniensis infections.

© 2014 Elsevier B.V. All rights reserved.

1. Introduction

Candida dubliniensis is a germ tube-positive yeast, that acts as an op-portunist pathogen (Sebti et al., 2001). Normally, this species of Candidais harmless in different body parts but it may become virulent undercertain conditions (Sullivan et al., 2005). Candida present in most ofthe body parts including the oral cavity, urine, vagina, lung, feces andsputum, especially in immunocompromised individuals/HIV-infectedpatient (Sebti et al., 2001). Clinically, 2 to 7% of candidemia caused byC. dubliniensis, showed their presence in the gastrointestinal tract.Candida showed a wide range of infections from superficial vaginaland oral mucosa to serious systematic infections (Sullivan et al.,2005). These infections are usually countered with the administrationof antifungal drugs, however, treatment becomes more difficult withthe development of resistance to antifungal agents (Moran et al.,1997). Furthermore, a close phenotypic resemblance of C. dubliniensis

ocal alignment search tool; PSI-ch tool; HMMTOP, prediction ofM, membrane protein topologyTH, class, architecture, topologyity; CDD, Conserved Domainarch tool; PANTHER, Proteinort Vector Machine; PP2C, pro-K, diacylglycerol kinase; CMD,tor superfamily.

ctional annotation of putativ

with Candida albicans makes the clinical diagnosis more difficult(O'Connor et al., 2010). Although, C. dubliniensis is less pathogenicthan C. albicans, its ability to produce hyphae and having more survivaltime pronounce its pathogenicity (Jackson et al., 2009). Hence, this spe-cies is a prime target of investigation of fungal infection, especially forthe condition of low immunity and frequent development of resistanceto antifungal agents (Sullivan and Coleman, 1998).

Recently, the genome of C. dubliniensis has been sequenced, andopen a new promising channel for extensive research (Jackson et al.,2009). The genome of C. dubliniensis is composed of eight chromosomescontaining 262288 reads with a total length of 14.6 Mb. An extensiveanalysis of C. dubliniensis genome leads to the identification of 1323pro-teins as hypothetical out of 5860 open reading frames (Jackson et al.,2009). HPs are predicted from open reading frame, having no experi-mental evidence of translation and from their functional annotation(Nimrod et al., 2008). Nearly, half of the proteins in most genomes be-long to HPs, and have an absolute importance to complete genomicand proteomic information (Loewenstein et al., 2009). Recent studiessuggest many significant roles of HPs because it constitutes a consider-able fraction of proteomes and has a reasonable probability that theseproteins are novel with uncharacterized biological roles (Adams et al.,2007; Desler et al., 2009; Eisenstein et al., 2000). HPs generally containlow identity compared to other known or annotated proteins (GalperinandKoonin, 2004). However, recent studies showed that a large fractionof genes encoding HPs have strong phylogenetic linkages with knownproteins (Mazandu and Mulder, 2012; Shahbaaz et al., 2013). Further-more, we have been working on the structure based drug design and

e hypothetical proteins from Candida dubliniensis, Gene (2014), http://

2 K. Kumar et al. / Gene xxx (2014) xxx–xxx

searching for a novel therapeutic targets (Hassan et al., 2007a, 2007b;Thakur and Hassan, 2011; Thakur et al., 2013). Therefore, HPs may alsoserve as markers and a potential drug target for drug design, discoveryand screen. A precise annotation of HPs of a particular genome leads tothe discovery of new functions, andhelps in bringing out a list of addition-al protein pathways and cascades, thus completing our fragmentaryknowledge on biological significance of many novel proteins.

The use of advance bioinformatics tools for sequence analysis isan initial step to identify homology shared between proteins, whichcould lead to a robust function prediction. Here, we have successfullycharacterized 43 HPs of C. dubliniensis using various computationaltools. Preliminary sequence analysis of all 43 HPs was carried usingBLAST-P, PSI-BLAST, Pfam and CDD search. Their functions were in-ferred on the basis of the presence of specific motifs, important re-gion(s) and specific folds, using InterProScan, InterPro, ScanPrositeand PFP-FunDSeqE. Other bioinformatics tools such as ProtParam,HMMTOP, TMHMM, SOSUI and CATH have been used precisely toprecisely define physicochemical property, subcellular localizationand their family. Furthermore, adhesin like proteins, or human patho-genic fungal adhesins were identified with FungalRV. Furthermore,C. dubliniensis is one of the major causative agents of infection inimmuno-compromised individual, especially inHIV/AIDS patients. There-fore, functional annotation of HPs may lead to identification of novel tar-gets for better treatment and understanding of C. dubliniensis infections.

2. Materials and methods

2.1. Sequence retrieval and homology search

Search for HP sequences of C. dubliniensiswas carried out on UniProtdatabase (http://www.uniprot.org/). The FASTA sequence along withtheir UniProt ID and primary accession number of 43 HPs were takenseparately to perform sequence analysis. UniProt ID of protein hasbeen used to identify the protein sequence to perform sequence analy-sis. Table 1 provides list of all tools and software that were used forthe functional annotation of HPs from C. dubliniensis. We used BLAST-Pand PSI-BLAST for searching similar sequences with known function

Table 1List of bioinformatics tools and databases used for function prediction.

S. N Tools URL Uses

1. Sequence similarity search tooli BLAST http://blast.ncbi.nlm.nih.gov/Blast.cgi To findii ClustalW2 https://www.ebi.ac.uk/Tools/msa/clustalw2/ Sequen

2. Biophysical &chemical characterizationi ProtoParam http://web.expasy.org/protparam/ To calc

3. Function predictioni. Conserved Domain http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi Used toii. InterProScan http://www.ebi.ac.uk/Tools/pfa/iprscan/ For funiii. Interpro http://www.ebi.ac.uk/interpro/ For fun

domainiv ScanProsite http://prosite.expasy.org/scanprosite/ Used tov Panther http://www.pantherdb.org/ Classifyvi Pfam http://pfam.sanger.ac.uk/ Classifyvii SMART http://smart.embl-heidelberg.de/ Allow a

5. Sub-cellular localization of the proteini. SOSUI http://bp.nuap.nagoya-u.ac.jp/sosui/sosui_submit.html Used to

proteinii. TMHMM http://www.cbs.dtu.dk/services/TMHMM/ Used toiii. Psort II http://psort.hgc.jp/form2.html Used toiv. SignalP http://www.cbs.dtu.dk/services/SignalP/ Predictv. HMMTOP http://www.enzim.hu/hmmtop/index.php Predict

6. Prediction of fold patterni. PFP-FunDSeqE http://www.csbio.sjtu.edu.cn/bioinf/PFP-FunDSeqE/ Used to

7. Virulence predictioni. FungalRV fungalrv.igib.res.in/query.php Used in

Please cite this article as: Kumar, K., et al., Functional annotation of putativdx.doi.org/10.1016/j.gene.2014.03.060

(Altschul and Koonin, 1998; Altschul et al., 1997). Top hits were selectedand further analyzed using ClustalW to find the alignment of functionalresidues of protein of known function with the sequence of HPs(Thompson et al., 2002).

2.2. Physicochemical characterization

Theoretical physiochemical parameters such as molecular weight,isoelectric point, aliphatic index, instability index and grand average ofhydropathicity (GRAVY) of each protein was carried out on Expasy'sProtParam server (http://web.expasy.org/protparam/). Results of thisanalysis are listed in Table S1.

2.3. Sub-cellular localization

In order to identify a protein as a drug or vaccine target, sub-cellularlocalization of the protein is essentially important. Surface membraneprotein can be used as a potential vaccine target while cytoplasmic pro-teins may act as promising drug targets (Vahisalu et al., 2008).We usedPSORT II tool (Nakai and Horton, 1999) for the prediction of sub-cellularlocalization protein. Online tools, TMHMM, SOSUI and HMMTOP wereused for predicting the propensity of a protein for being a membraneprotein, based on Hidden Markov Model (Chen et al., 2003; Hirokawaet al., 1998). SingnalP 4.1 (Petersen et al., 2011) was used to predictthe signal peptide and location of cleavage site in the peptide chainbased on neural network method. Results of these predictions are sum-marized in Table 2.

2.4. Function prediction

In order to assign a precise function to HPs from C. dubliniensis, wefirst analyzed all sequences on Conserved Domain Database (CDD)(Marchler-Bauer et al., 2011), SMART (Letunic et al., 2012), ScanProsite,CATH and PANTHER. CDD includes manually curated domain modelbased on the tertiary structure of the protein to provide sequence/structure/function relationship in an organized hierarchy of familyand superfamily (Marchler-Bauer et al., 2011). SMART compares

the similar sequence in the gene databasece comparison to compare homologous region

ulate various physical and chemical parameters for a given protein sequence

search Conserved Domain in the sequencesctional analysis of the amino acid sequences by finding the specific motif in the sequencesctional analysis of proteins on the basis of protein family categorization by predictings and important sitesscan profile based on domains, motifs and patternproteins on the basis of evolutionary relation and biological processprotein into family on the basis of multiple sequence alignmentnalysis of the domain in the protein sequences

identify weather the given protein sequences is of soluble protein or of trans-membrane

predict the transmembrane topology of the proteinpredict sub-cellular localization with a good reliabilitycleavage site of signal proteintransmembrane helix and topology of the protein

find the type of protein fold in the protein sequence

adhesin prediction

e hypothetical proteins from Candida dubliniensis, Gene (2014), http://

Table 2Sub-cellular localization of HPs.

S. no. Uniprot ID HMMTOP SOSUI TMHMM SignalP Psort

1 B9W9J1 NIL S NIL NSP Nuclear2 B9WBA5 3 TMH M, 2 TMH 2 TMH NSP Nuclear3 B9WFD2 NIL S NIL NSP Nuclear4 B9WFD7 NIL S NIL NSP Nuclear5 B9WFE4 1 TMH S NIL NSP Nuclear6 B9WFE6 NIL S NIL NSP Nuclear7 B9WFF1 NIL S NIL NSP Nuclear8 B9WFF7 NIL S NIL NSP Nuclear9 B9WFG4 NIL S NIL NSP Nuclear10 B9WFG8 1 TMH S NIL SP Nuclear11 B9WFG9 NIL S NIL NSP Nuclear12 B9WFH2 10 TMH M, 10 TMH 9 TMH NSP Cytoplasmic13 B9WFH4 NIL S NIL NSP Nuclear14 B9WFM3 1 TMH S NIL NSP Nuclear15 B9WFP3 1 TMH S 1 TMH NSP Nuclear16 B9WFR1 NIL S NIL NSP Nuclear17 B9WFR8 4 TMH M, 1 TMH NIL NSP Nuclear18 B9WFR9 2TMH S NIL NSP Cytoplasmic19 B9WFS0 NIL S NIL NSP Cytoplasmic20 B9WFS1 NIL S NIL NSP Cytoplasmic21 B9WFS2 2 TMH S NIL NSP Nuclear22 B9WFS4 2 TMH M, 1 TMH 2 TMH NSP Cytoplasmic23 B9WFS6 NIL S NIL NSP Nuclear24 B9WFT3 1 TMH M, 1 TMH 1 TMH NSP Nuclear25 B9WFT7 1 TMH M, 2 TMH NIL SP Nuclear26 B9WFT8 NIL M, 1 TMH 1 TMH NSP Nuclear27 B9WFU3 1 TMH M, 1 TMH NIL SP Nuclear28 B9WFU7 NIL S NIL NSP Nuclear29 B9WFU9 NIL S NIL NSP Cytoplasmic30 B9WFV3 NIL S NIL NSP Nuclear31 B9WFV7 2 TMH M, 2 TMH 2 TMH NSP Cytoplasmic32 B9WFW2 NIL S NIL NSP Cytoplasmic33 B9WFW8 NIL S NIL NSP Nuclear34 B9WFX1 NIL S NIL NSP Nuclear35 B9WIA6 NIL S NIL SP Nuclear36 B9WIB2 2 TMH M, 1 TMH 2 TMH NSP Cytoplasmic37 B9WIC3 NIL S NIL NSP Nuclear38 B9WIF0 NIL S NIL NSP Nuclear39 B9WIF4 1 TMH M, 1 TMH 1 TMH NSP Nuclear40 B9WIG1 NIL S NIL NSP Nuclear41 B9WIG2 NIL S NIL NSP Nuclear42 B9WIH4 1 TMH S 1 TMH NSP Nuclear43 B9WIH5 1 TMH M, 2 TMH 1 TMH SP Nuclear

TMH — transmembrane helix.

Enzyme Protein

DNA Binding Protein

RNA Binding Protin

Protin Binding

ATP binding

Phoshphoinositide Binding

Transport

Structural

Fig. 1. HPs classified into different groups based on their functions.

3K. Kumar et al. / Gene xxx (2014) xxx–xxx

the query sequence with the database and searches sequence withsimilar domain based on domain architecture and profiles. SMARTperforms multiple sequence alignment and identifies region in the se-quence which is compositionally biased such as transmembrane, coiledcoil portion and signal peptide (Letunic et al., 2012). ScanProsite is apublicly available web-based tool to scan PROSITE profile that is basedon the protein domains, families and functional sites and associatedpat-tern in the protein sequence that is structurally and functionally critical.CATH brings about structurally related protein even with low sequenceidentities (Orengo et al., 1997). Likewise, PANTHER is a widely-ranged,curated database of protein families, subfamilies, trees and was used tofind evolutionary relationships to deduce the functionality of HPs(Thomas et al., 2003). In a protein, motifs are signatures of the proteinfunction that can be used as a basis to define the family of proteins, par-ticularly enzymes inwhichmotifs are associated with catalytic function(Bork and Koonin, 1996). We used the InterProScan (Quevillon et al.,2005),which combines different protein signature recognitionmethodsfrom the InterPro consortium for motif discovery. We have used webserver PFP-FunDSeqE to find out protein fold pattern, based on combi-nation of functional domain information and evolutionary information.

2.5. Virulence factor analysis

Adhesins are characterized as a potential target for vaccine develop-ment because they are an essential factor thatmakes the fungus interact

Please cite this article as: Kumar, K., et al., Functional annotation of putativdx.doi.org/10.1016/j.gene.2014.03.060

with the host cell and cause infection. FungalRV (Chaudhuri et al., 2011)is a tool based on Support Vector Machine (SVM) method and trainedby a large number of compositional properties that are used to classifyhuman pathogenic fungal adhesins and adhesin like proteins.

3. Results

3.1. Sequence analysis

In the present study, we systematically analyzed the sequence of43 HPs from C. dubliniensis genome, using modern bioinformaticstools. Here, BLAST-P, PSI-BLAST, Pfam, CDD search, InterProScan, InterProand PFP-FunDSeqE have been used for functional annotation of theseHPs. We successfully assigned the function of 27 HPs very precisely(Fig. 1, Table 3).We found awell defined domain in 22HPs showing cor-responding functions (Table 4). All 43 HPs have been characterized fortheir folding patterns, and types of folds present in each protein are listedin Table 5 and Fig. 2. Interestingly, 19 HPs showed their close resem-blance with immunoglobulin type protein. Conversely, HPs: B9WFG8,B9WFU7, B9WFX1, B9WIG1 and B9WIH5 showed a close structural re-semblance to the viral coat and capsid proteins. FewHPs have TIM barreland thioredoxin like fold. The adhesin like character studied usingFungalRV showed that out of 43 proteins,five proteinsmay have adhesinlike signature, indicating their role in pathogenesis. Furthermore, eightHPs have nucleic acid binding property in which three are RNA-binding proteins and five are DNA-binding (Table 6). Many HPs possessenzymatic activities and are categorized as hydrolases, phosphatase,transferases, kinase, oxidoreductases, and peroxiredoxin. B9WFE4 andB9WFD2 showed ATP and phospholipid-binding activities, respectively.HPs, B9WFH2 and B9WIH5 may act as transporter protein. Here, weprovide a detailed analysis of each group of proteins.

3.2. Enzymes

Enzymes produced by the yeast, have a key role for its survival intheir host because they provide nutrient for growth, and are responsiblefor pathogenesis. Enzymes modify the local environment for favorablegrowth inside the host and essential for various metabolism. These en-zymes may also affect the physiology of the organism (Bjornson, 1984).We found 17 HPs showing catalytic activity and have been categorizedin six classes. A detailed knowledge of these enzymes is important forunderstanding the molecular basis of pathogenesis and host–pathogeninteraction.

3.2.1. HydrolaseHydrolytic enzymes play key roles in the invasion of the host tissue

and evading the host defensemechanism. It is an important virulence fac-tor in the vaginal infection caused by C. albicans (Schaller et al., 2005). Inour study, we have found that HP B9WFS1 is comprised of anα/β hydro-lase fold and possesses hydrolytic activity (Marchler-Bauer et al., 2011).This fold is very common in several hydrolytic enzymes having differentphylogenetic origins and different catalytic functions. However, they allhave similar core architecture and topology (Ollis et al., 1992). Theseenzymes have catalytic triad, three specific residues, namely, serine,

e hypothetical proteins from Candida dubliniensis, Gene (2014), http://

Table 3Predicted function of HPs from Candida dubliniensis.

S. no. Gene ID Uniprot ID Protein function

1. 8045310 B9W9J1 Peroxiredoxin activity2. 8047341 B9WFD2 Phosphoinositide binding3. 8047346 B9WFD7 Structural protein4. 8047351 B9WFE4 ATP binding5. 8047353 B9WFE6 RNA binding6. 8047358 B9WFF1 Protein binding7. 8047371 B9WFG4 Phosphatase8. 8047376 B9WFG9 RNA binding9. 8047379 B9WFH2 Transporter activity10. 8047381 B9WFH4 Kinase activity11. 8047605 B9WFM3 Protein binding12. 8047460 B9WFR1 Transferase activity13. 8047467 B9WFR8 DNA binding14. 8047468 B9WFR9 Transferase activity15. 8047470 B9WFS1 Hydrolase activity16. 8047471 B9WFS2 DNA binding17. 8047600 B9WFS4 Protein binding18. 8047474 B9WFS6 RNA binding19. 8047491 B9WFU3 Oxidoreductase activity20. 8047495 B9WFU7 DNA binding21. 8047497 B9WFU9 Hydrolase22. 8047654 B9WFV3 DNA binding23. 8047663 B9WFW2 Kinase activity24. 8047669 B9WFW8 DNA binding25. 8048708 B9WIB2 Protein binding26. 8048742 B9WIF0 Oxidoreductase activity27. 8048763 B9WIH5 Transport activity

Table 4List of domains identified in the HPs from Candida dubliniensis.

S. no. Uniprot ID Conserved Domain (super family)

1. B9W9J1 Carboxymuconolactone decarboxylase (CMD)2. B9WBA5 Hypothetical protein FLILHELTA3. B9WFD2 ANTH domain family4. B9WFE4 Archaeal ATPasea

5. B9WFG4 PP2Cc super family6. B9WFG8 lipoprotein A (RlpA)-like double-psi beta-barrel7. B9WFG9 PIN domain8. B9WFH2 Major facilitator superfamily (MFS)

Sugar (and other) transportera

9. B9WFH4 Diacylglycerol kinase catalytic domain (DAG)LCB5; Sphingosine kinase and enzymesa

10. B9WFM3 CUE domain11. B9WFP3 Oxidoreductase-like protein, N-terminal12. B9WFR8 Oxidoreductase-like protein, N-terminal,

GAL4-like Zn2Cys6 binuclear cluster DNA-binding domainGAL4-like Zn(II)2Cys6 (C6 zinc) binuclear cluster DNA-bindingdomaina

13. B9WFR9 1. CoA-transferase family III2. Predicted acyl-CoA transferases/carnitine dehydratasea

14. B9WFS1 Putative lysophospholipaseAlpha/beta hydrolase familya

15. B9WFS2 fungal transcription factor regulatory middle homology regionGAL4-like Zn2Cys6 binuclear cluster DNA-binding domainGAL4-like Zn(II)2Cys6 (C6 zinc) binuclear cluster DNA-bindingdomaina

16. B9WFS4 Chaperone for protein-folding within the ER, fungala

17. B9WFS6 Putative RNA methyltransferase18. B9WFU3 Protein disulfide isomerase (PDIa) family

Protein disulfide oxidoreductases and proteins with athioredoxin fold

19. B9WFU7 Fungal transcription factor regulatory middle homology regionGAL4-like Zn2Cys6 binuclear cluster DNA-binding domainGAL4-like Zn(II)2Cys6 (C6 zinc) binuclear cluster DNA-bindingdomaina

20. B9WFV3 GAL4-like Zn2Cys6 binuclear cluster DNA-binding domainGAL4-like Zn(II)2Cys6 (C6 zinc) binuclear cluster DNA-bindingdomaina

21. B9WFW2 Yersinia pseudotuberculosis carbohydrate kinase-like subgroupNucleotide-binding domain of the sugar kinase/HSP70/actinsuperfamilyFGGY-family pentulose kinasea

22. B9WFW8 Rad17 cell cycle checkpoint proteina

a Multi-domain protein.

Table 5Different types of folds identified in HPs from Candida dubliniensis.

S. no. Fold type UniProt ID

1. Beta-trefoil B9W9J1, B9WFG92. Small inhibitors, toxins, lectins B9WBA5, B9WFS43. Immunoglobulin-like B9WFD2, B9WFE4,B9WFF7, B9WFG4,

B9WFH2, B9WFH4, B9WFP3, B9WFR1,B9WFS0, B9WFS6, B9WFT3, B9WFT7,B9WFT8, B9WFV7, B9WIB2, B9WIC3,B9WIF4, B9WIG2, B9WIH4

4. DNA binding 3-helical B9WFD7, B9WFR85. 4-helical cytokines B9WFE6, B9WFS26. Ob-fold B9WFF17. Viral coat and capsid proteins B9WFG8, B9WFU7, B9WFX1,

B9WIG1, B9WIH58. 4-helical up and down bundle B9WFM39. TIM-barrel B9WFR9, B9WFW810. Hydrolases B9WFS111. Thioredoxin like B9WFU3, B9WIF012. Belta-grasp B9WFU913. Cupredoxins B9WFV314. Ribonuclease h-like motif B9WFW215. Cona-like lectin/glucanases B9WIA6

4 K. Kumar et al. / Gene xxx (2014) xxx–xxx

glutamate or aspartate and a histidine, in their catalytic domain(Marchler-Bauer et al., 2011), a signature of serine protease (Chenand Bode, 1983). B9WFU9 shows the presence of N-glycanase signaturesequence (Quevillon et al., 2005), and possesses hydrolase activity, sim-ilar to P21163A, an amidase protein, found in bacteria Elizabethkingiamiricola, and cleaves β-aspartylglucosylamine bond of asparagine-linked glycans (Kuhn et al., 1994), essential for pathogenesis. Our se-quence based function analysis clearly indicates the presence of varioushydrolases not known to earlier which may be involved in the patho-genesis, and essential for the survival of Candida.

3.2.2. PhosphataseDecrease in phosphate concentration of host may lead to in-

crease in virulence of pathogens like C. albicans, Candida glabrata andSaccharomyces cerevisiae (Powell et al., 2012). Phosphatase enzymes se-creted by these pathogens lead to depletion of phosphate level in thelocal environment of the infection sites to enhance their pathogenicity.Protein B9WFG4 has a conserved domain with protein phosphatase2C (PP2C), a major family of serine/threonine phosphatase protein(Marchler-Bauer et al., 2011; Quevillon et al., 2005). This is a Mn2+

or Mg2+ dependent Ser/Thr phosphatase protein essential for regu-lating cellular stress responses in eukaryotes (Das et al., 1996), similarto PTC1 of S. cerevisiae, that have shown similar activity with conservedcatalytic domain to PP2C (Maeda et al., 1993). However, for its functionPTC1 requires higher concentration of divalent ion than PP2C to func-tion (Maeda et al., 1993).

3.2.3. TransferaseIn yeast some transferases have been found to play significant role

against oxidative stress (Garcera et al., 2010). In our study proteinB9WFR1 shows a signature domain of the methylase subunit of type IDNA methyltransferase (Quevillon et al., 2005), that presumably in-volved in adenine-specific DNA–methyltransferase activity (Quevillonet al., 2005). In human, N-6 adenine-specific DNA methyltransferase 1is present which is orthologous to the yeast MTQ2 gene, and encodesS­adenosyl methionine (SAM)­dependent methyltransferase thatparticipates in arsenic metabolism to detoxify the cyto-toxicity(monomethylarsonous acid) (Ren et al., 2011). Similarly HP

Please cite this article as: Kumar, K., et al., Functional annotation of putativdx.doi.org/10.1016/j.gene.2014.03.060

B9WFR9 also shows transferase activity. In this protein a domainfor CoA-transferase family III has been found (Marchler-Bauer et al.,2011; Quevillon et al., 2005). Formyl-CoA transferase is found in

e hypothetical proteins from Candida dubliniensis, Gene (2014), http://

Beta-trefoil

Small inhibitors, toxins, lectins

Immunoglobulin-like

DNA binding 3-helical

4-helical cytokines

Ob-fold

Viral coat and capsid proteins

4-helical up and down Bundle

TIM-barrel

Hydrolase

Thioredoxin like

Belta-grasp

Cupredoxins

Ribonuclease h-like motif

Cona-like lectin/glucanases

Fig. 2. HPs classified on the basis of types of fold present.

5K. Kumar et al. / Gene xxx (2014) xxx–xxx

Oxalobacter formigenes, a bacterium present in the intestine and in-volved in oxalate catabolism in mammal catalysis transfer of CoA fromformate to oxalate in the first step of oxalate degradation byO. formigenes (Ricagno et al., 2003). This is the key enzyme foroxalate-dependent ATP synthesis.

3.2.4. Kinase activityKinases play an essential role in cell cycle regulation, filamentous

growth and signal transduction in the Candida (Bruckmann et al., 2000;Monge et al., 2006). HP B9WFH4has a catalytic domain for diacylglycerolkinase (DGK) (Marchler-Bauer et al., 2011; Quevillon et al., 2005). It wasknown that proteins belonging to this family prevent activation of pro-tein kinase C by converting diacylglycerol, which is a protein kinase C ac-tivator, to phosphatidic acid (Bakali et al., 2007). There aremore than tenisozymes of DGK that have been reported. These isoforms possess a vari-ety of regulatory domains that play key roles in signal transduction path-ways, neural and immune responses, cytoskeleton reorganization andcarcinogenesis (Sakane et al., 2007). HP B9WFH4 has domain of sphin-gosine kinase, a DGK related enzyme (Marchler-Bauer et al., 2011).

Table 6Functional categories of HPs.

Predicted functions HPs

Enzymatic activityHydrolase activity B9WFS1, B9WFU9Phosphatase activity B9WFG4Transferase activity B9WFR1, B9WFR9Kinase activity B9WFH4, B9WFW2Oxidoreductase activity B9WIF0, B9WFU3Peroxiredoxin activity B9W9J1

Binding proteinDNA binding B9WFR8, B9WFS2, B9WFU7, B9WFV3, B9WFW8RNA binding B9WFE6, B9WFS6, B9WFG9Protein binding B9WFF1, B9WFM3, B9WFS4, B9WIB2ATP binding B9WFE4Phosphoinositide-binding B9WFD2

Other proteinsTransport activity B9WFH2, B9WIH5Structural protein B9WFD7

Please cite this article as: Kumar, K., et al., Functional annotation of putativdx.doi.org/10.1016/j.gene.2014.03.060

Recently, sphingosine kinase has been reported as an oncogene, a po-tential neoplastic drug target. This kinase plays a significant role inpro-inflammatory and anti-apoptotic pathways (Bakali et al., 2007).Another protein showing kinase activity is HP B9WFW2. This proteinhas domain for Yersinia pseudotuberculosis carbohydrate kinase-likesubgroup which belongs to FGGY family of carbohydrate kinases(Marchler-Bauer et al., 2011; Quevillon et al., 2005). Protein of this fam-ily catalyzes ATP-dependent phosphorylation in the presence of Mg2+

(Lim and Cohen, 1966).

3.2.5. OxidoreductaseWe found that HP B9WIF0 contains a domain signature for the

NADH–ubiquinone oxidoreductase, suggesting that it may be involvedin dehydrogenase activity (Quevillon et al., 2005). Other HP B9WFU3is expected to show oxidoreductase activity because it has redox activeTRX domain. It contains a CXXC motif, a signature of the protein disul-fide isomerase (PDIa) family (Marchler-Bauer et al., 2011; Quevillonet al., 2005).Member of this family acts as oxidases by catalyzing forma-tion of disulphide bond of polypeptide in the endoplasmic reticulumand acts as isomerase to correct non-native disulfide bonds (Ellgaardand Ruddock, 2005). Such proteins also show chaperone activity(Ferrari and Soling, 1999). In S. cerevisiae, PDI plays an essential role inthe isomerization of disulphide bonds along with its redox activity forthe substrates such as carboxypeptidase Y. The role of a periplasmicdisulfide oxidoreductase in the pathogenesis has already been wellestablished in the case of Haemophilus influenzae (Rosadini et al.,2008), indicating the significance of oxidoreductase enzyme as a poten-tial therapeutic target.

3.2.6. Peroxidoxin activityHP B9W9J1 is predicted as an enzyme with peroxidoxin activity

because it has carboxymuconolactone decarboxylase (CMD) domain(Marchler-Bauer et al., 2011; Quevillon et al., 2005). Protein of this fam-ily plays a vital role in aromatic compound degradation under aerobicconditions in bacteria as it is involved in protocatechuate catabolism3-oxoadipate pathway (Eulberg et al., 1998). Alkyl hydroperoxide re-ductase shows antioxidant activity with hydroperoxidase activity andtogether with protein like AhpC, DlaT and Lpd, it constitutes NADH-dependent peroxidase, that protects bacterium from reactive nitrogen

e hypothetical proteins from Candida dubliniensis, Gene (2014), http://

6 K. Kumar et al. / Gene xxx (2014) xxx–xxx

by serving against peroxynitrate reductase (Bryk et al., 2002). Further-more, mycobacterial peroxiredoxin AhpC, a member of the family ofnon-heme peroxidases, protects heterologous bacterial and humancells from oxidative and nitrosative injuries (Chen et al., 1998). Theseobservations clearly indicate the potential role of enzymes, havingCMD domain, in the pathogenesis.

3.3. Binding proteins

3.3.1. Nucleic acid binding proteinFive HPs are predicted as DNA-binding proteins. HP B9WFR8 has a

domain similar to the transcription factor which is specifically foundin fungi, includes transcriptional activator xlnR, and a Zn2+-Cys6binuclear cluster DNA-binding domains. These domains are present inthe GAL4, a transcription regulators, and contain Zn2+-Cys6 motif thatbinds to sequences containing 2 DNA half sites constituted by 3–5 C/Gcombinations (Marmorstein et al., 1992). These domains are involvedin binding to DNA at a major groove along with zinc (Marmorsteinet al., 1992). Similarly, HP B9WFS2 contains a domain for GAL4 alongwith fungal transcription factor regulatory middle homology region(fungal_TF_MHR) (Marchler-Bauer et al., 2011). Fungal_TF_MHR isfound in a large family of fungal zinc cluster transcriptional factorsthat have N-terminal GAL4-like C6 zinc binuclear cluster DNA-bindingdomain. This protein showed 84% sequence identity with knownorthologous protein ZCF25 of C. albicans (Letunic et al., 2012; Powellet al., 2012). Interestingly, these domains are also conserved in HPB9WFU7, however, no significant hits are obtained for orthologous pro-tein search. GAL4 domain is also conserved in DNA-binding protein HPB9WFV3, and showed significant sequence similarity with orthologousprotein RGT1 of S. cerevisiae and C. albicans (Letunic et al., 2012). HPB9WFW8 showed higher similarity with RAD24 protein of S. cerevisiaeand C. albicans and presumably is involved in DNA damage checkpointmechanism (Marchler-Bauer et al., 2011; Quevillon et al., 2005). HPB9WFG9 shows close sequence similarity with PIN (PilT N terminus)domain of RBP1, a well characterized protein of S. cerevisiae. It is in-volved in nonsense-mediated mRNA decay, and essentially binds to ei-ther RNA or single stranded DNA (Lee and Moss, 1993; Letunic et al.,2012;Marchler-Bauer et al., 2011). Similarly, HP B9WFE6was predictedas a RNA-binding protein, and showed close sequence similarity to thetobacco mosaic virus RNA-binding protein, which presumably plays asignificant role in cell to cell movement during the early stage of infec-tion (Gafny et al., 1992; Quevillon et al., 2005). HP B9WFS6 contains adomain like RNAmethyltransferase, andmay be involved in RNAmeth-ylation (Zarembinski et al., 2003).

3.3.2. Protein bindingHP B9WFF1 showed a motif like ‘tetratrico peptide repeat’ that

mediates protein–protein interaction and assembly of multi-proteincomplex. This protein is generally involved in neurogenesis, cell cycleregulation, transcriptional control, mitochondrial and peroxisomaltransport and protein folding (Lamb et al., 1995). HP B9WFM3 hasCUE like domain, involved in ubiquitin interaction. It also shows similar-ity with interleukin 1 protein and is involved in signal transductionpathway (Donaldson et al., 2003). The sequence of HP B9WFS4 isconservedwith fungal chaperone Rot1, an essentialmolecular chaperonfound in themembrane of the endoplasmic reticulum of S. cerevisiae. Mo-lecular chaperons are involved in folding of denatured protein in vivo andprevent self-aggregation of proteins in vitro (Marchler-Bauer et al., 2011;Quevillon et al., 2005). HP B9WIB2 shows a close relationshipwith class Sprotein, a protein of phosphatidylinositol–glycan biosynthesis family. Itcomplexes with glycosylphosphatidylinositol trans-amidase anchoringGPI in the endoplasmic reticulum (Ohishi et al., 2001). HP B9WFE4 sharesa similar domain with proteins of P-loop NTPase superfamily that havemotif for phosphate-binding known as Walker A motif. Members of thissuper family participate in nucleotide/nucleoside binding (Ohishiet al., 2001). HP B9WFD2 showed a close phylogenetic relationship

Please cite this article as: Kumar, K., et al., Functional annotation of putativdx.doi.org/10.1016/j.gene.2014.03.060

with YAP180, a protein of S. cerevisiae, involved in phosphoinositidebinding that acts as universal adaptor for the nucleation of cathrincoats (Bruckmann et al., 2000; Monge et al., 2006).

3.4. Other proteins

3.4.1. StructuralIn our studywe have also found that HPB9WFD7has a cuticular pro-

tein signature which is a structural protein (Quevillon et al., 2005).These cuticular proteins are a composite structures with optimizedmechanical properties for biological function. Cuticular protein LM-76isolated from pharate cuticle of the Locusta migratoria and has ho-mology with B9WFD7, is rich in amino acids Gly, Leu and Tyr at theN-terminal position in the conserved sequence (Andersen et al., 1993).Fungus is often covered by a proteinaceous surface layer that acts as asieve for external molecular influx and protects microbes from externalaggression (Kwan et al., 2006). Hence, the structural proteins are equallyimportant for survival and pathogenesis.

3.4.2. TransportHP B9WFH2 is predicted to be involved in transportation because of

its close resemblance with the major facilitator superfamily (MFS) pro-teins. MFS proteins act as secondary transporters to facilitate transportof various substrates including drugs, neurotransmitters, amino acids,sugar phosphate and ions across cytoplasmic or internal membrane(Law et al., 2008; Marchler-Bauer et al., 2011; Quevillon et al., 2005).It has been reported that multidrug transporter of MFS proteins playscrucial role in the treatment of infectious disease. They are capable tohandle wide range of cytotoxic compounds even if they are structurallyand electrically dissimilar (Lewinson et al., 2006). Another protein pre-dicted for transport activity is HP B9WIH5, that shows homology withMae1 of Schizosaccharomyces pombe and Ss1 of S. cerevisiae. Mae1 is amalate transporter, whereas Ss1 is reported to be involved in sulfite ef-flux pump in the yeast (Quevillon et al., 2005; Vahisalu et al., 2008).Generally, therapeutic drugs act on four main categories of moleculartargets such as enzymes, receptors, ion channels and transporters.Among these potential drug targets 60–70% are membrane proteins,clearly indicating the potential therapeutic application of HPs (StGeorgiev, 2000).

3.4.3. AdhesinsIt has been reported that adhesins in Candida, play a very important

role in host cell recognition. It binds to carbohydrate-containing recep-tors during invasion (Sturtevant and Calderone, 1997). We used theFungalRV server to predict the HPs having adhesin like signature,which is one of the important factors for causing pathogenesis to thehost (Krogfelt, 1991). Adherence of microorganisms to host tissuecauses tissue damage, invasion and dissemination. Among the 43 HPs,we have found five HPs, B9WFD7, B9WFE6, B9WFG8, B9WFT7,B9WIH4, that showed adhesin character, and could be used as a tar-get for vaccine generation because adhesins of the fungal cell wall is pri-marily involved in adherence to host tissue, critical for colonizationleading to invasion anddamage of the host tissue. In the group of adhesinproteins, HP B9WFD7 is a structural protein that may participate inhyphae formation and get involved in host–pathogen interaction.Furthermore, such proteins may be a potential target for drug designand discovery because of the striking features of pathogenic fungi andCandida spp. and their ability to adhere tightly to different surfacessuch as human skin, endothelial and epithelial mucosal tissues (deGroot et al., 2013). We expect that future experimental studies focusedon functional characterization of novel putative adhesins will providemany new insights into their role in pathogenesis and host nichewhere the fungus lives and survives.

e hypothetical proteins from Candida dubliniensis, Gene (2014), http://

7K. Kumar et al. / Gene xxx (2014) xxx–xxx

4. Discussion

The unit of HPs is stillwaiting for an experimental validation to showtheir existence at the protein level. Hence, bioinformatic handling ofthese protein sequences to assign a tentative function is mandatory(Lubec et al., 2005). Understanding of HPs is of utmost importance tocomplete genomic and proteomic information. Furthermore, detectionof new HPs not only offers presentation of new structures but alsonew functions. Here we aligned the sequences of HPs in global andfocal databases for homology searches. Searches for some distinctmotif and domain have provided some clues for possible functions ofHPs that are listed in Table 3. Moreover, prediction of subcellular local-ization and signal peptide identifications were applied to know the ac-tual functional site for respective protein at cellular level (Table 2).Moreover, a complementing strategies for determination of physico-chemical properties such as amino acid composition and hydrophobic-ity scoring, etc. were addressed in Table S1.

We first annotated the sequence of HPs on the basis of sequencesimilarity followed by domain, motif and family search. If all resultssuggest the same function then we assigned a particular function tothe corresponding protein sequence listed in Table 4. Moreover, proteinfold plays a significant role in their function and hence the fold predic-tion has also been applied in order to further validate the predictedfunction (Table 5). Based on these results, we successfully annotatedthe function of 27 HPs which can be further used as a lead for designingexperimental approaches geared towards evaluation of exact functionof the gene.

C. dubliniensis is the most closely related species to C. albicans, apathogenic yeast species to humans. We used in silico approach topredict the function of 43 HPs. Five proteins are predicted as DNA-binding protein which may be involved in transcription regulation.Three proteins function as RNA-binding protein. There are 10 HPswhich showed catalytic activity, that are essentially important for thepathogenesis. HP, B9WFE4 showed ATP-binding activity, whileB9WFD7 acts as a structural protein. HPs, B9WFH2 and B9WIH5 mayact as transporter protein. We identified five HPs that are adhesin-likeprotein, that may be involved in host–pathogen interaction. We didnot find sufficient evidences to predict the functions of 16 proteins.

5. Conclusions

In silico analysis described here provides a simple and correctmeth-od for assigning function to various HPs of C. dubliniensis. Insufficientsequence resemblance in the database to the some of HPs creates prob-lems for accurate functional predictions. Our study facilitates a rapididentification of the hidden function of HPs which is a potential thera-peutic targets and may play a significant role in host–pathogen interac-tions. Once these HPs are established as a novel drug/vaccine targets,further research for new inhibitors and vaccines can be conducted forother clinically important pathogens.

Supplementary data to this article can be found online at http://dx.doi.org/10.1016/j.gene.2014.03.060.

Conflict of interest

We do not have any conflict of interest.

Acknowledgments

This work is supported by the Indian Council of Medical Research(BIC/12(04)/2012) to MIH and FA. AP is thankful to the UGC (BSRgrant), Delhi, India, for providing the Dr. DS Kothari post-doctoralfellowship to carry this work.

Please cite this article as: Kumar, K., et al., Functional annotation of putativdx.doi.org/10.1016/j.gene.2014.03.060

References

Adams, M.A., Suits, M.D., Zheng, J., Jia, Z., 2007. Piecing together the structure–functionpuzzle: experiences in structure-based functional annotation of hypothetical pro-teins. Proteomics 7, 2920–2932.

Altschul, S.F., Koonin, E.V., 1998. Iterated profile searches with PSI-BLAST—a tool for dis-covery in protein databases. Trends in Biochemical Sciences 23, 444–447.

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., 1997.Gapped BLAST and PSI-BLAST: a new generation of protein database search pro-grams. Nucleic Acids Research 25, 3389–3402.

Andersen, J.S., Andersen, S.O., Hojrup, P., Roepstorff, P., 1993. Primary structure of a14 kDa basic structural protein (Lm-76) from the cuticle of the migratory locust,Locusta migratoria. Insect Biochemistry and Molecular Biology 23, 391–402.

Bakali, H.M., Herman, M.D., Johnson, K.A., Kelly, A.A., Wieslander, A., Hallberg, B.M.,Nordlund, P., 2007. Crystal structure of YegS, a homologue to the mammaliandiacylglycerol kinases, reveals a novel regulatorymetal binding site. Journal of Biolog-ical Chemistry 282, 19644–19652.

Bjornson, H.S., 1984. Enzymes associated with the survival and virulence of gram-negative anaerobes. Reviews of Infectious Diseases 6 (Suppl. 1), S21–S24.

Bork, P., Koonin, E.V., 1996. Protein sequencemotifs. Current Opinion in Structural Biology6, 366–376.

Bruckmann, A., Kunkel, W., Hartl, A., Wetzker, R., Eck, R., 2000. A phosphatidylinositol 3-kinase of Candida albicans influences adhesion, filamentous growth and virulence.Microbiology 146 (Pt 11), 2755–2764.

Bryk, R., Lima, C.D., Erdjument-Bromage, H., Tempst, P., Nathan, C., 2002. Metabolic en-zymes of mycobacteria linked to antioxidant defense by a thioredoxin-like protein.Science 295, 1073–1077.

Chaudhuri, R., Ansari, F.A., Raghunandanan, M.V., Ramachandran, S., 2011. FungalRV:adhesin prediction and immunoinformatics portal for human fungal pathogens.BMC Genomics 12, 192.

Chen, Z., Bode, W., 1983. Refined 2.5 A X-ray crystal structure of the complex formed byporcine kallikrein A and the bovine pancreatic trypsin inhibitor. Crystallization,Patterson search, structure determination, refinement, structure and comparisonwith its components and with the bovine trypsin–pancreatic trypsin inhibitor com-plex. Journal of Molecular Biology 164, 283–311.

Chen, L., Xie, Q.W., Nathan, C., 1998. Alkyl hydroperoxide reductase subunit C (AhpC) pro-tects bacterial and human cells against reactive nitrogen intermediates. MolecularCell 1, 795–805.

Chen, Y., Yu, P., Luo, J., Jiang, Y., 2003. Secreted protein prediction system combining CJ-SPHMM, TMHMM, and PSORT. Mammalian Genome 14, 859–865.

Das, A.K., Helps, N.R., Cohen, P.T., Barford, D., 1996. Crystal structure of the protein serine/threonine phosphatase 2C at 2.0 A resolution. EMBO Journal 15, 6798–6809.

de Groot, P.W., Bader, O., de Boer, A.D., Weig, M., Chauhan, N., 2013. Adhesins in humanfungal pathogens: glue with plenty of stick. Eukaryotic Cell 12, 470–481.

Desler, C., Suravajhala, P., Sanderhoff, M., Rasmussen, M., Rasmussen, L.J., 2009. In silicoscreening for functional candidates amongst hypothetical proteins. BMC Bioinformat-ics 10, 289.

Donaldson, K.M., Yin, H., Gekakis, N., Supek, F., Joazeiro, C.A., 2003. Ubiquitin signals pro-tein trafficking via interaction with a novel ubiquitin binding domain in the mem-brane fusion regulator, Vps9p. Current Biology 13, 258–262.

Eisenstein, E., Gilliland, G.L., Herzberg, O., Moult, J., Orban, J., Poljak, R.J., Banerjei, L.,Richardson, D., Howard, A.J., 2000. Biological function made crystal clear — annotationof hypothetical proteins via structural genomics. Current Opinion in Biotechnology 11,25–30.

Ellgaard, L., Ruddock, L.W., 2005. The human protein disulphide isomerase family: sub-strate interactions and functional properties. EMBO Reports 6, 28–32.

Eulberg, D., Lakner, S., Golovleva, L.A., Schlomann, M., 1998. Characterization of aprotocatechuate catabolic gene cluster from Rhodococcus opacus 1CP: evidence for amerged enzyme with 4-carboxymuconolactone-decarboxylating and 3-oxoadipateenol-lactone-hydrolyzing activity. Journal of Bacteriology 180, 1072–1081.

Ferrari, D.M., Soling, H.D., 1999. The protein disulphide-isomerase family: unravelling astring of folds. Biochemical Journal 339 (Pt 1), 1–10.

Gafny, R., Lapidot, M., Berna, A., Holt, C.A., Deom, C.M., Beachy, R.N., 1992. Effects of termi-nal deletion mutations on function of the movement protein of tobacco mosaic virus.Virology 187, 499–507.

Galperin, M.Y., Koonin, E.V., 2004. ‘Conserved hypothetical’ proteins: prioritization of tar-gets for experimental study. Nucleic Acids Research 32, 5452–5463.

Garcera, A., Casas, C., Herrero, E., 2010. Expression of Candida albicans glutathione trans-ferases is induced inside phagocytes and upon diverse environmental stresses.FEMS Yeast Research 10, 422–431.

Hassan,M.I., Kumar, V., Singh, T.P., Yadav, S., 2007a. Structuralmodel of human PSA: a tar-get for prostate cancer therapy. Chemical Biology & Drug Design 70, 261–267.

Hassan, M.I., Kumar, V., Somvanshi, R.K., Dey, S., Singh, T.P., Yadav, S., 2007b. Structure-guided design of peptidic ligand for human prostate specific antigen. Journal of Pep-tide Science 13, 849–855.

Hirokawa, T., Boon-Chieng, S., Mitaku, S., 1998. SOSUI: classification and secondary struc-ture prediction system for membrane proteins. Bioinformatics 14, 378–379.

Jackson, A.P., Gamble, J.A., Yeomans, T., Moran, G.P., Saunders, D., Harris, D., Aslett, M.,Barrell, J.F., Butler, G., Citiulo, F., et al., 2009. Comparative genomics of the fungal path-ogens Candida dubliniensis and Candida albicans. Genome Research 19, 2231–2244.

Krogfelt, K.A., 1991. Bacterial adhesion: genetics, biogenesis, and role in pathogene-sis of fimbrial adhesins of Escherichia coli. Reviews of Infectious Diseases 13,721–735.

Kuhn, P., Tarentino, A.L., Plummer Jr., T.H., Van Roey, P., 1994. Crystal structure of peptide-N4-(N-acetyl-beta-D-glucosaminyl)asparagine amidase F at 2.2-A resolution. Bio-chemistry 33, 11699–11706.

e hypothetical proteins from Candida dubliniensis, Gene (2014), http://

8 K. Kumar et al. / Gene xxx (2014) xxx–xxx

Kwan, A.H., Winefield, R.D., Sunde, M., Matthews, J.M., Haverkamp, R.G., Templeton, M.D.,Mackay, J.P., 2006. Structural basis for rodlet assembly in fungal hydrophobins. Pro-ceedings of the National Academy of Sciences of the United States of America 103,3621–3626.

Lamb, J.R., Tugendreich, S., Hieter, P., 1995. Tetratrico peptide repeat interactions: to TPRor not to TPR? Trends in Biochemical Sciences 20, 257–259.

Law, C.J., Maloney, P.C., Wang, D.N., 2008. Ins and outs of major facilitator superfamilyantiporters. Annual Review of Microbiology 62, 289–305.

Lee, F.J., Moss, J., 1993. An RNA-binding protein gene (RBP1) of Saccharomyces cerevisiaeencodes a putative glucose-repressible protein containing two RNA recognition mo-tifs. Journal of Biological Chemistry 268, 15080–15087.

Letunic, I., Doerks, T., Bork, P., 2012. SMART 7: recent updates to the protein domain an-notation resource. Nucleic Acids Research 40, D302–D305.

Lewinson, O., Adler, J., Sigal, N., Bibi, E., 2006. Promiscuity in multidrug recognition andtransport: the bacterial MFS Mdr transporters. Molecular Microbiology 61, 277–284.

Lim, R., Cohen, S.S., 1966. D-phosphoarabinoisomerase and D-ribulokinase in Escherichiacoli. Journal of Biological Chemistry 241, 4304–4315.

Loewenstein, Y., Raimondo, D., Redfern, O.C., Watson, J., Frishman, D., Linial, M., Orengo, C., Thornton, J., Tramontano, A., 2009. Protein function annotation by homology-basedinference. Genome Biology 10, 207.

Lubec, G., Afjehi-Sadat, L., Yang, J.W., John, J.P., 2005. Searching for hypothetical proteins:theory and practice based upon original data and literature. Progress in Neurobiology77, 90–127.

Maeda, T., Tsai, A.Y., Saito, H., 1993. Mutations in a protein tyrosine phosphatase gene(PTP2) and a protein serine/threonine phosphatase gene (PTC1) cause a syntheticgrowth defect in Saccharomyces cerevisiae. Molecular and Cellular Biology 13,5408–5417.

Marchler-Bauer, A., Lu, S., Anderson, J.B., Chitsaz, F., Derbyshire, M.K., DeWeese-Scott, C.,Fong, J.H., Geer, L.Y., Geer, R.C., Gonzales, N.R., et al., 2011. CDD: a Conserved DomainDatabase for the functional annotation of proteins. Nucleic Acids Research 39,D225–D229.

Marmorstein, R., Carey, M., Ptashne, M., Harrison, S.C., 1992. DNA recognition by GAL4:structure of a protein–DNA complex. Nature 356, 408–414.

Mazandu, G.K., Mulder, N.J., 2012. Function prediction and analysis of Mycobacteriumtuberculosis hypothetical proteins. International Journal of Molecular Sciences 13,7283–7302.

Monge, R.A., Roman, E., Nombela, C., Pla, J., 2006. The MAP kinase signal transduction net-work in Candida albicans. Microbiology 152, 905–912.

Moran, G.P., Sullivan, D.J., Henman, M.C., McCreary, C.E., Harrington, B.J., Shanley, D.B.,Coleman, D.C., 1997. Antifungal drug susceptibilities of oral Candida dubliniensis iso-lates from human immunodeficiency virus (HIV)-infected and non-HIV-infected sub-jects and generation of stable fluconazole-resistant derivatives in vitro. AntimicrobialAgents and Chemotherapy 41, 617–623.

Nakai, K., Horton, P., 1999. PSORT: a program for detecting sorting signals in proteins andpredicting their subcellular localization. Trends in Biochemical Sciences 24, 34–36.

Nimrod, G., Schushan, M., Steinberg, D.M., Ben-Tal, N., 2008. Detection of functionallyimportant regions in “hypothetical proteins” of known structure. Structure 16,1755–1763.

O'Connor, L., Caplice, N., Coleman, D.C., Sullivan, D.J., Moran, G.P., 2010. Differentialfilamentation of Candida albicans and Candida dubliniensis is governed by nutrientregulation of UME6 expression. Eukaryotic Cell 9, 1383–1397.

Ohishi, K., Inoue, N., Kinoshita, T., 2001. PIG-S and PIG-T, essential for GPI anchor attach-ment to proteins, form a complex with GAA1 and GPI8. EMBO Journal 20, 4088–4098.

Ollis, D.L., Cheah, E., Cygler, M., Dijkstra, B., Frolow, F., Franken, S.M., Harel, M., Remington,S.J., Silman, I., Schrag, J., et al., 1992. The alpha/beta hydrolase fold. Protein Engineer-ing 5, 197–211.

Please cite this article as: Kumar, K., et al., Functional annotation of putativdx.doi.org/10.1016/j.gene.2014.03.060

Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M., 1997. CATH—a hierarchic classification of protein domain structures. Structure 5, 1093–1108.

Petersen, T.N., Brunak, S., von Heijne, G., Nielsen, H., 2011. SignalP 4.0: discriminating sig-nal peptides from transmembrane regions. Nature Methods 8, 785–786.

Powell, S., Szklarczyk, D., Trachana, K., Roth, A., Kuhn, M., Muller, J., Arnold, R., Rattei, T.,Letunic, I., Doerks, T., et al., 2012. eggNOG v3.0: orthologous groups covering 1133 or-ganisms at 41 different taxonomic ranges. Nucleic Acids Research 40, D284–D289.

Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., Lopez, R., 2005.InterProScan: protein domains identifier. Nucleic Acids Research 33, W116–W120.

Ren, X., Aleshin, M., Jo, W.J., Dills, R., Kalman, D.A., Vulpe, C.D., Smith, M.T., Zhang, L., 2011.Involvement of N-6 adenine-specific DNA methyltransferase 1 (N6AMT1) in arsenicbiomethylation and its role in arsenic-induced toxicity. Environmental Health Per-spectives 119, 771–777.

Ricagno, S., Jonsson, S., Richards, N., Lindqvist, Y., 2003. Formyl-CoA transferase enclosesthe CoA binding site at the interface of an interlocked dimer. EMBO Journal 22,3210–3219.

Rosadini, C.V., Wong, S.M., Akerley, B.J., 2008. The periplasmic disulfide oxidoreductaseDsbA contributes to Haemophilus influenzae pathogenesis. Infection and Immunity76, 1498–1508.

Sakane, F., Imai, S., Kai, M., Yasuda, S., Kanoh, H., 2007. Diacylglycerol kinases: why somany of them? Biochimica et Biophysica Acta 1771, 793–806.

Schaller, M., Borelli, C., Korting, H.C., Hube, B., 2005. Hydrolytic enzymes as virulence fac-tors of Candida albicans. Mycoses 48, 365–377.

Sebti, A., Kiehn, T.E., Perlin, D., Chaturvedi, V., Wong, M., Doney, A., Park, S., Sepkowitz, K.A., 2001. Candida dubliniensis at a cancer center. Clinical Infectious Diseases 32,1034–1038.

Shahbaaz, M., Hassan, M.I., Ahmad, F., 2013. Functional annotation of conserved hypo-thetical proteins from Haemophilus influenzae Rd KW20. PLoS One 8, e84263.

St Georgiev, V., 2000. Membrane transporters and antifungal drug resistance. CurrentDrug Targets 1, 261–284.

Sturtevant, J., Calderone, R., 1997. Candida albicans adhesins: biochemical aspects and vir-ulence. Revista Iberoamericana de Micología 14, 90–97.

Sullivan, D., Coleman, D., 1998. Candida dubliniensis: characteristics and identification.Journal of Clinical Microbiology 36, 329–334.

Sullivan, D.J., Moran, G.P., Coleman, D.C., 2005. Candida dubliniensis: ten years on. FEMSMicrobiology Letters 253, 9–17.

Thakur, P.K., Hassan, I., 2011. Discovering a potent small molecule inhibitor for gankyrinusing de novo drug design approach. International Journal of Computational Biologyand Drug Design 4, 373–386.

Thakur, P.K., Kumar, J., Ray, D., Anjum, F., Hassan, M.I., 2013. Search of potential inhibitoragainst New Delhi metallo-beta-lactamase 1 from a series of antibacterial naturalcompounds. Journal of Natural Science, Biology and Medicine 4, 51–56.

Thomas, P.D., Campbell, M.J., Kejariwal, A., Mi, H., Karlak, B., Daverman, R., Diemer, K.,Muruganujan, A., Narechania, A., 2003. PANTHER: a library of protein families andsubfamilies indexed by function. Genome Research 13, 2129–2141.

Thompson, J.D., Gibson, T.J., Higgins, D.G., 2002. Multiple sequence alignment usingClustalW and ClustalX. Current Protocols in Bioinformatics (Chapter 2, Unit 2 3).

Vahisalu, T., Kollist, H., Wang, Y.F., Nishimura, N., Chan, W.Y., Valerio, G., Lamminmaki, A.,Brosche, M., Moldau, H., Desikan, R., et al., 2008. SLAC1 is required for plant guard cellS-type anion channel function in stomatal signalling. Nature 452, 487–491.

Zarembinski, T.I., Kim, Y., Peterson, K., Christendat, D., Dharamsi, A., Arrowsmith, C.H.,Edwards, A.M., Joachimiak, A., 2003. Deep trefoil knot implicated in RNA bindingfound in an archaebacterial protein. Proteins 50, 177–183.

e hypothetical proteins from Candida dubliniensis, Gene (2014), http://