Preparation and Analysis of an Expressed Sequence Tag Library from the Toxic Dinoflagellate...

9
ORIGINAL ARTICLE Preparation and Analysis of an Expressed Sequence Tag Library from the Toxic Dinoflagellate Alexandrium catenella Paulina Uribe & Daniela Fuentes & Jorge Valdés & Amir Shmaryahu & Alicia Zúñiga & David Holmes & Pablo D. T. Valenzuela Received: 12 June 2007 / Accepted: 10 April 2008 # Springer Science + Business Media, LLC 2008 Abstract Dinoflagellates of the genus Alexandrium are photosynthetic microalgae that have an extreme importance due to the impact of some toxic species on shellfish aquaculture industry. Alexandrium catenella is the species responsible for the production of paralytic shellfish poison- ing in Chile and other geographical areas. We have constructed a cDNA library from midexponential cells of A. catenella grown in culture free of associated bacteria and sequenced 10,850 expressed sequence tags (ESTs) that were assembled into 1,021 contigs and 5,475 singletons for a total of 6,496 unigenes. Approximately 41.6% of the unigenes showed similarity to genes with predicted func- tion. A significant number of unigenes showed similarity with genes from other dinoflagellates, plants, and other protists. Among the identified genes, the most expressed correspond to those coding for proteins of luminescence, carbohydrate metabolism, and photosynthesis. The sequen- ces of 9,847 ESTs have been deposited in Gene Bank (accession numbers EX 454357464203). Keywords Alexandrium catenella . cDNA sequencing . Red tide microalgae . Toxic dinoflagellate Introduction Dinoflagellates represent a unique and important group of organisms in the marine environment in terms of their numbers and diversity, as well as their ecological and physiological significance. They commonly occur as free- living, photosynthetic, and marine unicellular algae, but also include endosymbiotic, parasitic, heterotrophic, and freshwa- ter taxa. Some species are responsible for the production of potent toxins that can be accumulated by shellfish and affect humans and marine mammals. They form harmful blooms or red tides,in which cell numbers reach more than one million cells per liter of seawater, producing a significant economic impact and public health concern on different geographical areas throughout the world (Scholin et al. 1995; Hallegraeff 1993). Dinoflagellates are the only photosynthetic organisms capable of bioluminescence (Sweeney 1987). Dinoflagellates are also unique among eukaryotes in many other biological and morphological characteristics. Their DNA content is higher than other eukayotes [from 3 to 250 pg/cell, or approximately 3,000215,000 megabases (Spector 1984; Triplett et al. 1993; Santos and Coffroth 2003)]. This is up to over 80 times the size of the human genome (Lin 2006). Their chromosomes consist of permanently condensed, genetically inactive central regions with peripheral loops of B-DNA that protrude from this core and comprise the actively transcribed DNA (Sigee 1984; Anderson et al. 1992; Bhaud et al. 1999). Dinoflagellates of the genus Alexandrium such as Alexan- drium catenella cause paralytic shellfish poisoning through saxitoxin production in Chile and in many geographical Mar Biotechnol DOI 10.1007/s10126-008-9107-8 P. Uribe : D. Fuentes : A. Zúñiga : P. D. T. Valenzuela(*) Fundación Ciencia para la Vida, Av. Zañartu 1482, Ñuñoa, Santiago, Chile e-mail: [email protected] J. Valdés : A. Shmaryahu : D. Holmes : P. D. T. Valenzuela Instituto MIFAB, Zañartu 1482, Ñuñoa, Santiago, Chile J. Valdés : A. Shmaryahu : D. Holmes Center for Bioinformatics and Genome Biology, Zañartu 1482, Ñuñoa, Santiago, Chile

Transcript of Preparation and Analysis of an Expressed Sequence Tag Library from the Toxic Dinoflagellate...

ORIGINAL ARTICLE

Preparation and Analysis of an Expressed Sequence TagLibrary from the Toxic Dinoflagellate Alexandrium catenella

Paulina Uribe & Daniela Fuentes & Jorge Valdés &

Amir Shmaryahu & Alicia Zúñiga & David Holmes &

Pablo D. T. Valenzuela

Received: 12 June 2007 /Accepted: 10 April 2008# Springer Science + Business Media, LLC 2008

Abstract Dinoflagellates of the genus Alexandrium arephotosynthetic microalgae that have an extreme importancedue to the impact of some toxic species on shellfishaquaculture industry. Alexandrium catenella is the speciesresponsible for the production of paralytic shellfish poison-ing in Chile and other geographical areas. We haveconstructed a cDNA library from midexponential cells ofA. catenella grown in culture free of associated bacteria andsequenced 10,850 expressed sequence tags (ESTs) thatwere assembled into 1,021 contigs and 5,475 singletons fora total of 6,496 unigenes. Approximately 41.6% of theunigenes showed similarity to genes with predicted func-tion. A significant number of unigenes showed similaritywith genes from other dinoflagellates, plants, and otherprotists. Among the identified genes, the most expressedcorrespond to those coding for proteins of luminescence,carbohydrate metabolism, and photosynthesis. The sequen-ces of 9,847 ESTs have been deposited in Gene Bank(accession numbers EX 454357–464203).

Keywords Alexandrium catenella . cDNA sequencing .

Red tide microalgae . Toxic dinoflagellate

Introduction

Dinoflagellates represent a unique and important group oforganisms in the marine environment in terms of theirnumbers and diversity, as well as their ecological andphysiological significance. They commonly occur as free-living, photosynthetic, and marine unicellular algae, but alsoinclude endosymbiotic, parasitic, heterotrophic, and freshwa-ter taxa. Some species are responsible for the production ofpotent toxins that can be accumulated by shellfish and affecthumans and marine mammals. They form harmful blooms or“red tides,” in which cell numbers reachmore than onemillioncells per liter of seawater, producing a significant economicimpact and public health concern on different geographicalareas throughout the world (Scholin et al. 1995; Hallegraeff1993). Dinoflagellates are the only photosynthetic organismscapable of bioluminescence (Sweeney 1987). Dinoflagellatesare also unique among eukaryotes in many other biologicaland morphological characteristics. Their DNA content ishigher than other eukayotes [from 3 to 250 pg/cell, orapproximately 3,000–215,000 megabases (Spector 1984;Triplett et al. 1993; Santos and Coffroth 2003)]. This is upto over 80 times the size of the human genome (Lin 2006).Their chromosomes consist of permanently condensed,genetically inactive central regions with peripheral loops ofB-DNA that protrude from this core and comprise theactively transcribed DNA (Sigee 1984; Anderson et al. 1992;Bhaud et al. 1999).

Dinoflagellates of the genus Alexandrium such as Alexan-drium catenella cause paralytic shellfish poisoning throughsaxitoxin production in Chile and in many geographical

Mar BiotechnolDOI 10.1007/s10126-008-9107-8

P. Uribe :D. Fuentes :A. Zúñiga : P. D. T. Valenzuela (*)Fundación Ciencia para la Vida,Av. Zañartu 1482, Ñuñoa,Santiago, Chilee-mail: [email protected]

J. Valdés :A. Shmaryahu :D. Holmes : P. D. T. ValenzuelaInstituto MIFAB,Zañartu 1482, Ñuñoa,Santiago, Chile

J. Valdés :A. Shmaryahu :D. HolmesCenter for Bioinformatics and Genome Biology,Zañartu 1482, Ñuñoa,Santiago, Chile

areas of the world. In Chile, the first documented toxicbloom was reported in 1972 in Magallanes (Guzmán andCampodónico 1975). Since then, the dominant toxic dino-flagellate species in Chilean southern coastal areas andestuaries during toxic bloom events has been identified to beA. catenella (Guzmán and Campodónico 1978).

The study of the molecular mechanisms that regulategrowth, toxicity, photosynthesis, luminescence, and of othercircadian-controlled expressed genes of A. catenella, is ofcritical importance for understanding the physiologicalmechanisms and bloom formation capacity of Alexandriumspecies. However, there have been few studies regardingthe molecular biology of A. catenella. Sequencing ofcomplementary DNA libraries to generate expressed se-quence tags (ESTs) is a reasonable approach for discover-ing expressed genes. ESTs can be used as markers for genesexpressed under specific conditions, for predicting proteinfamilies, and for the development of expression systems fornew proteins and their functions. Here, we have developedan EST library of A. catenella strain ACC07, isolated fromChilean waters, and have carried out large-scale sequencingto yield an EST database containing 10,850 ESTs and 6,496unique genes. This database provides an important genomicresource for scientists working on the genus Alexandriumand related dinoflagellates.

Materials and Methods

Strains and Media Alexandrium catenella clone ACC07isolated in Aysén, Chile, in 1994 was used. Cells were grownin f/2 medium (Guillard 1995) at 16°C in a 10:14 light/darkphotoperiod. Axenic cultures were obtained according toUribe and Espejo (2003). Briefly, cells were subjected tosequential washing and filtration through an 11-!m pore-sizenylon mesh with 0.05 mg/ml gentamicin and 0.2 mg/mlpenicillin G. Bacterial presence was determined by direct orepifluorescence microscopy after staining with acridineorange (Imai 1987; Kuwae and Hosokawa 1999). Thisprocedure allowed the detection of less than one bacteriumper dinoflagellate. The cell extracts of A. catenella strainACCO7 contained approximately 5–8 femtomoles of saxi-toxin equivalents/cell.

cDNA library preparation Approximately 6!106 cellsfrom an exponential phase culture were collected bycentrifugation at 1,000!g for 5 min during the light phaseand were broken by four successive cycles of freezing,grinding, and thawing. Approximately 400 !g of total RNAwas extracted using Trizol (Gibco BRL, Life Technologies,Gaithsburg, MD, USA), according to the manufacturer’sdirections, and quantified spectrophotometrically. Poly A+mRNA was isolated with the Poly (A) Quick mRNA

isolation kit (Stratagene, La Jolla, CA, USA). cDNA wasprepared from approximately 5 !g of polyA+ mRNA andcloned using the vector pExpress 1, exploiting the Not I andEco RV restriction endonuclease sites. Double-strand cDNAsynthesis was performed according to manufacturer’sdirections and quantified spectrophotometrically. ThecDNA libraries were not normalized. Sequencing reactionswere carried out from the 5! end of the cDNA insert usingthe universal primers M13FWD (5!-gtaaaacgacggccagt-3!)and M13REV (5!-caggaaacagctatgac-3!).

Computational Sequence Analysis and ESTs Assembly Vec-tor-derived, ribosomal, and ambiguous sequences wereremoved from the collected EST sequences. EST sequenceswere assembled in clusters with a minimum value of 95%identity for at least a 50-bp region of overlap using theCAP3 program (Huang and Madan 1999). Clusters andsingletons generated were designed as unigenes and werethen subjected to similarity searches against the NationalCenter for Biotechnology Information nonredundant pro-tein database, using the BLASTX algorithm (Altschul et al.1990). Initially, sequence similarities were considered to besignificant when the E value was below e!5 at theaminoacid sequence level. However, a stricter criterionwith a cut-off E value of e!20 or less was also used in theanalysis. The InterProScan (Mulder et al. 2007), geneontology (the gene ontology consortium 2007), and clustersof orthologous groups (COGs) (Tatusov et al. 2003)databases were used to infer the functional classificationof the predicted proteins.

Results and Discussion

Characteristics of the A. catenella cDNA Library

The cDNA library obtained had a titer of 1.1!106 colonyforming units per milliliter for a total of 1!107 primaryrecombinants. Blue/white plaque identification followingplating of an aliquot of the library revealed 99% recombi-nant plaques. The quality of the library was assessed byexamining the insert size of 768 (2!384 well plates)randomly selected recombinant plaques. The average insertsize was 1.7 Kb, a value similar to that of a recent cDNAlibrary of Karenia brevis (Lidie et al. 2005). The averagesize of the sequenced clones was 763 base pairs, and about83% of the sequenced cDNA clones contained inserts thatwere longer than the single sequence read. The global G+Ccontent for these ESTs was 56.8%. This value is similar tothat obtained for the coding regions of Alexandriumtamarense (60.8%) (Hackett et al. 2005) and in the rangeof the values obtained for other dinoflagellates such as K.brevis (51%), Lingulodinium polyedrum (59.0%), Amphidi-

Mar Biotechnol

rium carterae (50.4%), and Crypthecodinium cohnii (50%)(Lidie et al. 2005 and references therein).

Analysis of the codon usage revealed a major use of G(35.1%) and C (44.8%) at the third position similar to thatobtained for A. tamarense (37.2% and 40.7%, respectively).The most frequent stop codon was TGA (72.7%), comparedto TAA and TAG (6.5% and 20.7%, respectively).

Generation and Annotation of Expressed Sequences Tags

EST sequences were produced from the cDNA library andscanned visually to confirm overall quality of peak shape andcorrespondence with base identification. After the cleaningprocess, the average length per EST of the remainingsequences (9,847) was 736 base pairs and the Phred qualityvalue was larger than 20. The sequences were assembled into1,021 contigs (clusters of assembled ESTs) and 5,475singletons (sequences found only once) (Table 1). Thesequences of 9,847 ESTs have been deposited in Gene Bankwith accession numbers EX454357–464203.

Contigs were composed of multiple ESTs ranging from 2to 438. The percentage of unigene sequences withsimilarity to GenBank database was 41.6%. This ESTcollection constitutes one of the largest dinoflagellatelibraries deposited (Lidie et al. 2005; Hackett et al. 2005;Tanikawa et al. 2004). The total number of unigenes was6,496, corresponding to less than half of the total sequencesobtained (Table 1). The ratio of sequenced ESTs to thenumber of unigenes is similar to that reported for otherdinoflagellate EST libraries.

Using a cut-off E value of e!5 or less, a total of5,443 ESTs corresponding to 2,700 unigenes, were found tohave similarity to previously identified genes from a widevariety of organisms. Alexandrium catenella sequenceswere classified according to the organism with the bestprotein sequence hit. A significant proportion of the ESTsshow similarity to genes of dinoflagellates (32%), plants(15%), and other protista (protozoa, ciliates, and othermicroalgae, 13%). Different percentages were found in arecent EST library analysis of the dinoflagellate L. poly-edrum, where the groups most frequently found were land

plants and animals (21% and 16%, respectively), whereassimilarities with prokaryotes, flagellates, and protozoa were14%, 14%, and 13%, respectively (Tanikawa et al. 2004). Asimilar analysis was carried out using an E value of e!20 orless. In this case, a total of 3,460 ESTs corresponding to1,546 unigenes were found to have similarity to previouslyidentified genes. As shown in Fig. 1, a large proportion ofthe ESTs show high level of similarity to genes ofdinoflagellates (47%), plants (18.1%), and other protista(protozoa, ciliates, and other microalgae, 13.4%) (Fig. 1).

The unigenes could be assigned to known COGs. Themost represented group of proteins in ESTs correspondingto cellular processes are those related to luminescence(14.5%), carbohydrate metabolism (13.6%), aminoacidmetabolism (12.6%), protein modification (10.9%), andphotosynthesis (8.3%). Using an E value of e!20 or less, themost represented group of proteins in ESTs correspondingto cellular processes are those related to luminescence(18.4%), carbohydrate metabolism (14.5%), aminoacidmetabolism (15%), protein modification (11.7%), andphotosynthesis (8.4%) (Fig. 2).

Among the first two categories, the majority of thepredicted proteins corresponded to those from dinoflagel-lates. Proteins from plants were the most represented incategories such as aminoacid and nucleic acid metabolism.On the other hand, proteins from protozoa were the mostrepresented in translation and cellular cycle categories.Some categories of proteins such as transport and thosewith noncharacterized function were similarly distributedamong different taxonomic groups of eukaryotes andprokaryotes. This distribution of categories of proteinsamong different taxonomic groups was similar when Evalues of e!30 or less were considered. In summary, byusing an E value of e!20 as cut off, a higher degree ofspecificity was obtained resulting in an increased percent ofproteins from dinoflagellate, protozoa, plants, and othermicroalgae.

Highly Represented Genes

The contigs containing the highest number of ESTs(analyzed with an E value of e!20) are listed in Table 2.The sequence coding for luciferin-binding protein (LBP) wasthe most abundant transcript in the library with 80 unigenes(3%) representing 539 ESTs (15.6% of the total ESTs).

This gene was also highly expressed in L. polyedrum(Machabée et al. 1994) and also highly expressed (4%) in aprevious study of a normalized EST library of thisdinoflagellate during the night phase (Tanikawa et al.2004). Similar results were reported in A. tamarense(Hackett et al. 2005). Recently, in a global transcriptionalprofiling of the toxic dinoflagellate Alexandrium fundyense,four of the 15 signature sequences matched with the LBP

Table 1 Overview of the results from the A. catenella genomiclibrary

Number of sequences

Total ESTs sequenced 10,859Total valid ESTs 9,847Average length per EST 736 bpNumber of contigs 1,021Number of singletons 5,475Total unigenes 6,496Percentage known unigenes 41.6

Mar Biotechnol

Fig. 2 Distribution of A. cate-nella ESTs into the GO catego-ries of cellular processes

Fig. 1 Taxonomic group distri-bution of targets with the besthit by A. catenella ESTs con-sidering an E value of less than10!20 to the National Center forBiotechnology Information pro-tein nonredundant database

Mar Biotechnol

gene (Edner and Anderson 2006). In the present study, thesequences of the two luminescence proteins of A. catenellawere subjected to a more detailed analysis.

Luciferin-binding Protein

The complete sequence of the LBP coding region of A.catenella ACCO7 was obtained. It comprises 2,194nucleotides, corresponding to 663 aminoacids. Sequencingthe genomic coding region indicated the lack of introns, andafter expression in bacteria, an 80-kDa protein was obtained(data not shown). The LBP contains four domains with lowidentity (15%) between them. The highest similarity in theEST database was found with A. tamarense (Table 3). Atthe aminoacid level, the highest similarity (76%) was foundwith L. polyedrum. As found previously in Lingulodinium,the amino terminal region of approximately 100 aminoacidsof LBP of A. catenella is similar (50%) to the equivalentregion of luciferase (LCF). This is the first completesequence of LBP reported in a toxic strain of the genusAlexandrium (accession number EU236684).

Luciferase

Another highly expressed luminescence protein was the LCF.Complete sequence analysis of the 3,476 nucleotides codingfor the A. catenella enzyme showed that the most closelyrelated were those from A. tamarense and A. affine (94%identity) (Liu et al. 2004). The sequence contains no intronsand presents three domains with an identity of 76% betweenthem, a significantly lower value than the identity obtained

when these domains are compared to other dinoflagellatespecies (Liu et al. 2004). Internal regions of each domain arethe most conserved, corresponding to the probable catalyticsite of this enzyme. Four conserved histidines are present, atthe following positions within each domain: first domain(D1): H138, H148, H163, and H169; second domain (D2):H512, H525, H540, and H546; and third domain (D3):H891, H901, H916, and H922. These histidines werepreviously reported in L. polyendrum and are probablyrelated to the pH regulation of the activity of this enzyme (Liet al. 1997). The first and the third domains of the LCF wereexpressed in bacteria and the products were 60 and 45 kDa,respectively. The three domains of this protein have shownto be functional in L. polyedrum (Li et al. 1997).

The synthesis of the two luminescence proteins LCF andLBP of Lingulodinium is regulated translationally; theirmRNA and protein levels remain constant over thecircadian cycle (Machabée et al. 1994). Remarkably, bothLCF and LBP in L. polyedrum are destroyed at the end ofthe night phase and then resynthesized in the next cycle.Moreover, the scintillons themselves are broken down andreformed each day (Machabée et al. 1994).

Although the ecological function of the luminescence indinoflagellates has not been determined, it is probablyrelated to predation avoidance and communication (Esaiasand Curl 1972; Abrahams and Townsend 1993). Taken intoaccount that the luminescence proteins are among the mostexpressed in A. catenella, probably a high proportion of theenergy of this dinoflagellate is dedicated to this particularphysiological response. Taken together, the specific char-acteristics of the luminescent proteins and their expressionpatterns in a paralytic shellfish poisoning producingdinoflagellate such as A. catenella are of special relevanceto unveil the mechanisms of bloom formation in a toxin-producing species. These specific features could be usefulfor the development of new tools for the detection andlocalization of this toxic species using bio-optical instru-ments (Seliger et al. 1961; Widder et al. 1993).

Also highly expressed are transcripts that show a very highsimilarity to the enzymes S-adenosyl-L-homocysteine hydro-lase and the S-adenosylmethionine synthetase 2 (E values ofe!129 and e!112, respectively). These enzymes are involved inmethylation reactions that play a major role in themodification of a large variety of acceptor molecules, suchas lipids, polysaccharides, nucleic acids, proteins, andsecondary plant products (reviewed by Giovanelli 1987). Ineukaryotes, DNA methylation has been implicated in thecontrol of several cellular processes, including differentia-tion, gene regulation, and embryonic development (Cheng1995). The high expression level of genes that matched withthe two heat shock proteins HSP90 and HSP70 sequenceswas also remarkable. These proteins participate in variouscellular processes including signal transduction, protein

Table 2 Most highly represented ESTs in A. catenella cDNA library

Protein Number ofESTs

%

LBP 539 15.6S-adenosyl-L-homocysteine hydrolase 169 4.9Glyceraldehyde-3-phosphate dehydrogenaseisoform 2 (GPDH)

128 3.7

S-adenosylmethionine synthetase 2 105 3.0Actin 87 2.5EF-1 alpha-like protein 80 2.3Fumarate reductase 71 2.1Peridinin chl a binding protein 62 1.8Hsp90 62 1.8Phosphoglycerate kinase 59 1.7Ribonucleoside-diphosphate reductase R2 56 1.6Hsp70 54 1.6Chloroplast phosphoribulokinase 49 1.4Polyubiquitin 42 1.2Chloroplast light harvesting complex protein 40 1.2LCF 35 1.0Light-harvesting polyprotein precursor 30 0.9

Mar Biotechnol

folding, protein degradation, and morphological evolution(Lindquist and Craig 1988). HSP70 proteins can be found indifferent cellular compartments and have a role in the dis-assembly of clathrin cages and also participate in the post-translational transmembrane targeting of proteins to cellularorganelles (Craig 1989). The sequence coding for these pro-teins have also been found in high frequency in the ESTlibrary of the dinoflagellate A. tamarense (Hackett et al.2005).

Photosynthesis and Light Harvesting Genes

None of the 15 known plastid-encoded genes from peridinin-producing dinoflagellates were represented among the ESTs

of the library (Zhang et al. 1999). Thus, the 30 photosyn-thesis unigenes represented in the A. catenella cDNA libraryare probably encoded in the nucleus (Table 3). All theseplastid protein sequences contain tripartite N-terminal target-ing signals that are shown to direct the trafficking of theseproteins through the different membranes of the dinoflagel-late secondary plastids. The distribution of these signalelements in A. catenella plastid protein sequences was equiv-alent to those observed in the dinoflagellate Heterocapsatriquera (Patron et al. 2005).

The origin of these nuclear encoded plastid proteinsequences is suggested by the relative high similarity withthose present in other peridinin-pigmented dinoflagellates(Table 3). The nuclear location of these genes can be verified

Table 3 Photosynthesis and light harvesting proteins of A. catenella EST library

GenBank access number Function E value % Identity Organism

PhotosynthesisEX456598 Chloroplast photosystem II 12 kDa extrinsic

protein (PsbU)6.00E-64 6.00E-64 Alexandrium tamarense

EX455192 Photosystem II 23 kDa polypeptide (PsbP) E-121 86 Phakopsora pachyrhiziEX458868 PSII cytochrome c550 oxygen-evolving (PsbV) E-108 91 Alexandrium tamarenseEX455275 Plastid oxygen evolving enhancer 1 precursor (PsbO) 4.00E-71 94 Alexandrium tamarenseEX456053 Chloroplast cytochrome f (PetA) E-116 90 Alexandrium tamarenseEX457854 Chloroplast ferredoxin (PetF) 9.00E-90 83 Alexandrium tamarenseEX455467 Chloroplast ferredoxin-NADP{+) reductase (PetH) E-127 90 Heterocapsa triquetraEX455749 Rieske iron–sulfur protein precursor (PetC) E-125 94 Alexandrium tamarenseEX463406 Photosystem I iron–sulfur center (PsaC) 3.00E-87 98 Alexandrium tamarenseEX456301 Chloroplast photosystem I subunit XI (PsaL) 3.00E-40 74 Heterocapsa triquetraEX456206 PSI, ferredoxin-binding protein II (PsaD) 3.00E-51 90 Symbiodinium sp.EX462386 Chloroplast photosystem I, subunit III (PsaF) E-101 93 Alexandrium tamarenseEX459236 Chloroplast ATP synthase gamma subunit (AtpC) 1.00E-91 89 Alexandrium tamarenseEX462908 Chloroplast ATP synthase subunit C (AtpH) 4.00E-76 76 Alexandrium tamarenseEX462123 Chloroplast light harvesting complex protein 2.00E-84 85 Lingulodinium polyedrumEX460350 Peridinin-chlorophyll a-binding protein (PCP) 4.00E-75 89 Lingulodinium polyedrumEX463946 Chloroplast phosphoribulokinase E-133 89 Amphidinium carteraeEX463746 Chloroplast transketolase E-114 80 Euglena gracilisEX462931 Cytosolic class II fructose bisphosphate aldolase E-130 94 Heterocapsa triquetraEX462518 Glyceraldehyde-3-phosphate dehydrogenase isoform 2 E-159 91 Symbiodinium sp.EX455341 Phosphoglycerate kinase E-142 88 Karenia brevisEX455303 Ribose-5-phosphate isomerase 5.00E-73 90 Phaeodactylum tricornutumEX461668 RuBisCO form II E-153 96 Amphidinium carteraeEX461810 Triose-phosphate isomerase E-116 87 Isochrysis galbanaChlorophyll synthesisEX458222 NADPH protochlorophyllide reductase E-132 77 Phaeodactylum tricornutumEX455862 Magnesium chelatase H-subunit E-114 88 Ostreococcus lucimarinusEX455291 Mg-protoporhyrin IX (ChlI) 2.00E-68 88 Amphidinium carteraeEX455321 NADPH-protochlorophyllide oxidoreductase E-127 74 Phaeodactylum tricornutumEX462775 Chloroplast geranylgeranyl reductase/hydrogenase 7.00E-50 81 Heterocapsa triquetraEX456318 Glutamate 1-semialdehyde 2,1-aminomutase 7.00E-83 89 Amphidinium carteraeLuminescenceEX458318 LCF E-115 81 Lingulodinium polyedrumEU236684 LBP 0 97 Alexandrium tamarenseLight receptorsEX456649 Cryptochrome dash 9.00E-47 65 Euglena gracilisEX462564 Rhodopsin E-71 78 Branchiostoma floridae

Mar Biotechnol

by using the spliced leader sequence recently found in thenuclear-encoded mRNAs of dinoflagellates (Zhang et al.2007). As expected, within this group, the majority(accession numbers: EX455192, EX455275, EX455467,EX455749, EX456053, EX456206, EX456301, EX456598,EX457854, EX458868, EX459236, EX462386, EX462908,and EX463406) belong to the related species A. tamarense(from the A. cantenella–tamarense–fundyense species com-plex) (Scholin et al. 1995) followed by those from A.carterae and H. triquera; L. polyedrum, and Symbiodiniumsp. Only few ESTs were similar to chloroplast sequencesfrom the fucoxantin pigmented K. brevis, or from otherorganisms such as euglenoids, green algae, and strameno-piles, which have a different but parallel origin of the plastidproteins.

The most expressed transcripts with a high similarity tophotosynthesis genes were those predicted to encode thelight harvesting complex, composed of a chlorophyll a-/c-and peridinin-binding protein and those corresponding to anumber of proteins of the light phase of the photosynthesis,such as photosystems I and II, cytochrome b6f, and ATPsynthase (Patron et al. 2005) (Table 3). Highly expressedare the unigenes that are highly similar to the carbonfixation enzyme glyceraldehyde-3-phosphate dehydroge-nase isoform 2 that was 86% identical to the one from L.polyedrum. This enzyme participates in the aldehydeformation during the Calvin cycle in the dark phase ofphotosynthesis. Sequences coding for this enzyme werealso found among the highest expressed in other ESTlibraries of different dinoflagellates such as L. polyedrum(Bachvaroff et al. 2004), A. tamarense (Hackett et al. 2005),K. brevis (Lidie et al. 2005), and A. fundyense (Taroncher-Oldenburg and Anderson 2000). Other highly expressedgenes similar to sequences encoding carbon fixationproteins were the phosphoglycerate kinase and the chloro-plast phosphoribulokinase. The library contains the codingsequences of six enzymes related to chlorophyll synthesisand two enzymes involved in the synthesis of photo-protective pigments (Table 3).

We have also found sequences with high similarity tolight receptors. One has 77% identity to the green lightreceptor (450 and 500 nm) type 1 rhodopsin described inPyrocystis lunula, (Okamoto and Hastings 2003) and tothose from the marine chryptophyte Guillardia theta(Sineshchekov et al. 2005) and Cryptomonas spp. (29%and 26%, respectively). Type 1 rhodopsins have recentlybeen described in the green alga Chlamydomonas rein-hardtii, where they function as receptors for phototaxisresponses (Sineshchekov et al. 2002). This photosensitiveprotein is similar to "-proteobacterial rhodopsins and moreabundantly expressed during the early day hours (Okamotoand Hastings 2003). Sequences that correspond to a secondphotosensory receptor, the chryptochrome dash protein, a

blue light (400–500 nm) and UV-A (320–400 nm) receptor,were found. This protein, which is involved in the lightregulation of growth and development in plants and othercellular processes such as growth and the induction ofsexual reproduction in algae (Liscum et al. 2003) shows30% identity to those from K. brevis and Arabidopsisthaliana. We consider that these light receptors are aninteresting subject of study in relation to the high level ofexpression of blue light luminescence proteins in A.catenella, considering the probable role of the lumines-cence in the cellular communication of dinoflagellates.

Other Proteins

Two A. catenella unigenes show a 100% identity with atoxic strain-specific sequence of A. tamarense (AT-T1),previously identified as a biomarker of toxicity by Chan etal. (2006). Both A. catenella sequences also show similarityto unknown proteins of the nontoxic dinoflagellate H.triquera. These sequences contain signal peptide sequences,suggestive of a plastid targeting protein (Patron et al. 2005).We have also found an A. catenella unigene coding for aprotein with a high level of similarity to two interestingconjugation-induced proteins, SPS19 from Saccharomycescerevisiae and eIF-4A, an eukaryotic elongation factor thatwas found recently to be induced during conjugation in thedinoflagellates A. catenella and A. tamarense (Hosoi-Tanabe et al. 2005).

Fig. 3 Venn diagram of the comparison between the A. catenellaESTs with the genomes of A. thaliana; T. pseudonana; C. merolae;Entoamoeba histolytica; and Plasmodium falciparum. The numberand percentage of the homologous sequences of A. catenella with eachorganism is referred in the intersection

Mar Biotechnol

Two sequences code for a protein with a cysteine-richregion, which has similarity to the EhV_307 protein fromthe Emiliania huxleyi virus (Wilson et al. 2005). Other viralsequences from the Paramecium bursaria Chlorella viruswere also found but with a lower similarity.

Genes predicted to encode a diversity of proteinsinvolved in transport processes were detected; among themwere Na, K, Ca, phosphate, and ammonium channels, andalso antiporters; ABC-transporters; aminoacid transporters;and the Sec61 and SecY translocases, involved in secretionpathways in eukaryotes. Thirteen sequences that correspondto transposable elements previously described by Armbrustet al. (2004) in the Thalassiosira pseudonana genome werefound with a relatively low similarity.

Comparative Genomics

The A. catenella protein database was compared with ge-nomes of the plant A. thaliana and to genomes of unicellulareukaryotes of the protista kingdom, such as T. pseudonana,Entoamoeba histolytica, Cryptosporidium hominis, and thered algae Cyanidioschyzon merolae (Fig. 3). The Venndiagram shows the highest similarity with A. thaliana(19.3%), the diatom T. pseudonana (19.1%), and C. merolae(18.3%) (Fig. 3). We observed a similar distribution offunctional groups (COGs) among the sequences in commonwith those five organisms (not shown).

When the unigenes of this library were compared with10,886 ESTs of the closely related species A. tamarense(Hackett et al. 2005) present in the public database, we found3,045 (46.9%) hits. From them, the 1,236 common unigeneswere classified into COGs, and the most representedcategories corresponded to carbohydrate metabolism(11.8%), posttranslational modification and chaperones(9.1%), energy production (7.4%), and luminescence (6.6%).

Acknowledgments This research has been partially funded byCONICYT-FONDEF Project MR02I1003 and by a Microsoft Re-search Joint R&D Program.

References

Abrahams MV, Townsend LD (1993) Bioluminescence in dinoflagellates:A test of the burgular alarm hypothesis. Ecology 74:258–260

Altschul SF, Gish W, Miller E, Myers EW, Lipman DT (1990) Basiclocal alignment search tool. J Mol Biol 3:403–410

Anderson DM, Grabher A, Herzog M (1992) Separation of codingsequences from structural DNA in the dinoflagellate Cryptheco-dinium cohnii. Mol Mar Biol Biotechnol 1:89–96

Armbrust EY, Berges JA, Bowler C, Green BR,Martinez D, Putnam NH,Zhou S, Allen AF, Apt KE, Bechner M, Brzezinski MA, Chaal BK,Chiovitti A, Davis AK, Demarest MS, Detter JC, Glavina T,Goodstein D, Hadi MZ, Hellsten U, Hildebrand M, Jenkins BD,Jurka J, Kapitonov VV, Kroger N, Lau WW, Lane T, Larimer FW,Lippmeier JC, Lucas S, Medina M, Montsant A, Obornik M,

Parker MS, Palenik B, Pazour GT, Richardson PM, Rynearson TA,Saito MA, Schwartz DC, Thamatrakoln K, Valentin K, Vardi A,Wilkerson FP, Rokhsar DS (2004) The genome of the diatomThalassiosira pseudonana: ecology, evolution, and metabolism.Science 306:79–86

Bachvaroff TR, Concepcion GT, Rogers CR, Herman EM, Delwiche CH(2004) Dinoflagellate expressed sequence tag data indicate massivetransfer of chloroplast genes to the nuclear genome. Protist 155:65–78

Bhaud Y, Geraud M, Ausseil J, Soyer-Gobillard MO, Moreu H (1999)Cyclic expression of a nuclear protein in a dinoflagellate. JEukaryot Microbiol 46:259–267

Chan LL, Sit WH, Lam PK, Hsieh DP, Hodgkiss IJ, Wan JM, Ho AY,Choi NM, Wang DZ, Dudgeon D (2006) Identification andcharacterization of a “biomarker of toxicity” from the proteomeof the paralytic shellfish toxin-producing dinoflagellate Alexan-drium tamarense (Dinophyceae). Proteomics 6:654–666

Cheng X (1995) DNA modification by methyltransferases. Curr OpinStruct Biol 5:4–10

Craig EA (1989) Essential roles of 70 kDa heat inducible proteins.Bioessays 11:48–52

Easias WE, Curl HC Jr (1972) Effect of dinoflagellate biolumines-cence of copepod ingestión rates. Limnol Oceanogr 17:901–906

Edner DL, Anderson DM (2006) Global transcriptional profiling ofthe toxic dinoflagellate Alexandrium fundyense using massivelyparallel signature sequencing. BMC Genomics 7:88

Giovanelli J (1987) Sulfur aminoacids of plants: an overview.Methods Enzymol 143:419–426

Guillard R (1995) Culture methods. In: Hallegraeff GM, AndersonDM, Cembella AD (eds) IOC manuals and guides: manual onharmful marine microalgae. Intergovernmental OceanographicCommission of UNESCO, Paris, pp 45–62

Guzmán L, Campodónico I (1975) Marea Roja en la región deMagallanes. Publ Inst Pat Ser Monogr Punta Arenas (Chile) 9:44

Guzmán L, Campodónico I (1978) Mareas Rojas en Chile. Inter-ciencia 3:144–151

Hackett JD, Scheetz TE, Yoon HS, Soares MB, Bonaldo MF,Casavant TL, Bhattacharya D (2005) Insight into a dinoflagellategenome through expressed sequence tag analysis. BMCGenomics 6:80

Hallegraeff G (1993) A review of harmful algal blooms and theirapparent global increase. Phycologia 32:79–99

Hosoi-Tanabe S, Tomishima S, Nagai S, SaKo Y (2005) Identificationof a gene induced in conjugation-promoted cells of toxic marinedinoflagellate Alexandrium tamarense and Alexandrium catenellausing differential display analysis. FEMS Microbiol Lett251:161–168

Huang X, Madan A (1999) CAP3: A DNA sequence assemblyprogram. Genome Res 9:868–877

Imai I (1987) Size distribution, number and biomass of bacteria inintertidal sediments and seawater of Ohmi Bay, Japan. Bull JpnSoc Microb Ecol 2:1–11

Kuwae T, Hosokawa Y (1999) Determination of abundance andbiovolume of bacteria in sediments by dual staining with 4_6-diamino-2-phenylindole and acridine orange: relationship todispersion treatment and sediment characteristics. Appl EnvironMicrobiol 65:3407–3412

Li L, Hong R, Hastings JW (1997) Three functional luciferase domainsin a single polypeptide chain. Proc Natl Acad Sci U S A 94:8954–8958

Lidie KB, Ryan JC, Barbier M, Vandolah FM (2005) Gene expressionin Florida Red Tide Dinoflagellate Karenia brevis: Analysis of anexpressed sequence tag library and development of a DNAmicroarray. Mar Biotechnol 7:481–493

Lin S (2006) The smallest dinoflagellate genome is yet to be found: Acomment on LaJeunesse et al. “Simbiodinium (Pyrrophyta)

Mar Biotechnol

genome sizes (DNA content) are smallest among dinoflagel-lates”. J Phycol 42:746–748

Lindquist S, Craig EA (1988) The heat-shock proteins. Annu RevGenet 22:631–677

Liscum E, Hodgson DW, Campbell TJ (2003) Blue light signalingthrough the cryptochromes and phototropins. So that’s what theblues is all about. Plant Physiol 133:1429–1436

Liu L, Wilson T, Hastings JW (2004) Molecular evolution of dinoflagel-late luciferases, enzymes with three catalytic domains in a singlepolypeptide. Proc Natl Acad Sci U S A 101(47):16555–16560

Machabee S, Wall L, Morse D (1994) Expression and genomicorganization of a dinoflagellate gene family. Plant Mol Biol25:23–31

Mulder NJ, Apweiler R, Attwook TK, Bairoch A, Bateman A, Binns D,Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U,Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D,Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A,Langendijk-Genevaux PS, Lonsdale D, Lóperz R, Letunic I, MaderaM, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A,Nikolskaya AN, Orchard S, Orengo C, Petryszak R, Slengut JD,Sigrist CJ, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C(2007) New developments in the InterPro database. Nucleic AcidsRes 35:224–228

Okamoto OK, Hastings JW (2003) Novel dinoflagellate clock-relatedgenes identified through microarrays analysis. J Phycol 39:519–526

Patron NJ, Waller RF, Archibald JH, Keeling PT (2005) Complexprotein targeting to dinoflagellate plastids. J Mol Biol 348:1015–1024

Santos SR, Coffroth MA (2003) Molecular genetic evidence thatdinoflagellates belonging to the genus Symbiodinium Freudenthalare haploid. Biol Bull 204:10–20

Scholin CA, Hallegraeff GM, Anderson DM (1995) Molecularevolution of the Alexandrium tamarense species complex(Dinophyceae) dispersal in the North American and west Pacificregions. Phycologia 34:472–485

Seliger HH, Fastie WG, McElroy WD (1961) Bioluminescence inChesapeake Bay. Science 133:699–700

Sigee DC (1984) Structural DNA and genetically active DNA indinoflagellate chromosomes. Biosystems 16:203–210

Sineshchekov OA, Jung KH, Spudich JL (2002) Two rhodopsinsmediate phototaxis to low- and high-intensity light in Chlamydo-monas reinhardtii. Proc Natl Acad Sci U S A 25(99):8689–8694

Sineshchekov OA, Govorunova EG, Jung KH, Zauner S, Maier US,Spudich JL (2005) Rhodopsin-mediated photoreception incryptophyte flagellates. Biophys J 89:4310–4319

Spector D (1984) Dinoflagellate nuclei. In: Spector DL (ed)Dinoflagellates. Academic, Orlando, pp 107–147

Sweeney B (1987) Bioluminescence and circadian rhythms. In: TaylorFJR (ed) The biology of dinoflagellates, botanical monographs,vol 21. Blackwell Scientific, Oxford

Tanikawa N, Akimoto H, Ogoh K, Chun W, Ohmiya Y (2004)Expressed sequence tag analysis of the dinoflagellate Lingulodi-nium polyedrum during dark phase. Photochem Photobiol 80:31–35

Taroncher-Oldenburg G, Anderson DM (2000) Identification andcharacterization of three differentially expressed genes, encodingS-adenosylhomocysteine hydrolase, methionine aminopeptidase,and a histone-like protein, in the toxic dinoflagellate Alexandriumfundyense. Appl Environ Microbiol 66:2105–2112

Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, KooninEV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN,Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ,Natale DA (2003) The COG database: an updated version includeseukaryotes. BMC Bioinformatics 4:41

The Gene Ontology Consortium (2007) The Gene Ontology project in2008. Nucleic Acids Res 34:D322–D326

Triplett EL, Govind NS, Roman SI, Jovinem RVM, Prèzelinm BB(1993) Characterization of the sequence organization of DNAfrom the dinoflagellate Heterocapsa pygmaea (Glenodinium sp.).Mol Mar Biol Biotechnol 2:239–245

Uribe P, Espejo RT (2003) Effect of associated bacterial microflora inthe growth and toxin production of Alexandrium catenella. ApplEnviron Microbiol 69:659–662

Weinmaster G, Roberts VJ, Lemke G (1992) Notch2: a secondmammalian Notch gene. Development 116:931–941

Widder EA, Case JF, Bernstein SA, MacIntyre S, Lowenstine MR,Bowlby MR, Cook DP (1993) A new large volume biolumines-cence bathyphotometer with defined turbulence excitation. Deep-Sea Res 40:607–627

Wilson WH, Schroeder DC, Allen MJ, Holden MT, Parkhill J,Barrell BG, Churcher C, Hamlin N, Mungall K, Norbertczak H,Quail MA, Price C, Rabbinowitsch E, Walker D, Craigon M,Roy D, Ghazal P (2005) Complete genome sequence and lyticphase transcription profile of a Coccolithovirus. Science309:1090–1092

Zhang Z, Green BR, Cavalier-Smith T (1999) Single gene circles indinoflagellate chloroplast genomes. Nature 400:155–159

Zhang H, Hou Y, Miranda L, Campbell DA, Sturm NR, Gaasterland T,Lin S (2007) Spliced leader RNA trans-splicing in dinoflagellates.Proc Natl Acad Sci U S A 104:4618–4623

Mar Biotechnol