curating and analysing the genome of a model eukaryote

223
CURATING AND ANALYSING THE GENOME OF A MODEL EUKARYOTE A thesis submitted to The University of Manchester for the degree of Doctor of Science in the Faculty of Biology Medicine and Health Doctor of Science (DSc) 2018

Transcript of curating and analysing the genome of a model eukaryote

CURATING AND ANALYSING THE GENOME OF A MODEL EUKARYOTE

A thesis submitted to The University of Manchester for the degree

of Doctor of Science in the Faculty of Biology Medicine and Health

Doctor of Science (DSc)

2018

1

ACKNOWLEDGEMENTS

I would like to thank my line managers and mentors over the past decade; Jürg Bähler, Jaqueline Hayles, Paul Nurse, and Steve Oliver for support, mentorship, encouragement and, in many ways making PomBase a possibility; Also my PomBase colleagues Midori Harris, Antonia Lock, and Kim Rutherford whose wonderful work makes PomBase a reality.

I would also like to thank my friends and collegues (Midori Harris, Antonia Lock, Kim Rutherford and Guy Slater), and my mother Bernice Wood for their excelelnt proofreading, and much more.

2

TABLE OF CONTENTS

Acknowledgements 1

Table of contents 2

Abstract 5

List of abbreviations 6

Tables and figures 7

Higher doctorate candidate declaration 8

Copyright statement 12

Qualifications and eligibility 13

Submitted publications 14

Summary statement 17

Introduction 17

1. Genome and sequence feature annotation 19

Gene complement and revisions 20

Budding yeast reannotation 22

2. Comparative analyses 23

Genome comparisons 23

Proteome comparisons 23

Lineage Specific gene loss 26

3. Functional biocuration and data integration 27

Ontologies 27

Phenotype Curation 29

Gene Ontology (GO) curation 29

Increasing annotation expressivity and connectivity 31

Curating pathways and networks 32

A system-wide biological summary (GO slim) 35

Community curation 35

3

4. Analysing large datasets using biocuration 38

The deletion collection, prevalence of gene dispensability 38

Duplication, taxonomic conservation and dispensability 38

Dispensability, genome location, and cellular location 40

Dispensability and biological process 41

A phenotype resource for cell cycle and cell shape 44

Cell shape mutants 45

Cell cycle mutants 47

5. Tools for curation, community curation and data hosting

49

Canto curation tool 50

Conclusions 51

References 54

Submitted papers

1. The genome sequence of Schizosaccharomyces pombe

62

2. A re-annotation of the Saccharomyces cerevisiae genome

63

3. GeneDB: a resource for prokaryotic and eukaryotic ………..---------organisms.

64

4. The Gene Ontology (GO) database and informatics resource.

65

5. Gene Ontology annotation status of the fission yeast -----------------genome: preliminary coverage approaches 100%.

66

6. Schizosaccharomyces pombe comparative genomics; from ----------sequence to systems.

67

7. Analysis of a genome-wide set of gene deletions in the --------------fission yeast Schizosaccharomyces pombe.

68

8. A comprehensive online resource for fission yeast.

69

9. The fission yeast phenotype ontology.

70

4

10. Canto: An online tool for community literature curation.

71

11. A genome-wide resource of cell cycle and cell shape genes -------- of fission yeast.

72

12. A method for increasing expressivity of Gene Ontology --------------annotations using a compositional approach.

73

13. PomBase 2015: updates to the fission yeast database.

74

5

ABSTRACT

Valerie Wood. Doctor of Science DSc.

The University of Manchester. 2018.

CURATING AND ANALYSING THE GENOME OF A MODEL EUKARYOTE

This work describes the biocuration of the fission yeast Schizosaccharomyces pombe, and how this has contributed to the development of fission yeast as a model species. I recount the curation and refinement of the gene complement, and manually curated ortholog inventories and describe how these two resources have been pivotal for comparative analysis. I outline the mechanisms of functional curation using ontologies, the PomBase community curation initiative and ongoing curation work to construct a qualitative model of cellular pathways. I describe how functional curation has been used to analyse important genome-wide datasets. Finally, the tools and resources developed to curate, display and enable interrogation and reuse of data are outlined.

6

List of abbreviations

BP Biological Process BLAST Basic Local Alignment Search Tool CC Cellular Component CHEBI Chemical Entities of Biological Interest Co-PI Co-Principal Investigator CRUK Cancer Research UK DNA deoxyribonucleic acid FAIR Findable, Accessible, Interoperable and Reusable FYPO Fission Yeast Phenotype Ontology GO Gene Ontology GPI glycosylphosphatidylinositol MIPS Munich Information Center for Protein Sequences MOD model organism database MF Molecular Function Mya Million years ago ncRNA non-coding RNA OBO Open Biological and Biomedical Ontology ORF open reading frame Pfam Protein family database PATO Phenotype And Trait Ontology RT-PCR Reverse transcription polymerase chain reaction RNA ribonucleic acid RNA-Seq RNA sequencing RACE rapid amplification of cDNA ends SGD Saccharomyces Genome Database snRNA small nuclear RNA snoRNA small nucleolar RNA SO Sequence Ontology rRNA ribosomal RNA tRNA transfer RNA YPD Yeast Proteome Database

7

Tables and figures Title page Table 1 Summary of the changes in protein orthologs

identified between fission yeast and other species since 2006

26

Table 2 Changes in numbers of GO annotation to different GO aspects over time

31

Table 3 Fission yeast annotation extensions, growth over

time: Total number of annotations and the subset with extensions

33

Figure 1 Using GO annotation extensions to reconstruct

networks 34

Table 4 Current community contributions, increase over time. 37 Figure 2 Increase in ontology-based curation in PomBase over

time 38

Figure 3 Comparative analysis of gene dispensability profiles of

fission yeast 40

Figure 4 Enriched essential and non-essential processes for

fission yeast and budding yeast 42

Figure 5 Dispensability comparison of orthologous pairs from

the two yeasts 43

Figure 6 Functional distribution of orthologs with different

dispensability 43

Table 5 Enrichments to GO biological process terms for each

phenotypic class. 46

Table 6 Enrichments to GO cellular component terms for each

phenotypic class. 46

8

Higher Doctorate Candidate Declaration

Candidate Name: Valerie Wood

Faculty of Biology Medicine and Health

Higher Doctorate Title: Curating and Analysing the Genome

of A Model Eukaryote

Declaration

My research encompasses the systemization of all genome-

related, biochemical, and genetic data for an important model

eukaryotic organism - the fission yeast Schizosaccharomyces pombe. I

lead the development and maintenance of the model organism database

(MOD) that supports the changing patterns of research on this

organism. Through this work I am a major contributor to a number of

bioinformatics research projects related to functional genomics,

functional curation, and systems biology.

In 2002, I was first author of the publication that reported the S.

pombe genome sequence (the sixth eukaryotic genome to be

sequenced)1. My contribution to the Fission Yeast Sequencing Project

and publication included project management, genome assembly,

creating gene prediction pipelines, sequence feature annotation,

sequence analysis, preliminary functional curation, publication content.

Collaborators at Cancer Research UK (CRUK) carried out the analysis of

intergenic regions, introns, and duplications. Colleagues at the Sanger

Centre (now the Wellcome Sanger Institute) performed cloning, physical

mapping, sequencing, sequence finishing, and sequence validation. To

9

enable more informative pairwise comparisons with the genome of the

budding yeast Saccharomyces cerevisiae, I constructed an updated

inventory of protein-coding genes for S. cerevisiae2. For this work I

performed the curation and analysis and wrote the manuscript.

Between 2002 and 2010, I curated fission yeast genome data

singlehandedly in the first incarnation of the fission yeast database,

GeneDB3, for which I contributed to functional specifications for software

development. During this period, I became a member of, and active

contributor to, the Gene Ontology (GO) Consortium4 which marks the

beginning of my extensive contributions to GO’s ontology development

and annotation quality control across taxa. In 2006, fission yeast was

the second organism (after S. cerevisiae) for which each protein-coding

gene had been manually evaluated for the availability of GO

experimental data or possible functional inferences from orthologs in

other species5, a level of curation breadth still achieved only by these

two species. I performed all of the GO curation for this work and wrote

the manuscript. I collated a manually curated ortholog inventory

between budding yeast and fission yeast6, a resource which is used by

the majority of fission yeast laboratories, and which has been used to

provide seed alignments for over 1000 protein families in the Pfam

protein family database. In 2010, I worked with the laboratory of Sir

Paul Nurse to provide bioinformatics analysis of the fission yeast

deletion collection where my contributions included comparative analysis

with S. cerevisiae, GO analysis, and taxonomic distribution analysis of

essential genes, and the co-writing of the manuscript, on which I was a

joint first author7.

The sum total of the classification work described above is pivotal

to many of the past and current activities of the fission yeast research

10

community and has enabled fission yeast to become firmly established

as a mature model eukaryotic species in many areas of basic research,

especially cell cycle, cytokinesis, spindle organization and chromosome

segregation, RNAi, and chromatin biology, as well as centromere and

telomere function.

Since 2010, I have been Co-PI and project manager of PomBase8

the fission yeast MOD that fully replaced the legacy GeneDB system in

2013. I have been responsible for the overall strategy, implementation,

and development of database and curation workflows.

A pressing need for a cellular phenotype ontology was identified

and fulfilled by the Fission Yeast Phenotype Ontology (FYPO)9.

Phenotype annotation using FYPO is supported by comprehensive

metadata including conditions and genotype descriptions. I conceived

the project and, along with other team members, submitted and defined

a large number of terms; Dr. Midori A. Harris continues to develop

FYPO.

A novel aspect of the PomBase project is the “Fission Yeast

Community Curation” project which I conceived and piloted in 2009. I

created the specification for “Canto” 10 a generic web-based curation

software tool (developed by Mr. Kim Rutherford). Canto enables authors

to easily contribute detailed structured annotation from their own

research publications for inclusion in PomBase, and onward

dissemination to other databases. Canto was developed with input from

myself and other PomBase team members (Dr. Midori. A. Harris and Dr.

Antonia Lock). Community curation is proving a viable mechanism to

supplement the output of professional curators at PomBase and has

resulted in the curation 12,000 annotations from over 600 publications

11

to date. Both the Community Curation concept, and the Canto curation

tool are now being adopted by other research communities.

During 2010-2011 I was partly seconded to the Nurse laboratory

where I provided the bioinformatics analysis of data from a genome-

wide visual phenotypic screen performed by Dr. Jacqueline Hayles to

identify a set of genes which, when deleted, resulted in cell cycle or cell

shape defects. I was joint first author on the resulting paper11, currently

the second most accessed publication in PomBase. Subsequently, I

played a major role in the development of a system to extend the scope

of GO annotation using a compositional approach (known as annotation

extensions)13. Annotation extensions provide additional specificity to a

gene annotation by linking the annotated term to another ontology

term, or to a gene product, via a relationship. Early adoption of this

approach enabled PomBase to be the first MOD to incorporate

annotation extensions into its gene page display. PomBase is now

piloting the use of connections captured in annotation extensions to

generate high-confidence physical interaction networks and pathways.

I have published over 50 peer reviewed papers, (h-index of 26,

Scopus). The 13 publications that I present in this Thesis are listed

below; additional publications to which I made a substantial contribution

are also cited in the summary statement.

I confirm that this is a true statement and that, subject to any

comments above, the submission is my own original work.

Signed: ...................................................................

Date: ...........................................

12

Copyright statement

i. The author of this Thesis (including any appendices and/or

schedules to this thesis) owns any copyright in it (the "Copyright") and

she has given The University of Manchester the right to use such

Copyright for any administrative, promotional, educational and/or

teaching purposes.

ii. Copies of this thesis, either in full or in extracts, may be made

only in accordance with the regulations of the John Rylands University

Library of Manchester. Details of these regulations may be obtained

from the Librarian. This page must form part of any such copies made.

iii. The ownership of any patents, designs, trade marks and any and

all other intellectual property rights except for the Copyright (the

"Intellectual Property Rights") and any reproductions of copyright works,

for example graphs and tables ("Reproductions"), which may be

described in this thesis, may not be owned by the author and may be

owned by third parties. Such Intellectual Property Rights and

Reproductions cannot and must not be made available for use without

the prior written permission of the owner(s) of the relevant Intellectual

Property Rights and/or Reproductions.

iv. Further information on the conditions under which disclosure,

publication and exploitation of this thesis, the Copyright and any

Intellectual Property Rights and/or Reproductions described in it may

take place is available from the Head of School of Biological Sciences

and the Vice President and Dean of the Faculty of Biology. Medicine and

Health.

13

Qualifications and eligibility

Oct. 1995 University of Manchester, Manchester, UK – Jul. 1996 M. Sc. Bioinformatics Oct. 1992 University of Manchester, Manchester, UK – Jul. 1995 B.Sc. (Hons) First. Class. Biochemistry and Applied Molecular Biology

My research encompasses the systemization of all genome related,

biochemical and genetic data for an important model eukaryotic

organism - the fission yeast, Schizosaccharomyces pombe. I have led

the development and maintenance of the Model Organism Database

(MOD) to support the changing patterns of research on this organism.

As part of this work I am a major contributor to a number of

bioinformatics research projects related to functional genomics,

functional curation, and systems biology as described previously in the

prima facie case for support, and the complete statement below.

14

Submitted publications

1. Wood, V., et al. (2002). The genome sequence of

Schizosaccharomyces pombe. Nature. 21;415(6874):871-80.

Citations 1056 (Scopus), Altmetrics 21, Impact Factor 41.57.

2. Wood, V., Rutherford, K.M., Ivens, A., Rajandream, M.-A.,

Barrell, B. (2001). A re-annotation of the Saccharomyces cerevisiae

genome. Comparative and Functional Genomics. 2(3):143-154.

Citations 54 (Scopus).

3. Hertz-Fowler, C., Peacock, C.S., Wood, V., Aslett, M., Kerhornou,

A., Mooney, P., Tivey, A., Berriman, M., Hall, N., Rutherford, K., Parkhill,

J. Ivens, A.C., Rajandream, M.A., Barrell, B. (2004) GeneDB: a resource

for prokaryotic and eukaryotic organisms. Nucleic Acids Res.

32(Database issue):D339-43.

Joint first author, Citations 181 (Scopus), Impact Factor 11.56.

4. Harris, M.A., Clark, J., Ireland, A., Lomax, J., Ashburner, M.,

Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., Richter, J.,

Rubin, G.M., Blake., J.A., Bult, C., Dolan, M., Drabkin, H., Eppig, J.T.,

Hill, D.P., Ni, L., Ringwald, M., Balakrishnan, R., Cherry, J.M., Christie

KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL,

Nash RS, Sethuraman, A., Theesfeld, C.L., Botstein, D., Dolinski, K.,

Feierbach, B., Berardini, T., Mundodi. S., Rhee, S.Y., Apweiler, R.,

Barrell, D., Camon, E., Dimmer, E., Lee, V., Chisholm, R., Gaudet, P.,

Kibbe, W., Kishore, R., Schwarz, E.M., Sternberg, P., Gwinn, M.,

Hannick, L., Wortman, J., Berriman, M., Wood, V., de la Cruz, N.,

Tonellato, P., Jaiswal, P., Seigfried, T., White, R. (2004) The Gene

Ontology (GO) database and informatics resource. Nucleic Acids Res.

32(Database issue):D258-61.

15

Citations 2117 (Scopus), Altmetrics 3, Impact Factor 11.56.

5. Aslett, M., Wood, V. (2006). Gene Ontology annotation status of

the fission yeast genome: preliminary coverage approaches 100%.

Yeast. 23(13):913-919.

Citations 41 (Scopus), Altmetrics 9, Impact Factor 1.95, Special Issue

from the 2006 European Fission Yeast Meeting. Joint Guest Editor with

Jürg Bähler.

6. Wood, V. (2006). Schizosaccharomyces pombe comparative

genomics; from sequence to systems. pp. 233-285 in Topics in Current

Genetics, edited by P. Sunnerhagen, and J. Piskur.

Citations 27 (Scopus).

7. Kim, D.U., Hayles, J., Kim, D., Wood, V., et al. (2010). Analysis

of a genome-wide set of gene deletions in the fission yeast

Schizosaccharomyces pombe. Nature Biotechnology 28:617–623.

Joint first author, Citations 349 (Scopus), Altmetrics 11, Impact Factor

35.72.

8. Wood, V., Harris, M.A., McDowall, M.D., Rutherford, K., Vaughan,

B.W., Staines, D.M., Aslett, M., Lock, A., Bähler, J., Kersey, P.J., Oliver

S.G. (2012) PomBase: A comprehensive online resource for fission

yeast. Nucleic Acids Res. 40(D1):D559-D564.

Citations 157 (Scopus), Altmetrics 4, Impact Factor 11.56.

9. Harris, M.A., Lock, A., Bähler, J., Oliver, S.G., Wood, V. (2013).

FYPO: The fission yeast phenotype ontology. Bioinformatics.

29(13):1671-1678.

Citations 26 (Scopus), Altmetrics 4, Impact factor 5.48.

16

10. Rutherford, K.M., Harris, M.A., Lock, A., Oliver, S.G., Wood, V.

(2014). Canto: An online tool for community literature curation.

Bioinformatics. 30(12):1791-1792.

Citations 18 (Scopus), Altmetrics score 13, Impact factor 5.48.

11. Hayles, J., Wood, V., Jeffery, L., Hoe, K-L., Kim, D-U., Park. H-

O., Salas-Pino, S., Heichinger, C., Nurse, P. (2013). A genome-wide

resource of cell cycle and cell shape genes of fission yeast. Open Biol.

3(5):130053.

Joint first author, Citations 61 (Scopus), Altmetrics score 16, Impact

factor 3.28.

12. Huntley, R.P., Harris, M.A., Alam-Faruque, Y., Blake, J.A., Carbon,

S., Dietze, H., Dimmer, E.C., Foulger, R.E., Hill, D.P., Khodiyar, V.K.,

Lock, A., Lomax, J., Lovering, R.C., Mutowo-Meullenet, P., Sawford, T.,

Van Auken, K., Wood, V., Mungall, C.J. (2014). A method for increasing

expressivity of Gene Ontology annotations using a compositional

approach. BMC Bioinformatics. 15:155

Citations 29 (Scopus), Altmetrics score 11, Impact factor 2.213.

13. McDowall, M.D., Harris, M.A., Lock, A., Rutherford, K., Staines,

D.M., Bähler, J., Kersey, P.J., Oliver, S.G., Wood, V. (2014). PomBase

2015: updates to the fission yeast database. Nucleic Acids Res.

43(Database issue):D656-61

Citations 37 (Scopus), Altmetrics score 7, Impact Factor 11.56.

17

Summary statement

Introduction

Schizosaccharomyces pombe (fission yeast) is a free-living single-celled

taphrinomycete fungus that shares many features with more complex

organisms. It is estimated to have diverged from the budding yeast

Saccharomyces cerevisiae 330-420 million years ago (Mya), and from

the lineage giving rise to humans 1000-1200 Mya (Berbee and Taylor

1993). Fission yeast was first described by Paul Linder in the 1890s and

its use in genetic studies was initiated by Urs Leupold in the 1940’s

(Leupold 1993). S. pombe became firmly established as a genetic model

by groundbreaking cell cycle studies in the 1980’s that led to the

discovery of the cyclin-dependent kinase Cdk1/Cdc2, for which Paul

Nurse was later awarded the Nobel Prize in Medicine (reviewed in Nurse

2000). The high number of proteins conserved between fission yeast

and Metazoa, its ease of genetic manipulation, and substantial biological

differences from S. cerevisiae have made fission yeast a reliable model

for studying many aspects of human disease and cell biology (reviewed

in Hoffman et al. 2015). There is a large and active research community

dedicated to exploiting S. pombe as a model eukaryote, and areas of

intense study now embrace most conserved cellular processes.

At the beginning of the fission yeast genome sequencing project

(circa 1996), around 200 fission yeast protein-coding genes were known

from biochemistry and genetics (reviewed in Wood 2006). On

completion of the genome sequence in 2002 the number of predicted

protein-coding genes had increased to 4824 (Wood et al. 2002) of which

~70% could be assigned a biological role based on experimental

18

characterisation or homology to characterised genes in other species. In

2002 around 1200 proteins were experimentally characterised in fission

yeast, increasing to around 1560 by 2006 (Wood 2006). The pace of

novel protein characterisation has slowed substantially during the past

decade, and although 2342 fission yeast protein have now been

reported in published work describing small-scale experiments, we can

still assign a preliminary biological role to only 4374 of the 5069 protein-

coding genes (86%) (https://www.pombase.org/status/gene-

characterisation). At the time of writing, 695 predicted S. pombe

proteins have no known biological role:

https://www.pombase.org/status/priority-unstudied-genes (Wood et al.

manuscript submitted).

In this summary statement, I describe my contribution to the

biocuration of the S. pombe genome, and the development of fission

yeast as a model species. In section 1 (Genome and sequence

feature annotation), I present a summary of the primary sequence

annotation necessary to obtain a basic parts list of a single-celled

eukaryote. The proteome comparisons required for functional inference

and preliminary functional annotation are presented in section 2

(Comparative analyses). In Section 3 (Functional biocuration and

data integration), I describe the development of resources and

protocols for high quality, broad, and deep functional curation using the

formal semantic representation provided by ontologies and the

integration of the curated knowledge into networks and pathways.

Section 4 (Analysing large datasets using biocuration) describes

my role in providing the curation and bioinformatics analyses for two

important genome-wide phenotype resources as part of a collaboration

with the Nurse and Hayles laboratory (CRUK/Crick Institute, London

UK), the publication of the fission yeast genome–wide deletion

19

collection, and a visual screen for cell-cycle and morphology

phenotypes. Finally, in Section 5 (Tools for curation and data

hosting), I describe the tools and resources that I have co-developed in

order to curate and display data related to fission yeast and enable its

interrogation and reuse. In particular this relates to the fission yeast

resource PomBase, but also extends to external resources through data

dissemination. This narrative focuses on the evolution of each of these

key components of the biocuration for an important model species over

time, illustrated with progress metrics.

1. Genome and sequence feature annotation

A central goal of biological research is to fully decipher the information

encoded in a genome sequence, and to understand how this is

transformed into the orchestrated collection of processes and functions

that combine to enable life. The sequencing and annotation of the fission

yeast genome, and its preliminary analysis was a step towards this goal

(Wood et al. 2002). The identification of sequence features provides a

“parts list” for further bioinformatic and functional genomics

endeavours. The major challenge of sequence feature annotation is to

identify features comprehensively and accurately (coordinates and

type), and to annotate unambiguously, providing provenance and

unique identifiers. Continual refinement of the gene structure

coordinates and gene complement is necessary. Gene complement

refinement involves the identification of new genes, and the re-

classification of predicted coding genes as dubious, or dubious genes as

coding. Genes may also shift between a protein-encoding and an RNA-

encoding classification.

20

At the time of its publication, the genome sequence of fission

yeast provided the sixth such sequence for a free-living eukaryotic and

represented the one with the most compact proteome (4824 proteins).

Even today the complement of predicted protein-encoding genes of S.

pombe represents one of the smallest completed free-living eukaryotic

proteomes (https://www.ebi.ac.uk/reference_proteomes).

Gene complement and revisions

My approach to the prediction of the protein-coding genes of the fission

yeast integrated many orthogonal methods including gene prediction

algorithms, intron feature identification, homology searches, manual

inspection and refinement of every intron boundary. At publication, we

reported 4824 likely protein-coding genes. Initially all open reading

frames (ORFs) that encoded a sequence of at least 100 amino acids,

starting with a methionine, and with no known overlapping gene in an

alternative reading frame (more than 15 amino acids), were classed as

“protein coding”. Of these, 116 were considered likely to be dubious and

were excluded from the final count. A number of small proteins under

100 amino acids (147) were either the products of known genes or could

be predicted by amino acid sequence homology to gene products in

other species, and so were included. We concluded that any protein-

coding genes remaining to be discovered were therefore smaller than

100 amino acids or were specified by transcripts that were subject to

multiple splicing events.

Since the initial publication of the fission yeast genome, the

majority of newly identified protein-coding genes and the majority of the

revisions to existing protein-coding gene structures are from two

sources: In one study transcriptome analysis combined with manual

curation we identified 22 new protein-coding genes and refined 75

21

annotated gene structures (Wilhelm et al. 2008). In another study, we

identified 39 novel loci by a systematic reappraisal ignoring size

thresholds and using a complete six-frame translation compared to a

proteomics data set, the Pfam domain database, and the genomes of six

other fungi (Bitton et al. 2011). Confirmatory analyses (including

Reverse transcription polymerase chain reaction (RT-PCR), RNA-

sequencing (RNA-Seq), 5' and 3' Rapid amplification of cDNA

ends (RACE), translation, evolutionary conservation and distinct

phenotypes upon deletion) suggest that all of these 39 predicted genes

encoded functional proteins (Bitton et al. 2011). Additional changes

have been made based on my ongoing analysis, and those reported in

numerous specific publications and direct community contributions listed

in full in our revisions inventory1. The relatively small number of new

(118) and removed (50) protein-coding features and gene structure

alterations (289) since publication is a testament to the efficacy of a

rigorous protocol combining multiple gene prediction methods with

manual assessment.

The publication of the genome reported 174 tRNA genes, the 6

spliceosomal RNAs (U1-U6), 16 small nuclear snRNAs and 33 small

nucleolar snoRNAs, the 5.8S 18S and 26S ribosomal RNA (rRNA) genes

(grouped together as tandem repeats on chromosome III) and the 30

5S rRNA genes distributed throughout the genome (Wood et al. 2002).

Advances in high-throughput RNA sequencing technologies allowed the

transcriptome to be surveyed at single nucleotide resolution and began

to reveal large numbers of non-coding RNAs (Wilhelm et al. 2008). At

the time of writing, PomBase contains over 1800 annotated ncRNA

features, mainly of unknown type and function.

1 See https://www.pombase.org/status/new-and-removed-genes for revisions to the gene

22

Budding yeast reannotation

In 1996, when the S. cerevisiae genome was published, 6275 ORFs

were reported (over 100 amino acids with a methionine). However, it

was estimated that only around 5800 of these were likely to be coding

based on the number of small ORFs likely to be included by chance due

to the arbitrary 100 amino acid cut-off (Goffeau et al. 1996). Over the

subsequent 5 years, various budding yeast resources reported very

different protein coding numbers: Yeast Proteome Database (YPD, now

deprecated) reported 6142, the Saccharomyces Genome Database

(SGD) 6310, and the Munich Information Center for Protein Sequences

(MIPS) 6368 (reviewed in Wood et al. 2001). Moreover, the number of

proteins appeared to be generally overrepresented in all resources due

to the inclusion of a large number of non-coding small ORFs which were

entirely or largely overlapping with known genes in alternative strand

ORFs.

I undertook a budding yeast sequence feature annotation revision

to provide a more accurate protein-coding gene total, enabling a more

informative comparison with the fission yeast proteome (Wood et al.

2001). This resulted in the identification of three new genes and 46

proposed alterations to coding sequences (including extensions to

known genes and gene merges). Importantly, 370 of the existing ORFs

were re-classified as “spurious” due to being embedded in a longer ORF

on an alternative strand (a common artefact of the coding bias of highly

expressed proteins). Confoundingly, many of these protein-coding genes

had been named because they produced phenotypes that derived from

the known genes on the correct strand, providing provenance for their

persistence in earlier datasets. This annotation revision provided a

working protein coding gene number of 5570 for S. cerevisiae (5804

including 234 dubious). The current protein coding number reported by

23

SGD is 5915 (6604 total with 689 classified as dubious). The increase of

345 from our reannotation is largely due to the inclusion of 294

additional novel small protein-coding genes.

The accurate identification of sequence features provides a solid

framework for the use of protein sets in comparative analyses, for

functional curation, and for the integration of functional data both within

and between species.

2. Comparative analyses

Genome comparisons

Comparisons of chromosome sequences to search for large tracts of

conserved gene order revealed no large scale duplications in S. pombe;

however, duplicated sequence blocks of approximately 50 kb are

present at the subtelomeric regions of chromosomes I and II (Wood et

al. 2002). S. cerevisiae, in contrast is known to have undergone rounds

of whole genome duplication (Wolfe and Shields 1997). Of the 24 fission

yeast protein-coding genes that are 100% identical at the nucleotide

sequence level, 20 are located in the telomeric regions - suggesting that

a frequent exchange of genetic information occurs at these regions. As

in other species (S. cerevisiae, Plasmodium falciparum), there appears

to be a higher incidence of genes encoding cell surface molecules and

species-specific genes in these regions than elsewhere in the genome

(Wood et al. 2002, Wood 2006).

Proteome comparisons Preliminary proteome comparisons between S. pombe, S. cerevisiae and

Caenorhabditis elegans, using BLAST (Altschul et al. 1990), indicated

that around 3281 (67%) of proteins in fission yeast were also found in

24

S. cerevisiae and C. elegans, and 4050 (83%) were common between

the two yeasts (Wood et al. 2002). A small number - 145, (3%) were

reported to be present in C. elegans but not in S. cerevisiae and 681

(14%) were unique to S. pombe. Reciprocal comparisons between

budding yeast and fission yeast revealed 4523/5777 (78%) were

conserved in S. pombe, of which 3605 (62%) were also present in C.

elegans. The number of proteins conserved exclusively between the two

yeast species was greater for S. cerevisiae than for S. pombe (918 vs.

769), a difference explained by our analysis demonstrating that a larger

proportion of the S. pombe proteome was unique (i.e. encoded by a

single genomic copy) than the S. cerevisiae proteome, which has a

larger number of protein duplicates (paralogous groups). The number of

species-specific genes was also greater for S. cerevisiae than for S.

pombe (1104 vs. 681), this difference is also due mainly to increased

duplication in the species-specific cohort, but a higher incidence of newly

evolved genes could be a minor contributing factor. We also

demonstrated, for the first time, that genes conserved between the

animal and plant kingdoms were almost always conserved in both yeasts

(Wood et al. 2002).

Proteome reciprocal analysis using BLAST provided initial

estimates of conservation. However, pairwise BLAST analysis has a high

incidence of false positives for ortholog (a direct evolutionary

counterpart by vertical descent) identification; frequently similarities are

detected between family members which are not orthologous, and low-

complexity regions can often provide significant, but spurious, matches

(Wood 2006). Furthermore, distant similarities often fail to achieve

significance, or are completely undetected using local alignments, and

false negatives are also therefore very common.

25

Detection of orthologs and manual curation of ortholog inventories

between fission yeast/budding yeast and fission yeast/human has been

ongoing since 2002. Many methods have been combined to identify

orthologs for fission yeast proteins, including directed searches for

missing complex members and de novo prediction by protein family

seed alignment building, integration of the results of multiple existing

orthology prediction methods, and from published functional data (for

example analogous function and protein complex membership).

Examples of the de novo detection of distant orthologs which expedited

gene characterisation include:

• the distant relationship to ciliate telomere binding proteins

for Pot1, resulting in the preliminary characterisation of the

conserved telomere end-binding protein (Baumann and Cech

2001),

• the detection of the S. cerevisiae ortholog of Swi5 (Sae3p),

a protein involved in meiotic joint molecule formation

(Ellermeier et al. 2004)

• the human ortholog of the fission yeast microtubule

anchoring protein Msd1 (SSX2IP) (Toya et al. 2007)

Ortholog detection protocols are described in Wood 2006.

In 2006, S. cerevisiae orthologs were identified for 3636 fission

yeast proteins (mapping to 3842 S. cerevisiae proteins) (Wood 2006).

The remaining 1235 S. pombe proteins and 1704 S. cerevisiae proteins

had no identifiable ortholog recorded in the other yeast, although many

of these had orthologs in other species. By 2018, based on PomBase

curated data, orthologs are now identified for 3954 (78%) S. pombe

proteins to 4141 S. cerevisiae proteins, and for 3527 (69.6%) S. pombe

proteins to 4417 human proteins, with 3175 proteins shared between all

26

three distantly related eukaryotes (changes between 2006 and 2018 are

summarized in Table 1).

2006 2018

S. pombe

Proteins identified 4871 5069

Conserved in S. cerevisiae; 3636 3954

Conserved in Metazoa; N/A 3527

Conserved in Metazoa, absent from S. cerevisiae N/A 353

Fungi only N/A 546

Clade specific (Schizosaccharomyces) 669 368

S. cerevisiae

Proteins identified 5546 5915

Conserved in S. pombe 3842 4141

Conserved in Metazoa N/A N/A

Conserved in Metazoa, absent from S. pombe N/A N/A

Fungi only N/A N/A

Clade specific 1051 N/A

Table 1: Summary of the changes in protein orthologs identified between fission yeast and other species since 2006. Changes in homologs/orthologs detected and taxonomic distribution between pombe and cerevisiae and Metazoa; N/A Not available; 2006 data from Wood 2006, 2018 data from PomBase (www.pombase.org). Protein totals exclude dubious protein-coding genes.

Lineage-specific gene loss

Our initial comparative analysis using BLAST identified only 145 fission

yeast proteins conserved in (Aravind et al. 2000). Today in PomBase,

607 proteins are reported to be absent from S. cerevisiae

but conserved outside the Schizosaccharomyces clade and a large

proportion of these (353) are present in Metazoa. Many of these

proteins are part of functionally connected groups, including some

components of the RNA splicing machinery (40 proteins), components of

27

the heterochromatin machinery and gene silencing pathway (34

proteins), making fission yeast an invaluable model for the equivalent

human processes (Fair and Pleiss 2017, Allshire and Ekwall 2015).

Total proteome comparisons provide insights into evolutionary

similarities and differences (lineage-specific protein losses and protein

family expansions) between species. Since the identification of orthologs

also underlies annotation transfer, proteome-wide ortholog identification

combined with information about duplication and gene loss provides a

framework for the evaluation of functional similarities and differences

due to these phenomena, and to compare breadth and depth of

annotation (both proxies for degree of knowledge) between species.

3. Functional biocuration and data integration

The major activity of any model organism database (MOD) is to

interpret peer-reviewed research articles and manually curate the gene-

specific functional information published within them. The biocuration

process adds value to published data by integrating information from

different publications at the level of the gene product or data type, both

within and between species, in a way that can be interpreted by humans

and computers. Biocuration is thus pivotal to improve the data

accessibility, interoperability, and reuse life cycle. The increased volume

of biological data produced by genome-scale biology, and the increasing

amount of data from directed hypothesis-driven studies, has required a

shift in the approaches used to describe and integrate this data (Oliver

et al. 2016). The implementation and development of these approaches

to the integration of functional data using fission yeast are described

below.

28

Ontologies

Molecular data in PomBase are procured primarily by manual curation of

the fission yeast literature using ontologies (Wood et al. 2012, Lock et

al. 2018a, Lock et al. 2018b). Ontologies provide standardized

semantics for “terms” (classes that represent entities) and their

relationships to each other, and are used for this purpose universally by

the major MODs, and increasingly across other biological databases. An

annotation to an ontology term is a statement of a connection between

a gene product and a term in an ontology. Annotation using ontologies

prevents ambiguity and ensures consistency between descriptions of

experimental observations across different curators, authors, decades,

species, and data types, allowing them to be found and interpreted by

both humans and algorithms.

Increasingly, the ontologies used for this purpose are formally

defined in relation to each other. For example, a phenotype ontology

term describing an abnormal processes involving a metabolite could be

logically defined using a Gene Ontology (GO) Biological Process term

and a Chemical Entities of Biological Interest (CHEBI) molecule term

coupled to a phenotypic quality term from the Phenotype And Trait

Ontology (PATO) to describe the abnormality (de Matos et al. 2010,

Gkoutos et al. 2009). Explicit connections between ontologies preserve

logical consistency between resources aiding interoperability. Reasoning

software can also use logical definitions to partially automate ontology

maintenance by inferring missing links between terms and detecting

redundancy and other types of error. In addition, because ontology

terms are connected to each other in a graph, ontologies allow

annotation at different levels of granularity depending what is known, or

can be inferred. Finally, ontologies provide many additional mechanisms

for quality control, consistency checking, and error correction of

annotated data; for example, restricting some terms for use in specified

29

taxa, examining co-annotations for annotation outliers, and blocking the

use of terms where a more specific annotation should always be possible

(The Gene Ontology Consortium 2018).

Within the PomBase project, we develop and apply ontologies to

describe phenotypes and conditions, and have been major contributors

to the Gene Ontology (GO) since its inception 20 years ago (Harris et

al. 2013, Ashburner et al. 2000, Harris et al. 2004, The Gene Ontology

Consortium 2018).

Phenotype Curation

Phenotypes (observable characteristics of an organism that result from

the interaction of its genotype with a given environment) are curated in

PomBase using the Fission Yeast Phenotype Ontology (FYPO, Harris et

al. 2013). FYPO is a formal, modular ontology that uses several existing

ontologies from the Open Biological and Biomedical Ontology (OBO)

Foundry (for example PATO, GO, CHEBI, and the Sequence Ontology

(SO) as building blocks to create over 6000 precomposed phenotype

terms for consistent descriptions of fission yeast cell- and population-

level phenotype data (Smith et al. 2007, de Matos et al. 2010,

Ashburner et al. 2000, Gkoutos et al. 2009, Eilbeck et al. 2005). Using

FYPO, we curate detailed, accurate annotations for single and multi-

allele phenotypes with the aim of providing comprehensive coverage of

phenotypes reported in the literature. To date, we have provided more

than 80,000 individual phenotype annotations, connected to genotypes

and supported by evidence codes, citations, conditions and, where

applicable, annotation extensions (see below) capturing penetrance,

expressivity or affected gene products (Lock et al. 2018a, Lock et al.

2018b).

30

Gene Ontology (GO) Curation

The Gene Ontology is a collaborative, open project that provides

ontologies to describe three aspects of gene products: Biological Process

(BP), Molecular Function (MF), and Cellular Component (CC).

In addition to supporting functional annotation, the GO can be used to

identify unstudied, or “unknown” gene products - those that have been

assessed for the availability of annotation, but cannot be annotated to

any GO term for a specific aspect. These “unknown” gene products are

annotated to the root node of the aspect for which no information can

be found with the evidence code ND (No Data). This practice is

important in order to allow the unambiguous identification of unknown

gene products as opposed to unannotated gene products. In 2006, we

completed the first round of manual annotation of all fission yeast gene

products to all three aspects of GO, and annotated to the root node with

ND if no data could be found (Aslett and Wood 2006). PomBase and the

Saccharomyces Genome Database (SGD) are currently still the only

MODs that have annotated every protein-coding gene to each GO aspect

in order to distinguish “unannotated” from “un-annotatable”.

The increase in GO annotation, evidence type, and aspect coverage

between 2006 and 2018 are shown in Table 2. Although the number of

gene products annotated with non-root node GO terms has not

increased substantially in the past decade, the actual number of GO

annotations has increased by 10,000. Importantly, the annotation depth

(average annotation distance from the ontology root node) of individual

annotations has increased substantially, as GO annotations have

become more specific. PomBase provides one of the largest sets of

experimentally supported GO terms for use in phylogenetic based

31

propagation to other species, and supports over 0.5 million annotations

for other species (Gaudet 2011).

2006 2018

Protein coding genes 4969 5070

With at least one non-root GO term (any aspect) 4886 4979

With at least one non-root Biological Process 3976 4371

With at least one non-root Cellular Component 4801 4919

With at least one non-root Molecular Function 3471 4095

At least one annotation to each aspect 3225 3570

Total annotations 30343 41115

Total manual annotations 23243 37886

Total annotation with experimental evidence 12056 22858

Automated annotation 7100 3229

Table 2: Changes in numbers of GO annotation to different GO aspects over time. Annotations to each aspect of GO have increased slowly, but a larger proportion of annotations are derived from published experimental data, because we filter redundent automated (electronically inferred) annotation the number of this annotation type has decreased.

Increasing annotation expressivity and connectivity

To fully understand the biological role of a gene product within a

complex biological system it is necessary to connect its intrinsic

molecular or biochemical activity to the context in which it acts (for

example, the gene product it acts upon, the process it is a part of, the

cellular component it localizes to, the temporal or developmental phase

when its activity occurs). Historically, GO annotations were simple

declarative statements and therefore could not capture the requisite

connections (a GO annotation was essentially a pairing of a single gene

product with a single GO term). Although individual gene products could

32

be assigned to multiple GO terms, the annotations were independent,

and could not be easily combined to represent the connectivity of a

biological system.

We have extended the GO annotation model to accommodate

“annotation extensions”, where we create an annotation to an existing

GO term and describe a more specific subtype through the use of one or

more formal relationships to another entity (Huntley et al. 2014). For

example, if the Cdc2 protein kinase phosphorylates Klp9 in order to

negatively regulate spindle elongation this would be represented by:

Gene product: Cdc2

MF: cyclin-dependent protein serine/threonine kinase activity

Annotation extension: Relation: has_substrate(Gene Product: Klp9),

Relation involved_in(BP: negative regulation of mitotic spindle elongation)

Each relational expression is written formally as a Relation(Entity)

where the Relation is a label denoting a relationship type and Entity is

an identifier for a database object (e.g. the identifier for the gene

product Cdc2) or another ontology term. PomBase was an early adopter

of the annotation extension model as we had already been capturing

this connectivity as qualifiers for over a decade before formalization by

GO. In 2014, when (Huntley et al. 2014 was published, fission yeast had

the highest percentage of GO extended annotations and was the only

MOD to display annotation extension annotation on its website.

PomBase has since adopted the annotation extension model for

phenotype, protein modification and gene expression annotations

(McDowall et al. 2015, Lock et al. 2018a, Lock et al. 2018b). The growth

of extended annotations is illustrated in Table 3.

33

Curating pathways and networks

The annotation extension model creates connections between

gene products and provides rich biological context to their interactions.

These connections can be exploited to reconstruct biological pathways.

By using PomBase extended annotations, we can generate pathway

models from curated GO data (The Gene Ontology Consortium 2018).

This procedure relies on the premise that if the Entity in an annotation

extension is a gene product, “Annotated object (Relationship(Entity))”

can be interpreted as node-edge-node in a physical network.

2014 2015 2018

GO annotations 29049 37224 37886

GO with extensions 1902 2149 5786

Phenotype annotations 15713 36882 80673

Phenotypes with extensions 6005 11722 33005

Modification annotations 1614 11285 35296

Modifications with extensions 1594 4009 26221

Gene expression with extensions 26357 36455 37724

Table 3: Fission yeast annotation extensions, growth over time: Total number

of annotations and the subset with extensions.

More recently, we have extended annotation extension usage to link

Molecular Functions to Biological Processes. Because an annotation can

now connect a gene product to a substrate and a pathway, annotations

can be used to navigate a pathway from gene to gene on the PomBase

gene pages. Reciprocal annotations are also displayed, enabling

navigation in the opposite direction (Lock et al 2018a, Lock et al 2018b).

For example, the highly conserved cyclin-dependent

serine/threonine kinase Cdc2 (homolog of the mammalian CDK1) is

known to directly phosphorylate over 140 different proteins. A number

34

of these Cdc2–substrate connections are linked to the biological

processes that the interaction regulates (Figure 1A).

A representation of a pathway can be created from this series of

molecular function annotations using annotation extensions to identify

regulated kinase substrates and the biological process context in which

the functions take place, illustrated schematically in Figure 1B.

Figure 1. Using GO annotation extensions to reconstruct networks (A) Cdc2 activities (molecular functions) connected to substrates and biological processes using annotation extension notation. (B) Direct downstream targets of Cdc2 can be accessed via the hyperlinked annotation extension substrates, enabling users to follow biological pathways. The capturing of targets makes it possible to reconstruct pathways for a systems level representation of gene networks.

PomBase generates network diagrams based on manual curation for all

GO slim (see below) terms using GO annotation extension data as

35

described above, supplemented by curated high confidence physical

interaction data and curated protein complex assignments using EsyN

network building software (Bean et al. 2014). We are developing this

approach further to create a dynamic detailed and reliable qualitative

curation-based network view of a model eukaryotic cell (Lock et al

2018a, Lock et al 2018b).

A system-wide biological summary (GO slim)

As part of the PomBase project and the GO project, I created and

continue to maintain the fission yeast “GO slim” (a tailored subset of

“high-level” GO terms), and the generic GO Consortium slim, to provide

to provide biological overviews (Rhee et al 2008, Lock et al 2018b). The

fission yeast GO slim set comprises 53 biological process terms specific

enough to be informative about a gene products cellular role, whilst

minimizing overlap between terms (https://www.pombase.org/browse-

curation/fission-yeast-go-slim-terms). This slim set also aims to

demonstrate the distribution of processes within distinct “modules” of

biology (cytokinesis, tRNA metabolism, DNA replication, etc.) and

therefore excludes overly general biological process terms, such as

“metabolism” or “cellular component organization”, that would increase

coverage at the expense of specific context. Terms that recapitulate

activities in the molecular function ontology (e.g. “protein

phosphorylation”) or describe phenotypic observations but do not

correspond to a specific physiological role for a gene product (e.g.

“response to chemical”) are also excluded. The browsable fission yeast

slim resource represents 99.5% of all fission yeast proteins with a

biological process annotation.

36

Community curation

Curating gene-specific information from peer-reviewed articles is a time-

consuming, labour-intensive process that involves reading publications

and associating novel or confirmatory information with genes or other

biological features. Several factors have recently motivated databases to

seek alternative curation strategies in the face of decreasing funding

and an increasing volume and complexity of data presented in each

publication (Oliver et al. 2016). One promising approach implemented

by PomBase is a community curation initiative that engages researchers

in direct curation of their own papers2. By combining the topic-specific

expertise of biological experts with professional curators’ familiarity with

ontologies and annotation practices, community curation provides a high

standard of accuracy and specificity. The increase in the number of

community contributions over time is summarised in Table 4.

To date, 1445 publications have been assigned to community members

for curation. Of these, 627 are finished and have been, or are currently

being, checked by the PomBase curators for inclusion in the main

PomBase database an overall response rate of 44.1%.

2 See https://www.pombase.org/community/fission-yeast-community-curation-project for

background and historical perspective

37

Publications curated Annotations

year All Curator Community Curator Community

2012 378 366 12 3912 104

2013 463 382 81 6091 633

2014 920 835 85 11240 917

2015 592 506 86 116641 2046

2016 436 295 141 10553 2410

2017 302 203 99 7539 1501

2018 267 144 123 5342 4274

Total 3358 2731 627 56318 11885

Table 4: Current Community contributions, increase over time Number of papers curated by professional and community curators, and total number of annotations created, per year. Any paper fully or partially curated by a community curator is classified as a community curated publication. Papers are counted towards the year in which they are curated. Note: The breakdown of annotations between professional and community curators is only accurate from 2013 onwards. Until the end of 2012, all annotations from a community-curated paper were attributed to the community curator, even those added by a professional curator during the approval procedure.

At the time of writing, PomBase has manually curated over

222,000 annotations to ontology or controlled vocabulary terms,

supported by evidence codes and citations (Lock et al. 2018a, Lock et al.

2018b) (increase summarised in figure 2). Of these, over 100,000 are

connected to other gene products or further specified by annotation

extensions. These manual annotations are supplemented by ~3000 GO

annotations generated by automated methods based on sequence

homology to fill known annotation gaps. An increasing proportion of the

fission yeast annotation is provided by community curation. Thus, a

reliable, collaborative system has matured into one of PomBase’s

signature achievements.

38

Figure 2 Increase in ontology-based curation in PomBase over time.

Cumulative increases for curation of different types.

4. Analysing large datasets using biocuration

By providing parts lists, capturing gene-specific information, and making

connections between similar features, curated data underpins the

analysis and interpretation of many experimental results, especially for

large functional genomics datasets (Oliver et al. 2016). The value of

consistent curation for data analysis and results interpretation is

demonstrated by the analysis of the fission yeast genome-wide deletion

collection, and the genome-wide screen of deletion mutants for cell cycle

and morphology phenotypes (Kim et al 2010, Hayles et al 2013).

The deletion collection, prevalence of gene dispensability

The first collection of heterozygous deletions for fission yeast, covering

98.4% of the 4914 protein coding genes in the published reference

genome was constructed by Bioneer (an industrial partner in the

39

deletion project consortium). We found that 26.1% of the fission yeast

protein coding genes (1260/4836) were essential, and 73.9%

(3576/4836) were non-essential, for viability of haploid cells in the

growth conditions used. These results contrast with budding yeast

where 17.8% (1033/5776) were essential (Kim et al 2010). Despite a

smaller total number of proteins, fission yeast has 227 more essential

protein-coding genes than budding yeast.

Duplication, taxonomic conservation and dispensability

Since fission yeast has fewer duplicated genes than budding yeast, we

examined the possibility that duplication in budding yeast might mask

essentiality. To assess this, we identified all of essential genes in one

yeast with a duplicated ortholog in the other yeast (one-to-many and

many-to-many relationships). We then identified cases where duplicated

orthologs were non-essential in the other species. This revealed only 67

essential genes in fission yeast and 32 in budding yeast where

essentiality of the ortholog might be masked by redundancy. Reduced

redundancy can therefore only account for 35 (67–32), of the 227 extra

essential genes in fission yeast. We concluded that redundancy is not

the major reason for an increased number of essential genes in fission

yeast (Kim et al 2010). Despite this, essential genes are more likely to

be unique and duplicated genes are less likely to be essential with

93.1% of essential genes (1173/1260) being present in single copy

compared to 73.9% of non-essential genes (2643/3576). Genes which

are more broadly conserved taxonomically (universally, or to Metazoa),

are also more likely to be essential than fungal-specific or species-

specific genes (Figure 3).

40

Figure 3: Comparative analysis of gene dispensability profiles of fission yeast. Gene dispensability profiles of 4,836 deletion mutants by gene copy number of fission yeast orthologs compared to budding yeast (x axis) and species distribution (y axis). Compared to budding yeast, fission yeast genes consist of 2,841 single-copy genes (n = 1, m 1), 855 duplicated genes (n > 1, m 1) and 1,140 genes found in fission yeast but not in budding yeast (n 1, m = 0), where ‘n’ is the number of genes in fission yeast and ‘m’ is the number of genes in budding yeast. The term ‘eukaryotes’ includes humans and the term ‘variable phyla’ includes plants. The area of each circle represents the numbers of genes, where essential and nonessential genes are represented by yellow and blue, respectively.

Dispensability, genome location, and cellular location

Essential and non-essential genes were found to be distributed evenly

throughout the fission yeast genome except within 100 kb of the

telomeres where essential genes were almost absent (1.2% compared

41

to a genome average of 26.1%). We analysed gene essentiality and the

subcellular localization of protein products and found that the

percentage of essential genes was higher among those coding for

components of the spindle pole body, nucleus, and nuclear envelope

than for other cellular components (Kim et al 2010).

Dispensability and biological process

GO enrichment revealed that essential gene sets for both yeasts were

significantly overrepresented for core cellular processes (such as DNA,

RNA, protein, and lipid metabolism) and biosynthetic processes (general

transcription, translation, and ribosome assembly) (Figure 4). In

contrast, non-essential genes were enriched for processes related to

interactions with the environment (transcriptional regulation, and cell

communication, plasma membrane transport). Non-essential genes

were also enriched for condition- or lifestyle-specific processes (e.g.

stress, sexual reproduction) which are less likely to be essential in the

laboratory growth conditions compared to in the wild. Genes of unknown

cellular role were also highly enriched in the non-essential set (93% of

unknown genes were non-essential). We predict that most unknown

genes are not involved in core biosynthetic and information storage

processes.

Access to deletion collections of the two model yeasts allowed us to

compare the conservation of gene-specific dispensability between two

evolutionarily distant organisms. To eliminate complications due to

duplication and redundancy, we compared the essentiality of the 2438

single-copy (one-to-one) orthologous pairs where deletion data was

available for both species. Overall, 83% of these genes had the same

dispensability in both species, suggesting that these may be universally

essential.

42

Figure 4: Enriched essential and non-essential processes for fission yeast and budding yeast Comparison of GO analyses of fission yeast and budding yeast genes. Bar chart shows a selection of broad, biologically informative GO terms significantly (P <= 0.01) enriched for essential and nonessential genes in fission yeast and budding yeast.

The remaining 17% of orthologous pairs (411/2438) differ in essentiality

between the two yeasts. Of these, 268 are essential only in fission yeast

and 143 are essential only in budding yeast (Figure 5). Therefore, there

were 125 more fission yeast indispensable genes in the one-to-one

category, contributing the majority of the difference in essentiality

between the two yeasts (125 more in fission yeast). To analyse these

differences further we identified the set of broad biological processes

covered by the genes with differential essentiality (Figure 6).

43

Figure 5: Dispensability comparison of orthologous pairs from the two yeasts. Essentiality of the 2,438 single copy (one-to-one) orthologous pairs from S. pombe:S. cerevisiae were compared. Eighty-three percent of orthologs show conserved dispensability and the remaining 17% show different dispensability. E=essential; NE=nonessential.

Figure 6: Functional distribution of orthologs with different dispensability. The 17% of the orthologous pairs with different dispensability were allocated to one of 31 biological terms, 22 of which are shown here. Note that genes annotated to mitochondrial functions, certain amino acid metabolic pathways and protein degradation pathways such as neddylation and sumoylation are mostly essential in one yeast, and non-essential in the other yeast, whereas other categories show essential genes in both yeasts under the conditions used in this study although the specific genes are different. Because there are some differences in the constituents of the standard rich media used for each organism, it is likely that some differences in dispensability are due to these differences.

44

The most striking difference was in mitochondrial function (95

orthologous pairs). Of these 89 were essential in fission yeast and only 6

in budding yeast. Most (69) nuclearly encoded components of the

mitochondrial translation machinery, which are required for biogenesis

of the respiratory chain (budding yeast is a facultative anaerobe and

therefore cells lacking oxidative phosphorylation pathway are viable).

Other differences included the requirement for DNA replication

checkpoint genes, processes related to RNA processing and export

spindle/kinetochore-related processes, glycosylation, and other ER

processes. These differences may be due to dissimilarities in the number

of introns, centromere structure, organization of the Golgi network,

facultative anaerobiosis or other lifestyle differences between the two

yeasts.

Our comparisons of orthologous gene pairs between budding and fission

yeast showed that 83% had the same dispensability despite being

distantly related. This high level of conservation in dispensability will be

helpful for the interpretation of viability data from more complex

eukaryotes.

A phenotype resource for cell cycle and cell shape.

Understanding how cells reproduce and how they generate their shape

are two major goals of eukaryotic cell biology. These two processes are

intimately related because cells duplicate their components and

reproduce their cell structure in space to generate two daughter cells.

Defects in cell division and morphology frequently lead to cell death and

disease, via defects in genetic transmission or tissue architecture.

Fission yeast is a very amenable tool for investigating cell cycle and cell

shape because it grows by apical extension and divides by medial

fission.

45

To identify genes affecting cell cycle or morphology, 4843 single-gene

deletion mutants (95.7% of protein-coding genes) were visually

screened to provide the first system-wide description of cell shape

phenotypes. Mutants were classified as exhibiting one of 11 cell shape

phenotypes plus wild type (WT), arrested as ungerminated spores, or

arrested as germinated spores.

Cell shape mutants

Cell shape mutants other than those with a long cell phenotype have

been used to identify genes important for generating the normal rod

shape of the fission yeast cell. Previous studies identified “orb” mutants

that are spherical because cells fail to grow in a polarized manner;

curved or bent cells that no longer orient the growth zones correctly;

and T-shaped mutants, which form a new growth zone in the wrong

place and often grow at 90 degrees to the long axis of the cell (Snell and

Nurse 1994, Verde et al. 1995). Other cell shape mutants included

bottle or skittle-shaped cells (one end wider than the other) and

mutants with more generally misshapen cells (Wiley et al. 2008). The

deletion mutant library was screened for shape defects using the

following seven categories to describe the mutant phenotypes rounded,

stubby, curved and T-shaped, skittle, and 3 less defined subgroups of

generally misshapen: viable misshapen mutants (miss V), viable

misshapen mutants, which have a weak phenotype (miss weak V) and

essential misshapen mutants (miss E). In total, 857/4843 genes

(17.7%) resulted in an altered cell shape when deleted (Tables 5 and 6)

and 668 of these were conserved between fission yeast and human

(77.9%). No additional cell shape phenotypes were identified compared

with earlier work, suggesting that there may only be a restricted

number of defined shapes that a fission yeast cell can adopt.

46

47

Table 5 and 6: Enrichments to GO biological process and cellular component terms for each phenotypic class. The enrichment results were mapped to “GO slim” (high level) terms to give a broad view of the ontology content of the genome-wide gene deletion dataset.

yeast cell can adopt.

The rounded, stubby, and curved sets are all enriched for genes

implicated in the determination of cell polarity and for protein

localization at the cell tip; rounded and stubby sets for cell wall

organization; and the stubby set for cytokinesis and actin

cytoskeleton organization (Tables 5 and 6). All 14 genes that are in

the set whose deletant have a curved morphology, and are annotated

to cytoskeleton organization, are involved in microtubule cytoskeleton

organization and their protein products are enriched at the cell tip.

Genes which generate a skittle phenotype when deleted are enriched

for the process of mitochondrial organization; nearly 50 percent

(118/241) of the total genes annotated to mitochondrial organization

were in the skittle category. We found that 19 genes were required

for mitochondrial tRNA metabolism and 61 genes for the

mitochondrial ribosome, suggesting that mitochondrial translation

underlies the skittle phenotype. The miss E category was enriched for

genes required for lipid metabolism (35 genes), 15 of which are

involved in glycosylphosphatidylinositol (GPI) anchor biosynthesis.

GPI anchor proteins affect cell wall integrity, and loss of these

proteins can result in a misshapen cell phenotype (Yada et al. 2001).

Cell cycle mutants

In fission yeast, an elongated phenotype indicates cells blocked in

interphase or cytokinesis. Mitotic defects often cause arrest with WT

48

or irregular shape; cell shape mutants lose their cylindrical

morphology. The elongated phenotype was divided into three sub-

categories based on penetrance and whether branching occurred:

long high penetrance, long low penetrance, and branched. The genes

with long branched phenotype were enriched for cytokinesis. Genes

with long low penetrance and long high penetrance were enriched for

DNA metabolism and mitotic cell cycle regulation (Table 5). However,

genes with the long high penetrance phenotype was also enriched for

mRNA metabolism, RNA biogenesis, transcription, and DNA replication

(genes required for progression through interphase of the mitotic cell

cycle). The long low penetrance set of deletants was enriched for the

process of chromosome segregation (genes required for progression

through mitosis). The misshapen essential category also contains

genes required for DNA replication, and kinetochore components

required in mitosis, although cell size at division was normal (these

are likely to be checkpoint defective). A total of 513 genes were

identified as having phenotypes reflecting cell cycle defects (276 of

which had not previously been known to affect the cell cycle). We also

identified 13 genes which might provide links between metabolism

and the mitotic cell cycle.

These analyses illustrate how functional curation provided by

the MODs is increasingly necessary for the analysis of large-scale

datasets.

49

5. Tools for curation, community curation and data hosting

Historically (2004 to 2010), fission yeast data were housed in the

GeneDB resource at the Wellcome Trust Sanger Institute (Hertz-

Fowler C et al. 2004). GeneDB provided an intuitive interface for

display, querying and downloading information on S. pombe genes. In

essence, it provided sets of curated lists structured around gene

pages. We have continued to develop and extend this model in

PomBase, the dedicated model organism database for the fission

yeast (Wood et al 2012, McDowall et al 2014) (Lock et al. 2018b).

PomBase integrates the S. pombe genome sequence and features

with genome-wide datasets and detailed, comprehensive gene-

oriented, ontology-driven manual curation of published literature, and

provides tools to interrogate these data.

Many biological databases are domain-specific, tasked with the

curation of a small subset of data types (e.g. nucleic acids, sequence

features, protein family or domain, modification, phenotype, function,

process, component, interactions, gene and protein expression,

orthology, pathways). A MOD resource such as PomBase curates all of

these data types, and more, to provide the most detailed and

comprehensive information possible for an individual species or clade.

Researchers can quickly assess the ways in which different biological

features are connected (or not) by simply browsing and querying the

integrated data.

PomBase organizes data into pages that summarise genes,

publications, ontology terms, and genotypes. Of these, the most

frequently accessed are gene pages. The PomBase website supports

daily data updates as well as fast, efficient querying and smooth

50

navigation within and between pages. Recently implemented pages

for publications and genotypes provide routes to all data curated from

a single source and to all phenotypes associated with a specific

genotype, respectively. For ontology-based annotations, data displays

balance comprehensive coverage with ease of use. The default

annotation view makes creative use of the ontology structure to

provide a concise, non-redundant summary that can be expanded to

reveal underlying details and metadata. The phenotype annotation

display also offers filtering options to allow users to focus on specific

classes of phenotype (Lock et al 2018a, Lock et al 2018b).

Importantly, data are packaged into standardized downloadable

formats by data type for dissemination and re-use.

Canto curation tool

To enable the fission yeast community to contribute directly to

PomBase, we have developed Canto, a web-based tool that enables

the capture of detailed biological information consistently and

accurately, using ontologies (Rutherford et al. 2014). Canto is also

used by professional biocurators and supports literature triage and

curation management tasks; it also records the curator for each

annotation, making attribution possible. The Canto tool can be

configured for use with any species and any of several ontologies, and

can therefore be adapted for diverse uses.

Canto organizes curation at the level of an individual

publication, and provides a simple, intuitive interface that requires no

specialist training. The user is guided step-by-step through the

curation procedure ensuring that all essential and optional data are

51

captured. Canto also supports the use of annotation extensions with

ontology terms, and prompts for their use in appropriate contexts.

Our observation that the amount of data produced in scientific

laboratories is increasing at a non-linear rate, even for small-scale

publications, has required a reassessment of the way that we capture

and display these data. The need for intuitive curation tools and

displays is of paramount importance to enable interpretation of the

accumulating information.

By developing curation tools that are easy for non-professional

biocurators to use with minimal training, we have opened up a route

that enables the solicitation of high-quality, consistently described

data from the authors of small-scale publications.

Conclusions

The sum total of the classification work described above is pivotal to

many of the past and current activities of the fission yeast research

community, and has enabled fission yeast to become firmly

established as a mature model eukaryotic species in many areas of

basic research. Biocuration adds value to published data and is an

increasingly necessary part of the biological research cycle, due in

part to the increased new knowledge associated with each publication

(both large- and small-scale).

Genome biocuration begins with the identification of accurate

parts lists to provide a solid framework for comparative analyses and

for functional curation. At publication, fission yeast gene prediction

efforts provided the most compact proteome parts list of a free-living

52

eukaryote. The subsequent manual identification of orthologs

provided a highly robust mechanism for the inference of functional

roles for unstudied proteins by annotation transfer (Wood 2006). At

present 78% of fission yeast proteins have identifiable orthologs in

budding yeast and 69% in human. Proteome-wide ortholog

identification has also provided a framework for the evaluation of

functional similarities and differences due to duplication and loss, and

to compare the breadth and depth of annotations (both proxies for

degree of knowledge) between species. Early analysis revealed,

relative to budding yeast, a larger proportion of the fission yeast

protein-encoding genes are present in the S. pombe genome as single

copy (one-to-one), and that many proteins lost from S. cerevisiae

were conserved between fission yeast and human. Both of these

observations give fission yeast researchers an advantage for many

functional studies.

The process of functional biocuration adds value to published

data by integrating information from different publications at the level

of the gene product or the data type, both within and between

species. Data integration using the formal semantic representation

provided by ontologies ensures human and computer readability, and

is pivotal to improvements in the data accessibility, interoperability

and reuse lifecycle. Funders increasingly require providers to make

biological data FAIR (Findable, Accessible, Interoperable and

Reusable) (Wilkinson et al. 2016). Although, at present, FAIR

principles apply primarily to high-throughput datasets, functional

biocuration of small-scale data enables FAIR-sharing of published

information from thousands of publications by organizing the data

consistently into a semantically simplified and structured format. The

53

biocuration process therefore enables small-scale data to be reused in

the interpretation of large-scale datasets as demonstrated by two

analyses assessing different aspects of the phenotypes of single-gene

deletants in genome-wide mutant collections.

Pivotal to the increase in curated data, creative data

interpretation and display are required to provide reliable and

valuable biological information to the end user (who is frequently also

the data provider), in intuitive human-readable formats that support

FAIR principles. My work has developed tools and web interfaces to

support user interrogation of curated data, and for streamlined data

curation, including tools for community curation.

Biocuration improves data accessibility, data reuse in planning

and analysis, and increases data sharing possibilities. Furthermore,

biocuration also adds value and synthesizes new knowledge through

curated connections and these are being leveraged in novel ways. By

integrating curated data fully we are able to identify subsets of genes

involved in specific processes, identify knowledge gaps, and

ultimately to describe emergent properties of biological systems by

the reconstruction of curated biological pathways and networks.

54

REFERENCES

Allshire, R.C., Ekwall, K. (2015). Epigenetic Regulation of Chromatin

States in Schizosaccharomyces pombe. Cold Spring Harb Perspect

Biol. 1;7(7):a018770

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. (1990)

Basic local alignment search tool. J. Mol. Biol. 215:403-410.

Aravind, L., Watanabe, H., Lipman, D.J., Koonin, E.V. Lineage-specific

loss and divergence of functionally linked genes in eukaryotes. Proc

Natl Acad Sci U S A. 2000 Oct 10;97(21):11319-24.

Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry,

J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A.,

Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C.,

Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G. (2000) Gene

ontology: tool for the unification of biology. The Gene Ontology

Consortium. Nat Genet. 25(1):25-9.

Aslett, M., Wood, V. (2006). Gene Ontology annotation status of the

fission yeast genome: preliminary coverage approaches 100%. Yeast.

23(13):913-919

Baumann, P., and Cech, T.R. (2001) Pot1, the putative telomere end-

binding protein in fission yeast and humans. Science.

292(5519):1171-5.

55

Bean, D.M., Heimbach, J., Ficorella, L., Micklem, G., Oliver, S.G.,

Favrin, G. (2014) esyN: network building, sharing and publishing.

PLoS One. 9(9):e106035. eCollection 2014.

Berbee, M.L., Taylor, J.W. (1993). Dating the evolutionary radiations

of the true fungi. Can J Bot 71:1114-1127

Bitton, D., Wood V., Scutt, P.J., Grallert, A., Yates, T., Smith, D.L.,

Hagan, I.M., Miller C.J. (2011). Augmented annotation of the

Schizosaccharomyces pombe genome reveals additional genes

required for growth and viability. Genetics. 187(4):1207-17

Bond, M., Holthaus, S-M., Tammen, I., Tear, G., Russell, C. (2013)

Use of model organisms for the study of neuronal ceroid

lipofuscinosis. Biochim Biophys Acta. 1832:1842–65.

Eilbeck, K., Lewis, S.E., Mungall, C.J., Yandell, M., Stein, L., Durbin,

R., Ashburner, M. (2005) The Sequence Ontology: a tool for the

unification of genome annotations. Genome Biol. 6(5):R44.

Ellermeier, C., Schmidt, H., Smith, G.R. (2004). Swi5 acts in meiotic

DNA joint molecule formation in Schizosaccharomyces pombe.

Genetics. 168(4):1891-8

Fair, B.J., Pleiss, J.A. (2017). The power of fission: yeast as a tool for

understanding complex splicing. Curr Genet. 63(3):375-380

Gaudet, P., Livstone, M.S., Lewis, S.E., Thomas, P.D. Phylogenetic-

based propagation of functional annotations within the Gene Ontology

consortium. (2011). Brief Bioinform. 12(5):449-62

56

Gkoutos, G.V., Mungall, C., Dolken, S., Ashburner, M., Lewis, S.,

Hancock, J., Schofield, P., Kohler, S., Robinson, P.N. (2009)

Entity/quality-based logical definitions for the human skeletal

phenome using PATO. Conf Proc IEEE Eng Med Biol Soc. 2:7069-72.

Goffeau, A., Barrell, B.G., Bussey, H. et al. (1996) Life with 6000

genes. Science. 274:546-567

Harris, M.A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger,

R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., Richter, J., Rubin,

G.M., Blake., J.A., Bult, C., Dolan, M., Drabkin, H., Eppig, J.T., Hill,

D.P., Ni, L., Ringwald, M., Balakrishnan, R., Cherry, J.M., Christie KR,

Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL,

Nash RS, Sethuraman, A., Theesfeld, C.L., Botstein, D., Dolinski, K.,

Feierbach, B., Berardini, T., Mundodi. S., Rhee, S.Y., Apweiler, R.,

Barrell, D., Camon, E., Dimmer, E., Lee, V., Chisholm, R., Gaudet, P.,

Kibbe, W., Kishore, R., Schwarz, E.M., Sternberg, P., Gwinn, M.,

Hannick, L., Wortman, J., Berriman, M., Wood, V., de la Cruz, N.,

Tonellato, P., Jaiswal, P., Seigfried, T., White, R. (2004) The Gene

Ontology (GO) database and informatics resource. Nucleic Acids Res.

32(Database issue):D258-61.

Harris, M.A., Lock, A., Bähler, J., Oliver, S.G., Wood, V. (2013).

FYPO: The fission yeast phenotype ontology. Bioinformatics.

29(13):1671-1678

Hayles, J., Wood, V., Jeffery, L., Hoe, K-L., Kim, D-U., Park. H-O.,

Salas-Pino, S., Heichinger, C., Nurse, P. (2013). A genome-wide

57

resource of cell cycle and cell shape genes of fission yeast. Open Biol.

3(5):130053

Hertz-Fowler, C., Peacock, C.S., Wood, V., Aslett, M., Kerhornou, A.,

Mooney, P., Tivey, A., Berriman, M., Hall, N., Rutherford, K., Parkhill,

J. Ivens, A.C., Rajandream, M.A., Barrell, B. (2004) GeneDB: a

resource for prokaryotic and eukaryotic organisms. Nucleic Acids Res.

32(Database issue):D339-43

Hoffman, C.S., Wood, V., and Fantes, P. (2015). An Ancient Yeast for

Young Geneticists: A Primer on the Schizosaccharomyces pombe

Model System. Genetics. 201(2):403-423

Huntley, R.P., Harris, M.A., Alam-Faruque, Y., Blake, J.A., Carbon, S.,

Dietze, H., Dimmer, E.C., Foulger, R.E., Hill, D.P., Khodiyar, V.K.,

Lock, A., Lomax, J., Lovering, R.C., Mutowo-Meullenet, P., Sawford,

T., Van Auken, K., Wood, V., Mungall, C.J. (2014). A method for

increasing expressivity of Gene Ontology annotations using a

compositional approach. BMC Bioinformatics. 15:155

Kim, D.U., Hayles, J., Kim, D., Wood, V, et al. (2010). Analysis of a

genome-wide set of gene deletions in the fission yeast

Schizosaccharomyces pombe. Nature Biotechnology 28: 617–623

Leupold, U. (1993). The origin of Schizosaccharomyces pombe

genetics, pp. 125–128 in The Early Days of Yeast Genetics, edited by

M. N. Hall, and P. Linder. Cold Spring Harbor Laboratory Press, Cold

Spring Harbor, NY.

58

Lock, A., Rutherford, K., Harris, M.A., Wood, V. (2018a). PomBase:

The Scientific Resource for Fission Yeast. In: Kollmar M. (eds)

Eukaryotic Genomic Databases. Methods in Molecular Biology, vol

1757. Humana Press, New York, NY

Lock, A., Rutherford, K., Harris, M.A., Hayles, J., Oliver, S.G., Bähler,

J., Wood, V. (2018b) PomBase 2018: user-driven reimplementation

of the fission yeast database provides rapid and intuitive access to

diverse, interconnected information. Nucleic Acids Res. [Epub ahead

of print]

de Matos, P., Alcántara, R., Dekker, A., Ennis, M., Hastings, J., Haug.

K., Spiteri, I., Turner, S., Steinbeck, C. (2010) Chemical Entities of

Biological Interest: an update. Nucleic Acids Res. 2010

Jan;38(Database issue):D249-54.

McDowall, M.D., Harris, M.A., Lock, A., Rutherford, K., Staines, D.M.,

Bähler, J., Kersey, P.J., Oliver, S.G., Wood, V. (2014). PomBase

2015: updates to the fission yeast database. Nucleic Acids Res.

43(Database issue):D656-61

Nurse, P. (2000). A long twentieth century of the cell cycle and

beyond. Cell. 100:71-78

Oliver S.G., Lock, A., Harris, M.A., Nurse, P., Wood, V. (2016). Model

organism databases: Essential resources that need the support of

both funders and users. BMC Biology.14(1):49

59

Rhee, S.Y., Wood, V., Dolinski, K., Draghici, S. (2008) Use

and misuse of the gene ontology annotations. Nat Rev

Genet. 9(7):509-15

Rutherford, K.M., Harris, M.A., Lock, A., Oliver, S.G., Wood, V.

(2014). Canto: An online tool for community literature curation.

Bioinformatics. 30(12):1791-1792

Snell, V., Nurse, P. (1994) Genetic analysis of cell morphogenesis in

fission yeast--a role for casein kinase II in the establishment of

polarized growth. EMBO J. 1;13(9):2066-74

Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W.,

Goldberg, L.J., Eilbeck, K., Ireland, A., Mungall, C.J.; OBI Consortium,

Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S.A.,

Scheuermann, R.H., Shah, N., Whetzel, P.L., Lewis, S. (2007) The

OBO Foundry: coordinated evolution of ontologies to support

biomedical data integration. Nat Biotechnol. 25(11):1251-5.

The Gene Ontology Consortium. (2018) The Gene Ontology Resource:

20 years and still GOing strong. Nucleic Acids Res. 2018 Nov 5. doi:

10.1093/nar/gky1055. [Epub ahead of print]

Toya, M., Sato, M., Haselmann, U., Asakawa, K., Brunner, D., Antony,

C., Toda, T. (2007). Gamma-tubulin complex-mediated anchoring of

spindle microtubules to spindle-pole bodies requires Msd1 in fission

yeast. Nat Cell Biol. 9(6):646-53

Verde, F., Mata, J., Nurse, P. (1995) Fission yeast cell

morphogenesis: identification of new genes and analysis of their role

during the cell cycle. J Cell Biol. 131(6 Pt 1):1529-38

60

Wiley, D.J., Catanuto, P., Fontanesi, F., Rios., C., Sanchez,

N., Barrientos, A., Verde, F. (2008) Bot1p is required for

mitochondrial translation, respiratory function, and normal cell

morphology in the fission yeast Schizosaccharomyces pombe.

Eukaryot Cell. 7(4):619-29

Wilhelm, B.T., Marguerat, S., Watt, S., Schubert, F., Wood, V.,

Goodhead, I., Penkett C.J., Rogers, J., Bähler J (2008). Dynamic

repertoire of a eukaryotic transcriptome surveyed at single-nucleotide

resolution. Nature. 26;453(7199):1239-43

Wilkinson, M.D., et al. (2016) The FAIR Guiding Principles for

scientific data management and stewardship. Sci Data. 3:160018

Wolfe, K.H., Shields, D.C. (1997) Molecular evidence for an ancient

duplication of the entire yeast genome. Nature.

387(6634):708-13.

Wood, V., Rutherford, K.M., Ivens, A., Rajandream, M.-A., Barrell, B.

(2001). A re-annotation of the Saccharomyces cerevisiae genome.

Comparative and Functional Genomics. 2(3):143-154

Wood, V., et al. (2002). The genome sequence of

Schizosaccharomyces pombe. Nature. 21;415(6874):871-80

Wood, V. (2006). Schizosaccharomyces pombe comparative

genomics; from sequence to systems. pp. 233-285 in Topics in

Current Genetics, edited by P. Sunnerhagen, and J. Piskur

61

Wood, V., Harris, M.A., McDowall, M.D., Rutherford, K., Vaughan,

B.W., Staines, D.M., Aslett, M., Lock, A., Bähler, J., Kersey, P.J.,

Oliver S.G. (2012) PomBase: A comprehensive online resource for

fission yeast. Nucleic Acids Res. 40(D1):D559-D564.

Wood, V., Lock, A., Harris, M.A., Rutherford, K., Bähler, J, Oliver

S.G. Hidden in plain sight: What remains to be discovered in the

eukaryotic proteome? doi: https://doi.org/10.1101/469569

Yada, T., Sugiura, R., Kita, A., Itoh, Y., Lu, Y., Hong, Y., Kinoshita, T.,

Shuntoh, H., Kuno, T. (2001) Its8, a fission yeast homolog of Mcd4

and Pig-n, is involved in GPI anchor synthesis and shares an essential

function with calcineurin in cytokinesis.

J Biol Chem. 27;276(17):13579-86. Epub 2001 Jan 31.

62

Submitted Publication

1. The genome sequence of Schizosaccharomyces pombe.

!"#$%& ' ()* +,- ' ., /&0%$"%1 .22. ' 333456789:4;<= !"#

$%&'()*+

!"# $#%&'# (#)*#%+# &,!"#$%&'(""#()&*+",' -&*.,-. /&&012 3. 4567768'12 9.:;. 38<8%0=#8'12 9. >?%#12 3. >?%#12 ;. @A#58=AB2 C. @$&*=&(B2 D. E#8AF2 C. G8?7#(F2 @. H8I#=12 J. H8("8'12@. H&5'8%12 K. H=&&I(12 J. H=&5%12 @. H=&5%12 !. L"6776%$5&=A"12 L. L"*=+"#=12 9. L&776%(12 3. L&%%&=12 ;. L=&%6%12 E. J8M6(12 !. N#7A5#7712;. N=8(#=12 @. 4#%A7#(12 ;. 4&O7#12 D. G8'76%12 J. G8==6(12 C. G6087$&12 4. G&0$(&%12 @. G&7=&?012 !. G&=%(O?12 @. G&58=A"12 P. C. G*+I7#12@. G*%A12 K. C8$#7(12 K. C8'#(12 >. C&%#(12 9. C&%#(12 @. >#8A"#=12 @. 9+J&%87012 C. 9+>#8%12 E. 9&&%#?12 @. 9&*7#12 K. 9*%$87712>. 9*=Q"?12 J. D6O7#AA12 L. R0#7712 K. R76M#=12 @. RSD#6712 J. E#8=(&%12 9. ;. T*86712 P. 38OO6%&56A(+"12 K. 3*A"#=,&=012 @. 3*AA#=12J. @8*%0#=(12 K. @##$#=12 @. @"8=Q12 C. @I#7A&%12 9. @6''&%0(12 3. @)*8=#(12 @. @)*8=#(12 K. @A#M#%(12 K. !8?7&=12 3. 4. !8?7&=12;. !6M#?12 @. /87("12 !. /8==#%12 @. /"6A#"#8012 C. /&&058=012 4. -&7+I8#=AU2 3. ;#=AU2 C. 3&OO#%U2 H. 4=?'&%Q=#VU2 W. /#7A<#%(U2P. -8%(A=##7(U2 9. 36#$#=X2 9. @+"8Y ,#=X2 @. 9*Y 77#=:;*#=X2 L. 48O#7X2 9. N*+"(X2 L. N=6AV+Z2 P. G&7V#=Z2 J. 9&#(A7Z2 G. G67O#=AZ2 K. H&=V?'[2W. >8%$#=[2 ;. H#+I[2 G. >#"=8+"[2 3. 3#6%"8=0A[2 !. 9. E&"7\2 E. P$#=\2 /. ]6''#='8%%^2 G. /#07#=^2 3. /8'O*AA^2 H. E*=%#77#1_2;. 4&,,#8*1_2 P. L806#*112 @. J=#8%&112 @. 47&*a112 -. >#78*=#112 @. 9&AA6#=112 N. 4876O#=A112 @. C. ;M#(1B2 ]. b68%$1B2 L. G*%A1B2 K. 9&&=#1B2@. 9. G*=(A1B2 9. >*+8(1F2 9. 3&+"#A1F2 L. 486778=06%1F2 -. ;. !8778081U21X2 ;. 48=V&%1U21X2 4. !"&0#1U2 3. 3. J8$81U21X2 >. L=*V80&1U2C. C6'#%#V1U21X2 9. @8%+"#V1Z2 N. 0#7 3#?1Z2 C. H#%6A&1Z2 ;. J&'c%$*#V1Z2 C. >. 3#M*#7A81Z2 @. 9&=#%&1Z2 C. ;='(A=&%$1[2 @. >. N&=(O*=$1\2>. L#==*AA612 !. >&5#1^2 /. 3. 9+L&'O6#B_2 W. E8*7(#%B12 C. E&A8("I6%BB2 4. -. @"Q8I&M(I6BF2 J. d((#=?BU2 H. 4. H8==#771 e E. D*=(#F

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!" #$%" &"'(")*"+ $)+ $)),-$-"+ -#" ."),/" ,0 1&&2,) 3"$&- 4!"#$%&'(""#()&*+",' -&*.,56 7#2*# *,)-$2)& -#" &/$88"&-)(/9": ,0 ;:,-"2)<*,+2). .")"& 3"- :"*,:+"+ 0,: $ "(=$:3,-"> ?6@A?B C#" *")-:,/":"& $:" 9"-7"") DE $)+ FFG =28,9$&"& 4=95 $)+*,)-$2) :"8$-"+ :";"$-& 2)*8(+2). $ #2.#83 *,)&":%"+ FB@<=9 "8"/")-B H".2,)& (;&-:"$/ ,0 .")"& $:" 8,).": -#$) 2) 9(++2). 3"$&-4!(""#()&*+",' ",),/$'$(,56 ;,&&2983 :"I"*-2). /,:"<"J-")+"+ *,)-:,8 :".2,)&B K,/" ?DL ,0 -#" .")"& *,)-$2) 2)-:,)&6 ,0 7#2*#-#":" $:" ?6MDGB N20-3 .")"& #$%" &2.)21*$)- &2/28$:2-3 72-# #(/$) +2&"$&" .")"&O #$80 ,0 -#"&" $:" *$)*": :"8$-"+B !" 2+")-203#2.#83 *,)&":%"+ .")"& 2/;,:-$)- 0,: "(=$:3,-2* *"88 ,:.$)2P$-2,) 2)*8(+2). -#,&" :"'(2:"+ 0,: -#" *3-,&="8"-,)6*,/;$:-/")-$-2,)6 *"88<*3*8" *,)-:,86 ;:,-",83&2&6 ;:,-"2) ;#,&;#,:38$-2,) $)+ HQR &;82*2).B C#"&" .")"& /$3 #$%" ,:2.2)$-"+72-# -#" $;;"$:$)*" ,0 "(=$:3,-2* 820"B N"7 &2/28$:83 *,)&":%"+ .")"& -#$- $:" 2/;,:-$)- 0,: /(8-2*"88(8$: ,:.$)2P$-2,) 7":"2+")-21"+6 &(.."&-2). -#$- -#" -:$)&2-2,) 0:,/ ;:,=$:3,-"& -, "(=$:3,-"& :"'(2:"+ /,:" )"7 .")"& -#$) +2+ -#" -:$)&2-2,) 0:,/()2*"88(8$: -, /(8-2*"88(8$: ,:.$)2P$-2,)B

>: 9:?<97 @:9: 7@: ;<=?A:7B<5 <C 7@: C8AAD 655<767:E F:5<=:G:H8:5;: <C 7@: GB=?A: :8I69D<7: !"#$%&'(""#()&*+",' -&*.,J 6KGGB<5 D:6G74 L7 M:;<=:G 7@: GBN7@ :8I69D<7B; F:5<=: 7< M:G:H8:5;:EJ C<AA<3B5F !(""#()&*+",' ",),/$'$(,,J 0(,1&)#(.2$3$',4,5(1'.J 6)&'&-#$4( *,4(1&5('3,)OJ 7)(.$2&-'$' 3#(4$(1(+ 65E8&*& '(-$,1'-JP4 #@: :57B9: G:H8:5;: <C 7@: 85BH8: 9:FB<5G <C 7@:7@9:: ;@9<=<G<=:G BG ;<=?A:7:J 3B7@ F6?G B5 7@: ;:579<=:9B;9:FB<5G <C 6M<87 +2 IMJ 65E 6M<87 .P2 IM B5 7@: 7:A<=:9B; 9:FB<5G4#@: ;<=?A:7B<5 <C 7@BG G:H8:5;:J 7@: 6Q6BA6MBAB7D <C G<?@BG7B;67:E9:G:69;@=:7@<E<A<FB:GJ 65E 7@: :N?65EB5F ;<==85B7D 3<9IB5F <5!9 -&*.,J 3BAA 6;;:A:967: 7@: 8G: <C !9 -&*., C<9 C85;7B<56A 65E;<=?6967BQ: G78EB:G <C :8I69D<7B; ;:AA ?9<;:GG:G4!"#$%&'(""#()&*+",' -&*., BG 6 GB5FA:R;:AA:E C9:: ABQB5F 69;@B6GR

;<=D;:7: C85F8G G@69B5F =65D C:6789:G 3B7@ ;:AAG <C =<9: ;<=?ABR;67:E :8I69D<7:G4 /9<= F:5: G:H8:5;: ;<=?69BG<5G 65E?@DA<F:5:7B; 656ADG:GJ B7 @6G M::5 G8FF:G7:E 7@67 KGGB<5 D:6G7EBQ:9F:E C9<= M8EEB5F D:6G7 69<85E OO2S+.2 =BAAB<5 D:69G TUD9V6F<J 65E C9<= U:76W<6 65E ?A657G 69<85E ,J222S,J.22UD9 6F<XJ6A7@<8F@ 6 =<9: 9:;:57 :G7B=67: @6G ?87 7@:G: 7B=:G 67 ,J,++ 65E,JP22UD9J 9:G?:;7BQ:ADY4 Z<=: F:5: G:H8:5;:G 69: 6G :H86AAD

EBQ:9F:E M:73::5 7@: 73< D:6G7G 6G 7@:D 69: C9<= 7@:B9 @8=65@<=<A<F8:GJ ?9<M6MAD 9:[:;7B5F 6 =<9: 96?BE :Q<A87B<5 3B7@B5C85F6A AB5:6F:G 7@65 B5 7@: U:76W<64 !9 -&*., 36G K9G7 E:G;9BM:E B57@: ,Y\2G 65E @6G M::5 :N7:5GBQ:AD G78EB:E GB5;: 7@: ,\-2G\J,2J9:G8A7B5F B5 7@: ;@696;7:9BW67B<5 <C 69<85E ,J.22 F:5:G T@77?]^^3334F:5:EM4<9F^?<=M:V4 #@: :6G: 3B7@ 3@B;@ B7 ;65 M: F:5:7B;6AAD=65B?8A67:E BG G:;<5E <5AD 7< !9 ",),/$'$(, 6=<5F :8I69D<7:G 65E B7@6G G:9Q:E 6G 65 :N;:AA:57 =<E:A <9F65BG= C<9 7@: G78ED <C ;:AAR;D;A:;<579<AJ =B7<GBG 65E =:B<GBG,,J _!" 9:?6B9 65E 9:;<=MB567B<5,.J65E 7@: ;@:;I?<B57 ;<579<AG B=?<97657 C<9 F:5<=: G76MBAB7D,O4#@: ,O4YRUM F:5<=: <C !9 -&*., BG EBG79BM87:E M:73::5 ;@9<R

=<G<=:G L T-4XUMVJ LL T+4PUMV 65E LLL TO4-UMV,+J 7<F:7@:9 3B7@ 6.2RIM =B7<;@<5E9B6A F:5<=:,-4 #65E:= 6996DG <C ,22S,.2 9:?:67G<C 6 ,24+RIM C96F=:57 ;<576B5B5F 7@: -4YZJ ,YZ 65E .-Z 9BM<G<=6A%!" F:5:G 6;;<857 C<9 69<85E ,4,UM,P4 #@: 7@9:: ;:579<=:9:G 69:O-J P- 65E ,,2 IM A<5F C<9 ;@9<=<G<=:G LJ LL 65E LLLJ 9:G?:;7BQ:ADJ7<76AAB5F 24.UM4 #@BG A:6Q:G 6M<87 ,.4-UM <C 85BH8: G:H8:5;:JGB=BA69 B5 GBW: 7< 7@67 <C !9 ",),/$'$(,J 65E G8MG7657B6AAD G=6AA:9 7@657@<G: <C 7@: 7@9:: <7@:9 G:H8:5;:E =<E:A :8I69D<7:GJ 09 ,4,5(1'T\XUMVJ 7)(.$2&-'$' T,.-UMV 65E 6)&'&-#$4( T,OXUMV4 "AA <C 7@:

,#@: >:AA;<=: #98G7 Z65F:9 L5G7B787:J #@: >:AA;<=: #98G7 `:5<=: a6=?8GJ bB5N7<5J a6=M9BEF:a0,2 ,Z"J $c4 .a65;:9 %:G:69;@ $c *<5E<5 %:G:69;@ L5G7B787:J a<=?8767B<56A `:5<=: "56ADGBG*6M<967<9DJ ++ *B5;<A5dG L55 /B:AEGJ *<5E<5 >a." OefJ $c4 Oa65;:9 %:G:69;@ $c *<5E<5 %:G:69;@L5G7B787:J a:AA aD;A: *6M<967<9DJ ++ *B5;<A5dG L55 /B:AEGJ *<5E<5J >a." OefJ $c4 +c67@<AB:I:$5BQ:9GB7:B7 *:8Q:5J /6;8A7D <C "F9B;8A7896A 65E "??AB:E 0B<A<FB;6A Z;B:5;:GJ *6M<967<9D <C `:5:#:;@5<A<FDJ c69EB566A U:9;B:9A665 \. 0A<I /J 0RO22, *:8Q:5J 0:AFB8=4 -`:5<7D?: `=MbJ U<A:;8A690B<A<FD 65E 0B<7:;@ %:G:69;@J "5F:A@<C3:F O\J _RP\.-\>BA@:A=GC:AEJ `:9=65D4 PgL"`&! `=MbJ U6N(<A=:9 Z794 +J _R+2X.+ bBAE:5J `:9=65D4 XU6NReA65;IRL5G7B787 C8h9 =<A:I8A69: `:5:7BIJ L@5:G796GG: XOJ_R,+,\- 0:9AB5J `:9=65D4 Y`"#a 0B<7:;@ "`J i6I<MRZ76EA:9ReA67W XJ _RXY+PX c<5G765WJ `:9=65D4\"`)>" `=MbJ `AB:5B;I:9 >:F ,Y-J _R,.+Y\ 0:9AB5J `:9=65D4 ,2$5BQ:9GB7:j E: *<8Q6B5J $5B7: E:0B<;@B=B: e@DGB<A<FBH8:J eA6;: a9<BN E8 Z8E .R.2J 0,O+Y *<8Q6B5RA6R!:8Q:J 0:AFB8=4 ,,$U% P2P,a!%Z `:5:7BH8: :7 E:Q:A<??:=:57J /6;8A7:j E: U:jE:;B5:J . 6Q:58: E8 e9<C:GG:89 *:j<5 0:9569EJ /RO-2+O%:55:G a:E:NJ /965;:4 ,.$5BQ:9GB7D <C &N:7:9J Z;@<<A <C 0B<A<FB;6A Z;B:5;:GJ >6G@B5F7<5 ZB5F:9 *6M<96R7<9B:GJ e:99D %<6EJ &N:7:9 &f+ +g`J $c4 ,O`:j5:j7BH8: U<A:j;8A6B9: :7 a:AA8A6B9:J a!%Z $%",\.- L!%"

$U%.,PJ L5G7B787 !67B<56A "F9<5<=BH8: e69BGR`9BF5<5J XYY-2 #@BQ:9Q6A `9BF5<5J /965;:4 ,+_:?6976R=:57< E: `:5:7B;6J /6;8A76E E: aB:5;B6GJ $5BQ:9GBE6E E: U6A6F6J Z?6B54 ,-*6M<967<9B< "5E6A8W E:0B<A<FB6J $5BQ:9GBE6E e6MA< E: )A6QBE:J Z:QBAA6J Z?6B54 ,PL5G7B787< E: UB;9<MB<A<Fkj6 D 0B<H8kj=B;6J_:?6976=:57< E: UB;9<MB<A<Fkj6 D `:5:j7B;6J aZLa^$5BQ:9GBE6E E: Z6A6=65;6J &EBK;B< _:?6976=:576AJa6=?8G UBF8:A E: $56=85<J OX22X Z6A6=65;6J Z?6B54 ,X$5BQ:9GB7D <C Z8GG:NJ /6A=:9J 09BF@7<5 0!,\g`J $c4 ,YU<A:;8A69 l a:AA 0B<A<FD *6M<967<9DJ Z6AI L5G7B787: C<9 0B<A<FB;6A Z78EB:GJ ,22,2 !<97@#<99:D eB5:G %<6EJ *6 i<AA6J a6ABC<95B6 \.2OXR,2\\J $Z"4 ,\Z765C<9E $5BQ:9GB7DJ Z765C<9E $5BQ:9GB7DZ;@<<A <C U:EB;B5:J _:?697=:57 <C `:5:7B;GJ aaZ% %<<= ..--MJ .P\ a6=?8G _9BQ:J Z765C<9EJ a6ABC<95B6\+O2-J $Z"4 .2a<AE Z?9B5F b69M<9 *6M<967<9DJ e) 0<N ,22J , 085F7<35 %<6EJ a<AE Z?9B5F b69M<9J !:31<9I ,,X.+J $Z"4 .,#L`%J \X,. U:EB;6A a:57:9 _9BQ:J %<;IQBAA:J U69DA65E .2Y-2J $Z"4 ..#@: a@B;6F<U:EB;6A Z;@<<AJ OOOO `9::5 06D %<6EJ !<97@ a@B;6F<J LAAB5<BG P22P+J $Z"4 .OZ@:=D6IB5R)Q;@B55BI<QL5G7B787: <C 0B<<9F65B; a@:=BG79DJ %8GGB65 ";6E:=D <C Z;B:5;:GJ $A4 UBIA8I@<RU6IA6D6 ,P^,2J ,,X\\XU<G;<3J %8GGB64 .+a:57:9 C<9 0B<A<FB;6A Z:H8:5;: "56ADGBGJ 0B<a:5798=R_#$J #@: #:;@5B;6A $5BQ:9GB7D<C _:5=69IJ 08BAEB5F .2YJ _cR.Y22 cFG4 *D5FMDJ _:5=69I4

© 2002 Macmillan Magazines Ltd

!"#$!% &%$!%"'% (") *+&, +- ,.% ,./%% '%",/+*%/%& +- ,.% 0/&1%!2+3) 456.! &,/(#"4 .(7% 8%%" &%$!%"'%) 89 ,.% :%33'+*% ;/!&,<("=%/ >"&,#,!,% (") ,.% ?@ +,.%/ 3(8+/(,+/#%& ,.(, *(A% !2 ,.%!" #$%&' B!/+2%(" <%$!%"'#"= C+"&+/,#!* DB0EFGHI ,+=%,.%/J#,. ?KK A8 +- &%$!%"'% =%"%/(,%) 89 ,.% C+3) <2/#"= L(/8+/1(8+/(,+/9 DM%"N("A (''%&&#+" "!*8%/& O1@PP46KI O1@PP46?IO1@4?K@Q (") O1@4?K?RHS L%/%I J% 2/%&%", (") )#&'!&& ,.%=%"+*% &%$!%"'% (") '+*2+&#,#+"I (") '(//9 +!, (" #"#,#(3 +7%/T7#%J +- =%"% -!"',#+"I *(A#"= '+*2(/#&+"& J#,. +,.%/ %!A(/9+,#'+/=("#&*&I 2(/,#'!3(/39 !" (')'*+,+-'S

!"##$%&' ()*+)%,$%& "%- ()*+)%,) "%"./($(O '3+"% *(2 J(& =%"%/(,%) 89 ,.% #",%=/(,#+" +- ,.% ,J+ 2/%T%U#&,#"= *(2&?5I?VS B") &%$!%"'#"= (") /%&,/#',#+" )#=%&,#+" +-'+&*#)& J%/% !&%) ,+ '+"&,/!', ( *#"#*(3 ,#3% 2(,. -+/ &%$!%"'#"=SE/+83%*& J#,. ,.% %(/3#%/ *(2& #"'3!)%) ,.% %U#&,%"'% +- '.#T*(%/#' '3+"%&I *#&*(22%) '+&*#)&I 8(',%/#(3 #"&%/,#+" %3%*%",&(") !"W33%) =(2&S <*(33 =(2& J%/% '+7%/%) !&#"= ( 3+"=T/("=%2+39*%/(&% '.(#" /%(',#+" DECXH &,/(,%=9I 23(&*#) 3#8/(/#%&I (") (8(',%/#(3 (/,#W'#(3 './+*+&+*% DNOCH 3#8/(/9 2/+7#)%) '3+"%&-+/ =(2 '3+&!/% ('/+&& /%=#+"& "+, /%2/%&%",%) #" ,.% '+&*#)3#8/(/#%&S ;.% W"(3 ?6SPTG8 &%$!%"'% +- ,.% !" #$%&' =%"+*% #& ('+*2+&#,% +- QP6 '+&*#)&I 66 23(&*#)&I ?P NOC '3+"%& (") ?@ECX 2/+)!',&SG+&, &%$!%"'#"= J(& 2%/-+/*%) !&#"= /(")+* &%$!%"'#"= +-

&!8T'3+"%) YZO -+33+J%) 89 )#/%',%) &%$!%"'#"=?4S YZO -/+*'3+"%& J(& &.(,,%/%) D!&!(339 89 &+"#'(,#+"H (") -/(=*%",& +- ?SQ[6 A8 J%/% '3+"%)I ,92#'(339I #",+ G?@ +/ 20C?VS X(")+* &!8T'3+"%& J%/% &%$!%"'%) J#,. )9%T,%/*#"(,+/ '.%*#&,/9 (") ("(39&%)+" (!,+*(,%) &%$!%"'%/&S G+&, 3(8+/(,+/#%& !&%) E./%) &+-,J(/%-+/ &%$!%"'% 8(&% '(33#"= (") E./(2 +/ M(2Q -+/ '+",#= (&&%*8396KSM(2& (") 3+JT$!(3#,9 /%=#+"& +- ,.% &%$!%"'% J%/% /%&+37%) !&#"=2/#*%/ J(3A#"=I ECX (") /%T&%$!%"'#"= '3+"%&I !")%/ '+")#,#+"&,.(, =(7% #"'/%(&%) /%() 3%"=,.&S <+*% 3(8+/(,+/#%& (3&+ !&%) )#/%',83+,,#"= 2/+'%)!/%&I '3(&&#'(3 /()#+(',#7% &%$!%"'#"= (") "%&,%))%3%,#+"&S O33 &%$!%"'%& J%/% W"#&.%) ,+ ( .#=. )%=/%% +- (''!/('9IJ#,. (, 3%(&, ,J+ .#=.T$!(3#,9 /%()& +" %('. &,/(")I +/I #- ,.#& '+!3)"+, 8% (''+*23#&.%)I (" ())#,#+"(3 /%() +" ,.% &(*% &,/(") !&#"= ("(3,%/"(,#7% '.%*#&,/9S ;.% )%2,. +- '+7%/(=% J(& +" (7%/(=% %#=.,T-+3)S <%$!%"'%& J%/% '+33%',%) '%",/(339 (, ,.% :%33'+*% ;/!&,<("=%/ >"&,#,!,%I J.%/% ,.% $!(3#,9 J(& %U(*#"%) 89 '+*2(/#&+"+- +7%/3(22#"= /%=#+"& (") 89 '.%'A#"= -+/ -/(*%&.#-,& #" '+)#"=/%=#+"&S ;.% &%$!%"'#"= %//+/ /(,% J(& 3%&& ,.(" ? #" ?VKIKKK 8(&%2(#/& D82HI '(3'!3(,%) -/+* ,.% "!*8%/ +- &#"=3%T8(&% )#--%/%"'%&+8&%/7%) #" +7%/3(22#"= &%$!%"'%& -/+* )#--%/%", &+!/'%&S O33#)%",#W%) &%$!%"'#"= %//+/& .(7% 8%%" /%&+37%) J#,. ,.% %U'%2,#+"+- -+!/ &#"=3%T8(&% )#--%/%"'%& -+!") #" .+*+2+39*%/#' ,/(',&3+'(,%) +!,&#)% '+)#"= /%=#+"&I 2+&&#839 =%"%/(,%) 89 &3#22(=%)!/#"= YZO /%23#'(,#+"SM%"% 2/%)#',#+" J(& '(//#%) +!, J#,. MBZB\>ZYBX DES M/%%"

(") 1S L#33#%/I !"2!83#&.%) &+-,J(/%H ,/(#"%) +" %U2%/#*%",(339'+"W/*%) !" #$%&' =%"%& ,+ /%'+="#]% #",/+"#' (") '+)#"= /%=#+"&SO))#,#+"(3 #"-+/*(,#+" J(& 2/+7#)%) !&#"= ( L#))%" G(/A+7G+)%3 ,/(#"%) +" #",/+" &%$!%"'%& !&#"= LGGBX D.,,2^__.**%/S J!&,3S%)!_.**%/T.,*3_HS <%(/'.%& J%/% 2%/-+/*%) (=(#"&,2!83#' )(,(8(&%& D<:><<TEXF; (") ;/BGN16?I BGN166 (")E-(*6@HI !&#"= N1O<;6QI G<E'/!"'.6PI \O<;O6R (") M%"%J#&%65S;.% 2/%)#',#+"& J%/% /%W"%) *("!(339 J#,.#" ,.% O/,%*#& ("(39&#&(") (""+,(,#+" ,++36V !&#"= 2/+,%#" .+*+3+=9 (") %U2/%&&%)&%$!%"'% ,(= DB<;H )(,(64S N%'(!&% *+&, !" #$%&' =%"%& .(7% (2/+&2%',#7% .+*+3+=!% #" +,.%/ +/=("#&*&I 2!,(,#7% -!"',#+"& J%/%(&&#="%) +" ,.% 8(&#& +- &#*#3(/#,#%& ,+ A"+J" =%"%&I !&#"= ,.%<:><<TEXF;6?I E-(*6@I E/+,%+*%@KI <MY@? (") G>E< )(,(8(&%&@6S>)%",#W'(,#+" +- ,/("&-%/ XZOJ(& '(//#%) +!, !&#"= ,.% ,XZO &'("T<B &+-,J(/%@@SE/%)#',#+" +- =%"%& #" W&&#+" 9%(&, #& ( 2/+83%* +- #",%/*%)#(,%

'+*23%U#,9S >, #& *+/% )#-W'!3, ,.(" ,.% ("(39&#& +- ,#=.,39 2('A%)

=%"+*%& ,.(, .(7% 3#,,3% +/ "+ &23#'#"=I (& -+!") #" 2/+A(/9+,%& (")8!))#"= 9%(&,I 8!, 3%&& )#-W'!3, ,.(" =%"% 2/%)#',#+" #" *!3,#T'%33!3(/ %!A(/9+,%&I J.#'. .(7% 3+J%/ =%"% )%"&#,9I .#=. 3%7%3& +-&23#'#"=I (") 3+"= #",/+"&S ;.%/% (/% QI5@K '+"W/*%) (") 2/%)#',%)#",/+"& #" !" #$%&'I *("9 *+/% ,.(" ,.% 656 "+J 2/%)#',%) -+/ !"(')'*+,+-'S !" #$%&' #",/+"& (7%/(=% +"39 V? "!'3%+,#)%& #" 3%"=,.(") &+ (/% &.+/,%/ (") %(&#%/ ,+ 2/%)#', ,.(" ,.+&% -+!") #"G%,(]+((") 23(",&S F- ,.% QI5@K #",/+"& #" !" #$%&'I R@V .(7% 8%%"'+"W/*%) %U2%/#*%",(339 89 *%&&%"=%/ XZO (") B<; )(,(64I (")*("9 *+/% 89 .+*+3+=9S

0)%12) ,1%3)%3:% 2/%)#',%) ( *(U#*!* +- QI4QK 2/+,%#" '+)#"= =%"%& D#"'3!)#"=?? *#,+'.+")/#(3 =%"%&H (") @@ 2&%!)+=%"%&S ;.% ,./%% =%"% *(2&&.+J#"= ,.%&% 2/%)#',#+"& '(" 8% 7#%J%) (, -,2^__-,2S&("=%/S('S!A_2!8_9%(&,_2+*8%_M%"%G(2&_S O33 +2%" /%()#"= -/(*%& DFX\&H +7%/?KK (*#"+ ('#)& J#,. (" #"#,#(,+/ *%,.#+"#"% (") "+, +7%/3(22#"=J#,. +,.%/ A"+J" =%"%& (/% #"'3!)%) #" ,.#& &%,S O3&+ #"'3!)%) (/%?Q5 '+"W/*%) +/ 2/%)#',%) 2/+,%#"T'+)#"= &%$!%"'%& +- 6P[44(*#"+ ('#)&S O"9 /%*(#"#"= !")#&'+7%/%) =%"%& (/% 3#A%39 ,+ .(7%%#,.%/ ( .#=.39 &23#'%) &,/!',!/% J#,. &*(33 %U+"&I +/ ,+ 8% &*(33%/,.(" ?KK (*#"+ ('#)&S ;.%/% (/% ( -!/,.%/ ??R $!%&,#+"(83% 2/+,%#"&'+"&#)%/%) 3%&& 3#A%39 ,+ 8% '+)#"= 8%'(!&% ,.%9 (/% &*(33I .(7% "+)%,%',(83% .+*+3+=#%&I (") )#&23(9 3+J '+)#"= 2+,%",#(3S X%*+7(3+- ,.%&% $!%&,#+"(83% =%"%& /%)!'%& ,.% 2/%)#',%) =%"% '+*23%*%",-/+* QI4QK ,+ QIV6QSB7%" +!/ !22%/ %&,#*(,% +- QI4QK =%"%& -+/ !" #$%&' #& &!8&,("T

,#(339 3%&& ,.(" ,.% PIP5K[PIRP? =%"%& 2/%)#',%) -+/ !" (')'*+,+-'@QI@PI,.% RI5P6 =%"%& 2/%)#',%) -+/ .',$)/+0$&+1% 2$3+I ,.% 3(/=%&,2!83#&.%) 2/+A(/9+,% =%"+*% &%$!%"'% ,+ )(,%@RI (") ,.% 5IV6P=%"%& %&,#*(,%) #" ,.% VSR5TG8 =%"+*% +- ,.% 2/+A(/9+,%!3)'#3$%4(', ($'2+($2$) D`S E(/A.#33 (") <S N%",3%9I 2%/&+"(3 '+**!T"#'(,#+"HS :% '+"'3!)% ,.(, ( -/%%T3#7#"= %!A(/9+,#' '%33 '(" 8%'+"&,/!',%) J#,. -%J%/ ,.(" PIKKK =%"%&I (") ,.(, ,.% )#&,#"',#+"8%,J%%" %!A(/9+,#' (") 2/+A(/9+,#' '%33 +/=("#](,#+" #& "+, )%,%/T*#"%) &#*239 89 ,+,(3 "!*8%/ +- =%"%& 8!, )%2%")& +" ,.% ,92%& +-=%"%& 2/%&%", (") .+J ,.%9 #",%/(', J#,. %('. +,.%/ (") ,.%%"7#/+"*%",S C+*2(/#"= ,.% =%"+*% '+",%", +- &2%'#%& (, )#--%/%",3%7%3& +- +/=("#](,#+"I #, &%%*& ,.(, -%J%/ ,.(" PKK =%"%& (/%&!-W'#%", ,+ =%"%/(,% ( 2(/(&#,#' 2/+A(/9+,#' '%33 &!'. (&.4($#2-,%- 5'6+3-2+1%@5I (8+!, ?IPKK =%"%& -+/ ( -/%%T3#7#"= 2/+TA(/9+,#' '%33 &!'. (& 781+9': -'$2+(1,@VI PIKKK =%"%& -+/ ( -/%%T3#7#"=%!A(/9+,#' '%33 D!" (')'*+,+-' (") !" #$%&'a /%-S @4 (") ,.#& 2(2%/HI(") (/+!") ?PIKKK =%"%& -+/ *!3,#'%33!3(/ %!A(/9+,#' +/=("#&*&&!'. (& ;)$,$#/+2- (") <" '2'5-6,6I@I J.%/%(& @KIKKK[QKIKKK =%"%&=#7%& /#&% ,+ .!*(" '+"&'#+!&"%&&PIRSM%"% )%"&#,9 #& &#*#3(/ -+/ './+*+&+*%& > (") >>I J#,. +"% =%"%

%7%/9 6IQV@ (") 6IQP5 82 /%&2%',#7%39I 8!, #& 3%&& )%"&% -+/ './+*+T&+*% >>>I (, +"% =%"% %7%/9 6I54K 82S ;.#& #& "+, )!% ,+ )#--%/%"'%& #",.% (7%/(=% 3%"=,. +- ,.% =%"%&I J.#'. (/% &#*#3(/ D?IQK5[?IQQR 82H-+/ (33 ,./%% './+*+&+*%& D;(83% ?HS E/+,%#"T'+)#"= =%"%& (/%(8&%", -/+* ,.% '%",/+*%/%&I (3,.+!=. ,XZO =%"%& (/% -+!") #",.%&% /%=#+"&S M%"% )%"&#,9 #& (3&+ 3+J%/ (, ,.% ,%3+*%/%&S ;.% =%"%)%"&#,9 -+/ ,.% '+*23%,% =%"+*% #& +"% =%"% %7%/9 6IP6V 82I'+*2(/%) J#,. +"% =%"% %7%/9 6IKVV 82 -+/ !" (')'*+,+-'S ;.%2/+,%#"T'+)#"= &%$!%"'% #& 2/%)#',%) ,+ +''!29 RKS6b DP5b%U'3!)#"= #",/+"&H +- ,.% &%$!%"'%) 2+/,#+" +- ,.% !" #$%&'=%"+*%I '+*2(/%) J#,. 5?b #" !" (')'*+,+-' D5KSPb %U'3!)#"=#",/+"&HS ;.% +7%/(33 =!("#"% (") '9,+&#"% DMCH '+",%", #& @RSKbI'+*2(/%) J#,. @VS@b #" !" (')'*+,+-'I (") -+/ ,.% 2/+,%#"T'+)#"=2+/,#+" #& #)%",#'(3 #" ,.% ,J+ 9%(&,& (, @4SRbS:% .(7% #)%",#W%) ( ,+,(3 +- ?5Q ,XZO&I QP +- J.#'. .(7% #",/+"&a

(33 ,.% ,XZO -(*#3#%& "%%)%) ,+ )%'+)% (33 '+)+"& (/% 2/%&%",S ;.%&23#'%+&+*(3 XZO& D0?[0RH (/% -+!") ,+=%,.%/ J#,. ?R &*(33"!'3%(/ XZO =%"%& D&"XZO&H (") @@ &*(33 "!'3%+3(/ XZO& D&"+TXZO&HS ;.%&% (/% )#&2%/&%) *+&,39 (& &#"=3%,+"& ,./+!=.+!, ,.%=%"+*%S ;.% PSV<I ?V< (") 6R< /#8+&+*(3 XZO =%"%& (/% =/+!2%)

!"#$%&'(

)*+ ZO;0XB c dF1 Q?P c 6? \BNX0OXe 6KK6 c JJJS"(,!/%S'+*© 2002 Macmillan Magazines Ltd

!"#$!%$& '( )**+),* !'-.$/ &$0$'!( 1- !2" '&&'3( "- 4%&"/"("/$5556*7 89! !%$ !%1&!3 :; &18"("/'< =>? #$-$( '&$ .1(!&189!$.!%&"9#%"9! !%$ #$-"/$6)7 0&"@1.1-# "00"&!9-1!1$( A"& 9-$B9'<4&"((1-# "@$& 2%$- !%$3 '&$ 1- !'-.$/ "&1$-!'!1"- '-. 4<"($0&"C1/1!3D E%1( 4'- <$'. !" <"4'< .90<14'!1"-( '-. .$<$!1"-( "A#$-$( <"4'!$. 8$!2$$- !%$ :; =>? #$-$(6,D E%$&$ '&$ )) 1-!'4!!&'-(0"('8<$ $<$/$-!( FEA, !30$G FE'8<$ )G7 '44"9-!1-# A"& *DH:I "A!%$ #$-"/$D E%1( 1( (1#-1J4'-!<3 <$(( !%'- !%$ ,D6I F:K $<$/$-!(GA"9-. 1- !" #$%$&'(')$6H '-. !%$ )*I A"9-. 1- *%)+',-.('(67 '-. 1('<(" <1L$<3 !" 8$ /94% <$(( !%'- !%$ -9/8$&( 1- /%-(-.0'1) '-.%9/'-(6676:D E%$&$ '&$ ,: 2!A $<$/$-!( FM21!% !A)N "& !A,N!30$O <"-#!$&/1-'< &$0$'!(7 PE=(G7 2%14% '00$'& !" 8$ (0<14$. /$/8&'-$0&"!$1-( "A !" .-2+$D E%$($ $<$/$-!( '&$ "A!$- Q'-L$. 83 PE=(7 '-.(" /'3 %'@$ 8$$- .90<14'!$. 83 &$!&"!&'-(0"(1!1"-D E%$&$ '&$ '<(")R* ("<" PE=(7/'&L1-# A"&/$& !&'-(0"(1!1"- $@$-!(7 4"/0'&$.21!%,SR A"9-. 1- !" #$%$&'(')$D E%$ .$-(1!3 "A !&'-(0"('8<$ $<$/$-!&$/-'-!( "- 4%&"/"("/$ 555 "A !" .-2+$ 1( !214$ !%'! "A 4%&"/"N("/$( 5 '-. 55 FE'8<$ )GDT$ $C'/1-$. UH #$-$!14'<<3 '-. 0%3(14'<<3 /'00$. #$-$( A&"/

!%$ !%&$$ #$-$ /'0(V 4"/0'&1("- "A !%$($ /'0( (%"2( !%'! !%$3 '&$$(($-!1'<<3 4"N<1-$'& '-. !%'! !%$ <$@$< "A &$4"/81-'!1"- 1( (1/1<'&!%&"9#%"9! !%$ !%&$$ 4%&"/"("/$(D W"&$ .$!'1<$. 4"/0'&1("-( "A!%$ #$-$!14 '-. 0%3(14'< /'0( /'3 &$@$'< (98!<$ @'&1'!1"-( 1-&$4"/81-'!1"- '&"9-. 4$-!&"/$&$(7 !$<"/$&$(7 !%$ /'!1-#N!30$<"49(7 '-. (1!$( "A /$1"!14 X>? ."98<$N(!&'-. 8&$'L(D ;$@$&'<1-4"-(1(!$-41$( 1- !%$ #$-$!14 /'0( 2$&$ 1.$-!1J$.7 1-4<9.1-# !%$&$@$&('< "A ' 4%&"/"("/$ 55 A&'#/$-! -$'& !%$ !$<"/$&$ 8$!2$$-

3%.4 '-. (.-5 F&$AD 6SG7 !%$ &$<"4'!1"- "A #634 '-. 7$$4 A&"/ !%$!$<"/$&$ &$#1"- !" !%$ 4$-!&"/$&$ &$#1"- "A 4%&"/"("/$ 5557 '-.4%'-#$( 1- 0"(1!1"- "A 18(4 '-. 3-.4D

!"#$%&'"%" ($%)*$)%"(E%$ "9!<1-$ (!&94!9&$ "A !%$ 4$-!&"/$&$( %'( 0&$@1"9(<3 8$$-.$.94$. 83 ;"9!%$&- 8<"!!1-# '-. 83 ($B9$-41-# '8"9! )6I "A!%$ 4$-!&"/$&$ &$0$'! &$#1"-(6U+6KD Y$&$7 2$ ($B9$-4$. /"(! FR)IG"A !%$ !%&$$ 4$-!&"/$&$(V !%1( %'( '<<"2$. (4%$/'!14 /'0( "A !%$4$-!&"/$&$( !" 8$ @$&1J$. FZ1#D )GD E%$ -"/$-4<'!9&$ 9($. A"<<"2(!%'! "A !%$ ['-'#1.' #&"90:*7:)V %"2$@$&7 "!%$& .$(1#-'!1"-( "A !%$4$-!&"/$&$ $<$/$-!( %'@$ 8$$- 9($.:,D E%$/"(! 4"/0<$!$ ($B9$-4$1( A"& 4$-!&"/$&$ )7 2%14% 1( !%$ (%"&!$(! '! H: L8 '-. 1( /1((1-# "-<3"-$ ,D:NL8 A&'#/$-!D E%1( 4$-!&"/$&$ 4"-(1(!( "A ' 4$-!&'< 4"&$F4-!)G "A 6D) L8 '-. ,RI \] 4"-!$-!7 Q'-L$. 83 !2" :DSNL81/0$&A$4! 1/&) &$0$'!( F1/&)P7 1/&)=G 21!% ,KI \] 4"-!$-!7'-. !2" 0'1&( "A 6D6NL8 .# '-. 6DRNL8 .% &$0$'!( F.#)7 .%)G "AHH+H6I \] 4"-!$-!D ? &$0$'! "A '&"9-. *DH L87 L-"2- '( 4$- ,:HF^W_P `)HU:UG7 1( A"9-. '.a'4$-! !" !%$ .% &$0$'!(D E%$ /'0( "A!%$ "!%$& !2" 4$-!&"/$&$( %'@$ !%$ ('/$ 8'(14 (!&94!9&$ 21!%4$-!&'< 4-! &$#1"-( Q'-L$. 83 1/& &$0$'!( '-. 83 @'&1'8<$ -9/8$&("A .# '-. .% &$0$'!( ($0'&'!$. 83 4$- ,:HD ]-!)7 N, '-. NH (%'&$ 6RI1.$-!1!3 "@$& ' )76*:N80 &$#1"-7 '-. .%)7 N, '-. NH (%'&$ 6RI1.$-!1!3 "@$& ' )7R))N80 &$#1"-D Y"2$@$&7 !%$ /"(! (!&1L1-# 4"-N($&@'!1"- 1( "8($&@$. 1- !%$ .# &$#1"-(7 2%14% (%'&$ KUI 1.$-!1!3"@$& ' )7UR*N80 &$#1"-D E%1( %1#%<3 4"-($&@$. ($#/$-! &$0&$($-!( '-$<$/$-! !%'! 1( $(($-!1'< A"& 4$-!&"/$&$ A9-4!1"-V .$<$!1"- "A !%1(

!"#$%&'(

>?Eb=^ c deP 6): c ,) Z^_=b?=[ ,**, c 222D-'!9&$D4"/ )*+

!"#$% & '%()*% +)(,%(, -). ,/% ,/.%% +/.)*)0)*%0

!"#$%& '()* +,- ,. $"#"/ +,- ,. 0.1/ +,- ,.)/"23, 0.1/

+,- ,. 4%./ +,- ,.5,#" !06/

+,- ,.)/"23,$"#"/

7"8# $"#"5"#$%& '()*9

:"#" 3"#/;%<= >,3;#$ '?*

DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD

>&@,A,/,A" B CDCEFDE1G 1D1CC F H B II BI BDJJK 1DJFG CF-K>&@,A,/,A" 1 JDGEIDIEC BDIEH 1 B B CG E BDJBB 1DJCI CI-C>&@,A,/,A" G 1DJKCDEBE FFJ B 1 1G CH I BDJHI 1DIEH CJ-CL&,5" $"#,A" B1DJK1DKGI JDE1E BB G 1C BFH GG BDJ1K 1DC1F CI-CDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD97"8# $"#" 5"#$%& "MN523;#$ ;#%@,#/-=:"#" 3"#/;%<D $;O"# 8/ 8O"@8$" () )"@ $"#"-

p7G5 c1856 p4C9

c633c28F2 pB36C4

pYC111pYC116

c21B10pJ5566

c1259 pB5A12 c4B3c1676c233

1(35 kb)

2(65 kb)

3(110 kb)

1 partial repeatunit missing

1 repeat unitmissing

7 repeat unitsmissing

ImrL Cnt dg dhImrRcen 253 3 or more

tRNA genesSingle tRNA

gene

+,-)%" . !"#$%&'(" %&)* +, '#$ '#-$$ !" #$%&' "$.'-+%$-$* *#+/(.0 '#$ -$)$&'$1 $2$%$.'*3 4#$ 5$6 (* 0(7$. &' '#$ 8+''+% +, '#$ 90:-$ &.1 '#$ -$2$7&.' "2+.$* &-$ (.1("&'$1 :.1$-

$&"# "$.'-+%$-$ %&)3 4#$ %&)* &-$ .+' 1-&/. '+ *"&2$3

© 2002 Macmillan Magazines Ltd

!"#$%& '!%( )*" +# !","-). )"!("+ )*" /0/! !","-) 12 )*" 34-!5"#!%6,. !"764)7 $& - 8%(,4")" 4%77 %' 8"&)!%("!" -8)$9$)2 $& 1%)*($)%7$7 -&+ ("$%7$7:;< =*"!" (67) 1" - 7,"8$-4 ("8*-&$7( )%(-$&)-$& 768* - *$#* 4"9"4 %' 7">6"&8" 8%&7"!9-)$%& 1")?""& )*"+$''"!"&) 8"&)!%("!"7< =*" )%)-4 8-4864-)"+ 4"&#)*7 %' 8"&)!%("!"7 @.A -&+ ; -!" !"7,"8)$9"42 ;:. B: -&+ @@C 51. $&9"!7"42 ,!%,%!)$%&-4 )%)*" 4"&#)*7 %' )*" 8*!%(%7%("7 -) :<D. E<B -&+ ;<:F1< G%77$142(%!" "H)"&+"+ 8"&)!%("!$8 !"#$%&7 -!" !">6$!"+ '%! ,!%,"! ($)%)$8-&+ ("$%)$8 1"*-9$%6! ?*"& )*" 8*!%(%7%(" -!(7 -!" 7*%!)"!< I7&%)"+ -1%9" )*"!" -!" &% ,!%)"$&J8%+$&# #"&"7 $& )*" 8"&)!%("!$8!"#$%& 16) )*"!" -!" (-&2 )KLI #"&"7 MN$#< @O< )KLI 8467)"!7 P-&58"&)!%("!"7 A -&+ ; -&+ -!" -47% '%6&+ ?$)*$& )*" $(! !"#$%&7 %' -44)*!"" 8"&)!%("!"7:C< =*"7" )KLI #"&"7 ($#*) 8%&)!$16)" )% 8"&J)!%("!" '6&8)$%& 12 +"Q&$&# +%(-$& 1%6&+-!$"7 $(,%!)-&) '%!8"&)!%("!" -8)$9$)2:E<=*" !" #$%&' 8"&)!%("!"7 -!" 8%&7$+"!-142 4%&#"! )*-& )*"$!

!" (')'*+,+-' ">6$9-4"&)7. ?*$8* 8%&)-$& - 8%!" !"#$%& 76'Q8$"&) '%!8"&)!%("!" -8)$9$)2 %' %&42 @AC 1,::.:B -&+ - &684"-7"J,!%)"8)"+!"#$%& %' @:CR@BC 1, $&846+$&# )*" @ACJ1, 8%&7"!9"+ 8%!":D< S) $7&%) 84"-! ?*2 !" #$%&' 8"&)!%("!"7 -!" ;[email protected] )$("7 4-!#"! )*-&)*"$! !" (')'*+,+-' ">6$9-4"&)7. 16) %&" ,%77$1$4$)2 $7 )*-) )*"$!5$&")%8*%!" 7)!68)6!"7 -!" +$''"!"&)<

!"#$%&$"$ %$&'(")=*" )%)-4 $&)"!#"&" 4"&#)* +$7)!$16)$%&7 '%! !" #$%&' -&+ !"(')'*+,+-' -!" 7*%?& $& N$#< A< =*" 4"&#)* $7 8-4864-)"+ '!%( )*"7)%, 8%+%& )% )*" &"H) 7)-!) 8%+%& '%! )-&+"(42 %!$"&)"+ #"&"7.'!%( )*" 7)-!) 8%+%& )% )*" 7)-!) 8%+%& '%! +$9"!#"&)42 %!$"&)"+#"&"7. -&+ '!%( )*" 7)%, 8%+%& )% )*" 7)%, 8%+%& '%! 8%&9"!#"&)42%!$"&)"+ #"&"7< S&)"!#"&$8 !"#$%&7 $& !" #$%&' *-9" -(%+" %' EA; 1,-&+ - ("-& %' T:A 1,. 1%)* 4%&#"! )*-& )*" ">6$9-4"&) 9-46"7 '%!!" (')'*+,+-' MACC -&+ :@: 1, !"7,"8)$9"42O< I&-427$7 %' )*" +$9"!#"&)$&)"!#"&" !"#$%&7 !"9"-47 )*-) ,-$!7 %' 6,7)!"-( !"#$%&7 !-&#" $&4"&#)* '!%( ACC )% A.@CC 1,. ?$)* - ,"-5 1")?""& ACC -&+ @.ACC 1,MN$#< AO< =*$7 $7 4%&#"! )*-& )*" ">6$9-4"&) +$7)!$16)$%&7 $& !"(')'*+,+-'. ?*$8* !-&#" '!%( ACC )% TCC 1,. ?$)* - ,"-5 '!%( ACC)% DCC 1, MN$#< AO< I&-427$7 %' 8%&9"!#"&) $&)"!#"&" !"#$%&7 7*%?7 -,"-5 $& 4"&#)* '%! ,-$!7 %' +%?&7)!"-( !"#$%&7 %' ACCRUCC 1, '%!!" #$%&' -&+ @CCR:CC 1, '%! !" (')'*+,+-' MN$#< AO< =*"!"'%!" )*"!" $7- 7(-44"! +$''"!"&8" 1")?""& )*" )?% 2"-7)7 '%! )*" $&)"!#"&$8 !"#$%&71")?""& 8%&9"!#"&) #"&"7 M+%?&7)!"-( !"#$%&7O )*-& '%! )*%7"

1")?""& )*" +$9"!#"&) #"&"7 M6,7)!"-( !"#$%&7O<V"9"!-4 "H,4-&-)$%&7 8-& -88%6&) '%! )*"7" !"764)7< =*" :"(KLI

!"#$%&7 (-2 1" 727)"(-)$8-442 4%&#"! $& !" #$%&' )*-& $& !"(')'*+,+-'. -4)*%6#* )*"!" $7 &% "9$+"&8" '%! )*$7< N%! "H-(,4". )*"7,-8$&# 1")?""& )*" =I=IJ1%H !"#$%& -&+ )*" )!-&78!$,)$%&-4 7)-!)$& !" #$%&' $7 7*%!)"! )*-& )*-) $& !" (')'*+,+-':U.:T< I4)"!&-)$9"42. )*",!%(%)"! !"#$%&7 (-2 1" %' #!"-)"! 8%(,4"H$)2 $& !" #$%&' -&+)*"!"'%!" 4%&#"!< I#-$& )*"!" $7 &% +$!"8) "9$+"&8" )% 76,,%!) )*$79$"?. 16) )*"!" -!" %)*"! "H-(,4"7 %' (%!"J"H)"&+"+ %!#-&$W-)$%&%' 8*!%(-)$& "4"("&)7 $& !" #$%&'. $&846+$&# 4-!#"! 8"&)!%("!"7-&+ !"#$%&7 %' XLI !",4$8-)$%& %!$#$&BC< =*" "H$7)"&8" %' )!642$&)"!#"&$8 7,-8"! !"#$%&7 $& !" #$%&' $7 76,,%!)"+ 12 )*" $+"&)$Q8-J)$%& %' 7"9"!-4 ERUJ51 "H)"&+"+ #"&"J'!"" !"#$%&7. ?*$8* '-44 %6)7$+")*" 1!%-+ +$7)!$16)$%& %' 4"&#)*7 -77%8$-)"+ ?$)* -9"!-#" $&)"!#"&$8!"#$%&7< =*"7" -!" 4%? 8%(,4"H$)2 7">6"&8"7 ?$)* - MY !3O0MYZ3O7)!-&+ 7?$)8*B@< =*"!" -!" -1%6) )"& #"&"J'!"" !"#$%&7 ,"! 8*!%(%J7%(". ?*$8* -!" 676-442 P-&5"+ 12 )-&+"(42 %!$"&)"+ #"&"7< [&" %')*"7" #"&"J'!"" !"#$%&7. 1")?""& VGI3EYU<C;8 -&+ VGI3EYU<CE.8%!!"7,%&+7 )% - ,!%($&"&) ("$%)$8 XLI 1!"-5 7$)" %! 8467)"! %'7$)"7 M\< I< ]%6&#. K< < V8*!"85*$7" -&+ Y< K< V($)*. (-&678!$,) $&,!",-!-)$%&O<

!"#%(")I )%)-4 %' E.D;C $&)!%&7 $7 +$7)!$16)"+ -(%&# E;_ %' !" #$%&' #"&"7.?$)* @: 1"$&# )*" 4-!#"7) &6(1"! %' $&)!%&7 '%6&+ ?$)*$& - 7$&#4"#"&" M=-14" AO< S&)!%&7 9-!$"+ '!%( AT )% U@T &684"%)$+"7 4%&#. ?$)*- ("-& 4"&#)* %' U@ -&+ - (%+" %' EU &684"%)$+"7< S& !" (')'*+,+-'.$&)!%&7 -!" (68* !-!"!. ?$)* %&42 :_ %' #"&"7 *-9$&# $&)!%&7< F%7)$&)!%&7 $& !" #$%&' '%44%? )*" !64" %' Y= +%&%! -&+ IY -88",)%!.16) )*"!" -!" )*!"" "H-(,4"7 )*-) *-9" Y3 +%&%!7BA< =*" -9"!-#",%7$)$%&7 %' $&)!%&7 ?$)*$& #"&"7 ?"!" -77"77"+ 12 (-,,$&# )*"(?$)* !"7,"8) )% )*" 7)-!) -&+ 7)%, 8%+%&7< =*$7 -&-427$7 +%"7 &%) )-5"$&)% -88%6&) -&2 $&)!%&7 $& :" -&+ ;" 6&)!-&74-)"+ !"#$%&7< N%! )*"#"&"7 ?$)* @RB $&)!%&7 )*"!" $7 - :" 1$-7 '!%( )*" 9-46"7 "H,"8)"+ $'$&)!%&7 ?"!" "9"&42 +$7)!$16)"+ )*!%6#*%6) )*" #"&"7 M=-14" AO< I :"1$-7 $7 -47% 7""& $& !" (')'*+,+-'. ?*"!" $) *-7 1""& *2,%)*"7$W"+ )% 1"+6" )% +. *+*$ !"9"!7" )!-&78!$,)$%& #"&"!-)$&# 8%(,4"("&)-!2XLI7 ,!$("+ '!%( )*" ;" "&+7 %' )*" (KLI7. '%44%?"+ 12!",4-8"("&) %' )*" %!$#$&-4 8*!%(%7%(-4 #"&" ?$)* )*" 8XLI 12*%(%4%#%67 !"8%(1$&-)$%&B;< `"8-67" 8XLI7 -!" "H)"&+"+ '!%()*"$! ;" "&+7. )*"!" ?$44 1" - )"&+"&82 '%! $&)!%&7 -) :" "&+7 &%) )% 1"

!"#$%&'(

)*+ LI=aKb c d[e E@: c A@ Nb`KaIK] ACCA c ???<&-)6!"<8%(

S. pombeTotal Divergent orientation Convergent orientation

S. cerevisiae

<–20

0

100

500

900

1,30

0

1,70

0

2,10

0

2,50

0

2,90

0

≥3,2

00

0100200300400500600700800900

<–20

0

100

500

900

1,30

0

1,70

0

2,10

0

2,50

0

2,90

0

≥3,2

00

0

500

1,000

1,500

2,000

2,500

<–20

0

100

500

900

1,30

0

1,70

0

2,10

0

2,50

0

2,90

0

≥3,2

00

020406080

100120140160

<–20

0

100

500

900

1,30

0

1,70

0

2,10

0

2,50

0

2,90

0

≥3,2

00

050

100150200250300350400450

<–20

0

100

500

900

1,30

0

1,70

0

2,10

0

2,50

0

2,90

0

≥3,

2000

50100150200250300350400450

<–20

0

100

500

900

1,30

0

1,70

0

2,10

0

2,50

0

2,90

0

≥3,

200

0100200300400500600700

Length of intergene region (bp)

No.

of i

nter

gene

regi

ons

*'&+%$ , !"#$%&$"$ %$&'(")* +')#%',-#'(" (. '"#$%&$"$ %$&'(") &'/$" .(% 011 &$"$) 0"2 .(%2'/$%&$"# 0"2 3("/$%&$"# 40'%) (. &$"$)5 .(% ,(#6 !" #$%&' 0"2 !" (')'*+,+-'* 7 #(#01 (.

859:; '"#$%&$"$ %$&'(") .%(< !" #$%&' =$%$ 0"01>)$2 .%(< 0 20#0,0)$ 4%$40%$2 ?-)#

,$.(%$ 3(<41$#'(" (. #6$ =6(1$ &$"(<$5 0"2 @5A99 '"#$%&$"$ %$&'(") .%(< !" (')'*+,+-'

=$%$ 0"01>)$2* B')#(&%0<) )6(= #6$ "-<,$% (. %$&'(") '" C;;D,4 ,'")*

© 2002 Macmillan Magazines Ltd

!"#$%"& '!$# ()" *)!$#$+$#,- ."/"+0 1' ."/"+ (),( ),%" (2$ $!#$!" 3/(!$/+4 567 ),%" (2$ 3/(!$/+4 897 ),%" ()!""4 67: ),%" '$;!4<= ),%" >%" ,/& 7= ),%" +3? @A,B-" 9C0 A);+ ()" /;#B"! $' ."/"+),%3/. ,/ "?(!, 3/(!$/ &"*!",+"+ BD ,B$;( ),-' ,+ 3/(!$/ /;#B"!3/*!",+"+ '!$# (2$ ($ +3? E"! ."/"0 A)"+" $B+"!%,(3$/+ #,D B" $'!"-"%,/*" ($ +E"*;-,(3$/+ *$/*"!/3/. ()" #"*),/3+#+ BD 2)3*)3/(!$/+ ,!" ."/"!,("& ,/& !"#$%"&570 A)" !"-,(3%"-D -,!." /;#B"! $'3/(!$/+ 3/ !" #$%&' E!$%3&"+ $EE$!(;/3(3"+ '$! ,-("!/,(3%" +E-3*3/.($ ."/"!,(" E!$("3/ %,!3,/(+4 2)3*) *$;-& ),%" !".;-,($!D !$-"+ ,+2"-- ,+ 3/*!",+3/. ()" !,/." $' E!$("3/ (DE"+ E!"+"/( 3/ ()" *"--5F0

!"#$%" &'()*+,-*$#. ,#& +$%(,/*.$#.G$#E,!3+$/+ $' *)!$#$+$#,- +"H;"/*"+ ,/& +",!*)"+ '$! (!,*(+ $'*$/+"!%"& ."/" $!&"! &3& /$( !"%",- "%3&"/*" '$! -,!."I+*,-" ."/$#"&;E-3*,(3$/+ 3/ !" #$%&'0 A)3+ &3''"!+ '!$# !"E$!(+ '$! !" (')'*+,+-',/& .)-&+/$#,+,4 2)3*) ),%" +;.."+("& (),( B$() $' ()"+" $!.,/3+#+),%" ;/&"!.$/" +$#" -,!."I+*,-" ."/$#" &;E-3*,(3$/74550 J$2"%"!4B-$*K+ $' &;E-3*,("& +"H;"/*" ($(,--3/. ,B$;( F= KB !"(,3/3/. ,*$/+"!%"& ."/" $!&"! *,/ B" '$;/& ,( ()" +;BI("-$#"!3* !".3$/+ $'

*)!$#$+$#"+ L ,/& LL0 A2"/(DI'$;! ."/"+ @3/ .!$;E+ $' (2$ $! '$;!C,!" 6==M 3&"/(3*,- ,( ()" NOP -"%"-4 ,/& (2"/(D $' ()"+" ,!"-$*,-3Q"& 3/ +;BI("-$#"!3* !".3$/+4 +;.."+(3/. '!"H;"/( "?*),/."$' ."/"(3* 3/'$!#,(3$/ ,( ()"+" E$+3(3$/+0 R$+( $' ()"+" ."/"+ *$&"'$! E!$("3/+ B"-$/.3/. ($ ',#3-3"+ +E"*3>* ($ >++3$/ D",+( ,/& ,!"E!"&3*("& ($ B" *"--I+;!',*" E!$("3/+0 L/("!"+(3/.-D4 3/ !" (')'*+,+-' <$' ()" 65 ."/"+ @3/ .!$;E+ $' (2$4 ()!"" $! '$;!C (),( ,!" 6==M3&"/(3*,- ,( ()" NOP -"%"- ,!" ,-+$ -$*,("& 3/ +;BI("-$#"!3* !".3$/+0A)"+" ."/" E!$&;*(+ 3/*-;&"#"#B"!+ $' ()" B;&&3/.ID",+(I+E"*3>*SPT ,/& G1U ',#3-3"+4 2)3*) ,!" ,-+$ E!"&3*("& ($ B" *"--I+;!',*"E!$("3/+8V0 L/ ()" )3.)-D E-,+(3* ("-$#"!3* ,/& +;BI("-$#"!3* !".3$/+$' #,-,!3, ,/& +"%"!,- $()"! E!$($Q$,/ E,!,+3("+4 ."/"+ *$&3/. '$!+E"*3"+I+E"*3>* *"--I+;!',*" E!$("3/+ ,!" ,-+$ '$;/&4 '$! "?,#E-"4 ()"W,!4 X3>/ ,/& U("%$! ',#3-3"+ $' 01-,%$/+2% 3-1(+#-)2%5<0 A)"+" &,(,+;.."+( (),( !"*$#B3/,(3$/ "%"/(+ B"(2""/ ("-$#"!3* !".3$/+ #,D B", #,Y$! #"*),/3+# 3/%$-%"& 3/ ()" ."/"!,(3$/ $' $!.,/3+#I+E"*3>**"--I+;!',*" #$-"*;-"+0 A)"+" #$-"*;-"+ #,D ,-+$ B" $' 3#E$!(,/*"'$! *"-- 3&"/(3(D ,/& '$! E!$*"++"+ (),( ."/"!,(" )DE"!%,!3,B-" *"--I+;!',*" #$-"*;-"+ !"-"%,/( '$! +"-' ,/& /$/I+"-' !"*$./3(3$/0Z" /"?( *$#E,!"& ()" E!$("3/+ $' !" #$%&' 23() ()$+" $' ()"

;/3*"--;-,! ";K,!D$(" !" (')'*+,+-' ,/& ()" #"(,Q$,/ 4" '1'5-6,@[3.0 8C4 ;+3/. \-,+(S97 23() , *;($'' 7I%,-;" $' =0==6 ,/& /$ -$2I*$#E-"?3(D >-("!3/.0 ]?*-;&3/. ."/"+ *$&"& BD ()" #3($*)$/&!3,,/& (!,/+E$+$/+4 2" ;+"& , &,(, +"( $' 74:<5 E!$("3/+ '!$# !" #$%&'4F4<<< E!$("3/+ '!$# !" (')'*+,+-' @G"!E"E 67 R,D 9==6^ '(E_``'(E0+,/."!0,*0;K`E;B`D",+(`UG!",//$(,(3$/`*"!E"EC ,/& 6V4599 E!$("3/+'!$# 4" '1'5-6, @'(E_``'(E0+,/."!0,*0;K`E;B`&,(,B,+"+`2$!#E"EC0PB$;( (2$I()3!&+ $' ()" !" #$%&' E!$("3/+ @849:6C ),%" )$#$-$.;"+3/ *$##$/ 23() B$() !" (')'*+,+-' ,/& 4" '1'5-6, @[3.0 8C0 P +#,--"!/;#B"!4 <5V @65MC4 ),%" )$#$-$.;"+ 3/ !" (')'*+,+-' B;( /$( 3/4" '1'5-6, ,/&#,/D '"2"!4 67F @8MC4 ),%" )$#$-$.;"+ 3/4" '1'5-6,B;( /$( 3/ !" (')'*+,+-'0 P ($(,- $' 5:6 E!$("3/+ @67MC +""#+ ($ B";/3H;" ($ !" #$%&'0 P *$#E,!3+$/ B"(2""/ !" (')'*+,+-' ,/& ()"$()"! (2$ $!.,/3+#+ .,%" +3#3-,! !"+;-(+4 23() 845=F @59MC $' ()"E!$("3/+ 3/ *$##$/4 V6: @65MC '$;/& $/-D 3/ !" #$%&' ,/& 6F=@8MC $/-D 3/ 4" '1'5-6,4 -",%3/. 646=7 E!$("3/+ @6VMC ;/3H;" ($!" (')'*+,+-'0 A);+4 !" (')'*+,+-' E!$("3/+ 23() )$#$-$.;"+ $/-D 3/!" #$%&' ($(,- V6:2)"!",+ ()" !"%"!+" *$#E,!3+$/ ($(,-+ <5V @[3.0 8C43/&3*,(3/. (),( ()"!"#3.)( B"#$!" ."/" &;E-3*,(3$/+ 3/ !" (')'*+,+-'4,**$;/(3/. '$! ()" "?(!, E!$("3/+ '$;/& 3/ ()3+ $!.,/3+#0A$ 3/%"+(3.,(" ."/" &;E-3*,(3$/ ';!()"!4 2" *,!!3"& $;( ,/ a,--

,.,3/+( ,--b *$#E,!3+$/ ;+3/. ()" +,#" E!$("3/ &,(, +"(+ ,/& OG\L\-,+(G-;+(5: @'(E_``/*B30/-#0/3)0.$%`B-,+(`&$*;#"/(+`X]PNR]0B*-C ($ &3+(3/.;3+) E!$("3/ *-;+("!+ '!$# E!$("3/+ !"E!"+"/("&;/3H;"-D0 1' ()" 74:<5 E!$("3/I*$&3/. ."/"+ $' !" #$%&'4 74F6F),%" /$ $()"! +"H;"/*" !"-,(3%"+ 23()3/ ()" $!.,/3+# ,/& *,/ B"*$/+3&"!"& ;/3H;"0 A)" !"#,3/3/. 856 ,!" &3+(!3B;("& ,#$/.E!$("3/ *-;+("! .!$;E+ 23() (2$ $! #$!" #"#B"!+ @A,B-" 8C0T+3/. ()" +,#" E,!,#"("!+ 3/ !" (')'*+,+-'4 F4=56 ."/"+ ,!" ;/3H;",/& <65 ',-- 3/($ .!$;E+ 23() (2$ $! #$!" #"#B"!+ @A,B-" 8C0A)3+ +;EE$!(+ ()" 3&", (),( ()"!" 3+ -"++ ."/" !"&;/&,/*D (),/ 3/!" (')'*+,+-'4 2)3*) #,D )"-E ';/*(3$/,- ,/,-D+"+ $' ()$+" ."/"+ (),(,!" /$( &;E-3*,("& 3/ !" #$%&'0

!"#$%&'(

OPATX] c W1d 76F c 96 []\XTPXe 9==9 c 2220/,(;!"0*$# )*+

!"#$% & '()*+(, -%* .%(% "(/ "0%*".% -+,1)1+(, +2 1()*+(, 31)41( .%(%,

!"#$%"& '($ )("( *%+ %, )("(& -(." )("( /(")#0 12'3 4%&5#5%" %, 5"#$%"&6

7 8 9 : ; <0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

= 8><?9 7>:@A B B B B B B7 @@< 7>:8< =+8< 1=+;=3 B B B B B8 <7: 7>9@< =+7A 1=+993 =+:? 1=+<<3 B B B B9 98: 7>;?? =+79 1=+8;3 =+9A 1=+;=3 =+<9 1=+A;3 B B B: 7:? 7><99 =+7= 1=+8=3 =+8A 1=+:=3 =+;= 1=+<=3 =+A9 1=+?=3 B B; A= 7><=9 =+=? 1=+7A3 =+88 1=+993 =+9A 1=+:@3 =+;< 1=+<<3 =+AA 1=+?93 B< := 8>7<8 =+=< 1=+7:3 =+88 1=+8?3 =+9: 1=+:83 =+:@ 1=+;A3 =+<< 1=+A73 =+?8 1=+?;3AB7; 9: 8>A<< B B B B B B0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000C0( D.#. &(# %, :><AA 5"#$%"& E.& '$('.$(D FG&# 2(,%$( H%I'/(#5%" %, #0( E0%/( )("%I( &(JG("H(+6 C0( I(." '%&5#5%" %, 5"#$%"&> E5#0 #0( K./G(& 5" 2$.HL(#& $('$(&("#5") #0( '%&5#5%" 5, #0( 5"#$%"& E($( D5&#$52G#(D (K("/M #0$%G)0%G# #0( )("(+

150

918

1,104

3,605

3,281

681

769

145

S.p. proteinsin S.c. and C.e.

S.c. proteinsin S.p. and C.e.

S.c. proteinsin S.p.

S.c. proteinsin C.e.

S.c. only

S.p. only

S.p. proteinsin S.c.

S.p. proteinsin C.e.

a S. pombe

b S. cerevisiae

0*1'/" 2 !"#$%&'(") "* $&"+,')( ') !" #$%&' -!"#"./ !" (')'*+,+-' -!"(". %)0 ." '/'0-1,-."'".1 ,/ 2', 34%&+ 3"#$%&')5 +4, 4"#"6"57 "* $&"+,')( "* !" #$%&' 8'+4 +4"(, "*!" (')'*+,+-' %)0 ." '/'0-1,1 3/ 2', 34%&+ 3"#$%&')5 +4, 4"#"6"57 "* $&"+,')( "*!" (')'*+,+-'8'+4 +4"(, "* !" #$%&' %)0 ." '/'0-1,1 9"& ,:%#$6,/ !"#" $&"+,')( ') !"(" %)0

."'" #,%)( !" #$%&' $&"+,')( 8'+4 4"#"6"5;,( *";)0 ') !" (')'*+,+-' %)0 ." '/'0-1,1

<4, %=("6;+, );#=,&( "* $&"+,')( %&, 5'>,) *"& ="+4 7,%(+(1

© 2002 Macmillan Magazines Ltd

!"#$% &'()$() *)%)(!" #$$%$$ &'% ($%)(*+%$$ ") !" #$%&' )", -+.%$&-/#&-+/ &'% )(+0&-"+$ ")/%+%$ ,%*#&%1 &" '(2#+ 1-$%#$%3 4% ($%1 &'% $#2% 2%&'"1 #+11#&#$%& ") '(2#+ 1-$%#$% /%+%$ #$ &'#& %25*"6%1 )", #+#*6$-$ ") &'%()$*$#+,-. /%+"2%789 :,"&%-+;0"1-+/ /%+%$ ") !" #$%&' 4%,% -1%+;&-<%1 &'#& /%+%,#&% 5,"1(0&$ 4-&' $-2-*#,-&-%$ &" 5,"&%-+$ 0"1%1 =6>?8 /%+%$ &'#& #,% 2(&#&%13 #25*-<%1 ", 1%*%&%1 -+ '(2#+ 1-$%#$%9@ &"&#* ") AB> !" #$%&' 5,"&%-+$ '#.% $-2-*#,-&6 4-&' 2%2=%,$ ")&'-$ 1#&# $%& ") '(2#+ 1-$%#$% 5,"&%-+$3 #+1 A>> ") &'%$% '#.% /;.#*(%$ /,%#&%, &'#+ A ! AC"DC9 !'%$% .#*(%$ -+1-0#&% &'#& %-&'%, &'%6#,% +"& $-/+-<0#+& ", &'%6 '#.% "+*6 *-2-&%1 $-2-*#,-&-%$ 4-&' &'%%E(-.#*%+& '(2#+ 5,"&%-+$3 ,%F%0&-+/3 )", %G#25*%3 $'#,%1 1"2#-+$$(0' #$ ,%*#&%1 5,"&%-+;-+&%,#0&-+/ ,%/-"+$ ", 0#&#*6&-0 $-&%$9 H"4;%.%,3 1%$5-&% &'-$ *-2-&#&-"+3 &'%6 2#6 $&-** =% ($%)(* )", -+.%$&-/#&-+/&'% =-"0'%2-0#* #0&-.-&-%$ #+1 -+&%,#0&-"+$ ") '(2#+ 1-$%#$% 5,";&%-+$ -+ !" #$%&'9 !'% "&'%, IC !" #$%&' 5,"&%-+$ J!#=*%$ D #+1 IK'#.% /;.#*(%$ *"4%, &'#+ A ! AC"DC9 !'% 2",% $-/+-<0#+& $-2-*#,-&-%$$%%+ 4-&' &'-$ 0*#$$ 2%#+ &'#& /%+%$ 0"1-+/ )", &'%$% 5,"&%-+$ #,%2",% *-L%*6 &" =% ($%)(* )", -+.%$&-/#&-+/ +"& "+*6 &'% =-"0'%2-0#*=(& #*$" &'% =-"*"/-0#* )(+0&-"+$ ") &'% '(2#+ /%+%$3 #+1 $"2%0"(*1 5,".-1% /""1 2"1%*$ )", $&(16-+/ &'% #$$"0-#&%1 '(2#+1-$%#$% 5#&'4#6$9 !'% *#,/%$& /,"(5 ") '(2#+ 1-$%#$%;,%*#&%1 /%+%$#,% &'"$% -25*-0#&%1 -+ 0#+0%,9 !'%,% #,% >M $(0' /%+%$ J!#=*% DK3#+1 &'%6 #,% -+."*.%1 -+ NO@ 1#2#/% #+1 ,%5#-,3 0'%0L5"-+&0"+&,"*$3 #+1 &'% 0%** 060*%3 #** 5,"0%$$%$ -+."*.%1 -+ 2#-+&#-+-+//%+"2-0 $&#=-*-&69 !'% 0%** 060*% #+1 0'%0L5"-+& =#0L/,"(+1 ")!" #$%&' 2#L% -& # /""1 2"1%* ",/#+-$2 )", $&(16-+/ &'%$%5#,&-0(*#, 0#+0%, 1-$%#$% 5#&'4#6$9 P&'%, 0#&%/",-%$ &'#& #,% #*$"

,%5,%$%+&%1 -+ !" #$%&' #,% &'"$% -+."*.%1 -+ 2%&#="*-0 JA> /%+%$K3+%(,"*"/-0#* JAM /%+%$K3 0#,1-#0 JA /%+%K #+1 ,%+#* JA /%+%K 1-$%#$%J!#=*% IK9@ $-2-*#, #+#*6$-$ -+ !" 0')'1,*,.' -1%+&-<%1 A?> 5,"&%-+$ 4-&'

$-2-*#,-&-%$ &" &'% '(2#+ 1-$%#$% $%&3 4-&' 2"$& ") &'% /%+%$ 0"1-+/)", &'%$% 5,"&%-+$ =%-+/ $'#,%1 =6 &'% &4" 6%#$&$9 P+*6 &4" ") &'%/%+%$ JQ:@R7MC9AM0 #+1 Q:SRIMC9A>0K3 )"(+1 -+ !" #$%&' =(& +"&!" 0')'1,*,.'3 0"1% )", 5,"&%-+$ 4-&' #+6 $-/+-<0#+& $-2-*#,-&6 &"'(2#+ 1-$%#$% 5,"&%-+$9 !'%$% #,% &(=%,"($ $0*%,"$-$ > J!QR>K3-+."*.%1 -+ 0#+0%,3 #+1 0%,"-1 *-5")($0-+"$-$ ::!A3 -+."*.%1 -+2%&#="*-$29 S"&' 6%#$&$ $%%2 &" =% $-2-*#,*6 ($%)(* #$ 2"1%*",/#+-$2$ )", &'% $&(16 ") '(2#+ 1-$%#$% /%+% )(+0&-"+3 #*&'"(/'&'%-, 1-))%,-+/ =-"*"/-%$ 2#6 )#."(, "+% ",/#+-$2 )", 0%,&#-+ /%+%$#+1 &'% "&'%, ",/#+-$2 )", "&'%, /%+%$9

+,-.)'% &-#$'%(T-$&%1 -+ !#=*% 7 #,% &'% &%+2"$& ),%E(%+& 5,"&%-+ 1"2#-+$ )"(+1 -+!" #$%&'3 4-&' AA 2",% 1"2#-+$ ") -+&%,%$& -+ &'% &"5 DC 2"$&),%E(%+&3 #$ 1%&%,2-+%1 =6 U+&%,:," 2#&0'%$BC3 &"/%&'%, 4-&' &'%),%E(%+06 ") &'%$% 1"2#-+$ )", &'% "&'%, )(**6 $%E(%+0%1 %(L#,6"&-0/%+"2%$9 !'%$% 1"2#-+$ #,% 1-.-1%1 -+&" &',%% 0#&%/",-%$ JAVMK9!'% <,$& 0#&%/",6 JAK 0"+$-$&$ ") <.% 1"2#-+$ )"(+1 -+ &'% &"5 &%+

2"$& ),%E(%+& 1"2#-+$ -+ !" #$%&' &'#& #,% #*$" )"(+1 -+ &'% &"5 &%+") #& *%#$& )"(, ") &'% "&'%, %(L#,6"&%$9 !'%6 #,% &'% @!:WX!:=-+1-+/ $-&%3 &'% YNDC ,%5%#&3 &'% %(L#,6"&-0 5,"&%-+ L-+#$%0#&#*6&-0 0",%3 &'% ZO@ =-+1-+/ ,%/-"+ ZO:;A3 #+1 &'% [-+0 <+/%,R>H>;&65% &,#+$0,-5&-"+#* #0&-.#&",9 !'%$% (+-.%,$#* #+1 0"2;2"+*6 %G5*"-&%1 1"2#-+$ #*$" )%#&(,% '-/'*6 -+ "&'%, %(L#,6"&%$9S%0#($% &"&#* /%+% +(2=%, -+0,%#$%$ 4-&' &'% 0"25*%G-&6 ") #+",/#+-$23 &'% 5,"5",&-"+ ") &'%$% 1"2#-+$ -$ #55,"G-2#&%*6 $-2-*#,-+ %#0' ") &'% $%E(%+0%1 %(L#,6"&-0 /%+"2%$9 \+%,/6 (&-*-[#&-"+%G5*"-&-+/ &'% @!:WX!: =-+1-+/ $-&%3 5,"&%-+ 5'"$5'",6*#&-"+1%5%+1%+& "+ &'% 0#&#*6&-0 5,"&%-+ L-+#$% 1"2#-+3 #+1 &,#+$0,-5;&-"+#* #0&-.#&-"+ ($-+/ &'% [-+0 <+/%, R>H> 1"2#-+ 2($& 1%<+%=-"0'%2-0#* 2%0'#+-$2$ &'#& #,% ,%#1-*6 %G5*"-&%1 &" /%+%,#&% +%4=-"*"/-0#* 5#&'4#6$9U+ &'% $%0"+1 0#&%/",6 J>K3 &'% 1"2#-+$ #,% 5,%$%+& -+ # $-2-*#,

#=$"*(&% +(2=%, -+ &'% %(L#,6"&-0 /%+"2%$ #+#*6$%19 @2"+/$&&'"$% 2",% ),%E(%+&*6 )"(+1 -+ &'-$ 0#&%/",6 #,% &'% SZR!3,%5*-0#&-"+ )#0&", R3 2-+-0',"2"$"2% 2#-+&%+#+0% 5,"&%-+$J]R]$K3 ^-[[63 NO@;1-,%0&%1 NO@ 5"*62%,#$% ! )#2-*6 #+1'%*-0#$% R;&%,2-+#* 1"2#-+$9 Q"2% ") &'%$% #,% -+."*.%1 -+ 0",%0%** #0&-.-&-%$ *-L% NO@ ,%5*-0#&-"+3 NO@ ,%5#-, #+1 0%**;060*%5,"/,%$$-"+3 5%,'#5$ %G5*#-+-+/ 4'6 &'%6 #,% 5,%$%+& -+ $-2-*#,

!"#$%&'(

)*+ O@!_Z\ ` aPT DAI ` >A ^\SZ_@Zb >CC> ` 4449+#&(,%90"2

!"#$% & '%(% )*+$,-".,/( ,( !" #$%&' !"# !" (')'*+,+-'

!"#$%&' (%()%"*+%" ,-.*$%"

/#0 #1 ,-.*$%"*&' !" #$%&'

/#0 #1 ,-.*$%"*&' !" (')'*+,+-'

99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999

2 34525 546728 283 8579 2: 8;3 ; 225 8 97 2 2: 8 2": 6 9

<#$=- '#0 #1 ,-.*$%"* 3477> 54973<#$=- '#0 #1 *%?.%',%* 34;:7 54:::99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999!"#$%&' ,-.*$%"* @%"% &A%'$&B%A @&$C /DEF E-=*$D-.*$ .*&'G +="=(%$%"* H260I60>4 =* "%,#((%'A%A)J K0L#-1 M+%"*#'=- ,#((.'&,=$&#'N0L% .*%A A=$=)=*%* #1 34;:7!" #$%&' +"#$%&'* +"%+="%A O.*$)%1#"% ,#(+-%$&#' #1 $C% G%'#(% *%?.%',% ='A #1 54::: !" (')'*+,+-' +"#$%&'*0

!"#$% 0 !(.+/$,-((.-)$%0(', #$%&' $%"%& '%(!)%# )* +,-!" .!".%' $%"%&

P.(=' ,=',%" G%'% H,#"%Q !" #$%&' G%'%R+"#A.,$ HJ*$%(=$&, '=(%9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999

S%"#A%"(= +&G(%'$#*.( TU ./0 VVVV "=A254 "C+9 H!WD2T3028S%"#A%"(= +&G(%'$#*.( EU 12334 VVVV "=A85 H!WD2:W5067P%"%A&$="J '#'X+#-J+#*&* ,#-#"%,$=- ,=',%" MP/!DDNU 5!67 VVVV (*C8 H!ED83D7028DS%"#A%"(= +&G(%'$#*.( YU ./8 VVVV "=A274 "=A264 "=A864 *@&> H!DD>:6062F((.'#A%B,&%',JU T/W -&G=*% 2 VVVV ,A,2: H!WD5:W26029DP/!DDU /5!7 VVVV +(*2 H!WD2>Z28068DP/!DDU 5!69 VVVV (*C7 H!DD8;5027DP/!DDU 5!64 VVVV *@&3 H!WD;Y22069P/!DDU 5:6; VVVV (-C2 H!ED2:69063P=%(=$#-#G&,=- DC%A&=[\P&G=*C& *J'A"#(%U 36!; VVVV \ H!ED8;]28067DT="&%"\LC&$% A&*%=*%U !123< VVVV +G=[ H!ED92]2068DE-##( *J'A"#(%U =:5 VVVV C.*84 "?C24 "=A28 H!WD8Z22028W$=^&= $%-='G&%,$=*&=U <>5 VVVV $%-2 H!DD89E7069DS%"#A%"(= +&G(%'$#*.( ZU ./? VVV "=A29 H!ED9]:06;D<.)%"#.* *,-%"#*&* 8U >!37 VVV \ H!WD796029DF((.'% )="% -J(+C#,J$%U <=3=4 VVV \ H!ED>E706>DT#@'"%G.-=$%A &' =A%'#(=U 02< VVV \ H!WD;7>065DT&=(#'A\E-=,[1=' ='=%(&=U 2/!;@ VVV "+*2> H!ED73>068D#,[=J'% *J'A"#(% FU 3AB; VVV \ H!ED5::06>2<! VVV *$%54 "=*2 H!WD2:P>06>DDJ,-&'XA%+%'A%'$ [&'=*% 3U 30AC VVV ,A,8 H!ED22E2606>DP_8 +"#$%&' [&'=*% VVV ,A*2 H!DD2;E5022D<A>7 VVV +,[84 *$*74 +[,2 H!ED28T28063D9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999Q H,#"%* ="%` VVVV4 #2 ! 26"266U VVV4 2 ! 26"36 $# 2! 26"2660

© 2002 Macmillan Magazines Ltd

!"#$%&'( )&*"(+ +(,!+-%(## $. ,()$*( #/0(123 45#'(*!'/6 #(!+67(#.$+ $'7(+ -$*!/)# 8+(#()' /) #/*/%!+ !"#$%&'( )&*"(+# /) ,()$*(#$. !%% (&9!+5$'(# */,7' /-()'/.5 $'7(+: !' 8+(#()' &)+(6$,)/0(-:.&)6'/$)# /);$%;(- /) #/*/%!+ 6$+( 6(%% !6'/;/'/(#3<7( '7/+- 6!'(,$+5 =>? /)6%&-(# -$*!/)# @7$#( $66&++()6( +/#(#

-+!*!'/6!%%5 @/'7 /)6+(!#/), ,()$*( #/0( @/'7/) '7( A('!0$!3 <7/#6!'(,$+5 /)6%&-(# '7( 4B>: CB !)- '5+$#/)(D-&!%E#8(6/F6/'5 87$#E87!'!#( -$*!/)#3 <7(#( !+( /);$%;(- /) /)'+!E !)- /)'(+6(%%&%!+#/,)!%%/), 8!'7@!5#: @7/67 */,7' "( (G8(6'(- '$ "(6$*( /)6+(!#E/),%5 (%!"$+!'( !# *&%'/6(%%&%!+ 6$*8%(G/'5 /)6+(!#(#HI:123<@$ $'7(+ -$*!/)# /) '7( '$8 '() .$+ "$'7 '7( 5(!#'# !+( '7( #&,!+

!)- JKL '+!)#8$+'(+# =<!"%( H?3 !" #$%$&'(')$ 7!# #/,)/F6!)'%5 *$+($. '7(#( -$*!/)# !)- '7( !*/)$E!6/- 8(+*(!#( -$*!/) '7!) -$(#!" *+,-$1M: @7/67 *!5 (G8%!/) @75 /' /# ! *$+( ;(+#!'/%( $+,!)/#*:,+$@/), $) ! ,+(!'(+ +!),( $. *(-/!3 <7( N)=%%?L5#=H? '+!)#6+/8'/$)E

.!6'$+ -$*!/) /# .$&)- $)%5 /) '7( '@$ 5(!#'#: #&88$+'/), '7( /-(!'7!' /' /# #8(6/F6 '$ .&),/3 <7( 67+$*$-$*!/) /# .$&)- *$+(.+(O&()'%5 /) !" *+,-$P#(;() (G!*8%(# 6$*8!+(- @/'7 '@$ /)!" #$%$&'(')$P8$##/"%5 +(Q(6'/), -/..(+()6(# /) 7/,7(+E$+-(+ 67+$E*!'/) #'+&6'&+(3

!"#$%$& '(" ")*+,-.'%/ /"00<7( ,()$*( #(O&()6( $. !" *+,-$ /)6+(!#(# '7( +!),( $.!;!/%!"%( 6$*8%('( (&9!+5$'/6 ,()$*( #(O&()6(# '$ '@$ &)/E6(%%&%!+ .+((E%/;/), $+,!)/#*# =!" #$%$&'(')$ !)- !" *+,-$?: $)(8%!)' =.%)-'/+*('(?: !)- '7+(( *('!0$!)# =0" $1$2)3(: 4%+(+*5'1)!)- 7&*!)#?3 <7/# +!),( $. $+,!)/#*# !%%$@# ! 6$*8!+/#$)"('@(() (&9!+5$'/6 !)- 8+$9!+5$'/6 ,()$*(# =+(8+(#()'(- "5 >1"!6'(+/! !)- R !+67!(!?: @/'7 '7( /)'()'/$) $. /-()'/.5/), '7$#(,()(# /*8$+'!)' .$+ (&9!+5$'/6 6(%% $+,!)/0!'/$)3 S( 7!;( *!-( !)

!"#$%&'(

TJ<UVW X YZ[ \2] X M2 ^WKVUJV_ M``M X @@@3)!'&+(36$* )**

!"#$% & !"#$%&'(""#()&*+",' -&*., !"#"$ %"&'(") (* +,-'# ).$"'$" !"#"$

!"#$% &'()$() *)%) +'()$() ,-./)0 !" #$%&' *)%)12/.&"-3 ,4(3)#$3'- %$#)3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333

5'6(.% &'()$()7 ()*+, 8)3$9.6'- :::: ;<342) -.22)/ =>;$() ,;?@AB=CDEFG.%<'%("6'%<&)2)%&)%3 &'$9)3)(7 *-!./ 8)3$9.6'- :::: H/2FI H'%)('% /)6$3)& ,;=@AAJFADEB@!42)/'%("6'%'(#7 (,--0 8)3$9.6'- :::: =?@ 3/$%(2./3)/ ,;=@CKFEDFF@LM;+ &)N-')%-47 12*3 8)3$9.6'- :::: OPQF L;M &)R4&/.*)%$() ,;=@C=FADFS@'3/"66'%$)#'$ 342) T7 (!! 8)3$9.6'- :::: =/*'%'%.("--'%$3) (4%3R$() ,;?@UASDEV@5)/%'-H)WX./($H.QQ (4%&/.#)7 ).) 8)3$9.6'- ::: >/$%(H)3.6$() ,;?@ALVDEVY$/')*$3) 2./2R4/'$7 **45 8)3$9.6'- ::: ;/.3.2./2R4/'%.*)% .Z'&$() ,;=@FKVDE[@8$3"/'34<.%()3 &'$9)3)( .Q 3R) 4."%* \8]+^A_7 1-. 8)3$9.6'- ::: RZHFI R)Z.H'%$() ,;=@AU!MDEUL'3)6#$%`( (4%&/.#)7 !6-/7(8 8)3$9.6'- ::: @@@ G$<X<@6 3/$%(2./3)/ ,;?@FS!FEDFM@4(3'%"/'$ 342) F7 !6-8(/ 8)3$9.6'- ::: !<*6"-.('&$() ,;?@FMSCDE[@4(3'- N9/.('(7 (,--+ 8)3$9.6'- ::: =?@ 3/$%(2./3)/ ,;?@CVBDEV?$/33)/`( (4%&/.#)7 !6-/7(/ 8)3$9.6'- ::: @@@ G$<X<@6 3/$%(2./3)/ ,;?@FS!FEDFM8)%H)( (4%&/.#)7 ()*+( G)"/.6.*'-$6 :::: ;<342) -.22)/ =>;$() ,;?@AB=CDEF+)$Q%)((I R)/)&'3$/47 9:4/; G)"/.6.*'-$6 :::: #4.VF -6$(( Y #4.('% ,;?@A+FEDFU@a)66P)*)/ (4%&/.#)7 *<5/ G)"/.6.*'-$6 ::: ===<Q$#'64 =>;$() ,;@@VVCDEC>R.#()% &'()$()7 -6-=/ G)"/.6.*'-$6 ::: @6@ -R6./'&) -R$%%)6 ,;?@FB@[DFF,2'%.-)/)9)66$/ $3$Z'$ 342) M \,@=M_7 -(-=(/( G)"/.6.*'-$6 ::: YT@ (.&'"# -R$%%)6 ,;[email protected].%'- &4(3/.2R47 39/ G)"/.6.*'-$6 ::: ./9M ,)/1>R/ 2/.3)'% H'%$() ,;=@SAFDFA8-@"%)W=69/'*R3 (4%&/.#)7 1=(!/ G)"/.6.*'-$6 ::: *2$F *"$%'%) %"-6).3'&) 9'%&'%* ,;?@[email protected])`( .-"6.-)/)9/./)%$6 (4%&/.#)7 4->6 G)"/.6.*'-$6 ::: ;T; 2R.(2R$3$() ,;?@ALADEA+)%3(7 -6-=; G)"/.6.*'-$6 ::: @6@ -R6./'&) -R$%%)6 ,;?@FB@[[email protected]%Wb.P/47 >*!2.(8 G)"/.6.*'-$6 ::: ,)/1>R/ 2/.3)'% H'%$() ,;@@AU?FEDE[=%*)6#$%7 ?,<8( G)"/.6.*'-$6 ::: c9'd"'3'%<2/.3)'% 6'*$() ,;?;S?[DA[=#4.3/.2R'- 6$3)/$6 (-6)/.('(7 !43/ G)"/.6.*'-$6 ::: (.&FI ("2)/.Z'&) &'(#"3$() ,;=@SAFDFE@]*"-R' 342) A7 >@.A= G)"/.6.*'-$6 ::: ,)/1>R/ 2/.3)'% H'%$() ,;@@AU?FEDE[K$#'6'$6 -$/&'$- #4.2$3R47 9:@+ @$/&'$- :::: #4.AI #4.('% TT ,;@@MUVDEV@e)%$6 3"9"6$/ $-'&.('(7 ()*2,/ e)%$6 :::: Y<342) =>;$() ,;=@MC[DEV@33333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333330 ,-./)( $/)f ::::I "F ! FE"FEE7 :::I F ! FE"UE 3. F! FE"FEED

!"#$% ' ()*+%,- .*/",- "-"$01,1 "-. 2*/3"),1*- 4,+5 *+5%) %67")0*+%1

T%3)/2/.$--)(('.% %.D

!" #$%&' !" B'C'DEFEG' @" FG#E'HF 3" %'IGH$JGFK'C -" 'I'JGHF (" KLGIEGHG T%3)/2/. %$#)

;/.3)'%( e$%H ;/.3)'%( e$%H ;/.3)'%( e$%H ;/.3)'%( e$%H ;/.3)'%( e$%H ;/.3)'%( e$%H3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333

T;eEEFMS[ AFC F AM[ F UCM V ACF U FBF [ CCF V =>;1L>;<9'%&'%* ('3) #.3'Q = \;6..2_ FT;eEEFMSE FFU A B[ C A[[ S FSC V FEA FB AFE FE L 2/.3)'% # 5+UE /)2)$3( FT;eEEE[FB FFF C FFB A V[B C C[[ A UVE A FIEUB F J"H$/4.3'- 2/.3)'% H'%$() FT;eEEEVEU SE U MF V CE[ [ FSA M B[ AF AVV S eG= 9'%&'%* /)*'.% eG;F FT;eEEFMVE M[ V MC U FVV AE FEF F[ SE A[ FUS FC !)6'-$() @<3)/#'%$6 &.#$'% AT;eEEFSUF UU M CC FA AFV FV FAE FF FAM FA C[B U eTGL N%*)/ WT;eEEFUUE CS [ CC FA FVE AF BA FS UM UC FAV F[ >;e /)2)$3 WT;eEEFEMM CM S UM S UU MU UV CU VV C[ BS AM ,"*$/ 3/$%(2./3)/ WT;eEEFMF[ CC B UA B [V UE M[ AS MF CM FEC AV =?@ 3/$%(2./3)/ Q$#'64 WT;eEEESAA CA FE VF [ [FA A UEC F FVU FE FFV AE a'%- N%*)/I @A!A 342) F

T;eEEFCV[ FU AC FE CE AU SA F[ MF AV ME F[ SC ?e@> &.#$'% AT;eEEESMA S AB B CF S BS B MS M [B FC S[ e)26'-$3'.% Q$-3./ @ -.%()/g)& &.#$'% AT;eEEAEMU V CA V CV U FEA M [E C SA V BV +G= &'/)-3)& +G= 2.64#)/$() Q$#'64 # AT;eEEFAES M CF M CU FA BU FC MU V SE S BA 8@8 Q$#'64 AT;eEEEEEA V CA C C[ C FEC U [A A SC M BU KTaa^1@+@AE &.#$'% A

T;eEEFUVA AF FM AC FS AAE FU SA AC MA CV C B[ ,/- R.#.6.*4 C \,!C_ &.#$'% CT;eEEFSUB AF FM AM FM AVC FF SB AA [V CF A[ [C ;! &.#$'% CT;eEEECS[ B AS FF AB FFA AB U[ UE FFE FM AF [B >4/.('%)<(2)-'N- 2/.3)'% 2R.(2R$3$() $%&

&"$6<(2)-'N-'34 2/.3)'% 2R.(2R$3$() Q$#'64C

T;eEEFFCS A[ FC VA M E G= E G= E G= E G= K"%*$6 3/$%(-/'23'.%$6 /)*"6$3./4 2/.3)'% WT;eEEAABC AF FM CA FC UC MV CM UV CA VU MV UA ;)/#)$() Q./ $#'%. $-'&( $%& /)6$3)& -.#2."%&( WT;eEEEBVC [ CE A CS AM SE AE VS FV [E AU [M @R/.#.&.#$'% W3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333+.#$'% '&)%3'N)/( $/) Q/.# T%3)/;/.I PR'-R '%3)*/$3)( ;e],T>JI ;eTG>,$%& ;K=8D]%64 &.#$'%(P'3R'% 3R)#.(3 Q/)d")%3 UE Q."%& '%!" #$%&' $/) *'g)%D >R) %"#9)/( .Q 2/.3)'%(P'3R 3R)() &.#$'%( $%&3R)'/ /$%H'%* '( *'g)% Q./ !" #$%&' $%& 3R) .3R)/ )"H$/4.3)( 6'(3)&D =3 3R) /'*R3 )%& .Q 3R) 3$96) '( $ -6$(('N-$3'.% .Q FWC7 ()) 3)Z3 Q./ $% )Z26$%$3'.%D G=I %.3 $226'-$96)D

© 2002 Macmillan Magazines Ltd

!"!#!$% $"$%&'!' #( !)*"#!+& #,* -(.* /("'*.0*) 1*"*' +$%%!"1 !" #,!'/$#*1(.& 2& /(-3$.!"1 #,* 3.*)!/#*) 3.(#*!" '*45*"/*' /()*) 2&#,* $2(0* 1*"(-*'6 7,* 3*./*"#$1* '!-!%$.!#& 8$' )*.!0*) +.(- #,*,!# 2!# '/(.* )!0!)*) 2& #,* '*%+ 2!# '/(.* +(. *$/, 3.(#*!" 9'** 7$2%* :%*1*");6<* '*%*/#*) #,('* 3.(#*!"' 8!#, $ ,!1, 3*./*"#$1* '!-!%$.!#&'/(.* !" $%% (+ #,* *5=$.&(#*'> $") $ %(8 ("* !" $%% (+ #,* 3.(=$.&(#*'67,.** #,.*',(%)' 9?@A> B?A $") B@A; 8*.* 5'*) #( !)*"#!+&3.(#*!"' #,$# $.* ,!1,%& /("'*.0*) !" #,* +5%%& '*45*"/*) *5=$.&(#*'$") #,.** /(..*'3(")!"1 #,.*',(%)' 9C@A> D?A $") DCA .*'3*/E#!0*%&; #( !)*"#!+& 3.(#*!"' "(# +(5") !" #,* +5%%& '*45*"/*)3.(=$.&(#*' 97$2%* :$;6 F(. $" !"!#!$% )!'/5''!(" (+ #,*'* 3.(#*!"'>#,.*',(%)' (+ ?@A $") C@A 8*.* '*%*/#*)6 7,!' $"$%&'!' !)*"#!G*'1*"*' /()!"1 +(. 3.(#*!"' #,$# $.* ,!1,%& /("'*.0*) !" &*$'#'> 3%$"#'$") -*#$H($"' 92& 5'!"1 $ #,.*',(%) (+ ?@A '!-!%$.!#&; $") &*# $.*"(# 8*%% /("'*.0*) !" 3.(=$.&(#*' 92& 5'!"1 $ #,.*',(%) (+ C@A'!-!%$.!#&;6 7,* 3.(#*!"' !)*"#!G*) 5'!"1 #,*'* /.!#*.!$ $.* %!=*%& #(2* !-3(.#$"# +(. -$!"#$!"!"1 *5=$.&(#!/ /*%% (.1$"!H$#!(">$%#,(51, #,* ,!1, #,.*',(%) (+ ?@A -*$"' #,$# (#,*. 3.(#*!"'.*45!.*) +(. #,!' -$& 8*%% 2* *I/%5)*)6J'!"1 #,*'* #,.*',(%)'> KC 1*"*' 8*.* !)*"#!G*) $") 1.(53*)

$//(.)!"1 #( +5"/#!(" 97$2%* L;6 M(.* !"+(.-$#!(" $2(5# #,*'*1*"*' /$" 2* +(5") (" #,* N*"*OP 8*2'!#* 9,##3QRR88861*"*)26(.1R3(-2*; $") #,* S(-2*SO 8*2'!#* 9,##3QRR3.(#*(-*6/(-R)$#$2$'*';6 78( (+ #,*'* 1.(53' /()* +(. 3.(#*!"' $''(/!$#*) 8!#,/,$.$/#*.!'#!/' /("'!)*.*) #( )!'#!"15!', *5=$.&(#!/ /*%%' +.(-3.(=$.&(#!/ /*%%'Q #,* (.1$"!H$#!(" (+ OTU !" /,.(-('(-*'8!#,!" $ "5/%*5'> $") #,* +(.-$#!(" (+ B@V $") K@V .!2('(-$%'525"!#'> 8,!/, $.* %$.1*. #,$" #,* 3.(=$.&(#!/ W@V $") ?@V'525"!#'6 7,* G.'# 1.(53 !"/%5)*' #,* XW $") XB /(.* ,!'#("*3.(#*!"' .*45!.*) +(. 3$/=$1!"1 OTU !"#( "5/%*('(-*'> #,* X)$D,!'#("* )*$/*#&%$'*> 8,!/, '511*'#' ,!'#("* $/*#&%$#!(" !' /.!#!/$%+(. *5=$.&(#!/ /,.(-$#!"> $") #,* Y$" N7S$'* V3!D> $ =*& *%*-*"#+(. "5/%*$. -*-2.$"* #.$"'3(.#6 Z"* 35#$#!0* 3.(#*!" !" #,!'/$#*1(.& 9VSU[L\@6@:/; !' 3(''!2%& !"0(%0*) !" *I3(.# (+ -YTU

2!")!"1 3.(#*!"' $") $"(#,*. -$& 2* %(/$%!H*) !" #,* "5/%*5'9VS[SD]DD6@L;6 7,* '*/(") 1.(53 !"/%5)*' #8( Y3' $") '!I Y3%3.(#*!"'> /(-3("*"#' (+ #,* B@V $") K@V .!2('(-$% '525"!#'.*'3*/#!0*%&^ #,*'* *!1,# 3.(#*!"' -$& /("#.!25#* #( )!++*.*"/*' !"3.(#*!" #.$"'%$#!(" 2*#8**" 3.(=$.&(#*' $") *5=$.&(#*'678( +5.#,*. 1.(53' !" 7$2%* L $.* .*%*0$"# +(. #,* -(.* *%$2(.$#*

(.1$"!H$#!(" $") /(-3$.#-*"#$#!(" (+ *5=$.&(#!/ /*%%'6 Z"* /("E'!'#' (+ /&#('=*%*#$% 3.(#*!"'> #,* $/#!"' U/#D $") U/#C> #,* #525%!"'T)$C> T)$W $") 752D> $") #,* /&#('=*%*#("E$''(/!$#*) 3.(#*!"'U.3C $") [)/BC6 7,* $/#!" $") #525%!" 3(%&-*.' 3.(0!)* "(# ("%&!"#*."$% '#.5/#5.* 25# $%'( #,* -*$"' +(. #.$"'3(.# (+ /(-3("*"#'$") !"+(.-$#!(" +.(- ("* .*1!(" (+ #,* /*%% #( $"(#,*.> !-3(.#$"#-$##*.' 1!0*" #,* !"/.*$'*) '!H* (+ *5=$.&(#!/ /*%%'6 7,* 2$/#*.!$%F#'U> X'3:@ $") F#'_ 3.(#*!"' ,$0* '#.5/#5.*' 8!#, '!-!%$.!#!*'.*'3*/#!0*%& #( $/#!" $") #525%!" 25# ("%& 0*.& %!-!#*) 3.!-$.&'*45*"/* '!-!%$.!#!*':W`:?6 U.3C !' $" $/#!"E.*%$#*) 3.(#*!" .*45!.*)+(. $/#!" (.1$"!H$#!("> $") #,* [)/BC N7S$'* !' $ '!1"$%%!"1-(%*/5%* !-3(.#$"# +(. /*%% ',$3* $") +(. /(--5"!/$#!"1 '!1"$%'+.(- #,* /&#('=*%*#("6 Z"* 3.(#*!" 9VSU[\CK6@:/; !' 3.*)!/#*) #(2* $ )&"*!" %!1,# /,$!"6 7,* '*/(") 1.(53 /("'!'#' (+ N7S 2!")!"13.(#*!"' $") #,*!. .*15%$#(.' a3#D> EC> EW $") E:> U.+D> U3'D> N)!D$") V$.D> 8,!/, $.* .*45!.*) +(. -*-2.$"* #.$"'3(.#6 M*-2.$"*E2(5") (.1$"*%%*' $") '#.5/#5.*' $.* /,$.$/#*.!'#!/ +*$#5.*' (+*5=$.&(#!/ /*%%'> $") -*-2.$"* +5'!(" $") +.$1-*"#$#!(" $.*!-3(.#$"# !" (.1$"*%%* +(.-$#!(" $") +5"/#!("6 [$-D 9/$%-()5E%!"; !' $ 3.(#*!" #,$# *I3%(!#' /(-3$.#-*"#$%!H$#!(" (+ [$Cb #(.*15%$#* /*%%5%$. 3.(/*''*'6 Z"* 3.(#*!" 9VSP[D?W\6@L; !' $ 35#$#!0*UOS .!2('&%$#!(" +$/#(. $") -$& 2* !"0(%0*) !" #.$"'3(.#6U '-$%% 1.(53 97$2%* L; !"/%5)*' /*%%E/&/%* $") /,*/=3(!"#

/("#.(% 3.(#*!"'6 7,* [)/C 3.(#*!" =!"$'* 9[)/CL !" !" #$%$&'(')$;!' $ /&/%!"E)*3*")*"# =!"$'* 9[Oc; /("#.(%%!"1 #,* ("'*# (+ VE3,$'*$") -!#('!' !" #,* #8( &*$'#'> 8!#, /%('*%& .*%$#*) [Oc' /("#.(%%!"1#,*'* /*%%E/&/%* #.$"'!#!("' !" (#,*. *5=$.&(#*'6 7,* [Oc '&'#*- +(./*%%E/&/%* /("#.(% *0(%0*) 8!#, #,* $33*$.$"/* (+ *5=$.&(#!/ /*%%'>8,('* /*%% /&/%* )!++*.' +.(- 3.(=$.&(#*' !" #8( 8$&'Q OTU'&"#,*'!'> 8,!/, 5'*' -5%#!3%* (.!1!"' (+ .*3%!/$#!("> $") -!#('!'>8,!/, 2.!"1' $2(5# /,.(-('(-* '*1.*1$#!("6 d# ,$' 2**" $.15*)#,$#> !" #,* 3.!-*0$% *5=$.&(#*> #,*.* 8$' $ '!"1%* [Oc #,$#5")*.8*"# $ -("(#("!/ /,$"1* )5.!"1 #,* /*%% /&/%*> !"!#!$#!"1 V3,$'* *$.%& !" #,* /&/%* $# $ %(8 $/#!0!#& $") -!#('!' %$#* !" #,* /&/%*$# $ ,!1, $/#!0!#&:K6 78( /,*/=3(!"# 3.(#*!"'> Y$)CB $") Y$)C?> $.*DBEWEW 3.(#*!"' #,(51,# #( .*15%$#* #,* [)/C? 3,('3,$#$'* /("E#.(%%!"1 #,* [)/C [Oc::6 d+ OTU 2*/(-*' )$-$1*) #,*" #,*'*/,*/=3(!"# 3.(#*!"' 3.*0*"# #,* ("'*# (+ -!#('!' 5"#!% #,* )$-$1* !'.*3$!.*)6 7,!' 3$#,8$& !' *''*"#!$% +(.-$!"#$!"!"1 1*"(-!/ '#$2!%!#&$") '**-' #( 2* /,$.$/#*.!'#!/ (+ *5=$.&(#!/ /*%%'67,.** +5.#,*. 1.(53' .*e*/# 2!(/,*-!/$% 3.(/*''*' #,$# $.*

!-3(.#$"# !" *5=$.&(#!/ /*%% .*15%$#!("6 7,* G.'# 1.(53 /("'!'#' (+f'-C $") V-)C> 8,!/, $.* .*45!.*) +(. YTU '3%!/!"16 7,*'*/(") 1.(53 /("'!'#' (+ #,* J2/> J2! $") J2% 3.(#*!"' #(1*#,*.8!#, J!3D $") S$)D 97$2%* L;> $%% .*45!.*) #( 2.!"1 $2(5# /("#.(%%*)3.(#*(%&'!' (+ 3.(#*!"'6 U +5.#,*. 3.(#*!" 35#$#!0*%& !"0(%0*) !"3.(#*(%&'!' !' $ 3.(,!2!#!" /(-3%*I '525"!# 9VSU[D:LC6@K/;6 7,*

!"#$%&'(

)*) TU7JY] g hZf BD? g CD F]PYJUYa C@@C g 8886"$#5.*6/(-

!"#$% & '(%)*+,-+). /0)1%23%( .%)%1 +4502*")* ,02 (%6)+). *7% %89"2-0*+//%$$ ")( 48$*+/%$$8$"2+*-

!"#"$%&"'( )*+ *, -./.0 !123 !453 !41366666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666

6!7 8./.0 9.:/"/- .;<%&(*'"= *&-%/">%'"*/

523 4?@ A1 @B @4@53 1@5 ?A AC 55@23 C44 44C ?4 B2

6+7 8./.0 9.:/"/- #;$'"=.$$;$%&"'(

523 CDB 4 4 4@53 544 1 4 4@23 A@B C 1 166666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666EF. 0%#. 9%'% 0.' ,*& %00.00"/- -./. 9;G$"=%'"*/ H%0 ;0.9+ I&*'."/ 9%'% 0.'0 H.&. "9./'":.9 H"'F@23J @53 %/9 523 0"#"$%&"'( ,*& F;#%/0J !"#$#%&'()J *+ ,(,-).$J /+ %#01, %/9 /+ 2,",3'$'), "/ !*& ,*& F;#%/J!"#$#%&'()J*+ ,(,-).$ %/9 4")1'5#%$'$ "/++ EF. K$%0'L=%$=;$%'.9 M"' 0=*&. 9.0=&"M.0'F. 0"#"$%&"'( M.'H../ 'H* 0.N;./=.0+ O*& 'H* "9./'"=%$ 0.N;./=.0 6% =*#G%&.9 '* %7 'F. M"' 0=*&. "04223+ O*& 9",,.&./' 0.N;./=.0 6% =*#G%&.9 H"'F M7 'F. #.%0;&. *, 0"#"$%&"'( "0 M"' 0=*&. 6%M7PM"'0=*&. 6%%7 ! 422+ EF. /;#M.&0 H"'F"/ 'F.0. 9%'% 0.'0 %&. /*' ,*;/9 "/ %/( *, 'F. ,;$$( 0.N;./=.9G&*<%&(*'.0 6@5 "/ '*'%$7 "/ !J *& %/( *, 'F. G&*<%&(*'.0 %/9 'F. 'H* (.%0'0 "/ + %' 0"#"$%&"'( $.Q.$0 *,413J 453 %/9 123+ EF. @5 G&*<%&(*'.0 "/=$;9. -./*#.0 ,&*# CB R;M%='.&"% %/9 ? S&=F%.%+

!"#$% : ;$"11+6/"*+0) 0, /0)1%23%( .%)%1 +4502*")* ,02 (%6)+). *7% %89"2-0*+/ /%$$

);=$.;0 T"M*0*#%$ U('*0<.$.'*/ U*#G%&'#./'%'"*/ U.$$ =(=$. !G$"="/- I&*'.*$(0"0 V"/%0.PGF*0GF%'%0.

W"0=.$$%/.*;0

6666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666

FC+4 &G$4? %='4 (G'4 =9=1 $0#1 ;M=4C =<%4 !IKU1@UA+44FC+1 &G$1B %='1 (G'1 &%91@ 0#91 ;M=@ 9"01 !IKI?KB+1@UFC+C &G$1BS %&G1 (G'C &%915 ;M"4 FFG4F@+4 &G$1D =9=@1 (G'B ;M"@ GG%4F@+1 &G$BS /9%1 %G04 ;M$4 GG%1F@+C &G$B /9%C %&,4 ;.G4 GG.4F9%4 &G0CS ';M4 =%#4 F;05 090140G"4 &G014 !ISUD1A+2BU -9"4 G%94 !IKU1AX?+25U!ISU?D2+2BU 0%&4 &FGA !ISU11X42+2@!IUI4R44+2? !IKU45CD+2? !ISU4B?1+2AU6666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666EF. A1 G&*'."/0 ,&*# E%M$. B% 6523 Q.&0;0 1237 %&. =$%00":.9 %==*&9"/- '* 'F."& G&"#%&( ,;/='"*/ %0 9.0=&"M.9 "/ 'F. '.Y'+ O*& G;'%'"Q. ,;/='"*/0J */$( 'F. -./. $*=%'"*/ "0 -"Q./+

© 2002 Macmillan Magazines Ltd

!"#$% &$'() *'+,#,!, '- )$'!.#+ /#+0,., 0+% )"',)"0!0,.,1 0+%#+*2(%., 3/041 5#,61 7")!1 8)041 8)061 8).4 0+% 9%,64 0+%)(!0!#:. ,.$#+.;!"$.'+#+. )$'!.#+ )"',)"0!0,., <98=36674>?>@0+% 98A36B7C?>D*E? F". )$.,.+*. '- !".,. !"$.. $.&(20!'$G )$'H*.,,., (+#I(. !' .(/0$G'!#* *.22, 022'J, )$'!.#+ 2.:.2, 0+% 0*!#:#!#.,!' K. ,).*#L*022G 0+% $0)#%2G *"0+&.% J#!"'(! $.2G#+& '+ *"0+&., #+!$0+,*$#)!#'+ $0!.? M+ )$'/0$G'!#* *.22,1 &.+. $.&(20!#'+ '-!.+ ').$H0!., !"$'(&" *"0+&., #+ !$0+,*$#)!#'+ $0!.1 -'22'J.% KG %#2(!#'+ '-$.N0#+#+& )$'!.#+, 0, 0 *'+,.I(.+*. '- $0)#% *.22(20$ &$'J!"? F".,2'J.$ &$'J!" $0!., '- .(/0$G'!#* *.22, N.0+, !"0! N.*"0+#,N, #+0%%#!#'+ !' %#2(!#'+ KG &$'J!" 0$. $.I(#$.% !' N'%(20!. )$'!.#+0*!#:#!GO !".,. N.*"0+#,N, N0G K. )$':#%.% KG PQ= ,)2#*#+&1)$'!.'2G,#, 0+% )"',)"'$G20!#'+?FJ' &.+., *'%. -'$ 0 )(!0!#:. R#+*HL+&.$ )$'!.#+ <98A36@3B?44E

J#!" 0 )',,#K2. $'2. #+ *.22 )'20$#!G 0+% 0 )(!0!#:. 0(!')"0&G )$'!.#+<98A8CAS?6@*E !"0!N0GN.%#0!. 0!!0*"N.+! '- 0(!')"0&','N., !'N#*$'!(K(2.,? TU!.+,#'+ '- !"#, 0+02G,#, 0! %#--.$.+! !"$.,"'2%, '-,#N#20$#!G ,"'(2% #%.+!#-G -($!".$ )$'!.#+, '- (+/+'J+ -(+*!#'+!"0! 0$. #N)'$!0+! -'$ .(/0$G'!#* *.22 '$&0+#R0!#'+?V. ).$-'$N.% 0 ,#N#20$ 0+02G,#, !' #%.+!#-G "#&"2G *'+,.$:.%

&.+., !"0! N0G K. #N)'$!0+! -'$ N0#+!0#+#+& N(2!#*.22(20$ .(/0$HG'!#* '$&0+#R0!#'+ <F0K2. SKE? V. *'N)0$.% !". )$'!.#+, #+ )$'/0$HG'!., 0+% #+ !" #$%$&'(')$ 0+% !" *+,-$1 J"#*" 0$. 022 (+#*.22(20$1 J#!"!"',. '- ." $/$0)1(1 2%+(+*3'/)1 4%)-'5+*('( 0+% "(N0+,1 J"#*" 0$.022 N(2!#*.22(20$? F". ,0N. !"$.,"'2%, J.$. (,.% !' #%.+!#-G !"',.)$'!.#+, !"0! 0$. "#&"2G *'+,.$:.% #+ !". -'($ N(2!#*.22(20$ .(/0$GH'!., <D>W1 @DW 0+% @>WE 0+% !' #%.+!#-G J"#*" '- !".,. )$'!.#+,J.$. +'! -'(+% !' K. "#&"2G *'+,.$:.% #+ !". (+#*.22(20$ '$&0+#,N,<6>W1 4DW 0+% 46WE? F". +(NK.$ '- &.+., *'%#+& -'$ )$'!.#+, !"0!-022 #+!' !".,. *0!.&'$#., J0, :.$G ,N022X '+. !' !"$.. %.).+%#+& '+!". !"$.,"'2%, (,.%? F".,. &.+., *'%. 0 )(!0!#:. !$0+,*$#)!#'+-0*!'$1 0+ PQ=HK#+%#+& )$'!.#+ 0+% 0 ,.2.+#(NHK#+%#+& )$'!.#+?=, N'$. ,.I(.+*., K.*'N. 0:0#20K2.1 !". &$'(), '- &.+., J. "0:.

#%.+!#L.% 0, K.#+& #N)'$!0+! -'$ .(/0$G'!#* 0+% N(2!#*.22(20$'$&0+#R0!#'+ J#22 #+.:#!0K2G K. N'%#L.%? 7'J.:.$1 '($ $.,(2!,022'J (, !' ,).*(20!. '+ !". .:'2(!#'+0$G !$0+,#!#'+, -$'N )$'/0$HG'!., !' .(/0$G'!., 0+% !' N(2!#*.22(20$#!G? F". !$0+,#!#'+ !' N(2!#H*.22(20$#!G N0G +'! "0:. $.I(#$.% !". .:'2(!#'+ '- N0+G +.J &.+.,10K,.+! -$'N (+#*.22(20$ '$&0+#,N,? F". )0!"J0G, +.*.,,0$G -'$N(2!#*.22(20$ '$&0+#R0!#'+ *'(2% 02$.0%G "0:. K..+ #+ .U#,!.+*. #+(+#*.22(20$ .(/0$G'!.,? Y'$ .U0N)2.1 #+!.$*.22(20$ ,#&+022#+& N0G"0:. K..+ ,'2:.% KG !". ,.U(02 +..%, '- )$#N.:021 ,#+&2.H*.22.%.(/0$G'!., !' ,../ '(! 0+% #%.+!#-G 0+ 0))$')$#0!. N0!#+& )0$!+.$?Z+*. ,#&+022#+& K.!J..+ *.22, "0% .:'2:.%1 #! *'(2% K. $.0%#2G.U)2'#!.% !' &.+.$0!. !". ,#&+022#+& )0!"J0G, $.I(#$.% -'$ N(2!#H*.22(20$ '$&0+#R0!#'+? F". "#&"2G *'+,.$:.% &.+., ,).*#L* !' .(/0$HG'!., N0G K. +.*.,,0$G -'$ .(/0$G'!#* *.22 '$&0+#R0!#'+ !' K.&.+.$0!.%? M+ *'+!$0,!1 !". !$0+,#!#'+ -$'N (+#*.22(20$#!G !' N(2!#H*.22(20$#!G N0G +'! "0:. $.I(#$.% N0+G +.J &.+.,? M+,!.0% #! N0G"0:. (,.% &.+., 02$.0%G )$.,.+! #+ (+#*.22(20$ .(/0$G'!.,1 ).$"0),KG !". ,"(-[#+& '- -(+*!#'+02 %'N0#+,1 !' &#:. $#,. !' +.J *'NK#+0H!#'+,1 J"#*" 022'J.% !". %.:.2')N.+! '- )0!"J0G, $.I(#$.% -'$ !"..:'2(!#'+ '- N(2!#*.22(20$#!G61B\1S41SC? M- !".,. ,).*(20!#'+, 0$. *'$$.*!1!".G #N)2G !"0! !". .:'2(!#'+0$G !$0+,#!#'+ -$'N (+#*.22(20$ )$'H/0$G'!#* !' (+#*.22(20$ .(/0$G'!#* 2#-. N0G "0:. K..+ N'$. *'N)2.U!"0+ !". !$0+,#!#'+ !' N(2!#*.22(20$ 2#-.? F"#, N#&"! )$':#%. ,'N..U)20+0!#'+ 0, !' J"G #! !''/ 0$'(+% 61]>> N#22#'+ G.0$, <^G$E !'.:'2:. -$'N !". L$,! )$'/0$G'!. !' !". L$,! .(/0$G'!. <!"'(&"! !'"0:. 0$#,.+ 0K'(! ]1C>>^G$ 0+% 41D>>^G$ 0&'1 $.,).*!#:.2GE K(!'+2G D>>^G$ -'$ !". .:'2(!#'+ '- !". L$,! N(2!#*.22(20$ '$&0+#,N,1J"#*" 0$',. 0K'(! 41>>>^G$ 0&'? Y($!".$ 0+02G,., 0+% *'N)0$#,'+,,"'(2% *'+!#+(. !' K. #22(N#+0!#+& 0K'(! !"#, #+!.$.,!#+& I(.,!#'+'- J"#*" &.+., %.L+. .(/0$G'!#* *.22, 0+% J"#*" %.L+. N(2!#H*.22(20$ '$&0+#,N,? !

P.*.#:.% 4B Z*!'K.$ 6>>4O 0**.)!.% S _0+(0$G 6>>6?

4? `'--.0(1 =? $6 )/" F". G.0,! &.+'N. %#$.*!'$G? 7)68%$ !"# <,())2?E$ 4a4>D <4\\SE?

6? F". ." $/$0)1( 9.I(.+*#+& 3'+,'$!#(N? `.+'N. ,.I(.+*. '- !". +.N0!'%. ." $/$0)1(X 0 )20!-'$N -'$

#+:.,!#&0!#+& K#'2'&G? !#'$1#$ %"%$ 6>46a6>4C <4\\CE?

]? =%0N,1 ^? 5? $6 )/" F". &.+'N. ,.I(.+*. '- 2%+(+*3'/) ,$/)1+0)(6$%? !#'$1#$ %"#$ 64CDa64\D

<6>>>E?

@? F". =$0K#%'),#, `.+'N. M+#!#0!#:.? =+02G,#, '- !". &.+'N. ,.I(.+*. '- !". ['J.$#+& )20+!

4%)-'5+*('( 63)/')1)? 7)68%$ &'"$ S\BaC4D <6>>>E?

D? b0+%.$1 T? 9? $6 )/" M+#!#02 ,.I(.+*#+& 0+% 0+02G,#, '- !". "(N0+ &.+'N.?7)68%$ &'($ CB>a\64 <6>>4E?

B? c.+!.$1 _? 3? $6 )/" F". ,.I(.+*. '- !". "(N0+ &.+'N.? !#'$1#$ %()$ 4]>@a4]D4 <6>>4E?

S? 9#)#*R/#1 ^? V".$. %'., L,,#'+ G.0,! ,#! '+ !". !$.. '- 2#-.d 9$1+,$ :'+/" )$ 4>44?4a4>44?@ <6>>>E?

C? 7.*/N0+1 5? 9? $6 )/"^'2.*(20$ .:#%.+*. -'$ !". .0$2G *'2'+#R0!#'+ '- 20+% KG -(+&# 0+% )20+!,? !#'$1#$

%(!$ 446\a44]] <6>>4E?

\? b.()'2%1 e? 5#. c.$.K(+& :'+ 7'N'!"022#. (+% 7.!.$'!"022#. K.# !#3';+()##3)%+,<#$( *+,-$? ."="

>)-" .)%/(-$%0 %&$ ]C4a@SD <4\D>E?

4>? ^#!*"#,'+1 _? ^? F". &$'J!" '- ,#+&2. *.22,? M? !#3';+()##)%+,<#$( *+,-$? ?@*/" .$// =$(" )!$ 6@@a6B6

<4\DSE?

44? Y0+!.,1 8? f A.&&,1 _? A3$ B$)(6 78#/$8( <ZU-'$% e+#:? 8$.,,1 ZU-'$%1 6>>>E?

46? 50:#,1 b? f 9N#!"1 `? P? ^.#'!#* $.*'NK#+0!#'+ 0+% *"$'N','N. ,.&$.&0!#'+ #+ !#3';+()##3)%+,<#$(

*+,-$? C%+#" 7)6/ 4#)5" !#'" D!4 ("$ C]\DaC@>6 <6>>4E?

4]? 7(N)"$.G1 F? 5Q= %0N0&. 0+% *.22 *G*2. *'+!$'2 #+ !#3';+()##3)%+,<#$( *+,-$? E86)6" =$(" &*)$

644a66B <6>>>E?

4@? 9N#!"1 3? b? $6 )/" =+ .2.*!$')"'$.!#* /0$G'!G). -'$ !#3';+()##3)%+,<#$( *+,-$ KG )(2,.% L.2% &.2

.2.*!$')"'$.,#,? 78#/$'# 4#'5( =$(" )*$ @@C4a@@\4 <4\CSE?

4D? b0+&1 A? Y?1 3.%.$&$.+1 P? f `$0G1 ^? V? F". N#!'*"'+%$#02 &.+'N. '- !". L,,#'+ G.0,!1

!#3';+()##3)%+,<#$( *+,-$? 9.I(.+*. '- !". 20$&.H,(K(+#! $#K','N02 PQ= &.+.1 *'N)0$#,'+ '-

)'!.+!#02 ,.*'+%0$G ,!$(*!($. #+ -(+&02 N#!'*"'+%$#02 20$&.H,(K(+#! $PQ=, 0+% .:'2(!#'+0$G

*'+,#%.$0!#'+,? ?8%" F" :'+#3$," )+($ D6SaD]S <4\CSE?

4B? 9*"00/1 _?1 ^0'1 _? f 9'221 5? F". D?C9 PQ= &.+. ,.I(.+*. 0+% !". $#K','N02 $.).0! '-

!#3';+()##3)%+,<#$( *+,-$? 78#/$'# 4#'5( =$(" )'$ 6CD4a6CB@ <4\C6E?

4S? 7'".#,.21 _? 5? $6 )/" 7#&" $.,'2(!#'+ *',N#% 0+% 84 N0), ,)0++#+& !". 4@^K &.+'N. '- !". L,,#'+

G.0,! !" *+,-$? .$// #!$ 4>\a46> <4\\]E?

4C? ^#R(/0N#1 F? $6 )/" = 4] /K $.,'2(!#'+ *',N#% N0) '- !". 4@^K L,,#'+ G.0,! &.+'N. KG +'+$0+%'N

,.I(.+*.H!0&&.% ,#!. N0))#+&? .$// #!$ 464a4]6 <4\\]E?

4\? 70$$#,1 5? f^($)"G1 b? #+ 9$1+,'#( C%+6+#+/( <.%, 9!0$/.G1 ^? f T20,J0$0)(1 P?E 64Sa6]@ <7(N0+01

F'/0J01 Q.J _.$,.G1 6>>4E?

6>? A'+L.2%1 _? g?1 9N#!"1 g? f 9!0%.+1 P? = +.J 5Q= ,.I(.+*. 0,,.NK2G )$'&$0N?78#/$'# 4#'5( =$(" %!$

@\\6a@\\\ <4\\DE?

64? A0#$'*"1 =? f=)J.#2.$1 P? F". 9VM99H8PZF )$'!.#+ ,.I(.+*. %0!0 K0+/ 0+% #!, ,())2.N.+! F$T^Ab

#+ 4\\\? 78#/$'# 4#'5( =$(" %#$ @\aD@ <4\\\E?

66? 9!'.,,.$1 `?1 F(2#1 ^? =?1 b').R1 P? f 9!.$/1 8? F". T^Ab +(*2.'!#%. ,.I(.+*. %0!0K0,.? 78#/$'# 4#'5(

=$(" %#$ 4Ca6@ <4\\\E?

6]? A0!.N0+1 =? $6 )/" 8-0N ]?4X 4]4] N(2!#)2. 02#&+N.+!, 0+% )$'L2. 7^^, N0!*" !". N0h'$#!G '-

)$'!.#+,? 78#/$'# 4#'5( =$(" %#$ 6B>a6B6 <4\\\E?

6@? =2!,*"(21 9? Y?1 `#,"1 V?1 ^#22.$1 V?1 ^G.$,1 T? V? f b#)N0+1 5? _? A0,#* 2'*02 02#&+N.+! ,.0$*" !''2?

F" E+/" :'+/" %)*$ @>]a@4> <4\\>E?

6D? 9'++"0NN.$1 T? b? f 5($K#+1 P? = J'$/K.+*" -'$ 20$&.H,*02. ,.I(.+*. "'N'2'&G 0+02G,#,? .+,*86"

4**/" :'+(#'" )'$ ]>4a]>S <4\\@E?

6B? 8.0$,'+1 V? P? f b#)N0+1 5? _? MN)$':.% !''2, -'$ K#'2'&#*02 ,.I(.+*. *'N)0$#,'+? C%+#" 7)6/ 4#)5"

!#'" D!4 "*$ 6@@@a6@@C <4\CCE?

6S? A#$+.G1 T?1 F"'N),'+1 _? 5? f `#K,'+1 F? _? 80#$V#,. 0+% 9.0$*"V#,.X L+%#+& !". ')!#N02 02#&+N.+! #+

0 ,#N(2!0+.'(, *'N)0$#,'+ '- 0 )$'!.#+ )$'L2. 0&0#+,! 022 5Q= !$0+,20!#'+ -$0N.,?78#/$'# 4#'5( =$("

%&$ 6S]>a6S]\ <4\\BE?

6C? P(!".$-'$%1g? $6 )/"=$!.N#,X ,.I(.+*. :#,(02#R0!#'+ 0+% 0++'!0!#'+?:'+'1G+%,)6'#()+$ \@@a\@D <6>>>E?

6\? ^'$#NG'1 ^? $6 )/" #+ :'+5$G$1#$ E$#3)1'(,( )0)'1(6 ?1&'%+1,$16)/ !6%$(( <.%, ZR0J01 F?1 7'$#1 F? f

F0!,(N#1 g?E 44Da46] <g'+%0+,01 F'/G' f 9)$#+&.$1 7.#%.2K.$&1 4\\CE?

]>? 3',!0+R'1 ^? 3? $6 )/" F". G.0,! )$'!.'N. %0!0K0,. <i85E 0+% .)$1+%3)-5'6'( $/$0)1( )$'!.'N.

%0!0K0,. <V'$N85EX *'N)$.".+,#:. $.,'($*., -'$ !". '$&0+#R0!#'+ 0+% *'N)0$#,'+ '- N'%.2

'$&0+#,N )$'!.#+ #+-'$N0!#'+? 78#/$'# 4#'5( =$(" %"$ S]aSB <6>>>E?

]4? 3".$$G1 _? ^? $6 )/" 9`5X !)##3)%+,<#$( &.+'N. %0!0K0,.? 78#/$'# 4#'5( =$(" %+$ S]aS\ <4\\CE?

]6? ^.J.,1 7?V? $6 )/"^M89X 0 %0!0K0,. -'$ &.+'N., 0+% )$'!.#+ ,.I(.+*.,?78#/$'# 4#'5( =$(" %"$ ]Sa@>

<6>>>E?

]]? b'J.1 F? ^? f T%%G1 9? P? !PQ=,*0+H9TX 0 )$'&$0N -'$ #N)$':.% %.!.*!#'+ '- !$0+,-.$ PQ= &.+., #+

&.+'N#* ,.I(.+*.? 78#/$'# 4#'5( =$(" %*$ \DDa\B@ <4\\SE?

]@? A20+%#+1 `? $6 )/" `.+'N#* .U)2'$0!#'+ '- !". ".N#0,*'NG*.!'(, G.0,!,X @? F". &.+'N. '-

!)##3)%+,<#$( #$%$&'(')$ $.:#,#!.%? H?:! >$66" &"#$ ]4a]B <6>>>E?

]D? V''%1 c?1 P(!".$-'$%1 g? ^?1 M:.+,1 =?1 P0h0+%$.0N1 ^?H=? f A0$$.221 A? = $.H0++'!0!#'+ '- !".

!)##3)%+,<#$( #$%$&'(')$ &.+'N.? .+,*" H81#6" 9$1+," %$ 4@]a4D@ <6>>4E?

]B? g0+./'1 F? $6 )/" 3'N)2.!. &.+'N. ,!$(*!($. '- !". +#!$'&.+HLU#+& ,GNK#'!#* K0*!.$#(N

E$(+%3';+-'8, /+6' <,())2.N.+!E? 274 =$(" #$ ]C4a@>B <6>>>E?

]S? 7(!*"#,'+1 3? =? $6 )/" `2'K02 !$0+,)','+ N(!0&.+.,#, 0+% 0 N#+#N02 E<#+*/)(,) &.+'N.? !#'$1#$

%"+$ 64BDa64B\ <4\\\E?

]C? 5.*/.$!1 `? $6 )/" F". *'N)2.!. &.+'N. '- !". "G).$!".$N')"#2#* K0*!.$#(N 4I8'G$@ )$+/'#8(? 7)68%$

!(%$ ]D]a]DC <4\\CE?

]\? `'--.0(1 =? $6 )/" b#-. J#!" B>>> &.+.,? !#'$1#$ %#&$ D@B1 DB]aDBS <4\\BE?

@>? A0$+#!R1 _? F?1 3$0N.$1 _? 7?1 P'J+%1 P? 7?1 3''2.G1 b? f 9'221 5? =$$0+&.N.+! '- !". $#K','N02 PQ=

&.+., #+ !#3';+()##3)%+,<#$( *+,-$? H?:! >$66" )&!$ 46\a4]6 <4\C6E?

@4? ^0'1 _? $6 )/" F". D9 PQ= &.+., '- !#3';+()##3)%+,<#$( *+,-$?78#/$'# 4#'5( =$(" )'$ @CSaD>> <4\C6E?

@6? 30$$1 =? ^?1 ^0*Q.#221 9? =?1 70G2.,1 _? f Q($,.1 8? ^'2.*(20$ *2'+#+& 0+% ,.I(.+*. 0+02G,#, '- N(!0+!

022.2., '- !". L,,#'+ G.0,! *%*6 )$'!.#+ /#+0,. &.+.X #N)2#*0!#'+, -'$ *%*6j )$'!.#+ ,!$(*!($. 0+%

-(+*!#'+? E+/" 9$1" 9$1$6" %)"$ @4a@\ <4\C\E?

@]? g#N1 _? ^?1 c0+&($#1 9?1 A'./.1 _? 5?1 `0K$#.21 =? f c'G!0,1 5? Y? F$0+,)',0K2. .2.N.+!, 0+% &.+'N.

'$&0+#R0!#'+X 0 *'N)$.".+,#:. ,($:.G '- $.!$'!$0+,)','+, $.:.02.% KG !". *'N)2.!. !)##3)%+,<#$(

#$%$&'(')$ &.+'N. ,.I(.+*.? 9$1+,$ =$(" "$ @B@a@SC <4\\CE?

@@? =,"K($+.$1 ^? $6 )/" =+ .U)2'$0!#'+ '- !". ,.I(.+*. '- 0 6?\H^K $.&#'+ '- !". &.+'N. '- 2%+(+*3'/)

,$/)1+0)(6$%X !". 453 $.&#'+? 9$1$6'#( )*!$ 4S\a64\ <4\\\E?

@D? `(1 k?1 V0+&1 7?1 Q./$(!.+/'1 =? f b#1 V? 7? 5.+,#!#.,1 2.+&!" )$')'$!#'+,1 0+% '!".$ %#,!$#K(!#'+02

!"#$%&'(

Q=FePT l cZb @4D l 64 YTAPe=Pi 6>>6 l JJJ?+0!($.?*'N )*+© 2002 Macmillan Magazines Ltd

!"#$%&"' (! &")"$*$*+" '",%"-."' *- $/" /%0#- 1"-(0" "'$*0#$"2 !&(0 345 0"1#6#'"' (! 1"-(0*.

'",%"-."7 !"#" !"#$ 89:88 ;<555=7

3>7 ?1"@A B7 B"(&*"-$#$*(- (! $/" 2*'$#@ &"1*(- *- @*-C#1" 1&(%) DDB (! E''*(- F"#'$7 $%&&' !"#"(' !%$ 9GH:

985 ;9HH4=7

3G7 I/*C#'/*1"A J7 "( )*' I(0)('*$" 0($*!' #-2 &")"#$ 'F00"$&F *- +' ,-./" ."-$&(0"&"'K 2*&".$ #-#@F'*'

6F *-$"1&#$*(- (! 0-(D &"'$&*.$*(- '*$"'7 $"** "&$ G4H:GL9 ;9H8H=7

387 M%&#C#0*A N7A M#$'%0($(A O7A P*Q#A R7 S J#-#1*2#AM7 N$&%.$%&" (! $/" E''*(- F"#'$ ."-$&(0"&" ."-4K

2*&".$ #-#@F'*' (! $/" &"*$"&#$"2 *-+"&$"2 &"1*(-7 $1&-.-2-.) '('$ <93:<<9 ;9HH9=7

3H7 I@#&C"A T7 S U#%0AM7 V7 W%-.$*(-#@ #-#@F'*' (! # ."-$&(0"&" !&(0 E''*(- F"#'$K # &(@" !(& ."-$&(0"&"X

')".*E. &")"#$"2 YPZ '",%"-."'7 3-*' $"**' 45-*' '($ 98>4:98G< ;9HH5=7

L57 O#C#/#'/*A [7A M%&#C#0*A N7A I/*C#'/*1"A J7A P*Q#A R7 S J#-#1*2#A M7 Z @#&1" -%06"& (! $BPZ 1"-"'

#&" 'F00"$&*.#@@F @(.#$"2 *- E''*(- F"#'$ ."-$&(0"&"'7 6' 3-*' 45-*' !')$ 94:9G ;9HH9=7

L97 P#C#'"C(A J7A Z2#./*A J7A W%-#/#'/*A N7XD7A P*Q#A R7 S J#-#1*2#A M7 I/&(0('(0" Q#@C*-1 '/(Q' #

/*1/@F /(0(@(1(%' &")"$*$*+" '",%"-." )&"'"-$ (- #@@ $/" ."-$&(0"&" &"1*(-' (! E''*(- F"#'$7 7348 6'

"$ 9599:95<9 ;9H8>=7

L<7 W*'/"@A U7A Z0'$%$\A ]7A U#%0A M7A I#&6(-A ^7 S I@#&C"A T7 N$&%.$%&#@ (&1#-*\#$*(- #-2 !%-.$*(-#@

#-#@F'*' (! ."-$&(0"&*. YPZ *- $/" E''*(- F"#'$ +915:-2)991)&-.;9"2 ,-./"7 3-*' $"**' 45-*' )$ GL3:

G>4 ;9H88=7

L47 U#%0A M7A P1#-A _7 [7 S I@#&C"A T7 O/" ."-$&(0"&*. [X$F)" &")"#$ #-2 $/" ."-$&#@ .(&" #&" $(1"$/"&

'%!E.*"-$ $( "'$#6@*'/ # !%-.$*(-#@ +915:-2)991)&-.;9"2 ,-./" ."-$&(0"&"73-*' 45-*' $"** "$ G3G:G>9

;9HH3=7

L37 V#&$&*21"A ^7 W7A U(&1'$&(0A U7 S Z@@'/*&"A B7 I7 Y*'$*-.$ )&($"*- *-$"&#.$*(- 2(0#*-' #-2 )&($"*-

')&"#2*-1 *- # .(0)@"` ."-$&(0"&"7 !"#"2 <"=' '%$ G84:GH9 ;<555=7

LL7 ]F0#-A Z7 Z7 S N(&1"&A V7 [7 N$&%.$%&" #-2 !%-.$*(- (! C*-"$(./(&"' *- 6%22*-1 F"#'$7 >##%' ?"=' $"**

<"=' 45-*' ''$ 3G9:3HL ;9HHL=7

L>7 ]*"$"&A V7 "( )*' W%-.$*(-#@ '"@".$*(- #-2 #-#@F'*' (! F"#'$ ."-$&(0"&*. YPZ7 $"** %!$ H94:H<9 ;9H8L=7

LG7 W%-CA M7A ]"1"0#--A ^7 ]7 S V/*@*))'"-A V7 I/&(0#$*- 2*1"'$*(- Q*$/ &"'$&*.$*(- "-2(-%.@"#'"'

&"+"#@' 9L5:9>5 6) (! )&($".$"2 YPZ *- $/" ."-$&(0"&" (! ./&(0('(0" aD_ *- +)991)&-.;9"2

9"&"=525)"7 3-*' !"#' !"#"(' !'#$ 9L4:9>5 ;9H8H=7

L87 B%''"@@A V7 B7 O&#-'.&*)$*(- (! $/" $&*('"X)/(')/#$"X*'(0"&#'" 1"-" (! +915:-2)991)&-.;9"2 ,-./"

*-*$*#$"' !&(0 # '$#&$ )(*-$ 2*!!"&"-$ !&(0 $/#$ *- +)991)&-.;9"2 9"&"=525)"7 !"#" %($ 9<L:945 ;9H8L=7

LH7 P#1#Q#A W7 S W*-CA b7 B7 O/" &"@#$*(-'/*) 6"$Q""- $/" ccOZOZdd '",%"-." #-2 $&#-'.&*)$*(- *-*$*#$*(-

'*$"' #$ $/" @A+B 1"-" (! +)991)&-.;9"2 9"&"=525)"7 C&-9' 0)(* >9)D' +95' E+> )!$ 8LLG:8L>9 ;9H8L=7

>57 b(0"\A M7 S Z-$",%"&#A W7 R&1#-*\#$*(- (! YPZ &")@*.#$*(- (&*1*-' *- $/" E''*(- F"#'$ 1"-(0"7

7348 6' ')$ L>84:L>H5 ;9HHH=7

>97 T(6&FA ^7 B7 Z'F00"$&*. '%6'$*$%$*(- )#$$"&-' *- $/" $Q( YPZ '$&#-2' (! 6#.$"&*#73-*' 45-*' 7=-*' '*$

>>5:>>L ;9HH>=7

><7 M#-*#$*'A O7 S B""2A B7 O/" &(@" (! '0#@@ -%.@"#& &*6(-%.@"()&($"*- )#&$*.@"' *- )&"X0BPZ ')@*.*-17

0)(%&" *!"$ >G4:>G8 ;9H8G=7

>47 W*-CA b7 B7 V'"%2(1"-"' *- F"#'$e $"** %#$ L:> ;9H8G=7

>37 B(6"&$'(-A ]7 M7 O/" @#&1" 2&1 !#0*@F (! ./"0(&".")$(& 1"-"' *- $)"#-&1)/D5(52 -"0#$(2"' &"+"#@'

)&(."''"' (! 1"-(0" "+(@%$*(- *-+(@+*-1 @#&1" 2%)@*.#$*(-' #-2 2"@"$*(-' #-2 *-$&(- 1#*-' #-2 @(''"'7

!"#-." ?"2' '($ 9H<:<54 ;<555=7

>L7 b&#+"@"FA U7 B7 Z@$"&-#$*+" ')@*.*-1K *-.&"#'*-1 2*+"&'*$F *- $/" )&($"(0*. Q(&@27 F&"#D2 !"#"(' '&$

955:95G ;<559=7

>>7 f(@!"A [7 ]7 S N/*"@2'A Y7 I7M(@".%@#& "+*2"-." !(& #- #-.*"-$ 2%)@*.#$*(- (! $/" "-$*&" F"#'$ 1"-(0"7

0)(%&" *)&$ G58:G94 ;9HHG=7

>G7 U(Q0#-A N7 "( )*' O/" .(0)@"$" -%.@"($*2" '",%"-." (! ./&(0('(0" 4 (! C*)2.-D5%. G)*95,)&%.7

0)(%&" %(($ L4<:L48 ;9HHH=7

>87 Z@$'./%@A N7 W7 "( )*' b#))"2 UTZNO #-2 VNDXUTZNOK # -"Q 1"-"&#$*(- (! )&($"*- 2#$#6#'" '"#&./

)&(1&#0'7 0%9*"59 >95D2 ?"2' !"$ 448H:435< ;9HHG=7

>H7 B%6*-A b7 M7 "( )*' I(0)#&#$*+" 1"-(0*.' (! $/" "%C#&F($"'7 +95"#9" !)&$ <<53:<<9L ;<555=7

G57 Z)Q"*@"&A B7 "( )*' O/" D-$"&V&( 2#$#6#'"A #- *-$"1&#$"2 2(.%0"-$#$*(- &"'(%&." !(& )&($"*- !#0*@*"'A

2(0#*-' #-2 !%-.$*(-#@ '*$"'7 0%9*"59 >95D2 ?"2' !#$ 4G:35 ;<559=7

G97 I/"&+*$\A N7 Z7 "( )*' I(0)#&*'(- (! $/" .(0)@"$" )&($"*- '"$' (! Q(&0 #-2 F"#'$K (&$/(@(1F #-2

2*+"&1"-."7 +95"#9" !)!$ <5<<:<5<8 ;9HH8=7

G<7 ]%-$A I7 "( )*' N%6$"@(0"&*. '",%"-." !&(0 $/" &*1/$ #&0 (! +915:-2)991)&-.;9"2 ,-./" ./&(0('(0"

D .(-$#*-' '"+"- )"&0"#'" 1"-"'7 H")2( ')$ 4LL:4>9 ;<559=7

G47 [#6'./A f7 S ](@0"'A [7 I7 O/" #.$*- !(@27 I>+74 6' #$ 9>G:9G3 ;9HHL=7

G37 D$(/A O7A M#$'%2#A ]7 S M(&*A ]7 V/F@(1"-"$*. #-#@F'*' (! $/" $/*&2 12,JK /(0(@(1 *- 7291"&5915)

9-*5g # -(+"@ 0"06"& (! $/" @29LL '%6!#0*@F #-2 *$' )(''*6@" .(X./#)"&(-"7 <0> ?"2' +$ <HH:45L

;9HHH=7

GL7 ?&*.C'(-A ]7 V7 Z$(0*. '$&%.$%&"' (! $%6%@*- #-2 W$'h7 F&"#D2 $"** 45-*' )$ 944:94G ;9HH8=7

G>7 W*'/"&A Y7 T7 S P%&'"A V7 Z '*-1@" E''*(- F"#'$ 0*$($*. .F.@*- U )43.2.< C*-#'" )&(0($"' 6($/ NX)/#'"

#-2 0*$('*' *- $/" #6'"-." (! b9 .F.@*-'7 7348 6' '"$ 8L5:8>5 ;9HH>=7

GG7 T()"\Xb*&(-#A Z7A W%&-#&*A U7A M(-2"'"&$A R7 S B%''"@@A V7 P%.@"#& @(.#@*\#$*(- (! I2.<L *' &"1%@#$"2

6F YPZ 2#0#1" #-2 # 93X4X4 )&($"*-7 0)(%&" *#&$ 9G<:9GL ;9HHH=7

G87 T%-2*-A T7 b7 b"-" 2%)@*.#$*(-' *- "#&@F 0"$#\(#- "+(@%$*(-7 +".5#' $"** <"=' 45-*' '($ L<4:L45

;9HHH=7

!"#$%&'()*(+($,-

f" $/#-C $/" ?%&()"#- I(00*''*(-A $/" f"@@.(0" O&%'$ #-2 I#-."& B"'"#&./ i[ !(&E-#-.*#@ '%))(&$7 f" #@'( $/#-C #@@ $/" 0#-F )"()@" *- $/" E''*(- F"#'$ .(00%-*$F !(&$/"*& .(00"-$' #-2 '%11"'$*(-' #$ #@@ '$#1"' (! $/*' )&(j".$A )#&$*.%@#&@F M7M*$./*'(- #-2i7 T"%)(@2A $/" !(%-2"&' (! E''*(- F"#'$ '$%2*"'7 I#-."& B"'"#&./ i[A T(-2(- B"'"#&./D-'$*$%$"A .(0)&*'"' T*-.(@-d' D-- W*"@2' #-2 I@#&" ]#@@ T#6(&#$(&*"' (! $/" !(&0"&D0)"&*#@ I#-."& B"'"#&./ W%-2 !(@@(Q*-1 $/" 0"&1"& (! $/" DIBW Q*$/ $/" I#-."&B"'"#&./ I#0)#*1- *- W"6&%#&F <55<7

.%+/(,0$* 0$,(1(-,- -,2,(+($,

O/" #%$/(&' 2".@#&" $/#$ $/"F /#+" -( .(0)"$*-1 E-#-.*#@ *-$"&"'$'7

I(&&"')(-2"-." #-2 &",%"'$' !(& 0#$"&*#@' '/(%@2 6" #22&"''"2 $( M7Z7B7;"X0#*@K 0#&k'#-1"&7#.7%C=7

!"#$%&'(

))* PZOiB? l _RT 39L l <9 W?UBiZBJ <55< l QQQ7-#$%&"7.(0© 2002 Macmillan Magazines Ltd

abolish ability of human eRF1 to trigger peptidyl-tRNA hydrolysis. RNA 5, 1014–1020 (1999).

4. Zavialov, A. V., Mora, L., Buckingham, R. H. & Ehrenberg, M. Release of peptide promoted by the

GGQ-motif of class 1 release factors regulates the GTPase activity of RF3. Mol. Cell (in the press).

5. Song, H. et al. The crystal structure of human eukaryotic release factor eRF1-mechanism of stop

codon recognition and peptidyl-tRNA hydrolysis. Cell 100, 311–321 (2000).

6. Vestergaard, B. et al. Bacterial polypeptide release factor RF2 is structurally distinct from eukaryotic

eRF1. Mol. Cell 8, 1375–1382 (2001).

7. Freistroffer, D. V., Pavlov, M. Y., MacDougall, J., Buckingham, R. H. & Ehrenberg, M. Release factor

RF3 in E. coli accelerates the dissociation of release factors RF1 and RF2 from the ribosome in a GTP

dependent manner. EMBO J. 16, 4126–4133 (1997).

8. Zavialov, A. V., Buckingham, R. H. & Ehrenberg, M. A Posttermination ribosomal complex is the

guanine nucleotide exchange factor for peptide release factor RF3. Cell 107, 115–124 (2001).

9. van Heel, M. et al. Single-particle electron cryo-microscopy: towards atomic resolution. Q. Rev.

Biophys. 33, 307–369 (2000).

10. Yusupov, M. M. et al. Crystal structure of the ribosome at 5.5 A resolution. Science 292, 883–896

(2001).

11. Moffat, J. G. & Tate, W. P. A single proteolytic cleavage in release factor 2 stabilizes ribosome binding

and abolishes peptidyl-tRNA hydrolysis activity. J. Biol. Chem. 269, 18899–18903 (1994).

12. Tin, O. F. et al. Proteolytic fragmentation of polypeptide release factor 1 of Thermus thermophilus and

crystallization of the stable fragments. Biochimie 82, 765–772 (2000).

13. Kastner, B., Trotman, C. N. & Tate, W. P. Localization of the release factor-2 binding site on 70 S

ribosomes by immuno-electron microscopy. J. Mol. Biol. 212, 241–245 (1990).

14. Wilson, K. S., Ito, K., Noller, H. F. & Nakamura, Y. Functional sites of interaction between release

factor RF1 and the ribosome. Nature Struct. Biol. 7, 866–870 (2000).

15. Xu, W., Pagel, F. T. & Murgola, E. J. Mutations in the GTPase center of Escherichia coli 23S rRNA

indicate release factor 2-interactive sites. J. Bacteriol. 184, 1200–1203 (2002).

16. Wimberly, B. T., Guymon, R.,McCutcheon, J. P.,White, S.W. & Ramakrishnan, V. A detailed view of a

ribosomal active site: the structure of the L11–RNA complex. Cell 97, 491–502 (1999).

17. Conn, G. L., Draper, D. E., Lattman, E. E. & Gittis, A. G. Crystal structure of a conserved ribosomal

protein–RNA complex. Science 284, 1171–1174 (1999).

18. Yusupova, G. Z., Yusupov, M. M., Cate, J. H. & Noller, H. F. The path of messenger RNA through the

ribosome. Cell 106, 233–241 (2001).

19. Uno,M., Ito, K. & Nakamura, Y. Polypeptide release at sense and noncognate stop codons by localized

charge-exchange alterations in translational release factors. Proc. Natl Acad. Sci. USA 99, 1819–1824

(2002).

20. Nissen, P. et al. Crystal structure of the ternary complex of Phe-tRNAPhe, EF-Tu, and a GTP analog.

Science 270, 1464–1472 (1995).

21. Ito, K., Ebihara, K., Uno, M. & Nakamura, Y. Conserved motifs in prokaryotic and eukaryotic

polypeptide release factors: tRNA-protein mimicry hypothesis. Proc. Natl Acad. Sci. USA 93,

5443–5448 (1996).

22. Bertram, G., Bell, H. A., Ritchie, D. W., Fullerton, G. & Stansfield, I. Terminating eukaryote

translation: domain 1 of release factor eRF1 functions in stop codon recognition. RNA 6, 1236–1247

(2000).

23. Frolova, L. Y., Seit-Nebi, A. & Kisselev, L. L. Highly conserved NIKS tetrapeptide is functionally

essential in eukaryotic translation termination factor eRF1. RNA 8, 129–136 (2002).

24. Inagaki, Y., Blouin, C., Doolittle, W. F. & Roger, A. J. Convergence and constraint in eukaryotic release

factor 1 (eRF1) domain 1: the evolution of stop codon specificity. Nucleic Acids Res. 30, 532–544

(2002).

25. Merkulova, T. I., Frolova, L. Y., Lazar, M., Camonis, J. & Kisselev, L. L. C-terminal domains of human

translation termination factors eRF1 and eRF3mediate their in vivo interaction. FEBS Lett. 443, 41–47

(1999).

26. Jelenc, P. C. & Kurland, C. G. Nucleotide triphosphate regeneration decreases the frequency of

translation errors. Proc. Natl Acad. Sci. USA 76, 3174–3178 (1979).

27. Harauz, G. & van Heel, M. Exact filters for general geometry three-dimensional reconstruction.Optik

73, 146–156 (1986).

28. Jones, T. A., Zou, J. Y., Cowan, S. W. & Kjeldgaard,M. Improvedmethods for building proteinmodels

in electron density maps and the location of errors in these models. Acta Crystallogr. A 47, 110–119

(1991).

29. Evans, S. V. Setor: Hardware lighted three-dimensional solid model representations of

macromolecules. J. Mol. Graphics 11, 134–138 (1993).

Supplementary Information accompanies the paper on Nature’s website(! http://www.nature.com). A stereo representation of figure 3 and the FSC plot to assess theresolution of the reconstruction are provided.

Acknowledgements We thank M. Kjeldgaard for making the RF2 coordinates available beforedeposition; E. Murgola for sharing data before publication; E. Morris, R. Finn and R. Matadeenfor discussions on image processing; M. Schatz and R. Schmidt for improvements to the IMAGICsoftware system; G. Willoughby for computational support; and R. Brimacombe for discussions.This work was supported in part by grants from the BBSRC and the EU. B.V. was funded by theNIH. A.V.Z. and M.E. were supported by the Swedish Foundation for Strategic Research and theSwedish Research Council.

Competing interests statement The authors declare competing financial interests: detailsaccompany the paper on Nature’s website (! http://www.nature.com/nature).

Correspondence and requests for materials should be addressed to M.v.H.(e-mail: [email protected]). The atomic coordinates of the E. coli RF2 structure fitted into thecryo-EM map have been deposited in the Protein Data Bank (e-mail: http://www.rcsb.org) withthe accession code 1ML5. The cryo-EM map has been deposited in the 3D EM data base(e-mail: http://www.ebi.ac.uk/msd/MSDProjects/IIMS3D_EM.html) with the accession codeEMD-1005.

..............................................................

corrigenda

The genome sequence ofSchizosaccharomyces pombe

V. Wood, R. Gwilliam, M.-A. Rajandream, M. Lyne, R. Lyne, A. Stewart,J. Sgouros, N. Peat, J. Hayles, S. Baker, D. Basham, S. Bowman,K. Brooks, D. Brown, S. Brown, T. Chillingworth, C. Churcher, M. Collins,R. Connor, A. Cronin, P. Davis, T. Feltwell, A. Fraser, S. Gentles, A. Goble,N. Hamlin, D. Harris, J. Hidalgo, G. Hodgson, S. Holroyd, T. Hornsby,S. Howarth, E. J. Huckle, S. Hunt, K. Jagels, K. James, L. Jones, M. Jones,S. Leather, S. McDonald, J. McLean, P. Mooney, S. Moule, K. Mungall,L. Murphy, D. Niblett, C. Odell, K. Oliver, S. O’Neil, D. Pearson, M. A. Quail,E. Rabbinowitsch, K. Rutherford, S. Rutter, D. Saunders, K. Seeger,S. Sharp, J. Skelton, M. Simmonds, R. Squares, S. Squares, K. Stevens,K. Taylor, R. G. Taylor, A. Tivey, S. Walsh, T. Warren, S. Whitehead,J. Woodward, G. Volckaert, R. Aert, J. Robben, B. Grymonprez,I. Weltjens, E. Vanstreels, M. Rieger, M. Schafer, S.Muller-Auer, C. Gabel,M. Fuchs, A. Dusterhoft, C. Fritzc, E. Holzer, D. Moestl, H. Hilbert,K. Borzym, I. Langer, A. Beck, H. Lehrach, R. Reinhardt, T. M. Pohl,P. Eger, W. Zimmermann, H. Wedler, R. Wambutt, B. Purnelle, A. Goffeau,E. Cadieu, S. Dreano, S. Gloux, V. Lelaure, S. Mottier, F. Galibert,S. J. Aves, Z. Xiang, C. Hunt, K. Moore, S. M. Hurst, M. Lucas, M. Rochet,C. Gaillardin, V. A. Tallada, A. Garzon, G. Thode, R. R. Daga, L. Cruzado,J. Jimenez, M. Sanchez, F. del Rey, J. Benito, A. Domınguez,J. L. Revuelta, S.Moreno, J. Armstrong, S. L. Forsburg, L. Cerutti, T. Lowe,W. R. McCombie, I. Paulsen, J. Potashkin, G. V. Shpakovski, D. Ussery,B. G. Barrell & P. Nurse

Nature 415, 871–880 (2002)..............................................................................................................................................................................

In this Article, the author Andreas Dusterhoft was mistakenlyomitted: his name and affiliation (footnote 6) should have beeninserted between M. Fuchs and C. Fritzc in the author list. Inaddition, the name of L. Cerutti (in the last line of the author list)was misspelled. On p874 in the penultimate sentence of the‘Intergene regions’ section, “tandemly oriented genes” should read“divergently oriented genes.” A

..............................................................

corrigendum

Probing the free-energy surface forprotein folding with single-molecule fluorescencespectroscopy

Benjamin Schuler, Everett A. Lipman & William A. Eaton

Nature 419, 743–747 (2002)..............................................................................................................................................................................

The upper limit on the polypeptide reconfiguration time (t0) wasinadvertently calculated using (japp 2 j0)

2, instead of (japp2 2 j0

2),as given in the formula in the text (page 745, right column). Thecorrect upper limit is therefore 0.2ms. This results in a lower limiton the free energy barrier (D) of 2kBT, corresponding to anactivation entropy of +3kB (page 746, right column), and anupper limit on the pre-exponential factor (2pt0) of 1ms. Thismistake does not affect any of the conclusions. We thank Taekjip Hafor bringing it to our attention. A

letters to nature

NATURE |VOL 421 | 2 JANUARY 2003 | www.nature.com/nature94 © 2002 Nature Publishing Group

63

Submitted Publication

2. A re-annotation of the Saccharomyces cerevisiae genome.

Research Article

A Re-annotation of the Saccharomycescerevisiae Genome

V. Wood*, K. M. Rutherford, A Ivens, M-A Rajandream and B. BarrellThe Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

*Correspondence to:V. Wood, The Sanger Centre,Wellcome Trust GenomeCampus, Hinxton, CambridgeCB10 1SA, UK.E-mail: [email protected]

Received: 14 March 2001

Accepted: 19 April 2001

Abstract

Discrepancies in gene and orphan number indicated by previous analyses suggest thatS. cerevisiae would benefit from a consistent re-annotation. In this analysis three new genesare identified and 46 alterations to gene coordinates are described. 370 ORFs are definedas totally spurious ORFs which should be disregarded. At least a further 193 genes couldbe described as very hypothetical, based on a number of criteria.It was found that disparate genes with sequence overlaps over ten amino acids (especiallyat the N-terminus) are rare in both S. cerevisiae and Sz. pombe. A new S. cerevisiae genenumber estimate with an upper limit of 5804 is proposed, but after the removal of veryhypothetical genes and pseudogenes this is reduced to 5570. Although this is likely to becloser to the true upper limit, it is still predicted to be an overestimate of gene number. Acomplete list of revised gene coordinates is available from the Sanger Centre (S. cerevisiaereannotation: ftp://ftp/pub/yeast/SCreannotation). Copyright # 2001 John Wiley & Sons,Ltd.

Keywords: annotation; Schizosaccharomyces pombe; Sacccharomyces cerevisiae;comparative genomics; sequence orphans; hypothetical proteins

Introduction

Background

The publication in 1996 of the first completeeukaryotic genome sequence, that of Saccharomycescerevisiae, heralded a new era in biology (Goffeauet al., 1996). This resource not only benefited thoseinvestigating S. cerevisiae, but also enabled infer-ences from the functional data to be transferred to adiverse range of other organisms. Unexpectedly, asignificant proportion (56%) of annotated genes hadnot been studied previously, despite more than 50years of traditional biochemistry and genetics(Oliver et al., 1992, Oliver, 1996, Mewes et al.,1997). This observation stimulated the applicationof functional genomics technologies to characterisethese genes and their products, either gene-by-genein small laboratories, or on a larger scale in someresearch institutes (Hieter and Boguski, 1997).

In the five years since the S. cerevisiae genomewas sequenced, the majority (70%) of the predictedgenes have been assigned an initial functional

characterisation in the Yeast Protein Database(YPD, Proteome Inc. http://www.proteome.com/databases/index.html). Establishing the functionalinter-relationships between all the genes in agenome requires, in the first instance, the assign-ment of genes to preliminary functional classes.These initial assignments will authenticate predictedgenes as coding entities and partition the data intocategories for subsequent biological analyses. How-ever, it will be difficult to assess when this milestonehas been reached since the exact number of genes inS. cerevisiae is still unclear: the Munich InformationCenter for Protein Sequences (MIPS) database hasa protein complement of 6368 (http://www.mips.biochem.mpg.de/proj/yeast) Saccharomyces GenomeDatabase (SGD) has 6310 (http://genome-www.stanford.edu/Saccharomyces/), and YPD has 6142as of 26 January 2001.

It is likely that a significant cause of thediscrepancies between these gene numbers are dueto small, fortuitously occurring ORFs (open read-ing frames), which are notoriously difficult to

Comparative and Functional GenomicsComp Funct Genom 2001; 2: 143–154.DOI: 10.1002 / cfg.86

Copyright # 2001 John Wiley & Sons, Ltd.

distinguish from real genes (Dujon, 1996). In theoriginal S. cerevisiae annotation, only ORFs greaterthan 100 amino acids in size were considered. Thisthreshold was imposed in order to reduce thechance of missing small proteins without over-prediction due to the statistically expected fre-quency of small ORFs (Sharp and Cowe, 1991).Those without assigned function or homologueswere designated sequence orphans (Dujon, 1996). Asgenome sequencing proceeded, the ratio of orphansto ORFs with homologues increased rapidly- somuch so that this phenomenon was termed ‘Themystery of orphans’ (Dujon, 1996).

The existence of relatively high numbers oforphans can only be attributed to one or acombination of the following:

1. They may simply be spurious ORFs. InS. cerevisiae, a number of predicted genes arealso completely or substantially overlappingwith defined coding features and should there-fore be disregarded.

2. They may arise due to the acquisition of novelspecies-specific functions.

3. They escape functional characterisation byhomology because they are rapidly evolving.

4. Identifiable homologues in other organismsexist, but these have not yet been sequenced.

The question of S. cerevisiae gene number hasbeen addressed many times, with differing out-comes. Mackiewicz et al. (1999) estimated the totalnumber of protein coding ORFs to be 4800, basedon their sequence properties. Zhang and Wang(2000) calculated the likely number to be j5645,based on the assumption that unknown genes havesimilar statistical properties to known genes. Aspart of the Genolevures project, Blandin et al.(2000) performed a consistent re-annotation of theS. cerevisiae genome using uniform criteria, revealing50 possible novel genes and 26 gene extensions. Theyproposed a protein coding gene set of at least 5600genes. As part of the same initiative, Malpertuy et al.(2000) estimate that the S. cerevisiae genome contains5651 actual protein coding genes (including the 50new predictions), and that the public databasescontain 612 predicted ORFs that are not proteincoding.

The availability of an additional yeast genome, thatof Schizosaccharomyces pombe (fission yeast), whichhas 99.5% of its coding sequence annotated anddeposited in the EMBL database (manuscript in

preparation), will allow the comparison of thecomplete genomes and proteomes of two well-studiedunicellular eukaryotes, which diverged around 330million years ago (Berbee and Taylor, 1993).

Aims

The discrepancies in gene and orphan numbersproposed by previous analyses suggested thatS. cerevisiae would benefit from a consistentreannotation, applying new analytical methods andincorporating the data which have become availableover the last four years. In doing so, we wished toachieve:

1. The refinement of gene complement.2. The classification of orphans into hypothetical,

very hypothetical, and spurious ORFs whichshould be disregarded.

3. The identification of gene prediction errors.4. The identification of new genes.

The Sz. pombe genome annotation effort hasbenefited immensely from the availability of thecomplete genome of S. cerevisiae. The analysismethods used for the Sz. pombe genome combineab initio gene prediction algorithms and homologysearch results with rigorous manual inspection ofbiological context (Xiang et al., 2000). In addition,consistency checks using available cDNAs andESTs have been routinely performed, and newexperimental data from the fission yeast communityimmediately incorporated into the dataset. Webelieve that these methods provide an accurate,detailed gene set for this organism. The Sz. pombeanalysis procedure has been applied to S. cerevisiaein order to define an up-to-date non-redundant geneset with consistent annotations and a new estimateof orphan numbers.

Methods

DNA sequences

The sequences of the 16 S. cerevisiae chromosomes,and associated ORF translations, were downloadedfrom SGD on the 16th November 2000. ORFcoordinates were then converted into EMBL featuretable format and imported into the Artemissequence analysis and annotation tool (Rutherfordet al., 2000).

144 V. Wood et al.

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 143–154.

Analysis procedure

A number of standard analysis tools were used toassist the interpretation of the sequence data (asapplied to the Sz. pombe genome) (Xiang et al.,2000). Searches were performed against publicdatabases (SWISS-PROT and TrEMBL (Bairochand Apweiler, 1999), EMBL (Stoesser et al., 1999),Pfam (Bateman et al., 1999), and PROSITE(Bairoch, 1994)) using standard software (BLAST(Altschul et al., 1990), MSPcrunch (Sonnhammerand Durbin, 1994), tRNAscan (Lowe and Eddy,1997), FASTA (Pearson and Lipman, 1988) andGenewise (Birney et al., 1996)), to complete a seriesof automated analyses. This enabled annotatedDNA and protein features to be confirmed. Otherelements not included in the SGD annotation(experimentally identified snoRNAs and othercellular RNAs, omitted LTRs, and proteindomains), were also mapped onto the sequenceusing in-house Perl scripts. De novo gene predictionswere not performed as part of this analysis.

New genes

In the Sz. pombe genome, more than 300 genes havebeen identified which are conserved at the proteinlevel in other organisms, but absent from theS. cerevisiae dataset (manuscript in preparation).Some of these were small genes (70–150 aminoacids); TBLASTN searches were conducted to deter-mine whether these small genes had been omittedfrom the initial S. cerevisiae gene predictions.

New gene coordinates

Within the annotation tool Artemis (Rutherfordet al., 2000), FASTA alignments were performed onexisting gene predictions, to assess their accuracy.Overlapping ORFs were subject to systematicmanual inspection to determine whether the correc-tion of frameshifts or sequencing errors couldextend homology, by merging existing genes orincreasing their length.

Disregarded spurious ORFs, overlapping withreal genes

ORFs which have all, or the majority of theirtranslation overlapping with other annotated fea-tures, were individually assessed for similarity to allorganisms, as described in New gene coordinatesabove, together with experimental data if available.

For ORFs to be considered as spurious, they had tomeet all of the following criteria:

1. Small size (35–250 amino acids).2. Absence of similarity to known proteins.3. Absence of functional data which could not

have been generated by the real overlappinggene.

4. Greater than 25% overlap at the N-terminus or50% overlap at the C terminus with anothercoding feature; overlap with another feature atboth ends; or ORF containing a tRNA.

Transposon fragments were also removed.

Very hypothetical ORFs

In Sz. pombe, 177 ORFs which are consideredunlikely to be coding but cannot yet be dismissed asspurious have been assigned as very hypotheticalaccording to the following criteria:

1. Small size ( 100–250 amino acids).2. Absence of similarity to other known proteins.3. Overlap with other features, particularly at the

N-terminus, where they might interfere withpromoters (the overlaps in these cases are smallerthan those observed in disregarded ORFs).

4. Extreme GC content.

The annotation of Sz. pombe adequately discri-minates between very hypothetical proteins and realgenes and this approach has been applied to are-annotation of the S. cerevisiae genome.

Results

New genes identified

Three new genes were identified; 1. YBL071W-a ahypothetical conserved protein (simultaneouslyidentified by Blandin et al.) 2. YAL044W-a, thehomologue of Sz. pombe uvi31 3. YDL085C-a, thehomologue of the human 4F5S disease-associatedgene. The new genes and coordinates are listed inTable 1.

New coordinates (merged or extended genes)

The complete list of 46 proposed alterations to genecoordinates are presented in Table 1. Some of thesechanges have already been confirmed experimen-tally and deposited in the SWISS-PROT database

Reannotation of the S. cerevisiae genome 145

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 143–154.

(Bairoch and Apweiler, 1999) but may correspondto mutations in the sequenced strain. However,fragments pertaining to the same sequence shouldbe represented as a single feature in the publicdatabases. In addition to increased homology, data

from YPD indicates identical phenotypes andexpression patterns for some of these proposedmerges. For example, PRM7+YDL038+YDL037have the same transcript profile (repressed bymethylmethanesulphonate). Some of these proposed

Table 1. New, merged or extended genes

Chr. CDS Comment NewCoordinates

I YAL044W-a New gene 57520..57852II YBL071W-a New gene 89973..90221IV YDL085C-a New gene 302463..302669I YAL065C+YAL064C-A 1 11566..13173,13177..13362,13367..13744I YAR066W+YAR068W 1 221035..221643,221647..222930I IMD1 1or2 227728..228844,228846..229303II VMA2 extended 491228..492781III YCR099C+YCR100C+YCR101C 1 300825..301292,301294..302463,302478..303023III YCL001W-A 1or2 113074..113532,113549..113641,113644..114018III YCL069W 1or2 9427..9459,9463..11082IV PRM7+YDL038C+YDL037C 1or2 81982..382327,382330..384596,384600..385583IV YDR134C 1 721064..721474IV YDR474C+YDR475C 1or2 1407453..1409475,1409475..1409661,1409661..1410081IV TTR1 trimmed 1471122..1471451V YER066W 1or2 289637..290797V HVG1+YER039C-A 1or2 228455..229480V YEL077C+YEL076C+YEL075C 1 264..4095,4097..4553,4553..5117,5134..5418,5420..5875V YER189W+YRF1-2 1 571150..571463,571465..576520V KHS1 1 565667..565792,565796..566398VI AAD6+YFL057C 1or2 14305..14919,14919..15431VI BLM3+YFL006W 1or2 123474..128885,128885..129904VI YFL064C+65C+66C 1 1..1516,1437..3008,3033..3338,3340..3846VI YFR012W 1or2 167880..168488,168487..169113,169116..169301VI YFL042C+YFL043C 1or2 45720..46157,46156..47745VIII YHR218W+YHR219W 1or2 557819..557854,557857..558606,558712..559820,559822..560167,560169..562043VIII YHL049C+YHL050C 1or2 445..2283,2285..3700,3728..4540VIII YHR214W 1or2 541503..542255,542259..543542IX SDL1+YIL167W 1or2 29032..29412,29416..30048IX NIT1+YIL165C 1or2 33718..34086,34090..34686IX YIR043C+YIR044C 1or2 437040..437990,437994..438176IX YIL177C 1 483..4988,4987..6147IX HXT12 2 19515..19805,19808..21220X PRM10+YJL107C 1or2 217402..218554,218554..219713XI PKT1+YKT9 1or2 68270..69079,69081..70220XI YKL002W new splice 437416..437475,437544..438182XI YKL033W-A 1or2 74144..374305,374308..374853XII AQY2+YLL053C 1or2 35502..36044,36043..36360XII YRF1-4+YLR462W+YLR464W 1 1065951..1066556,1066567..1067079,1067082..1071230XII SDC25+YLL017W 2 112233..112502,112504..115992XIII YMR084W+YMR085W 1or2 436627..437412,437415..438788XV YOL162W+YOL163W 1or2 9596..10102,10106..10765XV VPS5 extended 453769..453801,453804..455795XV YOL048C new splice 240202..240945,241024..241308XV ABP140 2 784855..785685,785687..786742XVI YPL275W+YPL276W 1or2 17948..18382,18386..18416,18418..19079XVI YPL277C+YPL278C 1or2 15053..15492,15494..16868

1 Pseudogene2 Possible sequencing error (frameshift or stopcodon) or real mutation

146 V. Wood et al.

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 143–154.

gene extensions make previously predicted ORFsspurious.

Disregarded (spurious) ORFs

Using the criteria described in Methods, 370 ORFswere disregarded (Table 2. and see http://www.sanger.ac.uk/Projects/S_cerevisiae/spurious.shtml). Inagreement with Blandin et al. (2000), the ORFswhich correspond to SAGE tags within LTRs havebeen reclassified as spurious.

Orphans—very hypothetical

The discrimination between S. cerevisiae veryhypothetical proteins and orphans which are morelikely to be coding suggests 193 S. cerevisiae CDSshould be described as very hypothetical ORFs(after the removal of ORFs which should bedisregarded). Of these, 72 exhibit an overlap withanother CDS (Table 3 and see http://www.sanger.ac.uk/Projects/S_cerevisiae/veryhypothet.shtml).

The G+C content, range and average wascalculated for the fully partitioned ORFs onchromosomes I–V. ORFs were partitioned as: Real(characterised or well-conserved)=R; SequenceOrphans (possibly coding)=O; and Very Hypothe-tical (unlikely coding)=V. The mean G+C contentfor the partitioned ORF sets R :O:V are 40.24 :40.37 : 38.84 respectively, which indicates there maybe compositional differences between them. Eventhough the range of G+C for the very hypotheticalproteins is smaller than for real (23.29 vs. 27.07),the sample standard deviation is greater (V=5.06;R=3.47).

Discussion

Novel genes

The three novel genes predicted by this analysishave now been incorporated into the MIPS data-base (M. Muensterkoetter, MIPS, pers comm).

Blandin et al. predicted 49 additional novel genesusing interspecies sequence conservation, but someof these proposed new genes are spurious andothers could be labelled very hypothetical using thecriteria outlined in Methods. Some of these arepredicted due to other non-CDS features. Othersare extensions to existing genes. For example,YMR013wa is overlapped completely by a cellular

RNA, YGL258w is part of VPS5, and YER039ca ispart of HGV1. Other gene predictions from thisdataset extend beyond the newly proposed codingregion, and may correspond to regulatory regions,or to as yet undiscovered cellular RNAs. Forexample, YDL159wa is predicted to code for a 43amino acid peptide (129 base pairs corresponding tothe largest ORF) but the region of high similarityextends over 391 base pairs. Some predictions arederived from translations between 28 and 99 aminoacids in length, and correspond to low complexityDNA sequence, often with only one species homo-logue. There are attendant risks in defining a CDSsolely from an ORF and a statistically significantBLAST score (particularly with closely relatedorganisms), as this may not always be biologicallysignificant, or may pertain to a non-CDS feature.These predicted ORFs have been added to theSanger annotation as miscellaneous features andwill require further analysis before inclusion in theprotein set.

Merged and extended genes

Of the 46 alterations we propose, eight belong tosubtelomeric duplicated elements and are possiblypseudogenes. For the remainder, those sequences notalready corrected or confirmed or corrected by thesequencing of the genomic DNA will require rese-quencing for verification. However, frameshifts maystill persist in the sequenced strain due to mutations.

Disregarded spurious ORFs

Of the 370 genes proposed here to be disregardedORFs, 227 were also predicted as unlikely to becoding by Zhang and Wang (2000). However, theZhang and Wang analysis did not adequatelydifferentiate between coding and non-codingsequences when applied to ORFs which were notin the questionable category of the MIPS database.Here, 18 of the 46 ORFs predicted to be non-codingfor chromosomes I and II are now either function-ally characterised (YPD) or conserved in distantlyrelated organisms.

Malpertuy et al. (2000) propose that 91 of theORFs annotated by MIPS as questionable (becausethey largely overlap other features) are actually real,based on similarity to the recently sequencedhemiascomycetes. We propose all of these ORFsshould be disregarded as they will generate apparently

Reannotation of the S. cerevisiae genome 147

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 143–154.

Table 2. Spurious ORFs within other CDS or sequence features

Spurious ORF Overlapping feature Spurious ORF Overlapping feature Spurious ORF Overlapping feature

YAL004W SSA1 YDL034W GPR1 YER119C-A SCS2YAL035C-A MTW1 YDL041W SIR2 YER138W-A LTRYAL043C-A ERV46 YDL050C LHP1 YER181C LTRYAL045C new UVI31 YDL062W YDL063C YFL013W-A YFR055WYAL058C-A CNE2 YDL068W CBS1 YFL067W YFL013CYAL064W-B YAL065C YDL094C PMT5 YFL068W YFL066CYAR009C ty fragment YDL096C PMT1 YFR024C YFL066CYAR010C ty fragment YDL118W YDL119C YFR056C YFR024C-AYBL012C SCT1 YDL151C RPC53 YGL024W PGD1YBL053W SAS3 YDL152W SAS10 YGL042C DST1YBL062W SKT5 YDL158C STE7 YGL069C YGL068WYBL065W SEF1 YDL163W CDC9 YGL072C HSF1YBL070C AST1 YDL187C YDL186W YGL074C HSF1YBL073W AAR2 YDL221W CDC13 YGL102C RPL28YBL077W ILS1 YDL228C SSB1 YGL109W YGL108CYBL083C RHK1 YDR008C TRP1 YGL132W YGL131CYBL094C YBL095W YDR034C-A LTR YGL152C CUP2 PEX14YBL096C YBL095W YDR048C YDR049W YGL165C CUP2YBL100C ATP1 YDR053W DBF4 YGL168W PMR1YBL107W-A LTR YDR094W YDR093W YGL199C YGL198WYBR051W REG2 YDR112W YDR111C YGL214W SKI8YBR064W YBR063C YDR133C YDR134C YGL217C KIP3YBR089W POL30 YDR149C NUM1 YGL218W YGL219cYBR090C NHP6B YDR154C CHP1 YGL235W MTO1YBR099C MMS4 YDR187C CCT6 YGL239C CSE1YBR113W CYC8 YDR193W NUP42 YGR011W YGR010WYBR116C TKL2 YDR199W YDR200C YGR018C YGR017WYBR124W TFC1 YDR203W RAV2 YGR022C MLT1YBR174C YBR175W YDR230W COX20 YGR064W SPT4YBR206W KTR3 YDR241W LTR (29.16GC) YGR073C SMD1YBR219C YBR220C YDR269C CCC2 YGR114C SPT6YBR224W YBR223C YDR271C CCC2 YGR115C SPT6YBR226C YBR225W YDR278C TRNA YGR122C-A LTRYBR232C PBP2 YDR290W RTT103 YGR137W YGR136WYBR266C YBR267W YDR327W SKP1 YGR151C RSR1YBR277C DPB3 YDR340W TRNA and LTR YGR160W NSR1YCL022C KCC4 YDR355C NUF1 YGR164W TRNAYCL023C KCC4 YDR360W VID21 YGR176W ATF2YCL041C PDI1 YDR366C LTR YGR190C HIP1YCL042W GLK1 YDR396W NCB2 YGR219W MRPl9YCL046W YCL045C YDR401W DIT2 YGR226C AMA1YCL074W ty fragment YDR413C YDR412W YGR228W SMI1YCL075W ty fragment YDR417C RPL12B YGR242W YAP1802YCL076W ty fragment YDR426C YDR425W YGR259C TNA1YCR013C PGK1 YDR431W YDR430C YGR265W MES1YCR018C-A LTR YDR433W NPL3 YGR290W MAL11YCR041W MATALPHA1 YDR442W SSN2 YHR125W LTRYCR049C ARE1 YDR445C YDR444W YHR145C LTRYCR050C ARE1 YDR455C NHX1 YHR214W-A YHR214YCR064C YCR063W YDR467C YDR466W YIL060W delta LTRYCR087W YCR087C-A YDR509W YDR509C YIL080W YIL082WYDL009C YDL010W YDR521W YDR520C YJL007C tRNA AND LTRYDL011C YDL010W YDR526C YDR527W YJL009W CCT8YDL016C CDC7 YDR537C PAD1 YJL018W YJL019WYDL023C GPD1 YEL076C-A YELO76C YJL022W PET130YDL026W YDL027C YEL076W-C YEL077C YJL032W BET4YDL032W YDL033C YER066C-A YER067W YJL075C NET1

148 V. Wood et al.

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 143–154.

Table 2. Continued

Spurious ORF Overlapping feature Spurious ORF Overlapping feature Spurious ORF Overlapping feature

YJL086C YJL087C YLR261C YPT6 YNL170W PDS1YJL119C YJL118W YLR279W YLR281C YNL171C APC1YJL142C YAK1 YLR280C YLR281C YNL174W NOP13YJL152W YJL151C YLR282C YLR283W YNL184C MRPL19YJL169W SET2 YLR317W TAD3 YNL203C SPS19YJL175W SWI3 YLR322W SFH1 YNL205C SPS18YJL188C RPL39 YLR331C MID2 YNL226W YNL227CYJL195C CDC6 YLR334C LTR YNL228W YN227YJL202C PRP21 YLR338W VRP1 YNL235C SIN4YJL211C PEX2 YLR339C RPP0 YNL266W IST1YJL220W FSP2 YLR349W DIC1 YNL276C MET2YJR018W ESS1 YLR358C RSC2 YNL285W LTR and TrnaYJR020W TES1 YLR374C STP1 YNL296W MON2YJR023C LSM8 YLR379W SEC61 YNL319W HXT14YJR037W HUL4 YLR428C CRN1 YNL324W FIG4YJR038C YJR039w YLR434C YLR435W YNR005C VPS27YJR071W YJR070C YLR444C ECM7 YNR042W COQ2YJR079W YJR080C YLR458W NBP1 YOL013W-A LTRYJR087W YJR088C YLR463C YRF1-4 YOL035C YOL036WYJR128W ZMS2 YLR465C YRF1-4 YOL037C YOL036WYKL030W MAE1 YML010C-B SPT5 YOL046C YOL045WYKL036C YKL037W UGP1 YML010W-A SPT5 YOL050C GAL11YKL053W YKL052C YML013C-A YML013W YOL079W REX4YKL076C YKL075C YML035C-A SRC2 YOL099C PKH2YKL083W YKL082C YML048W-A PRM6 YOL106W LTRYKL111C ABF1 YML058C-A CMP2 YOL134C HRT1YKL118W VPH2 YML095C-A GIM5 YOL150C GRE2YKL123W SSH4 YML100W-A ARG81 YOR053W YOR054CYKL131W RMA1 YML102C-A CAC2 YOR055W YOR054CYKL136W ALP2 YML117W-A YML117W YOR068C new VPS5YKL147C YKL146W YMR046W-A LTR YOR082C YOR083WYKL153W GPM1 YMR052C-A FAR3 and STB2 YOR102W OST2YKL169C MRPL38 YMR075C-A YMR075W YOR105W YOR104WYKL177W STE3 YMR119W-A YMR119W YOR121C GCY1YKL202W MNN4 YMR135W-A YMR135C YOR135C IDH2YKR012C PYR2 YMR153C-A NUP53 YOR139C SFL1YKR033C DAL80 YMR158C-B LTR YOR146W YOR145CYKR035C FTI1 YMR158W-A APG16 YOR169C GLN4YKR047W NAP1 YMR172C-A HOT1 YOR170W LCB4YLL020C KNS1 YMR173W-A DDR48 YOR200W PET56YLL037W PRP19 YMR193C-A YMR194W YOR203W HIS3 and DED1YLL044W RPL8B YMR244C-A YMR245W YOR218C RFC1YLL047W RNP1 YMR290W-A HAS1 YOR225W ISU2YLR041W YLR040C YMR294W-A YMR295C YOR248W SRL1YLR062C RPL22-A YMR306C-A FKS3 YOR263C YOR264WYLR076C RPL10 YMR316C-A DIA1 YOR277C CAF20YLR101C ERG27 YMR316C-B YMR317W YOR282W PLP2YLR140W RRN5 YNL013C YNL014W YOR300W BUD7YLR169W APS1 YNL017C Trna YOR309C NOP58YLR171W APS1 YNL043C YIP3 YOR325W YOR324CYLR198C SIK1 YNL057W YNL058C YOR331C VMA4YLR202C YLR201C YNL089C RHO2 YOR333C MRS2YLR217W CPR6 YNL105W INP52 YOR345C REV1YLR230W CDC42 YNL109W YNL108C YOR364W YOR365CYLR232W YLR231C YNL114C RPC19 YOR366W YOR365CYLR235C TOP3 YNL120C YNL119W YOR379C YOR378WYLR252W YLR251W YNL140C RLR1 YPL035C YPL034W

Reannotation of the S. cerevisiae genome 149

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 143–154.

significant, but spurious, TBLASTX matches toalternative frames of the real gene (anti-sense orsense-different reading frame).

Of the ORFs previously defined as questionablebut now proposed to be coding by Malpertuy et al.(2000), and retrievable from the Genolevureswebsite (http://cbi.labri.u-bordeaux.fr/Genolevures/Genolevures.php3), at least 107 out of 136 occur inoverlapping pairs. These pairs have two significantTBLASTX hits when the ascomycete DNA iscompared to the S. cerevisiae predicted protein set;the best score belonging to the real coding sequenceand a lower score generated by the overlap with thespurious ORF and an alternative translation of theclosely related organism’s DNA. This is illustratedby the three pairs of overlapping genes YGR220C/YGR219w, YOR054c/YOR055w, and YDR443C/YDR442w in Table 4 (data from the Genolevureswebsite). The correct reading frame should also beapparent if levels of synonymous and nonsynony-mous nucleotide substitution are calculated for thealigned regions.

After the merging of sequences identified inMerged and extended genes (Table 1), only eightgenes of known or inferred function in the entireS. cerevisiae genome remain overlapping. The over-laps and their orientations are listed in Table 5. Thelongest overlaps observed were 55 and 34 aminoacids, which are possibly attributable to sequencinganomalies, or deletions; the other six are 10 aminoacids or less, and predominantly C-terminal.

Overlapping CDS features are also rare inSz. pombe. Of 4189 genes which are characterisedor conserved, only three pairs have an overlapgreater than 10 amino acids in length, none ofwhich were at the N terminus. Moreover, since thecompletion of the S. cerevisiae genome, no function

or biologically significant similarity to any othersequenced organism has been observed for any ofthe largely overlapping ORFs designated here asspurious. This is despite the major efforts ofEUROFAN and other functional genomics studiesto determine the function of every yeast gene, andthe exponential increase in protein sequencesdeposited in the public databases.

Considering the rarity of overlapping genes inboth yeasts, and the absence of unequivocal funct-ional evidence in support of the coding integrity ofany of the spurious ORFs which are wholly orlargely overlapping real genes, the likelihood thatany encode for proteins is minimal. Therefore, itwould be prudent to remove them completely fromthe genome totals and label them accordingly in thepublic databases.

Very hypothetical proteins

One advantage of discriminating between sequenceorphans likely to be coding, and very hypotheticalorphans, is that these regions of DNA can be easilypartitioned as a subset, facilitating the identificationof other features by bioinformatics analyses.

Despite over 1000 Sz. pombe experimental genecharacterisations, only three of the 177 Sz. pombeORFs annotated as ‘very hypothetical protein’ haveso far been shown to be protein coding. One ofthese, git11, is a 76 amino acid protein, and is belowthe threshold imposed on length, but appeared tohave coding potential. The size distribution andsplicing frequency of Sz. pombe orphans, whencompared to genes of known or inferred function,suggests that a larger proportion may not be real(manuscript in preparation).

Table 2. Continued

Spurious ORF Overlapping feature Spurious ORF Overlapping feature Spurious ORF Overlapping feature

YPL044C NOP4 YPL251W YAH1 YPR087W SRP54YPL073C UBP16 YPR002C-A LTR YPR099C YPR0100WYPL102C YPL101W YPR038W YPR037C YPR123C CTR1YPL114W YPL113C YPR039W YPR037C YPR126C YPR125WYPL136W YPL137W YPR044C RPL43A YPR130C SCD6YPL142C RPL33A YPR050C MAK3 YPR136C RRP9YPL182C YPL181W YPR053C NHP6A YPR177C PRP4YPL185W YPL186c YPR059C YMC1 YPR197C SGE1YPL197C RPL7B YPR076W OPY2YPL238C SUI3 YPR077C YPR078C

150 V. Wood et al.

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 143–154.

Implications for post genomics

Many of the spurious overlapping ORFs included inthe public databases, and proposed as disregardedORFs by this analysis, have associated functional

genomics data which could be artefacts. The originalyeast microarrays (using PCR products), were notstrand specific with respect to the probes (DeRisiet al., 1997), and opposite strand transcripts couldhybridise to these array spots (D. Vetrie, perscomm).

Table 3. Very hypothetical proteins with no homology and low coding potential

CDS CDS CDS CDS

YAL069W YEL045C YIL100W YLR385CYAL066W YEL033W YIL086C YLR400WYAR029W YEL010W YIL054W YLR402WYAR030C YER053C-A YIL025C YLR415CYAR047C YER066C-A YIL012W YLR416CYAR053W YER071C YIL059C YLR184WYAR060C YER084W YIL058W YML108WYAR064W YER091C-A YIL032W YML090WYAR069C YER092W YIR020C YML089CYAR070C YER093C-A YIR020W-B YML084WYBL071C YER097W YIR044C YMR031W-AYBL048W YER121W YJL199C YMR151WYBL129C-A YER181C YJL182C YMR057CYBR012C YER187W YJL150W YMR086C-AYBR013C YER189W YJL135W YMR194C-AYBR027C YFL015C YJL120W YMR245CYBR032W YFL019C YJL027C YMR304C-AYBR209W YGL261C YJL015C OR YJL016W YMR320WYBR300C YGL260W YJR114W OR YJR113C YMR324CYBR134W YGL204C YJR146W YNL338WYBR178W YGL193C YJL067W OR YJL066C YNL337WYBR190W YGL188C YJL065C OR YJL064W YNL303WYCL021W-A YGL182C YJL062W-A YNL300WYCR006C YGL177W YJL052-A YNL226WYCR022C YGL149W YKL162C-A YNL198CYCR025C YGL052W YKL102C YNL179CYCR043C YGL051W YKL044W YNL149C OR YNL150WYCR085W YGL041C YKL031W YNL146WYCR102W-A YGR025W YKL115C YNL028WYDL242W YGR050C YKR032W YNR025C OR YNR024WYDL196W YGR051C YKR041W OR YKR040C YNR001W-AYDL172C OR YDL173W YGR069W YKR073C YOL166CYDL162C YGR139W YKL106C-A YOL026CYDL157C YGR182C YKL018C-A YOR012W OR YOR013WYDR114C YGR269W YLL030C YOR024WYDR029W YGR291C YLR111W YOR041C OR YOR042WYDR157W YGR293C YLR112W YOR225WYDR209C YGR161W-A YLR123C OR YLR122C YOR235WYDR220C YGR271C-A YLR161W YOR268CYDR320C-A YHL045W YLR162W YOR304C-AYDR504C YHL005C YLR255C YOR343CYDR491C YHR217C YLR269C YPL261CYDR034W-B YHR130C YLR294C YPL205CYDR070C YHR139C-A YLR311C YPR012WYEL074W YHR173C YLR302C YPR092WYEL073C YIL174W YLR346C YPR142C OR YPR143WYEL068C YIL163C YLR365W YPR146CYEL059W YIL141W YLR366W YPR150W OR YPR151CYPR074W-A

Reannotation of the S. cerevisiae genome 151

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 143–154.

Positive signals may also result from overlappingUTRs. Not unexpectedly, many of the disregardedORFs have transcript profiles similar to the over-lapping characterised gene. Gene knockouts ofspurious ORFs may give phenotypes, particularly ifthey affect overlapping strand ORFs, promoters, orother cellular RNAs. It has been observed that someof the knockouts of overlapping ORFs have essen-tially the same, or similar phenotype to the realadjacent gene. These transcript and phenotypeartefacts, attached to the database entries, lend thesepredictions false credibility as proteins. The inclusionof spurious ORFs may therefore affect the accuracyof any previous global analysis of transcription orredundancy.

New gene number estimate

Our analysis provided a new estimate of genenumber for each S. cerevisiae chromosome. Theseare provided in Table 6.

When S. cerevisiae was first published, 6275ORFs were predicted; 390 of these were proposedto be spurious giving a probable gene number of5885 (Goffeau et al., 1996). The data used for ouranalysis (SGD) consisted of 6282 ORFs, of which370 have been disregarded, giving a new maximumupper limit of 5804. The removal of 42 pseudo orframe-shifted sequences, and 193 very hypotheticalproteins further reduces this total to 5570. This is

likely to be closer to the true upper limit, becausethe criteria used for the determination of veryhypothetical proteins are quite conservative.There is a possibility that a small number ofthe very hypothetical proteins may eventually bedetermined to be coding, but size distribution(unpublished) indicates that we may still be overestimating the number of small ORFs.

Malpertuy et al. predicted a gene number of5651. In addition, using two different statisticalmethods, they estimated that the actual number ofprotein coding ORFs should be either 5542 or 5552,but do not account for the differences between theirpredicted number of 5651 and the statisticalcalculations. The statistical calculations are closerto the number of genes predicted by our analysis(5570 or fewer). The discrepancies could be due tothe inclusion of novel genes which are in factspurious (see Discussion, Novel Genes), or theinclusion of genes previously defined as question-able, but proposed by this analysis to be dis-regarded (see Discussion, Spurious ORFs).

What are the remaining orphans?

Data obtained by Gaillardin et al. (2000) demon-strated that ascomycete specific genes are highlyrepresented in the functional classes of cell wallorganisation, extracellular/secreted proteins, andtranscriptional regulators suggesting that theydiverge more rapidly than other classes of genes.In Sz. pombe, many remaining orphans are lowcomplexity or repetitive proteins e.g. serine-richwith low similarity to alpha-agglutins and other cellsurface proteins, or proteins with basic chargedregions which may correspond to transcriptionfactors. It may be that most orphans correspondto genes which have diverged so much that they areunrecognisable, rather than novel genes. It is there-fore possible that the majority of orphans are geneswhich have diverged more rapidly and that thenumber of truly species specific genes is very small.

A comparison of the refined orphan sets of

Table 4. Real/false pairs

Ascomycete DNA Real gene Length P valueBlastScore

DisregardedORF Length P value

BlastScore

OverlapLength

AR0AA003F06TP1 YGR220C MRPL9 269 7e-69 72 YGR219w 113 0.01 31 100AR0AA005F12TP1 YOR054c 674 2e-09 52 YOR055w 144 2e-10 49 127AR0AA008C07CP1 YDR443C SSN2 1420 2e-16 130 YDR442w 130 3e-12 66 120

Table 5. Real S. cerevisiae genes with overlaps

Gene 1 Gene 2 Overlap (AA) Orientation

BUD5 MATALPHA2 0.3 N/CAUA1 YFL010C 55 C/CAMD1 PRP38 1 C/CECM12 YHR022C 34 C/CVPS38 YLR361C 10 C/CYML096W RAD10 1 C/CYNL246W YNL245C 0.3 C/CCTF19 YPL017C 8 C/C

N=N terminus, C=C terminus, AA=Amino acid length.

152 V. Wood et al.

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 143–154.

Sz. pombe and S. cerevisiae will aid the detection ofsubtle homologies and physical similarities betweenthe sequence orphans themselves, or orphans andpreviously characterised genes. For example, thefinal subunit of Sz. pombe RNA polymerase III (thehomologue of S. cerevisiae RPC31) was identifieddue to similarity in amino acid length and thepresence of an acidic C terminus, despite a lowsimilarity score (Richard Maraia, NICHD, NIH,Bethesda and George Shpakovski, Russian Acad-emy of Sciences, Moscow. pers comm).

Global comparison of the remaining orphans willfacilitate definition of the sets of genes necessaryfor unicellular eukaryotic life. However, to dothis effectively, it is important that a distinction isfirst made between orphans and spurious ORFs(Malpertuy et al., 2000).

Conclusions

A substantial proportion of small orphans areprobably not protein coding, yet may define othergenome features (regulatory regions, cellular RNAsor even gene-free regions which may be involved inhigher order chromosome structure). These maycontain spurious ORFs which, if defined as CDS,appear to generate matches at the protein level.Spurious gene predictions, with associated artefac-tual functional genomics data, will exclude theseregions of DNA from being inspected for non-CDSfeatures. Attaching a suitable annotation to thesewould facilitate the detection of authentic features.

It is important to differentiate firstly betweenorphans and disregarded spurious ORFs, andsecondly, between likely real orphans and veryhypothetical orphans. Refinement of the orphansets of sequenced genomes will enable the detection

of more subtle homologies and other physicalsimilarities between the real orphans.

As the number of orphans is gradually eroded bythe removal of non-coding ORFs and the detectionof distant homologues, it will become easier todetermine how many truly species specific genesexist in the Sz. pombe and S. cerevisiae genomes.

The annotation of an ORF’s status within thepublic datasets is important for both functionalgenomics and bioinformatics. The costs of reagents,labour and curation of 370 ORFs which should bedisregarded in functional genomics analyses are nottrivial, they account for roughly 5% of the totaleffort. Bioinformatics on proteome data to examineamino acid composition, charge, etc. require accu-rate datasets (perhaps with different confidencelevels attributed). Integration of contextual infor-mation on a gene-by-gene basis to determine statuswill enable the targeting of future research towardgenes which are more likely to be coding.

As more analyses are performed, we should getcloser to the absolute gene number. Taken incombination, previous analyses and the interpretionof the biological context of the ORF should enablebetter estimates of probable gene and orphannumber for this yeast.

Data availability

Updated EMBL format sequences (containingnearly 12000 annotations) which can be examinedin Artemis and a one-gene one-protein FASTAformat protein translations database are availablefrom the Sanger Centre ftp site (ftp://ftp/pub/yeast/SCreannotation).

The EMBL entries will continue to be maintained(and will be resubmitted to EMBL with permission

Table 6. Predicted S.cerevisiae gene numbers, by chromosome

Chromosome number I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI Total

ORF No. (Goffeauet al. 1996)

110 422 172 812 291 135 572 288 231 387 334 547 487 421 569 497 6275

Questionable (Goffeauet al. 1996)

3 30 12 65 13 5 57 12 11 29 20 41 30 23 3 36 390

SGD 2000 ORFs 107 428 173 819 288 134 571 284 224 387 335 547 491 421 573 500 6282Spurious non coding 8 27 16 61 7 3 39 3 2 25 19 38 27 27 38 28 368Sanger new 98 397 154 743 275 122 525 275 215 354 317 497 455 391 527 459 5804Pseudo or frameshift 2 2 3 1 2 2 2 3 5 5 2 4 3 1 3 2 42Sanger veryhypothetical

10 13 6 16 16 1 21 7 12 14 11 19 12 13 11 10 192

Final 86 382 145 726 257 119 502 265 198 335 304 474 440 377 513 447 5570

Reannotation of the S. cerevisiae genome 153

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 143–154.

of the original authors). Further refinement of thedatasets described will include:

1. New similarity information from BLASTX.2. EST/cDNA mappings.3. Regulatory region identification/mapping.4. Inclusion of other annotated features (keys and

qualifiers) from individual GenBank/EMBLentries.

References

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990.Basic Local Alignment Search Tool. J Mol Biol 215: 403–410.

Bairoch PBA. 1994. A generalized profile syntax for biomole-cular sequences motifs and its function in automatic sequenceinterpretation. In ISMB-94; Proceedings 2nd InternationalConference on Intelligent Systems for Molecular Biology.AAAIPress; 53–61.

Bairoch PBA, Apweiler R. 1999. The SWISS-PROT proteinsequence data bank and its supplement TrEMBL in 1999.Nucleic Acid Res 27(1): 49–54.

Bateman A, Birney E, Durbin R, Eddy S, Finn R, SonnhammerE. 1999. Pfam 3.1: 1313 multiple alignments and profilesHMMs match the majority of proteins. Nucleic Acid Res 27(1):260–262.

Berbee ML, Taylor JW. 1993. Dating the evolutionary radiationsof the true fungi. Can J Bot 71(1114): 1127.

Birney E, Thompson J, Gibson T. 1996. PairWise and Search-Wise: finding the optimal alignment in a simultaneouscomparison of a protein profile against all DNA translationframes. Nucleic Acid Res 24(14): 2730–2739.

Blandin G, Durrens P, et al. 2000. The genome of Saccharomycescerevisiae revisited. FEBS Lett 1–6.

DeRisi JL, Vishwanath R, Brown P. 1997. Exploring themetabolic and genetic control of gene expression on a genomicscale. Science 278: 681–686.

Dujon B. 1996. The yeast genome project: what did we learn?Trends Genet 12(7): 263–270.

Gaillardin C, Duchateau-Nguyen G, Tekaia F, et al. 2000.Genomic exploration of the hemiascomycetous yeasts: 21.Comparative functional classification of genes. FEBS Lett

134–149.Goffeau A, Barrell B, Bussey H, et al. 1996. Life with 6000 genes.

Science 274: 546–567.Hieter P, Boguski M. 1997. Functional genomics: it’s all how you

read it. Science 278: 601–602.Lowe T, Eddy S. 1997. tRNAscan-SE: a program for improved

detection of transfer RNA genes in genomic sequence. Nucleic

Acid Res 25: 955.Mackiewicz P, Kowalczuk M, Gierlik A, Dudek MR, Cebrat S.

1999. Origin and properties of non-coding ORFs in the yeastgenome. Nucleic Acid Res 27: 3503–3509.

Malpertuy A, Tekaia F, Casaregola S, et al. 2000. Genomicexploration of the hemiascomycetous yeasts: 19. Ascomycetes-specific genes. FEBS Lett 113–121.

Mewes H, Albermann K, Bahr M, et al. 1997. Overview of theyeast genome. Nature 387. supplement.

Oliver S. 1996. From DNA sequence to biological function.Nature 379: 597–600.

Oliver SG, van der Aart QJM, Agostoni-Carbone ML, et al.1992. The complete DNA sequence of yeast chromosome III.Nature 357: 38–46.

Pearson W, Lipman D. 1988. Improved tools for biologicalsequence comparison. Proc Natl Acad Sci 85: 2444–2448.

Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P,Rajandream M, Barrell B. 2000. Artemis: sequence visualiza-tion and annotation. Bioinformatics 16(10): 944–945.

Sharp P, Cowe E. 1991. Synonymous codon usage in Sacchar-

omyces cerevisiae. Yeast 7: 657–678.Sonnhammer E, Durbin R. 1994. A workbench for large scale

sequence homology analysis. Comput Appl Biosci 10: 301–307.Stoesser G, Tuli M, Lopez R, Sterk P. 1999. The EMBL

Nucleotide Sequence Database. Nucleic Acid Res 27(1): 18–24.Xiang Z, Moore K, Wood V, et al. 2000. Analysis of 114 kb of

DNA sequence from fission yeast chromosome 2 immediatelycentromere-distal to his 5. Yeast 16: 1405–1411.

Zhang C-T, Wang J. 2000. Recognition of protein coding genesin the yeast genome at better than 95% accuracy based on theZ curve. Nucleic Acid Res 28: 2804–2814.

154 V. Wood et al.

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 143–154.

64

Submitted Publication

3. GeneDB: a resource for prokaryotic and eukaryotic organisms.

!"#"$%& ' (")*+(," -*( .(*/'(0*12, '#3 "+/'(0*12,*(4'#2)5)67(2)12'#" 8"(19:;*<="(>? 67(2) @A B"',*,/? C'="(2" D**3? E'(12# F)="11?

F(#'+3 G"(7*(#*+? B'+= E**#"0? F3(2'# H2I"0? E'117"< %"((25'#? J"2= 8'==?

G25 K+17"(-*(3? L+=2'# B'(/72==? F=')3'2( 6A MI"#)? E'(2":F3"=" K'N'#3("'5 '#3

%'(1 %'(("==

!"# $#%%&'(# !)*+, -./0#) 1/+,2,*,#3 !"# $#%%&'(# !)*+, 4#/'(# 5.(6*+3 72/8,'/3 5.(9)2:0# 5;<= <->3 ?@

A#&#2B#: >*0*+, <C3 D==EF >&&#6,#: >*0*+, D=3 D==E

F%@HKF6H

!"#"$% O711.&PP<<<A4"#"3QA*(4PR 2) ' 4"#*5" 3'1':Q')" -*( .(*/'(0*12, '#3 "+/'(0*12, *(4'#2)5)A H7"(")*+(," .(*I23") ' .*(1'= 17(*+47 <72,7 3'1'4"#"('1"3 Q0 17" B'17*4"# @"S+"#,2#4 T#21 '1 17"D"==,*5" H(+)1 @'#4"( M#)121+1" '#3 *17"( ,*=='Q*(:'12#4 )"S+"#,2#4 ,"#1(") ,'# Q" 5'3" .+Q=2,=0'I'2='Q="A M1 ,*5Q2#") 3'1' -(*5 U#2)7"3 '#3*#4*2#4 4"#*5" '#3 "V.("))"3 )"S+"#," 1'4 OW@HR.(*N",1) <217 ,+('1"3 '##*1'12*#? 17'1 ,'# Q")"'(,7"3? )*(1"3 '#3 3*<#=*'3"3? +)2#4 ' )2#4="<"Q Q')"3 (")*+(,"A H7" ,+(("#1 ("="')" )1*(") XX3'1')"1) *- <72,7 )2V '(" ,+('1"3 '#3 5'2#1'2#"3 Q0Q2*=*42)1)? <7* ("I2"< '#3 2#,*(.*('1" 2#-*(5'12*#-(*5 17" ),2"#12U, =21"('1+("? .+Q=2, 3'1'Q')") '#317" (").",12I" (")"'(,7 ,*55+#212")A

MJHKY$T6HMYJ

!"# $%&"'(#) *#+,#)-.)( /).& 0$*/1 %& &"# 2#33-'4# !5,6&*%)(#5 7)6&.&,&# 6#+,#)-#6 % 3%5(# ),48#5 '9 :.;#56#<5'=%5>'&.- %): #,=%5>'&.- (#)'4#6 0"&&<?@@AAAB6%)(#5B%-B,=@$5'C#-&6@1B 7) 5#-#)& >#%56D )#A 6#+,#)-.)( %): %66#483>&#-")'3'(.#6 %): -'33%8'5%&.')6 8#&A##) 6#+,#)-.)( .)6&.&,&#6"%;# :5%4%&.-%33> .)-5#%6#: 8'&" &"# ',&<,& %): +,%3.&> '96#+,#)-# :%&%B E%.)&#)%)-# %): :.66#4.)%&.') '9 6,-" :%&%5#+,.5#: &"# :#;#3'<4#)& '9 %) .)&#(5%&#:D <,83.-3> %--#66.83#:%&%8%6#B

F,5.)( G#)#FHI6 :#;#3'<4#)&D 9',5 =#> <'.)&6 "%;# 8##)&%=#) .)&' -')6.:#5%&.')B G#)#FH 4,6& 8# -%<%83# '9 6&'5.)(%): 95#+,#)&3> ,<:%&.)( 6#+,#)-#6 %): %))'&%&.')6D .55#6<#-J&.;# '9 &"# 6&%&,6 '9 &"# 6#+,#)-.)( <5'C#-&B !"# 5#6',5-#6"',3: &"#5#9'5# 6,<<'5& 8'&" &"# 4.).)( '9 <5#3.4.)%5>:%&%6#&6 9'5 (#)# :.6-';#5> %): &"# ;.#A.)( '9 K).6"#:6#+,#)-# :%&%B *#-'):3>D %) .)&,.&.;# ,6#5 .)&#59%-#D A".-"<5';.:#6 5%<.: %--#66D ;.6,%3.L%&.')D 6#%5-".)( %): :'A)3'%:J.)( '9 :%&%D 4,6& 8# 6"%5#: 8#&A##) &"# :%&%6#&6B !".5:3>D &"#:%&%8%6# %5-".&#-&,5# 6"',3: %33'A .)&#(5%&.') '9 :.;#56#8.'3'(.-%3 :%&%6#&6 A.&" &"# 6#+,#)-#B M%6&3>D &"# ,6# '9

6&5,-&,5#: ;'-%8,3%5.#6 A',3: #)6,5# 6&%):%5:.L%&.')D9%-.3.&%&.)( +,#5>.)( %): -'4<%5.6') 8#&A##) 6<#-.#6B

N,55#)&3>D G#)#FH "',6#6 &"# 6#+,#)-#6 %): %66'-.%&#:%))'&%&.') '9 OO '5(%).646D .)-3,:.)( 4#48#56 '9 &"#8%-&#5.%D 9,)(.D <5'&'L'% %): %5&"5'<':6B P9 &"#6#D 6.Q %5#K).6"#: (#)'4#6 %): K;# %5# ')('.)( 6#+,#)-.)( <5'C#-&60!%83# O1B

$FHF FJFZ[@W@ FJ$ @[@HWE FK68MHW6HTKW

G#)#FH 6&'5#6 6#+,#)-# :%&% %): %)%3>6#6 (#)#5%&#: ;.%%,&'4%&#: %))'&%&.') <.<#3.)#6D <5.'5 &' 4%),%3 %))'&%&.')B!"# %)%3>6.6 <.<#3.)#6 .)-3,:# (#)# K):.)( %3('5.&"46D<5'&#.) 9#%&,5# <5#:.-&.')6D HMR*! %):@'5 SR*!R 6#%5-"#6%(%.)6& ),-3#'&.:#D <5'&#.) %): -,6&'4.L#: :%&%8%6#6D <5'&#.):'4%.) %):@'5 9%4.3> 6#%5-" 5#6,3&6 %): #3#-&5').-%33>.)9#55#: %): 4%),%33> 5#;.6#: (#)# ')&'3'(> %66'-.%&.')60GP1 0O1B *#%5-" 5#6,3&6 %5# -')&.),%33> 5#;.#A#: :,5.)( &"#-,5%&.') <5'-#66 %): -'4<3#4#)&#: 8> %::.&.')%3 :%&%6#&60S.(B O1B

!"# 6#+,#)-# %): %))'&%&.') K3#6 %5# <5'-#66#: 8> &"#G#)#FH 4.).)( -':#D (#)#5%&.)( 8'&" &"# G#)#FH T%;%'8C#-&6 %): 6&%):%5:.L#: K3#6 ,6#: &' <'<,3%&# % G#)'4.-6/).K#: *-"#4% 0G/*1 :%&%8%6# 0"&&<?@@AAAB(,6:8B'5(@1B R6#& '9 :%&% K3#6 .)-3,:.)( SR*!R 6#+,#)-# K3#6 9'5 &".5:J<%5&>&''36D 6,-" %6 HMR*!D .6 %36' <5':,-#:B H'&" &"# 4.).)( -':#%): &"# G#)#FH '8C#-& 3%>#5 &%=# %:;%)&%(# '9 %;%.3%83# -':#95'4 &"# H.'T%;% <5'C#-&B R--#66 &' &"# G#)#FH :%&% &"5',("&"# G#)#FH A#86.&# .6 <5';.:#: 8> % 6#& '9 6#5;3#&6 %): T%;%*#5;#5 $%(#6 0T*$1B

$FHF 6YJHWJH FJ$ $M@BZF[

!"# G#)#FH "'4#<%(# 6,<<3.#6 3.)=6 &' &"# .):.;.:,%3'5(%).64 "'4#<%(#6B S5'4 &"#6#D 5#6#%5-"#56 -%) &%=#%:;%)&%(# '9 ),4#5',6 A%>6 &' 5#&5.#;# :%&% %): -')6&5,-&6#%5-"#6 %--'5:.)( &' .):.;.:,%3 <5#9#5#)-#6 %): 5#+,.5#J4#)&6B N3.-=%83# -"5'4'6'4# %): -')&.( 4%<6D 6#%5-"%83#&#Q& .):.-#6 %): 85'A6%83# -%&%3'(,#6 UGP %66.()4#)&6 0O1D:#6-5.<&.')6D <5':,-&6D :'4%.)6V <5';.:# 9%6& %): #%6> %--#66BR) %::.&.')%3 +,#5> .)&#59%-# 6,<<'5&6 % A.:# 5%)(# '9 +,#5.#6

W!' A"'4 -'55#6<'):#)-# 6"',3: 8# %::5#66#:B !#3? XYY OZZ[ Y\Y\]]^ S%Q? XYY OZZ[ Y\Y\O\^ _4%.3? -"9`6%)(#5B%-B,=

!"# %,&"'56 A.6" .& &' 8# =)'A) &"%&D .) &"#.5 '<.).')D &"# K56& &"5## %,&"'56 6"',3: 8# 5#(%5:#: %6 C'.)& S.56& R,&"'56

!"#$%&# '#&() *%)%+,#-. /001. 23$4 5/. 6+7+8+)% &))"% !""#$!"%"69:; <04<0=5>?+,>@A-00B

!"#$%&# '#&() *%)%+,#-. 23$4 5/. 6+7+8+)% &))"% ! 9CD3,( E?&F%,)&7G H,%)) /001I +$$ ,&@-7) ,%)%,F%(

!"#$%& '( !"#$%&'& ('()$'")& *+,($),)"-). /% 0&)123))./#*45 '"3+1,#-'+" 6#'"). 31+, -7) $'-)1#-01) #". +-7)1 (0/$'* 1)&+01*)&5 6)")1#-) .#-# 3+1 8)")9:;9#-# #1) (#1&). -71+067 -7) ,'"'"6 *+.) #". &)1'#$'<). '" /'"#1% =$)& #& >)$$ #& &-+1). '" -7) 8?@ 1)$#-'+"#$ .#-#/#&) A&)) #**+,(#"%'"6 -)B- 3+1 301-7)1'"3+1,#-'+"C;

)*+,& '( 9)-#'$& +3 -7) 6)"+,)& )'-7)1 *011)"-$% #D#'$#/$) '" 8)")9: +1 '" (1)(#1#-'+" 3+1 #..'-'+" -+ -7).#-#/#&)

@()*')& 8)")1# 8)"+,) &'<)A7#($+'.C A4/C

@-#-0& E01#-). 9#-# -%()

!"#$%&'##" ()*+, :#*-)1'# FGHH I'"'&7). AJKC L+ M8@

!-+,.%/"--+"0%$)-'/ *%$1' I0"6' JF HHH I'"'&7). AJNC O)& E$+") /#&).!"--+"0%$)-'/ -'0'2,/,"' JP HNQ I'"'&7). AJRC L+ E$+") /#&).3/*'04,##5/ 65$,4"(5/ SK HHH T" (1+61)&& L+ M8@

7#"/$%8,5$ 6"#-,*"05$ U1+-+<+# PP QHH E+,($)-). AJGC O)& ME@7#"/$%8,5$ -+"1"58, SH HHH T" (1+61)&& O)& M8@7#"/$%8,5$ 1'04+', PN HHH T" (1+61)&& O)& M8@9',/+$"&," $":%0 SS NHH E+,($)-). O)& ME@9',/+$"&," ,&6"&(5$# VSF HHH T" (1+61)&& L+ M8@;0)*"&%/%$" 105-', SK HHH T" (1+61)&& O)& ME@W:;0)*"&%/%$" -%&4%#'&/'# VSK HHH T" (1+61)&& L+ M8@;0)*"&%/%$" 2,2"<# VSK HHH T" (1+61)&& L+ M8@;0)*"&%/%$" -05.,# VFS HHH E+,($)-). L+ M8@=,-()%/('#,5$ 8,/-%,8'5$ SF HHH E+,($)-). L+ ME@WO

>#%//,&" $%0/,("&/ !1-71+(+.# ?"4"+>" T" (1+61)&& L+ X@Y

T" -7) Z@-#-0&[ *+$0,"5 ZI'"'&7).[ 1)3)1& -+ (0/$'&7). 6)"+,)& >'-7+0- &)\0)"*'"6 6#(& #". ZE+,($)-).[1)3)1& -+ 6)"+,)& -7#- #1) &7+-60" *+,($)-) /0- &-'$$ 1)\0'1) 6#( *$+&01); M8@ ] >7+$) 6)"+,) &7+-60"^*$+") /#&). ] &)\0)"*). +" # *$+") /% *$+") /#&'& 0&'"6 (7%&'*#$ ,#(&^ ME@W: ] >7+$) *71+,+&+,) #".:!E &7+-60"^ ME@WO ] >7+$) *71+,+&+,) #". O!E &7+-60"^ X@Y ] )B(1)&&). &)\0)"*) -#6;#9#-#&)-& -+ /) #..). &7+1-$%;

!"#$ ?5-#',- 3-,8/ @'/'"0-+A BCCDA E%#F GBA ="("1"/' ,//5'

!" #$%&$"'$# (") *'&+(,$)- (""!,(,.!"# #,!+$) ." /012 3.,4,4$ (5.6.,7 ,! '!85."$ #$(+'4$# 3.,4 ,4$ 9!!6$(" !:$+(,!+#;<= (") >?@ A!+ $B(8:6$2 &#$+# '(" #$6$', (66 :+!,$."# !C (#:$'.D$) 6$"E,4 +("E$ 3.,4 ( #:$'.D$) "&85$+ !C .",+!"#@>,4$+ %&$+7 !:,.!"# ."'6&)$ /> (##.E"8$",#2 F$73!+)#2'4+!8!#!8$2 :+!,$." )!8(."# (") :+$).',$) :+!,$." #$%&$"'$C$(,&+$#@ G4$ %&$+.$# ." $('4 #$##.!" (+$ ,+('F$) H.( ( 4.#,!+7:(E$2 (66!3."E C&+,4$+ +$D"$8$", !C #$(+'4$# (") )!3"6!()I."E !C +$#&6,# (# ( "&'6$!,.)$ !+ (8."! ('.) A;1G; D6$@A&+,4$+8!+$2 ( H(+.$,7 !C #$%&$"'$ #.8.6(+.,7 #$(+'4 C('.6.,.$#(+$ (H(.6(56$ ,4+!&E4 /$"$=9@ J" ()).,.!" ,! K0I9L;1G2/$"$=9 (6#! #&::!+,# !8".9L;1G2 34.'4 :$+8.,# #$(+'4."E('+!## ( #$, !C #$6$',(56$ )(,(5(#$#@ ;" .,$+(,.H$ 9L;1G *M1JI9L;1G- #$(+'4 #&.,$) ,! ,4$ .)$",.D'(,.!" !C ).#,(",4!8!6!E&$# .# $"H.#(E$) ,! 5$ (H(.6(56$ #4!+,67@ M$:,.)$#$%&$"'$# '(" 5$ #$(+'4$) 3.,4 $.,4$+ &#$+I#:$'.D$) 8!,.C# !+&#."E ,4$ :$:,.)$ 8(## .)$",.D'(,.!" ,!!6 NO>K1N2 :(+, !C,4$ #&., !C NO9>11 !:$"I#!&+'$ #!C,3(+$ ,!!6# *P-@ ;"(6,$+"(,.H$ (::+!('4 C!+ (''$##."E E$"$# !C .",$+$#, .# ,! &#$,4$ !CD'.(6 5+!3#$+ !C ,4$ /> '!"#!+,.&82 ;8./>@ 1$H$+(6).CC$+$", 8$,4!)# (+$ (H(.6(56$ C!+ %&$+7."E ,4$ )(,( 5!,4

$B,$+"(667 *4,,:QRR333@E!)(,(5(#$@!+ER- (") .",$+"(667 H.(/$"$=92 (66 !C 34.'4 ."'6&)$ ).+$', 6."F# ,! ,4$ E$"$ :(E$#@

A$(,&+$ :(E$#2 E$"$+(,$) C!+ '!)."E #$%&$"'$#2 ).#:6(75(#.' 6!'(,.!" ."C!+8(,.!" (") ( '!",$B, 8(:@ G4$ +$#&6,# !C:+!,$." C$(,&+$ :+$).',.!" (6E!+.,48# S1.E"(6M TP@U *V-2GOWOO HP@U *X-2 /MJ ("'4!+ :+$).',.!"# *4,,:QRRYPZ@YZX@Y[\@Y]\R)E:.R.")$B^$"@4,86-_ (") ,4$ 8("&(6 (""!,(,.!" (")'&+(,.!" :+!'$##$# (+$ :+!H.)$) ." 5!,4 ( E+(:4.'(6 ).#:6(7 ("),$B, C!+8(, *A.E@ P-@ G4.# ."C!+8(,.!" .# '!8:6$8$",$) 57 ,4$+$#&6,# !C #.8.6(+.,7 #$(+'4$#2 ."'6&)."E ,4$ ).#:6(7 !C:+$).',$) (") $B:$+.8$",(667 '4(+(',$+.`$) !+,4!6!E&$# ("):(+(6!E&$#@ ;)).,.!"(6 #$%&$"'$ C$(,&+$#2 5!,4 (, ,4$ =<;6$H$6 *$@E@ :!678!+:4.#8#2 .",+!"#2 0G?#2 #:6.'$ )!"!+ (")(''$:,!+ #$%&$"'$#- (") :+!,$." 6$H$6 *$@E@ :$:,.)$ )!8(."#-2'(" 5$ H.$3$) ." ,4$ '!",$B, !C ,4$ (""!,(,$) #$%&$"'$ H.( (";+,$8.# (::6$, *\- *A.E@ P-@ G4$ #$6$',$) +$E.!" '(" (6#! 5$)!3"6!()$) $.,4$+ ." A;1G; !+ (""!,(,$) NO9L D6$ C!+8(,@1$%&$"'$ )(,(2 $.,4$+ !C ,4$ :+$).',$) '!)."E #$%&$"'$ !+ ,4$'6&#,$+$) N1G#2 (+$ (''$##.56$ H.( ( #$'!")(+7 :(E$@

NB,$"#.H$ '+!##I+$C$+$"'."E #&::!+,# +$,+.$H(6 !C +$6(,$)."C!+8(,.!" C+!8 $B,$+"(6 +$#!&+'$#2 (66!3."E +(:.) ,+("#C$+

!"#$%& '( G4$ &::$+ 4(6C !C ,4$ !"#$%&' #')Y /$"$=9 C$(,&+$ :(E$ *4,,:QRR333@E$"$)5@!+ERE$"$)5R1$(+'4a"(8$b1M;cY]N[@UZd!+E(".#8b:!85$-@ G!!6#2( E6!##(+7 (") ( C$$)5('F C!+82 ."H.,."E '!88$",# (") &:)(,$# !" ( :(+,.'&6(+ E$"$e# (""!,(,.!"2 (+$ (''$##.56$ H.( ,4$ "(H.E(,.!" 5(+@ =&+."E ,4$ '&+(,.!":+!'$##2 #,(,$8$",#2 $B,+(',$) C+!8 :&56.' +$#!&+'$# (") ,4$ +$#$(+'4 '!88&".,72 (+$ '(:,&+$) ." #,+&',&+$) #7",(B2 +$C$+$"'$) ,! ,4$ +$6$H(", )(,(5(#$@;)).,.!"(6 #$%&$"'$ C$(,&+$# (") (""!,(,.!" '(" 5$ H.$3$) ." ;+,$8.#2 (''$##.56$ H.( ,4$ 6."F (5!H$ ,4$ E$"$ "$.E45!&+4!!) 8(:@

()*+',* -*,./ 0'/'12*34 56674 8$+" 954 :1;1&1/' ,//)' !"#$

!"#$""% &'#'!'("() *+,( ,%-./&"( 0"-,102-'. .,%3( #2 %/4"02/(&'#'!'("( +2/(,%5 %/-."2#,&" '%& 102#",% ("6/"%-"( 7")5)89:; <=>? @$,((AB02#C*089:; <D>E? 1'#+$'F( 7G8HH <I>E?102#",% J'4,.,"( 7")5) @KLB <M>? BJ'4 <NO>? P%#"0B02 <NN>E?2%#2.25,"( 7")5) HL <N>E? "Q10"((,2% &'#' 7")5) 4,-02'00'F<+##1RCC$$$)('%5"0)'-)/3C1"0.C@BH8C5""QS,"$>E? (#0',% ,%J204A'#,2% 7TU@@PLV <+##1RCC124!")!,2.()(/(Q)'-)/3>E '%& 1+"%2A#F1" &'#' 7")5) #+" !"#$%&'(')% *"+,-. WVX, 102Y"-# <+##1RCC$$$)*0F1'%2TXV)205C>E) ;,%3( #2 &'#'!'("( +2/(,%5 #+"('4" 5"%24" '# &,JJ"0"%# (,#"( 7")5) @HZ <N[>? *HXZ <+##1RCC$$$)#,50)205C#&!C"[3NC#!'NC#!'N)(+#4.>E '0" '.(2 102S,&"&)*+"(" .,%3( #2 "Q#"0%'. 0"(2/0-"( '0" S'.,&'#"& '%& /1&'#"&2% ' 42%#+.F !'(,( !F #+" H"%"Z: 4,%,%5 -2&") X%%2#'#20('%& -/0'#20( '0" '/#24'#,-'..F '."0#"& #2 ,%-2%(,(#"%-,"( ,% #+"&'#'("#( '%& -+'%5"& HL ,&"%#,\"0()

!"#" $%&"#'()

8Q1"0,"%-"& !,2.25,(#( -/0'#" &'#' J20 (,Q 2J #+" 205'%,(4( ,%H"%"Z: <*'!." N>) @/-+ -/0'#,2% ,%S2.S"( ("S"0'. '(1"-#(? '..',4,%5 #2 J'-,.,#'#" &'#' 6/"0F,%5 '%& 0"#0,"S'.) T,0(# '( '%/4!"0 2J 205'%,(4( '0" ("6/"%-"& !F 420" #+'% 2%"("6/"%-,%5 -"%#0"? ,# "%(/0"( -2%(,(#"%# '%%2#'#,2% '-02((#+" $+2." 2J #+" 0"(1"-#,S" 5"%24") @"-2%&.F? ("6/"%-"( '%&#+",0 '%%2#'#,2% '0" /1&'#"& '--20&,%5 #2 %"$ (/!4,((,2%( #21/!.,- &'#'!'("(? 1/!.,-'#,2%( '%& -2%#0,!/#,2%( !F #+" $,&"0(-,"%#,\- -244/%,#F <T,5) N>) B/!.,- ,%J204'#,2% ,( /("& %2#2%.F #2 S"0,JF "Q,(#,%5 5"%" 42&".( '%& '%%2#'#,2%( !/# '.(2 #2'&& S'./"? "%'!.,%5 /("0( #2 0"#0,"S" 502/1( 2J 5"%"(C102#",%(%2# 12((,!." !F 1/0".F -241/#'#,2%'. 4"#+2&2.25,"()]+"0"S"0 12((,!."? -2%#02.."& S2-'!/.'0,"( (/-+ '( HL <N>'0" /("&) P% #+" '!("%-" 2J (/-+ S2-'!/.'0,"(? (#'#"4"%#( '0"-'1#/0"& ,% (#0/-#/0"& (F%#'Q ",#+"0 ,% #+" &"(-0,1#,2% .,%"(<T,5) [> 20 ,% ' &"&,-'#"& -/0'#,2% \".&? 102S,&,%5 ' -2%-,("(/44'0F 2J #+" 4'Y20 '(1"-#( 2J ' 5"%"^( !,2.25F) ;,%3( '0"102S,&"& #2 B/!9"& 0"-20&( '%& 2#+"0 0"(2/0-"( /("& #2-241,." #+" (#'#"4"%#() *"Q# ,%&,-"( 12,%# #2 2#+"0 102&/-#((+'0,%5 #+" ('4" &"(-0,1#,2% .,%")

K/0'#20( 0"5/.'0.F "Q-+'%5" ,%J204'#,2% '%& /1&'#"( $,#+#+" 1/!.,- &'#'!'("(? ',4,%5 #2 (F%-+02%,_" #+"(" &'#'("#(5.2!'..F) T,%'..F? #+" -/0'#,2% 2J 0".'#"& (1"-,"( <")5)/0%()'1.+) (1"-,"(? #+" 2.&-3'$0%(3.1%> '# 2%" (,#" "%'!."("Q#"%(,S" -02((A0"J"0"%-,%5 '%& -241'0'#,S" '%'.F("(!"#$""% (1"-,"(? (/-+ '( #+" ,%-./(,2% 2J "Q1"0,4"%#'..FS"0,\"& '%& 10"&,-#"& 20#+2.25/"()

T20 #+0"" 205'%,(4( <4,5.6'(%,,5%"')#,-( $')*-?7-.(5)%&.% )%8'" '%& !"#$%&'(')% *"+,-.>? H"%"Z:-/0'#20( '0" '.(2 ,%S2.S"& ,% ,41."4"%#,%5 %24"%-.'#/0"5/,&".,%"( <N`> '%& 0"(2.S,%5 %24"%-.'#/0" -2%a,-#(? "%(/0,%5'--/0'#" '%& -241."#" 0"#0,"S'. 2J ,%J204'#,2%)

*%#%&+ !+,+-(./+)#0

H"%"Z: -2&" &"S".214"%# $,.. -2%#,%/" #2 -2%-"%#0'#" 2%,%#"50'#,%5 H"%"Z: $,#+ #+" Hb@ (-+"4') B'0# 2J #+,(&"S".214"%# ,( ' -2..'!20'#,2% $,#+ #+" Hb@ #"'4 '# #+"K241/#'#,2%'. :,2.25F '%& P%J204'#,-( ;'!20'#20F <K:P;?b%,S"0(,#F 2J B"%%(F.S'%,'> #2 &"(,5% ' -2442% $"!,%#"0J'-" '0-+,#"-#/0"? 1"04,##,%5 #+" -0"'#,2% 2J -/(#24,_"&$"! 1'5"( #2 (/,# ,%&,S,&/'. &'#'!'(" 0"6/,0"4"%#( 7")5)H"%"Z:? B.'(42Z: <Nc>? X..H"%"( <+##1RCC$$$)'..5"%"()205C>E)

K/0'#,2% $,.. !" "Q#"%&"& #2 ,%#"50'#" "Q10"((,2%? 1+"%2A#F1,- '%& ,%#"0'-#,2% &'#') *2 #+,( "Q#"%#? H"%"Z: -/0'#20('%& &"S".21"0( +'S" -2..'!20'#"& $,#+ #+" Hb@ #"'4 #2,41."4"%# 42&,\-'#,2%( #2 #+" Hb@ (-+"4' ,% 10"1'0'#,2%J20 #+" ,%-20120'#,2% 2J #+"(" .'05"A(-'." !,2.25,-'. &'#') X.(2?$,#+ #+" ,%-0"'(,%5 "41+'(,( 2% 5"%24,-( 102Y"-#( 2J 0".'#"&205'%,(4(? #+" H"%"Z: #"'4 '0" '.0"'&F &"(,5%,%5 #22.( J20-241'0'#,S" '%'.F("( #+'# -'% !" 0"'&,.F &,(1.'F"& S,' #+"$"!)

T/0#+"0420"? ,# ,( ,%#"%&"& #2 (/!(#'%#,'..F "Q1'%& #+"'S',.'!." !'-#"0,'. &'#'("#( #2 ,%-./&" '.. #+" !'-#"0,'.5"%24"( -241."#"& '%& 1/!.,(+"& !F #+" B@b <("" +##1RCC$$$)('%5"0)'-)/3CB02Y"-#(C9,-02!"(C J20 ' -2410"+"%(,S".,(#>)

"$1)(2-+!3+/+)#0

]" $2/.& .,3" #2 #+'%3 #+" K:P; #"'4? ,% 1'0#,-/.'0 d2%'#+'%K0'!#0""? @#"S" T,(-+"0? d2%'#+'% @-+/5 '%& K+0,( @#2"-3"0#)]" $2/.& '.(2 .,3" #2 #+'%3 -2..'!20'#,%5 ("6/"%-,%5 -"%#0"(?,% 1'0#,-/.'0? V'Y,! 8.A@'F"& '# *+" P%(#,#/#" J20 H"%24,-W"("'0-+? B"#"0 9F."0 '%& G"% @#/'0# '# #+" @"'##.":,24"&,-'. W"("'0-+ P%(#,#/#"? X&'4 G/(1' '# :'F.20K2.."5" 2J 9"&,-,%"? X%5".,3' V2"5". '# #+" b%,S"0(,#F 2JK2.25%"? 9,-+". e"02% '# #+" P%(#,#/# B'(#"/0? '%& 9,3";"+'%" '# #+" b%,S"0(,#F 2J ]'."(? :'%520 J20 (+'0,%5/%1/!.,(+"& &'#' '( $".. '( #+" %/4"02/( 0"("'0-+"0( $+2+'S" -2%#0,!/#"& #2 #+" '%%2#'#,2% 2J &'#'("#( ,% H"%"Z:)H"%"Z: ,( J/%&"& !F #+" ]"..-24" *0/(# #+02/5+ ,#( (/1120#2J #+" @'%5"0 P%(#,#/#") Hb@ $'( &"S".21"& !F K:P;)H"%"Z: '%& #+" K"%#0" J20 *021,-'. '%& 84"05,%5 H.2!'.Z,("'("(? b%,S"0(,#F 2J H"205,' +'S" 4'&" (,5%,\-'%# -2%A#0,!/#,2%( #2 ,# '( 1'0# 2J '% 2%52,%5 -2..'!20'#,S" "JJ20# $,#+K:P; #2 J/0#+"0 &"S".21 #+" (-+"4')

&+*+&+)$+0

N) X(+!/0%"0?9)? :'..?K)X)? :.'3"?d)X)? :2#(#",%?Z)? :/#."0?f)? K+"00F?d)9)?Z'S,(?X)B)? Z2.,%(3,?G)? Z$,5+#?@)@)? 811,5?d)*) -3 %09 <[OOO> H"%"2%#2.25FR #22. J20 #+" /%,\-'#,2% 2J !,2.25F) *+" H"%" L%#2.25FK2%(20#,/4) :%3+"- ;-&-39? !"? [gh[M)

[) W,-"?B)? ;2%5&"%?P) '%& :."'(!F?X) <[OOO> 89:L@@R #+" 8/021"'%92."-/.'0 :,2.25F L1"% @2J#$'0" @/,#") !"-&1( ;-&-39? #$? [D=h[DD)

`) V,".("%?f) '%& G025+?X) <NMMI> B0"&,-#,2% 2J (,5%'. 1"1#,&"( '%& (,5%'.'%-+20( !F ' +,&&"% 9'032S 42&".) /"',9 <&39 ='&>9 <&3-009 4#(39 ?'09@.'09? $? N[[hN`O)

c) @2%%+'44"0?8);)? S2% f",Y%"?H) '%& G025+?X) <NMMI> X +,&&"% 9'032S42&". J20 10"&,-#,%5 #0'%(4"4!0'%" +".,-"( ,% 102#",% ("6/"%-"() /"',9<&39 ='&>9 <&3-009 4#(39 ?'09 @.'09? $? NDghNI[)

g) W/#+"0J20&?G)? B'03+,..?d)? K0223?d)? f20(%"..?*)? W,-"?B)?W'Y'%&0"'4?9)X) '%& :'00"..?:) <[OOO> X0#"4,(R ("6/"%-" S,(/'.,_'#,2%'%& '%%2#'#,2%) @.'.&>'")%3.,(? #$? McchMcg)

=) @#2"(("0?H)? :'3"0?])? e'% Z"% :02"3?X)? H'0-,'AB'(#20?9)? G'%_?K)?G/.,32S'?*)? ;",%2%"%?W)? ;,%?i)? ;24!'0&?e)? ;21"_?W) -3 %09 <[OO`>*+" 89:; V/-."2#,&" @"6/"%-" Z'#'!'("R 4'Y20 %"$ &"S".214"%#():+,0-., A,.1( B-(9? %#? NDh[[)

D) :2"-34'%%?:)? :',02-+?X)? X1$",."0?W)? :.'##"0?9)K)? 8(#0",-+"0?X)?H'(#",5"0?8)? 9'0#,%?9)d)? 9,-+2/&?G)? L^Z2%2S'%?K)? B+'%?P) -3 %09<[OO`> *+" @]P@@ABWL* 102#",% 3%2$."&5"!'(" '%& ,#( (/11."4"%#*089:; ,% [OO`) :+,0-., A,.1( B-(9? %#? `=gh`DO)

I) G'%"+,('?9)? H2#2?@)? G'$'(+,4'?@) '%& V'3'F'?X) <[OO[> *+" G8HH&'#'!'("( '# H"%24"V"#) :+,0-., A,.1( B-(9? %&? c[hc=)

M) H2/5+?d)? G'01./(?G)? f/5+"F?W) '%& K+2#+,'?K) <[OON> X((,5%4"%# 2J+242.25F #2 5"%24" ("6/"%-"( /(,%5 ' .,!0'0F 2J +,&&"% 9'032S 42&".(#+'# 0"10"("%# '.. 102#",%( 2J 3%2$% (#0/-#/0") C9 ?'09 @.'09? %#%?MO`hMNM)

!"#$ :+,0-., A,.1( B-(-%",5D EFFGD H'09 IED J%3%*%(- .((+-

!"# $%&'(%)*+#* $,-)'.*/#* 0'--1&,*2#* 31-4,)*5#* /&6,77'-*2#* /88.*9#5#*:-,;<&=>?@A)'>*9#* BA6'*C#2#* D%->=%77*D# %)8 9A))=%(('-*/#2#EF""FG H=' I;%( J-A&',) ;%(,7,'> 8%&%4%>'# !"#$%&# '#&() *%)+* !"*FKLMFN"#

!!# D178'-*O#@#* +J6',7'-*5#* +&&6AA8*H#C#* $%,-AP=*+#* $%--'77*3#*$%&'(%)*+#* $,))>*3#* $,>6%>*D#* $-%87'.*I#* $A-Q*I# %, -$+ EF""RG H='S)&'-I-A 3%&%4%>'* F""R 4-,)T> ,)P-'%>'8 PAU'-%T' %)8 )'6 ;'%&1-'>#!"#$%&# '#&() *%)+* !#* R!VMR!N#

!F# W')T*9#* 3A)T*X#* $%7%Q-,>=)%)*5#* 0=-,>&,'*C#* 0A>&%)YA*D#*3A7,)>Q,*C#* 36,T=&*9#9#* /)T'7*9#* Z,>Q*3#:#* BA)T*/# %, -$+ EF""RG9%PP=%-A(.P'> :')A(' 3%&%4%>' E9:3G J-AU,8'> 4,AP='(,P%7 %)8>&-1P&1-%7 ,);A-(%&,A) ;A- 4188,)T .'%>& J-A&',)># !"#$%&# '#&() *%)+* !#*F!LMF!N#

!R# 07%.&A)*0#* +8%(>*D#* +7(',8%*5#* $%7&Y*H#* $%--'&&*D#* $%>&,')*I#*$'77,*9#* $'U'-7'.*9#* $,&'%1*O#* $7%PQ6'77*@# %, -$+ E![[NG :')'&,P)A(')P7%&1-' ;A- ./01-23)34- %)8 5%&)64-2&-# 73$+ 8&3#6%4+9-/-)&,3$+* $%* FF!MFF\#

!\# $%=7*+#* $-1)Q*$#* 0-%4&-''*@#* Z-%1)=A7Y*D#@#* :%]-,%*$#* :-%)&*:#5#*:,)>41-T*B#* :1J&%*3#* C,>>,)T'-*@#0#* 2%4A*I# %, -$+ EF""RG I7%>(A3$^&=' 9$-)43(&"4 T')A(' -'>A1-P'# + 8%&%4%>' ,)&'T-%&,)T '_J'-,(')&%7%)8 PA(J1&%&,A)%7 8%&%# !"#$%&# '#&() *%)+* !#* F!FMF!V#

!V# I%-Q=,77*@#* 3A1T%)*:#* @%('>*C#3#* H=A(>A)*O#5#* I,PQ%-8*3#* W%,)*@#*0=1-P='-*0#* D1)T%77*C#2#* $')&7'.*9#3#* BA78')*D#H# %, -$+ EF""!G0A(J7'&' T')A(' >'`1')P' A; % (17&,J7' 8-1T -'>,>&%)& :-$432%$$-%2,%/&#- >'-AU%- H.J=, 0H!N# !-,"/%* &#!* N\NMNVF#

!L# WAA8*a#* :6,77,%(*5#* 5%]%)8-'%(*D#+#* 2.)'*D#* 2.)'*5#* 9&'6%-&*+#*9TA1-A>*@#* I'%&*O#* B%.7'>*@#* $%Q'-*9# %, -$+ EF""FG H=' T')A('>'`1')P' A; :#6&;3)-##6-/340#%) 134<%# !-,"/%* &#'* NK!MNN"#

!K# :A;;'%1*+#* +'-&*5#* +TA>&,),?0%-4A)'*D#2#* +=('8*+#* +,T7'*D#*+74'-T=,)%*2#* +74'-(%))*C#* +74'->*D#* +78'%*D#* +7'_%)8-%Q,*3# %, -$+E![[KG H=' b'%>& :')A(' 3,-'P&A-.# !-,"/%* !(%* !M!"V#

!N# :%-8)'-*D#@#* B%77*O#* Z1)T*/#* W=,&'*c#* $'--,(%)*D#* B.(%)*5#W#*0%-7&A)*@#D#* I%,)*+#* O'7>A)*C#/#* $A6(%)*9# %, -$+ EF""FG :')A('>'`1')P' A; &=' =1(%) (%7%-,% J%-%>,&' 9$-)43(&"4 =-$#&1-/"4# !-,"/%*&#$* \[NMV!!#

!"#$%&# '#&() *%)%-/#6> ?@@A> B3$+ C?> D-,-<-)% &))"% !"#"

65

Submitted Publication

4. The Gene Ontology (GO) database and informatics resource.

!"# $#%# &%'()(*+ ,$&- ./'/0/1# /%. 2%3(45/'2614#1(746#$#%# &%'()(*+ 8(%1(4'2759

!"#$%&' $(%)#$%&' *+,,-./+ 01234 !+5./+ 67/823' 9:5;4.5' 67/<1:=>+ 6%?@ ?AB' CD

E+-+:F+= G2>234 H?' H@@IJ E+F:3+= 75= G--+84+= A+84+/<+1 ?H' H@@I

:;<!=:8!

!"# $#%# &%'()(*+ ,$&- >4(?#6' ,"''>@AABBBC*#%#(%'()(*+C(4*A- >4(D2.#1 1'476'74#.E 6(%'4())#.D(6/07)/42#1 /%. 6)/112F6/'2(%1 '"/' 6(D#4 1#D#4/).(5/2%1 (3 5()#67)/4 /%. 6#))7)/4 02()(*+ /%. /4#34##)+ /D/2)/0)# 3(4 6(557%2'+ 71# 2% '"# /%%('/'2(%(3 *#%#1E *#%# >4(.76'1 /%. 1#G7#%6#1C H/%+5(.#) (4*/%215 ./'/0/1#1 /%. *#%(5# /%%('/'2(%*4(7>1 71# '"# $& /%. 6(%'4207'# '"#24 /%%('/'2(%1#'1 '( '"# $& 4#1(746#C !"# $& ./'/0/1# 2%'#I*4/'#1 '"# D(6/07)/42#1 /%. 6(%'4207'#. /%%('/'2(%1/%. >4(D2.#1 37)) /66#11 '( '"21 2%3(45/'2(% 2% 1#DI#4/) 3(45/'1C H#50#41 (3 '"# $& 8(%1(4'275 6(%I'2%7/))+ B(4J 6())#6'2D#)+E 2%D()D2%* (7'12.# #K>#4'1/1 %##.#.E '( #K>/%. /%. 7>./'# '"# $& D(6/07)/4I2#1C !"# $& L#0 4#1(746# /)1( >4(D2.#1 /66#11 '(#K'#%12D# .(675#%'/'2(% /0(7' '"# $& >4(?#6' /%.)2%J1 '( />>)26/'2(%1 '"/' 71# $& ./'/ 3(4 37%6'2(%/)/%/)+1#1C

MN!=&OP8!M&N

!"# #$% &' (#)&*#+,-%.# /0&.&(1 "%, ,##) 2"# %--3*3.%20&) &'4%,2 %*&3)2, &' /0&.&(0-%. 5%2%6 %--&*7%)0#5 /1 2"# 805#+,7$#%5 7$&.0'#$%20&) &' /0&.&(1+&$0#)2#5 5%2%/%,#,9 !& *%:#2"# /#,2 3,# &' /0&.&(0-%. 5%2%/%,#, %)5 2"# :)&8.#5(# 2"#1-&)2%0)6 50''#$#)2 :0)5, &' 0)'&$*%20&) '$&* 50''#$#)2 ,&3$-#,*3,2 /# 0)2#($%2#5 0) 8%1, 2"%2 *%:# ,#),# 2& /0&.&(0,2,9

; *%<&$ -&*7&)#)2 &' 2"# 0)2#($%20&) #''&$2 0, 2"#5#4#.&7*#)2 %)5 3,# &' %))&2%20&) ,2%)5%$5, ,3-" %,&)2&.&(0#, =>?@A9 B)2&.&(0#, 7$&405# -&)-#723%.0C%20&), &'5&*%0), &' :)&8.#5(# %)5 '%-0.02%2# /&2" -&**3)0-%20&)/#28##) $#,#%$-"#$, %)5 2"# 3,# &' 5&*%0) :)&8.#5(# /1-&*732#$, '&$ *3.207.# 73$7&,#,9

!"# D#)# B)2&.&(1 =DBA 7$&<#-2 0, % -&..%/&$%204# #''&$2 2&%55$#,, 28& %,7#-2, &' 0)'&$*%20&) 0)2#($%20&)E 7$&4050)(-&),0,2#)2 5#,-$072&$, '&$ (#)# 7$&53-2,6 0) 50''#$#)2 5%2%+/%,#,F %)5 ,2%)5%$50C0)( -.%,,0G-%20&), '&$ ,#H3#)-#, %)5,#H3#)-# '#%23$#,9 !"# 7$&<#-2 /#(%) 0) >IIJ %, % -&..%/&$+%20&) /#28##) 2"$## *&5#. &$(%)0,* 5%2%/%,#,E K.1L%,#=!"#$#%&'()A6 2"# *)++&)"#,-+.$ D#)&*# M%2%/%,# =NDMA%)5 2"# O&3,# D#)&*# P)'&$*%20-, =ODPA 7$&<#-29 N0)-#2"#)6 2"# DB Q&),&$203* "%, ($&8) 2& 0)-.35# *%)15%2%/%,#,6 0)-.350)( ,#4#$%. &' 2"# 8&$.5R, *%<&$ $#7&,02&$0#,'&$ 7.%)26 %)0*%. %)5 *0-$&/0%. (#)&*#, =% -3$$#)2 .0,2 &'*#*/#$ &$(%)0C%20&), 0, 0)-.35#5 %, N377.#*#)2%$1O%2#$0%.A9

!QR $& S=&TR8!

!"# DB 7$&<#-2 "%, 2"$## *%<&$ (&%.,E =0A 2& 5#4#.&7 % ,#2 &'-&)2$&..#56 ,2$3-23$#5 4&-%/3.%$0#,S:)&8) %, &)2&.&(0#,S2&5#,-$0/# :#1 5&*%0), &' *&.#-3.%$ /0&.&(16 0)-.350)( (#)#7$&53-2 %22$0/32#, %)5 /0&.&(0-%. ,#H3#)-#,F =00A 2& %77.1 DB2#$*, 0) 2"# %))&2%20&) &' ,#H3#)-#,6 (#)#, &$ (#)# 7$&53-2,0) /0&.&(0-%. 5%2%/%,#,F %)5 =000A 2& 7$&405# % -#)2$%.0C#573/.0- $#,&3$-# %..&80)( 3)04#$,%. %--#,, 2& 2"# &)2&.&(0#,6%))&2%20&) 5%2% ,#2, %)5 ,&'28%$# 2&&., 5#4#.&7#5 '&$ 3,# 802"DB 5%2%9

!"#$%$&'()

!"# DB 7$&<#-2 7$&405#, &)2&.&(0#, 2& 5#,-$0/# %22$0/32#, &'(#)# 7$&53-2, 0) 2"$## )&)+&4#$.%770)( 5&*%0), &' *&.#-3.%$/0&.&(19 T02"0) #%-" &)2&.&(16 2#$*, "%4# '$## 2#U2 5#G)020&),%)5 ,2%/.# 3)0H3# 05#)20G#$,9 !"# 4&-%/3.%$0#, %$# ,2$3-23$#50) % -.%,,0G-%20&) 2"%2 ,377&$2, V0,+%R %)5 V7%$2+&'R $#.%20&)+,"07,9 !"# ,-&7# %)5 ,2$3-23$# &' 2"# DB 4&-%/3.%$0#, %$#5#,-$0/#5 0) *&$# 5#2%0. 0) $#'#$#)-#, =W?XA9 P) 2"# -3$$#)2$#,#%$-" #)40$&)*#)26 8"#$# )#8 (#)&*# ,#H3#)-#, %$#/#0)( $%705.1 (#)#$%2#56 %)5 8"#$# -&*7%$%204# (#)&*#

YQ&$$#,7&)5#)-# ,"&3.5 /# %55$#,,#5 2& DB+ZLP6 ZOL[+ZLP6 T#..-&*# !$3,2 D#)&*# Q%*73,6 \0)U2&)6 Q%*/$05(# QL>] >NM6 ^_9!#.9 `@@ >aab @I@ccXF K%UE `@@ >aab @I@@cJF Z*%0.E *05&$0d#/09%-93:YQ3$$#)2 *#*/#$, &' 2"# DB Q&),&$203* %$#E O9 ;9 \%$$0,6 e9 Q.%$:6 ;9 P$#.%)56 e9 [&*%U =DB+ZLP6 \0)U2&)6 ^_AF O9 ;,"/3$)#$6 f9 K&3.(#$ =K.1L%,#6M#7%$2*#)2 &' D#)#20-,6 ^)04#$,021 &' Q%*/$05(#6 Q%*/$05(#6 ^_AF _9 Z0./#-:6 N9 [#80,6 L9 O%$,"%..6 Q9 O3)(%..6 e9 f0-"2#$6 D9 O9 f3/0) =LMDg6^Q+L#$:#.#16 L#$:#.#16 Q;6 ^N;A6 e9 ;9 L.%:#6 Q9 L3.26 O9 M&.%)6 \9 M$%/:0)6 e9 !9 Z770(6 M9 g9 \0..6 [9 h06 O9 f0)(8%.5 =ODP6 e%-:,&) [%/&$%2&$16L%$ \%$/&$6 OZ6 ^N;AF f9 L%.%:$0,")%)6 e9 O9 Q"#$$16 _9 f9 Q"$0,20#6 O9 Q9 Q&,2%)C&6 N9 N9 M80("26 N9 Z)(#.6 M9 D9 K0,:6 e9 Z9 \0$,-"*%)6 Z9 [9 \&)(6f9 N9 h%,"6 ;9 N#2"3$%*%)6 Q9 [9 !"##,'#.5 =NDM6 M#7%$2*#)2 &' D#)#20-,6 N2%)'&$5 ^)04#$,0216 N2%)'&$56 Q;6 ^N;AF M9 L&2,2#0)6 _9 M&.0),:06 L9 K#0#$/%-"=D#)&*0-, P),20232#6 g$0)-#2&) ^)04#$,0216 g$0)-#2&)6 he6 ^N;AF !9 L#$%$50)06 N9 O3)5&506 N9 i9 f"## =!;Pf6 Q%$)#(0# P),202320&)6 M#7%$2*#)2 &' g.%)2L0&.&(16 N2%)'&$56 Q;6 ^N;AF f9 ;78#0.#$6 M9 L%$$#..6 Z9 Q%*&)6 Z9 M0**#$6 j9 [## =DB; 5%2%/%,#6 ^)0g$&26 ZLP6 \0)U2&)6 ^_AF f9 Q"0,"&.*6 g9 D%35#26T9 _0//# =M0-21L%,#6 h&$2"8#,2#$) ^)04#$,0216 Q"0-%(&6 P[6 ^N;AF f9 _0,"&$#6 Z9 O9 N-"8%$C6 g9 N2#$)/#$( =T&$*L%,#6 Q%.0'&$)0% P),20232# &' !#-")&.&(16g%,%5#)%6 Q;6 ^N;AF O9 D80))6 [9 \%))0-:6 e9 T&$2*%) =P),20232# '&$ D#)&*# f#,#%$-"6 f&-:40..#6 OM6 ^N;AF O9 L#$$0*%)6 j9 T&&5 =T#..-&*# !$3,2N%)(#$ P),20232#6 \0)U2&)6 ^_AF h9 5# .% Q$3C6 g9 !&)#..%2& =fDM6 O#50-%. Q&..#(# &' T0,-&),0)6 O0.8%3:##6 TP6 ^N;AF g9 e%0,8%. =D$%*#)#6 M#7%$2*#)2&' g.%)2 L$##50)(6 Q&$)#.. ^)04#$,0216 P2"%-%6 hi6 ^N;AF !9 N#0('$0#5 =O%0C# ML6 P&8% N2%2# ^)04#$,0216 ;*#,6 P;6 ^N;AF f9 T"02# =P)-12# D#)&*0-,6g%.& ;.2&6 Q;6 ^N;A9

!"#$%!"&' /0+(.'+ 1+'2$ 3.$.)"+&4 56674 8#(9 :54 !);)<)$. '$$0.!=>? @69@6A:BC)"BDE&6:F

/0+(.'+ 1+'2$ 3.$.)"+&4 8#(9 :54 !);)<)$. '$$0. ! =GH#"2 IC'J."$';- K".$$ 5667L )(( "'D&;$ ".$."J.2

!"!#$%&% '()*&'(% +,( &"+(-'!+&." ./ 0!+! /'.1 1*#+&2#( %.*'3(%4&+ &% (%2(3&!##$ -('1!"( +. 2'.5&0( '&-.'.*% ."+.#.-&(% +,!+ 3!"6( %,!'(0 6$ +,( 3.11*"&+$7

8.#(3*#!' 9*"3+&." :89; 0(%3'&6(% !3+&5&+&(%4 %*3, !%3!+!#$+&3 .' 6&"0&"- !3+&5&+&(%4 !+ +,( 1.#(3*#!' #(5(#7 <=1.#(3*#!' /*"3+&." +('1% '(2'(%("+ !3+&5&+&(% '!+,(' +,!" +,(("+&+&(% :1.#(3*#(% .' 3.12#(>(%; +,!+ 2('/.'1 +,( !3+&."%4!"0 0. ".+ %2(3&/$ ?,('(4 ?,(" .' &" ?,!+ 3."+(>+ +,( !3+&."+!@(% 2#!3(7 A>!12#(% ./ &"0&5&0*!# 1.#(3*#!' /*"3+&." +('1%!'( +,( 6'.!0 3."3(2+ B@&"!%( !3+&5&+$C !"0 +,( 1.'( %2(3&D3BEF2,.%2,./'*3+.@&"!%( !3+&5&+$C4 ?,&3, '(2'(%("+% ! %*6+$2(./ @&"!%( !3+&5&+$7

G&.#.-&3!# H'.3(%% :GH; 0(%3'&6(% 6&.#.-&3!# -.!#% !33.1F2#&%,(0 6$ ."( .' 1.'( .'0('(0 !%%(16#&(% ./ 1.#(3*#!'/*"3+&."%7 I&-,F#(5(# 2'.3(%%(% %*3, !% B3(## 0(!+,C 3!" ,!5(6.+, %*6+$2(%4 %*3, !% B!2.2+.%&%C4 !"0 %*62'.3(%%(%4 %*3, !%B!2.2+.+&3 3,'.1.%.1( 3."0("%!+&."C7

J(##*#!' J.12."("+ :JJ; 0(%3'&6(% #.3!+&."%4 !+ +,( #(5(#%./ %*63(##*#!' %+'*3+*'(% !"0 1!3'.1.#(3*#!' 3.12#(>(%7A>!12#(% ./ 3(##*#!' 3.12."("+% &"3#*0( B"*3#(!' &""('1(16'!"(C4 ?&+, +,( %$"."$1 B&""(' ("5(#.2(C4 !"0 +,(B*6&)*&+&" #&-!%( 3.12#(>C4 ?&+, %(5('!# %*6+$2(% ./ +,(%(3.12#(>(% '(2'(%("+(07

K,( '(3("+ 0(5(#.21("+ ./ +,( L()*("3( ="+.#.-$ :L=;2('1&+% +,( 3#!%%&D3!+&." !"0 %+!"0!'0 '(2'(%("+!+&." ./%()*("3( /(!+*'(%7 M(D"(0 %()*("3( /(!+*'(% &"3#*0( +('1%%*3, !% B(>."C4 ?,.%( 1(!"&"- &% ?&0(#$ !33(2+(04 !"0 +,(1.'( 2'.6#(1!+&3 +('1 B2%(*0.-("(C4 /.' ?,&3, %(5('!#0&//('("+ *%!-(% ,!5( $(+ +. 6( '(%.#5(07 N#+,.*-, +,( L= &%! '(#!+&5(#$ "(? 5.3!6*#!'$4 !"0 &% %+&## *"0('-.&"- '(D"(F1("+4 &+ &% !#'(!0$ 6(&"- *%(0 /.' -(".1( !"".+!+&." 2'.O(3+%&" !"#$#%&'() !"0 *)+,#"&)-.'/'$ +(+0),$7

!""#$%$&#"'

J.##!6.'!+&"- 0!+!6!%(% 2'.5&0( 0!+! %(+% 3.12'&%&"- #&"@%6(+?((" 0!+!6!%( .6O(3+% !"0 <= +('1%4 ?&+, %*22.'+&"-0.3*1("+!+&."7 A5('$ !"".+!+&." 1*%+ 6( !++'&6*+(0 +. !%.*'3(4 ?,&3, 1!$ 6( ! #&+('!+*'( '(/('("3(4 !".+,(' 0!+!6!%(.' ! 3.12*+!+&."!# !"!#$%&%P /*'+,('1.'(4 +,( !"".+!+&." 1*%+&"0&3!+( +,( +$2( ./ (5&0("3( +,( 3&+(0 %.*'3( 2'.5&0(% +.%*22.'+ +,( !%%.3&!+&." 6(+?((" +,( -("( 2'.0*3+ !"0 +,( <=+('17 N %+!"0!'0 %(+ ./ (5&0("3( 3.0(% )*!#&D(% !"".+!+&."%?&+, '(%2(3+ +. 0&//('("+ +$2(% ./ (>2('&1("+!# 0(+('1&"!+&."%79.' (>!12#(4 ! 0&'(3+ !%%!$ +. 0(+('1&"( +,( /*"3+&." ./ +,((>!3+ -("( 2'.0*3+ 6(&"- !"".+!+(0 &% 1.'( '(#&!6#( +,!" !%()*("3( !'3,&+(3+*'( 3.12!'&%."7

I&-,F)*!#&+$ <= !"".+!+&."%4 ".'1!##$ 6!%(0 ." 3*'!+.'&!#'(5&(? ./ 2*6#&%,(0 #&+('!+*'( !"0 %*22.'+(0 6$ (>2('&1("+!#(5&0("3(4 !'( ".? !5!&#!6#( /.' -("( 2'.0*3+% &" 1!"$ 1.0(#.'-!"&%1%7 Q" !00&+&."4 #!'-( %(+% ./ !"".+!+&."% 1!0( *%&"-!*+.1!+(0 1(+,.0% 3.5(' 6.+, 1.0(# .'-!"&%1% !"0 #(%%(>2('&1("+!##$ +'!3+!6#( .'-!"&%1%4 &"3#*0&"- ,*1!"7 N"*16(' ./ 0&//('("+ !*+.1!+&3 1(+,.0% ,!5( 6((" !22#&(0:(7-7 RSTU;4 !## ./ ?,&3, !'( '(2'(%("+(0 6$ +,( (5&0("3( 3.0(QAN :B&"/(''(0 /'.1 (#(3+'."&3 !"".+!+&."C;7 K!6#( T 2'.5&0(%! %"!2%,.+ ./ 3*''("+ !"".+!+&."% &" +,( <= 0!+!6!%(P ! 1.'(0(+!&#(0 +!6#( &% 1!&"+!&"(0 ." +,( ?(6 !+ ,++2VWW???7-(F"(."+.#.-$7.'-W0.3W<=73*''("+7!"".+!+&."%7%,+1#7 N00&+&."!#&"/.'1!+&." ." <= !"".+!+&."% 3!" 6( /.*"0 &" '(/('("3(% :XSR; !"0 :TY;7

K,( L= &% 6(&"- *%(0 6$ +,( 3.##!6.'!+&"- 0!+!6!%(%/.' -(".1&3 /(!+*'( !"".+!+&."7 Z&@( <= !"".+!+&."%4 L=!"".+!+&."% !'( 3*'!+(0 *%&"- 6.+, 1!"*!# ?.'@ 6$ (>2('+%!"0 2*'(#$ 3.12*+!+&."!# 1(+,.0.#.-&(%7

() '*&+'

9.' 1!"$ 2*'2.%(%4 &" 2!'+&3*#!' '(2.'+&"- +,( '(%*#+% ./ <=!"".+!+&." ./ ! -(".1( .' 3M[N 3.##(3+&."4 &+ &% 5('$ *%(/*# +.,!5( ! ,&-,F#(5(# 5&(? ./ (!3, ./ +,( +,'(( ."+.#.-&(%7 K,(%(%*6%(+% ./ +,( <= ,!5( 6(3.1( @".?" !% B<= %#&1%C4 +,( D'%+./ ?,&3, ?!% 3."%+'*3+(0 /.' +,( !"".+!+&." ./ +,( !"#$#%&'()-(".1( :TY;7 N" (>!12#( ./ ! <= %#&1 !"!#$%&% &% %,.?" &"9&-*'( T7

K,( %,!'(0 *%( ./ <= %#&1% 1!@(% 3.12!'&%."% ./%*11!'$ <= +('1 0&%+'&6*+&."% 5('$ (!%$7 M&//('("+ !22#&3!F+&."%4 ,.?(5('4 1!$ '()*&'( 0&//('("+ <= %#&1 %(+% +!&#.'(0 +.+,( %2(3&D3 "((0% ./ !" !"!#$%&%7 K. !00'(%% +,&%4 +,( <=J."%.'+&*1 1!@(% 6.+, -("('&3 !"0 %2(3&D3 <= %#&1 D#(%!5!&#!6#(7 K,( -("('&3 <= %#&1 D#( &% @(2+ *2 +. 0!+( ?&+,'(%2(3+ +. +,( /*## ."+.#.-&(%4 !"0 %2(3&D3 <= %#&1 D#(% +,!+,!5( 6((" *%(0 &" 2!'+&3*#!' 2*6#&3!+&."% .' !"!#$%(% !'(!'3,&5(07

!"# $% &'!'(')#

K,( <= 0!+!6!%( 3."%&%+% ./ ! 8$L\Z 0!+!6!%( +,!+ 3!2+*'(%<= 3."+("+ !"0 ! H('# .6O(3+ 1.0(# !"0 N22#&3!+&."H'.-'!11(' Q"+('/!3( :NHQ; +. %&12#&/$ 0!+!6!%( !33(%% !"0,(#2 2'.-'!11('% ?'&+( +..#% +,!+ *%( +,( <= 0!+!7 K,( <='(#!+&."!# 0!+!6!%( &% '(#(!%(0 1."+,#$ &" %(5('!# 5('%&."%V+('106 &"3#*0(% +,( ."+.#.-&(%4 0(D"&+&."% !"0 3'.%%F'(/('F("3(% +. .+,(' 0!+!6!%(%P !%%.306 &"3#*0(% !## 0!+! &" +('1062#*% !%%.3&!+&."% +. -("( 2'.0*3+%P !"0 %()06 !00% 2'.+(&"%()*("3(% /.' !"".+!+(0 -("( 2'.0*3+% :?,('( !5!&#!6#(;7 N/.*'+, 5('%&."4 %()06#&+(4 &% ()*&5!#("+ +. %()06 ?&+,.*+ +,(QANF6!%(0 !%%.3&!+&."%P +,&% 5('%&." &% *%(0 6$ +,( N1&<=6'.?%(' :%(( 6(#.?;7

K,( <= 0!+!6!%( %3,(1! 1.0(#% -("('&3 -'!2,%4 &"3#*0&"-+,( <= %+'*3+*'( :! 0&'(3+(0 !3$3#&3 -'!2,4 .' MN<;'(#!+&."!##$7 N+ +,( 3.'( ./ +,( %3,(1! !'( +?. '(#!+&."!#+!6#(% /.' 3!2+*'&"- !## +('1% :!#%. 3!##(0 ".0(%; !"0 +('1S+('1 '(#!+&."%,&2% :!'3%;7 K,( +?. '(#!+&."%,&2 +$2(%4 B&%F!C !"0B2!'+F./4C !'( '(2'(%("+(0 !% ! B'(#!+&."%,&2 +$2(C !++'&6*+( &"+,( '(#!+&."%,&2 +!6#(7

,%-*. /0 L+!+*% ./ +,( <= 5.3!6*#!'&(%

K.+!#% ]*#$ T4 U^^^ ]*#$ T4 U^^Y

N## 5!#&0 +('1%! __`Y TY_TUK('1% ?&+, 0(D"&+&."% UX^ TTT^XK('1% ?&+, %$"."$1% Y^T URTYK('1% ?&+, 06 3'.%%F'(/('("3(% T^_U TUYTaN%%.3&!+&."%6 Y^EX_ aaRT`X_<("( 2'.0*3+% TY^TE TX_`UYEL()*("3(% ^ UT`TEH!+,%3 Y^`_T YT_RRE

!A>3#*0(% .6%.#(+( +('1%76Q"0&5&0*!# !%%.3&!+&."% 6(+?((" !"$ -("( 2'.0*3+ !"0 !"$ <= +('173H!'("+S3,&#0 '(#!+&."%,&2% +'!3(0 /'.1 !"$ <= +('1 +. +,( '..+ :1.#(3*#!'/*"3+&."4 6&.#.-&3!# 2'.3(%% .' 3(##*#!' 3.12."("+;7

123(+'3 43'.$ 5+$+)"3&6 78896 :#(; <76 !)/)-)$+ '$$2+ !"#$

!" #$%"&#'$%

!""#$$ %& &'%&(&)*#$ +', +''&%+%*&'$ *' +(( -&./+%$

!"# $%&'%& $( &"# )* '+$,#-&./$-01%20+3#45 066$&0&3$64570&0104# 067 0--$8'06936: &$$24.0+# 36 &"# '%123- 7$8036067 0+# +#07329 0--#44312# /30 &"# )* ;#1 '0:#4 0& "&&'<==;;;>:#6#$6&$2$:9>$+:=> !"# )* ?$64$+&3%8 :3/#4 '#+834@43$6 ($+ 069 $( 3&4 '+$7%-&4 &$ 1# %4#7 ;3&"$%& 23-#64#5 360--$+706-# ;3&" 3&4 +#734&+31%&3$6 067 -3&0&3$6 '$23-9>A3:"23:"&4 $( &"0& '$23-9 0+#<

B3C &"0& &"# )#6# *6&$2$:9 ?$64$+&3%8 34 -2#0+29 0-D6$;@2#7:#7 04 &"# 4$%+-# $( &"# '+$7%-&E

B33C &"0& 069 )* ?$64$+&3%8 F2#B4C 734'209#7 '%123-2936-2%7# &"# +#/343$6 6%81#+B4C 067=$+ 70&#B4C $( &"# +#2#/06&)* F2#B4CE

B333C &"0& 6#3&"#+ &"# -$6&#6& $( 0 )* F2#B4C 6$+ &"# 2$:3-02+#20&3$64"3'4 #81#77#7 ;3&"36 &"# )* F2#B4C 1# 02&#+#7 36 069;09>

!"# (%22 )* G#734&+31%&3$6 067 ?3&0&3$6 H$23-9 7$-%8#6&34 0/032012# $6236# 0& "&&'<==;;;>:#6#$6&$2$:9>$+:=7$-=)*>-3&#>"&82> I 234& $( %4#(%2 JGK4 067 077+#44#4 34 36-2%7#736 &"# L%''2#8#6&0+9 M0&#+302>

!"# M9LNK 70&0104# 7#4-+31#7 01$/# -06 1# 7$;62$07#72$-02295 067 H#+2 IHO4 0+# '+$/37#7> !"# )* ?$64$+&3%8P4$6&$2$:3#4 067 066$&0&3$64 0+# 024$ 0/032012# 04 Q0& F2#4 B&"#8$4& (+#R%#6&29 %'70&#7 ($+80& 0& &"# &38# $( ;+3&36:C 067 04GST UMKE &"# 20&&#+ 34 0/032012# ;3&" $+ ;3&"$%& 066$&0&3$670&0 36-2%7#7> !"# M9LNK 067 UMK ($+80&4 0+# +#2#04#78$6&"29> !"# Q0& F2#4 0+# %'70&#7 -$6&3602295 067 8$6&"29

460'4"$&4 0+# 0+-"3/#7> ?%++#6& 067 0+-"3/02 +#2#04#4 $( 022&"+## ($+80&4 -06 1# 7$;62$07#7 (+$8 &"# )* ;#1 43&#>

0&"1/#'%+%*&'

!"# )* ;#1 +#4$%+-# 36-2%7#4 06 #V&#643/# 4#& $( 7$-%@8#6&0&3$6 '0:#4 B4## "&&'<==;;;>:#6#$6&$2$:9>$+:=7$-=)*>-$6&#6&4>7$->"&82C> !$'3-4 36-2%7# 06 $/#+/3#; $( &"#)* '+$,#-& 067 &"# $6&$2$:3#45 :%37#4 &$ #73&$+302 4&92#5 F2#($+80&4 067 066$&0&3$6 '+0-&3-#45 067 (+#R%#6&29 04D#7R%#4&3$64 BTINC>

2&-%3+.#4%&&($

I /0+3#&9 $( 1+$;4#+4 &"0& '+$/37# /34%023W0&3$6 067 R%#+9-0'01323&3#4 ($+ &"# )* 0+# 0/032012#> T$+ #V08'2#5 &"# I83)*1+$;4#+ B7#/#2$'#7 19 &"# )* 4$(&;0+# :+$%' 0& X#+D#2#9E 4##"&&'<==;;;>:$70&0104#>$+:=-:3@136=:$>-:3C '+$/37#4 0 ;#136&#+(0-# ($+ 4#0+-"36: 067 734'20936: &"# $6&$2$:3#45 &#+87#F63&3$64 067 044$-30&#7 066$&0&#7 :#6# '+$7%-&4 ($+ &"##6&3+# 4'#-&+%8 $( -$6&+31%&36: $+:06348 70&0104#4 +#'+#@4#6&#7 36 &"# )* 70&0104#> I83)* #04329 022$;4 %4#+4 &$1+$;4# 0 &+##@23D# /3#; $( &"# )* 4&+%-&%+# 067 &$ 4#0+-" ($+&#+84 %436: 0 /0+3#&9 $( 73((#+#6& D#94 4%-" 04 0 608#5496$6985 7#F63&3$65 6%8#+3-02 37#6&3F#+ $+ -+$44@+#(#+#6-#7#6&+9 36 06 #V&#+602 70&0104#> !"# 4%880+9 /3#; '+#4#6&4 &"#234& $( :#6# '+$7%-&4 044$-30&#7 ;3&" #0-" &#+8> !"# +#4%2&4809 1# -$64&+036#7 19 &"# #/37#6-# -$7# %4#7 36 &"#044$-30&3$6 $+ 19 &"# $+:063W0&3$6 &"0& 4%183&&#7 &"# 044$-3@0&3$6> G#'+#4#6&0&3/# 0836$ 0-37 4#R%#6-#4 0+# 0/032012# ($+8$4& :#6#45 067 &"#4# -06 1# 4#2#-&#7 067 7$;62$07#7 04

5*)1.# 67 I''23-0&3$6 $( 0 )* 4238 4#& 36 :#6$8# 066$&0&3$6> !"# 6%81#+ $( :#6# '+$7%-&4 066$&0&#7 &$ #0-" &#+8 36 #0-" $( ($%+ 8$7#2 $+:06348 :#6$8#434 4"$;6 ($+ 0 )* 4238 4#& &0D#6 (+$8 &"# -#22%20+ -$8'$6#6& $6&$2$:9 B70&0 04 $( I%:%4& Y5 Z[[\C>

!"#$ !"#$%&# '#&() *%)%+,#-. /001. 23$4 5/. 6+7+8+)% &))"%

!"#$" %&'() *(+,- ./(01 02' ./ 34"#$ ('56'51 7('5( 89:(7;8+0 9 <7'5: ('<7',=' 9,> 5'05+'6' 02' ('<7',='( 9,> ./9,,?090+?,( ?@ 9&& (+8+&95 -',' A5?>7=0( +, 02' ./ >909;9(')

$2' ./ (?@0B95' -5?7A 29( 9&(? >'6'&?A'> C".DE>+01 90??& 0290 A5?6+>'( 9 -59A2+=9& +,0'5@9=' 0? ;5?B('1 <7'5: 9,>'>+0 ./ ?5 9,: ?02'5 6?=9;7&95: 0290 29( 9 C". >909 (057=075')./ =7590?5( 7(' C".DE>+0 0? 89,9-' 02' ./ 6?=9;7&95+'()$2' 0??& 29( 9&(? ;'', 7('> ;: ?02'5 -5?7A( 0? ;7+&> ?,0?&?-+'(@?5 9 B+>' 59,-' ?@ ;+?&?-+=9& (7;F'=0(1 (7=2 9( 9,90?8+'( 9,>>'6'&?A8',09& 0+8'&+,'( @?5 ('6'59& 8?>'& ?5-9,+(8(1 2789,>+('9('( 9,> A&9,0 -5?B02 ',6+5?,8',0) C".DE>+0 +( 9, ?A',(?75=' G969 9AA&+=90+?, 0290 +( +,(09&&'> &?=9&&:) " 7('5 -7+>' +(969+&9;&' B+02+, 02' 9AA&+=90+?, 9,> ?, 02' B'; H200AIJJBBB)-','?,0?&?-:)?5-J>?=J>9-'>+0K7('5-7+>'J>9-'>+0)208&L)

C".DE>+0 +( 7A>90'> 5'-7&95&: 0? 9>> @'9075'( 9,> +8A5?6'A'5@?589,='M 02' =755',0 6'5(+?, =9, ;' >?B,&?9>'> @5?8200AIJJ(?75='@?5-'),'0JA5?F'=0J(2?B%&'()A2AN-5?7AK+>OPQRSS)

$2' ./ #?@0B95' B'; A9-' H200AIJJBBB)-','?,0?&?-:)?5-J>?=J./)0??&()208&L A5?6+>'( 9 =909&?-7' ?@ ./D5'&90'> 0??&(>'6'&?A'> ;: 8'8;'5( ?@ 02' ./ T?,(?50+78 ?5 ;: ./ 7('5()U, 9>>+0+?, 0? "8+./1 02'5' 95' ('6'59& 8?5' 9AA&+=90+?,( @?5;5?B(+,- 9,> ('95=2+,- 02' ./ 6?=9;7&95+'( 9,> 9,,?090+?,()/02'5 969+&9;&' (?@0B95' +,=&7>'( 9AA&+=90+?,( @?5 =?55'&90+,->909 @5?8 02' ./ A5?F'=0 9,> ?02'5 (?75='( H+,=&7>+,-1 ;70 ,?0&+8+0'> 0?1 8+=5?9559: >909L1 9( B'&& 9( 0??&( 0290 95' ,?0(A'=+%= 0?1 ;70 =9, ;' 7('> +, =?,F7,=0+?, B+021 ./ >909)

!"#$% %$&'(%)$&

!"#$%&#'%$ ()**$(#")+) $2' ./ A5?F'=0 89+,09+,( 9 ;+;&+?D-59A2: ?@ A''5D5'6+'B'> A7;&+=90+?,( HVWX 9( ?@ "7-7(0 WYYPL5'&'69,0 0? 02' >'6'&?A8',0 9,> 7(' ?@ 02' ./ 6?=9;7&95+'(9,> 9,,?090+?, ('0( 90 200AIJJBBB)-','?,0?&?-:)?5-J>?=J./);+;&+?)208&) Z9,: ?@ 02' A7;&+=90+?,( >?=78',0 02'=7590+?, 9,> >+(A&9: ?@ ./ 9,,?090+?,( B+02+, 9 B+>' 695+'0:?@ >909;9('(1 B2'5'9( ?02'5( 89[' 7(' ?@ ./ 0'58( 9,> -','A5?>7=0 9,,?090+?,( +, 02' +,0'5A5'090+?, ?@ &95-'D(=9&''\A'5+8',09& 5'(7&0() #0+&& ?02'5 A9A'5( >'(=5+;' ,?6'& 7('(?@ ./ 0'58( H')-) +, 0'\0 8+,+,-L1 (?@0B95' 0290 7('( ./ >9099,> +,0'-590+?, ?@ 02' ./ B+02 ?02'5 ?,0?&?-+=9& 5'(?75='()

,)--'+"#. "+/'#) $2' ./ '@@?50 +( -5'90&: ',5+=2'> ;: +,A70@5?8 +0( 7('5 =?887,+0:) #'6'59& 5?70'( 95' 969+&9;&' @?5 7('5(0? =?88',0 ?, 695+?7( 9(A'=0( ?@ 02' ./) T?88',0( 9,>(7--'(0+?,( @?5 =29,-'( 9,> 7A>90'( 0? 02' ?,0?&?-+'( =9,;' (7;8+00'> 6+9 9 ./ A5?F'=0 A9-' 90 02' #?75='!?5-'(+0' H200AIJJ(?75='@?5-'),'0JA5?F'=0(J-','?,0?&?-:L1 B2'5'7A?,'9=2 (7--'(0+?, +( '69&790'> ;: ./ T?,(?50+78 8'8;'5()C+@@'5',0 ]059=['5(^ 969+&9;&' @5?8 02' #?75='!?5-' (+0' 9&&?B./ 7('5( 0? 5'A?50 A5?;&'8( ?5 5'<7'(0 @'9075'( @?5 02' "8+./;5?B('51 9,> 0? (7;8+0 (7--'(0+?,( @?5 9>>+0+?,( 9,> =29,-'( 0?02' ?,0?&?-+'(M +0'8( =9, ;' 9((+-,'> 0? +,>+6+>79&( ?5 -5?7A(B+02+, 02' ./ T?,(?50+78 B2? 296' 5'&'69,0 '\A'50+(') $2+((:(0'8 9&&?B( 02' (7;8+00'5 0? 059=[ 02' (0907( ?@ 9 (7--'(0+?,1;?02 ?,&+,' 9,> ;: '89+&1 9&&?B( ?02'5 7('5( 0? ('' B290=29,-'( 95' =755',0&: 7,>'5 =?,(+>'590+?,1 9,> 95=2+6'( 9&&',05+'( 9,> 9((?=+90'> =?887,+=90+?,()

0&"*"+1 *"2#2) ./ 9&(? 29( ('6'59& 89+&+,- &+(0(1 =?6'5+,--','59& <7'(0+?,( 9,> =?88',0(1 02' ./ >909;9(' 9,>(?@0B95'1 9,> (78895+'( ?@ =29,-'( 0? 02' ?,0?&?-+'() $2'&+(0( 95' >'(=5+;'> 90 200AIJJBBB)-','?,0?&?-:)?5-J./K

=?,09=0()208&) ",: <7'(0+?,( 9;?70 =?,05+;70+,- 0? 02' ./A5?F'=0 (2?7&> ;' >+5'=0'> 0? 02' 89+, ./ 89+&+,- &+(0 90-?_-','?,0?&?-:)?5-)

!"##$%&

$2' ./ A5?F'=0 A5?6+>'( 9, ?,-?+,- '\98A&' ?@ =?887,+0:>'6'&?A8',0 ?@ ;+?+,@?5890+=( (09,>95>() T?8;+,+,- 02''\A'50+(' ?@ ;+?&?-+(0( @5?8 87&0+A&' (7;D>+(=+A&+,'(1 02'=?8A7090+?,9& '\A'50+(' ?@ 950+%=+9& +,0'&&+-',=' 5'('95=2'5(19,> +,A70 @5?8 87&0+A&' 7('5( ?@ 02' (:(0'81 02' ./T?,(?50+78 =?,0+,7'( 0? >'6'&?A 9,> '\A9,> 02'(' =&9((+%=9D0+?, (:(0'8( @?5 8?&'=7&95 ;+?&?-:)

!"''()#)*+$%& #$+)%,$(

#7AA&'8',095: Z90'5+9& +( 969+&9;&' 90 `"a /,&+,')

$-.*/0()12)#)*+!

$2' .',' /,0?&?-: T?,(?50+78 +( (7AA?50'> ;: `UbJ`b.aU-59,0 b.YWWcP1 9,> ;: -59,0( @5?8 02' E75?A'9, *,+?, a$Cd5?-5988' ]e79&+0: ?@ 4+@' 9,> Z9,9-'8',0 ?@ 4+6+,-a'(?75='(^ He4aUDT$DWYYVDYYfRV 9,> e4aUDT$DWYYVDYYYVSL)

%)3)%)*-)!

V) .57;'51$)a) HVffPL " 059,(&90+?,9& 9AA5?9=2 0? A?509;&' ?,0?&?-+'()3+)4*5 6(751 *1 VffgWWY)

W) G?,'(1C)Z) 9,> d90?,1a)T) HVfffL $?B95> A5+,=+A&'( @?5 02'5'A5'(',090+?, ?@ 2+'595=2+=9& [,?B&'>-' +, @?589& ?,0?&?-+'() 8&#&3+)4*5 9+151 +,1 VYWgVYS)

P) #=27&h'Di5'8'51#) HVffRL /,0?&?-+'( @?5 8?&'=7&95 ;+?&?-:) :&(5 ;.-/5<")()-/'#51 +1 QfSgcYQ)

X) #0'6',(1a)1 .?;&'1T)")1 9,> 3'=22?@'5)#) HWYYYL /,0?&?-:D;9('>[,?B&'>-' 5'A5'(',090+?, @?5 ;+?+,@?5890+=() <%"$=5 <")"+=)%-51 ,1 PfRgXVX)

S) 3&9['1G)") 9,> b955+(1Z) HWYYPL $2' .',' /,0?&?-: d5?F'=0I #057=075'>6?=9;7&95+'( @?5 8?&'=7&95 ;+?&?-: 9,> 02'+5 9AA&+=90+?, 0? -',?8' 9,>'\A5'((+?, 9,9&:(+() U, 39\'69,+(1")C)1 C96+(?,1C)3)1 d9-'1a)1#0?58?1.) 9,> #0'+,14) H'>(L1 ,'%%$+# :%)#)()*2 "+ <")"+=)%-&#"(25j+&':9,> #?,(1 U,=)1 `'B k?5[)

Q) $2' .',' /,0?&?-: T?,(?50+78 HWYYVL T5'90+,- 02' -',' ?,0?&?-:5'(?75='I >'(+-, 9,> +8A&'8',090+?,) >$+)-$ ?$251 ,,1 VXWSgVXPP)

c) $2' .',' /,0?&?-: T?,(?50+78 HWYYYL .',' /,0?&?-:I 0??& @?5 02'7,+%=90+?, ?@ ;+?&?-:) @&#'%$ >$+$#51 -*1 WSgWf)

R) T98?,1E)1 Z9-59,'1Z)1 3955'&&1C)1 3+,,(1C)1 !&'+(=289,,1j)1i'5(':1d)1 Z7&>'51`)1 /+,,1$)1 Z9(&',1G)1 T?\1") $# &*5 HWYYPL $2' .','/,0?&?-: ",,?090+?, H./"L d5?F'=0I U8A&'8',090+?, ?@ ./ +, #jU##Dda/$1 $5EZ341 9,> U,0'5d5?) >$+)-$ ?$251 ,+1 QQWgQcW)

f) Z+1b)1 l9,>'5-5+@@1G)1 T98A;'&&1Z)1 `95'=29,+91")1 Z9F?5?(1j)14'B+(1#)1 $2?89(1d)C) 9,> "(2;75,'51Z) HWYYPL "(('((8',0 ?@ -',?8'DB+>' A5?0'+, @7,=0+?, =&9((+%=90+?, @?5 8%)2)/A"*& -$*&+)1&2#$%)>$+)-$ ?$251 ,+1 WVVRgWVWR)

VY) d?7&+?01k)1 .9?1G)1 #71e)G)1 4+71.).) 9,> 4+,-1m)3) HWYYVL CU"`I 9,?6'& 9&-?5+028 @?5 -',?8' ?,0?&?-+=9& =&9((+%=90+?,) >$+)-$ ?$251 ,,1VcQQgVccf)

VV) /[9h9[+1k)1 !757,?1Z)1 i9(7[9B91$)1 ">9=2+1G)1 3?,?1b)1 i?,>?1#)1`+[9+>?1U)1 /(90?1`)1 #9+0?1a) 9,> #7h7[+1b) $# &*5 HWYYWL ",9&:(+( ?@ 02'8?7(' 059,(=5+A0?8' ;9('> ?, @7,=0+?,9& 9,,?090+?, ?@ QY1ccY @7&&D&',-02=C`"() @&#'%$1 .-/1 SQPgScP)

VW) m+'1b)1 j9(('589,1")1 4'6+,'1n)1 `?6+[1")1 .5';+,([+:1l)1 #2?(29,1")9,> Z+,0h14) HWYYWL 495-' (=9&' A5?0'+, 9,,?090+?, 025?7-2 .','/,0?&?-:) >$+)-$ ?$251 ,-1 cRSgcfX)

VP) ">98(1Z)C)1 T'&,+['51#)E)1 b?&01a)")1 E69,(1T)")1 .?=9:,'1G)C)1"89,90+>'(1d).)1 #=2'5'51#)E)1 4+1d)j)1 b?([+,(1a)")1 .9&&'1a)!) $# &*5HWYYYL $2' -',?8' ('<7',=' ?@ 8%)2)/A"*& -$*&+)1&2#$%) ;("$+($1 -011WVRSgWVfS)

@'(*$"( 6("B2 ?$2$&%(AC DEEFC G)*5 HDC 8&#&I&2$ "22'$ !"#$

66

Submitted Publication

5. Gene Ontology annotation status of the fission yeast genome:

preliminary coverage approaches 100%.

YeastYeast 2006; 23: 913–919.Published online in Wiley InterScience(www.interscience.wiley.com) DOI: 10.1002/yea.1420

Report

Gene Ontology annotation status of the fission yeastgenome: preliminary coverage approaches 100%Martin Aslett and Valerie Wood*Wellcome Trust Sanger Institute, Cambridge CB10 1HH, UK

*Correspondence to:Valerie Wood, Wellcome TrustSanger Institute, CambridgeCB10 1HH, UK.E-mail: [email protected]

Received: 16 August 2006Accepted: 29 August 2006

AbstractIn this review, we present an overview of the Gene Ontology (GO) structureand describe how the GO is implemented for Sz. pombe and made availablevia Sz. pombe GeneDB (http://www.genedb.org/genedb/pombe/). We give a detailedprogress report of Sz. pombe GO annotation, providing the current status of bothmanual and automatic annotations. Fission yeast has at least one GO annotationfor 98.3% of its genes (excluding annotations to ‘unknown’ terms), greater than thecurrent percentage coverage for any other organism. Approximately 65% (3225 geneproducts) have at least one annotation to each of the three ontologies (biologicalprocess, cellular component and molecular function). Approximately 30% (1443 geneproducts) have GO terms derived directly from small-scale experiments in fissionyeast, supporting the validity of fission yeast as a model eukaryote and a referenceorganism. Copyright © 2006 John Wiley & Sons, Ltd.

Keywords: gene ontology; Schizosaccharomyces. pombe; annotation; curation; ref-erence genome; fission yeast

Introduction

The accumulation of biological data produced bygenome-scale biology has required a revolutionin the approaches used to describe, integrate andretrieve this huge volume of diverse informa-tion. Numerous attributes of gene products can berecorded during the annotation or literature cura-tion process but the molecular activity (function),biological process and cellular localization (com-ponent) are generally considered the most immedi-ately useful information to describe an organism’sbiology. Any robust system to describe these fea-tures of gene products has the following require-ments or desirable attributes:

1. The ability to describe gene products consis-tently and unambiguously, so that similar char-acteristics are grouped (including the groupingof gene products for which no functional datais available).

2. The ability to support the inherent pleiotropyof the data, recognizing that genes may have

multiple functions and locations and participatein multiple processes.

3. The ability to describe gene products using dif-ferent levels of granularity, depending on howmuch is known or can be inferred (hierarchical).

4. Mechanisms to qualify annotations with differ-ent levels of confidence, and to support theannotations with a method or citation.

5. Sophisticated consistency checks to maintain theintegrity of the data.

6. Be readily and rapidly extensible to incorporatenew biological concepts.

7. Support for negative annotations.8. Be species-independent, to support inter-

organism queries.9. Enable researchers to retrieve specified groups

of genes or to identify candidate gene productsfor specific functions.

The annotation standards provided by controlledvocabularies and more sophisticated ‘ontologies’are now crucial to the annotation process for

Copyright © 2006 John Wiley & Sons, Ltd.

914 M. Aslett and V. Wood

most genomes. These define ‘terms’ to describeaspects of a gene product’s biology, which canbe interpreted identically both within and betweenorganisms, by both biologists and computers. Themost vital resource for maintaining consistentannotation of genes and gene products is pro-vided by the Gene Ontology (GO) Consortium(http://www.geneontology.org), which fulfils allof the nine requirements above and is the anno-tation system of choice for the majority of modelorganism databases (MODs) (The Gene Ontol-ogy Consortium 2004). The GO Consortium isa collaborative open source project to developshared controlled vocabularies, which are contin-ually refined and expanded to reflect accumulatingbiological knowledge (Ashburner et al., 2000). GOprovides three ontologies to describe the orthogonalbiological domains of biological process, cellularcomponent and molecular function in a species-independent manner.

Gene Ontology structure

GO terms are arranged so that broader parentshave more specific children. The relationships arerepresented in the form of a directed acyclicgraph (DAG), which is similar to a hierarchy,except that it captures biological relationships morerealistically by allowing individual child terms tohave many parent terms. At present, two types ofrelationship are implemented in GO (‘is a’ and‘part of’), although it is conceivable that otherrelationship types will be added in the future.For example, the cellular component term ‘cellcortex of cell tip (GO:0051285)’, has two parents,it is ‘part of’ ‘cell tip (GO:0051286)’ and ‘is a’‘cell cortex (GO:0005938)’ (see Figure 1 for ascreenshot of the cellular component term ‘cellcortex of cell tip’ in the ‘AmiGO’ GO browser;this shows all the parent terms, together with thenumbers of Sz. pombe gene products associatedwith each term). Every GO term must obey the‘true path rule’; this means every possible pathfrom any term back to the root (most general term)must be biologically accurate. When a gene productis annotated to a term, it is therefore automaticallyannotated to all of the parent terms. For example, agene product annotated to ‘inner plaque of spindlepole body (GO:0005822)’ is ‘part of’ the ‘spindlepole body (GO:0005816)’, which is ‘part of’ the‘spindle pole (GO:0000922)’ and so forth, back to

the root node ‘cellular component (GO:0005575)’.If a path back to the root node is incorrect fora valid annotation, a ‘true path violation’ occursand the ontology must be revised. The DAGstructure allows curators to assign properties atdifferent levels of granularity, depending on howmuch is known, or can be inferred, about a geneproduct. Multiple associations (ontology terms) canbe applied to a single gene product, reflecting thefact that a gene product may have several functions,be present in different locations, participate indifferent processes and interact with numerousother proteins.

Dynamic aspects of the Gene Ontologyand the associated annotations

GO is a dynamic resource. Changes to the ontolo-gies are frequently made to correct legacy termsand relationships, to improve consistency and toadd new terms and relationships as advances aremade in biology. Literature curation is not a pas-sive process and necessarily includes contributingto the development of the GO by identifying miss-ing relationships, refining existing term definitionsand extending the vocabularies by identifying newterms. New terms added recently to describe bio-logical phenomena studied in Sz. pombe includethe cellular component terms ‘medial ring (GO:0031097)’ and ‘lateral element (GO:0000800)’, thebiological process terms ‘sister chromatid bior-ientation (GO:0031134)’ and ‘horsetail nuclearmovement (GO:0030989)’ and the molecular func-tion terms ‘ornithine N5-monooxygenase activity(GO:0031172)’ and ‘glucan endo-1,3-alpha glu-cosidase activity (GO:0051118)’.

Gene Ontology implementation

GO collaborators use the GO schema to annotateindividual gene products. These annotations aremaintained in a common file format (the gene asso-ciation file: see http://www.geneontology.org/GO.annotation.shtml?all#file), which is incorporatedinto the contributing database (GeneDB in the caseof Sz. pombe; http://www.genedb.org/) and sub-mitted to the Gene Ontology consortium.

A comprehensive set of GO annotations (geneassociations) are provided for Sz. pombe withinGeneDB. These associations are derived from a

Copyright © 2006 John Wiley & Sons, Ltd. Yeast 2006; 23: 913–919.DOI: 10.1002/yea

Fission yeast GO curation update 915

Figure 1. A screenshot of the cellular component term ‘cell cortex of cell tip’ and its parent terms in the ‘AmiGO’ GObrowser. The numbers in parentheses show the number of Sz. pombe gene products associated to each term

number of non-redundant sources, and are continu-ally refined and updated. There are currently 23 243manual gene associations for 4757 gene products.Of these, 6017 are derived from experimental datavia literature curation for 1443 genes (1529 publi-cations) and are supported by the appropriate evi-dence code for the type of experiment (see Table 1legend for a list of evidence codes). A further 8649are ‘inferred from sequence similarity’ (ISS) basedon manual inspection of sequence alignments tocharacterized proteins, 485 are derived from ‘non-traceable author statements’ (NAS) and a further958 are ‘inferred by the curator’ (IC), based onother GO annotations.

Fission yeast has recently benefited froma whole-genome localization study (Matsuyamaet al., 2006). Although essentially a high-throughput study, this provides high-quality anno-tations to be followed up in small-scale experi-ments in fission yeast and may provide functionalclues for completely unstudied genes. This studyhas provided a further 7134 experimentally sup-ported associations for 4302 genes.

The manual gene associations are supplementedby ‘inferred from electronic annotation’ (IEA)associations and additional manual associationsgenerated from the Gene Ontology Annotation(GOA) database and UniProt (Camon et al., 2004;

Copyright © 2006 John Wiley & Sons, Ltd. Yeast 2006; 23: 913–919.DOI: 10.1002/yea

916 M. Aslett and V. Wood

Table 1. Sources of the non-redundant GO data forSz. pombe in GeneDB, by evidence code

Evidence code GeneDB GOA/UniProt

IEA 19251 30843

1562 19354

IMP 1828IDA 8551IEP 75IGI 847IPI 755ISS 8649RCA 17IC 958NAS 485TAS 1078

25 324 5019

1 GOC:pombekw2GO (a mapping of keywords from the curatedGeneDB annotations to GO terms).2 GOC:ec2go (a mapping of EC numbers to GO terms).3 GOA:interpro (a mapping of GO terms which apply to all membersof specific protein families to GO terms).4 GOA:spkw (a mapping of SwissProt keywords to GO terms).IEA, inferred from electronic annotation; IMP, inferred from mutantphenotype; IDA, inferred from direct assay; IEP, inferred fromexpression pattern; IGI, inferred from genetic interaction; IPI,inferred from physical interaction; ISS, inferred from sequencesimilarity; RCA, reviewed computational analysis; IC, inferred bycurator; NAS, non-traceable author statement; TAS, traceableauthor statement. Also see the online GO evidence documentation:http://www.geneontology.org/GO.evidence.shtml?all

Apweiler et al., 2004). These include:

1. A keyword mapping from the primary Sz. pombeannotation to GO terms (GOC:pombekw2GO).

2. A mapping of enzyme commission (EC) num-bers assigned to GO terms (GOC:ec2go).

3. A mapping of InterPro families and domains toGO terms (GOA:interpro).

4. A mapping of UniProt (formerly SwissProt andTrEMBL) keywords to GO (GOA:spkw).

Within the GeneDB database, redundantelectronically-inferred GO annotation is preventedby presenting IEA data only when they are moregranular (specific) than existing manual associa-tions. This provides 7103 additional associations,giving a total of 30343. The sources of these asso-ciations and their distribution between the variousevidence codes are summarized in Table 1. Onefuture aim of the curation strategy is to process theliterature backlog, in order to convert IEA associ-ations to experimentally supported evidence codes

where applicable, or ISS codes supported by a man-ually assessed alignment to a characterized proteinor protein family if direct experimental results arenot available.

Three qualifiers (NOT, contributes to and co-localizes with) are available within GO to modifythe interpretation of the annotation. The ‘NOT’qualifier is used to support negative annotations.This would normally be used if experimental evi-dence has shown a particular assignment not to betrue, but where an association might otherwise bemade based on other evidence (e.g. sequence sim-ilarity). The ‘contributes to’ qualifier is used whena complex has an activity but the individual sub-units do not, e.g. the subunits of RNA polymerases.The ‘co-localizes with’ qualifier is used when geneproducts are associated transiently or peripherallywith a cellular component.

Within Sz. pombe GeneDB, additional qualifiersare applied to GO associations to increase theirinformational content further. Examples include‘domain’ qualifiers to specify regions of a pro-tein to which ‘protein binding GO:0005515’ andits child terms are applied. ‘Phase’ qualifiers maybe used to specify the life cycle or cell cyclestage when a particular localization is observed,or process occurs, which is especially useful forpleiotropic genes. A number of qualifiers arealso used in conjunction with the ‘inferred fromgenetic interaction’ (IGI) evidence code to estab-lish the type of genetic interaction (epistasis,acts upstream of, parallel pathway, etc.). This pro-vides information about the position in the genetichierarchy, the directionality of the interaction orwhether the gene product acts in the same or adifferent pathway and is pertinent to the reconstruc-tion of genetic networks.

Sz. pombe annotation progress, coverageand comparison with Saccharomycescerevisiae

Of the 4969 known and predicted protein codinggenes, 4886 are assigned to at least one GO term(Figure 2). This includes 3976 with at least onebiological process term, 4801 with at least onecellular component term and 3471 with at leastone molecular function term. There are 3225 geneproducts assigned to at least one term from eachof the three ontologies. Only 82 genes consid-ered likely to be protein-coding have no known

Copyright © 2006 John Wiley & Sons, Ltd. Yeast 2006; 23: 913–919.DOI: 10.1002/yea

Fission yeast GO curation update 917

Function

ProcessComponent

12676734

3225(3460)

166 63

All three aspects unknown 83 (596)

Total 4969 (5780)

17

Figure 2. Gene Ontology association coverage forSz. pombe gene products for each of the three ontologies(biological process, molecular function, cellular component)and their overlap. Inclusion in a set shows that at leastone annotation is present for the ontology aspect, althoughthere may be more than one. The total refers to thetotal of gene products (rather than individual annotations);3225 products have at least one annotation to each of thethree ontologies, 83 gene products have no annotations toany of the three ontologies. The corresponding figures forS. cerevisiae are shown in parentheses when available

or predicted component process or function. Incontrast, S. cerevisiae has more gene products

assigned to at least one term in all three ontolo-gies (3460), but also a greater number of geneswith unknown function process and component(596). (Note: S. cerevisiae figures were providedby SGD. Annotations to non-coding RNA genes,‘cellular component unknown’, ‘biological processunknown’ and ‘molecular function unknown’ wereremoved from both organisms.)

The total of 30 343 associations (Figure 3) isa 111.2% increase, since manual GO annotationof Sz. pombe began in January 2004. As stated,there are now 23 243 manual GO annotations,representing 76.6% of all GO associations forSz. pombe. Also, since January 2004, the numberof non-redundant automatic (IEA) associations hasfallen from 13 775 to 7100 as manual annotationshave replaced those done automatically.

Searching and accessing GO

GO is becoming increasingly powerful for the iden-tification of candidate genes in biological domainsof interest. For this process, annotations inferredfrom sequence similarity, automatic mappings andother bioinformatics predictions are important forprovision of the most comprehensive annotationcoverage (and therefore search results) in theabsence of experimental data. However, the GOstructure allows annotations to be made at differ-ent levels of granularity, depending on what isknown or can be inferred about a gene product,

1436416682

19008 2010822530

30343

0

5000

10000

15000

20000

25000

30000

35000

Dec-9

9

Mar-

00

Jun-

00

Sep-0

0

Dec-0

0

Mar-

01

Jun-

01

Sep-0

1

Dec-0

1

Mar-

02

Jun-

02

date

asso

ciat

ions Sz. pombe manual

Sz. pombe automated

Sz. pombe total

S. cerevisiae total

Figure 3. Graph of GO annotation number increase over time for Sz. pombe. Both manual and electronically inferred (IEA)annotations are shown, both individually and combined in the overall total. S. cerevisiae totals (from SGD) are included forcomparison (it should be noted that SGD do not include electronically inferred annotation)

Copyright © 2006 John Wiley & Sons, Ltd. Yeast 2006; 23: 913–919.DOI: 10.1002/yea

918 M. Aslett and V. Wood

and predicted annotations are likely to be moreconservative (less specific or granular). This raisesan important consideration for searching — thatit is important to perform searches at the cor-rect level of granularity. However, one impor-tant consequence of the GO structure for search-ing is that retrieving annotations to a GO termalso retrieves annotations to its children. Therefore,annotations to parent terms can identify candidategenes for more specific child terms. For instance,a researcher looking for a specific activity, such asdolichyl-phosphate beta-glucosyltransferase activ-ity (GO:0004581), may retrieve all genes whichare inferred by sequence similarity or electronicannotation to have the activity of the broader par-ent terms, UDP-glucosyltransferase activity (GO:0035251), or glucosyltransferase activity (GO:0046527), or galactosyl transferase activity (GO:0046527) as candidates.

The Sz. pombe gene associations can beaccessed via the GO consortium website and theGeneDB website using the AmiGO Gene Ontol-ogy browser (http://www.godatabase.org/cgi-bin/amigo/go.cgi), (http://www.genedb.org/amigo/perl/go.cgi) (B. Marshall, S. ShengQiang, S. Car-bon, S. Lewis, unpublished software). AmiGOallows the browsing of GO terms and the rela-tionships between them and the retrieval ofgene products associated with those terms, orall the terms associated with specific gene prod-ucts. It also allows searching of the ontol-ogy by term name and of the annotationsby gene name, sequence, evidence code orspecies.

GeneDB also supports a Boolean query facility,enabling GO annotations to any term to be com-bined using AND or OR (http://www.genedb.org/gusapp/servlet?page=boolq). These queries canalso be combined with other biological attributes,including protein domains, keywords, proteinlength and mass, presence of transmembraneregions, signal peptides, exon number and chromo-somal location. The results can be saved to a queryhistory, combined with previous queries (added,subtracted and intersected) and downloaded in anumber of formats (e.g. as gene names, descrip-tion, protein or nucleotide sequence) (Hertz-Fowleret al., 2004).

The gene association files are available fordownload from the GO consortium and the Well-come Trust Sanger Institute (WTSI) websites

(http://www.geneontology.org/GO.current.annotations.shtml), (ftp://ftp.sanger.ac.uk/pub/yeast/pombe/Gene ontology). The WTSI files alsoinclude the non-redundant IEA associations.

Conclusions

The curation process is improved by GO, farbeyond the provision of controlled vocabularies toconsistently describe biological phenomena. TheGO also provides a framework for quality controlof both data input and the subsequent revisions orextensions to biological knowledge which affectthe description and implementation of conceptswithin the ontologies. In addition, it provides amechanism for the identification of relevant terms,either by the application of curated mappings(e.g. InterPro to GO, or EC to GO) or by theconsideration of commonly co-annotated termsfrom orthogonal ontologies. These associations,in turn, provide robust datasets for inter-speciescomparisons, and facilitate uniform queries basedon shared biological roles. At present, the use ofGO by biologists falls into two main categories.First, GO annotation is used to identify statisticallyover-represented GO terms in large datasets (e.g.looking for biologically significant groups of genesin microarray or proteomics data). Second, GOis frequently used for assessing the distributionof genes annotated to groups of high-level terms(known as binning). Increasingly, GO is beingused by biologists to identify interesting geneproducts, and has the potential to identify areas andgenes which are relatively unstudied. All of theseapplications become increasingly powerful as theannotations are refined and the ontologies becomemore complete.

Since the completion of the fission yeast genomesequence, the rate of gene characterization hasincreased rapidly. This is reflected by the obser-vation that 1443 (30%) of gene products haveGO annotations supported by direct evidence fromsmall-scale experiments in fission yeast. The sub-stantial coverage reported here shows that themajority of gene products have at least someminimal functional information attached to them,whether from experiment, sequence similarity orhigh-throughput studies. However, coverage is onlyone measure of annotation progress and curationnecessarily requires long-term investment to make

Copyright © 2006 John Wiley & Sons, Ltd. Yeast 2006; 23: 913–919.DOI: 10.1002/yea

Fission yeast GO curation update 919

manual annotations and increase the breadth anddepth of annotation.

It should not be overlooked that a substantialnumber of fission yeast genes are conserved inhigher eukaryotes but absent from S. cerevisiae(Lespinet et al., 2002; Wood, 2006). For thesegenes, fission yeast is frequently an importantmodel, providing functional insights for humanorthologues (Wood, 2006). Furthermore, conservedgenes are often first characterized in Sz. pombebecause for some areas of biology it providesa more attractive model. Sz. pombe also has asmaller gene set than S. cerevisiae, a consequenceof fewer duplication events. Because duplication isfrequently accompanied by divergence, this meansthat Sz. pombe gene sequences are often more sim-ilar than the corresponding S. cerevisiae sequencesto their higher eukaryotic orthologues. The smallergene set and reduced divergence means that thecellular content of fission yeast more closely resem-bles that of the common ancestor, making it anincreasingly attractive model for the study of con-served cellular processes. Therefore, if buddingyeast is the first organism for which we knowsomething about the basic function of every gene,fission yeast will almost certainly not be very farbehind.

Acknowledgements

We would like to thank the GO editors (Midori Harris, JaneLomax, Amelia Ireland and Jennifer Clark, GO editorialoffice, EBI, Hinxton, UK); the curators at SaccharomycesGenome Database (SGD, Stanford, USA); the GOA group(Evelyn Camon and Daniel Barrell, EBI, Hinxton, UK); theSz. pombe UniProt curators (Viv Junker, SwissProt, Geneva

and Kati Laiho, EBI, Hinxton, UK); and the GeneDBdevelopers (Adrian Tivey and Paul Mooney, PathogenSequencing Unit, Wellcome Trust Sanger Institute, Hinxton,UK) for technical and curatorial support. Additional thanksto Midori Harris, Jurg Bahler (Fission Yeast FunctionalGenomics group, Wellcome Trust Sanger Institute) andMatt Berriman (Pathogen Sequencing Unit, Wellcome TrustSanger Institute) for proofreading comments.

This review is adapted and updated from Wood (2006);the updated version of this text and figures was publishedwith the kind permission of Springer Science and BusinessMedia.

References

Apweiler A, Bairoch A, Wu CH, et al. 2004. UniProt: Theuniversal protein knowledgebase. Nucleic Acids Res 32:D115–119.

Ashburner M, Ball CA, Blake JA, et al. 2000. Gene Ontology:tool for the unification of biology. Nat Genet 25: 25–29.

Camon E, Magrane M, Barrell D, et al. 2004. The Gene OntologyAnnotation (GOA) database: sharing knowledge in Uniprot withgene ontology. Nucleic Acids Res 32: D262–266.

Hertz-Fowler C, Peacock CS, Wood V, et al. 2004. GeneDB: aresource for prokaryotic and eukaryotic organisms. NucleicAcids Res 32: D339–343.

Lespinet O, Wolf YI, Koonin EV, Aravind L. 2002. The roleof lineage-specific gene family expansion in the evolution ofeukaryotes. Genome Res 12: 1048–1059.

Matsuyama A, Arai R, Yashiroda Y, et al. 2006. ORFeomecloning and global analysis of protein localization in thefission yeast Schizosaccharomyces pombe. Nat Biotechnol 24:841–847.

The Gene Ontology Consortium. 2004. The Gene Ontology(GO) database and informatics resource. Nucleic Acids Res 32:D258–261.

Wood V. 2006. Schizosaccharomyces pombe comparativegenomics; from sequence to systems. Comparative GenomicsUsing Fungi as Models, Section 5, Curation, Sunnerhagen P,Piskur J (eds). Springer-Verlag: Heidelberg.

Copyright © 2006 John Wiley & Sons, Ltd. Yeast 2006; 23: 913–919.DOI: 10.1002/yea

67

Submitted Publication

6. Schizosaccharomyces pombe comparative genomics; from

sequence to systems.

Topics in Current Genetics, Vol. 15 P. Sunnerhagen, J. Piškur (Eds.): Comparative Genomics DOI 10.1007/4735_97 / Published online: 2 August 2005 © Springer-Verlag Berlin Heidelberg 2005

Schizosaccharomyces pombe comparative genomics; from sequence to systems

Valerie Wood

Abstract

The fission yeast Schizosaccharomyces pombe is becoming increasingly important as a model for the characterization and study of many globally conserved genes, second only in importance to the budding yeast Saccharomyces cerevisiae. This chapter provides an updated inventory of gene number and genome contents for fission yeast compared to budding yeast. Functional and comparative genomics studies, and the insights these have provided into how the different genome con-tents of these two yeasts are manifested in their individual biologies are reviewed. Phylogenetic analysis, comparative genomics and experimental research support the choice of S. pombe as a model for the dissection of many biological processes, which are often more similar to the analogous processes in higher eukaryotes than those of the Saccharomytina. The review underlines the advantages of exploiting this organism through the integration of bench science, functional genomics, phy-logenomics and systems biology in order to identify and interpret the minimal re-quirements for a eukaryotic cell.

1 Introduction

Schizosaccharomyces pombe, or fission yeast, is a simple unicellular archiasco-mycete fungus. It was established as a model organism by the influential work which culminated in a universal model for control of the cell cycle (reviewed in Nurse 2000). The fission yeast and its distant relative, budding yeast (Saccharo-myces cerevisiae), are estimated to have diverged 330-420 million years ago; in comparison to the metazoan split which is estimated to have occurred 1000-1200 million years ago (Berbee and Taylor 1993; Lum et al. 1996). Other estimates propose a radical adjustment of these figures to 1,144 and 1600 million years ago respectively (Heckman et al. 2001). Despite the variation in the predicted time of divergence, phylogenetic analyses and anecdotal evidence indicate that S. pombe gene sequences are often more similar to their mammalian counterparts than the equivalent S. cerevisiae genes (reviewed in Sipiczki 2001).

Completion of the S. cerevisiae genome sequence in 1996, was a landmark that changed the nature of experimental biology for this organism (Goffeau et al. 1996). The availability of the genome sequence of S. pombe has similarly revolu-tionised research for the expanding fission yeast community and made possible

234 Valerie Wood

tionised research for the expanding fission yeast community and made possible the first global comparative genomics of two free living fungal species (Wood et al. 2002). The completed S. pombe genome, coupled with the features which have made it a popular experimental model (sophisticated technologies for molecular and cell biology and well developed genetic techniques), also make it an attractive target organism for functional genomics and global systems approaches.

The evolutionary distance between these two yeasts allows their differing ge-nome contents to be usefully compared and evaluated, not only to interpret their individual evolutionary histories in terms of functionality, but often to extrapolate these findings to higher eukaryotic systems. For, despite the length of time since fission yeast and budding yeast shared a common ancestor with humans, both or-ganisms provide excellent experimental models for many essential eukaryotic processes because the majority of genes from both yeasts have predicted ho-mologs in multicellular eukaryotes (Wood et al. 2002)1. A previous comparison between S. cerevisiae and Caenorhabditis elegans, using different thresholds, pre-dicted that a minimum of 40% of budding yeast genes had a homolog in multicel-lular eukaryotes (Chervitz et al. 1998)2. Chervitz and colleagues also proposed that most core biological functions are carried out by orthologous pairs of conserved genes. Furthermore, they demonstrated that orthologs could usually be reliably identified on a genome-wide basis by simple sequence comparisons, even within families of highly similar proteins with many members. Initial comparisons using S. pombe also showed that genes which were highly conserved between the animal and plant kingdoms were also almost always conserved in both yeasts (Wood et al. 2002). These observations continue to be supported by the characterisation of many conserved genes involved in processes fundamental to the maintenance of all eukaryotic cells. Significantly, but not surprisingly, many universally con-served genes are required for genome stability, and their mutated forms are often implicated in human cancers.

Perhaps unexpectedly, considering its smaller proteome, substantial numbers of broadly conserved proteins are completely absent from S. cerevisiae but are pre-sent in S. pombe (Aravind et al. 2000). Consequently, when gene products con-served in higher eukaryotes are absent from the budding yeast but present in the fission yeast, the fission yeast processes display closer functional correspondence to those of more complex organisms. These processes include centromere struc-ture and function (Kniola et al. 2001; Appelgren et al. 2003), RNA interference and heterochromatin formation (Volpe et al. 2002; Hall et al. 2002), nuclear mRNA splicing (Käufer and Potashkin 2000; Kuhn and Käufer 2003; Webb and Wise 2004), certain aspects of cell cycle progression (Mundt et al. 1999), and te-lomere function (Kanoh and Ishikawa 2003). However, because of the subtle na-ture of the variations in many of the regulatory circuits controlling these proc-esses, ultimately both the similarities and differences between these two yeasts will continue to be informative for the understanding of basic biological phenom-ena (Forsburg 1999).

1 using BLASTP with a cut-off E-value of 0.001 2 using BLASTP with a cut-off P-value of 10-10

Schizosaccharomyces pombe comparative genomics; from sequence to systems 235

S. pombe has lower protein redundancy than S. cerevisiae (inferred by fewer duplicate genes). This partially explains the apparent closer similarity of S. pombe to higher eukaryotes, because duplication is often accompanied by divergence (Langkjaer 2003; Kellis et al. 2004). Significantly, the evolution of some dupli-cated S. cerevisiae genes appears to have played a direct role in the transition to a fermentative lifestyle (Piskur 2001). Although S. cerevisiae will continue to be the most intensively studied because of its enormous industrial importance; S. pombe is more likely to resemble the cellular content of the common ancestor and may prove to be more suitable for the functional analysis of certain genes.

This chapter provides an updated inventory of the gene number and genome content of the fission yeast, S. pombe, as compared to the budding yeast, S. cere-visiae, and emphasises the importance of continual sequence analysis for the re-finement of the primary data. Genome features and contents are interpreted in the context of published experimental research. The available functional and compara-tive genomics studies, and the associated insights into how the differing genome contents of these two yeasts are manifested in their individual biologies are re-viewed. These include studies of transposon content, gene organization and regu-lation, microarray expression studies, proteome comparisons and orthology map-ping. Finally, an overview of the current status of genome annotation and literature curation using Gene Ontology (GO) descriptors and a summary of the global similarities and differences between the ‘high level’ biological processes of these two important model yeasts are presented.

Phylogenetic analyses, comparative genomics and experimental research into chromosome structure and organization support the choice of S. pombe for the dis-section of many processes which appear to be more similar to analogous processes in higher eukaryotes than to those of the Saccharomycotina. Drawing the cumula-tive body of research to date into a single unified review emphasises the advan-tages of exploiting this organism by the integration of bench science, functional genomics, phylogenomics and systems biology approaches in order to identify and interpret the minimal requirements for a single eukaryotic cell.

2 Genome features

2.1 Genome size and sequencing status

The S. pombe genome size was estimated to be 13.8 Mb by restriction mapping, compared to the 13.0 Mb genome of S. cerevisiae (Fan et al. 1988; Smith et al. 1987). Although the genome sizes are similar, S. pombe has only three chromo-somes compared to S. cerevisiae’s 16; their sizes being 5.7, 4.6 and 3.5 Mb for chromosomes I, II and III respectively. The smallest S. pombe chromosome is therefore over twice the length of the longest S. cerevisiae chromosome (1.5 Mb). For S. cerevisiae the increased chromosome number and smaller size is a conse-quence of the proposed whole genome duplication events in some yeast lineages. Most of the species which lie on the deeper branches of the ascomycete phylogeny

236 Valerie Wood

have haploid chromosome numbers between six and eight. This implies an ap-proximate doubling in the Saccharomyces (sensu stricto) group (Wolfe and Shields 1997; Keogh et al. 1998). Duplication appears to be accompanied by downsizing through deletion, because although chromosome number is often in-creased, total genome size is broadly similar. For S. pombe the lower chromosome number and larger size may indicate an absence of whole genome duplication events.

The contiguated fission yeast sequence is 12 571 419 bases, arranged in seven contigs with four sequence gaps (two centromeric and two telomeric). The pub-lished genome sequence excludes the ribosomal DNA (rDNA) repeats which are present in two tandem arrays on chromosome III. These arrays are estimated to be 1225 kb and 240 kb in size for the sequenced strain (972 h-), although dramatic length polymorphisms between closely related strains are reported for these re-gions (Pasero and Marilley 1993). The unsequenced subtelomeric regions for chromosomes I and II are approximately 80 kb +/- 20 kb (R. Hyppa and G. Smith, personal communication). The centromeric gaps are estimated to be less than 36 kb and are restricted to known repeats.

The sequenced genome size, together with estimated sizes of the unsequenced elements is 14.1 Mb, and compares well with the 13.8 Mb calculated earlier from Not I fragment sizes (see above). The estimated sizes of chromosome I and II are almost identical to earlier approximations. The majority of the observed size dif-ference is between the chromosome III totals and may be due to the variable na-ture of the rDNA repeats.

The composite sequence is expected to be missing only repetitive regions, there should therefore be little, if any, unique sequence excluded from the present data3. Efforts are continuing to sequence the remaining centromeric and telomeric gaps, and the sequence status is continually updated at http://www.sanger.ac.uk/Pro-jects/ S_pombe/status.shtml

2.2 Centromeres

The basic structure of the centromeres and their approximate sizes were deter-mined prior to complete genome sequencing by Southern blotting and partial se-quencing (Chikashige et al. 1989; Clarke and Baum 1990; Murakami et al. 1991). Centromeres 1, 2 and 3 were estimated at 40, 69 and 110 kb, respectively. These sizes are inversely proportional to the length of the chromosomes at 5.7, 4.6 and 3.5 Mb, and their structure was verified by the genome sequence. The centromere structure comprises a non-conserved central core sequence (cnt) flanked by in-verted repeats (ImrL and ImrR) that display sequence identity with each other (Takahashi et al. 1992). These central elements are flanked by variable numbers of outer repeats (otrL and otrR). Initial studies showed that the central core is essen-tial, but not sufficient for centromeric function, and at least a portion of the outer

3 Based on the assumption that no unique protein coding genes exist at the telomeres of

chromosome III proximal to the rDNA repeats.

Schizosaccharomyces pombe comparative genomics; from sequence to systems 237

repeat is required (Takahashi et al. 1992). These repeats contain a highly con-served region, dg or K, which was found to be critical, and additional repeats were shown to have a positive effect on minichromosome stability (Baum et al. 1994). The complex and diffuse centromere structure of fission yeast is more reminiscent of higher organisms than of the 125 base pair structurally conserved element is sufficient for centromere function in S. cerevisiae (Fitzgerald-Hayes et al. 1982).

Work on centromere and kinetochore function in fission yeast is beginning to dissect the biological basis for these structural differences. Several centromeric proteins have been identified in S. pombe which are conserved in mammals but are absent from S. cerevisiae, including Swi6 and Chp1 (Lorentz et al. 1994; Ekwall et al. 1995; Doe et al. 1998). These proteins have, like their mammalian counter-parts, been linked with distinct structural and functional domains. Recently, an important link has been made between the formation of silent centromeric hetero-chromatin and the RNA interference (RNAi) machinery (which is absent from S. cerevisiae but conserved in plants, insects and mammals; Volpe et al. 2003). It is proposed that small interfering RNAs (siRNAs) are generated from centromeric double-stranded RNAs by the RNAi machinery. These siRNAs induce the forma-tion of heterochromatin in the centromeric regions by targeting repetitive DNA and directing its methylation.

The conservation of features including size, structure and multilayered organi-zation have led to the suggestion that the fission yeast centromere represents the basic modular structure of complex centromeric DNA in higher eukaryotes (Kniola et al. 2001; Appelgren et al. 2003). These common features and the in-volvement of the RNAi components (essential for heterochromatin formation in vertebrate cells) are inevitably making fission yeast a valuable model for eu-karyotic chromatin remodelling and centromere function.

2.3 Subtelomeric regions

Approximately 50-60 kb of the region immediately proximal to the telomeric re-peats of all four of the sequenced subtelomeric regions is highly similar (~99% sequence identity for most of the regions). This is consistent with the observation that the telomeres of fission yeast and other eukaryotes are known to cluster at meiotic prophase (Chikashige et al. 1994; Scherthan et al. 1994), because telomere clustering may promote the more frequent exchange of genetic information which appears to occur in these regions (reviewed in Scherthan 2001).

One striking feature is a large (6.3 Kb) open reading frame (ORF), SPAC212.11, with homology to ReqQ helicases present at the ends of the two fully sequenced chromosome arms immediately proximal to the degenerate telom-eric repeats. This helicase has recently been shown to be highly expressed in rare survivors of crisis in telomerase mutants (Mandell et al. 2004; Mandell et al. 2005). There are also 19 highly conserved, telomere associated, Y’ elements in S. cerevisiae, containing a predicted helicase domain which have similarly been im-plicated in the maintenance of telomeres in telomerase defective populations of S. cerevisiae (Louis and Haber 1998; Yamanda et al. 1998; Maxwell et al. 2004).

238 Valerie Wood

This is the only example of a conserved protein function and genomic location be-tween the two yeasts. The S. pombe RecQ helicase appears to be partially tran-scriptionally regulated by RNAi, suggesting that this mechanism also operates at the telomeres (Mandell et al. 2004).

The subtelomeric regions of S. pombe appear to contain an increased density of species-specific predicted cell-surface glycoprotein families relative to the whole genome (Wood et al. 2002). Similarly, the S. cerevisiae Seripauperin and TIP or PAU family (26 members) and COS/DUP family (24 members), and flocculin family (6 members) which are also cell-surface molecules of unknown function, are typically telomerically encoded (Goffeau et al. 1996). It is possible that the subtelomeric regions of both yeasts may favour duplication and that this may re-sult in the generation of novel, organism specific genes important for cell identity (Wood et al. 2002; Kellis et al. 2003). One feature of telomeric regions, which may be significant in providing a potential reservoir for surface variation, is that these regions are usually transcriptionally silent (Nimmo et al. 1998). A novel form of epigenetic regulation at the telomeres has recently been identified in an S. cerevisiae strain where only FLO11 of the glycosylphosphatidylinositol (GPI) an-chored flocculin family is normally expressed. In some mutants, the loss of Sir2 induced transcriptional silencing increases switching frequency and turns on si-lenced proteins (Halme et al. 2004). The observed redundancy may therefore not exist solely to provide protection against mutation, but instead, to provide a reser-voir of contingency genes whose advantageous features can be positively selected for in response to novel or rare environmental conditions. Such a positional pref-erence is already well documented for contingency genes involved in immune evasion of parasitic protozoan (reviewed in Barry et al. 2003). Under these cir-cumstances it would be beneficial for essential housekeeping genes to concentrate away from the highly plastic subtelomeric regions. Intriguingly such a positional preference has also already been reported for C. elegans based on correlations be-tween chromosome location and lethality, and chromosome location and sequence similarity (Kamath et al. 2003; The C. elegans Sequencing Consortium 1998).

Genome wide expression studies in S. pombe have identified the telomeres as chromosomal regions enriched for meiotic genes induced in response to nitrogen starvation leading to the suggestion that spatial arrangement has a role in the acti-vation of genes required for this process (Mata et al. 2002; See also section 4.2). More recently Hansen and colleagues assayed the global effects of the silencing mutants in histone deacetylases (Clr3 and Clr6) and the histone methyltransferase (Clr4), using microarrays (Hansen et al. 2005). Many genes repressed by the Clr proteins cluster in extended regions close to the telomeres and these are largely overlapping with those shown previously to be expressed in response to nitrogen starvation (Mata et al. 2002). Hansen and colleagues also observed that the telom-eric regions contained genes, including transporters, whose expression in response to nutrient depletion may facilitate survival. A similar histone dependent repres-sion of environmental responsive genes in subtelomeric regions is observed in S. cerevisiae (Robyr et al. 2002).

Finally, Kellis and colleagues reported that the majority of the 18 species-specific genes which were present in S. cerevisiae but absent from syntenic posi-

Schizosaccharomyces pombe comparative genomics; from sequence to systems 239

tions in the closely related Saccharomyces (sensu stricto) strains were all at subte-lomeric locations (Kellis et al. 2003). Therefore, although subtelomeric duplicated ORFs are highly similar within a species; between species they appear to be rap-idly diverging.

In S. pombe functional categories of genes implicated in adaptations to envi-ronmental stresses appear to be frequently overrepresented for subtelomerically encoded genes. The observed changes in the expression of these genes when si-lencing factors are mutated, coupled with their frequent duplication and rapid di-vergence, suggest that sub-telomeric regions may provide the ideal genomic envi-ronment to create, test and select for novel genes which could be applicable to all fungi, or even eukaryotes in general. Future studies using refined datasets and an-notations will allow this hypothesis to be tested fully.

2.4 Gene density, GC composition and gene structure

Protein coding gene density is similar for chromosomes I and II, with one gene every 2462 and 2495 base pairs respectively, but lower for chromosome III which has one gene every 2766 base pairs. The reason for the substantially lower gene density on chromosome III is not known, but is not due to a difference in average gene length which is similar for all three chromosomes (1405-1444 base pairs). There are other notable differences between chromosome III and the other two chromosomes, including the maintenance of the tandem rDNA repeats and the more repetitive structure of its centromere. Chromosome III has also been shown to harbour an increased density of the remnants of transposable elements (Bowen et al. 2003; see also 2.11). It is possible that all of these observations are due to the different physical environment of some regions of this chromosome which may contribute to an enhanced capacity for the retention of duplicated sequence and indirectly, the lower gene density.

Protein coding genes are absent from the centromeres and gene density is lower than average at the telomeres. Overall gene density is one gene every 2528 base pairs compared with only one gene every 2088 base pairs for S. cerevisiae. This may reflect more complex regulatory structures, as average gene length (excluding introns) is approximately equivalent (1424/1460) but S. pombe intergenic regions are correspondingly larger (Wood et al. 2002).

Protein coding sequence accounts for 57% of the S. pombe genome, compared to 70.5% for S. cerevisiae. The overall GC composition is very similar for the two yeasts (36% and 38.3% for S. pombe and S. cerevisiae, respectively), and for the protein coding portion it is identical at 39.6%.

Introns are present in 2260 (46%) of fission yeast protein coding genes, and a total of 4722 have so far been identified. Intron length varies from 28 to 819 nu-cleotides with a mean of 82 nucleotides and the largest number found within a single gene is 15. Introns are much rarer in S. cerevisiae with only 301 identified in 5% of protein coding genes, although curiously, the mean length of S. cere-visiae introns is substantially longer at 216 base pairs (Dolinski et al. 2002, ftp://ftp.yeastgenome.org/yeast 12th July 2002).

240 Valerie Wood

Most S. pombe introns have GT donors and AG acceptors (only three con-firmed introns have a GC donor). The branch site is also well defined, with 95% of introns having a consensus YTRAY. Four additional branch sites, related to the consensus, are experimentally confirmed but used with decreased frequency. Fewer than 50 confirmed or predicted introns do not one have a verified branch site within 6-34 bases of the acceptor. At publication, 638 introns were experimen-tally confirmed by mRNA and EST data. This number has now increased to 722, although many more are supported by the absence of gaps across splice sites when aligned with related proteins.

For genes with one to six introns, a 5’ bias has been observed based on values expected if introns were evenly distributed within genes (Wood et al. 2002). A similar bias was observed previously in S. cerevisiae where it was hypothesised to be due to in vivo reverse transcription generating cDNAs which then replaced the original chromosomal gene (Fink 1987). Because cDNAs are extended from their 3’ ends, 5’ introns would have a reduced tendency to be removed. In addition, the number of genes with a specified number of introns decreases exponentially as in-tron number increases from two to six (614 have two introns, 324 have three in-trons,148 have four introns, 70 have five introns and 40 have six introns; Wood et al. 2002). Both of these observations may be of relevance to the speculation con-cerning the mechanism of intron removal.

The substantially larger intron number in fission yeast may provide a greater potential for post transcriptional regulation of biological processes via the con-trolled regulation of intron processing. It has also been proposed that the splicing machinery in S. pombe is closer to higher eukaryotes in both similarity and content (Käufer and Potashkin 2000; Kuhn and Käufer 2003). In support of this, recent studies have shown that some components of the splicing machinery are con-served from fission yeast to humans but absent from S. cerevisiae and that these appear to play a role in the splicing of particular subsets of genes (Webb and Wise 2004).

2.5 Proteome complement

A central goal of biological research is to describe fully the information encoded in a genome and how this is integrated into the orchestrated collections of proc-esses and functions which combine to produce living cells. Towards this goal, continual refinement of the gene structures and gene complements of sequenced genomes is necessary to provide the most accurate ‘parts list’ possible. Such a list is a prerequisite for a summary of an organism’s functional capabilities, to parti-tion the non-coding portion of the genome, and for accurate orthology mappings. Gene prediction in the relatively densely packed genomes of single celled fungi is substantially easier than for higher eukaryotes. However, the presence of splicing, and the difficulty in distinguishing short genes from short spurious ORFs means that even the basic statistic of gene number is not trivial to obtain. Gene structures are revised primarily by the incorporation of new information from both similarity searches and experimental data. Gene complement is refined by; (i). the identifica-

Schizosaccharomyces pombe comparative genomics; from sequence to systems 241

tion of new genes; (ii). ‘partitioning’ of dubious ORFs which are unlikely to be protein coding; (iii). detection of distant orthologs or other signals which provide evidence for the biological significance of a predicted translation; and (iv). ex-perimental verification.

The publication of the S. cerevisiae genome in 1996 reported 6275 ORFs but estimated that only around 5800 were likely to be coding, based on the predicted number of small but spurious ORFs which would be included by chance due to the 100 amino acid cut-off threshold (Goffeau et al. 1996). Early efforts to establish absolute protein coding gene complement were thwarted by the absence of; (i). homologous sequences in the public databases; (ii). adequate tools for gene dis-covery and (iii). available experimental data. Subsequent re-analyses based on ad-ditional data and the annotation methods implemented for S. pombe, and compari-sons with the partial shotgun sequence from 13 hemiascomycetes predicted similar protein complements (a maximum of 5570 ‘real’ ORFs over 100 codons, and a minimum of 5600 ‘real’ ORFs including ORFs under 100 codons respectively; Wood et al. 2001; Blandin et al. 2000). Both of these studies also provided im-proved gene coordinates and status calls for individual ORFs.

Recently, detailed comparisons with the genomes of four closely related syntenic Saccharomyces species (sensu stricto) and the slightly more distantly re-lated filamentous ascomycete Ashbya gossypii have provided an increasingly re-fined gene complement (Cliften et al. 2003; Kellis et al. 2003; Brachat et al. 2003). Modifications included improved gene structure coordinates, the identifica-tion of small genes and improved distinction between dubious and verified ORFs using reading frame conservation. The changes reported by these and other analy-ses (affecting approximately 10% of the genome) have been reviewed and incor-porated into the Saccharomyces Genome Database (SGD). This database currently reports a total of 6606 protein coding genes, 829 of which are dubious (using nu-merous criteria) giving a likely total of 5777 which includes 309 coding sequences under 100 amino acids (SGD http://www.yeastgenome.org/ 9th Nov 2004).

Publication of the fission yeast genome in 2002 recorded an upper estimate of 4940 protein coding sequences (including 11 mitochondrial proteins and 116 du-bious ORFs), the smallest number for a sequenced free-living eukaryote at publi-cation. This number has since increased to 4973 through the addition of 22 genes in sequenced gaps, and 14 genes which were missed during first pass annotation because they were either below the threshold size of 100 codons, or highly spliced. These are documented at http://www.genedb.org/genedb/ pombe/newgenes.jsp. Due to stricter annotation criteria, only 90 genes are now re-ported as dubious, so the present protein coding gene count is 4883.

The initial gene predictions were performed using GeneFinder trained on ex-perimentally verified S. pombe genes (Green and Hillier, unpublished software). These preliminary gene structures were refined by multiple rounds of manual in-spection within the Artemis analysis and annotation tool (Rutherford et al. 2000). When applicable the results of sequence similarity searches using BLAST, FASTA and Genewise against the UniProt (formerly Swissprot and TrEMBL), EMBL and Pfam databases were incorporated to extend gene predictions and to correct intron/exon boundaries (Altschul et al. 1990; Pearson and Lipman 1988;

242 Valerie Wood

Birney et al. 1996; Apweiler et al. 2004; Kulikova et al. 2004; Bateman et al. 2004). Intron boundaries were refined using EST data mapped onto the genome data using EST_GENOME, and by a Hidden Markov Model trained on S. pombe intron sequences using HMMER (Mott 1997; Hughey and Krogh 1996). All splice sites were manually inspected for ungapped homology across intron/exon bounda-ries and the presence of a consensus branch site, and adjusted when necessary. These integrated methods, coupled with manual intervention, have provided highly accurate gene structures for the fission yeast.

Since publication, additional experimental data (sequenced mRNAs), and ho-mology, have resulted in updated structures for 31 of the original gene predictions. These are documented at http://www.genedb.org/genedb/pombe/coord-Changes.jsp. The changes include; (i). the addition of small N- or C-terminal ex-ons; (ii). changes to the N-terminal methionine (sequence extended or reduced); (iii). replaced N-terminal exons; (iv). alteration of intron boundaries; (v). gene splits (two); (vi). additional in-frame splice (one); (vii). single base deletion (one). It is likely that 5777 and 4883 are close to the actual protein coding totals for both genomes although undoubtedly further small or highly spliced genes remain to be discovered.

2.6 Non coding RNA complement

Non coding RNAs (ncRNAs) include all RNAs other than mRNA and are central to a wide range of biological processes including transcription, translation, gene regulation and splicing. The number of known ncRNAs is expanding, but in the absence of an obviously detectable signal for many ncRNAs, especially those pre-sent in low copy number, their computational identification is still difficult (Eddy 2002).

In addition to the ~5000 protein coding genes there are ~600 known or pre-dicted genes for various cellular RNAs (more than 10% of the gene content). At present, 170 transfer RNAs (tRNAs; 195 including mitochondrially encoded tRNAs) are reported, compared to 288 in S. cerevisiae. This is likely to encompass the complete tRNA complement for the S. pombe genome and reflects the relative ease and accuracy with which tRNAs can be predicted by tRNAscan-SE (Lowe and Eddy 1997).

The 5.8S, 18S and 26S ribosomal RNAs (rRNAs) are present in tandem arrays of which there are an estimated 100-120 copies (Schaak et al. 1982; Barnitz et al. 1982). The genome sequence has a couple of representative copies of this repeat from the beginning of each tandem array. The 5S rRNAs are present in 32 copies dispersed throughout the genome in contrast to the 100-200 present in the S. cere-visiae rDNA repeats (Mao et al. 1982; Aarstad and Oyen 1975).

The spliceosomal RNAs (U1-U6), together with 34 small nucleolar RNAs (snoRNAs), are dispersed throughout the genome4. The snoRNAs cannot be de-tected by similarity alone and are difficult to predict computationally, although

4 U3 has 2 copies in S. pombe, U5 has 2 copies in S. cerevisiae.

Schizosaccharomyces pombe comparative genomics; from sequence to systems 243

there have been advances in methods for their detection (Lowe and Eddy 1999). Based on the number of snoRNAs identified in S. cerevisiae to date (68), at least 30 additional snoRNAs are likely to be present in S. pombe (T. Lowe, personal communication).

Besides the major classes of RNA, 8 ncRNAs have been identified experimen-tally: RNase P K-RNA (Krupp et al. 1986), sme2-meiRNA (Watanabe and Ya-momoto 1994), 7SL-RNA (Ribes et al. 1998), meu3, meu11, meu16, meu19 and meu20 (Watanabe et al. 2001). An additional 124 loci have been annotated as po-tential RNA genes (for example transcripts with no detectable open reading frame; Watanabe et al. 2002). It is likely that some of the 68 uncharacterised prl loci (which correspond to cDNAs lacking apparently long open reading frames, and often overlap with previously identified transcripts), and tos1-3 which are an-tisense to rec7, have regulatory roles (Watanabe et al. 2002; Molnar et al. 2001). Inevitably, many more unidentified RNA genes (antisense, structural and cata-lytic) will play important roles in fission yeast and other organisms. The complete RNA complement can be accessed from http://www.genedb.org/shortcuts.jsp.5.

2.7 Intergenic regions

Intergenic regions are larger, on average, between divergent genes containing two promoters (1341 bp) than between convergent genes containing two downstream regions, and therefore promoterless (558 bp), while intergenic regions between tandem genes containing one promoter, and one downstream region, show an in-termediate length distribution (955 bp; Wood et al. 2002)6. All mean intergenic distances for S. pombe are larger than the corresponding mean distances for S. cerevisiae, although the difference for divergent genes is larger and the difference for convergent genes is smaller. Intergene regions for S. pombe have a mean of 952 bp, compared to the S. cerevisiae mean of 515 bp. Several explanations can account for this observation. The untranslated regions (UTRs) may be systemati-cally longer in S. pombe than in S. cerevisiae. Mean lengths of identified 5’ UTRs are 178 nucleotides and 95 nucleotides, and 3’ UTRs 225 and 180 nucleotides for S. pombe and S. cerevisiae respectively. (S. pombe data, ftp://ftp.sanger.ac.uk/pub/yeast/pombe/UTRs; S. cerevisiae data, E. Hurowitz, per-sonal communication)7. Although S. pombe UTRs are apparently longer, this dif-ference would not account for the species differences for intergenic length. In ad-dition, the 5’ > 3’ bias can also not be attributed to longer 5’ UTRs as 3’ UTRs appear to be on average, longer. The promoter regions may be more complex and therefore longer in S. pombe, although there is no evidence to support this at pre-

5 The numbers reported here exclude the small complementary microRNAs for centromeric

function (Volpe et al. 2002). 6 Intergene distance is calculated from the stop and/or start codons between adjacent genes. 7 The S. cerevisiae average sizes were obtained from RACE-PCR experiments which have

higher success rates for genes with shorter UTRs, so the average reported here may be lower than the true genome average.

244 Valerie Wood

sent. However, there is evidence that classes of promoter proximal mammalian transcription activation domain, which are non functional in S. cerevisiae, are functional in a proximal promoter context in S. pombe suggesting there may be a closer relationship with higher eukaryotic promoters (Remacle et al. 1997). Repli-cation origins are known to be more extended in S. pombe than in S. cerevisiae (see section 2.8 below). There are also annotated examples of extended low com-plexity gene free regions in S. pombe (around 10 per chromosome) which, at 4-8kb fall outside the normal distribution of lengths associated with average inter-genic regions (Wood et al. 2002). These gene free tracts are usually flanked by di-vergently oriented genes and exhibit a (G-C) / (G+C) base compositional bias which switches strand in the centre of the gene free region. One such region in cosmid c4G8 corresponds to a prominent meiotic DNA break site (Young et al. 2002). No such gene free regions have been identified in S. cerevisiae. Intergenic regions are also more AT rich (69.4%) than the genome average (64%; Dai et al. 2005).

Publicly available EST data and mRNAs in the EMBL database have been mapped on to the genome sequence using EST_GENOME (Morimyo et al.1997; Kulikova et al. 2004; Mott 1997). When sequence quality was sufficient to deter-mine transcriptional start or end, these have been manually curated to create fea-tures for untranslated regions. This dataset provides 370 5’ UTRs and 742 3’ UTRs which are available to download from http://www.sanger.ac.uk/Projects/ S_pombe/DNA_download.shtml. These features provide a preliminary dataset of truly coding regions for a subset of genes by providing delimiters between gene boundaries and truly intergenic regions.

2.8 Replication origins

DNA replication origins (ORIs) are specific sites within a DNA molecule where DNA replication is initiated. Researchers would usually include in this definition any ‘cis acting’ sequences which affect origin function by binding the machinery that initiates and regulates replication (Masakuta et al. 2003). Replication origins have been identified in a variety of organisms including mammals, but are best studied in the two yeasts. Replication origins in S. cerevisiae are as short as 75 base pairs with an 11 base pair consensus and a number of partially redundant elements with varying distribution (Broach et al. 1983; Theis and Newlon 1997; Theis and Newlon 2001). Recent approaches based on chromatin immunoprecipi-tation and density labelling have predicted the distribution of 400 putative ORIs in S. cerevisiae (Wyrick et al. 2001; Raghuraman et al. 2001). In comparison, S. pombe replication origins are substantially larger and have a modular structure, possibly because more protein-DNA interactions are involved in replication initia-tion (Dubey et al. 1996). They require a minimum length of 0.5-1 kb and have no recognisable consensus, although they do contain asymmetric and non-asymmetric A-T stretches (Maundrell et al. 1988; Clyne and Kelly 1995). Like mammalian replication origins, they appear to be located preferentially upstream of RNA Po-lymerase II promoters (Gomez and Antequera 1999).

Schizosaccharomyces pombe comparative genomics; from sequence to systems 245

The first genome wide survey of potential replication origins in fission yeast showed that 90% of A+T rich islands colocalised with active ORIs (Segurado et al. 2003). The mean genomic frequency of the 384 A+T rich islands is one every 33 kb, and these all map to intergenic regions. A bias was also observed for their location in divergent transcription units, although this may be due to the larger size of these regions (see section 2.7 above). A similar number and distribution has also been observed using microarrays (C. Heichinger, personal communica-tion). There are significant clusters of ‘replication origin associated’ AT rich is-lands in the centromeres, and in the subtelomeric regions of chromosomes I and II and the mating-type locus (fourfold higher than the genome average), although the significance of this is not known.

It was recently reported by Dai and colleagues that the relative origin activity of an intergene in S. pombe is a function of its length and AT content rather than a specific nucleotide sequence requirement, and that sequence properties ascribed to origins are therefore general characteristics of intergenic regions (Dai et al. 2005). It is proposed that the intergenes which function as origins are likely to form a broad continuum, and demonstrated that any intergenic region over ~900 kb in length and greater than 70% AT (close to the intergene average) is likely to have origin activity. A stochastic model is proposed, where the binding affinity of the origin recognition complex (ORC) subunit Orc4 is dependent on both AT content and length, in a departure from the classical model which predicts binding to a small number of sites with high specificity. This model explains the observation that the origins studied so far in S. pombe are not used in every cell cycle (because the number of potential origins greatly exceeds the number of ORC molecules), and may also explain some features of origins in metazoans.

Although the number of predicted ORIs in S. pombe (385) and S. cerevisiae (400) are very similar, they do not appear to be similar in composition. S. pombe ORIs are more similar to mammalian ORIs in their lack of consensus sequences, presence of multiple dispersed partially redundant elements, and preference for as-sociation with promoter regions. These preliminary global analyses of replication will provide a framework to study the contribution of replication origin structure and function to replication dynamics and for the dissection of organismal similari-ties and differences.

2.9 Mitochondrial genome

The mitochondrial genome of fission yeast is considerably smaller than that of budding yeast (20 kb versus 85.8 kb) and contains a smaller number of protein coding genes (11 versus 28; Lang et al. 1987; Foury et al. 1998). However, in S. cerevisiae, 9 of these appear to be complete orphan genes of small size (<134 amino acids) and are likely to be spurious ORFs. The remainder of the non-conserved genes are involved in intron metabolism and are absent from some close relatives of S. cerevisiae. Therefore, the ‘ancient’ coding portion of the mi-tochondrial genome is almost identical between the two yeasts.

246 Valerie Wood

2.10 Pseudogenes

The incidence of pseudogenes is relatively low for both yeasts. The fission yeast genome database (GeneDB S. pombe http://www.genedb.org ) reports a total of 47 pseudogenes (9 of which are transposon or wtf related) compared to 22 pseu-dogenes reported by SGD for S88 strain of S. cerevisiae (http://www.yeastgenome.org/ 14th July 2004). In S. pombe, the majority of genes designated as pseudogenes have more than one frameshift, some are extremely degraded and were only identified as former coding sequences by BLASTX se-quence similarity searches. It is not presently possible to identify genes which may be pseudogenes due to inactivated promoters. It is also possible that some genes reported as pseudogenes may in fact be sequencing errors resulting from sponta-neous mutations in the clone libraries. Apparently frameshifted genes (for exam-ple spa1 in S. pombe) may also have valid translations due to ribosomal frameshifting mechanisms (Ivanov et al. 1998; Zhu et al. 2000). A number of S. pombe annotated pseudogenes have been shown to be transcribed (Mata et al. 2002; Chen et al. 2003), and in human the RNA of an expressed pseudogene has been shown to have a regulatory function (Hirotsune et al. 2003). The current in-ventories of pseudogenes for both species should therefore be evaluated with cau-tion.

2.11 Transposable elements

LTR (long terminal repeat) retrotransposons and endogenous retroviruses consti-tute variable proportions of their host genomes, and genome sequencing has re-vealed a diverse range of organismal transposon content. The availability of the complete fission yeast genome sequence has provided the opportunity to perform a comprehensive analysis of the entire complement of transposable elements with respect to their chromosomal distribution, insertion site preferences and evolution (Bowen et al. 2003). Only two families of transposons (Tf1 and Tf2) belonging to the Ty3/Gypsy group were known to exist in S. pombe (Levin et al. 1990; Levin 1995). Homology based methods confirmed that the S. pombe sequenced strain contained only 13 full length copies of a single family of active transposon (Tf2) and that there were no Tf1 elements in the laboratory strain. The transposon com-plement is therefore substantially lower than the 50 LTR-retrotransposons re-ported for budding yeast (Kim et al. 1998). It has been speculated that this differ-ence may be due to the loss of the RNAi machinery from S. cerevisiae because of the involvement of RNAi in the removal of duplicated sequence (Aravind et al. 2000). In addition, 274 intact and 75 fragmented (<200 base pairs) solo LTRs and five transposon fragments, marking the site of former transposition events, were identified. The intact LTRs were classified into at least three large groups; (i). those closely related to Tf2 (35; ii). those closely related to Tf1 (28), and (iii). many more distantly related small families (111). Some of these more distant line-ages were identical or highly similar to each other. Close examination revealed that these were all subtelomerically located and that their similarity was a result of

Schizosaccharomyces pombe comparative genomics; from sequence to systems 247

telomeric duplications. This is consistent with the increased sequence similarity at these locations (see section 2.3). In total, transposon derived sequences account for ~133,000 base pairs or 1.1% of the sequenced portion of the genome compared to 2.4% for S. cerevisiae.

Experimental studies of insertion site preference in S. pombe have shown that the Tf1 element has a significant preference for insertion into intergenic sequence within 300 nucleotides of the 5’ end of a coding sequence (CDS; Behrens et al. 2000; Singleton and Levin 2002). Bowen and colleagues provide complementary studies using a bioinformatics approach to support the previous experimental data for integration site preference (Bowen et al. 2003). Analysis of the 186 intact transposons and LTRs revealed that all insertions were exclusively intergenic. The frequency of insertion into intergenic regions proximal to CDS in tandem, diver-gent or convergent orientation was analysed. A positive correlation was detected between the number of expected transposon insertions and the number of expected RNA polymerase II promoters, in different spatial contexts. Insertions into inter-genic regions between convergent genes containing no promoters were found to be statistically under-represented (incorporating corrections for size differences). Furthermore, the distance between each insertion and the end of the nearest ORF was significantly biased for insertions associated with the 5’ end of genes, the ma-jority clustering between 100 and 400 base pairs of the 5’ end of the neighbouring CDS. Therefore, in contrast to S. cerevisiae, where transposons appear to target upstream of RNA polymerase III transcribed genes by specifically interacting with a component of the RNA polymerase III transcription machinery (Chalker and Sand Meyer 1992; Yieh et al. 2000); S. pombe transposon insertion sites appear to show an increased preference for RNA polymerase II promoters. S. cerevisiae is reported to contain 344 transpositions derived insertions or their remnants (Kim et al. 1998), so the overall numbers of transposons, or transposon footprints are simi-lar for these two yeasts.

During sequencing and annotation, a novel species specific high number copy family was identified (Wood et al. 2002). They were named wtf (for with Tf) be-cause many members of this family were flanked by Tf2-type LTRS. There are 25 sequences related to the wtf family, which was identified as the largest family of S. pombe specific genes in an analysis of lineage specific gene expansions (Lespinet et al. 2002). The only experimental data available has shown these genes to be upregulated up to 100 fold during meiosis (Watanabe et al. 2001; Mata et al. 2002). Surprisingly, 23 of the 25 copies were located on chromosome III.

Bowen et al. also analysed the genome wide distribution of insertion elements and showed that chromosome III contained almost twice as many insertions as the other two chromosomes. Further investigation revealed the association of wtfs with LTRs was responsible for 80% of the over-representation of LTRs on this chromosome. The nature of the mechanism of expansion of the wtf family is cur-rently unclear but it now appears that the targeted integration of Tf elements and subsequent duplications have contributed to their association with LTRs. It is in-teresting to speculate whether the higher transcription level of the wtfs may have contributed to the accumulation of nearby Tf insertions and is analogous to the re-ported preference of HIV-1 integrations for actively transcribed genes (Bowen et

248 Valerie Wood

al. 2003; Schroder et al. 2002). Furthermore, the integrase of Tf1 and Tf2 contains a chromodomain, which is implicated in chromatin remodeling via its interactions with histones (Malik and Eickbush 1999). It is therefore possible that the inser-tional preference of Tf insertion into actively transcribed genes is mediated by this chromodomain (Bowen et al. 2003). It appears that, despite the low abundance of transposable elements, the study of transposition mechanisms and insertion site preference in S. pombe will continue to be informative regarding the contribution of transposition to the shaping of genome content.

2.12 Genome features summary

A summary of the genome features and contents described here are presented in Table 1. Data is accessible via the GeneDB database (http://www.genedb.org/ S_pombe/; Hertz-Fowler et al. 2004), or the S. pombe project page at the Well-come Trust Sanger Institute (http://www.sanger.ac.uk/Projects/S_pombe/; WTSI).

3 Genome and proteome sequence comparisons

3.1 Introduction

Genome and proteome sequence comparisons provide insights into the functional similarities and differences, and evolutionary relationships, between the species compared. To fully elucidate the events operating on evolutionary timescales, it is necessary to compare sequences with different degrees of evolutionary related-ness. Distantly related genomes reveal ancient events and relatively slow changes, whereas more closely related genomes reveal recent and more rapid changes. Comparison of genomes identifies genes and other functional elements, regions of genome duplication and syntenic regions with other organisms. S. pombe is too divergent from currently available fungal genomes for direct genome comparisons to be informative in terms of genome rearrangements or content. However, the availability of the predicted proteomes of these two eukaryotic models has al-lowed the comparison of their protein complements to assess the similarities and differences in both size and content. Preliminary proteome comparisons, using pairwise sequence similarity, provide an overview of the potential conserved and species specific components of an organism. More specific classification of pro-teins, according to their potential evolutionary relationships, provides a natural framework for comparative genomics, functional annotation and evolutionary analysis. In this section, a summary of the initial global genome and proteome comparisons, and an overview of the more granular classification of orthologs, is presented.

Schizosaccharomyces pombe comparative genomics; from sequence to systems 249

Table 1. Comparative genome features and contents of the S. pombe and S. cerevisiae ge-nomes

S. pombe S. cerevisiae Genome size (sequenced/total) 12.5 Mb (~14.1 Mb) 12.1 Mb (~13.0 Mb) Chromosomes number 3 16 Chromosome size range 3.5-5.7 Mb 0.2-1.5 Mb Centromere size 35-110 kb ~0.15 kb Gene density (average bp/gene) ~2,530 bp 2,090 bp Average gene length ~1,430 bp 1,460 bp Overall GC content 36.0% 38.3% GC content in protein coding se-quence

39.6% 39.6%

GC content in intergenic sequence 30.6% - Intron number ~4,730 ~272 Genes with introns 2260 (46%) 257 (5%) Average intron length 82 bp (29 bp-819 bp) 216 bp Maximum number of introns/gene 15 3 Gene number (protein coding) 4,973 6606 Gene number (ex dubious) 4883 5777 tRNA genes 195 288 5.8S, 18S, 26S rRNA genes 100-120 tandem repeats (2

arrays) ~150 tandem repeats (1 array)

5S rRNA genes 32 dispersed genes 1-200 in rDNA repeats small nuclear RNA genes(snRNAs)

7 7

small nucleolar RNA genes (snoRNAs)

34 68

Other RNA encoding genes 8 4 Inter-gene regions (mean/median) 952 bp/423 bp 515 bp/200 bp Mean distance between divergent genes

1341 bp 570 bp

Mean distance between tandem genes 955 bp 586 bp Mean distance between convergent genes

558 bp 339 bp

UTR length 3’ 225 180 UTR length 5’ 178 95 Replication origins ~400 ~400 Replication origin sizes 0.5-1 kb 75-150 bp Mitochondrial genome 20 kb (11 genes) 85.8 kb (28 genes) Pseudogenes (excluding wtf) 39 22 Tf type transposons 13 /2 pseudo 59 Long terminal repeats (LTRs) solo intact

274 268

wtf elements (with tf2 type LTRs) 25/9 pseudo 0

250 Valerie Wood

3.2 Genome sequence comparisons

Numerous tracts of co-linear duplicated genes are detected in S. cerevisiae and were proposed to be the remnants of a whole genome duplication event in its evo-lutionary history (Wolfe and Shields 1997). The availability of the genomes of syntenic species, which diverged both before and after the proposed split, has pro-vided irrefutable evidence for this event (Wong et al. 2002 ; Kellis et al. 2004; Dietrich et al. 2004). Similar searches for tracts of conserved gene order did not reveal evidence for large scale genome duplications in S. pombe (Keogh et al. 1998; Wood et al. 2002). Synteny is not detectable between fission yeast and any other available fungal genomes at the time of writing; any relationships have been obscured by chromosomal rearrangements, gene duplications and losses. How-ever, a small number of segmental duplications are detectable in S. pombe, as blocks of intra genome conserved gene order at the sequenced subtelomeric re-gions of chromosomes I and II (see section 2.3). Thirty two tandemly repeated genes are also recorded.

3.3 Proteome sequence comparisons

Preliminary proteome comparisons between S. pombe, S. cerevisiae and C. ele-gans indicated that around 4050 (83%) S. pombe genes were common between the two yeasts (3281/67% of these also common to C. elegans; Wood et al. 2002). A small number (145/3%) were reported as present in C. elegans but not S. cere-visiae, and 681 (14%) were unique to S. pombe. Reciprocal comparisons revealed a larger number (4523) of S. cerevisiae proteins were conserved in S. pombe (3605 also in C. elegans) and 1104 were unique to S. cerevisiae. The number of genes conserved only between the two fungal species was greater in S. cerevisiae than in S. pombe (918 versus 769). These differences can only be explained by a greater number of duplicated genes being present in S. cerevisiae. The number of unique genes was greater for S. cerevisiae than S. pombe (1104 versus 681). This differ-ence is primarily due to an increased number of duplicates in S. cerevisiae. How-ever, it is possible that an increased number of newly evolved genes not generated by a duplication event, or a larger number of horizontally transferred genes are also contributory factors8.

Further analysis based on protein clustering estimated the numbers of multi-member families versus singletons in both yeasts. This showed that S. cerevisiae has around 716 protein coding genes belonging to multi-member families but S. pombe has only around 361, supporting the conclusion that more duplicated genes are present in S. cerevisiae. These observations, and the absence of any co-linear duplicated segments indicate that S. pombe is unlikely to have undergone any whole genome duplication events since it separated from the Saccharomyces line-age, an estimated 300-400 My ago.

8 ‘Unique’ is used here only with respect to the two species compared.

Schizosaccharomyces pombe comparative genomics; from sequence to systems 251

3.4 Orthologous groups

The concept of homologs (genes descended from a common evolutionary ances-tor), and the implicit inference of evolutionary history which accompanies this concept, originated from the seminal work of Ohno in the 1970’s (Ohno 1970). Homologs are further classified as orthologs (direct evolutionary counterparts by vertical descent i.e. the same gene in a different species) and paralogs (genes which have arisen by duplication events within a genome after a speciation event; Fitch 1970). These concepts are now routinely used in global genome comparisons and annotation protocols (Tatusov et al. 1997; Chervitz et al. 1998).

The identification of candidate orthologs, and orthologous groups between spe-cies, is a prerequisite for the rigorous evaluation of the nature and frequency of the events in their evolution affecting protein number and type; specifically gene du-plications, lineage specific gene loss gene divergence and horizontal transfer. Ac-curate orthology mapping between S. pombe and S. cerevisiae provide a frame-work for the reconstruction of the evolutionary events giving rise to these two species.

Preliminary global analysis using BLAST with a threshold cut-off provided ini-tial estimates of the level of protein conservation and redundancy between the two yeasts. Analyses of this type provide a useful overview but are unsuitable for the transfer of functional information because of a failure to detect many similarities (false negatives) and the inability to distinguish spurious matches (false positives). Functional transfer based on top scoring BLAST hits is only suitable for a propor-tion of any proteome, even when the alignment appears to be significant, and should be applied with extreme caution in any annotation pipeline. For robust functional annotation, orthologous relationships should ideally be identified by phylogenetic analysis of entire families but evolutionary inferences of orthology can usually be made without phylogenetic methods. A number of resources for automated ortholog detection are available. The most commonly used are COGS/KOGS, Inparanoid and OrthoMCL (Tatusov et al. 2003; Koonin et al. 2004; Remm et al. 2001; Li et al. 2003). These are based on initial candidate ortholog identification using pairwise BLAST comparisons followed by different methods (clustering or reciprocal best hit identification) to generate orthologous groups. Differing output and coverage indicate that these methods are currently sub-optimal (Li et al. 2003).

Most algorithms are ultimately dependent on reciprocal best hits which provide a good approximation of orthology. However, not all orthologs are reciprocal best hits, or even best hits. Extremely divergent proteins with lower levels of sequence conservation can often generate spurious matches, and obscure truly homologous relationships. The large number (30%) of reported KOGS orthologous clusters with unexpected phyletic patterns may be artificially large as a result of this re-striction. Lineage specific gene losses can also complicate ortholog determination by generating spurious false positives. Finally, a global threshold cut off for can-didate ortholog identification will impose an arbitrary restriction whereby ex-tremely divergent orthologs will not be detected.

252 Valerie Wood

3.4.1 Establishing orthology

An orthology mapping between S. cerevisiae and S. pombe has been created based on manual inspection of pairwise alignments, multiple alignments and protein clusters, using alignments seeds from numerous algorithms including BLAST, PSI-BLAST, FASTA, Pfam-B/Domainer; Alschtul et al. 1990; Alschtul et al. 1997; Pearson and Lipman 1988; Sonnhammer et al. 1997). Ambiguous relation-ships are inspected after clustering using CLUSTAL W (Thompson et al. 1994) and identified orthologs are corroborated by experimental evidence where avail-able. This has a number of advantages over automated methods including in-creased accuracy, increased specificity, greater coverage and the ability to com-bine data from multiple resources, including ortholog identification software (Wood et al. manuscript in preparation).

Firstly, accuracy is increased by manual curation through improved discrimina-tion for multi domain proteins by the inspection of domain organization. In addi-tion, ‘fusion proteins’ (a protein in one organism which maps independently to two unrelated proteins in another organism), can be identified. For example, S. pombe Pdf1 is a fusion between palmitoyl-protein thioesterase (PPT) and dolichyl pyrophosphate (Dol-P-P) phosphatase which is proteolytically cleaved after trans-lation. The two mature proteins are functionally connected but the domain combi-nation is not observed in other organisms, possibly indicating a recent fusion event. The PPT is the functional homolog of the neuronal ceroid lipofuscinosis (Battens disease) protein in humans and is absent from S. cerevisiae, although the Dol-P-P is present. These complex patterns of conservation are difficult to unravel with automated methods which usually rely on arbitrary thresholds for the length of the similarity hit and sequence identity when identifying candidates.

Not all similarities are due to homology, and unrelated proteins can sometimes generate reciprocal best hits. Manual inspection and experimental data can be used to distinguish non-orthologous sequences and increase accuracy. Granularity can be increased by detecting orthologous pairs within cluster members. Independent orthologs can also be detected for related proteins with promiscuous domains par-ticularly the WD, TPR, HEAT and LRR families of repeat containing proteins. For example, KOG0266 includes three S. cerevisiae and three S. pombe proteins. This cluster can form independent orthologous groups between S. cerevisiae Cps30 and S. pombe Swd3 and between S. cerevisiae Tup1 and S. pombe Tup1 and Tup11. Uncharacterised S. cerevisiae YGL004C is more distantly related to all of the other cluster members. The discrimination of independent orthologs is crucial for accurate functional transfer based on sequence similarity.

Most importantly, increased coverage can be obtained by distant ortholog de-tection. Orthologous proteins show a broad distribution of sequence similarity (evolutionary rate). Not all orthologs are significantly similar, and the inspection of individual pairwise or multiple alignments can often result in the detection of truly homologous relationships which are not necessarily best hits. For example, the S. pombe/S. cerevisiae orthologous pairs Orc6/Orc6p, Rpa34/Rpa34p, Pcp1/Spc110p, Ker1/Rpa14p, and Swi5/Sae3p are not BLAST reciprocal best hits, and are not detected by KOGS or Inparanoid.

Schizosaccharomyces pombe comparative genomics; from sequence to systems 253

For statistically insignificant short motifs, confidence in ortholog assignments can be increased by the consideration of:

i. conserved residue type; residue properties likely to have functional sig-nificance, for example rare or charged amino acids, especially when con-served in all cluster members.

ii. spatial context of alignments; correspondence of the positions of the con-served region(s) in the protein i.e. co-linear high scoring pairs (HSPs)

iii. spatial context and conservation of other protein features; transmembrane domains, signal sequences, predicted posttranslational-modification sites

iv. correspondence of protein length v. phylogenetic distribution and copy number; especially if conserved in a

single copy in all sequenced eukaryotes vi. functional context; supporting experimental data, for example similar

knockout phenotype, or missing member of conserved stoichiometric complex

Directed searches of the orphan (non conserved) protein set can be performed to detect candidate orthologs for less conserved proteins. In most cases, multiple lines of evidence will be used to support such a prediction. The S. cerevisiae ortholog of S. pombe DNA recombination/repair protein Swi5 was identified by a directed search and proposed adjustment to the S. cerevisiae gene prediction for SAE3 (Akamatsu et al. 2003; Young et al. 2004). The orthology prediction was supported by the conserved residue type and context, length, single copy distribu-tion in sequenced eukaryotes, and a conserved recombination defective phenotype. S. pombe Sad1 spindle pole body component is predicted to be the ortholog of S. cerevisiae Mps3 on the basis of a reciprocal BLAST hit with low significance, but is supported by co-linear HSPs, a transmembrane region in similar sequence con-text, and similar cellular localization. Orthologs have also been detected for mem-bers of conserved complexes by targeted searches which identified small and highly spliced genes missed by the first pass annotation. These include potential orthologs for S. cerevisiae Pop8, Ost4, Sen15, Dad3 and Sus1.

The annotation procedures outlined here remove the biologically artificial re-striction of genome wide cut-off threshold for sequence similarity and match length, and the dependence on a single algorithm. Orthology assignments can be incorporated from multiple sources (both software and experimental results). For example, recent comparison of the remaining orphan set against KOGs identified predicted orthologs for 7 sequences9. Manual inspection determined three other KOG predictions for SPAC1687.10/YOR058C, SPAP8A3.13/YGR066C/ YBR105C, and SPAC1A6.07/YLR330W to be false positives based on additional evidence. For example, S. cerevisiae YOR058C is a microtubule associated pro-tein and its predicted ortholog is SPAPB1A10.09 (Pfam family PF03999).

A total of 3636 S. pombe proteins and 3842 S. cerevisiae proteins have curated orthologs in the other yeast (summarised in Table 2). The remaining 1235 S. pombe proteins and 1704 S. cerevisiae proteins have no predicted ortholog in the

9 SPAC6F12.08c, SPCC1620.07c, SPCC736.12c, SPAC553.06, SPAC25B8.02,

SPCC1289.09 and SPBC24C6.08.

254 Valerie Wood

other yeast at present. However, a number of these have homologs in other species (498 and 346 for S. pombe and S. cerevisiae respectively; see section 3.5)10. A number of proteins in both organisms (68 and 307 for S. pombe and S. cerevisiae respectively), have conserved domains, but their respective orthologs cannot be distinguished as multiple duplications and gene losses have obscured their evolu-tionary relationships. The majority of these are regulatory proteins, and include a high proportion of transcription factors and proteins with RNA binding motifs. Further work and additional sequenced genomes will allow the relationships be-tween these to be resolved.

3.4.2 Orthologous relationship type and function

Ortholog identification is complicated by duplication events and can only be de-scribed accurately by multiple relationship type mappings (Table 2). Most orthologous relationships have a ‘one to one’ mapping (2396) where a single S. pombe gene maps to a single S. cerevisiae gene and vice versa. One to one map-pings are usually functionally equivalent, especially when universally conserved in a single copy in most or all eukaryotic genomes. These are predominantly core ‘informational’ proteins (those involved in processes related to genome stability and maintenance, transcription, translation and biosynthetic metabolism). The re-maining mappings represent instances of duplication in either one or both organ-isms and will be discussed in this context.

Recently duplicated genes are likely to have the same function and the most likely fate is rapid loss of one duplicate. However, duplicate genes which are re-tained usually have one of two fates; i). one copy will retain the original function and the other copy will evolve (often undergoing accelerated evolution) to gain a novel function or specificity (derived function, or neofunctionalization) or ii). the existing function is partitioned between the duplicate copies often by differential expression or compartmentation (subfunctionalization; reviewed in Prince and Pickett 2002).

S. cerevisiae genes which can be mapped to the most recent polyploidization event can be assumed to have formed simultaneously. Approximately 16% of the S. cerevisiae gene complement (~500 pairs) is estimated to be part of a duplicate pair dating from this whole genome duplication (reviewed in Wolfe 2004). The observation that S. pombe has 912 duplicated gene products in the conserved set (and additional duplicates in the non-conserved set) implies that both yeasts have been consistently prolific in generating and retaining duplicates since their diver-gence11.

10 The number of likely protein coding S. cerevisiae sequences reported here is 5546. This

is 231 less than the SGD current total. Some of these discrepancies are due to gene merges not reported in SGD. The remainder are all under 100 amino acids, and some ap-pear to be spurious as they are not reported in syntenic regions of the closely related yeasts (unpublished observation).

11 It should be noted that this does not represent the absolute total for duplicates for these species, as many members of the non conserved set are also duplicated.

Schizosaccharomyces pombe comparative genomics; from sequence to systems 255

Table 2. Distribution of the relationship types of conserved and non conserved proteins be-tween S. pombe and S. cerevisiae. ‘Species specific’ comprises sequence orphans, or dupli-cated in only one species, and characterised genes with no identifiable ortholog. ‘Ortholog cannot be distinguished’ refers to proteins with an identifiable domain but which cannot be assigned an ortholog in the other species. ‘Conserved but not in S. cerevisiae/S. pombe’ re-fers to lineage specific losses in the respective organisms. ‘One to One’, ‘One to many’ and ‘many to many’ refer to the numbers of orthologous proteins mapped from and to in the re-spective species.

S. pombe S. cerevisiae Non conserved set Species specific 669 1051 Ortholog cannot be distinguished 68 307 Conserved but not in S. cerevisiae/S. pombe 498 346 Subtotal 1235 1704 Conserved Set Orthlog relationship type One to one 2396 2396 One S. pombe to many S. cerevisiae 328 731 One S. cerevisiae to many S. pombe 429 202 Many to many 483 513 Total with orthologs 3636 3842 Total predicted protein (ex dubious) 4871 5546

The number of proteins which map from a single copy in one yeast to more

than one copy in the other yeast (one to many mappings) is higher for S. pombe than S. cerevisiae (328 versus 202). However, S. cerevisiae has a larger number of proteins mapped to, which is consistent with the previously observed increased number of duplicates (S. pombe 429 S. cerevisiae 731). The duplicated proteins in the ‘one to many’ set often have related, or overlapping functions. In some cases, subfunctionalization has occurred, either by altered expression, localization or specificity.

Parallel duplications (those which appear to have duplicated independently since divergence in both lineages) account for 483 S. pombe proteins and 513 S. cerevisiae proteins belonging to 193 orthologous clusters, (compared to 202 du-plicated in S. pombe only, and 328 duplicated in S. cerevisiae only). These ‘many to many’ duplicates are predominantly involved in monitoring or responding to nutrients or specific environmental stresses, or are signalling pathway compo-nents. Specifically, cell surface glycoproteins implicated in the assimilation and catabolism of nutrients (proteases, glycosyl transferases amylases etc.) and mem-brane transporters are the most highly represented. Although the functions of the parallel duplicates are usually related, they are sometimes involved in different processes. For example, members of the expanded glycosyl transferase 48 family in S. pombe are variously required for normal growth and sporulation. Some ex-panded families appear more likely to be reutilised in different contexts by species

256 Valerie Wood

specific adaptations. Annotation transfer should therefore be more conservative when mappings are multiple.

After the removal of ribosomal proteins (44 clusters), histones (3 clusters) and translation elongation and initiation factors (5 clusters), informational proteins are almost wholly absent from this set of 193 parallely duplicated clusters. In addition, the informational duplicates are frequently highly similar, or even identical, whereas non-informational duplicates tend to be more divergent. The frequency of occurrence of duplicates, and lack of divergence for these particular gene products in most genomes implies that mechanisms exist for the maintenance of copy num-ber and similarity. Several lines of experimental evidence have been presented and mechanisms proposed to support this (Koszul et al. 2004; Prado et al. 2005; Pyne et al. 2005). The S. pombe /S. cerevisiae curated ortholog mapping described here provides an inventory of potential orthologous sequences between these two species. By using a combination of methods, approximating to a natural classification, greater sensi-tivity and selectivity for the detection of orthologs and paralogs can be achieved to provide a rigorous and comprehensive inventory based on evolutionary related-ness. The nature of ortholog detection for divergent pairs (biological knowledge, multiple software and methods) make automation difficult. However, novel pro-tein families identified during ortholog detection are submitted to the Pfam protein family database, and the Hidden Markov Models (HMMs) created for these diver-gent gene families will be useful for the detection of candidate orthologs (in com-bination with other methods) in other genomes.

This dataset will continue to be refined and extended by the identification of further distant orthologs and refined by the inclusion of intermediate species as they become available. However, it is already providing a rigorous dataset for ap-plications including annotation by functional transfer, comparative analysis, evo-lutionary analysis and hypothesis development. 12 Future analysis of the nature of the evolutionary events shaping these two genomes will determine more fully how the biological capabilities of these two organisms are manifested in their respec-tive protein complements.

3.5 Lineage Specific Gene Loss

Comparative analysis of S. pombe and S. cerevisiae identified lineage specific gene losses as a major contributor to the shaping of eukaryotic genome content (Aravind et al. 2000). This analysis identified approximately 300 genes which were either lost from, or diverged beyond expectation in S. cerevisiae but present in S. pombe. A large number of these genes were also conserved in other non-fungal eukaryotes. Co-elimination of functionally connected groups in S. cere-visiae, including some subunits of the signalosome and the spliceosome and all components of the RNAi machinery, were recorded. Some of the proposed gene losses reported by Aravind and colleagues, including S. cerevisiae MEC3 and

12 The pre-publication ortholog table is available on request.

Schizosaccharomyces pombe comparative genomics; from sequence to systems 257

DDC1, which are the functional orthologs of S. pombe Hus1 and Rad9 respec-tively, would be more appropriately described as diverged beyond expectation (Sunnerhagen 2002). Manual inspection of alignments and protein clusters has since identified 498 protein coding genes which are absent from S. cerevisiae but conserved in S. pombe and other species and 346 protein coding genes conserved in S. cerevisiae but absent from S. pombe. Sequences absent from S. pombe but present in S. cerevisiae are more often fungally conserved (and may therefore have evolved since the divergence of the two yeasts, or be rapidly evolving) while those absent from S. cerevisiae are frequently universally eukaryotically con-served 13.

3.6 Orphan and species-specific sequences

One of the most unexpected findings of the S. cerevisiae genome project was the sheer number of completely unstudied genes. Only 40%-50% of identified genes could be assigned a preliminary process or function from similarity or experimen-tation. A staggering ~30% of the gene set had remained elusive to genetic or bio-chemical techniques in S. cerevisiae and appeared to have no homolog in any other sequenced species and became known as orphans (Oliver et al. 1992; Gof-feau et al. 1996).

The existence of orphans can only be attributed to; i). spurious ORFs which are not protein coding ii). the acquisition of novel species specific functions by the generation of de novo genes and proteins iii). rapidly evolving proteins for which the sequence similarity between the available species is obscured, and species closely related enough to detect orthologs have yet to be sequenced (Wood et al. 2001).

Over the past decade the number of orphans in all species has decreased rap-idly, either through experimentation, the detection of distant orthologs or the se-quencing of additional species. This is illustrated by the sequencing of the Sac-charomyces (sensu stricto) quartet which resulted in only 18 genes being identified as Saccharomyces cerevisiae specific (Kellis et al. 2003). However, many of these now form numerous largely Saccharomyces and hemiascomycete specific families.

There is an accumulating body of empirical and conjectural evidence that many apparent orphans or phylogenetically restricted genes are more rapidly evolving than broadly conserved genes (Copley et al. 2003). These genes also appear to be frequently implicated in processes which involve interacting with and monitoring of external environmental signals. The sequence of the close S. cerevisiae relative K. lactis identified orthologs for previous orphans and showed that sequence simi-larity was, on average, lower for these than for more ubiquitously conserved genes (Ozier-Kalogeropoulos et al. 1998). Gaillardin and colleagues indicated that hemi-ascomycete specific proteins are highly represented in the functional classes of cell wall organization, extracellular and secreted proteins and transcriptional regu-

13 These lists are accessible via GeneDB http://www.genedb.org/shortcuts.jsp.

258 Valerie Wood

lators, suggesting that these functional groups diverge more rapidly than other classes of protein (Gaillardin et al. 2000). Reports of rapid divergence of genes in-volved in taxon specific processes are not confined to fungi. Since the divergence of the mosquito Anopheles gambiae and the fruit fly Drosophila melanogaster, proteins involved in environmental defenses and signal transduction, have evolved faster on average than those involved in catalysis and maintenance of cellular structural integrity (Zdobnov et al. 2002; Domazet-Loso and Tautz 2003). Simi-larly, in comparisons between pufferfish and human, genes related to immunity and gametogenesis were identified as rapidly evolving (Aparicio et al. 2002).

There are now fewer than 500 complete orphans (less than 10% of the protein complement) remaining in S. pombe where experimentation has provided no clues about the process or function, and similarity has identified no orthologs or con-served domains. The majority of these are potential plasma membrane or cell sur-face molecules (based on sequence analysis of potential transmembrane domains, GPI anchors, N-terminal signal sequences and glycosylation sites) often identified as frequently rapidly evolving, and often involved in specific environmental adap-tations.

Although many orphans are likely to be taxon specific adaptations, the detec-tion of distant similarities between S. pombe and S. cerevisiae continues to reduce the S. pombe orphan set by identifying gene families, which although very diver-gent, are universally conserved ‘core’ genes. These include orthologous clusters containing S. pombe Rec10, Hop1, Spc24, Spc25, Sgo1, Nse1 (Lorenz et al. 2004; Asakawa et al. 2005; Kitajima 2004; Fujioka et al. 2002). Often, these divergent orthologs are part of the large proteinaceous complexes, for example those in-volved in chromosome synapsis and segregation. It is possible that the absence of interactions with invariable organic compounds (macromolecules, cofactors, sub-strates) reduces the selective pressures resulting in sequence conservation, because sequences can evolve via complementary mutations in interacting partners. Many components of these large complexes do not appear to be conserved. Detection of distant orthologs will almost certainly further reduce the orphan set to reveal the truly genus specific components of these species.

4 Comparative and functional genomics

4.1 Gene expression studies

The development of microarray technologies enabling the analysis of thousands of expression probes in parallel, has provided a mechanism to derive and test broad hypotheses on a genome wide basis, through the study of global expression pro-files for defined developmental or lifecycle stages or under specific environmental conditions (DeRisi et al. 1997). The effect of perturbations to these systems (either natural or induced) can also be evaluated. Moreover, the integrated analysis of mi-croarray expression data not only provides insights into global transcription pat-

Schizosaccharomyces pombe comparative genomics; from sequence to systems 259

tern but it may also provide insights to function as co-expressed genes are likely to be involved in similar processes (Eisen et al. 1998).

S. pombe microarray data and analyses are now available for a number of bio-logical processes fundamental to cell survival. These include sexual development and meiosis (Mata et al. 2002), stress responses (Chen et al. 2003), and the mitotic cell cycle (Rustici et al. 2004). The availability of complementary microarray datasets for S. cerevisiae, and a curated inventory of orthologous pairs, also allows the comparative analysis of these transcriptional programs.

During the fission yeast transcriptional program for sexual development, almost 2000 genes were significantly up-regulated in four temporal classes corresponding to the four main stages of sexual differentiation (Mata et al. 2002). Five chromo-somal regions were highly enriched for meiotically induced genes. Significantly, four of these regions were close to the usually transcriptionally inactive regions at the telomeres. This raises the possibility that spatial arrangement has a role in the activation of clusters of genes in this process. Of all conditions studied, genes upregulated during sexual development show a lower proportion conserved be-tween the two yeasts (Mata and Bähler 2003; see also sections 3 and 4.3.3). Both of these results are consistent with the observation that the subtelomeric regions of S. pombe, and other eukaryotes, harbour an increased density of apparently species specific families (see also section 2.3). The observed up-regulation of telomeri-cally encoded and species-specific genes at meiosis may therefore also be signifi-cantly correlated.

The evaluation of transcriptional responses to environmental stress defined a core environmental stress response (CESR) in S. pombe common to all, or most stresses (Chen et al. 2003). A substantial overlap between these, and the CESR genes of budding yeast was demonstrated showing that many stress induced changes are evolutionarily conserved. Finally, comparisons of global expression data for the cell-cycle control of transcription have revealed conservation of tran-scription factors between fission yeast and budding yeast, yet major differences in regulatory circuits (Rustici et al. 2004). Periodic transcription appeared not to be conserved, except for a core set of ~40 genes expected to be critical for cell cycle control.

Transcriptional control may be the primary mechanism for gene regulation but this operates at multiple levels from the sequence level (i.e. recognition and bind-ing of transcription factors), to the chromatin level (i.e. histone modification status) and the nuclear (level based on the 3D compartmentation of the genome in the nucleus; reviewed in van Driel et al. 2003). Gene expression is also controlled at additional levels: transcripts are regulated by their localization, processing and decay. Microarrays are being successfully exploited to evaluate various aspects of regulation by extensions to the original technology including chromatin immuno-precipitation (ChIP)-on-chip for the identification of binding sites for transcription factors and other DNA binding proteins (reviewed in Pollack and Iyer 2002). Other innovations include the analysis of polysome bound mRNA to determine global translation rates (Pradet-Balade et al. 2001), and combinatorial approaches to data analysis. The first S. pombe experiments correlating spatial genome ex-pression patterns with specific chromatin modifiers have identified telomeric clus-

260 Valerie Wood

tering of some of the target genes (Hansen et al. 2005). Experiments using ChIP-on-chip and polysome bound RNA also underway (personal communication, J Bähler) and will provide a wealth of data for the reconstruction of regulatory net-works.

Microarray experiments using the yeast models will undoubtedly continue to be informative in terms of the biology of unicellular eukaryotes, and in providing a framework for evaluating what can be successfully achieved using microarray analysis for the understanding of the gene expression programmes of more com-plex organisms.

4.2 Regulatory sequences

The complete understanding of an organism’s functional capabilities will depend not only on the analysis of individual gene products and their interactions, but also on the concurrent identification of shared regulatory motifs in the genome. Al-though the prediction of regulatory motifs is substantially more difficult than gene prediction, pattern discovery methods have been used with some success to iden-tify potential regulatory patterns in the S. cerevisiae genome (Brazma et al. 1998; Ettwiller et al. 2003). Comparative genomics approaches relying on synteny using closely related yeasts, have also been successful for the Saccharomyces genus (Cliften 2001; Kellis et al. 2003). However, the lack of any sequenced yeast dis-playing synteny with fission yeast precludes analyses of this type at present.

There are currently around 55 experimentally verified transcription factor bind-ing site motifs reported for S. cerevisiae (Kellis et al. 2003). However, fewer than a dozen transcription factor binding site motifs are so far experimentally identified in fission yeast (K. Kivinen, personal communication). Despite a similar genome size, the intergenic regions are significantly larger for S. pombe than for S. cere-visiae, which may be indicative of more complex regulatory mechanisms (see sec-tion 2.7). The availability of the genome sequences of these two yeast species pro-vides an opportunity to assess the similarities and differences by the comparison of pattern discovery methods and assessment of the number and type of motifs found by applying the same procedures to evolutionarily distant yeasts.

Additional information can be extracted from microarray data by targeted pat-tern discovery; based on the assumption that genes involved in the same biological processes, and genes with similar expression patterns are more likely to share regulatory mechanisms. S. pombe and S. cerevisiae data clustered by sequence similarity, co-annotation or co-expression, coupled with evaluation of pattern sig-nificance, were evaluated for over-represented motifs (K Kivinen, PhD thesis; http://www.sanger.ac.uk/Info/theses/). Initial comparisons confirmed expectations that the two yeasts were too divergent for comparative genomics approaches using pairwise alignments of predicted regulatory regions of orthologous sequences. However, analyses based on the comparison of functionally connected genes and co-expressed genes have provided a comprehensive set of sequence patterns, many of which are likely to have regulatory roles in one, or both yeasts. Firstly, analysis of co-annotated clusters of genes identified all but two of the known regu-

Schizosaccharomyces pombe comparative genomics; from sequence to systems 261

latory sites in fission yeast, and novel regulatory sites (both upstream and down-stream) were identified for both yeasts (Kivinen et al. manuscript in preparation). Secondly, analysis of co-expressed clusters from microarray data during meiotic differentiation, stress response and mitotic cell cycle were studied using the pub-lished datasets for both organisms (Chu et al. 1998; Mata et al. 2002; Chen et al. 2003; Gasch et al. 2000; Rustici et al. 2004; Spellman et al. 1998). This approach also identified most known regulatory sites including the patterns common to the two yeasts and many novel potential regulatory motifs (Chen et al. 2003; Rustici et al. 2004). Additional observations from these studies include:

i. The identification of unstudied, but shared motifs (K Kivinen, PhD the-sis; http://www.sanger.ac.uk/Info/theses/; Kivinen et al. manuscript in preparation).

ii. An extended functional role for the FLEX site which is conserved from yeast to man (previously identified as meiosis specific) through a likely involvement in both meiotic and mitotic cell cycles (Rustici et al. 2004).

iii. Approximately 50% of known budding yeast and fission yeast regulatory sites show a spatial bias relative to translation start sites (K Kivinen, PhD thesis; http://www.sanger.ac.uk/Info/theses/; Kivinen et al. manuscript in preparation).

iv. A set of genes containing a downstream motif in the 3’ UTR were identi-fied. This motif, an AU-rich element (ARE), is involved in the mRNA stability of interferons, cytokines and proto-oncogenes (reviewed in Chen and Shyu 1995). The same element has recently been implicated in the stability of the periodically abundant cyclin dependent kinase (CDK) in-hibitor rum1 mRNA in fission yeast (Daga et al. 2003).

v. Sets of genes containing novel downstream motifs which appear to have a functional role (Groocock et al. manuscript in preparation; K Kivinen, PhD thesis; http://www.sanger.ac.uk/Info/theses/; Kivinen et al. manu-script in preparation).

It is known that some regulatory patterns have survived millions of years of evolu-tion with no apparent change, for example, the shared regulatory sites MCB (Lowndes et al. 1992), and ATF/CRE (Jones and Jones 1989). For others, se-quence patterns have diverged but the functional role has been retained. However, in the majority of cases, these two model yeasts have diverged so far from each other that their regulatory regions appear to be unrelated (K. Kivinen, personal communication). Extension of the complementary approaches of co-expression and co-annotation, for the identification of regulatory regions, will have enormous potential as the expression datasets increase in coverage; annotation increases in specificity; and analysis tools for identifying similarities in expression pattern and sequence improve. Future developments will undoubtedly support the use of pat-tern discovery as a predictive tool for suggesting functional links between groups of genes.

262 Valerie Wood

4.3 Integrative comparative studies

Complete genomes and their associated data are providing the opportunity to sys-tematically examine the connections between the determinants of evolutionary history and other quantifiable characteristics of genes and proteins. Global correla-tions between different types of data; either genome wide experimental observa-tions, computationally derived data, or genome wide functional annotations, can be assessed. Preliminary comparisons are beginning to provide insights into the relative contributions of these quantifiable characteristics to the biological con-straints and selective pressures which determine genome content.

4.3.1 Dispensibility and divergence

A pilot gene deletion project was undertaken to estimate the percentage of essen-tial genes in fission yeast, investigating 100 contiguous CDS (Decottignies et al. 2003). The percentage of essential genes was found to be 17.5%, almost identical to the 17.8% that are essential for S. cerevisiae growth on a rich medium (Garrels 2002). Amongst the 81 S. pombe genes with a predicted homolog in S. cerevisiae, 88% (71 genes) showed the same deletion phenotype in both yeasts. Of the 15 es-sential fission yeast genes, only 10 (67%) are also essential for budding yeast growth. Therefore, despite the absolute percentage of essential genes being almost identical between the two yeasts, only two-thirds of these appear to overlap. This did not appear to be due to gene duplication and functional redundancy for any of the genes studied. A correlation was observed between the likely time of origin of a gene and dispensability, leading to the conclusion that more ancient genes (maintained in all eukaryotic, or all eukaryotic and prokaryotic species sequenced) are more likely to be essential, and yeast specific genes are less likely to be essen-tial. Previous analyses of both C. elegans and S. cerevisiae revealed similar con-clusions (Fraser et al. 2000; Garrels 2002).

A relationship between evolutionary rate and fitness has proven difficult to de-tect, but Hirsh and Fraser demonstrated that there is a highly significant correla-tion between protein dispensability and evolutionary rate (based on the number of substitutions per amino acid site using S. cerevisiae) which is not always detect-able from categorical comparisons of essential and non essential proteins (Hirsh and Fraser 2001). The relationship is apparently obscured because proteins with small but measurable fitness effects can be considered essential in evolutionary terms. It is likely that many highly conserved proteins involved in central proc-esses are not lethal because biological systems make extensive use of ‘fail-safe’ mechanisms.

4.3.2 Correlations with gene loss

Krylov and colleagues explored the connection between the propensity of a gene to be lost in evolution, protein sequence divergence, dispensability, the number of protein-protein interactions and expression level for genes in clustered ortholo-gous groups for seven fully sequenced eukaryotic genomes including S. pombe

Schizosaccharomyces pombe comparative genomics; from sequence to systems 263

(Krylov et al. 2003). Significant correlations were detected between the potential for a gene to be lost and all other categories. Genes with a lower propensity to be lost accumulate fewer changes, and tend to be essential, highly expressed and have many interaction partners. However, in this analysis no appreciable correlation was found between evolution rate and dispensability.

4.3.3 Correlations with expression level

Fission yeast gene expression levels were compared to the degree of species con-servation, by integrating expression data with core eukaryotic genes (present in worm, budding yeast and fission yeast), yeast specific genes (present in budding and fission yeast) and S. pombe specific genes (Mata and Bähler 2003). In vegeta-tively growing cells, S. pombe specific genes tended to be expressed at a lower level and a disproportionate number of core conserved genes were highly ex-pressed. These results support the hypothesis that core genes carry out basic func-tions, and are globally expressed in all conditions. Conversely, in sexually differ-entiating cells, although many core genes were still expressed, the bias was weaker, and many S. pombe specific genes became highly expressed. This en-richment of expression of S. pombe specific genes supports the hypothesis that or-ganism-specific genes function in specialised processes (see also section 3.4.2 and 3.6). Organism specific genes were over-represented at all stages of sexual differ-entiation but the trend was most prevalent for genes in the cluster involved in chromosome pairing and recombination (meiotic prophase). This is consistent with observations that meiotic structural proteins are poorly conserved across eu-karyotes (Villeneuve and Hillers 2001; see also section 3.6). It is speculated that differences in the chromosome pairing machinery may help to prevent fruitful meiosis between closely related organisms and drive the separation between spe-cies.

4.3.4 Conservation level and interaction number

Theoretical arguments propose that proteins evolve more slowly if they participate in many interactions. In addition, structural analysis has shown that amino acid residues at protein interfaces are generally more conserved than the average for all proteins (reviewed in Teichmann 2002). In order to investigate globally the con-straints protein-protein interactions place on sequence variation, the sequence similarities of S. cerevisiae proteins were compared to their S. pombe orthologs and evaluated with respect to interaction type (Teichmann 2002). The large varia-tion in sequence conservation between orthologs (>20-<90% identity) was used to demonstrate that stable complexes were, on average, more conserved than proteins involved in transient interactions. However, the trend for complexes to be more highly conserved than transient interactions, which are in turn more conserved than monomers, was found to be independent of whether a protein is involved in informational activities (transcription, translation, and replication) or not. This trend was also independent of protein dispensability. In contrast, Jordan et al. identified only a weak relationship between the number of protein interactions and

264 Valerie Wood

evolutionary rate (estimating evolutionary rate from S. cerevisiae/ S. pombe com-parisons, and using S. cerevisiae interaction data), and concluded that only the most prolific interactors showed a reduction in evolutionary rate (Jordan et al. 2003). Two further studies have subsequently identified a significant positive cor-relation between the number of protein-protein interactions in S. cerevisiae and evolutionary distance to other organisms including S. pombe (Fraser et al. 2003; Pagel et al. 2004). A preference for interacting proteins to be conserved together was also identified, but no bias was detected with respect to functional roles (Pagel et al. 2004). Inevitably studies of this type will be difficult to perform and interpret with current interaction datasets which are biased for well studied genes, error prone and incomplete.

4.3.5 Dispensability, distribution and interaction number

The availability of global datasets for genetic and physical interactions has made biology amenable to the application of techniques and theories governing the for-mation, behaviour, and development of networks. It has been proposed that most biological networks have a scale-free “small world” topology (Jeong et al. 2000). That is, most ‘nodes’ have a small number of connections, but a few highly con-nected nodes (or hubs) hold the network together. It was subsequently shown that centrality in the network (i.e. highly connected proteins) correlated positively with lethality (Jeong et al. 2001). Kunin and colleagues used S. cerevisiae protein inter-action data to trace the origin of proteins in the interaction network and to evaluate the evolution of a scale-free topology (Kunin et al. 2004). They did not detect a di-rect correlation between connectivity and age as expected by the ‘preferential at-tachment model’ whereby older nodes should display higher connectivity, and proposed that this is due to the functional heterogeneity of the protein interaction network. Instead, it was found that proteins which evolved after the split which lead to the fungi, and those which evolved after the split from fission yeast, dis-played on average, reduced connectivity. Surprisingly however, the proteins of oldest origin did not show the highest connectivity. The majority of the most highly connected proteins are found to have emerged during the eukaryotic radia-tion which seems to reflect the emergence of many highly connected proteins in-volved in eukaryotic cellular organization, such as cytoskeleton components, tran-scription complexes and the nuclear pore. They observed that different functional classes display different average connectivity. Specifically, proteins involved in cell wall organization and biogenesis appear to be the least connected, followed by proteins involved in transport, binding and metabolism. Proteins of unknown func-tion also have lower levels of connectivity. Conversely, proteins involved in tran-scription, replication, cellular processes and regulatory functions have, on average, almost twice as many binding partners. The age of a protein also correlates well with what is known about its function; far fewer ancient proteins are uncharacter-ised, which is expected because phylogenetically extended families tend to be well studied. It is proposed that protein function determines the types of binding part-ner, degree of connectivity and the time of emergence in the network.

Schizosaccharomyces pombe comparative genomics; from sequence to systems 265

4.4 Section summary

Further genome wide studies integrating information from function, physical in-teractions, lethality, sequence conservation, duplication and phylogenetic distribu-tion will continue to define factors affecting the evolution and characteristics of a eukaryotic cell, and to assess their relative contributions to genome content. These will become more accurate as annotation, curation and methods for comparison improve, providing the potential to propose and test numerous evolutionary hy-potheses on a genome-wide scale. An accurate and comprehensive model will also be a powerful predictive tool to determine which genes are likely to be involved in core eukaryotic processes, or species-specific adaptations based on phylogenetic distribution, copy number, evolutionary rate and network position.

5 Curation

The accumulation of biological data produced by genome-scale biology has re-quired a revolution in the approaches used to describe, integrate and retrieve this huge volume of diverse information. Numerous attributes of gene products can be recorded during annotation or literature curation but the molecular activity (func-tion), biological process and cellular localization (component) are generally con-sidered the most immediately useful information to describe an organism’s biol-ogy. Any robust system to capture these features of gene products has the following necessary or desirable requirements:

1. The ability to describe gene products consistently and unambiguously so that similar characteristics are grouped, (including the grouping of gene products for which no functional data is available)

2. To support the inherent pleiotropy of the data, recognising that gene products may have multiple functions, and locations, and participate in multiple processes

3. The ability to describe gene products using different levels of granularity (different levels of detail) depending how much is known or can be in-ferred (hierarchical)

4. Mechanisms to qualify annotations with different levels of confidence, and to support the annotations with a method or citation

5. Sophisticated consistency checks to maintain the integrity of the data 6. Be readily and rapidly extensible to incorporate new biological concepts 7. To support negative annotations 8. Be species independent, to support inter-organism queries 9. To enable researchers to retrieve specified groups of genes or to identify

candidate gene products for specific functions The annotation standards provided by controlled vocabularies, and more sophisti-cated ‘ontologies’ are now crucial to the annotation process for most genomes. These define ‘terms’ to describe aspects of a gene product’s biology, which can be interpreted identically both within, and between organisms, by both biologists and

266 Valerie Wood

computers. The most vital resource for maintaining consistent annotation of genes and gene products is provided by the Gene Ontology (GO) Consortium, which ful-fils all of the nine requirements above, and is the annotation system of choice for the majority of model organism databases (MODs; http://www.geneontology.org; The Gene Ontology Consortium 2004). The GO Consortium is a collaborative open source project to develop shared controlled vocabularies, which are continu-ally refined and expanded to reflect accumulating biological knowledge (Ashburner et al. 2000). The GO provides three ontologies to describe the or-thogonal biological domains of biological process, cellular component and mo-lecular function, in a species-independent manner.

5.1 Gene Ontology structure

Gene Ontology terms are arranged so that broader parents give rise to more spe-cific children. The relationships are represented in the form of a directed acyclic graph (DAG), which is similar to a hierarchy, except that it captures biological re-lationships more realistically by allowing individual child terms to have many parent terms. At present, two types of relationship are implemented in GO (‘is_a’ and ‘part_of’), although it is conceivable that other relationship types will be added in the future. For example, the cellular component term ‘nuclear pore (GO:0005643)’, has two parents, it is ‘part_of’ ‘nuclear membrane (GO:0005635)’ , and ‘is_a’ ‘pore complex (GO:0046930)’ (Figure 1 shows a screenshot of the cellular component term ‘nuclear pore’ in the ‘Amigo’ GO browser. This view shows the term ‘nuclear pore’ with all its parent terms, to-gether with the numbers of S. pombe gene products associated with each term). Every GO term must obey the 'true path rule'; this means every possible path from any term back to the root (most general term) must be biologically accurate. When a gene product is annotated to a term, it is therefore automatically annotated to all of the parent terms. For example, a gene product annotated to ‘inner plaque of spindle pole body (GO:0005822)’ is ‘part_of’ the ‘spindle pole body (GO:0005816)’ which is ‘part_of’ the ‘spindle pole (GO:0000922)’ and so forth back to the root node ‘cellular component (GO:0005575)’. If a path back to the root node is incorrect for a valid annotation a ‘true path violation’ occurs and the ontology must be revised. This structure allows curators to assign properties at dif-ferent levels of granularity depending how much is known, or can be inferred, about a gene product. Multiple associations (ontology terms) can be applied to a single gene product, reflecting the fact that a gene product may have several func-tions, be present in different locations, participate in different processes and inter-act with numerous other proteins.

Schizosaccharomyces pombe comparative genomics; from sequence to systems 267

Fig. 1. A screenshot of the cellular component term nuclear pore and its parent terms in the ‘Amigo’ GO browser (B. Marshall and S. Lewis, unpublished software). The numbers in parentheses show the number of S. pombe gene products associated to each term.

5.2 Gene Ontology implementation

GO collaborators use the GO schema to annotate individual gene products. These annotations are maintained in a common file format (the gene association file: see http://www.geneontology.org/GO.annotation.shtml?all#file), which is incorpo-rated into the contributing database (GeneDB http://www.genedb.org/ in the case of S. pombe) and submitted to the Gene Ontology consortium.

A comprehensive set of GO annotations (gene associations) are provided for S. pombe within GeneDB. These associations are derived from a number of non-redundant sources, and are continually refined and updated. There are currently 4889 manual gene associations for 1300 gene products. Of these, 2013 are derived

268 Valerie Wood

Table 3. Sources of the non-redundant GO data for S. pombe in GeneDB, by evidence code. Key: [1] GOC:pombekw2GO, [2] GOC:ec2go, [3] GOA:interpro, [4] GOA:spkw, [5] GOA:spec.

Evidence code GeneDB GOA/Uniprot IEA 7732 [1]

364 [2] 4162 [3] 1530 [4] 46 [5]

IMP 571 12 IDA 408 23 IEP 5 0 IGI 149 1 IPI 249 4 ISS 2876 6

IC 294 9 NAS 16 9 TAS 321 1

12985 5738 IEA = inferred from electronic annotation, IMP = inferred from mutant phenotype, IDA = inferred from direct assay, IEP = inferred from expression profile, IGI = inferred from ge-netic interaction, IPI = inferred from physical interaction, ISS = inferred from sequence similarity, IC= inferred by curator, NAS = non traceable author statement, TAS = traceable author statement. Also see the online GO Evidence documentation http://www.geneontology.org/GO.evidence.shtml?all from experimental data via literature curation (1008 publications) and are sup-ported by the appropriate evidence code for the type of experiment (see Table 3 legend for a list of evidence codes). A further 2876 are ‘inferred from sequence similarity’ (ISS evidence code) based on manual inspection of sequence align-ments to characterised proteins.

The manual gene associations are supplemented by electronically inferred an-notations (IEA evidence code) from S. pombe primary annotation, the Gene On-tology Annotation (GOA) database (Camon 2004) and UniProt, (Apweiler 2004). These include:

i. A keyword mapping from the primary S. pombe annotation to GO terms (GOC:pombekw2GO)

ii. A mapping of enzyme commission (EC) numbers assigned to GeneDB entries (GOC:ec2go) and Uniprot entries (GOA:spec) to GO terms

iii. A mapping of Interpro families and domains to GO terms (GOA:interpro) iv. A mapping of Uniprot keywords to GO (GOA:spkw)

Within the GeneDB database, redundancy of GO mappings is prevented by pre-senting IEA mappings only when they are more granular than manual associa-tions. These provide 13834 additional associations giving a total of 18723 non re-dundant associations. The sources of these associations and their distribution between the various evidence codes are summarised in Table 3.

Presenting automated mappings and integrating bioinformatics predictions with the manual annotations, provides greater annotation coverage to the S. pombe re-search community in the absence of manual curation. For instance, a researcher

Schizosaccharomyces pombe comparative genomics; from sequence to systems 269

looking for a specific activity like dolichyl-phosphate beta glucosyltransferase ac-tivity (GO:0004581) may retrieve all genes which are inferred by sequence simi-larity or electronic annotation to have the activity of the broader parent terns UDP-glucosyltransferase activity (GO:0035251) or glucosyltransferase activity (GO:0046527). Inferred annotations not only allow researchers to identify groups of candidate genes, but are also beneficial to the curation process, as they alert the curator to relevant terms which may have been overlooked, or to missing relation-ships in the ontologies. Moreover, assessing the output of global mapping re-sources from the perspective of individual gene products identifies erroneous mappings and allows them to be corrected, which can radically reduce false posi-tive mappings for the automated annotation of other organisms. One future aim of the curation strategy for S. pombe is to process the literature backlog, in order to convert IEA associations to experimentally supported evidence codes where ap-plicable, or ISS codes supported by a manually assessed alignment to a character-ised protein or protein family if direct experimental results are not available.

Three qualifiers (NOT, contributes_to and colocalizes_with) are available within GO to modify the interpretation of the annotation. The ‘NOT’ qualifier is used to support negative annotations. This would normally be used if experimental evidence has shown a particular assignment not to be true, but where an associa-tion might otherwise be made based on other evidence. The ‘contributes_to’ quali-fier is used when a complex has an activity but the individual subunits do not, for example the subunits of RNA polymerases. The ‘colocalizes_with’ qualifier is used when gene products are associated transiently or peripherally with a cellular component, or where the resolution is inconclusive.

Within S. pombe GeneDB, additional qualifiers are applied to GO associations to increase their informational content further. Examples include ‘phase’ qualifi-ers’ used to specify the life cycle or cell cycle stage when a particular localization is observed, or process occurs, which is especially useful for pleiotropic gene products. A selection of qualifiers are also used in conjunction with the ‘inferred from genetic interaction’ (IGI) evidence code to establish the type of genetic inter-action (epistasis, localization_dependency, acts_upstream_of, parallel_pathway etc). These qualifiers provide information about the position in the genetic hierar-chy, the directionality of the interaction or whether the gene product is in the same, or a different, pathway and will be pertinent to the reconstruction of genetic networks.

5.3 Dynamic aspects of the Gene Ontology and the associated annotations

The Gene Ontology is a dynamic resource. Changes to the ontologies are fre-quently made to correct legacy terms and relationships, to improve consistency, and to add new terms and relationships as advances are made in biology. Litera-ture curation is not a passive process and necessarily includes contributing to the development of the GO by identifying missing relationships, extending vocabular-ies, refining existing term definitions and identifying new terms. New terms added

270 Valerie Wood

645 (620)

818 (162) 50 (155)

1989 (3224)

Function

Process Component

293 (622)

143 (33)

274 (77)

Unassigned 668 (860)

Total 4880 (5977)

Fig. 2. Gene Ontology association coverage for S. pombe showing the number of gene products with at least one association to each of the three ontologies; molecular function, biological process and cellular component. The corresponding figures for S. cerevisiae are shown in parentheses.

recently to describe biological phenomena studied in S. pombe include the cellular component terms ‘medial ring (GO:0031097)’ and ‘linear element (GO:0030998)’, the biological process terms ‘sister chromatid biorientation (GO:0031134)’ and ‘horsetail movement (GO:0030989)’ and the molecular func-tion terms ‘ornithine N5-monooxegenase activity (GO: 0031172)’ and ‘glucan endo-1,3-alpha glucosidase activity (GO:0051118)’.

5.4 S. pombe gene associations, coverage and comparison with S. cerevisiae

Of the 4880 known and predicted protein coding genes 4215 are assigned to at least one GO term (Figure 2). This includes 3726 with at least one biological process term, 2977 with at least one cellular component term and 3000 with at least one molecular function term. Only 668 genes considered likely to be protein coding, have no known or predicted component process or function. I n contrast,

Schizosaccharomyces pombe comparative genomics; from sequence to systems 271

0

100

200

300

400

500

600

700

800

900

1000

cell c

ycle

cytok

inesis

amino

acid

metabolism

lipid

metabo

lism

nitrog

en co

mpound

meta

bolism

phos

phate

metaboli

sm

DNA meta

bolism

protein

modifica

tion

nucle

ar org.

& biog

en.

trans

lation

carbo

hydrat

e metabo

lism

nucle

otide

meta

bolis

m

trans

cripti

on

catabo

lism

cell b

udding

cell w

all or

g. & bi

ogen

.

trans

port

regulat

ion of

trans

cripti

on

energ

y path

ways

respons

e to st

ress

other

proces

s

pombecerevisiae

Fig. 3. A comparative overview of the distribution of ‘high level’ GO biological process annotations for S. pombe vs. S. cerevisiae. The S. cerevisiae totals are derived from SGD manual annotations supplemented by GOA mappings. Annotations to ‘unknown’ process function and component terms, and annotations for non-protein coding genes have been fil-tered. The terms are not mutually exclusive as terms may belong to more than one category, or, in the case of transcription and transcriptional regulation, one may be a complete subset (child) of another.

S. cerevisiae has more gene products assigned to at least one term in all three on-tologies (3224), but also, a greater number of genes with unknown function proc-ess or component (860)14.

A comparative overview of the distribution of ‘high level’ annotations for the process ontology annotations of the two yeasts is presented in Figure 3. This dis-tribution of annotations corresponds with what is known about the broad biology of these two organisms based on the accumulation of literature and comparative analysis reviewed earlier in this chapter (see section 3).

The GO terms which have annotations in approximately equal numbers in both yeasts are biased towards universally conserved proteins involved in informational processes. The GO terms which have annotations in increased numbers for S. cer-evisiae have a larger number of species specific genes (those without an apparent

14 S. cerevisiae figures were derived from a non redundant set combining SGD manual as-

signments and IEA annotations derived from GOA as described for S. pombe in the text. S. cerevisiae annotations to noncoding RNAs were removed, as these have not yet been implemented for S. pombe. Annotations to ‘cellular_component_unknown’, ‘biologi-cal_process_unknown’ and ‘molecular_function_unknown’ were removed for both or-ganisms.

272 Valerie Wood

ortholog) and genes which are commonly duplicated (many to many), and are processes implicated more often in interactions with the environment (nutrient ac-quisition, toxicity modulating and aspects of regulation; see also section 3).

5.5 Searching and accessing GO

The S. pombe gene associations can be accessed via the Gene Ontology consor-tium website and the GeneDB website using the Amigo Gene Ontology browser (http://www.godatabase.org/cgi-bin/amigo/go.cgi; http://www.genedb.org/ amigo/perl/go.cgi; B. Marshall and S. Lewis, unpublished software). Amigo al-lows the browsing of GO terms and the relationships between them and the re-trieval of gene products associated with those terms, or all the terms associated with specific gene products. It also allows searching of the ontology by term name and of the annotations by gene name, sequence, evidence code or species.

GeneDB also supports a query facility with Boolean capability, this allows the results of queries to any GO term to be combined using AND or OR. These que-ries can also be combined with other biological attributes including protein do-mains, keywords, protein length and Mass, presence of transmembrane regions, signal peptides, exon number, and chromosomal location. The results can be saved to a query history, combined with previous queries (added, subtracted and inter-sected) and downloaded in a number of formats (for example, as gene names, de-scription, protein or nucleotide sequence; http://www.genedb.org/gusapp/serv-let?page=boolq; Hertz-Fowler et al. 2004).

The gene association files are available for download from the GO consortium and the Wellcome Trust Sanger Institute (WTSI) websites (http://www.geneontology.org/GO.current.annotations.shtml; ftp://ftp.sanger. ac.uk/ pub/yeast/pombe/Gene_ontology. The WTSI files also include the non-redundant associations from other sources.

5.6 Curation summary

The curation process is improved by GO, far beyond the provision of controlled vocabularies to consistently describe biological phenomenon. The GO also pro-vides a framework for quality control, of both data input, and of the subsequent revisions or extensions to biological knowledge which affect the description and implementation of concepts within the ontologies. In addition, it provides a mechanism for the identification of relevant terms; either by the application of cu-rated mappings (i.e. Interpro to GO, or EC to GO); or by the consideration of commonly co-annotated terms from orthogonal ontologies. The resulting associa-tions, in turn, provide robust datasets for inter-species comparisons, and facilitate uniform queries based on shared biological roles. Increasingly, GO is being used by biologists to identify interesting gene products, and has the potential to identify areas and genes which are relatively unstudied. All of these applications become

Schizosaccharomyces pombe comparative genomics; from sequence to systems 273

increasingly powerful as the annotations are refined ontologies become more complete.

6 Future prospects

The availability of the genome sequence has revolutionised experimental research for S. pombe. When genome sequencing began, the number of studied genes was around 200; around 1400 genes now have some degree of published experimental characterization. Fission yeast is therefore no longer only the bastion of cell cycle research. Its efficacy as a general eukaryotic model is now promoting research in areas of cell biology that were traditionally more confined to S. cerevisiae. Despite the advances made by the S. pombe research community and the enormous poten-tial of S. pombe as a eukaryotic model organism, the published genome-wide func-tional interrogations are currently limited to microarray analyses. Genome wide datasets for deletion and localization are in progress are therefore eagerly antici-pated.

Functional and comparative genomics initiatives and the emerging field of sys-tems biology are intimately dependent on accurate parts list and continued primary sequence analysis is therefore paramount. The availability of close S. cerevisiae relatives has been instrumental in refining gene structures and identifying missing genes for this organism, resulting in alterations to more than 10% of the gene complement (Kellis et al. 2003; Brachat et al. 2003; Cliften et al. 2001). S. pombe will benefit similarly from the availability of the genomes of S. japonica, S. octo-sporus and S. kambucha which have recently been approved for sequencing as part of the Whitehead Institute Fungal Genomes Initiative (http://www.broad.mit.edu/annotation/fungi/fgi/candidates.html).

Whole genome comparisons are now central to the development and testing of hypotheses relating to the mechanisms of evolution. Accurate inventories of orthologs and partitioning of conserved, non-conserved and dubious proteins will provide accurate functional transfer, but will also benefit integrative studies to provide a framework for the dissection of species similarities and differences. The identification of further factors that determine, or correlate strongly with the rate of duplication, divergence and loss of proteins will continue to reveal the prevail-ing trends in protein evolution. Additional data partitions based on biology (i.e. metabolic versus non-metabolic, nuclear versus cytoplasmic) are likely to reveal more subtle correlations and evolutionary constraints.

Sequencing projects need a commitment to consistent curation to make mean-ingful computational comparisons based on functional roles a realistic prospect. However, data curation remains a major bottleneck for comparative analysis. The Gene Ontology (GO) schema provides a workable framework to make accurate and consistent curation a feasible goal. As the annotation becomes more complete, and GO is refined and extended in coverage, the possibilities for in silico research will increase in parallel. Integration of bioinformatics predictions with experimen-tal data will in turn provide testable hypotheses and models for bench scientists.

274 Valerie Wood

Ultimately the parts lists provided by genome sequencing and curation, and the data generated by functional genomics experiments are creating a platform for systems biology and network approaches for the elucidation of biological func-tion. Systems biology aims to describe the global organization of genes and pro-teins in the control and maintenance of cells and organisms. Ultimately, systems approaches will go far beyond the mere description of a network’s connectivity and its global dynamics. However, to explore fully the nature of the relationships within and between the identified modules will require new approaches for obtain-ing organizing and analysing data (Nurse 2003). Integration of functional genom-ics datasets is paramount; integration will corroborate statistically significant data and improve functional predictions. Fraser and Marcotte have recently outlined some considerations for these systems and begun to assess how this might be achieved (Fraser and Marcotte 2004). A complete description of cellular networks is a realistic goal for the post-genomics era, and S. pombe is an exemplary organ-ism to pioneer systems level research.

One of the challenges of biology is to identify the fundamental requirements for a functioning eukaryotic cell. This will be achieved by the integrated efforts of in-dividual bench scientists and genome wide studies. Despite a general correlation in proteome size between fission yeast and budding it appears that fission yeast is more similar in protein complement to higher eukaryotes than any single celled organism sequenced so far. To quote Mitsohiro Yanagida in the review ‘S. pombe the model eukaryotic organism’; “Researchers who are seriously interested in the evolution and establishment of eukaryotic organisms must consider fission yeast as a premier organism for study” (Yanagida 2002).

Acknowledgements

The author would like to thank M Aslett, J Bähler and M Harris for proofreading comments. M Aslett and the GeneDB programmers for technical support, L Groo-cock for mitochondrial proteome reannotation, and the staff at SGD and the GO editorial office.

References

Aarstad K, Oyen TB (1975) On the distribution of 5s RNA cistrons on the genome of Sac-charomyces cerevisiae. FEBS Lett 51:227-231

Akamatsu Y, Dziadkowiec D, Ikeguchi M, Shinagawa H, Iwasaki H (2003) Two different Swi5-containing protein complexes are involved in mating-type switching and recom-bination repair in fission yeast. Proc Natl Acad Sci 100:15770-15775

Alschtul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403-410

Schizosaccharomyces pombe comparative genomics; from sequence to systems 275

Alschtul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) gapped BLAST and PSI-BLAST a new generation of database search programs. Nu-cleic Acids Res 1:3389-3402

Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A, Gelpke MD, Roach J, Oh T, Ho IY, Wong M, Detter C, Verhoef F, Predki P, Tay A, Lucas S, Richardson P, Smith SF, Clark MS, Edwards YJ, Doggett N, Zharkikh A, Tavtigian SV, Pruss D, Barnstead M, Evans C, Baden H, Powell J, Glusman G, Rowen L, Hood L, Tan YH, Elgar G, Hawkins T, Venkatesh B, Rokhsar D, Brenner S (2002) Whole-genome shotgun assembly and analysis of the genome of Fugu ribicans. Science 297:1301-1310

Appelgren H, Kniola B, Ekwall K (2003) Distinct centromere domain structures with sepa-rate functions demonstrated in live fission yeast cells. J Cell Sci 116:4035-4042

Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS (2004) UniProt: The universal protein knowledgebase. Nucleic Acids Res 32:D138-D141

Aravind L, Watanabe H, Lipman DJ, Koonin EV (2000) Lineage-specific gene loss and di-vergence of functionally linked genes in eukaryotes. Proc Natl Acad Sci 97:11319-11324

Asakawa H, Hayashi A, Haraguchi T, Hiraoka Y (2005) Dissociation of the Nuf2-Ndc80 complex releases centromeres from the spindle-pole body during meiotic prophase in fission yeast. Mol Biol Cell 16:2325-2538

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene Ontol-ogy: tool for the unification of biology. Nat Genet 25:25-29

Bähler J, Wyler T, Loidl J, Kohli J (1993) Unusual nuclear structures in meiotic prophase of fission yeast: a cytological analysis. J Cell Biol 121:241-256

Barry JD, Ginger ML, Burton P, McCulloch R (2003) Why are parasitic contingency genes often associated with telomeres? Int J Parasitol 33:29-45

Barnitz JT, Cramer JH, Rownd RH, Cooley L, Soll D (1982) Arrangement of the ribosomal RNA genes in Schizosaccharomyces pombe. FEBS Lett 143:129-132

Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR (2004) The Pfam protein families database. Nucleic Acids Res 32:D138-D141

Baum M, Ngan VK, Clarke L (1994) The centromeric K-type repeat and the central core are together sufficient to establish a Schizosaccharomyces pombe centromere. Mol Biol Cell 5:747-761

Behrens R, Hayles J, Nurse P (2000) Fission yeast retrotransposons Tf1 integration is tar-geted to the 5’ ends of open reading frames. Nucleic Acids Res 28:4709-4716

Berbee ML, Taylor JW (1993) Dating the evolutionary radiations of the true fungi. Can J Bot 71:1114-1127

Birney E, Thompson JD, Gibson TJ (1996) PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA transla-tion frames. Nucleic Acids Res 24:2730-2739

Blandin G, Durrens P, Tekaia F, Aigle M, Bolotin-Fukuhara M, Bon E, Casaregola S, de Montigny J, Gaillardin C, Lepingle A, Llorente B, Malpertuy A, Neuveglise C, Ozier-Kalogeropoulos O, Perrin A, Potier S, Souciet J, Talla E, Toffano-Nioche C, Weso-

276 Valerie Wood

lowski-Louvel M, Marck C, Dujon B (2000) The genome of Saccharomyces cerevisiae revisited. FEBS Lett 487:31-36

Bowen NJ, Jordan IK, Epstein JA, Wood V, Levin HL (2003) Retrotransposons and their recognition of pol II promoters: A comprehensive survey of the transposable elements from the complete genome sequence of Schizosaccharomyces pombe. Genome Res 13:1984-1997

Brachat S, Dietrich FS, Voegeli S, Zhang Z, Stuart L, Lerch A, Gates K, Gaffney T, Philippsen P (2003) Reinvestigation of the Saccharomyces cerevisiae genome annota-tion by comparison to the genome of a related fungus: Ashbya gossypii. Genome Biol 4:R45

Brazma A, Jonassen I, Vilo J, Ukkonen E (1998) Predicting gene regulatory elements in silico on a genomic scale. Genome Res 8:1202-1215

Broach JR, Li YY, Feldman J, Jayaram M, Abraham J, Nasmyth KA, Hicks JB (1983) Lo-calization and sequence analysis of yeast origins of DNA replication. Cold Spring Harb Symp Quant Biol 47 Pt2:1165-1173

Camon E, Magrane M, Barrell D, Binns D, Fleischmann W, Kersey P, Mulder N, Oinn T, Maslen J, Cox A, Apweiler R (2004) The Gene Ontology Annotation (GOA) database: sharing knowledge in Uniprot with gene ontology. Nucleic Acids Res 32:D262-66

Chalker DL, Sandmeyer SB (1992) Ty3 integrates within the region of RNA polymerase III transcription initiation. Genes Dev 6:117-128

Chen CY, Shyu AB (1995) AU-rich elements: characterization and importance in mRNA degradation. Trends Biochem Sci 20:465-470

Chen D, Toone WM, Mata J, Lyne R, Burns G, Kivinen K, Brazma A, Jones N, Bähler J (2003) Global responses of fission yeast to environmental stress. Mol Biol Cell 14:214-229

Chervitz SA, Aravind L, Sherlock G, Ball CA, Koonin EV, Dwight SS, Harris MA, Dolinski K, Mohr S, Smith T, Weng S, Cherry JM, Botstein D (1998) Comparison of the complete protein sets of worm and yeast: Orthology and divergence. Science 282:2022-2028

Chikashige Y, Kinoshita N, Nakaseko Y, Matsumoto T, Murakami S, Niwa O, Yanagida M (1989) Composite motifs and repeat symmetry in S. pombe centromeres: Direct analy-sis by integration of NotI restriction sites. Cell 57:739-751

Chikashige Y, Ding DQ, Funabiki H, Haraguchi T, Mashiko S, Yanagida M, Hiraoka Y (1994) Telomere-led premeiotic chromosome movement in fission yeast. Science 264:270-273

Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO, Herskowitz I (1998) The transcriptional program of sporulation in budding yeast. Science 282:699-705

Clarke L, Baum MP (1990) Functional analysis of a centromere from fission yeast: a role for centromere-specific repeated DNA sequences. Mol Cell Biol 10:1863-1872

Cliften PF, Hillier LW, Fulton L, Graves T, Miner T, Gish WR, Waterston RH, Johnston M (2001) Surveying Saccharomyces genomes to identify functional elements by com-parative DNA sequence analysis. Genome Res 11:1175-1186

Cliften P, Sudarsanam P, Desikan A, Fulton L, Fulton B, Majors J, Waterston R, Cohen BA, Johnston M (2003) Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301:71-76

Clyne RY, Kelly TJ (1999) Genetic analysis of an ARS element from the fission yeast Schizosaccharomyces pombe. EMBO J 14:6348-6357

Schizosaccharomyces pombe comparative genomics; from sequence to systems 277

Copley R, Goodstadt L, Ponting C (2003) Eukaryotic domain evolution inferred from ge-nome comparisons. Curr Opin Genet Dev 13:623-628

Dai J, Chuang R-Y, Kelly T (2005) DNA replication origins in the Schizosaccharomyces pombe genome. PNAS 102:337-342

Daga RR, Bolanos P, Moreno S (2003) Regulated mRNA stability of the Cdk inhibitor Rum1 links nutrient status to cell cycle progression. Curr Biol 13:2015-2024

Davis JC Petrov DA (2004) Preferential duplication of conserved proteins in eukaryotic ge-nomes. PLoS 2:E55

Decottignies A, Sanchez-Perez I, Nurse P (2003) Schizosaccharomyces pombe essential genes: A pilot study. Genome Res 13:399-406

DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278:680-686

Dietrich FS, Voegeli S, Brachat S, Lerch A, Gates K, Steiner S, Mohr C, Pohlmann R, Luedi P, Choi S, Wing RA, Flavier A, Gaffney TD, Philippsen P (2004) The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome. Science 304:304-307

Doe CL, Wang G, Chow C, Fricker MD, Singh PB, Mellor EJ (1998) The fission yeast chromodomain encoding gene chp1(+) is required for chromosome segregation and shows a genetic interaction with alpha-tubulin. Nucleic Acids Res 26:4222-4229

Dolinski K, Balakrishnan R, Christie KR, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hirschman JE, Hong EL, Nash R, Oughtred R, Theesfeld CL, Binkley G, Lane C, Schroeder M, Sethuraman A, Dong S, Weng S, Miyasato S, Andrada R, Botstein D, Cherry JM "Saccharomyces Genome Database" http://www.yeastgenome.org/

Domazet-Loso T, Tautz D (2003) An evolutionary analysis of orphan genes in Drosophila. Genome Res 13:2213-2219

Dubey DD, Kim SM, Todorov IT, Huberman JA (1996) Large, complex modular structure of a fission yeast DNA replication origin. Curr Biol 6:467-473

Eddy SR (2002) Computational genomics of noncoding RNA genes. Cell 109:137-140 Ekwall K, Javerzat JP, Lorentz A, Schmidt H, Cranston G Allshire R (1995) The chromo-

domain protein Swi6: a key component of fission yeast centromeres. Science 269:1429-1431

Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of ge-nome-wide expression patterns. Proc Natl Acad Sci 95:14863-14864

Ettwiller LM, Rung J, Birney E (2003) Discovering novel cis-regulatory motifs using func-tional networks. Genome Res 13:883-895

Fan JB, Chikashige Y, Smith CL, Niwa O, Yanagida M, Cantor CR (1988) Construction of a Not I restriction map of the fission yeast Schizosaccharomyces pombe. Nucleic Acids Res 17:2801-2818

Fink GR (1987) Pseudogenes in yeast? Cell 49:5-6 Fitch WM (1970) Distinguishing homologs from analogous proteins. Berlin-Heidelberg-

New York, Springer-Verlag Fitzgerald-Hayes, Clarke L, Carbon J (1982) Nucleotide sequence comparisons and func-

tional analysis of yeast centromere DNAs. Cell 29:235-244 Forsburg SL (1999) The best yeast? Trends Genet 15:340-344 Foury F, Roganti T, Lecrenier N, Purnelle B (1998) The complete sequence of the mito-

chondrial genome of Saccharomyces cerevisiae. FEBS Lett 440:325

278 Valerie Wood

Fraser AG, Kamath RS, Zipperlen P, Martinez-Campos M, Sohrmann M, Ahringer J (2000) Functional genomic analysis of C. elegans chromosome I by systematic RNA interfer-ence. Nature 408:325-330

Fraser AG, Marcotte EM (2004) A probabilistic view of gene function. Nat Genet 36:559-564

Fraser HB, Wall DP, Hirsh AE (2003) A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol Biol 3:11

Fujioka Y, Kimata Y, Nomaguchi K, Watanabe K, Kohno K (2002) Identification of a novel non-structural maintenance of chromosomes (SMC) componet of the SMC5-SMC6 complex involved in DNA repair. J Biol Chem 277:21585-21591

Gaillardin C, Duchateau-Nguyen G, Tekaia F, Llorente B, Casaregola S, Toffano-Nioche C, Aigle M, Artiguenave F, Blandin G, Bolotin-Fukuhara M, Bon E, Brottier P, de Montigny J, Dujon B, Durrens P, Lepingle A, Malpertuy A, Neuveglise C, Ozier-Kalogeropoulos O, Potier S, Saurin W, Termier M, Wesolowski-Louvel M, Wincker P, Souciet J, Weissenbach J (2000) Genomic exploration of the hemiascomycetous yeasts: 21 Comparative functional classification of genes. FEBS Lett 487:134-149

Garrels JI (2002) Yeast genomic databases and the challenge of the post-genomic era. Funct Integr Genomics 2:212-237

Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO (2000) Genomic expression programs in the response of yeast cells to environ-mental changes. Mol Biol Cell 11:4241-4257

Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H, Oliver SG (1996) Life with 6000 genes. Science 274:546-567

Gomez M, Antequera F (1999) Organization of DNA replication origins in the fission yeast genome. EMBO J 18:5683-5690

Halme A, Bumgarner S, Styles C, Fink GR (2004) Genetic and epigenetic regulation of the FLO gene family generates cell-surface variation in yeast. Cell 116:405-415

Hall IM, Shankaranarayana GD, Noma K, Ayoub N, Cohen A, Grewel SI (2002) Estab-lishment and maintenance of a heterochromatin domain. Science 297:2215-2218

Hansen KR, Burns G, Mata J, Volpe TA, Martienssen RA, Bähler J, Thon G (2005) Global effects on gene expression in fission yeast by silencing and RNA interference machin-eries. Mol Cell Biol 25:590-601

Heckman DS, Geiser DM, Eidell BR, Stauffer RL, Kardos NL, Hedges SB (2001) Molecu-lar evidence for the early colonization of land by fungi and plants. Science 293:1129-1133

Hertz-Fowler C, Peacock CS, Wood V, Aslett M, Kerhornou A, Mooney P, Tivey A, Ber-riman M, Hall N, Rutherford K, Parkhill J, Ivens AC, Rajandream MA, Barrell B (2004) GeneDB: a resource for prokaryotic and eukaryotic organisms Nucleic Acids Res 32:D339-D343

Hirsh A, Fraser HB (2001) Protein dispensability and rate of evolution. Nature 411:1046-1049

Hirotsune S, Yoshida N, Chen A, Garrett L, Sugiyama F, Takahashi S, Yagami K, Wyn-shaw-Boris A, Yoshiki A (2003) An expressed pseudogene regulates the messenger-RNA stability of its homologous coding gene. Nature 423:91-96

Hughey R, Krogh A (1996) Hidden Markov models for sequence analysis: extensions and analysis of the basic method. Comput Appl Biosci 12:95-107

Schizosaccharomyces pombe comparative genomics; from sequence to systems 279

Ivanov IP, Gesteland RF, Matsufuji S (1998) Programmed frameshifting in the synthesis of mammalian anitzyme is +1 in mammals predominantly +1 in fission yeast, but -2 in budding yeast. RNA 4:1230-1238

Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL (2000) The large scale organization of metabolic networks. Nature 407:651-654

Jeong H, Mason SP, Barabasi AL, Oltvai ZN (2001) Lethality and centrality in protein net-works. Nature 411:41-42

Jones RH, Jones NC (1989) Mammalian cAMP-responsive element can activate transcrip-tion in yeast and binds a yeast factor(s) that resembles mammalian transcription factor ATF. Proc Natl Acad Sci 86:2176-2180

Jordan IK, Wolf YI, Koonin EV (2003) No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol Biol 3:1

Kamath RS, Fraser AG, Dong Y, Poulin G, Durbin R, Gotta M, Kanapin A, Le Bot N, Mo-reno S, Sohrmann M, Welchman DP, Zipperlen P, Ahringer J (2003) Systematic func-tional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421:231-237

Kanoh J, Ishikawa F (2003) Composition and conservation of the telomeric complex. Cell Mol Life Sci 60:2295-2302

Käufer NF, Potashkin J (2000) Analysis of the splicing machinery in fission yeast: a com-parison with budding yeast and mammals. Nucleic Acids Res 28:3003-3010

Kellis M, Birren B, Lander ES (2004) Proof and evolutionary analysis of ancient genome duplication in yeast Saccharomyces cerevisiae. Nature 428:617-624

Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES (2003) Sequencing and compari-son of yeast species to identify genes and regulatory elements. Nature 423:241-254

Keogh RS, Seoighe C, Wolfe KH (1998) Evolution of gene order and chromosome number in Saccharomyces, Kluyveromyces and related fungi. Yeast 14:443-457

Kim JM, Vanguri S, Boeke JD, Gabriel A, Voytas DF (1998) Transposable elements and genome organization: A comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence. Genome Res 8:464-478

Kitajima TS, Kawashima SA, Watanabe Y (2004) The conserved kinetochore protein shugoshin protects centromeric cohesion during meiosis. Nature 427:510-517

Kniola B, O’Toole E, McIntosh JR, Mellone B, Allshire R, Mengarelli S, Hultenby K, Ek-wall K (2001) The domain structure of centromeres is conserved from fission yeast to humans. Mol Biol Cell 12:2767-2775

Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Rogozin IB, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA. (2004) A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Ge-nome Biol 5:R7

Koszul R, Caburet S, Dujon B, Fischer G (2004) Eucaryotic genome evolution through the spontaneous duplication of large chromosomal segments. EMBO J 23:234-243

Kunin V, Pereira-Leal JB, Ouzounis CA (2004) Functional evolution of the yeast protein interaction network. Mol Biol Evol 21:1711-1716

Krupp G, Cherayil B, Frendewey D, Nishikawa S, Soll D (1986) Two RNA species co-purify with RNase P from the fission yeast S. pombe. EMBO J 5:1697-703

280 Valerie Wood

Krylov DM, Wolf YI, Rogozin IB, Koonin EV (2003) Gene loss, protein sequence diver-gence, gene dispensability, expression level, and interactivity are correlated in eu-karyotic evolution. Genome Res 10:2229-2235

Kuhn AN, Käufer NF (2003) Pre-mRNA splicing in Schizosaccharomyces pombe. Curr Genet 42:241-251

Kulikova T, Aldebert P, Althorpe N, Baker W, Bates K, Browne P, van den Broek A, Cochrane G, Duggan K, Eberhardt R, Faruque N, Garcia-Pastor M, Harte N, Kanz C, Leinonen R, Lin Q, Lombard V, Lopez R, Mancuso R, McHale M, Nardone F, Silven-toinen V, Stoehr P, Stoesser G, Tuli MA, Tzouvara K, Vaughan R, Wu D, Zhu W, Apweiler R (2004) The EMBL nucleotide sequence database. Nucleic Acids Res 32:D115-D119

Lang BF, Cedergren R, Gray MW (1987) The mitochondrial genome of the fission yeast, Schizosaccharomyces pombe. Sequence of the large-subunit ribosomal RNA gene, comparison of potential secondary structure in fungal mitochondrial large-subunit rRNAs and evolutionary considerations. Eur J Biochem 169:527-537

Langkjaer RB, Cliften P, Johnston M, Piskur J (2003) Yeast genome duplication was fol-lowed by asynchronous differentiation of duplicated genes. Nature 421:848-852

Lespinet O, Wolf YI, Koonin EV, Aravind L (2002) The role of lineage-specific gene fam-ily expansion in the evolution of eukaryotes. Genome Res 12:1048-1059

Levin HL (1995) A novel mechanism of self-primed reverse transcription defines a new family of retroelements. Mol Cell Biol 15:3310-3317

Levin H, Weaver DC, Boeke JD (1990) Two related families of retrotransposons from Schizosaccharomyces pombe. Mol Cell Biol 10:6791-6798

Li L, Stoeckert CJ, Roos DS (2003) OrthoMCL: Identification of ortholog groups for eu-karyotic genomes. Genome Res 13:2178-2189

Lorentz A, Ostermann K, Fleck O (1994) Switching gene swi6, involved in repression of si-lent mating-type loci in fission yeast, encodes a homologue of chromatin-associated proteins from Drosophila and mammals. Gene 143:139-143

Lorenz A, Wells JL, Pryce DW, Novatchkova M, Eisenhaber F, McFarlane RJ, Loidl J (2004) S. pombe linear elements contain proteins related to synaptonemal complex components. J Cell Sci 117:3345-3351

Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955-964

Lowe TM, Eddy SR (1999) A computational screen for methylation guide snoRNAs in yeast. Science 283:1168-1171

Lowndes NF, McInerny CJ, Johnson AL, Fantes PA, Johnston LH (1992) Control of the DNA synthesis genes in fission yeast by the cell-cycle gene cdc10+. Nature 355:449-453

Lum PY, Edwards S, Wright R (1996) Molecular, functional and evolutionary characteriza-tion of the gene encoding HMG-CoA reductase in the fission yeast Schizosaccharomy-ces pombe. Yeast 12:1107-1124

Malik HS, Eikbush TH (1999) Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol 73:5186-5190

Mandell J, Goodrich KJ, Bähler J, Cech TR (2004) Expression of a RecQ helicase homolog affects progression through crisis in fission yeast lacking telomerase. J Biol Chem 280:5249-5257

Mandell JG, Bähler J, Volpe TA, Martienssen RA, Cech TR (2005) Global expression changes resulting from loss of telomeric DNA in fission yeast. Genome Biol 6:R1

Schizosaccharomyces pombe comparative genomics; from sequence to systems 281

Mao J, Appel B, Schaack J (1982) The 5S RNA genes of Schizosaccharomyces pombe. Nu-cleic Acids Res 10:487-500

Masakuto H, Huberman JA, Frattini MG, Kelly TJ (2004) DNA replication in S. pombe. In: The molecular biology of Schizosaccharomyces pombe (Egel R, Ed). Springer-Verlag Heidelberg, pp73-99

Mata J, Lyne R, Burns G, Bähler J (2002) The transcriptional program of meiosis and sporulation in fission yeast. Nat Genet 32:143-147

Mata J, Bähler J (2003) Corrlelations between gene expression and gene conservation in fission yeast. Genome Res 13:2686-2690

Maundrell K, Hutchison A, Shall S (1988) Sequence analysis of ARS elements in fission yeast. EMBO J 7:2203-2209

Maxwell PH, Coombes C, Kenny AE (2004) Ty1 mobilizes subtelomeric Y’ elements in te-lomerase-negative Saccharomyces cerevisiae survivors. Mol Cell Biol. 24:9887-9898

Molnar M, Parisi S, Kakihara Y (2001) Characterization of rec7, an early meiotic recombi-nation gene in Schizosaccharomyces pombe. Genetics 2:519-532

Morimyo M, Mita K, Hongo E, Higashi T, Sugaya K, Ajimura M, Yamauchi M, Tsuji S, Park W.-Y, Sasanuma S, Nohata J, Kimura T, Inoue H, Ishihara Y (1998) cDNA cata-log of fission yeast (Schizosaccharomyces pombe) and its application for cloning of mammalian DNA repair gene. In: Biodefence mechanisms against environmental stress (Ozawa T, Hori T, Tatsumi K Eds), Springer Verlag Tokyo, Heidelberg, pp 115-123

Mott R (1997) EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA. Comput Appl Biosci 4:477-478

Mundt KE, Porte J, Murray JM, Brikos C, Christensen PU, Caspari T, Hagan IM, Millar JB, Simanis V, Hofmann K, Carr AM (1999) The COP9/signalosome complex is con-served in fission yeast and has a role in S phase. Curr Biol 9:1427-1430

Murakami S, Matsumoto T, Niwa O, Yanagida M (1991) Structure of the fission yeast cen-tromere cen3: direct analysis of the reiterated inverted region. Chromosoma 101:214-221

Nimmo ER, Pidoux AL, Perry PE, Allshire RC (1998) Defective meiosis in telomere si-lencing mutants of Schizosaccharomyces pombe. Nature 392:825-828

Nurse P (2000) A long twentieth century of the cell cycle and beyond. Cell 100:71-78 Nurse P (2003) Understanding cells. Nature 424:883 Ohno S (1970) Evolution by gene duplication. Springer-Verlag, Berlin-Heidelberg-New

York Oliver SG, van der Aart QJ, Agostoni-Carbone ML, Aigle M, Alberghina L, Alexandraki

D, Antoine G, Anwar R, Ballesta JP, Benit P, et al. (1992) The complete DNA se-quence of yeast chromosome III. Nature 357:38-46

Ozier-Kalogeropoulos O, Malpertuy A, Boyer J, Tekaia F, Dujon B (1998) Random explo-ration of the K. lactis genome and comparison to that of S. cerevisiae. Nucleic Acids Res 26:5511-5524

Pagel P, Mewes H-W, Frishman D (2004) Conservation of protein-protein interactions – lessons from ascomycota. Trends Genet 20:72-76

Pasero P, Marilley M (1993) Size variation of rDNA clusters in the yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe. Mol Gen Genet 236:448-452

Pearson W, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci 85:2444-2448

282 Valerie Wood

Piskur J (2001) Origin of the duplicated regions in the yeast genomes. Trends Genet 16:302-303

Pollack JR, Iyer VR (2002) Characterizing the physical genome. Nat Genet Suppl 32:515-521

Pradet-Balade B (2001) Translation control: bridging the gap between genomics and pro-teomics? Trends Biochem Sci 26:225-229

Prado F, and Aguilera A (2005) Partial depletion of histone H4 increases homologous re-combination-mediated genetic instability. Mol Cell Biol 24:1526-1536

Prince VE, Pickett (2002) Splitting pairs: The diverging fates of duplicated genes. Nat Rev Genet 3:827-837

Pyne S, Skiena S, Futcher B (2005) Copy correction and concerted evolution in the conser-vation of yeast genes. PLoS Biol, in press

Raghuraman MK, Winzeler EA, Collingwood D, Hunt S, Wodicka L, Conway A, Lockhart DJ, Davis RW, Brewer BJ, Fangman WL (2001) Replication dynamics of the yeast genome. Science 294:115-121

Remacle JE, Albrecht G, Brys R, Braus GH, Huylebroeck D (1997) Three classes of mam-malian transcription activation domain stimulate transcription in Schizosaccharomyces pombe. EMBO J 16:5722-5729

Remm M, Storm CE, Sonnhammer EL (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 314:1041-1052

Ribes V, Dehoux P, Tollervey D (1988) 7SL RNA from S. pombe is encoded by a single copy essential gene. EMBO J 7:231-237

Robyr D, Suka Y, Xenarios I, Kurdisatani SK, Wang A, Suka N, Grunstein M (2002) Mi-croarray deacetylation maps determine genome-wide functions for yeast histone deace-tylases. Cell 1009:437-466

Rustici G, Mata J, Kivinen K, Lio P, Penkett CJ, Burns G, Hayles J, Brazma A, Nurse P, Bähler J (2004) Periodic gene expression program of the fission yeast cell cycle. Nat Genet 36:809-817

Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B (2000) Artemis: sequence visualization and annotation. Bioinformatics 16:944-945

Schaak J, Mao J, Söll D (1982) The 5.8S RNA gene sequence and the ribosomal repeat of S. pombe. Nucleic Acids Res 10:2851-2864

Scherthan H, Bähler J, Kohli J (1994) Dynamics of chromosome organization and pairing during meiotic prophase in fission yeast. J Cell Biol 127:273-285

Scherthan H (2001) A bouquet makes ends meet. Nat Rev Mol Cell Biol 2:621-627 Schroder AR, Shinn P, Chen H, Berry C, Ecker JR, Bushman F (2002) HIV-1 integration in

the human genome favors active genes and local hotspots. Cell 110:521-529 Segurado M, de Luis A, Antequera F (2003) Genome-wide distribution of DNA replication

origins at A+T rich islands in Schizosaccharomyces pombe. EMBO reports 4:1048-1053

Singleton TL, Levin HL (2002) A long terminal repeat retrotransposon of fission yeast has strong preferences for specific sites of insertion. Eukaryot Cell 1:44-55

Sipiczki M (2001) Where does fission yeast sit on the tree of life? Genome Biol 1:1011.1-1011.4

Smith CL, Matsumoto T, Niwa O, Klco S, Fan JB, Yanagida M, Cantor CR (1987) An electrophoretic karyotype for Schizosaccharomyces pombe by pulsed field gel electro-phoresis. Nucleic Acids Res 15:4481-4491

Schizosaccharomyces pombe comparative genomics; from sequence to systems 283

Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments Proteins 3:405-420

Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B (1998) Comprehensive identification of cell-cycle regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9:3273-3297

Sunnerhagen P (2002) Prospects for functional genomics in Schizosaccharomyces pombe. Curr Genet 42:73-84

Takahashi K, Murakami S, Chikashige Y, Funabiki H, Niwa O, Yanagida M (1992) A low copy number central sequence with strict symmetry and unusual chromatin structure in the fission yeast centromere. Mol Biol Cell 3:819-835

Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on global families. Science 278:631-637

Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) The COG database: an updated ver-sion includes eukaryotes. BMC Bioinformatics 4:41

Teichmann SA (2002) The constraints protein-protein interactions place on sequence diver-gence. J Mol Biol 324:399-407

The C. elegans sequencing consortium (1998) Genome sequence of the nematode C. ele-gans : a platform for investigating biology. Science 282:2012-2018

The Gene Ontology Consortium (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32:D258-D261

Theis JF, Newlon CS (1997) The ARS309 chromosomal replicator of Schizosaccharomyces cerevisiae depends on an exceptional ARS consensus sequence. Proc Natl Acad Sci USA 94:10786-10791

Theis JF, Newlon CS (2001) Two compound replication origins in Saccharomyces cere-visiae contain redundant origin complex binding sites. Mol Cell Biol 21:2790-2801

Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl Acids Res 22:4673-4680

van Driel R (2003) The eukaryotic genome: a system regulated at different hierarchical lev-els. J Cell Sci 116:4067-4075

Villeneuve AM, Hillers KJ (2001) Whence meiosis? Cell 106:647-650 Volpe TA, Kidner C, Hall IM, Teng G , Grewal SI, Martienssen RA (2002) Regulation of

heterochromatin silencing and histone H3 lysine-9 methylation by RNAi. Science 297:1833-1837

Volpe T, Schramke V, Hamilton, White SA, Teng G, Martienssen RA, Allshire RC (2003) RNA interference is required for normal centromere function in fission yeast. Chromo-some Res 11:137-146

Watanabe Y, Yamamoto M (1994) S. pombe mei2+ encodes an RNA-binding protein essen-tial for premeiotic DNA synthesis and meiosis I, which cooperates with a novel RNA species meiRNA. Cell 78:487-498

Watanabe T, Miyashita K, Saito TT (2001) Comprehensive isolation of meiosis-specific genes identifies novel proteins and unusual non-coding transcripts in Schizosaccharo-myces pombe. Nucleic Acids Res 29:327-337

Watanabe T, Miyashita K, Saito TT, Nabeshima K, Nojima H (2002) Abundant poly (A)-bearing RNAs that lack open reading frames in S. pombe. DNA Res 9:209-215

284 Valerie Wood

Webb CJ, Wise JA (2004) The splicing factor U2AF small subunit is functionally con-served between fission yeast and humans. Mol Cell Biol 10:4229-4240

Wood V, Rutherford K, Ivens A, Rajandream M-A, Barrell B (2001) A re-annotation of the Saccharomyces cerevisiae genome. Comp Funct Genom 2:143-154

Wood V, Gwilliam R, Rajandream MA, Lyne M, Lyne R, Stewart A, Sgouros J, Peat N, Hayles J, Baker S, Basham D, Bowman S, Brooks K, Brown D, Brown S, Chilling-worth T, Churcher C, Collins M, Connor R, Cronin A, Davis P, Feltwell T, Fraser A, Gentles S, Goble A, Hamlin N, Harris D, Hidalgo J, Hodgson G, Holroyd S, Hornsby T, Howarth S, Huckle EJ, Hunt S, Jagels K, James K, Jones L, Jones M, Leather S, McDonald S, McLean J, Mooney P, Moule S, Mungall K, Murphy L, Niblett D, Odell C, Oliver K, O'Neil S, Pearson D, Quail MA, Rabbinowitsch E, Rutherford K, Rutter S, Saunders D, Seeger K, Sharp S, Skelton J, Simmonds M, Squares R, Squares S, Stevens K, Taylor K, Taylor RG, Tivey A, Walsh S, Warren T, Whitehead S, Wood-ward J, Volckaert G, Aert R, Robben J, Grymonprez B, Weltjens I, Vanstreels E, Rieger M, Schafer M, Muller-Auer S, Gabel C, Fuchs M, Fritzc C, Holzer E, Moestl D, Hilbert H, Borzym K, Langer I, Beck A, Lehrach H, Reinhardt R, Pohl TM, Eger P, Zimmermann W, Wedler H, Wambutt R, Purnelle B, Goffeau A, Cadieu E, Dreano S, Gloux S, Lelaure V, Mottier S, Galibert F, Aves SJ, Xiang Z, Hunt C, Moore K, Hurst SM, Lucas M, Rochet M, Gaillardin C, Tallada VA, Garzon A, Thode G, Daga RR, Cruzado L, Jimenez J, Sanchez M, del Rey F, Benito J, Dominguez A, Revuelta JL, Moreno S, Armstrong J, Forsburg SL, Cerrutti L, Lowe T, McCombie WR, Paulsen I, Potashkin J, Shpakovski GV, Ussery D, Barrell BG, Nurse P (2002) The genome se-quence of Schizosaccharomyces pombe. Nature 415:871-880

Wolfe KH, Shields DC (1997) Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387:708-713

Wolfe K (2004) Evolutionary genomics: Yeast accelerate beyond BLAST. Curr Biol 14: R392-R394

Wong S, Butler G, Wolfe KH (2002) Gene order evolution and paleopolyploidy in hemias-comycete yeasts. Proc Natl Acad Sci 14:9272-9277

Wyrick JJ, Aparicio JG, Chen T, Barnett JD, Jennings EG, Young RA, Bell SP, Aparicio OM (2001) Genome-wide distribution of ORC and NCN proteins in S. cerevisiae: high resolution mapping of replication origins. Science 294:2357-2360

Yamanda M, Hayatsu N, Matsuura A, Ishikawa F (1998) Y’-Help1, a DNA helicase en-coded by the yeast subtelomeric Y’ element, is induced in survivors defective for te-lomerase. J Biol Chem 273:33360-33366

Yanagida M (2002) The model unicellular eukaryote, Schizosaccharomyces pombe. Ge-nome Biol 3:COMMENT2003.1-2003.4

Yieh L, Kassavetis G, Geiduscheck EP, Sandmeyer SB (2000) The Brf and TATA-binding proteins subunits of the RNA polymerase III transcription factor IIIB mediate position specific integration of the gypsy-like element, Ty3. J Biol Chem 275:29800-29807

Young JA, Schreckhise RW, Steiner WW, Smith GR (2002) Meiotic recombination remote from prominent break sites in S. pombe. Mol Cell 9:253-263

Young JA, Hyppa RW, Smith GR (2004) Swi5 acts in meiotic DNA joint molecule forma-tion in Schizosaccharomyces pombe. Genetics 167:593-605

Zdobnov EM, von Mering C, Letunic I, Bork P (2002) Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science 298:149-159

Schizosaccharomyces pombe comparative genomics; from sequence to systems 285

Zhu C, Karplus K, Grate L, Coffino P (2000) A homolog of mammalian antizyme is pre-sent in fission yeast Scizosaccharomyces pombe but not detected in budding yeast Sac-charomyces cerevisiae. Bioinformatics 16:478-481

Wood, Valerie

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK [email protected]

68

Submitted Publication

7. Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe.

©20

10 N

atur

e A

mer

ica,

Inc.

All

righ

ts r

eser

ved.

NATURE BIOTECHNOLOGY VOLUME 28 NUMBER 6 JUNE 2010 617

R E S O U R C E

Systematic genome-wide gene deletion collections of eukaryotic organisms provide powerful tools for investigating molecular mecha-nisms in basic biology and for identifying pathways that can be targeted in bioengineering or medical applications, as shown by pioneering studies with the budding yeast Saccharomyces cerevisiae1–5. The con-struction of systematic gene deletion collections is difficult, although RNA interference (RNAi) provides a popular alternative approach to ablate gene activity in many organisms. However, RNAi approaches suffer from drawbacks such as partial knockdown of gene expression and off-target effects. For example, RNAi screens in fly and human cells revealed only a 10–38% overlap in genes identified as being required for the cell cycle between these two organisms6.

We have constructed a genome-wide gene deletion set for the fission yeast Schizosaccharomyces pombe. Fission and budding yeast are not closely related and differ in a number of aspects includ-ing organization of the cell cycle, heterochromatin, complexity of centromeres and DNA replication origins and the prevalence of introns7, which makes their comparison valuable for defining genes and processes required more generally in eukaryotes. Here, we have identified similarities and differences in gene dispensability between

the two yeasts and have used growth fitness profiling to identify genes haploinsufficient or haploproficient for growth.

RESULTSDeletion construction and gene dispensabilityWe have constructed 4,836 heterozygous deletions covering 98.4% of the 4,914 protein coding open reading frames (ORFs) based on the annotated genome sequence7 (http://www.genedb.org/genedb/pombe, 01/04/08) (Online Methods and Supplementary Table 1; for all the PCR primer sets and the mapping data, see Supplementary Data 1 and 2; also available at http://pombe.kaist.ac.kr/nbtsupp/). In addition, we have deleted 9 Tf2 transposons, 39 dubious genes8 and 48 pseudogenes (Supplementary Table 2). Each gene was deleted and replaced using homologous recombination by a ‘deletion cassette’ con-taining the KanMX marker gene9 (Supplementary Data 3) flanked by a pair of unique molecular bar codes (Fig. 1a, Supplementary Fig. 1 and Supplementary Table 1). Several pilot scale deletion stud-ies have been carried out10–12 and it was suggested that 40~80 bp of homology is not always sufficient for the recombination required for the systematic deletion of genes in fission yeast12. Both block PCR

Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombeDong-Uk Kim1,15, Jacqueline Hayles2,15, Dongsup Kim3,15, Valerie Wood2,4,15, Han-Oh Park5,15, Misun Won1,15, Hyang-Sook Yoo1,15, Trevor Duhig2, Miyoung Nam1, Georgia Palmer2, Sangjo Han3, Linda Jeffery2, Seung-Tae Baek1, Hyemi Lee1, Young Sam Shim1, Minho Lee3, Lila Kim1, Kyung-Sun Heo1, Eun Joo Noh1, Ah-Reum Lee1, Young-Joo Jang6, Kyung-Sook Chung1, Shin-Jung Choi1, Jo-Young Park1, Youngwoo Park1, Hwan Mook Kim7, Song-Kyu Park7, Hae-Joon Park5, Eun-Jung Kang5, Hyong Bai Kim8, Hyun-Sam Kang9, Hee-Moon Park10, Kyunghoon Kim11, Kiwon Song12, Kyung Bin Song13, Paul Nurse2,14 & Kwang-Lae Hoe1,6

We report the construction and analysis of 4,836 heterozygous diploid deletion mutants covering 98.4% of the fission yeast genome providing a tool for studying eukaryotic biology. Comprehensive gene dispensability comparisons with budding yeast—the only other eukaryote for which a comprehensive knockout library exists—revealed that 83% of single-copy orthologs in the two yeasts had conserved dispensability. Gene dispensability differed for certain pathways between the two yeasts, including mitochondrial translation and cell cycle checkpoint control. We show that fission yeast has more essential genes than budding yeast and that essential genes are more likely than nonessential genes to be present in a single copy, to be broadly conserved and to contain introns. Growth fitness analyses determined sets of haploinsufficient and haploproficient genes for fission yeast, and comparisons with budding yeast identified specific ribosomal proteins and RNA polymerase subunits, which may act more generally to regulate eukaryotic cell growth.

1Integrative Omics Research Centre, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Yuseong, Daejeon, Korea. 2Cancer Research UK, The London Research Institute, London, UK. 3Department of Bio and Brain Engineering, Korea Advanced Institute of Science & Technology (KAIST), Yuseong, Daejeon, Korea. 4Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. 5Bioneer Corp., Daedeok, Daejeon, Korea. 6Laboratory of Cell Cycle & Signal Transduction, WCU Department of NanoBioMedical Science, Institute of Tissue Regeneration Engineering, Dankook University, Cheonan, Korea. 7Bioevaluation Centre, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Ochang, Chungcheongbuk-do, Korea. 8Department of Bioinformatics & Biotechnology, Korea University, Jochiwon, Chungnam, Korea. 9School of Biological Sciences, Seoul National University, Seoul, Korea. 10Department of Microbiology, Chungnam National University, Yuseong, Daejeon, Korea. 11Division of Life Sciences, Kangwon National University, Chuncheon, Kangwon-do, Korea. 12Department of Biochemistry, Yonsei University, Seoul, Korea. 13Department of Food and Nutrition, Chungnam National University, Yuseong, Daejeon, Korea. 14The Rockefeller University, New York, New York, USA. 15These authors contributed equally to this work. Correspondence and requests for material should be addressed to K.-L.H. ([email protected]).

Received 6 January; accepted 30 March; published online 16 May 2010; corrected after print 7 December 2010; doi:10.1038/nbt.1628

©20

10 N

atur

e A

mer

ica,

Inc.

All

righ

ts r

eser

ved.

618 VOLUME 28 NUMBER 6 JUNE 2010 NATURE BIOTECHNOLOGY

R E S O U R C E

and total gene synthesis methods13 were developed to overcome this problem by increasing the length of homology from ~80 bp to ~350 bp (Fig. 1b and Supplementary Figs. 2–4). We confirmed that the deletion mutants were correctly replaced with the KanMX marker using PCR and dideoxy sequencing (Supplementary Fig. 5). For some genes constraints on primer selection for block PCR resulted in <100% of the ORF being deleted (for the amount of ORF deleted see Supplementary Table 1, column C KO%). Of the 4,836 genes deleted, at least 4,328 genes (87.6%) have >80% of their ORFs removed. In addition we carried out Southern blot analysis to determine the fre-quency with which the deletion cassette integrated elsewhere in the genome and estimated it to be <1% (Supplementary Fig. 6).

We determined the essentiality of 4,836 genes by sporulating each heterozygous deletion diploid strain and then observing the germi-nating haploid spores microscopically. Essentiality was confirmed by tetrad analysis for all genes initially characterized as essential. We found that 26.1% of fission yeast genes (1,260/4,836) were essential and 73.9% (3,576/4,836) were nonessential for viability of haploid cells in the growth conditions we used. This analysis determined the dispensability for 3,626 genes that had not been deleted previously (Fig. 1c). Comparisons with published data for 1,210 genes revealed that the dispensability data for 98.4% of our deletions are similar or our data are more likely to be correct, leaving 1.6% as the maximum estimate of the error rate in our study (Supplementary Table 3).

These results contrast with budding yeast where 17.8% (1,033/5,776) of genes are essential for viability (http://www.yeastgenome.org/). Fission yeast therefore has 227 more essential genes (1,260–1,033) than budding yeast despite having fewer genes in total (4,836 versus 5,776). Fission yeast has fewer duplicated genes than budding yeast14,15. It is therefore possible that there are more essential genes in fission yeast, because duplication in budding yeast is masking potential essentiality. We examined this possibility by identifying all of the essential genes for each organism with duplications in the other organism (ortho-logous relationships Sp|Sc, one|many, many|one and many|many). We then identified the cases where these essential genes had orthologs in the other organism that were both nonessential and duplicated (Supplementary Table 1 and Online Methods). This revealed only 67 essential genes in fission yeast and 32 essential genes in budding yeast where essentiality of the orthologs in the other organism could be masked by redundancy. Thus redundancy could account for maximally only 35 (67–32) of the 227 extra essential genes in fission yeast. We conclude that redundancy is not the major reason for the additional essential genes in fission yeast.

Analysis of gene dispensabilityEssential and nonessential genes are distributed evenly throughout the fission yeast genome except within 100 kb of the telomeres on chromosomes 1 and 2 (Fig. 2a). As in other organisms4 genes in the subtelomeric regions showed low essentiality (1.2%) compared to a

genome average of 26.1%. These regions are enriched for paralogs (68.7%, 79/115) and we have shown that duplicated genes are less likely to be essential than single-copy genes (Supplementary Table 4). These regions are also enriched for nonessential species-specific genes related to meiosis and the response to nitrogen starvation, which are less likely to be essential under these assay conditions15.

In fission yeast ~46% of genes have introns7 and we found that the essentiality of genes with one or more introns is significantly higher than genes lacking introns (33% versus 21%, P < 10−14) (Fig. 2b). One pos-sible explanation is that essential genes are less likely to be rapidly regu-lated given that it has already been shown that rapidly regulated stress response–related genes are less likely to contain introns16. Alternatively if introns arose early during eukaryotic evolution, this may be reflected as a bias toward intron-containing essential genes because essential genes are more likely to be ancient than nonessential genes12.

The relationship between our gene essentiality data and previously published ORFeome localization17 was also analyzed for ten different cellular locations (Fig. 2c). As in budding yeast3,18 we found that the greatest percentage of essential gene products was localized to the nucleolus, nuclear envelope and the spindle pole body.

As previously shown for budding yeast4, essential genes in fission yeast were more likely to be unique, with 93.1% of essential genes (1,173/1,260) being present in single copy compared to 73.9% of nonessential genes (2,643/3,576). In contrast nonessential genes were more likely to be duplicated or species-specific. Comparison of Gene Ontology (GO) term enrichment between the two yeasts revealed that the essential gene sets for both yeasts were significantly (P < 10−2) enriched for core cellular processes, such as macromolecular (DNA, RNA, protein and lipid) metabolism and cellular biosynthesis (transcription initiation and/or translation and ribosome assembly) (Fig. 2d and Supplementary Table 5). In contrast, nonessential genes were significantly (P < 10−2) enriched for regulatory functions (control of gene expression and cell communication) (Fig. 2d and Supplementary Table 6). Nonessential genes were also enriched for conditional or life-cycle specific processes, such as stress response, transmembrane transport and meiosis or sexual reproduction, together with processes that are less likely to be essential in the rich medium and mitotic growth used in our assay conditions. Genes of unknown function were also highly enriched (93%) in the nonessen-tial genes. We predict that many of these genes are involved in bio-logical regulation or condition-specific processes and are not directly involved in primary processes.

b cNot made(78)

Genesynthesis

(263)

Block PCR(3,058)

SerialPCR (1,515)

Nonessential(this study)

2,729

ND78

Essential(known)

363

Essential(this study)

897

Nonessential(known) 847

KanMX4UPTAG DNTAG

RHG ORF

RHG RHG

RHGChr. DNA Chr. DNA

aFigure 1 Deletion construction and gene dispensability. (a) Gene deletion cassette containing the KanMX4 gene flanked by unique bar codes (UPTAG/DNTAG) and regions of homology to the gene of interest (RHG). The cassette replaced the ORF of interest by homologous recombination at the RHG regions. (b) Construction of deletion mutants. All 4,836 protein coding genes were deleted using serial extension PCR (31.3%), block PCR (63.2%) or total gene synthesis (5.4%). The remaining 78 genes could not be confirmed as deleted owing to ambiguous sequencing results, recombination failure or inviability of the heterozygous diploids. (c) Dispensability of 4,836 protein coding genes. For 3,626 (2,729 + 897) genes the dispensability was previously unknown. ND, not done.

©20

10 N

atur

e A

mer

ica,

Inc.

All

righ

ts r

eser

ved.

NATURE BIOTECHNOLOGY VOLUME 28 NUMBER 6 JUNE 2010 619

R E S O U R C E

Species distribution of essential genesThe dispensability profiles for the 4,836-deletion gene set were classi-fied by their gene copy numbers according to their relationship with budding yeast genes (Supplementary Table 4 and x axis in Fig. 3) and into five categories by their species distribution (Supplementary

Table 4 and y axis in Fig. 3). In a comparison of the entire deletion gene set (4,836) there are 2,841 single-copy genes (n = 1, m 1) (n and m are gene copy number in fission yeast and budding yeast, respec-tively), 855 duplicated genes in fission yeast that are conserved in budd-ing yeast (n > 1, m 1) and 1,140 genes found in fission yeast but not conserved in budding yeast (n 1, m = 0). The 1,260 essential genes were distributed across species as follows: (i) 883 genes conserved only in eukaryotes including humans, (ii) 207 conserved in both bacteria and eukaryotes, including humans, (iii) 91 genes found only in fungi, (iv) 39 genes found with a variable distribution throughout the phyla and (v) 40 fission yeast–specific genes. Essential genes were more likely than nonessential genes to be single copy and to be conserved broadly across species. Of the 1,260 essential fission yeast genes, 1,173 were single copy and only 87 have duplicates (Supplementary Tables 1 and 4). From the total of 974 (883 + 91) essential genes found only

a Chr 1 Cen1 (5.6 M)

0 1 2 3 4 5

Chr 2 Cen2 (4.5 M)

0 1 2 3 4

Chr 3 Cen3 (2.5 M)

0 1 2

c

Essential genes (%)

Microtubule

Nuclear envelopeSpindle pole body

Golgi

Nucleolus

Cell peripheryER

Mitochondrion

Nucleus

Cytoplasm

Cel

lula

r lo

caliz

atio

n

0 5 10 15 20 25 30 35 40 45

b35

30

5

10 2 3 >4Number of introns

Ess

entia

l gen

es (

%)

25

201510

0

d

Biological process

Enr

iche

d no

ness

entia

l Unknown

Meiotic cell cycle

Response to stress

Transmembrane transport

Cell communication

Reg. of gene expression

Enr

iche

d es

sent

ial

Cellular component org.

Cellular macromol. biosyn.

Nucleotide/nucleic acid met.

RNA processing

Ribosome biogenesis

Protein localization

Translation

Mitotic cell cycle

DNA replication

General transcription

Number of genes

Essential (Fission/budding)/Nonessential

1,0000 500 1,500 2,000

Figure 2 Analysis of gene dispensability. (a) Chromosome distribution of gene dispensability. Essential genes (tall bars) and nonessential genes (short bars) are distributed randomly throughout the genome except within 100 kb of the telomeres (gray boxes), where nonessential genes are enriched. Upper bars represent genes transcribed left to right and lower bars represent genes transcribed right to left. Filled circles in orange represent centromeres. (b) Percentage of essential genes versus number of introns. Percentage of essential genes was plotted against the number of introns within genes. In fission yeast, the percentage of essential genes containing introns is significantly (P < 10−14) higher than the percentage of those lacking introns. The dotted line represents the average percentage of essential genes in the total gene set (26.1%). (c) Percentage of essential genes versus ORFeome localization. The percentage of essential genes was plotted against ten different cellular locations in fission yeast. The dotted line represents the average percentage of essential genes for the total gene set (26.1%). The number of essential gene products localized to the nucleolus, spindle pole body and nuclear envelope is higher than average. The number of essential genes compared to the total for each location is: (i) cytoplasm 564/2,113; (ii) nucleus 601/2,068; (iii) mitochondrion 128/450; (iv) ER 98/436; (v) cell periphery 55/326; (vi) nucleolus 89/217; (vii) Golgi 27/224; (viii) spindle pole body 69/181; (ix) nuclear envelope 29/76; and (x) microtubule 20/71. (d) Comparison of GO analyses of fission yeast and budding yeast genes. Bar chart shows a selection of broad, biologically informative GO terms significantly (P 0.01) enriched for essential and nonessential genes in fission yeast and budding yeast. For the complete list of processes and for methods used to extract these data, see Supplementary Tables 5 and 6.

n = 1 n ≥ 1n > 1

Eukaryotes

Eukaryotes + bacteria

Fungi

Variablephyla

Fission yeastspecific

Gene copy number

Spe

cies

dis

trib

utio

n

m ≥ 1 m ≥ 1 m = 0

Essential

Nonessential

Figure 3 Comparative analysis of gene dispensability profiles of fission yeast. Gene dispensability profiles of 4,836 deletion mutants by gene copy number of fission yeast orthologs compared to budding yeast (x axis) and species distribution (y axis). Compared to budding yeast, fission yeast genes consist of 2,841 single-copy genes (n = 1, m 1), 855 duplicated genes (n > 1, m 1) and 1,140 genes found in fission yeast but not in budding yeast (n 1, m = 0), where ‘n’ is the number of genes in fission yeast and ‘m’ is the number of genes in budding yeast. The term ‘eukaryotes’ includes human and the term ‘variable phyla’ includes plants. The area of each circle represents the numbers of genes, where essential and nonessential genes are represented by yellow and blue, respectively.

©20

10 N

atur

e A

mer

ica,

Inc.

All

righ

ts r

eser

ved.

620 VOLUME 28 NUMBER 6 JUNE 2010 NATURE BIOTECHNOLOGY

R E S O U R C E

in eukaryotes, 59 are probably related to genes found in Archaea (Supplementary Table 7). The remaining 915 genes (72.6% of all essential genes) are likely to have arisen within the eukaryotic line-age. This implies that many essential novel gene functions arose with the evolution of the eukaryotic cell. The fidelity of cell division in ancestral unicellular eukaryotes may have been very low, which could be tolerated in evolutionary terms as long as there was overall popu-lation growth. However, a multicellular eukaryote requires greater fidelity at each cell division than a unicellular eukaryote, because even moderate levels of random cell death would lead to poor survival of a multicellular organism. It has been estimated that it took around 500 million years for multicellular organisms to arise from an ancestral unicellular eukaryote19, and we propose that during this period there was considerable genomic innovation to generate a unicellular eukaryote with sufficient fidelity at cell division to allow the evolution of multicellularity. Essential genes broadly conserved both in bacteria and eukaryotes were significantly (207 genes, P < 10−2) enriched for respiratory function and primary metabolism of low molecular weight molecules, such as nucleotide or glucose metabolism (Supplementary Table 8). Of 445 fission yeast–specific genes, only 40 were essential for viability (Supplementary Table 9). Some of these genes are impli-cated in aspects of mitotic and meiotic chromosome segregation10 and such species-specific genes may have played a role in speciation by reinforcing reproductive isolation20. As the majority of essential genes are broadly conserved, it is possible that distant orthologs exist in other eukaryotes, including budding yeast, if some of these appar-ently species-specific genes are rapidly diverging. To investigate this possibility we re-interrogated the nonconserved essential genes from both yeasts using the same criteria used to build the manual ortholog data set, but relaxing thresholds for candidates to generate seed align-ments and building alignments starting from the budding yeast genes rather than the fission yeast genes. This revealed a further four poten-tial orthologs (Supplementary Table 10). This indicates that more

in-depth comparisons of the essential nonconserved gene sets may reveal further distant evolutionary relationships and functions.

Dispensability comparison of orthologous pairs from the two yeastsAccess to deletion collections for both fission yeast and budding yeast allows a robust comparative analysis of dispensability between two evolutionarily distant eukaryotic organisms. To eliminate any complications due to functionally redundant paralogous genes, 2,438 single-copy orthologous pairs (one to ones) for which dele-tion data are available in both organisms were used for this analysis (Supplementary Table 11). Overall 83% of these genes (2,027/2,438) had the same dispensability in both yeasts (Fig. 4a), suggesting that conserved orthologs in other organisms may also have conserved dispensability. GO enrichment of the conserved one-to-one essential genes in fission and budding yeasts was similar to that of all essen-tial genes (compare Supplementary Table 12 with Supplementary Table 5), whereas the nonessential one-to-one pairs (compare Supplementary Table 13 with Supplementary Table 6) were enriched for additional GO terms, such as DNA damage, Golgi and/or endo-plasmic reticulum (ER)-related processes and catabolic processes. As conserved genes can be expected to be under positive selection, these single-copy nonessential genes are likely to contribute to overall cell fitness. For example, the inability to repair nonlethal DNA damage will reduce cell fitness. It is also likely that some processes still take place in the absence of certain components, albeit less efficiently, because of flexibility and plasticity in the processes concerned21. The Golgi/ER-related processes may be complemented by different but related membrane trafficking pathways or components substituting one for the other.

The remaining 17% of orthologous pairs (411/2,438) differ in essentiality between the two yeasts; of these, 268 are essential only in fission yeast and 143 are essential only in budding yeast (Fig. 4a). Therefore, there are 125 extra essential genes (268–143) in fission yeast

a b

Actin cytoskeleton relatedTubulin-specific chaperone

Purine/pyrimidine metabolismHeme metabolism

Other amino acid metabolismErgosterol metabolism

Tryptophan metabolismMet/Thr/Glu metabolismProteolysis (peptidases)Neddylation associated

SUMOylation associatedProteosome/ubiquitin associated

V-type ATPaseGlycosylation/ER associated

Mitotic/SIN signalingDNA replication checkpointDNA recombination/repair

Spindle/kinetochore associatedOther processes

Iron-sulphur cluster assemblyOther mitochondrial function

Mitochondrial translation

Biological process

01020307580 0 10 20 30 75 80

Number of geneswith different dispensability

Fission yeast Budding yeast

EssentialNonessential

EssentialNonessential

Number of geneswith different dispensability

E:NE268 (11%)

NE:E143 (6%)

NE:NE1,312 (54%)

E:E715 (29%)

Figure 4 Dispensability comparison of orthologous pairs from the two yeasts. (a) Essentiality of nonredundant 2,438 orthologous pairs were compared between the two yeasts. Eighty-three percent of orthologs show conserved dispensability and the remaining 17% show different dispensability. E, essential; NE, nonessential. (b) Functional distribution of orthologs with different dispensability. The 17% of the orthologous pairs with different dispensability were allocated to one of 31 biological terms, 22 of which are shown here. For the complete list of processes and genes, see Supplementary Table 14. Note that genes annotated to mitochondrial functions, certain amino acid metabolic pathways and protein degradation pathways such as neddylation and sumoylation are mostly essential in one yeast and nonessential in the other yeast, whereas other categories show essential genes (although the specific genes are different) in both yeasts under the conditions used in this study. Because there are some differences in the constituents of the standard rich media used for each organism, it is possible that in a few cases different dispensability between the two organisms are due to these differences.

©20

10 N

atur

e A

mer

ica,

Inc.

All

righ

ts r

eser

ved.

NATURE BIOTECHNOLOGY VOLUME 28 NUMBER 6 JUNE 2010 621

R E S O U R C E

compared to budding yeast in this category, making the difference in dispensability of one-to-one orthologs a major reason why overall there are 227 more essential genes in fission yeast than in budding yeast. To analyze these differences further we identified a set of broad biologi-cal processes that encompass the entire set and sorted each gene pair into the most biologically relevant group (Fig. 4b and Supplementary Table 14). The most striking difference was for mitochondrial func-tion (95 orthologous pairs). Of these 89 genes were essential in fission yeast and only six genes in budding yeast. Many of these genes encode components of the mitochondrial translation machinery (69 genes), which is required for mitochondrial DNA (mtDNA) stability. Loss of mtDNA is lethal in fission yeast but not in budding yeast where ‘petite’ mutants lacking mtDNA are viable and mitochondrial translation is not essential22. Conversely, the DNA replication checkpoint genes rad3, rad26 and cds1 are nonessential during normal growth, whereas their respective budding yeast orthologs are essential because of the requirement for degradation of a ribonucleotide reductase inhibitor23. This inhibitor can be degraded by a second checkpoint-independent pathway in fission yeast24,25 and other eukaryotes but not in budding yeast. Other examples of differential essentiality include the biological processes relating to RNA processing and/or export pathways, Golgi/ER transport, spindle/kinetochore/centromere, transcription and/or other chromatin-associated and glycosylation and/or other ER-associated processes. These differences may reflect dissimilarities in the numbers of introns7, centromere structure7, the organization of the Golgi net-work26,27 and membrane trafficking. Although 83% of the orthologous pairs have conserved dispensability, different essentiality of specific biological processes and defined complexes in 17% of gene pairs may represent life-style differences between these distantly related yeasts.

Growth profiling of diploidsAll fission deletion mutants constructed in this study have been bar-coded (Supplementary Table 1), enabling the strains to be examined as an entire set in pooled experiments. Parallel analysis for changes in the growth rate of heterozygous deletion diploid strains has been used in budding yeast to identify potentially rate-limiting steps for cellular growth2,28,29. Using a similar methodology30 (Online Methods and Supplementary Figs. 7–9), we examined the growth rates in yeast extract medium for 4,334 fission yeast heterozygous deletion diploids (Supplementary Table 15; for the microarray raw data see Supplementary Data 4 and 5) and we further examined the growth rate of the 10 slowest haploinsufficient mutants as a proof-of-principle experiment (Supplementary Fig. 10). The growth rates of these ten mutants were found to be comparable to the relative fitness results from the microarray parallel analysis.

Comparisons were also made for the haploinsufficient (slower growth) and haploproficient (faster growth) genes in fission yeast and budding yeast (Fig. 5). There were considerably more haplo-insufficient genes in fission yeast compared to budding yeast (455 versus 356) when using a growth rate cut-off of <0.97 (Fig. 5 and Supplementary Table 16), whereas there were a similar number of haploproficient genes in both yeasts. The budding yeast life cycle is predominantly diploid and so reduced expression of potentially haploinsufficient genes in diploid cells is likely to have been subject to strong negative selection; this would not be the case for the pre-dominantly haploid fission yeast.

To make a more direct comparison between the fission and bud-ding yeasts, we compared the fastest 3% of haploproficient genes (136 versus 183) and the slowest 3% of haploinsufficient genes (138 versus 184) from each organism (Table 1 and Supplementary Table 17). In fission yeast the haploproficient gene set showed GO enrichment for

macromolecule biosynthesis (P < 2.1 × 10−19) particularly ribosomal proteins (Table 1a and Supplementary Table 18). The TOR pathway genes (tor2, tsc1 and mip1) and genes encoding Rab-GTPase activating proteins were also found to be enriched in the haploproficient gene set. The loss of heterozygosity in TSC131, RAB-GTPases32 and also certain ribosomal proteins33 has been implicated in certain human cancers. None of these haploproficient genes showed any enrichment in the budding yeast haploproficient gene set. If fission yeast evolved in a nutrition-poor niche, then these pathways may have evolved to fine-tune optimal growth in these conditions, which may result in a sub-maximal growth rate in rich media.

Haploinsufficient genes from budding yeast showed a significant GO enrichment for ribosomal-related function29, whereas those from fission yeast did not (Supplementary Table 18). We reasoned that any genes common to both haploinsufficient gene sets are likely to be important for regulating growth in both yeasts. A comparison of these gene sets in the two yeasts (138 versus 184) revealed 14 common orthologous groups and 15 genes (Table 1b). These included three genes encoding small subunit ribosomal proteins (S3, S6 and S7), five genes encoding large subunit ribosomal proteins (L6, redundant L13, L35 and L39) and another five genes involved in transcriptional functions including a predicted transcription factor TFIID complex subunit A/SAGA complex subunit (taf12), DNA-directed RNA polymerase II–specific subunits, (rpb3 and rpb7) and DNA-directed RNA polymerase subunits (rpb6 and rpc10), which are common to DNA-directed RNA polymerases I, II and III. Because the haplo-insufficiency of these genes has been conserved between two distantly related organisms, it is likely that the amount of protein encoded by them is particularly important for the growth rate of the cell. It is therefore possible that their dosage is also important for the regulation of growth in other eukaryotes.

1,500

0.88 0.91 0.97 1.00 1.03 0

500

1,000

0

500

1,000

1,500

Relative growth rate in rich media

Num

ber

of g

enes

Num

ber

of g

enes

Buddingyeast

Fissionyeast

0.88 0.94 0.97 0

50

100

0

50

100

0.94

0.91

0.88 0.91 0.97 1.00 0.94 1.03

0.88 0.94 0.97 0.91

Figure 5 A comparison of the relative growth rates for the total set of heterozygous deletion diploids in fission yeast (4,334 genes) and budding yeast (5,921 genes). In fission yeast there are more haploinsufficient genes with a relative growth rate of <0.97 compared to budding yeast (455 versus 356), as shown in the expanded region 0.88–0.97 (Supplementary Table 16).

©20

10 N

atur

e A

mer

ica,

Inc.

All

righ

ts r

eser

ved.

622 VOLUME 28 NUMBER 6 JUNE 2010 NATURE BIOTECHNOLOGY

R E S O U R C E

DISCUSSIONFission yeast is an important model eukaryotic organism and the availability of a genome-wide deletion collection will facilitate further studies such as genetic interaction assays, phenotypic analysis4, com-parative genomics, gene dispensability analysis of higher eukaryotes and drug-induced haploinsufficiency screening34. For example, a partial collection of the viable haploid deletions has been distributed to ~25 laboratories and studies from two laboratories have shown that there is considerable conservation of synthetic lethal genetic inter-actions with budding yeast as well as rewiring of some functionally conserved modules35,36.

Our comparisons of orthologous gene pairs between budding and fission yeast showed that 83% had the same dispensability despite being distantly related. This high level of conservation in dispensability will be helpful for the interpretation of more complex RNAi data from other organisms6,37–39. We have also shown that there is a relationship between gene essentiality and the presence of introns, which may indicate that essential genes are less likely to be rapidly regulated16. There are orthologs for 3,492 fission yeast genes in other eukaryotes, including humans. Of these genes, 454 are not conserved in budding yeast suggesting that fission yeast may be a valuable alternative organism to budding yeast for certain experiments, for example, optimization of drug screening protocols. However, there are ~3,038 genes conserved in

both yeasts and other eukaryotes including humans, which encour-ages us in the view that conclusions drawn from analyses in the two yeasts concerning molecular and cell biology will be relevant to, and improve our understanding of, metazoan cells.

We have also identified a small set of genes required for translation and transcription, including genes encoding specific ribosomal proteins and RNA polymerase subunits that are haploinsufficient for growth in both the yeasts. These specific gene products may play a critical role in regulating the growth of eukaryotic cells. The identification of genes encoding elements of the TOR pathway, Rab-GTPase activating proteins and ribosomal proteins, as haploproficient, is also of interest given the involvement of these gene products in cancer31–33. The avail-ability of a near-complete, genome-wide deletion collection for fission yeast provides a useful tool for the functional studies of eukaryotic molecular and cell biology and for biotechnological applications.

METHODSMethods and any associated references are available in the online version of the paper at http://www.nature.com/naturebiotechnology/.

Note: Supplementary information is available on the Nature Biotechnology website.

ACKNOWLEDGMENTSWe thank members of our laboratories for their participation in the construction and analysis of the deletion mutants, particularly H.-R. Hwang, H.-S. Ahn,

Table 1 Haploinsufficient and haploproficient genes in the two yeasts(a)

GO term Haploproficient (HP) gene HP gene annotationTotal gene annotation

P-value (uncorrected)

Translation & ribosome biogenesis (GO:0006412)

60S rpl301, rpl501, rpl702, rpl801, rpl901, rpl902, rpl1001, rpl1101, rpl1701, rpl1801, rpl2001, rpl2002, rpl1901, rpl2101, rpl2102, rpl2301, rpl2502, rpl2802, rpl3001, rpl3201, rpl3202, rpl3401, rpl3601, rpl3602, rpl3702, rpl4301, rpl3801, rpp201 (28 genes)

54 316 2.10 × 10−19

40S rps001, rps002, rps401, rps402, rps403, rps502, rps801, rps802, rps901, rps1001, rps1002, rps1101, rps1102, rps1201, rps13, rps1501, rps1502, rps1602, rps1701, rps1702, rps1801, rps1902, rps23, rps2302, rps2402, rps2802 (26 genes)

Regulation of Rab GTPase activity (GO:0032313)

gyp1, gyp7, gyp51, SPAC1952.17c 4 13 5.30 × 10−4

TOR signaling pathway (GO:0031929)

tor2, tsc1, mip1, tco89, gad8 5 14 8.55 × 10−3

Of the genes deleted in the 136 haploproficient mutants with the fastest growth rates, 54 genes (39.7%) encode ribosomal subunit proteins and nine genes encode Rab GAP and TOR pathway-related proteins. For GO enrichment of the haploproficient genes in fission yeast, see Supplementary Tables 17 and 18.

(b)

Gene product Gene category Budding yeast ID Fission yeast ID Sc:Sp

Ribosomal protein S3 Ribosomal subunit YNL178W SPBC16G5.14c 1:1Ribosomal protein S6 Ribosomal subunit YPL090C SPAPB1E7.12 1:1Ribosomal protein S7 Ribosomal subunit YNL096C|YOR096W SPAC18G6.14c 2:1Ribosomal protein L6 Ribosomal subunit YLR448W|YML073C SPCC622.18 2:1Ribosomal protein L13 Ribosomal subunit YDL082W|YMR142C SPAC664.05|SPBC839.13c 2:2Ribosomal protein L35 Ribosomal subunit YDL136W|YDL191W SPCC613.05c 2:1Ribosomal protein L39 Ribosomal subunit YJL189W SPCC663.04 1:1TFIID subunit A (Taf12) Transcription YDR145W SPAC15A10.02 1:1RNA pol II Rpb3 Transcription YIL021W SPCC1442.10c 1:1RNA pol Rpb6 Transcription YPR187W SPCC1020.04c 1:1RNA pol II Rpb7 Transcription YDR404C SPACUNK4.06c 1:1RNA pol Rpc10 Transcription YHR143W-A SPBC19C2.03 1:1U3 snoRNP Utp4 RNA processing YDR324C SPBC19F5.02c 1:1ATPase Rvb2 Chromatin remodeling YPL235W SPBC83.08 1:1

Genes common to the haploinsufficient gene sets of both fission yeast and budding yeast (Supplementary Tables 17 and 18). Of these 15 genes, 13 (86.7%) are involved in transcription or translation.

©20

10 N

atur

e A

mer

ica,

Inc.

All

righ

ts r

eser

ved.

NATURE BIOTECHNOLOGY VOLUME 28 NUMBER 6 JUNE 2010 623

R E S O U R C E

Y.-D. Kim, S. Park, H.-J. Lee, J.-H. Ahn, Y.-S. Kil, S.-Y. Park, J.-H. Lim, J.-H. Song, Y.-K. Ryoo, J.-Y. Kim, M.-J. Oh, S. Kong, J. Ahn, N. Sun, N. Peat, R. Mandeville and J.-J. Li. We also thank J.-H. Roe and W.-K. Huh for reading this manuscript and for their insightful comments and O. Nielsen for his patience with the many requests for pON177. This work was supported by the intramural research program of KRIBB (Mission 2007), the Chemical Genomics Research Program and the 21st Century Frontier Research Program from the Ministry of Education, Science and Technology (MOEST) of Korea. This work was also supported by Bioneer Corp., The Wellcome Trust, Cancer Research UK, The Breast Cancer Research Foundation (BCRF) and The Rockefeller University.

AUTHOR CONTRIBUTIONSD.-U.K., J.H., H.-O.P., M.W., H.-S.Y., P.N. and K.-L.H. conceived the project; D.-U.K., J.H., D.K., V.W., M.W., T.D., M.N., G.P., S.H., L.J., S.-T.B., H.L., Y.S.S., M.L., L.K., K.-S.H., E.J.N., A.-R.L., Y.-J.J., K.-S.C., S.-J.C., J.-Y.P., Y.P., H.M.K., S.-K.P., H.B.K., H.-S.K., H.-M.P., K.K., K.S. and K.B.S. performed experiments and data analysis; D.K., H.-J.P., E.-J.K. and H.-M.P. performed primer design; D.K. and V.W. performed bioinformatics; D.-U.K., J.H., D.K., V.W., P.N. and K.-L.H. wrote the paper.

COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.

Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/.

1. Jorgensen, P. et al. High-resolution genetic mapping with ordered arrays of Saccharomyces cerevisiae deletion mutants. Genetics 162, 1091–1099 (2002).

2. Hillenmeyer, M.E. et al. The chemical genomic portrait of yeast: uncovering a phenotype for all genes. Science 320, 362–365 (2008).

3. Giaever, G. et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 387–391 (2002).

4. Winzeler, E.A. et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 (1999).

5. Entian, K.D. & Kotter, P. Methods in Microbiology 36, edn. II. 629–666 (Elsevier, 2007).

6. Kittler, R. et al. Genome-scale RNAi profiling of cell division in human tissue culture cells. Nat. Cell Biol. 9, 1401–1412 (2007).

7. Wood, V. et al. The genome sequence of Schizosaccharomyces pombe. Nature 415, 871–880 (2002).

8. Fisk, D.G. et al. Saccharomyces cerevisiae S288C genome annotation: a working hypothesis. Yeast 23, 857–865 (2006).

9. Wach, A., Brachat, A., Pohlmann, R. & Philippsen, P. New heterologous modules for classical or PCR-based gene disruptions in Saccharomyces cerevisiae. Yeast 10, 1793–1808 (1994).

10. Gregan, J. et al. Novel genes required for meiotic chromosome segregation are identified by a high-throughput knockout screen in fission yeast. Curr. Biol. 15, 1663–1669 (2005).

11. Martin-Castellanos, C. et al. A large-scale screen in S. pombe identifies seven novel genes required for critical meiotic events. Curr. Biol. 15, 2056–2062 (2005).

12. Decottignies, A., Sanchez-Perez, I. & Nurse, P. Schizosaccharomyces pombe essential genes: a pilot study. Genome Res. 13, 399–406 (2003).

13. Smith, H.O., Hutchison, C.A. III, Pfannkoch, C. & Venter, J.C. Generating a synthetic genome by whole genome assembly: phiX174 bacteriophage from synthetic oligonucleotides. Proc. Natl. Acad. Sci. USA 100, 15440–15445 (2003).

14. Sipiczki, M. Where does fission yeast sit on the tree of life? Genome Biol. 1, reviews 1011.1–1011.4 (2000).

15. Wood, V. Schizosaccharomyces pombe comparative genomics; from sequence to systems, in Comparative Genomics: Using Fungi as Models (eds. Sunnerhagen, P. & Piskur, J.), 233–285 (Springer Berlin, Heidelberg, 2006).

16. Jeffares, D.C., Penkett, C.J. & Bahler, J. Rapidly regulated genes are intron poor. Trends Genet. 24, 375–378 (2008).

17. Matsuyama, A. et al. ORFeome cloning and global analysis of protein localization in the fission yeast Schizosaccharomyces pombe. Nat. Biotechnol. 24, 841–847 (2006).

18. Huh, W.K. et al. Global analysis of protein localization in budding yeast. Nature 425, 686–691 (2003).

19. Benton, M.J. & Ayala, F.J. Dating the tree of life. Science 300, 1698–1700 (2003).

20. Hoskin, C.J., Higgie, M., McDonald, K.R. & Moritz, C. Reinforcement drives rapid allopatric speciation. Nature 437, 1353–1356 (2005).

21. Harrison, R., Papp, B., Pal, C., Oliver, S.G. & Delneri, D. Plasticity of genetic interactions in metabolic networks of yeast. Proc. Natl. Acad. Sci. USA 104, 2307–2312 (2007).

22. Chiron, S., Suleau, A. & Bonnefoy, N. Mitochondrial translation: elongation factor tu is essential in fission yeast and depends on an exchange factor conserved in humans but not in budding yeast. Genetics 169, 1891–1901 (2005).

23. Choi, D.H., Oh, Y.M., Kwon, S.H. & Bae, S.H. The mutation of a novel Saccharomyces cerevisiae SRL4 gene rescues the lethality of rad53 and lcd1 mutations by modulating dNTP levels. J. Microbiol. 46, 75–80 (2008).

24. Ralph, E., Boye, E. & Kearsey, S.E. DNA damage induces Cdt1 proteolysis in fission yeast through a pathway dependent on Cdt2 and Ddb1. EMBO Rep. 7, 1134–1139 (2006).

25. Liu, C. et al. Cop9/signalosome subunits and Pcu4 regulate ribonucleotide reductase by both checkpoint-dependent and -independent mechanisms. Genes Dev. 17, 1130–1140 (2003).

26. Preuss, D., Mulholland, J., Franzusoff, A., Segev, N. & Botstein, D. Characterization of the Saccharomyces Golgi complex through the cell cycle by immunoelectron microscopy. Mol. Biol. Cell 3, 789–803 (1992).

27. Ayscough, K., Hajibagheri, N.M., Watson, R. & Warren, G. Stacking of Golgi cisternae in Schizosaccharomyces pombe requires intact microtubules. J. Cell Sci. 106, 1227–1237 (1993).

28. Roemer, T. et al. Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery. Mol. Microbiol. 50, 167–181 (2003).

29. Deutschbauer, A.M. et al. Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics 169, 1915–1925 (2005).

30. Pierce, S.E. et al. A unique and universal molecular barcode array. Nat. Methods 3, 601–603 (2006).

31. Jozwiak, J., Jozwiak, S. & Wlodarski, P. Possible mechanisms of disease development in tuberous sclerosis. Lancet Oncol. 9, 73–79 (2008).

32. Cheng, K.W., Lahad, J.P., Gray, J.W. & Mills, G.B. Emerging role of RAB GTPases in cancer and human disease. Cancer Res. 65, 2516–2519 (2005).

33. McGowan, K.A. et al. Ribosomal mutations cause p53-mediated dark skin and pleiotropic effects. Nat. Genet. 40, 963–970 (2008).

34. Lum, P.Y. et al. Discovering modes of action for therapeutic compounds using a genome-wide screen of yeast heterozygotes. Cell 116, 121–137 (2004).

35. Roguev, A. et al. Conservation and rewiring of functional modules revealed by an epistasis map in fission yeast. Science 322, 405–410 (2008).

36. Dixon, S.J. et al. Significant conservation of synthetic lethal genetic interaction networks between distantly related eukaryotes. Proc. Natl. Acad. Sci. USA 105, 16653–16658 (2008).

37. Kamath, R.S. et al. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421, 231–237 (2003).

38. Dietzl, G. et al. A genome-wide transgenic RNAi library for conditional gene inactivation in Drosophila. Nature 448, 151–156 (2007).

39. Ravi, D. et al. A network of conserved damage survival pathways revealed by a genomic RNAi screen. PLoS Genet. 5, e1000527 (2009).

©20

10 N

atur

e A

mer

ica,

Inc.

All

righ

ts r

eser

ved.

NATURE BIOTECHNOLOGY doi:10.1038/nbt.1628

currently being remade (Supplementary Table 1, column U for the list of heterozygous diploid strains that still contain the ts mutation).

Redundancy and essentiality. To assess the effect of redundancy on masking essentiality and its contribution to the extra essential genes in fission yeast, we identified all genes in the one|many, many|one and many|many categories where data were available for both organisms (Supplementary Table 1). We eliminated all orthologous groups with an equal number of essential genes in each organism (e.g., ev|ev) and those where redundancy could not contribute to the difference in essentiality (e.g., vv|vv). The remaining essential genes where redundancy could mask essentiality in one or the other organism were counted for both yeasts. There were 67 essential genes in fission yeast and 35 essential genes in budding yeast where redundancy in the other yeast could potentially be masking essentiality.

Data source and URLs. DNA and protein information of fission yeast were from the S. pombe GeneDB database ftp://ftp.sanger.ac.uk/pub/yeast/pombe/Mappings/OLD/allNames.txt_27Aug2008, and the budding yeast data set from http://www.yeastgenome.org/. Budding yeast deletion data3 were from http://downloads.yeastgenome.org/literature_curation/archive/phenotypes.tab.20080202.gz. Interspecies comparisons used manually curated species distribution from GeneDB on 24/06/2008 and Version 13 of the manually curated fission yeast/budding yeast ortholog table.

Distant ortholog detection. The detection of distant orthologs used all essen-tial S. pombe and all S. cerevisiae proteins that were not already members of an existing orthologous group, based on the manually curated S. cerevisae/ S. pombe ortholog table version 13 (ref. 15). Ortholog candidate detection used PSI blast, and the criteria as described15 were used to support orthologous clus-ter predictions. Individual multiple alignments are provided in Supplementary Table 10. One ortholog prediction SPAC1006.42/YPR085C pair has since been confirmed experimentally (PMID 19040720).

GO analysis. GO enrichment analysis used the Princeton implementation of GO term finder44 (http://go.princeton.edu/cgi-bin/GOTermFinder) with gene association files from November 2008; GO TermFinder calculates P-value using the hypergeometric distribution, and Bonferroni method is used for multiple hypothesis correction. Analysis used a P-value cut off of 0.01 and all evidence codes except RCA (reviewed computational analysis) are included. The whole genome comparison in Figure 2d and Supplementary Tables 5 and 6 used the total protein coding data sets for fission yeast (4,836) and budding yeast (5,776). Some biologically uninformative terms were omitted from the results (that is, when parent and child terms show identical enrichment only the child term is included). GO process enrichment of essential genes, which are conserved in single copy (Supplementary Table 12), versus nonessential genes conserved in single copy (Supplementary Table 13) used fission yeast annotations and background set.

Parallel analysis using microarray. The custom-made GeneChip (48 K) was designed and manufactured according to the Affymetrix GeneChip guide (KRIBBSP2, Part No. 520506). Construction of mutant library pools, sam-pling, PCR amplification of probes, hybridization and washes were carried out following modified budding yeast protocols29,30. Genomic DNA was pre-pared from frozen cell stocks using a kit (Zymo Research ZR-Fungal/Bacterial DNA kit). For each sample, 10~20 OD600 corresponding to 2~4 × 108 cells/ml was used for the genomic DNA preparation. To amplify and label the tags the following sets of primers were used for PCR using 0.2 g genomic DNA as a template; uptag, forward (5 U-2) 5 -GCTCCCGCCTTACTTCGCAT-3 , reverse (biotin-Kan5 U-2) 5 -biotin-CGGGGACGAGGCAAGCTAA-3 ; downtag, forward (DN3-F-biotin) 5 -biotin-GCCGCCATCCAGTGTCG-3 , reverse (DN3-R) 5 -TTGCGTTGCGTAGGGGGG-3 . For growth profiling, data were collected from six independent experiments using two different pool sets. For details, see Supplementary Figures 7–9.

Analysis of microarray results. Out of 4,441 mutants in the deletion pool, 3,523 mutants were represented by both up-tag and down-tag, and 811 mutants were represented by at least one of two tags. Therefore, at least one of the tags

ONLINE METHODSConstruction of genome-wide deletion mutants. Heterozygous deletion mutants of 4,836 protein coding genes in fission yeast were constructed using a method based on homologous recombination of a deletion cassette containing a pair of unique molecular bar codes (up-tag and down-tag in Supplementary Table 1) and the KanMX marker gene9. The sequences of bar codes was gener-ated using a BioPerl-based computer program to meet the following criteria; melting temperature (Tm) = 60 °C, no cross-hybridization, no secondary struc-tures and no similarities to genomic sequences. RNAfold and mfold freeware (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi) was used for checking second-ary structure, and the BLAST program was used for checking similarity with genomic sequence. Deletion cassettes were generated by a modified PCR-based strategy. For one-third of deletion cassettes, the conventional serial-extension PCR method3,4 was used. For the remaining two-thirds, the block PCR method or an innovative gene synthesis method13 was employed, resulting in the increase in the length of homologous recombination regions from ~80 bp to 250~450 bp. Oligonucleotides used in construction of the deletion cassettes were supplied by Bioneer Corporation. The deletion cassettes were transformed into SP286 (ade6-M210/ade6-M216, leu1-32/leu1-32, ura4-D18/ura4-D18 h+/h+) using a lithium acetate method40, and then incubated for 5 d to select positive colonies on YES agar containing 100 g/ml G418 (Duchefa Biochemie).

Confirmation of genome-wide deletion mutants. To verify the integration of deletion cassettes at the correct locus, colony PCR was carried out. Dideoxy sequencing of the PCR product from each successful deletion mutant was car-ried out to confirm the sequences of up- and down-tags as well as the junctions to accurately define the deleted region. To estimate how often the deletion cassette integrated at additional sites in the genome, Southern blot analysis of chromosomal DNA from 61 different deletion strains was carried out using KanMX4 as a probe. All the strains and check-PCR primers described here are available from Bioneer (http://pombe.bioneer.co.kr).

Determination of essentiality. General growth conditions and media were used as described41. Essentiality was determined by a microscopic observa-tion of colony-forming ability of spores on YES (yeast extract medium sup-plemented with adenine, leucine, uracil and histidine at 250 mg/l) at 25 °C and 32 °C. The spores were derived from corresponding heterozygous diploid deletion strains transformed with the pON177 plasmid42 using a modified version of the PLATE method43. About 5% of the heterozygous deletion diploids could not be transformed using this high-throughput method and these were repeated using a standard transformation protocol40. Briefly, four batches each of 48 heterozygous diploid strains were patched on to YE (yeast extract medium supplemented with leucine and uracil at 250 mg/l) + G418 agar plates in two 96-well microtiter plates (each strain is represented four times) and left to grow for 2~3 d at 32 °C. Cells were inoculated into 200 l YE + G418 and left to grow into stationary phase. The cells were harvested and transformed with pON177 (ref. 42), plated on minimal agar + leucine (250 mg/l) and incubated for a week at 32 °C. Transformants were inocu-lated into minimal media lacking nitrogen and left for 2~3 d at 25 °C to induce sporulation. The asci were treated with helicase (Bio Sepra) diluted 1 in 250 to eliminate vegetative cells, washed with water and the haploid spores were plated on YES agar at 25 °C and 32 °C. Essentiality was deter-mined by a microscopic observation of the germinating spores on plates after 1 and 2 d before replica plating to YES +100 g/ml G418 to confirm that the deletion phenotype was associated with G418 resistance. Essential genes were further analyzed by tetrad analysis. Briefly, cells harboring pON177 were left to germinate for 4~5 d on minimal plates. Using a Singer MSM microscope, spores were dissected on YES plates for 4~5 d at 30 °C. Viable colonies were patched onto YES plates + 100 g/ml G418 to confirm that viability was linked to G418 sensitivity (Supplementary Methods). While analyzing gene dispensability, we found that a subset of the deletion collec-tion harbored a recessive temperature-sensitive mutation unrelated to the gene deletion. This ts mutation was removed from the entire nonessential haploid deletion library after sporulation of the diploid heterozygous dele-tion strains of nonessential genes. There were originally 416 of the 1,260 essential heterozygous deletion diploid strains that harboured the ts muta-tion. Of these 416 strains, 364 have been remade and the remaining 52 are

©20

10 N

atur

e A

mer

ica,

Inc.

All

righ

ts r

eser

ved.

NATURE BIOTECHNOLOGYdoi:10.1038/nbt.1628

from 4,334 strains was detectable by chip analysis. The remaining 107 tags were removed from the analysis, as they had intensities less than fourfold that of background. For the analysis of microarray results, the analysis of covari-ance (ANCOVA) model was used as a statistical tool. Each array signal was normalized by a mean-intensity (that is, 2,500 arbitrary units) and interpreted by ANCOVA as a linear regression corresponding to a multiple-regression model on time (measured in generations and treated as a quantitative predic-tor) and replicate series (treated as a categorical predictor) simultaneously. This analysis provides estimates of statistical significance (P-values) using the F-statistic (Supplementary Methods).

40. Bahler, J. et al. Heterologous modules for efficient and versatile PCR-based gene targeting in Schizosaccharomyces pombe. Yeast 14, 943–951 (1998).

41. Moreno, S., Klar, A. & Nurse, P. Molecular genetic analysis of fission yeast Schizosaccharomyces pombe. Methods Enzymol. 194, 795–823 (1991).

42. Styrkarsdottir, U., Egel, R. & Nielsen, O. The smt-0 mutation which abolishes mating-type switching in fission yeast is a deletion. Curr. Genet. 23, 184–186 (1993).

43. Elble, R. A simple and efficient procedure for transformation of yeasts. Biotechniques 13, 18–20 (1992).

44. Boyle, E.I. et al. GO:TermFinder–open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20, 3710–3715 (2004).

1308 VOLUME 28 NUMBER 12 DECEMBER 2010 NATURE BIOTECHNOLOGY

Erratum: GM alfalfa—who wins?Jeffrey L FoxNat. Biotechnol. 28, 770 (2010); published online 9 August 2010; corrected after print 7 December 2010

In the version of this article initially published, Geertson Seed Farms was spelled “Geerston.” The error has been corrected in the HTML and PDF versions of the article.

Erratum: South-South entrepreneurial collaboration in health biotechHalla Thorsteinsdóttir, Christina C Melon, Monali Ray, Sharon Chakkalackal, Michelle Li, Jan E Cooper, Jennifer Chadder, Tirso W Saenz, Maria Carlota de Souza Paula, Wen Ke, Lexuan Li, Magdy A Madkour, Sahar Aly, Nefertiti El-Nikhely, Sachin Chaturvedi, Victor Konde, Abdallah S Daar & Peter A SingerNat. Biotechnol. 28, 407–416 (2010); published online 7 May 2010; corrected after print 7 December 2010

In the version of this article initially published, a line was missing connecting India and China in Figure 3. The error has been corrected in the HTML and PDF versions of the article.

Corrigendum: The BioPAX community standard for pathway data sharingEmek Demir, Michael P Cary, Suzanne Paley, Ken Fukuda, Christian Lemer, Imre Vastrik, Guanming Wu, Peter D’Eustachio, Carl Schaefer, Joanne Luciano, Frank Schacherer, Irma Martinez-Flores, Zhenjun Hu, Veronica Jimenez-Jacinto, Geeta Joshi-Tope, Kumaran Kandasamy, Alejandra C Lopez-Fuentes, Huaiyu Mi, Elgar Pichler, Igor Rodchenkov, Andrea Splendiani, Sasha Tkachev, Jeremy Zucker, Gopal Gopinath, Harsha Rajasimha, Ranjani Ramakrishnan, Imran Shah, Mustafa Syed, Nadia Anwar, Özgün Babur, Michael Blinov, Erik Brauner, Dan Corwin, Sylva Donaldson, Frank Gibbons, Robert Goldberg, Peter Hornbeck, Augustin Luna, Peter Murray-Rust, Eric Neumann, Oliver Reubenacker, Matthias Samwald, Martijn van Iersel, Sarala Wimalaratne, Keith Allen, Burk Braun, Michelle Whirl-Carrillo, Kei-Hoi Cheung, Kam Dahlquist, Andrew Finney, Marc Gillespie, Elizabeth Glass, Li Gong, Robin Haw, Michael Honig, Olivier Hubaut, David Kane, Shiva Krupa, Martina Kutmon, Julie Leonard, Debbie Marks, David Merberg, Victoria Petri, Alex Pico, Dean Ravenscroft, Liya Ren, Nigam Shah, Margot Sunshine, Rebecca Tang, Ryan Whaley, Stan Letovksy, Kenneth H Buetow, Andrey Rzhetsky, Vincent Schachter, Bruno S Sobral, Ugur Dogrusoz, Shannon McWeeney, Mirit Aladjem, Ewan Birney, Julio Collado-Vides, Susumu Goto, Michael Hucka, Nicolas Le Novère, Natalia Maltsev, Akhilesh Pandey, Paul Thomas, Edgar Wingender, Peter D Karp, Chris Sander & Gary D Bader.Nat. Biotechnol. 28, 935–942 (2010); published online 09 September 2010; corrected after print 7 December 2010

In the version of this article initially published, the affiliation for Ken Fukuda was incorrect. The correct affiliation is Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan. The error has been corrected in the HTML and PDF versions of the article.

Corrigendum: Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombeDong-Uk Kim, Jacqueline Hayles, Dongsup Kim, Valerie Wood, Han-Oh Park, Misun Won, Hyang-Sook Yoo, Trevor Duhig, Miyoung Nam, Georgia Palmer, Sangjo Han, Linda Jeffery, Seung-Tae Baek, Hyemi Lee, Young Sam Shim, Minho Lee, Lila Kim, Kyung-Sun Heo, Eun Joo Noh, Ah-Reum Lee, Young-Joo Jang, Kyung-Sook Chung, Shin-Jung Choi, Jo-Young Park, Youngwoo Park, Hwan Mook Kim, Song-Kyu Park, Hae-Joon Park, Eun-Jung Kang, Hyong Bai Kim, Hyun-Sam Kang, Hee-Moon Park, Kyunghoon Kim, Kiwon Song, Kyung Bin Song, Paul Nurse & Kwang-Lae HoeNat. Biotechnol. 28, 617–623 (2010); published online 16 May 2010; corrected after print 7 December 2010

In the version of this article initially published, the address of one of the authors, Young-Joo Jang, was incorrect. The correct address is Laboratory of Cell Cycle & Signal Transduction, WCU Department of NanoBioMedical Science, Institute of Tissue Regeneration Engineering, Dankook University, Cheonan, Korea. The error has been corrected in the HTML and PDF versions of the article.

ERRATA AND CORR IGENDA©

201

0 N

atur

e A

mer

ica,

Inc.

All

righ

ts r

eser

ved.

69

Submitted Publication

8. PomBase: A comprehensive online resource for fission yeast.

PomBase: a comprehensive online resourcefor fission yeastValerie Wood1,2,3,*, Midori A. Harris1,2,*, Mark D. McDowall4, Kim Rutherford1,2,Brendan W. Vaughan4, Daniel M. Staines4, Martin Aslett5, Antonia Lock6,Jurg Bahler6, Paul J. Kersey4 and Stephen G. Oliver1,2,*

1Cambridge Systems Biology Centre, 2Department of Biochemistry, University of Cambridge, Sanger Building,80 Tennis Court Road, Cambridge CB2 1GA, 3Cell Cycle Laboratory, Cancer Research UK, London ResearchInstitute, 44 Lincoln’s Inn Fields, London UK WC2A 3LY, 4European Bioinformatics Institute, Wellcome TrustGenome Campus, Hinxton, Cambridgeshire CB10 1SD, 5Wellcome Trust Sanger Institute, Wellcome TrustGenome Campus, Hinxton, Cambridgeshire CB10 1SA and 6Department of Genetics, Evolution andEnvironment, and UCL Cancer Institute, University College London, Darwin Building, Gower Street,London WC1E 6BT, UK

Received September 7, 2011; Accepted September 24, 2011

ABSTRACT

PomBase (www.pombase.org) is a new modelorganism database established to provide accessto comprehensive, accurate, and up-to-date mo-lecular data and biological information for thefission yeast Schizosaccharomyces pombe to ef-fectively support both exploratory and hypothesis-driven research. PomBase encompasses annotationof genomic sequence and features, comprehensivemanual literature curation and genome-wide datasets, and supports sophisticated user-definedqueries. The implementation of PomBase integratesa Chado relational database that houses manuallycurated data with Ensembl software that supportssequence-based annotation and web access.PomBase will provide user-friendly tools topromote curation by experts within the fissionyeast community. This will make a key contributionto shaping its content and ensuring its comprehen-siveness and long-term relevance.

INTRODUCTION

The fission yeast Schizosaccharomyces pombe is awell-studied eukaryotic model organism that has beenused since the 1950s to obtain valuable insights intodiverse eukaryotic biological processes including the cellgrowth and division cycle, genome organization and main-tenance, cell morphology and cytokinesis, signaling and

stress responses, chromatin and gene regulation andmeiotic differentiation (1). Moreover, since the completionof its genome sequence in 2002 (2), fission yeast hasemerged as a prime model for the characterization ofprocesses relevant to human disease and cell biology. Alarge and active community engages in biological and bio-medical research using this model system, routinelyapplying molecular genetic, cell biological and biochem-ical techniques. Data from small- and large-scale projectsare accumulating rapidly and will increase substantiallyover the next few years; the literature corpus currentlyexceeds 9000 publications and grows by about 500 publi-cations per year.PomBase (http://www.pombase.org) has recently been

established to provide user-friendly and standardizedaccess to genomic features and annotations, enabling sci-entists to assimilate novel findings into their researchprograms, improve experimental design, support the inter-pretation of genetic screens, and facilitate the interpret-ation of functional genomics and systems biologyexperiments. An accurate and comprehensive set ofmanual annotations of gene products based on publisheddata lies at the centre of this database, and is supple-mented by automatic annotation and information aboutnon-genic features. The PomBase project aims to providethree key resources to the fission yeast community:

. Comprehensive and deep curation of the scientificliterature;

. A software infrastructure to support curationactivities; and

*To whom correspondence should be addressed. Tel: +44 1223 746961; Fax: +44 1223 766002; Email: [email protected] may also be addressed to Midori A. Harris. Tel: +44 1223 761211; Fax: +44 1223 766002; Email: [email protected] may also be addressed to Stephen G. Oliver. Tel: +44 1223 333667; Fax: +44 1223 766002; Email: [email protected]

Published online 28 October 2011 Nucleic Acids Research, 2012, Vol. 40, Database issue D695–D699doi:10.1093/nar/gkr853

! The Author(s) 2011. Published by Oxford University Press.This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

. A computational infrastructure to integrate datacurated from the literature with the genomicsequence, high-throughput data sets, and data fromother fungal genomes.

DATA TYPES AND SOURCES

Historically, functional information about the genomeand biology of S. pombe has been maintained in a reposi-tory hosted by the GeneDB project at the Wellcome TrustSanger Institute (WTSI) (http://old.genedb.org/genedb/pombe/). This resource is now superseded by PomBase,which has not only inherited the data from GeneDB,but also includes additional curated data types andhigh-throughput data sets. The major data types availablein PomBase are listed in Table 1.

Sequence feature annotation

DNA and protein features are annotated using theSequence Ontology (SO) (4). Currently, 31 DNA featureterms are used in 25 650 annotations (examples includegene, exon, tRNA, centromere) and 22 protein featureterms for 919 annotations (examples include nuclear local-ization signal, ER retention signal, DDB box).

Gene Ontology annotation

Gene Ontology (GO) (5,6) terms are assigned to geneproducts to represent their molecular functions, cellularcomponents (including complexes), and biologicalprocesses. PomBase GO annotation data comprises over33 500 manual annotations and approximately 3500 auto-matically assigned annotations, using 3800 unique GOterms.

Phenotype annotation

To support the comprehensive and detailed representationof phenotypes, we are developing the Fission YeastPhenotype Ontology (FYPO), a formal ontology ofphenotypes observed in fission yeast. FYPO is a

modular ontology that uses several existing ontologiesfrom the Open Biological and Biomedical Ontologies(OBO) collection (7) as building blocks, including thephenotypic quality ontology PATO (8), GO, andChemical Entities of Biological Interest (ChEBI) (9).Over 7000 existing annotations have been convertedfrom the legacy GeneDB controlled vocabulary toFYPO terms; these annotations will support sophisticatedquerying, computational analysis, and comparisonbetween different experiments and even between differentspecies.

Genetic and physical interactions

Annotation of genetic and physical interactions are sup-ported using the BioGRID (http://thebiogrid.org) (10,11)annotation format. Existing annotations curated byBioGRID are imported into PomBase, and newlycreated annotations will be exchanged with BioGRID.

Genome-scale data

PomBase incorporates a wide variety of data sets that canbe mapped to the genome, obtained either from internalsources or via externally loaded URLs or data files.Examples of supported data sets include whole genomere-sequencing data, RNA-seq and ChIP-seq data,various microarray data and other high-throughput datatypes.

USER ACCESS TO POMBASE

A web portal has been developed for access to thePomBase data, which provides pages describing thecurrent state of annotation of the S. pombe genome,items of interest to the community, and, most importantly,a ‘Gene Overview Page’ that summarizes key informationabout each gene. Embedded in this portal is a genomebrowser, providing access to genomic context,sequence-based analyses and high-throughput data.

Table 1. Summary of data types available in PomBase

Data type Distinct descriptors Total annotations

Sequence Ontology DNA features 31 25 650Protein features 22 919

Gene Ontology Biological process 1861 14 501Molecular function 1405 9149Cellular component 543 16 101

Fission yeast phenotype ontology 342 7043Gene product descriptions 4331 7048Disease associations 134 378PSI-MOD(3) protein modification 22 1861EC numbers 520 837Name descriptions 186 513Annotation extensions 254 340Annotation status 7 5142BioGRID Genetic interactions N/A 13 147

Physical interactions N/A 5131Curated fission–budding yeast orthologs N/A 5210

D696 Nucleic Acids Research, 2012, Vol. 40, Database issue

Gene Overview Pages

Gene Overview pages organize gene-specific information,including the gene type, product description, sequencefeatures, phenotypes, Gene Ontology annotation andprotein modifications as well as physical and geneticinteractions. These pages are central to PomBase (seeFigure 1).

Searching

A simple search is available on every PomBase page. TheAdvanced Search allows users to perform queries onmultiple feature types including GO annotation, proteindomain, characterization status, species distribution,protein length, etc. A query history summarizes queriesand allows them to be edited or combined using unionor intersection.

Genome browser

The PomBase genome browser has been implementedusing software developed by the Ensembl project (12).

The Ensembl genome browser is a powerful system,offering support for the visualization of sequence, func-tional annotations, alignments, comparative data andpolymorphisms. Many of these features are already ex-ploited by PomBase, and others will be used as andwhen required for the incorporation of new data types.The use of standard technology readily supports compara-tive analyses with the genomes of other species that areaccessible via Ensembl (also see ‘Implementation’ section,below). Comparative analyses with other fungal genomesand with a range of taxonomically diverse genomes areprovided using data generated by the Ensembl Genomesproject (13). The Ensembl API enables users interested incomparative analyses to retrieve all data of interest frommultiple species present in Ensembl.

IMPLEMENTATION

The implementation of PomBase harnesses and integratesthree complementary, well supported and maturetechnologies, Chado (14), Ensembl and Drupal (http://

Figure 1. PomBase Gene Overview Page highlights. (A) Top of page showing essential information and genomic region. (B) Phenotype and GOannotation. (C) Additional curated data, including orthologs, interactions and external references. (D) Literature pertinent to the gene.

Nucleic Acids Research, 2012, Vol. 40, Database issue D697

www.drupal.org). Chado provides an environment formanual curation and the management of curated datawhile Ensembl provides end-user access via the webportal and display of sequence-based features.

Chado

Curated PomBase data are stored in a PostgreSQLdatabase using the Generic Model Organism Database(GMOD)-compliant Chado schema. Chado supports themanagement and storage of sequence annotation and lit-erature curation using any combination of availableontologies, and is therefore easily extensible to new datatypes. Sequence features are curated in Chado usingArtemis (15,16). New annotations produced by thePomBase curation team and the fission yeast community,and external data from BioGRID and UniProt/GOA (17),are loaded at regular intervals.

Ensembl

Ensembl is a generic software platform for the automaticannotation, analysis and display of genomes, in use forover a decade. In PomBase, Ensembl provides publicaccess to the integrated fission yeast data, and has beenextended to display the deep literature curation managedin Chado. Gene model and annotation data are retrievedfrom the Chado database using the Bio::Chado::SchemaPerl API, and loaded into an Ensembl MySQL databaseusing the Ensembl Perl API. Data in the Ensembl MySQLdatabase are accessible directly or via the Ensembl PerlAPI, and provide the content served both in theGenome Browser and in the Gene Overview pages. Datafrom the Ensembl database are also loaded into a BioMartdata warehouse (18,19) which supports data mining viaweb and programmatic interfaces and also providessupport for the advanced search (see below).

Drupal

Drupal is a content management system that has beenused to provide the web-based portal for PomBase. Theuse of Drupal has allowed the creation of a clean andintuitive user interface to access information aboutS. pombe and the PomBase project, and to supportcommunity-based functionality including wikis and dis-cussion forums. To support the PomBase interface, twocustom Drupal modules have been developed. The GeneOverview module is responsible for generating the GeneOverview pages by retrieving data about specific genesfrom a custom web service running on the Ensembl webserver, which in turn uses the Ensembl Perl API to querythe Ensembl MySQL databases. The Query Buildermodule supports the advanced search interface, and gen-erates and submits custom queries to the BioMart webservice to find genes matching specified criteria.

INTEGRATION OF USER DATA

PomBase welcomes contributions from the community toimprove the coverage and accuracy of its data. Userswanting to add or modify data in PomBase can directly

contact the curation staff (E-mail: [email protected]).

Following a successful pilot project conducted in 2008,a generic web-based curation environment is being de-veloped to support the launch of a comprehensive com-munity curation initiative early in 2012 (Rutherford et al.,in preparation). This will allow expert users to directlycontribute annotations based on their publications, andwill enhance the efforts of core curation staff and contrib-ute to the sustainability of the curation effort in the face ofincreasing volumes of highly specialized published data.

Additionally, there are many ways that users candirectly visualize sequence-based data within the contextof the web browser, including access to a DistributedAnnotation System, data upload to a private area of thesite, and dynamic integration of locally stored BAM files(using standard protocols such as HTTP, allowing users todirectly visualize large-scale experimental results in thecontext of the reference annotation). Producers ofmature data sets of any scale that are ready for full inte-gration in PomBase and public dissemination shouldcontact PomBase at the address above.

AVAILABILITY AND DATA PROPAGATION

All of the tools, protocols and workflows developed byPomBase are publicly available (http://www.pombase.org/downloads) and can be implemented by otherresearch communities to create analogous organism-specific databases either in collaboration with theEnsembl Genomes project or independently. Sequence,features and other annotation are available for bulkdownload via FTP, while subsets of data can be selectivelydownloaded using the PomBase BioMart.

FUTURE DIRECTIONS

PomBase will continue to incorporate large-scale data setsand curate new data types. We will also incorporatesequence data, automatic annotation, and high-throughput data sets available for other species in theSchizosaccharomyces genus (20).

We anticipate that usage of PomBase will extendbeyond the S. pombe community to encompass evolution-ary biologists studying genome variations and the evolu-tion of yeasts, fungi, and the eukaryota in general;researchers seeking well-studied orthologs of genes ofinterest in human and other species; curators from otherdatabases; and bioinformaticians and theoretical biolo-gists requiring programmatic access to fission yeast datain order to construct and test novel hypotheses.

ACKNOWLEDGEMENTS

The authors thank members of the Ensembl and EnsemblGenomes teams for contributions to the PomBase datapipelines. We also thank Chris Mungall for helpful discus-sions on Chado and GO, and we thank members of theS. pombe research community whose feedback has helpedestablish priorities for PomBase development.

D698 Nucleic Acids Research, 2012, Vol. 40, Database issue

FUNDING

Wellcome Trust [WT090548MA to SGO]. Funding foropen access charge: Wellcome Trust.

Conflict of interest statement. None declared.

REFERENCES

1. Egel,R. (ed.), (2004) The Molecular Biology ofSchizosaccharomyces pombe. Springer, Berlin, Germany.

2. Wood,V., Gwilliam,R., Rajandream,M.-A., Lyne,M., Lyne,R.,Stewart,A., Sgouros,J., Peat,N., Hayles,J., Baker,S. et al. (2002)The genome sequence of Schizosaccharomyces pombe. Nature,415, 871–880.

3. Montecchi-Palazzi,L., Beavis,R., Binz,P.-A., Chalkley,R.J.,Cottrell,J., Creasy,D., Shofstahl,J., Seymour,S.L. andGaravelli,J.S. (2008) The PSI-MOD community standard forrepresentation of protein modification data. Nat. Biotechnol., 26,864–866.

4. Eilbeck,K., Lewis,S.E., Mungall,C.J., Yandell,M., Stein,L.,Durbin,R. and Ashburner,M. (2005) The Sequence Ontology: atool for the unification of genome annotations. Genome Biol., 6,R44.

5. The Gene Ontology Consortium (2000) Gene Ontology: tool forthe unification of biology. Nat. Genet., 25, 25–29.

6. The Gene Ontology Consortium (2010) The Gene Ontology in2010: extensions and refinements. Nucleic Acids Res., 38,D331–D335.

7. Smith,B., Ashburner,M., Rosse,C., Bard,J., Bug,W., Ceusters,W.,Goldberg,L.J., Eilbeck,K., Ireland,A., Mungall,C.J. et al. (2007)The OBO Foundry: coordinated evolution of ontologies tosupport biomedical data integration. Nat. Biotechnol., 25,1251–1255.

8. Gkoutos,G.V., Green,E.C.J., Mallon,A.-M., Hancock,J.M. andDavidson,D. (2005) Using ontologies to describe mousephenotypes. Genome Biol., 6, R8.

9. deMatos,P., Alcantara,R., Dekker,A., Ennis,M., Hastings,J.,Haug,K., Spiteri,I., Turner,S. and Steinbeck,C. (2010) ChemicalEntities of Biological Interest: an update. Nucleic Acids Res., 38,D249–D254.

10. Stark,C., Breitkreutz,B.-J., Reguly,T., Boucher,L., Breitkreutz,A.and Tyers,M. (2006) BioGRID: a general repository forinteraction datasets. Nucleic Acids Res., 34, D535–D539.

11. Stark,C., Breitkreutz,B.-J., Chatr-Aryamontri,A., Boucher,L.,Oughtred,R., Livstone,M.S., Nixon,J., Auken,K.V., Wang,X.,Shi,X. et al. (2011) The BioGRID Interaction Database: 2011update. Nucleic Acids Res., 39, D698–D704.

12. Flicek,P., Amode,M.R., Barrell,D., Beal,K., Brent,S., Chen,Y.,Clapham,P., Coates,G., Fairley,S., Fitzgerald,S. et al. (2011)Ensembl 2011. Nucleic Acids Res., 39, D800–D806.

13. Kersey,P.J., Lawson,D., Birney,E., Derwent,P.S., Haimel,M.,Herrero,J., Keenan,S., Kerhornou,A., Koscielny,G., Kahari,A.et al. (2010) Ensembl Genomes: extending Ensembl across thetaxonomic space. Nucleic Acids Res., 38, D563–D569.

14. Mungall,C.J., Emmert,D.B. and FlyBase Consortium (2007) AChado case study: an ontology-based modular schema forrepresenting genome-associated biological information.Bioinformatics, 23, i337–i346.

15. Rutherford,K., Parkhill,J., Crook,J., Horsnell,T., Rice,P.,Rajandream,M.A. and Barrell,B. (2000) Artemis: sequencevisualization and annotation. Bioinformatics, 16, 944–945.

16. Carver,T., Berriman,M., Tivey,A., Patel,C., Bohme,U.,Barrell,B.G., Parkhill,J. and Rajandream,M.-A. (2008) Artemisand ACT: viewing, annotating and comparing sequences stored ina relational database. Bioinformatics, 24, 2672–2676.

17. Barrell,D., Dimmer,E., Huntley,R.P., Binns,D., O’Donovan,C.and Apweiler,R. (2009) The GOA database in 2009–an integratedGene Ontology Annotation resource. Nucleic Acids Res., 37,D396–D403.

18. Smedley,D., Haider,S., Ballester,B., Holland,R., London,D.,Thorisson,G. and Kasprzyk,A. (2009) BioMart–biological queriesmade easy. BMC Genomics, 10, 22.

19. Kinsella,R.J., Kahari,A., Haider,S., Zamora,J., Proctor,G.,Spudich,G., Almeida-King,J., Staines,D., Derwent,P.,Kerhornou,A. et al. (2011) Ensembl BioMarts: a hub for dataretrieval across taxonomic space. Database, 2011 (doi: 10.1093/database/bar030; epub ahead of print).

20. Rhind,N., Chen,Z., Yassour,M., Thompson,D.A., Haas,B.J.,Habib,N., Wapinski,I., Roy,S., Lin,M.F., Heiman,D.I. et al.(2011) Comparative functional genomics of the fission yeasts.Science, 332, 930–936.

Nucleic Acids Research, 2012, Vol. 40, Database issue D699

70

Submitted Publication 9. FYPO: The fission yeast phenotype ontology.

FYPO: The Fission Yeast Phenotype OntologyMidori A. Harris 1,⇤, Antonia Lock 2 Jurg Bahler 2 Stephen G. Oliver 1 andValerie Wood 1⇤

1Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge,Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK2Department of Genetics, Evolution & Environment and UCL Genetics Institute, University CollegeLondon, Darwin Building, Gower Street, London WC1E 6BT, UK

ABSTRACTMotivation: To provide consistent, computable descriptions ofphenotype data, PomBase is developing a formal ontology ofphenotypes observed in fission yeast.Results: The Fission Yeast Phenotype Ontology (FYPO) isa modular ontology that uses several existing ontologies fromthe Open Biological and Biomedical Ontologies (OBO) collectionas building blocks, including the phenotypic quality ontologyPATO, the Gene Ontology, and Chemical Entities of BiologicalInterest (ChEBI). Modular ontology development facilitates partially-automated, effective organization of detailed phenotype descriptionswith complex relationships to each other and to underlyingbiological phenomena. As a result, FYPO supports sophisticatedquerying, computational analysis, and comparison between differentexperiments and even between species.Availability: FYPO releases are available from the Subversionrepository at the PomBase SourceForge project page (https://sourceforge.net/p/pombase/code/HEAD/tree/phenotype_ontology/). The current version of FYPO is also available on theOBO Foundry web site (http://obofoundry.org/).Contact: [email protected]

1 INTRODUCTIONThe fission yeast Schizosaccharomyces pombe is a eukaryotic modelorganism that has been used since the 1950s to study diversebiological processes including the cell division cycle, genomeorganization and maintenance, cell morphology and cytokinesis,signaling and stress responses, chromatin, gene regulation andmeiotic differentiation Egel (2004). A large and active researchcommunity uses a wide variety of molecular genetic, cell biologicaland biochemical techniques to study S. pombe. With the completionof its genome sequence in 2002 Wood et al. (2002), fission yeasthas also become amenable to genome-scale experimentation, andhas emerged as a reliable model for studying processes involved inhuman disease and cell biology.

PomBase (http://www.pombase.org) has recently beenestablished as a comprehensive model organism database thatprovides centralized access to information relevant to S. pombe

⇤to whom correspondence should be addressed

Wood et al. (2012). PomBase encompasses a core of manualliterature curation that provides detailed, accurate curation ofphenotypes, Gene Ontology annotations, genetic and physicalinteractions, protein modifications, and many other types of datadescribing genes and their products. Manually curated data aresupplemented by automatic gene annotation and large-scale datasets, and information about additional sequence feature types.

We define a phenotype as an observable characteristic, or set ofcharacteristics, of an organism that results from the interaction ofits genotype with a given environment. Extensive genetics researchhas been carried out using S. pombe over several decades, and acomprehensive set of high-quality curated phenotype data is in highdemand in the S. pombe research community. A survey of S. pomberesearchers conducted in 2007 identified phenotype annotation asthe most requested feature not then available in a fission yeastdatabase.

In response to community demand, we have developed theFission Yeast Phenotype Ontology (FYPO), a formal ontology ofphenotypes observed in fission yeast that will allow PomBase toprovide consistent, computable descriptions of phenotype data.Using FYPO, we have begun to curate accurate and detailedannotations of mutant allele phenotypes, with the aim of providingcomprehensive coverage of phenotypes reported in the fission yeastliterature. FYPO annotations are available on PomBase gene pages,and we envisage that the availability of genome-scale phenotypedatasets will make new types of data analysis possible. Facilitatedby the formal structure of FYPO, phenotype annotations can beshared and integrated with additional data, including other types ofdata obtained in fission yeast as well as phenotype data from otherspecies.

2 APPROACHThe application of ontologies to biological curation has becomewidespread, and is best illustrated by the Gene Ontology(GO) project (http://www.geneontology.org) The GeneOntology Consortium (2000, 2012), a collaborative effort toconstruct and use controlled vocabularies to support functionalannotation of genes and their products in a wide varietyof organisms. Ontologies facilitate consistent, unambiguousdescriptions of biological concepts, and can accommodate content

1

Associate Editor: Dr. Janet Kelso

Bioinformatics Advance Access published May 8, 2013 by guest on M

ay 9, 2013http://bioinform

atics.oxfordjournals.org/D

ownloaded from

at different levels of taxon specificity. Ontologies allow annotationsat different levels of granularity, depending on what is known orwhat can be inferred, and provide mechanisms for quality control,consistency checking, and error correction using collected data(both within and between ontologies). We sought to make theseadvantages available to curators and database users of PomBasephenotype annotations.

GeneDB S. pombe (http://old.genedb.org/genedb/pombe/), the predecessor database to PomBase, offered anextensive set of Gene Ontology annotations, but did not useontologies to capture other data types. GeneDB provided minimalphenotype annotation using a small, manually constructed,controlled vocabulary. The phenotype vocabulary was a flat listof roughly 200 text descriptions, with no connections betweenthe different descriptions. Furthermore, because the GeneDBphenotype vocabulary was designed and used exclusively for S.pombe annotations, it did not support any data sharing or integrationbetween species or databases.

The launch of PomBase presented an opportunity to createan improved system for phenotype description, starting with a“blank slate” and unconstrained by the limitations of the GeneDBvocabulary and annotation system.

Ontology design considerationsThe entity–quality model We have constructed FYPO as a modularontology that uses several existing ontologies from the OpenBiological and Biomedical Ontologies (OBO) collection Smith et al.(2007) as building blocks to support the creation and maintenanceof an extensive set of pre-coordinated phenotype descriptors. Termsfrom OBO ontologies, including the phenotypic quality ontologyPATO Gkoutos et al. (2009), the Gene Ontology, the Cell OntologyMeehan et al. (2011), and Chemical Entities of Biological Interest(ChEBI) de Matos et al. (2010), are used to construct logicaldefinitions for FYPO terms. For a phenotype, a logical definitionfollows the entity–quality (EQ) model Mabee et al. (2007): theentity is what is affected, and can be the whole cell, a population ofcells, a part of a cell (corresponding to a GO cellular component),or an event such as a molecular function or biological process(represented by GO terms). An entity specification can be furtherrefined with additional details using GO or ChEBI terms. Thequality describes how the entity is affected, and is captured by aPATO term.

Evaluation of available ontologies and term composition Priorto commencing FYPO development, we examined the phenotypeontologies listed with the OBO Foundry, of which the AscomycetePhenotype Ontology (APO; http://www.yeastgenome.org/cache/PhenotypeTree.html), developed forSaccharomyces cerevisiae (budding yeast) Engel et al. (2010) andother fungi, most closely matches FYPO in scope and intendedapplication. Our evaluation was guided by the requirements of ourhighest-priority phenotype ontology applications. Most importantly,we require an extensive set of pre-composed phenotype terms forcommunity annotation and querying.

Existing phenotype ontologies typically use one of twoapproaches: In a pre-coordinated (or pre-composed) ontology,such as the Mammalian Phenotype Ontology Smith and Eppig(2009), phenotype descriptions are composed in advance (i.e.

separately from the annotation procedure). In other systems, suchas Dictyostelium discoideum Fey et al. (2009) and Danio rerio(zebrafish) Bradford et al. (2011), phenotype descriptions are post-coordinated (post-composed) at the time of annotation; a curatorchooses an entity and quality, and in some cases additional details,in parallel.

Phenotype annotation is one of the key features of PomBase’snewly developed community curation system (Rutherford et al.,manuscript in preparation), which allows researchers to contributeannotations from their publications directly to the database. For useby bench biologists, the simpler procedure of annotating to a singlepre-composed term is more intuitive than the parallel annotationprocess required with post-composition. We also anticipate thatbiologists will wish to annotate to highly specific terms, making thereasoning supported by logical EQ definitions essential for ontologymaintenance.

Annotations using APO terms, however, fall into the post-composed category: terms representing qualities and “observables”are combined by curators as part of the annotation procedure.Thus, although phenotype descriptions using APO are conceptuallycompatible with the EQ model, entity-quality combinations are notincorporated into APO itself, nor do APO terms include logicaldefinitions. Finally, curators using APO often add details drawnfrom other sources, including separate controlled vocabularies,meaning that much of the specific information captured in APOannotations is not incorporated into the ontology.

An additional ontology design consideration reflects distinctivefeatures of fission yeast biology. S. pombe represents anearly-diverging lineage within the Ascomycota (Taphrinomycota,formerly also known as Archiascomycetes) James et al. (2006).To accurately and consistently describe fission yeast phenotypes,we could reasonably expect to need specific terms that wouldnot apply to the other ascomycete fungi (Saccharomycotina andPezizomycotina) that have been annotated using APO. A newontology offers maximal freedom to fit fission yeast-specific termsinto a more general framework, without extensively restructuring analready-deployed vocabulary.

For these reasons, we have opted to develop FYPO independentlydespite the similarity in scope to APO.

Ontology contentHigh-level organization At the broadest level of classification,FYPO organizes terms along three axes. One axis distinguishesnormal from abnormal phenotypes, where ‘normal’ is operationallydefined as indistinguishable from characteristics of cells isogenicto the sequenced wild type strain (972 h–), and ‘abnormal’ asdetectably different from wild type, under the conditions in which aphenotype is assessed in a particular experiment.

A second axis classifies phenotypes by the entity affected; thebroad categories correspond to effects on biological processes (asdefined in GO), molecular functions (GO), or cellular structures(corresponding to GO cellular components). The third axisdistinguishes phenotypes relevant at the level of a cell are from thosethat can be observed only in a population of cells.

Table 1 shows the top-level classifications in FYPO, with thenumbers of is a descendants and cumulative annotations for eachterm.

2

by guest on May 9, 2013

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

Table 1. Top-level terms in FYPO. For each term, the name and unique ID is shown,along with the number of terms that are its is a descendants. The final column shows thenumber of individual annotations to the term or any of its descendants. (Data as of April22, 2013.)

Term name ID is a descendants Annotationsabnormal phenotype FYPO:0001985 1413 2546normal phenotype FYPO:0000257 348 735cell phenotype FYPO:0000002 1749 10323cell population phenotype FYPO:0000003 316 7584biological process phenotype FYPO:0000300 1248 3604molecular function phenotype FYPO:0000652 201 195

Representing common types of phenotype Further classification ofthe phenotype terms in FYPO, and their logical definitions, reflectsseveral general categories into which phenotypes fall.

Some phenotypes, such as cell morphology, affect the entirecell (represented by the root of the Cell Ontology, CL:0000000).Morphological changes are also observed at the sub-cellular level,corresponding to GO cellular component terms. Phenotypes thataffect cell size or shape refer to morphology qualities from PATO,as do phenotypes involving aberrant subcellular structures.

The largest category of phenotypes are those that affect (orinhere in) an entity corresponding to a GO biological process, i.e.phenotypes in which a cellular process does not proceed exactlyas in wild-type cells. A smaller, but conceptually similar, groupincludes phenotypes that affect GO molecular functions such asbinding or enzymatic activities. Phenotypes that affect biologicalprocesses and molecular functions refer to the corresponding GOterms, combined with PATO terms describing the alteration, e.g.‘abolished’, ‘delayed’, ‘advanced’ (onset), or increased or decreasedrate or frequency of occurrence.

Growth of cells on plates or in liquid medium is often evaluated,and changes in growth under specific conditions is taken to representsensitivity (as in the case of cell death or decreased growth rate oryield) or resistance (unchanged or increased growth rate or yield)to a stimulus. Sensitivity to various chemicals can be modeled bycombining the PATO term ‘increased sensitivity of a process’ withthe GO biological process ‘vegetative growth of a single-celledorganism’ and a ChEBI term representing the substance. Resistanceto a chemical follows the same model, using PATO ‘decreasedsensitivity of a process’. Sensitivity and resistance to stimuli otherthan chemical substances follow a similar pattern, but refer to GOterms for cellular responses to the stimuli. For example, ‘sensitiveto osmotic stress’ (FYPO:0000270) refers to the GO term ‘cellularresponse to osmotic stress’ (GO:0071470).

In addition to PATO, GO and ChEBI, FYPO draws on theSequence Ontology (SO) Eilbeck et al. (2005) for a small number ofterms that refer to specific DNA or RNA sequence regions, a smallnumber of Cell Ontology (CL) terms to distinguish phenotypes thataffect vegetatively growing cells or spores, and a single term theBRENDA Tissue Ontology (BTO) Gremse et al. (2011) is used forphenotypes that depend on, or affect, the growth medium.

Table 2 summarizes the usage of OBO ontology terms in FYPO.

Phenotype modeling challenges FYPO also includes a number ofterms that do not fit the simple logical models described above. The

principal types are complex phenotypes, which encompass morethan one quality, and phenotypes that affect cell populations.

Complex phenotypes can be represented as having simplerphenotypes as parts. To illustrate, Figure 1 shows the portion ofFYPO describing ‘mitotic catastrophe’ phenotypes, which arisewhen defects in mitotic chromosome segregation lead to cell death.The most general mitotic catastrophe term (FYPO:0001047) isdefined as the combination of ‘inviable’ (FYPO:0000049) with‘abnormal mitotic sister chromatid segregation’ (FYPO:0000141).Because mitotic catastrophe may occur with one or moreadditional features such as altered cell shape or size or a‘cut’ phenotype (i.e., septation despite abnormal chromosomesegregation), several more specific terms are included, andtheir logical definitions specify the additional parts. ‘Mitoticcatastrophe with cut’ (FYPO:0001048), for example, has theadditional parts ‘cut’ (FYPO:0000229) and ‘mistimed mitosis’(FYPO:0001204), whereas ‘mitotic catastrophe, elongated cells’(FYPO:0001051) adds FYPO:0001204 and ‘elongated vegetativecells’ (FYPO:0001122). All biologically relevant combinatorialpossibilities can be built, including ‘mitotic catastrophe with cut,elongated cells’ (FYPO:0001054).

Although most fission yeast phenotypes can be representedas properties of a cell (including events taking place in acell), some phenotypes can only be observed at the level of apopulation. These cell population phenotypes reflect propertiesof what cells do in groups, and pose particular challenges forlogical modeling because they do not represent characteristicsof a single organism. Some examples are colony morphology(‘abnormal colony morphology’ FYPO:0000150), flocculation(‘flocculating cells’ FYPO:0000155), and filament morphology.Some cellular processes can also be studied in cell populations,giving rise to population-level phenotype observations; one exampleis septation, for which the “septation index”, i.e. the proportionof cells in a population observed undergoing septation undergiven conditions. Although FYPO:0000155 is defined as ‘increasedoccurrence’ (PATO:0002051) of ‘flocculation’ (GO:0000128), mostcell population phenotype terms are among the small fraction inFYPO that do not yet have logical definitions.

To accurately model some phenotypes has required theintroduction of a few relations that are not defined in theOBO Relations Ontology (RO; http://code.google.com/p/obo-relations/) or the Basic Formal Ontology (BFO;http://www.ifomis.org/bfo/) at present. Some, suchas during and its subtypes exists during and happens during,

3

by guest on May 9, 2013

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

Fig. 1. Several specific types of mitotic catastrophe have been defined based on whether cell size or shape is affected, and whether the cells undergo septationdespite the failure of chromosome segregation (’cut’ phenotype). As the different specific mitotic catastrophe phenotypes support different interpretations ofthe underlying biology, the distinctions among these related phenotypes are valuable for downstream applications of phenotype annotations. A. Graphicalview of terms and is a relationships, which classify the terms. More specific terms build upon less specific terms by addition of differentiating features. Acomplex phenotype such as ‘mitotic catastrophe with cut, elongated cells’ has multiple paths to the root (most general term) of the ontology via differentparents, allowing annotations at any level of specificity. Also note that the paths in FYPO parallel the paths describing mitosis and the cell cycle in GO aswell as those in the cell morphology area of FYPO. B. OBO stanza defining FYPO:0001054 ’mitotic catastrophe with cut, elongated cells’. Note that thetwo is a relationships shown are manually asserted. The logical definition is specified by the intersection of lines, and the def line provides a human-readabledefinition.

4

by guest on May 9, 2013

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

Table 2. Usage of ontology terms in FYPO logical definitions: Of 2010 total FYPO terms (as of April22, 2013), 1802 have logical definitions. The table shows the external ontologies used in FYPO logicaldefinitions. ‘Unique external ontology terms’ denotes the number of different terms from the indicatedontology that are used; ‘FYPO terms’ indicates the number of FYPO terms that have a logical definitionusing one or more terms from the indicated ontology.

Ontology Unique external ontology terms FYPO termsBRENDA tissue/enzyme (BTO) 1 91Chemical Entities of Biological Interest (ChEBI) 196 480Cell Ontology (CL) 3 236Gene Ontology (GO) 570 1880Phenotypic quality (PATO) 88 1709Sequence Ontology (SO) 10 31

which are used to link process or structural phenotypes totime periods such as cell cycle phases, are borrowed froma set of relations developed by the GO Consortium for itsannotation extensions (Huntley et al., manuscript in preparation).Others, such as includes cells with phenotype, which links cellpopulation phenotypes with cell-level phenotypes of cells withinthe population, have been created specifically for FYPO and willbe submitted as candidates for addition to RO.

Current advantages of FYPO usage FYPO’s modular structureand formal logical definitions confer a number of advantages, asspecified below:

In ontology development, it is feasible to manage a largeset of terms, to define phenotypes precisely, and to representphenotype descriptions with complex relationships to each otherand to underlying biological phenomena. Reasoning software canuse FYPO’s logical definitions to infer links between terms andto detect redundancy and other errors, which streamlines ontologydevelopment. Furthermore, because the definitions refer directly andspecifically to terms from other OBO ontologies, reasoning overFYPO also keeps its structure consistent with external ontologiessuch as GO, ChEBI, and PATO. Text details are also more easilymanaged in FYPO than in a flat, manually managed, list. Forexample, synonymous words and phrases can be included toaid querying. Minor inconsistencies, such as misspellings andduplications, are easily avoided.

In addition to facilitating ontology development and qualitycontrol, FYPO supports much more effective manual curation thanthe legacy vocabulary from GeneDB. With many more specificterms, annotators can capture much richer, more detailed phenotypeinformation. The text and logical definitions help annotatorsmaintain accuracy and consistency in using a large set of ontologyterms.

Both the increased specificity and the structure of the ontologyalso support sophisticated querying and computational analysis.

Terms and annotations relevant to cytokinesis phenotypesillustrate many of the improvements that FYPO has facilitated.This topic is one of a number in which the GeneDB vocabularyhad a general descriptor such as “phenotype, cytokinesis defects”included as a substring of more specific entries such as “phenotype,cytokinesis defects, contractile ring, absent”, but the terms werenot otherwise related. Although a text search for the more generalstring would find both terms, a search for genes annotated to the

general term would not retrieve genes annotated to the more specificterm. In contrast, the FYPO term ‘abnormal actomyosin contractilering assembly’ (FYPO:0000161) has a logical definition thatstates that the quality ‘abnormal’ (PATO:0000460) inheres in theprocess of actomyosin contractile ring assembly during cytokinesis(GO:0000915, ’cytokinesis, actomyosin contractile ring assembly’).(Inheres in formally states that the PATO quality is an attributeof the GO process or other affected entity.) Figure 2 shows thelogical definition for FYPO:0000161 in Manchester syntax andOBO format. The classification of the phenotype term in FYPOparallels that of the biological process term in GO, in which‘actomyosin contractile ring assembly’ is both a type of ‘actincytoskeleton organization’ and a part of ‘cytokinesis’. Any mutantalleles annotated to FYPO:0000161 can therefore be retrieved byqueries for mutations that affect the actin cytoskeleton as well asthose affecting cytokinesis.

Normal phenotypes are represented in FYPO using the samelogical structures, and at the same level of detail, as abnormalphenotypes. Fission yeast is particularly amenable to normalphenotype annotation because the commonly used laboratory strainsare all isogenic, making unambiguous recognition of normal, andtherefore also abnormal, characteristics straightforward. Annotationof normal phenotypes allows curators to document mutations thatcause no phenotypic changes with respect to certain assays, orunder standard growth conditions. This provides important, albeitnegative, information about gene function, and makes the totalset of fission yeast phenotypes more comprehensive. For example,deletion of the small GTPase Ras1 causes defects in conjugation(mating) and sporulation Hughes et al. (1990); Herskowitz (1995);Papadaki et al. (2002). Deletion of the Ras1-activating guanylnucleotide exchange protein Ste6, however, causes defects onlyin conjugation. The normal sporulation phenotype observed in theste6 null mutant indicates that Ras1 must have other regulators anddownstream effectors besides Ste6.

3 METHODSFYPO is built using OBO-Edit Day-Richter et al. (2007). Logical definitionsare constructed in OBO-Edit for each term that can be represented asdescribed in the PATO XP best practices (http://obofoundry.org/wiki/index.php/PATO:XP_Best_Practice). In particular, FYPOuses inheres in both for qualities of processes and physical entities, as iscommon in other related efforts Mungall et al. (2010). It has since transpired

5

by guest on May 9, 2013

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

Fig. 2. Representation of FYPO:0000161, ‘abnormal actomyosin contractile ring assembly’, and its logical definition. A. Manchester syntax. B. OBO format.

that future versions of BFO may prohibit this usage, in which case we willeither modify the pattern we use, or use a broader relation, which will beincorporated in RO (C. J. Mungall, personal communication). The initialset of FYPO terms was based on a set of 208 free-text descriptors usedto annotate deletion (null) phenotypes in GeneDB. Additional terms weregenerated by combining a PATO quality with GO terms frequently used inS. pombe annotations supported by phenotypic evidence (using the evidence“inferred from mutant phenotype”, IMP). In ongoing FYPO development,terms are added or modified as needed to describe phenotypes in publishedliterature accurately and precisely. Term requests may come from PomBasecurators or community researchers. Regular releases of FYPO are generatedusing the OBO Ontology Release Tool (Oort; http://code.google.com/p/owltools/wiki/Oort), and include OBO and OWL formats.Reasoning uses the ELK reasoner Kazakov et al. (2011) as part of theOort release process. Links inferred by the reasoner during the Oort releaseare reviewed periodically, and any ontology errors that cause anomalousinferences are corrected.

4 DISCUSSIONWe have developed a formal ontology of phenotypes observed infission yeast, which now includes over 1900 terms, to supportphenotype curation in PomBase.

Applications of FYPOThe primary application of FYPO is to provide the phenotypeinformation demanded by the S. pombe research community,initially in the form of annotations displayed on PomBase genepages, detailing alleles, type of supporting evidence, and literaturecitations as well as FYPO terms. At present all fission phenotypeannotation is supported by published experimental data which hasbeen manually curated. To date, over 6000 legacy annotations havebeen converted from the GeneDB controlled vocabulary to FYPOterms, and a comparable number of new annotations have beencurated.

Enhanced phenotype description using FYPO supports curationof both classical low-throughput and emerging high-throughputexperiments. The latter will become increasingly important

as researchers use the genome-wide deletion collection thathas recently become available; genome-wide viability data arepublished Kim et al. (2010) and many more comprehensivephenotype screens are possible. We also include phenotype curationin the new community curation tool (Rutherford et al., manuscriptin preparation), which enables us to incorporate phenotypeannotations, along with supporting data on alleles and experimentalconditions, directly from expert researchers. Moreover, becauseusers can request new phenotype terms, community contributionsoffer substantial benefits to the phenotype ontology itself as well asthe collection of S. pombe phenotype annotations.

As high-throughput experiments become more common, andmanual curation of phenotypes from small-scale experimentsbecomes more complete, the body of S. pombe phenotype data willbecome sufficiently comprehensive to support enrichment analysesanalogous to those routinely performed using GO annotations (forexample, see Xue-Franzen et al. (2006); Shimanuki et al. (2007);Helmlinger et al. (2008); Deshpande et al. (2009); Kim et al.(2010); Marguerat et al. (2012)). We anticipate that comprehensivephenotype annotation, and analyses thereof, will complementGO annotation data. PomBase curators have begun reviewingGO annotations based on mutant phenotypes, to remove thosethat are known to represent indirect “downstream” effects. Manyexperimenters, however, will likely want to include both direct andindirect effects when analysing processes over- or under-representedin gene sets. The use of phenotype annotations to captureindirect effects, combined with GO annotations representing directeffects, allows us to maintain the direct–indirect distinction whilesupporting comprehensive enrichment analyses. As an example,a number of genes, such as the ER calcium-transporting ATPaseCta4, the pantothenate transporter Liz1, the mitochondrial DNApolymerase Pog1 and the DNA replication factor A subunit Ssb1,have annotations to FYPO terms describing cytokinesis defects, butare not annotated to cytokinesis in GO. FYPO annotations will alsoprovide access to statistical over-representation of gene lists forcellular phenomena that fall outside the scope of GO (such as drug

6

by guest on May 9, 2013

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

sensitivity, cell shape defects, cell lysis etc.). We further speculatethat sets of mutants will emerge with the same phenotypic signaturesbut distinct GO categories, which would suggest that distinct sets ofproteins may be involved (directly or indirectly) in the same cellularprocesses; such features would not be immediately evident from GOannotation alone.

Logical structure and data integrationWe have opted to pre-compose FYPO terms, primarily to simplifythe annotation process. Although pre- and post-composed termsmay be semantically equivalent, the parallel annotation processrequired with post-composition is not well suited to communitycuration. The simpler procedure of annotating to a single pre-composed term is more intuitive for bench biologists.

Because over 90% of FYPO terms have logical EQ definitions,however, we can also realize the benefits of explicit references toother OBO ontologies and reasoning. Notably, FYPO is compatiblewith the Cell Phenotype Ontology (CPO) Hoehndorf et al. (2012),a species-neutral ontology of morphological and physiologicalphenotypic characteristics of cells, cell components and cellularprocesses that supports automated synchronization with GO andintegration of cellular phenotype data across species. Like CPO,FYPO defines many phenotypes at the cellular level, in terms ofcellular processes or structures (both referring to GO) and how theyare affected (referring to PATO). The shared aspects of phenotyperepresentation mean that FYPO will be able to take advantage ofCPO’s automated synchronization to maintain consistency with GOand PATO. Conversely, the inclusion of FYPO and its associatedfission yeast phenotype annotations provides CPO with a set of high-quality data representing an important model organism, enriching itsintegrated data sets.

On a related note, the developers of the Ontology of MicrobialPhenotypes (OMP; http://microbialphenotypes.org/)are taking a similar approach to construct EQ-based descriptions ofphenotypes observed in microorganisms, especially in E. coli. Asconsiderable overlap in scope is likely between OMP and FYPO,the common underlying ontology structure will facilitate possiblefuture integration of ontology terms or annotation data. Because EQmodel-based phenotype integration methods can be used to alignpre- and post-composed phenotype terms Mungall et al. (2010), wecan also explore ways to align APO with FYPO to improve sharingof phenotype descriptions and annotation data.

Future workAs manual curation of phenotypes continues, we will add terms toFYPO as required, and we will explore ways to improve formalphenotype representations. For example, we anticipate that therecently launched Population and Community Ontology (PCO;http://code.google.com/p/popcomm-ontology/) willprovide terms that can be incorporated into logical definitions forFYPO population phenotypes.

We also envision extending FYPO to accommodate high-throughput experiments. For example, we will add complexphenotypes such as whole-transcriptome signatures used for eQTLmapping. High-throughput screens will also capture quantitativedata associated with phenotypes such as growth rates, survival ratesfollowing stress, or cell size and shape. Few of the challengesof modeling quantitative phenotypes have been met among the

broader community of ontology developers working on phenotyperepresentation, but we will work with both S. pombe researchers andontology developers to meet emerging community needs.

ACKNOWLEDGEMENTSWe thank George Gkoutos and Robert Hohendorf for advice onconstructing logical phenotype definitions, Heiko Dietze and ChrisMungall for helpful discussions and assistance with Oort, and JackyHayles for contributions to several phenotype definitions. We alsothank all PomBase project participants not listed as authors: KimRutherford, Mark McDowall, Dan Staines, and Paul Kersey.

Funding: This work was supported by the Wellcome Trust[WT090548MA to SGO].

REFERENCESBradford, Y., Conlin, T., Dunn, N., Fashena, D., Frazer, K., Howe, D. G., Knight, J.,

Mani, P., Martin, R., Moxon, S. A. T., Paddock, H., Pich, C., Ramachandran, S.,Ruef, B. J., Ruzicka, L., Schaper, H. B., Schaper, K., Shao, X., Singer, A., Sprague,J., Sprunger, B., Slyke, C. V., and Westerfield, M. (2011). ZFIN: enhancements andupdates to the Zebrafish Model Organism Database. Nucleic Acids Res, 39(Databaseissue), D822–829.

Day-Richter, J., Harris, M. A., Haendel, M., The Gene Ontology OBO-Edit WorkingGroup, and Lewis, S. (2007). OBO-Edit–an ontology editor for biologists.Bioinformatics, 23(16), 2198–2200.

de Matos, P., Alcantara, R., Dekker, A., Ennis, M., Hastings, J., Haug, K., Spiteri, I.,Turner, S., and Steinbeck, C. (2010). Chemical Entities of Biological Interest: anupdate. Nucleic Acids Res, 38(Database issue), D249–254.

Deshpande, G. P., Hayles, J., Hoe, K.-L., Kim, D.-U., Park, H.-O., and Hartsuiker, E.(2009). Screening a genome-wide s. pombe deletion library identifies novel genesand pathways involved in genome stability maintenance. DNA Repair (Amst), 8(5),672–679.

Egel, R., editor (2004). The Molecular Biology of Schizosaccharomyces pombe.Springer-Verlag, Berlin, Germany.

Eilbeck, K., Lewis, S. E., Mungall, C. J., Yandell, M., Stein, L., Durbin, R., andAshburner, M. (2005). The Sequence Ontology: a tool for the unification of genomeannotations. Genome Biol, 6(5), R44.

Engel, S. R., Balakrishnan, R., Binkley, G., Christie, K. R., Costanzo, M. C., Dwight,S. S., Fisk, D. G., Hirschman, J. E., Hitz, B. C., Hong, E. L., Krieger, C. J., Livstone,M. S., Miyasato, S. R., Nash, R., Oughtred, R., Park, J., Skrzypek, M. S., Weng, S.,Wong, E. D., Dolinski, K., Botstein, D., and Cherry, J. M. (2010). SaccharomycesGenome Database provides mutant phenotype data. Nucleic Acids Res, 38(Databaseissue), D433–436.

Fey, P., Gaudet, P., Curk, T., Zupan, B., Just, E. M., Basu, S., Merchant, S. N.,Bushmanova, Y. A., Shaulsky, G., Kibbe, W. A., and Chisholm, R. L. (2009).dictyBase–a Dictyostelium bioinformatics resource update. Nucleic Acids Res,37(Database issue), D515–519.

Gkoutos, G. V., Mungall, C., Dolken, S., Ashburner, M., Lewis, S., Hancock, J.,Schofield, P., Kohler, S., and Robinson, P. N. (2009). Entity/quality-based logicaldefinitions for the human skeletal phenome using PATO. Conf Proc IEEE Eng MedBiol Soc, 2009, 7069–7072.

Gremse, M., Chang, A., Schomburg, I., Grote, A., Scheer, M., Ebeling, C., andSchomburg, D. (2011). The BRENDA Tissue Ontology (BTO): the first all-integrating ontology of all organisms for enzyme sources. Nucleic Acids Res,39(Database issue), D507–513.

Helmlinger, D., Marguerat, S., Villen, J., Gygi, S. P., Jurg Bahler, and Winston, F.(2008). The s. pombe saga complex controls the switch from proliferation to sexualdifferentiation through the opposing roles of its subunits gcn5 and spt8. Genes Dev,22(22), 3184–3195.

Herskowitz, I. (1995). Map kinase pathways in yeast: for mating and more. Cell, 80(2),187–197.

Hoehndorf, R., Harris, M. A., Herre, H., Rustici, G., and Gkoutos, G. V. (2012).Semantic integration of physiology phenotypes with an application to the CellularPhenotype Ontology. Bioinformatics, 28(13), 1783–1789.

7

by guest on May 9, 2013

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

Hughes, D. A., Fukui, Y., and Yamamoto, M. (1990). Homologous activators of ras infission and budding yeast. Nature, 344(6264), 355–357.

James, T. Y., Kauff, F., Schoch, C. L., Matheny, P. B., Hofstetter, V., Cox, C. J., Celio,G., Gueidan, C., Fraker, E., Miadlikowska, J., Lumbsch, H. T., Rauhut, A., Reeb,V., Arnold, A. E., Amtoft, A., Stajich, J. E., Hosaka, K., Sung, G.-H., Johnson, D.,O’Rourke, B., Crockett, M., Binder, M., Curtis, J. M., Slot, J. C., Wang, Z., Wilson,A. W., Schussler, A., Longcore, J. E., O’Donnell, K., Mozley-Standridge, S., Porter,D., Letcher, P. M., Powell, M. J., Taylor, J. W., White, M. M., Griffith, G. W.,Davies, D. R., Humber, R. A., Morton, J. B., Sugiyama, J., Rossman, A. Y., Rogers,J. D., Pfister, D. H., Hewitt, D., Hansen, K., Hambleton, S., Shoemaker, R. A.,Kohlmeyer, J., Volkmann-Kohlmeyer, B., Spotts, R. A., Serdani, M., Crous, P. W.,Hughes, K. W., Matsuura, K., Langer, E., Langer, G., Untereiner, W. A., Lucking,R., Budel, B., Geiser, D. M., Aptroot, A., Diederich, P., Schmitt, I., Schultz, M.,Yahr, R., Hibbett, D. S., Lutzoni, F., McLaughlin, D. J., Spatafora, J. W., andVilgalys, R. (2006). Reconstructing the early evolution of Fungi using a six-genephylogeny. Nature, 443(7113), 818–822.

Kazakov, Y., Krotzsch, M., and Simancık, F. (2011). Concurrent classification of ELontologies. In L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal,N. Noy, and E. Blomqvist, editors, Proceedings of the 10th International SemanticWeb Conference (ISWC’11), volume 7032 of LNCS. Springer.

Kim, D.-U., Hayles, J., Kim, D., Wood, V., Park, H.-O., Won, M., Yoo, H.-S., Duhig,T., Nam, M., Palmer, G., Han, S., Jeffery, L., Baek, S.-T., Lee, H., Shim, Y. S., Lee,M., Kim, L., Heo, K.-S., Noh, E. J., Lee, A.-R., Jang, Y.-J., Chung, K.-S., Choi,S.-J., Park, J.-Y., Park, Y., Kim, H. M., Park, S.-K., Park, H.-J., Kang, E.-J., Kim,H. B., Kang, H.-S., Park, H.-M., Kim, K., Song, K., Song, K. B., Nurse, P., andHoe, K.-L. (2010). Analysis of a genome-wide set of gene deletions in the fissionyeast Schizosaccharomyces pombe. Nat Biotechnol, 28(6), 617–623.

Mabee, P. M., Ashburner, M., Cronk, Q., Gkoutos, G. V., Haendel, M., Segerdell, E.,Mungall, C., and Westerfield, M. (2007). Phenotype ontologies: the bridge betweengenomics and evolution. Trends Ecol Evol, 22(7), 345–350.

Marguerat, S., Schmidt, A., Codlin, S., Chen, W., Aebersold, R., and Jurg Bahler(2012). Quantitative analysis of fission yeast transcriptomes and proteomes inproliferating and quiescent cells. Cell, 151(3), 671–683.

Meehan, T. F., Masci, A. M., Abdulla, A., Cowell, L. G., Blake, J. A., Mungall,C. J., and Diehl, A. D. (2011). Logical development of the Cell Ontology. BMCBioinformatics, 12, 6.

Mungall, C. J., Gkoutos, G. V., Smith, C. L., Haendel, M. A., Lewis, S. E., andAshburner, M. (2010). Integrating phenotype ontologies across multiple species.Genome Biol, 11(1), R2.

Papadaki, P., Pizon, V., Onken, B., and Chang, E. C. (2002). Two ras pathways infission yeast are differentially regulated by two ras guanine nucleotide exchangefactors. Mol Cell Biol, 22(13), 4598–4606.

Shimanuki, M., Chung, S.-Y., Chikashige, Y., Kawasaki, Y., Uehara, L., Tsutsumi,C., Hatanaka, M., Hiraoka, Y., Nagao, K., and Yanagida, M. (2007). Two-step, extensive alterations in the transcriptome from G0 arrest to cell division in

Schizosaccharomyces pombe. Genes Cells, 12(5), 677–692.Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L. J.,

Eilbeck, K., Ireland, A., Mungall, C. J., OBI Consortium, Leontis, N., Rocca-Serra,P., Ruttenberg, A., Sansone, S.-A., Scheuermann, R. H., Shah, N., Whetzel, P. L.,and Lewis, S. (2007). The OBO Foundry: coordinated evolution of ontologies tosupport biomedical data integration. Nat Biotechnol, 25(11), 1251–1255.

Smith, C. L. and Eppig, J. T. (2009). The Mammalian Phenotype Ontology: enablingrobust annotation and comparative analysis. Wiley Interdiscip Rev Syst Biol Med,1(3), 390–399.

The Gene Ontology Consortium (2000). Gene Ontology: tool for the unification ofbiology. Nat Genet, 25(1), 25–29.

The Gene Ontology Consortium (2012). The Gene Ontology: enhancements for 2011.Nucleic Acids Res, 40(Database issue), D559–564.

Wood, V., Gwilliam, R., Rajandream, M.-A., Lyne, M., Lyne, R., Stewart, A., Sgouros,J., Peat, N., Hayles, J., Baker, S., Basham, D., Bowman, S., Brooks, K., Brown,D., Brown, S., Chillingworth, T., Churcher, C., Collins, M., Connor, R., Cronin,A., Davis, P., Feltwell, T., Fraser, A., Gentles, S., Goble, A., Hamlin, N., Harris,D., Hidalgo, J., Hodgson, G., Holroyd, S., Hornsby, T., Howarth, S., Huckle, E. J.,Hunt, S., Jagels, K., James, K., Jones, L., Jones, M., Leather, S., McDonald, S.,McLean, J., Mooney, P., Moule, S., Mungall, K., Murphy, L., Niblett, D., Odell, C.,Oliver, K., O’Neil, S., Pearson, D., Quail, M. A., Rabbinowitsch, E., Rutherford,K., Rutter, S., Saunders, D., Seeger, K., Sharp, S., Skelton, J., Simmonds, M.,Squares, R., Squares, S., Stevens, K., Taylor, K., Taylor, R. G., Tivey, A., Walsh,S., Warren, T., Whitehead, S., Woodward, J., Volckaert, G., Aert, R., Robben, J.,Grymonprez, B., Weltjens, I., Vanstreels, E., Rieger, M., Schafer, M., Muller-Auer,S., Gabel, C., Fuchs, M., Dusterhoft, A., Fritzc, C., Holzer, E., Moestl, D., Hilbert,H., Borzym, K., Langer, I., Beck, A., Lehrach, H., Reinhardt, R., Pohl, T. M., Eger,P., Zimmermann, W., Wedler, H., Wambutt, R., Purnelle, B., Goffeau, A., Cadieu,E., Dreano, S., Gloux, S., Lelaure, V., Mottier, S., Galibert, F., Aves, S. J., Xiang, Z.,Hunt, C., Moore, K., Hurst, S. M., Lucas, M., Rochet, M., Gaillardin, C., Tallada,V. A., Garzon, A., Thode, G., Daga, R. R., Cruzado, L., Jimenez, J., Sanchez, M.,del Rey, F., Benito, J., Domınguez, A., Revuelta, J. L., Moreno, S., Armstrong, J.,Forsburg, S. L., Cerutti, L., Lowe, T., McCombie, W. R., Paulsen, I., Potashkin, J.,Shpakovski, G. V., Ussery, D., Barrell, B. G., Nurse, P., and Cerrutti, L. (2002). Thegenome sequence of Schizosaccharomyces pombe. Nature, 415(6874), 871–880.

Wood, V., Harris, M. A., McDowall, M. D., Rutherford, K., Vaughan, B. W., Staines,D. M., Aslett, M., Lock, A., Jurg Bahler, Kersey, P. J., and Oliver, S. G. (2012).PomBase: a comprehensive online resource for fission yeast. Nucleic Acids Res,40(Database issue), D695–699.

Xue-Franzen, Y., Kjaerulff, S., Holmberg, C., Wright, A., and Nielsen, O. (2006).Genomewide identification of pheromone-targeted transcription in fission yeast.BMC Genomics, 7, 303.

8

by guest on May 9, 2013

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

71

Submitted Publication

10. Canto: An online tool for community literature curation.

Vol. 30 no. 12 2014, pages 1791–1792BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btu103

Databases and ontologies Advance Access publication February 25, 2014

Canto: an online tool for community literature curationKim M. Rutherford1,2,*, Midori A. Harris1,2, Antonia Lock3, Stephen G. Oliver1,2 andValerie Wood1,2

1Cambridge Systems Biology Centre, 2Department of Biochemistry, University of Cambridge, Sanger Building, 80 TennisCourt Road, Cambridge CB2 1GA and 3Department of Genetics, Evolution and Environment, and UCL Cancer Institute,University College London, Darwin Building, Gower Street, London WC1E 6BT, UKAssociate Editor: Jonathan Wren

ABSTRACT

Motivation: Detailed curation of published molecular data is essential

for any model organism database. Community curation enables re-

searchers to contribute data from their papers directly to databases,

supplementing the activity of professional curators and improving

coverage of a growing body of literature. We have developed Canto,

a web-based tool that provides an intuitive curation interface for both

curators and researchers, to support community curation in the fission

yeast database, PomBase. Canto supports curation using OBO ontol-

ogies, and can be easily configured for use with any species.

Availability: Canto code and documentation are available under an

Open Source license from http://curation.pombase.org/. Canto is a

component of the Generic Model Organism Database (GMOD) project

(http://www.gmod.org/).

Contact: [email protected]

Received on December 17, 2013; revised on February 12, 2014;

accepted on February 13, 2014

1 INTRODUCTION

The major activity of any model organism database (MOD) isthe manual curation of gene-specific information from peer-reviewed research articles, a time- and labour-intensive processthat involves reading publications and associating novel biolo-gical information with genes or other biological features. Severalfactors now motivate databases to develop alternative curationstrategies to supplement the efforts of professional curators tomaintain comprehensive annotation. Most pressingly, continuinggrowth in both the number of papers published, and the amountand complexity of information contained in a typical paper,threatens to outstrip the capacity of database staff. In addition,curators’ biological knowledge tends towards breadth ratherthan depth; a curator may annotate a paper on an unfamiliartopic in less than optimal detail, or make errors that expertswould avoid.PomBase (Wood et al., 2012), the MOD for the fission yeast

Schizosaccharomyces pombe, has introduced a community cur-ation initiative that engages researchers in direct curation of theirpublications, addressing issues of both literature volume andspecialized knowledge simultaneously. To support this, we havedeveloped Canto, a web-based tool that enables professional cur-ators and publication authors to capture detailed biological

knowledge accurately and consistently, using ontologies fromthe OBO Foundry collection (Smith et al., 2007). Canto can beconfigured to use gene (or gene product) identifiers for any spe-cies, as well as any of several ontologies, and can therefore bereadily adapted for diverse uses.

2 CURATION INTERFACE

Canto provides a simple, intuitive annotation interface thatrequires no specialized training for use. The user is guidedstep-by-step through the annotation procedure, ensuring thatall essential, and any optional, data required by a particularMOD are collected.In Canto, annotation is organized at the level of an individual

publication. For any paper, the first curation step is to specifythe genes (or gene products) to be annotated. For each gene, theuser then selects a type of data to curate. The types of identifiersallowed and the available data types are determined by config-uration (see Section 4). Subsequent annotation steps are specificto the data type.User documentation is provided as web pages and mouse-over

tooltips. A destination for user requests, such as a helpdesk ad-dress, can also be configured.

2.1 Curation using ontology terms

Most curation types in Canto use terms from bio-ontologies.Current Canto implementations use the Gene Ontology (GO)(The Gene Ontology Consortium, 2013) for function, processand component annotations and PSI-MOD for protein modifi-cations (Montecchi-Palazzi et al., 2008). The PomBase Cantoinstance uses the Fission Yeast Phenotype Ontology (Harriset al., 2013) for phenotype annotation, but any other ontologyof precomposed phenotypes can be substituted. To simplifyontology navigation for novice users, details of complex ontol-ogy structure are hidden. Instead, the user types familiar searchstrings, and then selects a relevant general term from a list ofmatching term names and synonyms. The user is then directed tothe most specific applicable child term. Links to external ontol-ogy browsers (such as AmiGO and QuickGO) provide access tothe ancestry and context of the term.The interface then guides the user through subsequent steps

that gather evidence and additional supporting data. For ex-ample, all ontology annotations require evidence, selected fromoptions tailored for the specific ontology. Phenotype annotationscapture details of alleles, expression levels and experimental*To whom correspondence should be addressed.

! The Author 2014. Published by Oxford University Press.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), whichpermits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

conditions. Finally, annotations can be transferred from onegene to another, streamlining the curation process. Because an-notations are made using precise ontology terms without freetext input, format and syntax errors are avoided. Users can,however, provide comments pertaining to individual annotationsor to the whole article.

2.2 Interaction curation

In addition to assigning ontology terms to genes, users can curategenetic and physical interactions. Starting from one gene, theuser selects an interaction type (physical or genetic), an interact-ing gene and an experiment type. Canto is configured to useBioGRID experiment types by default (Chatr-Aryamontriet al., 2013).

2.3 Literature and curation management

Canto includes an administrator interface that supports litera-ture- and curation-management tasks. Papers are retrieved fromPubMed according to administrator-specified criteria, such asorganism or publication date. Administrators can then use theliterature triage function to classify papers by type (e.g. curata-ble, review, methods) and prioritize for curation. Administratorscan select and curate papers, or invite authors to curate publica-tions. Users can also select their own papers for curation via apublication search. Administrators can monitor curation pro-gress, amend annotations in any active session and flag curationsessions as approved for public release.

3 METHODS

Canto is implemented in Perl using the Catalyst web frameworkand other widely used Perl packages, and has been engineered toensure that new annotation types can be added easily. In itsstandard mode of operation, Canto has no external dependen-cies, although it can be configured to use web services to retrievegene and publication details. All data is stored locally using theSQLite library. A CLucene (http://clucene.sourceforge.net/)index of ontology term names and synonyms supplies sugges-tions to the search autocomplete feature. A small amount ofJavascript is used on the browser side to make the applicationmore responsive.Canto can export in JSON format for loading into databases

that use the Chado schema (Mungall et al., 2007), or for archiv-ing or other applications. Curated GO data can be exported inGene Association File format (Balakrishnan et al., 2013).

4 CURRENT IMPLEMENTATIONS

The original implementation of Canto supports community cur-ation for S.pombe literature, as part of the PomBase project.Because many aspects of Canto, such as supported ontologies,and gene/gene product identifiers, are fully configurable, Canto

can be easily deployed for other organisms, with or without adedicated organism-specific database. We have set up twoadditional Canto installations, illustrating its flexibility. In a spe-cies-specific example, literature triage for the yeast Komagataellapastoris (formerly Pichia pastoris) has been completed, and an-notation is planned (D. Dikicioglu et al., manuscript in prepar-ation). A species-independent version of Canto supports GOannotation using UniProtKB protein accessions. CurrentCanto installations, including a demonstration tool, are access-ible on the Canto home page (http://curation.pombase.org/).

5 FUTURE DEVELOPMENT

Canto will be enhanced to support ontology subsets, taxonrestrictions (Deegan et al., 2010) and annotation extensions(R.P. Huntley et al., manuscript in preparation). We will alsoincorporate semantic checks for logical consistency and compre-hensive annotation. To improve efficiency, we will enable Cantoto link to TermGenie (http://termgenie.org; H. Dietze et al.,manuscript in preparation), which streamlines the creation ofnew GO terms. To increase interoperability, we plan to providefunctionality to export to GPAD (The Gene OntologyConsortium, 2013) and other useful formats as needed.

ACKNOWLEDGEMENTS

We thank Chris Brown, University of Otago, New Zealand andChris Mungall, Lawrence Berkeley National Laboratory, U.S.A.for help and advice during the development of Canto.

Funding: Wellcome Trust (grant WT090548MA to S.G.O.).

Conflict of Interest: none declared.

REFERENCES

Balakrishnan,R. et al. (2013) A guide to best practices for Gene Ontology (GO)

manual annotation. Database (Oxford).Chatr-Aryamontri,A. et al. (2013) The BioGRID Interaction Database: 2013

update. Nucleic Acids Res., 41, D816–D823.Deegan,J. et al. (2010) Formalization of taxon-based constraints to detect incon-

sistencies in annotation and ontology development. BMC Bioinform., 11, 530.Harris,M. et al. (2013) FYPO: the fission yeast phenotype ontology. Bioinformatics,

29, 1671–1678.Montecchi-Palazzi,L. et al. (2008) The PSI-MOD community standard for repre-

sentation of protein modification data. Nat. Biotechnol., 26, 864–866.Mungall,C. et al. (2007) A Chado case study: an ontology-based modular schema

for representing genome-associated biological information. Bioinformatics, 23,

i337–i346.Smith,B. et al. (2007) The OBO Foundry: coordinated evolution of ontologies to

support biomedical data integration. Nat. Biotechnol., 25, 1251–1255.The Gene Ontology Consortium. (2013) Gene Ontology annotations and resources.

Nucleic Acids Res., 41, D530–D535.Wood,V. et al. (2012) PomBase: a comprehensive online resource for fission yeast.

Nucleic Acids Res., 40, D695–D699.

1792

K.M.Rutherford et al.

72

Submitted Publication

11. A genome-wide resource of cell cycle and cell shape genes of

fission yeast.

rsob.royalsocietypublishing.org

ResearchCite this article: Hayles J, Wood V, Jeffery L,Hoe K-L, Kim D-U, Park H-O, Salas-Pino S,Heichinger C, Nurse P. 2013 A genome-wideresource of cell cycle and cell shape genes offission yeast. Open Biol 3: 130053.http://dx.doi.org/10.1098/rsob.130053

Received: 21 March 2013Accepted: 30 April 2013

Subject Area:genomics/bioinformatics/cellular biology

Keywords:genome-wide gene deletion resource,cell cycle, cell shape, fission yeast

Author for correspondence:Jacqueline Haylese-mail: [email protected]

†These authors contributed equally to thisstudy.

Electronic supplementary material is availableat http://dx.doi.org/10.1098/rsob.130053.

A genome-wide resource of cellcycle and cell shape genes offission yeastJacqueline Hayles1,†, Valerie Wood1,2,†, Linda Jeffery1,†,

Kwang-Lae Hoe3,†, Dong-Uk Kim4,†, Han-Oh Park5,†,

Silvia Salas-Pino6,7, Christian Heichinger6,8 and Paul Nurse1,6

1Cell Cycle Laboratory, Cancer Research UK, London Research Institute,44 Lincoln’s Inn Fields, London WC2A 3LY, UK2Cambridge Systems Biology Centre and Department of Biochemistry, University ofCambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK3Department of New Drug Discovery and Development, Chungnam National University,Yusong, Daejeon, South Korea4Aging Research Center, Korea Research Institute of Bioscience and Biotechnology,Yusong, Daejeon, South Korea5Bioneer Corporation, Daedeok, Daejeon, South Korea6Laboratory of Yeast Cell Biology and Genetics, Rockefeller University, 1230 York Avenue,New York, NY 10021-6399, USA7Centro Andaluz de Biologıa del Desarrollo, CSIC/Junta de Andalucia/Universidad Pablo deOlavide, Carretera de Utrera, km 141013 Sevilla, Spain8Department of Developmental Genetics, Institute of Plant Biology, University of Zurich,Zollikerstrasse 107, 8008 Zurich, Switzerland

1. SummaryTo identify near complete sets of genes required for the cell cycle and cellshape, we have visually screened a genome-wide gene deletion library of4843 fission yeast deletion mutants (95.7% of total protein encoding genes)for their effects on these processes. A total of 513 genes have been identifiedas being required for cell cycle progression, 276 of which have not been pre-viously described as cell cycle genes. Deletions of a further 333 genes lead tospecific alterations in cell shape and another 524 genes result in generally mis-shapen cells. Here, we provide the first eukaryotic resource of gene deletions,which describes a near genome-wide set of genes required for the cell cycleand cell shape.

2. IntroductionUnderstanding how cells reproduce and how they generate their shape are twomajor goals in eukaryotic cell biology. These two processes are related because,during the cell cycle, cells duplicate cellular components and reproduce theircell structure in space to generate two daughter cells. A key function of thecell cycle is to ensure accurate replication and segregation of the genome,because errors in genetic transmission can cause mutations and chromosomalrearrangements that may lead to cell death or disease. Failure to accurately

& 2013 The Authors. Published by the Royal Society under the terms of the Creative Commons AttributionLicense http://creativecommons.org/licenses/by/3.0/, which permits unrestricted use, provided the originalauthor and source are credited.

on August 24, 2016http://rsob.royalsocietypublishing.org/Downloaded from

reproduce and maintain cell shape can disrupt tissue archi-tecture or influence cell motility and may also lead to celldeath or disease. Given the importance of these two processesfor cell biology, we have generated a genome-wide resource,cataloguing the genes that when deleted disrupt the cell cycleor cell shape in the fission yeast Schizosaccharomyces pombe.This is the first such resource that qualitatively describes anear complete set of genes required for these processes in aeukaryotic organism.

Fission yeast is very amenable for investigating the cellcycle and cell shape [1–3] and has been used extensivelyfor cell cycle studies for many years [4]. It is a rod-shaped,unicellular eukaryote that grows by apical extension anddivides by medial fission and septation. This regular cellshape has made fission yeast a very useful organism to ident-ify genes involved in the cell cycle and the generation andmaintenance of cell shape. Mutants can easily be identifiedby visually screening for cells that divide at a longer orshorter length compared with wild-type (WT) (cell cycledefect), or that do not have a rod-shape (cell shape defect).A long cell phenotype is generated if cells are blocked ordelayed in cell cycle progression because they continue togrow but fail to divide and thus become elongated. However,not all genes required for the cell cycle show a long cellphenotype when deleted. For example, genes encodingcheckpoint proteins that are not required during a normalcell cycle may have a WT deletion phenotype and genesrequired for mitosis often arrest with a more irregular, non-elongated cell shape [5]. The long cell phenotype is easilyidentified by visual screening and is definitive for cellsblocked or delayed in cell cycle progression through the G1,S, G2 and cytokinesis phases of the cell cycle [6]. We havetherefore focused on identifying all genes with this pheno-type when deleted, to determine the majority of genesrequired for progression through interphase or cytokinesisin fission yeast.

Genes important for the generation of cell shape are alsoeasily identified, because mutants deleted for these genes losethe normal rod-shape. These cells exhibit a range of pheno-types, including rounded or stubby, curved, branched,skittle-shaped or more generally misshapen [2,3]. The pene-trance of these mutant phenotypes can be quite variable; insome cases, the majority of cells have the same alteredcell shape, whereas in others the phenotype is of lowerpenetrance with fewer of the cells exhibiting the phenotype.

Fission yeast currently has 5059 annotated protein codinggenes, and a genome-wide deletion collection has been con-structed with 4836 genes deleted [7]. In this study, we havesystematically visually screened the deletion collection toidentify genes required during interphase of the cell cycle,cytokinesis and for cell shape. This resource complementsand extends earlier studies using gene deletions in buddingyeast [8,9] and RNAi-based gene knockdowns in metazoa[10–15]. Although the budding yeast gene deletion collectionhas been extensively investigated, it has not been subjectedto a systematic screen for cell cycle genes such as that carriedout here, while metazoan RNAi cell cycle studies haveshown only limited reproducibility [12]. Given that 3397(67.14%) of the fission yeast protein coding genes haveidentifiable orthologues in metazoa (http://www.pombase.org), the genome-wide resource provided here will help toidentify cell cycle and cell shape genes in other eukaryotes,including humans.

3. Results3.1. Screening of the haploid deletion mutantsWe have microscopically examined and described the deletionphenotypes of 4843 haploid gene deletion mutants of bothessential and non-essential genes, after sporulating diploid het-erozygous deletion mutants and germinating haploid sporeson rich medium plates (see §5.1 for details of the screen andelectronic supplementary material 1, tables S1 (column G)–S3). Mutants were classified to one of 11 cell shape phenotypestogether with three additional categories, namely: WT; arrestedas normal spores (spores); or arrested as normal germinatedspores (germination) (figures 1 and 2a and table 1; electronicsupplementary material 1, table S1, column H), making 14 phe-notype categories in total (see §5.1). These phenotypes weremapped to terms in the Fission Yeast Phenotype Ontology(http://www.obofoundry.org/cgi-bin/detail.cgi?id=fypo)(see the electronic supplementary material 1, table S4), and thegene list for each category can be found in the electronic sup-plementary material 1, table S5a–n. All 4843 deletions wereassigned to only one of these 14 categories for analysis. Thevariation in the phenotype for mutants in a particular categorycan be found in the phenotype description in the electronicsupplementary material 1, table S1, column G. For example,some long mutants may be slightly curved, and T-shapedcells were observed in a subset of curved cells.

Gene Ontology (GO) term enrichment of biological pro-cesses and cellular components for these 14 phenotypecategories is shown in tables 2–4 (see the electronic sup-plementary material 1, table S6a–n and table S7a–n for thecomplete results). There are 643/4843 genes with no GO pro-cess annotation. Of these 643 ‘unknowns’, 574 (89.2%) havea WT deletion phenotype. This means that most genes show-ing one or more of the 13 other deletion phenotypes areassigned a biological process either by inference from otherorganisms or because they have been partially characterizedin fission yeast. However, their cellular shape is often notpart of that characterization.

To identify genes required for the cell cycle and cellshape, we focused on the 11 cell shape categories. These phe-notypes and their relevance for the cell cycle and cell shapeare described in the following sections. To demonstrate theuse of this resource, we describe a more detailed cytologicalanalysis of a subset of mutants altered in cell shape. Inaddition, we screened the library of haploid viable deletionmutants for hydroxyurea (HU) sensitivity and identifiednew genes implicated in the DNA checkpoint preventingentry into mitosis when DNA replication is incomplete orDNA is damaged.

3.2. Gene deletion mutants with an elongatedcell phenotype

A long cell phenotype identifies cells blocked in progressionthrough interphase of the cell cycle or cytokinesis (see §2).Gene deletion mutants with this phenotype were assignedto three categories, long high penetrance (long HP; 346/4843), long low penetrance (long LP; 136/4843) and longbranched (long Br; 31/4843) (figure 1 and table 1 and elec-tronic supplementary material 1, tables S4 and S5g–i).These three categories totalled 513 genes required for cell

rsob.royalsocietypublishing.orgOpen

Biol3:130053

2

on August 24, 2016http://rsob.royalsocietypublishing.org/Downloaded from

cycle progression, and we conclude that 10 per cent of fissionyeast genes are required directly or indirectly for progressionthrough interphase or cytokinesis (513/4843). Four hundredand sixty-seven genes had a strong elongated deletion pheno-type and a further 46 genes were included that were onlyweakly elongated when deleted (see §5.1). Of the 513 genesin the long categories, 66.5 per cent (341 genes) were essentialfor viability compared with 26.2 per cent for all genes, and85.57 per cent (439 genes) were conserved in human. Thisindicates that many of the genes identified in this study arelikely to be important for understanding the cell cycle inother more complex eukaryotes.

All three long categories were enriched for nuclear localiz-ation (table 3). The long Br set was enriched for cytokinesisand transcription and included genes encoding subunits ofthe RNA polymerase II holoenzyme, mediator and SAGAcomplexes (tables 2 and 4). As expected the long HP and longLP sets were both overrepresented for genes involved inDNA metabolism and the regulation of mitotic cell cycle(table 2). The long HP set was also enriched for processes andcomplexes involved in mRNA metabolism (particularly spli-cing), RNA biogenesis, transcription and DNA replicationinitiation, and 6/6 genes encoding subunits of the MCM com-plex (see table 4 and legend for details). By contrast, the long LPset was enriched for chromosome segregation and for genesencoding subunits of kinetochore complexes, including Mis6-SIM4 (8/14 subunits), Ndc-Mis-Spc (6/10 subunits), condensin(4/5 subunits) and APC (8/13 subunits) (tables 2 and 4). Thesedifferences in enrichment indicate that long HP is a good

classifier for genes involved in progress through interphase,whereas long LP is more specific for genes associated with pro-gress through mitosis (table 2 footnotes g,h). Mutants thatblock in mitosis usually display an irregular less defined cellshape but are not elongated, whereas an elongated phenotypeis characteristic of an interphase block, so one possibility is thatthe long LP gene set is enriched for a subset of mitotic genes thatare also required for interphase progression, with some cellsarresting in mitosis and other cells in interphase. Together,these three long phenotype categories (long HP, long LP andlong Br) define gene sets important for progression throughinterphase or cytokinesis. The long LP gene set is in additionenriched for a set of mitotic genes, which may also be requiredduring interphase.

3.3. Cell cycle mutants without a long phenotypeNot all genes previously known to be required during inter-phase have an elongated phenotype when deleted. We foundthat DNA replication genes cdc23, ssb1, pol1, cdc18, cdt1, rad4and four genes encoding core subunits of the RFC (rfc2, rfc3,rfc4 and rfc5) were all annotated to the misshapen essentialset (miss E; table 1 and electronic supplementary material 1,table S5d ). Some of these genes are known to be requiredfor both DNA replication and the DNA checkpoint, andmutant strains enter mitosis with incompletely replicatedDNA [16–20]. The miss E category also contains genesrequired for mitosis, for example, the kinetochore proteinNnf1 and pericentrin Pcp1. In total, 57 genes were annotated

curved

long Br

rounded stubby skittle

long LP

WT miss V miss weak V

smalllong HP

miss E

Figure 1. Cell shape categories. Examples of the 11 major cell shape phenotype categories described in this study. For each deletion, mutant cells are shown asobserved during the screen. Definitions of each phenotype can be found in the electronic supplementary material 1, table S4. Scale bar (shown in WT) ¼ 10 mm.

rsob.royalsocietypublishing.orgOpen

Biol3:130053

3

on August 24, 2016http://rsob.royalsocietypublishing.org/Downloaded from

to one or more cell cycle process (table 2 and electronic sup-plementary material 1, table S6d ). We conclude that the missE phenotypic class will be a good starting point to screen forfurther genes required for both DNA replication and theDNA checkpoint as well as for genes required during mitosis.

Another group of previously known cell cycle genes exhi-bit a wee phenotype, with viable cells dividing at a shorterlength and a smaller cell volume than WT cells. A partialgene deletion library of viable haploid mutants has alreadybeen screened, and 18 wee mutants identified [21]. In thisstudy, we identified by visual examination 25 genes with asmall cell deletion phenotype; this small set consisted of11 non-essential genes and 14 essential genes (see the electronicsupplementary material 1, table S5m). The 11 non-essentialgenes included nine of the 18 previously identified wee genesand were mainly those with a stronger wee deletion phenotype[21]. The two additional genes encode a predicted 26S protea-some non-ATPase regulatory subunit (SPCC18.17c), whichhas been found (F. Navarro and P. Nurse 2012, personal com-munication) to be shorter but also wider and so divided at aWT cell volume, and a mitochondrial inheritance GTPase(dml1), which is potentially a new wee gene. The 14 essentialgenes were enriched for tRNA metabolism (see table 2

for summary and the electronic supplementary material 1,tables S6m and S7m for details), including genes encodingsubunits of the RNase P and mitochondrial RNase P,which have roles in RNA processing [22], tRNA 2’-O-ribosemethyltransferase, tRNA-specific adenosine deaminase (2/2subunits) and the tRNA-specific splicing endonuclease (2/4subunits). It is possible that these non-viable small mutants,like other similar small size viable mutants identified inbudding yeast that affect growth [23], may only indirectlyaffect cell cycle progression.

3.4. New cell cycle genesOur genome-wide screen has identified 513 genes with along cell deletion phenotype and thus required for the cellcycle in fission yeast. Previously, 158 fission yeast genes

skittle

0 100 200 300 400

smallcurvedstubby

roundedlong Brlong LPlong HP

miss weak Vmiss Vmiss E

gene number

all longthis study513 genes

previously publishedas longwhen deleted158 genes

147 366 147

cell cycleannotationin fissionyeast614 genes

human cell cyclegenes 521 genes

new cellcycle thisstudy276 genes

501 113

365

43 233

11

(a)

(b)

(c)

cate

gory

Figure 2. Distribution of cell shape genes. (a) Distribution of 1395 genes with acell shape deletion phenotype among the 11 cell shape categories. (b) Overlap oflong gene set identified in this study (513 genes, green circle) with a set of pre-viously published genes with a long deletion phenotype (158 genes, red circle).For further details see the electronic supplementary material 1, table S8. (c) Over-lap between (i) 521 fission yeast orthologues of human genes with an RNAi cellcycle phenotype (blue circle), (ii) 614 genes with a mitotic cell cycle annotation infission yeast ( pink circle) and 276 new cell cycle genes from this study (orangecircle). For further details see the electronic supplementary material 1, tables S8and S11.

Table 1. Summary of phenotype categories. Fourteen broad phenotypecategories were used for analysis, and the number and dispensability ofgenes in the different categories is shown. Each gene is classified by asingle phenotype category based on the most penetrant or strongestdeletion phenotype. Cells were only described as wild-type (WT) when noother phenotype was observed. The classifier for each gene can be found inthe electronic supplementary material, table S1, column H and genedispensability in table S1, column I. All genes are included only once andcell phenotype terms from the electronic supplementary material 1, tableS4 that are not included as separate phenotypes are subsets of one ofthese categories, for example, a T-shaped mutant is included in the curvedcategory as curved is the most penetrant phenotype for this type of cellshape mutant.

phenotypecategories foranalysis

totalgenes

essentialgenes

non-essentialgenes

WT 3041 0 3041

spores 184 184 0

germination 223 223 0

misshapen

essential

302 302 0

misshapen

viable

31 0 31

misshapen weak

viable

191 0 191

long high

penetrance

346 215 131

long low

penetrance

136 106 30

long branched 31 20 11

rounded 89 55 34

stubby 52 8 44

curved 50 11 39

small 25 14 11

skittle 142 129 13

total genes

analysed

4843 1267 3576

rsob.royalsocietypublishing.orgOpen

Biol3:130053

4

on August 24, 2016http://rsob.royalsocietypublishing.org/Downloaded from

Table 2. GO cellular processes for all phenotype categories. A summary of the GO analysis to identify genes annotated to cellular processes enriched within particular phenotype categories. The enrichment results were mapped to ‘GOslim’ (high level) terms covering most biological processes observed in fission yeast to give a broad view of the ontology content of the genome-wide gene deletion dataset. For details see §5.5.2 and the electronic supplementarymaterial 1, table S6a – n and table S14. The total dataset is 4843 genes. Footnotes are denoted by a – n. Red colour denotes enriched p ! 0.001; orange denotes moderately enriched p ! 0.01; light orange denotes weakly enrichedp ! 0.1: light blue denotes no enrichment p ! 1; blank denotes number of genes is 0.

cellu

lar

proc

esse

s

tota

lge

nenu

mbe

rs

trans

crip

tion

cyto

plas

mic

trans

latio

n

ribos

ome

biog

enes

is

mRN

Am

etab

olism

nucle

ocyt

opla

smic

trans

port

nucle

obas

e,sid

e,tid

em

etab

olism

tRNA

met

abol

ism

signa

lling

DNA

repl

icatio

n

DNA

reco

mbi

natio

n

DNA

repa

ir

regu

latio

nof

mito

ticce

llcy

cle

chro

mos

ome

segr

egat

ion

cyto

kine

sis

mito

chon

drio

nor

gani

zatio

n

cell

wal

lor

gani

zatio

n

cyto

skel

eton

orga

niza

tion

cell

pola

rity

carb

ohyd

rate

met

abol

ism

lipid

met

abol

ism

trans

mem

bran

etra

nspo

rt

vesic

le-m

edia

ted

trans

port

othe

r

nopr

oces

san

nota

tion

WT 3041 241 151 117 85 53 230 47 233 32 60 80 92 75 39 92 70 90 24 154 117 230 177 730 574

spores 184 19 37 35 7 16 34 16 12 3 1 1 9 4 5 7 4 17 3 6 10 12 23 11 2

germination 223 14 16 34 21 5 28 13 9 3 0 2 25 7 3 6 3 6 2 15 20 13 31 21 2

miss E 302 50 12 39 22 7 36 11 21 13 4 7 25 19 18 3 7 21 5 21 35a 5 40 26 10

miss V 31 4 1 1 1 0 2 0 2 0 0 1 0 1 6 0 2 4 3 2 2 2 5 2 5

miss weak V 191 14 19 12 6 8 17 2 24 1 1 5 6 16b 15 2 14 17 3 13 16 10 18 32 20

long HP 346 66c 13 43 66d 18 45 13e 35f 46 23 36 74g 26 24 2 3 19 7 7 3 0 5 23 11

long LP 136 16 1 20 12 7 14 3 6 20 12 17 27h 37 2 2 0 14i 0 0 3 2 1 3 7

long Br 31 21 0 1 0 0 2 0 2 1 0 6 3 1 10 0 2 3 0 2 1 0 0 1 0

rounded 89 4 4 2 2 0 10 3 13j 0 0 0 9 3 10 6 14 7 14 13 8 5 7 16 0

stubby 52 4 1 1 1 1 6 1 10 0 0 1 4 1 11 1 10 10k 8 11 3 2 9 5 1

curved 50 14 1 2 2 1 6 4 3 0 0 0 4 3 7 1 0 14l 10 0 1 0 1 5 2

small 25 5 1 4 3 0 2 10 6m 0 0 1 5 1 2 1 0 0 0 1 0 0 0 4 0

skittle 142 3 0 2 0 0 9 19n 0 1 1 1 0 0 0 118 0 1 0 2 3 11 0 11 9

total (deletions) 4843 475 257 313 228 116 441 142 376 120 102 158 283 194 152 241 129 223 79 247 222 292 317 890 643

total (genome) 5059 526 259 317 232 116 452 144 379 120 106 159 283 198 154 253 132 228 79 250 225 304 321 914 763

aIncludes 15/29 genes annotated to glycosylphosphatidylinositol (GPI) anchor biosynthesis, a descendent of lipid metabolism ( p ¼ 1.75 " 1028).bIncludes 10/41 genes annotated to attachment of spindle to microtubules, a descendent of chromosome segregation ( p ¼ 0.001).cIncludes 11/26 genes annotated to histone deacetylation, a descendent of transcription ( p ¼ 0.00054).dIncludes 53/118 genes annotated to nuclear mRNA splicing, via spliceosome, a descendent of mRNA metabolism ( p ¼ 3.7 " 10227).eIncludes 5/6 subunits of the elongator complex involved in tRNA wobble uridine modification.fIncludes 11/25 genes annotated to the septation initiation signalling cascade ( p ¼ 0.00013), and 5/15 genes annotated to the stress activated protein. Kinase signalling cascade, 4/19 genes annotated to TOR signalling and 3/17 genes annotated to cAMP-mediated signalling (none enriched), all descendents of signalling.gIncludes 27/79 genes annotated to regulation of interphase, a descendent of regulation of the mitotic cell cycle ( p ¼ 1.23 " 1029).hIncludes 18/67 genes annotated to regulation of mitosis ( p ¼ 4.88 " 10211) and 6/27 genes annotated to attachment of spindle microtubules to kinetochore ( p ¼ 0.0351), descendents of regulation of the mitotic cell cycle.iIncludes 13/115 genes annotated to microtubule cytoskeleton, a descendent of cytoskeleton organization ( p ¼ 0.00706).jIncludes 4/9 genes annotated to Cdc42 signal transduction, a descendent of signalling ( p ¼ 0.006).kIncludes 7/52 genes annotated to actin cytoskeleton organization, a descendent of cytoskeleton organization ( p ¼ 8.09 " 1025).lIncludes 13/115 genes annotated to microtubule cytoskeleton organization, a descendent of cytoskeleton organization ( p ¼ 2.12 " 1028) and 5/5 genes annotated to gamma tubulin complex localization, a descendent of microtubule cytoskeleton organization ( p ¼ 3.11 " 1028).mIncludes 3/11 genes annotated to carbon catabolite repression of transcription, a descendent of signalling ( p ¼ 0.00484).nAll 19 genes involved in mitochondrial tRNA metabolism.

rsob.royalsocietypublishing.orgOpenBiol3:130053 5

on August 24, 2016http://rsob.royalsocietypublishing.org/

Dow

nloaded from

Tabl

e3.

GOce

llular

com

pone

nts

and

com

plexe

sfo

rall

phen

otyp

eca

tego

ries.

Sum

mar

yof

the

GOan

alysis

forc

ellula

rcom

pone

nts

enric

hed

with

inpa

rticu

larph

enot

ype

cate

gorie

s.Fo

rde

tails

see

§5.5.

2an

dele

ctron

icsu

pplem

enta

rym

ateria

l1,t

ables

S7a–

nan

dS1

4.Fo

rfur

ther

deta

ils,s

eeta

ble2

legen

d.Fo

otno

tes

arede

note

dby

a–u.

Red

colou

rden

otes

enric

hed

p!

0.001

;oran

gede

note

sm

odera

tely

enric

hed

p!

0.01;

light

oran

gede

note

swe

akly

enric

hed

p!

0.1;l

ightb

luede

note

sno

enric

hmen

tp!

1;bla

nkde

note

snum

bero

fgen

esis

0.

cellularlocations

andorganelles

totalgenenumbers

mitochondrion

ribosome

cytoplasm

nucleolus

nucleus/nuclearpart

microtubule

cytoskeleton

actincytoskeleton

Golgi/ER

celltip

celldivisionsite/part

membrane

cellsurface/cellwall

vacuole

macromolecular

complex

unknown

WT

3041

428

122

2474

135

1475

125

3649

710

515

287

1a99

109

688

97

spor

es18

414

1215

1b32

101

93

334

450

08

980

germ

inatio

n22

316

818

331

136

110

52c

48

671

11d

124

2

miss

E30

216

524

138

205e

269f

65g

1228

592

717

63

miss

V31

11

262

174

110

h3

410

11

141

miss

weak

V19

127

15i

158

1486

106

416

1560

76

824

long

HP34

613

823

5j44

290e

40k

512

813

233

023

55

long

LP13

64

184

2112

5e23

l0

60

514

01

101

0

long

Br31

00

173

26e

21

00

22

10

250

roun

ded

8914

m1

83n

242

74

23o

1227

283

433

0

stubb

y52

30

442

222

418

p8

1021

46

162

curve

d50

20

413

3715

q0

16r

82

00

262

small

254

021

6(5

)23

4s1

10

35

00

170

skitt

le14

213

561

t14

22

190

00

12

22u

00

720

tota

l(de

letion

s)48

4367

723

439

0133

626

0427

870

759

169

281

1234

121

153

1707

116

tota

l(ge

nom

e)50

5973

924

139

9234

026

4728

370

776

170

248

1280

127

156

1801

213

(Cont

inued

.)

rsob.royalsocietypublishing.orgOpen

Biol3:130053

6

on August 24, 2016http://rsob.royalsocietypublishing.org/Downloaded from

Tabl

e3.

(Cont

inued

.)a In

clude

s164

/193

gene

sann

otate

dto

plasm

am

embr

ane

(p¼

1.74!

102

9 )and

661/

914

gene

sann

otate

dto

intrin

sicto

mem

bran

e(p¼

5.45!

102

9 ),bo

thde

scend

ents

ofm

embr

ane.

b Inclu

des7

/8su

bunit

sofc

hape

ronin

cont

aining

T-co

mple

x(p¼

1.29!

102

7 )and

4/5

subu

nitso

feuk

aryot

ictra

nslat

ionini

tiatio

nfac

tor2

Bco

mple

x(p¼

0.001

).c In

clude

s5/8

subu

nitso

fCOP

Icoa

ted

vesic

lem

embr

ane

(p¼

0.001

).d In

clude

s8/1

4su

bunit

sofv

acuo

larpr

oton

-tran

spor

ting

V-ty

peAT

Pase

com

plex

(p¼

7.26!

102

6 ).e Se

eta

ble4

forb

reakd

own

ofnu

clear

com

plexe

s.f In

clude

s5/8

subu

nitso

fArp

2/3

prot

einco

mple

x(p¼

0.009

22).

g Inclu

des5

/7su

bunit

sofo

ligos

acch

aryltr

ansfe

rase

com

plex

(p¼

0.003

),3/

3su

bunit

sofg

lycos

ylpho

spha

tidyli

nosit

ol-N-

acet

ylgluc

osam

inyltr

ansfe

rase

(GPI

-GnT

)(p¼

0.050

68),

6/11

subu

nitso

fTRA

PPco

mple

x(p¼

0.004

17)a

nd4/

4su

bunit

sGAR

Pco

mple

x(p¼

0.003

13).

h Inclu

des2

/4su

bunit

sofA

P-1

adap

torc

omple

x(p¼

0.020

77).

i Inclu

des1

2/81

gene

sann

otate

dto

cyto

solic

large

ribos

omal

subu

nit(p¼

0.009

36),

ade

scend

ento

fribo

som

e.j In

clude

s5/6

subu

nitso

felon

gato

rholo

enzy

me

com

plex

(p¼

0.002

)and

5/5

subu

nitso

fRNA

cap

bindin

gco

mple

x(p¼

0.000

4).

k Inclu

des3

2/21

1ge

nesa

nnot

ated

tosp

indle

pole

body

(p¼

0.006

),a

desce

nden

tofm

icrot

ubule

cyto

skele

ton.

l Inclu

des2

0/21

1ge

nesa

nnot

ated

tosp

indle

pole

body

(p¼

0.000

14),

ade

scend

ento

fmicr

otub

ulecy

tosk

eleto

n.m

Inclu

des2

/4su

bunit

sofm

itoch

ondr

ialso

rting

and

asse

mbly

mac

hiner

yco

mple

x(p¼

0.224

96).

n Inclu

des2

/2su

bunit

sofe

RF1

met

hyltr

ansfe

rase

com

plex

(p¼

0.038

40).

o Inclu

des4

/8su

bunit

sofm

anno

syltr

ansfe

rase

com

plex

(p¼

0.000

81).

p Inclu

des8

/194

gene

sann

otate

dto

ERm

embr

ane

(p¼

0.15)

,ade

scend

ento

fGolg

i/ER.

q Inclu

des4

/10

gene

sann

otate

dto

equa

toria

lMTO

C(p¼

0.000

15)a

nd8/

51ge

nesa

nnot

ated

tosp

indle

pole

body

(p¼

0.1),

desce

nden

tsof

micr

otub

ulecy

tosk

eleto

n.r In

clude

s2/2

subu

nitso

ftea

1ce

llen

dco

mple

x(p¼

0.008

).s In

clude

s4/2

11ge

nesa

nnot

ated

tosp

indle

pole

body

(p¼

1),a

desce

nden

tofm

icrot

ubule

cyto

skele

ton.

t All6

1ge

nese

ncod

esu

bunit

sofm

itoch

ondr

ialrib

osom

e(6

1/70

subu

nitsp¼

3.98!

102

88).

u Inclu

des2

0/17

3ge

nesa

nnot

ated

tom

itoch

ondr

ialm

embr

ane

(p¼

6.38!

102

6 ),a

desce

nden

tofm

embr

ane.

rsob.royalsocietypublishing.orgOpen

Biol3:130053

7

on August 24, 2016http://rsob.royalsocietypublishing.org/Downloaded from

that generate elongated cells when deleted have beenreported and annotated in PomBase (http://www.pombase.org/) [24]. To validate this qualitative visual approach toidentify new cell cycle genes, we compared these 158 genes(see the electronic supplementary material 1, table S8a) withthe 513 fission yeast cell cycle genes from this screen (seethe electronic supplementary material 1, table S5g–i) andfound that 147 of the 158 genes were also identified in ourscreen (see figure 2b and electronic supplementary material 1,table S9). The 366 genes not previously reported aselongated when deleted included 90 genes with an existingcell cycle GO annotation and 276 genes with no previouslyknown cell cycle role (see the electronic supplementarymaterial 1, tables S8b and S8c). The majority of these 276genes (230/276) are annotated to GO processes which havepreviously been linked to the cell cycle, suggesting that the276 genes are true positives and identify new cell cyclegenes. The 230 new genes involved in cell-cycle-related pro-cesses included genes required for ribosome biogenesis,splicing and nucleotide metabolism (table 5). For example,seven genes (dfr1, adk1, hpt1, dea2, dut1, dcd1, tmp1) areconcerned with various nucleotide metabolism pathways.Two previously identified cell cycle genes, budding yeastCDC8 (tmp1 orthologue) [25], and the fission yeastcdc22 (ribonucleotide reductase) [26] are also required fornucleotide metabolism. Given the cell cycle role of thesetwo genes, the other genes identified here may also be impor-tant for maintaining the nucleotide levels needed for cellcycle progression.

The remaining 46 genes include 17 genes unstudied inany organism (nine of which are conserved in humans),Ta

ble

4.A

sum

mar

yof

the

GOan

alysis

for

nucle

arco

mple

xes

enric

hed

with

inth

e3

long

phen

otyp

eca

tego

ries.

For

deta

ils,s

ee§5

.5.2

and

electr

onic

supp

lemen

tary

mate

rial1

,tab

leS1

4.Fo

rfu

rther

deta

ils,s

eeta

ble2

legen

d.Fo

otno

tesa

rede

note

dby

a–d.

Red

colou

rden

otes

enric

hed

p!

0.001

;oran

gede

note

smod

erate

lyen

riche

dp!

0.01;

light

oran

gede

note

swea

klyen

riche

dp!

0.1;l

ightb

lueis

noen

richm

ent

p!

1;bla

nkde

note

snum

bero

fgen

esis

0. enric

hed

nucle

ar

com

plex

es

repl

icatio

n-re

late

dch

rom

osom

ese

greg

atio

nki

neto

chor

e-re

late

dtra

nscr

iptio

nnu

cleol

arreplicationfork

MCMcomplex

DNApolymerase

Ctf18-RFC

replicationpreinit

ORC

APC

proteasome

Smc5-6&loading

condensin

cohesin

kinetochore

Mis6-Sim4

NMS/Ndc80

DASHcomplex

RNApolymerase

RNApolIIholo.

mediatorcomplex

SAGAcomplex

Set1complex

Rpd3Scomplex

MBFcomplex

Swr1complex

Ino80complex

ASTRAcomplex

splicosomal

preribosome

RNaseP(nucleolar)

long

HP24

66

611

12

23

01

20

00

29

10

410

50

10

39a

81

long

LP10

b0

00

95

82

74

320

86

01

10

00

00

11

07c

70

long

Br0

00

00

00

00

00

00

00

05d

54

00

00

00

01

0to

tals

ubun

its64

610

1324

613

4010

59

6614

1010

1755

2219

816

715

157

9063

9a In

clude

s13/

26su

bunit

sofU

4/U6"

U5tri

-snRN

Pco

mple

x(p¼

9.18"

102

7 ),6/

7sub

units

ofU6

SnRN

P(p¼

0.000

17)a

nd6/

6su

bunit

sU2

snRN

P(p¼

0.074

7).

b Inclu

des4

/4su

bunit

sofG

INS

com

plex

(p¼

7.14"

102

5 ).c In

clude

s4/1

5su

bunit

sofU

1sn

RNP

(p¼

0.076

60),

3/6

subu

nitso

ft-U

TPco

mple

x(p¼

0.048

86)a

nd6/

7su

bunit

sofU

6Sn

RNP

(p¼

0.000

17).

d Allfi

vege

nese

ncod

esu

bunit

sofh

oloTF

IIHco

mple

x(5

/10

subu

nitsp¼

1.08"

102

7 ).

Table 5. GO process annotations for 276 novel cell cycle genes. These 276genes were not previously known to be involved in the cell cycle in fissionyeast. Two hundred and thirty genes are annotated to other GO processesthat have accepted links to the cell cycle in fission yeast and 29 genes hada GO process annotation not related to the cell cycle. Only 17 genes wereof completely unknown function.

process in fission yeast no. genes

GO processes cell-cycle-related 230

nucleocytoplasmic transport 13

mRNA metabolic process and splicing 73

ribosome biogenesis and cytoplasmic translation 37

Transcription 78

DNA repair, recombination, telomere

maintenance

14

vesicle-mediated transport 5

modification by small molecule conjugation 3

nucleotide metabolism 7

GO processes not cell-cycle-related 29

small molecule metabolic pathways 13

miscellaneous 16

unknown process 17

fission-yeast-specific 4

fungal-specific 4

conserved to humans 9

rsob.royalsocietypublishing.orgOpen

Biol3:130053

8

on August 24, 2016http://rsob.royalsocietypublishing.org/Downloaded from

and 29 genes that have existing GO annotations to processesor pathways not previously linked to the cell cycle. Of these,13 genes are involved in a number of different metabolicpathways, including amino acid, carbohydrate and phospho-lipid metabolism. We investigated whether any of thesegenes had genetic or physical interactions with genes impli-cated in the cell cycle using the BioGRID Interactiondatabase [27]. Several genes showed such interactions (seethe electronic supplementary material 1, table S10). Forexample, a predicted pyruvate decarboxylase SPAC1F8.07cinteracts with a wee gene zfs1 [21,28]. It is possible thatthese 13 metabolic cell cycle genes may act as regulatorylinks between small molecule biosynthesis pathways andthe cell cycle.

3.5. Comparison with a human cell cycle gene setTo examine the overlap between cell cycle genes in fissionyeast and human, we identified a set of 521 human genes pro-posed to be involved in the cell cycle [12], and which have afission yeast orthologue (see the electronic supplementarymaterial 1, table S11a and §5.5.3). The 521 fission yeast ortho-logues of these human genes were compared with 614 fissionyeast genes with an existing mitotic cell cycle annotation,including all genes so far annotated to the mitotic cell cycleeither by inference or experiment (see the electronic sup-plementary material 1, table S11b and §5.5.1; http://www.pombase.org/). There were 113 genes common to both genesets (see figure 2c and electronic supplementary material 1,table S11c). We also compared the 521 gene set with the 276new cell cycle genes from this study and identified a further43 genes in common (see figure 2c and electronic supplemen-tary material 1, table S11d). Therefore, in total, 156 of the 521conserved genes were involved in the cell cycle in both organ-isms (29.9%), a similar level to other inter-species comparisons,human/worm at 36 per cent and human/fly at 38 per cent[12,14,29,30]. Possible reasons why these inter-species com-parisons in a variety of studies show such a low overlap areconsidered in §4.

3.6. Cell cycle checkpoint genesTo identify new DNA checkpoint genes, we screened 2983viable gene deletion mutants for those that failed to block thecell cycle in the presence of the ribonucleotide reductase inhibi-tor HU [31–33]. We further screened these HU-sensitivemutants for those with a cut phenotype [34], where cells failto block cell cycle progress and enter mitosis generatingchromosome segregation defects. We identified 132 mutantssensitive to at least 5 mM HU (see §5.2 and electronic sup-plementary material 1, table S12). In the presence of HU,deletion mutants of eight genes had greater than 60 per centcells with a cut phenotype (see the electronic supplementarymaterial 1, table S12, column O). Of these, deletion mutantsof hus1, rad1, rad3, rad26, rad17 and rad9, which have been pre-viously been shown to be required for establishment of theDNA checkpoint, did not elongate prior to entering mitosis[33]. Deletion mutants of the remaining two genes (ddb1,lem2) initially became elongated but eventually entered mitosisand displayed a cut phenotype (figure 3a). Ddb1 is necessaryfor stabilizing DNA replication forks and is involved in regulat-ing the replication checkpoint kinase Cds1 [35], and a lem2mutant has previously been shown to be sensitive to HU

[36]. Mutants of a further 24 genes showed a lower level ofcut cells (between 20 and 60%) after 10 h in HU (see the elec-tronic supplementary material 1, table S12, column P). Ofthese, nine genes (nup132, nup40, did4, spc34, vps24, ubr1,mde4, utp16 and ers1) are newly identified as HU-sensitivegenes with a cut phenotype.

In this study, we have identified one new S phase check-point gene, lem2, which has a strong cut deletion phenotype,and nine genes with a lower penetrance cut deletion pheno-type, which may influence maintenance of the S phasecheckpoint (figure 3b).

3.7. Cell shape mutantsCell shape mutants other than those with a long cell pheno-type have been used to identify genes important forgenerating the normal rod-shape of a fission yeast cell. Pre-vious studies in fission yeast have identified orb mutantsthat are spherical because cells fail to grow in a polarizedmanner, ban (banana) mutants that have a curved or bentcell phenotype because cells no longer orientate the growthzones at 1808 along the long axis of the cell, and tea (tipelongation aberrant) mutants, which form a new growthzone in the wrong place often at 908 to the long axis of thecell [2,3]. Other cell shape mutants included bottle- or

systematic ID

new checkpointgene

% cutafter

10 h in HU

SPAC18G6.10 lem2 88

SPAC1805.04 nup132 26

SPAC19E9.01c nup40 27

SPAC4F8.01 did4 27

SPAC8C9.17c spc34 23

SPAC9E9.14 vps24 28

SPBC19C7.02 ubr1 24

SPBC6B1.04 mde4 28

SPBP8B7.10c 22

SPCC1393.05 ers1 20

(b)

(a)

Figure 3. New HU-sensitive cut genes. (a) lem2 deletion mutant cells stainedwith DAPI after growing for 8 – 10 h in the presence of HU. Examples of anu-cleated cells can be seen (white arrow) and cells with unequally segregatedchromatin (red arrow). (b) New checkpoint genes identified in this study.

rsob.royalsocietypublishing.orgOpen

Biol3:130053

9

on August 24, 2016http://rsob.royalsocietypublishing.org/Downloaded from

skittle-shaped cells and more generally misshapen cells [37].We screened the deletion mutants for shape defects usingthe following seven categories to describe the mutant pheno-types (see figure 1 and electronic supplementary material 1,table S5d–f,j– l,n for gene lists). (i) rounded, which includesthe typical orb mutants and also mutants that are morerounded than WT but not completely spherical. (ii) stubby,which look shorter and wider than WT but are mainly rod-shaped. These two categories showed a degree of overlap,with some mutants showing both phenotypes. (iii) curved,which includes the ban mutants and the tea mutants.During vegetative growth, tea mutants have only a lowlevel of T-shaped cells with most cells having a curved phe-notype. (iv) skittle: one end of cell the is wider than theother end. A total of 333 genes (6.9% of total genes)showed these specific alterations in cell shape when deletedand are thus important for the generation of normal cellshape. A further less well-defined group called misshapenshowed an ill-defined potato-like shape or a mixture ofother shapes. These fell into three further subgroups:(v) viable misshapen mutants (miss V), (vi) viable misshapenmutants, which have a weak phenotype (miss weak V) and(vii) essential misshapen mutants (miss E). There were 524genes with these more general misshapen deletion pheno-types. In total, 857/4843 genes (17.7%) showed altered cellshape when deleted (table 1) and 668 of these are conservedin humans (77.9%). No additional cell shape phenotypeswere identified compared with earlier work, suggesting thatthere may only be a restricted number of shapes that a fissionyeast cell can adopt.

The rounded, stubby and curved sets are all enriched forgenes implicated in cell polarity and for localization at thecell tip (tables 2 and 3), rounded and stubby sets for cell wallorganization, and the stubby set for cytokinesis and actin cyto-skeleton organization (table 2, footnote k). All 14 genes in thecurved set annotated to the cytoskeleton organization categoryare involved in microtubule cytoskeleton organization (seetable 2, footnote l) and are also enriched at the cell tip (table 3).Therefore, we predict that unknown genes with a curveddeletion phenotype when deleted are likely to be involved inmicrotubule-related processes. Similarly, the stubby pheno-type is likely to be associated with genes that affect actinprocesses. Genes which generate a skittle phenotype whendeleted were enriched for mitochondrial organization;nearly, 50 per cent (118/241) of the total genes annotated tomitochondrion organization were in the skittle category. Wefound that 19 genes were required for mitochondrial tRNAmetabolism (table 2) and 61 genes for the mitochondrial ribo-some, suggesting that mitochondrial translation underlies theskittle phenotype. The miss E category was enriched for genesrequired for lipid metabolism (35 genes), 15 of which areinvolved in glycosylphosphatidylinositol (GPI) anchor biosyn-thesis (table 2, footnote a; electronic supplementary material 1,table S6d ). GPI anchor proteins affect cell wall integrity, andloss of these proteins can result in a misshapen cell phenotype[38]. In higher eukaryotes, GPI anchor proteins have beenimplicated in the sorting of membrane proteins importantfor cell polarization [39], and so, in fission yeast, GPI anchorproteins could also be implicated in cell polarization.

To investigate whether gene deletions that cause cell shapechanges also have defects in the cytoskeleton, bipolar growthpattern or the cell wall, we analysed 54 previously uncharacter-ized viable cell shape mutants. These were a subset of 352

non-essential genes in the miss V, miss weak V, rounded, stubby,curved and skittle categories (table 1). We found that a total of35 strains had defects in the cytoskeleton and/or the cell wall,bipolar growth or cell separation defects and included 26mutants with actin or microtubule defects (see figure 4 andelectronic supplementary material 1, table S13 and electronicsupplementary material 2 for details). These results indicatethat further screening of viable deletion mutants, even thosewith a weak phenotype, will be a useful approach to identifyand characterize genes involved in the cytoskeleton.

4. DiscussionWe have visually screened 4843 gene deletion mutants in fis-sion yeast, 95.7 per cent of all protein coding genes, and haveidentified near genome-wide sets of the genes required forthe cell cycle and cell shape, the first systematic descriptionfor a eukaryote. The long cell phenotype in fission yeastdefines cells blocked in cell cycle progression during inter-phase or cytokinesis and so is an effective way to identifygenes required for these stages of the cell cycle. GO enrich-ment analysis has shown that the long HP and long Brcategories were enriched for genes previously identified asbeing required during interphase or cytokinesis respectively.The long LP set included genes previously known to berequired during mitosis and we suggest that these genesmay represent a subgroup of mitotic genes that haveadditional roles during interphase. Several genes requiredfor the cell cycle were found in the miss E gene set suggestingthat further analysis of this set may also identify new genesrequired during interphase or mitosis.

We identified 513 cell cycle genes in total, 276 of whichwere not previously known to have a role in the cell cycle.Of these new genes, 230 were annotated to GO processes pre-viously implicated in the cell cycle, thus identifying new linksbetween the cell cycle and these processes. These includedgenes required for nucleocytoplasmic transport, mRNA meta-bolic process (specifically splicing), ribosome biogenesis andnucleotide metabolism (table 5). There were 46 new cell cyclegenes not annotated to a process previously associated withthe cell cycle, 13 of which are involved in small moleculemetabolism. Frequently, only one or two genes were identifiedfor a specific metabolic pathway. For example ect1 is predictedto encode ethanolamine-phosphate cytidylyltransferase, whichis rate-limiting for synthesis of CDP-ethanolamine, an impor-tant step in phospholipid biosynthesis [40]. We speculate thatthese genes may encode proteins linking different aspects ofmetabolism, such as metabolite levels or flux, to the cell cycle.

Studies using RNAi in metazoan organisms have ident-ified sets of genes required for cell cycle progression.However, these overlap only to a limited extent between var-ious intra- and inter-species comparisons in a range from 10to 38 per cent [12]. Our analysis comparing cell cycle genesin fission yeast and human identified 156/521 orthologousgene pairs (29.9%) involved in the cell cycle in both organ-isms, a similar percentage overlap to that found in otherinter-species studies. A possible reason for the rather limitedoverlap of cell cycle genes in a wide range of studies may bebecause gene knockdowns using RNAi can result in varyinglevels of gene product and thus more variation in cellularphenotype. Our analysis is based on gene deletions that gen-erally eliminate the entire gene function, thus reducing

rsob.royalsocietypublishing.orgOpen

Biol3:130053

10

on August 24, 2016http://rsob.royalsocietypublishing.org/Downloaded from

phenotypic variability. Inter-species comparisons may also belimited because different phenotypes may be used in differ-ent organisms to infer a specific cell cycle defect, and theseare not always directly comparable. A cell cycle defectduring interphase in fission yeast produces an easily identifi-able highly consistent long cell phenotype, which can be usedto reliably identify interphase cell cycle genes. Many of thecell cycle genes identified in fission yeast have a conservedcell cycle role in other eukaryotes and, so the long cell cyclegene set identified in this study is likely to be useful touncover additional cell cycle genes conserved across species.

To catalogue genes required to generate and maintain thecorrect cell shape, we identified categories of deletion mutantswith specific cell shape defects. GO analysis showed that geneswithin these groups could be used to identify genes implicatedin the actin cytoskeleton (stubby), the microtubule cytoskeleton

(curved) and mitochondrial function (skittle). The skittle cat-egory suggests that there is an uncharacterized mechanisminfluencing cell shape, which is affected by defects in mito-chondrial organization and translation. The distribution ofmitochondria in a cell is dependent on microtubules [41],and binding of mitochondria to microtubules can moderatemicrotubule dynamics [42], raising the possibility that defec-tive mitochondria may affect microtubules thus leading tocell shape changes. In humans, a number of diseases includ-ing deafness and muscle pathologies are linked to defects inmitochondrial protein synthesis [43]. The link we have ident-ified in this study between mitochondria and cell shapesuggests that cell shape changes could be an underlyingcause of some of these pathologies in humans. We alsoshowed that some genes from the miss weak V set, althoughonly exhibiting a mild shape change when deleted, have

DIC CF actin tubulin

wild-type

rounded

curved

stubby

5 µm

5 µm

5 µm

5 µm

5 µm

missweak

(a)

(b)

(c)

(d)

(e)

Figure 4. Cytological analysis of novel cell shape mutants. Examples of viable mutants from four cell shape phenotype categories analysed for defects in thecytoskeleton, growth pattern or cell wall. (a) Wild-type. (b) meu29D. (c) yaf9D. (d ) spc2D. (e) tlg2D. DIC, differential interference contrast; CF, calcofluorused to stain the cell wall and septum. For details, see the electronic supplementary material 2 and electronic supplementary material 1, table S13.

rsob.royalsocietypublishing.orgOpen

Biol3:130053

11

on August 24, 2016http://rsob.royalsocietypublishing.org/Downloaded from

cytoskeletal defects. This suggests that genes with this deletionphenotype will be a good source of new genes affecting theactin and microtubule cytoskeleton.

A limited range of defined cell shapes were identified inthe genome-wide gene deletion screen. It appears that onlya restricted range of cell shapes is possible for the fissionyeast cell, perhaps reflecting topological constraints in cellu-lar organization, related to the cell wall or the cytoskeleton,for example. Future comparisons with other organisms thathave been screened for cell shape defects [10,11,13] willhelp identify the genes and processes required to generateor maintain eukaryotic cell shape.

Our work provides the first near genome-wide sets ofgene deletions that influence the eukaryotic cell cycle andcell shape. This qualitative classification of genes accordingto cell shape phenotypes, based on deletion mutants, pro-vides a resource that will be a good starting point forfurther studies in fission yeast and for the identification ofequivalent gene functions in other eukaryotic organisms.

5. Material and methods5.1. Phenotype analysis of the genome-wide set of

gene deletionsThe 4843 deletion strain collection used for this analysisconsists of 4825 strains described by Kim et al. [7] plus 18additional gene deletion mutants as shown in the electronicsupplementary material 1, table S2. Changes in gene dispen-sability from Kim et al. [7] for nine reconstructed strains and12 re-analysed strains are shown in the electronic supplemen-tary materia 1, table S3. All growth conditions and mediawere used as described by Moreno et al. [44], unless other-wise stated. Spores were generated as described for genedispensability analysis [7]. All strains were coded, and ablind analysis was conducted. Between two and four isolatesfor each heterozygous diploid deletion mutant were indepen-dently sporulated. A visual examination of the phenotypes ofboth deletion G418-resistant spores and WT G418-sensitivespores following free spore analysis was carried out after 1and 2 days following plating on non-selective YES plates at258C and 328C. The presence of both G418-sensitive and-resistant spores allowed a comparison of the deletionmutant with WT. Any phenotypic differences from WT thatcould be detected by eye were described as the putative del-etion phenotype. After 2 days, colonies were replica platedonto YES plates containing G418 (Sigma) at 258C and 328Cto confirm the gene deletion phenotype by linkage to theG418-resistant phenotype.

The final deletion phenotype categories for a genome-wide set of genes were generated as follows, a GO analysisof the long group was compared with a GO analysis of thesubdivisions long HP, long LP and long Br. These subdivisionsformed biologically significant subgroups of the long groupand so these three categories were used for further analysis.The same type of GO analysis was used for the roundedand stubby groups as these formed biologically distinct cat-egories, although there is also overlap between thephenotypes of these two categories. The misshapen groupwas divided into essential and non-essential genes, becauseviable mutants may be more useful for identifying genesrequired for cell shape where as the misshapen phenotype

observed in the essential group may be less specific for acell shape defect given that the cells are dying or dead. Theremaining categories WT, spores, germination, skittle, curvedand small could not be usefully further subdivided by theirphenotype.

To estimate the minimal cell length increase detectable, wemeasured the cell length of 34 viable gene deletion mutantsdescribed as long after visual screening on plates and whichhad not previously been implicated in the cell cycle (see theelectronic supplementary material 1, table S1 column J). Wecould detect cells at least 10 per cent or longer comparedwith WT (approx. 15.6 mm) and, so a 10 per cent or greaterincrease in cell length was used as the criteria for a long phe-notype. The cut-off between high penetrance and lowpenetrance was 30 per cent long cells. This was estimatedusing inviable mutants that formed microcolonies showing amixture of long and WT/short cells. For these mutants,30 per cent or less of the cells had a long phenotype.

To validate this approach, we compared the 513 genesfrom the three long categories with 158 cell cycle genesreported in PomBase as long. We identified 147/158 (93%)of these genes, suggesting that the remainder of the genesin our long category are also likely to be involved in the cellcycle. Furthermore, only 46/513 were not annotated to aGO category previously linked to the cell cycle.

Cells showing different cell shape defects were photo-graphed using a Zeiss Axioskop microscope with a CF planX50/0.55 objective and a Panasonic DMC-LX2 camera.Spores from representative strains were plated on to YESsolid medium and allowed to germinate or form small coloniesbefore being photographed.

5.2. Screen for new DNA checkpoint genesThe growth of 2983 viable deletion mutants from Bioneer ver-sion 1 were screened on YES agar plates for 24–48 h either inthe presence of 2.75 or 5.5 mM HU or without HU and scoredon a scale of strong (þþþ), medium (þþ), weak (þ) or no sen-sitivity, depending on their ability to grow on different HUconcentrations compared with no HU (see the electronic sup-plementary material 1, table S12). To check whether the 132HU-sensitive mutants were also involved in the DNA check-point preventing mitosis, cells were grown in liquid cultureswith 11 mM HU and screened for a cut phenotype using4’,6-diamidino-2-phenylindole (DAPI) to visualize the nucleus.

5.3. Cell length measurementsCells were grown to mid-exponential growth (2" 106 to 1 "107cells ml21) in YES liquid medium at 328C (or 258C whereappropriate) and photographed using a Zeiss Axioplan micro-scope with 100" objective and a COHU CCD camera. Celllengths of 30 septated cells were measured using IMAGEJ.

5.4. Phenotypic analysis and cytoskeleton analysis ofviable shape mutants

For the initial characterization, cells were grown at 188C, 25.58C,298C and 348C on minimal and YES medium plates, and cellmorphology analysed by differential interference contrast(DIC) microscopy. For further characterization, strains showingmorphological defects were grown in liquid rich medium at

rsob.royalsocietypublishing.orgOpen

Biol3:130053

12

on August 24, 2016http://rsob.royalsocietypublishing.org/Downloaded from

258C to mid-log phase or in the conditions at which each strainshowed the strongest phenotype by DIC microscopy. Septaand pattern of cell growth was visualized with 35 mg ml21

calcofluor staining (fluorescent brightener; Sigma). For WTcells, the septation index was 15 per cent (n¼ 500 cells)and 30.1 per cent of cells showed monopolar growth (n¼ 300cells). Nuclei were visualized with 0.2 mg ml21 DAPI(Sigma) or 100 mm ml21 IP (Sigma) staining. Actin stainingwas as described by Pelham & Chang [45] using AlexaFluor488-phalloidin (Molecular Probes). For anti-tubulin immuno-fluorescence, cells were fixed in methanol at 2808C andfurther processed as described by Hagan & Hyams [46]. Primaryantibodies were anti-tubulin ((TAT-1; 1 : 80 dilution) followed byAlexa 488 goat anti-mouse secondary antibody (MolecularProbes). Microscopy was performed at 23–258C, either with anAxioplan 2 microscope (Carl Zeiss, Inc.) equipped with a Cool-snapHQ camera (Roper Scientific) or with a Leica TCS SLconfocal microscope. Data were acquired using the 100!objective taking seven z-sections with 0.5 mm spacing.

5.5. Bioinformatics analysis

5.5.1. Identification of genes already implicated in mitotic cellcycle processes

To identify genes involved in the mitotic cell cycle in fissionyeast, we used fission yeast GO data from 26 September2011 (http://www.pombase.org/), and selected the set ofprotein coding genes annotated to:

GO:0000278 mitotic cell cycle,GO:0000910 cytokinesis,GO:0006261 DNA-dependent DNA replication, andGO:0000075 cell cycle checkpoint.Minor adjustments were made to this dataset to remove

three known false positives (SPAC343.17c, SPBC19F8.02,SPAC5D6.08c) and add three known false negatives(SPAC23H4.11c, SPAC26A3.03c SPAC23H4.18c). Thecomplete list is provided in the electronic supplementarymaterial 1, table S11b.

5.5.2. Gene Ontology enrichment analysis

GO enrichments were performed using GO term finder(http://go.princeton.edu/cgi-bin/GOTermFinder with ontol-ogy and annotations from 26 September 2011). Thresholdp-values of 0.001, 0.01 and 0.1 were used to identify specificenrichments. Bonferroni correction was used.

GO slim categories presented in tables 2, 3 and 4 refer tothe GO IDs in the electronic supplementary material 1, tableS14. All GO terms and p-values for each phenotype set areprovided in the electronic supplementary material 1, tablesS6 and S7. Genes where number of annotations ¼ 1 wereobtained using GO term Mapper (http://go.princeton.edu/cgi-bin/GOTermMapper).

The background set was the 4843 gene set used this study.

5.5.3. Comparison with human cell cycle genes

A list of 1351 human cell cycle genes was extracted from astudy by Kittler et al. [12] and mapped to current EnsemblIDs (79 of the human identifiers had been retired and wereno longer linked to extant genes). The remainder weremapped to 521 fission yeast orthologues using EnsemblCompara 10/11/2011 (http://genome.cshlp.org/content/19/2/327.long).

6. AcknowledgementsThis work was supported by Cancer Research UK, the Well-come Trust, Breast Cancer Research Foundation, KRIBB, NRFgrants (nos. 2012M3A9D1054666 and 2011-0016688) from theKorea Ministry of Science, ICT & Future Planning (MSIP). Weare very grateful to Midori Harris, Mark McDowall, KimRutherford and Juan-Juan Li for providing help with theanalysis. We are also indebted to all members of the CellCycle Laboratory, particularly Francisco Navarro, for readingof the manuscript and helpful suggestions. There are noconflicts of interest resulting from this work.

References1. Gomez EB, Forsburg SL. 2004 Analysis of the fission

yeast Schizosaccharomyces pombe cell cycle.Methods Mol. Biol. 241, 93 – 111. (doi:10.1385/1-59259-646-0:93)

2. Snell V, Nurse P. 1994 Genetic analysis of cellmorphogenesis in fission yeast: a role for caseinkinase II in the establishment of polarized growth.EMBO J. 13, 2066 – 2074.

3. Verde F, Mata J, Nurse P. 1995 Fission yeast cellmorphogenesis: identification of new genes andanalysis of their role during the cell cycle.J. Cell Biol. 131, 1529 – 1538. (doi:10.1083/jcb.131.6.1529)

4. Mitchison JM. 1957 The growth of singlecells. I. Schizosaccharomyces pombe. Exp. Cell Res.13, 244 – 262. (doi:10.1016/0014-4827(57)90005-8)

5. Toda T, Umesono K, Hirata A, Yanagida M. 1983Cold-sensitive nuclear division arrest mutants of thefission yeast Schizosaccharomyces pombe. J. Mol.

Biol. 168, 251 – 270. (doi:10.1016/S0022-2836(83)80017-5)

6. Nurse P, Thuriaux P, Nasmyth K. 1976 Geneticcontrol of the cell division cycle in the fission yeastSchizosaccharomyces pombe. Mol. Gen. Genet. 146,167 – 178. (doi:10.1007/BF00268085)

7. Kim DU et al. 2010 Analysis of a genome-wide setof gene deletions in the fission yeastSchizosaccharomyces pombe. Nat. Biotechnol. 28,617 – 623. (doi:10.1038/nbt.1628)

8. Giaever G et al. 2002 Functional profiling of theSaccharomyces cerevisiae genome. Nature 418,387 – 391. (doi:10.1038/nature00935)

9. Winzeler EA et al. 1999 Functional characterizationof the S. cerevisiae genome by gene deletion andparallel analysis. Science 285, 901 – 906. (doi:10.1126/science.285.5429.901)

10. Kiger AA, Baum B, Jones S, Jones MR, Coulson A,Echeverri C, Perrimon N. 2003 A functional genomic

analysis of cell morphology using RNA interference.J. Biol. 2, 27. (doi:10.1186/1475-4924-2-27)

11. King IN, Qian L, Liang J, Huang Y, Shieh JT, Kwon C,Srivastava D. 2011 A genome-wide screen reveals arole for microRNA-1 in modulating cardiac cellpolarity. Dev. Cell 20, 497 – 510. (doi:10.1016/j.devcel.2011.03.010)

12. Kittler R et al. 2007 Genome-scale RNAi profiling ofcell division in human tissue culture cells. Nat. CellBiol. 9, 1401 – 1412. (doi:10.1038/ncb1659)

13. Liu T, Sims D, Baum B. 2009 Parallel RNAi screensacross different cell lines identify generic and celltype-specific regulators of actin organization andcell morphology. Genome Biol. 10, R26. (doi:10.1186/gb-2009-10-3-r26)

14. Mukherji M et al. 2006 Genome-wide functionalanalysis of human cell-cycle regulators. Proc. NatlAcad. Sci. USA 103, 14 819 – 14 824. (doi:10.1073/pnas.0604320103)

rsob.royalsocietypublishing.orgOpen

Biol3:130053

13

on August 24, 2016http://rsob.royalsocietypublishing.org/Downloaded from

15. Neumann B et al. 2010 Phenotypic profiling of thehuman genome by time-lapse microscopy revealscell division genes. Nature 464, 721 – 727. (doi:10.1038/nature08869)

16. Gray FC, MacNeill SA. 2000 The Schizosaccharomycespombe rfc3þ gene encodes a homologue of thehuman hRFC36 and Saccharomyces cerevisiae Rfc3subunits of replication factor C. Curr. Genet. 37,159 – 167. (doi:10.1007/s002940050514)

17. Hofmann JF, Beach D. 1994 cdt1 is an essentialtarget of the Cdc10/Sct1 transcription factor:requirement for DNA replication and inhibition ofmitosis. EMBO J. 13, 425 – 434.

18. Nishitani H, Nurse P. 1995 p65cdc18 plays a majorrole controlling the initiation of DNA replication infission yeast. Cell 83, 397 – 405. (doi:10.1016/0092-8674(95)90117-5)

19. Reynolds N, Fantes PA, MacNeill SA. 1999 A key rolefor replication factor C in DNA replication checkpointfunction in fission yeast. Nucleic Acids Res. 27,462 – 469. (doi:10.1093/nar/27.2.462)

20. Saka Y, Fantes P, Sutani T, Mcinerny C, Creanor J,Yanagida M. 1994 Fission yeast cut5 links nuclearchromatin and M phase regulator in the replicationcheckpoint control. EMBO J. 13, 5319 – 5329.

21. Navarro FJ, Nurse P. 2012 A systematic screenreveals new elements acting at the G2/M cell cyclecontrol. Genome Biol. 13, R36. (doi:10.1186/gb-2012-13-5-r36)

22. Esakova O, Krasilnikov AS. 2011 Of proteins andRNA: the RNase P/MRP family. RNA 16, 1725 –1747. (doi:10.1261/rna.2214510)

23. Jorgensen P, Nishikawa JL, Breitkreutz B-J, Tyres M.2002 Systematic identification of pathways thatcouple cell growth and division in yeast.Science 297, 395 – 400. (doi:10.1126/science.1070850)

24. Wood V et al. 2012 PomBase: a comprehensiveonline resource for fission yeast. Nucleic Acids Res.40, D695 – D699. (doi:10.1093/nar/gkr853)

25. Jong AY, Campbell JL. 1984 Characterization ofSaccharomyces cerevisiae thymidylate kinase, theCDC8 gene product. General properties, kineticanalysis, and subcellular localization. J. Biol. Chem.259, 14 394 – 14 398.

26. Fernandez Sarabia MJ, Mcinemy C, Harris P, GordonC, Fantes P. 1993 The cell cycle genes cdc22þ andsuc22þ of the fission yeast Schizosaccharomycespombe encode the large and small subunits ofribonucleotide reductase. Mol. Gen. Genet. 238,241 – 251.

27. Stark C et al. 2011 The BioGRID interactiondatabase: 2011 update. Nucleic Acids Res. 39,D698 – D704. (doi:10.1093/nar/gkq1116)

28. Beltraminelli N, Murone M, Simanis V. 1999 TheS. pombe zfs1 gene is required to prevent septationif mitotic progression is inhibited. J. Cell Sci. 112,3103 – 3114.

29. Bjorklund M, Taipale M, Varjosaio M, Saharinen J,Lahdenpera J, Taipate J. 2006 Identification ofpathways regulating cell size and cell-cycleprogression by RNAi. Nature 439, 1009 – 1013.(doi:10.1038/nature04469)

30. Sonnichsen B et al. 2005 Full-genome RNAiprofiling of early embryogenesis in Caenorhabditiselegans. Nature 434, 462 – 469. (doi:10.1038/nature03353)

31. Enoch T, Carr AM, Nurse P. 1992 Fission yeast genesinvolved in coupling mitosis to completion of DNAreplication. Genes Dev. 6, 2035 – 2046. (doi:10.1101/gad.6.11.2035)

32. Al-Khodairy F, Fotou E, Sheldrick KS, Griffiths DJ,Lehmann AR, Carr AM. 1994 Identification andcharacterization of new elements involved incheckpoint and feedback controls in fission yeast.Mol. Biol. Cell 5, 147 – 160.

33. Parrilla-Castellar ER, Arlander SJ, Karnitz L. 2004Dial 9-1-1 for DNA damage: the Rad9-Hus1-Rad1(9-1-1) clamp complex. DNA Repair(Amst). 3, 1009 – 1014. (doi:10.1016/j.dnarep.2004.03.032)

34. Yanagida M. 1998 Fission yeast cut mutationsrevisited: control of anaphase. Trends CellBiol. 8, 144 – 149. (doi:10.1016/S0962-8924(98)01236-7)

35. Bondar T, Mirkin EV, Uckers DS, Walden WE, MirkinSM, Raychaudhuri P. 2003 Schizosaccharomycespombe Ddb1 is functionally linked to the replicationcheckpoint pathway. J. Biol. Chem. 278, 37 006 –37 014. (doi:10.1074/jbc.M303003200)

36. Han TX et al. 2010 Global fitness profiling of fissionyeast deletion strains by barcode sequencing. GenomeBiol. 11, R60. (doi:10.1186/gb-2010-11-6-r60)

37. Wiley DJ et al. 2008 Bot1p is required formitochondrial translation, respiratory function, andnormal cell morphology in the fission yeastSchizosaccharomyces pombe. Eukaryot. Cell 7,619 – 629. (doi:10.1128/EC.00048-07)

38. Yada T et al. 2001 Its8, a fission yeast homolog ofMcd4 and Pig-n, is involved in GPI anchor synthesisand shares an essential function with calcineurin incytokinesis. J. Biol. Chem. 276, 13 579 – 13 586.

39. Rollason R et al. 2009 A CD317/tetherin-RICH2complex plays a critical role in the organization ofthe subapical actin cytoskeleton in polarizedepithelial cells. J. Cell Biol. 184, 721 – 736. (doi:10.1083/jcb.200804154)

40. Grantham J, Brackley KI, Willison KR. 2006Substantial CCT activity is required for cell cycleprogression and cytoskeletal organization inmammalian cells. Exp. Cell Res. 312, 2309 – 2324.(doi:10.1016/j.yexcr.2006.03.028)

41. Weir BA, Yaffe MP. 2004 Mmd1p, a novel, conservedprotein essential for normal mitochondrialmorphology and distribution in the fission yeastSchizosaccharomyces pombe. Mol. Biol. Cell 15,1656 – 1665. (doi:10.1091/mbc.E03-06-0371)

42. Fu C et al. 2011 mmb1p binds mitochondria todynamic microtubules. Curr. Biol. 21, 1431 – 1439.(doi:10.1016/j.cub.2011.07.013)

43. Nunnari J, Suomalainen A. 2012 Mitochondria: insickness and in health. Cell 148, 1145 – 1159.(doi:10.1016/j.cell.2012.02.035)

44. Moreno S, Klar A, Nurse P. 1991 Molecular geneticanalysis of fission yeast Schizosaccharomyces pombe.Methods Enzymol. 194, 795 – 823. (doi:10.1016/0076-6879(91)94059-L)

45. Pelham Jr RJ, Chang F. 2001 Role of actinpolymerization and actin cables in actin-patchmovement in Schizosaccharomyces pombe. Nat. CellBiol. 3, 235 – 244. (doi:10.1038/35060020)

46. Hagan IM, Hyams JS. 1988 The use of cell divisioncycle mutants to investigate the control of microtubuledistribution in the fission yeast Schizosaccharomycespombe. J. Cell Sci. 89, 343 – 357.

rsob.royalsocietypublishing.orgOpen

Biol3:130053

14

on August 24, 2016http://rsob.royalsocietypublishing.org/Downloaded from

73

Submitted Publication

12. A method for increasing expressivity of Gene Ontology

annotations using a compositional approach.

METHODOLOGY ARTICLE Open Access

A method for increasing expressivity of GeneOntology annotations using a compositionalapproachRachael P Huntley1, Midori A Harris2, Yasmin Alam-Faruque1, Judith A Blake3, Seth Carbon4, Heiko Dietze4,Emily C Dimmer1, Rebecca E Foulger1, David P Hill3, Varsha K Khodiyar5, Antonia Lock2, Jane Lomax1,Ruth C Lovering5, Prudence Mutowo-Meullenet1, Tony Sawford1, Kimberly Van Auken6, Valerie Wood2

and Christopher J Mungall4*

Abstract

Background: The Gene Ontology project integrates data about the function of gene products across a diverserange of organisms, allowing the transfer of knowledge from model organisms to humans, and enablingcomputational analyses for interpretation of high-throughput experimental and clinical data. The core data structureis the annotation, an association between a gene product and a term from one of the three ontologies comprisingthe GO. Historically, it has not been possible to provide additional information about the context of a GO term, suchas the target gene or the location of a molecular function. This has limited the specificity of knowledge that can beexpressed by GO annotations.

Results: The GO Consortium has introduced annotation extensions that enable manually curated GO annotationsto capture additional contextual details. Extensions represent effector–target relationships such as localizationdependencies, substrates of protein modifiers and regulation targets of signaling pathways and transcription factorsas well as spatial and temporal aspects of processes such as cell or tissue type or developmental stage. We describethe content and structure of annotation extensions, provide examples, and summarize the current usage ofannotation extensions.

Conclusions: The additional contextual information captured by annotation extensions improves the utility offunctional annotation by representing dependencies between annotations to terms in the different ontologies ofGO, external ontologies, or an organism’s gene products. These enhanced annotations can also supportsophisticated queries and reasoning, and will provide curated, directional links between many gene products tosupport pathway and network reconstruction.

Keywords: Gene Ontology, Functional annotation, Annotation extension, Manual curation

BackgroundComprehensive representation of the roles of gene prod-ucts, individually and in combination, is essential to theunderstanding and modeling of biological systems. Inaddition to a gene product’s intrinsic activity, aspects ofthe context in which it acts, such as the gene products itacts upon, subcellular location of the activity, distribution

in cell or tissue types, or temporal restrictions to a cellcycle phase or developmental stage, must be described inorder to obtain a full description of its biological role.The Gene Ontology (GO) is a bioinformatics resource

that uses structured controlled vocabularies (ontologies)to describe the molecular functions or activities of a geneproduct, the biological processes in which a gene productis involved and the cellular components in which a geneproduct is located. Associations or ‘annotations’ can bemade between ontology terms and specific genes or geneproducts using a variety of manual or algorithmic methods

* Correspondence: [email protected] Berkeley National Laboratory, Genomics Division, Berkeley, CA94720, USAFull list of author information is available at the end of the article

© 2014 Huntley et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly credited. The Creative Commons Public DomainDedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article,unless otherwise stated.

Huntley et al. BMC Bioinformatics 2014, 15:155http://www.biomedcentral.com/1471-2105/15/155

that rely upon experimental evidence or sequence similar-ity, for example, to support the assertion [1-4].While the ontological rigor of the GO vocabularies has

been enriched over the years through the use of expressiveformalisms that permit logical reasoning and interactionwith external ontologies [5], the annotations themselveshave, until now, remained simple declarative statements.Each GO annotation is essentially a pair, combining a sin-gle gene product with a single GO term, plus supportingmetadata such as the evidence for the association [6]. Fur-thermore, any gene product can be associated with manyGO terms, and likewise any GO term could be used toannotate any number of gene products, the annotationsthus coded remain independent. The simplicity of thiscore GO annotation model has facilitated the populationof large annotation datasets, but this simplicity has, aswell, been unable to capture the interconnections betweenmultiple annotations to multiple genes, resulting in limita-tions on the granularity and connectivity of informationthat could be captured. Figure 1 illustrates this by showinga subset of Molecular Function and Cellular Componentannotations to several gene products, including micro-somal glutathione transferase 1 [7]. While the annotationscan describe which activities the gene products canperform, and in which components they are located, thereis no way of combining this information to convey whichactivities are performed in which locations.

Guidelines for pre-composing ontology termsIn the core GO annotation model, adding new terms tothe ontology, or “pre-composing” terms, has traditionallycaptured additional biological detail. However, we have setlimits on how specific terms may be differentiated fromone another. For example, generally we do not add newterms for activities or processes that are identical apartfrom which specific genes or gene products they affect.To illustrate this, consider the two terms ‘regulation

of transcription from RNA polymerase II promoter’(GO:0006357) and ‘regulation of Sonic hedgehog tran-scription from RNA polymerase II promoter’. The sec-ond term would not be added to the Biological Processontology because the only difference between it and itsparent is the target of the regulation; the core processrepresented by the term is mechanistically no differentfrom analogous processes governing transcription ofother genes. Although for the purpose of understand-ing the biology it is important to capture informationabout gene products that specifically regulate the tran-scription of Sonic hedgehog, it would not be practicalto create a specific transcription regulation term in theBiological Process ontology for every regulation targetin a genome.We also want to avoid pre-composing GO terms that

combine many concepts, or whose term label is very

long, making it difficult for humans to easily interprettheir meaning.To enable curators to create flexible, meaningful GO

annotations at the time of annotation that represent amore complete picture of gene product roles in theirbiological context, we have introduced annotation exten-sions to the GO annotation model. Curators can add detailto GO annotations using controlled vocabularies (eitherGO or external ontologies, such as Cell Type Ontology(CL) [8]; Uber Anatomy Ontology (Uberon) [9] or PlantOntology (PO) [10]) and biological entities such as genesor their products. GO annotations with extensions thusincorporate an increased level of detail and biological inte-gration, supporting more sophisticated querying and ana-lysis. We have applied this model to the curation of geneproducts from species such as mouse, human and fissionyeast and are proceeding to implement it throughout theGO Consortium.Here we describe how annotation extensions have been

incorporated into the GO annotation system, summarizethe relationship types we use for extensions, and provideexamples of how extensions can be displayed and applied,using a corpus of annotations we have developed.

ResultsExtending basic annotations with relationshipsWe extended the core GO annotation model to accom-modate annotation extensions. The annotation extensionmodel is described formally in terms of the Web OntologyLanguage (OWL) in the ‘Methods’ section. Conceptually,we take existing GO terms such as ‘protein kinase activity’(GO:0004672) or ‘nucleus’ (GO:0005634) and describe amore specific subtype through the use of one or moreformal relationships to other entities (such as the proteinthat is the target of the kinase, or the cell type which thenucleus is a part of). This is logically equivalent to creat-ing a new term for the subtype in the ontology.An extended annotation is an annotation to a GO term

followed by one or more relational expressions (extensions).Each relational expression is written as Relation(Entity),where Relation is a label denoting a relationship type, andEntity is an identifier for a database object or ontologyterm. Each such expression can be thought of as refiningthe core GO term used. For example, the Entity identifierfor ‘keratinocyte’ (CL:0000312) from the Cell Type Ontol-ogy (CL) can be combined with the Relation ‘part_of ’ tocreate the expression “part_of(CL:0000312)”, and whencombined with the GO term ‘nucleus’ (GO:0005634) nowdescribes a gene product that localizes to the nucleus of akeratinocyte.

RelationsWe created an application ontology that extends the OBO(Open Biomedical Ontologies) Relations Ontology (RO)

Huntley et al. BMC Bioinformatics 2014, 15:155 Page 2 of 11http://www.biomedcentral.com/1471-2105/15/155

[11] with a set of relations created explicitly for use in GOannotation extensions. These were selected and definedfor practical use by iterative discussion among curators,and are collected in a file maintained in OBO format [12],and also available in OWL format [13]. To enable curatorsto select the appropriate relation we have created a graph-ical web view (Figure 2; [14]), and organized relations into

subsets by entity type (for example, all relations where achemical entity can be specified are grouped in the‘chemical’ subset).The set of relations used fall into two broad categories –

molecular relations, which take an entity such as a gene,gene product, complex or chemical as an argument; andcontextual relations, which take an entity such as a cell

Figure 1 Representation of annotations made using the core GO annotation model. Gene products can be annotated to several GO terms,and any GO term can be used to annotate any number of gene products, but the annotations remain independent. The stars indicate anannotation of a gene product to the GO term and each colour represents a single gene product. Using this simple GO annotation model, it is notclear from the annotations shown in which Cellular Component each of the protein activities are performed. For example, microsomal glutathioneS-transferase 1 is represented by the red star and can perform two activities; glutathione transferase activity and glutathione peroxidase activity. It isfound in three Cellular Components, the mitochondrion, endoplasmic reticulum and peroxisomal membrane, but from these annotations theknowledge that the glutathione transferase activity is performed in the mitochondrion [7] cannot be found. For clarity not all annotations ofeach gene product are shown nor all terms between the specified terms and the root node.

Huntley et al. BMC Bioinformatics 2014, 15:155 Page 3 of 11http://www.biomedcentral.com/1471-2105/15/155

type, anatomy term, developmental stage or a GO term asan argument. Table 1 lists the most frequently used anno-tation extension relations with examples of their usage.

EntitiesIdentifiers used for the entities in annotation extensionscan reference GO or another ontology or database. Eachidentifier must have a prefix found in the GO DatabaseAbbreviations file [15], for example “UniProtKB” (proteindatabase) [16], “CHEBI” (chemical database) [17], “CL”(cell type ontology) [8], “Uberon” (metazoan anatomyontology) [9], or “PO” (plant anatomy ontology) [10]. Agene product identifier used in an annotation extension,should be interpreted in the context of the primary GOterm used. For example, the inclusion of the gene iden-tifier SGD:S000004660 in the annotation extension fieldassociated with the GO term ‘protein phosphorylation’

should be interpreted as “the protein product of SGD:S000004660 is phosphorylated”.

Combining multiple extensionsIn this new system, a single GO annotation can havemultiple relational expressions associated with it, whereeach expression uses a single relation and a single entity.Multiple expressions using the same relation are permitted.For example, if a gene product can carry out its activity inmultiple locations or during various processes, multipleRelation(Entity) pairs may be added as separate annotationextensions.To illustrate, consider a gene product that has its ac-

tivity in a neuron of the hippocampus. Here it wouldbe appropriate to make an extension combining twoexpressions for both the cell type (neuron) and thegross anatomical structure (hippocampus). If this gene

a

b c

Figure 2 Graphical web view of the annotation extension relations. (a) A graphical view of the relations was created to assist curators inselecting the appropriate relation for curation [14]; (b) A user can zoom into a particular area of the graph; (c) Relations can be clicked to viewinformation, such as which GO terms the relation can be used with and which identifiers may be used with the relation.

Table 1 Most commonly used relationships for annotation extension statements and examples of their usageContextual relationships Example (gene product; primary GO term; annotation extension)

part_of C. elegans psf-1; nucleus; part_of(WBbt:0006804 body wall muscle cell)

occurs_in Mouse opsin-4; G-protein coupled photoreceptor activity; occurs_in(CL:0000740 retinal ganglion cell)

happens_during S. pombe wis4; stress-activated MAPK cascade; happens_during(GO:0071470 cellular response to osmotic stress)

Molecular relationships Example (gene product; primary GO term; annotation extension)

has_regulation_target Human suppressor of fused homolog SUFU; negative regulation of transcription factor import into nucleus;has_regulation_target(UniProtKB:P08151 zinc finger protein GLI1)

has_input S. pombe rlf2; protein localization to nucleus; has_input(PomBase:SPAC26H5.03 pcf2)

has_direct_input Human WNK4; chloride channel inhibitor activity; has_direct_input(UniProtKB:Q7LBE3 Solute carrier family 26 member 9)

Molecular relations take an entity such as a gene, gene product, complex or chemical as an argument; contextual relations take an entity such as a cell type,anatomy term, developmental stage or a GO term as an argument. Entity names in italics are shown for clarity and are not part of the annotationextension format.

Huntley et al. BMC Bioinformatics 2014, 15:155 Page 4 of 11http://www.biomedcentral.com/1471-2105/15/155

product also had the same activity in an epithelial cell,this expression could be combined in the same annotationextension.A description of the semantics of multiple extensions

used in the annotation extension model can be found inthe ‘Methods’ section.

Annotation extensions in curation to specify moleculartargetsSchizosaccharomyces pombe protein Nep1 illustrateshow annotation extensions can be used to represent themultiple targets of a gene product’s enzymatic activity.Nep1 is a protease that can deneddylate proteins modi-fied by Nedd8 [18]. It has been shown to deneddylatethree cullin proteins, Cul1, Cul3 and Pcu4 (Figure 3a).Using the core GO annotation model described above,

it was not possible to record the cullins as the targetsof the deneddylation activity of Nep1. The annotationwould be:

‘NEDD8-specific protease activity’ (GO:0019784) withthe evidence code ‘Inferred from Mutant Phenotype’(IMP) (Figure 3b).

Using annotation extensions the annotation canbe enriched as follows: Nep1 is annotated to the term‘NEDD8-specific protease activity’ (GO:0019784) with theevidence code IMP, and with several Relation(Entity) pairsspecifying the gene product targets of the activity:

has_direct_input(PomBase:SPAC17G6.12)|has_direct_input(PomBase:SPAC24H6.03)|has_direct_input(PomBase:SPAC3A11.08)

(See ‘Methods’ section for a description of the semanticsused in the annotation extension model).This annotation means that a nep1 mutant phenotype

(IMP evidence in [18]) indicates that Nep1 executesNEDD8-specific protease activity and can deneddylateCul1 (SPAC17G6.12), Cul3 (SPAC24H6.03) and Pcu4(SPAC3A11.08) (Figure 3c). We use the relation has_dir-ect_input here with a Molecular Function term to indicatethe effector–substrate relationship between the gene prod-uct and its target protein. The PomBase display of theNep1 annotation is shown in Figure 4, note the annotationextension relation names have been translated to morehuman-readable text [19].

Annotation extensions in curation to specify locationalcontextTo illustrate how annotation extensions may be used tospecify locational context, we use the example of the ratsignaling complex subunit, mAKAP. mAKAP has beenshown by immunocytochemical assay to be located onthe nuclear envelope of cardiomyocytes [20].With the core annotation model, we are only able to

capture the cellular compartment that mAKAP is locatedin ‘nuclear envelope’ (GO:0005635) with the evidence code‘Inferred from Direct Assay’ (IDA).Using the annotation extension model as follows, we

can capture the cellular and anatomical context of thelocation of mAKAP such that mAKAP is annotated tothe term ‘nuclear envelope’ (GO:0005635) with the evidencecode IDA and with two Relation(Entity) pairs specifying thecell and tissue locations of the nuclear envelope:

part_of(CL:0002495), part_of(UBERON:0002082)

This annotation means that a direct assay (immunocyto-chemical assay in [20]) has shown that rat mAKAP is

a

b c

Figure 3 The deneddylation activity of S. pombe Nep1. (a) Theexperimental data reported in [18] is interpreted as: Nep1 is capableof deneddylating the cullins Cul1, Cul3 and Pcu4. (b) and (c)Graphical representation of Nep1 annotations using (b) the core GOannotation model and (c) the extended GO annotation model.

Huntley et al. BMC Bioinformatics 2014, 15:155 Page 5 of 11http://www.biomedcentral.com/1471-2105/15/155

located at the nuclear envelope of ‘fetal cardiomyocytes’(CL:0002495) of the ‘cardiac ventricle’ (UBERON:0002082).

Interconversion of core GO annotations and annotationextensionsAs annotation extensions are a relatively new feature ofGO curation, we have described and implemented methodsthat allow legacy tools (i.e. those that do not have supportfor extensions in their data models) to use extendedannotations without loss of specificity [21]. We have alsoimplemented reverse methods that allow the conversionof basic GO annotations to extended annotations. Weinformally call these methods ‘folding’ and ‘unfolding’ re-spectively, and these make use of the OWL formalizationof the GO (see ‘Methods’ section for details). The foldingoperation creates a new application ontology on the fly,with each extended annotation materializing a new GOterm. An OWL reasoner is used to automatically con-struct the graph in this new ontology. Application ofthis method can be seen as a stopgap to allow continueduse of existing tools – the resulting application ontology,whilst logically complete, may be unwieldy for queryingand browsing. The unfolding method takes annotations toexisting highly specific GO terms, and replaces them withan annotation to a more basic GO term, with the equiva-lent additional information now expressed as extensions.Unfolding annotations is useful for reducing the complexityof GO terms when querying or browsing.

DiscussionPractical application of annotation extensionsSeveral member groups of the GO Consortium are nowproducing extended annotations to enrich their dataset.

A summary of the numbers of extended annotationscategorized by species is shown in Table 2.Currently there are few applications, databases or

browsers that make use of, or display, extended anno-tations. In addition to their inclusion in the annotationfiles, extended annotations are currently displayed inthe GO Consortium browser, AmiGO 2 [22] and on thePomBase gene information pages [23] and there areplans to display them in UniProt-GOA’s GO browser,QuickGO [24], and on WormBase gene pages [25]. Outsideof the GO Consortium, Ensembl Genomes [26] now displayannotation extensions for S. pombe genes and these can be

Figure 4 Display of annotation extension data in PomBase for S. pombe Nep1 gene product. Annotation of the observation that Nep1deneddylates the three cullins Cul1, Cul3 and Pcu4 [18] requires one annotation with three separate expressions in the annotation extension.Note that more human-readable text has been substituted for the annotation extension relation names for display purposes in PomBase. Theunderlying data retain the relation names, and the mapping between relation names and display text is available on the PomBase website [19].

Table 2 Extended annotations categorized by speciesSpecies Total no. manual

annotationsNo. extendedannotations

% extendedannotations

Mus musculus 409098 25209 6.2

Homo sapiens 219258 9042 4.1

Saccharomycescerevisiae

53750 2713 5.0

Schizosaccharomycespombe

29049 1902 6.5

Caenorhabditis elegans 27488 1102 4.0

Arabidopsis thaliana 101936 503 0.5

Rattus norvegicus 72280 477 0.7

Escherichia coli 11658 426 3.7

Dictyosteliumdiscoideum

19278 228 1.2

Drosophilamelanogaster

109886 214 0.2

The number of extended annotations is shown compared to the total numberof manual annotations for each species. Calculated with the statistics from theUniProt-GOA database [3] on 21 November 2013.

Huntley et al. BMC Bioinformatics 2014, 15:155 Page 6 of 11http://www.biomedcentral.com/1471-2105/15/155

used for querying annotation sets in the Ensembl FungiBioMart [27].As extension data becomes more widely available,

querying for functional information can become moresophisticated. Users of the GO will be able to query theannotations for a wealth of specific information, includingconnections between a gene product and other entitiesand processes, or the locations — at the subcellular levelas well as cell and tissue types — where a gene productperforms specific roles. For example, a user could queryfor all targets of a particular protein kinase, or compose amore specific query to find all the proteins that are in-volved in blood vessel remodeling during retina vascula-ture development in the camera-type eye. Annotationextensions capturing effector-target relationships at thecellular level will provide a rich source of directional in-formation for regulatory network reconstruction. Forinstance, the has_input and has_direct_input relationscan be used to connect signal transducing componentsof signaling pathways or to link DNA binding regulatorytranscription factors with their specific target genes.The inherent directionality encoded in the extension canalso be used to increase the information content of existinginteraction-based networks. Annotation extensions can alsoassist with improving the interpretations of pathway ana-lysis. Currently pathway analysis, which uses methods suchas term enrichment and pathway topology, is hampered bythe lack of functional annotation with associated contextualaspects such as cell or tissue type or dependencies on othergene products or substances [28]. GO has the potential toenable great advances in pathway analysis by providing thiscontextual information in annotation extensions.

Pre- vs. post-composition of GO termsAs described above, increased specificity of GO annotationshas historically been achieved by adding new, more specificontology terms. However, new term addition cannot ac-commodate every detail that would be desirable to capturein GO annotations.Using annotation extensions to increase annotation

specificity is logically equivalent to creating new termsin the ontology (see ‘Methods’ section), but allows amore streamlined approach for information capture atthe time of annotation. Extended annotations can be‘folded’ to create a logical equivalent of a GO term, re-gardless of whether the term is included in the ontology.GO terms that are included in the ontology are said tobe ‘pre-composed’, whereas the combination of terms andannotation extensions effectively ‘post-compose’ a term. Itis also possible to perform the inverse and ‘unfold’ pre-composed GO terms into the equivalent extended annota-tion expression (see ‘Methods’ section). Whether termsare pre- or post-composed during the annotation processis thus not critical because it is possible to interconvert

seamlessly between the two. Identical information canthus be captured by either of two routes, creation of anew pre-composed term or during the recording of anannotation.Although many details captured in annotation extensions

will remain outside the scope of GO terms indefinitely, GOdevelopers will investigate systems by which annotation ex-tensions can be automatically converted to pre-composedterms when certain criteria are met, for example where acertain number of annotations have identical extensionsand the pre-composed term is in scope. The new terms willbe added to the ontology using logical definitions that makethem equivalent to the post-compositional annotation.Annotations made previously using post-composition canbe processed to the new pre-composed terms.In the future, maintaining a good balance between

pre- and post-composition will be assisted by automatedmethods to reason over annotations enhanced with annota-tion extensions to ensure the annotations are consistentlygrouped by an appropriate common ancestor GO term.

Impact on users of Gene Ontology annotationThe GO Consortium will provide annotation extensiondata as unfolded annotations, i.e. in the annotation files,the annotation extension will be kept in a separate fieldto the primary annotation. Consumers of annotation datacan therefore choose to be unaffected by annotation ex-tensions by simply ignoring the additional field. However,we do hope that users and tool developers will incorporatethe extensions into their tools and workflows to provideadditional specificity to their queries and tools. For ex-ample, a term enrichment tool provider might providean option to fold the annotation extensions into pre-composed terms before a user performs term enrich-ment. A GO browser could be extended to include anoption to search folded annotation extensions as wellas regular GO terms, e.g. it would be possible to searchfor all gene products that are involved in epithelial celldifferentiation, whether or not the cell type was curatedusing the specific GO term or in the annotation extensionwith the more general GO term ‘cell differentiation’. Abasic query for a GO term will necessarily find the annota-tions to that term (and its child terms) with and withoutextensions, the user may choose whether or not to use theextension data.We encourage users and tool developers to contact us with

specific questions so we can assist them with using this data.

Future developmentsA longer-term goal of the GO Consortium is to link an-notations together to fully describe the directionality anddependencies in a whole pathway or process. Althoughannotation extensions are not sufficient to representcomplete biological pathways, they provide a valuable

Huntley et al. BMC Bioinformatics 2014, 15:155 Page 7 of 11http://www.biomedcentral.com/1471-2105/15/155

set of data that future work can build upon. A more ex-pressive annotation system is now under developmentwithin the GO Consortium, which will allow curatorsto join annotations sourced from different publicationsand with different supporting evidence to describe entirepathways or sub-processes. The annotation extensionscurrently being captured will feed directly into the newmodular annotation system [29].

ConclusionsGO annotation extensions have been introduced to en-hance the depth and utility of annotation data by capturingspecific contextual information regarding a gene product’sfunction or location. Curators can now create, on-the-fly,complex GO annotations that describe dependencies andconsequences of a gene product’s function or location morecompletely than was previously possible. Data curated usingannotation extensions provides a repository for experimen-tally verified regulation targets for a wide range of geneproducts, including transcription factors and microRNAs,information that is currently not captured by other stan-dardized annotation approaches. A large corpus of anno-tations now make use of annotation extensions, and thisnumber is growing rapidly as groups make use of powerfulcuration tools such as UniProt-GOA’s Protein2GO [30].Extensive annotation enhancement makes GO data moreinformative for a biologist’s understanding of a gene orprocess of interest, and provides additional value to thedata which can be used by GO analysis tool providers toenhance the interpretation of high-throughput datasets,such as those created by next generation sequencing, tran-scriptomic and proteomic studies.

MethodsAnnotation Extension ModelAnnotation extensions are a means of dynamically referringto subtypes of existing GO terms, by means of sets ofrelation-value pairs, connected via either “and” or “or” op-erators (represented in GO annotation files using “,” and“|”, respectively).We present a formal treatment of the GO annotation

extension model and the syntax used to write extensions.This formal underpinning is necessary to clarify the seman-tics of annotation extensions and to enable the use of auto-mated reasoners to perform useful computations. However,the details of the formal underpinnings can be hidden intools used by curators and end-users, and instead presentedin intuitive ways.

FormalizationWe formalize the annotation extension model in terms ofDescription Logics, and in particular the Web OntologyLanguage (OWL) [31]. The GO is already heavily axioma-tized in OWL [32]. In the core GO annotation model, an

annotation is an association between a gene or gene prod-uct G and an OWL Class C. C is restricted to be a classfrom one of the three sub-ontologies of the GO: Molecu-lar Function, Biological Process, or Cellular Component.The meaning of the association varies depending on whichof these three sub-ontologies are used – there are a num-ber of ways of formalizing this in OWL, however, we donot provide details here as this is not in the scope of theextensions provided in this manuscript.The GO annotation extension model is formally a relax-

ation of the core model, in that it allows the annotation tobe to any OWL Class Expression that conforms to the fol-lowing profile.

ClassExpression ::= Class | ObjectIntersectionOf(Class RelationalExpression+)

RelationalExpression ::= ObjectSomeValuesFrom(ObjectProperty Class)

For a description of the constructs used in the above,please see the OWL2 syntax and semantics document [33].The main language constructs used are (1) intersections,which are interpreted as set-intersection (2) existentialrestrictions (“some values from”) which correspond tostandard relationships such as those found in the GOand (3) object properties, also known as relations.It can be seen that annotation extensions form a subset

of the EL++ profile [34], which thus allows the use of fastreasoners such as Elk [35]. This is important for the GO,which contains large numbers of annotations.One consequence of this model is that the external

entities, being related, must be modelled as OWL clas-ses rather than OWL individuals. In practice this is nota limitation, as molecular entities such as proteins aretypically modelled as classes [36].

SyntaxAnnotation extensions can be expressed in a backwardsand forwards compatible extension to existing exchangeformats such as Gene Association Format (GAF); GAF2.0 extends GAF 1.0 by providing an additional column(position 16) in which to write a set of relational expres-sions, as defined above. This column is optionally filledwith a disjunctive expression conforming to the followingBachus Normal Form (BNF) grammar:

AnnotationExtension ::=RelationalExpressionConjunction {“|”RelationalExpressionConjunction }

RelationalExpressionConjunction ::=RelationalExpression {“,” RelationalExpression }RelationalExpression ::= RelationSymbol “(“ ClassID “)”

Huntley et al. BMC Bioinformatics 2014, 15:155 Page 8 of 11http://www.biomedcentral.com/1471-2105/15/155

A disjunction is equivalent to multiple independentannotations each consisting of a conjunctive expression.The conjunctive expression is translated to an OWL

intersection expression whose elements are the mainGO class being annotated together with all relational ex-pressions in the conjunction. Each relational expressionis translated to an OWL existential restriction (“somevalues from”). The Relation Symbol is translated to anObject Property from the Relations Ontology, and theClassID is translated to an OWL class, both accordingto the mapping provided in the OBO format document[37]. To precisely specify the semantics of multipleextensions in output files, the annotation formats pro-vided by the GO Consortium force the use of eitherthe comma character (“,”) or the pipe character (“|”) toseparate each expression, where the comma indicatesconjunction (AND) and the pipe indicates disjunction(OR).For example, an annotation to the term ‘nuclear en-

velope’ (GO:0005635) with an extension field filledwith:

part_of(CL:0002495), part_of(UBERON:0002082)

(where the CL identifier denotes cardiomyocyte and theUberon identifier denotes cardiac ventricle) is translatedto be an annotation to the OWL class expression:

GO_0005635 and (BFO_0000050 some CL_0002495)and (BFO_0000050 some UBERON_0002082)

(where the BFO (Basic Formal Ontology) identifier denotesthe part_of relation).These expressions can be used by OWL reasoners to

return guaranteed valid and complete answers to queriessuch as “find all annotations to classes that are part of acell nucleus and part of a heart”.The syntax does not allow nesting of expressions, but

the use of parentheses in the grammar allows for theintroduction of nesting in the future.

Property ChainsThe set of object properties used can be primitiverelations (such as part_of, occurs_in or regulates) orrelations defined via an object property chain. Thiseffectively allows for a limited level of nesting in theannotated OWL class expression, extending the profiledescribed above to:

RelationalExpression ::= ObjectSomeValuesFrom(ObjectProperty ClassOrRelationalExpression)

ClassOrRelationalExpression ::= Class |RelationalExpression

For example, if a relation expression of regulates_oc-curs_in(CL:0000540) is used, this is equivalent to anOWL class expression

regulates some (occurs_in some CL_0000540)

Based on the definition of regulates_occurs_in < − >regulates o occurs_in.These chains can be expanded in user-views – for ex-

ample, AmiGO 2 will show the expression above as“regulates . occurs_in : neuron”.

Automated validation using reasoningWe use the Elk reasoner to reason over annotation classexpressions in order to make sure they are logically co-herent according to constraints encoded in the OWLversion of the GO, the relations ontology (RO; [11]), andexternal ontologies. For example, an annotation to anonsense class expression that contains occurs_in someapoptosis is flagged because the reasoner computes thatthis expression is unsatisfiable, due to the constraint thatthe range of occurs_in is a continuant (i.e. non-process).We also use reasoning to automatically deepen anno-

tations to class expressions to the Most Specific Class(MSC) in the ontology. For example, if a gene product isannotated to ‘postsynaptic density’ (GO:0014069) and hasthe extension field filled with “part_of(CL:0000127)”, thisis directly translated to the class expression ‘postsynapticdensity’ and part_of some astrocyte which is inferred tohave the MSC GO:0097483 (‘glial cell postsynaptic dens-ity’) based on equivalence axioms in the GO [5]: ‘glial cellpostsynaptic density’ EquivalentTo ‘postsynaptic dens-ity’ and part_of some ‘glial cell’ and the axiom ‘astro-cyte’ SubClassOf ‘glial cell’ inferred from the Cell TypeOntology.These reasoner checks and deepening procedures are

performed by the GO Continuous Integration server [38].We translate Gene Association Files into OWL usingOWLTools [39].

Annotation folding and unfolding procedureWe define a process of annotation folding that takes asinput the GO plus a set of supporting ontologies togetherwith a set of extended annotations and generates as out-put an additional ontology plus a set of basic annotations,where the input and output are logically equivalent [21].For each extended annotation a to a term t and extensionexpression e, we replace this with an annotation a’ to aterm tA, where tA is added to the application ontology,with an equivalence axiom tA EquivalentTo (t and e). Afast OWL reasoner such as Elk is used to automaticallyclassify the application ontology. The completeness of theclassification is related to the proportion of classes in thecore GO ontology that have equivalence axioms.

Huntley et al. BMC Bioinformatics 2014, 15:155 Page 9 of 11http://www.biomedcentral.com/1471-2105/15/155

The converse procedure of annotation unfolding takesas input the GO plus a set of supporting ontologiestogether with a set of basic annotations and generates asoutput a simplified GO plus a set of extended annotations.For each annotation a to a term t, if the term t has anequivalence axiom in the GO to an expression (t’ and e),where t’ is a GO term and e conforms to an extensionexpression, then replace a with a new annotation a’,where t is replaced by t’ and the extension field is filledwith e.

Curation proceduresAnnotation extensions are created as part of the manualcuration process [6]. This involves biological databasecurators reading full text, peer-reviewed articles to obtaininformation about gene product functions, the processesin which they are involved and their subcellular locations[1-4]. Curators choose GO terms that describe theseaspects of a gene product and assign an evidence codethat is appropriate for the type of supporting experimentor statement in the paper. The GO annotations and anyannotation extension information are entered into theannotating groups’ curation tool for inclusion in theirdatabase and/or display on their website. On a periodicbasis, each group submits their file(s) of annotationsfor display on the GO Consortium website [40] and ftpsite [41].Annotation extensions are formatted as Relation

(Entity) – where ‘Entity’ is an identifier in an ontology ordatabase, expressed as ‘DB:ID’ – in the current GO anno-tation file format (GAF2.0, column 16) [42] and in thenew format Gene Product Association Data (GPAD, col-umn 11) [43]. The DB prefix must be listed in the GODatabase Abbreviations collection [15].

Data availability and resourcesAnnotation extensions can be represented in the twoGO Consortium-supported annotation formats, GAF 2.0[42] and GPAD [43]. These files are housed on the GeneOntology Consortium website [40].Annotation extension data is available in AmiGO2

[22] and for S. pombe genes is additionally displayed onthe PomBase gene pages [44] and in the Ensembl FungiBioMart [27].Further documentation on annotation extensions can

be found on the GO Consortium website [45].

AbbreviationsBFO: Basic formal ontology; CHEBI: Chemical entities of biological interest;CL: Cell type ontology; GAF: Gene association file; GO: Gene ontology;GPAD: Gene product association data; IDA: Inferred from direct assay;IMP: Inferred from mutant phenotype; OBO: Open biomedical ontologies;OWL: Web ontology language; PO: Plant ontology; RO: Relations ontology;UBERON: Uber anatomy ontology.

Competing interestsThe authors declare that they have no competing interests.

Authors’ contributionsAll authors were involved in initial discussions on implementation ofannotation extension and relationships. CJM, MAH, RPH, VW, ED, REF, DPH,RCL, YA-F, PM-M and JL defined the set of annotation extension relationscurrently in use. RPH, MAH, YA-F, ED, REF, DPH, VKK, AL, RCL, PM-M, KV-A andVW contributed annotations with extensions. TS was responsible for thedevelopment of Protein2GO to allow for curation of annotation extensionsand for the graphical visualization of annotation extension relations. RPHcoordinated the writing of the paper. RPH, MAH, JAB, DPH, JL, RCL, CJM,KV-A and VW contributed to the writing of the paper. SJC is responsible forthe development of AmiGO2. HD worked on the OWLTools code forfolding/unfolding. All authors read and approved the final manuscript.

AcknowledgementsWe thank Mark McDowall and Kim Rutherford for producing the PomBaseannotation display. We also thank Mais Ammari, Rama Balakrishnan, LionelBreuza, Leonardo Briganti, Fiona Broackes-Carter, Nancy Campbell, Karen Christie,Gayatri Chavali, Carol Chen, Maria Costanzo, Janos Demeter, Paul Denny, RobertDodson, Harold Drabkin, Margaret Duesbury, Marine Dumousseau, Selina Dwight,Stacia Engel, Petra Fey, Dianna Fisk, Reija Hieta, Ursula Hinz, Marta Iannuccelli,Diane Inglis, Sruthi Jagannathan, Jyoti Khadake, Astrid Lagreid, Luana Licata, PaulLloyd, Birgit Meldal, Anna Melidoni, Mila Milagros, Robert Nash, Li Ni, SandraOrchard, Livia Perfetto, Pablo Porras Millan, Arathi Raghunath, Silvie Ricard-Blum,Bernd Roechert, Kim van Roey, Aleksandra Shypitsyna, Dmitry Sitnikov, MarekSkrzypek, Andre Stutz, Michael Tognolli and Edith Wong for additionalcontributions to the currently available annotation extension data set.The Gene Ontology Consortium is supported by National Human GenomeResearch Institute (NHGRI) U41 grant HG22073 to PIs JA Blake, JM Cherry, SLewis, PW Sternberg and P Thomas. This grant supports all authors exceptthose listed hereafter. RC Lovering, VK Khodiyar and T Sawford: British HeartFoundation grants SP/07/007/23671 and RG/13/5/30112. MA Harris, V Woodand A Lock: Wellcome Trust grant WT090548MA. Y Alam-Faruque: KidneyResearch UK [RP26/2008] and European Molecular Biology Laboratory corefunding. T Sawford and P Mutowo-Muellenet: NIH grant 4U41HG006104-04to UniProt. K Van Auken: US National Human Genome Research Institute[U41-HG002223] and British Medical Research Council [G070119]. The workperformed by H Dietze, S Carbon and CJ Mungall was additionally supportedby the Director, Office of Science, Office of Basic Energy Sciences, of the U.S.Department of Energy under Contract No. DE-AC02-05CH11231. The articleprocessing charge was funded by National Human Genome ResearchInstitute (NHGRI) U41 grant HG22073.

Author details1European Molecular Biology Laboratory, European Bioinformatics Institute(EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD,UK. 2Department of Biochemistry, Cambridge Systems Biology Centre,University of Cambridge, Sanger Building, 80 Tennis Court Road, CambridgeCB2 1GA, UK. 3The Jackson Laboratory, 600 Main Street, Bar Harbor, ME04609, USA. 4Lawrence Berkeley National Laboratory, Genomics Division,Berkeley, CA 94720, USA. 5Centre for Cardiovascular Genetics, Institute ofCardiovascular Science, University College London, London, UK. 6CaliforniaInstitute of Technology, Division of Biology 156-29, Pasadena, CA 91125, USA.

Received: 4 March 2014 Accepted: 15 May 2014Published: 21 May 2014

References1. Li D, Berardini TZ, Muller RJ, Huala E: Building an efficient curation workflow

for the Arabidopsis literature corpus. Database (Oxford) 2012, 2012:bas047.2. Drabkin HJ, Blake JA: Manual Gene Ontology annotation workflow at the

Mouse Genome Informatics Database. Database (Oxford) 2012,2012:bas045.

3. Dimmer EC, Huntley RP, Alam-Faruque Y, Sawford T, O’Donovan C, MartinMJ, Bely B, Browne P, Mun Chan W, Eberhardt R, Gardner M, Laiho K, Legge D,Magrane M, Pichler K, Poggioli D, Sehra H, Auchincloss A, Axelsen K, BlatterM-C, Boutet E, Braconi-Quintaje S, Breuza L, Bridge A, Coudert E, Estreicher A,Famiglietti L, Ferro-Rojas S, Feuermann M, Gos A, et al: The UniProt-GOAnnotation database in 2011. Nucleic Acids Res 2011, 40:D565–D570.

Huntley et al. BMC Bioinformatics 2014, 15:155 Page 10 of 11http://www.biomedcentral.com/1471-2105/15/155

4. Pillai L, Chouvarine P, Tudor CO, Schmidt CJ, Vijay-Shanker K, McCarthy FM:Developing a biocuration workflow for AgBase, a non-model organismdatabase. Database (Oxford) 2012, 2012:bas038.

5. Mungall CJ, Bada M, Berardini TZ, Deegan J, Ireland A, Harris MA, Hill DP,Lomax J: Cross-product extensions of the Gene Ontology. J Biomed Inform2011, 44:80–86.

6. Balakrishnan R, Harris MA, Huntley R, Van Auken K, Cherry JM: A guide tobest practices for Gene Ontology (GO) manual annotation. Database2013, 2013:bat054.

7. Johansson K, Järvliden J, Gogvadze V, Morgenstern R: Multiple roles ofmicrosomal glutathione transferase 1 in cellular protection: amechanistic study. Free Radic Biol Med 2010, 49:1638–1645.

8. Meehan TF, Masci AM, Abdulla A, Cowell LG, Blake JA, Mungall CJ, Diehl AD:Logical development of the cell ontology. BMC Bioinforma 2011, 12:6.

9. Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA: Uberon, anintegrative multi-species anatomy ontology. Genome Biol 2012, 13:R5.

10. Avraham S, Tung C-W, Ilic K, Jaiswal P, Kellogg EA, McCouch S, Pujar A, Reiser L,Rhee SY, Sachs MM, Schaeffer M, Stein L, Stevens P, Vincent L, Zapata F, WareD: The Plant Ontology Database: a community resource for plant structureand developmental stages controlled vocabulary and annotations. NucleicAcids Res 2008, 36:D449–D454.

11. OBO Relations Ontology. http://obo-relations.googlecode.com.12. GO Annotation Extension Relations OBO file. http://purl.obolibrary.org/

obo/go/extensions/gorel.obo.13. GO Annotation Extension Relations OWL file. http://purl.obolibrary.org/

obo/go/extensions/gorel.owl.14. GO Annotation Extension Relations graph. http://www.ebi.ac.uk/QuickGO/

AnnotationExtensionRelations.html.15. GO Database Abbreviations file. http://www.geneontology.org/doc/GO.

xrf_abbs.16. UniProt Consortium: Activities at the Universal Protein Resource (UniProt).

Nucleic Acids Res 2014, 42:D191–D198.17. Hastings J, de Matos P, Dekker A, Ennis M, Harsha B, Kale N, Muthukrishnan

V, Owen G, Turner S, Williams M, Steinbeck C: The ChEBI referencedatabase and ontology for biologically relevant chemistry:enhancements for 2013. Nucleic Acids Res 2013, 41:D456–D463.

18. Zhou L, Watts FZ: Nep1, a Schizosaccharomyces pombe deneddylatingenzyme. Biochem J 2005, 389:307–314.

19. PomBase annotation extension relation display. http://www.pombase.org/documentation/gene-page-annotation-extension-relation-display.

20. Kapiloff MS, Jackson N, Airhart N: mAKAP and the ryanodine receptor arepart of a multi-component signaling complex on the cardiomyocytenuclear envelope. J Cell Sci 2001, 114:3167–3176.

21. Annotation Extension Folding. http://code.google.com/p/owltools/wiki/AnnotationExtensionFolding.

22. AmiGO 2. http://amigo2.berkeleybop.org/cgi-bin/amigo2/amigo.23. Wood V, Harris MA, McDowall MD, Rutherford K, Vaughan BW, Staines DM,

Aslett M, Lock A, Bähler J, Kersey PJ, Oliver SG: PomBase: a comprehensiveonline resource for fission yeast. Nucleic Acids Res 2012, 40:D695–D699.

24. QuickGO. http://www.ebi.ac.uk/QuickGO.25. Yook K, Harris TW, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, de la Cruz

N, Duong A, Fang R, Ganesan U, Grove C, Howe K, Kadam S, Kishore R, LeeR, Li Y, Muller H-M, Nakamura C, Nash B, Ozersky P, Paulini M, Raciti D,Rangarajan A, Schindelman G, Shi X, Schwarz EM, Ann Tuli M, Van Auken K,Wang D, et al: WormBase 2012: more genomes, more data, new website.Nucleic Acids Res 2012, 40:D735–D741.

26. Kersey PJ, Lawson D, Birney E, Derwent PS, Haimel M, Herrero J, Keenan S,Kerhornou A, Koscielny G, Kähäri A, Kinsella RJ, Kulesha E, Maheswari U,Megy K, Nuhn M, Proctor G, Staines D, Valentin F, Vilella AJ, Yates A:Ensembl Genomes: extending Ensembl across the taxonomic space.Nucleic Acids Res 2010, 38:D563–D569.

27. Ensembl Fungi BioMart. http://fungi.ensembl.org/biomart/martview/.28. Khatri P, Sirota M, Butte AJ: Ten years of pathway analysis: current

approaches and outstanding challenges. PLoS Comput Biol 2012,8:e1002375.

29. Thomas P: Building biological function modules from molecules topopulations. http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/experimental/lego/docs/PThomaslego-Whitepaper-2010-03.pdf.

30. Barrell D, Dimmer E, Huntley RP, Binns D, O’Donovan C, Apweiler R: TheGOA database in 2009–an integrated Gene Ontology Annotationresource. Nucleic Acids Res 2009, 37:D396–D403.

31. OWL Web Ontology Language. http://www.w3.org/TR/owl-guide/.32. Hill DP, Adams N, Bada M, Batchelor C, Berardini TZ, Dietze H, Drabkin HJ,

Ennis M, Foulger RE, Harris MA, Hastings J, Kale NS, de Matos P, Mungall CJ,Owen G, Roncaglia P, Steinbeck C, Turner S, Lomax J: Dovetailing biologyand chemistry: integrating the Gene Ontology with the ChEBI chemicalontology. BMC Genomics 2013, 14:513.

33. OWL2 syntax and semantics document. http://www.w3.org/TR/owl2-syntax/.34. OWL 2 EL profile. http://www.w3.org/TR/owl2-profiles/#OWL_2_EL.35. ELK reasoner. http://code.google.com/p/elk-reasoner/.36. Natale DA, Arighi CN, Barker WC, Blake J, Chang T-C, Hu Z, Liu H, Smith B,

Wu CH: Framework for a protein ontology. BMC Bioinformatics 2007,8(Suppl 9):S1.

37. OBO format document. https://code.google.com/p/oboformat/.38. Continuous Integration of Open Biological Ontology Libraries.

http://bio-ontologies.knowledgeblog.org/405.39. OWL Tools. http://code.google.com/p/owltools/wiki/OortGAFs.40. Gene Ontology Consortium Annotation File Download. http://

geneontology.org/GO.downloads.annotations.shtml.41. GO Consortium Gene Association ftp Downloads. ftp://ftp.geneontology.

org/pub/go/gene-associations/.42. Gene Association File Format 2.0 guide. http://www.geneontology.org/

GO.format.gaf-2_0.shtml.43. Gene Product Association Data File Format. http://www.geneontology.

org/GO.format.gpad.shtml.44. PomBase website. http://www.pombase.org/.45. Annotation Extension documentation. http://www.geneontology.org/GO.

annotation.extension.shtml.

doi:10.1186/1471-2105-15-155Cite this article as: Huntley et al.: A method for increasing expressivityof Gene Ontology annotations using a compositional approach. BMCBioinformatics 2014 15:155.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit

Huntley et al. BMC Bioinformatics 2014, 15:155 Page 11 of 11http://www.biomedcentral.com/1471-2105/15/155

74

Submitted Publication

13. PomBase 2015: updates to the fission yeast database.

D656–D661 Nucleic Acids Research, 2015, Vol. 43, Database issue Published online 31 October 2014doi: 10.1093/nar/gku1040

PomBase 2015: updates to the fission yeast databaseMark D. McDowall1,*, Midori A. Harris2, Antonia Lock3, Kim Rutherford2, Daniel M. Staines1,Jurg Bahler3, Paul J. Kersey1, Stephen G. Oliver2,* and Valerie Wood2,*

1European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust GenomeCampus, Hinxton, Cambridgeshire CB10 1SD, UK, 2Cambridge Systems Biology and Department of Biochemistry,University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge, Cambridgeshire CB2 1GA, UK and3Research Department of Genetics, Evolution and Environment, and UCL Cancer Institute, University CollegeLondon, Darwin Building, Gower Street, London WC1E 6BT, UK

Received September 12, 2014; Revised October 10, 2014; Accepted October 10, 2014

ABSTRACT

PomBase (http://www.pombase.org) is the model or-ganism database for the fission yeast Schizosaccha-romyces pombe. PomBase provides a central hubfor the fission yeast community, supporting bothexploratory and hypothesis-driven research. It pro-vides users easy access to data ranging from thesequence level, to molecular and phenotypic anno-tations, through to the display of genome-wide high-throughput studies. Recent improvements to the siteextend annotation specificity, improve usability andallow for monthly data updates. Both in-house cu-rators and community researchers provide manuallycurated data to PomBase. The genome browser pro-vides access to published high-throughput data setsand the genomes of three additional Schizosaccha-romyces species (Schizosaccharomyces cryophilus,Schizosaccharomyces japonicus and Schizosaccha-romyces octosporus).

INTRODUCTION

The fission yeast Schizosaccharomyces pombe is a unicellu-lar eukaryote that has been used as a model organism forstudying a diverse array of biological processes, from the cellcycle to signaling, for over 60 years (1). It was the sixth eu-karyotic organism to have its genome completely sequenced(2). With a thriving community generating data from smalland large-scale projects, a central hub to curate and inte-grate information is vital to facilitate data interpretationand hypothesis generation, and to guide further research.

PomBase (http://www.pombase.org) was launched in2011 as the model organism database for fission yeast (3).The PomBase portal provides centralized access to gene-and genome-scale information, emphasizing data acquiredby manual literature curation. In a novel community cura-

tion initiative, fission yeast researchers now contribute sig-nificantly to gene annotation, using the Canto online cura-tion tool (4).

PomBase presents information in gene-specific pages thatinclude summary data on each gene and its product, suchas its biological functions, cellular localization, phenotypedata, modifications, interactions, regulation and gene ex-pression.

PomBase offers a customized Ensembl Genome browser(5) to provide access to the genome sequence and features,and to visualize high-throughput data sets in a genomiccontext.

BIOLOGICAL DATA

PomBase curators focus on extracting data from historicalpapers and on providing help and guidance to researcherswho curate their own papers using Canto. The inclusion ofgenome-scale datasets has resulted in a large increase in thevolume of data curated.

High-throughput datasets

PomBase gene pages typically include data from varioustypes of large-scale experiments, such as gene expressiondata (6,7), phenotypic analysis (8,9) and interaction data(10). Within the genome browser, PomBase hosts sequence-based datasets from a variety of high-throughput exper-imental techniques, such as nucleosome positioning (11),transcriptomic data (see Figure 1A) (6,11–12), replicationprofiling (13), polyadenylation sites (14,15) and chromatinbinding (16). The datasets included to date are those re-quested by the fission yeast community, and for which thepublication authors have provided data to PomBase.

Additional species

In addition to S. pombe, the genomes of Schizosac-charomyces cryophilus, Schizosaccharomyces japonicus and

*To whom correspondence should be addressed. Tel: +44 1223 494589; Fax: +44 1223 494468; Email: [email protected] may also be addressed to Valerie Wood. Tel: +44 1223 746961; Fax: +44 1223 766002; Email: [email protected] may also be addressed to Stephen G. Oliver. Tel: +44 1223 333667; Fax: +44 1223 766002; Email: [email protected]

C⃝ The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), whichpermits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Nucleic Acids Research, 2015, Vol. 43, Database issue D657

Figure 1. Views available in the genome browser. (A) Region display with two tracks enabled, displaying RNA-Seq coverage data (12). The two tracksshow the forward (top) and reverse (bottom) strand reads along with the genes that are mapped to the region. (B) Region comparison display showingthe regions of alignment between Schizosaccharomyces pombe and Schizosaccharomyces japonicus (top) and Schizosaccharomyces octosporus (bottom) ingreen. (C) Gene tree view generated using the Compara framework, with the gene of interest, pom1 (SPAC2F7.03c), highlighted in red.

D658 Nucleic Acids Research, 2015, Vol. 43, Database issue

Schizosaccharomyces octosporus (12) are now accessible viathe genome browser, as the result of collaboration withthe Ensembl Genomes project (17). All-against-all DNAalignments between the four Schizosaccharomyces speciescan also be displayed in the PomBase browser (see Fig-ure 1B). The new Schizosaccharomyces species are amongthe 52 fungal species included in a protein multiple sequencealignment (see Figure 1C). S. pombe also represents the fis-sion yeast clade in protein multiple sequence alignments ofa further set of 178 species covering a broad taxonomicrange including human and bacteria, which are visible inthe genome browser.

INFRASTRUCTURE AND PROCEDURAL IMPROVE-MENTS

Manually curated data is stored within a Chado relationaldatabase (18). During the release procedure, a snapshot ofthe curation database is created and the annotations are im-ported into an Ensembl schema on a MySQL database. TheEnsembl schema provides the back-end architecture for thegenome browser and houses the annotations for the Pom-Base site.

The import pipeline that transfers data from the Pom-Base Chado curation database to the Ensembl databasehas been altered to accommodate increasing annotationcomplexity (as described below). Update procedures havebeen improved to implement data consistency checks on thedatabase and to use the Selenium testing framework (http://www.seleniumhq.org) on the web interface. This more ro-bust infrastructure has enabled PomBase to implement amonthly release cycle. Improved back-end data storage andretrieval has reduced the gene page loading time, even as theamount of data presented has increased, enabling users tonavigate through multiple genes with minimal delays.

INTEGRATING AND VISUALIZING DATA

To maintain a readable display of increasingly complexdata, and incorporate new data types, there have been majorimprovements to the organization and presentation of dataon the gene pages. The most extensive changes have affectedthree key regions of the gene page: the displays of GeneOntology (GO) (19,20) annotations, Fission Yeast Pheno-type Ontology (FYPO) annotations (21) and gene expres-sion data. More subtle changes have also been introducedthroughout the gene pages.

Ontology annotations and extensions

The most significant change affecting annotation complex-ity in PomBase is the introduction of ‘annotation exten-sions’ that increase the expressivity of annotations to on-tology terms. With active participation by PomBase cura-tors, the GO Consortium introduced annotation extensionsin 2013 (22) to enable curators to capture additional contex-tual details such as effector–target relationships and tempo-ral or spatial aspects of biological processes. Whereas previ-ously each GO annotation combined a single gene productwith a single GO term, and was independent from any other

GO annotations, extended annotations can capture inter-connections between multiple annotations as well as linksto additional ontologies.

Each annotation extension consists of a relation and anidentifier that refers to another ontology term (GO, SO (23),ChEBI (24), PSI-MOD (25), etc.) or another gene. An an-notation may have one or more extensions, each with itsown relationships and sources, and ‘compound’ extensionscan be made by combining single extensions.

To date, PomBase curators have added extensions to over2000 GO annotations. PomBase has also adopted the an-notation extension model for phenotype (FYPO) and geneexpression annotations, as described below. Table 1 showsthe total number of annotations in PomBase as of August2014, and the number that have extensions, for four ontolo-gies plus gene expression.

To accommodate annotation extensions, PomBase hasadapted its Chado and Ensembl relational databaseschemata and loading procedures and enhanced the genepage ontology annotation displays. On PomBase genepages, annotation extensions are shown in rows below theontology term, with the relevant evidence code and anno-tation source. Identifiers and relation strings are convertedto human-friendly text, such as a gene name or ontologyterm, wherever possible. For example, Figure 2A shows an-notations to GO:0045944 from the ste11 (SPBC32C12.02)gene page. Annotations without extensions are displayedfirst, followed by those with extensions, and the bottom rowshows a compound annotation extension (26).

Phenotype annotations have also considerably increasedboth in number and complexity. FYPO annotations includeallele details, expression level, experimental conditions, theevidence and source, and annotation extensions that repre-sent penetrance and severity. Phenotype annotation exten-sions also capture specific genes used in assays for pheno-types such as protein localization or gene expression level.On each gene page, FYPO annotations are grouped bywhether the phenotype is relevant at the level of a cell pop-ulation or an individual cell, and display annotation exten-sions similarly to the GO tables. Figure 2B shows allelesof cdc2 (SPBC11B10.09) that have been annotated to ‘de-creased protein binding’ (FYPO:0001645), affecting severalother gene products.

Targets

A new table on the gene pages, ‘Target Of’, reports effects ofother genes on the gene of interest, such as modification orregulation. ‘Target Of’ annotations are the reciprocal of GOand FYPO annotation extensions. The ‘Target Of’ displayincludes the relevant gene, a relationship and the annotationsource. For example, cdc2 (SPBC11B10.09) is annotated asthe substrate of csk1 (SPAC1D4.06c) protein kinase activity(27) and cdc25 (SPAC24H6.05) protein phosphatase activ-ity (28).

Gene expression

Quantitative gene expression data have been imported fromtwo large datasets covering the expression of 3175 (7) and7016 (6) gene products. Gene expression annotations may

Nucleic Acids Research, 2015, Vol. 43, Database issue D659

Figure 2. PomBase gene page views for example annotations: (A) GO (gene ste11 - SPBC32C12.02); (B) FYPO (gene cdc2 - SPBC11B10.09); (C) Geneexpression (gene clr3 - SPBC800.03). These examples highlight the display of annotation extensions and their use within the context of different ontologies.

include extensions indicating that the expression level wasmeasured in a specific phase of the cell cycle or under spe-cific growth conditions. Further qualitative data have alsobeen manually imported into PomBase from the literature.When available, this information is displayed on the genepages, providing details of the experimental conditions, evi-dence, scale and source. Figure 2C shows the display of geneexpression data for clr3 (SPBC800.03).

Data visualization in genome browser

In the PomBase genome browser, data tracks are presentedwith curated metadata, links to the relevant publicationvia the Europe PMC portal (29) and, where appropriate,links to external source databases. Users have the option ofviewing their own data privately within the context of thegenome browser, or submitting data to be hosted by Pom-Base for public viewing.

D660 Nucleic Acids Research, 2015, Vol. 43, Database issue

Table 1. Summary of annotations and extensions in PomBase as of September 2014

Curated data type Annotation Annotation Genecount extensions coverage

Gene expression 40 403 40 403 7017Phenotype (FYPO) 36 382 11 722 4942Gene Ontolgy (GO) 37 224 2149 5301Modifications (MOD) 11 265 7255 2009Protein sequence 943 N/A 764Features (SO)

Annotation Count: total number of annotations of each type, including those with extensions; Annotation extensions: number of annotations that haveone or more extensions apiece; Gene coverage: number of genes that have at least one annotation of the given type.

OTHER IMPROVEMENTS

PomBase now offers a motif finder that can retrieve listsof genes that match a particular protein sequence pattern.In the PomBase advanced search, the interfaces for con-structing custom queries and retrieving results have beenenhanced.

User-experience testing conducted after the initial Pom-Base release identified several opportunities for usabilityimprovements. Accordingly, changes to the navigation andorganization of the gene pages, such as collapsible intra-page menus, now make data more intuitively visible. Inter-faces requiring user interaction are now also more intuitive.

OUTREACH AND USAGE

PomBase includes documentation for all gene page sections,links to Ensembl documentation for the genome browserand a Frequently Asked Questions section. Various webforms offer convenient links for users to contact curatorsto ask questions or submit high-throughput datasets to beincluded in the genome browser or on the gene pages. Pom-Base curators invite all authors of new fission yeast pub-lications to curate their own papers using Canto. Pom-Base also sends announcements and help to a dedicatedmailing list and to various social media outlets includingTwitter (@PomBase), LinkedIn (http://www.linkedin.com/company/pombase) and Google+ (+PombaseOrg).

FUTURE DIRECTIONS

Canto and the gene pages will be extended to support thecuration and display, respectively, of multiple-gene pheno-types (double mutants, triple mutants, etc.). Work has alsobegun to create pages for non-gene sequence features, suchas the centromeres, which at present can only be viewed inthe genome browser.

ACKNOWLEDGEMENTS

The authors thank the members of the Ensembl Genomesand Ensembl teams for contributions to the PomBaseproject. We also thank Nick Rhind for providing the im-ages for the three new Schizosaccharomyces species and wethank members of the Gene Ontology Consortium for help-ful discussions on the usage and display of annotation ex-tensions. Finally, we thank the fission yeast community fortheir contributions to literature curation and for their on-going support and feedback.

FUNDING

Wellcome Trust [WT090548MA to S.G.O.]. Funding foropen access charge: Wellcome Trust [WT090548MA toS.G.O.].Conflict of interest statement. None declared.

REFERENCES1. Egel,R. (2000) Fission yeast on the brink of meiosis. Bioessays, 22,

854–860.2. Wood,V., Gwilliam,R., Rajandream,M., Lyne,M., Lyne,R.,

Stewart,A., Sgouros,J., Peat,N., Hayles,J., Baker,S. et al. (2002) Thegenome sequence of Schizosaccharomyces pombe. Nature, 415,871–880.

3. Wood,V., Harris,M.A., McDowall,M.D., Rutherford,K.,Vaughan,B.W., Staines,D.M., Aslett,M., Lock,A., Bahler,J.,Kersey,P.J. et al. (2012) PomBase: a comprehensive online resourcefor fission yeast. Nucleic Acids Res., 40, D695–D699.

4. Rutherford,K.M., Harris,M.A., Lock,A., Oliver,S.G. and Wood,V.(2014) Canto: an online tool for community literature curation.Bioinformatics, 30, 1791–1792.

5. Flicek,P., Amode,M.R., Barrell,D., Beal,K., Billis,K., Brent,S.,Carvalho-Silva,D., Clapham,P., Coates,G., Fitzgerald,S. et al. (2014)Ensembl 2014. Nucleic Acids Res., 42, D749–D755.

6. Marguerat,S., Schmidt,A., Codlin,S., Chen,W., Aebersold,R. andBahler,J. (2012) Quantitative analysis of fission yeast transcriptomesand proteomes in proliferating and quiescent cells. Cell, 151,671–683.

7. Carpy,A., Krug,K., Graf,S., Koch,A., Popic,S., Hauf,S. andMacek,B. (2014) Absolute proteome and phosphoproteomedynamics during the cell cycle of Schizosaccharomyces pombe(Fission Yeast). Mol. Cell. Proteomics, 13, 1925–1936.

8. Hayles,J., Wood,V., Jeffery,L., Hoe,K.-L., Kim,D.-U., Park,H.-O.,Salas-Pino,S., Heichinger,C. and Nurse,P. (2013) A genome-wideresource of cell cycle and cell shape genes of fission yeast. Open Biol.,3, 130053.

9. Sun,L.-L., Li,M., Suo,F., Liu,X.-M., Shen,E.-Z., Yang,B.,Dong,M.-Q., He,W.-Z. and Du,L.-L. (2013) Global analysis offission yeast mating genes reveals new autophagy factors. PLoSGenet., 9, e1003715.

10. Chatr-Aryamontri,A., Breitkreutz,B.-J., Heinicke,S., Boucher,L.,Winter,A., Stark,C., Nixon,J., Ramage,L., Kolas,N., O’Donnell,L.et al. (2013) The BioGRID interaction database: 2013 update.Nucleic Acids Res., 41, D816–D823.

11. Soriano,I., Quintales,L. and Antequera,F. (2013) Clusteredregulatory elements at nucleosome-depleted regions punctuate aconstant nucleosomal landscape in Schizosaccharomyces pombe.BMC Genomics, 14, 813.

12. Rhind,N., Chen,Z., Yassour,M., Thompson,D.A., Haas,B.J.,Habib,N., Wapinski,I., Roy,S., Lin,M.F., Heiman,D.I. et al. (2011)Comparative functional genomics of the fission yeasts. Science, 332,930–936.

13. Xu,J., Yanagisawa,Y., Tsankov,A.M., Hart,C., Aoki,K.,Kommajosyula,N., Steinmann,K.E., Bochicchio,J., Russ,C.,Regev,A. et al. (2012) Genome-wide identification andcharacterization of replication origins by deep sequencing. GenomeBiol., 13, R27.

Nucleic Acids Research, 2015, Vol. 43, Database issue D661

14. Mata,J. (2013) Genome-wide mapping of polyadenylation sites infission yeast reveals widespread alternative polyadenylation. RNABiol., 10, 1407–1414.

15. Schlackow,M., Marguerat,S., Proudfoot,N.J., Bahler,J., Erban,R.and Gullerova,M. (2013) Genome-wide analysis of poly(A) siteselection in Schizosaccharomyces pombe. RNA, 19, 1617–1631.

16. Woolcock,K.J., Gaidatzis,D., Punga,T. and Buhler,M. (2011) Dicerassociates with chromatin to repress genome activity inSchizosaccharomyces pombe. Nat. Struct. Mol. Biol., 18, 94–99.

17. Kersey,P.J., Allen,J.E., Christensen,M., Davis,P., Falin,L.J.,Grabmueller,C., Hughes,D. S.T., Humphrey,J., Kerhornou,A.,Khobova,J. et al. (2014) Ensembl Genomes 2013: scaling up access togenome-wide data. Nucleic Acids Res., 42, D546–D552.

18. Mungall,C.J., Emmert,D.B. and FlyBase Consortium (2007) AChado case study: an ontology-based modular schema forrepresenting genome-associated biological information.Bioinformatics, 23, i337–i346.

19. Gene Ontology Consortium, Blake,J., Dolan,M., Drabkin,H.,Hill,D., Li,N., Sitnikov,D., Bridges,S., Burgess,S., Buza,T. et al.(2013) Gene Ontology annotations and resources. Nucleic Acids Res.,41, D530–D535.

20. Ashburner,M., Ball,C., Blake,J., Botstein,D., Butler,H., Cherry,J.,Davis,A., Dolinski,K., Dwight,S., Eppig,J. et al. (2000) Geneontology: tool for the unification of biology. The Gene OntologyConsortium. Nat. Genet., 25, 25–29.

21. Harris,M.A., Lock,A., Bahler,J., Oliver,S.G. and Wood,V. (2013)FYPO: the fission yeast phenotype ontology. Bioinformatics, 29,1671–1678.

22. Huntley,R.P., Harris,M.A., Alam-Faruque,Y., Blake,J.A.,Carbon,S., Dietze,H., Dimmer,E.C., Foulger,R.E., Hill,D.P.,

Khodiyar,V.K. et al. (2014) A method for increasing expressivity ofGene Ontology annotations using a compositional approach. BMCBioinformatics, 15, 155.

23. Eilbeck,K., Lewis,S.E., Mungall,C.J., Yandell,M., Stein,L.,Durbin,R. and Ashburner,M. (2005) The Sequence Ontology: a toolfor the unification of genome annotations. Genome Biol., 6, R44.

24. Hastings,J., de Matos,P., Dekker,A., Ennis,M., Harsha,B., Kale,N.,Muthukrishnan,V., Owen,G., Turner,S., Williams,M. et al. (2013)The ChEBI reference database and ontology for biologically relevantchemistry: enhancements for 2013. Nucleic Acids Res., 41,D456–D463.

25. Montecchi-Palazzi,L., Beavis,R., Binz,P.-A., Chalkley,R.J.,Cottrell,J., Creasy,D., Shofstahl,J., Seymour,S.L. and Garavelli,J.S.(2008) The PSI-MOD community standard for representation ofprotein modification data. Nat. Biotechnol., 26, 864–866.

26. Sugimoto,A., Iino,Y., Maeda,T., Watanabe,Y. and Yamamoto,M.(1991) Schizosaccharomyces pombe ste11+ encodes a transcriptionfactor with an HMG motif that is a critical regulator of sexualdevelopment. Genes Dev., 5, 1990–1999.

27. Lee,K., Saiz,J., Barton,W. and Fisher,R. (1999) Cdc2 activation infission yeast depends on Mcs6 and Csk1, two partially redundantCdk-activating kinases (CAKs). Curr. Biol., 9, 441–444.

28. Moreno,S., Hayles,J. and Nurse,P. (1989) Regulation of p34cdc2protein kinase during mitosis. Cell, 58, 361–372.

29. McEntyre,J.R., Ananiadou,S., Andrews,S., Black,W.J.,Boulderstone,R., Buttery,P., Chaplin,D., Chevuru,S., Cobley,N.,Coleman,L.-A. et al. (2011) UKPMC: a full text article resource forthe life sciences. Nucleic Acids Res., 39, D58–D65.