Insights into evolution of multicellular fungi from the assembled chromosomes of the mushroom...

30
Insights into evolution of multicellular fungi from the assembled chromosomes of the mushroom Coprinopsis cinerea (Coprinus cinereus) Jason E. Stajich a,b,c,d,1 , Sarah K. Wilke e , Dag Ahrén f , Chun Hang Au g , Bruce W. Birren h , Mark Borodovsky i , Claire Burns j , Björn Canbäck f , Lorna A. Casselton k , C.K. Cheng g , Jixin Deng e,2 , Fred S. Dietrich d,l , David C. Fargo m,3 , Mark L. Farman n , Allen C. Gathman o , Jonathan Goldberg h , Roderic Guigó p , Patrick J. Hoegger q,4 , James B. Hooker e , Ashleigh Huggins e , Timothy Y. James r , Takashi Kamada s , Sreedhar Kilaru q,5 , Chinnapa Kodira h , Ursula Kües q , Doris Kupfer t , H.S. Kwan g , Alexandre Lomsadze i , Weixi Li n , Walt W. Lilly o , Li-Jun Ma h , Aaron J. Mackey u,6 , Gerard Manning v , Francis Martin w , Hajime Muraguchi x , Donald O. Natvig y , Heather Palmerini j , Marilee A. Ramesh z , Cathy J. Rehmeyer n,7 , Bruce A. Roe t , Narmada Shenoy h , Mario Stanke aa , Vardges Ter-Hovhannisyan bb , Anders Tunlid f , Rajesh Velagapudi d,q,8 , Todd J. Vision e , Qiandong Zeng h , Miriam E. Zolan j , and Patricia J. Pukkila e,1 a Department of Plant Pathology and Microbiology, University of California, Riverside, CA 92521; b Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720; c University Program in Genetics and Genomics and d Department of Molecular Genetics and Microbiology, Duke University, Durham, NC 27710; e Department of Biology, University of North Carolina at Chapel Hill, NC 27599; f Department of Microbial Ecology, Lund University, S-223 62, Lund, Sweden; g Department of Biology, The Chinese University of Hong Kong, Hong Kong, China; h Broad Institute, Cambridge MA 02142; i Wallace H. Coulter Department of Biomedical Engineering, Division of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332; j Department of Biology, Indiana University, Bloomington, IN 47405; k Department of Plant Sciences, University of Oxford, Oxford OX1 3RB, United Kingdom; l Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708; m Center for Bioinformatics, University of North Carolina at Chapel Hill, Chapel Hill NC 27599; n Department of Plant Pathology, University of Kentucky, Lexington, KY 40546; o Department of Biology, Southeast Missouri State University, Cape Girardeau, MO 63701; p Centre for Genomic Regulation, 08003 Barcelona, Spain; q Division of Molecular Wood Biotechnology and Technical Mycology, Büsgen-Institute, Georg-August-University Goettingen, D-37077 Goettingen, Germany; r Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109; s Graduate School of Natural Science and Technology, Okayama University, Okayama 700-8530, Japan; t Advanced Center for Genome Technology, University of Oklahoma, Norman, OK 73019; u Penn Genomics Institute, University of Pennsylvania, Philadelphia, PA 19104; v Salk Institute for Biological Studies, La Jolla, CA 92037; w Unité Mixte de Recherche 1136, Institut National de la Recherche Agronomique (INRA)-Nancy Université, Interactions Arbres/ Microorganismes, INRA-Nancy, 54280 Champenoux, France; x Department of Biotechnology, Akita Prefectural University, Akita 010-0195, Japan; y Department of Biology, University of New Mexico, Albuquerque, NM 87131; z Department of Biology, Roanoke College, Salem, VA 24153; aa Department of Bioinformatics, University of Göttingen, 37077 Göttingen, Germany; and bb School of Biology, Georgia Institute of Technology, Atlanta, GA 30332 Edited by Joan Wennstrom Bennett, Rutgers University, New Brunswick, NJ, and approved April 26, 2010 (received for review March 18, 2010) The mushroom Coprinopsis cinerea is a classic experimental model for multicellular development in fungi because it grows on dened media, completes its life cycle in 2 weeks, produces some 10 8 syn- chronized meiocytes, and can be manipulated at all stages in development by mutation and transformation. The 37-megabase genome of C. cinerea was sequenced and assembled into 13 chro- mosomes. Meiotic recombination rates vary greatly along the chro- mosomes, and retrotransposons are absent in large regions of the genome with low levels of meiotic recombination. Single-copy genes with identiable orthologs in other basidiomycetes are pre- dominant in low-recombination regions of the chromosome. In contrast, paralogous multicopy genes are found in the highly re- combining regions, including a large family of protein kinases (FunK1) unique to multicellular fungi. Analyses of P450 and hydro- phobin gene families conrmed that local gene duplications drive the expansions of paralogous copies and the expansions occur in independent lineages of Agaricomycotina fungi. Gene-expression patterns from microarrays were used to dissect the transcriptional program of dikaryon formation (mating). Several members of the FunK1 kinase family are differentially regulated during sexual mor- phogenesis, and coordinate regulation of adjacent duplications is rare. The genomes of C. cinerea and Laccaria bicolor, a symbiotic basidiomycete, share extensive regions of synteny. The largest syn- tenic blocks occur in regions with low meiotic recombination rates, no transposable elements, and tight gene spacing, where ortholo- gous single-copy genes are overrepresented. The chromosome as- sembly of C. cinerea is an essential resource in understanding the evolution of multicellularity in the fungi. basidiomycete | dikaryon formation | gene families | kinase | meiotic recombination A lthough the number of fungal genome-sequencing projects has increased dramatically over the last few years, there is a surprising lack of complete chromosome assemblies from these Author contributions: J.E.S., A.C.G., H.S.K., W.W.L., H.M., T.J.V., M.E.Z., and P.J.P. designed research; J.E.S., S.K.W., C.H.A., C.B., C.K.C., F.S.D., A.C.G., P.J.H., J.B.H., A.H., T.K., S.K., U.K., D.K., H.S.K., W.W.L., H.M., H.P., M.A.R., R.V., and P.J.P. performed research; M.B., R.G., T.K., A.L., W.L., A.J.M., G.M., F.M., H.M., C.J.R., M.S., and V.T.-H. contributed new re- agents/analytic tools; B.W.B. supervised the Genome Project; T.K. also contributed the laboratory strain that was sequenced; C.K. supervised the Genome Annotation Team; D.K., D.O.N., and B.A.R. contributed to the initial EST dataset; F.M. contributed L. bicolor sequence; J.E.S., D.A., C.H.A., C.B., B.C., L.A.C., C.K.C., J.D., F.S.D., D.C.F., M.L.F., A.C.G., J.G., T.Y.J., C.K., U.K., A.L., W.W.L., L.-J.M., G.M., H.M., M.A.R., C.J.R., N.S., V.T.-H., A.T., R.V., and Q.Z. analyzed data; and J.E.S., S.K.W., D.A., C.H.A., C.B., L.A.C., J.D., A.C.G., J.G., T.Y.J., H.M., G.M., U.K., M.A.R., and P.J.P. wrote the paper. The authors declare no conict of interest. This article is a PNAS Direct Submission. Freely available online through the PNAS open access option. Data deposition: Assembly accession: AACS02000000; Expression data in GEO: GSE20628; EST library accessions: DR774668-DR775517, DN591505-DN593171, DN593172-DN593841, DR752715-DR753264, DN593842-DN593917, DR753265-DR753301, DR753303-DR753936, FG068230-FG068291, DR421062-DR421601, DR752151-DR752714, DR753937-DR753939, DR907568-DR908072. See Commentary on page 11655. 1 To whom correspondence may be addressed. E-mail: [email protected] or pukkila@ unc.edu. 2 Present address: Human Genome Sequencing Center, Houston, TX 77030. 3 Present address: Library and Information Services, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, 27709. 4 Present address: Syngenta Crop Protection AG, 4332 Stein, Switzerland. 5 Present address: School of Bioscience, University of Exeter, Exeter EX4 4QD, United Kingdom. 6 Present address: Center for Public Health Genomics, University of Virginia, Charlottes- ville, VA 22908. 7 Present address: Pikeville College School of Osteopathic Medicine, Pikeville, KY 41501. 8 Present address: Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01605. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1003391107/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1003391107 PNAS | June 29, 2010 | vol. 107 | no. 26 | 1188911894 GENETICS SEE COMMENTARY

Transcript of Insights into evolution of multicellular fungi from the assembled chromosomes of the mushroom...

Insights into evolution of multicellular fungi from theassembled chromosomes of the mushroomCoprinopsis cinerea (Coprinus cinereus)Jason E. Stajicha,b,c,d,1, Sarah K. Wilkee, Dag Ahrénf, Chun Hang Aug, Bruce W. Birrenh, Mark Borodovskyi, Claire Burnsj,Björn Canbäckf, Lorna A. Casseltonk, C.K. Chengg, Jixin Denge,2, Fred S. Dietrichd,l, David C. Fargom,3, Mark L. Farmann,Allen C. Gathmano, Jonathan Goldbergh, Roderic Guigóp, Patrick J. Hoeggerq,4, James B. Hookere, Ashleigh Hugginse,Timothy Y. Jamesr, Takashi Kamadas, Sreedhar Kilaruq,5, Chinnapa Kodirah, Ursula Küesq, Doris Kupfert, H.S. Kwang,Alexandre Lomsadzei, Weixi Lin, Walt W. Lillyo, Li-Jun Mah, Aaron J. Mackeyu,6, Gerard Manningv, Francis Martinw,Hajime Muraguchix, Donald O. Natvigy, Heather Palmerinij, Marilee A. Rameshz, Cathy J. Rehmeyern,7, Bruce A. Roet,Narmada Shenoyh, Mario Stankeaa, Vardges Ter-Hovhannisyanbb, Anders Tunlidf, Rajesh Velagapudid,q,8, Todd J. Visione,Qiandong Zengh, Miriam E. Zolanj, and Patricia J. Pukkilae,1

aDepartment of Plant Pathology and Microbiology, University of California, Riverside, CA 92521; bDepartment of Plant and Microbial Biology, University ofCalifornia, Berkeley, CA 94720; cUniversity Program in Genetics and Genomics and dDepartment of Molecular Genetics and Microbiology, Duke University,Durham, NC 27710; eDepartment of Biology, University of North Carolina at Chapel Hill, NC 27599; fDepartment of Microbial Ecology, Lund University, S-223 62,Lund, Sweden; gDepartment of Biology, The Chinese University of Hong Kong, Hong Kong, China; hBroad Institute, Cambridge MA 02142; iWallace H. CoulterDepartment of Biomedical Engineering, Division of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332; jDepartmentof Biology, Indiana University, Bloomington, IN 47405; kDepartment of Plant Sciences, University of Oxford, Oxford OX1 3RB, United Kingdom; lInstitute forGenome Sciences and Policy, Duke University, Durham, NC 27708; mCenter for Bioinformatics, University of North Carolina at Chapel Hill, Chapel Hill NC 27599;nDepartment of Plant Pathology, University of Kentucky, Lexington, KY 40546; oDepartment of Biology, Southeast Missouri State University, Cape Girardeau,MO 63701; pCentre for Genomic Regulation, 08003 Barcelona, Spain; qDivision of Molecular Wood Biotechnology and Technical Mycology, Büsgen-Institute,Georg-August-University Goettingen, D-37077 Goettingen, Germany; rDepartment of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI48109; sGraduate School of Natural Science and Technology, Okayama University, Okayama 700-8530, Japan; tAdvanced Center for Genome Technology,University of Oklahoma, Norman, OK 73019; uPenn Genomics Institute, University of Pennsylvania, Philadelphia, PA 19104; vSalk Institute for Biological Studies,La Jolla, CA 92037; wUnité Mixte de Recherche 1136, Institut National de la Recherche Agronomique (INRA)-Nancy Université, Interactions Arbres/Microorganismes, INRA-Nancy, 54280 Champenoux, France; xDepartment of Biotechnology, Akita Prefectural University, Akita 010-0195, Japan; yDepartmentof Biology, University of New Mexico, Albuquerque, NM 87131; zDepartment of Biology, Roanoke College, Salem, VA 24153; aaDepartment of Bioinformatics,University of Göttingen, 37077 Göttingen, Germany; and bbSchool of Biology, Georgia Institute of Technology, Atlanta, GA 30332

Edited by Joan Wennstrom Bennett, Rutgers University, New Brunswick, NJ, and approved April 26, 2010 (received for review March 18, 2010)

The mushroom Coprinopsis cinerea is a classic experimental modelfor multicellular development in fungi because it grows on definedmedia, completes its life cycle in 2 weeks, produces some 108 syn-chronized meiocytes, and can be manipulated at all stages indevelopment by mutation and transformation. The 37-megabasegenome of C. cinerea was sequenced and assembled into 13 chro-mosomes. Meiotic recombination rates vary greatly along the chro-mosomes, and retrotransposons are absent in large regions of thegenome with low levels of meiotic recombination. Single-copygenes with identifiable orthologs in other basidiomycetes are pre-dominant in low-recombination regions of the chromosome. Incontrast, paralogous multicopy genes are found in the highly re-combining regions, including a large family of protein kinases(FunK1) unique to multicellular fungi. Analyses of P450 and hydro-phobin gene families confirmed that local gene duplications drivethe expansions of paralogous copies and the expansions occur inindependent lineages of Agaricomycotina fungi. Gene-expressionpatterns from microarrays were used to dissect the transcriptionalprogram of dikaryon formation (mating). Several members of theFunK1 kinase family are differentially regulated during sexual mor-phogenesis, and coordinate regulation of adjacent duplications israre. The genomes of C. cinerea and Laccaria bicolor, a symbioticbasidiomycete, share extensive regions of synteny. The largest syn-tenic blocks occur in regions with low meiotic recombination rates,no transposable elements, and tight gene spacing, where ortholo-gous single-copy genes are overrepresented. The chromosome as-sembly of C. cinerea is an essential resource in understanding theevolution of multicellularity in the fungi.

basidiomycete | dikaryon formation | gene families | kinase | meioticrecombination

Although the number of fungal genome-sequencing projectshas increased dramatically over the last few years, there is

a surprising lack of complete chromosome assemblies from these

Author contributions: J.E.S., A.C.G., H.S.K., W.W.L., H.M., T.J.V., M.E.Z., and P.J.P. designedresearch; J.E.S., S.K.W., C.H.A., C.B., C.K.C., F.S.D., A.C.G., P.J.H., J.B.H., A.H., T.K., S.K., U.K.,D.K., H.S.K., W.W.L., H.M., H.P., M.A.R., R.V., and P.J.P. performed research; M.B., R.G.,T.K., A.L., W.L., A.J.M., G.M., F.M., H.M., C.J.R., M.S., and V.T.-H. contributed new re-agents/analytic tools; B.W.B. supervised the Genome Project; T.K. also contributed thelaboratory strain that was sequenced; C.K. supervised the Genome Annotation Team;D.K., D.O.N., and B.A.R. contributed to the initial EST dataset; F.M. contributed L. bicolorsequence; J.E.S., D.A., C.H.A., C.B., B.C., L.A.C., C.K.C., J.D., F.S.D., D.C.F., M.L.F., A.C.G., J.G.,T.Y.J., C.K., U.K., A.L., W.W.L., L.-J.M., G.M., H.M., M.A.R., C.J.R., N.S., V.T.-H., A.T., R.V.,and Q.Z. analyzed data; and J.E.S., S.K.W., D.A., C.H.A., C.B., L.A.C., J.D., A.C.G., J.G., T.Y.J.,H.M., G.M., U.K., M.A.R., and P.J.P. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Freely available online through the PNAS open access option.

Data deposition: Assembly accession: AACS02000000; Expression data in GEO: GSE20628;EST library accessions: DR774668-DR775517, DN591505-DN593171, DN593172-DN593841,DR752715-DR753264, DN593842-DN593917, DR753265-DR753301, DR753303-DR753936,FG068230-FG068291, DR421062-DR421601, DR752151-DR752714, DR753937-DR753939,DR907568-DR908072.

See Commentary on page 11655.1To whom correspondence may be addressed. E-mail: [email protected] or [email protected].

2Present address: Human Genome Sequencing Center, Houston, TX 77030.3Present address: Library and Information Services, National Institute of EnvironmentalHealth Sciences, National Institutes of Health, Research Triangle Park, NC, 27709.

4Present address: Syngenta Crop Protection AG, 4332 Stein, Switzerland.5Present address: School of Bioscience, University of Exeter, Exeter EX4 4QD,United Kingdom.

6Present address: Center for Public Health Genomics, University of Virginia, Charlottes-ville, VA 22908.

7Present address: Pikeville College School of Osteopathic Medicine, Pikeville, KY 41501.8Present address: Department of Medicine, University of Massachusetts Medical School,Worcester, MA 01605.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1003391107/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1003391107 PNAS | June 29, 2010 | vol. 107 | no. 26 | 11889–11894

GEN

ETICS

SEECO

MMEN

TARY

projects. Sequencing and analysis of chromosome assembliesfrom ascomycete fungi have revealed blocks of shared ancestralgene order (1, 2), the consequences and sorting of gene pairsafter whole-genome duplication (3), and the preferential loca-tion of large gene families near chromosome ends (4, 5). Com-parable complete-genome analyses in basidiomycete fungi areessential complements to whole-genome shotgun (WGS) as-semblies to increase the utility of the sequence information andto learn if unique features of genome organization are associatedwith the increased complexity of multicellular fungi.Coprinus cinereus (now known as Coprinopsis cinerea) (6) was

chosenby theFungalGenome Initiative (http://www.broadinstitute.org/science/projects/fungal-genome-initiative) as a key species ina cohesive genome-sequencing strategy designed to increase ourunderstanding of the biology, evolution, and biomedical implica-tions of the entire fungal kingdom. Studies on this fungus using toolssuch as DNA-mediated transformation (7) and RNAi silencing(8) have provided important insights into the regulation of multi-cellular development, mushroom fruiting, mating pheromone, andreceptor signaling pathways (9–11). Haploid homokaryotic hyphaeof basidiomycete fungi fuse to form dikaryons if different allelesare present at both A and B mating-type loci. Dikaryons formmushroom fruiting bodies under appropriate light input, tempera-ture, and CO2 levels. The major controlling elements of the matingcascade have been uncovered with genetic tools, but the full com-plement of genes that respond during these conditions still is notwell understood.Studies using C. cinerea also have provided important insights

into meiotic chromosome behavior because of several experi-mental advantages. Meiosis occurs with a high degree of syn-chrony, and meiotic mutants have been obtained and analyzedusing a variety of cytological and molecular tools. Pioneeringstudies exploited the relatively small nuclear volume of C. cinereameiocytes and used serial sectioning and three-dimensional re-construction to describe the karyotype, the sites of synaptic ini-tiation, the dynamics of recombination nodule assembly, andchiasmata formation (12). More recently, meiotic mutants havebeen obtained and analyzed using a variety of cytological andmolecular tools including indirect immunofluorescence (13).We report here the complete genome sequence and chromo-

some assembly of C. cinerea (SI Text). The genome was assembledinto 13 chromosomes of 36 megabases (Mb) (AACS02000000;Dataset S1, Tables S1 and S2) consistent with cytological (12) andgenetic (14) evidence and supported by BAC fingerprinting(Dataset S1,Table S1).Weused theassembly to investigate rates ofmeiotic recombination along the chromosomes and the distribu-tion, expression, and evolutionary conservation of single-copygenes and gene families in this model basidiomycete.

ResultsHigh-Resolution Genetic Map. Cytological evidence has indicatedthat meiotic exchanges are highly enriched in subtelomeric regionsof the 13 chromosomes in C. cinerea (12), suggesting that re-combination rates might be non-uniform across the genome. Toexaminecrossoverdistribution,weused133 simple sequence repeat(SSR) markers evenly distributed across the genome and four ad-ditional markers to construct a genetic map (Fig. 1, SI Text, Fig. S1,and Dataset S1, Table S3). Examination of the marker genotypesof the progeny revealed regions of average, high, and low re-combination. The total genetic map length of the 31-Mb physicalgenome that could bemapped is 948 centimorgans (cM), indicatingan average frequency of exchange of 33 kb/cM. However, the “hot”regions (8% of the genome) exhibit an elevated rate of re-combination (6 kb/cM on average), whereas the “cold” regions(44%of the genome) exhibit very little recombination (198 kb/cM).The hot regions are located predominantly in subtelomeric

regions (16/18 are within the 15% of the nearest telomere; Fisher’sexact test, P = 0.0002) (Fig. S1). Although the frequency of chi-

asmata (cross-overs) per bivalent ranges from 1 to 12 in both fungiand other organisms (reviewed in ref. 15), species with both low andhigh levels of exchange exhibit an elevated rate of recombination insubtelomeric regions (16–18). This elevation may reflect associa-tions between the initiation of chromosome synapsis at subterminalregions and chiasma formation in these species (15, 19).

Retrotransposon Distribution. Transposon sequences (2.5% of ge-nomic sequence) are not distributed uniformly across the genome(Fig. 1, Fig. S1, and Dataset S1, Tables S4a and S4b). Eachchromosome contains a distinctive internal transposon clusterwhose position is highly correlated (R2 = 0.89) with the cyto-logical centromere (12) on the nine chromosomes that extendtelomere to telomere (Fig. S1). These transposon clusters (20%of all transposon-related sequences) could represent sequence-independent centromeres that are common in other fungi such asNeurospora crassa and Cryptococcus neoformans (20, 21). Many ofthese transposon clusters lie within regions that are cold formeiotic recombination. With the exception of the transposonclusters, the cold regions lack retrotransposon-related sequences.They contain only 3 of the 44 full-length retrotransposons (χ2=23.7, P < 0.001), and all of the cold regions contain extensivestretches (over 1 Mb on the larger chromosomes) that lack anyretrotransposon-related sequences (Fig. S1). The complexity offactors influencing the genome-wide distribution of transposonshas been noted (22, 23), but the pattern we see in C. cinerea withan exclusion of retrotransposons from most but not all regions oflow recombination has not been reported previously. Retro-transposons outnumber DNA transposons by a factor of 10 inC. cinerea (SI Text). Retrotransposon-related sequences are foundeither in the presumed centromere clusters or in regions of av-erage or high recombination, whereas the distribution of DNAtransposons is more uniform.

Telomere and Centromere location

0

37Transposable elements / 50 kb

0

7tRNA genes / 50 kb

Recombination rate

0

32Genes / 50 kb

0

1Orphan genes / genes / 50 kb

0

1Orthologous genes / genes / 50 kb

0

1Paralogous genes / genes / 50 kb

0

1Paralogs 1/dS

0 1 2 3

FISH synteny blocks

Fig. 1. Summary plots of chromosome II of C. cinerea. The plot shows thelocation of (Top panel) telomeres in red and centromere as black oval; (Secondpanel) thedensity of transposableelements (brown); (Thirdpanel) tRNAgenes(light green); (Fourth panel) recombination rates (the position of the SSRmarkers is indicated by vertical black bars, white is unmapped, red is high re-combination,gray is average recombination, blue is low recombination); (Fifthpanel) the density of all genes (orange); (Sixth panel), the density of orphangenes (light orange); (Seventh panel), the density of orthologous genes (blue);(Eighthpanel) thedensity ofparalogousgenes (red); (Ninthpanel) similarity ofparalog families represented as 1/dS; (10th panel) syntenic regions (all regionsof synteny between C. cinerea and L. bicolor are indicated in green, and blockswith >15 anchors are indicated in dark green). Vertical scales are defined foreach bar in the bar title. Horizontal scale is Mb.

11890 | www.pnas.org/cgi/doi/10.1073/pnas.1003391107 Stajich et al.

Ortholog and Paralog Distribution. To predict 13,342 protein-cod-ing genes, 267 tRNA genes, and 10 snRNA genes (Dataset S1,Table S5), we used computational tools in combination with evi-dence that included proteins from related species and 5,612 ESTs(SI Text). Gene calls were confirmed further by 5′ serial analysis ofgeneexpression (SAGE) from two tissue types (5,130 genemodels;Dataset S1, Table S6). Comparative analyses of the available ba-sidiomycete genomes reveals a dramatic increase in gene numberfrom less than 7,000 genes inC. neoformans andUstilagomaydis tomore than 13,000 in the sequencedAgaricomycete fungi includingC. cinerea (Dataset S1, Table S7). To understand better themechanism of the observed gene increase and to ask if expansionsof gene families are equally likely at different chromosomal loca-tions, we constructed gene families based on sequence similarityusing TribeMCL (24) (SI Text and Dataset S1, Table S8). We de-termined which C. cinerea genes are orphan (no homologs),orthologous (single-copy in C. cinerea and at least one Laccariabicolor ortholog), or paralogous (multicopy in C. cinerea). Weplotted the distribution of these three categories along thechromosomes (Fig. 1 and Fig. S1). It is striking that, whereas thedistribution of orphan genes is relatively uniform, orthologoussingle-copy genes are overrepresented in regions with low ratesof meiotic recombination, and paralogous multicopy genes arefound primarily in regions with average or high rates of meioticrecombination (Table 1). We conclude that several factors, in-cluding high recombination rates, tolerance of transposable ele-ments, telomere proximity, and chromatin structure, potentiallycould contribute positively to the creation and maintenance ofduplicated paralogous genes in the discrete regions that we ob-serve in C. cinerea.

Paralogous Gene Families. Inspection of Dataset S1, Table S8reveals that the largest paralogous expansion involves genesencoding proteins with protein kinase domains. Because kinases inmany organisms mediate sophisticated control mechanisms essen-tial for complex structure and developmental patterns, we focusedon this family in more detail. BLAST and hidden Markov model(HMM) screening of the C. cinerea transcriptome indicates thepresence of 380 kinases (SI Text and Dataset S1, Tables S9a andS9b), including 12 classes not present in Saccharomyces cerevisiae,three of which have not been previously observed outside theMetazoa (Dataset S1, Table S9c). The largest family, with 133members, FunK1, has unusual modifications in conserved kinasemotifs, expanding the documented diversity of this important cat-alytic domain (Fig. 2). Catalytic residuesD166, N171, andD184 areconserved [1ATP.pdb numbering (25)], suggesting that FunK1family members use a catalytic mechanism similar to that of con-ventional protein kinases and are enzymatically active. There aresome notable changes in the FunK1 motifs. A highly conservedlysine corresponding to K168 is replaced by an invariant serine inFunK1. This lysine, which donates a hydrogen bond to the trans-ferred gamma phosphate group, presumably is replaced by a basicresidue from elsewhere in the FunK1 sequence if efficient catalysis

is to be retained. Several very highly conserved residues withoutdirect catalytic roles also are missing from FunK1, including H158,F185, and G186. However, conservation of R165 and K189, whichform intramolecular interactions with phosphorylated amino acidsin some protein kinases, suggests that members of the FunK1family are regulated by phosphorylation (25). The FunK1 familyhas homologs in the Agaricomycotina and Pezizomycotina, but notin other fungi, suggesting a potential link between this kinase familyand the multicellularity of these fungi. We observed that the leftsubterminal region of chromosome IX contains 59 FunK1 kinasesin all orientations (head to head, head to tail, and tail to tail) withonly two transposon clusters and very few interspersed nonkinasegenes. In contrast, the distribution of protein kinases from familieswith widely distributed orthologs is more scattered, with the vastmajority occurring in regions with low rates of meiotic recom-bination (Fig. S2).There also are copy-number changes within conserved fungal

kinase families. Phylogenetic analysis of the MAPK cascaderevealed Basidiomycota-specific duplications in the MAPK genes(S. cerevisiae FUS3/KSS1) which are involved in the pheromoneresponse, whereas HOG1 a p38 kinase, is single copy throughoutthe sampled fungi (Fig. S3).We examined two additional gene families of significant size

(cytochrome P450 and hydrophobins) to ask if duplicated paralogsfrom these families also are found in restricted chromosomeregions. The P450 gene family (125 genes contain the Pfam do-main) includes genes with metabolic roles in monooxygenase me-tabolism, and the family is implicated in the degradation capa-bilities of Phanerochaete chrysosporium on substrates ranging fromlignin to diesel fuel (26, 27). Phylogenetic analysis shows that theexpansion of the family was independent in P. chrysosporium andC. cinerea (SI Text andFig. S4). Although scatteredmembers of theP450 and other expanded families are found within cold regions,tandem repeats of these genes are found exclusively in genomicregions with higher recombination rates. In contrast to the FunK1family, many P450 gene paralogs are found in head-to-tail orien-tation with a maximum of four copies in any one chromosomallocation. A similar pattern is observed for the hydrophobin genefamily (34 genes). Hydrophobins are small, secreted proteins thatself-assemble at hydrophilic–hydrophobic interfaces to form am-phipathic films (28, 29). These films help hyphae emerge frommoist substrates to form aerial structures and also line internalcavities in the fruiting body (28).We have found thatC. cinerea hasthe largest described hydrophobin gene family for any fungus.Phylogenetic analysis (Fig. S5) shows independent expansions of

Table 1. Non-uniform gene distribution in chromosome regionswith different rates of meiotic recombination

Chromosomalregion* KS comparison, P value

Gene class Hot Average Cold Hot − cold (Hot + average) − cold

Orphans 5.5 5.5 5.0 0.6512 0.175Orthologs 5.6 7.1 11.4 1.32 E-12 2.20 E-16

Paralogs 7.7 6.2 3.9 3.08 E-9 2.55 E-12

KS, Kolmogorov-Smirnov.*Values reported are the mean number of features/50 kb in each type ofchromosome region as defined by recombination rate categories

sequence position

rela

tive

entr

opy

6

4

2

0

6

4

2

0

155

156157

158

159

160

161

162

163

164165

166167

168

169

170

171

172173

174

175

176177

178

179

180

181

182

183

184

185

186

187

188

189

190

156157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

180

181

182

183

184

185

186

187

188

189

190A

B

Fig. 2. HMM logos of protein kinase active site regions. (A) Conventionalprotein kinases. (B) FunK1 protein kinases. Conserved residues positions inboth conventional and FunK1 kinases are indicated with blue circles. Residuepositions that are nearly universally conserved in conventional ePK domainsbut are altered in FunK1 kinases are indicated with red circles.

Stajich et al. PNAS | June 29, 2010 | vol. 107 | no. 26 | 11891

GEN

ETICS

SEECO

MMEN

TARY

the gene family in P. chrysosporium, L. bicolor, and C. cinerea. Thehydrophobin paralogs often are found in head-to-tail orientationwith a maximum of seven genes in one cluster. These clusters arefound in regions with high rates of recombination, whereashydrophobins that are unique within their contigs occur in regionswith low rates of recombination.The patterns of gene duplication have important implications

for adaptation and evolution in some species where pathogenicfactors like adhesins and cell-surface variation genes tend to befound in the highly recombining regions near chromosome ends(4, 30). In C. cinerea, gene ontology (GO) terms are correlatedwith local recombination rates along the chromosomes (SI Textand Dataset S1, Table S10). In addition, we have found that theage of gene duplicates, as measured by calculating synonymoussubstitution rates (dS), correlates well with the recombination rateclasses. We identified 3,796 duplicated gene pairs and found thatpairs residing in cold regions are significantly older (P < 1E-16,Kolmogorov-Smirnov test; median dS 2.2) than the pairs thatreside in regions of average or elevated recombination (median dS1.95), as illustrated in Fig. 1 and Fig. S1. The genomic orientationof paralogous gene pairs also is highly nonrandom. Overall, 49%of all adjacent genes are in tandem orientation, as expected froma random distribution, whereas 88% of adjacent paralogous genesare in tandem orientation (P < 1E-16; Fisher’s exact test). Theincreased age in the cold regions of the genome indicates a lowergeneration rate of gene duplicates and confirms that an unequalrate of sequence evolution is a general property of all gene fam-ilies in C. cinerea.

Transcriptional Program Associated with Mating Behavior. Coor-dinated gene expression of adjacent FunK1 kinases, P450 genes,and hydrophobins might provide a selective advantage that con-tributes to the maintenance of the clusters. Alternatively, in-dividual members of these paralogous gene families may playimportant roles at discrete stages in C. cinerea development. Theease of culture and synchronous development (under the controlof light and nutritional cues) of C. cinerea greatly facilitatesstudies of transcriptional regulation. Accordingly, we designeda 13,230-feature microarray that includes at least one 70-meroligonucleotide for each known and predicted gene, EST, andrepeated element. We used the validated arrays to investigate thetranscriptional program that occurs during mating (SI Text). Inthe typical basidiomycete life cycle, nuclear fusion does not occurimmediately after mating cell fusion. Instead, a nucleus fromeach mating partner is maintained in a common cytoplasm (the“dikaryotic cell”), and all tissues (except for the multinucleatestipe cells) of the highly differentiated mushroom fruiting bodiesof C. cinerea (Fig. 3A) are composed of dikaryotic cells. Theformation of the dikaryotic mycelium in mushrooms is initiatedby fusion of undifferentiated hyphal cells and is maintained bya complex cell division in which both nuclei divide in synchrony inthe tip cell, and daughter nuclei then are partitioned equally intothe new tip and subterminal cells. Partitioning involves the for-mation of a structure known as the “clamp connection” (Fig. 3D)through which one of the daughter nuclei must pass. The steps ofdikaryon formation are controlled by two sets of unlinked, mul-tiallelic mating-type genes called “A” and “B” (Fig. S6). Manydetails of the process, including the different steps that are underthe control of the A genes (which encode homeodomain proteinsthat heterodimerize in a compatible mating) and the B genes(which encode pheromones and receptors that activate eachother in a compatible mating) are understood from classical ge-netics and more recent molecular studies (10).We examined mycelia in which targets of the A locus (the

“A-on” strain; Fig. 3C) and the B locus (the “B-on” strain, whichhas a morphology similar to the strain shown in Fig. 3B) wereexpressed separately and compared these with transcripts ex-pressed in the dikaryon (Fig. 3D) and in the unmated mono-

karyotic strain (Fig. 3B). We observed 877 transcripts with sig-nificant differences in expression levels in one or more of theseconditions (SI Text and Dataset S1, Table S11). Of particularinterest was a FunK1 kinase (CC1G_04033) with an ortholog inL. bicolor, which was significantly up-regulated in the A-on my-celium, along with a FunK1 paralog (CC1G_13267), suggestingthat these may have a unique role in cell signaling in the A-reg-ulated part of the pathway that requires synchronization of nu-clear division. Other differentially regulated genes of interestinclude the previously characterized genes clp1 and pcc1, 12transcription factors, five additional kinases including S. cerevisiaeHOG1 and RCK2 orthologs, STE3 receptors, major facilitatorsuperfamily transporters, and many genes involved in cell-cycleregulation, the cytoskeleton, and cell wall biogenesis (SI Text andDataset S1, Table S11). A MAPK signaling complex, whichincludes the HOG1 homolog Fus3, plays a central role in theyeast pheromone response (31). However, the Fus3 MAPK cas-cade does not occur outside of the Saccharomycetales (32). Wedid observe that HOG1 a p38 MAPK, and RCK2 a calcium/cal-modulin-dependent protein kinase, are both strongly down-reg-ulated in A-on and B-on cells. Orthologs of these protein kinasesare involved in the hyperosmotic response of yeast (33, 34).HOG1 is down-regulated in S. cerevisiae a1/α2 cells (35), but incontrast to what we observe in C. cinerea, its target, RCK2 (34), isnot (35). It has been suggested that when FUS3 transcription isshut down in a1/α2 cells, HOG1 is down-regulated so it is notspuriously activated in the site vacated by its homolog (35). InC. cinerea, down-regulation of HOG1 and its target RCK2 mustserve a different purpose, because the Fus3 is absent (32). Weconclude that our high-throughput approach to identifying po-tential regulators and targets in the A-on and B-on pathways isessential for understanding this complex morphogenetic process,especially because very few of the downstream targets of themating factors have been identified in any basidiomycete (36).Overall, we found very few examples of coordinated regulation

of adjacent paralogous gene duplicates in these experiments. De-spite the presence of 59 adjacent FunK1 family members in thegenome, scattered members of this cluster were coordinately reg-ulated during dikaryon formation. We did observe significantlycoordinated expression of a single paralogous tandem array ofhydrophobins and of a paralogous tandem array of P450 genesduring dikaryon formation. However, examination of expressionpatterns of all paralogous pairs indicated that adjacent pairs wereno more likely to be coordinately regulated than nonadjacent

Fig. 3. Photograph and micrographs of C. cinerea. (A) Mature C. cinereafruiting body that is shedding spores. The upper surface of the cap hasloosely adhering “veil cells.” The lower surface of the cap is composed of“gills” which support the basidia (meiotic cells). The cap is elevated severalcentimeters above the Petri dish by the “stipe” (stalk). (B) Simple septumbetween two cells in a monokaryotic hypha. (C) “False clamp” between twocells in an “A-on” hypha. (D) True clamp connection between two cells ina dikaryotic hypha (“A-on B-on”). Magnification in B–D is the same.

11892 | www.pnas.org/cgi/doi/10.1073/pnas.1003391107 Stajich et al.

pairs. We conclude that adjacent gene duplicates have diverged inexpression timing and perhaps function.

Synteny. Although there is larger number of genes in the currentlysampled Agaricomycotina genomes (~10,000–15,000) than inCryptococcus or Ustilaginomycotina (~6,000), there is no evidencefor a whole-genome duplication, because no substantial region ofduplication was identified through dot-plot or gene-based com-parisons of self-vs.-self of the C. cinerea genome. A comparativeapproach also was used to ask if the key genomic features presentin C. cinerea, particularly the potential for similar large genomicclusters with limited meiotic recombination, are represented inother Agaricomycetes. The genome of L. bicolor (37) provides anappropriate test case, because these Agaricomycetes last shareda common ancestor 200 Mya (SI Text). L. bicolor is an importantectomycorrhizal symbiont of hardwood and conifer species, al-though it also can adopt a transient saprotrophic lifestyle similarto that of C. cinerea. The genome of L. bicolor is 1.8 times largerthan the C. cinerea genome, contains 1.6 times the number ofpredicted gene models, and is estimated to contain 13.65 Mb oftransposons and transposon relics (in contrast to the 0.86 Mb inthe assembled C. cinerea genome).To identify blocks of synteny between these species, we

employed the program (for “Fast Identification of SegmentalHomology”), because the Manhattan distance metric it employsallows very asymmetric intervals between the syntenic “anchors”(38) (SI Text). We found that 39% of the assembled C. cinereagenome is syntenic with L. bicolor (Fig. S1 and Dataset S1, TableS12a). To estimate the total number of chromosomal rearrange-ment events that have occurred since their split from a commonancestor, we fit our data to the Nadeau-Taylor model (39, 40),which assumes that genes and chromosomal rearrangementbreakpoints are uniformly distributed at random along the chro-mosomes. We calculate a rate of 3.5–4.5 chromosomal rear-rangements per million years have accumulated along eachlineage since separation. This rate is at the high end of the rangedescribed previously for eukaryotes (41) and is approximately 3-fold higher than in S. cerevisiae (42).Despite the prevalence of rearrangements in these lineages, we

observed 10 blocks with more than 15 anchors (SI Text andDataset S1, Table S12b). Because these are highly unlikely (P <0.0016) if rearrangements are tolerated equally, it was of interestto determine the nature of these chromosomal regions that areunusually refractory to rearrangement in mushrooms. Theseregions (3.4 Mb) are found primarily in genomic regions with lowmeiotic recombination rates on the five largest chromosomes(Fig. 1 and Fig. S1). GO analysis of these regions revealed thatthey are enriched (P < 0.0005 to P < 0.01) in genes annotated tobasic structures and processes such as nitrogen metabolism, thecytoskeleton, and metabolic regulation, as well as in particularGprotein-coupled receptors (Dataset S1,TableS13).These regionscontain 2.4 times thenumber of expected transcription factors (χ2=67.7, P < 0.001) and lack transposable elements. Interestingly,the 1,378 genes in these blocks are spaced on average only 872 bpapart, in contrast to the average gene spacing in the genome(1,261 bp) and in sharp contrast to the average gene spacing inregions that display elevated rates of recombination (1,655 bp).

DiscussionOur gap closure, telomere and centromere identification, andsequence anchoring to the genetic map have increased the valueof the draft C. cinerea genome sequence. The revised gene callssupported by ESTs and 5′SAGE profiling reported here (SI Text)enabled accurate annotation and comparative analyses. We con-clude that regions of the genome that have persisted intact overevolutionary time exhibit low rates of meiotic recombination andalso lack sequences such as tandemly repeated members of largegene families and transposable elements that could promote ec-

topic recombination. The presence of transcription factors in thesyntenic blocks may be an important developmental control that isinterdependent and cannot easily be disrupted. Studies focusingon the phenotypes of knockdown or knockouts of these genes nowcan be undertaken to determine if they play important roles inmushroom development. Further profiling of the C. cinerea geneexpression during specific tissue and developmental time points,enabled by the completion and annotation of the genome, willhelp address these questions.We also observed that paralogous gene families are overre-

presented near chromosome ends, in regions of average or highmeiotic recombination. The finding of independent expansions inseveral Agaricomycotina genomes, likely driven by tandem dupli-cations in both the P450 and hydrophobin families, suggests a com-mon mechanism for rapid family expansions. The diversification ofthese families may indicate acquisition of new or partitioned func-tion (43) andmay indicate an increased importance of these familiesin the cellular processes of Agaricomycotina fungi. The furtheridentification of protein kinase FunK1 family (a family that is onlyfound in multicellular fungi) expansions in C. cinerea suggests ex-perimental approaches to the identification of signaling pathwaysthat control cellular development and differentiation processesspecific to multicellular fungi.Our expression studies to date suggest an important role for

specific FunK1 family members in C. cinerea development. Theyalso provide no evidence for coordinate regulation of adjacentFunK1 family members or for the vast majority of duplicatedparalogous genes in the C. cinerea genome. Because manyaspects of genome organization are open to experimental ma-nipulation in C. cinerea, the potential influence of chromosomeposition on expression of key members of this important familycan be evaluated systematically.

Materials and MethodsGenome Sequencing, Assembly, and Annotation. The haploid Okayama 7 #130strain was sequenced by WGS sequencing to a level of 10× coverage of thepredicted 36-Mb genome, assembled with Arachne (44). Genome finishinginto chromosomes was undertaken by manual inspection of the assembly,PCR from ends of WGS contigs, identification of telomeric repeats, and in-corporation of BAC mapping data (SI Text). The protein-coding genes (13,342sequences) were identified and confirmed using a combination of geneprediction and evidence-based tools including ESTs and 5′SAGE (SI Text). Theinitial sequence, assembly, and annotation can be accessed at http://www.broadinstitute.org/annotation/genome/coprinus_cinereus.

Dikaryon Formation (Mating) and Microarray analysis. Four strainswereused toexamine genes under the control of the A pathway (“A-on”; AmutB43), the Bpathway (“B-on”; A43Bmut), and both pathways (“A-on, B-on”; AmutBmut)in comparison with the reference (“A-off, B-off”; A43B43). The 70-mer oli-gonucleotide arrays include 13,230 probes designed for all predicted genesand ESTs. Methods for hybridization, data capture, and analysis are describedin SI Text, and data have been deposited in GEO (GSE20628). Annotations,ESTs, SAGE tags, SSRs, and linkage data, and oligonucleotides on the micro-array platform can be accessed at http://genome.semo.edu/ccin.

ACKNOWLEDGMENTS. We thank the Center for Genomics and Bioinfor-matics at Indiana University for microarray production and Susan Whitfieldfor assistance with the figures. This article benefited greatly from input fromC. D. Jones, J. Heitman, G. May, G. R. Fink, J. W. Taylor, and members of theJ. W. Taylor laboratory. Support to finish, annotate, and map the C. cinereagenome was provided to P.J.P., F.S.D., A.C.G., W.W.L., S.K.W., J.B.H., and A.Hby National Science Foundation (NSF) Grant EF 0412016. J.D. and T.J.V. weresupported by NSF Grant DBI0227314 (to T.J.V.). M.L.F. was supported by NSFGrantMCB 0135462, and C.J.R. was supported by an NSF graduate fellowship.J.E.S. was supported by a NSF graduate fellowship and by a postdoctoralfellowship from the Miller Institute for Basic Research in Science. M.B., A.L.,and V.T.-H. were supported by National Institutes of Health (NIH) GrantHG00783 (to M.B.). G.M. was supported by NIH Grant HG004164. M.E.Z.,C.B., and H.P. were supported by NIH Grant GM43930 (to M.E.Z.). T.K. wassupported by a Grant-in-Aid for Scientific Research from the Ministry of Ed-ucation, Science, Sports, and Culture of Japan. U.K., P.J.H., S.K., and R.V. weresupported by the Deutsche Bundesstiftung Umwelt (to U.K.). H.S.K., C.H.A.,and C.K.C. were supported by the Research Grants Council of Hong Kong SAR,

Stajich et al. PNAS | June 29, 2010 | vol. 107 | no. 26 | 11893

GEN

ETICS

SEECO

MMEN

TARY

China, reference number 466608 (to H.S.K.). F.M. was supported by an InstitutNational de la Recherche Agronomique Grant “AIP BioResources.” Supportfor BAC end-sequencing was provided to H.M. by the Biotechnology Center,Akita Prefectural University. D.A., B.C., and A.T. were funded by the SwedishResearch Council. Array construction was supported by the Indiana METACyt

Initiative of IndianaUniversity, funded in part through amajor grant from theLilly Endowment, Inc. (to P.J.P. andM.E.Z.). TheUniversity of North Carolina atChapel Hill’s Libraries provided support for open access publication and pagecharges were paid by startup funds to J.E.S. from the University of California,Riverside.

1. Wong S, Butler G, Wolfe KH (2002) Gene order evolution and paleopolyploidy inhemiascomycete yeasts. Proc Natl Acad Sci USA 99:9272–9277.

2. Galagan JE, et al. (2005) Sequencing of Aspergillus nidulans and comparative analysiswith A. fumigatus and A. oryzae. Nature 438:1105–1115.

3. Scannell DR, et al. (2007) Independent sorting-out of thousands of duplicated genepairs in two yeast species descended from a whole-genome duplication. Proc NatlAcad Sci USA 104:8397–8402.

4. Butler G, et al. (2009) Evolution of pathogenicity and sexual reproduction in eightCandida genomes. Nature 459:657–662.

5. Fabre E, et al. (2005) Comparative genomics in hemiascomycete yeasts: Evolution ofsex, silencing, and subtelomeres. Mol Biol Evol 22:856–873.

6. Redhead SA, Vilgalys R, Moncalvo J, Johnson J, Hopple JS (2001) Coprinus Persoon andthe disposition of Coprinus species sensu lato. Taxon 50:203–241.

7. BinningerDM, Skrzynia C, Pukkila PJ, Casselton LA (1987) DNA-mediated transformationof the basidiomycete Coprinus cinereus. EMBO J 6:835–840.

8. Wälti MA, et al. (2006) Targeted gene silencing in the model mushroom Coprinopsiscinerea (Coprinus cinereus) by expression of homologous hairpin RNAs. Eukaryot Cell5:732–744.

9. Kamada T (2002) Molecular genetics of sexual development in the mushroomCoprinus cinereus. Bioessays 24:449–459.

10. Casselton LA, Kües U (2007) Sex in Fungi: Molecular Determination and EvolutionaryImplications, eds Heitman J, Kronstad JW, Taylor JW, Casselton LA (AmericanAssociation for Microbiology Press, Washington), pp 283–300.

11. Kües U (2000) Life history and developmental processes in the basidiomyceteCoprinus cinereus. Microbiol Mol Biol Rev 64:316–353.

12. Holm P, Rasmussen S, Zickler D, Lu B, Sage J (1981) Chromosome pairing, recombinationnodules and chiasma formation in the basidiomycete Coprinus cinereus. Carlsberg ResCommun 46:305–346.

13. Acharya SN, et al. (2008) Coprinus cinereus rad50 mutants reveal an essentialstructural role for Rad50 in axial element and synaptonemal complex formation,homolog pairing and meiotic recombination. Genetics 180:1889–1907.

14. Muraguchi H, Ito Y, Kamada T, Yanagi SO (2003) A linkage map of the basidiomyceteCoprinus cinereus based on random amplified polymorphic DNAs and restrictionfragment length polymorphisms. Fungal Genet Biol 40:93–102.

15. Burns C, Pukkila PJ, Zolan ME (2010) Cellular and Molecular Biology of FilamentousFungi, eds Borkovich KA, Ebbole DJ (American Society of Microbiology Press,Washington), pp 81–95.

16. Barnes TM, Kohara Y, Coulson A, Hekimi S (1995) Meiotic recombination, noncodingDNA and genomic organization in Caenorhabditis elegans. Genetics 141:159–179.

17. Jensen-Seaman MI, et al. (2004) Comparative recombination rates in the rat, mouse,and human genomes. Genome Res 14:528–538.

18. Barton AB, Pekosz MR, Kurvathi RS, Kaback DB (2008) Meiotic recombination at theends of chromosomes in Saccharomyces cerevisiae. Genetics 179:1221–1235.

19. Zickler D (2006) From early homologue recognition to synaptonemal complexformation. Chromosoma 115:158–174.

20. Galagan JE, et al. (2003) The genome sequence of the filamentous fungus Neurosporacrassa. Nature 422:859–868.

21. Loftus BJ, et al. (2005) The genome of the basidiomycetous yeast and humanpathogen Cryptococcus neoformans. Science 307:1321–1324.

22. Charlesworth B, Sniegowski P, Stephan W (1994) The evolutionary dynamics ofrepetitive DNA in eukaryotes. Nature 371:215–220.

23. Duret L, Marais G, Biémont C (2000) Transposons but not retrotransposons are locatedpreferentially in regions of high recombination rate in Caenorhabditis elegans.Genetics 156:1661–1669.

24. Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scaledetection of protein families. Nucleic Acids Res 30:1575–1584.

25. Zheng J, et al. (1993) 2.2 A refined crystal structure of the catalytic subunit of cAMP-dependent protein kinase complexed with MnATP and a peptide inhibitor. ActaCrystallogr D Biol Crystallogr 49:362–365.

26. Martinez D, et al. (2004) Genome sequence of the lignocellulose degrading fungusPhanerochaete chrysosporium strain RP78. Nat Biotechnol 22:695–700.

27. Doddapaneni H, Chakraborty R, Yadav JS (2005) Genome-wide structural andevolutionary analysis of the P450 monooxygenase genes (P450ome) in the white rotfungus Phanerochaete chrysosporium: Evidence for gene duplications and extensivegene clustering. BMC Genomics 6:92.

28. Wessels JG (1997) Hydrophobins: Proteins that change the nature of the fungalsurface. Adv Microb Physiol 38:1–45.

29. Wösten HA (2001) Hydrophobins: Multipurpose proteins. Annu Rev Microbiol 55:625–646.

30. Verstrepen KJ, Reynolds TB, Fink GR (2004) Origins of variation in the fungal cellsurface. Nat Rev Microbiol 2:533–540.

31. Elion EA (2000) Pheromone response, mating and cell biology. Curr Opin Microbiol 3:573–581.

32. Mody A, Weiner J, Ramanathan S (2009) Modularity of MAP kinases allowsdeformation of their signalling pathways. Nat Cell Biol 11:484–491.

33. O’Rourke SM, Herskowitz I, O’Shea EK (2002) Yeast go the whole HOG for thehyperosmotic response. Trends Genet 18:405–412.

34. Teige M, Scheikl E, Reiser V, Ruis H, Ammerer G (2001) Rck2, a member of thecalmodulin-protein kinase family, links protein synthesis to high osmolarity MAPkinase signaling in budding yeast. Proc Natl Acad Sci USA 98:5625–5630.

35. Galgoczy DJ, et al. (2004) Genomic dissection of the cell-type-specification circuit inSaccharomyces cerevisiae. Proc Natl Acad Sci USA 101:18069–18074.

36. Kahmann R, Schirawski J (2007) Sex in Fungi: Molecular Determintation andEvolutionary Implications, eds Heitman J, Kronstad JW, Taylor JW, Casselton LA(American Association for Microbiology Press, Washington).

37. Martin F, et al. (2008) The genome of Laccaria bicolor provides insights intomycorrhizal symbiosis. Nature 452:88–92.

38. Calabrese PP, Chakravarty S, Vision TJ (2003) Fast identification and statistical evaluationof segmental homologies in comparative maps. Bioinformatics 19(Suppl 1):i74–i80.

39. Nadeau JH, Taylor BA (1984) Lengths of chromosomal segments conserved sincedivergence of man and mouse. Proc Natl Acad Sci USA 81:814–818.

40. Sankoff D, Ferretti V, Nadeau JH (1997) Conserved segment identification. J ComputBiol 4:559–565.

41. Eichler EE, Sankoff D (2003) Structural dynamics of eukaryotic chromosome evolution.Science 301:793–797.

42. Gordon JL, Byrne KP, Wolfe KH (2009) Additions, losses, and rearrangements on theevolutionary route from a reconstructed ancestor to the modern Saccharomycescerevisiae genome. PLoS Genet 5:e1000485.

43. Ohno S (1970) Evolution by Gene Duplication (Springer-Verlag, Heidelberg).44. Batzoglou S, et al. (2002) ARACHNE: A whole-genome shotgun assembler. Genome

Res 12:177–189.

11894 | www.pnas.org/cgi/doi/10.1073/pnas.1003391107 Stajich et al.

Supporting InformationStajich et al. 10.1073/pnas.1003391107SI TextGenome Assembly. The haploid Okayama 7 #130 strain, a commonlaboratory strain, was sequenced by whole-genome shotgun(WGS) sequencing of 4-kb and 10-kb plasmids and 40-kb fosmidsto a level of 10× coverage of the predicted 36 megabase (Mb)genome. The initial assembly using Arachne (1) was made publicJuly 2003 and consisted of 431 WGS contigs with an N50 of218,423 kb. The contigs were assembled into 106 scaffolds bymate-pair information. One scaffold comprised the mitochondrial DNA(42,448 nt). Genome finishing was performed by PCR from ends ofWGS contigs predicted to be adjacent based on the scaffold as-signment and on BAC mapping. There were 157 successful PCRreactions yielding single products spanning two contigs, providingthe primary evidence for closing 96 gaps. Re-analysis of the orig-inal fosmid reads using Consed (2) and a limited number of ad-ditional PCR reactions gave evidence closing 60 gaps betweencontigs. Alignment of the WGS contigs showed 88 cases in whichcontigs overlapped. In 91 cases, neighboring contigs were joined inthe current assembly by abutting them without a known joiningsequence. TERMINUS (3) was used to identify telomeric repeatson contig ends. The chromosomes range in size from 0.98 Mb to4.15 Mb (Fig. S1), and the genome also contains 1.2 Mb of tan-demly repeated rDNA sequences at one end of chromosome VI.Most (9 of 13) chromosomes extend telomere to telomere, threechromosomes are missing one telomere, and one chromosome(chromosome VI) has telomere-linked rDNA sequences at oneend and is missing the other telomere (Fig. S1).The remaining 54 WGS contigs, comprising 337,367 nt or 0.94%

of the total sequence, were not placed on chromosomes. Thesecontigs contained transposon sequences (11 contigs) and/orwere linked to telomere repeats (12 contigs) using TERMINUS(3). The finished chromosome assembly containing 13 chromo-somes and 54 unplaced contigs has an N50 of 3.47Mb. The initialsequence, assembly, and annotation can be accessed at http://www.broadinstitute.org/annotation/genome/coprinus_cinereus/. The data in this paper, including annotations, ESTs, serialanalysis of gene expression (SAGE) tags, SSRs, linkage data, andoligonucleotides on the microarray platform can be accessed athttp://genome.semo.edu/ccin.

BAC Map Construction. A library of BACs was constructed afterpartial digestion of Okayama 7 #130 DNA with HindIII andligation to the pBACTZ vector as described (4). We finger-printed 1,728 clones using FingerPrinted Contig (FPC) analysis[tolerance 7 and cutoff of 1E-8 (5, 6)]. The FPC output wasanalyzed using BACFinder (7) which positioned 696 BACs(Dataset S1, Table S1). An additional 303 BACs that could notbe placed via FPC were end-sequenced and placed on the scaf-folds (Dataset S1, Table S1). The 27 BACs with ends mapping indifferent scaffolds were used to confirm 17 links between scaf-folds (Dataset S1, Table S2). Contigs containing a subtelomericregion were identified using the TERMINUS program (3), whichallowed the identification of 21 scaffold ends linked to telomeresequences (Fig. S1 and Dataset S1, Table S2).

Genetic Linkage Map Construction. RepeatMasker (8) was used toidentify 537 genomic regions with a minimum of seven perfect tri-nucleotide repeats or six perfect tetranucleotide repeats. Primerswere designed to produce 200- to 250-nt fragments flanking 404 ofthese simple sequence repeats (SSRs) and were used to amplifyDNAs including the SSRs from Okayama 7 #130 (the sequencedstrain, A43B43 ade3) and HT 14.01 #172 (the mapping partner,

A6B6 trp1). Polymorphisms ranging in size from 3 to 30 nt wereidentified for 133 regions, which were used as markers to constructthe linkagemap.ForwardprimerswithBigDyefluorophores addedto the 5′ end were used to amplify DNAs from the two parentalstrains and 46 random spore progeny in 96-well format. PCR re-actions (20 μL total) contained 8 ng genomic DNA, 20 mM Tris-HCl (pH 8.0), 50 mM KCl, 2 mM MgCl2, 0.25 μM sense primer,0.25 μM antisense primer, 0.8 mM dNTPs, and 0.1 U Taq poly-merase (Invitrogen). Cycling conditions were 95 °C for 3 min fol-lowed by 10 cycles of 94 °C (30 s), 62–52 °C (30 s; 1° decline eachcycle), 72 °C (45 s), then followed by 29 cycles of 94 °C (30 s), 52 °C(30 s), 72 °C (45 s) followed by a 20-min hold at 72 °C. Fragmentswere resolved using an ABI 3730XL Genetic Analyzer and scoredusing GeneMapper software (Applied Biosystems). In addition tothe polymorphic SSRs, we included the two mating-type loci (AandB) and two nutritional markers (ade3 and trp1) in themap. Thealleles at the mating-type loci were determined by crossing eachrandom spore to four mating-type tester strains (A43B43, A6B6,A43B6, A6B43) and observing clamp connections (Fig. 3D) in thecompatible mating (e.g., a segregant with the A43B6 mating typeforms clamp connections only with theA6B43 tester). The alleles atade3 and trp1 were determined by growth on minimal medium,minimalmediumplus adenine,minimalmedia plus tryptophan, andminimal medium plus both adenine and tryptophan.Segregation patterns for all the markers were examined, and

none exhibited segregation distortion. We used MAPMAKERsoftware (9) to construct the linkage map. The “group” commandwith a minimum LODlinkage of 3.0 and maximum distance of 60centimorgans (cM) produced 13 linkage groups that correspondedto the 13 chromosomes, with two exceptions. First, we observedpseudolinkage of the right arm of chromosome IV and chromo-some XIII. Contour-clamped homogenous field electrophoresisrevealed that in strain HT14.01 #172, chromosome IV is smallerand chromosome XIII is larger than their corresponding chro-mosomes in strain Okayama 7 #130, consistent with a terminaltranslocation. Four segregants exhibited two alleles for the ter-minal SSR on chromosome IV, indicating that they inheriteda normal chromosome and a translocated chromosome. Second,the markers on chromosome VIII fell into two groups. Becauselinkage between these two groups was well supported by the se-quence assembly, we assumed a map distance of 50 cM for thisinterval. The chromosome positions of the SSRs and othermarkers, map distances, and calculated kb/cM are shown inDataset S1, Table S6. Although a higher marker density in somegenomic intervals might reveal additional hotspots (if crossoverdistribution is non-uniform in these intervals), our results are ingood agreement with the linkage map produced using 219 randomamplified polymorphic DNA (RAPD) markers (10). Every chro-mosome in the RAPDmap displayed an internal cluster of RAPDmarkers (40% of all markers) that failed to recombine, as pre-dicted if the RAPD markers are randomly distributed in the Co-prinopsis cinerea genome and if 45%of the genome undergoes verylittle meiotic recombination. The linkage map was supportedfurther by the analysis of four tetrads which placed 60 crossovers.Only four of these crossovers occurred in cold regions, whereas 22occurred in hot regions. The linkage map also supported 11 linksbetween scaffolds (Dataset S1, Table S2).

Transposon Identification. Retrotransposons (class I transposableelements) were identified using PILER (11), RepeatScout (12),and RepeatRunner (13). PILER generated a library of 29 repeatfamilies. BLASTX (14) analysis of those families showed that five

Stajich et al. www.pnas.org/cgi/content/short/1003391107 1 of 6

families showed strong hits to retrotransposons, five familiesshowed weak hits to various proteins, and 19 families showed nosignificant hits. Four of the five families (0.27, 0.28, 23.4, and 6.21)showed similarity to Gypsy elements, whereas one family (10.17)showed similarity to Copia elements. Twelve additional Copia el-ements were found using a keyword search of the C. cinerea an-notations. RepeatMasker (8) was used to determine the copynumber and chromosome distribution of these families (DatasetS1, Tables S3a and S3b). A total of 834 kb is found in the fiveretrotransposon families. No DNA transposons (class II trans-posable elements) were identified through the previous analysis. Asearch of Pfam domains and gene predictions of the C. cinereaannotations was conducted to identify any potential DNA trans-posons. Initially, 26 potential elements were identified by thissearch method. The elements fall into three classes of DNAtransposons: Activator (hAT), Enhancer (En/spm), and Mariner(Tc1). A total of 70 kb is found in the three DNA transposonfamilies. The copy number and chromosome distribution of thesefamilies is shown in Dataset S1, Tables S3a and S3b.

Genome Annotation. The protein-coding genes (13,342 sequences)were identified using a combination of gene prediction andevidence-based tools. The v1 annotations generated requiredextensive training because of limited curated gene models forCoprinopsis or any Agaricomycete fungus at the time (2002). Theinitial gene set was produced on the first version of the assembly(NZ_AACS00000000.1) and used ab initio predictors SNAP (15),AUGUSTUS (16), Twinscan (17), and GeneZilla (18), which weretrained using a set of gene models predicted from protein-to-genome alignments generated by Genewise (19). Twinscan modelswere predictedwith thebasidiomycetePhanerochaete chrysosporiumas an informant genome. The comparative, computational, andexperimental data setswere combined into afinal gene call using thetoolGLEAN(20) that uses a latent classification scheme to scale thecontribution of each type of evidence and to assemble the exons intothe largest ORF. We constrained GLEAN to limit the size of theintrons to 300 nt.Additional gene calling was performed on the second version of

the assembly (AACS02000000) with the Broad Institute gene-calling pipeline using additional EST data, GeneMark.hmm+ESversion 2 (21), FGENESH (22), and GENEID (23). Previouslypredicted genes were used to identify genes that may have beenmissed and potential splits and merges. Mitochondrial genes werepredicted using GeneWise (19) and computational or manualidentification of ORFs based on BLAST (24) and Pfam homology(25) to reveal mitochondrial genes. Repeat regions were identi-fied based on multiple alignments of transcripts to the wholegenome and overlap with BLAST and Pfam hits from transposonproteins or domains. Genes flagged by these screens were ex-cluded from the final gene set based on manually determinedcutoff points. Predicted proteins without supporting evidencewere excluded from the final gene set if their coding sequence(CDS)/transcript length ratio was less than 0.33 or their CDSlength was less than 80 residues.The 267 tRNA genes were identified using tRNAScan-SE.

These genes are distributed randomly on all of the chromosomesand show no tendency to be colocalized with transposable ele-ments. Nuclear and mitochondrial ribosomal RNA genes wereidentified based on comparison with highly similar ribosomalRNA sequences from Coniophora puteana (GenBank accessionAM946631), Agrocybe aegerita (AAU54637), and Suillus sinus-paulianus (L47585), supplemented with predictions from RNAm-mer (26) and Rfam (27).

ESTs. Transcript data from 5,612 ESTs were employed to supportgene calls. We used nine different growth conditions for our ESTlibraries. Libraries CCFBM and CCK+6 were obtained froma dikaryon constructed from strains backcrossed to Okayama7

#130 for five generations (J6, 5–4 #409 × J6, 5–5 #410). Thedikaryon was grown on yeast extract/malt extract/glucose (YMG)medium (28) at 37 °C until confluent and then was transferred to25 °C with a 16-h light/8-h dark cycle.CCFBM: Fruit body caps were harvested either 1 h before or 1 h

after karyogamy. Poly-A+ RNA was isolated from both groupsand pooled to make cDNA, which was cloned into pBluescript IISK- phagemids. There are 1,175 accessions in the GenBank ESTdatabase from library CCFBM (DR774668–DR775517).CCK+6: Fruit body caps were harvested 6 h after karyogamy.

Library construction was performed as for CCFBM. There are1,667 accessions in the GenBank EST database from libraryCCK+6 (DN591505–DN593171).All other libraries were constructed using C. cinerea strain

Okayama 7#130monokaryoticmycelia and, except as noted, werecultured at 37 °C for 3 d on minimal medium before harvest oradditional treatment. Harvested mycelia were frozen in liquid ni-trogen and stored at −80 °C before RNA extraction. RNA wasextracted using the Qiagen RNeasy plant mini kit per kit in-structions. Libraries were constructed using the Stratagene Blue-script II XR cDNA library construction kit per kit instructions. Tominimize repeated sequencing of clones from highly expressedmRNAs in these libraries, we chose 47 predicted genes that hadthe highest number of ESTs in the CCFBM, CCK+6, and CCMINlibraries (which were not subtracted). Oligonucleotide probeswere designed for these genes (which included hydrophobins, ri-bosomal proteins, and many other “housekeeping” genes) andwere used to screen the remaining libraries. This method reducedthe redundant sequencing from 10% in the initial unsubtractedlibraries to 2% in the subtracted libraries.CCMIN: Mycelia were cultured at 37 °C for 3 d on minimal

medium (29) before harvest. There are 670 accessions in theGenBank EST database from library CCMIN (DN593172–DN593841).CCYMG: Mycelia were cultured at 37 °C for 3 d on YMG

medium before harvest. There are 550 accessions in the GenBankEST database from library CCYMG (DR752715–DR753264).CCRAP (rapamycin): Mycelia were cultured at 37 °C for 3 d on

minimal medium and then were transferred to minimal mediumwith 100 mM rapamycin for 1 h before harvest. There are 747accessions in the GenBank EST database from library CCRAP(DN593842–DN593917, DR753265–DR753301, and DR753303–DR753936).CCOS (osmotic shock): Mycelia were cultured at 37 °C for 3 d

on minimal medium and then were transferred to minimal me-dium with 1M sorbitol for 1 h before harvest. There are 62 ac-cessions in the GenBank EST database from library CCOS(FG068230–FG068291).CCHS (heat shock): Mycelia were cultured at 37 °C for 3 d on

minimal medium and then were transferred to prewarmed mini-mal medium and incubated at 42 °C for 1 h before harvest. Thereare 540 accessions in the GenBank EST database from libraryCCHS (DR421062–DR421601).CCCN(complexcarbon/nitrogen source):Myceliawere cultured

at 37 °C for 3 d on minimal medium with 2% wt/vol cellobiose and0.4% wt/vol gelatin substituting for glucose and L-asparagine.There are 560 accessions in the GenBank EST database from li-brary CCCN (DR752151–DR752714 andDR753937–DR753939).CCSEN (senescent): Mycelia were cultured at 37 °C on min-

imal medium. After 4 d, the extent of the mycelium was markedon the plate. After 9 d, the first 4-d growth was harvested. Thereare 560 accessions in the GenBank EST database from libraryCCSEN (DR907568–DR908072).

SAGE Library Construction. To confirm the gene calls and to examinethe regulatory complexity in C. cinerea, we prepared 5′ SAGE li-braries from vegetative (dikaryotic) tissue and from fruit body pri-mordia. A dikaryon using strains backcrossed to Okayama 7 #130

Stajich et al. www.pnas.org/cgi/content/short/1003391107 2 of 6

(J6, 5–4#409× J6, 5–5#410)was cultivatedonYMGmedium(28).Themyceliumwas culturedonagarplates at 37 °C for about 7duntilthemycelium covered the whole agar surface. The primordiumwasinduced by incubating the mycelial culture at 25 °C under a light/dark regimeof14/10h.The incubatorwaskept ata relativehumidityhigher than 60%.TotalRNAwas extracted frommycelia when theygrew over the whole agar surface and from stage 1 primordia whenthey grew toaheight of about 5mm.TheRNAswere extracted fromtissue frozen in liquid nitrogen using TRI reagent (Molecular Re-search Center, Inc.) followed by chloroform extraction and pre-cipitation. SAGE library construction and analysis were as de-scribed (30), with some modifications. Poly-A+ RNA was isolatedusing the PolyATract mRNA isolation system (Promega) followingthe manufacturer’s protocol. First-strand cDNA was synthesizedusing SuperScript III First-Strand Synthesis System for qRT-PCR(Invitrogen). Two separate first-strand synthesis and template-switching (TS) reactions were applied for each developmental stage(mycelium and primordium). The TS oligos were A: 5′-GGGAT-TTGCTGGTGCAGTACAGGATCCGACggg-3′; B: 5′-GCTGC-TCGAATTCAAGCTTCTGGATCCGACggg-3′, where ‘g’ standsfor ribonucleotide. Second-strand cDNA synthesis was performedby low-cycle primer extension using Advantage 2 polymerase (Clon-tech). Oligos used were CDS primer: 5′-CAGTGGTATCAACG-CAGAGTAC(dT)20VN-3′, Anchor primer A: 5′-GGGATTTGC-TGGTGCAGTACAGGATCCGAC-3′; Anchor primer B: 5′-GCT-GCTCGAATTCAAGCTTCTGGATCCGAC-3′. PCR cycling con-ditions were 72 °C for 5 min; 95 °C for 45 s; 95 °C for 10 s, 55 °Cfor 30 s, 68 °C for 4 min for five cycles and 68 °C for 3 min. ThePCR products then were purified using QIAquick PCR purifi-cation kit (Qiagen). Samples were digested with MmeI (NewEngland BioLabs), and 50-bp bands were recovered after acryl-amide gel electrophoresis. The cDNAs were ligated to from 100-bp ditags and were amplified using anchor primers A and B withthe following cycling conditions: 95 °C for 2 min; 95 °C for 30 s,65 °C for 45 s, 72 °C for 20 s for 10 cycles, and 72 °C for 3 min.The ditags were purified by phenol/chloroform extraction andethanol precipitation and were used for high-throughput py-rosequencing (454 Life Sciences GS20 sequencer). Individualtags were extracted from the ditag and checked for sequencequality (Phred equivalent >20), and the unmapped starting Gresidues were removed. The resulting tags were aligned to thegenome. Predicted genes with two ormore tags<500 nt upstreamof the start codon are recorded as “sense” in Dataset S1, TableS5; predicted genes with two ormore tags<500 nt downstream ofthe stop codon or within the coding region are recorded as “an-tisense” in Dataset S1, Table S5. Strand-specific PCR confirmedthe presence of four predicted antisense transcripts.

Gene Families. A similarity search of all protein-coding genes inC. cinerea, Laccaria bicolor (31), and P. chrysosporium (32, 33) wasperformed using National Center for Biotechnology Information(NCBI) BLASTP (24) with an E-value cutoff of 1E−10 as input forthe clustering of proteins into protein families. TribeMCL (34) wasrun with default inflation value (1.5) and generated 7,433 proteinfamilies (at least twomembers in a family) and 5,044 singletons fromC. cinerea, L. bicolor, and P. chrysosporium. The functional anno-tation using Pfam showed that families with hydrophobin domainand P450 domain are among the most expanded in C. cinerea(Dataset S1, Table S8).

Protein Kinases.A preliminary set of C. cinerea protein kinases wasdelineated using hmmsearch (version 2.3.2) (35, 36) to screen thepredicted expressed proteins with a hidden Markov model(HMM) derived from an alignment of the complete Dictyosteliumdiscoideum kinome (37) using an E-value cutoff of 1. Additionalprotein kinases with E-values between 1E-1 and 1E-100 wereidentified based on the conservation of protein kinase sub-domains (38) determined by manual inspection of the alignment

to the HMM alignment. A six-frame translation of the genomealso was screened with a library of atypical and divergent proteinkinases (kin20.hmm, available at www.kinase.com). Translatedprotein kinases and their classifications, locations, and corre-spondence with the previous annotation release are presented inDataset S1, Table S9a. Genes that were missed in initial anno-tation are denoted NewKin with a numeric designation.Kinases were classified based on BLAST (24), and class-specific

HMMs; ambiguous cases were resolved based on differentialBLAST/HMM scores and manual examination of multiple se-quence alignments. Novel C. cinerea kinases were clustered usingOrthoMCL (39) to form seed groups, which were aligned and usedto build customHMMs. TheC. cinerea genome, the nonredundantprotein sequence database at NCBI, and fungal databases werescreened against these custom HMMs to identify novel C. cinereakinasesmissed in the initial screens, and to identify novel kinases inother species. Subfamilies within the FunK1 family were de-lineated using a neighbor-joining tree (40). The numbers of familyand subfamilymembers inC. cinerea are comparedwith those froma diverse set of species with complete or draft kinomes in DatasetS1, Table S9b. Differences between the C. cinerea kinome and theextensively characterized Saccharomyces cerevisiae kinome arehighlighted in Dataset S1, Table S9c.The FunK1 family members are of particular interest, because

these kinases appear to be restricted to multicellular fungi. Dif-ferential transcription of several of these family members has beennoted during specific steps of dikaryon formation (main text andDataset S1, Table S11). To date, differential transcription of thetwo FunK1 family members that lie within theB locus (Fig. S6) hasnot been detected. A comparison of FunK1 and conventionalkinases was performed to examine whether key regulatory residueswere conserved (Fig. 3). The HMMlogo was constructed usingHMMeditor (41) using structure 1ATP (42) numbering. C. cinereahas 16 members of the tyrosine kinase-like (TKL) kinase group,which are the likely progenitor of animal tyrosine kinases. Thisgroup of kinases is found in the Basidiomycetes but is entirelyabsent from the Saccharomycotina. In plants and D. discoideum,TKLs often function as receptor kinases, but no fungal membersappear to bemembrane linked. In fact, no conventional or atypicalC. cinerea kinase is predicted to contain a transmembrane helixdomain; if any of these kinases transduce intercellular signals, theypresumably do so through a distinct set of nonkinase receptors.L. bicolor protein kinases were manually curated previously

(43), and the total counts per family are presented in Dataset S1,Table S9b.A phylogenetic tree of MAPK also was constructed to check for

the presence of FUS3 andHOG1 orthologs in C. cinerea (Fig. S3).Examination of the tree revealed that orthologs of both are foundin C. cinerea and that several duplications of the FUS3 family haveoccurred in the Basidiomycetes.

Phylogenetic Analysis of P450 and Hydrophobin Genes. The genefamily expansions from the TRIBE-MCL clusters showed obviousexpansions of the number of P450 domain-containing genes inC. cinerea. A large count of P450 family members has been ob-served previously in P. chrysosporium (44), but the extent to whichthe expansion in the Agaricales is independent has not been ex-plored. To identify the total number of genes containing theP450 Pfam domain, genes were counted using hmmsearch and theP450 HMM profile. The Pfam count did not exactly match thenumber of C. cinerea genes in TRIBE-MCL clusters labeledP450, presumably because of some extraneous sequence addi-tions, but the trend of large numbers is found by both methods.The hmmsearch identifies the full complement of P450 members,whereas the TRIBE-MCL approach can help identify subfamilies.For example, Cluster 2 is a large family in both C. cinerea (70copies) and P. chrysosporium (62 copies) but is a smaller family inL. bicolor (27 copies).

Stajich et al. www.pnas.org/cgi/content/short/1003391107 3 of 6

Toexaminethephysicalclusteringofgenes inthesamefamily,wecomputed the number of adjacent genes in each of the P450-containing TRIBE families. Cluster 2 has 70 total members inC. cinerea, with six pairs of immediately adjacent genes on chro-mosome II, three pairs on chromosome IX, two pairs on chro-mosome VI, and one pair on chromosome IV. Cluster 75 has14 total genes fromC. cinerea,with two sets of four adjacent geneson chromosome X (CC1G_01582–CC1G_01585; CC1G_01619–CC1G_01622). There are additional examples of adjacent genepairs for cluster 114 with three separate clusters containing seventotal genes on chromosome VIII and a pair on chromosome Vfor cluster 31.To study the evolutionary history of the families, we used

MrBayes (45) to construct phylogenetic trees of many of thesubfamilies of P450 as defined by TRIBE-MCL clusters (Fig. S4).The phylogenetic tree from cluster 75 shows the pattern of spe-cies-specific duplications where most of the members coalescewithin a species, indicating the gene family diversified after di-vergence from the ancestor of P. chrysosporium and C. cinerea.Furthermore, the duplication has occurred independently in boththe P. chrysosporium and C. cinerea lineages, because both speciesshow the pattern of gene family radiations in separate clades.Many members of the families are found near each other on thechromosome in tandem arrays, indicating that in some cases localgene duplication drives the family expansion.A similar analysis was explored in the hydrophobin gene family.

These genes also showedpatterns of genomic adjacency, indicatingthat local gene duplication drives the expansion of the hydro-phobin family as well. Comparison of the copy number of hydro-phobins across the Basidiomycetes also showed a large expansionin the Agaricomycetes fungi sampled relative to Ustilago maydis(one copy) (46) or Cryptococcus neoformans (no copies) (47).Family 29 contains most of the hydrophobins, with a total of 34members from C. cinerea but only 17 and 15 for L. bicolor andP. chrysosporium, respectively. A phylogenetic tree of the rela-tionships of these sequences can be seen in Fig. S5 showing thatfor the most part there are well-supported clades of species-specific gene duplications indicating independent duplication andexpansion of the family in each of the lineages represented bythese genomes. Starred genes indicate groups of immediatelyadjacent genes. The group marked with stars indicates two sets ofadjacent genes that are found on chromosome X; one is made upof five genes from CC1G_02181–CC1G_02185, and nearby isa second pair of genes, CC1G_02174 (marked with an open star)and CC1G_02173. The close proximity of the two sets of genesindicates there was a hot spot for duplication of hydrophobinsbetween 1.84–1.86 Mb on chromosome X.

Gene Ontology Analysis of Genes in Regions with Different RecombinationRates.We examined genes within 11 genomic regions with low ratesof genomic recombination (> 70 kb/cM, vs. the average 198 kb/cM)ranging in size from 0.8 to 2.4 Mb with 271–996 genes/region(Dataset S1, Table S6). We also examined the 737 genes found ingenomic regions with high rates of genomic recombination (< 10kb/cM, vs. the average 6 kb/cM). These 11 “pseudoclusters” wereexamined using EASE analysis (48) running in the MeV platform(49) to determine potential enrichment of gene ontology (GO)classes for the 11 sets of linked genes and the set of genes foundin regions with high rates of genetic exchange.It has long been recognized that a reduced recombination rate

would be advantageous when particular combinations of allelesat different loci provide a selective advantage and/or other com-binations present a disadvantage (50–52). Evidence that genesunder such “epistatic selection” are in regions that are cold forrecombination also has been noted in the yeast S. cerevisiae (53, 54).We tested if genes involved in particular biological processes, cel-lular components, or molecular functions were found in specificgenomic regions with low recombination and if the associated GO

terms were distinct from GO terms found associated with genes infreely recombining regions. The cold regions were enriched (P <0.000005 to P < 0.01) for a distinctive combination of functionsannotated to basic cellular processes such as helicase activity,Endoplasmic Reticulum-to-Golgi transport, RNA processing, ornitrogen metabolism (Dataset S1, Table S10). In contrast, genesannotated to other processes such as defense responses and cellwall catabolic processes were enriched in the regions of the chro-mosomes that exhibit high levels of meiotic recombination (Data-set S1, Table S10).

Dikaryon Formation (Mating) and Microarray Analysis. Mating com-patibility inC. cinerea involves a complex program that is controlledby two sets of unlinked genes, A and B. To identify downstreamtargets of these factors, we examined transcripts expressed in my-celia in which the A- or B-controlled parts of the pathway are ac-tivated separately. Strain Okayama 7 #424 (A43 B43) was crossedto strain #425 [AmutBmut (55)], and strains #422 (AmutB43) and#423 (A43Bmut) were recovered. The four strains were culturedonYMG, andRNAwas isolated as described (EST libraries). First-strand cDNA synthesis, Alexa-Fluor labeling using the SuperscriptIndirect cDNA Labeling System (Invitrogen), and array hybrid-ization were performed. The two-channel hybridizations compareda sample (AmutB43, A43Bmut, or AmutBmut) with the A43B43reference, and four replicates for each sample were analyzed. Datacapture and analysis of the 12 arrays was performed with GenePix4200A scanner (Molecular Devices) and identified with GenePixPro software (Molecular Devices) at Indiana University Center forGenomics and Bioinformatics. Spots were flagged for omissionusing GenePix software if they were scored as manually flagged,spots not found, sum of medians <200, or spot pixels<40. Data fora given oligonucleotide were included if two or more of the fourreplicates contained data for both probes. Of the 13,230 arrayprobes, 11,726 fulfilled these criteria for the AmutB43 arrays,11,798 for the A43Bmut arrays, and 10,055 for the AmutBmutarrays. Significance analysis of microarrays (56) was used to de-termine significant differences in gene expression between thesample and the control. Expression ratios were log2 transformed,and Dataset S1, Table S11 reports genes with median (sample/reference) log2 >2 [false-discovery rate (FDR) < 2%] in six cate-gories (up-regulated inAmutB43, down-regulated inAmutB43, up-regulated in A43Bmut, down-regulated in A43Bmut, up-regulatedin AmutBmut, and down-regulated in AmutBmut, with respect tothe A43B43 reference in each case). As expected, pcc1 and clp1were up-regulated in the AmutB43 category, although clp1 wassignificant at a FDR of 5% and thus does not appear in Dataset S1,Table S11. None of these categories included three or more genesin tandem, except for the two cases reported in the text.

Divergence Time Estimation. The divergence time estimate of 200My between L. bicolor and C. cinerea was estimated using theapplication r8s (57) and a pruned version of the previouslypublished fungal tree of life (58) with key basidiomycete lineagesretained. The maximum age of the Basidiomycota was set to 550Mya based on results from Taylor and Berbee (59), and thePenalized Likelihood model was applied. The resulting estimatefor the divergence between C. cinerea and L. bicolor is 194 Mya,for the origin of the Agaricomycetes (C. cinerea, L. bicolor, andP. chrysosporium common ancestor) is 317 Mya, and for origin ofthe Agaricomycotina (Agaricomycetes and C. neoformans com-mon ancestor) is 515 Mya. U. maydis and Sporobolomyces roseuswere included as additional outgroups in the Basidiomycetes.These time estimates are still useful comparison points for rel-ative ages of fungi despite suffering from too little fossil data andincomplete sampling of lineages in the Basidiomycetes andoutgroups. More detailed studies of fungal divergence times willhelp to establish robust date estimates.

Stajich et al. www.pnas.org/cgi/content/short/1003391107 4 of 6

Synteny. Syntenic regions were identified betweenC. cinerea andL.bicolor using FISH (for “Fast Identification of Segmental Ho-mology”) (60) based on BLASTP searches with a cutoff of 1E-5.FISH was run with default parameters, except that we requiredthe minimal block to contain at least four anchors (Dataset S1,

Table S12a). Initially, we observed 14 blocks with more than 15anchors in each (Dataset S1, Table S12b). However, in four cases(S1.1, S2.1, S5.2, and S5.3), two blocks with more than 15 anchorswere nearly adjacent in the genome and were treated as a singleblock for the GO analysis. GO analysis was as described above.

1. Batzoglou S, et al. (2002) ARACHNE: A whole-genome shotgun assembler. GenomeRes 12:177–189.

2. Gordon D, Abajian C, Green P (1998) Consed: A graphical tool for sequence finishing.Genome Res 8:195–202.

3. Li W, Rehmeyer CJ, Staben C, Farman ML (2005) TERMINUS—Telomeric End-ReadMining IN Unassembled Sequences. Bioinformatics 21:1695–1698.

4. Muraguchi H, Kamada T, Yanagi SO (2005) Construction of a bacterial artificialchromosome (BAC) library of Coprinus cinereus. Mycoscience 46:49–53.

5. Marra MA, et al. (1997) High throughput fingerprint analysis of large-insert clones.Genome Res 7:1072–1084.

6. Soderlund C, Humphray S, Dunham A, French L (2000) Contigs built with fingerprints,markers, and FPC V4.7. Genome Res 10:1772–1787.

7. Crowe ML, Rana D, Fraser F, Bancroft I, Trick M (2002) BACFinder: Genomiclocalisation of large insert genomic clones based on restriction fingerprinting. NucleicAcids Res 30:e118.

8. Smith AFA, Hubley R, Green P . RepeatMaskerOpen-3.0. 1996–2010. http://repeatmasker.org.

9. Lander ES, et al. (1987) MAPMAKER: An interactive computer package forconstructing primary genetic linkage maps of experimental and natural populations.Genomics 1:174–181.

10. Muraguchi H, Ito Y, Kamada T, Yanagi SO (2003) A linkage map of the basidiomyceteCoprinus cinereus based on random amplified polymorphic DNAs and restrictionfragment length polymorphisms. Fungal Genet Biol 40:93–102.

11. Edgar RC, Myers EW (2005) PILER: Identification and classification of genomic repeats.Bioinformatics 21(Suppl 1):i152–i158.

12. Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families inlarge genomes. Bioinformatics 21(Suppl 1):i351–i358.

13. Smith CD, et al. (2007) Improved repeat identification and masking in Dipterans. Gene389:1–9.

14. Gish W, States DJ (1993) Identification of protein coding regions by databasesimilarity search. Nat Genet 3:266–272.

15. Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59.16. Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and a new

intron submodel. Bioinformatics 19(Suppl 2):ii215–ii225.17. Korf I, Flicek P, Duan D, Brent MR (2001) Integrating genomic homology into gene

structure prediction. Bioinformatics 17(Suppl 1):S140–S148.18. Majoros WH, Pertea M, Salzberg SL (2004) TigrScan and GlimmerHMM: Two open

source ab initio eukaryotic gene-finders. Bioinformatics 20:2878–2879.19. Birney E, Clamp M, Durbin R (2004) GeneWise and genomewise. Genome Res 14:

988–995.20. Elsik CG, et al. (2007) Creating a honey bee consensus gene set. Genome Biol 8:R13.21. Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M (2008) Gene pre-

diction in novel fungal genomes using an ab initio algorithm with unsupervisedtraining. Genome Res 18:1979–1990.

22. Solovyev V, Salamov A (1997) The Gene-Finder computer tools for analysis of humanand model organisms genome sequences. Proc Int Conf Intell Syst Mol Biol 5:294–302.

23. Guigó R, Knudsen S, Drake N, Smith T (1992) Prediction of gene structure. J Mol Biol226:141–157.

24. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: A new generation of proteindatabase search programs. Nucleic Acids Res 25:3389–3402.

25. Finn RD, et al. (2008) The Pfam protein families database. Nucleic Acids Res 36(Database issue):D281–D288.

26. Lagesen K, et al. (2007) RNAmmer: Consistent and rapid annotation of ribosomal RNAgenes. Nucleic Acids Res 35:3100–3108.

27. Gardner PP, et al. (2009) Rfam: Updates to the RNA families database. Nucleic AcidsRes 37 (Database issue):D136–D140.

28. Rao PS, Niederpruem DJ (1969) Carbohydrate metabolism during morphogenesis ofCoprinus lagopus (sensu Buller). J Bacteriol 100:1222–1228.

29. Moore D, Pukkila PJ (1985) Coprinus cinereus: An ideal organism for studies ofgenetics and developmental biology. J Biol Educ 19:31–40.

30. Zhang Z, Dietrich FS (2005) Mapping of transcription start sites in Saccharomycescerevisiae using 5′ SAGE. Nucleic Acids Res 33:2838–2851.

31. Martin F, et al. (2008) The genome of Laccaria bicolor provides insights intomycorrhizal symbiosis. Nature 452:88–92.

32. Martinez D, et al. (2004) Genome sequence of the lignocellulose degrading fungusPhanerochaete chrysosporium strain RP78. Nat Biotechnol 22:695–700.

33. Vanden Wymelenberg A, et al. (2006) Computational analysis of the Phanerochaetechrysosporium v2.0 genome database and mass spectrometry identification ofpeptides in ligninolytic cultures reveal complex mixtures of secreted proteins. FungalGenet Biol 43:343–356.

34. Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scaledetection of protein families. Nucleic Acids Res 30:1575–1584.

35. Eddy SR. HMMER, version 2.3.2. 1995-2010. http://hmmer.janelia.org/.36. Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14:755–763.37. Goldberg JM, et al. (2006) The Dictyostelium kinome—Analysis of the protein kinases

from a simple model organism. PLoS Genet 2:e38.38. Hanks SK, Hunter T (1995) Protein kinases 6. The eukaryotic protein kinase

superfamily: Kinase (catalytic) domain structure and classification. FASEB J 9:576–596.39. Li L, Stoeckert CJ, Jr, Roos DS (2003) OrthoMCL: Identification of ortholog groups for

eukaryotic genomes. Genome Res 13:2178–2189.40. Saitou N, Nei M (1987) The neighbor-joining method: A new method for

reconstructing phylogenetic trees. Mol Biol Evol 4:406–425.41. Dai J, Cheng J (2008) HMMEditor: A visual editing tool for profile hidden Markov

model. BMC Genomics 9(Suppl 1):S8.42. Zheng J, et al. (1993) 2.2 A refined crystal structure of the catalytic subunit of cAMP-

dependent protein kinase complexed with MnATP and a peptide inhibitor. ActaCrystallogr D Biol Crystallogr 49:362–365.

43. Rajashekar B, et al. (2009) Expansion of signal pathways in the ectomycorrhizalfungus Laccaria bicolor—evolution of nucleotide sequences and expression patternsin families of protein kinases and RAS small GTPases. New Phytol 183:365–379.

44. Doddapaneni H, Chakraborty R, Yadav JS (2005) Genome-wide structural andevolutionary analysis of the P450 monooxygenase genes (P450ome) in the white rotfungus Phanerochaete chrysosporium: Evidence for gene duplications and extensivegene clustering. BMC Genomics 6:92.

45. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference undermixed models. Bioinformatics 19:1572–1574.

46. Kämper J, et al. (2006) Insights from the genome of the biotrophic fungal plantpathogen Ustilago maydis. Nature 444:97–101.

47. Loftus BJ, et al. (2005) The genome of the basidiomycetous yeast and humanpathogen Cryptococcus neoformans. Science 307:1321–1324.

48. Hosack DA, Dennis G, Jr, Sherman BT, Lane HC, Lempicki RA (2003) Identifyingbiological themes within lists of genes with EASE. Genome Biol 4:R70.

49. Saeed AI, et al. (2003) TM4: A free, open-source system for microarray datamanagement and analysis. Biotechniques 34:374–378.

50. Fisher R (1930) The Genetical Theory of Natural Selection (Clarendon, Oxford, UK).51. Kimura M (1956) A model of a genetic system which leads to closer linkage by natural

selection. Evolution 10:278–287.52. Nei M (1967) Modification of linkage intensity by natural selection. Genetics 57:

625–641.53. Pál C, Hurst LD (2003) Evidence for co-evolution of gene order and recombination

rate. Nat Genet 33:392–395.54. Wong S, Wolfe KH (2005) Birth of a metabolic gene cluster in yeast by adaptive gene

relocation. Nat Genet 37:777–782.55. Swamy S, Uno I, Ishikawa T (1984) Morphogenic effects of mutations at the A and B

incompatibility factors in Coprinus cinereus. J Gen Microbiol 130:3219–3224.56. Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to

the ionizing radiation response. Proc Natl Acad Sci USA 98:5116–5121.57. Sanderson MJ (2003) r8s: Inferring absolute rates of molecular evolution and

divergence times in the absence of a molecular clock. Bioinformatics 19:301–302.58. James TY, et al. (2006) Reconstructing the early evolution of Fungi using a six-gene

phylogeny. Nature 443:818–822.59. Taylor JW, Berbee ML (2006) Dating divergences in the Fungal Tree of Life: Review

and new analyses. Mycologia 98:838–849.60. Calabrese PP, Chakravarty S, Vision TJ (2003) Fast identification and statistical evaluation

of segmental homologies in comparative maps. Bioinformatics 19(Suppl 1):i74–i80.

Stajich et al. www.pnas.org/cgi/content/short/1003391107 5 of 6

Fig. S1. Summary plots of the 13 chromosomes of C. cinerea. Each plot shows the location of (Top panel) telomeres (if identified) in red, and centromere asa black oval; (Second panel) the density of transposable elements (brown); (Third panel) tRNA genes (light green); (Fourth panel) recombination rates (theposition of the SSR markers is indicated by vertical black bars, and white is unmapped; red is high recombination; gray is average recombination; blue is lowrecombination); (Fifth panel) the density of all genes (orange); (Sixth panel) the density of orphan genes (light orange); (Seventh panel) the density of or-thologous genes (blue); (Eighth panel) the density of paralogous genes (red); (Ninth panel) similarity of paralog families represented as 1/dS (synonymoussubstitution rate); (10th panel) syntenic regions (all regions of synteny between C. cinerea and L. bicolor are indicated in green, and blocks with >15 anchorsare indicated in dark green). Vertical scales are defined for each bar in the bar title. Horizontal scale is Mb (chromosome XIII is 0.982 Mb).

Fig. S1.

Other Supporting Information Files

Dataset S1 (XLS)

Fig. S2. Chromosomal distribution of protein kinases. Regions of high recombination (red), low recombination (blue), and extensive synteny (light gray) as inFig. S1. Green bars indicate protein kinases from widely conserved groups. Orange bars indicate FunK1 protein kinases.

Fig. S2.

Fig. S3. Phylogenetic analysis of the MAPK genes across nine fungi. Arrows indicate Basidiomycete-specific duplications. The prefixes of genes in the treeindicate species: AFUA, Aspergillus fumigatus; BDEG, Batrachochytrium dendrobatidis; CC, C. cinerea; CNAG, C. neoformans serotype A; LACBI, L. bicolor;SCCOM, Schizophyllum commune; SP, Schizosaccharomyces pombe; UM, U. maydis; Y, S. cerevisiae. Numbers at the internal nodes represent the posteriorprobability for the clade from MrBayes.

Fig. S3.

Fig. S4. Phylogenetic analysis of TRIBE-MCL cluster 75 containing a subfamily of the P450 gene family. The tree shows that both local duplication (stars andflorets indicate sets of adjacent gene pairs) and independent lineage-specific duplications have shaped the evolution of this family. Numbers at the internalnodes represent the posterior probability for the clade from MrBayes with thickened branches scaled to indicate more significant support.

Fig. S4.

Fig. S5. Phylogenetic analysis of the hydrophobin gene family. The MrBayes-computed tree depicts the hydrophobin domain containing genes inC. cinerea (prefixed by CC1G), L. bicolor (prefixed by Lbic), and P. chrysosporium (prefixed by Pchr) rooted with the U. maydis hydrophobin gene UM05010.Stars, crosses, and florets indicate clusters of adjacent genes (separated at most by one gene), with the two sets found on chromosome X indicated bya labeled vertical bar. Numbers at the internal nodes represent the posterior probability for the clade from MrBayes with thickened branches scaled toindicate more significant support.

Fig. S5.

Fig. S6. Structure of the A and B mating type loci. (A) The A locus. (B) The B locus. In both A and B, conserved sequence regions are indicated by a bold line;unique sequences of the Aα, Aβ, and B subloci are indicated by a thin line. Exons are shown in light blue. The 5′ UTR regions are shown in green, and the 3′ UTRregions are shown in red. Numbers indicate gene start and stop positions. Direction of transcription and gene names are indicated below the line. Paralogousduplications are shown in the same color.

Fig. S6.

Stajich et al. www.pnas.org/cgi/content/short/1003391107 6 of 6

Telomere and Centromere location

0

37Transposable elements / 50 kb

0

7tRNA genes / 50 kb

Recombination rate

0

32Genes / 50 kb

0

1Orphan genes / genes / 50 kb

0

1Orthologous genes / genes / 50 kb

0

1Paralogous genes / genes / 50 kb

0

1Paralogs 1/dS

0 1 2 3 4

FISH synteny blocks

Chromosome I

Telomere and Centromere location

0

37Transposable elements / 50 kb

0

7tRNA genes / 50 kb

Recombination rate

0

32Genes / 50 kb

0

1Orphan genes / genes / 50 kb

0

1Orthologous genes / genes / 50 kb

0

1Paralogous genes / genes / 50 kb

0

1Paralogs 1/dS

0 1 2 3

FISH synteny blocks

Chromosome II

Telomere and Centromere location

0

37Transposable elements / 50 kb

0

7tRNA genes / 50 kb

Recombination rate

0

32Genes / 50 kb

0

1Orphan genes / genes / 50 kb

0

1Orthologous genes / genes / 50 kb

0

1Paralogous genes / genes / 50 kb

0

1Paralogs 1/dS

0 1 2 3

FISH synteny blocks

Chromosome III

Telomere and Centromere location

0

37Transposable elements / 50 kb

0

7tRNA genes / 50 kb

Recombination rate

0

32Genes / 50 kb

0

1Orphan genes / genes / 50 kb

0

1Orthologous genes / genes / 50 kb

0

1Paralogous genes / genes / 50 kb

0

1Paralogs 1/dS

0 1 2 3

FISH synteny blocks

Chromosome IV

Telomere and Centromere location

0

37Transposable elements / 50 kb

0

7tRNA genes / 50 kb

Recombination rate

0

32Genes / 50 kb

0

1Orphan genes / genes / 50 kb

0

1Orthologous genes / genes / 50 kb

0

1Paralogous genes / genes / 50 kb

0

1Paralogs 1/dS

0 1 2 3

FISH synteny blocks

Chromosome V

Telomere and Centromerelocation

0

37Transposable elements / 50 kb

0

7tRNA genes / 50 kb

Recombination rate

0

32Genes / 50 kb

0

1Orphan genes / genes / 50 kb

0

1Orthologous genes / genes / 50 kb

0

1Paralogous genes / genes / 50 kb

0

1Paralogs 1/dS

0 1 2

FISH synteny blocks

Chromosome VI

Telomere and Centromere location

0

37Transposable elements / 50 kb

0

7tRNA genes / 50 kb

Recombination rate

0

32Genes / 50 kb

0

1Orphan genes / genes / 50 kb

0

1Orthologous genes / genes / 50 kb

0

1Paralogous genes / genes / 50 kb

0

1Paralogs 1/dS

0 1 2

FISH synteny blocks

Chromosome VII

Telomere and Centromere location

0

37Transposable elements / 50 kb

0

7tRNA genes / 50 kb

Recombination rate

0

32Genes / 50 kb

0

1Orphan genes / genes / 50 kb

0

1Orthologous genes / genes / 50 kb

0

1Paralogous genes / genes / 50 kb

0

1Paralogs 1/dS

0 1 2

FISH synteny blocks

Chromosome VIII

Telomere and Centromere location

0

37Transposable elements / 50 kb

0

7tRNA genes / 50 kb

Recombination rate

0

32Genes / 50 kb

0

1Orphan genes / genes / 50 kb

0

1Orthologous genes / genes / 50 kb

0

1Paralogous genes / genes / 50 kb

0

1Paralogs 1/dS

0 1 2

FISH synteny blocks

Chromosome IX

Telomere and Centromere location

0

37Transposable elements / 50 kb

0

7tRNA genes / 50 kb

Recombination rate

0

32Genes / 50 kb

0

1Orphan genes / genes / 50 kb

0

1Orthologous genes / genes / 50 kb

0

1Paralogous genes / genes / 50 kb

0

1Paralogs 1/dS

0 1 2

FISH synteny blocks

Chromosome X

Telomere and Centromere location

0

37Transposable elements / 50 kb

0

7tRNA genes / 50 kb

Recombination rate

0

32Genes / 50 kb

0

1Orphan genes / genes / 50 kb

0

1Orthologous genes / genes / 50 kb

0

1Paralogous genes / genes / 50 kb

0

1Paralogs 1/dS

0 1

FISH synteny blocks

Chromosome XI

Telomere and Centromere location

0

37 Transposable elements / 50 kb

0

7 tRNA genes / 50 kb

Recombination rate

0

32 Genes / 50 kb

0

1 Orphan genes / genes / 50 kb

0

1 Orthologous genes / genes / 50 kb

0

1 Paralogous genes / genes / 50 kb

0

1 Paralogs 1/dS

0 1 2

FISH synteny blocks

Chromosome XII

Telomere and Centromere location

0

37Transposable elements / 50 kb

0

7tRNA genes / 50 kb

Recombination rate

0

32Genes / 50 kb

0

1Orphan genes / genes / 50 kb

0

1Orthologous genes / genes / 50 kb

0

1Paralogous genes / genes / 50 kb

0

1Paralogs 1/dS

0

FISH synteny blocks

Chromosome XIII

I II III IV V VI VII VIII IX X XI XII XIII0

1

2

3

4

5

Chromosome

Pos

ition

(MB

)

0.2

YPR054W_SMK1

Um_um02331_CMGC_MAPK

LACBIDRAFT_239600

YLR113_HOG1

CNAG_01523.2

CC1G_07620T0

YBL016W_FUS3

Um_um02357_p38

CC1G_00577T0

YGR040W_KSS1

LACBIDRAFT_184962

CNAG_04282

LACBIDRAFT_232153

SCCOMDRAFT_75776

Um_um03305_CMGC_MAPK

BDEG_06646

CC1G_14865T0

CC1G_03633T0

AFUA_1G12940

Um_um11003_CMGC_MAPK

LACBIDRAFT_187188

CC1G_02007T0

SCCOMDRAFT_70675

SPAC31G5.09c_Spk1

LACBIDRAFT_250005

CNAG_02531.2

YHR030C_SLT2YKL161C_MLP1

SPBC119.08_Pmk1

AFUA_5G09100

SCCOMDRAFT_72648

LACBIDRAFT_174940

AFUA_6G12820

CNAG_02511

SCCOMDRAFT_17138

BDEG_06434

SPAC24B11.06c_Sty1

Um_um10107_CMGC_MAPK

BDEG_00152

AFUA_4G13720

CNAG_04514

LACBIDRAFT_244909

BDEG_00153

SCCOMDRAFT_82727

1

1

1

0.86

0.72

1

1

0.96

0.99

1

1

1

0.52

1

1

0.96

0.99

1

0.81

0.9

0.65

0.87

1

1

0.99

1

0.99

1

1

0.8

1

1

1

1

1

1

1

1

1

1

0.2

Pchr_8720Pchr_8694

Pchr_8707

Lbic_183409

CC1G_04037

Lbic_253329

Pchr_128038

Pchr_5749

Pchr_136973

Lbic_255302

Pchr_130492

CC1G_01621

CC1G_01620

CC1G_01622

Lbic_239939

CC1G_01583

Pchr_132113

CC1G_01638

CC1G_01619

Pchr_132577

Pchr_130797

Lbic_229974

Pchr_5081

CC1G_01582

CC1G_01585

Lbic_310984

CC1G_01643

Pchr_130502

CC1G_01584CC1G_01599

Pchr_130733

Pchr_8693

0.55

0.99

0.85

0.8

1

0.660.85

1

10.91

0.88

1

0.88

1

1

1

0.971

0.55

0.85

0.97

1

0.74

1

1

1

★★

0.2

CC1G_04061

CC1G_04875CC1G_04893

CC1G_09158

CC1G_05732

Lbic_335058

Pchr_7036

CC1G_08314

Pchr_3003

CC1G_02183

Pchr_420

Lbic_241509

CC1G_01505

Lbic_253148

CC1G_04754

Pchr_8629

Pchr_2996

CC1G_10176

Pchr_2995

CC1G_08363

CC1G_09503

UM05010

CC1G_02173

Pchr_7035

Lbic_399267

CC1G_06082

CC1G_10206

CC1G_02181

Lbic_253198

Lbic_251143

Lbic_180892

Lbic_399293Lbic_251219

Pchr_423

CC1G_05161

Pchr_3998

Pchr_6856

CC1G_02182

CC1G_01230

CC1G_06086

Pchr_2997

CC1G_05739

Pchr_9329

Lbic_238394

CC1G_06114

CC1G_10186

CC1G_13822

CC1G_02185

Pchr_8923

CC1G_04843

Pchr_135792

CC1G_10189

CC1G_09189

Lbic_328798

Lbic_318421

CC1G_10188

Pchr_8103

CC1G_02174

CC1G_06484

Pchr_3004

CC1G_02184

Pchr_5941

CC1G_04060

10.68

1

0.85

0.69

0.66

1

1

0.99

0.97

1

1

10.97

0.73

0.970.98

0.68

1

0.77

1

0.66

0.97

1

0.77

0.72

0.55

0.95

0.54

0.67

0.73

0.88

1

0.96

10.98

0.9

0.55

0.57

0.73

1

0.58

1

0.73

0.74

0.97

0.97

✤✤

★★

★✩

✩ Chromosome X

A

B

1780

498

1778

600

1774

742

1777

627

1771

508

1774

000

CC

1G_0

2149

.2

Ser

ine / thr

eoni

ne kina

se gen

e

1770

619

1768

398

1767

005

1763

373

CC

1G_0

2148

.2

Maj

or fa

cilit

ator

super

fam

ily gen

e

1765

700

1765

577

CC

1G_0

1534

3.2

Mem

bran

e tr

ansp

orte

r ge

ne

CC

1G_0

2147

.2

/ h yd

rola

se fold

pro

tein

gen

e

CC

1G_0

2150

.2

Ser

ine / thr

eoni

ne kina

se gen

e

CC

1G_0

2151

.2

Rec

epto

r gene CcS

TE - 215

1

1784

131

1782

025

Mod

ified

CC

1G_0

2153

.2

Rec

epto

r gene CcS

TE

-215

3

1787

131

1786

754

1788

294

1787

782

CC

1G_0

2154

.2

Uni

que C.

cinere

a gene

? C

C1G

_015

4.2

Uni

que C.

cinere

a gene

? C

C1G

_015

5.2

Uni

que C.

cinere

a gene

? 17

9002

3 17

8931

3

Phe

rom

one

gene

phb3.

3

1757

710

1756

368

CC

1G_0

144.

2 U

niqu

e C. cine

rea ge

ne?

1753

724

1755

484

1750

049

1748

904

1745

124

CC

1G_0

2139

.2

Mul

tidru

g tr

ansp

orte

r ge

ne msf

1.2

1742

704

1738

930

1738

721

CC

1G_1

5341

.2

Phe

rom

one

gene

phb3.

2

Mod

ified

CC

1G_1

5340

.2

Phe

rom

one

gene

phb3.

1 17

3760

3 17

3738

5

1740

169

1739

743

CC

1G_0

2140

.2

/ h ydr

olas

e fold

prot

ein

gene

1753

477

1751

061

1747

094

1745

808

CC

1G_0

2141

.2

/ h ydr

olas

e fold

prot

ein

gene

CC

1G_0

2142

.2

Mul

tidru

g tr

ansp

orte

r ge

ne msf

2

CC

1G_0

143.

2 U

niqu

e C. cine

rea ge

ne?

CC

1G_0

2145

.2

Pro

teas

ome 26

S s

ubun

it ge

ne

1758

706

1758

138

CC

1G_0

1534

2.2

Mem

bran

e tr

ansp

orte

r ge

ne

1763

095

1760

919

CC

1G_0

144.

2 U

niqu

e C. cine

rea ge

ne? 43

43

43

CC

1G_0

1533

8.2

Rec

epto

r gene

CcST

E3.

2b

CC

1G_0

2135

.2

Phe

rom

one

gene

phb2.

1

CC

1G_0

2134

.2

Phe

rom

one

gene

phb2.

2 C

C1G

_021

33.2

Phe

rom

one

gene

phb2.

3

CC

1G_0

2130

.2

Phe

rom

one

gene

phb1

1730

088

1733

640

1731

773

1732

234

1725

033

1724

848

1720

887

1718

027

CC

1G_0

2129

.2

Rec

epto

r gene

rcb1

CC

1G_0

2127

.2

Hyp

othe

tical

gen

e

CC

1G_0

2126

.2

Hyp

othe

tical gen

e 17

1451

4 17

1330

9

1716

182

1715

582

1721

624

1719

858

CC

1G_1

5337

.2

Phe

rom

one

gene

CcPhB

1 17

1789

6

1721

779

CC

1G_0

2132

.2

Phe

rom

one

gene

CcPhB

3 17

2656

7 17

2640

6 17

2755

0 17

2735

6

1729

042

1728

857

CC

1G_0

1533

8.2

Rec

epto

r gene

rcb2

CC

1G_0

2137

Rec

epto

r gene rcb3

17

3628

6 17

3478

0

B locus

43

43

43

43

43

43

43

2646

406

2647

545

CC

1G_0

1827

.2

CC

1G_0

1828

.2

HD

1 gene

d1 26

4946

2

2647

740

2653

400

2651

473

Mod

ified

CC

1G_1

3704

.2

HD

2 gene

b2

CC

1G_0

1831

.2

HD

1 gene

b1 26

5384

8

2655

981

2662

251

2663

799

Mod

ified

CC

1G_0

1833

.2

HD

2 gene

a2

CC

1G_0

1834

.2

HD

1 gene

a1

CC

1G_0

1835

mip

1

CC

1G_0

1832

.2

Uni

que C.

cinere

a gene

?

2666

417

2668

903

2664

189

2666

140

2660

048

2660

867

Aα sublocus a gene pair

Aβ sublocus b gene pair d gene pair

β-fg - 1 - 2 - 2 - 2 - 2