Distribution of mutations distinguishing the most prevalent disease-causing\u003c i\u003e Candida...

8
Distribution of mutations distinguishing the most prevalent disease-causing Candida albicans genotype from other genotypes § Ningxin Zhang a , Jenine E. Upritchard b , Barbara R. Holland c , Lauren E. Fenton b,1 , Martin M. Ferguson d , Richard D. Cannon b , Jan Schmid a, * a Institute for Molecular Biosciences, Massey University, Palmerston North, New Zealand b Department of Oral Sciences, University of Otago, Dunedin, New Zealand c Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand d Department of Oral Diagnostic and Surgical Sciences, University of Otago, Dunedin, New Zealand 1. Introduction The dimorphic fungus Candida albicans is a major opportunistic pathogen of humans that causes a wide variety of superficial infections (vaginitis, oral thrush), and life-threatening systemic and disseminated diseases (Odds, 1988; Sternberg, 1994; Morgan, 2005). Using DNA fingerprinting we had earlier identified a cluster of strains of very similar genotypes, comprising 37% of 266 infection-causing isolates in a global collection (Schmid et al., 1999). The remaining strains fell into 37 groups of equally similar genotypes and the prevalence of each of these groups among epidemiologically unrelated isolates ranged between 0.3 and 4.5%. Overall, the large cluster thus had a prevalence as an etiological agent of disease 10–100 times higher than each of the remaining groups. This increased prevalence was also observed when strains from different geographical regions, patient types and different types of infection were analyzed separately. In addition, strains belonging to the cluster were more prevalent as commensal colonizers than any other group. There is also indirect evidence that strains belonging to the cluster may be able to replace other strains in compromised patients (Schmid et al., 1999). Subsequent studies have confirmed these observations. The cluster identified by us is largely identical to clade 1 as defined by MLST typing (Odds et al., 2007) which in turn largely overlaps with the largest groups of genetically similar isolates found by other methodologies in other sets of isolates (Lott et al., 1999; Soll and Pujol, 2003; Odds et al., 2007). In the largest study of this kind, Odds et al. (2007) found clade 1 to be the predominant clade in all geographical regions, to cause all types of infections more often than any other clade and to be the most frequently encountered clade among commensal isolates. On the basis of our results we proposed that this large cluster constitutes a general-purpose genotype (GPG) (Schmid et al., 1999), owing its ubiquitous high prevalence to the possession of Infection, Genetics and Evolution 9 (2009) 493–500 ARTICLE INFO Article history: Received 29 October 2008 Received in revised form 15 January 2009 Accepted 16 January 2009 Available online 30 January 2009 Keywords: Candida albicans General-purpose genotype Fitness Genomic comparisons Prevalence ABSTRACT Candida albicans is a major opportunistic pathogen of humans. Previous work has demonstrated the existence of a general-purpose genotype (GPG; equivalent to clade 1 as defined by multi-locus sequence typing data) that is more frequent than other genotypes as an agent of human disease and commensal colonization. We undertook a genomic screen which indicated that a large number of mutations differentiate GPG strains from other strains and that such mutations are scattered throughout the genome. GPG-specific mutations are non-synonymous more frequently than expected by chance, and are not randomly distributed across functional and structural gene categories. Our analysis has identified three categories of genes in which GPG-specific mutations are over-represented, namely genes for which expression changes during the yeast-hyphal transition, genes for which expression changes as a result of exposure to antifungal agents and repeat-containing ORFs. Although we have no direct evidence that the individual polymorphisms identified confer selective advantages to GPG strains, the results support our contention that the high prevalence of GPG strains is not merely due to genetic drift but that GPG strains have reached a high prevalence because they possess a multitude of fitness-enhancing traits. They also indicate that the distribution of genes marked by GPG-specific mutations across functional and structural categories could identify physiological traits that are of particular importance to the success of GPG strains in their interactions with the human host. ß 2009 Elsevier B.V. All rights reserved. § Nucleotide sequence data reported in this paper are available in GenBank, EMBL and DDBJ databases under accession numbers DQ465825–DQ465906. * Corresponding author. Tel.: +64 6 350 5171; fax: +64 6 350 5688. E-mail address: [email protected] (J. Schmid). 1 Present address: ESR, Christchurch Science Centre, Christchurch, New Zealand. Contents lists available at ScienceDirect Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid 1567-1348/$ – see front matter ß 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.meegid.2009.01.007

Transcript of Distribution of mutations distinguishing the most prevalent disease-causing\u003c i\u003e Candida...

Infection, Genetics and Evolution 9 (2009) 493–500

Distribution of mutations distinguishing the most prevalent disease-causingCandida albicans genotype from other genotypes§

Ningxin Zhang a, Jenine E. Upritchard b, Barbara R. Holland c, Lauren E. Fenton b,1,Martin M. Ferguson d, Richard D. Cannon b, Jan Schmid a,*a Institute for Molecular Biosciences, Massey University, Palmerston North, New Zealandb Department of Oral Sciences, University of Otago, Dunedin, New Zealandc Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealandd Department of Oral Diagnostic and Surgical Sciences, University of Otago, Dunedin, New Zealand

A R T I C L E I N F O

Article history:

Received 29 October 2008

Received in revised form 15 January 2009

Accepted 16 January 2009

Available online 30 January 2009

Keywords:

Candida albicans

General-purpose genotype

Fitness

Genomic comparisons

Prevalence

A B S T R A C T

Candida albicans is a major opportunistic pathogen of humans. Previous work has demonstrated the

existence of a general-purpose genotype (GPG; equivalent to clade 1 as defined by multi-locus sequence

typing data) that is more frequent than other genotypes as an agent of human disease and commensal

colonization. We undertook a genomic screen which indicated that a large number of mutations

differentiate GPG strains from other strains and that such mutations are scattered throughout the

genome. GPG-specific mutations are non-synonymous more frequently than expected by chance, and

are not randomly distributed across functional and structural gene categories. Our analysis has identified

three categories of genes in which GPG-specific mutations are over-represented, namely genes for which

expression changes during the yeast-hyphal transition, genes for which expression changes as a result of

exposure to antifungal agents and repeat-containing ORFs. Although we have no direct evidence that the

individual polymorphisms identified confer selective advantages to GPG strains, the results support our

contention that the high prevalence of GPG strains is not merely due to genetic drift but that GPG strains

have reached a high prevalence because they possess a multitude of fitness-enhancing traits. They also

indicate that the distribution of genes marked by GPG-specific mutations across functional and

structural categories could identify physiological traits that are of particular importance to the success of

GPG strains in their interactions with the human host.

� 2009 Elsevier B.V. All rights reserved.

Contents lists available at ScienceDirect

Infection, Genetics and Evolution

journa l homepage: www.e lsev ier .com/ locate /meegid

1. Introduction

The dimorphic fungus Candida albicans is a major opportunisticpathogen of humans that causes a wide variety of superficialinfections (vaginitis, oral thrush), and life-threatening systemicand disseminated diseases (Odds, 1988; Sternberg, 1994; Morgan,2005). Using DNA fingerprinting we had earlier identified a clusterof strains of very similar genotypes, comprising 37% of 266infection-causing isolates in a global collection (Schmid et al.,1999). The remaining strains fell into 37 groups of equally similargenotypes and the prevalence of each of these groups amongepidemiologically unrelated isolates ranged between 0.3 and�4.5%. Overall, the large cluster thus had a prevalence as anetiological agent of disease 10–100 times higher than each of the

§ Nucleotide sequence data reported in this paper are available in GenBank, EMBL

and DDBJ databases under accession numbers DQ465825–DQ465906.* Corresponding author. Tel.: +64 6 350 5171; fax: +64 6 350 5688.

E-mail address: [email protected] (J. Schmid).1 Present address: ESR, Christchurch Science Centre, Christchurch, New Zealand.

1567-1348/$ – see front matter � 2009 Elsevier B.V. All rights reserved.

doi:10.1016/j.meegid.2009.01.007

remaining groups. This increased prevalence was also observedwhen strains from different geographical regions, patient typesand different types of infection were analyzed separately. Inaddition, strains belonging to the cluster were more prevalent ascommensal colonizers than any other group. There is also indirectevidence that strains belonging to the cluster may be able toreplace other strains in compromised patients (Schmid et al.,1999). Subsequent studies have confirmed these observations. Thecluster identified by us is largely identical to clade 1 as defined byMLST typing (Odds et al., 2007) which in turn largely overlaps withthe largest groups of genetically similar isolates found by othermethodologies in other sets of isolates (Lott et al., 1999; Soll andPujol, 2003; Odds et al., 2007). In the largest study of this kind,Odds et al. (2007) found clade 1 to be the predominant clade in allgeographical regions, to cause all types of infections more oftenthan any other clade and to be the most frequently encounteredclade among commensal isolates.

On the basis of our results we proposed that this large clusterconstitutes a general-purpose genotype (GPG) (Schmid et al.,1999), owing its ubiquitous high prevalence to the possession of

N. Zhang et al. / Infection, Genetics and Evolution 9 (2009) 493–500494

traits which confer selective advantages over other strains (Schmidet al., 1995, 1999; Giblin et al., 2001). Others have also suggestedthat mutations specific to these strains may confer selectiveadvantages (Lott and Effat, 2001). It is, however, also possible thatthe high prevalence of GPG strains is merely a result of bottlenecksand genetic drift.

If fitness-enhancing mutations rather than genetic drift explainthe high prevalence of GPG strains, genomic comparisons betweenGPG and other strains may reveal the reason for their success and thesuccess of C. albicans in the human host. However, since GPG strainscan be distinguished from other strains by a variety of geneticmarkers and appear to be successful under a wide variety ofcircumstances, the total number of distinguishing mutations and thenumber of fitness-enhancing mutations could be large, with each ofthe latter only making a small contribution to fitness. This wouldmake it difficult to identify the contribution of individual mutationsto fitness. However, if some biological attributes contribute more tothe success of GPG strains than others, advantageous mutationsaffecting these functions should be more likely to become fixedbecause they can confer larger fitness benefits than mutationsaffecting functions less important to successful interaction with thehost. From such non-random distribution of mutations it may thusbe possible to infer which physiological traits are of particularimportance in C. albicans’ interaction with the human host.

We undertook the current study to estimate the extent ofgenetic differences between GPG strains and other strains, todetermine their distribution across the genome and to determineif their distribution, in terms of functional and structural categoriesof genes affected, is non-random.

2. Materials and methods

2.1. Characterization of polymorphisms

Two partially overlapping sets of strains (Supplementary TableA), selected in a previous study (Zhang et al., 2003), were screened.Set 1 best represented features that distinguished GPG strains andnon-GPG strains. Set 2 best represented the entire spectrum ofGPG cluster strains and the entire spectrum of non-GPG strains,respectively, including strains of uncertain affiliation (Zhang et al.,2003). AFLP (amplified fragment length polymorphism) wascarried out as described previously (Zhang et al., 2003) using MseIlinkers only or MseI/EcoRI linker combinations (see Supplemen-tary Table B for primer sequences). If, in at least one set of testerstrains, the frequency of an AFLP product in GPG strains minus itsfrequency in other strains was less than �0.75 or larger than 0.75it was characterized further. Such AFLP products were sequencedas described previously (Zhang et al., 2003). Sequence differencesbetween GPG strains and other strains causing AFLPs wereidentified by amplifying the AFLP-containing regions usingprimers corresponding to flanking sequences (for each AFLP, 2–27 different regions from at least two different strains weresequenced; Supplementary Figure A). Some AFLP productsspecific to GPG strains and some AFLP products specific to non-GPG strains contained a restriction site which should have beencut by the restriction enzyme used for AFLP. Since these productswere consistently produced by strains belonging to one groupof strains and consistently absent in the other, this was inter-preted as an indication of GPG-specific methylation differencesat that site.

2.2. Reverse transcriptase-PCR on RNA samples from oral lesions

The lesions of HIV-negative patients presenting with oralcandidiasis at University of Otago School of dentistry clinicswere wiped with a sterile swab, total RNA was isolated and

reverse transcriptase-PCR was carried out, in duplicate, as des-cribed previously (Zhang et al., 2003). RNA was DNAse treatedand samples without reverse transcription were PCR amplified asa control to ensure that RNA samples were not contaminatedwith DNA.

2.3. Analysis of distribution of polymorphisms throughout the genome

To assess if polymorphisms were over-represented on parti-cular chromosomes, 1000 simulations were carried out, in which23 markers were randomly distributed throughout the genome,divided into chromosomes (sizes were obtained from http://candida.bri.nrc.ca/candida/index.cfm) and the number of markerson each chromosome was counted. The chi-squared statistic wascalculated for the observed polymorphism distribution and foreach of the 1000 simulations (a randomization test was used ratherthan simply looking up a table of critical values as some of theexpected values were<5). It was then determined if the number ofpolymorphisms observed for each chromosome was higher thanthe number of markers on this chromosome in 950 simulations. Asimilar approach was used to detect if polymorphisms wereassociated with particular regions of a chromosome more oftenthan others. A thousand simulations were carried out in which anumber of markers, equivalent to the number of polymorphismsfound on a given chromosome, were randomly distributed acrossthat chromosome. It was then assessed if the observed distributionwas more extreme than 950 simulated distributions, in terms ofthe smallest distance between polymorphisms, the sum of the twosmallest distances between polymorphisms, or the variance of thedistances between polymorphisms. It was also tested if thenumber of polymorphisms that overlapped with particularregions, such as telomeres, was greater than that expected bychance.

2.4. Statistical analysis of frequency of non-synonymous mutations

The proportion of non-synonymous changes in the GPG-specificmutations was determined using the pS and pN values defined byNei and Gojobori (1986). The pN/pS ratio was used as the teststatistic (note that using dN and dS is not appropriate here, as it isnot possible to correct for multiple changes on the basis of a singlecodon difference). In two cases, two different non-GPG codons hadreplaced the GPG codon; in these cases the average of the values ofthe two transitions was used. To see if the observed value of thepN/pS ratio was unusually high, it was compared with valuesobtained for random sets of codon differences from sequencesamong fastPHASE (Scheet and Stephens, 2006)-derived haplotypesin the C. albicans MLST database (http://test1.mlst.net/). The teststatistic was computed for these codon differences, and it wasdetermined how often values were obtained which were higherthan those observed for GPG-specific codon differences.

2.5. Analysis of distribution of polymorphisms-marked ORFs between

gene categories

To assess over-representation of particular types of genesamong those marked by GPG-specific polymorphisms using the GOannotation, initially it was determined which of the genes adjacentto GPG-specific polymorphisms had been assigned GO terms.These reduced sets were then used to query the GO term finder atthe C. albicans genome data base (http://www.candidagenome.org/) for significant shared GO terms, using the default settings. Toassess if GPG polymorphism-labelled ORFs were representedacross all screens or in individual screens shown in Fig. 4 moreoften than expected by chance a randomization test was used,because there was overlap in terms of biological functions between

Fig. 1. Distribution of AFLP products between GPG strains and other strains. The

presence or absence of 132 individual AFLP products obtained with 6 different primer

sets was scored in each of the 42 strains in set 1 and the frequency of each product

was calculated for GPG cluster strains and other strains. The size of the circles in the

figure indicate the number of fragments represented by the circle (n = 1 to n = 56)

Shaded areas: the frequency of an AFLP product in cluster strains minus its frequency

in non-cluster strains) is either less than �0.75 or greater than 0.75.

N. Zhang et al. / Infection, Genetics and Evolution 9 (2009) 493–500 495

the screens, and a Bonferoni correction was inappropriate (inflatedtype II error if multiple comparisons are not independent). Tenthousand sets of 22 ORFs were chosen at random (each set waschosen without replacement) from a list of all C. albicans ORFsobtained from the C. albicans genome database (http://www.can-didagenome.org/). It was then determined how often these setswere over-represented across all screens and how many randomsets were represented to the same degree or to a higher degreethan GPG polymorphism-labelled ORFs in individual screens. Thiswas used to calculate a one-tailed mid-point p-value for eachscreen. A one-sided test was used because only statisticallysignificant over-representation was to be assessed; under-repre-sentation would be difficult or impossible to demonstrate for manyof the screens. Mid-point p-values were considered the mostappropriate as the test statistics are integers and only vary over asmall range. The selection of screens used in Fig. 4 was based on aMedline search for screens available at the time the analysis wascarried out. The statistical analysis was carried out after the panelof screens had been established, and no screens were added to, orwithdrawn from, the panel after the statistical analysis. Three GPG-specific AFLPs apparently caused by methylation were included inthese analyses, because GPG-specific methylation of a geneindicates GPG-specific differences in its expression (van Drielet al., 2003) and thus GPG-specific differences in the function ofthe gene.

The sources of the data used for the analysis in Fig. 4 were:stress response (Enjalbert et al., 2003) (microarrays; ‘‘threestressors’’ refers to altered expression in response to heat, osmoticand oxidative stress); dimorphism A (Nantel et al., 2002);dimorphism B (Kadosh and Johnson, 2005) (expression alteredduring hypha formation; microarrays); dimorphism C (Uhl et al.,2003) (genes required for hypha formation; haploinsufficiencyscreen); response to blood (Fradin et al., 2003) (cDNA subtraction);pH-dependent expression (Bensen et al., 2004) (microarrays);response to a mating factor (Bennett et al., 2003) (microarrays);response to adhesion to polystyrene (Marchais et al., 2005)(microarrays); gene expression in biofilms (Garcia-Sanchez et al.,2004) (microarrays); response to �1 of several antifungals (Liuet al., 2005) (microarrays); repeat-containing ORFs (Braun et al.,2005) (whole-genome analysis). For several arrays (Bensen et al.,2004; Garcia-Sanchez et al., 2004; Kadosh and Johnson, 2005; Liuet al., 2005; Marchais et al., 2005) the number of polymorphism-marked ORFs included in the array was estimated based on the sizeof the array, relative to the total number of ORFs in the genome(Braun et al., 2005).

3. Results

3.1. Mutations causing 23 GPG-specific AFLPs are randomly

distributed through the genome

We screened for GPG-specific AFLPs among 41 GPG and 23 non-GPG infection-causing isolates (described in Supplementary TableA), from our worldwide strain collection (Schmid et al., 1999). Theisolates were selected to represent optimally the genetic differ-ences between GPG strains and other strains, as well as the entirespectrum of strains belonging to both groups (see Zhang et al.(2003) for more detailed explanation of selection rationale andmethod). Two borderline strains with uncertain affiliation werealso included in the screen.

We found no AFLP products that were present in all GPG strainswhile absent in all other strains or vice versa (Fig. 1); most strainshad at least one marker not typical of the group to which theybelonged (Supplementary Table A). This is expected because C.

albicans’ population structure is not exclusively clonal (Tibayrenc,1997; Lott and Effat, 2001; Fundyga et al., 2002; Holland et al.,

2002; Odds et al., 2007). Twenty-three AFLP products (approxi-mately 10% of all products that gave strong bands in acrylamidegels), however, were highly specific to either group (frequency inGPG strains minus frequency in other strains less than �0.75 orgreater than 0.75) and were characterized further by sequencingand mapping.

Sequencing of the AFLP products and surrounding regions inmultiple strains enabled us to identify and map the mutationsthat caused the 23 AFLPs (Table 1; Fig. 2a). Polymorphisms weredistributed among chromosomes as expected based on chromo-some size. On individual chromosomes, polymorphisms didnot cluster in the vicinity of telomeres or in any other regions.Polymorphisms were not preferentially associated with the highlyheterozygous regions defined by Jones et al. (2004); actually nonewere located in these regions. The GC content of 200 bp regionssurrounding the mutations that caused the GPG-specific poly-morphisms was within the range of GC contents of randomlychosen areas elsewhere in the genome, an indication that they werenot acquired by horizontal gene transfer. Because the repetitiveelement MU13-4 (Giblin et al., 2001) was associated with onepolymorphism, P17, we analyzed the 200 bp regions surroundingeach polymorphism and found that no other polymorphisms hadbeen generated through MU13-4 or a limited number of othermobile repetitive elements.

In summary, mutations appeared randomly distributed thr-ough the genome and to have arisen independently.

3.2. Genes marked by GPG-specific polymorphisms

Through sequencing the DNA surrounding the polymorphismsin multiple strains, we identified the genes closest to the mutationsthat caused GPG-specific AFLPs and the position of GPG-specificmutations relative to these genes (Table 1; Fig. 2b). Eleven AFLPswere caused by mutations within open reading frames; two AFLPsby indels and nine AFLPs by a total of 14 point mutations. For twoadditional polymorphisms the restriction site that was affectedalso lay within an open reading frame, but there was no sequencedifference between GPG and non-GPG strains. The fact thatthere was nevertheless a consistent AFLP difference between GPG

Table 1Location of GPG-specific polymorphisms, effect on sequence of closest ORFs and description of closest ORFs.

Polymorphism AFLP

producta

Chromosomal location

of polymorphismb

Type of polymorphism;

effect on closest ORF;

position relative to

closest ORF (bp)c

Closest adjacent C.

albicans ORF;

locus named

Description of ORFe

P1 MC0-700 CHR 2: 1158231–1158454 D&I; ns; 1133–1346 orf19.687

(orf19.14161)PNG2

Similar to glycoamidase gene

P2 MC5-c CHR 3: 1448823–1450443,

1451236–1452130

D&I&P; ns; 1311–2931

and 3724–4618

orf19.7400 ALS7 Agglutinin-like protein; hypermutable contingency

gene; member of ALS family

P3 MC6-c CHR 4: 1315492–1315870 D; n/a;-684–306 orf19.11882

(orf19.4404) PGA49

Putative GPI-anchored protein of unknown

function

P4 MC11-c CHR 2: 1001942 P; n/a;�15 orf19.188 (orf19.9443) Similar to yeast MNN4, involved in

mannosylphosphorylation of N-linked

oligosaccharides

P5 MC12-2-n CHR 3: 1713650 P; s; 436 orf19.6738

(orf19.14030) VAN1

Member of Mnn9p family; similar to S. cerevisiae

Van1p which is involved in the first step of mannan

synthesis

P6 MC14-c CHR R: 1896670–1897409 MG*; n/a; n/a CA25SRRN C. albicans gene for 25S rRNA

P7 MC16-c CHR R: 322674 P; n/a; +122 orf19.10053

(orf19.2517)

Similar to yeast HOL1; putative plasma membrane

transporter

P8 MCEA 1-c CHR 1: 2557819–2558112 I; n/a; +200–+493 orf19.874 (orf19.1153)

GAD1

Glutamate decarboxylase activated under stress

P9 MG0-n CHR 5: 3304–3694 P; n/a; �305and �695 orf19.5698 Putative ribosomal protein; ortholog of yeast

MRPL1

P10 MG1-c CHR R: 224518 P; n/a; +207 orf19.7637 YHB4 Related to flavohemoglobins with possible function

in stress response

P11 MG3-c CHR3: 1646545–1646552 I; n/a; �708–�701 orf19.6782

(orf19.14074)

Hypothetical protein

P12 MG7-c CHR 1: 1162395 P; ns; 557 orf19.412 (orf19.8042) Ortholog of yeast SSH1 involved in cotranslational

pathway of protein translocation

P13 MG12-n CHR 2: 1331194 P; n/a; �298 orf19.7702 (orf19.31) Potential cell wall protein

P14 MGEA 2-n CHR1: 1393408, 1393731,

1393733

P; ns&s; 693 and 694

and 1017

orf19.13639

(orf19.6260)

Ortholog of yeast UBP12, encoding ubiquitin-

specific protease in nucleus and cytoplasm,

cleaving ubiquitin from ubiquitinated proteins

P15 MGEA 4-c CHR 2: 1165937,1165938 P; ns; 452 and 453 orf19.6882

(orf19.14171) OSM1

Putative flavoprotein subunit of fumarate

reductase; yeast ortholog required for anaerobic

growth

P16 MGEA 9-c CHR 4:

726941,726940,726935

P; ns; 495 and 496 and

501

orf19.3374

(orf19.20174) ECE1

Encodes repeat-containing protein; hypha-specific

expression

P17 MGEA15-c CHR 3: 3732–2867 I&P; n/a; +96–+961 orf19.5474

(orf19.12929)

Possibly spurious ORF, transcriptionally activated

by Mnl1p under weak acid stress

P18 MGEA24-c CHR 6: 914421 P; ns; 1924 orf19.76 (orf19.7727) Ortholog of yeast SPB1, encoding

methyltransferase, involved in rRNA processing and

60S ribosomal subunit maturation

P19 MGEA25-n CHR 2: 1961657, 1961734 MG; n/a; 876 and 953 orf19.1396 (19.8974) Ortholog of yeast AGE2, encoding, a ADP-

ribosylation factor GTPase-activating protein

effector, involved in Trans–Golgi-Network

transport

P20 MGEA28-c CHR R: 1856129, 1856133 MG; n/a; 798–792 orf19.13792

(orf19.6434)

Ortholog of yeast PEX19, encoding chaperone and

import receptor for newly-synthesized class I

peroxisomal membrane proteins

P21 MGEA29-c CHR 4: 1360355–1360406,

1360500

P; ns&s; 330–381and

475

orf19.4415

(orf19.11893)

Predicted ORF; possibly spurious

P22 MGEA33-c CHR 5: 385695 P; ns; 1537 orf19.3206

(orf19.10718)

Ortholog of yeast CCT7, encoding subunit of the

cytosolic chaperonin Cct ring complex, related to

Tcp1p, required for the assembly of actin and

tubulins in vivo

P23 MGEA37-c CHR R: 1108518 P; s; 3285 orf19.643 (orf19.8257) Putative ORF with predicted role in chromosome

segregation

a Capital letters at the beginning of the name of AFLP products refer to AFLP linkers and bases used in pre-selective amplification (M = MseI, E = EcoRI) -c indicates AFLP

products over-represented in GPG cluster strains, -n indicates products under-represented in GPG cluster strains.b Based on http://www.candidagenome.org.c P: point mutation, D: deletion in GPG strains, I: insertion in GPG strains; MG: methylation in GPG strains (*: methylation adjacent to restriction enzyme site-the restriction

enzyme site only contains A and T); ns: non-synonymous, s: synonymous mutation. Numbers indicate position within ORF; + and � signs in front of numbers indicate

polymorphisms located downstream and upstream of the ORF, respectively. In some cases combinations of several sequence alterations and/or types of sequence alterations

caused the AFLP.d Number of ORF closest to the polymorphism in BLAST search of assembly 19. Number in brackets: second allele of ORF in SC5314 genome. Underlined number is the main

number assigned to the ORF in the C. albicans genome database (http://www.candidagenome.org/); the database is also the source of the loci names.e Based on annotation in GenBank and in the C. albicans genome database (http://www.candidagenome.org/).

N. Zhang et al. / Infection, Genetics and Evolution 9 (2009) 493–500496

strains and non-GPG strains linked to these sites indicatesmethylation differences between GPG strains and other strainsat these two sites. Five additional mutations lay within 700 bpupstream of the nearest ORFs and four within 200 bp downstreamof the nearest ORF. One mutation (P6 in Table 1) affected aribosomal RNA gene.

The distribution of AFLP-causing mutations between ORFs andnon-coding regions was comparable to that expected based on theoverall percentage of coding regions in the C. albcians genome,61.5% (Braun et al., 2005). However the ratio of non-synonymous/synonymous mutations was not. We calculated a pN/pS ratio (Neiand Gojobori, 1986) of 0.54 for the fourteen GPG-specific point

Fig. 2. (a) Distribution of GPG-specific polymorphisms across C. albicans chromosomes; and (b) nature of the polymorphisms and the genes they affect. (a) Polymorphisms,

named P1–P23, were mapped using the Candida Genome database (http://www.candidagenome.org/). Chromosome numbers are shown to the left. Centromeric regions

(Sanyal et al., 2004) are shown as grey dots. In (b) a brief description of the function of the closest ORF to each polymorphism, the nature of the polymorphism and its position

relative to the nearest ORF (represented by a black arrow) is given. Unfilled triangles indicate point mutations outside ORFs or synonymous point mutations within ORFs, grey

triangles indicate non-synonymous point mutations in ORFs, black triangles indicate insertions and deletions and grey diamonds indicate methylation. For polymorphisms

outside an ORF, their positions in the figure reflect the actual distances to those ORFs. Some AFLPs were caused by multiple mutations, and in this case only the one with the

greatest potential impact (the one closest to the ORF for polymorphisms involving upstream and downstream regions) is shown. P6, located in a rRNA gene is not included in

the figure. More detailed information on the polymorphisms is given in Table 1.

N. Zhang et al. / Infection, Genetics and Evolution 9 (2009) 493–500 497

mutations in ORFs. We then compared this to the pN/pS ratiosobserved in 1000 randomly chosen sets of 14 codon differencesbetween ORF sequences in the C. albicans MLST database (http://test1.mlst.net/). Nine hundred and eighty-three had a lower pN/pSratio than the 14 codons affected by GPG-specific mutations. Sothere are significantly (P = 0.017) more non-synonymous changesin the GPG-specific set of mutations than could be expected bychance.

Functions (or predicted functions) of the genes closest to GPG-specific polymorphisms fell into numerous categories, includingadhesion, ion transport, ER to Golgi transport, sterol synthesis, thesynthesis, degradation, modification and export of proteins,anaerobic and hyphal growth, stress response and DNA repair,chromosome segregation, a putative cell wall protein and the ECE1

gene (up-regulated in hyphae (Birse et al., 1993; Braun and Johnson,2000)). For two ORFs associated with polymorphisms, no putativefunction was deducible.

We also assessed if the marked genes were expressed during oralcandidiasis by measuring their transcription in clinical samplesfrom six patients (Fig. 3), using reverse transcriptase-PCR withprimers which would amplify their transcripts in both GPG and innon-GPG strains. All but one of the genes were expressed at adetectable level, indicating potential functionality during disease(the exception was ORF 19.76. marked by P18; controls in whichsamples were amplified without a preceding reverse transcriptasestep were all negative).

3.3. Distribution of polymorphisms across gene categories

We wanted to determine if polymorphisms marked particularfunctional or structural categories of genes more often thanothers; as explained in the introduction this could potentiallyhighlight gene categories of above average importance to thesuccess of GPG strains. This statistical approach does not requirethat the mutational markers themselves have a functional impactor that all of the markers distinguishing the populations to becompared are close to genes that determine the trait under

investigation-although markers which are not close to genes thatcontribute to the trait introduce statistical noise (Ardlie et al.,2002; Wu and Lin, 2006).

When comparing GPG polymorphism-marked genes with all C.

albicans genes using the C. albicans Genome Database GeneOntology (GO) Finder (http://www.candidagenome.org/), weidentified one significant shared GO term (i.e. a GO termrepresented among GPG-polymorphism-marked genes morefrequently than expected based on its frequency across all GO-mappped C. albicans genes): Process: Single-species biofilm forma-

tion (2 out of 19 GPG polymorphism-labelled and GO-mappedgenes, ALS7 and ECE1, compared to 25 of all 6334 GO-mappedgenes; P = 0.0982).

The GO annotation of the C. albcians genome is not yet complete.Also, aside from mistakes introduced by automated annotation, thedistribution of GO terms only partially reflects the percentages ofall genes in the genome involved in particular processes andfunctions or related to particular cellular components. Thefrequency of a GO term is also influenced by the degree of interestof the scientific community in the underlying process, function andcellular component; more research activity means that a higherpercentage of genes involved will have been identified and theassociated GO terms will be over-represented. Thus the aboveanalysis is not an entirely accurate test of non-random distributionof GPG-specific polymorphisms in terms of functions, processesand cellular components. A more objective test would be acomparison of distribution of GPG-specific polymorphisms againstdata obtained in screens less susceptible to investigator bias.

We therefore compared our set of 23 genes with sets of genesidentified in a selection of published genome-wide screensinvestigating specific aspects of C. albicans biology; mainlyexpression screens aimed at identifying genes responding toexternal stimuli or the yeast–hyphal transition. GPG polymorph-ism-marked ORFs were over-represented in nine of fourteencategories assessed in the selection of screens we used (Fig. 4)which is significantly more often than expected by chance(P < 0.03; calculated by randomization test; see Section 2). This

Fig. 3. Expression of ORFs marked by GPG-specific polymorphisms during oral candidasis. Total RNA was extracted from oral swabs from 6 patients and reverse transcriptase-

PCR (RT-PCR) was carried out, as described in Section 2, to amplify the ORFs associated with the polymorphisms listed on the left. Images of the resulting bands are shown

together with a band intensity score (+++, very intense; ++, medium intensity; +, low intensity; �, absent). Because levels of RT-PCR products are influenced by both cell

number and levels of expression, expression of two housekeeping genes, EFB1 and ACT1 was also assessed for every sample. For each patient, PCR was carried out on two

samples of cDNA independently reverse transcribed from the mRNA preparation, and the two PCR amplifications gave similar results.

N. Zhang et al. / Infection, Genetics and Evolution 9 (2009) 493–500498

confirmed that the features that distinguish GPG strains and otherstrains are not randomly distributed among gene categories. Use ofP < 0.10 as an indicator of statistical significance, consistent withthe cut-off point for identifying significant shared terms in Gene

Fig. 4. Representation of GPG-specific polymorphism-marked ORFs in published datasets

or fulfilling the structural criteria listed on the left, grey bars show the percentage of G

include both up-regulated and down-regulated genes). Where over-representation of

polymorphism-labelled ORFs are shown. The number of all ORFs, and of polymorphism-

Section 2 for source data and details of their interpretation for this analysis.

Ontology term searches (http://www.candidagenome.org/), iden-tified five functional categories where over-representation ofpolymorphism-marked ORFs is most likely to be statisticallysignificant, namely repeat-containing ORFs and ORFs differentially

. Black bars show the percentage of all ORFs in the genome responding to the stimuli,

PG-specific polymorphism-labelled ORFs (in expression screens these percentages

polymorphism-marked ORFs was statistically significant, P values and names of

marked ORFs included in the screen, is shown beneath the name of each screen. See

N. Zhang et al. / Infection, Genetics and Evolution 9 (2009) 493–500 499

expressed in response to antifungal drugs, hyphal formation,blood, the a mating factor, and changes in pH (Fig. 4).

4. Discussion

Our results confirm that the high prevalence of GPG strains is notthe result of genetic drift. GPG polymorphisms in ORFs are non-synonymous more often than expected by chance and are notrandomly distributed across gene categories. Both results indicatethat many GPG-specific polymorphisms are under selection. Ourresults also indicate that the success of GPG strains has a broadmultigenic basis. Given that 10% of all our AFLPs were GPG-specific,the 23 we characterized were distributed evenly through thegenome, each affecting a different locus and nine causing alterationsin the amino acid sequence of proteins, there could be hundreds ofpolymorphisms that affect fitness. If so, each probably has aminuscule effect and at least some of these effects would beexpected to be epistatic. As a result, demonstrating the impact ofGPG-specific alleles at individual loci on fitness experimentally byinserting them into tester strains and assessing fitness differences inanimal models, would be extremely challenging. Complex traitmapping (Ardlie et al., 2002; Wu and Lin, 2006) may be more likely toidentify crucial polymorphisms, but would still be difficult if theirnumber is large. Possibly the most useful next step is to carry outgenomic comparisons on a larger scale, by resequencing the C.

albicans genome in representative (Holland and Schmid, 2005) setsof GPG and non-GPG strains. This will provide an estimate of thenumber of polymorphisms separating GPG strains and other strains.It will also provide a much larger source of polymorphisms fromwhich to identify traits that are of above average importance to thesuccess of GPG strains, as gene categories marked by suchpolymorphisms more often than expected by chance.

Our initial statistical evidence suggests three categories ofgenes important to the success of GPG strains. The first category isrepeat-containing ORFs. Such repeat-containing genes rapidlygenerate new alleles, and in Plasmodium such genes are importantin its interaction with the host (Rich and Ayala, 2000). GPG-specificalleles have been observed for several repeat-containing C. albicans

ORFs (Lott et al., 1999; Zhang et al., 2003; Oh et al., 2005; Zhaoet al., 2007), even though their inherent mutability would workagainst their fixation. While the extent of functional differencesbetween the proteins encoded by the different alleles remains to beelucidated, the existence of clade-specific alleles of these highlymutable ORFs thus strongly suggests that these alleles are fitnessdeterminants.

The second category is genes involved in dimorphism, and this isalso supported by other evidence. Hyphal formation has long beenconsidered important to C. albicans success in the human host (Odds,1988; Odds et al., 2001; Stokes et al., 2007), and the propensity ofGPG strains for forming hyphae exceeds that of other genotypes(Hunter and Schmid, unpublished results). Note that the over-representation of GPG polymorphism-marked genes in response toblood, the a mating factor and pH can be explained as a secondaryeffect of over-representation among genes responding to hyphalformation. The genes ECE1 and GAD1 that contribute to the over-representation of GPG-polymorphism-labelled genes in the setsidentified in these three screens are also differentially expressed inhyphal formation, and cell elongation occurred in all of the screens(Bennett et al., 2003; Fradin et al., 2003; Bensen et al., 2004). If ECE1

and GAD1 are omitted from the analysis, GPG-specific polymorph-isms are no longer over-represented among genes responding toblood, the mating factor and pH, but are still over-representedamong genes reacting to hypha formation.

The third category involves genes responding to antifungals.Physiological data (Schmid et al., 1995) indicate that this couldreflect a greater resilience of GPG strains to detrimental substances

or to stress in general rather than specific to antifungaltherapeutics (Schmid et al., 1995). The latter is also unlikely,because exposure of C. albicans to antifungal therapeutics is rare, asthe yeast mainly exists as a commensal (see below), and theirintroduction is a recent event; exposure to antifungal therapeuticsis thus unlikely to have caused clade-wide differences in theresponse to these agents.

Our evidence does not support biofilm formation as importantto the success of GPG strains because ORFs responding to biofilmformation are not over-represented among ORFs carrying GPG-specific mutations (Fig. 4) although mutations in genes GOannotated as involved in biofilm formation are over-representedamong the mutations. This discrepancy may be due to our smallsample size, deficiencies in the GO annotation or temporally andspatially restricted roles of genes marked by GPG-specificpolymorphisms in biofilm formation which may make themdifficult to detect in expression screens.

Since GPG strains are the most common agents of disease, cangenomic comparisons between GPG strains and other strains tellus anything about virulence? Probably not directly, becauseselection of features that set GPG strains apart from others is likelyto be driven entirely by competition of strains during commensalcolonization. There is no evidence for specialized pathogeniclineages in C. albicans, and thus a given strain will spend the vastmajority of its existence as a commensal (Schmid et al., 1995,1999; Xu et al., 1999). Also, the ability to cause disease does notconfer a readily apparent net benefit: a temporary increase in cellnumbers may increase chances of transmission to other hosts butsuccessful treatment or death of the patient may eradicate thestrain in the original host. Thus it would appear unlikely that GPGstrains have developed features which primarily function toenhance their ability to cause disease (Schmid et al., 1995;Sokurenko et al., 1999; Tekaia and Latge, 2005). However therecould still be considerable overlap between traits enhancingcommensalism and those that enhance virulence. Many challengesthat C. albicans needs to meet when causing disease are identical tothose faced during commensalism (such as the need to evade theimmune system and to adhere to epithelia). GPG strains do have ahigher propensity towards causing at least some types ofinfections, seem capable of replacing other strains in infection(Schmid et al., 1999; Odds et al., 2007) and may cause higherpatient mortality (Schmid, Bretagne, Zhang, Bendall, Desnos-Ollivier, and the yeasts group: differences in mortality betweenpatients infected with different C. albicans genotypes. 9th ASMConference on Candida and Candidiasis, Jersey City, NJ, USA, 2008):at least some of the traits that make GPG strains better colonizersmay thus also make them more virulent. Virulence also resemblesthe features that determine the success of GPG strains, in that itseems to be a complex trait. According to the pathogen-hostinteractions database (http://www.phi-base.org/), disruption of121 of 126 C. albicans genes tested to date attenuates virulence inanimal models.

The large number of virulence genes not only supports the ideaof a similarity between virulence traits and the traits thatdistinguish GPG strains from other strains but also highlightsthat it may be difficult to identify key virulence genes by studyingeach gene’s function in isolation and without additional informa-tion. Thus investigating the basis of the success of GPG strains bygenomic comparisons may be an indirect but necessary steptowards understanding C. albicans virulence.

Acknowledgements

We thank Richard Bennett for supplying unpublished details onmicroarray analyses, Joyce Leung for calculating the frequency ofamino acid-altering polymorphisms in genes used for multi-locus

N. Zhang et al. / Infection, Genetics and Evolution 9 (2009) 493–500500

sequence typing and Beatrice Magee, Jeremy Hyams and JustinO’Sullivan for comments on early versions of the manuscript. Wethank Frank Odds for bringing to our attention the high percentageof gene disruptions that attenuate virulence. Sequence data for C.

albicans SC5314 were obtained from the Candida Genome databaseat http://www.candidagenome.org/. This work was supported byMarsden grant MAU 902 from The Royal Society of New Zealand toJS and RDC.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, in

the online version, at doi:10.1016/j.meegid.2009.01.007.

References

Ardlie, K.G., Kruglyak, L., Seielstad, M., 2002. Patterns of linkage disequilibrium inthe human genome. Nat. Rev. Genet. 3, 299–309.

Bennett, R.J., Uhl, M.A., Miller, M.G., Johnson, A.D., 2003. Identification and char-acterization of a Candida albicans mating pheromone. Mol. Cell. Biol. 23, 8189–8201.

Bensen, E.S., Martin, S.J., Li, M., Berman, J., Davis, D.A., 2004. Transcriptional profilingin Candida albicans reveals new adaptive responses to extracellular pH andfunctions for Rim101p. Mol. Microbiol. 54, 1335–1351.

Birse, C.E., Irwin, M.Y., Fonzi, W.A., Sypherd, P.S., 1993. Cloning and characterizationof ECE1, a gene expressed in association with cell elongation of the dimorphicpathogen Candida albicans. Infect. Immun. 61, 3648–3655.

Braun, B.R., Johnson, A.D., 2000. TUP1, CPH1 and EFG1 make independent contribu-tions to filamentation in Candida albicans. Genetics 155, 57–67.

Braun, B.R., van het Hoog, M., d’Enfert, C., Martchenko, M., Dungan, J., Kuo, A., Inglis,D.O., Uhl, M.A., Hogues, H., Berriman, M., et al., 2005. A human-curatedannotation of the Candida albicans genome. PLoS Genet. 1, 36–57.

Enjalbert, B., Nantel, A., Whiteway, M., 2003. Stress-induced gene expression inCandida albicans: absence of a general stress response. Mol. Biol. Cell 14, 1460–1467.

Fradin, C., Kretschmar, M., Nichterlein, T., Gaillardin, C., d’Enfert, C., Hube, B., 2003.Stage-specific gene expression of Candida albicans in human blood. Mol. Micro-biol. 47, 1523–1543.

Fundyga, R., Lott, T.J., Arnold, J., 2002. Population structure of Candida albicans, amember of the human flora, as determined by microsatellite loci. Infect. Genet.Evol. 2, 57–68.

Garcia-Sanchez, S., Aubert, S., Iraqui, I., Janbon, G., Ghigo, J.M., d’Enfert, C., 2004.Candida albicans biofilms: a developmental state associated with specific andstable gene expression patterns. Eukaryot. Cell 3, 536–545.

Giblin, L., Edelmann, A., Zhang, N., Maltzahn, N.B.V., Cleland, S.B., Sullivan, P.A.,Schmid, J., 2001. A DNA polymorphism specific to Candida albicans strainsexceptionally successful as human pathogens. Gene 272, 157–164.

Holland, B.R., Huber, K.T., Dress, A., Moulton, V., 2002. Delta-plots: a tool foranalyzing phylogenetic distance data. Mol. Biol. Evol. 19, 2051–2059.

Holland, B.R., Schmid, J., 2005. Selecting representative model micro-organisms.BMC Microbiology 5, paper 26.

Jones, T., Federspiel, N.A., Chibana, H., Dungan, J., Kalman, S., Magee, B.B., Newport,G., Thorstenson, Y.R., Agabian, N., Magee, P.T., et al., 2004. The diploid genomesequence of Candida albicans. Proc. Natl. Acad. Sci. U.S.A. 101, 7329–7334.

Kadosh, D., Johnson, A.D., 2005. Induction of the Candida albicans filamentousgrowth program by relief of transcriptional repression: a genome-wide analy-sis. Mol. Biol. Cell 16, 2903–2912.

Liu, T.T., Lee, R.E., Barker, K.S., Wei, L., Homayouni, R., Rogers, P.D., 2005. Genome-wide expression profiling of the response to azole, polyene, echinocandin, andpyrimidine antifungal agents in Candida albicans. Antimicrob. Agents Che-mother. 49, 2226–2236.

Lott, T.J., Effat, M.M., 2001. Evidence for a more recently evolved clade withina Candida albicans North American population. Microbiology 147, 1687–1692.

Lott, T.J., Holloway, B.P., Logan, D.A., Fundyga, R., Arnold, J., 1999. Towards under-standing the evolution of the human commensal yeast Candida albicans. Micro-biology 145, 1137–1143.

Marchais, V., Kempf, M., Licznar, P., Lefrancois, C., Bouchara, J.P., Robert, R., Cottin, J.,2005. DNA array analysis of Candida albicans gene expression in response toadherence to polystyrene. FEMS Microbiol. Lett. 245, 25–32.

Morgan, J., 2005. Global trends in candidemia: review of reports from 1995–2005.Curr. Infect. Dis. Rep. 7, 429–439.

Nantel, A., Dignard, D., Bachewich, C., Harcus, D., Marcil, A., Bouin, A.P., Sensen, C.W.,Hogues, H., van het Hoog, M., Gordon, P., et al., 2002. Transcription profiling ofCandida albicans cells undergoing the yeast-to-hyphal transition. Mol. Biol. Cell13, 3452–3465.

Nei, M., Gojobori, T., 1986. Simple methods for estimating the numbers ofsynonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol.3, 418–426.

Odds, F.C., 1988. Candida and Candidosis, 2nd ed. Bailliere Tindall, London.Odds, F.C., Bougnoux, M.E., Shaw, D.J., Bain, J.M., Davidson, A.D., Diogo, D., Jacobsen,

M.D., Lecomte, M., Li, S.Y., Tavanti, A., et al., 2007. Molecular phylogenetics ofCandida albicans. Eukaryot. Cell 6, 1041–1052.

Odds, F.C., Gow, N.A., Brown, A.J., 2001. Fungal virulence studies come of age.Genome Biol. 2, reviews1009.1-reviews 1009.4.

Oh, S.-H., Cheng, G., Nuessen, J.A., Jajko, R., Yeater, K.M., Zhao, X., Pujol, C., Soll, D.R.,Hoyer, L.L., 2005. Functional specificity of Candida albicans Als3p proteinsand clade specificity of ALS3 alleles discriminated by the number ofcopies of the tandem repeat sequence in the central domain. Microbiology151, 673–681.

Rich, S.M., Ayala, F.J., 2000. Population structure and recent evolution of Plasmodiumfalciparum. Proc. Natl. Acad. Sci. U.S.A. 97, 6994–7001.

Sanyal, K., Baum, M., Carbon, J., 2004. Centromeric DNA sequences in the pathogenicyeast Candida albicans are all different and unique. Proc. Natl. Acad. Sci. U.S.A.101, 11374–11379.

Scheet, P., Stephens, M., 2006. A fast and flexible statistical model for large-scalepopulation genotype data: applications to inferring missing genotypes andhaplotypic phase. Am. J. Hum. Genet. 78, 629–644.

Schmid, J., Herd, S., Hunter, P.R., Cannon, R.D., Yasin, M.S.M., Samad, S., Carr, M., Parr,D., McKinney, W., Schousboe, M., et al., 1999. Evidence for a general-purposegenotype in Candida albicans, highly prevalent in multiple geographic regions,patient types and types of infection. Microbiology 145, 2405–2414.

Schmid, J., Hunter, P.R., White, G.C., Nand, A.K., Cannon, R.D., 1995. Physiologicaltraits associated with success of Candida albicans strains as commensal colo-nisers and pathogens. J. Clin. Microbiol. 33, 2920–2926.

Sokurenko, E.V., Hasty, D.L., Dykhuizen, D.E., 1999. Pathoadaptive mutations: geneloss and variation in bacterial pathogens. Trends Microbiol. 7, 191–195.

Soll, D.R., Pujol, C., 2003. Candida albicans clades. FEMS Immunol. Med. Microbiol.39, 1–7.

Sternberg, S., 1994. The emerging fungal threat. Science 266, 1632–1634.Stokes, C., Moran, G.P., Spiering, M.J., Cole, G.T., Coleman, D.C., Sullivan, D.J., 2007.

Lower filamentation rates of Candida dubliniensis contribute to its lower viru-lence in comparison with Candida albicans. Fungal Genet. Biol. 22, 920–931.

Tekaia, F., Latge, J.-P., 2005. Aspergillus fumigatus: saprophyte or pathogen? Curr.Opin. Microbiol. 8, 385–392.

Tibayrenc, M., 1997. Are Candida albicans natural populations subdivided? TIM 5,253–257.

Uhl, M.A., Biery, M., Craig, N., Johnson, A.D., 2003. Haploinsufficiency-based large-scale forward genetic analysis of filamentous growth in the diploid humanfungal pathogen C.albicans. EMBO J. 22, 2668–2678.

van Driel, R., Fransz, P.F., Verschure, P.J., 2003. The eukaryotic genome: a systemregulated at different hierarchical levels. J. Cell Sci. 116, 4067–4075.

Wu, R., Lin, M., 2006. Functional mapping - how to map and study the geneticarchitecture of dynamic complex traits. Nat. Rev. Genet. 7, 229–237.

Xu, J., Boyd, C.M., Livingston, E., Meyer, W., Madden, J.F., Mitchell, T.G., 1999. Speciesand genotypic diversities and similarities of pathogenic yeasts colonizingwomen. J. Clin. Microbiol. 37, 3835–3843.

Zhang, N., Harrex, A.L., Holland, B.R., Fenton, L.E., Cannon, R.D., Schmid, J., 2003.Sixty alleles of the ALS7 open reading frame in Candida albicans: ALS7 is ahypermutable contingency locus. Genome Res. 13, 2005–2017.

Zhao, X., Oh, S.H., Jajko, R., Diekema, D.J., Pfaller, M.A., Pujol, C., Soll, D.R., Hoyer, L.L.,2007. Analysis of ALS5 and ALS6 allelic variability in a geographically diversecollection of Candida albicans isolates. Fungal Genet. Biol. 44, 1298–1309.