Host-jump drives rapid and recent ecological speciation of the emergent fungal pathogen...

48
Title: Host-jump drives rapid and recent ecological speciation of the emergent fungal pathogen Colletotrichum kahawae Authors: Diogo Nuno Silva 1,2 , Pedro Talhinhas 1 , Lei Cai 3 , Luzolo Manuel 4 , Elijah K. Gichuru 5 , Andreia Loureiro 1 , Vítor Várzea 1 , Octávio Salgueiro Paulo 2 and Dora Batista 1 . 1 CIFC/IICT - Centro de Investigação das Ferrugens do Cafeeiro/ Instituto de Investigação Científica Tropical, Quinta do Marquês, 2784-505 Oeiras, Portugal. 2 Computational Biology and Population Genomics group, Centro de Biologia Ambiental, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal 3 State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, No.10, North 4th Ring Rd West, Beijing 100190, People’s Republic of China 4 Instituto Nacional do Café de Angola, Luanda, Angola 5 Coffee Research Foundation, P.O.Box 4-00232, Ruiru, Kenya Abstract: Ecological speciation through host-shift has been proposed as a major route for the appearance of novel fungal pathogens. The growing awareness of their negative impact on global economies and public health created an enormous interest in identifying the factors that are most likely to promote their emergence in nature. In this work, a combination of pathological, molecular and geographic data was used to investigate the recent emergence of the fungus Colletotrichum kahawae. C. kahawae emerged as a specialist pathogen causing Coffee Berry Disease in Coffea arabica, due to its unparalleled adaptation of infecting green coffee berries. 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Transcript of Host-jump drives rapid and recent ecological speciation of the emergent fungal pathogen...

Title: Host-jump drives rapid and recent ecological speciation of the emergent fungal pathogen

Colletotrichum kahawae

Authors: Diogo Nuno Silva1,2, Pedro Talhinhas1, Lei Cai3, Luzolo Manuel4, Elijah K. Gichuru5,

Andreia Loureiro1, Vítor Várzea1, Octávio Salgueiro Paulo2 and Dora Batista1.

1 CIFC/IICT - Centro de Investigação das Ferrugens do Cafeeiro/ Instituto de Investigação

Científica Tropical, Quinta do Marquês, 2784-505 Oeiras, Portugal.

2 Computational Biology and Population Genomics group, Centro de Biologia Ambiental,

Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal

3 State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences,

No.10, North 4th Ring Rd West, Beijing 100190, People’s Republic of China

4 Instituto Nacional do Café de Angola, Luanda, Angola

5 Coffee Research Foundation, P.O.Box 4-00232, Ruiru, Kenya

Abstract: Ecological speciation through host-shift has been proposed as a major route for the

appearance of novel fungal pathogens. The growing awareness of their negative impact on global

economies and public health created an enormous interest in identifying the factors that are most

likely to promote their emergence in nature. In this work, a combination of pathological,

molecular and geographic data was used to investigate the recent emergence of the fungus

Colletotrichum kahawae. C. kahawae emerged as a specialist pathogen causing Coffee Berry

Disease in Coffea arabica, due to its unparalleled adaptation of infecting green coffee berries.

1

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

Contrary to current hypotheses, our results suggest that a recent host-jump underlay the

speciation of C. kahawae from a generalist group of fungi seemingly harmless to coffee berries.

We posit that immigrant inviability and a predominantly asexual behavior could have been

instrumental in driving speciation by creating pleiotropic interactions between local adaptation

and reproductive patterns. Moreover, we estimate that C. kahawae began its diversification at less

than 2,200 yrs leaving a very short time frame since the divergence from its sibling lineage

(∼5,600 yrs), during which a severe drop in C. kahawae’s effective population size occurred.

This further supports a scenario of recent introduction and subsequent adaptation to C. arabica.

Phylogeographic data revealed low levels of genetic polymorphism but provided the first

geographically consistent population structure of C. kahawae, inferring the Angolan population

as the most ancestral and the East African populations as the most recently derived. Altogether,

these results highlight the significant role of host specialization and asexuality in the emergence

of fungal pathogens through ecological speciation.

Short title: Host-jump speciation of C. kahawae

Keywords: Adaptation, BEAST, Coffee Berry Disease, Coffea arabica, Divergence population

genetics, Plant pathogen

Introduction

Ecological speciation is being increasingly recognized as an important mechanism driving the

origin of species, in which the contributions of ecology and natural selection are fundamental

(Rundle & Nosil 2005; Schluter 2009). By definition, ecological speciation occurs when

2

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

reproductive isolation between populations arises as a consequence of ecologically-based

divergent selection promoting the fixation of different advantageous alleles in each of the

contrasting environments (Schluter & Conte 2009). Host specialization can be a powerful driver

of divergent selection, promoting ecological speciation via host shift or jump, depending on

whether the involved hosts are genetically similar or distant, respectively. Evidence is

accumulating from disparate taxa, such as phytophagous insects (Winkler et al. 2009), coral

dwelling fish (Munday et al. 2004) and pathogenic fungi (Giraud et al. 2010), showing that

specialization to different hosts can effectively lead to population sub-division, adaptation and

divergence.

Host-shifts may represent a way by which incipient pathogen species can escape from direct

resource competition, thus experiencing relatively high fitness even if they are initially poorly

adapted to the new environment (Dieckmann et al. 2004). The adaptation to the new habitat can

then drive the evolution of partial reproductive isolation within a few generations (Hendry et al.

2007). During this period, adaptive divergence itself is likely to be of key importance in

restricting gene flow, thereby allowing the completion of speciation regardless of the geographic

distribution of the diverging populations (Dieckmann et al. 2004). Indeed, verbal and

mathematical models suggest that when host specialization and local adaptation are coupled with

assortative mating, gene flow can be severely or totally restricted even in the absence of extrinsic

barriers (Nosil et al. 2005; Schluter & Conte 2009; Gladieux et al. 2011). Moreover, if one

considers the effects of pleiotropy, the genetic associations between disruptively selected traits

and the mechanism of reproductive isolation do not depend exclusively upon the establishment of

linkage, rendering ecological speciation likely and swift (Via 2001, Servedio et al. 2011).

3

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

Elucidating the biotic factors and environmental circumstances that are most likely to promote

ecological speciation in nature can be challenging using contemporaneous data, since time often

obscures the conditions at the moment of speciation (Coyne & Orr 2004). In this regard,

addressing emergent fungal diseases can be particularly insightful, since their recent origins

provide a unique opportunity to study the mechanisms underlying ecological speciation

(Stukenbrock & McDonald 2008). Fungal diseases have been emerging at an increasing rate on a

wide range of host plants, posing tremendous threats to global economy and food safety (Jones et

al. 2008). The growing awareness of their worldwide impact has been followed by the need of

understanding the events leading to the emergence of such species (Stukenbrock & McDonald

2008). In addition, they also represent interesting models of speciation mechanisms for two other

reasons. First, as human activities led to dramatic changes in the ecosystems at a global scale,

with vast areas cultivated with single crops/genotypes and with the global exchange of

germplasm (Kareiva et al. 2007), diverse ecological opportunities have been created for the rapid

emergence and transmission of novel fungal diseases (Stukenbrock & McDonald 2008). Second,

fungi possess several exclusive and remarkable features that facilitate rapid ecological speciation,

especially through host-shift (Giraud et al. 2010). In some examples of local adaptation to their

hosts, fungi revealed capable of: (i) undergoing frequent asexual reproduction with rare events of

sexual recombination; (ii) generating a large number of spores, which increases adaptive

variation input by mutation; or (iii) mating only within the host, creating pleiotropic interactions

between host specialization and assortative mating (Giraud et al. 2010 and references therein).

Herein, we aimed to elucidate the speciation event of an emergent fungal pathogen,

4

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

Colletotrichum kahawae, from within the generalist and cosmopolitan C. gloeosporioides

complex. C. kahawae emerged as a specialist pathogen causing Coffee Berry Disease (produces

anthracnose symptoms, with sunken necroses leading to fruit drop or mummification) on Arabica

coffee (Coffea arabica), due to its unique adaptation of infecting green berries, an ecological

niche previously unoccupied by other fungi (Firman & Waller 1977). Although C. arabica is

grown in South America, Africa and Asia, C. kahawae is still restricted to Africa where it was

first reported in 1922 in Kenya (McDonald 1926). However, and despite the African origin of C.

arabica, the cultivation of this crop is also more recent in this continent, with the first attempts in

the 18th century but most dating from the second half of the 19th century (Bigger 2006). The

Coffee Berry Disease pathogen currently occurs in nearly all African regions where C. arabica is

grown, particularly above 1400m, ravaging plantations and causing up to 80% yield losses

annually. Nonetheless, very little is known about its origin besides the information provided by

historical data and field reports. According to these sources, C. kahawae would have originated

in Kenya from C. gloeosporioides sensu lato populations inhabiting Coffea spp. hosts (frequently

isolated from ripe or damaged coffee berries) either by: (i) mutation from a mildly parasitic form

in C. arabica (Nutman & Roberts 1960); (ii) hybridization between C. gloeosporioides strains

from other Coffea spp. hosts (Robinson 1976); or (iii) shifting from another Coffea spp., where it

would be present as an harmless fungal strain (Robinson 1974). Currently, these hypotheses

remain untested and the relationship between C. kahawae and C. gloeosporioides s.l. isolates has

been poorly investigated, so that it is unclear how the specialization to the new niche was

accomplished. Some studies have attempted to provide insights by investigating the genetic

variability and population structure of C. kahawae, albeit with little success. Concordant with the

putative recent origin and predominantly asexual reproduction, genetic variability of C. kahawae

was found to be very low, preventing inferences about its population structure, origin and spread

5

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

(Bridge et al. 2008; Manuel et al. 2009).

The present study intends to assess the current hypotheses for the emergence and spread of C.

kahawae and to investigate the process of host specialization, using information from a multi-

locus sequencing approach. We used an extensive sampling of C. kahawae’s populations, as well

as C. gloeosporioides s.l. isolates from Coffea spp. and several other hosts worldwide. In this

endeavor, we made use of novel statistical techniques on the inference of phylogenetic

relationships among species, estimation of divergence times and reconstruction of past

demographic history. These analyses were coupled with information from geographic modeling

and pathogenicity tests to provide a multidisciplinary view of such speciation event.

Methods

Fungal material

A total of 102 isolates from the C. gloeosporioides complex were obtained from culture

collections and field samples, comprising three main groups: (i) 29 C. gloeosporioides s.l.

isolates collected from Coffea spp. hosts throughout plantations in South America, Africa and

Asia; (ii) 14 C. gloeosporioides s.l. isolates collected from eleven host species (other than Coffea

spp.) and five geographic locations, previously selected based on preliminary phylogenetic

analyses and culture availability; and (iii) 59 C. kahawae isolates obtained from infected green C.

arabica berries in ten African countries, covering most of its current range (Table S1). Ex-

holotypes (cultures obtained from the taxonomical type strain) from three recently described

species within the C. gloeosporioides complex from C. arabica hosts in Thailand were included:

Colletotrichum siamense, Colletotrichum asianum and Colletotrichum fructicola (Prihastuti et al.

6

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

2009). Isolates from Citrus limon and Olea europaea hosts were also used, representing the C.

gloeosporioides sensu stricto epitype (Cannon et al. 2008) based on BLAST results of the

complete rDNA Internal Transcribed Spacer (ITS) (GenBank accession number EU371022 [E

value = 0, Max ident. = 98%]) and β-tubulin 2 (β-tub2) (GenBank accession number FJ907445 [E

value = 0, Max ident. = 99%]). Five isolates of Colletotrichum fragariae were selected as

outgroup taxa for the phylogenetic analyses. Culturing and DNA extraction from fungal isolates

were performed as previously described (Silva et al. 2012).

Molecular data

Six nuclear markers previously described in detail by Silva et al. (2012) were employed for this

study: the ITS region, β-tub2, Apn25L, MAT1-2-1, ApMAT and MAT5L. Primers and PCR

conditions for ApMAT, MAT5L, Apn25L and MAT1-2-1 were as described by Silva et al.

(2012). PCR amplification of the ITS region was carried out with primers ITS1Ext/ITS4Ext

(Brown et al. 1996) and β-tub2 using primers T1/T2 (O’Donnell & Cigelnik 1997). PCR products

yielding one clear band were purified using SureClean (BioLine). When products presented a

multi-band profile, the band of the expected size was excised and purified using the Silica Bead

DNA Gel Extraction kit (Fermentas). Sequencing reactions were performed using the BigDye

version 3.1 chemistry (Applied Biosystems) on an ABI prism 310 automated sequencer.

Amplicons were sequenced in both directions and chromatograms were manually checked for

errors in SEQUENCHER v4.0.5 (Gene Codes Corporation).

Phylogenetic analyses

Multiple sequence alignments were constructed for each nuclear sequence dataset in MAFFT

v6.717b (Katoh & Toh 2009), using the L-INS-i method. Individual datasets were concatenated

into a combined matrix using the CONCATENATOR v1.1.0 software (Pina-Martins & Paulo

7

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

2008). We used Maximum Likelihood (ML) and a Bayesian framework with Markov chain

Monte Carlo (BMCMC) to reconstruct phylogenies from the separate and combined datasets. For

ML analyses, MODELTEST v3.7 (Posada & Crandall 1998) was used to select the best fit model

of DNA sequence evolution, under the Akaike Information Criterion (AIC). ML heuristic

searches were performed in PAUP* v4.0d99 (Swofford 2003) with 100 replicates, random

sequence addition and a Tree-Bisection Reconnection (TBR) branch swapping algorithm.

Nonparametric bootstrap was conducted using 1 000 pseudoreplicates with 10 random additions

and TBR branch swapping. The BMCMC analysis was run in MRBAYES v3.1.2 (Ronquist &

Huelsenbeck 2003) with the optimal model of sequence evolution selected under the AIC, as

implemented in MRMODELTEST v2.3 (Nylander 2004). The MAT1-2-1 dataset was partitioned

into codon positions because two different substitution models were estimated for the first and

second positions combined and third position. Posterior probabilities were generated from 1x107

generations, sampling at every 1 000th iteration, and the analysis was run three times with one

cold and three incrementally heated Metropolis-coupled Monte Carlo Markov chains, starting

from random trees. The achievement of the stationary phase was checked using Tracer v1.4

(Rambaut & Drummond 2007) and 1x106 generations were discarded as burn-in. Trees from

different runs were then combined and summarized in a majority rule 50% consensus tree.

Since species assignment for most C. gloeosporioides s.l. isolates was initially uncertain, we

diagnosed potentially distinct species in a phylogenetic context using the multi-locus

genealogical concordance approach (Taylor et al. 2000). Phylogenetic species were required to

meet the following criteria: (i) monophyly, (ii) strong phylogenetic support (e.g. bootstrap,

posterior probabilities) in the multi-locus analysis and (iii) genealogical concordance, as conflict

8

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

between gene trees could be interpreted as recombination among individuals within a species

(Taylor et al. 2000).

Divergence dating of the species tree and demographic analysis

The Bayesian MCMC analysis implemented in *BEAST (Heled & Drummond 2010), an

extension of the BEAST v.1.6.1 software package (Drummond & Rambaut 2007), was used to

joint estimate the species tree and divergence times from our multi-locus dataset. Preliminary

simulations detected a substantial heterogeneity in the substitution rates of the different

sequenced regions, rendering inappropriate the attribution of a single clock model. Thus, we

analyzed eight partition schemes of increasing complexity, ranging from a single partition of

substitution and clock models, to the partitioning of the entire dataset in 8 unlinked substitution

models and 6 unlinked clock models, according to gene/intergenic positions (Table S2). The best

fit model of sequence evolution for each partition was estimated using MODELTEST v3.7

(Posada & Crandall 1998), under the AIC. We used a Birth and Death process as species tree

prior and a Piecewise linear and constant root model for population size. Preliminary runs using

an uncorrelated lognormal relaxed clock revealed that the σr (“CoefficientOfVariation”)

parameter value for the ITS partition was consistently < 0.3 with a frequency histogram abutting

0, suggesting that this partition does not significantly deviate from a strict clock. Therefore,

where applicable in our partition schemes, we used a strict clock for the ITS partition and

uncorrelated lognormal molecular clocks for all other partitions.

Genetic distances were converted into geological time units using an averaged substitution rate of

8.8x10-9 per bp per year corresponding to the median of estimates for several nuclear genomic

regions obtained by Kasuga et al. (2002). We fixed the substitution rate of ITS with this value,

9

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

since it was the only genomic region from our dataset included in the calibration study, and

estimated the rates for all other partitions using uniform priors [0, 1]. The final results from the

*BEAST MCMC analyses were obtained by combining two independent runs with 1x108

generations each, sampling parameters every 10 000 iterations, using LOGCOMBINER v1.6.1

(Drummond & Rambaut 2007). Tracer v1.4 (Rambaut & Drummond 2007) was used to assess

convergence and mixing for all parameters by visually inspecting the trace log and estimating the

Effective Sample Size (ESS) for each parameter; all ESS values were > 200 indicating adequate

mixing. A conservative 10% of each analysis was discarded as burn-in.

In order to objectively select the most appropriate partition, we tested our partition schemes with

a Bayes Factors (BF) analysis. Following a previous study (Brown & Lemmon 2007), we

considered values for the test statistic 2ln(BF)12 between model 1 and model 2 above 10 to be

evidence of significant support for model 2, values between 10 and -10 to indicate ambiguity and

values below 10 to indicate strong support for model 1. When faced with ambiguity, we opted for

the simpler model. BF were determined by calculating the marginal likelihood for each scheme

using Tracer v1.4 (Rambaut & Drummond 2007).

The demographic history of C. kahawae’s populations was reconstructed using the Gaussian

Markov random fields (GMRF) Bayesian Skyride (Minin et al. 2008), implemented in BEAST

v1.6.1 (Drummond & Rambaut 2007). The GMRF Bayesian Skyride was only specified for the

most appropriate partition previously selected using BF. The analysis was run twice for 1x108

generations, sampling at every 10 000 generations after an initial burn-in of 10%. As in the

*BEAST analysis, the performance of the MCMC procedure was evaluated in Tracer v1.4

10

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

(Rambaut & Drummond 2007), which was also used to create the Bayesian Skyride plot with the

median and corresponding credibility intervals of the estimated demographic parameters.

Population genetics analyses in C. kahawae and UG1

According to the phylogenetic analyses described above, a group of C. gloeosporioides s.l.

isolates very close to C. kahawae was detected (Fig. 1), here named Undescribed group 1 (UG1).

To better understand their relationship, median-joining haplotype networks were constructed for

each locus using the program NETWORK v4.6.0.0 (Bandelt et al. 1999). DNA polymorphism

statistics, measures of divergence using the net divergence statistic (Da, Nei 1987) and analyses of

recombination based on the four-gamete test (Hudson and Kaplan 1985) were computed in

DnaSP v5.0 (Rozas et al. 2003). To assess the degree of genetic differentiation, F-statistics were

calculated for all molecular markers, as implemented in ARLEQUIN (Excoffier et al. 2005), and

the Snn statistic (Hudson 2000) was estimated after 1 000 permutations.

Isolation with Migration analyses

Before investigating the history of isolation between these two groups, the standard neutral model

was tested for each marker using Tajima’s D and Fu and Li’s D* and F*, as implemented in

DnaSP v5.0. We used the program isolation-with-migration (IMa) (Hey and Nielsen 2004) to

assess whether an isolation with migration model would fit the data significantly better than a

strict isolation model and to estimate population mutation parameters (θCk, θUG1 and ancestral θA),

migration rates (mCk>UG1 and mUG1>Ck) and the time since the divergence of the ancestral population

of C. kahawae and UG1. Since IMa supports multi-locus datasets with different molecular rates,

the concatenated dataset of 6 nuclear markers was used to carry out the estimation of parameters.

Given the low sequence polymorphism and absence of sites with multiple substitutions, we

11

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

applied the infinite-site model of sequence evolution for all loci. Multiple preliminary runs were

performed to determine the appropriate prior upper bounds for the parameters and to assess the

most efficient heating scheme for the Metropolis coupled MCMC run. After these pilot runs, we

performed four independent runs with 100 Metropolis coupled chains with geometric heating

(increment parameters: g1 = 0.95; g2 = 0.5) for 5 × 105 steps, sampling at every 10th iteration,

after a burnin period of 1 × 106 generations. Overall, 2 × 105 generations were saved and

combined for the final estimation of parameters by executing the program in L-mode. The L-

mode of IMa was also used to estimate joint parameter distributions and to examine nested

models with a reduced number of parameters, representing alternative hypothesis of gene flow

patterns. Hypothesis testing was performed by comparing the ln-likelihood ratio (LLR) statistic

between two models to a χ2 distribution with k degrees of freedom, where k is the difference in

the number of parameters between models (Hey and Nielsen 2007). To convert parameter

estimates into biologically meaningful units, we used the per locus substitution rate of the ITS

partition (1.369 × 10-6), obtained from the same averaged substitution rate per site per year as in

the *BEAST analyses.

Phylogeography of C. kahawae

The relationships and structure of C. kahawae’s populations were assessed by ML and BMCMC

methods as described above. For each of the polymorphic sites within C. kahawae, the ancestral

state was estimated using SNAP Workbench (Aylor et al. 2006). Relevant alternative topologies

were tested to assess the robustness of the unconstrained inference of population’s relationships,

using the topological test (SH) of Shimodaira & Hasegawa (1999). One thousand replicates were

performed by re-sampling the partial likelihoods for each site (RELL model). Using the IDRISI

12

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

15 Andes software (Eastman 2006), a topographical map of a partial region of the African

continent was modeled, in order to highlight regions above 1400m of altitude and to assess their

correlation with C. kahawae’s haplotype distribution.

Pathogenicity assays

To assess the pathogenicity of Colletotrichum isolates from other hosts on Arabica coffee fruits,

inoculations of fungal strains were carried out on green and ripe detached berries. Based on the

phylogenetic analyses conducted above, all isolates from UG1 and Undescribed group 2 (UG2)

phylogenetic groups (Fig. 1) were included, plus four C. kahawae isolates as positive controls

(Mal2, Cam1, Ang29 and Que2). For each isolate, 106 spore mL-1 spore suspensions were

obtained from 7-day-old sporulating cultures in Malt Extract Agar medium (Cultimed, Panreac

Quimica). Berries were collected from coffee plants maintained at CIFC greenhouses, washed

three times with distilled water and blotted dry with a sterilized paper. Two trials were then

undertaken: (i) 10 Arabica green coffee berries of genotype CIFC-7963 and (ii) 10 Arabica ripe

coffee berries of genotype CIFC-H420/2 susceptible to Coffee Berry Disease were inoculated

with 10 µl droplets of conidia suspension/berry from each fungal isolate. For both trials, an

additional 10-berry set was mock-inoculated with 10 µl sterile water/berry as a negative control.

After inoculation, berries were incubated for 24 h in the dark in a moisture chamber at 25 ºC and

then maintained under these conditions but with a 12 h photoperiod. Symptoms were scored at

the 7th and 15th days after inoculation, and berries were classified either as showing Coffee Berry

Disease symptoms (1), or as asymptomatic (0).

13

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

Results

Evolutionary relationships of Colletotrichum spp. from coffee and other hosts

The evolutionary relationships between C. kahawae and C. gloeosporioides s.l. isolates were

established by constructing a multi-locus concatenated phylogeny of 102 samples of

Colletotrichum from coffee and other hosts collected worldwide. Using a six marker (ITS, β-

tub2, ApMAT, Apn25L, MAT1-2-1 and MAT5L) sequencing approach, we achieved a substantial

resolution of species relationships. A summary of the phylogenetic information for each

individual and combined molecular markers is provided in Table 1. Parsimony informative

content within individual nuclear sequences ranged from low (ITS: 3%; β-tub2: 13%) to

moderate (Apn25L: 20%; MAT5L: 18%; MAT1-2-1: 17%) and high (ApMAT: 37%). The

combined dataset consisted of 3783 bp of sequence data, with 764 parsimony informative sites

(20%).

Phylogenetic reconstructions of the concatenated dataset using both ML and BMCMC methods

yielded the same topology with minor discrepancies in the support for some nodes (Fig. 1). The

phylogeny was mostly resolved with high statistical support and revealed a pattern of divergent

evolution between three main lineages derived from two ancestral splits. Since the split from C.

fragariae, the first lineage to diverge included the C. kahawae clade, along with two closely

related groups of isolates (Da = 0.53%) from several hosts other than Coffea spp. (UG1 and

UG2). These two groups include fungi comprising a very broad range of hosts and geographic

origins, as they were isolated from fruits, twigs and leaf lesions or as leaf endophytes from eleven

host species belonging to different taxonomic divisions and orders fairly distant from the

14

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

taxonomic placement of Coffea spp., in three continents (Fig. 1; Table S1). The phylogenetic

proximity was particularly high between C. kahawae and isolates from the UG1 group.

According to the genealogical concordance criteria alone, both groups should be regarded as a

single species, considering that UG1 could not be recovered as a distinct monophyletic group

with sufficient statistical support neither in single nor in multi-locus phylogenetic reconstructions

and that C. kahawae could not be completely distinguished from some UG1 isolates in four out

of six single-locus analyses (Fig. S1). However, since UG1 and C. kahawae clearly represent

distinct and mostly differentiated biological entities (see Results, Pathogenicity tests; Genetic

divergence and differentiation between C. kahawae and UG1) they will be considered hereafter

as two separate groups. Moreover, using the combined dataset, all C. kahawae isolates were

monophyletic and clearly distinguishable with high bootstrap and posterior probability values

(100/1). The remaining C. gloeosporioides s.l. isolates clustered in a large monophyletic group

comprising the two remaining lineages. The most anciently diverged lineage encompassed

representative isolates of C. gloeosporioides s.s. from lemon and olive. The third main lineage

was exclusively composed by isolates from coffee hosts belonging to at least four different

species, which revealed a fairly high divergence from C. kahawae (Da = 6.14%). In this lineage,

most of the unassigned C. gloeosporioides s.l. strains grouped with the C. siamense ex-holotype

in a large and highly supported monophyletic clade, within which a high degree of genealogical

discordance was observed (Fig. S1). We also found two clonal samples (Ang100 and Ang101)

that revealed to be a distinct and well supported new monophyletic lineage (Undescribed group 3;

UG3). Even though a formal recognition of species boundaries within C. gloeosporioides s.l. on

Coffee hosts is out of the scope of this study, all of the diagnosed groups in Fig. 1 are in

accordance with the genealogical concordance phylogenetic species recognition guidelines. Only

the ITS and MAT5L partitions were unable to recover clear species boundaries, most likely due

15

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

to low nucleotide variability and small sequence size, respectively.

Divergence time estimates of the species tree and demographic analysis

From the eight partitioning schemes performed in *BEAST, the BF analysis favored the second

most complex model in which all six nuclear sequences presented distinct substitution and clock

models, with the MAT1-2-1 dataset further partitioned into intron and exon regions with two

codon partitioning, over similar models with three codon (2lnBF = 7) or without codon

partitioning (2lnBF = 37)(Table S3). The generated maximum clade credibility tree (Fig. 2)

presented a similar topology to that produced by the ML and BMCMC analyses of concatenated

data for the two first main lineages. However, species relationships within the third main lineage

were largely incongruent between methods with respect to the relative phylogenetic position of

taxa and statistical support. Unlike the well supported phylogeny obtained from the concatenated

analyses, the *BEAST species tree revealed a much weaker support for the relationships of these

species with C. fructicola, C. siamense and UG3 sharing a terminal and nearly polytomic

relationship, while C. asianum was the most anciently diverged species.

The Bayesian MCMC procedure implemented in *BEAST also allowed co-estimating the

posterior distribution of the time to the most recent common ancestor (TMRCA), including 95%

credibility intervals [Highest Posterior Density (HDP)], of any group of taxa. Thus, to construct a

time line of events, we estimated the TMRCA of all C. kahawae isolates as well as the

divergence time between C. kahawae and three taxa sets of particular interest: i) all C.

gloeosporioides s.l. isolates from Coffea spp. hosts; ii) all isolates from UG1and iii) the closest

phylogenetic representative of UG1, isolate Cg432 (Table 2). The TMRCA of C. kahawae was

estimated to be between 320 and 4 784 years (95% HPD) with a median of 2 219 years. This

16

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

species most likely diverged from the sampled C. gloeosporioides s.l. species from coffee hosts

around 403 610 years (95% HPD, 164 000 to 686 000), contrasting with the relatively shallow

divergence of 11 547 years (95% HPD, 3 574 to 21 735) from UG1. However, given the lack of

monophyly for all isolates in this group and some heterogeneity in their phylogenetic distance to

C. kahawae (Da = 0.19% - 0.69%), we also estimated the TMRCA between C. kahawae and the

UG1 isolate Cg432. This allowed a better approximation of the divergence time between C.

kahawae and the closest non-C. kahawae relative, which was estimated to be 7 840 years (95%

HPD, 2 020 to 15 245).

The demographic history of C. kahawae’s populations is depicted in Fig. 3 using a Bayesian

Skyride plot reconstruction. Setting the GMRF Bayesian Skyride as a tree prior in the BEAST

analysis resulted in older estimations for the TRMCA of C. kahawae (mean: 10 598 years; 95%

HPD, 3 207 to 19 889) and its divergence time from the closest non-C. kahawae relative (mean:

26 578 years; 95% HPD, 13 735 to 45 698). The time lag relative to the *BEAST estimates was

expected due to methodological differences between concatenation and species tree methods

(Heled & Drummond 2010) and thus, these time estimates should be interpreted in relative terms.

The historical demography of C. kahawae revealed a very recent and sharp drop in population

size, without significant evidence of recovery until the present day. Notably, the onset of the

downfall in C. kahawae’s population size (~25 000 years) roughly matches the estimated time of

divergence from the closest non-C. kahawae relative, pointing to the occurrence of a bottleneck

only after their separation (Fig. 3). To assess the potentially biasing effect of C. kahawae’s

population structure in the reconstruction of its demographic history, we also performed the same

GMRF Bayesian skyride analysis for each haplotypic group separately (Angola, Cameroon and

17

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

East Africa). The results were similar across all datasets, suggesting that our demographic

estimates were robust to the violation of the panmitic population assumption due to the genetic

structure of C. kahawae’s populations.

Genetic divergence and differentiation between C. kahawae and UG1

Haplotype networks constructed for each locus revealed a close relationship between C. kahawae

and UG1 (Fig. 4). Full haplotypes were shared between some UG1 isolates and C. kahawae in

several individual markers, although both groups could be completely distinguished in the

Apn25L and MAT1-2-1 datasets. Table 3 summarizes several interspecific divergence and

differentiation statistics. Consistent with a very recent separation, the net divergence was very

low across all individual loci (Da = 0.03-0.35). Intriguingly, out of the 38 polymorphic sites

among C. kahawae and UG1, there was an absence of shared polymorphisms and three

polymorphic sites were fixed in the Apn25L and MAT1-2-1 datasets. This suggests that while

both groups have separated quite recently, current levels of gene flow should be low.

Differentiation indexes, such as FST and Snn estimates, reflect the same pattern as they consistently

show high and significant levels of differentiation across all loci. Most of the polymorphic sites

were exclusive of the UG1 group (32), while only a small fraction (3) was exclusive of C.

kahawae, which further supports the effective population decline of the latter.

Isolation with migration analyses

For each marker dataset, both Fu and Li’s D* and F* and Tajima’s D tests were not significantly

deviated from 0 (P > 0.05; Table S4) and no recombination events were observed within loci.

18

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

Therefore, the hypotheses of selective neutrality and no recombination within loci, both

assumptions of the IMa models, could not be rejected.

The approximate posterior density curves for the isolation with migration model parameters that

resulted from the IMa analyses are shown in Fig. 5 and their maximum-likelihood estimates with

respective 95% Highest Posterior Density (HPD) intervals are provided in Table 4. Even though

there was no sufficient information in the data to estimate the time since the split of the ancestral

population of C. kahawae and UG1 parameter, all other parameters had stronger signals in the

data and revealed a single major peak in their posterior distributions. The limiting information in

the data as well as the lack of shared polymorphisms between populations may have also led to

nonzero tails in the posterior distribution of θUG1, θA and mUG1>Ck that prevented a reliable

estimation of the credibility intervals for these parameters. Despite these limitations, the four

replicate runs gave similar maximum likelihood estimates and posterior distributions for these

parameters, suggesting that the program was sampling from the stationary phase of the MCMC

run and that parameter values are robust and representative given their priors. To test the potential

impact of C. kahawae’s population structure on the estimation of IMa parameters, we have

performed additional runs for each haplotypic group separately. As in the BEAST GMRF Skyride

analysis, the results were similar across datasets suggesting a limited impact of C. kahawae’s

population structure on the estimation of parameters.

Estimates of current and ancestral effective population sizes revealed a decline of circa 40-fold in

the population size of C. kahawae compared to the ancestral population size, suggesting that this

species underwent a population bottleneck since its separation from UG1. Estimates of gene flow

19

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

were highly asymmetrical. No gene flow was detected from UG1 to C. kahawae but moderate

and significant gene exchange was detected from C. kahawae to UG1. However, it is important to

note that the full model did not fit the data significantly better than any of the nested models

compared, including the strict isolation model (Table S5). Thus, even though the full model

reveals an intriguing pattern of gene flow between C. kahawae and UG1, our data could not

reject the simpler hypothesis that both groups have been isolated since their divergence.

Phylogeography of C. kahawae

Our results confirmed the extremely low genetic variability within the C. kahawae clade (π =

0.00076; S = 3). However, three divergent haplotypes could be distinguished, named after their

geographic location: Angola, Cameroon and East Africa (Fig. 6). For the three polymorphic sites

within C. kahawae, ancestral states were inferred from the C. gloeosporioides sampling (Fig. 6a).

The Angola haplotype presented a nucleotide sequence identical to the inferred ancestral state,

while the Cameroon haplotype diverged by one non-synonymous mutation at the MAT1-2-1 gene,

which replaces a serine for a proline residue. The East Africa haplotype shared the same non-

synonymous mutation and diverged from the other haplotypes by two additional intronic

mutations at the β-tub2 marker. To assess the robustness of this inference, we tested an alternative

topology, in which we constrained the East African population as the most ancestral lineage and

the Angola population as the most derived. The likelihood score of the resulting tree was worse

than the unconstrained topology, with a marginally significant P-value (SH Test, P = 0.056).

Moreover, we could not detect the presence of migrant haplotypes within our C. kahawae

sampling. We further extended our phylogeographical analysis by modeling a topographical map

of a partial region of Africa, in order to highlight all regions above 1400m of altitude, and the

20

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

result was embedded on Fig. 6b, as rough gray areas. East African regions revealed a larger

extension of highlands, which coincides with the Great Rift Valley area. Outside this area,

highland regions are sparsely distributed, mainly through South Africa, Namibia, Angola and

Cameroon, and isolated by long distances of lowland regions.

Pathogenicity assays

In the pathogenicity test with green berries, only C. kahawae isolates produced necrotic and

sunken lesions characteristic of Coffee Berry Disease symptoms on all 10 inoculated berries

(10/10 berries scored “1”). Green berries inoculated with isolates from UG1 and UG2 revealed

no necroses or any other symptoms (10/10 berries scored “0”). Conversely, neither C. kahawae

nor isolates from UG1 and UG2 groups were able to successfully colonize ripe berries.

Discussion

To our knowledge, this study describes the first comprehensive analysis of the evolutionary

relationships of C. kahawae with C. gloeosporioides s.l. isolates from diverse hosts worldwide,

aiming at enlightening the underlying speciation event. One of the most striking findings was that

none of the initial hypotheses asserting that C. kahawae would have emerged from a C.

gloeosporioides s.l. gene pool from Coffea spp. hosts could be supported by our results.

According to our phylogenetic analyses, C. kahawae appears to be diverging from all C.

gloeosporioides s.l. isolates from coffee hosts since approximately 403 000 years, which is

inconsistent with a very recent origin by mutation and adaptation (Nutman & Roberts 1960) or

hybridization (Robinson 1976) from these populations. The most notable inconsistency, however,

derives from the existence of a taxonomically unclassified phylogenetic group sampled from

several hosts other than Coffea spp. (UG1) that is genetically similar to C. kahawae. Although

21

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

unexpected, the close relationship of UG1 with C. kahawae as well as its association with

genetically distant hosts clearly suggests an alternative scenario in which a host-jump preceded

the outbreak of Coffee Berry Disease epidemics, though the diversity of hosts that this group

covers hamper any confident inference on the host sources of this event. In Colletotrichum, host-

shifts have not been previously documented as a speciation driver, but there is recent evidence

that local adaptation and host specialization have structured populations of some species, such as

C. cereale (Crouch et al. 2009). Notwithstanding, a large amount of evidence has accumulated

showing that host-shift speciation, a particular case of ecological speciation, is one of the main

routes for the emergence of fungal pathogens (Couch et al. 2005, Stukenbrock & McDonald

2008, Zaffarano et al. 2008, Giraud et al. 2010).

“Rapid ecological speciation through host jump” hypothesis

Considering the onset of C. kahawae’s diversification and its divergence time from the closest

non-C. kahawae relative, C. kahawae and UG1 have only been separated for an average of 5 600

years. This presents a remarkably short period of time for speciation to occur and comes in

agreement with the observation that crop pathogens can emerge rather swiftly in recent

timescales (Stukenbrock & McDonald 2008). However, the estimated time period for the origin

of C. kahawae is still older than expected from historical data (Firman & Waller 1977) and from

the onset of the cultivation of C. arabica in Africa (Bigger 2006). While C. kahawae could have

begun adapting to C. arabica in wild populations or on an intermediate host, such as another

Coffea spp., these scenarios are not supported by our data. A more plausible explanation could be

a downward bias in our molecular rate calibration resultant from the time dependency of

mutation rates (Ho et al. 2005). It has been recently demonstrated that mutation rates in recent

lineages accelerate exponentially towards the present, biasing divergence time estimates to higher

22

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

values when old calibrations are employed (Ho et al. 2011). This would leave room for fitting the

more recent historical time estimates in this molecular-derived scenario.

The close phylogenetic relationship between C. kahawae and UG1 also raises the question of

whether they represent good species or simply structured populations from a single species. Even

though we lacked a solid basis to consider both groups as separate species solely from a

phylogenetic standpoint, our results from the pathogenicity tests showed that, unlike C. kahawae,

isolates from UG1 were unable to cause Coffee Berry Disease on detached green berries and

neither group was able to infect ripe berries. Green Arabica coffee berries thus seem to present an

ecological source of divergent selection to which C. kahawae successfully adapted, while isolates

from UG1 appear to suffer a sharp fitness decrease that greatly compromises their survival. At

the molecular level, C. kahawae and UG1 also seem to represent well differentiated groups

according to a combination of significant and elevated differentiation indexes across all studied

loci and a complete segregation of polymorphic sites. Under the isolation with migration model,

migration estimates revealed an absence of gene flow to C. kahawae but, unexpectedly, a

moderate amount of gene flow into UG1 was detected. The transfer of genetic material from C.

kahawae to UG1 after their divergence is intriguing because it implies the occurrence of a sexual

stage of C. kahawae in nature as well as the existence of a suitable alternate host or substrate

where mating could occur, both of which are yet to be reported (Firman and Waller 1977, Bridge

et al. 2008). Nevertheless, the migration signal in our data was not strong enough to allow the

rejection of a strict isolation model, though future work with a larger sampling of the UG1 group

and more variable loci is still required. Overall, it is reasonable to conclude that C. kahawae and

UG1 represent ecologically distinct and isolated groups that evolved significantly different

23

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

pathogenic abilities in a short period of time, from which only C. kahawae should trigger major

disease control and biosecurity measures on C. arabica crops.

During the brief period that C. kahawae and UG1 were separated, historical demographic

reconstructions and patterns of DNA polymorphism consistently showed a sudden and severe

drop in the effective population size of C. kahawae, as might be expected following a host-jump.

For such a host-jump to occur, isolates from the UG1 group must have been within the cruising

range of C. arabica during the initial stages of C. kahawae speciation, even if the geographic

distribution of both groups became disjunct in latter stages. In such scenario, the pattern of low

genetic diversity in C. kahawae could have been produced by the exertion of strong disruptive

selection during the first stages of adaptation to C. arabica, coupled with the ability of the fungus

to undergo repeated cycles of asexual propagation. Asexual reproduction could greatly amplify

new advantageous mutations to very high frequencies along with the entire genome by

hitchhiking (Adolfatto 2001). This would eliminate polymorphisms and maintain only the intact

genome of those individuals in the population having the favored mutations, evidencing the

strong genetic bottleneck and clonal pattern observed in C. kahawae as well as the lack of shared

polymorphisms with UG1.

Adaptation to a new host is generally most efficient when gene flow of ancestral genes into the

adapting population is severely reduced or absent (Giraud et al. 2010), which may be difficult to

achieve when the source population lies within cruising range. Nonetheless, two significant

intrinsic barriers to gene flow may have indeed evolved between C. kahawae and UG1 that can

explain their high differentiation and low levels of gene flow. First, the process of host

24

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

specialization itself can give rise to an isolation mechanism from the earliest stages of divergence

in fungal populations (Gladieux et al 2010). Because most pathogenic Ascomycetes

(Colletotrichum included; Cisar & TeBeest 1999) can only mate on their host after mycelial

development, any genetic variation favoring an adaptation to C. arabica can pleiotropically cause

assortative mating and restrict gene flow, as any unfit immigrant will be unable to grow and

reproduce in such environment (Giraud 2006; Giraud et al. 2010). This means that the differential

survivorship of ill-adapted UG1 and adapted C. kahawae isolates on C. arabica constitutes a

legitimate and significant pre-zygotic reproductive barrier, recently named immigrant inviability

(Nosil et al. 2005). The unique adaptation of C. kahawae could be viewed as a “magic trait”

scenario (Gavrilets 2004), where assortative mating arises as a by-product of host-specialization.

Under strong selection, this barrier has already been shown to quickly and significantly prevent

gene flow in other fungal pathogens undergoing adaptation to their hosts, such as Venturia

inaequalis’s sympatric populations from apple varieties with and without the Vf resistance gene

(Gladieux et al. 2011). Second, the transition of C. kahawae to a predominantly asexual

reproductive strategy could have been a major reproductive barrier, allowing for multiple

generations of selection for local adaptation without the pernicious effect of recombination

(Zeigler 1998, Giraud et al. 2010). Altogether, it is reasonable to expect that the combination of

immigrant inviability and the predominantly asexual behavior of C. kahawae would have been

rather effective in keeping populations separated during the early stages of divergence. However,

additional work with a strategic and focused sampling of the UG1 group will be required to better

understand not only the geography of the latter stages in C. kahawae speciation but also the

mechanisms underlying the adaptation of this fungal pathogen to C. arabica.

25

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

Phylogeography of C. kahawae

The phylogeographic knowledge of C. kahawae can provide important information that may help

pinpointing potential regions for the occurrence of its source population. In fact, our genetic

analysis revealed the first consistent and unambiguous population structure of C. kahawae. Based

on single nucleotide polymorphisms, three slightly divergent haplotypes highly correlated with

their geographic distribution were found (Angola, Cameroon and East Africa), confirming

previous indications of an incipient geographic structuring (Bridge et al. 2008). Strikingly, the

ancestral state inference suggests that the Angola haplotype is the most ancestral among the

studied isolates, while those from Kenya and the remaining East African countries clustered

together as the most derived lineage. This provides an alternative view for the geographic origin

of C. kahawae, as compared to the current understanding, which follows the premise that C.

kahawae co-evolved with coffee hosts and is based on disease reports and field observations.

Arabica coffee occupies only a small fraction of the plantations in Angola, mainly in the central

plateau, while 98% of coffee production derives from C. canephora varieties. The first

introductions of C. arabica in this country occurred in the 18th century (A. Mendes Gaspar¸

personal communication), and the first reports of Coffee Berry Disease date back to 1930

(Beynon et al. 1995), just eight years after its discovery in Kenya. However, we stress that

information obtained from historical data can be flawed because it is biased towards regions

where disease incidence is higher, and where substantial scientific effort has been focused on

monitoring plant diseases. Moreover, as we suggested above, the ancestral population of C.

kahawae most likely emerged from hosts other than Coffea spp., which may circumvent the

difficulties in reconciling our results with historical data, since speciation via host-shift can be

rather swift. Adding the fact that the transport of infected plant material has reached an

26

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

unprecedented global scale (Stukenbrock & McDonald 2008), this greatly increases the

likelihood for the hypothesis of a host-jump of the ancestral population of C. kahawae into the

newly arrived C. arabica plants in Angola.

The population structure of C. kahawae also reveals no evidence of migration between the

geographical locations of each haplotype. This may be explained by seldom sequential

introductions with subsequent geographic isolation. Arabica coffee growing areas in Angola,

Cameroon and East Africa are separated by extensive lowland areas, which are not suitable for

the pathogen or the host (Firman & Waller 1977), thus representing a potentially effective barrier

to migration. In such scenario, bottleneck events during these rare introductions can generate drift

pulses in each of the introduced populations, which become genetically differentiated from each

other whilst retaining their source-introduction relationship (Estoup & Guillemaud 2010).

Accordingly, our results suggest that after a hypothetical origin of the Angola population, an

introduction in the Cameroon followed and from there to the East Africa countries, while each of

the established populations remained isolated in their respective highland areas. However, in

invasion biology, evolutionary scenarios are often characterized by small divergence times, which

may decrease the likelihood of identifying the true source-sink relationship between populations

due to the stochasticity of the process (Estoup & Guillemaud 2010). Moreover, given the vastness

of coffee plantations in the East African countries, particularly in Ethiopia where C. arabica also

occurs naturally, we cannot exclude the presence of unsampled ancestral haplotypes in these

regions that could alter our inferences. However, in such scenario, the exceeding number of

additional steps required to explain the current phylogeographic pattern renders this hypothesis

unlikely. Thus, even though our dataset suggest an Angolan origin for C. kahawae, sampling

27

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

more isolates and polymorphic loci will certainly provide a much more reliable and robust insight

on the evolution of C. kahawae’s populations.

Conclusions

Altogether, our work represents an important step in the understanding of the evolutionary and

speciation history of C. kahawae that will certainly have implications and launch new directions

on its future research. Having found very little support for the current understanding of these

events, we postulate an alternative hypothesis of rapid ecological speciation by a very recent

host-jump and subsequent host specialization to explain the emergence of C. kahawae on Arabica

coffee. Two far-reaching and intriguing implications of this hypothesis are that new and severe

fungal pathogens are indeed able to successfully adapt and speciate in a remarkably short time

span, particularly when driven by divergent natural selection, and that intrinsic and common

biological traits of fungal populations may greatly facilitate their emergence. Our results also

highlight the power and value of molecular data when inferring dissemination patterns of

emerging pathogens. Unlike the limited information retrieved from historical data, the population

structure of C. kahawae revealed an alternative and more objective center of origin and that the

topography of the African continent may have had a pivotal role in shaping and limiting its

dispersal.

Acknowledgments

At FCUL we thank our colleagues, Ana Vieira and Tiago Jesus, for discussions and constructive

criticisms during the elaboration of this manuscript. At CIFC/IICT, we appreciate the technical

support provided by Sandra Sousa Emídio. For supplying additional and valuable isolates for this

work we are also in debt to Ana Paula Ramos, at ISA/UTL, Portugal, and Peter Johnston and

28

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

Bevan Weir, at Landcare Research, New Zealand. Lei Cai acknowledges grants CAS-KSCX2-

EW-J-6/NSFC31070020. This work was financially supported by the portuguese Foudation for

Science and Technology (FCT) and IICT, Ministério da Ciência, Tecnologia e Ensino Superior,

Portugal.

References

Adolfatto P (2001) Adaptive hitchhiking effects on genome variability. Current Opinion in

Genetics & Development, 11, 635–641.

Aylor DL, Price EW, Carbone I (2006) SNAP: combine and map modules for multilocus

population genetic analysis. Bioinformatics, 22, 1399–1401.

Bandelt HJ, Forster P, Röhl A (1999) Median-joining networks for inferring intraspecific

phylogenies. Molecular Biology and Evolution, 16, 37–48.

Beynon S, Coddington A, Lewis BG, Várzea V (1995) Genetic variation in the Coffee Berry

Disease pathogen, Colletotrichum kahawae. Physiological and Molecular Plant Pathology, 46,

457–470.

Bigger M (2006) The dissemination of coffee cultivation throughout the world. Tropical

Agriculture Association Newsletter, 26, 15–19.

Bridge PD, Waller JM, Davies D, Buddie AG (2008) Variability of Colletotrichum kahawae in

relation to other Colletotrichum species from tropical perennial crops and the development of

diagnostic techniques. Journal of Phytopathology, 156, 274–280.

Brown A, Sreenivasaprasad S, Timmer L (1996) Molecular characterization of slow-growing

orange and key lime anthracnose strains of Colletotrichum from citrus as C. acutatum.

Phytopathology, 86, 523–527.

29

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

Brown JM, Lemmon AR (2007) The importance of data partitioning and the utility of Bayes

factors in Bayesian phylogenetics. Systematic Biology, 56, 643–655.

Cannon PF, Buddie AG, Bridge PD (2008). The typification of Colletotrichum gloeosporioides.

Mycotaxon, 104, 1890–204.

Cisar C, TeBeest D (1999) Mating system of the filamentous ascomycete, Glomerella cingulata.

Current Genetics, 35, 127–133.

Couch BC, Fudal I, Lebrun M-H, Tharreau D, Valent B, van Kim P, Nottéghem J-L, Kohn LM

(2005) Origins of host-specific populations of the blast pathogen Magnaporthe oryzae in crop

domestication with subsequent expansion of pandemic clones on rice and weeds of rice.

Genetics, 170, 613-630

Coyne JA, Orr HA (2004) Speciation. Sinauer associates, Inc., Sunderland, Massachusetts.

Crouch JA, Tredway L, Clarke B, Hillman B (2009). Phylogenetic and population genetic

divergence correspond with habitat for the pathogen Colletotrichum cereale and allied taxa across

diverse grass communities. Molecular Ecology, 18, 123–135.

Dieckmann U, Doebeli M, Metz JAJ, Tautz D (2004) Adaptive Speciation. Cambridge University

Press, Cambridge.

Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees.

BMC Evolutionary Biology, 7, 214.

Eastman JR (2006) IDRISI Andes, Clark University, Worcester, Massachusetts.

Estoup A, Guillemaud T (2010) Reconstructing routes of invasion using genetic data: why, how

and so what? Molecular Ecology, 19, 4113–4130.

Excoffier L, Laval G, Schneider S (2005) Arlequin ver. 3.0: An integrated software package for

30

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

population genetics data analysis, Evolutionary Bioinformatics Online, 1, 47–50.

Firman I, Waller J (1977) Coffee Berry Disease and other Colletotrichum diseases of coffee.

Phytopathological Papers, 20, 1–53.

Gavrilets S (2004) Fitness Landscapes and the Origin of Species. Princeton University Press,

Princeton, New Jersey.

Giraud T (2006) Selection against migrant pathogens: the immigrant inviability barrier in

pathogens. Heredity, 97, 316–318.

Giraud T, Gladieux P, Gavrilets S (2010) Linking the emergence of fungal plant diseases with

ecological speciation. Trends in Ecology and Evolution, 25, 387–395.

Gladieux P, Caffier V, Devaux M, Le Cam B (2010) Host-specific differentiation among

populations of Venturia inaequalis causing scab on apple, pyracantha and loquat. Fungal

Genetics and Biology, 6, 511-521.

Gladieux P, Guérin F, Giraud T, Caffier V, Lemaire C, Parisi L, Didelot F, Le Cam B (2011)

Emergence of novel fungal pathogens by ecological speciation: importance of the reduced

viability of immigrants. Molecular Ecology, 20, 4521-4532.

Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data.

Molecular Biology and Evolution, 27, 570–580.

Hendry AP, Nosil P, Rieseberg LH (2007) The speed of ecological speciation. Functional

Ecology, 21, 455–464.

Hey J, Nielsen R (2004) Multilocus methods for estimating population sizes, migration rates and

divergence time, with application to the divergence of Drosophila pseudoobscura and D.

persimilis. Genetics, 167, 747–760.

31

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

Hey J, Nielsen R (2007) Integration within the Felsenstrein equation for improved Markov chain

Monte Carlo methods in population genetics. Proceedings of the National Academy of Sciences

USA, 104, 2785–2790.

Ho SYW, Phillips MJ, Cooper A, Drummond AJ (2005) Time dependency of molecular rate

estimates and systematic overestimation of recent divergence times. Molecular Biology and

Evolution, 22, 1561–1568.

Ho SYW, Lanfear R, Bromham L, Phillips MJ, Soubrier J, Rodrigo AJ, Cooper A (2011) Time-

dependent rates of molecular evolution. Molecular Ecology, 20, 3087–3101.

Hudson RR (2000) A new statistic for detecting genetic differentiation. Genetics, 155, 2011–

2014.

Hudson RR, Kaplan N (1985) Statistical properties of the number of recombination events in the

history of a sample of DNA sequences. Genetics, 111, 147–164.

Jones KE, Patel NG, Levy MA, Storeygard A, Balk D, Gittleman JL, Daszak P (2008) Global

trends in emerging infectious diseases. Nature, 451, 990–993.

Kareiva P, Watts S, McDonald R, Boucher T (2007) Domesticated nature: shaping landscapes

and ecosystems for human welfare. Science, 103, 1866–1869.

Kasuga T, White TJ, Taylor JW (2002) Estimation of nucleotide substitution rates in

Eurotiomycete fungi. Molecular Biology, 19, 2318–2324.

Katoh K, Toh H (2009). Recent developments in the MAFFT multiple sequence alignment

program. Briefings in Bioinformatics, 9, 286–298.

Manuel L, Talhinhas P, Várzea V, Neves-Martins J (2010) Characterization of Colletotrichum

32

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

kahawae isolates causing coffee berry disease in Angola. Journal of Phytopathology, 158, 310–

313.

McDonald J (1926) A preliminary account of a disease of green coffee berries in Kenya.

Transactions of the British Mycological Society, 11, 145–154.

Minin VN, Bloomquist EW, Suchard MA (2008) Smooth skyride through a rough skyline:

Bayesian coalescent-based inference of population dynamics. Molecular Biology and Evolution,

25, 1459–1471.

Munday PL, Herwerden LV, Dudgeon CL (2004) Evidence for sympatric speciation by host shift

in the sea. Current Biology, 14, 1498–1504.

Nei M (1987). Molecular Evolutionary Genetics, Columbia Univ. Press, New York.

Nosil P, Vines TH, Funk DJ (2005) Reproductive isolation caused by natural selection against

immigrants from divergent habitats. Evolution, 59, 715–719.

Nutman F, Roberts F (1960) Investigations on a disease of Coffea arabica caused by a form of

Colletotrichum coffeanum Noack. I. Some factors affecting infection by the pathogen.

Transactions of the British Mycological Society, 43, 489–505.

Nylander, JAA (2004) MrModeltest v2. Program distributed by the author. Evolutionary Biology

Center, Uppsala University.

O’Donnell K, Cigelnik E (1997) Two divergent intragenomic rDNA ITS2 types within a

monophyletic lineage of the fungus Fusarium are nonorthologous. Molecular Phylogenetics and

Evolution, 7, 103–116.

Pina-Martins F, Paulo OS (2008) Concatenator: sequence data matrices handling made easy.

Molecular Ecology Resources, 8, 1254–1255.

33

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

Posada D, Crandall KA (1998) Modeltest: testing the model of DNA substitution. Bioinformatics,

14, 817–818.

Prihastuti H, Cai L, Chen H, McKenzie E, Hyde KD (2009) Characterization of Colletotrichum

species associated with coffee berries in northern Thailand. Fungal Diversity, 39, 89–109.

Rambaut A, Drummond AJ (2007) Tracer v1.4, Available from http://beast.bio.ed.ac.uk/Tracer.

Robinson RA (1974) Terminal report of the FAO coffee pathologist to the government of

Ethiopia. FAO, Rome AGO/74/443, 16pp.

Robinson RA (1976) Plant Pathosystems. Springer-Verlag, Berlin.

Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed

models. Bioinformatics, 19, 1572–1574.

Rozas J, Sánchez-DelBarrio JC, Messeguer X, Rozas AR (2003) DnaSP: DNA polymorphism

analyses by the coalescent and other methods. Bioinformatics, 19, 2496–2497.

Rundle HD, Nosil P (2005) Ecological speciation. Ecology Letters, 3, 336–352.

Schluter D (2009) Evidence for ecological speciation and its alternative. Science, 323, 737–741.

Schluter D, Conte GL (2009) Genetics and ecological speciation. Proceedings of the National

Academy of Sciences USA, 106, 9955–9962.

Servedio MR, Doorn GSV, Kopp M, Frame AM, Nosil P (2011) Magic traits in speciation:

“magic” but not rare? Trends in Ecology and Evolution, 26, 389–397.

Shimodaira H, Hasegawa M (1999) Multiple comparisons of loglikelihoods with applications to

phylogenetic inference. Molecular Biology and Evolution, 16, 1114–1116.

Silva DN, Talhinhas P, Lei C, Várzea V, Paulo OS, Batista D (2012) Application of the

Apn2/MAT locus to improve the systematics of the Colletotrichum gloeosporioides complex: An

34

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

example from coffee (Coffea spp.) hosts. Mycologia, DOI:10.3852/11-145.

Stukenbrock E, McDonald B (2008) The origins of plant pathogens in agro-ecosystems. Annual

Review of Phytopathology, 46, 75–100.

Swofford DL (2003) PAUP*: Phylogenetic Analysis Using Parsimony (* and Other Methods),

Version 4. Sinauer Associates, Sunderland, MA.

Taylor JW, Jacobson DJ, Kroken S, Kasuga T, Geiser DM, Hibbett DS, Fisher MC (2000)

Phylogenetic species recognition and species concepts in fungi. Fungal Genetics and Biology, 31,

21–32.

Via S (2001) Sympatric speciation in animals: the ugly duckling grows up. Trends in Ecology

and Evolution, 16, 381–390.

Winkler IS, Mitter C, Scheffer SJ (2009) Repeated climate-linked host shifts have promoted

diversification in a temperate clade of leaf-mining flies. Proceedings of the National Academy of

Sciences USA, 43, 18103–18108.

Zaffarano PL, McDonal BA, Linde CC (2008) Rapid speciation following recent host shifts in

the plant pathogenic fungus Rhynchosporium. Evolution, 62, 1418-1436.

Zeigler RS (1998) Recombination in Magnaporthe grisea. Annual Review of Phytopathology, 36,

249-275.

Data accessibility

List of isolates, natural host, country and geographic region: Table S1 in supporting information.

DNA sequences were deposited in the EMBL database with the accession numbers presented in

Table S1.

35

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

Multiple sequence alignments for each nuclear marker were archived in the dryad repositories:

doi:

*BEAST, GMRF Skyride and IMa input files were archived in the dryad repositories: doi:

36

792

793

794

Figure legends:

Fig. 1 50% majority rule Bayesian tree with the concatenated six-marker dataset illustrating the

evolutionary relationships between Colletotrichum kahawae and C. gloeosporioides s.l. from

coffee and other hosts. Isolates from Coffea spp. hosts are followed by a black bar while isolates

from other hosts are followed by a grey bar and a code, for which a key provided in the right. The

geographic origin of the isolates is also provided underlined. The tree was rooted with C.

fragariae. C. kahawae is represented by a single clade for clarity with the detailed phylogenetic

reconstruction provided in Fig. 5.

Fig. 2 Time-calibrated Bayesian species tree resulting from the *BEAST analysis based on the

six-marker dataset and on the best partitioning scheme selected from the BF analysis. Posterior

probabilities are given above the respective node. Scale numbers correspond to years before

present.

Fig. 3 GMRF Bayesian Skyride plot depicting fluctuations in the population size of

Colletotrichum kahawae through time. The x-axis is in units of years before present and the y-

axis is equal to Neτ. The thick black line is the median estimate and the gray dashed lines delimit

the 95% Highest Posterior Density. The vertical dashed line represents the median of the time to

the most recent common ancestor between C. kahawae and the closest non-C. kahawae relative.

Fig. 4 Median-joining haplotype networks depicting the relationship between Colletotrichum

kahawae and Undescribed group 1 (UG1). Haplotypes are presented as pie charts with size

proportional to the number of individuals.

37

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

Fig. 5 The posterior probabilities of parameter estimates of the isolation with migration model

(IMa).

Fig. 6 Phylogeographic patterns of Colletotrichum kahawae. (a) Detailed phylogenetic

reconstruction of populations relationships within the C. kahawae clade, using the combined six-

marker dataset. Bootstrap and posterior probability values are provided above each branch. The

three nucleotide combination provided below each branch represents the mutational events that

occurred along the evolution of the three populations. The combination of the Angola population

was inferred as the ancestral state. Mutations are highlighted with an asterisk and the source gene

is provided. KA, non-synonymous mutation; Ki, intronic mutations. (b) Geographic distribution

of the three divergent C. kahawae populations, with the respective location key provided on the

upper left corner. Countries were highlighted with different colors corresponding to the existing

haplotypes. Rough orange areas on the map represent regions above 1 400 m of altitude.

38

818

819

820

821

822

823

824

825

826

827

828

829

830

831

Table 1 Summary of the phylogenetic information for the individual and combined nuclear

regions used in this study

Nuclear

Region FLa Indels NCb PIc %PId Ve Modelf Modelg

ITS 489 2 487 14 3% 1 TrNef+I SYM+I

β-tub2 575 34 541 76 13% 5 SYM+G SYM+G

Apn25L 883 36 847 172 20% 32 GTR+G GTR+G

ApMAT 782 84 698 287 37% 29 HKY+G HKY+G

MAT5L 213 6 194 38 18% 11 TrN+G HKY+G

MAT1-2-1h 843 1 842 141 17% 18 K81+I/GTR/SYM+G GTR+I/GTR/SYM

Combined 3783 130 3606 764 20% 72 - -a Fragment length

b Nucleotide characters (bp)

c Parsimony informative characters (bp)

d % of Parsimony informative characters

e Variable uninformative characters (bp)

f Best fit model of DNA evolution, under the AIC, implemented in ModelTest

g Best fit model of DNA evolution, under the AIC, implemented in MrModelTest

h Models of DNA evolution correspond to 1st and 2nd codon position/3rd codon position/intron

39

832

833

834

835

836

837

838

839

840

841

Table 2 Bayesian estimates of the time (in years) to the most recent common ancestor (TMRCA)

of four taxa sets of interest. Values in parenthesis are 95% Highest Posterior Densities intervals.

The divergence estimates were calculated using *BEAST v1.6.1 with the best partitioning

scheme previously selected with a BF analysis

Taxa set TMRCA SDa ESSb

Ck 2,219 (320; 4,784) 64 428

Ck & UG1 11,547 (3,574; 21,735) 133 1470

Ck & Cg432 7,840 (2,020; 15,245) 142 743

Ck & Cg from Coffea spp. 403k (149k; 647k) 2.6k 2811a Standard deviation of mean

b Effective Sample Size

40

842

843

844

845

846

847

Table 3 Divergence and differentiation between the Undescribed group 1 (UG1) group and

Colletotrichum kahawae

Locus Daa (%) Vb VF

c VSd PCk

e PUG1e FST

f Snnf Rmg

ApMAT 0.09 7 0 0 0 7 0.536*** 0.88596*** 0

Apn25L 0.12 13 1 0 0 12 0.868*** 0.93525*** 0

ITS 0.03 4 0 0 0 4 0.634** 0.88596*** 0

MAT1-2-1 0.31 7 2 0 1 4 0.845*** 1*** 0

MAT5L 0.35 2 0 0 0 2 0.913*** 0.96610*** 0

β-tub2 0.13 5 0 0 2 3 0.425** 0.84394*** 0

Total 0.14 38 3 0 3 32 0.757*** 1***a Net divergence (Nei 1987)

b Total number of polymorphic sites

c Number of fixed polymorphisms

d Number of shared polymorphisms

e Number of exclusive polymorphisms for C. kahawae and UG1

f Differentiation statistics measured by FST (Excoffier et al. 2005) and Snn statistics (Hudson 2000)

g Minimum number of recombination events

Statistical significance: ** P < 0.05; *** P < 0.01

41

848

849

850

851

852

853

854

855

856

857

858

Table 4 Maximum-likelihood estimates (MLE) and respective 95% High Posterior Density

(HPD) intervals for parameters of the isolation and migration model for the clade composed by

Undescribed group 1 and Colletotrichum kahawae

θA θCK θUG1 mUG1>Ck mCk>UG1

MLE 6.38 0.15 5.45 0.88 1.45

Lower 95% HPD 0 0.05 1.09 0 0

Higher 95% HPD 13.97 0.29 12.29 2.73 2.59

42

859

860

861

862

0.02

C. kahawae

UG1

UG2

Ang52 Angola

C1206.3 New Zealand

Ang84 Angola

Ang96 Angola

BPD-I16 Thailand

Thai2 Thailand

Thai3 Thailand

Ang101 Angola

Ang95 Angola

C1252.12 New Zealand

CCA4 Kenya

C880.1 New Zealand

CR21 Portugal

C1262.12 Australia

Bra9 Brazil

CCM6 Kenya

Mal5 Malawi

CCM7 Kenya

BPD-I4 Thailand

PT111 Portugal

Ang91 Angola

Chi4 China

Bra8 Brazil

Ang97 Angola

Col2 Colombia

Cg432 Portugal

Thai1 Thailand

C1282.3 New Zealand

Thai4 Thailand

Ang99 Angola

Ang40 Angola

CCM5 Kenya

BPD-I2 Thailand

C1282.4 New Zealand

Bra5 Brazil

Col1 Colombia

CCA3 Kenya

C1291 USA

Ang100 Angola

C1275.8 Germany

C1288.1 New Zealand

PR220 Portugal

C1252.22 New Zealand

C. gloeosporioides s.s.

C. siamense

C. asianum

C. fructicolaUG3

C. fragariae

Posterior probability > 0.95Bootstrap > 70

OeCl

OeCaKeHpPaMi

PtVlCo

Host key:Coffea spp.

Non - Coffea spp.Oe - Olea europaea

Cl - Citrus limon

Ca - Camellia sp.

Ke - Kunzea ericoides

(Angiosperm; Asterids)

(Angiosperm; Rosids)

(Angiosperm; Asterids)

(Angiosperm; Rosids)Hp - Hypericum perfuratum

(Angiosperm; Rosids)Pa - Persea americana

(Angiosperm; Magnolids)Mi - Mangifera indica

(Angiosperm; Rosids)Pt - Podocarpus totara

(Gimnosperms; Pinopsida)Vl - Vitex lucens

(Angiosperm; Asterids)Co - Coprosma sp.

(Angiosperm; Asterids)

(Angiosperm; Gentiales)

Posterior probability > 0.95Bootstrap < 70

0100000200000300000400000500000

C. siamense

UG3

C. kahawae

C. gloeosporioides

UG1

C. asianum

C. fragariae

C. fructicola

UG2

1

1

0.53

0.89

1

10.94

Time (yrs)0 50000

1.E4

1.E5

1.E6

1.E7

1.E8

1000007500025000

Population Size

ApMAT Apn25L ITS

MAT1-2-1 MAT5L -tub2

0 5 10 15

0.0

50

.07

0.0

90

.11

0.0 0.1 0.2 0.3 0.4 0.5

01

23

45

6

0 5 10 15

00

.05

0.1

00

.15

AθUG1θCkθ

Prob

abili

ty

0 1 2 3 4 0 1 2 3 4

mCk > UG1 mUG1 > Ck

Prob

abili

ty

Population size

Migration

00.4

0.8

1.2

00.2

0.4

0.6

81/1

54/.98

100/1

Angolapopulations

Cameroonpopulations

East Africanpopulations

..T..T..T..

..C..T..T..

..C..C..A..

Ancestral state

*MAT1-2-1

* *-tub2

K

K K

A

i i b)a)

C. kahawae clade

Cameroon

Angola

Zimbabwe

Malawi

Tanzania

Kenya

EthiopiaUganda

RwandaBurundi