Host-jump drives rapid and recent ecological speciation of the emergent fungal pathogen...
Transcript of Host-jump drives rapid and recent ecological speciation of the emergent fungal pathogen...
Title: Host-jump drives rapid and recent ecological speciation of the emergent fungal pathogen
Colletotrichum kahawae
Authors: Diogo Nuno Silva1,2, Pedro Talhinhas1, Lei Cai3, Luzolo Manuel4, Elijah K. Gichuru5,
Andreia Loureiro1, Vítor Várzea1, Octávio Salgueiro Paulo2 and Dora Batista1.
1 CIFC/IICT - Centro de Investigação das Ferrugens do Cafeeiro/ Instituto de Investigação
Científica Tropical, Quinta do Marquês, 2784-505 Oeiras, Portugal.
2 Computational Biology and Population Genomics group, Centro de Biologia Ambiental,
Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal
3 State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences,
No.10, North 4th Ring Rd West, Beijing 100190, People’s Republic of China
4 Instituto Nacional do Café de Angola, Luanda, Angola
5 Coffee Research Foundation, P.O.Box 4-00232, Ruiru, Kenya
Abstract: Ecological speciation through host-shift has been proposed as a major route for the
appearance of novel fungal pathogens. The growing awareness of their negative impact on global
economies and public health created an enormous interest in identifying the factors that are most
likely to promote their emergence in nature. In this work, a combination of pathological,
molecular and geographic data was used to investigate the recent emergence of the fungus
Colletotrichum kahawae. C. kahawae emerged as a specialist pathogen causing Coffee Berry
Disease in Coffea arabica, due to its unparalleled adaptation of infecting green coffee berries.
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Contrary to current hypotheses, our results suggest that a recent host-jump underlay the
speciation of C. kahawae from a generalist group of fungi seemingly harmless to coffee berries.
We posit that immigrant inviability and a predominantly asexual behavior could have been
instrumental in driving speciation by creating pleiotropic interactions between local adaptation
and reproductive patterns. Moreover, we estimate that C. kahawae began its diversification at less
than 2,200 yrs leaving a very short time frame since the divergence from its sibling lineage
(∼5,600 yrs), during which a severe drop in C. kahawae’s effective population size occurred.
This further supports a scenario of recent introduction and subsequent adaptation to C. arabica.
Phylogeographic data revealed low levels of genetic polymorphism but provided the first
geographically consistent population structure of C. kahawae, inferring the Angolan population
as the most ancestral and the East African populations as the most recently derived. Altogether,
these results highlight the significant role of host specialization and asexuality in the emergence
of fungal pathogens through ecological speciation.
Short title: Host-jump speciation of C. kahawae
Keywords: Adaptation, BEAST, Coffee Berry Disease, Coffea arabica, Divergence population
genetics, Plant pathogen
Introduction
Ecological speciation is being increasingly recognized as an important mechanism driving the
origin of species, in which the contributions of ecology and natural selection are fundamental
(Rundle & Nosil 2005; Schluter 2009). By definition, ecological speciation occurs when
2
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
reproductive isolation between populations arises as a consequence of ecologically-based
divergent selection promoting the fixation of different advantageous alleles in each of the
contrasting environments (Schluter & Conte 2009). Host specialization can be a powerful driver
of divergent selection, promoting ecological speciation via host shift or jump, depending on
whether the involved hosts are genetically similar or distant, respectively. Evidence is
accumulating from disparate taxa, such as phytophagous insects (Winkler et al. 2009), coral
dwelling fish (Munday et al. 2004) and pathogenic fungi (Giraud et al. 2010), showing that
specialization to different hosts can effectively lead to population sub-division, adaptation and
divergence.
Host-shifts may represent a way by which incipient pathogen species can escape from direct
resource competition, thus experiencing relatively high fitness even if they are initially poorly
adapted to the new environment (Dieckmann et al. 2004). The adaptation to the new habitat can
then drive the evolution of partial reproductive isolation within a few generations (Hendry et al.
2007). During this period, adaptive divergence itself is likely to be of key importance in
restricting gene flow, thereby allowing the completion of speciation regardless of the geographic
distribution of the diverging populations (Dieckmann et al. 2004). Indeed, verbal and
mathematical models suggest that when host specialization and local adaptation are coupled with
assortative mating, gene flow can be severely or totally restricted even in the absence of extrinsic
barriers (Nosil et al. 2005; Schluter & Conte 2009; Gladieux et al. 2011). Moreover, if one
considers the effects of pleiotropy, the genetic associations between disruptively selected traits
and the mechanism of reproductive isolation do not depend exclusively upon the establishment of
linkage, rendering ecological speciation likely and swift (Via 2001, Servedio et al. 2011).
3
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
Elucidating the biotic factors and environmental circumstances that are most likely to promote
ecological speciation in nature can be challenging using contemporaneous data, since time often
obscures the conditions at the moment of speciation (Coyne & Orr 2004). In this regard,
addressing emergent fungal diseases can be particularly insightful, since their recent origins
provide a unique opportunity to study the mechanisms underlying ecological speciation
(Stukenbrock & McDonald 2008). Fungal diseases have been emerging at an increasing rate on a
wide range of host plants, posing tremendous threats to global economy and food safety (Jones et
al. 2008). The growing awareness of their worldwide impact has been followed by the need of
understanding the events leading to the emergence of such species (Stukenbrock & McDonald
2008). In addition, they also represent interesting models of speciation mechanisms for two other
reasons. First, as human activities led to dramatic changes in the ecosystems at a global scale,
with vast areas cultivated with single crops/genotypes and with the global exchange of
germplasm (Kareiva et al. 2007), diverse ecological opportunities have been created for the rapid
emergence and transmission of novel fungal diseases (Stukenbrock & McDonald 2008). Second,
fungi possess several exclusive and remarkable features that facilitate rapid ecological speciation,
especially through host-shift (Giraud et al. 2010). In some examples of local adaptation to their
hosts, fungi revealed capable of: (i) undergoing frequent asexual reproduction with rare events of
sexual recombination; (ii) generating a large number of spores, which increases adaptive
variation input by mutation; or (iii) mating only within the host, creating pleiotropic interactions
between host specialization and assortative mating (Giraud et al. 2010 and references therein).
Herein, we aimed to elucidate the speciation event of an emergent fungal pathogen,
4
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
Colletotrichum kahawae, from within the generalist and cosmopolitan C. gloeosporioides
complex. C. kahawae emerged as a specialist pathogen causing Coffee Berry Disease (produces
anthracnose symptoms, with sunken necroses leading to fruit drop or mummification) on Arabica
coffee (Coffea arabica), due to its unique adaptation of infecting green berries, an ecological
niche previously unoccupied by other fungi (Firman & Waller 1977). Although C. arabica is
grown in South America, Africa and Asia, C. kahawae is still restricted to Africa where it was
first reported in 1922 in Kenya (McDonald 1926). However, and despite the African origin of C.
arabica, the cultivation of this crop is also more recent in this continent, with the first attempts in
the 18th century but most dating from the second half of the 19th century (Bigger 2006). The
Coffee Berry Disease pathogen currently occurs in nearly all African regions where C. arabica is
grown, particularly above 1400m, ravaging plantations and causing up to 80% yield losses
annually. Nonetheless, very little is known about its origin besides the information provided by
historical data and field reports. According to these sources, C. kahawae would have originated
in Kenya from C. gloeosporioides sensu lato populations inhabiting Coffea spp. hosts (frequently
isolated from ripe or damaged coffee berries) either by: (i) mutation from a mildly parasitic form
in C. arabica (Nutman & Roberts 1960); (ii) hybridization between C. gloeosporioides strains
from other Coffea spp. hosts (Robinson 1976); or (iii) shifting from another Coffea spp., where it
would be present as an harmless fungal strain (Robinson 1974). Currently, these hypotheses
remain untested and the relationship between C. kahawae and C. gloeosporioides s.l. isolates has
been poorly investigated, so that it is unclear how the specialization to the new niche was
accomplished. Some studies have attempted to provide insights by investigating the genetic
variability and population structure of C. kahawae, albeit with little success. Concordant with the
putative recent origin and predominantly asexual reproduction, genetic variability of C. kahawae
was found to be very low, preventing inferences about its population structure, origin and spread
5
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
(Bridge et al. 2008; Manuel et al. 2009).
The present study intends to assess the current hypotheses for the emergence and spread of C.
kahawae and to investigate the process of host specialization, using information from a multi-
locus sequencing approach. We used an extensive sampling of C. kahawae’s populations, as well
as C. gloeosporioides s.l. isolates from Coffea spp. and several other hosts worldwide. In this
endeavor, we made use of novel statistical techniques on the inference of phylogenetic
relationships among species, estimation of divergence times and reconstruction of past
demographic history. These analyses were coupled with information from geographic modeling
and pathogenicity tests to provide a multidisciplinary view of such speciation event.
Methods
Fungal material
A total of 102 isolates from the C. gloeosporioides complex were obtained from culture
collections and field samples, comprising three main groups: (i) 29 C. gloeosporioides s.l.
isolates collected from Coffea spp. hosts throughout plantations in South America, Africa and
Asia; (ii) 14 C. gloeosporioides s.l. isolates collected from eleven host species (other than Coffea
spp.) and five geographic locations, previously selected based on preliminary phylogenetic
analyses and culture availability; and (iii) 59 C. kahawae isolates obtained from infected green C.
arabica berries in ten African countries, covering most of its current range (Table S1). Ex-
holotypes (cultures obtained from the taxonomical type strain) from three recently described
species within the C. gloeosporioides complex from C. arabica hosts in Thailand were included:
Colletotrichum siamense, Colletotrichum asianum and Colletotrichum fructicola (Prihastuti et al.
6
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
2009). Isolates from Citrus limon and Olea europaea hosts were also used, representing the C.
gloeosporioides sensu stricto epitype (Cannon et al. 2008) based on BLAST results of the
complete rDNA Internal Transcribed Spacer (ITS) (GenBank accession number EU371022 [E
value = 0, Max ident. = 98%]) and β-tubulin 2 (β-tub2) (GenBank accession number FJ907445 [E
value = 0, Max ident. = 99%]). Five isolates of Colletotrichum fragariae were selected as
outgroup taxa for the phylogenetic analyses. Culturing and DNA extraction from fungal isolates
were performed as previously described (Silva et al. 2012).
Molecular data
Six nuclear markers previously described in detail by Silva et al. (2012) were employed for this
study: the ITS region, β-tub2, Apn25L, MAT1-2-1, ApMAT and MAT5L. Primers and PCR
conditions for ApMAT, MAT5L, Apn25L and MAT1-2-1 were as described by Silva et al.
(2012). PCR amplification of the ITS region was carried out with primers ITS1Ext/ITS4Ext
(Brown et al. 1996) and β-tub2 using primers T1/T2 (O’Donnell & Cigelnik 1997). PCR products
yielding one clear band were purified using SureClean (BioLine). When products presented a
multi-band profile, the band of the expected size was excised and purified using the Silica Bead
DNA Gel Extraction kit (Fermentas). Sequencing reactions were performed using the BigDye
version 3.1 chemistry (Applied Biosystems) on an ABI prism 310 automated sequencer.
Amplicons were sequenced in both directions and chromatograms were manually checked for
errors in SEQUENCHER v4.0.5 (Gene Codes Corporation).
Phylogenetic analyses
Multiple sequence alignments were constructed for each nuclear sequence dataset in MAFFT
v6.717b (Katoh & Toh 2009), using the L-INS-i method. Individual datasets were concatenated
into a combined matrix using the CONCATENATOR v1.1.0 software (Pina-Martins & Paulo
7
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
2008). We used Maximum Likelihood (ML) and a Bayesian framework with Markov chain
Monte Carlo (BMCMC) to reconstruct phylogenies from the separate and combined datasets. For
ML analyses, MODELTEST v3.7 (Posada & Crandall 1998) was used to select the best fit model
of DNA sequence evolution, under the Akaike Information Criterion (AIC). ML heuristic
searches were performed in PAUP* v4.0d99 (Swofford 2003) with 100 replicates, random
sequence addition and a Tree-Bisection Reconnection (TBR) branch swapping algorithm.
Nonparametric bootstrap was conducted using 1 000 pseudoreplicates with 10 random additions
and TBR branch swapping. The BMCMC analysis was run in MRBAYES v3.1.2 (Ronquist &
Huelsenbeck 2003) with the optimal model of sequence evolution selected under the AIC, as
implemented in MRMODELTEST v2.3 (Nylander 2004). The MAT1-2-1 dataset was partitioned
into codon positions because two different substitution models were estimated for the first and
second positions combined and third position. Posterior probabilities were generated from 1x107
generations, sampling at every 1 000th iteration, and the analysis was run three times with one
cold and three incrementally heated Metropolis-coupled Monte Carlo Markov chains, starting
from random trees. The achievement of the stationary phase was checked using Tracer v1.4
(Rambaut & Drummond 2007) and 1x106 generations were discarded as burn-in. Trees from
different runs were then combined and summarized in a majority rule 50% consensus tree.
Since species assignment for most C. gloeosporioides s.l. isolates was initially uncertain, we
diagnosed potentially distinct species in a phylogenetic context using the multi-locus
genealogical concordance approach (Taylor et al. 2000). Phylogenetic species were required to
meet the following criteria: (i) monophyly, (ii) strong phylogenetic support (e.g. bootstrap,
posterior probabilities) in the multi-locus analysis and (iii) genealogical concordance, as conflict
8
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
between gene trees could be interpreted as recombination among individuals within a species
(Taylor et al. 2000).
Divergence dating of the species tree and demographic analysis
The Bayesian MCMC analysis implemented in *BEAST (Heled & Drummond 2010), an
extension of the BEAST v.1.6.1 software package (Drummond & Rambaut 2007), was used to
joint estimate the species tree and divergence times from our multi-locus dataset. Preliminary
simulations detected a substantial heterogeneity in the substitution rates of the different
sequenced regions, rendering inappropriate the attribution of a single clock model. Thus, we
analyzed eight partition schemes of increasing complexity, ranging from a single partition of
substitution and clock models, to the partitioning of the entire dataset in 8 unlinked substitution
models and 6 unlinked clock models, according to gene/intergenic positions (Table S2). The best
fit model of sequence evolution for each partition was estimated using MODELTEST v3.7
(Posada & Crandall 1998), under the AIC. We used a Birth and Death process as species tree
prior and a Piecewise linear and constant root model for population size. Preliminary runs using
an uncorrelated lognormal relaxed clock revealed that the σr (“CoefficientOfVariation”)
parameter value for the ITS partition was consistently < 0.3 with a frequency histogram abutting
0, suggesting that this partition does not significantly deviate from a strict clock. Therefore,
where applicable in our partition schemes, we used a strict clock for the ITS partition and
uncorrelated lognormal molecular clocks for all other partitions.
Genetic distances were converted into geological time units using an averaged substitution rate of
8.8x10-9 per bp per year corresponding to the median of estimates for several nuclear genomic
regions obtained by Kasuga et al. (2002). We fixed the substitution rate of ITS with this value,
9
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
since it was the only genomic region from our dataset included in the calibration study, and
estimated the rates for all other partitions using uniform priors [0, 1]. The final results from the
*BEAST MCMC analyses were obtained by combining two independent runs with 1x108
generations each, sampling parameters every 10 000 iterations, using LOGCOMBINER v1.6.1
(Drummond & Rambaut 2007). Tracer v1.4 (Rambaut & Drummond 2007) was used to assess
convergence and mixing for all parameters by visually inspecting the trace log and estimating the
Effective Sample Size (ESS) for each parameter; all ESS values were > 200 indicating adequate
mixing. A conservative 10% of each analysis was discarded as burn-in.
In order to objectively select the most appropriate partition, we tested our partition schemes with
a Bayes Factors (BF) analysis. Following a previous study (Brown & Lemmon 2007), we
considered values for the test statistic 2ln(BF)12 between model 1 and model 2 above 10 to be
evidence of significant support for model 2, values between 10 and -10 to indicate ambiguity and
values below 10 to indicate strong support for model 1. When faced with ambiguity, we opted for
the simpler model. BF were determined by calculating the marginal likelihood for each scheme
using Tracer v1.4 (Rambaut & Drummond 2007).
The demographic history of C. kahawae’s populations was reconstructed using the Gaussian
Markov random fields (GMRF) Bayesian Skyride (Minin et al. 2008), implemented in BEAST
v1.6.1 (Drummond & Rambaut 2007). The GMRF Bayesian Skyride was only specified for the
most appropriate partition previously selected using BF. The analysis was run twice for 1x108
generations, sampling at every 10 000 generations after an initial burn-in of 10%. As in the
*BEAST analysis, the performance of the MCMC procedure was evaluated in Tracer v1.4
10
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
(Rambaut & Drummond 2007), which was also used to create the Bayesian Skyride plot with the
median and corresponding credibility intervals of the estimated demographic parameters.
Population genetics analyses in C. kahawae and UG1
According to the phylogenetic analyses described above, a group of C. gloeosporioides s.l.
isolates very close to C. kahawae was detected (Fig. 1), here named Undescribed group 1 (UG1).
To better understand their relationship, median-joining haplotype networks were constructed for
each locus using the program NETWORK v4.6.0.0 (Bandelt et al. 1999). DNA polymorphism
statistics, measures of divergence using the net divergence statistic (Da, Nei 1987) and analyses of
recombination based on the four-gamete test (Hudson and Kaplan 1985) were computed in
DnaSP v5.0 (Rozas et al. 2003). To assess the degree of genetic differentiation, F-statistics were
calculated for all molecular markers, as implemented in ARLEQUIN (Excoffier et al. 2005), and
the Snn statistic (Hudson 2000) was estimated after 1 000 permutations.
Isolation with Migration analyses
Before investigating the history of isolation between these two groups, the standard neutral model
was tested for each marker using Tajima’s D and Fu and Li’s D* and F*, as implemented in
DnaSP v5.0. We used the program isolation-with-migration (IMa) (Hey and Nielsen 2004) to
assess whether an isolation with migration model would fit the data significantly better than a
strict isolation model and to estimate population mutation parameters (θCk, θUG1 and ancestral θA),
migration rates (mCk>UG1 and mUG1>Ck) and the time since the divergence of the ancestral population
of C. kahawae and UG1. Since IMa supports multi-locus datasets with different molecular rates,
the concatenated dataset of 6 nuclear markers was used to carry out the estimation of parameters.
Given the low sequence polymorphism and absence of sites with multiple substitutions, we
11
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
applied the infinite-site model of sequence evolution for all loci. Multiple preliminary runs were
performed to determine the appropriate prior upper bounds for the parameters and to assess the
most efficient heating scheme for the Metropolis coupled MCMC run. After these pilot runs, we
performed four independent runs with 100 Metropolis coupled chains with geometric heating
(increment parameters: g1 = 0.95; g2 = 0.5) for 5 × 105 steps, sampling at every 10th iteration,
after a burnin period of 1 × 106 generations. Overall, 2 × 105 generations were saved and
combined for the final estimation of parameters by executing the program in L-mode. The L-
mode of IMa was also used to estimate joint parameter distributions and to examine nested
models with a reduced number of parameters, representing alternative hypothesis of gene flow
patterns. Hypothesis testing was performed by comparing the ln-likelihood ratio (LLR) statistic
between two models to a χ2 distribution with k degrees of freedom, where k is the difference in
the number of parameters between models (Hey and Nielsen 2007). To convert parameter
estimates into biologically meaningful units, we used the per locus substitution rate of the ITS
partition (1.369 × 10-6), obtained from the same averaged substitution rate per site per year as in
the *BEAST analyses.
Phylogeography of C. kahawae
The relationships and structure of C. kahawae’s populations were assessed by ML and BMCMC
methods as described above. For each of the polymorphic sites within C. kahawae, the ancestral
state was estimated using SNAP Workbench (Aylor et al. 2006). Relevant alternative topologies
were tested to assess the robustness of the unconstrained inference of population’s relationships,
using the topological test (SH) of Shimodaira & Hasegawa (1999). One thousand replicates were
performed by re-sampling the partial likelihoods for each site (RELL model). Using the IDRISI
12
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
15 Andes software (Eastman 2006), a topographical map of a partial region of the African
continent was modeled, in order to highlight regions above 1400m of altitude and to assess their
correlation with C. kahawae’s haplotype distribution.
Pathogenicity assays
To assess the pathogenicity of Colletotrichum isolates from other hosts on Arabica coffee fruits,
inoculations of fungal strains were carried out on green and ripe detached berries. Based on the
phylogenetic analyses conducted above, all isolates from UG1 and Undescribed group 2 (UG2)
phylogenetic groups (Fig. 1) were included, plus four C. kahawae isolates as positive controls
(Mal2, Cam1, Ang29 and Que2). For each isolate, 106 spore mL-1 spore suspensions were
obtained from 7-day-old sporulating cultures in Malt Extract Agar medium (Cultimed, Panreac
Quimica). Berries were collected from coffee plants maintained at CIFC greenhouses, washed
three times with distilled water and blotted dry with a sterilized paper. Two trials were then
undertaken: (i) 10 Arabica green coffee berries of genotype CIFC-7963 and (ii) 10 Arabica ripe
coffee berries of genotype CIFC-H420/2 susceptible to Coffee Berry Disease were inoculated
with 10 µl droplets of conidia suspension/berry from each fungal isolate. For both trials, an
additional 10-berry set was mock-inoculated with 10 µl sterile water/berry as a negative control.
After inoculation, berries were incubated for 24 h in the dark in a moisture chamber at 25 ºC and
then maintained under these conditions but with a 12 h photoperiod. Symptoms were scored at
the 7th and 15th days after inoculation, and berries were classified either as showing Coffee Berry
Disease symptoms (1), or as asymptomatic (0).
13
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
Results
Evolutionary relationships of Colletotrichum spp. from coffee and other hosts
The evolutionary relationships between C. kahawae and C. gloeosporioides s.l. isolates were
established by constructing a multi-locus concatenated phylogeny of 102 samples of
Colletotrichum from coffee and other hosts collected worldwide. Using a six marker (ITS, β-
tub2, ApMAT, Apn25L, MAT1-2-1 and MAT5L) sequencing approach, we achieved a substantial
resolution of species relationships. A summary of the phylogenetic information for each
individual and combined molecular markers is provided in Table 1. Parsimony informative
content within individual nuclear sequences ranged from low (ITS: 3%; β-tub2: 13%) to
moderate (Apn25L: 20%; MAT5L: 18%; MAT1-2-1: 17%) and high (ApMAT: 37%). The
combined dataset consisted of 3783 bp of sequence data, with 764 parsimony informative sites
(20%).
Phylogenetic reconstructions of the concatenated dataset using both ML and BMCMC methods
yielded the same topology with minor discrepancies in the support for some nodes (Fig. 1). The
phylogeny was mostly resolved with high statistical support and revealed a pattern of divergent
evolution between three main lineages derived from two ancestral splits. Since the split from C.
fragariae, the first lineage to diverge included the C. kahawae clade, along with two closely
related groups of isolates (Da = 0.53%) from several hosts other than Coffea spp. (UG1 and
UG2). These two groups include fungi comprising a very broad range of hosts and geographic
origins, as they were isolated from fruits, twigs and leaf lesions or as leaf endophytes from eleven
host species belonging to different taxonomic divisions and orders fairly distant from the
14
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
taxonomic placement of Coffea spp., in three continents (Fig. 1; Table S1). The phylogenetic
proximity was particularly high between C. kahawae and isolates from the UG1 group.
According to the genealogical concordance criteria alone, both groups should be regarded as a
single species, considering that UG1 could not be recovered as a distinct monophyletic group
with sufficient statistical support neither in single nor in multi-locus phylogenetic reconstructions
and that C. kahawae could not be completely distinguished from some UG1 isolates in four out
of six single-locus analyses (Fig. S1). However, since UG1 and C. kahawae clearly represent
distinct and mostly differentiated biological entities (see Results, Pathogenicity tests; Genetic
divergence and differentiation between C. kahawae and UG1) they will be considered hereafter
as two separate groups. Moreover, using the combined dataset, all C. kahawae isolates were
monophyletic and clearly distinguishable with high bootstrap and posterior probability values
(100/1). The remaining C. gloeosporioides s.l. isolates clustered in a large monophyletic group
comprising the two remaining lineages. The most anciently diverged lineage encompassed
representative isolates of C. gloeosporioides s.s. from lemon and olive. The third main lineage
was exclusively composed by isolates from coffee hosts belonging to at least four different
species, which revealed a fairly high divergence from C. kahawae (Da = 6.14%). In this lineage,
most of the unassigned C. gloeosporioides s.l. strains grouped with the C. siamense ex-holotype
in a large and highly supported monophyletic clade, within which a high degree of genealogical
discordance was observed (Fig. S1). We also found two clonal samples (Ang100 and Ang101)
that revealed to be a distinct and well supported new monophyletic lineage (Undescribed group 3;
UG3). Even though a formal recognition of species boundaries within C. gloeosporioides s.l. on
Coffee hosts is out of the scope of this study, all of the diagnosed groups in Fig. 1 are in
accordance with the genealogical concordance phylogenetic species recognition guidelines. Only
the ITS and MAT5L partitions were unable to recover clear species boundaries, most likely due
15
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
to low nucleotide variability and small sequence size, respectively.
Divergence time estimates of the species tree and demographic analysis
From the eight partitioning schemes performed in *BEAST, the BF analysis favored the second
most complex model in which all six nuclear sequences presented distinct substitution and clock
models, with the MAT1-2-1 dataset further partitioned into intron and exon regions with two
codon partitioning, over similar models with three codon (2lnBF = 7) or without codon
partitioning (2lnBF = 37)(Table S3). The generated maximum clade credibility tree (Fig. 2)
presented a similar topology to that produced by the ML and BMCMC analyses of concatenated
data for the two first main lineages. However, species relationships within the third main lineage
were largely incongruent between methods with respect to the relative phylogenetic position of
taxa and statistical support. Unlike the well supported phylogeny obtained from the concatenated
analyses, the *BEAST species tree revealed a much weaker support for the relationships of these
species with C. fructicola, C. siamense and UG3 sharing a terminal and nearly polytomic
relationship, while C. asianum was the most anciently diverged species.
The Bayesian MCMC procedure implemented in *BEAST also allowed co-estimating the
posterior distribution of the time to the most recent common ancestor (TMRCA), including 95%
credibility intervals [Highest Posterior Density (HDP)], of any group of taxa. Thus, to construct a
time line of events, we estimated the TMRCA of all C. kahawae isolates as well as the
divergence time between C. kahawae and three taxa sets of particular interest: i) all C.
gloeosporioides s.l. isolates from Coffea spp. hosts; ii) all isolates from UG1and iii) the closest
phylogenetic representative of UG1, isolate Cg432 (Table 2). The TMRCA of C. kahawae was
estimated to be between 320 and 4 784 years (95% HPD) with a median of 2 219 years. This
16
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
species most likely diverged from the sampled C. gloeosporioides s.l. species from coffee hosts
around 403 610 years (95% HPD, 164 000 to 686 000), contrasting with the relatively shallow
divergence of 11 547 years (95% HPD, 3 574 to 21 735) from UG1. However, given the lack of
monophyly for all isolates in this group and some heterogeneity in their phylogenetic distance to
C. kahawae (Da = 0.19% - 0.69%), we also estimated the TMRCA between C. kahawae and the
UG1 isolate Cg432. This allowed a better approximation of the divergence time between C.
kahawae and the closest non-C. kahawae relative, which was estimated to be 7 840 years (95%
HPD, 2 020 to 15 245).
The demographic history of C. kahawae’s populations is depicted in Fig. 3 using a Bayesian
Skyride plot reconstruction. Setting the GMRF Bayesian Skyride as a tree prior in the BEAST
analysis resulted in older estimations for the TRMCA of C. kahawae (mean: 10 598 years; 95%
HPD, 3 207 to 19 889) and its divergence time from the closest non-C. kahawae relative (mean:
26 578 years; 95% HPD, 13 735 to 45 698). The time lag relative to the *BEAST estimates was
expected due to methodological differences between concatenation and species tree methods
(Heled & Drummond 2010) and thus, these time estimates should be interpreted in relative terms.
The historical demography of C. kahawae revealed a very recent and sharp drop in population
size, without significant evidence of recovery until the present day. Notably, the onset of the
downfall in C. kahawae’s population size (~25 000 years) roughly matches the estimated time of
divergence from the closest non-C. kahawae relative, pointing to the occurrence of a bottleneck
only after their separation (Fig. 3). To assess the potentially biasing effect of C. kahawae’s
population structure in the reconstruction of its demographic history, we also performed the same
GMRF Bayesian skyride analysis for each haplotypic group separately (Angola, Cameroon and
17
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
East Africa). The results were similar across all datasets, suggesting that our demographic
estimates were robust to the violation of the panmitic population assumption due to the genetic
structure of C. kahawae’s populations.
Genetic divergence and differentiation between C. kahawae and UG1
Haplotype networks constructed for each locus revealed a close relationship between C. kahawae
and UG1 (Fig. 4). Full haplotypes were shared between some UG1 isolates and C. kahawae in
several individual markers, although both groups could be completely distinguished in the
Apn25L and MAT1-2-1 datasets. Table 3 summarizes several interspecific divergence and
differentiation statistics. Consistent with a very recent separation, the net divergence was very
low across all individual loci (Da = 0.03-0.35). Intriguingly, out of the 38 polymorphic sites
among C. kahawae and UG1, there was an absence of shared polymorphisms and three
polymorphic sites were fixed in the Apn25L and MAT1-2-1 datasets. This suggests that while
both groups have separated quite recently, current levels of gene flow should be low.
Differentiation indexes, such as FST and Snn estimates, reflect the same pattern as they consistently
show high and significant levels of differentiation across all loci. Most of the polymorphic sites
were exclusive of the UG1 group (32), while only a small fraction (3) was exclusive of C.
kahawae, which further supports the effective population decline of the latter.
Isolation with migration analyses
For each marker dataset, both Fu and Li’s D* and F* and Tajima’s D tests were not significantly
deviated from 0 (P > 0.05; Table S4) and no recombination events were observed within loci.
18
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
Therefore, the hypotheses of selective neutrality and no recombination within loci, both
assumptions of the IMa models, could not be rejected.
The approximate posterior density curves for the isolation with migration model parameters that
resulted from the IMa analyses are shown in Fig. 5 and their maximum-likelihood estimates with
respective 95% Highest Posterior Density (HPD) intervals are provided in Table 4. Even though
there was no sufficient information in the data to estimate the time since the split of the ancestral
population of C. kahawae and UG1 parameter, all other parameters had stronger signals in the
data and revealed a single major peak in their posterior distributions. The limiting information in
the data as well as the lack of shared polymorphisms between populations may have also led to
nonzero tails in the posterior distribution of θUG1, θA and mUG1>Ck that prevented a reliable
estimation of the credibility intervals for these parameters. Despite these limitations, the four
replicate runs gave similar maximum likelihood estimates and posterior distributions for these
parameters, suggesting that the program was sampling from the stationary phase of the MCMC
run and that parameter values are robust and representative given their priors. To test the potential
impact of C. kahawae’s population structure on the estimation of IMa parameters, we have
performed additional runs for each haplotypic group separately. As in the BEAST GMRF Skyride
analysis, the results were similar across datasets suggesting a limited impact of C. kahawae’s
population structure on the estimation of parameters.
Estimates of current and ancestral effective population sizes revealed a decline of circa 40-fold in
the population size of C. kahawae compared to the ancestral population size, suggesting that this
species underwent a population bottleneck since its separation from UG1. Estimates of gene flow
19
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
were highly asymmetrical. No gene flow was detected from UG1 to C. kahawae but moderate
and significant gene exchange was detected from C. kahawae to UG1. However, it is important to
note that the full model did not fit the data significantly better than any of the nested models
compared, including the strict isolation model (Table S5). Thus, even though the full model
reveals an intriguing pattern of gene flow between C. kahawae and UG1, our data could not
reject the simpler hypothesis that both groups have been isolated since their divergence.
Phylogeography of C. kahawae
Our results confirmed the extremely low genetic variability within the C. kahawae clade (π =
0.00076; S = 3). However, three divergent haplotypes could be distinguished, named after their
geographic location: Angola, Cameroon and East Africa (Fig. 6). For the three polymorphic sites
within C. kahawae, ancestral states were inferred from the C. gloeosporioides sampling (Fig. 6a).
The Angola haplotype presented a nucleotide sequence identical to the inferred ancestral state,
while the Cameroon haplotype diverged by one non-synonymous mutation at the MAT1-2-1 gene,
which replaces a serine for a proline residue. The East Africa haplotype shared the same non-
synonymous mutation and diverged from the other haplotypes by two additional intronic
mutations at the β-tub2 marker. To assess the robustness of this inference, we tested an alternative
topology, in which we constrained the East African population as the most ancestral lineage and
the Angola population as the most derived. The likelihood score of the resulting tree was worse
than the unconstrained topology, with a marginally significant P-value (SH Test, P = 0.056).
Moreover, we could not detect the presence of migrant haplotypes within our C. kahawae
sampling. We further extended our phylogeographical analysis by modeling a topographical map
of a partial region of Africa, in order to highlight all regions above 1400m of altitude, and the
20
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
result was embedded on Fig. 6b, as rough gray areas. East African regions revealed a larger
extension of highlands, which coincides with the Great Rift Valley area. Outside this area,
highland regions are sparsely distributed, mainly through South Africa, Namibia, Angola and
Cameroon, and isolated by long distances of lowland regions.
Pathogenicity assays
In the pathogenicity test with green berries, only C. kahawae isolates produced necrotic and
sunken lesions characteristic of Coffee Berry Disease symptoms on all 10 inoculated berries
(10/10 berries scored “1”). Green berries inoculated with isolates from UG1 and UG2 revealed
no necroses or any other symptoms (10/10 berries scored “0”). Conversely, neither C. kahawae
nor isolates from UG1 and UG2 groups were able to successfully colonize ripe berries.
Discussion
To our knowledge, this study describes the first comprehensive analysis of the evolutionary
relationships of C. kahawae with C. gloeosporioides s.l. isolates from diverse hosts worldwide,
aiming at enlightening the underlying speciation event. One of the most striking findings was that
none of the initial hypotheses asserting that C. kahawae would have emerged from a C.
gloeosporioides s.l. gene pool from Coffea spp. hosts could be supported by our results.
According to our phylogenetic analyses, C. kahawae appears to be diverging from all C.
gloeosporioides s.l. isolates from coffee hosts since approximately 403 000 years, which is
inconsistent with a very recent origin by mutation and adaptation (Nutman & Roberts 1960) or
hybridization (Robinson 1976) from these populations. The most notable inconsistency, however,
derives from the existence of a taxonomically unclassified phylogenetic group sampled from
several hosts other than Coffea spp. (UG1) that is genetically similar to C. kahawae. Although
21
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
unexpected, the close relationship of UG1 with C. kahawae as well as its association with
genetically distant hosts clearly suggests an alternative scenario in which a host-jump preceded
the outbreak of Coffee Berry Disease epidemics, though the diversity of hosts that this group
covers hamper any confident inference on the host sources of this event. In Colletotrichum, host-
shifts have not been previously documented as a speciation driver, but there is recent evidence
that local adaptation and host specialization have structured populations of some species, such as
C. cereale (Crouch et al. 2009). Notwithstanding, a large amount of evidence has accumulated
showing that host-shift speciation, a particular case of ecological speciation, is one of the main
routes for the emergence of fungal pathogens (Couch et al. 2005, Stukenbrock & McDonald
2008, Zaffarano et al. 2008, Giraud et al. 2010).
“Rapid ecological speciation through host jump” hypothesis
Considering the onset of C. kahawae’s diversification and its divergence time from the closest
non-C. kahawae relative, C. kahawae and UG1 have only been separated for an average of 5 600
years. This presents a remarkably short period of time for speciation to occur and comes in
agreement with the observation that crop pathogens can emerge rather swiftly in recent
timescales (Stukenbrock & McDonald 2008). However, the estimated time period for the origin
of C. kahawae is still older than expected from historical data (Firman & Waller 1977) and from
the onset of the cultivation of C. arabica in Africa (Bigger 2006). While C. kahawae could have
begun adapting to C. arabica in wild populations or on an intermediate host, such as another
Coffea spp., these scenarios are not supported by our data. A more plausible explanation could be
a downward bias in our molecular rate calibration resultant from the time dependency of
mutation rates (Ho et al. 2005). It has been recently demonstrated that mutation rates in recent
lineages accelerate exponentially towards the present, biasing divergence time estimates to higher
22
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
values when old calibrations are employed (Ho et al. 2011). This would leave room for fitting the
more recent historical time estimates in this molecular-derived scenario.
The close phylogenetic relationship between C. kahawae and UG1 also raises the question of
whether they represent good species or simply structured populations from a single species. Even
though we lacked a solid basis to consider both groups as separate species solely from a
phylogenetic standpoint, our results from the pathogenicity tests showed that, unlike C. kahawae,
isolates from UG1 were unable to cause Coffee Berry Disease on detached green berries and
neither group was able to infect ripe berries. Green Arabica coffee berries thus seem to present an
ecological source of divergent selection to which C. kahawae successfully adapted, while isolates
from UG1 appear to suffer a sharp fitness decrease that greatly compromises their survival. At
the molecular level, C. kahawae and UG1 also seem to represent well differentiated groups
according to a combination of significant and elevated differentiation indexes across all studied
loci and a complete segregation of polymorphic sites. Under the isolation with migration model,
migration estimates revealed an absence of gene flow to C. kahawae but, unexpectedly, a
moderate amount of gene flow into UG1 was detected. The transfer of genetic material from C.
kahawae to UG1 after their divergence is intriguing because it implies the occurrence of a sexual
stage of C. kahawae in nature as well as the existence of a suitable alternate host or substrate
where mating could occur, both of which are yet to be reported (Firman and Waller 1977, Bridge
et al. 2008). Nevertheless, the migration signal in our data was not strong enough to allow the
rejection of a strict isolation model, though future work with a larger sampling of the UG1 group
and more variable loci is still required. Overall, it is reasonable to conclude that C. kahawae and
UG1 represent ecologically distinct and isolated groups that evolved significantly different
23
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
pathogenic abilities in a short period of time, from which only C. kahawae should trigger major
disease control and biosecurity measures on C. arabica crops.
During the brief period that C. kahawae and UG1 were separated, historical demographic
reconstructions and patterns of DNA polymorphism consistently showed a sudden and severe
drop in the effective population size of C. kahawae, as might be expected following a host-jump.
For such a host-jump to occur, isolates from the UG1 group must have been within the cruising
range of C. arabica during the initial stages of C. kahawae speciation, even if the geographic
distribution of both groups became disjunct in latter stages. In such scenario, the pattern of low
genetic diversity in C. kahawae could have been produced by the exertion of strong disruptive
selection during the first stages of adaptation to C. arabica, coupled with the ability of the fungus
to undergo repeated cycles of asexual propagation. Asexual reproduction could greatly amplify
new advantageous mutations to very high frequencies along with the entire genome by
hitchhiking (Adolfatto 2001). This would eliminate polymorphisms and maintain only the intact
genome of those individuals in the population having the favored mutations, evidencing the
strong genetic bottleneck and clonal pattern observed in C. kahawae as well as the lack of shared
polymorphisms with UG1.
Adaptation to a new host is generally most efficient when gene flow of ancestral genes into the
adapting population is severely reduced or absent (Giraud et al. 2010), which may be difficult to
achieve when the source population lies within cruising range. Nonetheless, two significant
intrinsic barriers to gene flow may have indeed evolved between C. kahawae and UG1 that can
explain their high differentiation and low levels of gene flow. First, the process of host
24
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
specialization itself can give rise to an isolation mechanism from the earliest stages of divergence
in fungal populations (Gladieux et al 2010). Because most pathogenic Ascomycetes
(Colletotrichum included; Cisar & TeBeest 1999) can only mate on their host after mycelial
development, any genetic variation favoring an adaptation to C. arabica can pleiotropically cause
assortative mating and restrict gene flow, as any unfit immigrant will be unable to grow and
reproduce in such environment (Giraud 2006; Giraud et al. 2010). This means that the differential
survivorship of ill-adapted UG1 and adapted C. kahawae isolates on C. arabica constitutes a
legitimate and significant pre-zygotic reproductive barrier, recently named immigrant inviability
(Nosil et al. 2005). The unique adaptation of C. kahawae could be viewed as a “magic trait”
scenario (Gavrilets 2004), where assortative mating arises as a by-product of host-specialization.
Under strong selection, this barrier has already been shown to quickly and significantly prevent
gene flow in other fungal pathogens undergoing adaptation to their hosts, such as Venturia
inaequalis’s sympatric populations from apple varieties with and without the Vf resistance gene
(Gladieux et al. 2011). Second, the transition of C. kahawae to a predominantly asexual
reproductive strategy could have been a major reproductive barrier, allowing for multiple
generations of selection for local adaptation without the pernicious effect of recombination
(Zeigler 1998, Giraud et al. 2010). Altogether, it is reasonable to expect that the combination of
immigrant inviability and the predominantly asexual behavior of C. kahawae would have been
rather effective in keeping populations separated during the early stages of divergence. However,
additional work with a strategic and focused sampling of the UG1 group will be required to better
understand not only the geography of the latter stages in C. kahawae speciation but also the
mechanisms underlying the adaptation of this fungal pathogen to C. arabica.
25
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
Phylogeography of C. kahawae
The phylogeographic knowledge of C. kahawae can provide important information that may help
pinpointing potential regions for the occurrence of its source population. In fact, our genetic
analysis revealed the first consistent and unambiguous population structure of C. kahawae. Based
on single nucleotide polymorphisms, three slightly divergent haplotypes highly correlated with
their geographic distribution were found (Angola, Cameroon and East Africa), confirming
previous indications of an incipient geographic structuring (Bridge et al. 2008). Strikingly, the
ancestral state inference suggests that the Angola haplotype is the most ancestral among the
studied isolates, while those from Kenya and the remaining East African countries clustered
together as the most derived lineage. This provides an alternative view for the geographic origin
of C. kahawae, as compared to the current understanding, which follows the premise that C.
kahawae co-evolved with coffee hosts and is based on disease reports and field observations.
Arabica coffee occupies only a small fraction of the plantations in Angola, mainly in the central
plateau, while 98% of coffee production derives from C. canephora varieties. The first
introductions of C. arabica in this country occurred in the 18th century (A. Mendes Gaspar¸
personal communication), and the first reports of Coffee Berry Disease date back to 1930
(Beynon et al. 1995), just eight years after its discovery in Kenya. However, we stress that
information obtained from historical data can be flawed because it is biased towards regions
where disease incidence is higher, and where substantial scientific effort has been focused on
monitoring plant diseases. Moreover, as we suggested above, the ancestral population of C.
kahawae most likely emerged from hosts other than Coffea spp., which may circumvent the
difficulties in reconciling our results with historical data, since speciation via host-shift can be
rather swift. Adding the fact that the transport of infected plant material has reached an
26
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
unprecedented global scale (Stukenbrock & McDonald 2008), this greatly increases the
likelihood for the hypothesis of a host-jump of the ancestral population of C. kahawae into the
newly arrived C. arabica plants in Angola.
The population structure of C. kahawae also reveals no evidence of migration between the
geographical locations of each haplotype. This may be explained by seldom sequential
introductions with subsequent geographic isolation. Arabica coffee growing areas in Angola,
Cameroon and East Africa are separated by extensive lowland areas, which are not suitable for
the pathogen or the host (Firman & Waller 1977), thus representing a potentially effective barrier
to migration. In such scenario, bottleneck events during these rare introductions can generate drift
pulses in each of the introduced populations, which become genetically differentiated from each
other whilst retaining their source-introduction relationship (Estoup & Guillemaud 2010).
Accordingly, our results suggest that after a hypothetical origin of the Angola population, an
introduction in the Cameroon followed and from there to the East Africa countries, while each of
the established populations remained isolated in their respective highland areas. However, in
invasion biology, evolutionary scenarios are often characterized by small divergence times, which
may decrease the likelihood of identifying the true source-sink relationship between populations
due to the stochasticity of the process (Estoup & Guillemaud 2010). Moreover, given the vastness
of coffee plantations in the East African countries, particularly in Ethiopia where C. arabica also
occurs naturally, we cannot exclude the presence of unsampled ancestral haplotypes in these
regions that could alter our inferences. However, in such scenario, the exceeding number of
additional steps required to explain the current phylogeographic pattern renders this hypothesis
unlikely. Thus, even though our dataset suggest an Angolan origin for C. kahawae, sampling
27
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
more isolates and polymorphic loci will certainly provide a much more reliable and robust insight
on the evolution of C. kahawae’s populations.
Conclusions
Altogether, our work represents an important step in the understanding of the evolutionary and
speciation history of C. kahawae that will certainly have implications and launch new directions
on its future research. Having found very little support for the current understanding of these
events, we postulate an alternative hypothesis of rapid ecological speciation by a very recent
host-jump and subsequent host specialization to explain the emergence of C. kahawae on Arabica
coffee. Two far-reaching and intriguing implications of this hypothesis are that new and severe
fungal pathogens are indeed able to successfully adapt and speciate in a remarkably short time
span, particularly when driven by divergent natural selection, and that intrinsic and common
biological traits of fungal populations may greatly facilitate their emergence. Our results also
highlight the power and value of molecular data when inferring dissemination patterns of
emerging pathogens. Unlike the limited information retrieved from historical data, the population
structure of C. kahawae revealed an alternative and more objective center of origin and that the
topography of the African continent may have had a pivotal role in shaping and limiting its
dispersal.
Acknowledgments
At FCUL we thank our colleagues, Ana Vieira and Tiago Jesus, for discussions and constructive
criticisms during the elaboration of this manuscript. At CIFC/IICT, we appreciate the technical
support provided by Sandra Sousa Emídio. For supplying additional and valuable isolates for this
work we are also in debt to Ana Paula Ramos, at ISA/UTL, Portugal, and Peter Johnston and
28
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
Bevan Weir, at Landcare Research, New Zealand. Lei Cai acknowledges grants CAS-KSCX2-
EW-J-6/NSFC31070020. This work was financially supported by the portuguese Foudation for
Science and Technology (FCT) and IICT, Ministério da Ciência, Tecnologia e Ensino Superior,
Portugal.
References
Adolfatto P (2001) Adaptive hitchhiking effects on genome variability. Current Opinion in
Genetics & Development, 11, 635–641.
Aylor DL, Price EW, Carbone I (2006) SNAP: combine and map modules for multilocus
population genetic analysis. Bioinformatics, 22, 1399–1401.
Bandelt HJ, Forster P, Röhl A (1999) Median-joining networks for inferring intraspecific
phylogenies. Molecular Biology and Evolution, 16, 37–48.
Beynon S, Coddington A, Lewis BG, Várzea V (1995) Genetic variation in the Coffee Berry
Disease pathogen, Colletotrichum kahawae. Physiological and Molecular Plant Pathology, 46,
457–470.
Bigger M (2006) The dissemination of coffee cultivation throughout the world. Tropical
Agriculture Association Newsletter, 26, 15–19.
Bridge PD, Waller JM, Davies D, Buddie AG (2008) Variability of Colletotrichum kahawae in
relation to other Colletotrichum species from tropical perennial crops and the development of
diagnostic techniques. Journal of Phytopathology, 156, 274–280.
Brown A, Sreenivasaprasad S, Timmer L (1996) Molecular characterization of slow-growing
orange and key lime anthracnose strains of Colletotrichum from citrus as C. acutatum.
Phytopathology, 86, 523–527.
29
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
Brown JM, Lemmon AR (2007) The importance of data partitioning and the utility of Bayes
factors in Bayesian phylogenetics. Systematic Biology, 56, 643–655.
Cannon PF, Buddie AG, Bridge PD (2008). The typification of Colletotrichum gloeosporioides.
Mycotaxon, 104, 1890–204.
Cisar C, TeBeest D (1999) Mating system of the filamentous ascomycete, Glomerella cingulata.
Current Genetics, 35, 127–133.
Couch BC, Fudal I, Lebrun M-H, Tharreau D, Valent B, van Kim P, Nottéghem J-L, Kohn LM
(2005) Origins of host-specific populations of the blast pathogen Magnaporthe oryzae in crop
domestication with subsequent expansion of pandemic clones on rice and weeds of rice.
Genetics, 170, 613-630
Coyne JA, Orr HA (2004) Speciation. Sinauer associates, Inc., Sunderland, Massachusetts.
Crouch JA, Tredway L, Clarke B, Hillman B (2009). Phylogenetic and population genetic
divergence correspond with habitat for the pathogen Colletotrichum cereale and allied taxa across
diverse grass communities. Molecular Ecology, 18, 123–135.
Dieckmann U, Doebeli M, Metz JAJ, Tautz D (2004) Adaptive Speciation. Cambridge University
Press, Cambridge.
Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees.
BMC Evolutionary Biology, 7, 214.
Eastman JR (2006) IDRISI Andes, Clark University, Worcester, Massachusetts.
Estoup A, Guillemaud T (2010) Reconstructing routes of invasion using genetic data: why, how
and so what? Molecular Ecology, 19, 4113–4130.
Excoffier L, Laval G, Schneider S (2005) Arlequin ver. 3.0: An integrated software package for
30
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
population genetics data analysis, Evolutionary Bioinformatics Online, 1, 47–50.
Firman I, Waller J (1977) Coffee Berry Disease and other Colletotrichum diseases of coffee.
Phytopathological Papers, 20, 1–53.
Gavrilets S (2004) Fitness Landscapes and the Origin of Species. Princeton University Press,
Princeton, New Jersey.
Giraud T (2006) Selection against migrant pathogens: the immigrant inviability barrier in
pathogens. Heredity, 97, 316–318.
Giraud T, Gladieux P, Gavrilets S (2010) Linking the emergence of fungal plant diseases with
ecological speciation. Trends in Ecology and Evolution, 25, 387–395.
Gladieux P, Caffier V, Devaux M, Le Cam B (2010) Host-specific differentiation among
populations of Venturia inaequalis causing scab on apple, pyracantha and loquat. Fungal
Genetics and Biology, 6, 511-521.
Gladieux P, Guérin F, Giraud T, Caffier V, Lemaire C, Parisi L, Didelot F, Le Cam B (2011)
Emergence of novel fungal pathogens by ecological speciation: importance of the reduced
viability of immigrants. Molecular Ecology, 20, 4521-4532.
Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data.
Molecular Biology and Evolution, 27, 570–580.
Hendry AP, Nosil P, Rieseberg LH (2007) The speed of ecological speciation. Functional
Ecology, 21, 455–464.
Hey J, Nielsen R (2004) Multilocus methods for estimating population sizes, migration rates and
divergence time, with application to the divergence of Drosophila pseudoobscura and D.
persimilis. Genetics, 167, 747–760.
31
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
Hey J, Nielsen R (2007) Integration within the Felsenstrein equation for improved Markov chain
Monte Carlo methods in population genetics. Proceedings of the National Academy of Sciences
USA, 104, 2785–2790.
Ho SYW, Phillips MJ, Cooper A, Drummond AJ (2005) Time dependency of molecular rate
estimates and systematic overestimation of recent divergence times. Molecular Biology and
Evolution, 22, 1561–1568.
Ho SYW, Lanfear R, Bromham L, Phillips MJ, Soubrier J, Rodrigo AJ, Cooper A (2011) Time-
dependent rates of molecular evolution. Molecular Ecology, 20, 3087–3101.
Hudson RR (2000) A new statistic for detecting genetic differentiation. Genetics, 155, 2011–
2014.
Hudson RR, Kaplan N (1985) Statistical properties of the number of recombination events in the
history of a sample of DNA sequences. Genetics, 111, 147–164.
Jones KE, Patel NG, Levy MA, Storeygard A, Balk D, Gittleman JL, Daszak P (2008) Global
trends in emerging infectious diseases. Nature, 451, 990–993.
Kareiva P, Watts S, McDonald R, Boucher T (2007) Domesticated nature: shaping landscapes
and ecosystems for human welfare. Science, 103, 1866–1869.
Kasuga T, White TJ, Taylor JW (2002) Estimation of nucleotide substitution rates in
Eurotiomycete fungi. Molecular Biology, 19, 2318–2324.
Katoh K, Toh H (2009). Recent developments in the MAFFT multiple sequence alignment
program. Briefings in Bioinformatics, 9, 286–298.
Manuel L, Talhinhas P, Várzea V, Neves-Martins J (2010) Characterization of Colletotrichum
32
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
kahawae isolates causing coffee berry disease in Angola. Journal of Phytopathology, 158, 310–
313.
McDonald J (1926) A preliminary account of a disease of green coffee berries in Kenya.
Transactions of the British Mycological Society, 11, 145–154.
Minin VN, Bloomquist EW, Suchard MA (2008) Smooth skyride through a rough skyline:
Bayesian coalescent-based inference of population dynamics. Molecular Biology and Evolution,
25, 1459–1471.
Munday PL, Herwerden LV, Dudgeon CL (2004) Evidence for sympatric speciation by host shift
in the sea. Current Biology, 14, 1498–1504.
Nei M (1987). Molecular Evolutionary Genetics, Columbia Univ. Press, New York.
Nosil P, Vines TH, Funk DJ (2005) Reproductive isolation caused by natural selection against
immigrants from divergent habitats. Evolution, 59, 715–719.
Nutman F, Roberts F (1960) Investigations on a disease of Coffea arabica caused by a form of
Colletotrichum coffeanum Noack. I. Some factors affecting infection by the pathogen.
Transactions of the British Mycological Society, 43, 489–505.
Nylander, JAA (2004) MrModeltest v2. Program distributed by the author. Evolutionary Biology
Center, Uppsala University.
O’Donnell K, Cigelnik E (1997) Two divergent intragenomic rDNA ITS2 types within a
monophyletic lineage of the fungus Fusarium are nonorthologous. Molecular Phylogenetics and
Evolution, 7, 103–116.
Pina-Martins F, Paulo OS (2008) Concatenator: sequence data matrices handling made easy.
Molecular Ecology Resources, 8, 1254–1255.
33
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
Posada D, Crandall KA (1998) Modeltest: testing the model of DNA substitution. Bioinformatics,
14, 817–818.
Prihastuti H, Cai L, Chen H, McKenzie E, Hyde KD (2009) Characterization of Colletotrichum
species associated with coffee berries in northern Thailand. Fungal Diversity, 39, 89–109.
Rambaut A, Drummond AJ (2007) Tracer v1.4, Available from http://beast.bio.ed.ac.uk/Tracer.
Robinson RA (1974) Terminal report of the FAO coffee pathologist to the government of
Ethiopia. FAO, Rome AGO/74/443, 16pp.
Robinson RA (1976) Plant Pathosystems. Springer-Verlag, Berlin.
Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed
models. Bioinformatics, 19, 1572–1574.
Rozas J, Sánchez-DelBarrio JC, Messeguer X, Rozas AR (2003) DnaSP: DNA polymorphism
analyses by the coalescent and other methods. Bioinformatics, 19, 2496–2497.
Rundle HD, Nosil P (2005) Ecological speciation. Ecology Letters, 3, 336–352.
Schluter D (2009) Evidence for ecological speciation and its alternative. Science, 323, 737–741.
Schluter D, Conte GL (2009) Genetics and ecological speciation. Proceedings of the National
Academy of Sciences USA, 106, 9955–9962.
Servedio MR, Doorn GSV, Kopp M, Frame AM, Nosil P (2011) Magic traits in speciation:
“magic” but not rare? Trends in Ecology and Evolution, 26, 389–397.
Shimodaira H, Hasegawa M (1999) Multiple comparisons of loglikelihoods with applications to
phylogenetic inference. Molecular Biology and Evolution, 16, 1114–1116.
Silva DN, Talhinhas P, Lei C, Várzea V, Paulo OS, Batista D (2012) Application of the
Apn2/MAT locus to improve the systematics of the Colletotrichum gloeosporioides complex: An
34
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
example from coffee (Coffea spp.) hosts. Mycologia, DOI:10.3852/11-145.
Stukenbrock E, McDonald B (2008) The origins of plant pathogens in agro-ecosystems. Annual
Review of Phytopathology, 46, 75–100.
Swofford DL (2003) PAUP*: Phylogenetic Analysis Using Parsimony (* and Other Methods),
Version 4. Sinauer Associates, Sunderland, MA.
Taylor JW, Jacobson DJ, Kroken S, Kasuga T, Geiser DM, Hibbett DS, Fisher MC (2000)
Phylogenetic species recognition and species concepts in fungi. Fungal Genetics and Biology, 31,
21–32.
Via S (2001) Sympatric speciation in animals: the ugly duckling grows up. Trends in Ecology
and Evolution, 16, 381–390.
Winkler IS, Mitter C, Scheffer SJ (2009) Repeated climate-linked host shifts have promoted
diversification in a temperate clade of leaf-mining flies. Proceedings of the National Academy of
Sciences USA, 43, 18103–18108.
Zaffarano PL, McDonal BA, Linde CC (2008) Rapid speciation following recent host shifts in
the plant pathogenic fungus Rhynchosporium. Evolution, 62, 1418-1436.
Zeigler RS (1998) Recombination in Magnaporthe grisea. Annual Review of Phytopathology, 36,
249-275.
Data accessibility
List of isolates, natural host, country and geographic region: Table S1 in supporting information.
DNA sequences were deposited in the EMBL database with the accession numbers presented in
Table S1.
35
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
Multiple sequence alignments for each nuclear marker were archived in the dryad repositories:
doi:
*BEAST, GMRF Skyride and IMa input files were archived in the dryad repositories: doi:
36
792
793
794
Figure legends:
Fig. 1 50% majority rule Bayesian tree with the concatenated six-marker dataset illustrating the
evolutionary relationships between Colletotrichum kahawae and C. gloeosporioides s.l. from
coffee and other hosts. Isolates from Coffea spp. hosts are followed by a black bar while isolates
from other hosts are followed by a grey bar and a code, for which a key provided in the right. The
geographic origin of the isolates is also provided underlined. The tree was rooted with C.
fragariae. C. kahawae is represented by a single clade for clarity with the detailed phylogenetic
reconstruction provided in Fig. 5.
Fig. 2 Time-calibrated Bayesian species tree resulting from the *BEAST analysis based on the
six-marker dataset and on the best partitioning scheme selected from the BF analysis. Posterior
probabilities are given above the respective node. Scale numbers correspond to years before
present.
Fig. 3 GMRF Bayesian Skyride plot depicting fluctuations in the population size of
Colletotrichum kahawae through time. The x-axis is in units of years before present and the y-
axis is equal to Neτ. The thick black line is the median estimate and the gray dashed lines delimit
the 95% Highest Posterior Density. The vertical dashed line represents the median of the time to
the most recent common ancestor between C. kahawae and the closest non-C. kahawae relative.
Fig. 4 Median-joining haplotype networks depicting the relationship between Colletotrichum
kahawae and Undescribed group 1 (UG1). Haplotypes are presented as pie charts with size
proportional to the number of individuals.
37
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
Fig. 5 The posterior probabilities of parameter estimates of the isolation with migration model
(IMa).
Fig. 6 Phylogeographic patterns of Colletotrichum kahawae. (a) Detailed phylogenetic
reconstruction of populations relationships within the C. kahawae clade, using the combined six-
marker dataset. Bootstrap and posterior probability values are provided above each branch. The
three nucleotide combination provided below each branch represents the mutational events that
occurred along the evolution of the three populations. The combination of the Angola population
was inferred as the ancestral state. Mutations are highlighted with an asterisk and the source gene
is provided. KA, non-synonymous mutation; Ki, intronic mutations. (b) Geographic distribution
of the three divergent C. kahawae populations, with the respective location key provided on the
upper left corner. Countries were highlighted with different colors corresponding to the existing
haplotypes. Rough orange areas on the map represent regions above 1 400 m of altitude.
38
818
819
820
821
822
823
824
825
826
827
828
829
830
831
Table 1 Summary of the phylogenetic information for the individual and combined nuclear
regions used in this study
Nuclear
Region FLa Indels NCb PIc %PId Ve Modelf Modelg
ITS 489 2 487 14 3% 1 TrNef+I SYM+I
β-tub2 575 34 541 76 13% 5 SYM+G SYM+G
Apn25L 883 36 847 172 20% 32 GTR+G GTR+G
ApMAT 782 84 698 287 37% 29 HKY+G HKY+G
MAT5L 213 6 194 38 18% 11 TrN+G HKY+G
MAT1-2-1h 843 1 842 141 17% 18 K81+I/GTR/SYM+G GTR+I/GTR/SYM
Combined 3783 130 3606 764 20% 72 - -a Fragment length
b Nucleotide characters (bp)
c Parsimony informative characters (bp)
d % of Parsimony informative characters
e Variable uninformative characters (bp)
f Best fit model of DNA evolution, under the AIC, implemented in ModelTest
g Best fit model of DNA evolution, under the AIC, implemented in MrModelTest
h Models of DNA evolution correspond to 1st and 2nd codon position/3rd codon position/intron
39
832
833
834
835
836
837
838
839
840
841
Table 2 Bayesian estimates of the time (in years) to the most recent common ancestor (TMRCA)
of four taxa sets of interest. Values in parenthesis are 95% Highest Posterior Densities intervals.
The divergence estimates were calculated using *BEAST v1.6.1 with the best partitioning
scheme previously selected with a BF analysis
Taxa set TMRCA SDa ESSb
Ck 2,219 (320; 4,784) 64 428
Ck & UG1 11,547 (3,574; 21,735) 133 1470
Ck & Cg432 7,840 (2,020; 15,245) 142 743
Ck & Cg from Coffea spp. 403k (149k; 647k) 2.6k 2811a Standard deviation of mean
b Effective Sample Size
40
842
843
844
845
846
847
Table 3 Divergence and differentiation between the Undescribed group 1 (UG1) group and
Colletotrichum kahawae
Locus Daa (%) Vb VF
c VSd PCk
e PUG1e FST
f Snnf Rmg
ApMAT 0.09 7 0 0 0 7 0.536*** 0.88596*** 0
Apn25L 0.12 13 1 0 0 12 0.868*** 0.93525*** 0
ITS 0.03 4 0 0 0 4 0.634** 0.88596*** 0
MAT1-2-1 0.31 7 2 0 1 4 0.845*** 1*** 0
MAT5L 0.35 2 0 0 0 2 0.913*** 0.96610*** 0
β-tub2 0.13 5 0 0 2 3 0.425** 0.84394*** 0
Total 0.14 38 3 0 3 32 0.757*** 1***a Net divergence (Nei 1987)
b Total number of polymorphic sites
c Number of fixed polymorphisms
d Number of shared polymorphisms
e Number of exclusive polymorphisms for C. kahawae and UG1
f Differentiation statistics measured by FST (Excoffier et al. 2005) and Snn statistics (Hudson 2000)
g Minimum number of recombination events
Statistical significance: ** P < 0.05; *** P < 0.01
41
848
849
850
851
852
853
854
855
856
857
858
Table 4 Maximum-likelihood estimates (MLE) and respective 95% High Posterior Density
(HPD) intervals for parameters of the isolation and migration model for the clade composed by
Undescribed group 1 and Colletotrichum kahawae
θA θCK θUG1 mUG1>Ck mCk>UG1
MLE 6.38 0.15 5.45 0.88 1.45
Lower 95% HPD 0 0.05 1.09 0 0
Higher 95% HPD 13.97 0.29 12.29 2.73 2.59
42
859
860
861
862
0.02
C. kahawae
UG1
UG2
Ang52 Angola
C1206.3 New Zealand
Ang84 Angola
Ang96 Angola
BPD-I16 Thailand
Thai2 Thailand
Thai3 Thailand
Ang101 Angola
Ang95 Angola
C1252.12 New Zealand
CCA4 Kenya
C880.1 New Zealand
CR21 Portugal
C1262.12 Australia
Bra9 Brazil
CCM6 Kenya
Mal5 Malawi
CCM7 Kenya
BPD-I4 Thailand
PT111 Portugal
Ang91 Angola
Chi4 China
Bra8 Brazil
Ang97 Angola
Col2 Colombia
Cg432 Portugal
Thai1 Thailand
C1282.3 New Zealand
Thai4 Thailand
Ang99 Angola
Ang40 Angola
CCM5 Kenya
BPD-I2 Thailand
C1282.4 New Zealand
Bra5 Brazil
Col1 Colombia
CCA3 Kenya
C1291 USA
Ang100 Angola
C1275.8 Germany
C1288.1 New Zealand
PR220 Portugal
C1252.22 New Zealand
C. gloeosporioides s.s.
C. siamense
C. asianum
C. fructicolaUG3
C. fragariae
Posterior probability > 0.95Bootstrap > 70
OeCl
OeCaKeHpPaMi
PtVlCo
Host key:Coffea spp.
Non - Coffea spp.Oe - Olea europaea
Cl - Citrus limon
Ca - Camellia sp.
Ke - Kunzea ericoides
(Angiosperm; Asterids)
(Angiosperm; Rosids)
(Angiosperm; Asterids)
(Angiosperm; Rosids)Hp - Hypericum perfuratum
(Angiosperm; Rosids)Pa - Persea americana
(Angiosperm; Magnolids)Mi - Mangifera indica
(Angiosperm; Rosids)Pt - Podocarpus totara
(Gimnosperms; Pinopsida)Vl - Vitex lucens
(Angiosperm; Asterids)Co - Coprosma sp.
(Angiosperm; Asterids)
(Angiosperm; Gentiales)
Posterior probability > 0.95Bootstrap < 70
0100000200000300000400000500000
C. siamense
UG3
C. kahawae
C. gloeosporioides
UG1
C. asianum
C. fragariae
C. fructicola
UG2
1
1
0.53
0.89
1
10.94
0 5 10 15
0.0
50
.07
0.0
90
.11
0.0 0.1 0.2 0.3 0.4 0.5
01
23
45
6
0 5 10 15
00
.05
0.1
00
.15
AθUG1θCkθ
Prob
abili
ty
0 1 2 3 4 0 1 2 3 4
mCk > UG1 mUG1 > Ck
Prob
abili
ty
Population size
Migration
00.4
0.8
1.2
00.2
0.4
0.6