Advances in Coffea Genomics

42
This chapter was originally published in the book Advances in Botanical Research (Volume 53). The copy attached is provided by Elsevier for the author’s benefit and for the benefit of the author’s institution, for non-commercial research, and educational use. This includes without limitation use in instruction at your institution, distribution to specific colleagues, and providing a copy to your institution’s administrator. All other uses, reproduction and distribution, including without limitation commercial reprints, selling or licensing copies or access, or posting on open internet sites, your personal or institution’s website or repository, are prohibited. For exceptions, permission may be sought for such use through Elsevier's permissions site at: http://www.elsevier.com/locate/permissionusematerial From Alexandre de Kochko, Advances in Coffea Genomics. In: Jean-Claude Kader and Michel Delseny, editors, Advances in Botanical Research (Volume 53) Academic Press, 2010, p. 23. ISBN: 978-0-12-380872-1 © Copyright 2010, Elsevier Ltd. Academic Press. Provided for non-commercial research and educational use only. Not for reproduction, distribution or commercial use.

Transcript of Advances in Coffea Genomics

This chapter was originally published in the book Advances in Botanical Research (Volume 53). The copy attached is provided by Elsevier for the author’s benefit and for the benefit of the author’s institution, for non-commercial research, and educational use. This includes without

limitation use in instruction at your institution, distribution to specific colleagues, and providing a copy to your institution’s administrator.

All other uses, reproduction and distribution, including without limitation commercial reprints, selling or licensing copies or access, or posting on open internet sites, your personal or institution’s website or

repository, are prohibited. For exceptions, permission may be sought for such use through Elsevier's permissions site at:

http://www.elsevier.com/locate/permissionusematerial

From Alexandre de Kochko, Advances in Coffea Genomics. In: Jean-Claude Kader and Michel Delseny, editors, Advances in Botanical Research

(Volume 53) Academic Press, 2010, p. 23. ISBN: 978-0-12-380872-1

© Copyright 2010, Elsevier Ltd. Academic Press.

Provided for non-commercial research and educational use only. Not for reproduction, distribution or commercial use.

Author's personal copy

Advances in Coffea Genomics

ALEXANDRE DE KOCHKO,*,1 SELASTIQUE AKAFFOU,† ALAN

C. ANDRADE,‡ CLAUDINE CAMPA,* DOMINIQUE CROUZILLAT,§

ROMAIN GUYOT,* PERLA HAMON,* RAY MING,|| LUKAS A.

MUELLER,{ VALERIE PONCET,* CHRISTINE TRANCHANT-

DUBREUIL* AND SERGE HAMON*

*UMR DIAPC, GECOFA; Centre IRD de Montpellier, BP64501,

34394 Montpellier Cedex, France†URES Daloa, B150, Daloa, Cote d’Ivoire

‡Laboratory of Molecular Genetics (LGM-NTBio), Embrapa Genetic

Resources and Biotechnology, CP 02372, 70770-900 Brasılia-DF,

Brazil§Nestle R&D Tours, 101 Av. G. Eiffel, Notre Dame d’Oe, BP 49716

37097, Tours, Cedex 2, France||Department of Plant Biology, University of Illinois at Urbana

Champaign, Urbana, IL 61801, USA{Boyce Thompson Institute for Plant Research, Tower Road, Ithaca,

NY 14853-1801, USA

I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

II. Molecular Markers, Genetic Maps and Cytogenetics . . . . . . . . . . . . 27

A. Molecular Markers and Genetic Diversity . . . . . . . . . . . . . . . . . 27

B. Genetic Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

C. QTL Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1Corresponding author: E-mail: [email protected]

Advances in Botanical Research, Vol. 53Copyright 2010, Elsevier Ltd. All rights reserved.

0065-2296/09 $35.00DOI: 10.1016/S0065-2296(10)53002-7

Author's personal copy

III. Genomic Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

A. Coffea Genome Size and Cytogenetics . . . . . . . . . . . . . . . . . . . . 34

B. Expressed Sequence Tags in Coffea . . . . . . . . . . . . . . . . . . . . . . 35

C. Bac Libraries in Coffea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

D. Genes and Metabolism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

E. Bioinformatics: Coffee Genomic Resources Available on theWorld Wide Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

IV. Toward the Whole Genome Sequencing of Coffee . . . . . . . . . . . . . . 49

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

ABSTRACT

Coffee is the second most valuable commodity exported by developing countries. TheCoffea genus comprises over 103 species but coffee production uses only two speciesthroughout the tropics: Coffea canephora, which is self-sterile and diploid and betterknown as Robusta, and C. arabica, which is self-fertile and tetraploid. With thearrival of new analytical technologies and the start of genome sequencing projects,it was clearly time to review the state of the art of coffee genetics and genomics.

In the first part of this chapter,wepresent themain results concerning genetic diversityandphylogeny – themost advanced fields – based on largemolecularmarker sets, such asrandomamplified polymorphicDNAs (RAPDs), amplified fragment lengthpolymorph-isms (AFLPs), intersimple sequence repeat (ISSR), single sequence repeats (SSRs), orconserved orthologue set (COS), which are mainly polymerase chain reaction (PCR)based. Thesemarkers also enable the construction of genetic maps and the identificationof quantitative trait loci (QTLs) for both morphological and biochemical traits.

In the second part, after reviewing current knowledge on variation in coffee genomesize and insights into cytogenetics, we focus on currently available genomic resourcesand web facilities. Large sets of expressed sequences tags (ESTs) and bacterial artificialchromosome (BAC) libraries for both C. canephora andC. arabica have been obtainedalong with information on genes and specific metabolic pathways.

In the final section, we describe recently designed tools and their ultimate goal, whichis to facilitate the sequencing, assembly and annotation of the first Coffea genome. Weare at the gate of a new era of scientific approaches to coffee that should lead to a betterunderstanding of phylogenetic relationships and genome evolution within the genus.Finally, taken together, this information should help develop improved varieties tomeet the new challenges represented by ongoing radical changes in the environment.

I. INTRODUCTION

Coffee is the fourth most valuable traded agricultural commodity (FAO

Statistics, 2004) but is the second most valuable commodity exported by

developing countries, and more than 75 million people depend on coffee for

all or most of their livelihood (Pendergrast, 2009). Coffee is currently

24 A. DE KOCHKO ET AL.

Author's personal copy

produced throughout the tropics, the two main cultivated species being

Coffea canephora (better known as Robusta) and C. arabica (Fig. 1).

The Coffea genus belongs to the Rubiaceae family, the largest flowering

plant family comprising about 650 genera and 13,000 species (Rova et al.,

2002). All Coffea species are native to Africa, Madagascar and Mascarenes.

The genus, which was formerly divided into three botanical sections: Eucoffea

(West and Central African species), Mozambicoffea (East African species) and

Mascarocoffea (Malagasy and Mascarenes species) by Chevalier (1947) and

Coste (1955), is now organized in at least 103 species (Maurin et al., 2007). All

Coffea species, except C. arabica (amphidiploid with 2n= 4x= 44 chromo-

somes), are diploid (2n= 2x= 22 chromosomes). Their diploid genome size,

estimated by flow cytometry (Cros et al., 1995; Noirot et al., 2003), varies from

1.03 to 1.76pg of DNA per nucleus.

From the early 1970s to the late 1980s, wild Coffea species were collected

in contrasted habitats in Madagascar and African tropical forests with two

main objectives: to preserve their genetic resources and to produce interspe-

cific hybrids for breeding programs. The impressive diversity of the genus,

with very extreme phenotypes ranging from small shrubs in East African dry

forests to 20-m-high trees in West African tropical rain forests, makes the

genus an ideal model to explore diversification and adaptation processes.

During the 1980s, plant geneticists and breeders mostly focused on the

construction of genetic linkage maps and the identification of quantitative

A C

B

Fig 1. Coffee tree:Coffea canephora. (A) Tree carrying fruits. (B) Flowers. (C) Mature

fruits.

ADVANCES IN COFFEA GENOMICS 25

Author's personal copy

trait loci (QTLs). Bernatzky and Tanksley (1986) first demonstrated that

efficient linkage mapping strategies can be applied to plants with a high

number of polymorphic loci. As far as coffee is concerned, breeders were

mostly interested in acquiring a genetic map of C. arabica. However, the

main problem with Arabica is that the cultivated forms come from a very

narrow genetic base (foundation effect) and that the Ethiopian genetic

resources are not available to geneticists. On the other hand, because of its

lower coffee quality, C. canephora was not a priority for breeders despite its

diploid status, naturally huge diversity, and the fact that it is the main source

of instant coffee. This explains why coffee genetics was not at the cutting

edge during this period.

During the 1990s, there was an explosion in Arabidopsis research due to

rapid progress in plant molecular science. The initial objectives, which were the

free availability of most resources and the completion of the genome sequen-

cing of Arabidopsis, rapidly stimulated projects focused on different types of

plants. In parallel, the synteny concept was extended to address questions of

chromosome homeology (mostly within the grass family), which led to the

possibility of extrapolating results from one species to a relative species (Devos

et al., 1995): Rice genetic maps were extrapolated to sorghum, sugarcane, etc.

Arabidopsis data were extrapolated within the Cruciferaceae family, specifi-

cally within cultivated Brassica species. Similarly a strong colinearity between

tomato and potato chromosomes (Solanaceae family) was described by

Bonierbale et al. (1988). In the 1980s, 10 years after the pioneers, coffee

researchers were only just beginning to develop their own molecular markers.

In the early years of the 21st century, new sequencing possibilities have

changed our vision of genomics, which has become crucial for coffee, a

model plant like tomato. The SOLanaceae genome project (SOL network)

chose a strategy based on the sequencing of 220Mb of Euchromatin, which

was believed to contain most genes and particularly those involved in traits

such as fleshy fruit formation, which are not found in Arabidopsis or rice.

Today, whole genome sequencing (WGS) is routinely performed and with

the rapid emergence of new technologies, an increasing number of species,

including C. canephora, will be sequenced in the next few years. In contrast

to previous approaches, these new technologies offer access to all the genes

of a genome; provide information about global genome structure and allow

analysis of regulatory regions, transposable elements and noncoding

sequences. Coupled with highly automated tools, these approaches have

created unprecedented opportunities for generating and analyzing large

biological data sets. The opportunity to systematically compare complete

nucleic acid sequences from very different organisms has fundamentally

modified the way biologists undertake genome studies.

26 A. DE KOCHKO ET AL.

Author's personal copy

As far as coffee is concerned, an important goal remains the sequencing of

the economically predominant species, C. arabica. But due to its allotetra-

ploid nature and the subsequent difficulties to assemble properly the

sequence of its genome, the scientific community finally agreed to concen-

trate its efforts first on C. canephora, the diploid cultivated species. This

decision was facilitated by the availability of dihaploid genotypes of

C. canephora obtained using the technique developed by Couturon (1986).

The availability of dihaploid genotypes combined with the increase in new

technologies suddenly opened the way to sequencing the whole Coffea

genome. In this chapter, we present an up-to-date review of the main

advances in coffee genomics with special emphasis on structural genomics,

functional genomics, bioinformatics and WGS strategies.

II. MOLECULAR MARKERS, GENETIC MAPS ANDCYTOGENETICS

Initial coffee molecular genetic data were obtained using isozymes from 18

populations of wild species (Berthou et al., 1980). Restriction fragment

length polymorphisms (RFLPs) were only used for a short period because

of the large amounts of DNA required by such markers and the low yield of

marking subsequently obtained. During the 1990s, the development of poly-

merase chain reaction (PCR)-based technologies paved the way for a great

number of molecular markers such as random amplified polymorphic DNAs

(RAPDs) and amplified fragment length polymorphisms (AFLPs). But the

real change came with the advancement of single sequence repeats (SSRs),

single nucleotide polymorphism (SNP) and conserved ortholog set (COS)

markers.

In coffee, like in other crops, molecular markers were mostly used to (i)

assess the genetic diversity of the species, (ii) construct genetic maps, and (iii)

identify QTLs.

A. MOLECULAR MARKERS AND GENETIC DIVERSITY

The first paper reporting on the molecular analysis of DNA was published

by Berthou et al. (1983). Chloroplast and mitochondrial DNA from nine

species or taxa of coffee trees were compared with respect to their phyloge-

netic relationship using RFLP analysis. Three types of chloroplast DNA (cp

DNA) were detected indicating the following relationships: (i) C. arabica,

C. eugenioides; (ii) C. canephora, C. congensis, ‘nana’ taxon; and (iii)

C. liberica. The mitochondrial DNA (mt DNA) separated into five types:

ADVANCES IN COFFEA GENOMICS 27

Author's personal copy

(i) C. arabica, C. eugenioides, C. congensis; (ii) C. canephora, ‘nana’ taxon;

(iii) C. excelsa; (iv) C. liberica; and (v) Paracoffea ebracteolata. The diver-

gence in organelles containing DNAs agreed with the phylogenetic relation-

ship deduced using the then conventional methods. Restriction patterns of

the cp and mt DNAs isolated from a clone of C. arabusta (a hybrid between

C. canephora and C. arabica) were compared to those of the parents and

were found to be maternally inherited.

RFLPs, involving DNA–DNA hybridization with homologous and/or

homeologous probes, were the first molecular markers used to build plant

genetic linkage maps (Bernatzky and Tanksley, 1986; McCouch et al., 1988).

In coffee, they were also used to construct the first molecular linkage map

(Paillard et al., 1996). The development of PCR-based markers, such as

RAPDs, allowed researchers to get around the problem of the amount of

DNA and, despite the low RAPD reproducibility often reported in the

literature, these markers are still used today to assess the genetic diversity

within C. arabica (Masumbuko and Bryngelsson, 2006; Sera, 2001; Silveira

et al., 2003) and also in C. canephora (Ferrao et al., 2009; Tshilenge et al.,

2009).

The development of AFLPs by Vos et al. (1995) enabled numerous repro-

ducible and informative dominant markers to be obtained in only a few

experiments even if reproducibility between laboratories was not ideal. In

addition, these markers are generally widely distributed throughout the

genome and, mostly in C. arabica, are useful to (i) identify genetic diversity

among cultivars (Dessalegn et al., 2008); (ii) detect introgression in cultivars

derived from natural interspecific hybrids (Prakash et al., 2004; Steiger et al.,

2002); (iii) analyze the genetic of resistance (Gichuru et al., 2008; Herrera

et al., 2009); (iv) construct a genetic map of C. arabica (Pearl et al., 2004);

and (v) clarify taxonomic debates between C. liberica var liberica and

C. liberica var dewevrei, combined with the analysis of morphological traits

and the study of male fertility in interspecific F1 hybrids (N’Diaye et al.,

2005; Poncet et al., 2005). AFLPs are often used to analyze diversity (and for

genetic mapping), but very little information is available on their sequence

characteristics. Species-specific sequences have been analyzed in a single

Coffea genome (C. pseudozanguebariae) associated with clustered or non-

clustered AFLP loci of known genetic position. Conversion of these AFLP

markers into sequence-characterized amplified region (SCAR) anchor mar-

kers enabled the determination of sequence conservation within Coffea

species with respect to species relatedness (Poncet et al., 2005).

Most recently, intersimple sequence repeat (ISSR) and inverse sequence-

tagged repeat (ISTR) markers have been used in several studies (Aga and

Bryngelsson, 2006; Masumbuko and Bryngelsson, 2006; Ruas et al., 2003).

28 A. DE KOCHKO ET AL.

Author's personal copy

However in coffee, these kinds of markers are not widely used. Nevertheless,

they confirmed the low genetic diversity available among the cultivated

C. arabica species in Ethiopia (Aga and Bryngelsson, 2006; Aga et al.,

2003, 2005; Zeltz et al., 2005).

SSR, microsatellite markers, appeared later in the development of new

technologies, especially for DNA sequencing. They opened the way for the

development of very informative molecular markers due to the high level of

polymorphism of this type of marker also enabling the detection of all alleles

present at a given locus. Hundreds of markers were obtained from micro-

satellite libraries, sequencing of bacterial artificial chromosome (BAC) ends,

or expressed sequence tag (EST) libraries from both C. arabica (Baruah

et al., 2003; Rovelli et al., 2000) and C. canephora (Dufour et al., 2001;

Hendre et al., 2008; Leroy et al., 2005; Poncet et al., 2007). Primers derived

from these sequences generally exhibit broad cross-species transferability

(Baruah et al., 2003; Combes et al., 2000; Moncada and McCouch, 2004;

Poncet et al., 2004). Despite their high efficiency, the SSRs confirmed the

lower number of alleles per locus in C. arabica (cultivated and wild forms)

compared to C. canephora (Cubry et al., 2008; Moncada and McCouch,

2004; Silvestrini et al., 2007). They also confirmed the considerable diversity

of C. canephora and, more generally, of wild diploid species (Cubry et al.,

2008; Gomez et al., 2009; Prakash et al., 2005). New SSR markers for

diversity studies (Cubry et al., 2008) and BAC–FISH (fluorescent in situ

hybridization) experiments have been also successfully developed (Guyot

et al., 2009; Herrera et al., 2007).

COS markers are a very promising new source of markers (i.e., SSRs or

SNPs) (Fulton et al., 2002; Wu et al., 2006). Generated by the combined

availability of information from characterized gene sequences in compar-

ison with other unrelated species using bioinformatics, these markers are

being already used in many studies. One study, whose aim was to develop

widely applicable gene markers for phylogenetic reconstructions at low

taxonomic levels, tested the low copy of nuclear COS genes (Li et al.,

2008). The markers were found to be highly informative in phylogenetic

reconstruction of congeneric species, where introns provide a higher pro-

portion of parsimony informative sites than traditional phylogenetic mar-

kers such as intergenic transcribed sequence (ITS) and matK. At greater

phylogenetic distances, where only coding regions could be aligned, the

polymorphism levels of the COS ranged between those of ndhF and matK.

The first paper describing the use of COSII markers for comparative

mapping across Solanaceae (tomato) versus Rubiaceae (coffee) was pub-

lished by Wu et al. (2006). The results indicate that some coffee genome

areas correspond to specific segments of the tomato genome and imply

ADVANCES IN COFFEA GENOMICS 29

Author's personal copy

that COS genes can be used for comparative mapping among Euasterid

plant families.

B. GENETIC MAPS

In the case of C. canephora, coffee genetic linkage mapping started at the

same time as the development of molecular markers. A progeny of

C. canephora doubled haploids was the first to be mapped with RFLPs

and RAPDs (Paillard et al., 1996). Based on SSRs, SNPs and COSII, two

genetic maps of C. canephora, are currently nearing completion by Centre

de cooperation Internationale en Recherche Agronomique pour le Devel-

oppement (CIRAD) and Nestle (T. Leroy and D. Crouzillat, personal

communication). Fig. 2 shows the C. canephora genetic map developed

by Nestle (Lefebvre-Pautigny et al., submitted). Their comparison will be

used to build a C. canephora consensus map, the future reference for the

ongoing coffee genome sequencing project. In addition, in collaboration

with Indonesian Coffee and Cocoa Research Institute (ICCRI)

(Indonesia), a progeny will be available for the scientific community, as

recommended by the International Coffee Genomics Network (ICGN;

http://www.coffeegenome.org/).

Interspecific crosses between C. canephora and wild species have also been

undertaken but frequently showed segregation distortion (Ky et al., 2000).

Most coffee interspecific genetic linkage maps were first constructed to

identify QTLs involved in very contrasting traits existing in several wild

species (Coulibaly et al., 2002, 2003a, b; Ky et al., 2000; N’Diaye et al.,

2007). Now, the mapping of anchor markers, such as genes (Bustamante-

Porras et al., 2007a; Campa et al., 2003; Mahesh et al., 2006), SSRs and

EST–SSRs (Coulibaly et al., 2003b), provides valuable information on the

organization of different sized genomes through comparative mapping. The

construction of genetic maps using COSII markers by Nestle and Institut de

Recherche pour le Developpement (IRD) for different interspecific proge-

nies will soon allow comparison between the two closely related families,

Solanaceae (tomato) and Rubiaceae (coffee).

The first C. arabica genetic linkage map was constructed with AFLPs on a

small pseudo-F2 population (Pearl et al., 2004). As mentioned above, due to

the low genetic diversity of C. arabica, the combinations of 288 AFLP

primers only generated 464 markers that were usable for mapping. A partial

map of C. arabica was constructed, based on a backcross population and

RAPD markers, by De Oliveira et al. (2007). From a total of 178 markers

evaluated, only 134 that segregated 1:1 (P> 0.05) were used to build the

map. Seventeen markers were not linked, while 117 formed 11 linkage

30 A. DE KOCHKO ET AL.

Author's personal copy

A B C D E

Fig 2. Genetic map of Coffea canephora developed by Nestle. The map contains 682

loci, 479 SSR, 199 SNPs (unigenes) and 4 BAC. It covers 1306 cM.

ADVANCES IN COFFEA GENOMICS 31

Author's personal copy

groups, covering a genome distance of 1803.2 cM. The maximum distance

between adjacent markers was 26.9 cM, and only seven intervals exceeded

20 cM. The markers were also used for assisting selection of the plants

closest to the recurrent parent, to accelerate the introgression of rust resis-

tance genes in the coffee breeding program. These results are a good

F G H I J K

Fig 2. (Continued)

32 A. DE KOCHKO ET AL.

Author's personal copy

testimony of the considerable efforts invested to acquire a saturated

C. arabica genetic map.

C. QTL IDENTIFICATION

In Coffea, the genetic control of useful traits (monofactorial or quantitative)

such as S alleles, fruiting cycle length, biochemical contents and morpholo-

gical traits, has mostly been studied using interspecific progenies.

The S-locus was mapped using a C. canephora progeny derived from a

doubled haploid (Lashermes et al., 1996) and an interspecific cross between

C. canephora (self-incompatible) and C. heterocalyx (self-compatible)

(Coulibaly et al., 2002, 2003a). In the latter progeny, three significant

QTLs were also detected for pollen viability.

In C. canephora, C. arabica and C. liberica, the period between flowering

and ripening is very long (10–11 months). Following the identification of

very shorter fruiting cycles (2–3 months) in wild species native to eastern

Africa, the control of fruiting time (FT) and cycle length are very interesting

characters to study. Using an interspecific backcross between C. pseudozan-

guebariae (2 months FT) and C. liberica Hiern var. dewevrei de Wild (10–11

months FTs), Akaffou et al. (2003) showed that FT is an additive trait. The

bimodal distribution of the full growth period suggests the involvement of

Ft1, a major gene. This gradient overlaps those of caffeine and chlorogenic

acid (CGA) content, suggesting that long FT could control caffeine and

CGA contents in coffee beans. There is a similar relation between FT and

seed weight.

The genetic control of several biochemical compounds was studied using

the same progeny. It appears that the bean caffeine content and the quantity

of an undetermined heteroside are oligogenic (Barre et al., 1998). One major

gene with two alleles could be involved in the control of the biosynthesis of

the two compounds; the absence of caffeine could be controlled by one

recessive gene and heteroside content by one codominant gene. Feruloylquinic

acid (FQA) isomer content appears to be controlled by one major gene with

the dominant allele leading to the absence of 3-FQA (Ky et al., 1999). As far

as trigonelline is concerned, a nucleocytoplasmic inheritance with one nuclear

QTL was found (Ky et al., 2001).

Morphological quantitative traits were also studied. The phenotypic (16

quantitative traits) and genetic differentiation between the two related

Coffea species (C. liberica Hiern and C. canephora Pierre) were evaluated

(N’Diaye et al., 2007). Eight QTLs for the petiole length, leaf area, number

of flowers per inflorescence, fruit shape, fruit disc diameter, seed shape and

seed length were identified and mapped. However, in all these cases, only

ADVANCES IN COFFEA GENOMICS 33

Author's personal copy

major genes or QTLs with strong effects could be detected because of the

small size of the available populations.

III. GENOMIC RESOURCES

In the past few years, there has been a dramatic increase in coffee genomic

tools and molecular resources for cultivated Coffea species. Understanding

the composition, structure and evolution of the coffee genome is now pos-

sible. The Coffea research community has produced basic genomic resources

such as large sets of EST sequences and genomic inserts into BAC libraries

for both C. canephora and C. arabica. Here, we describe (i) current knowl-

edge on coffee genome size and cytogenetics, (ii) Coffea EST resources; (iii)

available BAC libraries; (iv) genes and metabolism; and (v) web facilities.

A. COFFEA GENOME SIZE AND CYTOGENETICS

Coffee genome sizes were estimated using flow cytometry. Four main conclu-

sions were drawn: (i) the genome size of diploid coffee varies from 1.03

(C. racemosa) to 1.76pg (C. humilis) (Cros et al., 1995, 1998; Hamon et al.,

2009; Noirot et al., 2003); (ii) species native to dry areas (mostly in East Africa)

have a smaller genome size (<1.3 pg) than those native to evergreen forest

(1.3< x< 1.76pg); (iii) a difference in genome size greater than 0.25pg is

associated with high rates failure in crosses and marked sterility of hybrids;

(iv) by coupling flow cytometry tools with cytometry images, the 1C nuclear

DNA content of C. canephora and C. arabica was evaluated and results con-

firmed the true allotetraploidy of C. arabica (Clarindo and Carvalho, 2008).

As far as chromosome morphology is concerned, early observations clearly

showed that diploid coffee chromosomes (2n= 2x= 22) are small, metacentric

and submetacentric (Sybenga, 1960). For a long time, the limited number of

metaphases in root meristem cells has strongly limited coffee cytogenetic

investigations. But recently, the development of pachytene chromosome ana-

lysis (Pinto-Maglio and Da Cruz, 1987, 1998) and improved methods of

chromosome preparations (Clarindo and Carvalho, 2008) enabled an overview

of heterochromatin versus euchromatin distribution along the chromosomes

and led to karyotyping of both C. arabica and C. canephora.

FISH and heterochromatin staining techniques have been used for coffee

(Barre et al., 1998; Hamon et al., 2009; Raina et al., 1998). These techniques

enabled improved resolution for the physical mapping of ribosomal genes

and heterochromatin AT- or GC-rich regions. This approach, which was

extended to a large set of Coffea species (Hamon et al., 2009), showed that

34 A. DE KOCHKO ET AL.

Author's personal copy

there is a correlation between the number of secondary constrictions (one or

two satellite chromosomes (SAT-chromosomes)), the number of rDNA sites

(5S and 18S), the geographic origin of the species (West and Central Africa

vs. East Africa) and the genome size. However, it was impossible to assign a

causality relationship between these traits and rDNA distribution patterns.

FISH techniques also permitted visual location of BAC clones on

C. arabica (Herrera et al., 2007) and C. canephora chromosomes (Guyot

et al., 2009). Combined BAC–FISH technology for use on pachytene chro-

mosomes is a very promising tool for complementary studies of the organi-

zation of the coffee tree genomes, as well as for the comparison of species

genetic relationships and of physical maps.

Genomic in situ hybridization (GISH) was used to study the genome

organization of interspecific hybrids. In interspecific F1 hybrids, Barre

et al. (1998) demonstrated that there is a linear relationship between the

number of chromosomes of one parental species and the nuclear DNA

content. It was also helpful to analyze the DNA content of the backcrossed

derived hybrids and to monitor species evolution of C. Arabica. Lashermes

et al. (1999) focused on the origin of the tetraploid Arabica and concluded

on the allopolyploid origin of the species. Hamon et al. (2009) provided clear

cytogenetic evidence that one C. arabica progenitor is native to East Africa

and the other to West or Central Africa. More detailed analysis of the

organization of the coffee genome will certainly be undertaken in the very

near future by combining BAC–FISH, DNA fiber, FISH and genetic maps.

The organization of gene-rich regions along C. canephora chromosomes

remains unknown.However, previous cytological observations inC. canephora

and C. arabica chromosomes by Pinto-Maglio and Da Cruz (1987, 1998)

revealed the presence of intensely and lightly stained patterns in some regions

indicating an overall chromosome organization in condensed heterochromatin

and decondensed euchromatin regions. Similarly to the architecture of Medi-

cago and tomato chromosomes (Kulikova et al., 2001; Lin et al., 2005),

C. canephora heterochromatin regions are mainly located around centromeric

regions, while euchromatin forms the distal parts of chromosomes. As for

other studied plants, it is expected to find gene-rich regions in the distal

euchromatine of the C. canephora genome.

B. EXPRESSED SEQUENCE TAGS IN COFFEA

ESTs are high-throughput single-pass sequences produced from cDNA clones.

These cDNA clones are generally organized in large libraries that provide a

picture of gene expression in a specific tissue or an organ under particular

physiological conditions. Despite their short sizes and their relatively low

ADVANCES IN COFFEA GENOMICS 35

Author's personal copy

quality, ESTs are valuable resources that can be exploited in different ways. In

the absence of complete genome sequence, large-scale ESTs are used to (i)

discover the composition of genes, (ii) provide gene repertoires of a given

species and (iii) study gene expression in particular tissues or under specific

conditions (Van der Hoeven et al., 2002). EST sequences can also be used for

comparative genomic applications including the determination of conserved

genes between genomes to (i) perform phylogenetic studies (Vandepoele and

Van de Peer, 2005), (ii) discover COS markers for comparative genetic map-

ping (Fulton et al., 2002) and (iii) investigate and study gene and genome

duplications. More recently, the availability of large-scale EST sequences

from different genotypes of the same organism facilitated the detection of

new allelic variations such as SNP and InDels (Tang et al., 2006). Finally,

ESTs are an invaluable complement to sequenced genomes in the validation of

gene predictions, the identification of coding and noncoding transcribed

regions and the identification of alternative splicing of genes (Rudd, 2003).

As of July 2009, more than 62 million ESTs are publicly available on the

dbESTdatabase (GenBank; http://www.ncbi.nlm.nih.gov/dbEST/dbEST_sum-

mary.html) (Boguski et al., 1994). These EST sequences have increased drama-

tically by more than 20 million during the past 2 years (3 million between

November 2008 and February 2009). Table I lists the top seven (T1–7)

TABLE ISummary of ESTs of a Selection of Plant Organisms Available in GenBank (as of

July 2009)

Species Nb of EST sequences

Top sevenT1 – Zea mays (maize) 2,018,634T2 – Arabidopsis thaliana 1,527,298T3 – Glycine max (soybean) 1,386,618T4 – Oryza sativa (rice) 1,248,995T5 – Triticum aestivum (wheat) 1,067,014T6 – Brassica napus (oilseed rape) 632,344T7 – Hordeum vulgare (barley) 501,366

Other examplesPhyscomitrella patens subsp. Patens 362,131Vitis vinifera (wine grape) 353,941Solanum lycopersicum (tomato) 291,209

Medicago truncatula (barrel medic) 260,238

Coffea speciesCoffea canephora (Coffee robusta) 55,694Coffea arabica (Coffee arabica) 43,562

Source: http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html.

36 A. DE KOCHKO ET AL.

Author's personal copy

plant species with the largest number of available ESTs in GenBank as well as

other model plant species. Together, these top seven plant species represent

more than 8.3 millions ESTs.

Different research groups have produced large-scale sets of Coffea EST

sequences. However, the number of publicly available ESTs remains dramati-

cally low because most of these sequences are private property. Some institu-

tions decided to keep their own resources confidential for a while (The

Brazilian Coffee Genome Project, CENICAFE), while others (Nestle, IRD)

made them freely available.

The Brazilian Coffee Genome Project has generated 130,792, 12,381 and

10,566 EST sequences from C. arabica, C. canephora and C. racemosa,

respectively, assembled into 33,000 unigenes (Vieira et al., 2006). The CEN-

ICAFE research group produced 32,961 EST sequences from three different

tissues (leaves, 31-week-old fruits and flowers) of C. arabica (cv. Catura)

assembled into 10,799 unigenes (Montoya and Vuong, 2006). Neither project

has yet released sequences to public databases1. Different research groups

have produced large sets of EST sequences in C. canephora. At the French

IRD, 10,420 EST sequences (assembled into 5534 potential unigenes) were

produced from C. canephora fruit and leaf cDNA libraries (Poncet et al.,

2006). Including the 47,000 ESTs, representing 13,175 unigenes, published by

Nestle and Cornell University (Lin et al., 2005), a total of 55,694 sequences

are currently available, comprising the main public resource for the scientific

community (Table I). From two C. arabica cultivars (red Catai and red

Bourbon), 1587 EST sequences were produced to develop a cDNA micro-

array containing 1506 ESTs from leaves and embryonic roots (De Nardi et al.,

2006). Sequences are available at the coffeeDNA database (http://www.cof-

feedna.net/). This considerable number of sequences represents a valuable

resource to establish an exhaustive gene catalog for the Coffea genus. Inter-

estingly, the analysis showed that 22% of sequences had no similarity to

released and known protein sequences in GenBank (BLASTX with a thresh-

old of 10e–5 E-value). A significant fraction may represent noncoding tran-

scribed sequences such as untranslated terminal region (UTR) or parts of

transposable elements (TEs), which are the main component and one of the

major forces driving the structure and evolution of plant genomes.

GenBank offers access to 1577 ESTs for C. arabica and 55,694 ESTs for

C. canephora. An EST database was generated at Cornell University (http://

www.sgn.cornell.edu/content/coffee.pl), grouping 47,000 ESTs from five

C. canephora cDNA libraries organized by type of tissue with particular

1Since the first submission of this manuscript, CENICAFE released 41,985 ESTs, bringing thetotal number of C. arabica ESTs to 43,562.

ADVANCES IN COFFEA GENOMICS 37

Author's personal copy

attention to seed development (Lin et al., 2005). Following clustering and

assembly, 13,175 unigenes were identified and used for comparative analysis

with the gene repertoires of Arabidopsis and tomato (Solanum lycopersi-

cum). C. canephora appeared to be more closely related to tomato (both from

the Euasterid clade) than to Arabidopsis (Eurosid clade). Computational

sequence comparison indicated a better conservation of the gene catalogs

between C. canephora and tomato than between C. canephora and Arabi-

dopsis. Such conservation of the gene repertoire associated with a similar

genome size and chromosome karyotype and architecture promoted the use

of tomato as a genomic model for Coffea species. Recently, another valuable

application to the C. canephora EST sequences was demonstrated in the

annotation of Coffea genomic sequences. In the absence of robust and

specific gene prediction software for Coffea genes, EST alignments were

used to validate and correct gene models predicted with gene prediction

algorithms trained with Eurosids genes (Guyot et al., 2009).

Beside the traditional exploitation of EST sequences, ESTs from the

Brazilian Coffee Genome Project and from the publicly available C. cane-

phora sequences were screened for the presence of TE insertions (Lopes

et al., 2008). However, so far, the impact of such elements on the Coffea

genus has not been investigated. In the work cited above, 140 transcripts

from 39,312 Coffea unigenes were found to contain TE insertions (mainly

long terminal repeat (LTR) retrotransposons) into protein coding regions

called ‘TE-cassettes’. A total of 26 putative TE-encoded sequences were

identified, suggesting that gene structures in Coffea may be modified

through TE insertion by a molecular evolution process (Lopes et al., 2008).

C. BAC LIBRARIES IN COFFEA

Large-insert genomic BAC libraries have become central and cost-effective

tools in genomic research. BAC libraries have been constructed for many

plant species including Arabidopsis (Mozo et al., 1998), rice (Wang et al.,

1995), tomato (Budiman et al., 2000), papaya (Ming et al., 2001), grape

(Adam-Blondon et al., 2005) and bread wheat (Safar et al., 2004). The appear-

ance of high-quality BAC libraries in plants enabled the development of a

wide variety of genomic applications including (i) the development of new

genetic markers through BAC end sequencing for comparative genomics and

to saturate genetic maps (Lai et al., 2006; Paux et al., 2006; Shultz et al., 2007);

(ii) the construction of physical maps through BAC end sequencing (Lai et al.,

2006; Mao et al., 2000) and high-throughput fingerprinting characterization

of BAC clones (Chen et al., 2002; Moroldo et al., 2008; Mozo et al., 1999;

Paux et al., 2008); (iii) the development of BAC-based FISH (BAC–FISH)

38 A. DE KOCHKO ET AL.

Author's personal copy

allowing the integration of cytogenetic and physical maps (Cheng et al., 2001a,

b); and (iv) the development of new strategies to facilitate chromosome and

WGS projects (AGI, 2000; IRGSP, 2005; Paux et al., 2008).

In the Coffea genus, five BAC libraries have recently been constructed so

far exclusively concerning the cultivated Coffea species C. arabica (two

libraries) and C. canephora (three libraries) (Table II).

The first C. arabica BAC library was constructed using the C. arabica

IAPAR-59 (Agronomic Institute of Parana, Brazil) cultivar (Noir et al.,

2004). This popular inbred commercial line in South America was selected

for its useful agronomical traits such as resistance to root-knot nematodes

(Meloidogyne exigua) and to leaf rust (Hemileia vastatrix) (Sera, 2001). This

library is composed of 80,813 BAC clones (average clone size 130Kb) and

represents approximately eightfold the 1300Mb C. arabica genome size

(Table II). Recently, this BAC library was used to construct a physical

map linked to the SH3 leaf rust resistance locus (Mahe et al., 2007). In

addition, the synteny between the SH3 physical contig and the Arabidopsis

reference genome was assessed. Relative conservation of the synteny was

established between C. arabica and four different chromosomal segments in

Arabidopsis. In the SH3 region, genomic sequences of Arabidopsis appeared

to be a valuable tool to develop approaches based on marker saturation.

The second C. arabica BAC library was constructed from the small bean

and high-cup quality C. arabica Cv. Tall Mokka (Jones et al., 2006). In

parallel, the Tall Mokka variety was crossed with the large bean, low-

cupping quality, Arabica variety Catimor to generate a segregating popula-

tion for the identification of QTLs. This BAC library consists of 52,416

clones with an average size of 94Kb, providing a 4� genome equivalent

(Table II). The main purpose of this BAC library is coffee improvement

through cloning of genes controlling disease resistance and other important

economic traits. In addition to applied research, the genomic library is

currently being used for comparative orthologous sequence analysis to

understand the evolution of the polyploid C. arabica genome from its two

diploid progenitor genomes (R. Ming, Personal Communication). Perspec-

tives in C. arabica genomics concern the development of a very high cover-

age (>10�) large-insert genomic library (>100,000 clones) that could

become the reference tool for physical mapping of this large complex

genome.

The construction of an integrated physical and genetic map will enable map-

based gene cloning and genome sequencing. In this process, the complete

physical maps of C. canephora and C. eugenioides, the ancestral diploid pro-

genitors of C. arabica, will be valuable resources to successfully complete the

physical map of C. arabica. The presence in the C. arabica genome of two

ADVANCES IN COFFEA GENOMICS 39

Author's personal copy

TABLE IIAvailable Coffea BAC Libraries (as of May 2009)

Species Genotypes/cultivarsNumber of

clonesAverage sizeof clones Cloning site

Coverageestimation* Authors/references

C. arabica IAPAR 59 80,813 130 kb HindIII ~8� Noir et al. (2004)C. arabica Hybrid Mokka–Catimor 52,416 94 kb HindIII ~3.8� Jones et al. (2006)C. canephora IF 126 55,296 135 kb HindIII ~10.6� Leroy et al. (2005)

C. canephora IF200 D.H. 36,864 150 kb HindII ~7.9� de Kochko A., LashermesP. and Wing R. A.

C. canephora IF200 D.H. 36,864 121 kb BstYI ~6.9� de Kochko A., LashermesP. and Wing R. A.

*Calculated for an average genome size of 1300Mb for C. arabica and 700Mb for C. canephora. D.H., doubled haploid.

Author's personal copy

highly similar subgenomes has greatly limited the use of chromosome walking

and positional cloning approaches. To overcome this complexity and to access

a wider range of genetic diversity, the research and development of resources

has also focused on the cultivated ancestral diploid progenitor C. canephora.

Different large-insert C. canephora BAC libraries have been produced.

The first C. canephora BAC library was constructed in 2005, using the

clone IF126, a hybrid between two distinct genetic groups within C. cane-

phora: Congolese and Guinean (Moschetto et al., 1996). This library con-

tained 55,296 clones (average size of 135Kb), providing coverage of about

10 genome equivalents (Leroy et al., 2005). Initially, it was used to investi-

gate the genome organization of genes involved in sugar metabolism (Leroy

et al., 2005). Later, the characterization of CcEIN4 and CcCCoAOMT genes

encoding an ethylene receptor and a caffeic acid O-methyltransferase,

respectively, demonstrated its value in identifying genes of agronomical

interest in C. canephora (Bustamante-Porras et al., 2007; Chabrillange

et al., 2006; Guyot et al., 2009). The genomic sequence of the C. canephora

CcEIN4 region was compared to that of several sequenced dicotyledonous

genomes (such as Arabidopsis, Medicago truncatula, tomato and grape) that

covered the Euasterid and Eurosid clades. Extensive conservation between

C. canephora and most of the genomes studied was demonstrated; locally the

gene content and order were shown to be highly conserved. The highest

degree of microcollinearity was found between C. canephora and V. vinifera,

which belong, respectively, to Euasterids and Eurosids, two clades that

diverged more than 114 million years ago (Guyot et al., 2009).

The two other C. canephora BAC libraries were constructed with the

genomic DNA of a doubled haploid (DH) line derived from the clone

IF200. The BAC libraries were prepared with two restriction enzymes Hin-

dIII and BstYI, respectively, and include 36,864 clones each (Table II). The

two libraries were completed with a total ~16 genome equivalents, providing

the basic resources for an international initiative for the construction of a

physical map of C. canephora. Sequencing of the 72,000 BAC ends was

recently initiated at Genoscope (France).

Finally, a C. eugenioides (the putative maternal parent of C. arabica) BAC

library was recently funded (http://www.fontagro.org/). It will be con-

structed with the aim of sequencing the BAC ends and constructing a

physical map. Altogether these genomic resources will rapidly promote the

development of homeologous comparative sequence analysis studies

between C. arabica and its diploid progenitors. Comparative sequence map-

ping could answer several questions concerning C. arabica such as what are

the consequences of interspecific hybridization on reshaping the genomes

and on genome evolution.

ADVANCES IN COFFEA GENOMICS 41

Author's personal copy

D. GENES AND METABOLISM

Research on coffee plants has mostly focused on agronomical improvement

of the plant. Breeders are especially interested in selecting C. arabica plants

with better disease resistance or C. canephora green beans that provide a

high-quality coffee beverage. Breeding for complex metabolic compounds is,

for the moment, less of a priority.

Coffee aroma formation is a very complex process, involving Maillard and

Strecker’s reactions (Maillard, 1913) as well as thermal degradation during

roasting (De Maria et al., 1994). Some aroma precursors, such as sucrose and

trigonelline, result in products with desirable flavor (Clifford, 1985; Dart and

Nursten, 1985; De Maria et al., 1996; Feldman et al., 1969), while others, such

as CGAs and caffeine, increase bitterness (Leloup et al., 1995; Voilley et al.,

1977). Enhancing Robusta cup quality would thus imply increasing sucrose

and trigonelline contents while decreasing CGA and caffeine contents.

Improvement of coffee cup quality first requires an understanding of the

mechanisms governing the accumulation, in beans, of the precursors that

generate coffee aroma and taste. Among the 20 or so families of compounds

involved in coffee cup quality (Flament, 1991), only four have been studied

for their biochemical pathway and its regulation. These studies concerned

purine alkaloids (caffeine), CGA (whose biosynthesis is controlled by the

phenylpropanoid pathway), lipids and sugars.

Due to its commercial importance, caffeine biosynthesis has been the

most widely studied of the alkaloid biosynthetic pathways in the coffee

plant. This purine alkaloid is produced in a variety of plants, including tea,

kola nuts, guarana berries, Yerba mate and cacao beans (Ashihara and

Crozier, 1999). In coffee plants, caffeine (1,3,7-trimethylxanthine) is synthe-

sized in three methylation steps involving S-adenosyl-L-methionine-

dependent N-methyltransferases plus a step involving elimination of the

ribose residue from xanthosine (Ashihara and Crozier, 1999). Structural

studies of xanthosine (X), methyltransferase (XMT) and 1,7-dimethylxanthine

methyltransferase (DXMT) revealed several elements that appear to be

critical for substrate selectivity. Serine-316 in XMT appears to play a

major role in the recognition of xanthosine (XR). Likewise, a change

from glutamine-161 in XMT to histidine-160 in DXMT may have catalytic

consequences. A change from phenylalanine-266 to isoleucine-266 in DXMT

is also likely to be crucial for the discrimination between mono and dimethyl

transferases in coffee (McCarthy and McCarthy, 2007). Several genes of this

pathway have been cloned and characterized (Mizuno et al., 2003a, b;

Ogawa et al., 2001; Uefuji et al., 2003). In vitro, recombinant methyltrans-

ferases obtained by heterologous expression of these genes converted

42 A. DE KOCHKO ET AL.

Author's personal copy

xanthosine into caffeine (Uefuji et al., 2003). Silencing and overexpression

approaches led to an overview of metabolic engineering of the caffeine bio-

synthetic pathway (Ogita et al., 2004). Tobacco plants (caffeine free in natural

conditions) that simultaneously expressed the three methylation genes also

produced caffeine (Uefuji et al., 2005).

CGA biosynthesis is also the subject of active studies today. In coffee

plants, among the many genes involved in the phenylpropanoid pathway,

CcCCoAOMT was the first gene described as being involved in CGA bio-

synthesis (Campa et al., 2003; Lepelley et al., 2007). It encodes a methyl-

transferase that catalyzes an early step of the lignin biosynthesis. A gene

encoding phenylalanine ammonia lyase (PAL), which catalyzes the first step

of the phenylpropanoid pathway, was then described (Mahesh et al., 2006).

The isolation of two C30H genes encoding hydroxylases – one HQT and one

HCT genes encoding transferases, all of which are involved in the last step of

CGA formation – proved that the two routes coexist for CGA biosynthesis

in coffee plants (Lepelley et al., 2007; Mahesh et al., 2007).

Reducing sugars (mainly glucose) and amino acids (free or associated with

proteins) are the most actively involved precursors of the aromatic (volatile)

compounds formed during Maillard’s reactions and provided by the coffee

cup. Despite the high importance of these compounds in coffee cup quality,

studies on their genomics have only begun recently. The first coffee cDNA

sequence from the sugar metabolism was cloned by Zhu and Goldstein

(1994). The corresponding gene encodes an a-galactosidase that degrades

galactomannans – complex cell wall polysaccharides – during seed germina-

tion. More recently, studies focused on the sucrose metabolism pathway.

Using available ESTs from the Brazilian Coffee Genome Project (http://

www.lge.ibi.unicamp.br/cafe/), Geromel et al. (2006) isolated two full-length

cDNAs (CaSUS1 and CaSUS2) expressed during C. arabica fruit develop-

ment. These cDNAs encode sucrose synthase isoforms and their contrasting

expression patterns in perisperm, endosperm and pericarp tissues pointing to

the central role of these enzymes in sugar metabolism during sucrose accu-

mulation in the coffee cherry. Furthermore, Privat et al. (2008) identified the

complete set of genes encoding enzymes involved in sucrose synthesis/degra-

dation in coffee beans. Transcriptomic and enzymatic analysis also revealed

the important role of vacuolar invertases in sucrose accumulation.

Beverage quality also depends on the length of the fruiting cycle, which

influences the amount of compounds that accumulate. C. arabica at least, as

a climacteric plant, shows a sudden increase in ethylene production and

respiration during fruit ripening (Pereira et al., 2005). Some trials were con-

ducted to act on coffee fruit ripening by adding exogenous ethylene (Rao, 1978;

Rao et al., 1978; Winston et al., 1992) but, until recently, no molecular studies

ADVANCES IN COFFEA GENOMICS 43

Author's personal copy

were carried out on ethylene biosynthesis or on its perception in coffee plants.

Bustamante-Porras et al. (2005) were the first to describe a gene sensitive to

ethylene encoding a transcription factor-like protein. The presence of cDNA

corresponding to this gene was observed in leaves and mature fruits. A more

extensive survey was then conducted to identify and characterize ethylene

receptors encoding genes in C. canephora. To date, three such genes have

been characterized (Bustamante-Porras et al., 2007a, b) named CcETR1,

CcEIN4 and CcETR2 based on their homology with the corresponding Arabi-

dopsis gene (O’Malley et al., 2005). The deduced encoded proteins presented

the expected features of ethylene receptors with many conserved domains and

more variable regions. Young transgenic Arabidopsis seedlings overexpressing

CcEIN4 cDNA showed a loss of gravitropic regulation. Genetic mapping of

CcETR1 and CcEIN4 revealed that both genes are localized on two different

linkage groups in wild Coffea species (Bustamante-Porras et al., 2007a). BAC–

FISH experiments indicated that the genome region carrying CcEIN4 is only

present as one copy in the C. canephora genome (Guyot et al., 2009).

In addition, expression studies using conventional techniques such as

Northern Blot or real-time quantitative PCR (qPCR) have also been per-

formed on coffee (Barsalobres-Cavallari et al., 2009; Cruz et al., 2009).

Generally, expression studies resulted in the identification of genes involved

in (i) plant response to biotic and abiotic stresses such as infection by the rust

fungus (H. vastatrix) (Andrade, 2008; Fernandez et al., 2004; Ganesh et al.,

2006; Petitot et al., 2008), coffee leaf miner (Mondego et al., 2005) and

drought (Marracini et al., 2008); (ii) fruit development and maturation

(Bustamante-Porras et al., 2007a; Hinniger et al., 2006; Salmona et al.,

2008; Simkin et al., 2008); and (iii) particular biosynthetic pathways such

as sugar (Geromel et al., 2006, 2008a, b; Privat et al., 2008), caffeine

(Koshiro et al., 2006), carotenoids (Simkin et al., 2008), storage proteins

and galactomannans (Marraccini et al., 2001; Pre et al., 2008).

Isolation of full-length cDNA sequences enables the heterologous expres-

sion of key proteins and the determination of their structure and function.

To this end, some enzymes of the phenylpropanoid pathway (Lepelley et al.,

2007; Mahesh et al., 2007) or involved in caffeine biosynthesis (McCarthy

and McCarthy, 2007) have been characterized.

A genome-wide survey of gene expression levels will enable a better under-

standing of how transcriptional networks are interconnected in order to

program different biological processes. In recent years, different techniques

such as microarrays and qPCR have been used for transcript profiling.

However, the building of a large EST data set is a prerequisite for the

development of microarrays (Alba et al., 2004; Mitreva and Mardis, 2009).

Initiatives for the generation of such arrays have already begun and some

44 A. DE KOCHKO ET AL.

Author's personal copy

results have started to appear (De Nardi et al., 2006; Privat et al., 2008).

Microarrays based on available coffee EST sequences were recently devel-

oped in France by a Nestle/IRD/CIRAD scientific consortium granted by

Genoplante. This project, named ‘PUCE CAFE’, is in part dedicated to

large-scale transcriptomic analysis during grain development of

C. canephora grown in different countries (Ecuador, French Guyana,

Reunion Island). At the end of the project, the generated arrays should be

available to the international scientific community working on coffee.

To reconstruct the metabolic pathways involved in the biosynthesis of the

main coffee seed storage compounds, Joet et al. (2009) conducted integrated

transcriptome and metabolite analyses. The work was performed by com-

bining real-time (RT)-PCR on 137 selected genes (of which 79 had not

previously been characterized in Coffea) and metabolite profiling.

Next-generation sequencing technologies might also be an interesting

approach to perform transcriptome profiling of coffee. In the same way as

serial analysis of gene expression (SAGE), shotgun libraries derived from

mRNA or small RNAs are deeply sequenced using these novel sequencing

technologies, and the counts (tags) corresponding to individual genes can be

used for quantification (Mardis, 2008; Shendure and Ji, 2008). However,

meaningful transcript profiling using these new sequencing technologies

would depend on the availability of a reference sequence for the coffee

genome. Although the generation of a high-quality reference sequence of

the coffee genome would have been prohibitively costly only a few years ago,

this can now be achieved rapidly and cheaply using hybrid assemblies of

different sequencing technologies (Darby and Hall, 2008). Initial efforts by

the coffee scientific community are underway to this end.

E. BIOINFORMATICS: COFFEE GENOMIC RESOURCES AVAILABLE ON THE

WORLD WIDE WEB

The volume of information related to molecular biology has grown expo-

nentially over the years due to rapid developments in genomic and molecular

research technologies. Bioinformatics plays a key role both in deciphering

genomic, transcriptomic and proteomic data generated by high-throughput

sequencing technologies and in managing information. The reduction in

computing costs, access to Internet, database technologies and computa-

tional methods are constantly being improved which facilitate access to and

the integration and retrieval of an increasing number of data.

In the past few years, genomic information has been rapidly accumulated

on Rubiaceae species and especially on those belonging to theCoffea genus. A

number of bioinformatics resources have been developed for coffee. However,

ADVANCES IN COFFEA GENOMICS 45

Author's personal copy

most bioinformatics resources are not accessible to the general public, as they

are proprietary and confined to the respective projects that fund them. Never-

theless, there are some notable exceptions of data sets that have been put into

the public domain. The most important coffee resources on the Web are

summarized in Table III.

The University of Trieste (Italy) has developed a coffee DNA database

(http://coffeedna.net), a user-friendly site that contains genomic information

on coffee, with particular focus on Arabica, including ESTs, marker infor-

mation, coffee germplasm and transposable elements.

IRD (France) has developed MoccaDB (http://moccadb.mpl.ird.fr/), an

interactive online database that manages annotated and/or mapped micro-

satellite markers in Rubiaceae (Plechakova et al., 2009). In its current

release, the database stores 638 markers, which were defined from 259

ESTs and 379 genomic sequences (Poncet et al., 2006, 2007). Marker infor-

mation was retrieved from 11 published works and completed with original

data on 132 microsatellite markers validated in the respective laboratories.

DNA sequences were derived from three Coffea species/hybrids. Microsa-

tellite markers were checked for redundancy, in vitro tested for cross-

amplification and diversity/polymorphism status in up to 38 Rubiaceae

species belonging to the Cinchonoideae and Rubioideae subfamilies. Func-

tional annotation was provided along with a number of associated markers

to describe metabolic pathways. Users can search the database for markers,

sequences, maps, or information on diversity through multioption query

forms. The retrieved data can be browsed and downloaded, along with the

protocols used, using a standard web browser. MoccaDB also includes

bioinformatics tools (CMap viewer and local BLAST) and hyperlinks to

related external data sources (NCBI GenBank and PubMed, SOL Genomics

Network (SGN) database). SGN (http://sgn.cornell.edu/) is mainly devoted

to Solanaceae genomics and includes other closely related Euasterids species

such as coffee and snapdragon (Mueller et al., 2005). It currently provides

the most complete public unigene builds of coffee ESTs from over 55,000

Nestle ESTs and almost 9000 ESTs from IRD (both from C. canephora). The

unigene set is thoroughly annotated using BLAST against Arabidopsis and

GenBank datasets, InterproScan protein domains and Gene Ontology

terms. SGN also maintains a database of all loci that have been experimen-

tally characterized among Euasterids. The database stores locus symbols and

aliases, free text descriptions, relation to mutants (with mutant images),

references to the literature, Gene and Plant Ontology annotations and

other information (Menda et al., 2008). Currently, there are almost 6000

loci for all Euasterids in the database but only 112 are coffee loci. Research-

ers from the community can request editor privileges for a locus that

46 A. DE KOCHKO ET AL.

Author's personal copy

TABLE IIICoffee Resources Available on the Web

Name Description Data type Reference

The Cenicafe coffeedatabases

CENICAFE – Colombia

An EST database containing 32,000 coffee (C. arabica,C. liberica) ESTs. It also includes 6000 Beauvariabassiana ESTs and 4000 Hypothenemus hampei (coffeeberry borer) ESTs. Login required for access to data.

http://bioinformatics.cenicafe.org/

EST Cristancho et al.(2006)

TropGENE DBCIRAD – France

A crop information system created to store genetic andgenomic information about tropical crops. A modulefor coffee has been implemented that manage dataabout C. canephora BAC library and SSR markers(55,296 and 253, respectively).

http://tropgenedb.cirad.fr/en/coffee.html

BAC ends, SSR Leroy et al. (2005);Ruiz et al.(2004)

Ccmb coffee databaseCCMB – India

A database for the molecular characterization of theavailable coffee genepool in India and generating basicmaterials (molecular markers/mapping populations)as a prelude to molecular (DNA) marker-based coffeebreeding program. Still under construction.

http://www.ccmb.res.in/coffeegermplasm/index.htm

Molecular markers,diversity, genetic maps

The Brazilian CoffeeGenome EST project

Brazilian CoffeeResearch andDevelopmentConsortium (CBP&D-Cafe) – Brazil

The database and the interface have been developed tosupport the Brazilian Coffee Genome EST project. Itmanages 130,792, 12,381 and 10,566 sequences for C.arabica, C. canephora and C. racemosa, respectively,(37 cDNA libraries) assembled into 33,000 unigenes.Login required for access to data.

http://www.lge.ibi.unicamp.br/cafe/

EST, SSR, SNP,transposable elements

Vieira et al. (2006)

(continues )

Author's personal copy

TABLE III (continued )

Name Description Data type Reference

CoffeeDNAUniversity of Trieste –Italy

A database for coffee genomics (13,686 ESTs, 266Microsatellites, 43 retrotransposon, taxons). Loginrequired for access to private information and certainfunctions.

http://www.coffeedna.net/

EST, SSR, retrotransposons

MoccaDBIRD – France

A comprehensive web resource for researchers workingon Coffea genus, the Rubiaceae family or relatedspecies. It manages information about EST-SSR andSSR markers (to date, 638 markers). Markers werechecked for redundancy, in vitro tested for cross-amplification and diversity over up to 38 Rubiaceaespecies. MoccaDB includes Cmap and BLAST toolsand links to other related databases (e.g., SGN,NCBI).

http://moccadb.mpl.ird.fr/

SSR, DNA sequences,genetic map, diversitydata

Plechakova et al.(2009)

The SOL GenomicsNetwork (SGN)

Cornell University –USA

The SOL Genomics Network is a genomics informationresource for the Solanaceae family and related familyin the Euasterid clade with the aim of building acomparative bioinformatics platform. SGN currentlyhouses map and marker data for Solanaceae species, alarge expressed sequence tag collection withcomputationally derived unigene sets, an extensivedatabase of phenotypic information for a mutagenizedtomato population, and associated tools such as real-time quantitative trait loci.

About 47,000 coffee (C. canephora var robusta) ESTsequences released by Cornell University and NestleS.A. are freely accessible.

http://sgn.cornell.edu/

EST, COS, map,phenotypes

Mueller et al.(2005)

Author's personal copy

interests them and log in to edit and expand the annotations with simple

web-based tools. While there are almost 100 annotators for more than 250

tomato loci, coffee researchers still need to come forward to claim their loci

of interest.

CoffeaCyc (http://solcyc.sgn.cornell.edu/) is a pathway database for Cof-

fea, maintained at SGN as part of the SolCyc system and based on the SGN

unigene assemblies and annotation. The database is currently based solely

on C. canephora and comprises 200 pathways and 781 distinct compounds.

Although little manual curation has been done on the database, some

important coffee-specific pathways, such as caffeine biosynthesis, have

been curated manually. The system is based on the Pathway Tools software

(Paley and Karp, 2006), and most popular Pathway Tools features are

available, such as the overview diagram, the Omics Viewer – which can be

used to overlay expression and other data on the diagram – and the

advanced search and browsing functions.

IV. TOWARD THE WHOLE GENOME SEQUENCINGOF COFFEE

The coffee plant is one of themajor commodities inmany tropical countries but

for several reasons its genetics and genomics have not been on the cutting edge.

One reason was its perennial status and the need to wait at least 4 years from

seed to seed. Another was the extreme commercial value of Arabica and its low

diversity. Native to Ethiopia, this species underwent two successive genetic

bottlenecks: one its genetic origin (amphiploidy) and one created by human

agriculture, i.e., the limited number of plants planted in early plantations. The

final consequence was a very low level of polymorphism available for breeding.

In coffee, the development of molecular markers (the first step in coffee

genomics) was essential to (i) assess genetic diversity within the two main

cultivated species C. canephora and C. arabica; (ii) analyze the diversity of

wild-related species and detail phylogenetic relationships within the genus;

(iii) detect introgressions; (iv) identify QTLs; and (vi) characterize major

genes of interest. The main result was the significant difference between the

two cultivated species with respect to their genetic diversity. C. canephora,

diploid, with a wide geographic East to West distribution (from Uganda to

Guinea and southward to Angola) has a high level of genetic diversity. The

narrow genetic base of the cultivated C. arabica, amphidiploid, although

higher among wild genotypes, appears clearly with all type of markers. This

is due to the bottleneck constituted by its genetic origin, an interspecific

hybridization involving unreduced gametes or followed by a chromosome

ADVANCES IN COFFEA GENOMICS 49

Author's personal copy

doubling (Lashermes et al., 1999). Molecular markers and cytogenetic data

have confirmed the early hypothesis of a hybridization involving two species

related to the current C. canephora and C. eugenioides. Molecular markers

were also used to construct genetic maps and identify QTLs.

A sequence-tagged genetic map is essential for genome assembly and for

tagging target genes of interest. A preliminary linkage map of C. Arabica has

been constructed using AFLP markers (Pearl et al., 2004). Linkage mapping

in coffee requires more efforts and is more costly than in annual crops due to

the longer generation time, a low polymorphism rate, particularly in Arabica

coffee and the absence of a large collection of DNA markers and genomic

sequences. A high C. canephora density map will be available soon and will

be helpful for integration of genetic and physical maps and assemble the

genome sequence. When a draft sequence of the C. canephora genome is

available, SSR markers from this diploid genome will be used to map the

Arabica genome. A high-density map of C. eugenioides is also needed to

assemble the Arabica genome. The species need to be compared to obtain

good transferability of markers from one species to another.

Perfect transferability across the Coffea genus, whatever the type of PCR-

based molecular markers used, has already been demonstrated (Poncet

et al., 2007). It is possible to extend this transferability to the Rubiaceae

family. In fact, the genomic data available in public databases are mostly

derived from the Coffea genus. This easy transferability of Coffea markers

to other Rubiaceae genera makes the Coffea genus a model genus for the

whole Rubiaceae family. Recently, using COS markers, a comparison was

made with related families like Solanaceae and, to a certain extent, it should

also be possible to extrapolate to more distant species. Evaluation of

genome evolution/conservation requires the transfer of information from

model species used as references to ‘orphan’ genomes lacking available

resources, which could facilitate the identification of genes of interest

through map-based cloning strategies. For years, it was assumed that the

synteny decreased progressively, then disappeared along with species diver-

gence. In coffee, it was shown that within genomes, in particular in areas of

importance such as the CcEIN4 gene region, the genomic organization

could remain more conserved than expected over longer evolutionary per-

iods. An unexpected microcolinearity was found, for the region containing

this ethylene receptor gene, between Coffea and Vitis, two genera not

considered to display strong genetic similarities (Guyot et al., 2009). Simi-

larly, with the rapid advance of genomic and transcriptomic projects, large

amounts of sequence information are now available. Plant genomists have

been experimenting alternative approaches to identify genes underlying all

types of traits and biological processes.

50 A. DE KOCHKO ET AL.

Author's personal copy

Over the last decade, the rate of generation of genome sequence data has

far outstripped our ability to ascertain gene function. At the gene level,

several genes have been isolated and characterized and in a more global

way, several studies have focused on quality-related genes and their involve-

ment in storage compound biosynthesis and accumulation during endo-

sperm development. The aid provided by next-generation sequencing

technologies will certainly advance functional analysis beyond model sys-

tems and permit a massive acceleration of our ability to assign biological

roles to genes. Thus, further characterization of gene networks in coffee will

certainly help to identify new targets for manipulation of physiological,

biochemical and developmental processes in this very important crop spe-

cies. The Coffea genome community now has all the competences and

capacities it needs to tackle metabolic pathways. These resources have

enabled basic knowledge to be acquired about Coffea genomes that is

essential for the ongoing C. canephora large-scale genomic projects and

comparative genomics.

Comparative genomic studies are essential to investigate the conserva-

tion of gene order between closely and distantly related plant species.

Large-insert genomic libraries are primary genomic resources for posi-

tional cloning, physical mapping, integration of genetic and physical

maps and sequencing of the genomes. The most efficient physical mapping

approach is direct fingerprinting of BAC clones followed by anchoring

mapped ESTs or other types of DNA markers to confirm the contig maps

and integrate genetic and physical maps. The creation of a high-coverage

C. canephora BAC library from a DH genotype, and its forthcoming BAC

end sequencing will (i) serve as support for resources for the whole C.

canephora genome sequencing project and (ii) produce a valuable dataset

for comparative genomics within the Rubiaceae family, since few non-

coffee data are available today. Nevertheless, in the absence of a complete

reference sequence, ESTs remain a key resource to understand the func-

tion and evolution of the Coffea genomes and to develop plant-breeding

approaches.

Nonetheless, efforts are still required to develop further resources and

tools in Coffea including (i) the generation of large sets of ESTs and full-

length cDNA in cultivated and wild Coffea species; (ii) the development of

genomic BAC libraries in C. eugenioides and closely related wild Coffea

species; and (iii) the WGS of C. canephora as a model to understand genome

structure and evolution in the Coffea genus and the Rubiaceae family.

Unfortunately, despite extensive efforts to generate large sets of ESTs in

the past few years, only a limited number of sequences are publicly available

so far.

ADVANCES IN COFFEA GENOMICS 51

Author's personal copy

Although the genomic data available on coffee plants are rapidly increas-

ing, they are often isolated, and very few sequence resources are freely

accessible. The SGN is an example of a database that is rapidly developing

into a comprehensive resource for comparative biology between members of

the Solanaceae family. This resource includes a great number of data of

many different types. Due to the relative genetic proximity of the Solanaceae

and Rubiaceae families (Euasterids), as reported earlier, data from SOL can be

easily transferred to coffee trees. With the increasing use of new generation

sequencing technologies, the availability of large quantities of biological infor-

mation frommultiple web resources will continue to explode. Furthermore, the

nature of the data is becoming increasingly diverse. Although all these

resources are highly informative individually, efficiently integrating and com-

paring data from a range of heterogeneous sources has become crucial to

accelerate genomic research. The management and integration of these

resources will require increasingly sophisticated electronic mechanisms. One

solution to facilitate the cross-referencing of data sources is the use of con-

trolled structured vocabularies (e.g., Gene Ontology, Plant Ontology) and

standardized data formats. With this new generation of sequencing technolo-

gies, bioinformatics in general, but more specifically in Coffee with the coffee

genome sequencing project, is facing new challenges to better manage, process

and analyze these large quantities of biological information.

The genomes of C. canephora and C. arabica are the targets of the coffee

genome sequencing project. Given that C. canephora and C. eugenioides are

progenitors of C. arabica, an ideal approach could have been to sequence

these two diploid genomes first. However, even with the reduced cost of the

next generation of sequencing technologies, sequencing and annotating a

plant genome is not a trivial project and is still costly. Ultimately, the

genome of C. eugenioides will be sequenced for a better assembly and

annotation of the C. arabica genome. This project will enable a better

understanding of the dynamics of genome evolution after the hybridization

event between the two progenitors. The entire BAC libraries can be finger-

printed using an automated high-throughput fingerprinting technique.

Sequencing the ends of BAC clones is an important step for any genome

sequencing project. The paired BAC end sequences are critical for building

scaffolds of the whole genome shotgun sequences of coffee genomes. Physi-

cal mapping information (contig and chromosomal location) of each BAC

will be combined with BAC end sequence data to construct a comprehensive

assembly of the coffee genomes. Sequencing a large set of ESTs from various

organs and tissues at different developmental stages and stress conditions is

still the most cost-effective way to validate expressed genes before the first

release of the C canephora genome.

52 A. DE KOCHKO ET AL.

Author's personal copy

The currently available Roche 454 Titanium sequencing technology

makes it feasible to sequence the diploid C. canephora genome using the

whole genome shotgun approach. With financial support from funding

agencies, sequencing the tetraploid C. arabica genome is also achievable.

The emerging SMRT sequencing technology from Pacific Biosciences will

ensure the sequencing of all three target coffee genomes. The BAC-by-BAC

genome sequencing approach is not suitable for sequencing the medium-

sized genomes of Robusta (~700Mb) and Arabica (~1.2Gb), because of the

high cost and long period of time required to complete the genomes.

Specifically, genomic DNA can be isolated from young leaf nuclei of the

selected Robusta coffee genotype, thereby reducing contamination of orga-

nellar DNA. To provide a near-saturated genome coverage and to reduce

the cost, it is assumed that generating a 20� genome coverage of regular

Roche 454 Titanium runs with 400 bp reads and a 10X genome coverage of

paired end 454 runs with 200 bp reads is sufficient for a reasonable genome

assembly and annotation. The whole genome shotgun sequence can be

assembled using the recently developed public domain software packages

(ARACHINE, MIT; PHUSION, Sanger Center, JAZZ, JGI and GS de

novo Assembler, Roche 454). Annotation of the whole genome shotgun

sequences focuses on the identification of genes, but also includes searches

for uncharacterized transposable elements. Coffee unigenes from cDNA will

be aligned with the unmasked genome assembly, which can be used in

training ab initio gene prediction software. Finally, only 10 years after the

first genome sequence of Arabidopsis, the Coffee community is ready for a

new challenge: entering true Coffee genomics.

ACKNOWLEDGEMENTS

The authors express their thanks to Drs. Maud Lepelley, James McCarthy

and Isabelle Privat for their critical reading of the manuscript. A. C. Andrade

acknowledges FAPEMIG, CNPq and FINEP for financial support.

REFERENCES

Adam-Blondon, A.F., Bernole, A., Faes, G., Lamoureux, D., Pateyron, S., Grando,M.S., et al., 2005. Construction and characterization of BAC libraries frommajor grapevine cultivars. Theor. Appl. Genet. 110, 1363–1371.

Aga, E., Bekele, E., Bryngelsson, T., 2005. Inter-simple sequence repeat (ISSR)variation in forest coffee trees (coffea arabica L.) Populations from ethiopia.Genetica 124, 213–221.

ADVANCES IN COFFEA GENOMICS 53

Author's personal copy

Aga, E., Bryngelsson, T., 2006. Inverse sequence-tagged repeat (ISTR) analysis ofgenetic variability in forest coffee (coffea arabica L.) from ethiopia. Genet.Resour. Crop Evol. 53, 721–728.

Aga, E., Bryngelsson, T., Bekele, E., Salomon, B., 2003. Genetic diversity offorest arabica coffee (coffea arabica L.) in ethiopia as revealed byrandom amplified polymorphic DNA (RAPD) analysis. Hereditas138, 36–46.

AGI, 2000. Analysis of the genome sequence of the flowering plant arabidopsisthaliana. Nature 408, 796–815.

Akaffou, D.S., Ky, C.-L., Barre, P., Hamon, S., Louarn, J., Noirot, M., 2003.Identification and mapping of a major gene (ft1) involved in fructificationtime in the interspecific cross coffea pseudozanguebariae x C. Liberica var.Dewevrei: impact on caffeine content and seed weight. Theor. Appl. Genet.106, 1486–1490.

Alba, R., Fei, Z.J., Payton, P., Liu, Y., Moore, S.L., Debbie, P., et al., 2004. ESTs,cDNA microarrays, and gene expression profiling: tools for dissecting plantphysiology and development. Plant J. 39, 697–714.

Andrade, A.C., 2008. The coffee genome project and plant disease resistance. Trop.Plant Pathol. 33, 22–25.

Ashihara, H., Crozier, A., 1999. Biosynthesis and metabolism of caffeine andrelated purine alkaloids in plants. Adv. Bot. Res. 30, 117–205.

Barre, P., Layssac, M., D’hont, A., Louarn, J., Charrier, A., Hamon, S., et al., 1998.Relationship between parental chromosomic contribution and nuclearDNA content in the coffee interspecific hybrid C-pseudozanguebariae xC-liberica var ‘dewevrei’. Theor. Appl. Genet. 96, 301–305.

Barsalobres-Cavallari, C., Severino, F., Maluf, M., Maia, I., 2009. Identification ofsuitable internal control genes for expression studies in Coffea arabicaunder different experimental conditions. Bmc Mol. Biol. 10, 1.

Baruah, A., Naik, P., Hendre, S., Rajkumar, R., Rajendrakumar, P., Aggarwal, R.K., 2003. Isolation and characterization of nine microsatellite markers fromcoffea arabica L., showing wide cross-species amplifications. Mol. Ecol.Notes 3, 647–650.

Bernatzky, R., Tanksley, S.D., 1986. Toward a saturated linkage map in tomatobased on isozymes and random cDNA sequences. Genetics 112, 887–898.

Berthou, F., Mathieu, C., Vedel, F., 1983. Chloroplast and mitochondrial DNAvariation as indicator of phylogenetic relationships in the genus coffea L.Theor. Appl. Genet. 65, 77–84.

Berthou, F., Trouslot, P., Hamon, S., Vedel, F., Quetier, F., 1980. Analyse enelectrophorese du polymorphisme biochimique des cafeiers: variation enzy-matique dans dix-huit populations sauvages. Cafe Cacao The 24, 313–326.

Boguski, M.S., Tolstoshev, C.M., Bassett, D.E., Jr., 1994. Gene discovery in dbEST.Science 265, 1993–1994.

Bonierbale, M., Plaisted, R., Tanksley, S.D., 1988. RFLP maps based on a commonset of clones reveal modes of chromosomal evolution in potato and tomato.Genetics 120, 1095–1103.

Budiman, M.A., Mao, L., Wood, T.C., Wing, R.A., 2000. A deep-coverage tomatoBAC library and prospects toward development of an STC framework forgenome sequencing. Genome Res. 10, 129–136.

Bustamante-Porras, J., Campa, C., Poncet, V., Noirot, M., Leroy, T., Hamon, S., et al.,2007a. Molecular characterization of an ethylene receptor gene (CcETR1) incoffee trees, its relationship with fruit development and caffeine content. Mol.Genet. Genomics 277, 701–712.

54 A. DE KOCHKO ET AL.

Author's personal copy

Bustamante-Porras, J., Noirot, M., Campa, C., Hamon, S., de Kochko, A., 2005.Isolation and characterization of a coffea canephora ERF-like cDNA. Afr.J. Biotechnol. 4, 157–159.

Bustamante-Porras, J., Poncet, V., Campa, C., Noirot, M., Hamon, S., de Kochko,A., 2007b. Characterization of three ethylene receptor genes in coffeacanephora Pierre. In: Ramina, A., Chang, C., Giovannoni, J., Klee, H.,Perata, P., Woltering, E. (Eds.), Advances in Plant Ethylene Research.Dordrecht, Springer.

Campa, C., Noirot, M., Kochko De, A., Bourgeois, M., Pervent, M., Ky, C.-L.,et al., 2003. Genetic mapping of a caffeoyl-coenzyme A 3-O-methyltransferase gene in coffee trees. Impact on chlorogenic acid content.Theor. Appl. Genet. 107, 751–756.

Chabrillange, N., Talamond, P., Moreau, C., Le Gal, L., Bourgeois, M., Hamon, S.,et al., 2006. Isolation and first characterization of two O-Methyltransferasegenes involved in phenylpropanoid pathway in Coffea canephora. 19thInternational Conference on Coffee Science. ASIC, Montpellier, France.

Chen, M., Presting, G., Barbazuk, W.B., Goicoechea, J.L., Blackmon, B., Fang, G.,et al., 2002. An integrated physical and genetic map of the rice genome.Plant Cell 14, 537–545.

Cheng, Z., Buell, C.R., Wing, R.A., Gu, M., Jiang, J., 2001a. Toward a cytologicalcharacterization of the rice genome. Genome Res. 11, 2133–2141.

Cheng, Z., Presting, G.G., Buell, C.R., Wing, R.A., Jiang, J., 2001b. High-resolutionpachytene chromosome mapping of bacterial artificial chromosomes anchoredby genetic markers reveals the centromere location and the distribution ofgenetic recombination along chromosome 10 of rice. Genetics 157, 1749–1757.

Chevalier, A., 1947. Les cafeiers du globe. III Systematique des cafeiers et fauxcafeiers. Maladies et insectes nuisibles, Paris, Paul Lechevalier.

Clarindo, W.R., Carvalho, C.R., 2008. First coffea arabica karyogram showing thatthis species is a true allotetraploid. Plant Syst. Evol. 274, 237–241.

Clifford, M.N., 1985. Chemical and physical aspects of green coffee and coffee pro-ducts. In: Clifford, M.N., Willson, K.C. (Eds.), Coffee: Botany, Biochemistryand Production of Beans and Beverage. Croom Helm Ltd, London.

Combes, M.C., Andrzejewski, S., Anthony, F., Bertrand, B., Rovelli, P., Graziosi,G., et al., 2000. Characterization of microsatellite loci in coffea arabica andrelated coffee species. Mol. Ecol. 9, 1178–1180.

Coste, R., 1955. Les cafeiers et le cafe dans le monde, Paris, Larose.Coulibaly, I., Louarn, J., Lorieux, M., Charrier, A., Hamon, S., Noirot, M., 2003a.

Pollen viability restoration in a coffea canephora P. and C-heterocalyxstoffelen backcross. QTL identification for marker-assisted selection.Theor. Appl. Genet. 106, 311–316.

Coulibaly, I., Noirot, M., Lorieux, M., Charrier, A., Hamon, S., Louarn, J., 2002.Introgression of self-compatibility from coffea heterocalyx to the cultivatedspecies coffea canephora. Theor. Appl. Genet. 105, 994–999.

Coulibaly, I., Revol, B., Noirot, M., Poncet, V., Lorieux, M., Carasco-Lacombe, C.,et al., 2003. AFLP and SSR polymorphism in a coffea interspecific back-cross progeny [(C-heterocalyx x C-canephora) x C-canephora]. Theor. Appl.Genet. 107, 1148–1155.

Couturon, E., 1986. The early sorting of spontaneous haploid plants of coffeacanephora pierre. Cafe Cacao The 30, 171–176.

Cristancho, M., Rivera, L., Orozco, C., Chalarca, A., Mueller, L., 2006. Develop-ment of a Bioinformatics Platform at the Colombia National CoffeeResearch Center. Association Scientifique Internationale du Cafe (ASIC),Montpellier, France.

ADVANCES IN COFFEA GENOMICS 55

Author's personal copy

Cros, J., Combes, M.C., Chabrillange, N., Duperray, C., Monnot Des Angles, A.,Hamon, S., 1995. Nuclear DNA content in the subgenus coffea (rubiaceae):inter- and intra-specific variation in african species. Can. J. Bot. 73, 14–20.

Cros, J., Combes, M.C., Trouslot, P., Anthony, F., Hamon, S., Charrier, A., et al.,1998. Phylogenetic analysis of chloroplast DNA variation in coffea L. Mol.Phylogenet. Evol. 9, 109–117.

Cruz, F., Kalaoun, S., Nobile, P., Colombo, C., Almeida, J., Barros, L.M.G., et al.,2009. Evaluation of coffee reference genes for relative expression studies byquantitative real-time RT-PCR. Mol. Breed. 23, 607–616.

Cubry, P., Musoli, P., Legnate, H., Pot, D., De Bellis, F., Poncet, V., et al., 2008.Diversity in coffee assessed with SSR markers: structure of the genus coffeaand perspectives for breeding. Genome 51, 50–63.

Darby, A.C., Hall, N., 2008. Fast forward genetics. Nat. Biotechnol. 26, 1248–1249.Dart, S.K., Nursten, H.E., 1985. Volatile components. In: Clarke, R.J., Macrae, R.

(Eds.), Coffee. Elsevier Applied Science, London.De Maria, C.A.B., Trugo, L.C., Aquino Neto, F.R., Moreira, R.F.A., 1994. Arabi-

nogalactan as a potential furfural precursor in roasted coffee. Int. J. FoodSci. Technol. 29, 559–562.

De Maria, C.A.B., Trugo, L.C., Neto, F.R.A., Moreira, R.F.A., Alviano, C.S., 1996.Composition of green coffee water-soluble fractions and identification ofvolatiles formed during roasting. Food Chem. 55, 203–207.

De Nardi, B., Dreos, R., Del Terra, L., Martellossi, C., Asquini, E., Tornincasa,P., et al., 2006. Differential responses of coffea arabica L. Leaves androots to chemically induced systemic acquired resistance. Genome 49,1594–1605.

De Oliveira, A.C.B., Sakiyama, N.S., Caixeta, E.T., Zambolim, E.M., Rufino, R.J.N., Zambolim, L., 2007. Partial map of coffea arabica L. and recovery ofthe recurrent parent in backcross progenies. Crop Breed. Appl. Biotechnol.7, 196–203.

Dessalegn, Y., Herselman, L., Labuschagne, M.T., 2008. AFLP analysis amongethiopian arabica coffee genotypes. Afr. J. Biotechnol. 7, 3193–3199.

Devos, K.M., Moore, G., Gale, M.D., 1995. Conservation of marker synteny duringevolution. Euphytica 85, 367–372.

Dufour, M., Hamon, P., Noirot, M., Ristrerucci, A.M., Brottier, P., Vico, V., et al.,2001. Potential use of SSR markers for Coffea spp. genetic mapping. In:ASIC (Ed.), 19th International Scientific Colloquium on Coffee. Trieste,Italy.

FAO Statistical Yearbook, 2004. Vol. 1/1 Table C.10: Most important imports andexports of agricultural products (in value terms) (2004). FAO StatisticsDivision. http://www.fao.org/statistics/yearbook

Feldman, J.R., Ryder, W.S., Kung, J.T., 1969. Importance of nonvolatile compoundsin the flavor of coffee. J. Agric. Food Chem. 17, 733–739.

Fernandez, D., Santos, P., Agostini, C., Bon, M.-C., Petitot, A.-S., Silva, C., et al.,2004. Coffee (coffea arabica L.) genes early expressed during infection bythe rust fungus (hemileia vastatrix). Mol. Plant Pathol. 5, 527–536.

Ferrao, M.A.G., Da Fonseca, A.F.A., Ferrao, R.G., Barbosa, W.M., Souza, E.M.R., 2009. Genetic divergence in conilon coffee revealed by RAPD markers.Crop Breed. Appl. Biotechnol. 9, 67–74.

Flament, I., 1991. Volatile compounds. In: Maarse, H. (Ed.), Foods and Beverages.Dekker, New York.

Fulton, T.M., Van Der Hoeven, R., Eannetta, N.T., Tanksley, S.D., 2002. Identifica-tion, analysis, and utilization of conserved ortholog set markers for com-parative genomics in higher plants. Plant Cell 14, 1457–1467.

56 A. DE KOCHKO ET AL.

Author's personal copy

Ganesh, D., Petitot, A.S., Silva, M.C., Alary, R., Lecouls, A.C., Fernandez, D., 2006.Monitoring of the early molecular resistance responses of coffee (coffeaarabica L.) to the rust fungus (hemileia vastatrix) using real-time quantita-tive RT-PCR. Plant Sci. 170, 1045–1051.

Geromel, C., Ferreira, L.P., Bottcher, A., Pot, D., Pereira, L.F.P., Leroy, T., et al.,2008a. Sucrose metabolism during fruit development in coffea racemosa.Ann. Appl. Biol. 152, 179–187.

Geromel, C., Ferreira, L.P., Davrieux, F., Guyot, B., Ribeyre, F., Scholz, M.B.D.,et al., 2008b. Effects of shade on the development and sugar metabolism ofcoffee (coffea arabica L.) fruits. Plant Physiol. Biochem. 46, 569–579.

Geromel, C., Ferreira, L.P., Guerreiro, S.M.C., Cavalari, A.A., Pot,D., Pereira, L.F.P.,et al., 2006. Biochemical and genomic analysis of sucrose metabolism duringcoffee (Coffea arabica) fruit development. J. Exp. Bot. 57, 3243–3258.

Gichuru, E.K., Agwanda, C.O., Combes, M.C., Mutitu, E.W., Ngugi, E.C.K.,Bertrand, B., et al., 2008. Identification of molecular markers linked to agene conferring resistance to coffee berry disease (Colletotrichum kahawae)in Coffea arabica. Plant Pathol. 57, 1117–1124.

Gomez, C., Dussert, S., Hamon, P., Hamon, S., Kochko, A.D., Poncet, V., 2009.Current genetic differentiation of Coffea canephora Pierre ex A. Froehn inthe Guineo-Congolian African zone: cumulative impact of ancient climaticchanges and recent human activities. BMC Evol. Biol. 9, 167.

Guyot, R., De La Mare, M., Viader, V., Hamon, P., Coriton, O., Bustamante-Porras, J., et al., 2009. Microcollinearity in an ethylene receptor codinggene region of the Coffea canephora genome is extensively conserved withVitis vinifera and other distant dicotyledonous sequenced genomes. BMCPlant Biol. 9, 22.

Hamon, P., Siljak-Yakovlev, S., Srisuwan, S., Robin, O., Poncet, V., Hamon, S.,et al., 2009. Physical mapping of rDNA and heterochromatin in chromo-somes of 16 Coffea species: a revised view of species differentiation. Chro-mosome Res. 17, 291–304.

Hendre, P.S., Phanindranath, R., Annapurna, V., Lalremruata, A., Aggarwal, R.K.,2008. Development of new genomic microsatellite markers from robustacoffee (Coffea canephora Pierre ex A. Froehner) showing broad cross-species transferability and utility in genetic studies. BMC Plant Biol. 8, 51.

Herrera, P.J.C., Alvarado, A.V., Cortina, G.H.A., Combes, M.C., Romero, G.G.,Lashermes, P., 2009. Genetic analysis of partial resistance to coffee leaf rust(Hemileia vastatrix Berk & Br.) introgressed into the cultivated Coffeaarabica L. from the diploid C. canephora species. Euphytica 167, 57–67.

Herrera, J.C., D’hont, A., Lashermes, P., 2007. Use of fluorescence in situ hybridiza-tion as a tool for introgression analysis and chromosome identification incoffee (Coffea arabica L.). Genome 50, 619–626.

Hinniger, C., Caillet, V., Michoux, F., Ben Amor, M., Tanksley, S., Lin, C.W., et al.,2006. Isolation and characterization of cDNA encoding three dehydrinsexpressed during Coffea canephora (Robusta) grain development. Ann. Bot.97, 755–765.

IRGSP, 2005. The map-based sequence of the rice genome. Nature 436, 793–800.Joet, T., Laffargue, A., Salmona, J., Doulbeau, S., Descroix, F., Bertrand, B., et al.,

2009. Metabolic pathways in tropical dicotyledonous albuminous seeds:Coffea arabica as a case study. New Phytol. 182, 146–162.

Jones, M.R., Byers, A., Skelton, R.L., Yu, Q., Nagai, C., Moore, P.H., et al., 2006.Construction of an arabica coffee bac library for molecular dissection of anallotetraploid genome. 21st International Conference on Coffee Science(ASIC), B211, p. 49.

ADVANCES IN COFFEA GENOMICS 57

Author's personal copy

Koshiro, Y., Zheng, X.Q., Wang, M.L., Nagai, C., Ashihara, H., 2006. Changes incontent and biosynthetic activity of caffeine and trigonelline during growthand ripening of Coffea arabica and Coffea canephora fruits. Plant Sci. 171,242–250.

Kulikova, O., Gualtieri, G., Geurts, R., Kim, D.J., Cook, D., Huguet, T., et al., 2001.Integration of the FISH pachytene and genetic maps of Medicago trunca-tula. Plant J. 27, 49–58.

Ky, C.-L., Doulbeau, S., Guyot, B., Akaffou, S., Charrier, A., Hamon, S., et al.,2000. Inheritance of coffee bean sucrose content in the interspecific crossCoffea pseudozanguebariae x Coffea liberica ‘dewevrei’. Plant Breed. 119,165–168.

Ky, C.L., Louarn, J., Dussert, S., Guyot, B., Hamon, S., Noirot, M., 2001. Caffeine,trigonelline, chlorogenic acids and sucrose diversity in wild Coffea arabicaL. and C. canephora P. accessions. Food Chem. 75, 223–230.

Ky, C.L., Louarn, J., Guyot, B., Charrier, A., Hamon, S., Noirot, M., 1999. Rela-tions between and inheritance of chlorogenic acid contents in an interspe-cific cross between Coffea pseudozanguebariae and C. liberica var Dewevrei.Theor. Appl. Genet. 98, 628–637.

Lefebvre-Pautigny et al., (submitted). Tree Genet. Genomes, in press,DOI: 10.1007/s11295-010-0272-3.

Lai, C.W., Yu, Q., Hou, S., Skelton, R.L., Jones, M.R., Lewis, K.L., et al., 2006.Analysis of papaya BAC end sequences reveals first insights into the orga-nization of a fruit tree genome. Mol. Genet. Genomics 276, 1–12.

Lashermes, P., Combes, M.C., Robert, J., Trouslot, P., D’hont, A., Anthony, F.,et al., 1999. Molecular characterization and origin of the Coffea arabica L.genome. Mol. Gen. Genet. 261, 259–266.

Lashermes, P., Couturon, E., Moreau, N., Paillard, M., Louarn, J., 1996. Inheritanceand genetic mapping of self-incompatibility in Coffea canephora Pierre.Theor. Appl. Genet. 93, 458–462.

Leloup, V., Louvrier, A., Liardon, R., 1995. Degradation mechanisms of chlorogenicacids during roasting. 16th International Conference on Coffee Science(ASIC). Kyoto, Japan.

Lepelley, M., Cheminade, G., Tremillon, N., Simkin, A., Caillet, V., McCarthy, J.,2007. Chlorogenic acid synthesis in coffee: an analysis of CGA content andreal-time RT-PCR expression of HCT, HQT, C3H1, and CCoAOMT1genes during grain development in C. canephora. Plant Sci. 172, 978–996.

Leroy, T., Marraccini, P., Dufour, M., Montagnon, C., Lashermes, P., Sabau, X.,et al., 2005. Construction and characterization of a Coffea canephora BAClibrary to study the organization of sucrose biosynthesis genes. Theor.Appl. Genet. 111, 1032–1041.

Li, M.G., Wunder, J., Bissoli, G., Scarponi, E., Gazzani, S., Barbaro, E., et al., 2008.Development of COS genes as universally amplifiable markers for phyloge-netic reconstructions of closely related plant species. Cladistics 24, 727–745.

Lin, C., Mueller, L.A., Mc Carthy, J., Crouzillat, D., Petiard, V., Tanksley, S.D.,2005. Coffee and tomato share common gene repertoires as revealed bydeep sequencing of seed and cherry transcripts. Theor. Appl. Genet. 112,114–130.

Lopes, F.R., Carazzolle, M.F., Pereira, G.A.G., Colombo, C.A., Carareto, C.M.A.,2008. Transposable elements in Coffea (Gentianales: Rubiacea) transcriptsand their role in the origin of protein diversity in flowering plants. Mol.Genet. Genomics 279, 385–401.

Mahesh, V., Million-Rousseau, R., Ullmann, P., Chabrillange, N., Bustamante, J.,Mondolot, L., et al., 2007. Functional characterization of two p-coumaroyl

58 A. DE KOCHKO ET AL.

Author's personal copy

ester 30-hydroxylase genes from coffee tree: evidence of a candidate forchlorogenic acid biosynthesis. Plant Mol. Biol. 64, 145–159.

Mahesh, V., Rakotomalala, J.-J., Le Gal, L., Vigne, H., de Kochko, A., Hamon, S.,et al., 2006. Isolation and genetic mapping of a Coffea canephora phenyla-lanine ammonia-lyase gene CcPAL1 and its involvement in the accumula-tion of caffeoyl quinic acids. Plant Cell Rep. 25, 986–992.

Mahe, L., Combes, M.C., Lashermes, P., 2007. Comparison between a coffee singlecopy chromosomal region and Arabidopsis duplicated counterparts evi-denced high level synteny between the coffee genome and the ancestralArabidopsis genome. Plant Mol. Biol. 64, 699–711.

Maillard, L.C., 1913. Genese des matieres humiques et des matieres proteiques, Paris,Masson.

Mao, L., Wood, T.C., Yu, Y., Budiman, M.A., Tomkins, J., Woo, S., et al., 2000.Rice transposable elements: a survey of 73,000 sequence-tagged-connectors.Genome Res. 10, 982–990.

Mardis, E.R., 2008. Next-generation DNA sequencing methods. Ann. Rev. Geno-mics Hum. Genet. 9, 387–402.

Marraccini, P., Rogers, W.J., Allard, C., Andre, M.L., Caillet, V., Lacoste, N., et al.,2001. Molecular and biochemical characterization of endo-beta-mannanases from germinating coffee (Coffea arabica) grains. Planta 213,296–308.

Marracini, P.R., Vieira, H.J.O., Ferrao, L.G.E., Da Silva, M.A.G., Taquita, F.R.,Bloch, J.A., Jr., et al., 2008. Study of Drought-Tolerance Mechanisms inCoffee Plants by an Integrated Analysis. 22nd International Conference onCoffee Science. Campinas, SP Brazil, , Association Scientifique Internatio-nale du Cafe (ASIC), Paris, France.

Masumbuko, L.I., Bryngelsson, T., 2006. Inter simple sequence repeat (ISSR) ana-lysis of diploid coffee species and cultivated Coffea arabica L. from Tanza-nia. Genet. Resour. Crop Evol. 53, 357–366.

Maurin, O., Davis, A.P., Chester, M., Mvungi, E.F., Jaufeerally-Fakim, Y., Fay, M.F., 2007. Towards a phylogeny for Coffea (Rubiaceae): identifying well-supported lineages based on nuclear and plastid DNA sequences. Ann. Bot.100, 1565–1583.

McCarthy, A.A., McCarthy, J.G., 2007. The structure of two N-methyltransferasesfrom the caffeine biosynthetic pathway. Plant Physiol. 144, 879–889.

McCouch, S.R., Kochert, G., Yu, Z.H., Wang, Z.Y., Khush, G.S., Coffman, W.R.,et al., 1988. Molecular mapping of rice chromosomes. Theor. Appl. Genet.76, 815–829.

Menda, N., Buels, R.M., Tecle, I., Mueller, L.A., 2008. A community-based annota-tion framework for linking Solanaceae genomes with phenomes. PlantPhysiol. 147, 1788–1799.

Ming, R., Moore, P.H., Zee, F., Abbey, C.A., Ma, H., Paterson, A.H., 2001.Construction and characterization of a papaya BAC library as a founda-tion for molecular dissection of a tree-fruit genome. Theor. Appl. Genet.102, 892–899.

Mitreva, M., Mardis, E.R., 2009. Large-scale sequencing and analytical processing ofESTs. Methods Mol. Biol. 533, 153–187.

Mizuno, K., Kato, M., Irino, F., Yoneyama, N., Fujimura, T., Ashihara, H., 2003a.The first committed step reaction of caffeine biosynthesis: 7-methylxanthosine synthase is closely homologuous to caffeine synthases incoffee (Coffea arabica L.). FEBS Lett. 547, 56–60.

Mizuno, K., Okuda, A., Kato, M., Yoneyama, N., Tanaka, H., Ashihara, H., et al.,2003b. Isolation of a new dual-functional caffeine synthase gene encoding

ADVANCES IN COFFEA GENOMICS 59

Author's personal copy

an enzyme for the conversion of 7-methylxanthine to caffeine from coffee(Coffea arabica L.). FEBS Lett. 534, 75–81.

Moncada, P., McCouch, S., 2004. Simple sequence repeat diversity in diploid andtetraploid Coffea species. Genome 47, 501–509.

Mondego, J.M.C., Guerreiro-Filho, O., Bengtson, M.H., Drummond, R.D., Felix, J.D., Duarte, M.P., et al., 2005. Isolation and characterization of Coffeagenes induced during coffee leaf miner (Leucoptera coffeella) infestation.Plant Sci. 169, 351–360.

Montoya, G., Vuong, H., Cristancho, M., Moncada, P., Yepes, M., 2006. Sequenceanalysis from leaves, flowers and fruits of Coffea arabica var. Caturra. 21stInternational Conference on Coffee Science (ASIC). Montpellier, France.

Moroldo, M., Paillard, S., Marconi, R., Fabrice, L., Canaguier, A., Cruaud, C.,et al., 2008. A physical map of the heterozygous grapevine ‘CabernetSauvignon’ allows mapping candidate genes for disease resistance. BMCPlant Biol. 8, 66.

Moschetto, D., Montagnon, C., Guyot, B., Perriot, J.J., Leroy, T., Eskes, A., 1996.Studies on the effect of genotype on cup quality of Coffea canephora. Trop.Sci. 36, 18–31.

Mozo, T., Dewar, K., Dunn, P., Ecker, J.R., Fischer, S., Kloska, S., et al., 1999. Acomplete BAC-based physical map of the Arabidopsis thaliana genome.Nat. Genet. 22, 271–275.

Mozo, T., Fischer, S., Shizuya, H., Altmann, T., 1998. Construction and characteriza-tion of the IGF Arabidopsis BAC library. Mol. Gen. Genet. 258, 562–570.

Mueller, L.A., Solow, T.H., Taylor, N., Skwarecki, B., Buels, R., Binns, J., et al.,2005. The SOL genomics network. A comparative resource for solanaceaebiology and beyond. Plant Physiol. 138, 1310–1317.

Noir, S., Patheyron, S., Combes, M.C., Lashermes, P., Chalhoub, B., 2004. Con-struction and characterisation of a BAC library for genome analysis of theallotetraploid coffee species (Coffea arabica L.). Theor. Appl. Genet. 109,225–230.

N’Diaye, A., Noirot, M., Hamon, S., Poncet, V., 2007. Genetic basis of speciesdifferentiation between Coffea liberica Hiern and C. canephora Pierre:analysis of an interspecific cross. Genet. Resour. Crop Evol. 54, 1011–1021.

N’Diaye, A., Poncet, V., Louarn, J., Hamon, S., Noirot, M., 2005. Genetic differ-entiation between Coffea liberica var. liberica and C liberica var. Dewevreiand comparison with C canephora. Plant Syst. Evol. 253, 95–104.

Noirot,M., Poncet, V., Barre, P., Hamon, P., Hamon, S., de Kochko, A., 2003. Genomesize variations in diploid African Coffea species. Ann. Bot. 92, 709–714.

Ogawa, M., Herai, Y., Koizumi, N., Kusano, T., Sano, H., 2001. 7-Methylxanthinemethyltransferase of coffee plants – Gene isolation and enzymatic proper-ties. J. Biol. Chem. 276, 8213–8218.

Ogita, S., Uefuji, H., Morimoto, M., Sano, H., 2004. Application of RNAi toconfirm theobromine as the major intermediate for caffeine biosynthesisin coffee plants with potential for construction of decaffeinated varieties.Plant Mol. Biol. 54, 931–941.

O’Malley, R.C., Rodriguez, F.I., Esch, J.J., Binder, B.M., O’donnell, P., Klee, H.J.,et al., 2005. Ethylene-binding activity, gene expression levels, and receptorsystem output for ethylene receptor family members from Arabidopsis andtomato. Plant J. 41, 651–659.

Paillard, M., Lashermes, P., Petiard, V., 1996. Construction of a molecular linkagemap in coffee. Theor. Appl. Genet. 93, 41–47.

Paley, S.M., Karp, P.D., 2006. The Pathway Tools cellular overview diagram andOmics Viewer. Nucleic Acids Res. 34, 3771–3778.

60 A. DE KOCHKO ET AL.

Author's personal copy

Paux, E., Roger, D., Badaeva, E., Gay, G., Bernard, M., Sourdille, P., et al., 2006.Characterizing the composition and evolution of homoeologous genomes inhexaploid wheat through BAC-end sequencing on chromosome 3B. Plant J.48, 463–474.

Paux, E., Sourdille, P., Salse, J., Saintenac, C., Choulet, F., Leroy, P., et al., 2008. Aphysical map of the 1-gigabase bread wheat chromosome 3B. Science 322,101–104.

Pearl, H.M., Nagai, C., Moore, P.H., Steiger, D.L., Osgood, R.V., Ming, R., 2004.Construction of a genetic map for arabica coffee. Theor. Appl. Genet. 108,829–835.

Pendergrast, M., 2009. Coffee second only to oil? Tea Coffee Trade J. April, 38–41.Pereira, L.F.P., Galvao, R.M., Kobayashi, A.K., Cacao, S.M.B., Vieira, L.G.E.,

2005. Ethylene production and acc oxidase gene expression during fruitripening of Coffea arabica L. Braz. J. Plant Physiol. 17, 283–289.

Petitot, A.S., Lecouls, A.C., Fernandez, D., 2008. Sub-genomic origin and regulationpatterns of a duplicated WRKY gene in the allotetraploid species Coffeaarabica. Tree Genet. Genomes 4, 379–390.

Pinto-Maglio, C.A.F., Da Cruz, N.D., 1987. Pachytene chromosome morphology inCoffea L. I. Nucleolar chromosomes. Caryologia 40, 7–23.

Pinto-Maglio, C.A.F., Da Cruz, N.D., 1998. Pachytene chromosome morphology inCoffea L. II. C. arabica L. complement. Caryologia 51, 19–35.

Plechakova, O., Tranchant-Dubreuil, C., Benedet, F., Couderc, M., Tinaut, A.,Viader, V., et al., 2009. MoccaDB – An integrative database for functional,comparative and diversity studies in the Rubiaceae family. BMC Plant Biol.9, 123.

Poncet, V., Dufour, M., Hamon, P., Hamon, S., de Kochko, A., Leroy, T., 2007.Development of genomic microsatellite markers in Coffea canephora andtheir transferability to other coffee species. Genome 50, 1156–1161.

Poncet, V., Hamon, P., De Saint Marc, M.B.S., Bernard, T., Hamon, S., Noirot, M.,2005. Base composition of Coffea AFLP sequences and their conservationwithin the genus. J. Hered. 96, 59–65.

Poncet, V., Hamon, P., Minier, J., Carasco, C., Hamon, S., Noirot, M., 2004. SSRcross-amplification and variation within coffee trees (Coffea spp.). Genome47, 1071–1081.

Poncet, V., Rondeau, M., Tranchant, C., Cayrel, A., Hamon, S., de Kochko, A.,et al., 2006. SSR mining in coffee tree EST databases: potential use of EST-SSRs as markers for the Coffea genus. Mol. Genet. Genomics 276, 436–449.

Prakash, N., Combes, M.C., Dussert, S., Naveen, S., Lashermes, P., 2005. Analysisof genetic diversity in Indian robusta coffee genepool (Coffea canephora) incomparison with a representative core collection using SSRs and AFLPs.Genet. Resour. Crop Evol. 52, 333–343.

Prakash, N.S., Marques, D.V., Varzea, V.M.P., Silva, M.C., Combes, M.C.,Lashermes, P., 2004. Introgression molecular analysis of a leaf rust resis-tance gene from Coffea liberica into C. arabica L. Theor. Appl. Genet. 109,1311–1317.

Pre, M., Caillet, V., Sobilo, J., McCarthy, J., 2008. Characterization and expressionanalysis of genes directing galactomannan synthesis in coffee. Ann. Bot.102, 207–220.

Privat, I., Foucrier, S., Prins, A., Epalle, T., Eychenne, M., Kandalaft, L., et al.,2008. Differential regulation of grain sucrose accumulation and metabo-lism in Coffea arabica (Arabica) and Coffea canephora (Robusta) revealedthrough gene expression and enzyme activity analysis. New Phytol. 178,781–797.

ADVANCES IN COFFEA GENOMICS 61

Author's personal copy

Raina, S.N., Mukai, Y., Yamamoto, M., 1998. In situ hybridization identifies thediploid progenitor species of Coffea arabica. Theor. Appl. Genet. 97, 1204–1209.

Rao, G.S., 1978. Stimulation of bean growth in coffee by exogenous application ofethylene. Turrialba 28, 157–158.

Rao, G.S., Venkataramanan, D., Partha, T.S., Rao, K.N., 1978. Ethylene-inducedchanges in chemical composition of coffee mucilage. Turrialba 28, 153–155.

Rova, J.H.E., Delprete, P.G., Andersson, L., Albert, V.A., 2002. A trnL-F cpDNAsequence study of the Condamineeae-Rondeletieae-Sipaneeae complex withimplications on the phylogeny of the Rubiaceae. Am. J. Bot. 89, 145–159.

Rovelli, P., Mettulio, R., Anthony, F., Anzueto, F., Lashermes, P., Graziosi, G.,2000. Microsatellites in coffea arabica L. In: Sera, T., Soccol, C.R., Pandey,A., Roussos, S. (Eds.), Coffee Biotechnology and Quality. Kluwer Aca-demic Publishers, the Netherlands.

Ruas, P.M., Ruas, C.F., Rampim, L., Carvalho, V.P., Ruas, E.A., Sera, T., 2003.Genetic relationship in Coffea species and parentage determination ofinterspecific hybrids using ISSR (inter-simple sequence repeat) markers.Genet. Mol. Biol. 26, 319–327.

Rudd, S., 2003. Expressed sequence tags: alternative or complement to whole genomesequences? Trends Plant Sci. 8, 321–329.

Ruiz, M., Rouard, M., Raboin, L.M., Lartaud, M., Lagoda, P., Courtois, B., 2004.TropGENE-DB, a multi-tropical crop information system. Nucleic AcidsRes. 32, D364–D367.

Safar, J., Bartos, J., Janda, J., Bellec, A., Kubalakova, M., Valarik, M., et al., 2004.Dissecting large and complex genomes: flow sorting and BAC cloning ofindividual chromosomes from bread wheat. Plant J. 39, 960–968.

Salmona, J., Dussert, S., Descroix, F., de Kochko, A., Bertrand, B., Joet, T., 2008.Deciphering transcriptional networks that govern Coffea arabica seeddevelopment using combined cDNA array and real-time RT-PCRapproaches. Plant Mol. Biol. 66, 105–124.

Sera, T., 2001. Coffee Genetic Breeding at IAPAR. Crop Breed. Appl. Biotechnol. 1,179–199.

Shendure, J., Ji, H.L., 2008. Next-generation DNA sequencing. Nat. Biotechnol. 26,1135–1145.

Shultz, J.L., Samreen Kaz, S., Bashir, B., Afzal, J.A., Lightfoot, D.A., 2007. Thedevelopment of BAC-end sequence-based microsatellite markers and place-ment in the physical and genetic maps of soybean. Theor. Appl. Genet. 114,1081–1090.

Silveira, S.R., Ruas, P.M., Ruas, C.D.F., Sera, T., Carvalho, V.D.P., Coelho, A.S.G., 2003. Assessment of genetic variability within and among coffee pro-genies and cultivars using RAPD markers. Genet. Mol. Biol. 26, 329–336.

Silvestrini, M., Junqueira, M.G., Favarin, A.C., Guerreiro, O., Maluf, M.P., Silvar-olla, M.B., et al., 2007. Genetic diversity and structure of Ethiopian, Yemenand Brazilian Coffea arabica L. accessions using microsatellites markers.Genet. Resour. Crop Evol. 54, 1367–1379.

Simkin, A.J., Moreau, H., Kuntz, M., Pagny, G., Lin, C.W., Tanksley, S., et al.,2008. An investigation of carotenoid biosynthesis in Coffea canephora andCoffea arabica. J. Plant Physiol. 165, 1087–1106.

Steiger, D.L., Nagai, C., Moore, P.H., Morden, C.W., Osgood, R.V., Ming, R., 2002.AFLP analysis of genetic diversity within and among Coffea arabica culti-vars. Theor. Appl. Genet. 105, 209–215.

Sybenga, J., 1960. Genetics and Cytology of Coffee. Bibliographia Genetica,Wageningen.

62 A. DE KOCHKO ET AL.

Author's personal copy

Tang, J., Vosman, B., Voorrips, R.E., Van Der Linden, C.G., Leunissen, J.A., 2006.QualitySNP: a pipeline for detecting single nucleotide polymorphisms andinsertions/deletions in EST data from diploid and polyploid species. BMCBioinformatics 7, 438.

Tshilenge, P., Nkongolo, K.K., Mehes, M., Kalonji, A., 2009. Genetic variation inCoffea canephora L. (Var. Robusta) accessions from the founder gene poolevaluated with ISSR and RAPD. Afr. J. Biotechnol. 8, 380–390.

Uefuji, H., Ogita, S., Yamaguchi, Y., Koizumi, N., Sano, H., 2003. Molecular cloningand functional characterization of three distinct N-methyltransferasesinvolved in the caffeine biosynthetic pathway in coffee plants. Plant Physiol.132, 372–380.

Uefuji, H., Tatsumi, Y., Morimoto, M., Kaothien-Nakayama, P., Ogita, S., Sano,H., 2005. Caffeine production in tobacco plants by simultaneous expressionof three coffee N-methyltrasferases and its potential as a pest repellant.Plant Mol. Biol. 59, 221–227.

Van Der Hoeven, R., Ronning, C., Giovannoni, J., Martin, G., Tanksley, S., 2002.Deductions about the number, organization, and evolution of genes in thetomato genome based on analysis of a large expressed sequence tag collec-tion and selective genomic sequencing. Plant Cell 14, 1441–1456.

Vandepoele, K., Van De Peer, Y., 2005. Exploring the plant transcriptome throughphylogenetic profiling. Plant Physiol. 137, 31–42.

Vieira, L.G.E., Andrade, A.C., Colombo, C.A., Araujo, A.H., Metha, A., Oliveira,A.C., et al., 2006. Brazilian coffee genome project: an EST-based genomicresource. Braz. J. Plant Physiol. 18, 95–108.

Voilley, A., Sauvageot, F., Durand, D., 1977. Influence sur l’amertume d’un cafeboisson de quelques parametres d’extraction. In: ASIC (Ed.), HuitiemeColloque de l’ASIC ASIC. Abidjan, Cote d’Ivoire.

Vos, P., Hogers, R., Bleeker, M., Reijans, M., Van De Lee, T., Hornes, M., et al.,1995. AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res.23, 4407–4414.

Wang, G.L., Holsten, T.E., Song, W.Y., Wang, H.P., Ronald, P.C., 1995. Construc-tion of a rice bacterial artificial chromosome library and identification ofclones linked to the Xa-21 disease resistance locus. Plant J. 7, 525–533.

Winston, E.C., Hoult, M., Howitt, C.J., Shepherd, R.K., 1992. Ethylene-inducedfruit ripening in arabica coffee (Coffea-arabica L.). Aust. J. Exp. Agric. 32,401–408.

Wu, F., Mueller, L.A., Crouzillat, D., Petiard, V., Tanksley, S.D., 2006. Combiningbioinformatics and phylogenetics to identify large sets of single-copy ortho-logous genes (COSII) for comparative, evolutionary and systematic studies:a test case in the euasterid plant clade. Genetics 174, 1407–1420.

Zeltz, P., Schneider, S., Volkmann, J., Willmund, R., 2005. Traceability of wild-growing coffee from Ethiopian rainforest by genetic fingerprinting. Dtsch.Lebensmitt. Rundschau. 101, 89–92.

Zhu, A., Goldstein, J., 1994. Cloning and functional expression of a cDNA-encodingcoffee bean alpha-galactosidase. Gene 140, 227–231.

ADVANCES IN COFFEA GENOMICS 63