Bacterial origin of a diverse family of UDP-glycosyltransferase genes in the Tetranychus urticae...

67
Accepted Manuscript Bacterial origin of a diverse family of UDP-glycosyltransferase genes in the Tetranychus urticae genome Seung-Joon Ahn, Wannes Dermauw, Nicky Wybouw, David G. Heckel, Thomas Van Leeuwen PII: S0965-1748(14)00067-8 DOI: 10.1016/j.ibmb.2014.04.003 Reference: IB 2573 To appear in: Insect Biochemistry and Molecular Biology Received Date: 10 February 2014 Revised Date: 28 March 2014 Accepted Date: 1 April 2014 Please cite this article as: Ahn, S.-J., Dermauw, W., Wybouw, N., Heckel, D.G., Van Leeuwen, T., Bacterial origin of a diverse family of UDP-glycosyltransferase genes in the Tetranychus urticae genome, Insect Biochemistry and Molecular Biology (2014), doi: 10.1016/j.ibmb.2014.04.003. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Transcript of Bacterial origin of a diverse family of UDP-glycosyltransferase genes in the Tetranychus urticae...

Accepted Manuscript

Bacterial origin of a diverse family of UDP-glycosyltransferase genes in theTetranychus urticae genome

Seung-Joon Ahn, Wannes Dermauw, Nicky Wybouw, David G. Heckel, Thomas VanLeeuwen

PII: S0965-1748(14)00067-8

DOI: 10.1016/j.ibmb.2014.04.003

Reference: IB 2573

To appear in: Insect Biochemistry and Molecular Biology

Received Date: 10 February 2014

Revised Date: 28 March 2014

Accepted Date: 1 April 2014

Please cite this article as: Ahn, S.-J., Dermauw, W., Wybouw, N., Heckel, D.G., Van Leeuwen, T.,Bacterial origin of a diverse family of UDP-glycosyltransferase genes in the Tetranychus urticaegenome, Insect Biochemistry and Molecular Biology (2014), doi: 10.1016/j.ibmb.2014.04.003.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service toour customers we are providing this early version of the manuscript. The manuscript will undergocopyediting, typesetting, and review of the resulting proof before it is published in its final form. Pleasenote that during the production process errors may be discovered which could affect the content, and alllegal disclaimers that apply to the journal pertain.

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

Arachnida

Acari

Acariformes

Two-spotted spider mite’sUGT gene family

Parasitiformes AraneaeScorpiones

Chelicerata

Arthropoda

BacteriaUGT lost

UGT gained(horizontal gene transfer)

Myriapoda Hexapoda

Mandibulata

Crustacea

NoUGT

animal typeUGTs

bacterial typeUGTs

tetur01g07060-UGT203A1

tetur10g05770-UGT203D1

tetur0

4g02

350-U

GT203A

2

tetu

r06g

0610

0-UG

T203

A3

100

tetu

r36g

0034

0-UG

T203

F1

tetu

r05g

0509

0-UG

T203

C1

tetu

r36g

0106

0-U

GT2

03G

1

tetu

r09g

0022

0-U

GT2

03B1

tetu

r09g

0165

0-U

GT2

03B2

tetu

r09g

0166

0-U

GT2

03B3

tetu

r16g

0230

0-U

GT2

03E1

8983

99

tetu

r08g

0539

0-U

GT2

06A1

tetu

r08g

0249

0-U

GT2

07A1

91

tetu

r11g

0123

0-U

GT2

05A1

tetu

r11g

0125

0-U

GT2

05A2

tetu

r32g

0124

0-UG

T205

A3te

tur1

1g01

830-

UGT2

05C1

tetur

32g0

1230

-UGT2

05B1

tetur32g01250-UGT205B2

99

100

tetu

r04g

0435

0-UG

T201

D2

tetur0

4g04

300-U

GT201D

1tet

ur05

g057

10-U

GT201

E1

tetur1

2g00

360-U

GT201

G2

tetur08g07460-UGT201G1

tetur02g10390-UGT201G3

tetur06g02410*-UGT201F2

tetur06g02430-UGT201F3

tetur02g01310-UGT201F1

tetur07g06450-UGT201B14

tetur07g06430-UGT201B13

tetur07g06420-UGT201B12tetur07g06390-UGT201B11tetur04g07770-UGT201B5

tetur04g07780-UGT201B6

tetur04g07710*-UGT201B4p

88

tetur05g05060-UGT201B10

tetur05g05050-UGT201B9

tetur05g05030-UGT201B8

tetur05g05020-UGT201B7

tetur04g07630-UGT201B3

tetur01g05700-UG

T201B2

tetur01g05690-UG

T201B1

96

99

98

tetur08g03000-UGT201H1

tetur05g04690-UGT201C3

tetur05g04680-UG

T201C2

tetur01g11870-UG

T201C1

tetur19g00440-UG

T201A6

tetur05g09325-UG

T201A4

tetur21g01400*-UG

T201A7p

tetur08g00190-UG

T201A5

tetur184g00030*-UGT201A8

tetur02g02770-UGT201A3p

tetur60g00080-UGT201A2v1

tetur02g02480-UGT201A2v2

tetur01g03820-UGT201A1

100

99

100

tetu

r02g

0330

0-U

GT2

04A1

tetu

r05g

0006

0-UG

T204

A2

tetu

r05g

0007

0-UG

T204

A3

tetur0

5g00

090-U

GT204A

5

tetur05g00080-UGT204A4

100

tetur02g09830-UGT204B1

tetur02g09850-UGT204B2

99

tetur11g06460-UGT204C1

tetur10g02090-UGT202B1

tetur1

39g0

0010

*-UGT2

02A14

p

tetur22g00510-UGT202A11

tetur22g00460-UGT202A9

tetur22g00480-UGT202A10

tetur30g02050*-UGT202A13ptetur15g00340-UGT202A1

tetur22g00310-UGT202A3

tetur22g00330-UGT202A4

tetur22g00350-UGT202A5

100

tetur22g00270-UGT202A2

tetur30g00390-UGT202A12

96

tetur22g00360*-UGT202A6

tetur22g00380-UGT202A7

tetur22g00420-UGT202A8

tetur22g00440-UGT202A15

tetur22g00970-UG

T202A16

99

100

95

0.2

UGT201

UGT205

UGT206

UGT207

UGT203

UGT202

UGT204

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1

Bacterial origin of a diverse family of UDP-glycosyltransferase genes in the 1

Tetranychus urticae genome 2

3

Seung-Joon Ahn1, 2, *, Wannes Dermauw3, Nicky Wybouw3, David G. Heckel1, Thomas Van Leeuwen3, 4 4

5

1 Department of Entomology, Max Planck Institute for Chemical Ecology, 07745 Jena, Germany 6

2 National Institute of Horticultural and Herbal Science, Rural Development Administration, 441-440 7

Suwon, Korea 8

3 Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, B-9000 Ghent, 9

Belgium 10

4 Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, 1098 XH Amsterdam, 11

The Netherlands 12

* Corresponding author: Seung-Joon Ahn (S.-J. Ahn) 13

Department of Entomology, Max Planck Institute for Chemical Ecology, 07745 Jena, Germany 14

Tel.: +49 3641 571555; fax: +49 3641 571502. E-mail address: [email protected] 15

16

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

2

Abstract 17

UDP-glycosyltransferases (UGTs) catalyze the conjugation of a variety of small lipophilic molecules with 18

uridine diphosphate (UDP) sugars, altering them into more water-soluble metabolites. Thereby, UGTs 19

play an important role in the detoxification of xenobiotics and in the regulation of endobiotics. Recently, 20

the genome sequence was reported for the two-spotted spider mite, Tetranychus urticae, a polyphagous 21

herbivore damaging a number of agricultural crops. Although various gene families implicated in 22

xenobiotic metabolism have been documented in T. urticae, UGTs so far have not. We identified 80 UGT 23

genes in the T. urticae genome, the largest number of UGT genes in a metazoan species reported so far. 24

Phylogenetic analysis revealed that lineage-specific gene expansions increased the diversity of the T. 25

urticae UGT repertoire. Genomic distribution, intron-exon structure and structural motifs in the T. urticae 26

UGTs were also described. In addition, expression profiling after host-plant shifts and in acaricide 27

resistant lines supported an important role for UGT genes in xenobiotic metabolism. Expanded searches 28

of UGTs in other arachnid species (Subphylum Chelicerata), including a spider, a scorpion, two ticks and 29

two predatory mites, unexpectedly revealed the complete absence of UGT genes. However, a centipede 30

(Subphylum Myriapoda) and a water flea and a crayfish (Subphylum Crustacea) contain UGT genes in 31

their genomes similar to insect UGTs, suggesting that the UGT gene family might have been lost early in 32

the Chelicerata lineage and subsequently re-gained in the tetranychid mites. Sequence similarity of T. 33

urticae UGTs and bacterial UGTs and their phylogenetic reconstruction suggest that spider mites 34

acquired UGT genes from bacteria by horizontal gene transfer. Our findings show a unique evolutionary 35

history of the T. urticae UGT gene family among other arthropods and provide important clues to its 36

functions in relation to detoxification and thereby host adaptation. 37

Keywords: Tetranychus urticae; UDP-glycosyltransferase; Detoxification; Horizontal gene transfer; 38

Arthropoda; Chelicerata 39

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

3

Abbreviations: aaID, amino acid identitiy; CDS, coding sequences; EGT, ecdysteroid UDP-40

glycosyltransferase; GT, glycosyltransferase; HGT, horizontal gene transfer; TM, transmembrane 41

domain; TSA, transcriptome shotgun assembly; UDP, uridine diphosphate; UGT, UDP-42

glycosyltransferases; WGS, whole-genome shotgun contigs 43

44

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

4

1. Introduction 45

Glycosyltransferases (GTs) (EC2.4.x.y) are ubiquitous across all kingdoms of life and catalyze the 46

transfer of sugar moieties from activated donor molecules to a variety of acceptor molecules, such as 47

carbohydrates, proteins, lipids, nucleic acids, antibiotics and other small molecules (Lairson et al., 2008). 48

As of March 2014, 95 families of GTs have been identified (GT1-GT95) and classified hierarchically 49

according to the stereochemistry of the substrates and reaction products (http://www.cazy.org) (Lombard 50

et al., 2013). Among these, GT1, often referred to as UDP-glycosyltransferases (UGTs), is the largest 51

family containing the majority of GT genes. In Arabidopsis thaliana, Caenorhabditis elegans and 52

Drosophila melanogaster, they account for more than 25, 29 and 24% of the total documented GT genes, 53

respectively (Yonekura-Sakakibara and Hanada, 2011). 54

UGTs are a gene family of GT1 enzymes that catalyze the conjugation of a variety of small lipophilic 55

molecules with uridine diphosphate (UDP) sugars, increasing their solubility in water. Therefore, 56

glycosylation by UGTs plays an important role in not only the detoxification of xenobiotics, but also the 57

biosynthesis, storage and transport of secondary metabolites. The protein structure is commonly divided 58

into two main parts: the N-terminal domain for aglycone substrate binding and the C-terminal domain for 59

UDP-sugar donor binding (Meech et al., 2012). 60

UGTs are common in all living organisms including viruses, bacteria, plants and animals. Most 61

baculovirus genomes encode the enzyme ecdysteroid UDP-glycosyltransferase (EGT) which regulates the 62

development of the host insect by glycosylating and inactivating ecdysteroid hormones (Hughes, 2013; 63

O'Reilly, 1995). The plant endornaviruses also contain UGTs in their genomes (Hacker et al., 2005; Song 64

et al., 2013). Bacterial UGTs are involved in the glycosylation of various natural products including 65

antibiotics, and their engineering has been encouraged for pharmacological and industrial applications for 66

many years (Erb et al., 2009; Luzhetskyy and Bechthold, 2008). In vertebrates, UGTs are regarded as a 67

major member of the phase II drug metabolizing enzymes, conjugating a large number of xenobiotics and 68

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

5

endobiotics including many drugs with UDP-glucuronic acid as a sugar donor (Bock, 2003). Vertebrate 69

UGTs contain an N-terminal signal peptide that is removed following insertion of the proteins into 70

endoplasmic reticulum (ER), and a C-terminal transmembrane (TM) domain that anchors the protein to 71

the ER membrane with catalytic sites facing the lumen and a tail exposed to cytosol (Magdalou et al., 72

2010). In plants, a variety of UGTs play an important role in the biosynthesis and modification of 73

secondary metabolites, thereby enhancing their solubility and stability, and determining their bioactivity. 74

Plant UGTs lack a signal peptide and a TM domain and are thus localized in the cytosol (Bowles et al., 75

2005). In insects, the significance of glycosylation of small hydrophobic compounds has been overlooked 76

for many years, as it was often regarded as a minor mechanism of enzymatic detoxification, compared to 77

others such as cytochrome P450 monooxygenases (P450s), glutathione-S-transferases (GSTs) and 78

carboxyl/cholinesterases (CCEs) (Brattsten, 1988; Després et al., 2007; Smith, 1962). However, recent 79

biochemical and functional studies revealed that the insect UGTs are responsible for the detoxification 80

and sequestration of a variety of plant allelochemicals and insecticides (Ahn et al., 2011; Daimon et al., 81

2010; Kojima et al., 2010; Lee et al., 2006; Sasai et al., 2009). Recent genome sequencing identified a 82

large collection (>300 genes) of insect UGTs, revealing diverse features of this gene family such as 83

lineage-specific gene diversifications between different insect orders and a conserved gene family 84

(UGT50) along the species evolution in holometabolous insects (Ahn et al., 2012). However, the UGT 85

family has so far not been studied in arthropods other than insects. 86

The two-spotted spider mite, Tetranychus urticae (Subphylum Chelicerata, Order Trombidiformes), is 87

one of the most polyphagous herbivores known, and has been documented to feed on more than 1,100 88

plant species that belong to more than 140 different plant families, including many plants that produce 89

toxic compounds (Jeppson et al., 1975; Migeon and Dorkeld, 2013). In addition, spider mites are major 90

agricultural pests and are the ‘resistance champion’ among arthropods as they have the most documented 91

instances of resistance to diverse pesticides (Van Leeuwen et al., 2010). The molecular mechanisms 92

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

6

underlying the spider mite’s resistance to xenobiotics (pesticides and plant secondary metabolites) are 93

however less understood compared to insects (Van Leeuwen et al., 2010; Yang et al., 2002). 94

Recently, a draft genome of T. urticae was reported, the first published genome sequence of a chelicerate 95

(Grbić et al., 2011). The availability of the genome sequence has provided a unique opportunity to study 96

the role of gene families involved in xenobiotic metabolism in the spider mite (Dermauw et al., 2013b; 97

Van Leeuwen et al., 2012a; Van Leeuwen et al., 2012b). Characterization of gene families associated with 98

detoxification of xenobiotics is the first step towards a better understanding of how the spider mite copes 99

with the noxious compounds (Van Leeuwen et al., 2012b). So far, P450s, GSTs, CCEs, and ATP-binding 100

cassette (ABC) transporters have been recently studied in genome-wide perspectives (Dermauw et al., 101

2013a; Grbić et al., 2011), where the importance of these gene families was documented in both 102

insecticide resistance and adaptation to novel hosts (Dermauw et al., 2013b). However, the UGT gene 103

family has not been studied so far, in spite of its putative potential in the biology of the spider mite. 104

In this report, we provide a comprehensive analysis of the UGT gene family in T. urticae, which is the 105

first genome-wide characterization among non-insect arthropods. All of UGT sequences were annotated 106

in the T. urticae genome and classified according to the current nomenclature system (Mackenzie et al., 107

2005). Phylogenetic analysis with closely and distantly related organisms revealed that the spider mite 108

UGTs are intimately related to bacterial sequences, suggesting horizontal gene transfer. Amino acid 109

sequence alignment and structure prediction further support the bacterial origin. The gene searches were 110

expanded into a wide range of arthropod species to get an overall insight of the evolution of this gene 111

family. Transcriptome analyses provided a wealth of information on gene expression profiles related to 112

host plant challenge or pesticide resistance status. This study provides not only a baseline study that will 113

facilitate functional studies on the roles of T. urticae UGTs in metabolism, detoxification, resistance and 114

host plant adaptation, but also an evolutionary perspective of this gene family in the arthropod-wide 115

context. 116

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

7

2. Materials and Methods 117

2.1. Identification of UGT genes in the genome of T. urticae 118

UGT amino acid sequences from insects were used as queries in tBLASTn searches (Altschul et al., 1997) 119

against the T. urticae genome sequence assembly available at the ORCAE genome portal, 120

http://bioinformatics.psb.ugent.be/orcae/overview/Tetur. All hits with threshold E-value < 10-1 were 121

extracted for analysis, and gene models and EST sequences identified were aligned with genomic 122

scaffolds (London strain) to annotate complete gene structure by using Sequencher (Gene Codes 123

Corporation, MI, USA). In most cases, predicted gene models were good to find full sequences, but in 124

some cases incomplete or split gene models were necessary to be manually annotated. UGT sequences 125

identified in this study were deposited in GenBank and their accession numbers can be found in Table S5. 126

2.2. UGT genes in other species 127

The T. urticae UGTs identified in this study and the insect UGTs available from NCBI were used as 128

queries to perform tBLASTn searches against the arthropod and bacteria genome databases, including the 129

whole-genome shotgun contigs (WGS) as well as transcriptome shotgun assemblies (TSA) in NCBI 130

restricted to the following subphyla of arthropods: Chelicerata (mites and ticks, scorpions, spiders), 131

Myriapoda (centipede) and Crustacea (water flea, crayfish). Repetitive BLAST searches were conducted 132

also in other databases, such as VectorBase (https://www.vectorbase.org; Megy et al., 2012), BCM-133

HGSC (https://www.hgsc.bcm.edu/arthropods/i5k-pilot), and Ensembl Metazoa 134

(http://metazoa.ensembl.org/index.html). In addition, a transcriptome (SRA) data set of Panonychus citri 135

(Family Tetranychidae, the citrus red mite), deposited in EMBL-EBI 136

(http://www.ebi.ac.uk/ena/data/view/ERP000885) (Liu et al., 2011), was assembled by CLC genomic 137

workbench (CLC Bio, Qiagen, Denmark) to make it ‘BLASTable’ for orthologous UGT searches. 138

Accession numbers of UGT protein sequences used in this study can be found in Table S5. 139

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

8

2.3. Nomenclature 140

According to the current UGT nomenclature guidelines (Mackenzie et al., 2005), the UGT genes were 141

named with the following criteria: the gene symbol UGT, a family number, a subfamily letter, and an 142

individual gene number. Families are defined as sharing 40% or more amino acid sequence identity 143

(aaID) and subfamilies defined at 60% aaID or greater. Multiple sequence alignment was performed with 144

ClustalW and adjusted manually. Preliminary grouping was done using the program CD-HIT (Li and 145

Godzik, 2006) at 60% and 40% sequence identity as cut-off values, and preliminary family and subfamily 146

names were assigned on this basis. A maximum likelihood tree was constructed from the sequence 147

alignment using MEGA 5.2 (Tamura et al., 2011) and plotted using the preliminary names. Groups were 148

examined for consistency, and groups on the borderline of 40% or 60% were examined using pairwise p-149

distances calculated by MEGA 5.2. In a few cases, the family criterion of 40% was difficult to apply due 150

to some pairwise comparisons being 41-42% while others were 38-39%, and the family criterion was 151

relaxed to 37-39% if doing so created a coherent group after maximum likelihood phylogenetic analysis. 152

Preliminary names were re-assigned and the entire process was repeated. Partial sequences were 153

examined to ensure that they were not incorrectly grouped. The nomenclature reported here was approved 154

by the UGT Nomenclature Committee and is recorded on the Committee’s website 155

(http://www.flinders.edu.au/medicine/sites/clinicalpharmacology/ugt-homepage.cfm). 156

2.4. Phylogenetic analysis 157

Deduced amino acid sequences were aligned using MUSCLE (Edgar, 2004). Model selection was done 158

with ProtTest 2.4 (Abascal, et al., 2005). According to the Akaike information criterion the model 159

LG+I+G+F, LG+I+G+F, and LG+G+F were optimal for phylogenetic analysis of T. urticae (Fig. 2), 160

Tetranychidae (Fig. S1), and arthropod and bacterial UGTs (Fig. 5). Maximum-likelihood analyses were 161

performed using Treefinder (Jobb et al., 2004) with edge-support calculated by 500 pseudoreplicates (LR-162

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

9

ELW). Resulting phylogenetic trees were visualized using MEGA5.2 (Tamura et al., 2011) and further 163

edited with Adobe Illustrator CS2 (Adobe Systems, USA). 164

2.5. Primary structure prediction 165

Multiple alignments of deduced protein sequences were performed by ClustalW and the structural 166

domains such as UGT signature motif was detected by comparison with other sequences of which 167

primary structures was characterized. Graphical logos for the signature motif of different groups were 168

generated using WebLogo application available at http://weblogo.berkeley.edu (Crooks et al., 2004). An 169

N-terminal signal peptide and a C-terminal transmembrane domain were searched by SignalP 4.1 170

(Petersen et al., 2011) and by TMHMM2.0 (http://www.cbs.dtu.dk/services/TMHMM), respectively. 171

2.6. Genomic distribution of T. urticae UGTs 172

Genomic scaffolds harboring the UGT genes were collected from 640 scaffolds and the genes were 173

mapped on each scaffold, describing gene orientation and relative position. The genomic location for the 174

UGTs identified was resolved by the alignment of their coding sequences (CDS) with scaffold sequences. 175

2.7. Intron mapping 176

Intron positions of the T. urticae UGTs were identified by aligning CDS to corresponding genomic 177

scaffold. The splicing site phases are considered as follows: a phase 0 splicing site lies between two 178

codons, while a phase 1 site lies one base inside the codon in the 3’ direction and the phase 2 intron lies 179

two bases inside a codon in the 3’ direction. 180

2.8. Expression profiling of UGT genes 181

Expression profiling of T. urticae UGT genes in acaricide resistance, after host plant transfer and 182

diapause induction in spider mites, was assessed using previously published dual-color whole genome 183

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

10

gene expression microarrays (Bryon et al., 2013; Dermauw et al., 2013b; Zhurov et al., 2013). Prior to 184

analysis, probe sequences were remapped to the latest annotations of the T. urticae UGT genes using 185

Bowtie2-2.1.0 (Langmead et al., 2009) with default parameters. Seventy-seven UGTs were present on 186

both T. urticae array designs. Limma (Smyth, 2005) was used for final analysis of the dual color data. Per 187

array design, background intensities were corrected (‘normexp’-method with an offset of 50) (Ritchie et 188

al., 2007) followed by within- and between-array normalization (‘loess’- and ‘Aquantile’-method, 189

respectively). Intraspot-correlations were implemented for the linear modeling of the data with the 190

033850 array design (Smyth and Altman, 2013). The design on which the linear models were fitted 191

comprised of all data being compared to one common reference; the T. urticae London strain on bean at 192

standard laboratory conditions (25°C, 60% RH). Significant differential expression was assessed by an 193

empirical bayes approach with cut-offs of the Benjamini-Hochberg corrected p-values and log2FC at 0.05 194

and 1, respectively. The RNA-Seq dataset consisted of replicated RNA-Seq libraries of spider mites 195

(larvae) feeding on different host plants (Arabidopsis, tomato and bean) for 12 hr and a single RNA-Seq 196

library for different developmental stages of spider mites (embryo, larvae, nymph and adult). 197

Experimental details can be found in Grbić et al. (Grbić et al., 2011) and the RNA-Seq data are available 198

via Gene Expression Omnibus under reference GSE32342. To ensure the best possible alignment of 199

RNA-Seq reads to our manually annotated UGT gene models, we re-mapped the RNA-Seq reads to the 200

spider mite genome as previously described (Dermauw et al., 2013a). Expression quantification was 201

performed as described in Grbić et al. (Grbić et al., 2011). 202

203

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

11

3. Results and discussion 204

3.1. Identification and phylogenetic analysis of the T. urticae UGTs 205

The T. urticae genome contains a total of 80 putative UGT genes including five pseudogenes (Table 1). 206

This is the largest UGT repertoire found in any animal genome sequenced so far, including several insects 207

(Ahn et al., 2012), vertebrates (Bock, 2003; Huang and Wu, 2010) and other non-insect arthropods (this 208

study; Fig. 1),. 209

3.1.1. Nomenclature of T. urticae UGT genes The UGT Nomenclature Committee has assigned 210

systematic names to UGT families: families UGT1 - 50 are for animals; UGT51 - 70 for yeasts and fungi; 211

UGT71 - 100 for plants; and UGT101 - 200 for bacteria (Mackenzie et al., 1997, 2005; Ross et al., 2001). 212

According to the list of UGT sequences posted by the nomenclature committee 213

(https://www.flinders.edu.au/medicine/sites/clinical-pharmacology/ugt-homepage.cfm), the animal UGTs 214

are further divided into: families UGT1 - 8 are used for mammals; UGT9 - 27 for a nematode 215

(Caenorhabditis elegans); UGT31 - 50 for insects or insect viruses. All of the T. urticae UGTs identified 216

in this study were assigned to seven new families, UGT201 to 207, as approved by the Committee. Since 217

they are different from any animal UGTs including insects, but closer to a group of bacterial UGTs 218

(UGT108s) (average aaID = 30%, Table S1), we further considered bacterial genes for comparisons. 219

Among many other bacterial UGTs deposited in NCBI, 18 UGTs were identified that clustered with the T. 220

urticae UGTs, but had not yet received official names. On the Committee homepage, there are 21 221

bacterial UGTs that had been already given official names from UGT101 - 107 222

(http://www.flinders.edu.au/medicine/sites/clinical-pharmacology/ugt-homepage.cfm). None of them, 223

however, clustered with the 17 unnamed bacterial UGTs closely related to the T. urticae UGTs. Thus, 224

these bacterial UGTs were grouped into a new family named UGT108. 225

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

12

3.1.2. Phylogenetic analysis A phylogenetic analysis supports the classification of the T. urticae 226

UGTs into the seven distinct families, UGT201 - UGT207 (Fig. 2). UGT201, the largest family, is 227

composed of 36 UGTs, further divided into 8 subfamilies according to 60% aaID rule from the 228

nomenclature guidelines (Mackenzie et al., 2005). This gene family is not only the largest, but also the 229

most diversified one, especially with the recently expanded subfamilies UGT201A (8 genes) and 230

UGT201B (14 genes) (Fig. 2, Table 1). Three out of five pseudogenes identified in T. urticae UGTs are 231

found in these two subfamilies (UGT201A3p, UGT201A7p, UGT201B4p) disrupted by transposable 232

elements (TEs) or repetitive sequences. Subfamilies UGT201C, UGT201D, UGT201F and UGT201G are 233

composed of 3, 2, 3, and 3 duplicated genes, respectively, whereas UGT201E and UGT201H are single-234

gene subfamilies. In addition, there are three partial sequences (UGT201A2v1, UGT201A2v2 and 235

UGT201A8) that could not be completely annotated due to genomic gaps on the scaffolds. As 236

UGT201A2v1 and UGT201A2v2 are currently located within different scaffolds, we decided to regard 237

them as separate, but variant sequences until the two scaffolds are combined. The other partial sequence 238

(UGT201A8) lacks only 22 nucleotide bases at the C-terminal end due to a short gap on the scaffold. 239

Furthermore, UGT201F2 contains a nonsense mutation resulting in premature termination of translation 240

in the middle of the genes, and hence in production of 261 aa-long truncated proteins. However, in the 241

genome of the Montpellier strain, another T. urticae strain that was resequenced (Grbić et al., 2011), the 242

CDS is restored by a GAA codon (coding glutamate) instead of TAA, suggesting they are variants among 243

different populations rather than a pseudogene. The truncated protein lacks a predicted UDP-sugar 244

binding domain at C-terminal, but whether it can bind to the aglycone substrate is unknown. 245

UGT202, the second largest family, consists of 17 UGT genes classified into s subfamilies, UGT202A 246

and UGT202B. Recent lineage-specific gene expansion appears to have occurred in UGT202A family, as 247

diversified into 16 closely related sequences including two pseudogenes, UGT202A13p and 248

UGT202A14p. The former is interrupted by several mutations causing frame shifts, whereas the latter is 249

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

13

disrupted by a 405 bp-long inserted sequence in the middle of exon 2. On the other hand, UGT202A6 250

contains a nonsense mutation resulting in a premature stop codon (TAA) in the middle of exon 2 251

(producing 271 aa-long truncated protein). In the Montpellier strain, however, the intervening termination 252

codon is replaced by a TCA codon (coding serine) and putatively leads to an intact protein sequence, as 253

shown in the case of UGT201F2 above, and a non-synonymous mutation site is also found, suggesting 254

they are variants derived from different populations. UGT202B is composed of a single gene, UGT202B1, 255

distantly related with the others in this family. 256

UGT203, UGT204 and UGT205 families are composed of 11, 8, and 6 genes, respectively. UGT203 is 257

diversified into 7 subfamilies, whereas each of UGT204 and UGT205 consists of 3 subfamilies. 258

UGT204C1 which is classified into UGT204 with low bootstrapping value (<60), is the most divergent 259

member of this subfamily, suggesting that it possesses a unique function. 260

Finally, UGT206 and UGT207 are single-gene families that positioned at separate branches in the 261

phylogenetic tree, suggesting they have gone through evolutionarily independent paths without recent 262

diversification. Although they are given different subfamily names according to the nomenclature rule (i.e. 263

<40% aaID), they seem to share a common ancestor with UGT203 subfamilies as shown in the 264

phylogenetic analysis (Fig. 2). 265

3.2. Characterization of sequence structures 266

3.2.1. Signature sequence The UGT signature motif is a hallmark of the UGT superfamily in all 267

kingdoms of life and is thought to be involved in the binding of UDP moiety of the sugar donor 268

(Mackenzie et al., 1997). Multiple alignments of the amino acid sequences of all the T. urticae UGTs 269

revealed that the common UGT signature sequence is positioned at C-terminal domain (Fig. S2(B)), 270

where some highly conserved amino acids are found in positions 6 (Q), 13-14 (VD), 17-23 (ITHGGNN), 271

27 (E), 32-34 (GKP), 36-37 (IV), 39 (P) and 43-44 (DQ) (Fig. S2(A)). The signature motif composed of 272

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

14

44 amino acids residues is regarded as donor binding domain 1 (DBD1) according to the human UGT 273

model (Miley et al., 2007), where specific amino acid interactions with UDP-glucuronic acid (a sugar 274

donor in mammalian UGT systems) are identified. For example, 27 (E) forms a hydrogen bond with an 275

oxygen of ribose; 18 (T) and 19 (H) interact with phosphates; 43 (D) and 44 (Q) form hydrogen bonds 276

with oxygens of the sugar moiety (Radominska-Pandya et al., 2010). Similarly in the plant UGT model, 277

such consensus residues are also found in the signature motif of T. urticae UGTs (Caputi et al., 2012). 278

However, in spite of such a high degree of conservation in the signature sequence, there seems to be some 279

variation among different animal taxa. Comparison of the graphical alignments restricted to the 44 amino 280

acids from bacterial UGT108s, insects, the crustacean Daphnia pulex, the centipede Strigamia maritime 281

and T. urticae UGTs showed not only sequence conservation among different taxa, but also the presence 282

of taxon-specific residues in the signature motif (Fig. S3). The motif of T. urticae UGTs showed higher 283

similarity with that of bacteria UGT108s than other arthropods, representing bacteria-spider mite 284

consensus residues particularly at 6 (Q), 13 (V), 14 (D), 16 (V), 22 (N), 23 (N), 27 (E) and 33 (K), 285

whereas insect UGTs are closer to water flea and centipede in this motif. 286

3.2.2. Absence of signal peptide and transmembrane domain in T. urticae UGTs In animals, the 287

N-terminal end of the UGT contains a signal peptide that mediates the integration of the protein precursor 288

into the endoplasmic reticulum (ER) compartment. The signal peptide is subsequently cleaved and the 289

protein is further N-glycosylated. The mature protein is retained in the ER membrane by its hydrophobic 290

transmembrane (TM) domain at the C-terminal end, followed by a short cytoplasmic tail (Magdalou et al., 291

2010). However, such a signal peptide was not detected in the T. urticae UGTs examined by SignalP 4.1 292

(Petersen et al., 2011), suggesting that the proteins are probably not oriented within the ER. In addition, 293

the TM domain and the subsequent tail at the C-terminal end were also not found in the T. urticae UGTs, 294

as predicted by TMHMM 2.0 (http://www.cbs.dtu.dk/services/TMHMM), except for UGT202A15 and 295

UGT202A16. These latter two predicted TM regions were located not at the C-terminal end, but in the 296

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

15

middle of the sequences: the predicted TM was found at AA positions 271 (or 272) to 293 (or 294) in 297

UGT202A15 (or UGT202A16), which is unusual when compared to membrane bound UGTs in animals. 298

In addition, since the predicted TM cannot span the membrane without the protein containing a signal 299

peptide at its N-terminal, these two regions with hydrophobic properties similar to a TM helix are likely 300

not physically inserted into the membrane. The absence of true TM domains indicates that the T. urticae 301

UGTs are cytosolic enzymes like bacterial UGTs (Ross et al., 2001). These structural differences of T. 302

urticae UGTs suggest that they have experienced a different evolutionary pathway compared to UGTs of 303

other animals. 304

3.2.3. Intron-exon structure Leaving the three partial sequences (UGT201A2v2, UGT201A3p and 305

UGT201A7p) out of consideration, most of the T. urticae UGTs are composed of 2 exons, excepting only 306

UGT202B1 with 3 exons and UGT201G3 with one (Table 1). The intron locations are strongly conserved 307

across all of the 75 two-exon UGTs in position as well as splicing phase. Each intron is inserted between 308

[A/G/S] and [X]-[G]-[H]-[I/L/V/F]-[N/H/Q/L] very early in the N-terminal end. All of the splicing sites 309

are phase 2, such that an intron lies two bases inside the codon of [A/G/S] to the 3' direction (Fig. 6). 310

Such a conserved intron-exon organization of T. urticae UGTs strongly suggests recent divergence of the 311

gene family after an intron gain in a common ancestor UGT gene. UGT202B1 has 2 introns as confirmed 312

by RNA-Seq and EST-data 313

(http://bioinformatics.psb.ugent.be/orcae/annotation/Tetur/current/tetur10g02090). The position of the 314

first intron of UGT202B1 is identical to that of the single-intron T. urticae UGTs while its second intron 315

is located in the middle of exon 2, suggesting a more recent gain of the second intron. UGT201G3 is the 316

sole intronless gene among the T. urticae UGTs. Compared to the other two genes in the same subfamily 317

(UGT201G1 and UGT201G2), UGT201G3 lacks the consensus intron which may have been secondarily 318

lost. It is noteworthy that these three UGT201Gs are spread over three different genomic scaffolds (Fig. 319

4), indicating that they are not produced by tandem gene duplication. 320

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

16

321

3.3. Genomic distribution of T. urticae UGTs 322

The 80 T. urticae UGT genes were mapped onto 22 different genomic scaffolds showing an uneven 323

distribution of UGTs in the genome (Fig. 4), although chromosome-wide assembly of the genome is not 324

yet available for T. urticae (Grbić et al., 2011). About 60% of UGTs are arranged in a tandem manner, 325

with 31 UGTs concentrated in 6 different clusters containing more than 3 genes. For example, the largest 326

cluster is found on scaffold 22, where 11 genes of UGT202 family are positioned in the same orientation 327

except one (UGT202A11), suggesting that several gene duplication events have occurred in this cluster 328

(Fig. 4). The other clusters are also composed of tandemly duplicated genes. Two large clusters in 329

scaffold 5 are composed of multiple genes belonging to UGT204A and UGT201B subfamilies, 330

respectively. Similarly, two other UGT201B clusters are found in scaffold 4 and 7. In contrast, 34 UGTs 331

are present as singletons distributed over 20 scaffolds. Each of the five pseudogenes is distributed on a 332

different scaffold, and two of them (UGT201A3p and UGT201B4p) are found in different gene clusters, 333

suggesting that they have probably been generated by gene duplication in their own clusters and 334

subsequently lost their functions. The intronless UGT gene (UGT201G3) and the two-intron UGT gene 335

(UGT202B1) are located on different scaffolds devoid of other UGT gene clusters, reflecting the 336

uniqueness of their gene structure among the others (see above, Table 1). The features of this genomic 337

distribution could become clearer when the small scaffolds are further assembled and a chromosome map 338

is completed. The mapping of this large gene family on the T. urticae genomic scaffolds will be useful in 339

assembling the genomes of closely related species in the future. 340

341

3.4. Expression profiling of T. urticae UGTs 342

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

17

We studied the expression profile of UGT genes across development in the T. urticae London reference 343

strain, as well as in larvae feeding on a benign host (bean, Phaseolus vulgaris) and two more challenging 344

hosts (Arabidopsis thaliana and tomato, Solanum lycopersicum), using existing RNA-Seq reads (Grbić et 345

al., 2011). Using previously published microarray data (Bryon et al., 2013; Dermauw et al., 2013b; 346

Zhurov et al., 2013), we further studied in more detail the expression profiles of UGT genes in adult mite 347

host transplant experiments, in strains that are resistant to multiple pesticides (multi-resistant strain, MR-348

VP; MAR-AB), and after diapause induction. For both expression approaches, we recalculated UGT gene 349

expression based on the manually corrected and validated UGT gene models as part of this study (see 350

Materials and Methods section for details). 351

3.4.1. UGT expression analysis using RNA-Seq As assessed by RNA-Seq expression 352

quantification, the majority (81%) of UGT genes was found to be expressed, i.e. 65 of the 80 T. urticae 353

UGT sequences had an RPKM of >1 in at least one of the spider mite life stages or on one of the plant 354

hosts (Fig. 7). In contrast, all 5 pseudogenes showed extremely low or no expression. Most full-length T. 355

urticae UGT genes for which we did not detect expression belonged to either the UGT201 (7 genes), 356

UGT202 (7) or UGT203 (1) families. Whether these T. urticae UGT genes are expressed at low levels in 357

highly restricted expression domains, or alternatively are only expressed under specific environmental 358

conditions (i.e., host plants), remains to be determined. As shown in Fig. 7 almost half of the T. urticae 359

UGT genes (39 genes, or 49%) belonging to multiple families, were expressed across all life stages 360

analyzed (embryos, larvae, nymphs and adults). Furthermore, 8 genes (UGT201G1, UGT202B1, 361

UGT203A3, UGT203E1, UGT205A3, UGT205B1, UGT206A1, UGT207A1) showed very high 362

expression in all stages (RPKM>10) (Fig. 7). However, the larval stage had the highest number of 363

expressed UGTs (58 genes) followed by nymphal (55), adult (46) and embryonic stage (46). All members 364

of the UGT205 family consistently showed relatively higher expression levels in developmental stages. 365

Most of the 46 UGTs expressed in embryos kept high expression levels at the following stages. However, 366

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

18

a few UGTs which were not expressed in the embryo considerably increased their expression level in 367

other stages, for example UGT201B1, UGT201D1 and UGT201E1 in larval; UGT201A1, UGT201C3 368

and UGT204A3 in nymphal stages. 369

3.4.2. UGT expression profile in xenobiotic metabolism Fig. 8 shows the transcriptional response 370

of UGTs in adult spider mites when challenged by alternative hosts (response to plant allelochemicals) as 371

well as after developing resistance to acaricides (Dermauw et al., 2013b; Zhurov et al., 2013). As was first 372

suggested by RNA-Seq analysis (Fig. 7), the microarray derived expression heat map (Fig. 8) reveals that 373

UGT expression alters considerably in response to xenobiotic metabolism and resistance in adult T. 374

urticae mites. To further confirm the putative role of T. urticae UGTs in xenobiotic metabolism, 375

significant differential expression was assessed using log2 of absolute FC ≥ 1 and a Benjamini-Hochberg-376

corrected p-value < 0.05 as cut-offs. Microarray analysis revealed a significant differential expression of 377

9 and 10 UGT genes in the multi-resistant strains MR-VP and MAR-AB, respectively (Table S2). Of 378

these UGTs, 7 were differentially expressed in both multi-resistant strains and showed a positive 379

correlation in expression. Members of the UGT204 family (UGT204B1 and UGT204B2) showed the 380

highest up-regulation in the two resistant T. urticae strains (Fig. 8, Table S2). Interestingly, three 381

pseudogenes (UGT201A3p, UGT201A7p and UGT202A14p) out of five were not inactive in their 382

transcription and were differentially expressed in a number of resistant strains (Fig. 8), notably showing a 383

high up-regulation of UGT201A7p in MAR-AB in particular (Table S2). This suggests that these are 384

probably not pseudogenes in all strains. Furthermore, the induction of UGT transcription in adults after 385

transfer from bean to the more challenging hosts, tomato and Arabidopsis, is clear as evidenced by the 386

overview presented in Fig. 8. This induction was confirmed by the significance analysis (Table S3). Nine 387

UGTs significantly altered their transcription when T. urticae adults were transferred to tomato for 12 hrs. 388

Of these, 7 UGTs remained differentially expressed after T. urticae was grown on tomato for 5 389

consecutive generations, and UGT204 family members (UGT204A2 and UGT204A5) were again the 390

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

19

most up-regulated. The total number of differentially expressed UGTs increased to 22 after 5 generations 391

on tomato and included members of the UGT201, UGT202, UGT203, UGT204 and UGT205 family (Fig. 392

8, Table S3). UGTs also represented a large proportion of the transcriptional response when adult mites 393

were transferred for 24 hrs from bean to different Arabidopsis lines (wild-type Col-0, mutants qKO and 394

atr1D), which differ in glucosinolate content (Zhurov et al., 2013) (Fig. 8, Table S3). Glucosinolates are 395

thioglucose-based plant defense compounds that release toxic nitrile- or isothiocyanate-based products 396

upon herbivore feeding (Lambrix et al., 2001). The qKO mutant line lacks glucosinolates, and 5 mite 397

UGTs were significantly down-regulated upon transfer to it from bean. In contrast, when mites were 398

transferred to Arabidopsis lines containing glucosinolates, the overall expression level of UGTs increased 399

(Fig. 8, Table S3). Thirteen UGTs changed expression levels when mites were put on Col-0 or on the 400

glucosinolate overproducing atr1D mutant line and of those 8 UGTs increased transcription upon each 401

transfer. As determined by (Zhurov et al., 2013), a considerable number of UGTs (UGT201A8, UGT204A5, 402

UGT201A2v2, UGT202A4, UGT201A5, UGT204B1 and UGT202A1), showed a dose dependent relationship 403

between glucosinolate content in the Arabidopsis line and UGT expression level in T. urticae. In line with 404

previous results, members of the UGT204 family responded most strongly. Last, microarray analysis of 405

facultative reproductive diapause revealed that a large number of UGTs were significantly down-406

regulated (Fig. 8, Table S4). This response was specific to the non-feeding diapause state, as feeding, 407

non-diapausing mites under the same environmental conditions did not exhibit this major down-408

regulation (Bryon et al., 2013). Among a total of 35 UGTs differentially expressed in the diapausing stage, 409

the vast majority was down-regulated, while only 5 genes (UGT201B1, UGT201B2, UGT202A4, 410

UGT203D1 and UGT203E1) were slightly up-regulated (Fig. 8, Table S4). As argued by Bryon et al. 411

(Bryon et al., 2013), this down-regulation probably reflects the fact that diapausing females do not feed. 412

This pattern was also found for P450 mono-oxygenases and other well-known detoxification enzymes and 413

indirectly supports the role of UGTs in xenobiotic metabolism. 414

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

20

415

3.5. Loss of UGT gene family in early Chelicerata lineage 416

Comparative phylogenetic analysis of gene families across different taxa can provide important insights 417

with evolutionary implications. tBLASTn searches against the genome databases of other arthropods 418

revealed that (1) an ancient extinction of the complete gene set seems to have occurred in the early 419

Chelicerata lineage, and (2) the T. urticae UGT gene set belongs to a lineage distinguished from other 420

arthropod UGTs, whereas (3) the UGT gene family has diversified along the evolutionary course of the 421

other arthropod subphyla, such as Myriapoda, Crustacea, and Hexapoda. 422

3.5.1. Subphylum Chelicerata As of March 2014, there were four Acari species of which draft genome 423

sequences were available in NCBI: Ixodes scapularis (black-legged tick) and Rhipicephalus microplus 424

(southern cattle tick) (Order Ixodida); and Metaseiulus occidentalis (western orchard predatory mite) and 425

Varroa destructor (honeybee mite) (Order Mesostigmata). Interestingly, no UGT sequences could be 426

detected in any of these genome databases (Fig. 1). Two transcriptome shotgun assembly (TSA) databases 427

of Ixodes ricinus (castor bean tick) and Rhipicephalus pulchellus (a cattle tick) were additionally searched 428

by BLAST, but any UGT sequences have not been detected. However, two UGT-like EST sequences 429

(EW856022 and EW856023, 861 and 848 bp long, respectively) were retrieved from the EST library of I. 430

scapularis. These two EST sequences did not map to any scaffolds of the I. scapularis genome, which 431

consists of 369,492 contigs totaling 1.76 Gbps that covers 84% of the estimated genome size (2.1 Gbps) 432

(Hill, 2010). In addition, homologous sequences of I. scapularis EW856022 and EW856023 were also 433

not detected in the TSA of a congeneric species, I. ricinus. All, together, this might indicate that these two 434

UGT-like EST sequences map to not yet sequenced genomic regions of I. scapularis or represent bacterial 435

contamination from not yet sequenced bacterial contaminants. 436

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

21

Nevertheless, such a complete absence of UGTs in the non-phytophagous Acari species motivated us to 437

extend our searches into other groups of Arachnida, including a scorpion (Mesobuthus martensii) (Order 438

Scorpiones) and a spider (Parasteatoda tepidariorum) (Order Araneae). Repetitive BLAST searches 439

against the two genome databases failed to identify any UGT-like traces from their large genome 440

assemblies of 1.13 Gbps (86% coverage) and 1.44 Gbps (84% coverage), respectively (GenBank 441

Assembly IDs: GCA_000484575.1 and GCA_000365465.1) (Fig. 1). In contrast, 51 UGT-like contigs 442

were identified in the transcriptome of Panonychus citri (citrus red mite), a closely related mite species 443

belonging to the same family Tetranychidae as T. urticae. Alignment and phylogenetic analysis of the 444

protein sequences from both T. urticae and P. citri revealed clear orthologous relationships (Fig. S1). In 445

conclusion, we could detect UGT sequences only in T. urticae and P. citri, but not in other Arachnida 446

species within Chelicerata, suggesting a loss of the UGT gene family in early Chelicerata lineage and a 447

later gain in the spider mites at least before the divergence of Tetranychidae (see Section 3.6 for 448

horizontal gene transfer). 449

3.5.2. Subphylum Myriapoda In Myriapoda, another subphylum of Arthropoda, Strigamia maritima (a 450

centipede) is the only species for which a draft genome sequence (176 Mbp) is available (GenBank 451

Assembly ID: GCA_000239455.1) which accounts for about 60% of the estimated genome size 293 Mbp. 452

We identified 20 complete UGT sequences including one pseudogene in the S. maritima genome (Table 453

S5), revealing they are the animal type UGTs (Fig. 1), more similar to insect UGTs but not to mite UGTs 454

(Fig. 1 and Fig. 5). 455

3.5.3. Subphylum Crustacea The draft genome sequence of Daphnia pulex (a water flea) contains 25 456

UGT genes including 2 pseudogenes, which are all the animal type (Fig. 1), showing relatively higher 457

similarity with insect UGTs (e.g. 30 - 37 % aaID with T. castaneum) (see Table S5 for accession numbers 458

for D. pulex UGT sequences). Similarly, the crayfish Pontastacus leptodactylus contains also multiple 459

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

22

animal-type UGT sequences as identified from a transcriptome shotgun assembly (TSA) available in 460

NCBI (Manfrin et al., 2013). 461

3.5.4. Subphylum Hexapoda A diverse array of insect UGTs have been previously reported in several 462

insect species as described in (Ahn et al., 2012), all of which are of the animal type. 463

In summary, Myriapoda, Crustacea and Hexapoda contain animal UGTs, with diverse numbers in gene 464

members (Fig. 1). Their gene structures (intron/exon) and protein sequence motifs share common features 465

of animal UGTs. In contrast, members of the Chelicerata lack the complete UGT gene set, except for two 466

phytophagous mites including T. urticae that have a distinct UGT repertoire closer to the bacterial type. 467

This comparative analysis with diverse species in Arthropoda shows that the arthropod UGTs might have 468

been once lost early in the Chelicerata lineage (between 700 and 450 mya), whereas the Mandibulata 469

lineage seems to have further diversified its UGT gene family, as species of Myriapoda, Crustacea and 470

Hexapoda have evolved. In Chelicerata, however, two ancient lineages, Pycnogonida (sea spider) and 471

Xiphosura (horseshoe crab), had been branched out before Arachnida (not shown in Fig. 1). When 472

molecular information from these two marine chelicerates becomes available, it will allow a better 473

understanding of exactly when the loss occurred in the Chelicerata lineage. Another issue to be concerned 474

is whether UGTs had already formed a multigene family in the last common ancestors of metazoan 475

animals before their loss or whether UGTs had undergone lineage-specific expansions in each separate 476

taxonomic group after their loss. In the first scenario, the absence of UGTs in Chelicerata (excluding 477

spider mites) requires multiple losses of all UGT gene members in several chelicerate lineages (Araneae, 478

Scorpiones, Ixodida, etc.) which seems unlikely. Instead, as postulated in the second scenario, it is more 479

likely that an ancestral gene(s) of this gene family did not diversify prior to the loss. This is supported by 480

the observation that insect UGTs diversified in a lineage-specific manner at Order level (Ahn et al., 2012). 481

Assuming that such a lineage-specific UGT diversification is also found in arthropods outside the 482

Hexapoda, the loss of UGTs in early Chelicerata might have occurred before the multigene family was 483

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

23

formed. This ‘most likely evolutionary scenario’ has important implications for our understanding of the 484

evolution and current distribution of UGTs in arthropods. 485

Unlike other chelicerates, the spider mites have a large number of unique UGTs which are more similar to 486

bacterial ones. This unusual existence of bacterial type of UGTs in the spider mites led us to propose the 487

‘horizontal gene transfer (HGT)’ hypothesis, which is further discussed in the next section. 488

3.6. Horizontal gene transfer of T. urticae UGTs from bacteria 489

Horizontal gene transfer (HGT; also called lateral gene transfer) is the process by which genes move 490

across species boundaries by asexual mechanisms, and is an important key in understanding evolution of 491

prokaryotes and eukaryotes (Boucher et al., 2003). Contrasting with abundant examples of HGT within 492

prokaryotes (Boucher et al., 2003), fewer cases have reported in relation to eukaryotes, especially animals 493

(Andersson, 2005). This is presumably because the reproductive biology of animals is fundamentally 494

different; however many such events may have been overlooked due to the routine procedure of weeding 495

out prokaryote-like sequences considered to be contaminants in the analysis of eukaryotic genomes 496

(Acuña et al., 2012). Nevertheless, the number of examples of bacteria-to-animal HGT is rapidly 497

increasing as more animal genomes are being sequenced (Dunning Hotopp, 2011; Schönknecht et al., 498

2014). 499

3.6.1. Phylogenetic analysis In phylogenetic analyses including arthropod and bacterial UGTs, all T. 500

urticae UGT protein sequences did not cluster with evolutionarily closer species such as a centipede, a 501

water flea or insects. Instead, they clustered within groups of bacterial phyla, such as Actinobacteria and 502

Chloroflexi (Fig. 5). We identified a single gene most similar to the T. urticae UGTs from each of 15 503

Actinobacteria and 2 Chloroflexi genomes by BLAST searches in NCBI, and constructed a phylogenetic 504

sub-tree together with seven representative T. urticae UGTs, showing a close relationship between the T. 505

urticae UGTs and the bacterial groups supported by a high bootstraping value (Fig. S4). 506

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

24

Actinobacteria constitute one of the largest phyla among Bacteria. They are well known for the 507

production of secondary metabolites and are of high pharmacological and commercial interest. For 508

example, Streptomycetes is the largest genus in the Actinobacteria, including over 580 species, mostly 509

living in soil, and it is especially known for its source of secondary metabolites, such as antibiotics, 510

antifungals, antiparasitics and anticancer agents (Zhou et al., 2012). Glycosylation is also regarded as one 511

of the important metabolic pathways to produce and further modify such secondary compounds, mostly 512

performed by UGTs. In this perspective, the polyphagous spider mite might have obtained such a potent 513

glycosylation tool-kit from a bacterial source and then increased the diversity of UGTs by gene 514

duplication resulting in enhanced adaptability to environmental conditions, such as various host plants. 515

Interestingly, a recent genome analysis of a plant pathogenic fungus, Botrytis cinerea, revealed that a 516

UGT gene was obtained from plants by HGT, suggesting it may contribute to the evolution of 517

phytopathogenicity in B. cinerea (Zhu et al., 2012). In addition, a similar evolutionary scenario as for 518

UGT genes in T. urticae can be found in plant-parasitic nematodes in which enzymes for the degradation 519

of the plant cell wall have been acquired via HGT and have undergone massive duplications (Danchin et 520

al., 2010; Paganini et al., 2012). In these cases, the enzymes apparently play an important role in the 521

biology of the species. Although no direct role in detoxification of any T. urticae UGT has been reported 522

yet, the pattern of diversification suggests significant ecological consequences of this large gene family in 523

the spider mite. 524

3.6.2. Protein structure comparison Multiple alignments of T. urticae UGT201A1, representative of 525

the majority of T. urticae UGTs, with UGT1A1 from Homo sapiens (vertebrate), UGT40A1 from 526

Bombyx mori (insect) and UGT108A2 from Streptomyces violaceusniger (Actinobacteria) showed a 527

greater structural similarity of the T. urticae UGTs to bacterial proteins (Fig. 3). The common UGT 528

signature motif is positioned in the middle of C–terminal domain consistently across different organisms. 529

However, the T. urticae UGT lacks an N-terminal signal peptide which is required to guide the peptide 530

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

25

into the ER. In addition, a TM domain and cytoplasmic tail at C-terminus could not be detected (see 531

above section). Instead, the primary structure of the T. urticae UGT more closely resembles that of 532

bacteria, as well as the mature protein size (Fig. 3), supporting the notion that the T. urticae UGTs 533

originated by HGT from a bacterial ancestor. 534

3.6.3. Other HGT examples in T. urticae UGTs are not the only examples of horizontal gene 535

transfer in T. urticae; recent genome sequencing has revealed several other cases (Grbić et al., 2011). For 536

example, intradiol ring-cleavage dioxygenases (ID-RCDs) have been suggested to originate from fungi 537

(Dermauw et al., 2013b), and a cobalamin-independent methionine synthase (MetE) gene and two β-538

fructofuranosidase genes might possibly be transferred from Bacteria (Grbić et al., 2011). In addition, two 539

clusters of carotenoid biosynthesis genes (Grbić et al. 2011; Bryon et al. 2013) and a cyanase gene 540

(Wybouw et al., 2012) are also highly likely to have been laterally transferred, but it is still unclear from 541

which organisms they were obtained. These examples, including UGTs in this study, indicate that HGT 542

has occurred quite frequently and diversely in T. urticae, contributing putative selective advantages after 543

the gene transfer such as effective detoxification or higher host plant adaptability. As suggested in a 544

previous report (Wybouw et al. 2012) and reinforced by this study (see 3.5.1), these ancient gene transfers 545

might have occurred before the divergence in the family Tetranychidae (Wybouw et al., 2012), as was 546

also shown in the case of UGTs. 547

To summarize, an unexpectedly large number of UGT genes were detected in T. urticae and these genes 548

most likely originate from an ancestral HGT event from bacteria, a very rare event in eukaryotes 549

(Andersson, 2005). Orthologues identified in a mite relative, P. citri, confirmed that HGT occurred at 550

least before the diversification of the family Tetranychidae. Furthermore, structural similarity of T. 551

urticae and bacterial UGT protein sequences also supported this hypothesis. 552

553

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

26

3.7. Conclusions 554

In this study, we annotated the UGT gene family of a non-insect arthropod, the spider mite T. urticae. 555

Transcriptome analysis after host-plant shifts and in acaricide resistant lines has provided abundant 556

information on the dynamics of expression, supporting an important role for UGT genes in xenobiotic 557

metabolism. Comparative analysis of this gene family in Arthropoda revealed that an ancient loss of all 558

animal-like UGT gene families has probably occurred in the Chelicerata, and that an ancestor of the 559

spider mite regained a bacterial-like gene family probably by horizontal gene transfer. This unusual mode 560

of evolution is partly responsible for the remarkable adaptability of this group of organisms. 561

562

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

27

Acknowledgements 563

We thank Dr. H. Vogel for his technical assistance in the transcriptome assembly and two anonymous 564

reviewers for their valuable comments and suggestions to improve the manuscript. This work was 565

supported by RDA Grant PJ009365 (to SA) and by the Max-Planck-Gesellschaft (to SA and DGH). TVL 566

and WD are post-doctoral fellows of the Fund for Scientific Research Flanders (FWO) This work was 567

supported by FWO grant 3G061011 and 3G009312 and a Ghent University Special Research Fund grant 568

01J13711, the Government of Canada through Genome Canada, and the Ontario Genomics Institute OGI-569

046 (to TVL). NW is supported by the Institute for the Promotion of Innovation by Science and 570

Technology in Flanders (IWT, grant IWT/SB/101451). 571

572

Competing interests 573

The authors declare they have no competing interests. 574

575

Authors' contributions 576

SA and TVL designed the research. SA performed annotation and genomic analysis. WD and NW 577

analyzed RNA-Seq and microarray data. DGH provided evolutionary analyses. SA and TVL wrote the 578

manuscript with input from WD, NW and DGH. All authors read and approved the final manuscript. 579

580

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

28

References 581

Abascal, F., Zardoya, R., Posada, D., 2005. ProtTest: Selection of best-fit models of protein evolution. 582

Bioinformatics 21, 2104-2105. 583

Acuña, R., Padilla, B.E., Flórez-Ramos, C.P., Rubio, J.D., Herrera, J.C., Benavides, P., Lee, S.-J., Yeats, 584

T.H., Egan, A.N., Doyle, J.J., Rose, J.K.C., 2012. Adaptive horizontal transfer of a bacterial gene to 585

an invasive insect pest of coffee. Proc. Natl. Acad. Sci. U.S.A. 586

http://dx.doi.org/10.1073/pnas.1121190109. 587

Ahn, S.-J., Badenes-Pérez, F.R., Reichelt, M., Svatoš, A., Schneider, B., Gershenzon, J., Heckel, D.G., 588

2011. Metabolic detoxification of capsaicin by UDP-glycosyltransferase in three Helicoverpa species. 589

Arch. Insect Biochem. Physiol. 78, 104-118. 590

Ahn, S.-J., Vogel, H., Heckel, D.G., 2012. Comparative analysis of the UDP-glycosyltransferase 591

multigene family in insects. Insect Biochem. Mol. Biol. 42, 133-147. 592

Altincicek, B., Kovacs, J.L., Gerardo, N.M., 2012. Horizontally transferred fungal carotenoid genes in the 593

two-spotted spider mite Tetranychus urticae. Biol. Lett. 8, 253-257. 594

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., 1997. 595

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic 596

Acids Res. 25, 3389-3402. 597

Andersson, J.O., 2005. Lateral gene transfer in eukaryotes. Cell. Mol. Life Sci. 62, 1182-1197. 598

Bock, K.W., 2003. Vertebrate UDP-glucuronosyltransferases: functional and evolutionary aspects. 599

Biochem. Pharmacol. 66, 691-696. 600

Boucher, Y., Douady, C.J., Papke, R.T., Walsh, D.A., Boudreau, M.E.R., Nesbø, C.L., Case, R.J., 601

Doolittle, W.F., 2003. Lateral gene transfer and the origins of prokaryotic groups. Annu. Rev. Genet. 602

37, 283-328. 603

Bowles, D., Isayenkova, J., Lim, E.-K., Poppenberger, B., 2005. Glycosyltransferases: managers of small 604

molecules. Curr. Opin. Plant. Biol. 8, 254-263. 605

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

29

Brattsten, L.B., 1988. Enzymic adaptations in leaf-feeding insects to host-plant allelochemicals. J. Chem. 606

Ecol. 14, 1919-1939. 607

Bryon, A., Wybouw, N., Dermauw, W., Tirry, L., Van Leeuwen, T., 2013. Genome wide gene-expression 608

analysis of facultative reproductive diapause in the two-spotted spider mite Tetranychus urticae. 609

BMC Genomics 14, 815. 610

Caputi, L., Malnoy, M., Goremykin, V., Nikiforova, S., Martens, S., 2012. A genome-wide phylogenetic 611

reconstruction of family 1 UDP-glycosyltransferases revealed the expansion of the family during the 612

adaptation of plants to life on land. Plant J. 69, 1030-1042. 613

Crooks, G.E., Hon, G., Chandonia, J.-M., Brenner, S.E., 2004. WebLogo: a sequence logo generator. 614

Genome Res. 14, 1188-1190. 615

Daimon, T., Hirayama, C., Kanai, M., Ruike, Y., Meng, Y., Kosegawa, E., Nakamura, M., Tsujimoto, G., 616

Katsuma, S., Shimada, T., 2010. The silkworm Green b locus encodes a quercetin 5-O-617

glucosyltransferase that produces green cocoons with UV-shielding properties. Proc. Natl. Acad. Sci. 618

U.S.A. 107, 11471-11476. 619

Danchin, E.G.J., Rosso, M.-N., Vieira, P., de Almeida-Engler, J., Coutinho, P.M., Henrissat, B., Abad, P., 620

2010. Multiple lateral gene transfers and duplications have promoted plant parasitism ability in 621

nematodes. Proc. Natl. Acad. Sci. U.S.A. 107, 17651-17656. 622

Dermauw, W., Osborne, E., Clark, R., Grbić, M., Tirry, L., Van Leeuwen, T., 2013a. A burst of ABC 623

genes in the genome of the polyphagous spider mite Tetranychus urticae. BMC Genomics 14, 317. 624

Dermauw, W., Wybouw, N., Rombauts, S., Menten, B., Vontas, J., Grbić, M., Clark, R.M., Feyereisen, R., 625

Van Leeuwen, T., 2013b. A link between host plant adaptation and pesticide resistance in the 626

polyphagous spider mite Tetranychus urticae. Proc. Natl. Acad. Sci. U.S.A. 110, E113-E122. 627

Després, L., David, J.-P., Gallet, C., 2007. The evolutionary ecology of insect resistance to plant 628

chemicals. Trends Ecol. Evol. 22, 298-307. 629

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

30

Dunning Hotopp, J.C., 2011. Horizontal gene transfer between bacteria and animals. Trends Genet. 27, 630

157-163 631

Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. 632

Nucleic Acids Res. 32, 1792-1797. 633

Erb, A., Weiß, H., Härle, J., Bechthold, A., 2009. A bacterial glycosyltransferase gene toolbox: 634

Generation and applications. Phytochemistry 70, 1812-1821. 635

Grbić, M., Van Leeuwen, T., Clark, R.M., Rombauts, S., Rouzé, P., Grbić, V., Osborne, E.J., Dermauw, 636

W., Thi Ngoc, P.C., Ortego, F., Hernandez-Crespo, P., Diaz, I., Martinez, M., Navajas, M., Sucena, 637

É., Magalhães, S., Nagy, L., Pace, R.M., Djuranović, S., Smagghe, G., Iga, M., Christiaens, O., 638

Veenstra, J.A., Ewer, J., Villalobos, R.M., Hutter, J.L., Hudson, S.D., Velez, M., Yi, S.V., Zeng, J., 639

Pires-daSilva, A., Roch, F., Cazaux, M., Navarro, M., Zhurov, V., Acevedo, G., Bjelica, A., Fawcett, 640

J.A., Bonnet, E., Martens, C., Baele, G., Wissler, L., Sanchez-Rodriguez, A., Tirry, L., Blais, C., 641

Demeestere, K., Henz, S.R., Gregory, T.R., Mathieu, J., Verdon, L., Farinelli, L., Schmutz, J., 642

Lindquist, E., Feyereisen, R., Van de Peer, Y., 2011. The genome of Tetranychus urticae reveals 643

herbivorous pest adaptations. Nature 479, 487-492. 644

Hacker, C.V., Brasier, C.M., Buck, K.W., 2005. A double-stranded RNA from a Phytophthora species is 645

related to the plant endornaviruses and contains a putative UDP glycosyltransferase gene. J. Gen. 646

Virol. 86, 1561-1570. 647

Hedges, S.B., Kumar, S., 2009. Discovering the timetree of life. Oxford University Press, New York. 648

Hill, C.A., 2010. Genome analysis of major tick and mite vectors of human pathogens. 649

https://www.vectorbase.org/sites/default/files/ftp/documents/ixs_sequencing_proposal.pdf. 650

Huang, H., Wu, Q., 2010. Cloning and comparative analyses of the zebrafish Ugt repertoire reveal its 651

evolutionary diversity. PLoS ONE 5, e9144. 652

Hughes, A.L., 2013. Origin of ecdysosteroid UDP-glycosyltransferases of baculoviruses through 653

horizontal gene transfer from Lepidoptera. Coevolution 1, 1-7. 654

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

31

Jeppson, L.R., Keifer, H.H., Baker, E.W., 1975. Mites injurious to Economic Plants. University of 655

California Press, Berkeley, Los Angeles. 656

Jobb, G., Haeseler, A.V., Strimmer, K., 2004. TREEFINDER: a powerful graphical analysis environment 657

for molecular phylogenetics. BMC Evol. Biol. 4, 18. 658

Kojima, W., Fujii, T., Suwa, M., Miyazawa, M., Ishikawa, Y., 2010. Physiological adaptation of the 659

Asian corn borer Ostrinia furnacalis to chemical defenses of its host plant, maize. J. Insect Physiol. 660

56, 1349-1355. 661

Lairson, L.L., Henrissat, B., Davies, G.J., Withers, S.G., 2008. Glycosyltransferases: structures, functions, 662

and mechanisms. Annu. Rev. Biochem. 77, 521-555. 663

Lambrix, V., Reichelt, M., Mitchell-Olds, T., Kliebenstein, D.J., Gershenzon, J., 2001. The Arabidopsis 664

epithiospecifier protein promotes the hydrolysis of glucosinolates to nitriles and influences 665

Trichoplusia ni herbivory. Plant Cell 13, 2793-2807. 666

Langmead, B., Trapnell, C., Pop, M., Salzberg, S., 2009. Ultrafast and memory-efficient alignment of 667

short DNA sequences to the human genome. Genome Biol. 10, R25. 668

Lee, S.-W., Ohta, K., Tashiro, S., Shono, T., 2006. Metabolic resistance mechanisms of the housefly 669

(Musca domestica) resistant to pyraclofos. Pestic. Biochem. Physiol. 85, 76-83. 670

Li, W., Godzik, A., 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or 671

nucleotide sequences. Bioinformatics 22, 1658-1659. 672

Liu, B., Jiang, G., Zhang, Y., Li, J., Li, X., Yue, J., Chen, F., Liu, H., Li, H., Zhu, S., Wang, J., Ran, C., 673

2011. Analysis of transcriptome differences between resistant and susceptible strains of the citrus red 674

mite Panonychus citri (Acari: Tetranychidae). PLoS ONE 6, e28516. 675

Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P.M., Henrissat, B., 2013. The carbohydrate-676

active enzymes database (CAZy) in 2013. Nucleic Acids Res. 42, D490-D495. 677

Luzhetskyy, A., Bechthold, A., 2008. Features and applications of bacterial glycosyltransferases: current 678

state and prospects. Appl. Microbiol. Biotechnol. 80, 945-952. 679

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

32

Mackenzie, P.I., Owens, I.S., Burchell, B., Bock, K.W., Bairoch, A., Belanger, A., Gigleux, S.F., Green, 680

M., Hum, D.W., Iyanagi, T., Lancet, D., Louisot, P., Magdalou, J., Roy Chowdhury, J., Ritter, J.K., 681

Tephly, T.R., Schachter, H., Tephly, T., Tipton, K.F., Nebert, D.W., 1997. The UDP 682

glycosyltransferase gene superfamily: recommended nomenclature update based on evolutionary 683

divergence. Pharmacogenetics 7, 255-269. 684

Mackenzie, P.I., Walter Bock, K., Burchell, B., Guillemette, C., Ikushiro, S.-i., Iyanagi, T., Miners, J.O., 685

Owens, I.S., Nebert, D.W., 2005. Nomenclature update for the mammalian UDP glycosyltransferase 686

(UGT) gene superfamily. Pharmacogenet. Genomics 15, 677-685. 687

Magdalou, J., Fournel-Gigleux, S., Ouzzine, M., 2010. Insights on membrane topology and 688

structure/function of UDP-glucuronosyltransferases. Drug Metab. Rev. 42, 159-166. 689

Manfrin, C., Tom, M., De Moro, G., Gerdol, M., Guarnaccia, C., Mosco, A., Pallavicini, A., Giulianini, 690

P.G., 2013. Application of D-crustacean hyperglycemic hormone induces peptidases transcription 691

and suppresses glycolysis-related transcripts in the hepatopancreas of the crayfish Pontastacus 692

leptodactylus - Results of a transcriptomic study. PLoS ONE 8, e65176. 693

Meech, R., Miners, J.O., Lewis, B.C., Mackenzie, P.I., 2012. The glycosidation of xenobiotics and 694

endogenous compounds: versatility and redundancy in the UDP glycosyltransferase superfamily. 695

Pharmacol. Ther. 134, 200-218. 696

Megy, K., Emrich, S.J., Lawson, D., Campbell, D., Dialynas, E., Hughes, D.S.T., Koscielny, G., Louis, C., 697

MacCallum, R.M., Redmond, S.N., Sheehan, A., Topalis, P., Wilson, D., the VectorBase, C., 2012. 698

VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics. Nucleic 699

Acids Res. 40, D729-D734. 700

Migeon, A., Dorkeld, F., 2013. Spider Mites Web: a comprehensive database for the Tetranychidae. 701

http://www.montpellier.inra.fr/CBGP/spmweb. 702

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

33

Miley, M.J., Zielinska, A.K., Keenan, J.E., Bratton, S.M., Radominska-Pandya, A., Redinbo, M.R., 2007. 703

Crystal structure of the cofactor-binding domain of the human phase II drug-metabolism enzyme 704

UDP-glucuronosyltransferase 2B7. J. Mol. Biol. 369, 498-511. 705

O'Reilly, D.R., 1995. Baculovirus-encoded ecdysteroid UDP-glucosyltransferases. Insect Biochem. Mol. 706

Biol. 25, 541-550. 707

Paganini, J., Campan-Fournier, A., Da Rocha, M., Gouret, P., Pontarotti, P., Wajnberg, E., Abad, P., 708

Danchin, E.G.J., 2012. Contribution of lateral gene transfers to the genome composition and parasitic 709

ability of root-knot nematodes. PLoS ONE 7, e50875 710

Petersen, T.N., Brunak, S., von Heijne, G., Nielsen, H., 2011. SignalP 4.0: discriminating signal peptides 711

from transmembrane regions. Nat. Methods. 8, 785-786. 712

Radominska-Pandya, A., Bratton, S.M., Redinbo, M.R., Miley, M.J., 2010. The crystal structure of 713

human UDP-gluuronosyltransferase 2B7 C-termnial end is the first mammalian UGT target to be 714

revealed: the significance for human UGTs from both the 1A and 2B families. Drug Metab. Rev. 42, 715

133-144. 716

Ritchie, M.E., Silver, J., Oshlack, A., Holmes, M., Diyagama, D., Holloway, A., Smyth, G.K., 2007. A 717

comparison of background correction methods for two-colour microarrays. Bioinformatics 23, 2700-718

2707. 719

Ross, J., Li, Y., Lim, E.-K., Bowles, D., 2001. Higher plant glycosyltransferases. Genome Biol. 2, 720

reviews3004.3001-reviews3004.3006. 721

Sasai, H., Ishida, M., Murakami, K., Tadokoro, N., Ishihara, A., Nishida, R., Mori, N., 2009. Species-722

specific glucosylation of DIMBOA in larvae of the rice Armyworm. Biosci. Biotechnol. Biochem. 723

73, 1333-1338. 724

Schönknecht, G., Weber, A.P.M., Lercher, M.J., 2014. Horizontal gene acquisitions by eukaryotes as 725

drivers of adaptive evolution. BioEssays 36, 9-20. 726

Smith, J.N., 1962. Detoxication Mechanisms. Annu. Rev. Entomol. 7, 465-480. 727

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

34

Smyth, G., Altman, N., 2013. Separate-channel analysis of two-channel microarrays: recovering inter-728

spot information. BMC Bioinformatics 14, 165. 729

Smyth, G.K., 2005. limma: Linear Models for Microarray Data, in: Gentleman, R., Carey, V., Huber, W., 730

Irizarry, R., Dudoit, S., Smyth, G.K. (Eds.), Bioinformatics and Computational Biology Solutions 731

Using R and Bioconductor. Springer New York, pp. 397-420. 732

Song, D., Cho, W.K., Park, S.-H., Jo, Y., Kim, K.-H., 2013. Evolution of and horizontal gene transfer in 733

the Endornavirus genus. PLoS ONE 8, e64270. 734

Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S., 2011. MEGA5: molecular 735

evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum 736

parsimony methods. Mol. Biol. Evol. 28, 2731-2739. 737

Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., 1997. The CLUSTAL_X 738

windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. 739

Nucleic Acids Res. 25, 4876-4882. 740

Van Leeuwen, T., Demaeght, P., Osborne, E.J., Dermauw, W., Gohlke, S., Nauen, R., Grbić, M., Tirry, L., 741

Merzendorfer, H., Clark, R.M., 2012a. Population bulk segregant mapping uncovers resistance 742

mutations and the mode of action of a chitin synthesis inhibitor in arthropods. Proc. Natl. Acad. Sci. 743

U.S.A. 109, 4407-4412. 744

Van Leeuwen, T., Dermauw, W., Grbić, M., Tirry, L., Feyereisen, R., 2012b. Spider mite control and 745

resistance management: does a genome help? Pest Manag. Sci. 69, 156-159. 746

Van Leeuwen, T., Vontas, J., Tsagkarakou, A., Dermauw, W., Tirry, L., 2010. Acaricide resistance 747

mechanisms in the two-spotted spider mite Tetranychus urticae and other important Acari: a review. 748

Insect Biochem. Mol. Biol. 40, 563-572. 749

Wybouw, N., Balabanidou, V., Ballhorn, D.J., Dermauw, W., Grbić, M., Vontas, J., Van Leeuwen, T., 750

2012. A horizontally transferred cyanase gene in the spider mite Tetranychus urticae is involved in 751

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

35

cyanate metabolism and is differentially expressed upon host plant change. Insect Biochem. Mol. 752

Biol. 42, 881-889. 753

Yang, X., Buschman, L.L., Zhu, K.Y., Margolies, D.C., 2002. Susceptibility and detoxifying enzyme 754

activity in two spider mite species (Acari: Tetranychidae) after selection with three insecticides. J. 755

Econ. Entomol. 95, 399-406. 756

Yonekura-Sakakibara, K., Hanada, K., 2011. An evolutionary view of functional diversity in family 1 757

glycosyltransferases. Plant J. 66, 182-193. 758

Zhou, Z., Gu, J., Li, Y.-Q., Wang, Y., 2012. Genome plasticity and systems evolution in Streptomyces. 759

BMC Bioinformatics 13, S8. 760

Zhu, B., Zhou, Q., Xie, G., Zhang, G., Zhang, X., Wang, Y., Sun, G, Li, B., Jin, G., 2012. Interkingdom 761

gene transfer may contribute to the evolution of phytopathogenicity in Botrytis cinerea. Evol. 762

Bioinformatics 8, 105-117. 763

Zhurov, V., Navarro, M., Bruinsma, K.A., Arbona, V., Santamaria, E.M., Cazaux, M., Wybouw, N., 764

Osborne, E.J., Ens, C., Rioja, C., Vermeirssen, V., Rubio-Somoza, I., Krishna, P., Diaz, I., Schmid, 765

M., Gómez-Cadenas, A., Van de Peer, Y., Grbić, M., Clark, R.M., Van Leeuwen, T., Grbić, V., 2014. 766

Reciprocal responses in the interaction between Arabidopsis and the cell-content-feeding chelicerate 767

herbivore spider mite. Plant Physiol. 164, 384-399. 768

769

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

36

Figure legends 770

771

Figure 1. Evolutionary relationships and UGT gene numbers of T. urticae and related arthropods. 772

The tree shows divergence time for arthropod lineages with the Nematoda subphylum as an outgroup as 773

indicated by TimeTree (http://www.timetree.org) (Hedges and Kumar, 2009). The numbers of arthropod 774

UGT genes are taken from reference (Ahn et al., 2012) and this study. The C. elegans UGT gene number 775

is based on the data table in the UGT Nomenclature Committee website 776

(http://www.flinders.edu.au/medicine/sites/clinical-pharmacology/ugt-homepage.cfm). Databases (DB) 777

refers to whole-genome shotgun contigs (WGS), transcriptome shotgun assembly (TSA) or the 778

Nomenclature Committee’s data collection (NCD). 779

Figure 2. Phylogenetic analysis of T. urticae UGTs. A full set of T. urticae UGT protein sequences was 780

aligned using MUSCLE (Edgar, 2004) and subjected to a maximum-likelihood analysis using Treefinder 781

(Jobb et al., 2004). The resulting tree was midpoint rooted. Only bootstrapping values higher than 80 are 782

shown. The scale bar represents 0.2 substitutions per site. Information and accession numbers of T. 783

urticae UGT protein sequences can be found in Table 1 and Table S5. 784

Figure 3. Comparison of T. urticae UGT protein sequences to human, insect and bacterial UGTs. 785

(A) Comparison of primary structures of the UGT protein sequences with emphasis on important domains. 786

(B) Representative sequences from the organisms are aligned by ClustalW where asterisks (*) indicates 787

positions which have a single, fully conserved residue, and colons (:) and periods (.) indicate conservation 788

between groups of strongly and weakly similar chemical properties, respectively. 789

Figure 4. Distribution of the UGT genes on the T. urticae genomic scaffolds. Only scaffolds harboring 790

UGT genes are depicted with UGT genes mapped at their relative physical location. Arrows indicate gene 791

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

37

orientation and same color in gene name represents the same gene family. Accession numbers of UGT 792

protein sequences used can be found in Table S5. 793

Figure 5. Phylogenetic analysis of arthropod and bacterial UGTs. A set of arthropod and bacterial 794

UGT protein sequences was aligned using MUSCLE (Edgar, 2004) and subjected to a maximum-795

likelihood analysis using Treefinder (Jobb et al., 2004). Only bootstrapping values higher than 80 are 796

shown. The scale bar represents 0.5 substitutions per site. 797

Figure 6. Alignment of N-terminal amino acid sequences of 7 representative T. urticae UGTs and 5 798

Actinobacteria UGTs. The intron-exon splicing site is indicated by a red line and conserved amino acids 799

are highlighted in same colors near the splicing site. Accession numbers of UGT protein sequences used 800

can be found in Table S5. 801

Figure 7. Expression heat map of T. urticae UGT genes based on RNA-Seq data. RPKM values are 802

analyzed from RNA-Seq data derived from larvae after feeding on several host plants (from bean to 803

tomato and Arabidopsis) and from different life stages of T. urticae (embryo, larvae, nymph and adult). 804

Figure 8. Microarray-derived expression heat map of T. urticae UGT genes. The heat map depicts the 805

log2FC of 77 UGT genes after host plant transfer (from bean to tomato and Arabidopsis), in multi-806

resistant strains (MR-VP and MAR-AB), and after diapause induction, relative to the susceptible London 807

strain on bean under standard laboratory conditions (25°C, 60% RH). The expression profile of UGTs 808

after transfer to tomato was assessed after 2 hrs, 12 hrs and 5 generations, while the transcriptional 809

response to Arabidopsis was investigated on different lines (qKO, Col-0, atr1D) with different 810

allelochemical (glucosinolate) content 24 hrs after transfer relative to the same strain feeding on the 811

ancestral, benign host plant, bean. The qKO mutant line lacks glucosinolates and atr1D line overproduces 812

glucosinolates compared to the wild type, Col-0. 813

814

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1

Table 1. Characterization of a full set of T. urticae UGT sequences. 1

UGT family Name1 Tetur ID2 Length (aa)3 Scaffold4 Strand No. Exons5

201 UGT201A1 tetur01g03820 455 scaffold 1 + 2

UGT201A2v1 tetur60g00080 402 scaffold 60 - 2

UGT201A2v2 tetur02g02480 (278) scaffold 2 + (1)

UGT201A3p† tetur02g02770 (103) scaffold 2 + (1)

UGT201A4 tetur05g09325 448 scaffold 5 + 2

UGT201A5 tetur08g00190 444 scaffold 8 + 2

UGT201A6 tetur19g00440 456 scaffold 19 + 2

UGT201A7p tetur21g01400* (166) scaffold 21 + (1)

UGT201A8 tetur184g00030* 444 scaffold 184 + 2

UGT201B1 tetur01g05690 438 scaffold 1 + 2

UGT201B2 tetur01g05700 443 scaffold 1 + 2

UGT201B3 tetur04g07630 438 scaffold 4 - 2

UGT201B4p tetur04g07710* 437 scaffold 4 - 2

UGT201B5 tetur04g07770 437 scaffold 4 - 2

UGT201B6 tetur04g07780 437 scaffold 4 - 2

UGT201B7 tetur05g05020 437 scaffold 5 + 2

UGT201B8 tetur05g05030 437 scaffold 5 + 2

UGT201B9 tetur05g05050 437 scaffold 5 + 2

UGT201B10 tetur05g05060 436 scaffold 5 + 2

UGT201B11 tetur07g06390 435 scaffold 7 + 2

UGT201B12 tetur07g06420 435 scaffold 7 + 2

UGT201B13 tetur07g06430 435 scaffold 7 + 2

UGT201B14 tetur07g06450 435 scaffold 7 + 2

UGT201C1 tetur01g11870 441 scaffold 1 + 2

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

2

UGT201C2 tetur05g04680 436 scaffold 5 - 2

UGT201C3 tetur05g04690 440 scaffold 5 - 2

UGT201D1 tetur04g04300 459 scaffold 4 - 2

UGT201D2 tetur04g04350 459 scaffold 4 + 2

UGT201E1 tetur05g05710 438 scaffold 5 + 2

UGT201F1 tetur02g01310 438 scaffold 2 + 2

UGT201F2 tetur06g02410* 436 scaffold 6 + 2

UGT201F3 tetur06g02430 438 scaffold 6 + 2

UGT201G1 tetur08g07460 457 scaffold 8 - 2

UGT201G2 tetur12g00360 436 scaffold 12 - 2

UGT201G3 tetur02g10390 438 scaffold 2 + 1

UGT201H1 tetur08g03000 442 scaffold 8 - 2

202 UGT202A1 tetur15g00340 436 scaffold 15 - 2

UGT202A2 tetur22g00270 437 scaffold 22 + 2

UGT202A3 tetur22g00310 435 scaffold 22 + 2

UGT202A4 tetur22g00330 437 scaffold 22 + 2

UGT202A5 tetur22g00350 437 scaffold 22 + 2

UGT202A6 tetur22g00360* 433 scaffold 22 + 2

UGT202A7 tetur22g00380 430 scaffold 22 + 2

UGT202A8 tetur22g00420 432 scaffold 22 + 2

UGT202A9 tetur22g00460 437 scaffold 22 + 2

UGT202A10 tetur22g00480 439 scaffold 22 + 2

UGT202A11 tetur22g00510 437 scaffold 22 - 2

UGT202A12 tetur30g00390 434 scaffold 30 + 2

UGT202A13p tetur30g02050* 434 scaffold 30 - 2

UGT202A14p tetur139g00010* 437 scaffold 139 + 2

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

3

UGT202A15 tetur22g00440 436 scaffold 22 + 2

UGT202A16 tetur22g00970 439 scaffold 22 + 2

UGT202B1 tetur10g02090 434 scaffold 10 + 3

203 UGT203A1 tetur01g07060 425 scaffold 1 - 2

UGT203A2 tetur04g02350 426 scaffold 4 + 2

UGT203A3 tetur06g06100 426 scaffold 6 + 2

UGT203B1 tetur09g00220 504 scaffold 9 - 2

UGT203B2 tetur09g01650 529 scaffold 9 - 2

UGT203B3 tetur09g01660 461 scaffold 9 + 2

UGT203C1 tetur05g05090 446 scaffold 5 - 2

UGT203D1 tetur10g05770 428 scaffold 10 - 2

UGT203E1 tetur16g02300 436 scaffold 16 - 2

UGT203F1 tetur36g00340 428 scaffold 36 - 2

UGT203G1 tetur36g01060 433 scaffold 36 - 2

204 UGT204A1 tetur02g03300 435 scaffold 2 + 2

UGT204A2 tetur05g00060 433 scaffold 5 + 2

UGT204A3 tetur05g00070 434 scaffold 5 + 2

UGT204A4 tetur05g00080 432 scaffold 5 + 2

UGT204A5 tetur05g00090 434 scaffold 5 + 2

UGT204B1 tetur02g09830 438 scaffold 2 - 2

UGT204B2 tetur02g09850 438 scaffold 2 - 2

UGT204C1 tetur11g06460 486 scaffold 11 + 2

205 UGT205A1 tetur11g01230 440 scaffold 11 + 2

UGT205A2 tetur11g01250 441 scaffold 11 + 2

UGT205A3 tetur32g01240 451 scaffold 32 - 2

UGT205B1 tetur32g01230 578 scaffold 32 - 2

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

4

UGT205B2 tetur32g01250 501 scaffold 32 - 2

UGT205C1 tetur11g01830 479 scaffold 11 - 2

206 UGT206A1 tetur08g05390 445 scaffold 8 - 2

207 UGT207A1 tetur08g02490 465 scaffold 8 - 2

1 Official names are assigned according to the guideline recommended by the UGT Nomenclature 2

Committee (Mackenzie et al., 2005). 3

2Tetur IDs are originated from the latest version of gene model in the dataset of T. 4

urticae_CDS_20130730.tfa accessible at the ORCAE genome portal, 5

http://bioinformatics.psb.ugent.be/orcae/overview/Tetur. Gene models with asterisk (*) represent that they 6

are modified (mostly improved) by manual annotation, therefore their sequence information found in the 7

dataset can be different from those in this article. 8

3 The deduced protein length is based on the coding region and the parenthesis indicates partial sequence. 9

4 Scaffolds used in this study are basically from London strain. 10

5 Parenthesis indicates that the exon number is not confidential due to its partial sequence. 11

† The p after the gene name is used to denote a pseudogene. 12

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

0100200300400500600700800approximate divergence time (mya)

Chelicerata

MandibulataMyriapoda

Arachnida

Acari

Scorpiones

Acariformes

Phylum Nematoda (outgroup)

PhylumArthropoda

Parasitiformes

Mesostigmata

Ixodida

Araneae

Cladocera

Decapoda

Chilopoda

Malacostraca

Branchiopoda

Hexapoda

Hemiptera

Hymenoptera

Coleoptera

Lepidoptera

Diptera

Strigamia maritima

Mesobuthus martensiiIxodes scapularisRhipicephalus microplusVarroa destructorMetaseiulus occidentalis

Tetranychus urticaeCaenorhabditis elegans

Panonychus citri

Parasteatoda tepidariorum

Daphnia pulexAcyrthosiphon pisum

Pontastacus leptodactylus

Apis melliferaNasonia vitripennisTribolium castaneumBombyx moriDrosophila melanogasterAnopheles gambiaeAedes aegypti

UGT DBUGT type

34 animal

animal

animal

animal

animal

animal

animal

animal

animal

animal

animal

animal

n/a

n/a

n/a

n/a

n/a

n/a

bacteria

bacteria

26

34

45

43

22

12

58

25

multiple

20

0

0

0

0

0

0

multiple

80

multiple

WGS

WGS

WGS

WGS

WGS

WGS

WGS

WGS

TSA

WGS

WGS

WGS

WGS

WGS

WGS

WGS

TSA

WGS

NCD

WGS

Crustacea

Myriapoda

Hexapoda

ChelicerataLoss of UGT

Gain of UGT from bacteria

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

tetur01g07060-UGT203A1tetur10g05770-UGT203D1

tetur04g02350-UGT203A2tetur06g06100-UGT203A3

tetur36g00340-UGT203F1tetur05g05090-UGT203C1

tetur36g01060-UGT203G1tetur09g00220-UGT203B1tetur09g01650-UGT203B2tetur09g01660-UGT203B3

tetur16g02300-UGT203E1

UGT203

UGT206UGT207

tetur08g05390-UGT206A1tetur08g02490-UGT207A1

tetur11g01230-UGT205A1tetur11g01250-UGT205A2

tetur32g01240-UGT205A3tetur11g01830-UGT205C1

tetur32g01230-UGT205B1tetur32g01250-UGT205B2

UGT205

tetur04g04350-UGT201D2tetur04g04300-UGT201D1

tetur05g05710-UGT201E1tetur12g00360-UGT201G2

tetur08g07460-UGT201G1tetur02g10390-UGT201G3

tetur06g02410*-UGT201F2tetur06g02430-UGT201F3tetur02g01310-UGT201F1

tetur07g06450-UGT201B14tetur07g06430-UGT201B13

tetur07g06420-UGT201B12tetur07g06390-UGT201B11tetur04g07770-UGT201B5

tetur04g07780-UGT201B6tetur04g07710*-UGT201B4p

tetur05g05060-UGT201B10tetur05g05050-UGT201B9

tetur05g05030-UGT201B8tetur05g05020-UGT201B7tetur04g07630-UGT201B3

tetur01g05700-UGT201B2tetur01g05690-UGT201B1

tetur08g03000-UGT201H1tetur05g04690-UGT201C3tetur05g04680-UGT201C2

tetur01g11870-UGT201C1tetur19g00440-UGT201A6

tetur05g09325-UGT201A4tetur21g01400*-UGT201A7p

tetur08g00190-UGT201A5tetur184g00030*-UGT201A8

tetur02g02770-UGT201A3ptetur60g00080-UGT201A2v1tetur02g02480-UGT201A2v2

tetur01g03820-UGT201A1

UGT201

tetur02g03300-UGT204A1tetur05g00060-UGT204A2

tetur05g00070-UGT204A3tetur05g00090-UGT204A5

tetur05g00080-UGT204A4tetur02g09830-UGT204B1

tetur02g09850-UGT204B2tetur11g06460-UGT204C1

UGT204

tetur10g02090-UGT202B1tetur139g00010*-UGT202A14ptetur22g00510-UGT202A11

tetur22g00460-UGT202A9tetur22g00480-UGT202A10

tetur30g02050*-UGT202A13ptetur15g00340-UGT202A1

tetur22g00310-UGT202A3tetur22g00330-UGT202A4

tetur22g00350-UGT202A5tetur22g00270-UGT202A2tetur30g00390-UGT202A12

tetur22g00360*-UGT202A6tetur22g00380-UGT202A7

tetur22g00420-UGT202A8tetur22g00440-UGT202A15

tetur22g00970-UGT202A16

UGT202

100

95100

97

100

9999

99

100

89

100

96

8594

80

9899

88

100

99

100

87

98

99

100

100

98

83

99

91

99

10089

99

98100

100

99

52

100

96

98

9995

100

96

96

100

99

99

100

95

100

100

100

0.2

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

UGT1A1 MAVESQGGRPLVLGLLLCVLGPVVSHAGKILLIPVDG-SHWLSMLGAIQQLQQRGHEIVV 59UGT41A1 -------MRCLGLLFFLVCVVTSARAYHVLCVFPIPSRSHNSLGKGIVDALLEAGHEVTW 53UGT201A1 ---------------------MNNKPKYRFLISSIDAFGHINCALAIGEILASNGHEVTF 39UGT108A2 -----------------------MSGPLTILFAPESAYGPTNNCVGIGDVLRRRGHRVVF 37

: . . . . . : * **.:.

UGT1A1 LAPDASLYIRDG------AFYTLKTYPVPFQREDVKESFVSLGHNVFENDSFLQRVIKTY 113UGT41A1 VTPYPPSELAKG----LKIVDVSATVSIS-KTVDMHEQRNSNTGVSFVKALAENITRVSL 108UGT201A1 ANHAKHKSLAD-----RRDFKFIPFDEHHFKYLNPVVKWINGLLHRFRSDALSIFN--NW 92UGT108A2 AAEASWKGRLAPLGFEEDLVDLAPPPEKPQEAGQFWKDFVRDTAPEFQKPTIEQLG--TW 95

. : : . * . .

UGT1A1 KKIKKDSAMLLSGCSHLLHNKELMASLAESSFDVMLTDPFLPCSPIVAQYLSLPTVFFLH 173UGT41A1 ATPALQQAIVQGKYDAVITETFFNDAEAGYGAVLQVPWILMSSIAMMPQ---LEAIVDEV 165UGT201A1 THDEIDFFGSIVEHCDAKNRALEQVLKENEDNFDMFIGDFMSVYPAFYR---TTLPWALV 149UGT108A2 VRPVWEELVAGARYCEPRLKEIVGRVRP-----DVIVEDNVVCFPALTT---ADVPFVRI 147

: . . : . .

UGT1A1 ALPCSLEFEATQCPNPFSYVPRPLSSHSDHMTFLQRVKNMLIAFSQNFLCDVVYSPYATL 233UGT41A1 RSVTTIPLLFNNAPTPMG----------FWDRLKNVFLHSVMVISDWLDRPKTVAFYESL 215UGT201A1 HSSNPICLYPE-GPPAWSGFSVKEKEPEKWEKFRALFGEASSVLREKMY-----SWWKSY 203UGT108A2 VSCNPLEVKGEHIPPPFSGYPAGDRAG--WDEFRAEYDRTHRELWAEFN-----AWVVDQ 200

.: . * . . : . : : :

UGT1A1 ASEFLQR-----EVTVQDLLSSASVWLFRSDFVKDYPRPIMPNMVFVGGINCLHQNPLSQ 288UGT41A1 FAPLATARGVALPPFEEALYN-VSVLLVNSHPAFAPPLSLPPNVVEIAGYHIDPKTPPLP 274UGT201A1 EVPYPNIKDSRIENWWYSEPEQLGIYHYPELLDYHEVGPIRSDKWVRLDCAIRKPDNVEP 263UGT108A2 GARP-------LPDLEFIHDGELNLYVYPEIADYAEARPLG-PAWHRLDSSVRETDEEFT 252

.: . .: .

UGT1A1 EFEAYINASGEHGIVVFSLGSMV--SEIPEKKAMAIADALGKIPQTVLWR-YTGTRPSNL 345UGT41A1 KDLQSILDSSPQGVVYFSMGSVLKSSKLSEQTRRELLDVFGSIPQTVLWK--FEEDLQDL 332UGT201A1 FIIPESLKSLPGKLIYFSLGSLG---SAQVDLMKTFIKILSKCPHRFIVSKGPKGDQLEL 320UGT108A2 LPPEWAERPEGSALIYFSLGSLG---SADVGLMRRIIASLARTPHRYIVSKGPLHDEIEL 309

. :: **:**: . : :. *: : :*

UGT1A1 ANNTILVKWLPQNDLLGHPMTRAFITHAGSHGVYESICNGVPMVMMPLFGDQMDNAKRME 405UGT41A1 PKNVHIRSWMPQSSILAHPNMKVFITHGGLLSILETLHYGVPILAVPVFGDQPSNANSAV 392UGT201A1 GPNMWGENYVDQISVL--EVVDLVITHGGNNTFLETIYAAKPLIVIPFFMDQLDNAQRAV 378UGT108A2 PPTMWGAEFLPQTRIL--PLVDLVITHGGNNTTTESLHFGKPMIVLPLFWDQYDNAQRIA 367

. .:: * :* .***.* *:: . *:: :*.* ** .**:

UGT1A1 TKGAGVTLNVLEMTSEDLENALKAVINDKSYKENIMRLSSLHKDRPVEPLDLAVFWVEFV 465UGT41A1 RNGFAKSIEYKPDMAKDMKVALNEMLSDDSYYKRARYLSKIFGDKLVPPAKVISHYVKVA 452UGT201A1 DCGIGSRINLHELDETKVLQTIEQTLSNPSYVEKIVKISDSMKSTKS--RENLVKKIETF 436UGT108A2 ELGYGVRLDPYRFTDAQLHGAMAELLDDIRLRNRLTEASRTIRARDG--LRTAADLIERH 425

* . :: .: :: :.: :. * ::

UGT1A1 MRHKGAPHLRPAAHDLTWYQYHSLDVIGFLLAVVLTVAFITFKCCAYGYRKCLGKKGRVK 525UGT41A1 IETNGAYHLRSKSLLYPWYQRWLVDIIAALLLACLAVYVVARRVLCYLYTSVTG--GGCN 510UGT201A1 LENHQKNVIKSNVLYHLKG----------------------------------------- 455UGT108A2 GHERQET----------------------------------------------------- 432

. .

UGT1A1 KAHKSKTH 533UGT41A1 RSVKVKKN 518UGT201A1 --------UGT108A2 --------

1000 200 300 400 500 aa

1000 200 300 400 500 aa

1000 200 300 400 455 aa

1000 200 300 400 438 aa

Human UGT1A1

Insect UGT41A1

T. urticae UGT201A1

Bacterial UGT108A2

UGT signature motif

Signal peptide

Transmembrane domain Cytoplasmic tail

(A)

(B)

N

N

N

N

C

C

C

C

UGT signature motif

Transmembrane domain Cytoplasmic tail

Signal peptide

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

2 Mbp

3 Mbp

4 Mbp

5 Mbp

6 Mbp

7 Mbp

8 Mbp

1 Mbp

0 bp

UGT201A1 ↓

UGT201F1 ↓

UGT204A3 ↓UGT204A4 ↓UGT204A5 ↓UGT201A4 ↓

UGT201C2 ↑UGT201C3 ↑

UGT203C1 ↑

UGT204A2 ↓

UGT201F3 ↓

UGT203A3 ↓

UGT201F2 ↓

UGT201B8 ↓UGT201B9 ↓

UGT201E1 ↓

UGT201B10 ↓

UGT201B7 ↓

UGT201A2v2 ↓UGT201A3p↓UGT204A1 ↓ UGT203A2 ↓

UGT201D1 ↑

UGT201B4p ↑UGT201B5 ↑UGT201B6 ↑

UGT201B3 ↑

UGT201D2 ↓

UGT204B1 ↑UGT204B2 ↑

UGT201G3 ↓

UGT205A1 ↓ UGT201A6 ↓

UGT205C1 ↑UGT204C1 ↓

UGT205A2 ↓

UGT201B1 ↓

UGT203A1 ↑

UGT201C1 ↓

UGT201B2 ↓

Scaffold 2 Scaffold 4 Scaffold 5Scaffold 1 Scaffold 6

UGT201B11 ↓UGT201B12 ↓UGT201B13 ↓UGT201B14 ↓

Scaffold 7

UGT201A5 ↓

UGT201H1 ↑

UGT206A1 ↑

UGT201G1 ↑

UGT207A1 ↑

Scaffold 8

UGT203B1 ↑

UGT203B2 ↑UGT203B3 ↓

Scaffold 9

Scaffold 11

UGT203D1 ↑

UGT202B1 ↓

Scaffold 10

UGT201G2 ↑

Scaffold 12

UGT202A1 ↑

Scaffold 15

UGT203E1 ↑

Scaffold 16

UGT202A13p ↑

UGT202A12 ↓

Scaffold 30

UGT205B1 ↑UGT203F1 ↑UGT203G1 ↑UGT205A3 ↑

UGT205B2 ↑

Scaffold 32 Scaffold 36 Scaffold 139

Scaffold 19

UGT201A7p ↓

UGT202A14p ↓

Scaffold 60

UGT201A2v1 ↑

Scaffold 184

UGT201A8 ↓

Scaffold 21

UGT202A2 ↓UGT202A3 ↓UGT202A4 ↓UGT202A5 ↓UGT202A6 ↓UGT202A7 ↓UGT202A8 ↓UGT202A15 ↓

UGT202A16 ↓

UGT202A9 ↓UGT202A10 ↓UGT202A11 ↑

Scaffold 22

2 Mbp

1 Mbp

0 bp

1 Mbp

0 bp

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

96

89

97

10098

99

90

99

99

100

100

96

87

9091

100

10082

99

100

83

9796

8388

97

9980

98 818680 81

82

82

92 88

96

83

8998

9792

88

92

9999

99

88

100

8992

0.5

centipede(Myriapoda)Daphnia

(Crustacea)Insect UGT50

Insects(Hexapoda)

T. urticae(Acari)

Actinobacteria/Chloroflexi

bacteria

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

UGT201A1 -------MNNKPKYRFLISSIDAFGHINCALAIGEILASNGHEUGT202A1 -------MDEKGALKVLMTSMIGLGHLHACIGIGALLRKRGHEUGT203A1 -------------MKIFFLPMDGHGHINACIGLARMLRDYNHEUGT204A1 --------MNSNPIKVLVTSINGYGHFNAALGVASLLANRGNDUGT205A1 ----------MGHLKILLSAMSGQGHVNAILGLAAILKQHNHEUGT206A1 MEESNLKGHRHKKNKIMLLPMDGAGHVLSLLGLAFYFKQRFHGUGT207A1 ---------MAKSIKILIIPTNGYGHLNIGLGLADMLTKHGHKUGT203G1 ---------MPKKYHFLITATNSYGPINAALGFGELLQRQGHHUGT108A1 -----------MSLTILFMPESAYGPTNNCIGIGDILRKRGHRUGT108A2 ---------MSGPLTILFAPESAYGPTNNCVGIGDVLRRRGHRUGT108A3 ---------MSEPLTILFAPESAYGPTNNCVGIGDVLRRRGHRUGT108A4 ---MSSTRGSLRTLTVLFAPESAYGPTNNCVGIGDVLRRRGHRUGT108A5 ---MSPTGRSLRPLTILFAPESAYGPTNNCVGIGGVLRRRGHR

Splicing site (phase 2)N-terminal

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

tomato Arabidopsisbean adultembryo larvae nymph

RPKM = 0−1 RPKM = 1−2 RPKM = 2−4 RPKM = 4−8RPKM = 8−32 RPKM = 32−64 RPKM = 64−128 RPKM = 128−256

UGT201

UGT202

UGT203

UGT204

UGT205

UGT206UGT207

UGT201A1UGT201A2v1UGT201A2v2UGT201A3pUGT201A4UGT201A5UGT201A6UGT201A7pUGT201A8UGT201B1UGT201B2UGT201B3UGT201B4pUGT201B5UGT201B6UGT201B7UGT201B8UGT201B9UGT201B10UGT201B11UGT201B12UGT201B13UGT201B14UGT201C1UGT201C2UGT201C3UGT201D1UGT201D2UGT201E1UGT201F1UGT201F2UGT201F3UGT201G1UGT201G2UGT201G3UGT201H1UGT202A1UGT202A2UGT202A3UGT202A4UGT202A5UGT202A6UGT202A7UGT202A8UGT202A9UGT202A10UGT202A11UGT202A12UGT202A13pUGT202A14pUGT202A15UGT202A16UGT202B1UGT203A1UGT203A2UGT203A3UGT203B1UGT203B2UGT203B3UGT203C1UGT203D1UGT203E1UGT203F1UGT203G1UGT204A1UGT204A2UGT204A3UGT204A4UGT204A5UGT204B1UGT204B2UGT204C1UGT205A1UGT205A2UGT205A3UGT205B1UGT205B2UGT205C1UGT206A1UGT207A1

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

−4 0 4value

Color Key

Host plant transfer Resistance Diapause

2 hrs 12 hrs 5 gen

tomato

qKO Col-0 atr1D

Arabidopsis

MR-VP MAR-AB cold diapauseUGT207A1UGT206A1UGT205C1UGT205B2UGT205B1UGT205A3UGT205A2UGT205A1UGT204C1UGT204B2UGT204B1UGT204A5UGT204A4UGT204A3UGT204A2UGT204A1UGT203G1UGT203F1UGT203E1UGT203C1UGT203B2UGT203B1UGT203A3UGT203A2UGT202B1UGT202A16UGT202A15UGT202A14pUGT202A13pUGT202A12UGT202A11UGT202A10UGT202A9UGT202A8UGT202A7UGT202A6UGT202A5UGT202A4UGT202A3UGT202A2UGT202A1UGT201H1UGT201G3UGT201G2UGT201G1UGT201F3UGT201F2UGT201F1UGT201E1UGT201D2UGT201D1UGT201C3UGT201C2UGT201C1UGT201B14UGT201B13UGT201B12UGT201B11UGT201B10UGT201B9UGT201B8UGT201B7UGT201B6UGT201B5UGT201B4pUGT201B3UGT201B2UGT201B1UGT201A8UGT201A7pUGT201A6UGT201A5UGT201A4UGT201A3pUGT201A2v2UGT201A2v1UGT201A1

UGT201

UGT202

UGT203

UGT204

UGT205

UGT206UGT207

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

Highlights

• Annotation and phylogenetic analysis of a UGT gene family in Tetranychus urticae.

• An important role of UGT in xenobiotic metabolism by transcriptome analyses.

• Comparative analysis revealed an ancient loss of UGT gene family in Chelicerata.

• Spider mites acquired UGT genes from bacteria by horizontal gene transfer.

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1

Supplementary figures and tables

Figure S1. Phylogenetic tree of T. urticae and P. citri UGTs. A set of 80 T. urticae and 51 P. citri UGT protein

sequences were aligned using MUSCLE (Edgar, 2004) and subjected to a maximum-likelihood analysis using

Treefinder (Jobb et al. 2004). The resulting tree is depicted as a “circle tree” and only the topology is shown.

Bootstrapping values lower than 80 are not shown. Accession numbers of T. urticae and P. citri UGT protein

sequences can be found in Table S5.

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

2

Table S1. Amino acid sequence identity (aaID) of T. urticae UGTs and others from bacteria or insects. Each of the

seven representative UGTs of T. urticae was compared with 17 bacterial UGT108s and 65 insect UGTs.

UGT201A1 UGT202A1 UGT203A1 UGT204A1 UGT205A1 UGT206A1 UGT207A1

Bacteria UGT108A1 29.3 31.4 32.9 28.6 36.9 30.0 35.0

Bacteria UGT108A2 28.5 28.9 32.7 28.2 35.0 28.9 33.6

Bacteria UGT108A3 28.2 29.2 32.7 28.7 34.5 28.0 32.9

Bacteria UGT108A4 29.4 28.9 32.5 28.5 36.1 28.7 32.6

Bacteria UGT108A5 29.5 29.9 32.9 28.1 35.3 30.6 33.9

Bacteria UGT108A6 26.1 24.8 30.8 26.9 32.7 28.2 27.7

Bacteria UGT108A7 26.3 24.9 32.2 26.6 32.6 28.2 31.4

Bacteria UGT108A8 25.2 28.4 35.5 26.4 36.2 29.4 31.4

Bacteria UGT108A9 24.6 28.9 34.1 27.8 32.3 25.0 32.3

Bacteria UGT108A10 28.1 31.1 35.6 28.3 35.9 30.2 34.0

Bacteria UGT108A11 28.7 29.9 33.4 26.4 35.5 30.8 33.4

Bacteria UGT108A12 29.5 29.9 32.9 28.1 35.3 30.6 33.9

Bacteria UGT108A13 25.4 28.3 32.9 27.7 32.3 26.0 32.7

Bacteria UGT108B1 26.1 24.3 30.4 25.3 31.1 29.4 31.2

Bacteria UGT108B2 28.1 29.2 31.8 26.9 31.5 31.7 30.3

Bacteria UGT108C1 25.5 26.2 29.4 23.2 28.6 26.9 27.1

Bacteria UGT108D1 27.5 28.0 28.9 27.3 31.8 30.4 29.2

Insect UGT31A1 8.1 7.1 9.2 11.5 11.4 9.4 8.6

Insect UGT32A1 9.7 9.9 6.8 9.7 11.6 10.8 10.9

Insect UGT33A1 11.9 13.3 10.6 8.7 8.0 10.8 13.6

Insect UGT34A1 7.5 10.1 12.7 15.4 8.6 9.2 12.3

Insect UGT35A1 12.1 10.8 7.8 10.6 13.6 10.1 8.2

Insect UGT36A1 13.9 13.1 9.2 14.0 11.8 9.4 13.3

Insect UGT37A1 13.0 12.2 10.8 12.0 10.2 11.9 12.3

Insect UGT38A1 15.6 12.6 10.4 13.3 14.3 11.7 11.2

Insect UGT39B1 11.2 11.2 12.2 12.6 15.5 11.0 6.7

Insect UGT40A1 14.3 10.3 17.2 14.3 15.2 15.1 11.8

Insect UGT41A1 13.6 11.9 13.2 13.3 10.9 11.2 13.6

Insect UGT42A1 5.9 15.6 12.0 8.7 11.8 12.1 9.3

Insect UGT43A1 8.6 14.9 15.1 14.9 10.0 10.1 14.8

Insect UGT44A1 13.4 13.1 12.2 10.8 13.4 10.8 10.1

Insect UGT46A1 12.3 11.7 12.2 13.1 8.9 10.3 7.5

Insect UGT47A1 12.3 13.5 15.5 9.0 15.5 16.4 12.9

Insect UGT48A1 12.5 10.6 8.2 9.2 9.8 14.4 11.6

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

3

Insect UGT49B1 11.9 13.3 6.4 9.4 13.6 11.2 12.5

Insect UGT50A1 13.4 14.2 10.1 11.3 13.0 6.5 8.6

Insect UGT301D1 8.8 9.6 10.8 8.7 12.1 11.2 7.7

Insect UGT302C1 15.0 12.8 13.2 16.8 15.9 10.3 14.2

Insect UGT303A1 9.5 11.9 10.8 10.8 13.4 11.7 12.9

Insect UGT304A1 9.2 11.0 10.8 8.5 11.6 11.9 11.6

Insect UGT307A1 12.5 9.2 14.4 9.0 13.2 14.4 11.0

Insect UGT311A1 13.9 14.5 10.1 9.2 13.9 13.0 7.7

Insect UGT312A1 16.5 13.5 11.1 14.3 13.6 11.9 11.6

Insect UGT316A1 13.0 6.4 13.4 6.7 12.3 10.6 14.4

Insect UGT317A1 7.0 11.0 8.7 15.2 14.6 14.2 11.8

Insect UGT318A1 11.7 13.8 10.8 11.0 10.5 10.6 8.6

Insect UGT319A1 13.0 14.5 8.7 6.7 10.9 14.4 11.0

Insect UGT320A1 13.9 11.5 13.4 9.7 12.5 13.5 14.4

Insect UGT321A1 10.6 15.4 10.6 10.3 16.6 9.2 8.4

Insect UGT322A1 15.4 10.3 12.0 11.7 11.4 9.7 8.0

Insect UGT323A1 13.4 11.5 12.2 15.4 8.4 10.8 13.1

Insect UGT324A1 13.4 14.9 12.9 12.4 12.3 13.7 12.3

Insect UGT325A1 16.7 11.7 6.8 13.6 11.8 10.8 10.3

Insect UGT326A1 13.9 11.2 13.4 13.1 13.0 13.5 12.5

Insect UGT327A1 15.8 11.9 13.4 11.3 8.6 14.6 10.3

Insect UGT328A1 11.7 11.9 9.4 11.0 12.1 12.6 12.5

Insect UGT329A1 14.3 11.5 14.8 10.1 13.0 13.7 13.3

Insect UGT330A1 11.9 11.0 13.7 7.1 10.9 15.1 17.4

Insect UGT331A1 14.5 14.0 13.2 9.7 13.6 12.8 12.5

Insect UGT332A1 13.0 15.4 9.9 14.9 10.2 11.0 11.0

Insect UGT333A1 16.5 10.1 12.2 12.4 13.2 11.7 13.3

Insect UGT334A1 10.6 10.3 12.5 11.0 9.8 13.3 12.7

Insect UGT335A1 14.3 12.6 9.4 7.1 13.0 15.3 10.3

Insect UGT336A1 15.8 11.7 10.1 12.0 14.3 11.7 10.8

Insect UGT337A1 10.1 9.2 10.1 11.7 7.5 8.3 13.3

Insect UGT338A1 9.0 9.4 7.5 11.3 8.6 8.5 5.8

Insect UGT339A1 10.8 14.0 12.0 12.4 10.7 8.3 9.0

Insect UGT340C1 10.3 13.1 12.0 8.5 6.8 10.6 10.1

Insect UGT341A1 9.5 15.8 11.1 12.4 12.3 12.8 15.1

Insect UGT342A1 15.6 8.3 7.3 14.3 11.8 11.5 9.9

Insect UGT343A1 12.1 11.9 15.5 13.3 10.7 13.7 13.1

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

4

Insect UGT344A1 12.5 11.0 9.7 9.2 10.2 12.6 7.3

Insect UGT345A1 7.9 14.0 13.2 11.0 11.8 9.7 11.2

Insect UGT346A1 10.8 11.9 14.6 9.7 12.1 11.0 8.6

Insect UGT347A1 11.2 12.6 9.2 12.6 10.9 14.6 12.0

Insect UGT348A1 17.1 11.7 13.9 12.6 10.9 14.8 10.5

Insect UGT349A1 10.6 11.5 9.9 14.3 11.6 8.1 15.3

Insect UGT350A1 14.5 12.8 8.7 10.8 15.2 11.0 8.4

Insect UGT351A1 15.0 10.6 12.9 11.5 9.8 15.3 9.3

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

5

Figure S2. Multiple alignment of the C-terminal domain of T. urticae UGTs. (A) The frequent amino acid residues

of the signature motif are presented by graphical logos generated using the WebLogo application available at

http://weblogo.berkeley.edu (Crooks et al., 2004). (B) Deduced amino acid sequences of the C-terminal domain of

T. urticae UGTs are aligned. Slush (/) at some C-terminal ends refers to a position where extended sequences are

omitted.

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

6

Figure S3. Graphical representation of the UGT signature motif of different taxa. The overall height of each stack

indicates the sequence conservation at that position (measured in bits) and the height of symbols within the stack

reflects the relative frequency of the corresponding amino or nucleic acid at that position. Logos were generated

using the WebLogo application available at http://weblogo.berkeley.edu (Crooks et al., 2004).

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

7

Table S2. The log2FC of the significant differentially expressed T. urticae UGT genes in two multi-resistant strains

(MR-VP and MAR-AB) relative to the susceptible London strain.

T. urticae UGT MR-VP MAR-AB

UGT201A5 1,03 1,53

UGT201A7p 1,49

UGT201B8 -1,81 -2,77

UGT201C2 -1,49 -2,18

UGT201C3 -1,05

UGT201F3 1,06

UGT202A1 1,26 1,44

UGT204B1 2,28 3,58

UGT204B2 1,43 2,20

UGT205A1 -2,55 -1,18

UGT205A2 -1,13

UGT205C1 -1,74

Total No. 9 10

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

8

Table S3. The log2FC of significant differentially expressed T. urticae UGT genes after host plant transfer with a

different timing (2hrs, 12hrs, or 5 generations) on tomato, or after transfer to different Arabidopsis lines (qKO, Col-

0, atr1D) with different allelochemical (glucosinolate) content, relative to the same strain feeding on the ancestral,

benign host plant, bean. The qKO mutant line lacks glucosinolates and atr1D line overproduces glucosinolates

compared to the wild type, Col-0.

tomato

Arabidopsis

T. urticae UGT 2 hrs 12 hrs 5 gen

qKO Col-0 atr1D

UGT201A1

-1,20

-2,13 -2,16 -1,99

UGT201A2v1

1,19

1,10 1,92

UGT201A2v2

1,77

1,72 2,49

UGT201A4

-1,66

UGT201A7p

1,12

UGT201A8

1,71

1,51 2,52

UGT201B1

-1,03

UGT201B5

2,94

1,43 1,65

UGT201B7

-2,02 -2,61

UGT201B8

-2,41 -4,29

-2,01 -1,80 -2,17

UGT201B9

-2,09 -3,22

UGT201B10

-2,10 -4,31

UGT201B12

1,23

UGT201B13 1,17

UGT201C1 -1,09 -1,00 -1,54

UGT201C2 -1,34

UGT201F2 -1,50

UGT201F3 -1,72

UGT202A2

2,51

UGT202A4

1,59

UGT202A5

-1,03

UGT203G1 1,10 1,76

UGT204A1 1,32 1,22

UGT204A2 2,47 4,83 1,64 1,41

UGT204A3 2,29 1,35 1,38

UGT204A4 1,23 1,09

UGT204A5 1,16 1,85 1,51 2,07

UGT205A1

-1,79

UGT205C1

-1,84 -1,94 -2,77

Total No. 0 9 22

5 13 13

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

9

Table S4. The log2FC of significant differentially expressed T. urticae UGT genes in the spider mites living under

diapause-inducing conditions, with a distinction between feeding mites and mites in the non-feeding diapause

state, relative to the London strain living under standard laboratory conditions.

T. urticae UGT cold diapause

UGT201A1 -1,72 -2,82

UGT201A2v2

-1,26

UGT201A4 -1,31 -2,65

UGT201A6

-1,27

UGT201B1 1,77 1,23

UGT201B2 1,87 1,58

UGT201B4p 1,06

UGT201B5 -2,31 -2,28

UGT201B7

-1,72

UGT201B8 -2,24 -2,80

UGT201B9

-1,56

UGT201B10

-1,82

UGT201B12 1,05

UGT201B13 1,32

UGT201B14 1,07

UGT201C1 -1,73

UGT201C2 -1,42 -2,49

UGT201C3 -1,86

UGT201F1 -1,08

UGT201F3 2,18 -1,27

UGT201G1 -1,01

UGT202A2

-1,50

UGT202A4 1,24 2,81

UGT202A5

-1,85

UGT202A6

-1,54

UGT202A7 -1,05 -1,06

UGT202A8

-1,60

UGT202A15 -1,55

UGT202A16 -1,77

UGT203C1 -1,30 -1,55

UGT203D1 2,24

UGT203E1 1,11 1,29

UGT203G1 1,22

UGT204A2 -2,33

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

10

Table S4. Continued

T. urticae UGT cold diapause

UGT204A3 -1,48

UGT204A4 -1,06

UGT204A5 -1,13

UGT205A1 -1,09

UGT205B1 -1,78

UGT205C1

-3,49

Total No. 17 35

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

11

Figure S4. Phylogenetic analysis of bacterial UGTs and some selected T. urticae UGTs. A maximum-likelihood tree

was constructed by MEGA 5.2 (Tamura et al., 2011) using an insect UGT (Bombyx mori UGT41A1) as outgroup. The

seven representative T. urticae UGT protein sequences (one from each subfamily) are embedded in bacterial UGTs

(from Actinobacteria or Chloroflexi). The scale bar represents 0.5 substitutions per site. Accession numbers of UGT

protein sequences used can be found in Table S5.

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

12

Table S5. Accession numbers of UGT protein sequences used in this study.

Taxa Species UGT name GenBank Accession number BOGAS accession number

(for T. urticae)

Animal Homo sapiens UGT1A1 NP_000454.1

Animal Bombyx mori UGT41A1 AEW43175.1

Animal Tetranychus urticae UGT201A1 KJ584711 tetur01g03820

Animal Tetranychus urticae UGT201A2v1 KJ584712 tetur60g00080

Animal Tetranychus urticae UGT201A2v2 KJ584713 tetur02g02480

Animal Tetranychus urticae UGT201A3p KJ584714 tetur02g02770

Animal Tetranychus urticae UGT201A4 KJ584715 tetur05g09325

Animal Tetranychus urticae UGT201A5 KJ584716 tetur08g00190

Animal Tetranychus urticae UGT201A6 KJ584717 tetur19g00440

Animal Tetranychus urticae UGT201A7p KJ584718 tetur21g01400*

Animal Tetranychus urticae UGT201A8 KJ584719 tetur184g00030*

Animal Tetranychus urticae UGT201B1 KJ584720 tetur01g05690

Animal Tetranychus urticae UGT201B2 KJ584721 tetur01g05700

Animal Tetranychus urticae UGT201B3 KJ584722 tetur04g07630

Animal Tetranychus urticae UGT201B4p KJ584723 tetur04g07710*

Animal Tetranychus urticae UGT201B5 KJ584724 tetur04g07770

Animal Tetranychus urticae UGT201B6 KJ584725 tetur04g07780

Animal Tetranychus urticae UGT201B7 KJ584726 tetur05g05020

Animal Tetranychus urticae UGT201B8 KJ584727 tetur05g05030

Animal Tetranychus urticae UGT201B9 KJ584728 tetur05g05050

Animal Tetranychus urticae UGT201B10 KJ584729 tetur05g05060

Animal Tetranychus urticae UGT201B11 KJ584730 tetur07g06390

Animal Tetranychus urticae UGT201B12 KJ584731 tetur07g06420

Animal Tetranychus urticae UGT201B13 KJ584732 tetur07g06430

Animal Tetranychus urticae UGT201B14 KJ584733 tetur07g06450

Animal Tetranychus urticae UGT201C1 KJ584734 tetur01g11870

Animal Tetranychus urticae UGT201C2 KJ584735 tetur05g04680

Animal Tetranychus urticae UGT201C3 KJ584736 tetur05g04690

Animal Tetranychus urticae UGT201D1 KJ584737 tetur04g04300

Animal Tetranychus urticae UGT201D2 KJ584738 tetur04g04350

Animal Tetranychus urticae UGT201E1 KJ584739 tetur05g05710

Animal Tetranychus urticae UGT201F1 KJ584740 tetur02g01310

Animal Tetranychus urticae UGT201F2 KJ584741 tetur06g02410*

Animal Tetranychus urticae UGT201F3 KJ584742 tetur06g02430

Animal Tetranychus urticae UGT201G1 KJ584743 tetur08g07460

Animal Tetranychus urticae UGT201G2 KJ584744 tetur12g00360

Animal Tetranychus urticae UGT201G3 KJ584745 tetur02g10390

Animal Tetranychus urticae UGT201H1 KJ584746 tetur08g03000

Animal Tetranychus urticae UGT202A1 KJ584747 tetur15g00340

Animal Tetranychus urticae UGT202A2 KJ584748 tetur22g00270

Animal Tetranychus urticae UGT202A3 KJ584749 tetur22g00310

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

13

Animal Tetranychus urticae UGT202A4 KJ584750 tetur22g00330

Animal Tetranychus urticae UGT202A5 KJ584751 tetur22g00350

Animal Tetranychus urticae UGT202A6 KJ584752 tetur22g00360*

Animal Tetranychus urticae UGT202A7 KJ584753 tetur22g00380

Animal Tetranychus urticae UGT202A8 KJ584754 tetur22g00420

Animal Tetranychus urticae UGT202A9 KJ584755 tetur22g00460

Animal Tetranychus urticae UGT202A10 KJ584756 tetur22g00480

Animal Tetranychus urticae UGT202A11 KJ584757 tetur22g00510

Animal Tetranychus urticae UGT202A12 KJ584758 tetur30g00390

Animal Tetranychus urticae UGT202A13p KJ584759 tetur30g02050*

Animal Tetranychus urticae UGT202A14p KJ584760 tetur139g00010*

Animal Tetranychus urticae UGT202A15 KJ584761 tetur22g00440

Animal Tetranychus urticae UGT202A16 KJ584762 tetur22g00970

Animal Tetranychus urticae UGT202B1 KJ584763 tetur10g02090

Animal Tetranychus urticae UGT203A1 KJ584764 tetur01g07060

Animal Tetranychus urticae UGT203A2 KJ584765 tetur04g02350

Animal Tetranychus urticae UGT203A3 KJ584766 tetur06g06100

Animal Tetranychus urticae UGT203B1 KJ584767 tetur09g00220

Animal Tetranychus urticae UGT203B2 KJ584768 tetur09g01650

Animal Tetranychus urticae UGT203B3 KJ584769 tetur09g01660

Animal Tetranychus urticae UGT203C1 KJ584770 tetur05g05090

Animal Tetranychus urticae UGT203D1 KJ584771 tetur10g05770

Animal Tetranychus urticae UGT203E1 KJ584772 tetur16g02300

Animal Tetranychus urticae UGT203F1 KJ584773 tetur36g00340

Animal Tetranychus urticae UGT203G1 KJ584774 tetur36g01060

Animal Tetranychus urticae UGT204A1 KJ584775 tetur02g03300

Animal Tetranychus urticae UGT204A2 KJ584776 tetur05g00060

Animal Tetranychus urticae UGT204A3 KJ584777 tetur05g00070

Animal Tetranychus urticae UGT204A4 KJ584778 tetur05g00080

Animal Tetranychus urticae UGT204A5 KJ584779 tetur05g00090

Animal Tetranychus urticae UGT204B1 KJ584780 tetur02g09830

Animal Tetranychus urticae UGT204B2 KJ584781 tetur02g09850

Animal Tetranychus urticae UGT204C1 KJ584782 tetur11g06460

Animal Tetranychus urticae UGT205A1 KJ584783 tetur11g01230

Animal Tetranychus urticae UGT205A2 KJ584784 tetur11g01250

Animal Tetranychus urticae UGT205A3 KJ584785 tetur32g01240

Animal Tetranychus urticae UGT205B1 KJ584786 tetur32g01230

Animal Tetranychus urticae UGT205B2 KJ584787 tetur32g01250

Animal Tetranychus urticae UGT205C1 KJ584788 tetur11g01830

Animal Tetranychus urticae UGT206A1 KJ584789 tetur08g05390

Animal Tetranychus urticae UGT207A1 KJ584790 tetur08g02490

Animal Daphnia pulex UGT208A1 KJ584791

Animal Daphnia pulex UGT208A2 KJ584792

Animal Daphnia pulex UGT208A3 KJ584793

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

14

Animal Daphnia pulex UGT208B1 KJ584794

Animal Daphnia pulex UGT208B2 KJ584795

Animal Daphnia pulex UGT208C1 KJ584796

Animal Daphnia pulex UGT208C2 KJ584797

Animal Daphnia pulex UGT208D1 KJ584798

Animal Daphnia pulex UGT208E1 KJ584799

Animal Daphnia pulex UGT208F1p KJ584800

Animal Daphnia pulex UGT209A1 KJ584801

Animal Daphnia pulex UGT209A2 KJ584802

Animal Daphnia pulex UGT209A3 KJ584803

Animal Daphnia pulex UGT209A4 KJ584804

Animal Daphnia pulex UGT209B1 KJ584805

Animal Daphnia pulex UGT209B2 KJ584806

Animal Daphnia pulex UGT209C1 KJ584807

Animal Daphnia pulex UGT209D1 KJ584808

Animal Daphnia pulex UGT209E1 KJ584809

Animal Daphnia pulex UGT209F1 KJ584810

Animal Daphnia pulex UGT210A1 KJ584811

Animal Daphnia pulex UGT210B1 KJ584812

Animal Daphnia pulex UGT210C1 KJ584813

Animal Daphnia pulex UGT210D1 KJ584814

Animal Daphnia pulex UGT210E1 KJ584815

Animal Strigamia maritima UGT211A1 KJ584816

Animal Strigamia maritima UGT211A2 KJ584817

Animal Strigamia maritima UGT211B1 KJ584818

Animal Strigamia maritima UGT211B2p KJ584819

Animal Strigamia maritima UGT211C1 KJ584820

Animal Strigamia maritima UGT211C2 KJ584821

Animal Strigamia maritima UGT211D1 KJ584822

Animal Strigamia maritima UGT211E1 KJ584823

Animal Strigamia maritima UGT211F1 KJ584824

Animal Strigamia maritima UGT211G1 KJ584825

Animal Strigamia maritima UGT211H1 KJ584826

Animal Strigamia maritima UGT212A1 KJ584827

Animal Strigamia maritima UGT212A2 KJ584828

Animal Strigamia maritima UGT212A3 KJ584829

Animal Strigamia maritima UGT212B1 KJ584830

Animal Strigamia maritima UGT213A1 KJ584831

Animal Strigamia maritima UGT213B1 KJ584832

Animal Strigamia maritima UGT214A1 KJ584833

Animal Strigamia maritima UGT215A1 KJ584834

Animal Strigamia maritima UGT216A1 KJ584835

Bacteria Streptomyces antibioticus UGT101A1 CAA80301.1

Bacteria Streptomyces lividans UGT101A2 AAA26780.1

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

15

Bacteria Pantoea agglomerans UGT102A1 AAA64979.1

Bacteria Pantoea ananatis UGT102B1 BAA14125.1

Bacteria Pseudomonas aeruginosa UGT103 AAA62129.1

Bacteria Mycobacterium tuberculosis UGT104A1 NP_216042.1

Bacteria Mycobacterium avium UGT104A2 AAD44208.1

Bacteria Mycobacterium avium UGT104A3 AAD44209.1

Bacteria Mycobacterium leprae UGT104A4 NP_302527.1

Bacteria Mycobacterium tuberculosis UGT104A5 NP 216040.1

Bacteria Amycolatopsis orientalis UGT105A1 CAA11775.1

Bacteria Amycolatopsis orientalis UGT105A2 AAB49299.1

Bacteria Amycolatopsis orientalis UGT105A3 CAA11776.1

Bacteria Amycolatopsis orientalis UGT105A4 AAB49298.1

Bacteria Amycolatopsis orientalis UGT105A5 CAA11774.1

Bacteria Amycolatopsis balhimycina UGT105A6 CAA76551.1

Bacteria Amycolatopsis balhimycina UGT105A7 CAA76552.1

Bacteria Amycolatopsis balhimycina UGT105A8 CAA76553.1

Bacteria Bacillus subtilis UGT106A1 NP_390075.1

Bacteria Staphylococcus aureus UGT106B1 CAA74741.1

Bacteria Bacillus subtilis UGT107A1 CAA05612.1

Bacteria Streptosporangium roseum DSM 43021 UGT108A1 YP_003337539

Bacteria Streptomyces violaceusniger Tu 4113 UGT108A2 YP_004814675

Bacteria Streptomyces rapamycinicus NRRL 5491 UGT108A3 WP_020865434

Bacteria Streptomyces gancidicus UGT108A4 WP_006134531

Bacteria Streptomyces ghanaensis ATCC 14672 UGT108A5 WP_004978524

Bacteria Frankia sp. EuI1c UGT108A6 YP_004014445

Bacteria Frankia sp. CN3 UGT108A7 WP_007512641

Bacteria Mycobacterium smegmatis str. MC2 155 UGT108A8 YP_006569926

Bacteria Mycobacterium vaccae UGT108A9 WP_003931471.1

Bacteria Nonomuraea coxensis DSM 45129 UGT108A10 WP_020546554

Bacteria Actinoplanes globisporus DSM 43857 UGT108A11 WP_020512288

Bacteria Streptomyces viridosporus T7A UGT108A12 WP_016823039

Bacteria Mycobacterium sp. VKM Ac 1815D UGT108A13 WP_019511661

Bacteria Sphaerobacter thermophilus DSM 20745 UGT108B1 YP_003320072.1

Bacteria Thermomicrobium roseum DSM 5159 UGT108B2 YP_002522616

Bacteria Saccharopolyspora erythraea NRRL2338 UGT108C1 WP_009943080

Bacteria Mycobacterium smegmatis JS623 UGT108D1 YP_007294515