Bacterial origin of a diverse family of UDP-glycosyltransferase genes in the Tetranychus urticae...
-
Upload
independent -
Category
Documents
-
view
0 -
download
0
Transcript of Bacterial origin of a diverse family of UDP-glycosyltransferase genes in the Tetranychus urticae...
Accepted Manuscript
Bacterial origin of a diverse family of UDP-glycosyltransferase genes in theTetranychus urticae genome
Seung-Joon Ahn, Wannes Dermauw, Nicky Wybouw, David G. Heckel, Thomas VanLeeuwen
PII: S0965-1748(14)00067-8
DOI: 10.1016/j.ibmb.2014.04.003
Reference: IB 2573
To appear in: Insect Biochemistry and Molecular Biology
Received Date: 10 February 2014
Revised Date: 28 March 2014
Accepted Date: 1 April 2014
Please cite this article as: Ahn, S.-J., Dermauw, W., Wybouw, N., Heckel, D.G., Van Leeuwen, T.,Bacterial origin of a diverse family of UDP-glycosyltransferase genes in the Tetranychus urticaegenome, Insect Biochemistry and Molecular Biology (2014), doi: 10.1016/j.ibmb.2014.04.003.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service toour customers we are providing this early version of the manuscript. The manuscript will undergocopyediting, typesetting, and review of the resulting proof before it is published in its final form. Pleasenote that during the production process errors may be discovered which could affect the content, and alllegal disclaimers that apply to the journal pertain.
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
Arachnida
Acari
Acariformes
Two-spotted spider mite’sUGT gene family
Parasitiformes AraneaeScorpiones
Chelicerata
Arthropoda
BacteriaUGT lost
UGT gained(horizontal gene transfer)
Myriapoda Hexapoda
Mandibulata
Crustacea
NoUGT
animal typeUGTs
bacterial typeUGTs
tetur01g07060-UGT203A1
tetur10g05770-UGT203D1
tetur0
4g02
350-U
GT203A
2
tetu
r06g
0610
0-UG
T203
A3
100
tetu
r36g
0034
0-UG
T203
F1
tetu
r05g
0509
0-UG
T203
C1
tetu
r36g
0106
0-U
GT2
03G
1
tetu
r09g
0022
0-U
GT2
03B1
tetu
r09g
0165
0-U
GT2
03B2
tetu
r09g
0166
0-U
GT2
03B3
tetu
r16g
0230
0-U
GT2
03E1
8983
99
tetu
r08g
0539
0-U
GT2
06A1
tetu
r08g
0249
0-U
GT2
07A1
91
tetu
r11g
0123
0-U
GT2
05A1
tetu
r11g
0125
0-U
GT2
05A2
tetu
r32g
0124
0-UG
T205
A3te
tur1
1g01
830-
UGT2
05C1
tetur
32g0
1230
-UGT2
05B1
tetur32g01250-UGT205B2
99
100
tetu
r04g
0435
0-UG
T201
D2
tetur0
4g04
300-U
GT201D
1tet
ur05
g057
10-U
GT201
E1
tetur1
2g00
360-U
GT201
G2
tetur08g07460-UGT201G1
tetur02g10390-UGT201G3
tetur06g02410*-UGT201F2
tetur06g02430-UGT201F3
tetur02g01310-UGT201F1
tetur07g06450-UGT201B14
tetur07g06430-UGT201B13
tetur07g06420-UGT201B12tetur07g06390-UGT201B11tetur04g07770-UGT201B5
tetur04g07780-UGT201B6
tetur04g07710*-UGT201B4p
88
tetur05g05060-UGT201B10
tetur05g05050-UGT201B9
tetur05g05030-UGT201B8
tetur05g05020-UGT201B7
tetur04g07630-UGT201B3
tetur01g05700-UG
T201B2
tetur01g05690-UG
T201B1
96
99
98
tetur08g03000-UGT201H1
tetur05g04690-UGT201C3
tetur05g04680-UG
T201C2
tetur01g11870-UG
T201C1
tetur19g00440-UG
T201A6
tetur05g09325-UG
T201A4
tetur21g01400*-UG
T201A7p
tetur08g00190-UG
T201A5
tetur184g00030*-UGT201A8
tetur02g02770-UGT201A3p
tetur60g00080-UGT201A2v1
tetur02g02480-UGT201A2v2
tetur01g03820-UGT201A1
100
99
100
tetu
r02g
0330
0-U
GT2
04A1
tetu
r05g
0006
0-UG
T204
A2
tetu
r05g
0007
0-UG
T204
A3
tetur0
5g00
090-U
GT204A
5
tetur05g00080-UGT204A4
100
tetur02g09830-UGT204B1
tetur02g09850-UGT204B2
99
tetur11g06460-UGT204C1
tetur10g02090-UGT202B1
tetur1
39g0
0010
*-UGT2
02A14
p
tetur22g00510-UGT202A11
tetur22g00460-UGT202A9
tetur22g00480-UGT202A10
tetur30g02050*-UGT202A13ptetur15g00340-UGT202A1
tetur22g00310-UGT202A3
tetur22g00330-UGT202A4
tetur22g00350-UGT202A5
100
tetur22g00270-UGT202A2
tetur30g00390-UGT202A12
96
tetur22g00360*-UGT202A6
tetur22g00380-UGT202A7
tetur22g00420-UGT202A8
tetur22g00440-UGT202A15
tetur22g00970-UG
T202A16
99
100
95
0.2
UGT201
UGT205
UGT206
UGT207
UGT203
UGT202
UGT204
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
1
Bacterial origin of a diverse family of UDP-glycosyltransferase genes in the 1
Tetranychus urticae genome 2
3
Seung-Joon Ahn1, 2, *, Wannes Dermauw3, Nicky Wybouw3, David G. Heckel1, Thomas Van Leeuwen3, 4 4
5
1 Department of Entomology, Max Planck Institute for Chemical Ecology, 07745 Jena, Germany 6
2 National Institute of Horticultural and Herbal Science, Rural Development Administration, 441-440 7
Suwon, Korea 8
3 Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, B-9000 Ghent, 9
Belgium 10
4 Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, 1098 XH Amsterdam, 11
The Netherlands 12
* Corresponding author: Seung-Joon Ahn (S.-J. Ahn) 13
Department of Entomology, Max Planck Institute for Chemical Ecology, 07745 Jena, Germany 14
Tel.: +49 3641 571555; fax: +49 3641 571502. E-mail address: [email protected] 15
16
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
2
Abstract 17
UDP-glycosyltransferases (UGTs) catalyze the conjugation of a variety of small lipophilic molecules with 18
uridine diphosphate (UDP) sugars, altering them into more water-soluble metabolites. Thereby, UGTs 19
play an important role in the detoxification of xenobiotics and in the regulation of endobiotics. Recently, 20
the genome sequence was reported for the two-spotted spider mite, Tetranychus urticae, a polyphagous 21
herbivore damaging a number of agricultural crops. Although various gene families implicated in 22
xenobiotic metabolism have been documented in T. urticae, UGTs so far have not. We identified 80 UGT 23
genes in the T. urticae genome, the largest number of UGT genes in a metazoan species reported so far. 24
Phylogenetic analysis revealed that lineage-specific gene expansions increased the diversity of the T. 25
urticae UGT repertoire. Genomic distribution, intron-exon structure and structural motifs in the T. urticae 26
UGTs were also described. In addition, expression profiling after host-plant shifts and in acaricide 27
resistant lines supported an important role for UGT genes in xenobiotic metabolism. Expanded searches 28
of UGTs in other arachnid species (Subphylum Chelicerata), including a spider, a scorpion, two ticks and 29
two predatory mites, unexpectedly revealed the complete absence of UGT genes. However, a centipede 30
(Subphylum Myriapoda) and a water flea and a crayfish (Subphylum Crustacea) contain UGT genes in 31
their genomes similar to insect UGTs, suggesting that the UGT gene family might have been lost early in 32
the Chelicerata lineage and subsequently re-gained in the tetranychid mites. Sequence similarity of T. 33
urticae UGTs and bacterial UGTs and their phylogenetic reconstruction suggest that spider mites 34
acquired UGT genes from bacteria by horizontal gene transfer. Our findings show a unique evolutionary 35
history of the T. urticae UGT gene family among other arthropods and provide important clues to its 36
functions in relation to detoxification and thereby host adaptation. 37
Keywords: Tetranychus urticae; UDP-glycosyltransferase; Detoxification; Horizontal gene transfer; 38
Arthropoda; Chelicerata 39
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
3
Abbreviations: aaID, amino acid identitiy; CDS, coding sequences; EGT, ecdysteroid UDP-40
glycosyltransferase; GT, glycosyltransferase; HGT, horizontal gene transfer; TM, transmembrane 41
domain; TSA, transcriptome shotgun assembly; UDP, uridine diphosphate; UGT, UDP-42
glycosyltransferases; WGS, whole-genome shotgun contigs 43
44
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
4
1. Introduction 45
Glycosyltransferases (GTs) (EC2.4.x.y) are ubiquitous across all kingdoms of life and catalyze the 46
transfer of sugar moieties from activated donor molecules to a variety of acceptor molecules, such as 47
carbohydrates, proteins, lipids, nucleic acids, antibiotics and other small molecules (Lairson et al., 2008). 48
As of March 2014, 95 families of GTs have been identified (GT1-GT95) and classified hierarchically 49
according to the stereochemistry of the substrates and reaction products (http://www.cazy.org) (Lombard 50
et al., 2013). Among these, GT1, often referred to as UDP-glycosyltransferases (UGTs), is the largest 51
family containing the majority of GT genes. In Arabidopsis thaliana, Caenorhabditis elegans and 52
Drosophila melanogaster, they account for more than 25, 29 and 24% of the total documented GT genes, 53
respectively (Yonekura-Sakakibara and Hanada, 2011). 54
UGTs are a gene family of GT1 enzymes that catalyze the conjugation of a variety of small lipophilic 55
molecules with uridine diphosphate (UDP) sugars, increasing their solubility in water. Therefore, 56
glycosylation by UGTs plays an important role in not only the detoxification of xenobiotics, but also the 57
biosynthesis, storage and transport of secondary metabolites. The protein structure is commonly divided 58
into two main parts: the N-terminal domain for aglycone substrate binding and the C-terminal domain for 59
UDP-sugar donor binding (Meech et al., 2012). 60
UGTs are common in all living organisms including viruses, bacteria, plants and animals. Most 61
baculovirus genomes encode the enzyme ecdysteroid UDP-glycosyltransferase (EGT) which regulates the 62
development of the host insect by glycosylating and inactivating ecdysteroid hormones (Hughes, 2013; 63
O'Reilly, 1995). The plant endornaviruses also contain UGTs in their genomes (Hacker et al., 2005; Song 64
et al., 2013). Bacterial UGTs are involved in the glycosylation of various natural products including 65
antibiotics, and their engineering has been encouraged for pharmacological and industrial applications for 66
many years (Erb et al., 2009; Luzhetskyy and Bechthold, 2008). In vertebrates, UGTs are regarded as a 67
major member of the phase II drug metabolizing enzymes, conjugating a large number of xenobiotics and 68
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
5
endobiotics including many drugs with UDP-glucuronic acid as a sugar donor (Bock, 2003). Vertebrate 69
UGTs contain an N-terminal signal peptide that is removed following insertion of the proteins into 70
endoplasmic reticulum (ER), and a C-terminal transmembrane (TM) domain that anchors the protein to 71
the ER membrane with catalytic sites facing the lumen and a tail exposed to cytosol (Magdalou et al., 72
2010). In plants, a variety of UGTs play an important role in the biosynthesis and modification of 73
secondary metabolites, thereby enhancing their solubility and stability, and determining their bioactivity. 74
Plant UGTs lack a signal peptide and a TM domain and are thus localized in the cytosol (Bowles et al., 75
2005). In insects, the significance of glycosylation of small hydrophobic compounds has been overlooked 76
for many years, as it was often regarded as a minor mechanism of enzymatic detoxification, compared to 77
others such as cytochrome P450 monooxygenases (P450s), glutathione-S-transferases (GSTs) and 78
carboxyl/cholinesterases (CCEs) (Brattsten, 1988; Després et al., 2007; Smith, 1962). However, recent 79
biochemical and functional studies revealed that the insect UGTs are responsible for the detoxification 80
and sequestration of a variety of plant allelochemicals and insecticides (Ahn et al., 2011; Daimon et al., 81
2010; Kojima et al., 2010; Lee et al., 2006; Sasai et al., 2009). Recent genome sequencing identified a 82
large collection (>300 genes) of insect UGTs, revealing diverse features of this gene family such as 83
lineage-specific gene diversifications between different insect orders and a conserved gene family 84
(UGT50) along the species evolution in holometabolous insects (Ahn et al., 2012). However, the UGT 85
family has so far not been studied in arthropods other than insects. 86
The two-spotted spider mite, Tetranychus urticae (Subphylum Chelicerata, Order Trombidiformes), is 87
one of the most polyphagous herbivores known, and has been documented to feed on more than 1,100 88
plant species that belong to more than 140 different plant families, including many plants that produce 89
toxic compounds (Jeppson et al., 1975; Migeon and Dorkeld, 2013). In addition, spider mites are major 90
agricultural pests and are the ‘resistance champion’ among arthropods as they have the most documented 91
instances of resistance to diverse pesticides (Van Leeuwen et al., 2010). The molecular mechanisms 92
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
6
underlying the spider mite’s resistance to xenobiotics (pesticides and plant secondary metabolites) are 93
however less understood compared to insects (Van Leeuwen et al., 2010; Yang et al., 2002). 94
Recently, a draft genome of T. urticae was reported, the first published genome sequence of a chelicerate 95
(Grbić et al., 2011). The availability of the genome sequence has provided a unique opportunity to study 96
the role of gene families involved in xenobiotic metabolism in the spider mite (Dermauw et al., 2013b; 97
Van Leeuwen et al., 2012a; Van Leeuwen et al., 2012b). Characterization of gene families associated with 98
detoxification of xenobiotics is the first step towards a better understanding of how the spider mite copes 99
with the noxious compounds (Van Leeuwen et al., 2012b). So far, P450s, GSTs, CCEs, and ATP-binding 100
cassette (ABC) transporters have been recently studied in genome-wide perspectives (Dermauw et al., 101
2013a; Grbić et al., 2011), where the importance of these gene families was documented in both 102
insecticide resistance and adaptation to novel hosts (Dermauw et al., 2013b). However, the UGT gene 103
family has not been studied so far, in spite of its putative potential in the biology of the spider mite. 104
In this report, we provide a comprehensive analysis of the UGT gene family in T. urticae, which is the 105
first genome-wide characterization among non-insect arthropods. All of UGT sequences were annotated 106
in the T. urticae genome and classified according to the current nomenclature system (Mackenzie et al., 107
2005). Phylogenetic analysis with closely and distantly related organisms revealed that the spider mite 108
UGTs are intimately related to bacterial sequences, suggesting horizontal gene transfer. Amino acid 109
sequence alignment and structure prediction further support the bacterial origin. The gene searches were 110
expanded into a wide range of arthropod species to get an overall insight of the evolution of this gene 111
family. Transcriptome analyses provided a wealth of information on gene expression profiles related to 112
host plant challenge or pesticide resistance status. This study provides not only a baseline study that will 113
facilitate functional studies on the roles of T. urticae UGTs in metabolism, detoxification, resistance and 114
host plant adaptation, but also an evolutionary perspective of this gene family in the arthropod-wide 115
context. 116
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
7
2. Materials and Methods 117
2.1. Identification of UGT genes in the genome of T. urticae 118
UGT amino acid sequences from insects were used as queries in tBLASTn searches (Altschul et al., 1997) 119
against the T. urticae genome sequence assembly available at the ORCAE genome portal, 120
http://bioinformatics.psb.ugent.be/orcae/overview/Tetur. All hits with threshold E-value < 10-1 were 121
extracted for analysis, and gene models and EST sequences identified were aligned with genomic 122
scaffolds (London strain) to annotate complete gene structure by using Sequencher (Gene Codes 123
Corporation, MI, USA). In most cases, predicted gene models were good to find full sequences, but in 124
some cases incomplete or split gene models were necessary to be manually annotated. UGT sequences 125
identified in this study were deposited in GenBank and their accession numbers can be found in Table S5. 126
2.2. UGT genes in other species 127
The T. urticae UGTs identified in this study and the insect UGTs available from NCBI were used as 128
queries to perform tBLASTn searches against the arthropod and bacteria genome databases, including the 129
whole-genome shotgun contigs (WGS) as well as transcriptome shotgun assemblies (TSA) in NCBI 130
restricted to the following subphyla of arthropods: Chelicerata (mites and ticks, scorpions, spiders), 131
Myriapoda (centipede) and Crustacea (water flea, crayfish). Repetitive BLAST searches were conducted 132
also in other databases, such as VectorBase (https://www.vectorbase.org; Megy et al., 2012), BCM-133
HGSC (https://www.hgsc.bcm.edu/arthropods/i5k-pilot), and Ensembl Metazoa 134
(http://metazoa.ensembl.org/index.html). In addition, a transcriptome (SRA) data set of Panonychus citri 135
(Family Tetranychidae, the citrus red mite), deposited in EMBL-EBI 136
(http://www.ebi.ac.uk/ena/data/view/ERP000885) (Liu et al., 2011), was assembled by CLC genomic 137
workbench (CLC Bio, Qiagen, Denmark) to make it ‘BLASTable’ for orthologous UGT searches. 138
Accession numbers of UGT protein sequences used in this study can be found in Table S5. 139
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
8
2.3. Nomenclature 140
According to the current UGT nomenclature guidelines (Mackenzie et al., 2005), the UGT genes were 141
named with the following criteria: the gene symbol UGT, a family number, a subfamily letter, and an 142
individual gene number. Families are defined as sharing 40% or more amino acid sequence identity 143
(aaID) and subfamilies defined at 60% aaID or greater. Multiple sequence alignment was performed with 144
ClustalW and adjusted manually. Preliminary grouping was done using the program CD-HIT (Li and 145
Godzik, 2006) at 60% and 40% sequence identity as cut-off values, and preliminary family and subfamily 146
names were assigned on this basis. A maximum likelihood tree was constructed from the sequence 147
alignment using MEGA 5.2 (Tamura et al., 2011) and plotted using the preliminary names. Groups were 148
examined for consistency, and groups on the borderline of 40% or 60% were examined using pairwise p-149
distances calculated by MEGA 5.2. In a few cases, the family criterion of 40% was difficult to apply due 150
to some pairwise comparisons being 41-42% while others were 38-39%, and the family criterion was 151
relaxed to 37-39% if doing so created a coherent group after maximum likelihood phylogenetic analysis. 152
Preliminary names were re-assigned and the entire process was repeated. Partial sequences were 153
examined to ensure that they were not incorrectly grouped. The nomenclature reported here was approved 154
by the UGT Nomenclature Committee and is recorded on the Committee’s website 155
(http://www.flinders.edu.au/medicine/sites/clinicalpharmacology/ugt-homepage.cfm). 156
2.4. Phylogenetic analysis 157
Deduced amino acid sequences were aligned using MUSCLE (Edgar, 2004). Model selection was done 158
with ProtTest 2.4 (Abascal, et al., 2005). According to the Akaike information criterion the model 159
LG+I+G+F, LG+I+G+F, and LG+G+F were optimal for phylogenetic analysis of T. urticae (Fig. 2), 160
Tetranychidae (Fig. S1), and arthropod and bacterial UGTs (Fig. 5). Maximum-likelihood analyses were 161
performed using Treefinder (Jobb et al., 2004) with edge-support calculated by 500 pseudoreplicates (LR-162
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
9
ELW). Resulting phylogenetic trees were visualized using MEGA5.2 (Tamura et al., 2011) and further 163
edited with Adobe Illustrator CS2 (Adobe Systems, USA). 164
2.5. Primary structure prediction 165
Multiple alignments of deduced protein sequences were performed by ClustalW and the structural 166
domains such as UGT signature motif was detected by comparison with other sequences of which 167
primary structures was characterized. Graphical logos for the signature motif of different groups were 168
generated using WebLogo application available at http://weblogo.berkeley.edu (Crooks et al., 2004). An 169
N-terminal signal peptide and a C-terminal transmembrane domain were searched by SignalP 4.1 170
(Petersen et al., 2011) and by TMHMM2.0 (http://www.cbs.dtu.dk/services/TMHMM), respectively. 171
2.6. Genomic distribution of T. urticae UGTs 172
Genomic scaffolds harboring the UGT genes were collected from 640 scaffolds and the genes were 173
mapped on each scaffold, describing gene orientation and relative position. The genomic location for the 174
UGTs identified was resolved by the alignment of their coding sequences (CDS) with scaffold sequences. 175
2.7. Intron mapping 176
Intron positions of the T. urticae UGTs were identified by aligning CDS to corresponding genomic 177
scaffold. The splicing site phases are considered as follows: a phase 0 splicing site lies between two 178
codons, while a phase 1 site lies one base inside the codon in the 3’ direction and the phase 2 intron lies 179
two bases inside a codon in the 3’ direction. 180
2.8. Expression profiling of UGT genes 181
Expression profiling of T. urticae UGT genes in acaricide resistance, after host plant transfer and 182
diapause induction in spider mites, was assessed using previously published dual-color whole genome 183
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
10
gene expression microarrays (Bryon et al., 2013; Dermauw et al., 2013b; Zhurov et al., 2013). Prior to 184
analysis, probe sequences were remapped to the latest annotations of the T. urticae UGT genes using 185
Bowtie2-2.1.0 (Langmead et al., 2009) with default parameters. Seventy-seven UGTs were present on 186
both T. urticae array designs. Limma (Smyth, 2005) was used for final analysis of the dual color data. Per 187
array design, background intensities were corrected (‘normexp’-method with an offset of 50) (Ritchie et 188
al., 2007) followed by within- and between-array normalization (‘loess’- and ‘Aquantile’-method, 189
respectively). Intraspot-correlations were implemented for the linear modeling of the data with the 190
033850 array design (Smyth and Altman, 2013). The design on which the linear models were fitted 191
comprised of all data being compared to one common reference; the T. urticae London strain on bean at 192
standard laboratory conditions (25°C, 60% RH). Significant differential expression was assessed by an 193
empirical bayes approach with cut-offs of the Benjamini-Hochberg corrected p-values and log2FC at 0.05 194
and 1, respectively. The RNA-Seq dataset consisted of replicated RNA-Seq libraries of spider mites 195
(larvae) feeding on different host plants (Arabidopsis, tomato and bean) for 12 hr and a single RNA-Seq 196
library for different developmental stages of spider mites (embryo, larvae, nymph and adult). 197
Experimental details can be found in Grbić et al. (Grbić et al., 2011) and the RNA-Seq data are available 198
via Gene Expression Omnibus under reference GSE32342. To ensure the best possible alignment of 199
RNA-Seq reads to our manually annotated UGT gene models, we re-mapped the RNA-Seq reads to the 200
spider mite genome as previously described (Dermauw et al., 2013a). Expression quantification was 201
performed as described in Grbić et al. (Grbić et al., 2011). 202
203
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
11
3. Results and discussion 204
3.1. Identification and phylogenetic analysis of the T. urticae UGTs 205
The T. urticae genome contains a total of 80 putative UGT genes including five pseudogenes (Table 1). 206
This is the largest UGT repertoire found in any animal genome sequenced so far, including several insects 207
(Ahn et al., 2012), vertebrates (Bock, 2003; Huang and Wu, 2010) and other non-insect arthropods (this 208
study; Fig. 1),. 209
3.1.1. Nomenclature of T. urticae UGT genes The UGT Nomenclature Committee has assigned 210
systematic names to UGT families: families UGT1 - 50 are for animals; UGT51 - 70 for yeasts and fungi; 211
UGT71 - 100 for plants; and UGT101 - 200 for bacteria (Mackenzie et al., 1997, 2005; Ross et al., 2001). 212
According to the list of UGT sequences posted by the nomenclature committee 213
(https://www.flinders.edu.au/medicine/sites/clinical-pharmacology/ugt-homepage.cfm), the animal UGTs 214
are further divided into: families UGT1 - 8 are used for mammals; UGT9 - 27 for a nematode 215
(Caenorhabditis elegans); UGT31 - 50 for insects or insect viruses. All of the T. urticae UGTs identified 216
in this study were assigned to seven new families, UGT201 to 207, as approved by the Committee. Since 217
they are different from any animal UGTs including insects, but closer to a group of bacterial UGTs 218
(UGT108s) (average aaID = 30%, Table S1), we further considered bacterial genes for comparisons. 219
Among many other bacterial UGTs deposited in NCBI, 18 UGTs were identified that clustered with the T. 220
urticae UGTs, but had not yet received official names. On the Committee homepage, there are 21 221
bacterial UGTs that had been already given official names from UGT101 - 107 222
(http://www.flinders.edu.au/medicine/sites/clinical-pharmacology/ugt-homepage.cfm). None of them, 223
however, clustered with the 17 unnamed bacterial UGTs closely related to the T. urticae UGTs. Thus, 224
these bacterial UGTs were grouped into a new family named UGT108. 225
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
12
3.1.2. Phylogenetic analysis A phylogenetic analysis supports the classification of the T. urticae 226
UGTs into the seven distinct families, UGT201 - UGT207 (Fig. 2). UGT201, the largest family, is 227
composed of 36 UGTs, further divided into 8 subfamilies according to 60% aaID rule from the 228
nomenclature guidelines (Mackenzie et al., 2005). This gene family is not only the largest, but also the 229
most diversified one, especially with the recently expanded subfamilies UGT201A (8 genes) and 230
UGT201B (14 genes) (Fig. 2, Table 1). Three out of five pseudogenes identified in T. urticae UGTs are 231
found in these two subfamilies (UGT201A3p, UGT201A7p, UGT201B4p) disrupted by transposable 232
elements (TEs) or repetitive sequences. Subfamilies UGT201C, UGT201D, UGT201F and UGT201G are 233
composed of 3, 2, 3, and 3 duplicated genes, respectively, whereas UGT201E and UGT201H are single-234
gene subfamilies. In addition, there are three partial sequences (UGT201A2v1, UGT201A2v2 and 235
UGT201A8) that could not be completely annotated due to genomic gaps on the scaffolds. As 236
UGT201A2v1 and UGT201A2v2 are currently located within different scaffolds, we decided to regard 237
them as separate, but variant sequences until the two scaffolds are combined. The other partial sequence 238
(UGT201A8) lacks only 22 nucleotide bases at the C-terminal end due to a short gap on the scaffold. 239
Furthermore, UGT201F2 contains a nonsense mutation resulting in premature termination of translation 240
in the middle of the genes, and hence in production of 261 aa-long truncated proteins. However, in the 241
genome of the Montpellier strain, another T. urticae strain that was resequenced (Grbić et al., 2011), the 242
CDS is restored by a GAA codon (coding glutamate) instead of TAA, suggesting they are variants among 243
different populations rather than a pseudogene. The truncated protein lacks a predicted UDP-sugar 244
binding domain at C-terminal, but whether it can bind to the aglycone substrate is unknown. 245
UGT202, the second largest family, consists of 17 UGT genes classified into s subfamilies, UGT202A 246
and UGT202B. Recent lineage-specific gene expansion appears to have occurred in UGT202A family, as 247
diversified into 16 closely related sequences including two pseudogenes, UGT202A13p and 248
UGT202A14p. The former is interrupted by several mutations causing frame shifts, whereas the latter is 249
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
13
disrupted by a 405 bp-long inserted sequence in the middle of exon 2. On the other hand, UGT202A6 250
contains a nonsense mutation resulting in a premature stop codon (TAA) in the middle of exon 2 251
(producing 271 aa-long truncated protein). In the Montpellier strain, however, the intervening termination 252
codon is replaced by a TCA codon (coding serine) and putatively leads to an intact protein sequence, as 253
shown in the case of UGT201F2 above, and a non-synonymous mutation site is also found, suggesting 254
they are variants derived from different populations. UGT202B is composed of a single gene, UGT202B1, 255
distantly related with the others in this family. 256
UGT203, UGT204 and UGT205 families are composed of 11, 8, and 6 genes, respectively. UGT203 is 257
diversified into 7 subfamilies, whereas each of UGT204 and UGT205 consists of 3 subfamilies. 258
UGT204C1 which is classified into UGT204 with low bootstrapping value (<60), is the most divergent 259
member of this subfamily, suggesting that it possesses a unique function. 260
Finally, UGT206 and UGT207 are single-gene families that positioned at separate branches in the 261
phylogenetic tree, suggesting they have gone through evolutionarily independent paths without recent 262
diversification. Although they are given different subfamily names according to the nomenclature rule (i.e. 263
<40% aaID), they seem to share a common ancestor with UGT203 subfamilies as shown in the 264
phylogenetic analysis (Fig. 2). 265
3.2. Characterization of sequence structures 266
3.2.1. Signature sequence The UGT signature motif is a hallmark of the UGT superfamily in all 267
kingdoms of life and is thought to be involved in the binding of UDP moiety of the sugar donor 268
(Mackenzie et al., 1997). Multiple alignments of the amino acid sequences of all the T. urticae UGTs 269
revealed that the common UGT signature sequence is positioned at C-terminal domain (Fig. S2(B)), 270
where some highly conserved amino acids are found in positions 6 (Q), 13-14 (VD), 17-23 (ITHGGNN), 271
27 (E), 32-34 (GKP), 36-37 (IV), 39 (P) and 43-44 (DQ) (Fig. S2(A)). The signature motif composed of 272
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
14
44 amino acids residues is regarded as donor binding domain 1 (DBD1) according to the human UGT 273
model (Miley et al., 2007), where specific amino acid interactions with UDP-glucuronic acid (a sugar 274
donor in mammalian UGT systems) are identified. For example, 27 (E) forms a hydrogen bond with an 275
oxygen of ribose; 18 (T) and 19 (H) interact with phosphates; 43 (D) and 44 (Q) form hydrogen bonds 276
with oxygens of the sugar moiety (Radominska-Pandya et al., 2010). Similarly in the plant UGT model, 277
such consensus residues are also found in the signature motif of T. urticae UGTs (Caputi et al., 2012). 278
However, in spite of such a high degree of conservation in the signature sequence, there seems to be some 279
variation among different animal taxa. Comparison of the graphical alignments restricted to the 44 amino 280
acids from bacterial UGT108s, insects, the crustacean Daphnia pulex, the centipede Strigamia maritime 281
and T. urticae UGTs showed not only sequence conservation among different taxa, but also the presence 282
of taxon-specific residues in the signature motif (Fig. S3). The motif of T. urticae UGTs showed higher 283
similarity with that of bacteria UGT108s than other arthropods, representing bacteria-spider mite 284
consensus residues particularly at 6 (Q), 13 (V), 14 (D), 16 (V), 22 (N), 23 (N), 27 (E) and 33 (K), 285
whereas insect UGTs are closer to water flea and centipede in this motif. 286
3.2.2. Absence of signal peptide and transmembrane domain in T. urticae UGTs In animals, the 287
N-terminal end of the UGT contains a signal peptide that mediates the integration of the protein precursor 288
into the endoplasmic reticulum (ER) compartment. The signal peptide is subsequently cleaved and the 289
protein is further N-glycosylated. The mature protein is retained in the ER membrane by its hydrophobic 290
transmembrane (TM) domain at the C-terminal end, followed by a short cytoplasmic tail (Magdalou et al., 291
2010). However, such a signal peptide was not detected in the T. urticae UGTs examined by SignalP 4.1 292
(Petersen et al., 2011), suggesting that the proteins are probably not oriented within the ER. In addition, 293
the TM domain and the subsequent tail at the C-terminal end were also not found in the T. urticae UGTs, 294
as predicted by TMHMM 2.0 (http://www.cbs.dtu.dk/services/TMHMM), except for UGT202A15 and 295
UGT202A16. These latter two predicted TM regions were located not at the C-terminal end, but in the 296
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
15
middle of the sequences: the predicted TM was found at AA positions 271 (or 272) to 293 (or 294) in 297
UGT202A15 (or UGT202A16), which is unusual when compared to membrane bound UGTs in animals. 298
In addition, since the predicted TM cannot span the membrane without the protein containing a signal 299
peptide at its N-terminal, these two regions with hydrophobic properties similar to a TM helix are likely 300
not physically inserted into the membrane. The absence of true TM domains indicates that the T. urticae 301
UGTs are cytosolic enzymes like bacterial UGTs (Ross et al., 2001). These structural differences of T. 302
urticae UGTs suggest that they have experienced a different evolutionary pathway compared to UGTs of 303
other animals. 304
3.2.3. Intron-exon structure Leaving the three partial sequences (UGT201A2v2, UGT201A3p and 305
UGT201A7p) out of consideration, most of the T. urticae UGTs are composed of 2 exons, excepting only 306
UGT202B1 with 3 exons and UGT201G3 with one (Table 1). The intron locations are strongly conserved 307
across all of the 75 two-exon UGTs in position as well as splicing phase. Each intron is inserted between 308
[A/G/S] and [X]-[G]-[H]-[I/L/V/F]-[N/H/Q/L] very early in the N-terminal end. All of the splicing sites 309
are phase 2, such that an intron lies two bases inside the codon of [A/G/S] to the 3' direction (Fig. 6). 310
Such a conserved intron-exon organization of T. urticae UGTs strongly suggests recent divergence of the 311
gene family after an intron gain in a common ancestor UGT gene. UGT202B1 has 2 introns as confirmed 312
by RNA-Seq and EST-data 313
(http://bioinformatics.psb.ugent.be/orcae/annotation/Tetur/current/tetur10g02090). The position of the 314
first intron of UGT202B1 is identical to that of the single-intron T. urticae UGTs while its second intron 315
is located in the middle of exon 2, suggesting a more recent gain of the second intron. UGT201G3 is the 316
sole intronless gene among the T. urticae UGTs. Compared to the other two genes in the same subfamily 317
(UGT201G1 and UGT201G2), UGT201G3 lacks the consensus intron which may have been secondarily 318
lost. It is noteworthy that these three UGT201Gs are spread over three different genomic scaffolds (Fig. 319
4), indicating that they are not produced by tandem gene duplication. 320
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
16
321
3.3. Genomic distribution of T. urticae UGTs 322
The 80 T. urticae UGT genes were mapped onto 22 different genomic scaffolds showing an uneven 323
distribution of UGTs in the genome (Fig. 4), although chromosome-wide assembly of the genome is not 324
yet available for T. urticae (Grbić et al., 2011). About 60% of UGTs are arranged in a tandem manner, 325
with 31 UGTs concentrated in 6 different clusters containing more than 3 genes. For example, the largest 326
cluster is found on scaffold 22, where 11 genes of UGT202 family are positioned in the same orientation 327
except one (UGT202A11), suggesting that several gene duplication events have occurred in this cluster 328
(Fig. 4). The other clusters are also composed of tandemly duplicated genes. Two large clusters in 329
scaffold 5 are composed of multiple genes belonging to UGT204A and UGT201B subfamilies, 330
respectively. Similarly, two other UGT201B clusters are found in scaffold 4 and 7. In contrast, 34 UGTs 331
are present as singletons distributed over 20 scaffolds. Each of the five pseudogenes is distributed on a 332
different scaffold, and two of them (UGT201A3p and UGT201B4p) are found in different gene clusters, 333
suggesting that they have probably been generated by gene duplication in their own clusters and 334
subsequently lost their functions. The intronless UGT gene (UGT201G3) and the two-intron UGT gene 335
(UGT202B1) are located on different scaffolds devoid of other UGT gene clusters, reflecting the 336
uniqueness of their gene structure among the others (see above, Table 1). The features of this genomic 337
distribution could become clearer when the small scaffolds are further assembled and a chromosome map 338
is completed. The mapping of this large gene family on the T. urticae genomic scaffolds will be useful in 339
assembling the genomes of closely related species in the future. 340
341
3.4. Expression profiling of T. urticae UGTs 342
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
17
We studied the expression profile of UGT genes across development in the T. urticae London reference 343
strain, as well as in larvae feeding on a benign host (bean, Phaseolus vulgaris) and two more challenging 344
hosts (Arabidopsis thaliana and tomato, Solanum lycopersicum), using existing RNA-Seq reads (Grbić et 345
al., 2011). Using previously published microarray data (Bryon et al., 2013; Dermauw et al., 2013b; 346
Zhurov et al., 2013), we further studied in more detail the expression profiles of UGT genes in adult mite 347
host transplant experiments, in strains that are resistant to multiple pesticides (multi-resistant strain, MR-348
VP; MAR-AB), and after diapause induction. For both expression approaches, we recalculated UGT gene 349
expression based on the manually corrected and validated UGT gene models as part of this study (see 350
Materials and Methods section for details). 351
3.4.1. UGT expression analysis using RNA-Seq As assessed by RNA-Seq expression 352
quantification, the majority (81%) of UGT genes was found to be expressed, i.e. 65 of the 80 T. urticae 353
UGT sequences had an RPKM of >1 in at least one of the spider mite life stages or on one of the plant 354
hosts (Fig. 7). In contrast, all 5 pseudogenes showed extremely low or no expression. Most full-length T. 355
urticae UGT genes for which we did not detect expression belonged to either the UGT201 (7 genes), 356
UGT202 (7) or UGT203 (1) families. Whether these T. urticae UGT genes are expressed at low levels in 357
highly restricted expression domains, or alternatively are only expressed under specific environmental 358
conditions (i.e., host plants), remains to be determined. As shown in Fig. 7 almost half of the T. urticae 359
UGT genes (39 genes, or 49%) belonging to multiple families, were expressed across all life stages 360
analyzed (embryos, larvae, nymphs and adults). Furthermore, 8 genes (UGT201G1, UGT202B1, 361
UGT203A3, UGT203E1, UGT205A3, UGT205B1, UGT206A1, UGT207A1) showed very high 362
expression in all stages (RPKM>10) (Fig. 7). However, the larval stage had the highest number of 363
expressed UGTs (58 genes) followed by nymphal (55), adult (46) and embryonic stage (46). All members 364
of the UGT205 family consistently showed relatively higher expression levels in developmental stages. 365
Most of the 46 UGTs expressed in embryos kept high expression levels at the following stages. However, 366
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
18
a few UGTs which were not expressed in the embryo considerably increased their expression level in 367
other stages, for example UGT201B1, UGT201D1 and UGT201E1 in larval; UGT201A1, UGT201C3 368
and UGT204A3 in nymphal stages. 369
3.4.2. UGT expression profile in xenobiotic metabolism Fig. 8 shows the transcriptional response 370
of UGTs in adult spider mites when challenged by alternative hosts (response to plant allelochemicals) as 371
well as after developing resistance to acaricides (Dermauw et al., 2013b; Zhurov et al., 2013). As was first 372
suggested by RNA-Seq analysis (Fig. 7), the microarray derived expression heat map (Fig. 8) reveals that 373
UGT expression alters considerably in response to xenobiotic metabolism and resistance in adult T. 374
urticae mites. To further confirm the putative role of T. urticae UGTs in xenobiotic metabolism, 375
significant differential expression was assessed using log2 of absolute FC ≥ 1 and a Benjamini-Hochberg-376
corrected p-value < 0.05 as cut-offs. Microarray analysis revealed a significant differential expression of 377
9 and 10 UGT genes in the multi-resistant strains MR-VP and MAR-AB, respectively (Table S2). Of 378
these UGTs, 7 were differentially expressed in both multi-resistant strains and showed a positive 379
correlation in expression. Members of the UGT204 family (UGT204B1 and UGT204B2) showed the 380
highest up-regulation in the two resistant T. urticae strains (Fig. 8, Table S2). Interestingly, three 381
pseudogenes (UGT201A3p, UGT201A7p and UGT202A14p) out of five were not inactive in their 382
transcription and were differentially expressed in a number of resistant strains (Fig. 8), notably showing a 383
high up-regulation of UGT201A7p in MAR-AB in particular (Table S2). This suggests that these are 384
probably not pseudogenes in all strains. Furthermore, the induction of UGT transcription in adults after 385
transfer from bean to the more challenging hosts, tomato and Arabidopsis, is clear as evidenced by the 386
overview presented in Fig. 8. This induction was confirmed by the significance analysis (Table S3). Nine 387
UGTs significantly altered their transcription when T. urticae adults were transferred to tomato for 12 hrs. 388
Of these, 7 UGTs remained differentially expressed after T. urticae was grown on tomato for 5 389
consecutive generations, and UGT204 family members (UGT204A2 and UGT204A5) were again the 390
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
19
most up-regulated. The total number of differentially expressed UGTs increased to 22 after 5 generations 391
on tomato and included members of the UGT201, UGT202, UGT203, UGT204 and UGT205 family (Fig. 392
8, Table S3). UGTs also represented a large proportion of the transcriptional response when adult mites 393
were transferred for 24 hrs from bean to different Arabidopsis lines (wild-type Col-0, mutants qKO and 394
atr1D), which differ in glucosinolate content (Zhurov et al., 2013) (Fig. 8, Table S3). Glucosinolates are 395
thioglucose-based plant defense compounds that release toxic nitrile- or isothiocyanate-based products 396
upon herbivore feeding (Lambrix et al., 2001). The qKO mutant line lacks glucosinolates, and 5 mite 397
UGTs were significantly down-regulated upon transfer to it from bean. In contrast, when mites were 398
transferred to Arabidopsis lines containing glucosinolates, the overall expression level of UGTs increased 399
(Fig. 8, Table S3). Thirteen UGTs changed expression levels when mites were put on Col-0 or on the 400
glucosinolate overproducing atr1D mutant line and of those 8 UGTs increased transcription upon each 401
transfer. As determined by (Zhurov et al., 2013), a considerable number of UGTs (UGT201A8, UGT204A5, 402
UGT201A2v2, UGT202A4, UGT201A5, UGT204B1 and UGT202A1), showed a dose dependent relationship 403
between glucosinolate content in the Arabidopsis line and UGT expression level in T. urticae. In line with 404
previous results, members of the UGT204 family responded most strongly. Last, microarray analysis of 405
facultative reproductive diapause revealed that a large number of UGTs were significantly down-406
regulated (Fig. 8, Table S4). This response was specific to the non-feeding diapause state, as feeding, 407
non-diapausing mites under the same environmental conditions did not exhibit this major down-408
regulation (Bryon et al., 2013). Among a total of 35 UGTs differentially expressed in the diapausing stage, 409
the vast majority was down-regulated, while only 5 genes (UGT201B1, UGT201B2, UGT202A4, 410
UGT203D1 and UGT203E1) were slightly up-regulated (Fig. 8, Table S4). As argued by Bryon et al. 411
(Bryon et al., 2013), this down-regulation probably reflects the fact that diapausing females do not feed. 412
This pattern was also found for P450 mono-oxygenases and other well-known detoxification enzymes and 413
indirectly supports the role of UGTs in xenobiotic metabolism. 414
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
20
415
3.5. Loss of UGT gene family in early Chelicerata lineage 416
Comparative phylogenetic analysis of gene families across different taxa can provide important insights 417
with evolutionary implications. tBLASTn searches against the genome databases of other arthropods 418
revealed that (1) an ancient extinction of the complete gene set seems to have occurred in the early 419
Chelicerata lineage, and (2) the T. urticae UGT gene set belongs to a lineage distinguished from other 420
arthropod UGTs, whereas (3) the UGT gene family has diversified along the evolutionary course of the 421
other arthropod subphyla, such as Myriapoda, Crustacea, and Hexapoda. 422
3.5.1. Subphylum Chelicerata As of March 2014, there were four Acari species of which draft genome 423
sequences were available in NCBI: Ixodes scapularis (black-legged tick) and Rhipicephalus microplus 424
(southern cattle tick) (Order Ixodida); and Metaseiulus occidentalis (western orchard predatory mite) and 425
Varroa destructor (honeybee mite) (Order Mesostigmata). Interestingly, no UGT sequences could be 426
detected in any of these genome databases (Fig. 1). Two transcriptome shotgun assembly (TSA) databases 427
of Ixodes ricinus (castor bean tick) and Rhipicephalus pulchellus (a cattle tick) were additionally searched 428
by BLAST, but any UGT sequences have not been detected. However, two UGT-like EST sequences 429
(EW856022 and EW856023, 861 and 848 bp long, respectively) were retrieved from the EST library of I. 430
scapularis. These two EST sequences did not map to any scaffolds of the I. scapularis genome, which 431
consists of 369,492 contigs totaling 1.76 Gbps that covers 84% of the estimated genome size (2.1 Gbps) 432
(Hill, 2010). In addition, homologous sequences of I. scapularis EW856022 and EW856023 were also 433
not detected in the TSA of a congeneric species, I. ricinus. All, together, this might indicate that these two 434
UGT-like EST sequences map to not yet sequenced genomic regions of I. scapularis or represent bacterial 435
contamination from not yet sequenced bacterial contaminants. 436
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
21
Nevertheless, such a complete absence of UGTs in the non-phytophagous Acari species motivated us to 437
extend our searches into other groups of Arachnida, including a scorpion (Mesobuthus martensii) (Order 438
Scorpiones) and a spider (Parasteatoda tepidariorum) (Order Araneae). Repetitive BLAST searches 439
against the two genome databases failed to identify any UGT-like traces from their large genome 440
assemblies of 1.13 Gbps (86% coverage) and 1.44 Gbps (84% coverage), respectively (GenBank 441
Assembly IDs: GCA_000484575.1 and GCA_000365465.1) (Fig. 1). In contrast, 51 UGT-like contigs 442
were identified in the transcriptome of Panonychus citri (citrus red mite), a closely related mite species 443
belonging to the same family Tetranychidae as T. urticae. Alignment and phylogenetic analysis of the 444
protein sequences from both T. urticae and P. citri revealed clear orthologous relationships (Fig. S1). In 445
conclusion, we could detect UGT sequences only in T. urticae and P. citri, but not in other Arachnida 446
species within Chelicerata, suggesting a loss of the UGT gene family in early Chelicerata lineage and a 447
later gain in the spider mites at least before the divergence of Tetranychidae (see Section 3.6 for 448
horizontal gene transfer). 449
3.5.2. Subphylum Myriapoda In Myriapoda, another subphylum of Arthropoda, Strigamia maritima (a 450
centipede) is the only species for which a draft genome sequence (176 Mbp) is available (GenBank 451
Assembly ID: GCA_000239455.1) which accounts for about 60% of the estimated genome size 293 Mbp. 452
We identified 20 complete UGT sequences including one pseudogene in the S. maritima genome (Table 453
S5), revealing they are the animal type UGTs (Fig. 1), more similar to insect UGTs but not to mite UGTs 454
(Fig. 1 and Fig. 5). 455
3.5.3. Subphylum Crustacea The draft genome sequence of Daphnia pulex (a water flea) contains 25 456
UGT genes including 2 pseudogenes, which are all the animal type (Fig. 1), showing relatively higher 457
similarity with insect UGTs (e.g. 30 - 37 % aaID with T. castaneum) (see Table S5 for accession numbers 458
for D. pulex UGT sequences). Similarly, the crayfish Pontastacus leptodactylus contains also multiple 459
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
22
animal-type UGT sequences as identified from a transcriptome shotgun assembly (TSA) available in 460
NCBI (Manfrin et al., 2013). 461
3.5.4. Subphylum Hexapoda A diverse array of insect UGTs have been previously reported in several 462
insect species as described in (Ahn et al., 2012), all of which are of the animal type. 463
In summary, Myriapoda, Crustacea and Hexapoda contain animal UGTs, with diverse numbers in gene 464
members (Fig. 1). Their gene structures (intron/exon) and protein sequence motifs share common features 465
of animal UGTs. In contrast, members of the Chelicerata lack the complete UGT gene set, except for two 466
phytophagous mites including T. urticae that have a distinct UGT repertoire closer to the bacterial type. 467
This comparative analysis with diverse species in Arthropoda shows that the arthropod UGTs might have 468
been once lost early in the Chelicerata lineage (between 700 and 450 mya), whereas the Mandibulata 469
lineage seems to have further diversified its UGT gene family, as species of Myriapoda, Crustacea and 470
Hexapoda have evolved. In Chelicerata, however, two ancient lineages, Pycnogonida (sea spider) and 471
Xiphosura (horseshoe crab), had been branched out before Arachnida (not shown in Fig. 1). When 472
molecular information from these two marine chelicerates becomes available, it will allow a better 473
understanding of exactly when the loss occurred in the Chelicerata lineage. Another issue to be concerned 474
is whether UGTs had already formed a multigene family in the last common ancestors of metazoan 475
animals before their loss or whether UGTs had undergone lineage-specific expansions in each separate 476
taxonomic group after their loss. In the first scenario, the absence of UGTs in Chelicerata (excluding 477
spider mites) requires multiple losses of all UGT gene members in several chelicerate lineages (Araneae, 478
Scorpiones, Ixodida, etc.) which seems unlikely. Instead, as postulated in the second scenario, it is more 479
likely that an ancestral gene(s) of this gene family did not diversify prior to the loss. This is supported by 480
the observation that insect UGTs diversified in a lineage-specific manner at Order level (Ahn et al., 2012). 481
Assuming that such a lineage-specific UGT diversification is also found in arthropods outside the 482
Hexapoda, the loss of UGTs in early Chelicerata might have occurred before the multigene family was 483
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
23
formed. This ‘most likely evolutionary scenario’ has important implications for our understanding of the 484
evolution and current distribution of UGTs in arthropods. 485
Unlike other chelicerates, the spider mites have a large number of unique UGTs which are more similar to 486
bacterial ones. This unusual existence of bacterial type of UGTs in the spider mites led us to propose the 487
‘horizontal gene transfer (HGT)’ hypothesis, which is further discussed in the next section. 488
3.6. Horizontal gene transfer of T. urticae UGTs from bacteria 489
Horizontal gene transfer (HGT; also called lateral gene transfer) is the process by which genes move 490
across species boundaries by asexual mechanisms, and is an important key in understanding evolution of 491
prokaryotes and eukaryotes (Boucher et al., 2003). Contrasting with abundant examples of HGT within 492
prokaryotes (Boucher et al., 2003), fewer cases have reported in relation to eukaryotes, especially animals 493
(Andersson, 2005). This is presumably because the reproductive biology of animals is fundamentally 494
different; however many such events may have been overlooked due to the routine procedure of weeding 495
out prokaryote-like sequences considered to be contaminants in the analysis of eukaryotic genomes 496
(Acuña et al., 2012). Nevertheless, the number of examples of bacteria-to-animal HGT is rapidly 497
increasing as more animal genomes are being sequenced (Dunning Hotopp, 2011; Schönknecht et al., 498
2014). 499
3.6.1. Phylogenetic analysis In phylogenetic analyses including arthropod and bacterial UGTs, all T. 500
urticae UGT protein sequences did not cluster with evolutionarily closer species such as a centipede, a 501
water flea or insects. Instead, they clustered within groups of bacterial phyla, such as Actinobacteria and 502
Chloroflexi (Fig. 5). We identified a single gene most similar to the T. urticae UGTs from each of 15 503
Actinobacteria and 2 Chloroflexi genomes by BLAST searches in NCBI, and constructed a phylogenetic 504
sub-tree together with seven representative T. urticae UGTs, showing a close relationship between the T. 505
urticae UGTs and the bacterial groups supported by a high bootstraping value (Fig. S4). 506
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
24
Actinobacteria constitute one of the largest phyla among Bacteria. They are well known for the 507
production of secondary metabolites and are of high pharmacological and commercial interest. For 508
example, Streptomycetes is the largest genus in the Actinobacteria, including over 580 species, mostly 509
living in soil, and it is especially known for its source of secondary metabolites, such as antibiotics, 510
antifungals, antiparasitics and anticancer agents (Zhou et al., 2012). Glycosylation is also regarded as one 511
of the important metabolic pathways to produce and further modify such secondary compounds, mostly 512
performed by UGTs. In this perspective, the polyphagous spider mite might have obtained such a potent 513
glycosylation tool-kit from a bacterial source and then increased the diversity of UGTs by gene 514
duplication resulting in enhanced adaptability to environmental conditions, such as various host plants. 515
Interestingly, a recent genome analysis of a plant pathogenic fungus, Botrytis cinerea, revealed that a 516
UGT gene was obtained from plants by HGT, suggesting it may contribute to the evolution of 517
phytopathogenicity in B. cinerea (Zhu et al., 2012). In addition, a similar evolutionary scenario as for 518
UGT genes in T. urticae can be found in plant-parasitic nematodes in which enzymes for the degradation 519
of the plant cell wall have been acquired via HGT and have undergone massive duplications (Danchin et 520
al., 2010; Paganini et al., 2012). In these cases, the enzymes apparently play an important role in the 521
biology of the species. Although no direct role in detoxification of any T. urticae UGT has been reported 522
yet, the pattern of diversification suggests significant ecological consequences of this large gene family in 523
the spider mite. 524
3.6.2. Protein structure comparison Multiple alignments of T. urticae UGT201A1, representative of 525
the majority of T. urticae UGTs, with UGT1A1 from Homo sapiens (vertebrate), UGT40A1 from 526
Bombyx mori (insect) and UGT108A2 from Streptomyces violaceusniger (Actinobacteria) showed a 527
greater structural similarity of the T. urticae UGTs to bacterial proteins (Fig. 3). The common UGT 528
signature motif is positioned in the middle of C–terminal domain consistently across different organisms. 529
However, the T. urticae UGT lacks an N-terminal signal peptide which is required to guide the peptide 530
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
25
into the ER. In addition, a TM domain and cytoplasmic tail at C-terminus could not be detected (see 531
above section). Instead, the primary structure of the T. urticae UGT more closely resembles that of 532
bacteria, as well as the mature protein size (Fig. 3), supporting the notion that the T. urticae UGTs 533
originated by HGT from a bacterial ancestor. 534
3.6.3. Other HGT examples in T. urticae UGTs are not the only examples of horizontal gene 535
transfer in T. urticae; recent genome sequencing has revealed several other cases (Grbić et al., 2011). For 536
example, intradiol ring-cleavage dioxygenases (ID-RCDs) have been suggested to originate from fungi 537
(Dermauw et al., 2013b), and a cobalamin-independent methionine synthase (MetE) gene and two β-538
fructofuranosidase genes might possibly be transferred from Bacteria (Grbić et al., 2011). In addition, two 539
clusters of carotenoid biosynthesis genes (Grbić et al. 2011; Bryon et al. 2013) and a cyanase gene 540
(Wybouw et al., 2012) are also highly likely to have been laterally transferred, but it is still unclear from 541
which organisms they were obtained. These examples, including UGTs in this study, indicate that HGT 542
has occurred quite frequently and diversely in T. urticae, contributing putative selective advantages after 543
the gene transfer such as effective detoxification or higher host plant adaptability. As suggested in a 544
previous report (Wybouw et al. 2012) and reinforced by this study (see 3.5.1), these ancient gene transfers 545
might have occurred before the divergence in the family Tetranychidae (Wybouw et al., 2012), as was 546
also shown in the case of UGTs. 547
To summarize, an unexpectedly large number of UGT genes were detected in T. urticae and these genes 548
most likely originate from an ancestral HGT event from bacteria, a very rare event in eukaryotes 549
(Andersson, 2005). Orthologues identified in a mite relative, P. citri, confirmed that HGT occurred at 550
least before the diversification of the family Tetranychidae. Furthermore, structural similarity of T. 551
urticae and bacterial UGT protein sequences also supported this hypothesis. 552
553
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
26
3.7. Conclusions 554
In this study, we annotated the UGT gene family of a non-insect arthropod, the spider mite T. urticae. 555
Transcriptome analysis after host-plant shifts and in acaricide resistant lines has provided abundant 556
information on the dynamics of expression, supporting an important role for UGT genes in xenobiotic 557
metabolism. Comparative analysis of this gene family in Arthropoda revealed that an ancient loss of all 558
animal-like UGT gene families has probably occurred in the Chelicerata, and that an ancestor of the 559
spider mite regained a bacterial-like gene family probably by horizontal gene transfer. This unusual mode 560
of evolution is partly responsible for the remarkable adaptability of this group of organisms. 561
562
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
27
Acknowledgements 563
We thank Dr. H. Vogel for his technical assistance in the transcriptome assembly and two anonymous 564
reviewers for their valuable comments and suggestions to improve the manuscript. This work was 565
supported by RDA Grant PJ009365 (to SA) and by the Max-Planck-Gesellschaft (to SA and DGH). TVL 566
and WD are post-doctoral fellows of the Fund for Scientific Research Flanders (FWO) This work was 567
supported by FWO grant 3G061011 and 3G009312 and a Ghent University Special Research Fund grant 568
01J13711, the Government of Canada through Genome Canada, and the Ontario Genomics Institute OGI-569
046 (to TVL). NW is supported by the Institute for the Promotion of Innovation by Science and 570
Technology in Flanders (IWT, grant IWT/SB/101451). 571
572
Competing interests 573
The authors declare they have no competing interests. 574
575
Authors' contributions 576
SA and TVL designed the research. SA performed annotation and genomic analysis. WD and NW 577
analyzed RNA-Seq and microarray data. DGH provided evolutionary analyses. SA and TVL wrote the 578
manuscript with input from WD, NW and DGH. All authors read and approved the final manuscript. 579
580
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
28
References 581
Abascal, F., Zardoya, R., Posada, D., 2005. ProtTest: Selection of best-fit models of protein evolution. 582
Bioinformatics 21, 2104-2105. 583
Acuña, R., Padilla, B.E., Flórez-Ramos, C.P., Rubio, J.D., Herrera, J.C., Benavides, P., Lee, S.-J., Yeats, 584
T.H., Egan, A.N., Doyle, J.J., Rose, J.K.C., 2012. Adaptive horizontal transfer of a bacterial gene to 585
an invasive insect pest of coffee. Proc. Natl. Acad. Sci. U.S.A. 586
http://dx.doi.org/10.1073/pnas.1121190109. 587
Ahn, S.-J., Badenes-Pérez, F.R., Reichelt, M., Svatoš, A., Schneider, B., Gershenzon, J., Heckel, D.G., 588
2011. Metabolic detoxification of capsaicin by UDP-glycosyltransferase in three Helicoverpa species. 589
Arch. Insect Biochem. Physiol. 78, 104-118. 590
Ahn, S.-J., Vogel, H., Heckel, D.G., 2012. Comparative analysis of the UDP-glycosyltransferase 591
multigene family in insects. Insect Biochem. Mol. Biol. 42, 133-147. 592
Altincicek, B., Kovacs, J.L., Gerardo, N.M., 2012. Horizontally transferred fungal carotenoid genes in the 593
two-spotted spider mite Tetranychus urticae. Biol. Lett. 8, 253-257. 594
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., 1997. 595
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic 596
Acids Res. 25, 3389-3402. 597
Andersson, J.O., 2005. Lateral gene transfer in eukaryotes. Cell. Mol. Life Sci. 62, 1182-1197. 598
Bock, K.W., 2003. Vertebrate UDP-glucuronosyltransferases: functional and evolutionary aspects. 599
Biochem. Pharmacol. 66, 691-696. 600
Boucher, Y., Douady, C.J., Papke, R.T., Walsh, D.A., Boudreau, M.E.R., Nesbø, C.L., Case, R.J., 601
Doolittle, W.F., 2003. Lateral gene transfer and the origins of prokaryotic groups. Annu. Rev. Genet. 602
37, 283-328. 603
Bowles, D., Isayenkova, J., Lim, E.-K., Poppenberger, B., 2005. Glycosyltransferases: managers of small 604
molecules. Curr. Opin. Plant. Biol. 8, 254-263. 605
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
29
Brattsten, L.B., 1988. Enzymic adaptations in leaf-feeding insects to host-plant allelochemicals. J. Chem. 606
Ecol. 14, 1919-1939. 607
Bryon, A., Wybouw, N., Dermauw, W., Tirry, L., Van Leeuwen, T., 2013. Genome wide gene-expression 608
analysis of facultative reproductive diapause in the two-spotted spider mite Tetranychus urticae. 609
BMC Genomics 14, 815. 610
Caputi, L., Malnoy, M., Goremykin, V., Nikiforova, S., Martens, S., 2012. A genome-wide phylogenetic 611
reconstruction of family 1 UDP-glycosyltransferases revealed the expansion of the family during the 612
adaptation of plants to life on land. Plant J. 69, 1030-1042. 613
Crooks, G.E., Hon, G., Chandonia, J.-M., Brenner, S.E., 2004. WebLogo: a sequence logo generator. 614
Genome Res. 14, 1188-1190. 615
Daimon, T., Hirayama, C., Kanai, M., Ruike, Y., Meng, Y., Kosegawa, E., Nakamura, M., Tsujimoto, G., 616
Katsuma, S., Shimada, T., 2010. The silkworm Green b locus encodes a quercetin 5-O-617
glucosyltransferase that produces green cocoons with UV-shielding properties. Proc. Natl. Acad. Sci. 618
U.S.A. 107, 11471-11476. 619
Danchin, E.G.J., Rosso, M.-N., Vieira, P., de Almeida-Engler, J., Coutinho, P.M., Henrissat, B., Abad, P., 620
2010. Multiple lateral gene transfers and duplications have promoted plant parasitism ability in 621
nematodes. Proc. Natl. Acad. Sci. U.S.A. 107, 17651-17656. 622
Dermauw, W., Osborne, E., Clark, R., Grbić, M., Tirry, L., Van Leeuwen, T., 2013a. A burst of ABC 623
genes in the genome of the polyphagous spider mite Tetranychus urticae. BMC Genomics 14, 317. 624
Dermauw, W., Wybouw, N., Rombauts, S., Menten, B., Vontas, J., Grbić, M., Clark, R.M., Feyereisen, R., 625
Van Leeuwen, T., 2013b. A link between host plant adaptation and pesticide resistance in the 626
polyphagous spider mite Tetranychus urticae. Proc. Natl. Acad. Sci. U.S.A. 110, E113-E122. 627
Després, L., David, J.-P., Gallet, C., 2007. The evolutionary ecology of insect resistance to plant 628
chemicals. Trends Ecol. Evol. 22, 298-307. 629
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
30
Dunning Hotopp, J.C., 2011. Horizontal gene transfer between bacteria and animals. Trends Genet. 27, 630
157-163 631
Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. 632
Nucleic Acids Res. 32, 1792-1797. 633
Erb, A., Weiß, H., Härle, J., Bechthold, A., 2009. A bacterial glycosyltransferase gene toolbox: 634
Generation and applications. Phytochemistry 70, 1812-1821. 635
Grbić, M., Van Leeuwen, T., Clark, R.M., Rombauts, S., Rouzé, P., Grbić, V., Osborne, E.J., Dermauw, 636
W., Thi Ngoc, P.C., Ortego, F., Hernandez-Crespo, P., Diaz, I., Martinez, M., Navajas, M., Sucena, 637
É., Magalhães, S., Nagy, L., Pace, R.M., Djuranović, S., Smagghe, G., Iga, M., Christiaens, O., 638
Veenstra, J.A., Ewer, J., Villalobos, R.M., Hutter, J.L., Hudson, S.D., Velez, M., Yi, S.V., Zeng, J., 639
Pires-daSilva, A., Roch, F., Cazaux, M., Navarro, M., Zhurov, V., Acevedo, G., Bjelica, A., Fawcett, 640
J.A., Bonnet, E., Martens, C., Baele, G., Wissler, L., Sanchez-Rodriguez, A., Tirry, L., Blais, C., 641
Demeestere, K., Henz, S.R., Gregory, T.R., Mathieu, J., Verdon, L., Farinelli, L., Schmutz, J., 642
Lindquist, E., Feyereisen, R., Van de Peer, Y., 2011. The genome of Tetranychus urticae reveals 643
herbivorous pest adaptations. Nature 479, 487-492. 644
Hacker, C.V., Brasier, C.M., Buck, K.W., 2005. A double-stranded RNA from a Phytophthora species is 645
related to the plant endornaviruses and contains a putative UDP glycosyltransferase gene. J. Gen. 646
Virol. 86, 1561-1570. 647
Hedges, S.B., Kumar, S., 2009. Discovering the timetree of life. Oxford University Press, New York. 648
Hill, C.A., 2010. Genome analysis of major tick and mite vectors of human pathogens. 649
https://www.vectorbase.org/sites/default/files/ftp/documents/ixs_sequencing_proposal.pdf. 650
Huang, H., Wu, Q., 2010. Cloning and comparative analyses of the zebrafish Ugt repertoire reveal its 651
evolutionary diversity. PLoS ONE 5, e9144. 652
Hughes, A.L., 2013. Origin of ecdysosteroid UDP-glycosyltransferases of baculoviruses through 653
horizontal gene transfer from Lepidoptera. Coevolution 1, 1-7. 654
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
31
Jeppson, L.R., Keifer, H.H., Baker, E.W., 1975. Mites injurious to Economic Plants. University of 655
California Press, Berkeley, Los Angeles. 656
Jobb, G., Haeseler, A.V., Strimmer, K., 2004. TREEFINDER: a powerful graphical analysis environment 657
for molecular phylogenetics. BMC Evol. Biol. 4, 18. 658
Kojima, W., Fujii, T., Suwa, M., Miyazawa, M., Ishikawa, Y., 2010. Physiological adaptation of the 659
Asian corn borer Ostrinia furnacalis to chemical defenses of its host plant, maize. J. Insect Physiol. 660
56, 1349-1355. 661
Lairson, L.L., Henrissat, B., Davies, G.J., Withers, S.G., 2008. Glycosyltransferases: structures, functions, 662
and mechanisms. Annu. Rev. Biochem. 77, 521-555. 663
Lambrix, V., Reichelt, M., Mitchell-Olds, T., Kliebenstein, D.J., Gershenzon, J., 2001. The Arabidopsis 664
epithiospecifier protein promotes the hydrolysis of glucosinolates to nitriles and influences 665
Trichoplusia ni herbivory. Plant Cell 13, 2793-2807. 666
Langmead, B., Trapnell, C., Pop, M., Salzberg, S., 2009. Ultrafast and memory-efficient alignment of 667
short DNA sequences to the human genome. Genome Biol. 10, R25. 668
Lee, S.-W., Ohta, K., Tashiro, S., Shono, T., 2006. Metabolic resistance mechanisms of the housefly 669
(Musca domestica) resistant to pyraclofos. Pestic. Biochem. Physiol. 85, 76-83. 670
Li, W., Godzik, A., 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or 671
nucleotide sequences. Bioinformatics 22, 1658-1659. 672
Liu, B., Jiang, G., Zhang, Y., Li, J., Li, X., Yue, J., Chen, F., Liu, H., Li, H., Zhu, S., Wang, J., Ran, C., 673
2011. Analysis of transcriptome differences between resistant and susceptible strains of the citrus red 674
mite Panonychus citri (Acari: Tetranychidae). PLoS ONE 6, e28516. 675
Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P.M., Henrissat, B., 2013. The carbohydrate-676
active enzymes database (CAZy) in 2013. Nucleic Acids Res. 42, D490-D495. 677
Luzhetskyy, A., Bechthold, A., 2008. Features and applications of bacterial glycosyltransferases: current 678
state and prospects. Appl. Microbiol. Biotechnol. 80, 945-952. 679
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
32
Mackenzie, P.I., Owens, I.S., Burchell, B., Bock, K.W., Bairoch, A., Belanger, A., Gigleux, S.F., Green, 680
M., Hum, D.W., Iyanagi, T., Lancet, D., Louisot, P., Magdalou, J., Roy Chowdhury, J., Ritter, J.K., 681
Tephly, T.R., Schachter, H., Tephly, T., Tipton, K.F., Nebert, D.W., 1997. The UDP 682
glycosyltransferase gene superfamily: recommended nomenclature update based on evolutionary 683
divergence. Pharmacogenetics 7, 255-269. 684
Mackenzie, P.I., Walter Bock, K., Burchell, B., Guillemette, C., Ikushiro, S.-i., Iyanagi, T., Miners, J.O., 685
Owens, I.S., Nebert, D.W., 2005. Nomenclature update for the mammalian UDP glycosyltransferase 686
(UGT) gene superfamily. Pharmacogenet. Genomics 15, 677-685. 687
Magdalou, J., Fournel-Gigleux, S., Ouzzine, M., 2010. Insights on membrane topology and 688
structure/function of UDP-glucuronosyltransferases. Drug Metab. Rev. 42, 159-166. 689
Manfrin, C., Tom, M., De Moro, G., Gerdol, M., Guarnaccia, C., Mosco, A., Pallavicini, A., Giulianini, 690
P.G., 2013. Application of D-crustacean hyperglycemic hormone induces peptidases transcription 691
and suppresses glycolysis-related transcripts in the hepatopancreas of the crayfish Pontastacus 692
leptodactylus - Results of a transcriptomic study. PLoS ONE 8, e65176. 693
Meech, R., Miners, J.O., Lewis, B.C., Mackenzie, P.I., 2012. The glycosidation of xenobiotics and 694
endogenous compounds: versatility and redundancy in the UDP glycosyltransferase superfamily. 695
Pharmacol. Ther. 134, 200-218. 696
Megy, K., Emrich, S.J., Lawson, D., Campbell, D., Dialynas, E., Hughes, D.S.T., Koscielny, G., Louis, C., 697
MacCallum, R.M., Redmond, S.N., Sheehan, A., Topalis, P., Wilson, D., the VectorBase, C., 2012. 698
VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics. Nucleic 699
Acids Res. 40, D729-D734. 700
Migeon, A., Dorkeld, F., 2013. Spider Mites Web: a comprehensive database for the Tetranychidae. 701
http://www.montpellier.inra.fr/CBGP/spmweb. 702
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
33
Miley, M.J., Zielinska, A.K., Keenan, J.E., Bratton, S.M., Radominska-Pandya, A., Redinbo, M.R., 2007. 703
Crystal structure of the cofactor-binding domain of the human phase II drug-metabolism enzyme 704
UDP-glucuronosyltransferase 2B7. J. Mol. Biol. 369, 498-511. 705
O'Reilly, D.R., 1995. Baculovirus-encoded ecdysteroid UDP-glucosyltransferases. Insect Biochem. Mol. 706
Biol. 25, 541-550. 707
Paganini, J., Campan-Fournier, A., Da Rocha, M., Gouret, P., Pontarotti, P., Wajnberg, E., Abad, P., 708
Danchin, E.G.J., 2012. Contribution of lateral gene transfers to the genome composition and parasitic 709
ability of root-knot nematodes. PLoS ONE 7, e50875 710
Petersen, T.N., Brunak, S., von Heijne, G., Nielsen, H., 2011. SignalP 4.0: discriminating signal peptides 711
from transmembrane regions. Nat. Methods. 8, 785-786. 712
Radominska-Pandya, A., Bratton, S.M., Redinbo, M.R., Miley, M.J., 2010. The crystal structure of 713
human UDP-gluuronosyltransferase 2B7 C-termnial end is the first mammalian UGT target to be 714
revealed: the significance for human UGTs from both the 1A and 2B families. Drug Metab. Rev. 42, 715
133-144. 716
Ritchie, M.E., Silver, J., Oshlack, A., Holmes, M., Diyagama, D., Holloway, A., Smyth, G.K., 2007. A 717
comparison of background correction methods for two-colour microarrays. Bioinformatics 23, 2700-718
2707. 719
Ross, J., Li, Y., Lim, E.-K., Bowles, D., 2001. Higher plant glycosyltransferases. Genome Biol. 2, 720
reviews3004.3001-reviews3004.3006. 721
Sasai, H., Ishida, M., Murakami, K., Tadokoro, N., Ishihara, A., Nishida, R., Mori, N., 2009. Species-722
specific glucosylation of DIMBOA in larvae of the rice Armyworm. Biosci. Biotechnol. Biochem. 723
73, 1333-1338. 724
Schönknecht, G., Weber, A.P.M., Lercher, M.J., 2014. Horizontal gene acquisitions by eukaryotes as 725
drivers of adaptive evolution. BioEssays 36, 9-20. 726
Smith, J.N., 1962. Detoxication Mechanisms. Annu. Rev. Entomol. 7, 465-480. 727
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
34
Smyth, G., Altman, N., 2013. Separate-channel analysis of two-channel microarrays: recovering inter-728
spot information. BMC Bioinformatics 14, 165. 729
Smyth, G.K., 2005. limma: Linear Models for Microarray Data, in: Gentleman, R., Carey, V., Huber, W., 730
Irizarry, R., Dudoit, S., Smyth, G.K. (Eds.), Bioinformatics and Computational Biology Solutions 731
Using R and Bioconductor. Springer New York, pp. 397-420. 732
Song, D., Cho, W.K., Park, S.-H., Jo, Y., Kim, K.-H., 2013. Evolution of and horizontal gene transfer in 733
the Endornavirus genus. PLoS ONE 8, e64270. 734
Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S., 2011. MEGA5: molecular 735
evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum 736
parsimony methods. Mol. Biol. Evol. 28, 2731-2739. 737
Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., 1997. The CLUSTAL_X 738
windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. 739
Nucleic Acids Res. 25, 4876-4882. 740
Van Leeuwen, T., Demaeght, P., Osborne, E.J., Dermauw, W., Gohlke, S., Nauen, R., Grbić, M., Tirry, L., 741
Merzendorfer, H., Clark, R.M., 2012a. Population bulk segregant mapping uncovers resistance 742
mutations and the mode of action of a chitin synthesis inhibitor in arthropods. Proc. Natl. Acad. Sci. 743
U.S.A. 109, 4407-4412. 744
Van Leeuwen, T., Dermauw, W., Grbić, M., Tirry, L., Feyereisen, R., 2012b. Spider mite control and 745
resistance management: does a genome help? Pest Manag. Sci. 69, 156-159. 746
Van Leeuwen, T., Vontas, J., Tsagkarakou, A., Dermauw, W., Tirry, L., 2010. Acaricide resistance 747
mechanisms in the two-spotted spider mite Tetranychus urticae and other important Acari: a review. 748
Insect Biochem. Mol. Biol. 40, 563-572. 749
Wybouw, N., Balabanidou, V., Ballhorn, D.J., Dermauw, W., Grbić, M., Vontas, J., Van Leeuwen, T., 750
2012. A horizontally transferred cyanase gene in the spider mite Tetranychus urticae is involved in 751
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
35
cyanate metabolism and is differentially expressed upon host plant change. Insect Biochem. Mol. 752
Biol. 42, 881-889. 753
Yang, X., Buschman, L.L., Zhu, K.Y., Margolies, D.C., 2002. Susceptibility and detoxifying enzyme 754
activity in two spider mite species (Acari: Tetranychidae) after selection with three insecticides. J. 755
Econ. Entomol. 95, 399-406. 756
Yonekura-Sakakibara, K., Hanada, K., 2011. An evolutionary view of functional diversity in family 1 757
glycosyltransferases. Plant J. 66, 182-193. 758
Zhou, Z., Gu, J., Li, Y.-Q., Wang, Y., 2012. Genome plasticity and systems evolution in Streptomyces. 759
BMC Bioinformatics 13, S8. 760
Zhu, B., Zhou, Q., Xie, G., Zhang, G., Zhang, X., Wang, Y., Sun, G, Li, B., Jin, G., 2012. Interkingdom 761
gene transfer may contribute to the evolution of phytopathogenicity in Botrytis cinerea. Evol. 762
Bioinformatics 8, 105-117. 763
Zhurov, V., Navarro, M., Bruinsma, K.A., Arbona, V., Santamaria, E.M., Cazaux, M., Wybouw, N., 764
Osborne, E.J., Ens, C., Rioja, C., Vermeirssen, V., Rubio-Somoza, I., Krishna, P., Diaz, I., Schmid, 765
M., Gómez-Cadenas, A., Van de Peer, Y., Grbić, M., Clark, R.M., Van Leeuwen, T., Grbić, V., 2014. 766
Reciprocal responses in the interaction between Arabidopsis and the cell-content-feeding chelicerate 767
herbivore spider mite. Plant Physiol. 164, 384-399. 768
769
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
36
Figure legends 770
771
Figure 1. Evolutionary relationships and UGT gene numbers of T. urticae and related arthropods. 772
The tree shows divergence time for arthropod lineages with the Nematoda subphylum as an outgroup as 773
indicated by TimeTree (http://www.timetree.org) (Hedges and Kumar, 2009). The numbers of arthropod 774
UGT genes are taken from reference (Ahn et al., 2012) and this study. The C. elegans UGT gene number 775
is based on the data table in the UGT Nomenclature Committee website 776
(http://www.flinders.edu.au/medicine/sites/clinical-pharmacology/ugt-homepage.cfm). Databases (DB) 777
refers to whole-genome shotgun contigs (WGS), transcriptome shotgun assembly (TSA) or the 778
Nomenclature Committee’s data collection (NCD). 779
Figure 2. Phylogenetic analysis of T. urticae UGTs. A full set of T. urticae UGT protein sequences was 780
aligned using MUSCLE (Edgar, 2004) and subjected to a maximum-likelihood analysis using Treefinder 781
(Jobb et al., 2004). The resulting tree was midpoint rooted. Only bootstrapping values higher than 80 are 782
shown. The scale bar represents 0.2 substitutions per site. Information and accession numbers of T. 783
urticae UGT protein sequences can be found in Table 1 and Table S5. 784
Figure 3. Comparison of T. urticae UGT protein sequences to human, insect and bacterial UGTs. 785
(A) Comparison of primary structures of the UGT protein sequences with emphasis on important domains. 786
(B) Representative sequences from the organisms are aligned by ClustalW where asterisks (*) indicates 787
positions which have a single, fully conserved residue, and colons (:) and periods (.) indicate conservation 788
between groups of strongly and weakly similar chemical properties, respectively. 789
Figure 4. Distribution of the UGT genes on the T. urticae genomic scaffolds. Only scaffolds harboring 790
UGT genes are depicted with UGT genes mapped at their relative physical location. Arrows indicate gene 791
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
37
orientation and same color in gene name represents the same gene family. Accession numbers of UGT 792
protein sequences used can be found in Table S5. 793
Figure 5. Phylogenetic analysis of arthropod and bacterial UGTs. A set of arthropod and bacterial 794
UGT protein sequences was aligned using MUSCLE (Edgar, 2004) and subjected to a maximum-795
likelihood analysis using Treefinder (Jobb et al., 2004). Only bootstrapping values higher than 80 are 796
shown. The scale bar represents 0.5 substitutions per site. 797
Figure 6. Alignment of N-terminal amino acid sequences of 7 representative T. urticae UGTs and 5 798
Actinobacteria UGTs. The intron-exon splicing site is indicated by a red line and conserved amino acids 799
are highlighted in same colors near the splicing site. Accession numbers of UGT protein sequences used 800
can be found in Table S5. 801
Figure 7. Expression heat map of T. urticae UGT genes based on RNA-Seq data. RPKM values are 802
analyzed from RNA-Seq data derived from larvae after feeding on several host plants (from bean to 803
tomato and Arabidopsis) and from different life stages of T. urticae (embryo, larvae, nymph and adult). 804
Figure 8. Microarray-derived expression heat map of T. urticae UGT genes. The heat map depicts the 805
log2FC of 77 UGT genes after host plant transfer (from bean to tomato and Arabidopsis), in multi-806
resistant strains (MR-VP and MAR-AB), and after diapause induction, relative to the susceptible London 807
strain on bean under standard laboratory conditions (25°C, 60% RH). The expression profile of UGTs 808
after transfer to tomato was assessed after 2 hrs, 12 hrs and 5 generations, while the transcriptional 809
response to Arabidopsis was investigated on different lines (qKO, Col-0, atr1D) with different 810
allelochemical (glucosinolate) content 24 hrs after transfer relative to the same strain feeding on the 811
ancestral, benign host plant, bean. The qKO mutant line lacks glucosinolates and atr1D line overproduces 812
glucosinolates compared to the wild type, Col-0. 813
814
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
1
Table 1. Characterization of a full set of T. urticae UGT sequences. 1
UGT family Name1 Tetur ID2 Length (aa)3 Scaffold4 Strand No. Exons5
201 UGT201A1 tetur01g03820 455 scaffold 1 + 2
UGT201A2v1 tetur60g00080 402 scaffold 60 - 2
UGT201A2v2 tetur02g02480 (278) scaffold 2 + (1)
UGT201A3p† tetur02g02770 (103) scaffold 2 + (1)
UGT201A4 tetur05g09325 448 scaffold 5 + 2
UGT201A5 tetur08g00190 444 scaffold 8 + 2
UGT201A6 tetur19g00440 456 scaffold 19 + 2
UGT201A7p tetur21g01400* (166) scaffold 21 + (1)
UGT201A8 tetur184g00030* 444 scaffold 184 + 2
UGT201B1 tetur01g05690 438 scaffold 1 + 2
UGT201B2 tetur01g05700 443 scaffold 1 + 2
UGT201B3 tetur04g07630 438 scaffold 4 - 2
UGT201B4p tetur04g07710* 437 scaffold 4 - 2
UGT201B5 tetur04g07770 437 scaffold 4 - 2
UGT201B6 tetur04g07780 437 scaffold 4 - 2
UGT201B7 tetur05g05020 437 scaffold 5 + 2
UGT201B8 tetur05g05030 437 scaffold 5 + 2
UGT201B9 tetur05g05050 437 scaffold 5 + 2
UGT201B10 tetur05g05060 436 scaffold 5 + 2
UGT201B11 tetur07g06390 435 scaffold 7 + 2
UGT201B12 tetur07g06420 435 scaffold 7 + 2
UGT201B13 tetur07g06430 435 scaffold 7 + 2
UGT201B14 tetur07g06450 435 scaffold 7 + 2
UGT201C1 tetur01g11870 441 scaffold 1 + 2
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
2
UGT201C2 tetur05g04680 436 scaffold 5 - 2
UGT201C3 tetur05g04690 440 scaffold 5 - 2
UGT201D1 tetur04g04300 459 scaffold 4 - 2
UGT201D2 tetur04g04350 459 scaffold 4 + 2
UGT201E1 tetur05g05710 438 scaffold 5 + 2
UGT201F1 tetur02g01310 438 scaffold 2 + 2
UGT201F2 tetur06g02410* 436 scaffold 6 + 2
UGT201F3 tetur06g02430 438 scaffold 6 + 2
UGT201G1 tetur08g07460 457 scaffold 8 - 2
UGT201G2 tetur12g00360 436 scaffold 12 - 2
UGT201G3 tetur02g10390 438 scaffold 2 + 1
UGT201H1 tetur08g03000 442 scaffold 8 - 2
202 UGT202A1 tetur15g00340 436 scaffold 15 - 2
UGT202A2 tetur22g00270 437 scaffold 22 + 2
UGT202A3 tetur22g00310 435 scaffold 22 + 2
UGT202A4 tetur22g00330 437 scaffold 22 + 2
UGT202A5 tetur22g00350 437 scaffold 22 + 2
UGT202A6 tetur22g00360* 433 scaffold 22 + 2
UGT202A7 tetur22g00380 430 scaffold 22 + 2
UGT202A8 tetur22g00420 432 scaffold 22 + 2
UGT202A9 tetur22g00460 437 scaffold 22 + 2
UGT202A10 tetur22g00480 439 scaffold 22 + 2
UGT202A11 tetur22g00510 437 scaffold 22 - 2
UGT202A12 tetur30g00390 434 scaffold 30 + 2
UGT202A13p tetur30g02050* 434 scaffold 30 - 2
UGT202A14p tetur139g00010* 437 scaffold 139 + 2
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
3
UGT202A15 tetur22g00440 436 scaffold 22 + 2
UGT202A16 tetur22g00970 439 scaffold 22 + 2
UGT202B1 tetur10g02090 434 scaffold 10 + 3
203 UGT203A1 tetur01g07060 425 scaffold 1 - 2
UGT203A2 tetur04g02350 426 scaffold 4 + 2
UGT203A3 tetur06g06100 426 scaffold 6 + 2
UGT203B1 tetur09g00220 504 scaffold 9 - 2
UGT203B2 tetur09g01650 529 scaffold 9 - 2
UGT203B3 tetur09g01660 461 scaffold 9 + 2
UGT203C1 tetur05g05090 446 scaffold 5 - 2
UGT203D1 tetur10g05770 428 scaffold 10 - 2
UGT203E1 tetur16g02300 436 scaffold 16 - 2
UGT203F1 tetur36g00340 428 scaffold 36 - 2
UGT203G1 tetur36g01060 433 scaffold 36 - 2
204 UGT204A1 tetur02g03300 435 scaffold 2 + 2
UGT204A2 tetur05g00060 433 scaffold 5 + 2
UGT204A3 tetur05g00070 434 scaffold 5 + 2
UGT204A4 tetur05g00080 432 scaffold 5 + 2
UGT204A5 tetur05g00090 434 scaffold 5 + 2
UGT204B1 tetur02g09830 438 scaffold 2 - 2
UGT204B2 tetur02g09850 438 scaffold 2 - 2
UGT204C1 tetur11g06460 486 scaffold 11 + 2
205 UGT205A1 tetur11g01230 440 scaffold 11 + 2
UGT205A2 tetur11g01250 441 scaffold 11 + 2
UGT205A3 tetur32g01240 451 scaffold 32 - 2
UGT205B1 tetur32g01230 578 scaffold 32 - 2
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
4
UGT205B2 tetur32g01250 501 scaffold 32 - 2
UGT205C1 tetur11g01830 479 scaffold 11 - 2
206 UGT206A1 tetur08g05390 445 scaffold 8 - 2
207 UGT207A1 tetur08g02490 465 scaffold 8 - 2
1 Official names are assigned according to the guideline recommended by the UGT Nomenclature 2
Committee (Mackenzie et al., 2005). 3
2Tetur IDs are originated from the latest version of gene model in the dataset of T. 4
urticae_CDS_20130730.tfa accessible at the ORCAE genome portal, 5
http://bioinformatics.psb.ugent.be/orcae/overview/Tetur. Gene models with asterisk (*) represent that they 6
are modified (mostly improved) by manual annotation, therefore their sequence information found in the 7
dataset can be different from those in this article. 8
3 The deduced protein length is based on the coding region and the parenthesis indicates partial sequence. 9
4 Scaffolds used in this study are basically from London strain. 10
5 Parenthesis indicates that the exon number is not confidential due to its partial sequence. 11
† The p after the gene name is used to denote a pseudogene. 12
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
0100200300400500600700800approximate divergence time (mya)
Chelicerata
MandibulataMyriapoda
Arachnida
Acari
Scorpiones
Acariformes
Phylum Nematoda (outgroup)
PhylumArthropoda
Parasitiformes
Mesostigmata
Ixodida
Araneae
Cladocera
Decapoda
Chilopoda
Malacostraca
Branchiopoda
Hexapoda
Hemiptera
Hymenoptera
Coleoptera
Lepidoptera
Diptera
Strigamia maritima
Mesobuthus martensiiIxodes scapularisRhipicephalus microplusVarroa destructorMetaseiulus occidentalis
Tetranychus urticaeCaenorhabditis elegans
Panonychus citri
Parasteatoda tepidariorum
Daphnia pulexAcyrthosiphon pisum
Pontastacus leptodactylus
Apis melliferaNasonia vitripennisTribolium castaneumBombyx moriDrosophila melanogasterAnopheles gambiaeAedes aegypti
UGT DBUGT type
34 animal
animal
animal
animal
animal
animal
animal
animal
animal
animal
animal
animal
n/a
n/a
n/a
n/a
n/a
n/a
bacteria
bacteria
26
34
45
43
22
12
58
25
multiple
20
0
0
0
0
0
0
multiple
80
multiple
WGS
WGS
WGS
WGS
WGS
WGS
WGS
WGS
TSA
WGS
WGS
WGS
WGS
WGS
WGS
WGS
TSA
WGS
NCD
WGS
Crustacea
Myriapoda
Hexapoda
ChelicerataLoss of UGT
Gain of UGT from bacteria
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
tetur01g07060-UGT203A1tetur10g05770-UGT203D1
tetur04g02350-UGT203A2tetur06g06100-UGT203A3
tetur36g00340-UGT203F1tetur05g05090-UGT203C1
tetur36g01060-UGT203G1tetur09g00220-UGT203B1tetur09g01650-UGT203B2tetur09g01660-UGT203B3
tetur16g02300-UGT203E1
UGT203
UGT206UGT207
tetur08g05390-UGT206A1tetur08g02490-UGT207A1
tetur11g01230-UGT205A1tetur11g01250-UGT205A2
tetur32g01240-UGT205A3tetur11g01830-UGT205C1
tetur32g01230-UGT205B1tetur32g01250-UGT205B2
UGT205
tetur04g04350-UGT201D2tetur04g04300-UGT201D1
tetur05g05710-UGT201E1tetur12g00360-UGT201G2
tetur08g07460-UGT201G1tetur02g10390-UGT201G3
tetur06g02410*-UGT201F2tetur06g02430-UGT201F3tetur02g01310-UGT201F1
tetur07g06450-UGT201B14tetur07g06430-UGT201B13
tetur07g06420-UGT201B12tetur07g06390-UGT201B11tetur04g07770-UGT201B5
tetur04g07780-UGT201B6tetur04g07710*-UGT201B4p
tetur05g05060-UGT201B10tetur05g05050-UGT201B9
tetur05g05030-UGT201B8tetur05g05020-UGT201B7tetur04g07630-UGT201B3
tetur01g05700-UGT201B2tetur01g05690-UGT201B1
tetur08g03000-UGT201H1tetur05g04690-UGT201C3tetur05g04680-UGT201C2
tetur01g11870-UGT201C1tetur19g00440-UGT201A6
tetur05g09325-UGT201A4tetur21g01400*-UGT201A7p
tetur08g00190-UGT201A5tetur184g00030*-UGT201A8
tetur02g02770-UGT201A3ptetur60g00080-UGT201A2v1tetur02g02480-UGT201A2v2
tetur01g03820-UGT201A1
UGT201
tetur02g03300-UGT204A1tetur05g00060-UGT204A2
tetur05g00070-UGT204A3tetur05g00090-UGT204A5
tetur05g00080-UGT204A4tetur02g09830-UGT204B1
tetur02g09850-UGT204B2tetur11g06460-UGT204C1
UGT204
tetur10g02090-UGT202B1tetur139g00010*-UGT202A14ptetur22g00510-UGT202A11
tetur22g00460-UGT202A9tetur22g00480-UGT202A10
tetur30g02050*-UGT202A13ptetur15g00340-UGT202A1
tetur22g00310-UGT202A3tetur22g00330-UGT202A4
tetur22g00350-UGT202A5tetur22g00270-UGT202A2tetur30g00390-UGT202A12
tetur22g00360*-UGT202A6tetur22g00380-UGT202A7
tetur22g00420-UGT202A8tetur22g00440-UGT202A15
tetur22g00970-UGT202A16
UGT202
100
95100
97
100
9999
99
100
89
100
96
8594
80
9899
88
100
99
100
87
98
99
100
100
98
83
99
91
99
10089
99
98100
100
99
52
100
96
98
9995
100
96
96
100
99
99
100
95
100
100
100
0.2
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
UGT1A1 MAVESQGGRPLVLGLLLCVLGPVVSHAGKILLIPVDG-SHWLSMLGAIQQLQQRGHEIVV 59UGT41A1 -------MRCLGLLFFLVCVVTSARAYHVLCVFPIPSRSHNSLGKGIVDALLEAGHEVTW 53UGT201A1 ---------------------MNNKPKYRFLISSIDAFGHINCALAIGEILASNGHEVTF 39UGT108A2 -----------------------MSGPLTILFAPESAYGPTNNCVGIGDVLRRRGHRVVF 37
: . . . . . : * **.:.
UGT1A1 LAPDASLYIRDG------AFYTLKTYPVPFQREDVKESFVSLGHNVFENDSFLQRVIKTY 113UGT41A1 VTPYPPSELAKG----LKIVDVSATVSIS-KTVDMHEQRNSNTGVSFVKALAENITRVSL 108UGT201A1 ANHAKHKSLAD-----RRDFKFIPFDEHHFKYLNPVVKWINGLLHRFRSDALSIFN--NW 92UGT108A2 AAEASWKGRLAPLGFEEDLVDLAPPPEKPQEAGQFWKDFVRDTAPEFQKPTIEQLG--TW 95
. : : . * . .
UGT1A1 KKIKKDSAMLLSGCSHLLHNKELMASLAESSFDVMLTDPFLPCSPIVAQYLSLPTVFFLH 173UGT41A1 ATPALQQAIVQGKYDAVITETFFNDAEAGYGAVLQVPWILMSSIAMMPQ---LEAIVDEV 165UGT201A1 THDEIDFFGSIVEHCDAKNRALEQVLKENEDNFDMFIGDFMSVYPAFYR---TTLPWALV 149UGT108A2 VRPVWEELVAGARYCEPRLKEIVGRVRP-----DVIVEDNVVCFPALTT---ADVPFVRI 147
: . . : . .
UGT1A1 ALPCSLEFEATQCPNPFSYVPRPLSSHSDHMTFLQRVKNMLIAFSQNFLCDVVYSPYATL 233UGT41A1 RSVTTIPLLFNNAPTPMG----------FWDRLKNVFLHSVMVISDWLDRPKTVAFYESL 215UGT201A1 HSSNPICLYPE-GPPAWSGFSVKEKEPEKWEKFRALFGEASSVLREKMY-----SWWKSY 203UGT108A2 VSCNPLEVKGEHIPPPFSGYPAGDRAG--WDEFRAEYDRTHRELWAEFN-----AWVVDQ 200
.: . * . . : . : : :
UGT1A1 ASEFLQR-----EVTVQDLLSSASVWLFRSDFVKDYPRPIMPNMVFVGGINCLHQNPLSQ 288UGT41A1 FAPLATARGVALPPFEEALYN-VSVLLVNSHPAFAPPLSLPPNVVEIAGYHIDPKTPPLP 274UGT201A1 EVPYPNIKDSRIENWWYSEPEQLGIYHYPELLDYHEVGPIRSDKWVRLDCAIRKPDNVEP 263UGT108A2 GARP-------LPDLEFIHDGELNLYVYPEIADYAEARPLG-PAWHRLDSSVRETDEEFT 252
.: . .: .
UGT1A1 EFEAYINASGEHGIVVFSLGSMV--SEIPEKKAMAIADALGKIPQTVLWR-YTGTRPSNL 345UGT41A1 KDLQSILDSSPQGVVYFSMGSVLKSSKLSEQTRRELLDVFGSIPQTVLWK--FEEDLQDL 332UGT201A1 FIIPESLKSLPGKLIYFSLGSLG---SAQVDLMKTFIKILSKCPHRFIVSKGPKGDQLEL 320UGT108A2 LPPEWAERPEGSALIYFSLGSLG---SADVGLMRRIIASLARTPHRYIVSKGPLHDEIEL 309
. :: **:**: . : :. *: : :*
UGT1A1 ANNTILVKWLPQNDLLGHPMTRAFITHAGSHGVYESICNGVPMVMMPLFGDQMDNAKRME 405UGT41A1 PKNVHIRSWMPQSSILAHPNMKVFITHGGLLSILETLHYGVPILAVPVFGDQPSNANSAV 392UGT201A1 GPNMWGENYVDQISVL--EVVDLVITHGGNNTFLETIYAAKPLIVIPFFMDQLDNAQRAV 378UGT108A2 PPTMWGAEFLPQTRIL--PLVDLVITHGGNNTTTESLHFGKPMIVLPLFWDQYDNAQRIA 367
. .:: * :* .***.* *:: . *:: :*.* ** .**:
UGT1A1 TKGAGVTLNVLEMTSEDLENALKAVINDKSYKENIMRLSSLHKDRPVEPLDLAVFWVEFV 465UGT41A1 RNGFAKSIEYKPDMAKDMKVALNEMLSDDSYYKRARYLSKIFGDKLVPPAKVISHYVKVA 452UGT201A1 DCGIGSRINLHELDETKVLQTIEQTLSNPSYVEKIVKISDSMKSTKS--RENLVKKIETF 436UGT108A2 ELGYGVRLDPYRFTDAQLHGAMAELLDDIRLRNRLTEASRTIRARDG--LRTAADLIERH 425
* . :: .: :: :.: :. * ::
UGT1A1 MRHKGAPHLRPAAHDLTWYQYHSLDVIGFLLAVVLTVAFITFKCCAYGYRKCLGKKGRVK 525UGT41A1 IETNGAYHLRSKSLLYPWYQRWLVDIIAALLLACLAVYVVARRVLCYLYTSVTG--GGCN 510UGT201A1 LENHQKNVIKSNVLYHLKG----------------------------------------- 455UGT108A2 GHERQET----------------------------------------------------- 432
. .
UGT1A1 KAHKSKTH 533UGT41A1 RSVKVKKN 518UGT201A1 --------UGT108A2 --------
1000 200 300 400 500 aa
1000 200 300 400 500 aa
1000 200 300 400 455 aa
1000 200 300 400 438 aa
Human UGT1A1
Insect UGT41A1
T. urticae UGT201A1
Bacterial UGT108A2
UGT signature motif
Signal peptide
Transmembrane domain Cytoplasmic tail
(A)
(B)
N
N
N
N
C
C
C
C
UGT signature motif
Transmembrane domain Cytoplasmic tail
Signal peptide
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
2 Mbp
3 Mbp
4 Mbp
5 Mbp
6 Mbp
7 Mbp
8 Mbp
1 Mbp
0 bp
UGT201A1 ↓
UGT201F1 ↓
UGT204A3 ↓UGT204A4 ↓UGT204A5 ↓UGT201A4 ↓
UGT201C2 ↑UGT201C3 ↑
UGT203C1 ↑
UGT204A2 ↓
UGT201F3 ↓
UGT203A3 ↓
UGT201F2 ↓
UGT201B8 ↓UGT201B9 ↓
UGT201E1 ↓
UGT201B10 ↓
UGT201B7 ↓
UGT201A2v2 ↓UGT201A3p↓UGT204A1 ↓ UGT203A2 ↓
UGT201D1 ↑
UGT201B4p ↑UGT201B5 ↑UGT201B6 ↑
UGT201B3 ↑
UGT201D2 ↓
UGT204B1 ↑UGT204B2 ↑
UGT201G3 ↓
UGT205A1 ↓ UGT201A6 ↓
UGT205C1 ↑UGT204C1 ↓
UGT205A2 ↓
UGT201B1 ↓
UGT203A1 ↑
UGT201C1 ↓
UGT201B2 ↓
Scaffold 2 Scaffold 4 Scaffold 5Scaffold 1 Scaffold 6
UGT201B11 ↓UGT201B12 ↓UGT201B13 ↓UGT201B14 ↓
Scaffold 7
UGT201A5 ↓
UGT201H1 ↑
UGT206A1 ↑
UGT201G1 ↑
UGT207A1 ↑
Scaffold 8
UGT203B1 ↑
UGT203B2 ↑UGT203B3 ↓
Scaffold 9
Scaffold 11
UGT203D1 ↑
UGT202B1 ↓
Scaffold 10
UGT201G2 ↑
Scaffold 12
UGT202A1 ↑
Scaffold 15
UGT203E1 ↑
Scaffold 16
UGT202A13p ↑
UGT202A12 ↓
Scaffold 30
UGT205B1 ↑UGT203F1 ↑UGT203G1 ↑UGT205A3 ↑
UGT205B2 ↑
Scaffold 32 Scaffold 36 Scaffold 139
Scaffold 19
UGT201A7p ↓
UGT202A14p ↓
Scaffold 60
UGT201A2v1 ↑
Scaffold 184
UGT201A8 ↓
Scaffold 21
UGT202A2 ↓UGT202A3 ↓UGT202A4 ↓UGT202A5 ↓UGT202A6 ↓UGT202A7 ↓UGT202A8 ↓UGT202A15 ↓
UGT202A16 ↓
UGT202A9 ↓UGT202A10 ↓UGT202A11 ↑
Scaffold 22
2 Mbp
1 Mbp
0 bp
1 Mbp
0 bp
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
96
89
97
10098
99
90
99
99
100
100
96
87
9091
100
10082
99
100
83
9796
8388
97
9980
98 818680 81
82
82
92 88
96
83
8998
9792
88
92
9999
99
88
100
8992
0.5
centipede(Myriapoda)Daphnia
(Crustacea)Insect UGT50
Insects(Hexapoda)
T. urticae(Acari)
Actinobacteria/Chloroflexi
bacteria
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
UGT201A1 -------MNNKPKYRFLISSIDAFGHINCALAIGEILASNGHEUGT202A1 -------MDEKGALKVLMTSMIGLGHLHACIGIGALLRKRGHEUGT203A1 -------------MKIFFLPMDGHGHINACIGLARMLRDYNHEUGT204A1 --------MNSNPIKVLVTSINGYGHFNAALGVASLLANRGNDUGT205A1 ----------MGHLKILLSAMSGQGHVNAILGLAAILKQHNHEUGT206A1 MEESNLKGHRHKKNKIMLLPMDGAGHVLSLLGLAFYFKQRFHGUGT207A1 ---------MAKSIKILIIPTNGYGHLNIGLGLADMLTKHGHKUGT203G1 ---------MPKKYHFLITATNSYGPINAALGFGELLQRQGHHUGT108A1 -----------MSLTILFMPESAYGPTNNCIGIGDILRKRGHRUGT108A2 ---------MSGPLTILFAPESAYGPTNNCVGIGDVLRRRGHRUGT108A3 ---------MSEPLTILFAPESAYGPTNNCVGIGDVLRRRGHRUGT108A4 ---MSSTRGSLRTLTVLFAPESAYGPTNNCVGIGDVLRRRGHRUGT108A5 ---MSPTGRSLRPLTILFAPESAYGPTNNCVGIGGVLRRRGHR
Splicing site (phase 2)N-terminal
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
tomato Arabidopsisbean adultembryo larvae nymph
RPKM = 0−1 RPKM = 1−2 RPKM = 2−4 RPKM = 4−8RPKM = 8−32 RPKM = 32−64 RPKM = 64−128 RPKM = 128−256
UGT201
UGT202
UGT203
UGT204
UGT205
UGT206UGT207
UGT201A1UGT201A2v1UGT201A2v2UGT201A3pUGT201A4UGT201A5UGT201A6UGT201A7pUGT201A8UGT201B1UGT201B2UGT201B3UGT201B4pUGT201B5UGT201B6UGT201B7UGT201B8UGT201B9UGT201B10UGT201B11UGT201B12UGT201B13UGT201B14UGT201C1UGT201C2UGT201C3UGT201D1UGT201D2UGT201E1UGT201F1UGT201F2UGT201F3UGT201G1UGT201G2UGT201G3UGT201H1UGT202A1UGT202A2UGT202A3UGT202A4UGT202A5UGT202A6UGT202A7UGT202A8UGT202A9UGT202A10UGT202A11UGT202A12UGT202A13pUGT202A14pUGT202A15UGT202A16UGT202B1UGT203A1UGT203A2UGT203A3UGT203B1UGT203B2UGT203B3UGT203C1UGT203D1UGT203E1UGT203F1UGT203G1UGT204A1UGT204A2UGT204A3UGT204A4UGT204A5UGT204B1UGT204B2UGT204C1UGT205A1UGT205A2UGT205A3UGT205B1UGT205B2UGT205C1UGT206A1UGT207A1
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
−4 0 4value
Color Key
Host plant transfer Resistance Diapause
2 hrs 12 hrs 5 gen
tomato
qKO Col-0 atr1D
Arabidopsis
MR-VP MAR-AB cold diapauseUGT207A1UGT206A1UGT205C1UGT205B2UGT205B1UGT205A3UGT205A2UGT205A1UGT204C1UGT204B2UGT204B1UGT204A5UGT204A4UGT204A3UGT204A2UGT204A1UGT203G1UGT203F1UGT203E1UGT203C1UGT203B2UGT203B1UGT203A3UGT203A2UGT202B1UGT202A16UGT202A15UGT202A14pUGT202A13pUGT202A12UGT202A11UGT202A10UGT202A9UGT202A8UGT202A7UGT202A6UGT202A5UGT202A4UGT202A3UGT202A2UGT202A1UGT201H1UGT201G3UGT201G2UGT201G1UGT201F3UGT201F2UGT201F1UGT201E1UGT201D2UGT201D1UGT201C3UGT201C2UGT201C1UGT201B14UGT201B13UGT201B12UGT201B11UGT201B10UGT201B9UGT201B8UGT201B7UGT201B6UGT201B5UGT201B4pUGT201B3UGT201B2UGT201B1UGT201A8UGT201A7pUGT201A6UGT201A5UGT201A4UGT201A3pUGT201A2v2UGT201A2v1UGT201A1
UGT201
UGT202
UGT203
UGT204
UGT205
UGT206UGT207
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
Highlights
• Annotation and phylogenetic analysis of a UGT gene family in Tetranychus urticae.
• An important role of UGT in xenobiotic metabolism by transcriptome analyses.
• Comparative analysis revealed an ancient loss of UGT gene family in Chelicerata.
• Spider mites acquired UGT genes from bacteria by horizontal gene transfer.
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
1
Supplementary figures and tables
Figure S1. Phylogenetic tree of T. urticae and P. citri UGTs. A set of 80 T. urticae and 51 P. citri UGT protein
sequences were aligned using MUSCLE (Edgar, 2004) and subjected to a maximum-likelihood analysis using
Treefinder (Jobb et al. 2004). The resulting tree is depicted as a “circle tree” and only the topology is shown.
Bootstrapping values lower than 80 are not shown. Accession numbers of T. urticae and P. citri UGT protein
sequences can be found in Table S5.
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
2
Table S1. Amino acid sequence identity (aaID) of T. urticae UGTs and others from bacteria or insects. Each of the
seven representative UGTs of T. urticae was compared with 17 bacterial UGT108s and 65 insect UGTs.
UGT201A1 UGT202A1 UGT203A1 UGT204A1 UGT205A1 UGT206A1 UGT207A1
Bacteria UGT108A1 29.3 31.4 32.9 28.6 36.9 30.0 35.0
Bacteria UGT108A2 28.5 28.9 32.7 28.2 35.0 28.9 33.6
Bacteria UGT108A3 28.2 29.2 32.7 28.7 34.5 28.0 32.9
Bacteria UGT108A4 29.4 28.9 32.5 28.5 36.1 28.7 32.6
Bacteria UGT108A5 29.5 29.9 32.9 28.1 35.3 30.6 33.9
Bacteria UGT108A6 26.1 24.8 30.8 26.9 32.7 28.2 27.7
Bacteria UGT108A7 26.3 24.9 32.2 26.6 32.6 28.2 31.4
Bacteria UGT108A8 25.2 28.4 35.5 26.4 36.2 29.4 31.4
Bacteria UGT108A9 24.6 28.9 34.1 27.8 32.3 25.0 32.3
Bacteria UGT108A10 28.1 31.1 35.6 28.3 35.9 30.2 34.0
Bacteria UGT108A11 28.7 29.9 33.4 26.4 35.5 30.8 33.4
Bacteria UGT108A12 29.5 29.9 32.9 28.1 35.3 30.6 33.9
Bacteria UGT108A13 25.4 28.3 32.9 27.7 32.3 26.0 32.7
Bacteria UGT108B1 26.1 24.3 30.4 25.3 31.1 29.4 31.2
Bacteria UGT108B2 28.1 29.2 31.8 26.9 31.5 31.7 30.3
Bacteria UGT108C1 25.5 26.2 29.4 23.2 28.6 26.9 27.1
Bacteria UGT108D1 27.5 28.0 28.9 27.3 31.8 30.4 29.2
Insect UGT31A1 8.1 7.1 9.2 11.5 11.4 9.4 8.6
Insect UGT32A1 9.7 9.9 6.8 9.7 11.6 10.8 10.9
Insect UGT33A1 11.9 13.3 10.6 8.7 8.0 10.8 13.6
Insect UGT34A1 7.5 10.1 12.7 15.4 8.6 9.2 12.3
Insect UGT35A1 12.1 10.8 7.8 10.6 13.6 10.1 8.2
Insect UGT36A1 13.9 13.1 9.2 14.0 11.8 9.4 13.3
Insect UGT37A1 13.0 12.2 10.8 12.0 10.2 11.9 12.3
Insect UGT38A1 15.6 12.6 10.4 13.3 14.3 11.7 11.2
Insect UGT39B1 11.2 11.2 12.2 12.6 15.5 11.0 6.7
Insect UGT40A1 14.3 10.3 17.2 14.3 15.2 15.1 11.8
Insect UGT41A1 13.6 11.9 13.2 13.3 10.9 11.2 13.6
Insect UGT42A1 5.9 15.6 12.0 8.7 11.8 12.1 9.3
Insect UGT43A1 8.6 14.9 15.1 14.9 10.0 10.1 14.8
Insect UGT44A1 13.4 13.1 12.2 10.8 13.4 10.8 10.1
Insect UGT46A1 12.3 11.7 12.2 13.1 8.9 10.3 7.5
Insect UGT47A1 12.3 13.5 15.5 9.0 15.5 16.4 12.9
Insect UGT48A1 12.5 10.6 8.2 9.2 9.8 14.4 11.6
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
3
Insect UGT49B1 11.9 13.3 6.4 9.4 13.6 11.2 12.5
Insect UGT50A1 13.4 14.2 10.1 11.3 13.0 6.5 8.6
Insect UGT301D1 8.8 9.6 10.8 8.7 12.1 11.2 7.7
Insect UGT302C1 15.0 12.8 13.2 16.8 15.9 10.3 14.2
Insect UGT303A1 9.5 11.9 10.8 10.8 13.4 11.7 12.9
Insect UGT304A1 9.2 11.0 10.8 8.5 11.6 11.9 11.6
Insect UGT307A1 12.5 9.2 14.4 9.0 13.2 14.4 11.0
Insect UGT311A1 13.9 14.5 10.1 9.2 13.9 13.0 7.7
Insect UGT312A1 16.5 13.5 11.1 14.3 13.6 11.9 11.6
Insect UGT316A1 13.0 6.4 13.4 6.7 12.3 10.6 14.4
Insect UGT317A1 7.0 11.0 8.7 15.2 14.6 14.2 11.8
Insect UGT318A1 11.7 13.8 10.8 11.0 10.5 10.6 8.6
Insect UGT319A1 13.0 14.5 8.7 6.7 10.9 14.4 11.0
Insect UGT320A1 13.9 11.5 13.4 9.7 12.5 13.5 14.4
Insect UGT321A1 10.6 15.4 10.6 10.3 16.6 9.2 8.4
Insect UGT322A1 15.4 10.3 12.0 11.7 11.4 9.7 8.0
Insect UGT323A1 13.4 11.5 12.2 15.4 8.4 10.8 13.1
Insect UGT324A1 13.4 14.9 12.9 12.4 12.3 13.7 12.3
Insect UGT325A1 16.7 11.7 6.8 13.6 11.8 10.8 10.3
Insect UGT326A1 13.9 11.2 13.4 13.1 13.0 13.5 12.5
Insect UGT327A1 15.8 11.9 13.4 11.3 8.6 14.6 10.3
Insect UGT328A1 11.7 11.9 9.4 11.0 12.1 12.6 12.5
Insect UGT329A1 14.3 11.5 14.8 10.1 13.0 13.7 13.3
Insect UGT330A1 11.9 11.0 13.7 7.1 10.9 15.1 17.4
Insect UGT331A1 14.5 14.0 13.2 9.7 13.6 12.8 12.5
Insect UGT332A1 13.0 15.4 9.9 14.9 10.2 11.0 11.0
Insect UGT333A1 16.5 10.1 12.2 12.4 13.2 11.7 13.3
Insect UGT334A1 10.6 10.3 12.5 11.0 9.8 13.3 12.7
Insect UGT335A1 14.3 12.6 9.4 7.1 13.0 15.3 10.3
Insect UGT336A1 15.8 11.7 10.1 12.0 14.3 11.7 10.8
Insect UGT337A1 10.1 9.2 10.1 11.7 7.5 8.3 13.3
Insect UGT338A1 9.0 9.4 7.5 11.3 8.6 8.5 5.8
Insect UGT339A1 10.8 14.0 12.0 12.4 10.7 8.3 9.0
Insect UGT340C1 10.3 13.1 12.0 8.5 6.8 10.6 10.1
Insect UGT341A1 9.5 15.8 11.1 12.4 12.3 12.8 15.1
Insect UGT342A1 15.6 8.3 7.3 14.3 11.8 11.5 9.9
Insect UGT343A1 12.1 11.9 15.5 13.3 10.7 13.7 13.1
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
4
Insect UGT344A1 12.5 11.0 9.7 9.2 10.2 12.6 7.3
Insect UGT345A1 7.9 14.0 13.2 11.0 11.8 9.7 11.2
Insect UGT346A1 10.8 11.9 14.6 9.7 12.1 11.0 8.6
Insect UGT347A1 11.2 12.6 9.2 12.6 10.9 14.6 12.0
Insect UGT348A1 17.1 11.7 13.9 12.6 10.9 14.8 10.5
Insect UGT349A1 10.6 11.5 9.9 14.3 11.6 8.1 15.3
Insect UGT350A1 14.5 12.8 8.7 10.8 15.2 11.0 8.4
Insect UGT351A1 15.0 10.6 12.9 11.5 9.8 15.3 9.3
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
5
Figure S2. Multiple alignment of the C-terminal domain of T. urticae UGTs. (A) The frequent amino acid residues
of the signature motif are presented by graphical logos generated using the WebLogo application available at
http://weblogo.berkeley.edu (Crooks et al., 2004). (B) Deduced amino acid sequences of the C-terminal domain of
T. urticae UGTs are aligned. Slush (/) at some C-terminal ends refers to a position where extended sequences are
omitted.
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
6
Figure S3. Graphical representation of the UGT signature motif of different taxa. The overall height of each stack
indicates the sequence conservation at that position (measured in bits) and the height of symbols within the stack
reflects the relative frequency of the corresponding amino or nucleic acid at that position. Logos were generated
using the WebLogo application available at http://weblogo.berkeley.edu (Crooks et al., 2004).
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
7
Table S2. The log2FC of the significant differentially expressed T. urticae UGT genes in two multi-resistant strains
(MR-VP and MAR-AB) relative to the susceptible London strain.
T. urticae UGT MR-VP MAR-AB
UGT201A5 1,03 1,53
UGT201A7p 1,49
UGT201B8 -1,81 -2,77
UGT201C2 -1,49 -2,18
UGT201C3 -1,05
UGT201F3 1,06
UGT202A1 1,26 1,44
UGT204B1 2,28 3,58
UGT204B2 1,43 2,20
UGT205A1 -2,55 -1,18
UGT205A2 -1,13
UGT205C1 -1,74
Total No. 9 10
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
8
Table S3. The log2FC of significant differentially expressed T. urticae UGT genes after host plant transfer with a
different timing (2hrs, 12hrs, or 5 generations) on tomato, or after transfer to different Arabidopsis lines (qKO, Col-
0, atr1D) with different allelochemical (glucosinolate) content, relative to the same strain feeding on the ancestral,
benign host plant, bean. The qKO mutant line lacks glucosinolates and atr1D line overproduces glucosinolates
compared to the wild type, Col-0.
tomato
Arabidopsis
T. urticae UGT 2 hrs 12 hrs 5 gen
qKO Col-0 atr1D
UGT201A1
-1,20
-2,13 -2,16 -1,99
UGT201A2v1
1,19
1,10 1,92
UGT201A2v2
1,77
1,72 2,49
UGT201A4
-1,66
UGT201A7p
1,12
UGT201A8
1,71
1,51 2,52
UGT201B1
-1,03
UGT201B5
2,94
1,43 1,65
UGT201B7
-2,02 -2,61
UGT201B8
-2,41 -4,29
-2,01 -1,80 -2,17
UGT201B9
-2,09 -3,22
UGT201B10
-2,10 -4,31
UGT201B12
1,23
UGT201B13 1,17
UGT201C1 -1,09 -1,00 -1,54
UGT201C2 -1,34
UGT201F2 -1,50
UGT201F3 -1,72
UGT202A2
2,51
UGT202A4
1,59
UGT202A5
-1,03
UGT203G1 1,10 1,76
UGT204A1 1,32 1,22
UGT204A2 2,47 4,83 1,64 1,41
UGT204A3 2,29 1,35 1,38
UGT204A4 1,23 1,09
UGT204A5 1,16 1,85 1,51 2,07
UGT205A1
-1,79
UGT205C1
-1,84 -1,94 -2,77
Total No. 0 9 22
5 13 13
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
9
Table S4. The log2FC of significant differentially expressed T. urticae UGT genes in the spider mites living under
diapause-inducing conditions, with a distinction between feeding mites and mites in the non-feeding diapause
state, relative to the London strain living under standard laboratory conditions.
T. urticae UGT cold diapause
UGT201A1 -1,72 -2,82
UGT201A2v2
-1,26
UGT201A4 -1,31 -2,65
UGT201A6
-1,27
UGT201B1 1,77 1,23
UGT201B2 1,87 1,58
UGT201B4p 1,06
UGT201B5 -2,31 -2,28
UGT201B7
-1,72
UGT201B8 -2,24 -2,80
UGT201B9
-1,56
UGT201B10
-1,82
UGT201B12 1,05
UGT201B13 1,32
UGT201B14 1,07
UGT201C1 -1,73
UGT201C2 -1,42 -2,49
UGT201C3 -1,86
UGT201F1 -1,08
UGT201F3 2,18 -1,27
UGT201G1 -1,01
UGT202A2
-1,50
UGT202A4 1,24 2,81
UGT202A5
-1,85
UGT202A6
-1,54
UGT202A7 -1,05 -1,06
UGT202A8
-1,60
UGT202A15 -1,55
UGT202A16 -1,77
UGT203C1 -1,30 -1,55
UGT203D1 2,24
UGT203E1 1,11 1,29
UGT203G1 1,22
UGT204A2 -2,33
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
10
Table S4. Continued
T. urticae UGT cold diapause
UGT204A3 -1,48
UGT204A4 -1,06
UGT204A5 -1,13
UGT205A1 -1,09
UGT205B1 -1,78
UGT205C1
-3,49
Total No. 17 35
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
11
Figure S4. Phylogenetic analysis of bacterial UGTs and some selected T. urticae UGTs. A maximum-likelihood tree
was constructed by MEGA 5.2 (Tamura et al., 2011) using an insect UGT (Bombyx mori UGT41A1) as outgroup. The
seven representative T. urticae UGT protein sequences (one from each subfamily) are embedded in bacterial UGTs
(from Actinobacteria or Chloroflexi). The scale bar represents 0.5 substitutions per site. Accession numbers of UGT
protein sequences used can be found in Table S5.
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
12
Table S5. Accession numbers of UGT protein sequences used in this study.
Taxa Species UGT name GenBank Accession number BOGAS accession number
(for T. urticae)
Animal Homo sapiens UGT1A1 NP_000454.1
Animal Bombyx mori UGT41A1 AEW43175.1
Animal Tetranychus urticae UGT201A1 KJ584711 tetur01g03820
Animal Tetranychus urticae UGT201A2v1 KJ584712 tetur60g00080
Animal Tetranychus urticae UGT201A2v2 KJ584713 tetur02g02480
Animal Tetranychus urticae UGT201A3p KJ584714 tetur02g02770
Animal Tetranychus urticae UGT201A4 KJ584715 tetur05g09325
Animal Tetranychus urticae UGT201A5 KJ584716 tetur08g00190
Animal Tetranychus urticae UGT201A6 KJ584717 tetur19g00440
Animal Tetranychus urticae UGT201A7p KJ584718 tetur21g01400*
Animal Tetranychus urticae UGT201A8 KJ584719 tetur184g00030*
Animal Tetranychus urticae UGT201B1 KJ584720 tetur01g05690
Animal Tetranychus urticae UGT201B2 KJ584721 tetur01g05700
Animal Tetranychus urticae UGT201B3 KJ584722 tetur04g07630
Animal Tetranychus urticae UGT201B4p KJ584723 tetur04g07710*
Animal Tetranychus urticae UGT201B5 KJ584724 tetur04g07770
Animal Tetranychus urticae UGT201B6 KJ584725 tetur04g07780
Animal Tetranychus urticae UGT201B7 KJ584726 tetur05g05020
Animal Tetranychus urticae UGT201B8 KJ584727 tetur05g05030
Animal Tetranychus urticae UGT201B9 KJ584728 tetur05g05050
Animal Tetranychus urticae UGT201B10 KJ584729 tetur05g05060
Animal Tetranychus urticae UGT201B11 KJ584730 tetur07g06390
Animal Tetranychus urticae UGT201B12 KJ584731 tetur07g06420
Animal Tetranychus urticae UGT201B13 KJ584732 tetur07g06430
Animal Tetranychus urticae UGT201B14 KJ584733 tetur07g06450
Animal Tetranychus urticae UGT201C1 KJ584734 tetur01g11870
Animal Tetranychus urticae UGT201C2 KJ584735 tetur05g04680
Animal Tetranychus urticae UGT201C3 KJ584736 tetur05g04690
Animal Tetranychus urticae UGT201D1 KJ584737 tetur04g04300
Animal Tetranychus urticae UGT201D2 KJ584738 tetur04g04350
Animal Tetranychus urticae UGT201E1 KJ584739 tetur05g05710
Animal Tetranychus urticae UGT201F1 KJ584740 tetur02g01310
Animal Tetranychus urticae UGT201F2 KJ584741 tetur06g02410*
Animal Tetranychus urticae UGT201F3 KJ584742 tetur06g02430
Animal Tetranychus urticae UGT201G1 KJ584743 tetur08g07460
Animal Tetranychus urticae UGT201G2 KJ584744 tetur12g00360
Animal Tetranychus urticae UGT201G3 KJ584745 tetur02g10390
Animal Tetranychus urticae UGT201H1 KJ584746 tetur08g03000
Animal Tetranychus urticae UGT202A1 KJ584747 tetur15g00340
Animal Tetranychus urticae UGT202A2 KJ584748 tetur22g00270
Animal Tetranychus urticae UGT202A3 KJ584749 tetur22g00310
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
13
Animal Tetranychus urticae UGT202A4 KJ584750 tetur22g00330
Animal Tetranychus urticae UGT202A5 KJ584751 tetur22g00350
Animal Tetranychus urticae UGT202A6 KJ584752 tetur22g00360*
Animal Tetranychus urticae UGT202A7 KJ584753 tetur22g00380
Animal Tetranychus urticae UGT202A8 KJ584754 tetur22g00420
Animal Tetranychus urticae UGT202A9 KJ584755 tetur22g00460
Animal Tetranychus urticae UGT202A10 KJ584756 tetur22g00480
Animal Tetranychus urticae UGT202A11 KJ584757 tetur22g00510
Animal Tetranychus urticae UGT202A12 KJ584758 tetur30g00390
Animal Tetranychus urticae UGT202A13p KJ584759 tetur30g02050*
Animal Tetranychus urticae UGT202A14p KJ584760 tetur139g00010*
Animal Tetranychus urticae UGT202A15 KJ584761 tetur22g00440
Animal Tetranychus urticae UGT202A16 KJ584762 tetur22g00970
Animal Tetranychus urticae UGT202B1 KJ584763 tetur10g02090
Animal Tetranychus urticae UGT203A1 KJ584764 tetur01g07060
Animal Tetranychus urticae UGT203A2 KJ584765 tetur04g02350
Animal Tetranychus urticae UGT203A3 KJ584766 tetur06g06100
Animal Tetranychus urticae UGT203B1 KJ584767 tetur09g00220
Animal Tetranychus urticae UGT203B2 KJ584768 tetur09g01650
Animal Tetranychus urticae UGT203B3 KJ584769 tetur09g01660
Animal Tetranychus urticae UGT203C1 KJ584770 tetur05g05090
Animal Tetranychus urticae UGT203D1 KJ584771 tetur10g05770
Animal Tetranychus urticae UGT203E1 KJ584772 tetur16g02300
Animal Tetranychus urticae UGT203F1 KJ584773 tetur36g00340
Animal Tetranychus urticae UGT203G1 KJ584774 tetur36g01060
Animal Tetranychus urticae UGT204A1 KJ584775 tetur02g03300
Animal Tetranychus urticae UGT204A2 KJ584776 tetur05g00060
Animal Tetranychus urticae UGT204A3 KJ584777 tetur05g00070
Animal Tetranychus urticae UGT204A4 KJ584778 tetur05g00080
Animal Tetranychus urticae UGT204A5 KJ584779 tetur05g00090
Animal Tetranychus urticae UGT204B1 KJ584780 tetur02g09830
Animal Tetranychus urticae UGT204B2 KJ584781 tetur02g09850
Animal Tetranychus urticae UGT204C1 KJ584782 tetur11g06460
Animal Tetranychus urticae UGT205A1 KJ584783 tetur11g01230
Animal Tetranychus urticae UGT205A2 KJ584784 tetur11g01250
Animal Tetranychus urticae UGT205A3 KJ584785 tetur32g01240
Animal Tetranychus urticae UGT205B1 KJ584786 tetur32g01230
Animal Tetranychus urticae UGT205B2 KJ584787 tetur32g01250
Animal Tetranychus urticae UGT205C1 KJ584788 tetur11g01830
Animal Tetranychus urticae UGT206A1 KJ584789 tetur08g05390
Animal Tetranychus urticae UGT207A1 KJ584790 tetur08g02490
Animal Daphnia pulex UGT208A1 KJ584791
Animal Daphnia pulex UGT208A2 KJ584792
Animal Daphnia pulex UGT208A3 KJ584793
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
14
Animal Daphnia pulex UGT208B1 KJ584794
Animal Daphnia pulex UGT208B2 KJ584795
Animal Daphnia pulex UGT208C1 KJ584796
Animal Daphnia pulex UGT208C2 KJ584797
Animal Daphnia pulex UGT208D1 KJ584798
Animal Daphnia pulex UGT208E1 KJ584799
Animal Daphnia pulex UGT208F1p KJ584800
Animal Daphnia pulex UGT209A1 KJ584801
Animal Daphnia pulex UGT209A2 KJ584802
Animal Daphnia pulex UGT209A3 KJ584803
Animal Daphnia pulex UGT209A4 KJ584804
Animal Daphnia pulex UGT209B1 KJ584805
Animal Daphnia pulex UGT209B2 KJ584806
Animal Daphnia pulex UGT209C1 KJ584807
Animal Daphnia pulex UGT209D1 KJ584808
Animal Daphnia pulex UGT209E1 KJ584809
Animal Daphnia pulex UGT209F1 KJ584810
Animal Daphnia pulex UGT210A1 KJ584811
Animal Daphnia pulex UGT210B1 KJ584812
Animal Daphnia pulex UGT210C1 KJ584813
Animal Daphnia pulex UGT210D1 KJ584814
Animal Daphnia pulex UGT210E1 KJ584815
Animal Strigamia maritima UGT211A1 KJ584816
Animal Strigamia maritima UGT211A2 KJ584817
Animal Strigamia maritima UGT211B1 KJ584818
Animal Strigamia maritima UGT211B2p KJ584819
Animal Strigamia maritima UGT211C1 KJ584820
Animal Strigamia maritima UGT211C2 KJ584821
Animal Strigamia maritima UGT211D1 KJ584822
Animal Strigamia maritima UGT211E1 KJ584823
Animal Strigamia maritima UGT211F1 KJ584824
Animal Strigamia maritima UGT211G1 KJ584825
Animal Strigamia maritima UGT211H1 KJ584826
Animal Strigamia maritima UGT212A1 KJ584827
Animal Strigamia maritima UGT212A2 KJ584828
Animal Strigamia maritima UGT212A3 KJ584829
Animal Strigamia maritima UGT212B1 KJ584830
Animal Strigamia maritima UGT213A1 KJ584831
Animal Strigamia maritima UGT213B1 KJ584832
Animal Strigamia maritima UGT214A1 KJ584833
Animal Strigamia maritima UGT215A1 KJ584834
Animal Strigamia maritima UGT216A1 KJ584835
Bacteria Streptomyces antibioticus UGT101A1 CAA80301.1
Bacteria Streptomyces lividans UGT101A2 AAA26780.1
MANUSCRIP
T
ACCEPTED
ACCEPTED MANUSCRIPT
15
Bacteria Pantoea agglomerans UGT102A1 AAA64979.1
Bacteria Pantoea ananatis UGT102B1 BAA14125.1
Bacteria Pseudomonas aeruginosa UGT103 AAA62129.1
Bacteria Mycobacterium tuberculosis UGT104A1 NP_216042.1
Bacteria Mycobacterium avium UGT104A2 AAD44208.1
Bacteria Mycobacterium avium UGT104A3 AAD44209.1
Bacteria Mycobacterium leprae UGT104A4 NP_302527.1
Bacteria Mycobacterium tuberculosis UGT104A5 NP 216040.1
Bacteria Amycolatopsis orientalis UGT105A1 CAA11775.1
Bacteria Amycolatopsis orientalis UGT105A2 AAB49299.1
Bacteria Amycolatopsis orientalis UGT105A3 CAA11776.1
Bacteria Amycolatopsis orientalis UGT105A4 AAB49298.1
Bacteria Amycolatopsis orientalis UGT105A5 CAA11774.1
Bacteria Amycolatopsis balhimycina UGT105A6 CAA76551.1
Bacteria Amycolatopsis balhimycina UGT105A7 CAA76552.1
Bacteria Amycolatopsis balhimycina UGT105A8 CAA76553.1
Bacteria Bacillus subtilis UGT106A1 NP_390075.1
Bacteria Staphylococcus aureus UGT106B1 CAA74741.1
Bacteria Bacillus subtilis UGT107A1 CAA05612.1
Bacteria Streptosporangium roseum DSM 43021 UGT108A1 YP_003337539
Bacteria Streptomyces violaceusniger Tu 4113 UGT108A2 YP_004814675
Bacteria Streptomyces rapamycinicus NRRL 5491 UGT108A3 WP_020865434
Bacteria Streptomyces gancidicus UGT108A4 WP_006134531
Bacteria Streptomyces ghanaensis ATCC 14672 UGT108A5 WP_004978524
Bacteria Frankia sp. EuI1c UGT108A6 YP_004014445
Bacteria Frankia sp. CN3 UGT108A7 WP_007512641
Bacteria Mycobacterium smegmatis str. MC2 155 UGT108A8 YP_006569926
Bacteria Mycobacterium vaccae UGT108A9 WP_003931471.1
Bacteria Nonomuraea coxensis DSM 45129 UGT108A10 WP_020546554
Bacteria Actinoplanes globisporus DSM 43857 UGT108A11 WP_020512288
Bacteria Streptomyces viridosporus T7A UGT108A12 WP_016823039
Bacteria Mycobacterium sp. VKM Ac 1815D UGT108A13 WP_019511661
Bacteria Sphaerobacter thermophilus DSM 20745 UGT108B1 YP_003320072.1
Bacteria Thermomicrobium roseum DSM 5159 UGT108B2 YP_002522616
Bacteria Saccharopolyspora erythraea NRRL2338 UGT108C1 WP_009943080
Bacteria Mycobacterium smegmatis JS623 UGT108D1 YP_007294515