Expression and structure-function characterisation of ... - DIVA
-
Upload
khangminh22 -
Category
Documents
-
view
1 -
download
0
Transcript of Expression and structure-function characterisation of ... - DIVA
2
Doctoral thesis at Stockholm University
Department of Biochemistry and Biophysics
©Sue-Li Dahlroth, Stockholm 2008
ISBN 978-91-7155-755-1 pp 1-76
Printed in Sweden by Universitetsservice AB, Stockholm 2008
Distributor: Department of Biochemistry and Biophysics, Stockholm University
Papers I-III are reprinted with permission from the publisher.
3
Abstract
Human viruses coexist with their hosts, sometimes silently and sometimes
by causing a vast range of symptoms. To fully understand these seemingly
simple particles, how they have evolved, their pathogenesis, to be able to
develop new drugs and potentially new vaccines and diagnostic tools we
need to study the individual viral proteins both functionally and structurally.
In order to determine and study a protein structure, large amounts of it is
needed. The easiest way to obtain a protein is to recombinantly overexpress
it in the well-studied bacterium Escherichia coli. However, this expression
host has one major disadvantage, overexpressed proteins might not be folded
or be insoluble. Within the field of structural genomics, protein production
has become one of the most challenging problems and the recombinant
overexpression of viral proteins has in particular proven to be very difficult.
The first part of the thesis concerns the recombinant overexpression of
troublesome proteins in E. coli. A new method has been developed to screen
for soluble overexpression in E. coli at the colony level, making it suitable
for screening large gene collections. This method was used to successfully
screen deletion libraries of troublesome mammalian proteins as well as
complete ORFeomes from five herpesviruses. As a result soluble expression
of previously insoluble mammalian proteins was obtained as well as crystals
of three proteins from two oncogenic human herpesviruses, all linked to
DNA synthesis of the viral genome. The second part of the work presented
here concerns the structural studies of three herpesviral proteins. SOX from
Kaposi’s sarcoma associated herpesvirus is involved in processing and
maturation of the viral genome. Recently SOX has also been implicated in
host shutoff at the mRNA level. With this structure, we propose a substrate
binding site and a likely exonucleolytic mechanism. The holoenzyme
ribonucleotide reductase is solely responsible for the production of
deoxyribonucleotides and regulates the nucleotide pool of the cell. The small
subunit, R2, has been solved from both Epstein Barr virus and Kaposi’s
sarcoma associated herpesvirus. Both structures show disordered secondary
structure elements in their apo-and mono metal forms, located close to the
iron binding sites in similarity to the p53 induced R2 indicating that these
two R2 proteins might play a similar and important role.
4
Table of Contents
INTRODUCTION 8
HUMAN HERPESVIRUSES 11
HERPESVIRUS STRUCTURE 12 HERPESVIRUS SUBFAMILIES 13 ALPHA HERPESVIRUSES 14 BETA HERPESVIRUSES 15 GAMMA HERPESVIRUSES 15
STRUCTURAL GENOMICS 17
PROTEIN PRODUCTION 19
PHYSICAL PARAMETERS 20 FUSION PROTEINS AND PROTEIN TAGS 21 CONSTRUCT DESIGN 22
SCREENING FOR RECOMBINANT SOLUBLE OVEREXPRESSION
23
COLONY SCREENING METHODS 24 THE COFI BLOT (PAPER I AND III) 28
LIBRARY METHODS 31
DELETION LIBRARIES SCREENED WITH THE COFI BLOT (PAPER II) 32
EXPRESSION SCREENING COMPLETE GENOMES 36
STRUCTURAL GENOMICS AND HUMAN PATHOGENS 38
THE DAILY SCOOP (PAPER IV) 39 SCOOP AND OTHER HERPESVIRUS ORFEOMES 41
MOVING FURTHER DOWN THE PIPELINE 42
HOST SHUTOFF IN HHV 45
SOX 46 STRUCTURAL STUDIES OF THE SOX PROTEIN FROM KSHV (PAPER V) 48 SUBSTRATE BINDING 50 SOX AS AN RNASE? 51
NUCLEOTIDE SYNTHESIS 53
RIBONUCLEOTIDE REDUCTASE 53
5
STRUCTURAL STUDIES OF THE R2 SUBUNIT OF THE RIBONUCLEOTIDE
REDUCTASE FROM EBV AND KSHV (MANUSCRIPT IN PREPARATION) 54
FUTURE PROSPECTS 57
ACKNOWLEDGEMENTS 58
REFERENCES 60
6
List of papers
This thesis is based on the following papers, referred to in the text by their
roman numerals.
I. Cornvik, T., Dahlroth, S-L, Magnusdottir, A., Herman, M.D, Knaust,
R., Ekberg, M. and Nordlund P.
Colony-Filtration blot, A new screening method for soluble protein
expression in E. coli.
Nature Methods. 2005, 2(7):507-9.
II. Cornvik, T, Dahlroth, S-L., Magnusdottir, A., Flodin, S., Engvall, B.,
Lieu, V., Ekberg, M. and Nordlund, P.
An efficient and generic strategy for producing soluble human
proteins and domains in E.coli by screening construct libraries.
Proteins: Structure, Function and Bioinformatics. 2006, 1;65(2):266-
73.
III. Dahlroth, S-L., Nordlund, P. and Cornvik, T.
Colony filtration blot for screening soluble expression in Escherichia
coli.
Nature Protocols, 2006, 1(1):253-8.
IV. Dahlroth, S-L., Lieu., V, Haas, J. and Nordlund, P.
Screening Colonies of Pooled ORFeomes, SCOOP: A rapid and
efficient strategy for expression screening ORFeomes in E. coli.
Submitted
V. Dahlroth, S-L., Gurmu, D., Schmitzberger, F., Erlandsen, H. and
Nordlund, P.
Structure of the shutoff and exonuclease protein from the oncogenic
Kaposi’s sarcoma associated herpesvirus
Manuscript
7
Abbreviations
2D-gel Two-dimensional gel
AE Alkaline exonuclease
AIDS Acquired immune deficiency syndrome
CAT Chloramphenicol acetyltransferase
CMV Cytomegalovirus
CoFi blot Colony filtration blot
EBV Epstein Barr virus
GFP Green fluorescent protein
HHV Human herpesvirus
HIV Human immunodeficiency virus
HSV-1, 2 Herpes simplex 1 and 2
IMAC Immobilised metal affinity chromatography
IPTG Isopropyl -D-1-thiogalactopyranoside
KS Kaposi’s sarcoma
KSHV Kaposi’s sarcoma associated herpes virus
MCD Multicentric Castleman’s disease
mCMV Murine Cytomegalovirus
MS Mass spectrometry
NMR Nuclear magnetic resonance
ORF Open reading frame
ORFeome The complete collection of ORFs from one organism
PEL Pleural effusion lymphoma
RNR Ribonucleotide reductase
SAD Single wavelength anomalous dispersion
SCOOP Screening colonies of ORFeome pools
SG Structural genomics
SOX Shut off and exonuclease
VZV Varicella zooster virus
WHO World health organisation
UNICEF United nations children’s fund
8
a) b)
c) d)
e)
Introduction
Viruses are small biological entities, existing on the border of life as we
define it. It cannot survive on its own and behaves as a molecular parasite,
making use of its host’s cellular machinery to create infectious progeny.
They come in many different and diverse forms and have been around since
ancient times, evolving with its surroundings (1). Viruses are divided into
families, subfamilies, genus and strains and can vary in both shape and size.
The smallest known virus belongs to the Parvovirdae family with a size of
20-25 nm and the largest known virus is the Mimivirus of 400 nm (2). They
will carry their genetic material of only a few genes to hundreds as either
DNA or RNA, which is encapsulated in a protective layer of proteins
(capsid) and sometimes a membrane consisting of lipids and proteins.
Proteins in this outer shell will determine their mode of infection and which
hosts they infect, be it bacteria, plants, fungi, animals or humans (1, 2).
Figure 1
Pictures1 of different viruses taken with electron microscopy. a) Adenovirus (~90-100 nm) b)
Bacteriophages (~20-200 nm) c) Herpesvirus (~200 nm) d) Hepatitis C virus (50 nm) e) the
Ebola virus (~80 nm).
1 Pictures are part of the public domain and under no copyright restrictions. http://www.wikipedia.org
9
Certain types of viruses that infect plants can destroy entire harvests of
certain crops causing enormous economic damage each year (3). In humans
viral infections can be life long or temporary and can cause symptoms that
can range from anything like a common cold, diarrhoea, the flu, chicken
pox, and measles to hepatitis, polio, cancer, AIDS (acquired immune
deficiency syndrome), encephalitis, Ebola hemorrhagic fever and so on.
Viruses can roughly be divided into DNA or RNA viruses depending on how
they carry their genetic material. For both classes, the genome can be double
stranded (ds) single stranded (ss), circular or linear. The life cycle of a virus
(Figure 2) can be divided into several stages, attachment to the target cell,
entry (by endocytosis, fusion or genetic injection), replication and shedding
(the process when new viral particles leave the cell). Shedding occurs either
through lysis, budding, apoptosis or exocytosis (2).
After host cell entry, DNA viruses must move its genome into the host cells
nucleus, the site for DNA replication and transcription. The RNA, from
RNA viruses, can either remain in the cell cytoplasm, which will then be the
scene for its life cycle, or it can convert its RNA into DNA that will move
into the nucleus and fuse with the hosts’ genome. These latter types of RNA
viruses, of which the best known is HIV (human immunodeficiency virus)
are called retroviruses and cause lifelong infections. An interesting fact is
that up to 8% of the human genome is believed to be remnants of retroviral
infections (4) and although we carry what is referred to as proviruses in our
genome, they do not as far as we know cause disease.
It is however not only retroviruses that can cause lifelong infections. Many
additional elements determine weather or not a viral infection will persist,
such as the target cell, the individual’s immune system and how the viral
genome is maintained and replicated once inside its host. For instance
herpesviruses are relatively large dsDNA viruses infecting a wide range of
host cells. Herpesviral infections are life long due to target cell type, genome
maintenance and a cunning strategy to evade the immune system (5).
10
c)
d)
a) e)
Figure 2
The general life cycles of viruses from attachment to shedding. a) A DNA virus, that enters
through fusion. The DNA is exported to the nucleus where it is replicated and transcribed.
The mRNA is transported to the cytoplasm and translated into viral proteins. New virions are
shed through exocytosis. b) An RNA virus enters through endocytosis and is stripped. The
RNA is either c) translated directly into viral proteins and new virions are made or d) the
RNA is reversibly transcribed into DNA that enters the nucleus and fuses with the host
genome. This provirus is transcribed and translated into virions that are shed through budding.
e) The RNA genome is injected into the host cell and is translated into viral proteins, which
assemble into new infectious virions that are shed through host cell lysis.
Since viruses are the causative agents for numerous mild symptoms like
colds but also very brutal diseases like cervical cancer, Burkitt’s lymphoma,
liver cancer etc they are intensely studied. Their mode of infection,
pathogenesis and epidemiology as well as their molecular structure and
cellular interactions are of huge interest with the goal of developing
diagnostic tools, vaccines and antiviral drugs. As huge as the discovery and
development of penicillin and other antibiotics, as a treatment for bacterial
infections in the late 1920’s, is the discovery of vaccination (from the Latin
word vacca meaning cow) in the late 18th century. In 1796 Edward Jenner
used the cowpox virus to vaccinate humans, which resulted in protection
b)
11
against the two smallpox viruses that can cause blister-like scarring in the
face, blindness and even death. Smallpox was officially declared eradicated
in 1979. Polio, caused by the poliovirus was for a very long time a dreaded
childhood disease that can cause paralysis, meningitis and even death.
Vaccines against polio were developed by Jonas Salk in 1952 and Albert
Sabin in 1962. Since the start of a global vaccination effort in 1988 by
WHO, UNICEF and the Rotary Foundation, the number of reported cases
has dropped from hundreds of thousands to only thousands each year2.
Several continents are today declared as polio-free and a global eradication
has been proposed, although is still persists in some developing countries
(6). Even though many viral infections can be stopped by vaccinations there
are still many viruses, such as HIV, Hepatitis C, Dengue fever, for which
vaccines do not exist.
Human herpesviruses
An immense number of books and scientific articles have been written about
a wide range of topics concerning herpesviruses, from the overall virion
structure, the mode of transmission, prevalence in ethnic and social groups,
to the regulation of specific proteins in an infected cell. The main purpose of
this thesis is not to give an absolute introduction to herpesviruses, but just a
sneak peak and enough information about them for the reader to understand
the importance of this work as well as give a general grasp of the complex
relationship between these viruses and their hosts.
To date, hundreds of herpesviruses have been identified but only 8 are
known human pathogens and for the remainder of this thesis the focus will
be on the human herpesviruses and they will be referred to by their common
names or abbreviations (Table 1). All herpesviruses are large dsDNA viruses
that share a common overall structure and similar life cycle. Depending on
their mode of infection, the length of the life cycle and target cells, they are
further divided into subfamilies (2, 5). These subfamilies are further divided
into genera, although these will not be referred to or mentioned in this text.
2 http://www.who.int/en/
12
Table 1
A list of the 8 human herpesviruses, their subfamily, formal names, common names and
abbreviations. In the text, they will henceforth be referred to either by their common name or
abbreviations in column 5, except for HHV-6 and HHV-7.
Subfamily Formal name Abbrev Common name Abbrev
Alpha herpesvirus Human
herpesvirus 1
HHV-1 Herpes simplex virus-1 HSV-1
Alpha herpesvirus Human
herpesvirus 2
HHV-2 Herpes simplex virus-2 HSV-2
Alpha herpesvirus Human
herpesvirus 3
HHV-3 Varicella-zoster virus VZV
Gamma herpesvirus Human
herpesvirus 4
HHV-4 Epstein-Barr virus EBV
Beta herpesvirus Human
herpesvirus 5
HHV-5 Human
cytomegalovirus
HCMV
Beta herpesvirus Human
herpesvirus 6
HHV-6 - -
Beta herpesvirus Human
herpesvirus 7
HHV-7 - -
Gamma herpesvirus Human
herpesvirus 8
HHV-8 Kaposi’s sarcoma-
associated herpesvirus
KSHV
Most of what we know today about herpesviruses has been based on studies
of the herpes simplex virus-1 (HSV-1) due to its early identification and easy
cultivation in cell cultures. Recently, however, significant progress has been
made in understanding the structure, biology and pathogenesis of the other
human herpesviruses.
Herpesvirus structure
As already mentioned, the herpesvirus family is a group of dsDNA viruses,
which are ~200 nm in diameter with a genome size of ~130-250 kbp (70-170
genes). What also unites herpesviruses is the common architecture of the
infectious particles (Figure 3). In the infectious virion, the genome is linear
and is wrapped around a core of proteins. This genome is contained within
an icosahedral shell, the capsid, which is made up of two types of oligomeric
proteins, hexon and penton capsomers. Surrounding the capsid is the poorly
characterized tegument, an amorphous mass consisting of various essential
13
and non-essential proteins that are delivered to the cells at the very initial
stage if infection (5). Amino acid sequencing and MS analyses have been
carried out to determine the protein content of the tegument. Apart from
cellular proteins (that might be specifically or non-specifically packaged into
the virions) (7-9) the tegument contains more than 20 for HSV-1 and more
than 30 for HCMV virus-encoded proteins that aid in viral replication and
immune evasion (8, 10, 11). The tegument is surrounded by a membrane of
lipids and glycoproteins, (each of the herpesviruses encodes a set of 20-80
glycoproteins) used for target cell recognition, attachment and entry (5, 12,
13).
Figure 3
Schematic picture of the overall structure of a herpesvirus. The DNA core is surrounded by
the capsid, the tegument and a lipid envelope containing glycoproteins.
Herpesvirus subfamilies
All herpesviruses belong to one of three subfamilies, alpha, beta and gamma,
depending on their host range, length of reproductive cycle and target cells.
In addition to their overall structure, the three herpesvirus subfamilies share
the mode of infection and life cycle (5, 14). A herpesvirus life cycle has two
distinct phases, a latent and a lytic phase. After attachment to the target cell,
the lipid membrane is fused with the cellular membrane and the capsid and
tegument proteins are released into the cell. The capsid is dissolved and the
DNA is transported into the nucleus where is circularises into an episome.
The viral episome is replicated and maintained with the host genome. Only a
very small set of genes is expressed during the latent phase, and their
products block apoptotic pathways and aid in immune evasion and genome
maintenance. During the latent phase, the infected individual shows no
symptoms of infection. At a given signal, for instance a weakened immune
system due to a cold, the virus is reactivated and goes into lytic phase. The
14
molecular signals that cause reactivation of herpesviruses are still not
entirely known, but HSV reactivation has been ascribed to physical damage,
ultraviolet light, hormones, or even fever. In the lytic phase, the virus will
take over host gene expression and shuts it down. It will then start
replication of its own genome and subsequently produce viral proteins. New
infectious virions will assemble and leave the host cell (5). A common
characteristic of herpesvirus infections is that they rarely pose any real
threats to a healthy person and although infection is life long a person can go
through life without even knowing that they are infected. The real problem
occurs in people with weakened immune systems where infection can lead to
organ failure and consequently death (15-18).
Alpha herpesviruses
HSV-1 is the prototype of the alpha herpesvirus family. The alpha
herpesviruses HSV-1/2 causes cold sores and genital herpes while VZV
causes chicken pox and shingles (14). Symptoms of infection will show in
epithelial cells such as skin and mucosa and these cells will consequently be
targeted by the immune system. However, the target destination is sensory
neurons in the brain. Infection starts at the mucosal surface, where the virus
will undergo lytic replication in the surrounding epithelial cells. After this it
enters a nearby sensory neuron, where it will establish a lifelong albeit latent
infection. The capsid travels up the axon on microtubules to the nucleus
where the genome enters the nucleus and circularises into an episome (19,
20). Upon reactivation, the virus will travel back down the axon to epithelial
cells, where further lytic spread will result in symptoms and a hopeful
transmission to new hosts. Although about 90 % of the general population is
infected with HSV it is only very rarely that this will cause any symptoms
other than cold sores (5). However, alpha herpesvirus infections can result in
encephalitis most commonly in children, the elderly, and people with
weakened immune systems (i.e. those with HIV/AIDS or cancer) although
this is very rare and only occasionally fatal (21).
15
Beta herpesviruses
Beta herpesviruses, such as HCMV, HHV-6 and HHV-7 replicate more
slowly than alpha herpesviruses and establish latency in progenitor cells of
the bone marrow, monocytes and T-cells, which are all part of the immune
system (5).
The best characterised member in this subfamily is HCMV, which replicates
in a vast number of cells i.e. macrophages, dendritic cells, colonic and retinal
cells, endothelial cells (22). It has been estimated that >60% of the general
population carries CMV (5) and infection usually goes unnoticed in healthy
adults. It is only in immunocompromised individuals (like HIV patients,
organ transplant recipients as well as unborn babies) that serious conditions
such as pneumonia, encephalitis and retinitis arise (22, 23). HCMV infection
is considered the major cause of these conditions and the subsequent
mortality among immunosuppressed individuals (24, 25).
Gamma herpesviruses
There are two human herpesviruses that belong to the gamma herpesvirus
subfamily, EBV and KSHV. A key feature of these viruses is their capacity
to induce lymphoproliferation and cancers (5, 26). EBV’s major targets are
epithelial cells and B-cells where it also establishes latency. KSHV infects a
wide range of cells in vivo and in vitro for example endothelial cells, B-cells
epithelial cells and fibroblasts (27). After infection the linear genome
circularises into an episome, which tethers itself to the host chromosome by
a specific protein and replicates in concert with the hosts genome (5, 28, 29).
EBV causes infectious mononucleosis, better known as “kissing disease”
and it has been estimated that >90% of the worlds adult population carries
EBV3. EBV was the first human tumour virus discovered and is associated
with Burkitt’s lymphoma and nasopharyngeal carcinoma and to several other
malignancies for instance several types of Hodgkin’s lymphoma and gastric
carcinoma. In AIDS patients EBV causes a number of other lymphomas and
tumours (5, 30). In addition, there is a growing body of evidence, although
very controversial, that suggests a connection between EBV infection and
3 http://www.who.int/en/
16
liver and breast cancer as well as certain types of auto-immune diseases like
multiple sclerosis, rheumatoid arthritis and diabetes (31, 32)
Kaposi’s sarcoma (KS) was first described in 1872 as a rare purplish-
pigmented sarcoma of the skin typically found in elderly men of Jewish and
Mediterranean descent. However, during the onset of AIDS in the 1980’s
there was a noticeable increase of KS and in 1994 the cause was identified as
a new human herpesvirus subsequently called Kaposi’s sarcoma associated
herpesvirus (KSHV) (33).
Besides causing KS, KSHV has been associated with some rare but lethal
lymphomas, pleural effusion lymphoma (PEL) and multicentric Castleman’s
disease (MCD) (5, 34). In Europe and the US <3% of the general population
are infected with KSHV. However, in some sub-Saharan countries where KS
was almost unknown before HIV, >50% are carriers (35) and in these
countries KS has become one of the most clinically described neoplasms. As
with EBV, KSHV has also been linked to other more controversial
conditions like sarcoidosis (5, 36, 37) and multiple myeloma (38, 39).
What really sets KSHV aside from the rest of the human herpesviruses is the
amount of cellular genes that is has copied throughout evolution, in
something called molecular mimicry or molecular piracy, where more than a
dozen cellular genes have been copied. Furthermore, several of these genes
have potential tumour related functions, so called oncogenes, meaning that
they can affect the cell cycle, apoptosis and other types of cell signalling
(40-42). EBV, on the other hand, encodes several highly evolved
transcriptions factors and signalling proteins that induce many of the same
cellular genes that KSHV has pirated into its own genome (5).
Herpesviruses have coevolved with their hosts to establish lifelong
infections in various cell types (43, 44). They are the major cause of several
minor syndromes and major malignancies in humans. Whether or not the
symptoms are manageable with today’s treatments or more severe is
determined by the individual’s immune system and genetic predispositions.
In immunocompromised patients, such as organ transplant recipients, cancer
patients, and AIDS patients a herpesviral infection can cause major
complications which could result in death.
To fully understand these viruses the individual proteins, such as virion
proteins and proteins from various stages of the viral life cycle, can be
17
studied. However, some of these proteins can be hard to come by, since they
are not present or expressed in sufficient amounts in the virion particle or
target cells. The easiest way to obtain these proteins would therefore be to
recombinantly overexpress them. Recombinant proteins from herpesviruses
could help in creating vaccines, yield in high-resolution protein structures
that can help in understanding the viral life cycle, evolution, and
pathogenicity and might serve as potential drug targets.
Structural genomics
It is broadly accepted that the function of a proteins is dependant on the
amino acid sequence and how this chain of amino acids is folded in the
three-dimensional space. It is also widely accepted that the information for
the folding pattern of a protein resides in its linear DNA sequence. However,
this folding information is at present much too hidden from us and in order
to understand how a protein works at the molecular level we must study its
three-dimensional structure, preferably at high resolution.
In the wake of the genomics efforts, the effort to determine the complete
genomic sequence of all organisms (45-48), new emerging fields have risen
with aims to, on a-full-organism-scale determine patterns, trends, functions
and pathways among and within these genes and their corresponding gene
products, be it at the transcriptional, translational or degradation level. Huge
databases with massive amounts of information have become available for
searches, which in turn have yielded in vast amounts of new results and data.
Structural genomics (SG), the effort to structurally determine a large number
of proteins from one organism/genome, is such a field (49). In Table 2, the
approximate number of genes for some organisms and viruses are shown and
the number of unique protein structures for each of them4.
To date less than 5% of the human herpesviral proteins have been
structurally characterised and even less have been solved within the context
of SG (50, 51).
4 http://www.rcsb.org/pdb/home/home.do
18
Table 2
Unique structures submitted to PDB for a few organisms/viruses also shown is the
approximate number of genes5.
Organism ~No of genes Unique structures in PDB
Homo sapiens 20,000 4080
Saccharomyces
cerevisiae
7,700 533
Escherichia coli 4,400 1403
HIV 9 32 (6)
EBV(HHV-4) 80 12
KSHV (HHV-8) 88 9
Within structural genomics projects there usually exists a common approach
to which all target proteins must succumb. This approach/strategy is usually
referred to as a pipeline and has the general outline of target selection,
cloning, overexpression, purification, crystallisation and structure
determination. Even though these steps are common, the execution of them
may vary, for instance how the targets are chosen (what criteria they are
based on) how the cloning is done (i.e. digestion followed by ligation,
recombination cloning or ligase independent cloning), which expression
system is used (i.e. bacteria, yeast, insect cells) (52-60). But the common
denominator to all these steps within SG initiatives is the aim to achieve as
high output of target structures as possible. To be able to work with as many
target proteins as possible and to increase success rates new methods within
all steps have been developed and evolved. In the early days of SG it became
evident that one of the major obstacles was, and still is, the production of
suitable protein samples of sufficient amount and purity (61). Since then,
statistics for all the steps in the SG pipeline from various worldwide SG
efforts have been gathered. SG initiatives have together cloned more than
100,000 targets from various organisms and produced suitable protein
samples for downstream processing for about 1/3 of the cloned targets7.
5 http://www.genomesonline.org 6 For HIV, the number of structures exceed the number of genes. The HIV genes are subject
to splicing, leaky scanning and frame shifting and products of translation are subjected to
protease activity hence creating more proteins than genes. In addition, structures from
different strains of HIV have been solved. 7 http://targetdb.pdb.org/statistics/targetstatistics.html
19
Protein production
Since certain proteins are present only at very low levels and at certain time
points in a living organism, the most ethical and effective way of obtaining a
protein of interest is to recombinantly overexpress it. High-resolution
structures of biological macromolecules, such as proteins, can be determined
with several methods. The most convenient and efficient methods are X-ray
crystallography and NMR. For X-ray crystallography, the dominating
method, an absolute requirement is well diffracting protein crystals.
Numerous crystallisation trials and optimisations might be necessary to
produce a well diffracting crystal and therefore large amounts of pure and
soluble protein is needed.
The most widely used expression host to date for recombinant
overexpression is the well-studied bacterium E. coli. The major reason for
using E. coli is that with great ease and low cost, large amounts of biomass
can be generated (62-64). E. coli also possesses other advantages beneficial
for structural biologists. For instance, it will produce a very homogenous
protein sample since it lacks the machinery to create certain covalent
modifications like glycosylations.
In principal, E. coli has the same protein production machinery as any other
cell, although is differs in ways that might create problems. For instance,
when trying to overexpress a protein, the bacterium might form large
insoluble aggregates called inclusion bodies (62, 65, 66). Inclusion bodies
consist of unfolded or partly folded proteins and might take up a very large
volume of the cell (67, 68), although there have been situations where
correctly folded and active proteins have been found in inclusion bodies
(69). The already mentioned lack of covalent modifications, which might be
needed for proper protein folding and function, could be one reason for
inclusion body formation. E. coli may also lack certain important tRNAs
that could lead to halted translation as well as lacking certain folding
partners such as chaperones (67, 68, 70-73). Another problem when
overexpressing proteins is in vivo degradation by indigenous proteases or
that the target protein in itself is toxic to the bacterium (74). The
overexpression of eukaryotic proteins in E. coli is especially troublesome
(75), although even when trying to overexpress indigenous E. coli proteins,
in vivo proteolysis occurs as well as inclusion body formation (65, 66, 76).
20
In contrast to prokaryotic proteins, eukaryotic proteins tend to be, on an
average, larger and consist of more domains that are connected by flexible
linker regions (77) and it has been shown that the size of a protein correlates
with the success rate of its soluble expression in E. coli (53). As already
mentioned, the official success rates from SG efforts for soluble production
of prokaryotic proteins, is 50% and 30% for eukaryotic proteins when a
basic SG pipeline is used (53, 60). However, in these statistics there is no
information of the success rates in regard to full-length protein expression or
protein size.
To increase the likelihood of obtaining soluble protein, when using E. coli,
there are generally two approaches: i) either change physical parameters of
the experiment (like the bacterial strains, culture conditions, promoters or
fusion partners or ii) change the properties of the target protein.
To ensure high success rates and low costs, the soluble recombinant
expression should be screened before proceeding with large-scale expression
and purification.
Physical parameters
To compensate for rare codons, tRNAs can be co-expressed in E. coli (78,
79) and numerous strains are commercially available that can co-express
these tRNAs. Typically these strains also lack certain proteases that could
lead to protein degradation (80). Even strains that should be more resistant to
toxic proteins and strains that provide the right oxidizing conditions,
permitting disulfide bonds to be formed, have been created (81-83). Another
parameter that can influence the recombinant expression of a target protein
is the culture conditions (62). The growth medium can be changed as well as
the growth temperature. Although very few systematic studies have been
reported (84, 85), it is still believed that the expression medium could
influence soluble expression. In regard to expression temperatures, more
support exists for its influence on the expression (62, 72) than for the
expression medium. For several proteins it has been shown that by
decreasing the temperature, target proteins could be rescued from the fate of
inclusion bodies (62, 85-87). How the solubility of a protein in a bacterium
correlates with a decrease in expression temperature is not fully understood
and might be due to a combination of factors involved in the transcription
21
and translation as well as folding of the protein. When the transcription and
translation machineries slow down, due to decreased temperature, the
protein might have time to fold in a proper way. The attractive forces,
between hydrophobic parts, that could lead to protein aggregation are
potentially weaker at low temperatures (88). It has also been shown that the
expression of several indigenous chaperones are induced at lower
temperatures (89).
Although new bacterial strains have been created and culture conditions are
varied, the problem of inclusion bodies and proteolysis still persists,
especially for mammalian proteins.
Fusion proteins and protein tags
Fusion proteins are often large soluble proteins that are subcloned upstream
or downstream of the target protein. It has been shown that by adding a large
soluble fusion protein, the folding propensity and therefore the solubility of
the target protein itself can be increased (75, 90-94). The most widely used
fusion proteins are glutathione S-transferase (GST), maltose binding protein
(MBP) and thioredoxin (TrxA). Fusion proteins do not only serve as
solubilising factors but can also aid in purification and detection of the target
protein. Although there are obvious benefits of adding a large soluble
protein, there are some clear limitations to it. For instance, the fusion protein
should preferably be removed before crystallisation trials either by laborious
recloning or digestions. Large fusion partners might also alter the solubility
of the target protein in a negative way and removal could therefore result in
an unpleasant surprise, such as protein aggregation and precipitation (94-98).
Instead of adding a large fusion protein for detection and purification, that
could potentially alter the solubility of the target protein, small peptide tags
can be used instead. The most commonly used peptide tag is the His-tag. A
stretch of six histidines is added, in the cloning step either upstream or
downstream of the target protein, which has the ability to bind divalent
cations, typically Ni2+ or Co2+. If these cations are immobilised on a gel
resin, the target protein can be caught and separated from non-histidine-
tagged proteins. This method, which we today refer to as IMAC
22
(immobilised metal affinity chromatography), was first described in the late
1980’s (99) and has since then revolutionised recombinant protein
purification. Commercially available antibodies and probes, conjugated with
horseradish peroxidase (HRP) or alkaline phosphatase (AP), directed
towards His-tags have since then been generated and therefore a His-tag can
be used for target protein detection based on immunochemicals.
Construct design
The second approach to increase the chances of obtaining recombinant
soluble expression is to change the characteristics of the target protein. As
already mentioned the expression of prokaryotic proteins can be problematic
and success rates for soluble expression of such a protein is approximately
~50 %. However, for eukaryotic proteins the success rates drop significantly
to ~30%8. This could be due to the larger size of these proteins, the number
of domains and flexible linker regions (which might be protease sensitive),
the requirement of specific chaperones for folding and the requirement for
post-translational modifications. A natural reaction to these problems has
been the attempts to clone and express the individual domains of the target
protein, which has been proven to be very useful (57, 75, 100-103). This
strategy is based on the theory that if the full-length protein fails to express
or crystallise, perhaps its individual domains might. Domains can be
predicted either experimentally, with limited proteolysis coupled with MS
analysis (101, 102), deuterium exchange MS (104) or with special computer
programs (105).
The latter strategy, designing new constructs partly with help of domain
predictions, has successfully been employed within SG-initiatives where
several expression constructs for one target protein are generated (Figure 4).
It has been shown that by using this approach the probability of generating
soluble protein increased two-fold (100). Since these types of domain
prediction programs have a fair degree of uncertainty, several expression
constructs have to be designed that start close to the predicted domains.
8 http://targetdb.pdb.org/statistics/targetstatistics.html
23
b)
Figure 4
Construct design. a) A schematic picture of a result from a domain prediction program of a
multi-domain protein. b) New expression constructs are designed to define domain borders in
hope of finding a better expressing construct.
The solubility of a protein can also be increased by making random or
focused mutations or deletions. This approach is called in vitro evolution and
will be described later in the text. Whether new expression constructs are
generated in a focused or random approach generate, all of them have to be
screened for soluble expression.
Screening for recombinant soluble overexpression
The aim of a screen, no matter how it is executed, is to rapidly reduce a large
number of clones/targets to a more easily handled number.
The traditional approach when screening for soluble expression is usually
done with liquid cultures in individual vials and more recently in a 96-well
format. 1 ml cultures are grown and induced in parallel, cells are harvested,
lysed and soluble material is separated from insoluble by centrifugation
and/or purification. The soluble fraction is usually analysed by SDS-PAGE
gels (53, 60, 85, 106-108). Robots can perform certain steps in this process
while others still have to be done manually.
A couple of years ago an effort at genome wide expression screening was
attempted. Some 10,167 ORFs from the nematode Caenorhabditis elegans
was screened for soluble expression in E. coli in a 96-well format. Soluble
expression could be detected for 1,356 ORFs corresponding to a success rate
of 13% (109). This number is very much lower than more recently reported
success rates for eukaryotic proteins and it was later shown that many ORFs
a)
24
were wrongly annotated, had mispredicted gene boundaries and were out of
frame (110). Although only one vector and one expression strain was used
and some steps were automated, the workload of this effort is likely to have
been very large.
In our lab, we have previously developed a method that utilises filtration in
order to separate soluble material from insoluble called FiDo (filtration dot
blot) (111). Liquid cultures are grown and induced in a 96-well plate. A
small fraction of the culture is then transferred to another 96-well plate with
a low protein binding submicron filter in the bottom. The liquid media is
removed by vacuum and a bacterial pellet formed on top of the filter. The
pellet is either resuspended in a denaturing lysis buffer (solubilising all
proteins in the bacteria) or a native lysis buffer (only releasing the soluble
proteins). Vacuum is reapplied and the filtrate is collected in a collector
plate. The filtrate is then used to make dots on a membrane with a high
protein affinity, like nitrocellulose. The nitrocellulose is then blocked and
probed with an antibody or probe and developed like a Western blot.
The FiDo screen has also been modified to accommodate for an affinity
purification step to be able to determine the purifying ability of the target
protein (112).
In order to secure a high output of structures in an SG pipeline the best
strategy would be to generate multiple constructs subcloned with different
fusion proteins/tags, which would then be expressed in different strains and
at different temperatures. A quick calculation shows that working with 96
targets, creating 10 variants of each (based on domain predictions) cloned
with 2 different fusion proteins/tags and expressed in two different strains at
two different temperatures, would generate 7,680 different experiments and
although most of the work would be done in a 96-well format, it would be
quite labour intense. Therefore these types of combinatorial experiments are
currently not pursued within SG initiatives due to high costs and heavy
workload.
Colony screening methods
As already mentioned, solubility can be increased by adding a large fusion
protein. The solubility would then subsequently be screened based on the
physical characteristics of a soluble protein, such as its ability to be
25
separated from insoluble protein by centrifugation, filtration or purification
of liquid cultures.
However, solubility of a protein is intimately connected to its folding and
activity and could therefore be monitored by fusing the target protein to a
reporter protein, that when folded correctly would give rise to an easily
monitored phenotype. The theory relies on, that if the reporter protein is well
folded, hence soluble, the target protein should also be soluble and well
folded (Figure 5).
Figure 5
Schematic picture of a target protein (grey) fused to a reporter protein (black). a) If the target
protein is well folded and soluble, the reporter protein will fold and the phenotype can be
observed. b) If the target protein misfolds, the reporter protein will misfold and no phenotype
will be seen.
Solubility would not have to be screened in liquid cultures but instead at
colony level and would therefore lift the heavy burden of handling liquid
cultures since thousands of colonies, if they all carried different constructs,
could potentially be screened on one colony plate.
Waldo et al described in 1999 a method to monitor folding and therefore
solubility in colonies by fusing the target protein to GFP (green fluorescent
protein). Only bacterial colonies that fluoresce would have complete read-
through and express a well-folded and soluble target protein and vice versa
(113).
Another method relying on the same theory, presented by Maxwell et al, is
to fuse CAT (chloramphenicol acetyltransferase) to the protein of interest.
Only a bacterium that can grow in the presence of the antibiotic
chloramphenicol would express the target protein (114). Both these methods
are easy to use and allow thousands of colonies to be screened in one
experiment.
a) b)
26
a)
b)
Nevertheless, these types of methods have potential drawbacks. Firstly, the
reporter protein might affect the solubility of the target as already
mentioned. Secondly, false positives are seen for example when GFP is used
(115, 116), and thirdly, the reporter protein has to be removed before the
target protein can be used for structural studies.
In order to avoid the drawbacks of adding a large soluble protein, several
methods only relying on a small reporter peptide have been developed. In
practice, these methods rely on splitting a large reporter protein in two,
where none of the parts are active on their own. The target protein is fused to
a small peptide, corresponding to a vital part of the reporter protein. If the
target protein is soluble and well folded the phenotype should be detectable
when the “rest” of the reporter protein is added or co-expressed and vice
versa (Figure 6).
Figure 6
The theory behind a split reporter protein. The target (grey) is fused with a part of the reporter
protein (dark grey). If the target protein folds the tag will subsequently fold and when
combined with the rest of the reporter protein (black), the phenotype will show. If the target
protein is insoluble, the tag will be unfolded and no phenotype will be seen when adding the
rest of the reporter protein.
Wigley et al (117) used -galactosidase in this manner. -galactosidase was
split into a 52 amino acid -fragment, which was fused to different control
targets, and an -fragment corresponding to the rest of the protein. When the
control targets were expressed, colonies would turn blue or white depending
on the solubility. This method was effectively used to screen a hybrid gene
library of the human P450 for proper folding and solubility. Even though
this method seems to work well, especially when wanting to monitor slow
folding processes, a certain degree of false positives and negatives could be
27
observed. Additionally, in a review by G.S Waldo from 2003 it was claimed
that the -fragment could render proteins insoluble (118).
The developers of the previously described GFP-method have recently
developed it to better suit the demands of a smaller tag (119). Only part of
GFP, representing -strand 11, is fused to the target protein via a flexible
linker. This approach has successfully been used to screen mutation libraries
of proteins from Mycobacterium tuberculosis (115).
Both these methods have approached the problem, that reporter proteins can
remain active although it is fused to an insoluble target protein as well as
that they might alter protein solubility, by splitting the reporter protein in
two pieces. Both these methods have the advantage that they work in vivo
and in vitro making it easy to monitor solubility of the protein in a cell
lysate. A small complication is however, that in order for the rest of the
reporter protein to be present in the cell, two plasmids have to be used and
the protein has to be co-expressed.
A method that can directly monitor folding has been described by DeLisa et
al (120). It relies on that the twin-arginine translocation (Tat) pathway only
moves well-folded proteins across the inner membrane to the periplasmic
space of the E. coli cell (121). By fusing the target protein to a Tat
exporting-signal at the N-terminal and to -lactamase (which only confers
antibiotic resistance in the periplasm) at the C-terminal, the folding of the
target protein can be directly monitored. This method was tested on a
number of well-characterised proteins known to be soluble in the cytoplasm
(like GST, MBP, GFP, TrxA etc) as well as some cytoplasmic unstable
proteins and a good correlation could be seen. Potentially, -lactamase could
influence the folding in some unforeseen way and there were no reports on
any upper size-limit of the proteins that could be exported with the Tat-
pathway. In addition, before the target protein can be used the tag and/or the
-lactamase have to be removed either by recloning or digestions.
By fusing a gene to a C-terminal biotin acceptor peptide and therefore
enabling biotinylation in vivo by E. coli biotin ligase, BirA (122), detection
and affinity purification can be used based on the very strong binding of
biotin to the protein avidin (123).
Tarendeau et al (124) used this type of approach at the colony level.
Colonies carrying an expression construct with a C-terminal biotin acceptor
peptide (Avi-tag) are arrayed very closely, by a robot, on a nitrocellulose
28
membrane and induced for expression. The Avi-tag will be biotinylated in
vivo and after lysis, colonies expressing biotinylated proteins can be detected
on the nitrocellulose by probing with a fluorescent streptavidin conjugate. A
deletion library of almost 27,000 constructs (and with a seven-fold
oversampling!) of the influenza virus polymerase PB2 was successfully
screened in this manner and the approach was called ESPRIT (expression of
soluble proteins by incremental truncation). A major advantage is (at least in
theory) that since neither recloning nor digestions should be needed,
identified constructs can, if a plate replica were to be made, go directly in to
scale up experiments and subsequent affinity purifications based on biotins
affinity for avidin could be done. In addition no reports have so far surfaced
on any potential negative side effects by adding an Avi-tag, whether it
affects the solubility or if aggregated protein could potentially be
biotinylated and misinterpreted as soluble. Although this is a very powerful
method in its present implementation it still relies on expensive robotics,
which would not be part of standard laboratory equipment, and the use of
costly streptavidin-magnetic beads.
Several colony-based fusion protein or fusion tag screens have been
developed and evolved into well performing strategies. They are all elegant
methods that allow for a swift and easy way to identify targets that could be
suitable for large-scale protein production. Most importantly, however, is
that when it comes to colony-based screens they are best put to use when the
desire is to screen large collections of gene variants.
The CoFi blot (paper I and III)
We wanted to develop a method that works as a solubility screen at the
colony level but neither relies on a reporter protein that could potentially
affect the solubility of the target protein nor makes use of a tag that needs a
reporter protein to be co-expressed. Since we had very good experiences
with the previously described FiDo screen we adapted it in such a way so it
would allow us to screen soluble expression at the colony level.
In this method colonies are grown on a plate, called the master plate, and are
transferred to a submicron low protein-binding filter that can separate
inclusion bodies from soluble protein. Colonies on the master plate are
29
a) b)
c) d)
e) f)
g) h)
regrown and colonies on the filter are induced for expression by placing the
filter on a plate containing IPTG. After induction the filter is used to make a
filter sandwich (Figure 7). Upon lysis, soluble protein will diffuse through
the filter and attach to a high protein-binding membrane, such as
nitrocellulose. Detection is done by incubating the nitrocellulose with probes
or antibodies directed at the target protein and using standard
immunochemicals.
Figure 7
A schematic picture of the CoFi-blot method. a) Colonies are grown on a master plate, which
are b) transferred to a Durapore filter and c) expression is induced on a plate containing IPTG.
d) The filter with the colonies is then used to make a filter sandwich consisting of the
Durapore on top of a nitrocellulose and a Whatman paper with lysis buffer. e) Close up of
sandwich. f) Upon lysis, by repeated freeze thawing, soluble protein will diffuse through the
filter and bind to the nitrocellulose. g) The nitrocellulose is then blocked and incubated with
probes or antibodies directed at the target protein. h) The signals are detected by
chemiluminescence.
In our case we tend to use His-tag fusions, which efficiently can be probed
with Ni2+ conjugates as well as be used for purification. Colonies that give
rise to signals are picked from the master plate and can either go directly into
scale-up experiments or be further analysed. We chose to name this method
the CoFi blot (colony filtration blot).
30
In order to verify how well the CoFi blot works we decided to compare it to
the traditional method of growing liquid cultures and separating soluble
protein from insoluble by centrifugation.
32 eukaryotic and 24 E. coli proteins were subcloned in two different
expression vectors yielding either an N-terminal His- or FLAG-tag. Targets
were both grown and induced as colonies, which were subjected to the CoFi
blot, and as liquid cultures. The bacteria in the liquid cultures were
harvested, lysed and centrifuged. Dots were made on a nitrocellulose of both
total and soluble protein content and developed in the same way as for the
CoFi blot.
Figure 8
The CoFi blots performance was compared to traditional way of screening for soluble
expression, centrifugation. In the picture the results and correlation between the two methods
are shown for 32 eukaryotic proteins. Targets have been noted with (+), (-) or (0). (+) for
where the two methods are in agreement, (-) for disagreement and (0) for when there is no
total expression. Constructs with no total expression have been excluded from the statistics.
We have shown that the two methods are in 84% agreement with a fairly
good correlation between expression levels. We have also shown that the
CoFi blot is reproducible by re-screening clones in quadruplicates.
Differences between the two methods can be due to the different metabolic
states for a bacteria growing in a liquid culture as opposed to growing on a
solid support. Another factor that could explain the deviances is the affinity
of the filter for the proteins. Even though the filter is a low-protein binding
31
filter, some proteins might still stick and therefore the CoFi blot would not
work as a detection method for these particular proteins.
In summary, we have developed a colony expression screen, called the CoFi
blot, with good reproducibility and correlation to a more traditional
expression screen. The CoFi blot can be applied to any type of protein to
which antibodies have been generated or that contain a detectable tag, such
as a His-tag. Since it utilises standard molecular biology reagents and
equipment, it is a method suitable for any laboratory. Another advantage is
that the CoFi blot only utilises a small affinity tag for detection and not a
large fusion protein and therefore there is little risk that the tag will influence
the solubility/folding of the target protein. In addition, colonies containing
constructs that yield soluble protein, can be directly picked from the master
plate and be subjected to scale up experiments. One major advantage is that
the CoFi blot detects solubility after lysis and separation of cell debris,
meaning that the target protein has survived two additional steps required for
scale-up purification and is therefore, potentially, a better indicator of a
useful protein as compared to other methods where solubility is detected in
the cell before lysis. The CoFi blot also carries the same advantage of other
colony-based screens; the ability to screen thousands of colonies in a single
experiment.
Library methods
A successful strategy to make a protein soluble is by making mutations,
amino acid substitutions and deletions that could favour protein expression
and/or folding (115, 125). These types of alterations could be made in a
focused manner like changing the design of the expression construct based
on domain predictions (generating only a limited number of variants), or by
randomly generating thousands of variants (by error prone PCR, DNA
shuffling, truncations etc) creating a library. However, beneficial mutations
can be quite rare and one could potentially end up with a needle in a
haystack scenario. A colony screen is therefore an efficient mean to lift the
heavy burden of screening overexpression of such a library with traditional
methods.
32
As already mentioned, compared to prokaryotic proteins eukaryotic are often
larger and consist of more domains connected by flexible linker regions. As
discussed, a very common approach to increase a protein’s solubility is also
to change the expression construct by predicting domains. Domains are
predicted by computer programs, and expression constructs close to
predicted domain borders are cloned and tried for soluble expression. Even
though this approach has shown to be very successful, domain predictions
based on computer programs are not always accurate.
Domain borders can also be determined experimentally by limited
proteolysis coupled to MS, but this approach poses a kind of catch-22
situation since it relies on soluble expression from start. We wanted to
develop a strategy with which we could obtain or enhance soluble protein
expression and potentially map domain borders at the same time.
Deletion libraries screened with the CoFi blot (paper II)
We decided to produce deletion libraries of 21 human proteins that
previously, as determined in paper I, had been insoluble or did not express
protein at all to in an effort render them soluble and at the same time
investigate if we could experimentally map domain borders. Deletion
libraries were generated at DNA-level from the 5’ end of the target gene by a
method described by Henikoff et al (126) and is now available as a kit called
Erase-a-Base® (Promega). The target gene has to be cloned into a plasmid,
which is linearised and incubated with exonuclease III that removes
nucleotides from one strand of the DNA at a constant rate. By removing
timed aliquots a deletion library spanning the entire length of the gene can
be created.
In order for this method to work, two restriction sites have to be introduced
in front of the target gene to ensure unidirectional deletion. One to create a
5’-overhang that is susceptible to exonuclease III digestion and one that
generates a protecting 3’-overhang at the opposite end of the gene. The
remaining single strand is removed by S1 nuclease and the ends are flushed
by Klenow polymerase. The linear blunt-ended fragments are then re-ligated
and transformed into a cloning strain (Figure 9). Colonies are grown and
harvested and the library is extracted and transformed into an expression
33
strain. The library inside the expression strain is then screened for soluble
expression with the CoFi blot method.
Figure 9
Schematic picture of the Erase-a-base procedure. A plasmid containing appropriate
restrictions sites in front of the target gene is linearised. A deletion library, spanning the
entire gene, is generated by adding exonuclease III and removing timed aliquots. The
remaining single strand is removed by SI nuclease and blunt ends are generated by Klenow
polymerase. The plasmids are religated and a cloning strain is transformed.
21 human proteins were chosen, where 19 were successfully recloned into an
expression vector with appropriate restriction sites in front of the genes. For
detection and purification purposes we added a His-tag in front of the
restrictions sites. Deletion libraries were generated for all 19 genes and
positive colonies could be seen for 17 (Figure 10).
34
Figure 11
SDS-PAGE of constructs
identified with the CoFi blot
method a) The best expressing
constructs were purified by small
scale IMAC and the eluate was
run on SDS-PAGE. b) 6 different
constructs from two proteins
showing the difference in
construct length and their level
Figure 10
CoFi blots of 6 targets. Positive colonies can be seen as black spots on a, b, c, d, and f and
negative colonies are seen as very faint grey spots. Since this is a deletion library from the 5’-
end, only one third of the colonies can in theory be positive. e) Example of a completely
negative CoFi blot. Positive colonies were picked for analysis and identification purposes.
From each target, 24 positive colonies were picked and used to inoculate
small liquid cultures that were induced for expression. The soluble fraction
was purified by IMAC and the eluate was analysed as Dot blots. At this
stage 14 targets gave soluble expression and out of these, 11 could be seen
as a band on an SDS-PAGE gel (Figure 11). The remaining three proteins
with no soluble protein detected were either extracellular or membrane
associated.
35
Because we wanted to see if this approach could be used to experimentally
determine domain borders we ran two domain prediction programs, PFAM
and BIOZON9, and compared them with our translational start points. As can
be seen from (Figure 12), our well expressing constructs have translational
starts correlating well with predicted domain borders. Interestingly, for some
clones they even have a translational start within a predicted domain and
these constructs would therefore not have been identified using standard
domain predictions. It should also be noted that the two domain prediction
programs, determine domains based on different criteria, and therefore a
difference can be seen in their domain predictions, which shows the
importance in these type of experimental approaches compared to rational
construct design based on domain predictions.
Figure 12
Domains as predicted by PFAM and BIOZON and translational starts of constructs identified
using the CoFi blot method.
9 http://www.biozon.org/
36
Another very intriguing observation is that for most targets there are clones
that express soluble protein that is full length or close to full length. This
could be the result of favourable changes in these regions that affect the
transcription, translation and/or folding. This would be in concurrence with a
report that regions downstream of the initiation codon can act as
translational enhancers (127).
From the initial set used in paper I, where only 30% of the mammalian
proteins could be expressed we increased the success to 60%.
In retrospect we have developed an experimental method, by generating
random deletions to increase soluble expression and at the same time define
domain borders.
The CoFi blot is routinely used in our lab and has now also been adapted to
allow the screening of recombinant overexpression of integral membrane
proteins. It was verified by screening random mutation libraries of nine
integral membrane proteins (128) and has now been used to screen a very
large part of all human membrane proteins (Unpublished results).
Expression screening complete genomes
The main objective of an expression screen is to slim down the number of
samples from a larger set. It could be applied to a library of different variants
of the same gene, as earlier described, or a collection of genes representing
for example a protein family, a pathway or a whole genome. Particularly the
handling of an entire genome and the cloning into an expression vector could
become quite troublesome, and new strategies are needed to handle very
large gene collections for expression studies. A good start point, for such
studies are Gateway® adapted entry clones allowing for the easy subcloning
of a gene into an expression vector through recombination (Figure 13).
Today several ORFeomes (a collection of all ORFs from one
organism/genome) for a number of genomes are available as Gateway clones
(110, 129-132).
37
Figure 13
Schematic picture of Gateway® cloning from Invitrogen. A PCR product is generated flanked
by specific recombination sites that are complementary with a vector, pENTR. The PCR
product is moved through recombination into the pENTR creating an Entry clone with new
recombination sites. The Entry clone is incubated with an expression vector containing
recombination sites complementary to the new sites. An expression clone is created.
As earlier mentioned a genome wide expression screen was performed on
the C. elegans ORFeome v 1.1 with very low success rates. This screen was
done in the more traditional 96-well format. Even though all ORFs were
available as Gateway clones and some steps were automated, this work was
probably very labour intense. Another attempt at handling and screening a
part of the C .elegans ORFeome was reported by Gillette et al (133). This
strategy named POET (pooled ORFeome expression technology) was based
on pooling 688 C. elegans ORFs and in a single recombination reaction
transfer them into an expression vector with a His-tag. An expression strain
was subsequently transformed with this pooled ORFeome library. The
transformation reaction was then used to inoculate a larger volume to make a
large expression culture. This culture was induced for expression, harvested
and lysed. To separate soluble protein from insoluble, the lysate was purified
by IMAC. To analyse expression and separate the different overexpressed
proteins from each other, the eluate was run on a 2D-gel. Intense and deviant
spots were picked for MS analysis. 165 spots were picked and 50 were
identified as C. elegans proteins by MS. Out of these 50 proteins, 12 were
38
chosen for small-scale expression and 6 out of these for large-scale
expression.
This method acts as a rough sieve to identify strong expressers in a large
gene collection. It also makes the handling, from cloning to expression
screening simpler. However, individual DNA measurements and the pooling
and subpooling of equal amounts must have been tiresome. In addition, this
method might not be readily used in labs with more modest equipment,
referring to the access of 2D-gel electrophoresis or MS equipment. POET is
also haunted by some limiting factors, such as the step of converting a
transformation reaction into a large culture and the iso-electric focusing
range on the gel. In a culture containing several variants of a bacterial strain
there is always a risk that one or a few variants take over the entire culture
and several targets might be lost in this step. The iso-electric range of pH 4-7
will result in that only proteins with a pI within this range will be separated,
excluding a vast amount of potentially overexpressed proteins. In the end,
anything identified must be fished out from the original collection and
recloned for any downstream processes.
Structural genomics and human pathogens
In the beginning of this thesis, a general introduction was given to human
pathogens in the form of viruses. Although recombinant protein and protein
structures from viruses could aid in the development of vaccines and drugs,
relatively few protein structures of viral proteins have been solved10.
Especially from SG initiatives submission of viral protein structures in PDB
have been modest (53). This could partly be explained by that the focus,
within these initiatives, is on other types of proteins. In two articles (50, 51),
where the focus or part of the focus was on viral proteins, poor success rates
were reported. The main problem was low expression and low solubility in
E. coli. Success was predominantly achieved when each target was treated
individually or expressed in insect cells.
10 http://www.rcsb.org/pdb/home/home.do
39
The daily SCOOP (Paper IV)
We wanted to use the CoFi blot as a homing method to, within a genome,
identify easily expressed targets and targets in need of special care. We
designed a general strategy and applied it to the ORFeome of KSHV. KSHV
consist of approximately 90 ORFs, which were supplied to us as 113 entry
clones (132). ORFs coding for integral membrane proteins were cloned as
full-length constructs, as well as into separate non-transmembrane domains.
In similarity to POET, our approach also relied on pooling of Gateway®
adapted entry clones and a single recombination reaction. However, instead
of tedious DNA measurements we grew separate cultures of bacteria
containing the individual entry clones over night. OD600 measurements
showed similar density for all of them and equal amounts were pooled.
Plasmids from this pool were extracted and ORFs were moved, in one
recombination reaction, into an expression vector with an N-terminal His-
tag. A cloning strain was transformed with the reaction and plated. The
library was harvested and an expression strain was transformed and plated.
4,000 colonies were subjected to the CoFi blot procedure. Positive colonies
were analysed based on their purifying propensity with small-scale
purification. Identification was done through regular Sanger sequencing. 23
unique constructs could be identified as expressing soluble protein with this
approach. 11 out of these 23 constructs were randomly chosen for large-
scale expression and purification.
In order to benchmark this method we compared it to the more traditional
handling and screening; individual subcloning, expression screening and
subsequent affinity purification all in a 96-well format. With the more
traditional setup 25 constructs could be identified as soluble expressers.
When comparing our new approach, now called SCOOP (screening colonies
of ORFeome pools), with the more traditional approach 23 and 25
constructs, respectively, were identified as yielding soluble protein in E. coli.
20 out of these 23 and 25 were the same for the two experiments. To
elucidate why 5 constructs were missed with SCOOP, their expression at the
colony level was investigated using the CoFi blot. Soluble expression at the
colony level could not be detected for 3/5. The additional 3 constructs that
were only identified with the CoFi blot were grown and induced for
expression in liquid cultures. The soluble fraction was affinity purified and
40
the eluate was run on SDS-PAGE gels and very faint bands could be seen on
the gel, indicating that these constructs were expressed at low levels.
It has long been known that a bacterium growing on solid support in a
colony behaves differently from bacteria growing in a liquid culture (134,
135). In this context we therefore propose that either the metabolic state of
the colony, compared to bacteria growing in liquid, does not allow for
soluble expression of some ORFs and vice versa, or that the CoFi blot
simply does not work for these proteins. In addition, when making pools of
this kind, some targets are inevitable lost along the way, perhaps accounting
for the additional two not identified by the CoFi blot. Success rates will
correlate to the overall coverage/normalisation of the library and although
we find several constructs more often than others, we must conclude that our
library is well dispersed since we cover almost all soluble expressers.
Although 113 constructs were screened by SCOOP, sequencing at a very late
stage revealed empty vectors, mutations leading to frame shifts and early
stop codons for 31 constructs. A TMHMM11 search of the remaining
constructs predicted 8 ORFs to code for transmembrane proteins with at
least 2 transmembrane regions. Of the 113 constructs used in the original
experiment only 74 could be expected to produce proteins of the expected
size.
In a matter of days we have been able to screen the ORFeome from a human
pathogen for soluble recombinant expression in E. coli using SCOOP
(Figure 14). Once again we have shown the general applicability and
advantages of the CoFi blot method, by which constructs identified can
directly be moved further down the SG pipeline. A normalised library is
created through pooling of bacterial cultures instead of at DNA-level, thus
avoiding tedious DNA measurements. Subcloning into an expression vector
is done in one recombination reaction and the subsequent screening is done
with the CoFi blot. Identification of positive constructs is done by
sequencing. From this very small set of targets, soluble expression in one
strain and at one temperature was roughly about 30%, which was also
verified by SCOOP. This number is a bit lower than the SG statistics of
37%, although this corresponds to proteins from all types of viruses and
11 http://www.cbs.dtu.dk/services/TMHMM-2.0/
41
there is no information about full-length expression nor expression
conditions (53).
Figure 14
Flow scheme of SCOOP.
SCOOP has also been applied to deletion libraries of the KSHV ORFeome.
Pools were generated based on restriction enzyme compatibility for the
Erase-a-base® protocol. Deletion libraries were generated and expression
was screened using the CoFi blot. Initial results show that several other
targets with widely different functions can be made to express in E. coli with
this approach. We have also screened the overexpression of all integral
membrane proteins from KSHV using the membrane protein CoFi-blot.
SCOOP and other herpesvirus ORFeomes
As mentioned, structure determination of viral proteins has been suffering, in
part, due to poor soluble expression in E. coli and therefore these types of
proteins have been considered as “difficult”. As a result only 525 unique
virus protein structures have been submitted to the PDB, a number that
includes proteins from all types of viruses, from SARS and HIV to
42
herpesviruses. Out of these 525 structures only 31 have been submitted by
SG initiatives, worldwide. SCOOP was applied to four additional
ORFeomes from herpesviruses, HSV-1, VZV, EBV and mCMV available as
Gateway® adapted entry clones. By using one expression strain and one
temperature we were able to identify 75 unique constructs as expressing
soluble protein. These constructs corresponded to proteins of various
functions like DNA metabolism, immune evasion and tegument proteins as
well as proteins of unknown function. For 65 there was no protein structure
in PDB. Large-scale expression, purification and crystallisation trials have
been attempted for all 65. So far 3 protein structures have been solved.
Moving further down the pipeline
One area that has been intensely studied for human herpesviruses is DNA
metabolism due to the high degree of conservation among the proteins
involved and the importance of DNA synthesis (5, 136). In fact the most
effective drug, available on the market today, targets the viral infection at the
level of DNA synthesis and therefore proteins involved in DNA metabolism
would be effective as antiviral drug targets. Acyclovir is an acyclic
nucleoside analogue used to treat alpha herpesvirus infections, which targets
the viral thymidine kinase that phosphorylates acyclovir to acyclovir-
monophosphate. Downstream phosphorylations, by cellular enzymes,
convert this monophosphate to a triphosphate that when incorporated in to
the growing DNA chain of the virus, by the viral DNA polymerase, causes
chain termination. Acyclovir and its related compounds represent a success
story in the general treatment of HSV, VZV and CMV infections (25, 137).
It can be administered as a lotion on symptomatic tissue, orally or
intravenously depending on the severity of infection. However, resistant
strains of these viruses have been identified, especially in HIV-infected
persons and almost all significant resistance has been seen in
immunocompromised patients (25, 137, 138). For these patients more
aggressive antiviral remedies have to be used.
In short, all herpesviruses encode a core set of six DNA synthesis enzymes
(Table 4); a DNA polymerase, a DNA polymerase processivity subunit,
three helicase primase subunits and a single strand DNA binding protein
43
(aka ICP8). In addition to their importance in DNA synthesis they also
mediate homologous recombination during viral replication (5, 139).
A second set of proteins with links to DNA synthesis, more or less
conserved in the herpesvirus family includes the thymidine kinase, alkaline
endo-exonuclease, ribonucleotide reductase, uracil N-glycosylase and a
dUTPase (Table 4). These proteins are considered non-essential for
replication in cultured cells but several appear essential for “normal”
behaviour of the virus in animal models (5).
Table 4
The proteins involved in DNA replication and synthesis, their abbreviations and their gene
names in all the human herpesviruses. White rows are proteins directly part of DNA synthesis
and grey rows are proteins with links to DNA synthesis.
Enzyme HSV-1/-2 VZV HCMV HHV6/7 EBV KSHV
DNA polymerase,
POL
UL30 ORF28 UL54 U38 BALF5 ORF9
DNA polymerase
processivity subunit,
PPS
UL42 ORF16
UL44 U27 BMRF1 ORF59
Helicase-primase
ATPase subunit, HP1
UL5 ORF55 UL105 U77 BBLF4 ORF44
Helicase-primase
RNA pol subunit B,
HP2
UL52 ORF6 UL70 U43 BSLF1 ORF56
Helicase-primase
subunit C, HP3
UL8 ORF52 UL102 U74 BBLF2/3 ORF40/41
Single strand DNA
binding protein, ICP8
UL29 ORF29 UL57 U41 BALF2 ORF6
Thymidine kinase, TK UL23 ORF36 - - BXLF1 ORF21
Alkaline exonuclease,
AE
UL12 ORF48 UL98 U70 BGLF5 ORf37
Deoxyuridine
triphosphate, dUTPase
UL50 ORF8 UL72 U45 BLLF3 ORF54
Uracil-DNA
glycosidase, UNG
UL2 ORF59 UL114 U81 BKRF3 ORF46
Ribonucleotide
reductase, large
subunit, R1
UL39 ORF19 UL45 U28 BORF2 ORF61
Ribonucleotide
reductase, small
subunit, R2
UL40 ORF18 - - BaRF1 ORF60
44
The alkaline endo-exonuclase (AE) is conserved in all herpes viruses and is
involved in the maturation and packaging of viral DNA (140, 141). In HSV
this protein interacts with ICP8 (142). These two proteins can also work
together to promote strand exchange similar to recombination events in
bacteriophage (143). Recently it was shown that the AE from both EBV
and KSHV is involved in host shutoff at the mRNA level, an activity that
has not been shown for the other HHV AEs (140, 144, 145).
The well-known allosterically regulated enzyme ribonucleotide reductase
(RNR) is solely responsible for the de novo synthesis of all four
deoxyribonucleotides. RNR is a heterotetramer of two subunits, R1 and R2
(146). All HHV carries homologues to R1 and R2 except beta herpesviruses,
which lack the R2 (5). Interestingly, the RNR of HSV is not negatively
regulated by high TTP pools as their cellular counterparts and seems
necessary for viral growth in “resting” cells (147, 148). There are reports
that this viral enzyme has an accessory function to protect HSV-1 infected
cells against cytokine induced apoptosis (149).
Uracil N-glycosylase functions as a repair enzyme that cleaves mutated U
residues from the DNA backbone (150, 151).
The virally encoded dUTPase breaks down dUTP, thus preventing its
incorporation into viral DNA(152, 153).
More understanding of proteins involved in DNA synthesis and replication,
their mode of action and interactions, amongst themselves but also with
cellular proteins, could aid in the development of new drugs targeting this
important step of the viral life cycle.
The fundamental mission of all viruses is to replicate, spread and persist in
the host environment to which they have become adapted. Viruses differ
with respect to the mechanisms by which they accomplish their objectives.
The difference is reflected not only in the basic mechanisms of viral entry,
synthesis of proteins, nucleic acid synthesis, virion assembly and egress but
also with respect to the basic strategies by which they counteract the
enormous recourses of the host cell and of the multicellular organism aiming
at total viral extinction. In principal, as a virus, you want no reports to the
immune system or to the apoptotic machinery of your presence and you will
take necessary precautions to achieve this. However, the immune system and
the apoptotic machinery are not so easily tricked with many back up systems
45
to fend off and terminate any invader. The infected cell, must in order to
pass immune system control, show several signs of being “normal”,
therefore certain mechanisms need to be upregulated while others need to be
blocked. Anything remotely suspicious will result in an attack by the
immune system. Immune evasion in the host is a major feature of
herpesviruses. To persist and replicate in a host cell the virus has to launch
several anti-host functions. For herpesviruses these functions tend to fall,
more or less, within four categories, i) selectively block synthesis of new
proteins, ii) block the function of preexisting host proteins activated after
infection, iii) selectively degrade cellular proteins and iv) block signalling to
the host immune system indicating that the cell is infected.
Host shutoff in HHV
Viruses i.e. polio, influenza, SARS-CoV etc frequently induce host shutoff
to avoid competition from host cell transcripts during the mass-production of
their own proteins and to prevent expression of factors involved in
stimulating an immune response (154).
One way to take over a cell is to down regulate cellular protein synthesis by
degrading RNA (155, 156). Not only are host proteins not synthesised due to
low mRNA levels but pre-existing polyribosomes are also degraded and
splicing is impaired. Host shutoff is a consequence of infection of alpha
herpesviruses and gamma herpesviruses and occurs a few hours after
infection (5, 157). One of the consequences of host shutoff is a block in the
synthesis of HLA class I and II molecules, revealed by reduced levels of
these antigen presenting complexes at the surface of cells in the EBV lytic
phase. This effect could lead to escape of T-cell recognition and elimination
(157).
In alpha herpesviruses, mRNA degradation is a result of the tegument
protein vhs (virion host shutoff) protein encoded by UL41 (158). However,
although vhs functions as an RNase it cannot succeed in complete host
shutoff by itself and collaborates with ICP27, inhibiting splicing of mRNA
(5, 159). Intriguingly, it appears that some mRNAs are more resistant to
degradation than others (160). Vhs homologues are found in all alpha
herpesviruses but not in beta-or gamma herpesviruses (157).
46
Host shutoff at the mRNA level has also been observed in gamma
herpesviruses. However, this occurrence has been ascribed to another
protein, the already mentioned alkaline exonucleases (AEs) of these viruses
(140, 144, 145). A screen of KSHV genes for ability of their corresponding
gene products to, in vivo, diminish expression of a GFP reporter protein lead
to the identification of the gene product of ORF37 as a host shutoff factor.
Since this protein functions both as an exo- and endonuclease as well as a
shutoff protein it was named SOX (shutoff and exonuclease) (140).
SOX
SOX is conserved in all herpesviruses as an alkaline exonuclease involved in
DNA maturation and packaging, but it is only the gamma herpesvirus
homologues that display a shutoff activity in vivo. SOX expression initiates
8 hours after lytic reactivation, which precisely coincides with the beginning
of host shutoff (161). Unlike vhs, SOX is not a tegument protein (162) and
the co-expression with GFP resulted in loss of GFP fluorescence, which was
the result of the GFP mRNA degradation. This discovery was further
verified by siRNA knockdowns (140, 144).
In vitro studies of the herpesviral AEs exhibit a 5’-3’ and a 3’-5’ (10-fold
weaker) dsDNA exonuclease activity and an endonuclease activity (163)
They work best at an alkaline pH and have an absolute requirement for Mg2+
(although Mn2+ can work, equally well for HSV-1/2 AE at some
concentrations) (164-166). For the HSV null mutant in gene UL12, growth is
severely restricted but is still able to replicate and encapsidate considerable
amounts of viral DNA. Capsids, as a result from this mutant, are impaired in
the egress from the nucleus to the cytoplasm (167). In addition the HSV AE,
UL12, exhibits strand exchange activity, in vitro, and acts together with
ICP8 as a recombinase (143, 168). In contrast to UL12, which only localises
to the nucleus, SOX localises both to the nucleus and cytoplasm (144).
Complex sequence alignments showed that all herpesviral AEs belong to the
PD-(D/E)XK superfamily, which comprises endonucleases (i.e. EcoRI,
EcoRV BamHI etc), nicking enzymes and one exonuclease (169-172).
Typically, sequence similarity between these proteins is so low that it is only
upon structure determination that they are assigned to this superfamily.
Although low sequence similarity, it has been proposed that all these
47
proteins share a common core of a 4-5-stranded -sheet that brings together
the charged residues in the conserved motif PDX10-30(D/E)XK that is
involved in Mg2+ binding and catalysis (169).
When aligning all herpesviral AEs, seven conserved linear motifs can be
identified (173). Both random and focused mutations identified single and
double function mutants. Single function mutants lacked either DNase or
shutoff activity in vitro or in vivo respectively, whereas double function
mutants had neither. When linearly mapping these mutations it was revealed
that mutations that abolished shutoff activity located outside of these
conserved motifs (except one, which is not conserved) whereas double
function mutations and single function mutations affecting only DNase
activity located within the conserved linear motifs (Table 5, Figure 15)
(144).
Table 5
Summary of locations of mutations that affect SOX activity (140, 144)
Mutation DNase activity Shutoff activity Conserved motif Conserved
residue in HHV
AEs
T24I Yes No No No
A61T Yes No No No
Q129H No Yes I Yes
P176S Yes No No No
G217A No No II Yes
D221E No No II Yes
V369I Yes No VI No
D474N Yes No No No
Y477Stop Yes No No No
Figure 15
Schematic figure of the seven (I-VII) conserved motifs of SOX and the mutations affecting
the activity. Black box indicates a double function mutation. Grey box indicates single
function mutations abolishing DNase activity and white boxes indicate single function
mutations abolishing shutoff activity.
48
As a result it was proposed that SOX is a bifunctional protein that exhibits
DNase activity in the nucleus and shutoff activity in the cytoplasm, inducing
host shutoff. Because mutations that abolished shutoff activity located
outside of conserved linear motifs, it was proposed that this activity had
been acquired later in evolution (140). However, since the shutoff activity
only could be shown in vivo, it was suggested that SOX might need an
additional cellular or viral-binding partner to induce this action.
The EBV homologue, BGLF5, has been shown to possess the same features
as SOX by similar characterisations (145, 166, 174, 175). With an amino
acid sequence identity of 42% it has now been shown that both these
proteins down regulate expression of HLA molecules, which impairs the
recognition of infected cells, by T-cells and that this is not a feature of the
other herpesviral AEs (175).
Structural studies of the SOX protein from KSHV (Paper V)
One of the proteins identified as positive with SCOOP was the KSHV SOX
protein. Catalytically active SOX, both native and selenomethionine
labelled, was expressed with an N-terminal His-tag in E. coli BL21[DE3]
and purified to homogeneity with our generic two-step purification scheme
by IMAC and gel filtration. Well diffracting crystals were obtained through
vapour diffusion. Native crystals diffracted to 1.85 Å and phases were
obtained by SAD.
SOX consists of 487 residues with a molecular mass of 55 kDa. It consists of
two domains giving it a rough bowl shape. The two domains, the N-terminal
domain and the core domain are connected by a partly unmodelled region
called the bridge. The N-terminal domain consists of -helices and the core
domain consist of a 5 stranded -sheet flanked by several helices in
similarity to the rest of the members in the PD(D/E)-XK superfamily. The
two domains form a crevice where the bottom is made up of a U-shaped
loop, called the U-loop, stretching from core domain into the N-terminal
domain (Figure 16). This crevice is positively charged, consists of highly
conserved amino acids in all herpesviral AEs as well as the residues in the
already mentioned conserved sequence motif of the PD(D/E)-XK
superfamily.
49
U-loop
Figure 16
Overall model of SOX. The N-terminal domain in light and the core domain in dark. Three sulphates can be seen in the density and have therefore been modelled. Density for what we believe is a magnesium ion could be seen in the cleft and has therefore been modelled. The bridge connecting the N-terminal domain with the core domain is indicated by a dotted line
Densities can be seen for what we believe are three sulphates, where two are
coordinated by totally or semi-conserved residues proposed to be binding the
site for the substrate backbone, and one is involved in crystal contacts. A
magnesium ion has also been modelled in the crevice. This magnesium ion
is partly coordinated by residues in the PD(D/E)XK motif. A DALI search
revealed a surprisingly similar fold to the only known exonuclease within
the superfamily, the -exonuclease (Figure 17) (176).
Figure 17 Superposition of a monomer from the -exonuclease (dark) on SOX (light). -exonuclease is also a two-domain protein. The two domains in -exonuclease are connected by a completely modelled bridge. The partly modelled bridge in SOX is an ordered -helix and the U-loop has small -sheet in the -exonuclease. The bridge in -exonuclease is completely modelled and a sulphate in SOX is bound in the same location as a phosphate in -exonuclease. In addition, the magnesium ion in SOX is located in the same position as a manganese ion in -exonuclease (not shown).
50
Substrate binding
In a superposition of SOX and the -exonuclease, we can see that one of the
sulphates is located and coordinated in the same way as a phosphate in the -
exonuclease structure. The magnesium ion in SOX is also located and
coordinated as a manganese ion in the -exonuclease structure as well as
other divalent ions in other proteins belonging to the same superfamily.
Since this area in SOX is highly conserved and both the tertiary structure as
well as the aligning of ligands with the -exonuclease, we propose that this
is the site for substrate binding. However, although great similarities are
observed in both the overall fold of these two enzymes, dissimilarities can be
seen. For instance, SOX is larger than -exonuclease and where SOX has
partly unstructured and unmodelled regions (such as the bridge and the U-
loop) -exonuclease has an ordered secondary structure. We believe that the
-exonuclease structure represents what for SOX would be a substrate bound
state, where conformational changes lead to ordered secondary structures in
the bridge and the U-loop. Another major difference between these two
proteins is their quaternary structure, where the -exonuclease is active as a
trimer (176), which could explain its processivity. No prior reports or our
results indicate that SOX functions as anything but a monomer (166).
The second modelled sulphate in SOX, located at the beginning of the
crevice, is coordinated by totally or semi-conserved residues, which would
indicate that this is a true substrate backbone binding site. To elucidate
whether this sulphate could represent part of the substrate backbone, we
modelled a 4bp oligo by aligning the phosphates in the backbone to the two
sulphates in the model. As can be seen (Figure 18) this oligo fits nicely in
this highly conserved crevice and aligns very well with our sulphates.
The second phosphate in the oligo backbone places itself very close to the
modelled magnesium ion. It is possible that the phosphodiester bond is
cleaved by a nucleophilic attack by an activated water molecule close to the
magnesium, which stabilises negative charges in this area, cleaving off one
nucleotide at a time. Since the substrate of SOX can be both double stranded
and single stranded SOX must have a type of unwinding activity. We
therefore believe that SOX separates the strands of the dsDNA and the single
stranded substrate destined for digestion is guided into the active site by the
bridge and is stabilised by binding to conserved and partly conserved
51
aromatic residues in and around the crevice. The backbone is cleaved and
the free nucleotide is released on the opposite side of the “tunnel” created
under the bridge. The remaining free single strand of the substrate is directed
away from the protein surface.
Figure 18
Conserved residues in SOX as determined by sequence alignments with all other HHV AEs.
Totally conserved residues are depicted in red and least conserved residues are light blue. A
4bp oligo was modelled in the crevice based on the location of the two sulphates.
SOX as an RNase?
An intriguing question is whether SOX can mediate host shut off through an
intrinsic RNase activity. As already mentioned, functional characterisations
of SOX in vivo, show that it affects the total mRNA content of the host cell
and although host shutoff is observed in alpha herpesviruses, the other
herpesviral AE’s cannot induce this type of degradation.
Figure 19
Mapping mutations that affect the activity of SOX. Red deletes overall activity. Orange
deletes DNase activity and pink the shutoff activity.
52
In addition, single function mutations abolishing the shutoff activity of SOX
locate outside of conserved linear sequence motifs found in all herpesviral
AE’s. It has therefore been proposed that the host shutoff trait of SOX, and
BGLF5 from EBV, has been acquired later in evolution. It has also been
proposed that SOX might need an additional cellular binding partner or that
it does not have any intrinsic RNase activity and only activates some sort of
RNA decay pathway (140, 144).
With the structure in hand we can now map known mutations, which affect
the shutoff function. As can be seen (Figure 19), mutations that abolish
shutoff function are spread all over the protein, with predominance in the N-
terminal domain. The mutations locate both on the exposed surface and in
buried regions although they are mostly found close to the surface.
Since SOX has a nuclease type active site, with an extended pocket that
appear well suited for binding nucleic acids, it could be argued that both the
DNase and a potential shutoff RNase activity are carried out in the same
active site, since no other obvious binding or active site can be seen in the
structure. As support for this theory, both double-function mutations
(G217A, D221E), located in the U-loop, as well as one shutoff mutation
(P176S), located in the end of the unmodelled bridge are either directly
involved in metal binding or reside very close to the substrate binding and
active site. As in most mutational studies carried out in vivo, it cannot be
ruled out that these mutations only affect the folding and stability of SOX,
cancelling out activity all together. Even so, these mutations emphasize the
importance of this region. However, since RNase activity cannot be detected
for SOX in vitro, additional proteins are likely to be required enabling the
nuclease activity for mRNA degradation. Possibly this can be viral or host
factors mediating the interaction of SOX with conserved elements of
mRNA.
What would support the hypothesis of an additional cellular binding partner
for shutoff activity is the predominance of mutations affecting shutoff in the
N-terminal domain, especially in parts that are not present in the -
exonuclease. Perhaps this domain, which also carries a more overall positive
charge than the rest of the protein, will play a role in cellular protein binding.
In this scenario, whether or not it is SOX that carries out the RNA
degradation or the potential binding partner cannot be determined presently.
53
Nucleotide synthesis
For every living cell, cell division is a highly regulated process and even the
slightest stray could lead to serious consequences, where cancer is a classic
sign. In a multicellular organism, there has to be a strict balance between
cells that rest, divide and die. The cell cycle is a series of events that lead to
cell division and is under strict control of key enzymes called cyclins and
cyclin dependent kinases. The cell cycle is roughly divided into four phases;
G1-, S-, G2- and M-phase. During G1, cells synthesise RNA and proteins
that are needed for S-phase, which is characterised by DNA synthesis and
chromosome replication. In the G2-phase, proteins needed for cell division
are synthesised, which is followed by actual cell division during the M-
phase. An additional phase exists for cells, called G0, this phase is entered by
cells in G1 and is the state of cells that are non-dividing or are fully
differentiated. The actual time period a cell is present in G0 varies with cell
type some, i.e. neurons, never leave. The passage from one phase to another
is under rigorous regulation and has certain check points that need to be
cleared in order for safe cell division. Deoxyribonucleotides are the building
blocks of DNA and the synthesis of deoxyribonucleotides peaks during S-
phase. The enzyme responsible for the production of deoxyribonucleotides is
the extremely well studied and strictly regulated enzyme ribonucleotide
reductase (146). However, this enzyme is also important for the synthesis of
deoxyribonucleotides in resting cells, as a response to DNA damage. Both
alpha and gamma herpesviruses encode their own RNR while beta
herpesviruses only encode one of the RNR subunits (R1) (5).
Ribonucleotide reductase
RNR is an enzyme that can be found in all kingdoms of life as well as in
viruses. As already mentioned, RNR performs the chemically difficult
reduction of all four ribonucleotides to their corresponding
deoxyribonucleotides in a reaction involving a free radical (177). The
allosterically regulated RNR is the only enzyme that can perform this
reaction and it therefore controls the entire dNTP pool in the cell.
54
Fig 20
The reaction catalysed by RNR.
RNR’s have been divided based on their radical cofactors into three classes,
I, II and III, where class I further is divided into subclasses Ia and Ib. A
strictly balanced supply of deoxyribonucleotides is essential for accurate
DNA replication and repair (146). RNR could therefore be a potential drug
target for the treatment of different types of cancer and viral infections (178-
182).
RNR is a heterotetramer consisting of two homodimers, R1 and R2 and has
an absolute requirement for iron in the form of a di-iron site located in the
R2 subunit. A free radical is generated, by reductive cleavage of molecular
oxygen at the di-iron site of the R2 subunit, which is transported to the R1
subunit to activate the nucleotide substrate for catalysis (177).
RNR activity is under strict transcriptional and postranscriptional control
and in replicating mammalian cells enzyme activity is regulated by control
of R2 protein stability (146, 177). However, to produce
deoxyribonucleotides as a response to DNA damage in resting cells, the
expression of another cellular R2 is induced, which lies under the control of
the transcription factor and tumour suppressor p53, hence it has been named
p53R2 (183, 184). p53R2 also appears to play a role in providing
deoxyribonucleotides for mitochondrial DNA synthesis (188)
Structural studies of the R2 subunit of the ribonucleotide
reductase from EBV and KSHV (manuscript in preparation)
The EBV genome encodes for a class I RNR R2 subunit with low amino
acid sequence identity to bacterial and eukaryotic R2s. This R2, encoded by
the BaRF1 gene was identified as positive by SCOOP of the EBV ORFeome
and is different to R2s of other organisms in that it has a glutamate (Glu61)
in the position of a metal-coordinating aspartate common to most R2s. The
structure of EBV R2 was solved in its apo, mono metal and di-metal form
55
(Figure 21) at 1.9, 1.6 and 2.0 Å respectively and their radical generating
abilities were investigated using electron paramagnetic resonance (EPR).
The monomer of the EBV R2 is an -helix bundle in similarity to other R2s,
but with several interesting differences in the region around the metal site
(Figure 22). In one of the monomers in the apo and monometal structures,
residues 62-73 of -helix B are disordered and Glu61 is either disordered
or oriented away from the metal binding site. Additionally, there is no
traceable density for residues 120-143 in 2 and D indicating that these
elements too are disordered. These stretches seem to order themselves only
when the di-ferrous metal site is fully formed, where Glu61 participates in
metal coordination. This large conformational flexibility upon iron site
formation has not previously been described for any R2 subunits. Recently,
however, disorder in this particular region is found in three R2 structures
also recently available in PDB; of H. sapiens p53R2 (2VUX) in Bacillus
halodurans (2RCC), Plasmodium vivax (2O1Z) and S. cerevisiae (2RCC).
It is likely that the extra flexibility of helix B affects the rate of iron
binding and therefore the timing of the radical generation reaction. EPR
confirms that EBV R2 is slower in generating the radical and that the radical
site is more accessible to radical scavengers. These observed deviances in
the different metal bound forms of EBV R2, become all the more interesting
when put into a wider virus-host cell context. p53R2 also displays this
disordered region. In nonproliferating cells, DNA damage induces
expression of the p53R2 and also the R1 protein. Together they form an
active RNR complex supplying the resting cell with dNTPs for DNA repair
(183, 186). Since the host cell already encodes for its own RNR it is
therefore believed that the viral RNR supplies nucleotides in a resting host
cell (187).
The mobility of these regions in both EBV R2 and p53R2 may be a relevant
adaptation to particular chemical environments in a resting cell, and may
have a functional role in modulating iron and oxygen access, respectively to
the active site.
56
a) b)
c)
Figure 21
Model of the EBV R2 homodimer in its di-metal form. The subunits are coloured in light blue
and dark blue. Iron atoms are coloured in orange.
We have also solved the apo structure of KSHV R2 encoded by ORF60.
Initial results show that the apo form of KSHV R2 also has the unstructured
stretch in the -helix B, which additionally supports the hypothesised and
specific role of gamma herpesvirus R2s and p53R2. However, the chemical
basis for this role still remains to be elucidated.
Figure 22
Cartoon model of the a) apo-EBV R2 b)
mono metal and c) di-metal EBV R2. The
secondary structure elements described in the
text, which are disordered for one of the
monomers in the apo and monometal forms,
are coloured in cyan (helix B) and light
pink (helix 2 and D).
57
Future prospects
Needless to say, viral biology is an interesting field. To thoroughly
understand its evolution, pathogenesis, to develop new drugs and potentially
new vaccines and diagnostic assays etc we need to study the individual viral
proteins both functionally and structurally. Unfortunately, the recombinant
overexpression of viral proteins has so far proven to be very difficult and
therefore very few viral proteins have been structurally determined. We have
developed methods and strategies to speed up the structural genomics
pipeline and increase the throughput. Recombinant viral proteins are being
routinely pushed through our pipeline and protein structures are coming out
as a result. Three structures have so far been solved from proteins involved
in DNA metabolism, which is an area of both scientific as well as medical
interest.
However, to really comprehend a virus’ interaction with its host, structural
studies of protein complexes are necessary and this will in the future be
pursued. The CoFi blot and SCOOP are routinely used in our lab, screening
gene collections of various kinds. The CoFi blot is evolving and we are
finding new applications for it.
58
Acknowledgements When wanting to write the acknowledgements I realised that what I appreciate the most, with
all the people that have surrounded me during this journey, is their fantastic patience and their
never fading trust and support. Many people have been part and I would especially like to
thank a few of them.
Pär, för ditt förtoende. Inte bara när att du antog mig som student utan också för din
övertygelse om att mitt projekt skulle fungera. För att du lät mig åka ända till Kina. Du har
skapat en inspirerande miljö och jag har lärt mig mycket, både om forskning och mig själv.
Stefan Nordlund, för att du alltid lyssnar tålmodigt på en students predikament.
Alla på DBB, speciellt Anki, Kicki, Lotta och Ann, som hjälpt mig många gånger med
administrativa saker.
Past and present members of PN group especially: Martin A, Vickan, Susanne F, Helena,
Susanne VdB, Deb, Tove, Karl-Magnus, Agnes, Martin M, Martin Hä, Amin, Benita,
Anders T, Jessica A, Andreas, Herwig, Albert, Ken, Lina-Beth, Alberto, Marie, Pelle,
Hanna, Henrik, Anna, Florian, Elisabeth and Marina.
Monica, för alla visdomsord om livet.
Said, för ditt fantastiska tålamod och för att du tar hand om labbet!!
Ulrika, U som i ullig, L för labrador!!!! För att du är en fantastisk vän.
Audur, för sjukt roliga stunder både i och utanför labbet! Och all hjälp!!!!
Lola, for always sharing ranting experiences and listening to my ranting.
Heidi, för prat och stöd.
Martin, Stina, Pål och Emma, för upplyftande diskussioner som berör alla möjliga ämnen!
Daniel M-M, för att du alltid kan få mig att skratta.
Karin, för vänskap på labbet och utanför.
Daniel G, för din humor.
Damian, för att du tålmodigt lyssnar och hjälper mig med mina dataproblem.
Maria D, för alla leenden.
People at Beida, especially Xiao Dong, Erik, Nan Jie, Xiaoyan, Koh and Dr Gao! Xie xie.
Sunny “shi pa” and family, your generosity and curiosity overwhelm me. I am certain we
will meet again!
Mina Kina-kompisar, Anders K, Kim, Tove, Anna, Sara, Anders L, Ola och Maria!!!!
Stockholms biovetenskapliga forskarskola, med Eva och Camilla i spetsen.
Mina fosko-kompisar, speciellt Anton, Anna och Linda. Med era historier inser jag att jag
inte är själv!
59
Mia och Linda för att ni är helt utanför labblivet, men alltid är på min sida!
Jenny, för att du alltid är redo att luncha.
Crista and Jill, for your never seizing friendship although you are so far away.
Jasmine and Stephanie, for great times and really being there for me!
Familjerna Nolvi, Jansson/Umana, Tronström och Sundesson.
My entire family in Singapore, who means the world to me, see you soon!
Familjen Cornvik, för att ni alltid lyssnar på mig och mitt eviga tjatande om allt och inget.
Anna-Karin, det finns inga ord som beskriver vår vänskap. Du är världens bästa och ingen
kan ersätta dig!
Kim, världens bästa bror, som alltid ser till att ta ner mig på jorden och som alltid ställer upp
för mig.
Mamma och Pappa, för att ni helt utan tvekan står vid mig. Ni ger mig perspektiv och
försöker alltid förstå vad jag håller på med, både på labbet och med mitt liv.
Tobias, för den fantastiska person du är. För det du betytt, betyder och alltid kommer att
betyda för mig. Du gör mig komplett i alla avseenden. DOJ, DOJ!
60
References
1. Koonin, E. V., Senkevich, T. G., and Dolja, V. V. (2006) The ancient Virus World and evolution of cells, Biol Direct 1, 29.
2. Cann, A. J. (2005) Principles of Molecular Virology (Standard Edition), Fourth Edition.
3. Abo, M. E., and Sy, A. A. (1998) Rice Virus Diseases: Epidemiology and Management Strategies, J. Sust. Agri. 11, 113-134.
4. Khodosevich, K., Lebedev, Y., and Sverdlov, E. (2002) Endogenous retroviruses and human evolution, Comp. Funct. Genomics 3, 494-498.
5. Arvin, A., Campadelli-Fiume, G., Mocarski, E., Moore, P. S., Roizman, B., Whitley, R., and Yamanishi, K. (2007) Human
Herpesviruses, Biology, Therapy and Immunoprophylaxis, First Edition.
6. Dutta, A. (2008) Epidemiology of poliomyelitis-Options and update, Vaccine.
7. Zhu, F. X., Chong, J. M., Wu, L., and Yuan, Y. (2005) Virion proteins of Kaposi's sarcoma-associated herpesvirus, J. Virol. 79, 800-811.
8. Baldick, C. J., Jr., and Shenk, T. (1996) Proteins associated with purified human cytomegalovirus particles, J. Virol. 70, 6097-6105.
9. Johannsen, E., Luftig, M., Chase, M. R., Weicksel, S., Cahir-McFarland, E., Illanes, D., Sarracino, D., and Kieff, E. (2004) Proteins of purified Epstein-Barr virus, Proc. Natl. Acad. Sci. U. S.
A. 101, 16286-16291. 10. Loret, S., Guay, G., and Lippe, R. (2008) Comprehensive
characterization of extracellular herpes simplex virus type 1 virions, J. Virol. 82, 8605-8618.
11. Kalejta, R. F. (2008) Tegument proteins of human cytomegalovirus, Microbiol. Mol. Biol. Rev. 72, 249-265.
12. Heldwein, E. E., and Krummenacher, C. (2008) Entry of herpesviruses into mammalian cells, Cell. Mol. Life Sci. 65, 1653-1668.
13. Spear, P. G. (2004) Herpes simplex virus: receptors and ligands for cell entry, Cell. Microbiol. 6, 401-410.
14. Mori, I., and Nishiyama, Y. (2005) Herpes simplex virus and varicella-zoster virus: why do these human alphaherpesviruses behave so differently from one another?, Rev. Med. Virol. 15, 393-406.
15. Lin, A., Xu, H., and Yan, W. (2007) Modulation of HLA expression in human cytomegalovirus immune evasion, Cell. Mol. Immunol. 4, 91-98.
61
16. Laurent, C., Meggetto, F., and Brousset, P. (2008) Human herpesvirus 8 infections in patients with immunodeficiencies, Hum.
Pathol. 39, 983-993. 17. Klein, E., Kis, L. L., and Klein, G. (2007) Epstein-Barr virus
infection in humans: from harmless to life endangering virus-lymphocyte interactions, Oncogene 26, 1297-1305.
18. Griffiths, P. D. (2006) CMV as a cofactor enhancing progression of AIDS, J. Clin. Virol. 35, 489-492.
19. Sandri-Goldin, R. M. (2003) Replication of the herpes simplex virus genome: does it really go around in circles?, Proc. Natl. Acad. Sci.
U. S. A. 100, 7428-7429. 20. Deshmane, S. L., and Fraser, N. W. (1989) During latency, herpes
simplex virus type 1 DNA is associated with nucleosomes in a chromatin structure, J. Virol. 63, 943-947.
21. Kennedy, P. G., and Chaudhuri, A. (2002) Herpes simplex encephalitis, J. Neurol. Neurosurg. Psychiatry 73, 237-238.
22. Landolfo, S., Gariglio, M., Gribaudo, G., and Lembo, D. (2003) The human cytomegalovirus, Pharmacol. Ther. 98, 269-297.
23. De Bolle, L., Naesens, L., and De Clercq, E. (2005) Update on human herpesvirus 6 biology, clinical features, and therapy, Clin.
Microbiol. Rev. 18, 217-245. 24. Vancikova, Z., and Dvorak, P. (2001) Cytomegalovirus infection in
immunocompetent and immunocompromised individuals--a review, Curr. Drug Targets Immune Endocr. Metabol. Disord. 1, 179-187.
25. Mercorelli, B., Sinigalia, E., Loregian, A., and Palu, G. (2008) Human cytomegalovirus DNA replication: antiviral targets and drugs, Rev. Med. Virol. 18, 177-210.
26. Du, M. Q., Bacon, C. M., and Isaacson, P. G. (2007) Kaposi sarcoma-associated herpesvirus/human herpesvirus 8 and lymphoproliferative disorders, J. Clin. Pathol. 60, 1350-1357.
27. Barozzi, P., Potenza, L., Riva, G., Vallerini, D., Quadrelli, C., Bosco, R., Forghieri, F., Torelli, G., and Luppi, M. (2007) B cells and herpesviruses: a model of lymphoproliferation, Autoimmun. Rev.
7, 132-136. 28. Cotter, M. A., 2nd, and Robertson, E. S. (1999) The latency-
associated nuclear antigen tethers the Kaposi's sarcoma-associated herpesvirus genome to host chromosomes in body cavity-based lymphoma cells, Virology 264, 254-264.
29. Leight, E. R., and Sugden, B. (2000) EBNA-1: a protein pivotal to latent infection by Epstein-Barr virus, Rev Med Virol 10, 83-100.
30. Thorley-Lawson, D. A., Duca, K. A., and Shapiro, M. (2008) Epstein-Barr virus: a paradigm for persistent infection - for real and in virtual reality, Trends Immunol. 29, 195-201.
31. Lunemann, J. D., and Munz, C. (2007) Epstein-Barr virus and multiple sclerosis, Curr. Neurol. Neurosci. Rep. 7, 253-258.
62
32. Posnett, D. N. (2008) Herpesviruses and autoimmunity, Curr. Opin.
Investig. Drugs 9, 505-514. 33. Chang, Y., Cesarman, E., Pessin, M. S., Lee, F., Culpepper, J.,
Knowles, D. M., and Moore, P. S. (1994) Identification of herpesvirus-like DNA sequences in AIDS-associated Kaposi's sarcoma, Science 266, 1865-1869.
34. Verma, S. C., and Robertson, E. S. (2003) Molecular biology and pathogenesis of Kaposi sarcoma-associated herpesvirus, FEMS
Microbiol. Lett. 222, 155-163. 35. Dedicoat, M., and Newton, R. (2003) Review of the distribution of
Kaposi's sarcoma-associated herpesvirus (KSHV) in Africa in relation to the incidence of Kaposi's sarcoma, Br. J. Cancer 88, 1-3.
36. Haburchak, D. R., Thomason, J. W., Edelman, D. C., and Constantine, N. T. (2001) Negative human herpesvirus 8 serology in sarcoidosis, J. Hum. Virol. 4, 111.
37. Di Alberti, L., Piattelli, A., Artese, L., Favia, G., Patel, S., Saunders, N., Porter, S. R., Scully, C. M., Ngui, S. L., and Teo, C. G. (1997) Human herpesvirus 8 variants in sarcoid tissues, Lancet 350, 1655-1661.
38. Sjak-Shie, N. N., Vescio, R. A., and Berenson, J. R. (1999) The role of human herpesvirus-8 in the pathogenesis of multiple myeloma, Hematol. Oncol. Clin. North Am. 13, 1159-1167.
39. Zong, J. C., Arav-Boger, R., Alcendor, D. J., and Hayward, G. S. (2007) Reflections on the interpretation of heterogeneity and strain differences based on very limited PCR sequence data from Kaposi's sarcoma-associated herpesvirus genomes, J. Clin. Virol. 40, 1-8.
40. Choi, J., Means, R. E., Damania, B., and Jung, J. U. (2001) Molecular piracy of Kaposi's sarcoma associated herpesvirus, Cytokine Growth Factor Rev. 12, 245-257.
41. Fujimuro, M., Hayward, S. D., and Yokosawa, H. (2007) Molecular piracy: manipulation of the ubiquitin system by Kaposi's sarcoma-associated herpesvirus, Rev. Med. Virol. 17, 405-422.
42. Neipel, F., Albrecht, J. C., and Fleckenstein, B. (1997) Cell-homologous genes in the Kaposi's sarcoma-associated rhadinovirus human herpesvirus 8: determinants of its pathogenicity, J. Virol. 71, 4187-4192.
43. McGeoch, D. J., and Cook, S. (1994) Molecular phylogeny of the alphaherpesvirinae subfamily and a proposed evolutionary timescale, J. Mol. Biol. 238, 9-22.
44. Davison, A. (2002) Comments on the phylogenetics and evolution of herpesviruses and other large DNA viruses, Virus Res. 82, 127-132.
45. Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A., Merrick, J. M., McKenney, K., Sutton, G.,
63
FitzHugh, W., Fields, C., Gocayhe, J.D., Scott, J., Shirley, R., Liu, L-I., Glodek, A., Kelley, J.M, Weidman, J.F., Phillips, C.A., Spriggs, T., Hedblom, E., Cotton, M.D., Herback, T., Hanna, M.C., Nguyen, D.T., Saudek, D.M., Brandon, R.C., Fine, L.D., Frichtman, J.L., Fuhrmann, J.L., Geoghagen, N.S.M., Gnehm, C.C., McDonald, L.A., Small, K.V., Fraser, C.M., Smith, H.O., Venter, J.C. (1995) Whole-genome random sequencing and assembly of Haemophilus
influenzae Rd, Science 269, 496-512. 46. (2001) Initial sequencing and analysis of the human genome, Nature
409, 860-921. 47. Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J.,
Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., Gocayne, J. D., Amanatides, P., Ballew, R. M., Huson, D. H., Wortman, J. R., Zhang, Q., Kodira, C. D., Zheng, X. H., Chen, L., Skupski, M., Subramanian, G., Thomas, P. D., Zhang, J., Gabor Miklos, G. L., Nelson, C., Broder, S., Clark, A. G., Nadeau, J., McKusick, V. A., Zinder, N., Levine, A. J., Roberts, R. J., Simon, M., Slayman, C., Hunkapiller, M., Bolanos, R., Delcher, A., Dew, I., Fasulo, D., Flanigan, M., Florea, L., Halpern, A., Hannenhalli, S., Kravitz, S., Levy, S., Mobarry, C., Reinert, K., Remington, K., Abu-Threideh, J., Beasley, E., Biddick, K., Bonazzi, V., Brandon, R., Cargill, M., Chandramouliswaran, I., Charlab, R., Chaturvedi, K., Deng, Z., Francesco, V. D., Dunn, P., Eilbeck, K., Evangelista, C., Gabrielian, A. E., Gan, W., Ge, W., Gong, F., Gu, Z., Guan, P., Heiman, T. J., Higgins, M. E., Ji, R.-R., Ke, Z., Ketchum, K. A., Lai, Z., Lei, Y., Li, Z., Li, J., Liang, Y., Lin, X., Lu, F., Merkulov, G. V., Milshina, N., Moore, H. M., Naik, A. K., Narayan, V. A., Neelam, B., Nusskern, D., Rusch, D. B., Salzberg, S., Shao, W., Shue, B., Sun, J., Wang, Z. Y., Wang, A., Wang, X., Wang, J., Wei, M.-H., Wides, R., Xiao, C., Yan, C., Yao, A., Ye, J., Zhan, M., Zhang, W., Zhang, H., Zhao, Q., Zheng, L., Zhong, F., Zhong, W., Zhu, S. C., Zhao, S., Gilbert, D., Baumhueter, S., Spier, G., Carter, C., Cravchik, A., Woodage, T., Ali, F., An, H., Awe, A., Baldwin, D., Baden, H., Barnstead, M., Barrow, I., Beeson, K., Busam, D., Carver, A., Center, A., Cheng, M. L., Curry, L., Danaher, S., Davenport, L., Desilets, R., Dietz, S., Dodson, K., Doup, L., Ferriera, S., Garg, N., Gluecksmann, A., Hart, B., Haynes, J., Haynes, C., Heiner, C., Hladun, S., Hostin, D., Houck, J., Howland, T., Ibegwam, C., Johnson, J., Kalush, F., Kline, L., Koduru, S., Love, A., Mann, F., May, D., McCawley, S., McIntosh, T., McMullen, I., Moy, M., Moy, L., Murphy, B., Nelson, K., Pfannkoch, C., Pratts, E., Puri, V., Qureshi, H., Reardon, M., Rodriguez, R., Rogers, Y.-H., Romblad, D., Ruhfel, B., Scott, R., Sitter, C., Smallwood, M., Stewart, E., Strong, R., Suh, E., Thomas, R., Tint, N. N., Tse, S., Vech, C., Wang, G., Wetter, J., Williams, S.,
64
Williams, M., Windsor, S., Winn-Deen, E., Wolfe, K., Zaveri, J., Zaveri, K., Abril, J. F., Guigo, R., Campbell, M. J., Sjolander, K. V., Karlak, B., Kejariwal, A., Mi, H., Lazareva, B., Hatton, T., Narechania, A., Diemer, K., Muruganujan, A., Guo, N., Sato, S., Bafna, V., Istrail, S., Lippert, R., Schwartz, R., Walenz, B., Yooseph, S., Allen, D., Basu, A., Baxendale, J., Blick, L., Caminha, M., Carnes-Stine, J., Caulk, P., Chiang, Y.-H., Coyne, M., Dahlke, C., Mays, A. D., Dombroski, M., Donnelly, M., Ely, D., Esparham, S., Fosler, C., Gire, H., Glanowski, S., Glasser, K., Glodek, A., Gorokhov, M., Graham, K., Gropman, B., Harris, M., Heil, J., Henderson, S., Hoover, J., Jennings, D., Jordan, C., Jordan, J., Kasha, J., Kagan, L., Kraft, C., Levitsky, A., Lewis, M., Liu, X., Lopez, J., Ma, D., Majoros, W., McDaniel, J., Murphy, S., Newman, M., Nguyen, T., Nguyen, N., Nodell, M., Pan, S., Peck, J., Peterson, M., Rowe, W., Sanders, R., Scott, J., Simpson, M., Smith, T., Sprague, A., Stockwell, T., Turner, R., Venter, E., Wang, M., Wen, M., Wu, D., Wu, M., Xia, A., Zandieh, A., and Zhu, X. (2001) The Sequence of the Human Genome, Science 291, 1304-1351.
48. Venter, J. C., Remington, K., Heidelberg, J. F., Halpern, A. L., Rusch, D., Eisen, J. A., Wu, D. Y., Paulsen, I., Nelson, K. E., Nelson, W., Fouts, D. E., Levy, S., Knap, A. H., Lomas, M. W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y. H., and Smith, H. O. (2004) Environmental genome shotgun sequencing of the Sargasso Sea, Science 304, 66-74.
49. Kim, S. H. (1998) Shining a light on structural genomics, Nat.
Struct. Biol. 5 Suppl, 643-645. 50. Tarbouriech, N., Buisson, M., Geoui, T., Daenke, S., Cusack, S., and
Burmeister, W. P. (2006) Structural genomics of the Epstein-Barr virus, Acta Crystallogr. D. Biol. Crystallogr. 62, 1276-1285.
51. Fogg, M. J., Alzari, P., Bahar, M., Bertini, I., Betton, J. M., Burmeister, W. P., Cambillau, C., Canard, B., Corrondo, M. A., Coll, M., Daenke, S., Dym, O., Egloff, M. P., Enguita, F. J., Geerlof, A., Haouz, A., Jones, T. A., Ma, Q., Manicka, S. N., Migliardi, M., Nordlund, P., Owens, R. J., Peleg, Y., Schneider, G., Schnell, R., Stuart, D. I., Tarbouriech, N., Unge, T., Wilkinson, A. J., Wilmanns, M., Wilson, K. S., Zimhony, O., and Grimes, J. M. (2006) Application of the use of high-throughput technologies to the determination of protein structures of bacterial and viral pathogens, Acta Crystallogr. D. Biol. Crystallogr. 62, 1196-1207.
52. Marsden, R. L., and Orengo, C. A. (2008) Target selection for structural genomics: an overview, Meth. Mol. Biol. 426, 3-25.
53. Graslund, S., Nordlund, P., Weigelt, J., Hallberg, B. M., Bray, J., Gileadi, O., Knapp, S., Oppermann, U., Arrowsmith, C., Hui, R., Ming, J., dhe-Paganon, S., Park, H. W., Savchenko, A., Yee, A.,
65
Edwards, A., Vincentelli, R., Cambillau, C., Kim, R., Kim, S. H., Rao, Z., Shi, Y., Terwilliger, T. C., Kim, C. Y., Hung, L. W., Waldo, G. S., Peleg, Y., Albeck, S., Unger, T., Dym, O., Prilusky, J., Sussman, J. L., Stevens, R. C., Lesley, S. A., Wilson, I. A., Joachimiak, A., Collart, F., Dementieva, I., Donnelly, M. I., Eschenfeldt, W. H., Kim, Y., Stols, L., Wu, R., Zhou, M., Burley, S. K., Emtage, J. S., Sauder, J. M., Thompson, D., Bain, K., Luz, J., Gheyi, T., Zhang, F., Atwell, S., Almo, S. C., Bonanno, J. B., Fiser, A., Swaminathan, S., Studier, F. W., Chance, M. R., Sali, A., Acton, T. B., Xiao, R., Zhao, L., Ma, L. C., Hunt, J. F., Tong, L., Cunningham, K., Inouye, M., Anderson, S., Janjua, H., Shastry, R., Ho, C. K., Wang, D., Wang, H., Jiang, M., Montelione, G. T., Stuart, D. I., Owens, R. J., Daenke, S., Schutz, A., Heinemann, U., Yokoyama, S., Bussow, K., and Gunsalus, K. C. (2008) Protein production and purification, Nat. Methods 5, 135-146.
54. Lesley, S. A., Kuhn, P., Godzik, A., Deacon, A. M., Mathews, I., Kreusch, A., Spraggon, G., Klock, H. E., McMullan, D., Shin, T., Vincent, J., Robb, A., Brinen, L. S., Miller, M. D., McPhillips, T. M., Miller, M. A., Scheibe, D., Canaves, J. M., Guda, C., Jaroszewski, L., Selby, T. L., Elsliger, M. A., Wooley, J., Taylor, S. S., Hodgson, K. O., Wilson, I. A., Schultz, P. G., and Stevens, R. C. (2002) Structural genomics of the Thermotoga maritima proteome implemented in a high-throughput structure determination pipeline, Proc. Natl. Acad. Sci. U. S. A. 99, 11664-11669.
55. Bussow, K., Scheich, C., Sievert, V., Harttig, U., Schultz, J., Simon, B., Bork, P., Lehrach, H., and Heinemann, U. (2005) Structural genomics of human proteins--target selection and generation of a public catalogue of expression clones, Microb. Cell. Fact. 4, 21.
56. Heinemann, U., Bussow, K., Mueller, U., and Umbach, P. (2003) Facilities and methods for the high-throughput crystal structural analysis of human proteins, Acc. Chem. Res. 36, 157-163.
57. Banci, L., Bertini, I., Cusack, S., de Jong, R. N., Heinemann, U., Jones, E. Y., Kozielski, F., Maskos, K., Messerschmidt, A., Owens, R., Perrakis, A., Poterszman, A., Schneider, G., Siebold, C., Silman, I., Sixma, T., Stewart-Jones, G., Sussman, J. L., Thierry, J. C., and Moras, D. (2006) First steps towards effective methods in exploiting high-throughput technologies for the determination of human protein structures of high biomedical value, Acta Crystallogr. D.
Biol. Crystallogr. 62, 1208-1217. 58. Aricescu, A. R., Assenberg, R., Bill, R. M., Busso, D., Chang, V. T.,
Davis, S. J., Dubrovsky, A., Gustafsson, L., Hedfalk, K., Heinemann, U., Jones, I. M., Ksiazek, D., Lang, C., Maskos, K., Messerschmidt, A., Macieira, S., Peleg, Y., Perrakis, A., Poterszman, A., Schneider, G., Sixma, T. K., Sussman, J. L., Sutton, G., Tarboureich, N., Zeev-Ben-Mordehai, T., and Jones, E. Y.
66
(2006) Eukaryotic expression: developments for structural proteomics, Acta Crystallogr. D. Biol. Crystallogr. 62, 1114-1124.
59. Gong, W. M., Liu, H. Y., Niu, L. W., Shi, Y. Y., Tang, Y. J., Teng, M. K., Wu, J. H., Liang, D. C., Wang, D. C., Wang, J. F., Ding, J. P., Hu, H. Y., Huang, Q. H., Zhang, Q. H., Lu, S. Y., An, J. L., Liang, Y. H., Zheng, X. F., Gu, X. C., and Su, X. D. (2003) Structural genomics efforts at the Chinese Academy of Sciences and Peking University, J. Struct. Funct. Genomics 4, 137 - 139.
60. Alzari, P. M., Berglund, H., Berrow, N. S., Blagova, E., Busso, D., Cambillau, C., Campanacci, V., Christodoulou, E., Eiler, S., Fogg, M. J., Folkers, G., Geerlof, A., Hart, D., Haouz, A., Herman, M. D., Macieira, S., Nordlund, P., Perrakis, A., Quevillon-Cheruel, S., Tarandeau, F., van Tilbeurgh, H., Unger, T., Luna-Vargas, M. P., Velarde, M., Willmanns, M., and Owens, R. J. (2006) Implementation of semi-automated cloning and prokaryotic expression screening: the impact of SPINE, Acta Crystallogr. D.
Biol. Crystallogr. 62, 1103-1113. 61. Edwards, A. M., Arrowsmith, C. H., Christendat, D., Dharamsi, A.,
Friesen, J. D., Greenblatt, J. F., and Vedadi, M. (2000) Protein production: feeding the crystallographers and NMR spectroscopists, Nat. Struct. Biol. 7, 970-972.
62. Sorensen, H., and Mortensen, K. (2005) Soluble expression of recombinant proteins in the cytoplasm of Escherichia coli, Microb.l
Cell Fact. 4, 1-8. 63. Makrides, S. C. (1996) Strategies for achieving high-level
expression of genes in Escherichia coli, Microbiol. Rev. 60, 512-538.
64. Hannig, G., and Makrides, S. C. (1998) Strategies for optimizing heterologous protein expression in Escherichia coli, Trends
Biotechnol. 16, 54-60. 65. Prouty, W. F., and Goldberg, A. L. (1972) Fate of abnormal proteins
in E. coli accumulation in intracellular granules before catabolism, Nature New Biol. 240, 147-150.
66. Prouty, W. F., Karnovsky, M. J., and Goldberg, A. L. (1975) Degradation of abnormal proteins in Escherichia coli. Formation of protein inclusions in cells exposed to amino acid analogs, J. Biol.
Chem. 250, 1112-1122. 67. Bowden, G. A., Paredes, A. M., and Georgiou, G. (1991) Structure
and morphology of inclusion bodies in Escherichia coli,
Biotechnology. (N. Y). 9, 725-730. 68. Taylor, G., Hoare, M., Gray, D. R., and Martson, F. A. O. (1986)
Size and density of inclusion bodies, Biotechnology. (N. Y). 4, 553-557.
69. Garcia-Fruitos, E., Gonzalez-Montalban, N., Morell, M., Vera, A., Ferraz, R., Aris, A., Ventura, S., and Villaverde, A. (2005)
67
Aggregation as bacterial inclusion bodies does not imply inactivation of enzymes and fluorescent proteins, Microb. Cell Fact.
4, 27. 70. Carrio, M. M., Corchero, J. L., and Villaverde, A. (1998) Dynamics
of in vivo protein aggregation: building inclusion bodies in recombinant bacteria, FEMS Microb. Lett. 169, 9-15.
71. Carrio, M. M., and Villaverde, A. (2003) Role of molecular chaperones in inclusion body formation, FEBS Lett. 537, 215-221.
72. Baneyx, F., and Mujacic, M. (2004) Recombinant protein folding and misfolding in Escherichia coli, Nat. Biotechnol. 22, 1399-1408.
73. Baneyx, F., and Palumbo, J. L. (2003) Improving heterologous protein folding via molecular chaperone and foldase co-expression, Methods Mol. Biol. 205, 171-197.
74. Sorensen, H. P., and Mortensen, K. K. (2005) Advanced genetic strategies for recombinant protein expression in Escherichia coli, J.
Biotechnol. 115, 113-128. 75. Dyson, M. R., Shadbolt, S. P., Vincent, K. J., Perera, R. L., and
McCafferty, J. (2004) Production of soluble mammalian proteins in Escherichia coli: identification of protein features that correlate with successful expression, BMC Biotechnol 4, 32.
76. Vincentelli, R., Bignon, C., Gruez, A., Canaan, S., Sulzenbacher, G., Tegoni, M., Campanacci, V., and Cambillau, C. (2003) Medium-scale structural genomics: strategies for protein expression and crystallization, Acc. Chem. Res. 36, 165-172.
77. Brocchieri, L., and Karlin, S. (2005) Protein length in eukaryotic and prokaryotic proteomes, Nucleic Acids Res 33, 3390-3400.
78. Sorensen, H. P., Sperling-Petersen, H. U., and Mortensen, K. K. (2003) Production of recombinant thermostable proteins expressed in Escherichia coli: completion of protein synthesis is the bottleneck, J. Chromatogr. B Analyt. Technol. Biomed. Life. Sci.
786, 207-214. 79. Hua, Z., Wang, H., Chen, D., Chen, Y., and Zhu, D. (1994)
Enhancement of expression of human granulocyte-macrophage colony stimulating factor by argU gene product in Escherichia coli,
Biochem. Mol. Biol. Int. 32, 537-543. 80. Baneyx, F., and Georgiou, G. (1991) Construction and
characterization of Escherichia coli strains deficient in multiple secreted proteases: protease III degrades high molecular weight substrates in vivo, J. Bacteriol. 173, 2696-2703.
81. Studier, F. W., and Moffatt, B. A. (1986) Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes, J. Mol. Biol. 189, 113-130.
82. Miroux, B., and Walker, J. E. (1996) Over-production of proteins in Escherichia coli: Mutant hosts that allow synthesis of some
68
membrane proteins and globular proteins at high levels, J. Mol. Biol.
260, 289-298. 83. Bessette, P. H., Aslund, F., Beckwith, J., and Georgiou, G. (1999)
Efficient folding of proteins with multiple disulfide bonds in the Escherichia coli cytoplasm, Proc. Natl. Acad. Sci. USA 96, 13703-13708.
84. Abergel, C., Coutard, B., Byrne, D., Chenivesse, S., Claude, J.-B., Deregnaucourt, C. l., Fricaux, T., Gianesini-Boutreux, C., Jeudy, S., Lebrun, R. g., Maza, C., Notredame, C. d., Poirot, O., Suhre, K., Varagnol, M., and Claverie, J.-M. (2003) Structural genomics of highly conserved microbial genes of unknown function in search of new antibacterial targets, J. Struct. Funct. Genom. 4, 141-157.
85. Berrow, N. S., Bussow, K., Coutard, B., Diprose, J., Ekberg, M., Folkers, G. E., Levy, N., Lieu, V., Owens, R. J., Peleg, Y., Pinaglia, C., Quevillon-Cheruel, S., Salim, L., Scheich, C., Vincentelli, R., and Busso, D. (2006) Recombinant protein expression and solubility screening in Escherichia coli: a comparative study, Acta
Crystallogr. D. Biol. Crystallogr. 62, 1218-1226. 86. Vasina, J. A., and Baneyx, F. (1997) Expression of aggregation-
prone proteins at low temperatures: a comparative study of the E.
coli cspA and tac promoter systems, Protein Expr. Purif. 9, 211-218. 87. Vasina, J. A., and Baneyx, F. (1997) Expression of Aggregation-
Prone Recombinant Proteins at Low Temperatures: A Comparative Study of the Escherichia coli cspA and tac Promoter Systems, Protein Expr. Purif. 9, 211-218.
88. Kiefhaber, T., Rudolph, R., Kohler, H. H., and Buchner, J. (1991) Protein aggregation in vitro and in vivo: a quantitative model of the kinetic competition between folding and aggregation, Biotechnology. (N. Y). 9, 825-829.
89. Ferrer, M., Chernikova, T. N., Yakimov, M. M., Golyshin, P. N., and Timmis, K. N. (2003) Chaperonins govern growth of Escherichia coli at low temperatures, Nat. Biotechnol. 21, 1266-1267.
90. Kapust, R. B., and Waugh, D. S. (1999) Escherichia coli maltose-binding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused, Protein Sci. 8, 1668-1674.
91. Hammarstrom, M., Hellgren, N., Van den Berg, S., Berglund, H., and Hard, T. (2002) Rapid screening for improved solubility of small human proteins produced as fusion proteins in Escherichia
coli, Protein Sci. 11, 313-321. 92. Braun, P., Hu, Y., Shen, B., Halleck, A., Koundinya, M., Harlow,
E., and LaBaer, J. (2002) Proteome-scale purification of human proteins from bacteria, Proc. Natl. Acad. Sci. USA 99, 2654-2659.
93. Braun, P., and LaBaer, J. (2003) High throughput protein production for functional proteomics, Trends Biotechnol. 21, 383-388.
69
94. Waugh, D. S. (2005) Making the most of affinity tags, Trends
Biotechnol. 23, 316-320. 95. Nallamsetty, S., and Waugh, D. S. (2007) A generic protocol for the
expression and purification of recombinant proteins in Escherichia
coli using a combinatorial His6-maltose binding protein fusion tag, Nat. Protoc. 2, 383-391.
96. Jeon, W., Aceti, D., Bingman, C., Vojtik, F., Olson, A., Ellefson, J., McCombs, J., Sreenath, H., Blommel, P., Seder, K., Burns, B., Geetha, H., Harms, A., Sabat, G., Sussman, M., Fox, B., and Phillips, G. (2005) High-throughput Purification and Quality Assurance of Arabidopsis thaliana Proteins for Eukaryotic Structural Genomics, J. Struct. Funct. Genom. 6, 143-147.
97. Ashraf, S. S., Benson, R. E., Payne, E. S., Halbleib, C. M., and Gron, H. (2004) A novel multi-affinity tag system to produce high levels of soluble and biotinylated proteins in Escherichia coli, Protein Expr. Purif. 33, 238-245.
98. Nallamsetty, S., and Waugh, D. S. (2006) Solubility-enhancing proteins MBP and NusA play a passive role in the folding of their fusion partners, Protein Expr. Purif. 45, 175-182.
99. Hochuli, E., Bannwarth, W., Dobeli, H., Gentz, R., and Stuber, D. (1988) Genetic Approach to Facilitate Purification of Recombinant Proteins with a Novel Metal Chelate Adsorbent, Nat. Biotechnol. 6,
1321-1325. 100. Graslund, S., Sagemark, J., Berglund, H., Dahlgren, L. G., Flores,
A., Hammarstrom, M., Johansson, I., Kotenyova, T., Nilsson, M., Nordlund, P., and Weigelt, J. (2008) The use of systematic N- and C-terminal deletions to promote production and structural studies of recombinant proteins, Protein Expr. Purif. 58, 210-221.
101. Cohen, S. L. (1996) Domain elucidation by mass spectrometry, Structure 4, 1013-1016.
102. Gao, X., Bain, K., Bonanno, J., Buchanan, M., Henderson, D., Lorimer, D., Marsh, C., Reynes, J., Sauder, J., Schwinn, K., Thai, C., and Burley, S. (2005) High-throughput Limited Proteolysis/Mass Spectrometry for Protein Domain Elucidation, J. Struct. Funct.
Genom. 6, 129-134. 103. Bjornsson, A., Mottagui-Tabar, S., and Isaksson, L. A. (1996)
Structure of the C-terminal end of the nascent peptide influences translation termination, EMBO J. 15, 1696-1704.
104. Pantazatos, D., Kim, J. S., Klock, H. E., Stevens, R. C., Wilson, I. A., Lesley, S. A., and Woods, V. L., Jr. (2004) Rapid refinement of crystallographic protein construct definition employing enhanced hydrogen/deuterium exchange MS, Proc. Natl. Acad. Sci. U. S. A.
101, 751-756. 105. Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V.,
Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S.,
70
Sonnhammer, E. L. L., Studholme, D. J., Yeats, C., and Eddy, S. R. (2004) The Pfam protein families database, Nucleic Acids Res. 32, D138-D141.
106. Sauder, M. J., Rutter, M. E., Bain, K., Rooney, I., Gheyi, T., Atwell, S., Thompson, D. A., Emtage, S., and Burley, S. K. (2008) High throughput protein production and crystallization at NYSGXRC, Meth. Mol. Biol. 426, 561-575.
107. Puri, M., Robin, G., Cowieson, N., Forwood, J. K., Listwan, P., Hu, S. H., Guncar, G., Huber, T., Kellie, S., Hume, D. A., Kobe, B., and Martin, J. L. (2006) Focusing in on structural genomics: the University of Queensland structural biology pipeline, Biomol. Eng.
23, 281-289. 108. Peti, W., and Page, R. (2007) Strategies to maximize heterologous
protein expression in Escherichia coli with minimal cost, Protein
Expr. Purif. 51, 1-10. 109. Luan, C. H., Qiu, S., Finley, J. B., Carson, M., Gray, R. J., Huang,
W., Johnson, D., Tsao, J., Reboul, J., Vaglio, P., Hill, D. E., Vidal, M., Delucas, L. J., and Luo, M. (2004) High-throughput expression of C. elegans proteins, Genome Res. 14, 2102-2110.
110. Lamesch, P., Milstein, S., Hao, T., Rosenberg, J., Li, N., Sequerra, R., Bosak, S., Doucette-Stamm, L., Vandenhaute, J., Hill, D. E., and Vidal, M. (2004) C. elegans ORFeome version 3.1: increasing the coverage of ORFeome resources with improved gene predictions, Genome Res. 14, 2064-2069.
111. Knaust, R. K. C., and Nordlund, P. (2001) Screening for soluble expression of recombinant proteins in a 96-well format, Anal.
Biochem. 297, 79-85. 112. Eshaghi, S., Hedren, M., Nasser, M. I. A., Hammarberg, T.,
Thornell, A., and Nordlund, P. (2005) An efficient strategy for high-throughput expression screening of recombinant integral membrane proteins, Protein Sci. 14, 676-683.
113. Waldo, G. S., Standish, B. M., Berendzen, J., and Terwilliger, T. C. (1999) Rapid protein-folding assay using green fluorescent protein, Nat. Biotechnol. 17, 691-695.
114. Maxwell, K. L., Mittermaier, A. K., Forman-Kay, J. D., and Davidson, A. R. (1999) A simple in vivo assay for increased protein solubility, Protein Sci. 8, 1908-1911.
115. Cabantous, S., Pédelacq, J.-D., Mark, B. L., Naranjo, C., Terwilliger, T., and Waldo, G. (2005) Recent Advances in GFP Folding Reporter and Split-GFP Solubility Reporter Technologies. Application to Improving the Folding and Solubility of Recalcitrant Proteins from Mycobacterium tuberculosis, J. Struct. Funct. Genom.
6, 113-119. 116. Denis-Quanquin, S., Lamouroux, L., Lougarre, A., Maheo, S.,
Saves, I., Paquereau, L., Demange, P., and Fournier, D. (2007)
71
Protein expression from synthetic genes: selection of clones using GFP, J. Biotechnol. 131, 223-230.
117. Wigley, W. C., Stidham, R. D., Smith, N. M., Hunt, J. F., and Thomas, P. J. (2001) Protein solubility and folding monitored in
vivo by structural complementation of a genetic marker protein, Nat.
Biotechnol. 19, 131-136. 118. Waldo, G. S. (2003) Genetic screens and directed evolution for
protein solubility, Curr. Opin. Chem. Biol. 7, 33-38. 119. Cabantous, S., Terwilliger, T. C., and Waldo, G. S. (2005) Protein
tagging and detection with engineered self-assembling fragments of green fluorescent protein, Nat. Biotechnol. 23, 102-107.
120. DeLisa, M. P., Samuelson, P., Palmer, T., and Georgiou, G. (2002) Genetic analysis of the twin arginine translocator secretion pathway in bacteria, J. Biol. Chem. 277, 29825-29831.
121. DeLisa, M. P., Tullman, D., and Georgiou, G. (2003) Folding quality control in the export of proteins by the bacterial twin-arginine translocation pathway, Proc. Natl. Acad. Sci. U. S. A. 100, 6115-6120.
122. Beckett, D., Kovaleva, E., and Schatz, P. J. (1999) A minimal peptide substrate in biotin holoenzyme synthetase-catalyzed biotinylation, Protein Sci. 8, 921-929.
123. Scott, C. J., Martin, S. L., Wallace, A., Curran, M. D., and Walker, B. (2000) Characterization of the affinity of streptavidin toward a peptide sequence previously identified as a target substrate for biotinylation by the Escherichia coli biotin holoenzyme synthetase, BirA, Anal. Biochem. 284, 416-417.
124. Tarendeau, F., Boudet, J., Guilligay, D., Mas, P. J., Bougault, C. M., Boulo, S., Baudin, F., Ruigrok, R. W., Daigle, N., Ellenberg, J., Cusack, S., Simorre, J. P., and Hart, D. J. (2007) Structure and nuclear import function of the C-terminal domain of influenza virus polymerase PB2 subunit, Nat. Struct. Mol. Biol. 14, 229-233.
125. van den Berg, S., Lofdahl, P. A., Hard, T., and Berglund, H. (2006) Improved solubility of TEV protease by directed evolution, J.
Biotechnol. 121, 291-298. 126. Henikoff, S. (1984) Unidirectional digestion with exonuclease III
creates targeted breakpoints for DNA sequencing, Gene 28, 351-359.
127. Etchegaray, J. P., and Inouye, M. (1999) Translational enhancement by an element downstream of the initiation codon in Escherichia
coli, J. Biol. Chem. 274, 10079-10085. 128. Martinez Molina, D., Cornvik, T., Eshaghi, S., Haeggstrom, J. Z.,
Nordlund, P., and Sabet, M. I. (2008) Engineering membrane protein overproduction in Escherichia coli, Protein Sci. 17, 673-680.
129. Labaer, J., Qiu, Q., Anumanthan, A., Mar, W., Zuo, D., Murthy, T. V., Taycher, H., Halleck, A., Hainsworth, E., Lory, S., and Brizuela,
72
L. (2004) The Pseudomonas aeruginosa PA01 gene collection, Genome Res. 14, 2190-2200.
130. Lamesch, P., Li, N., Milstein, S., Fan, C., Hao, T., Szabo, G., Hu, Z., Venkatesan, K., Bethel, G., Martin, P., Rogers, J., Lawlor, S., McLaren, S., Dricot, A., Borick, H., Cusick, M. E., Vandenhaute, J., Dunham, I., Hill, D. E., and Vidal, M. (2007) hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes, Genomics 89, 307-315.
131. Gong, W., Shen, Y. P., Ma, L. G., Pan, Y., Du, Y. L., Wang, D. H., Yang, J. Y., Hu, L. D., Liu, X. F., Dong, C. X., Ma, L., Chen, Y. H., Yang, X. Y., Gao, Y., Zhu, D., Tan, X., Mu, J. Y., Zhang, D. B., Liu, Y. L., Dinesh-Kumar, S. P., Li, Y., Wang, X. P., Gu, H. Y., Qu, L. J., Bai, S. N., Lu, Y. T., Li, J. Y., Zhao, J. D., Zuo, J., Huang, H., Deng, X. W., and Zhu, Y. X. (2004) Genome-wide ORFeome cloning and analysis of Arabidopsis transcription factor genes, Plant
Physiol. 135, 773-782. 132. Uetz, P., Dong, Y. A., Zeretzke, C., Atzler, C., Baiker, A., Berger,
B., Rajagopala, S. V., Roupelieva, M., Rose, D., Fossum, E., and Haas, J. (2006) Herpesviral protein networks and their interaction with the human proteome, Science 311, 239-242.
133. Gillette, W. K., Esposito, D., Frank, P. H., Zhou, M., Yu, L. R., Jozwik, C., Zhang, X., McGowan, B., Jacobowitz, D. M., Pollard, H. B., Hao, T., Hill, D. E., Vidal, M., Conrads, T. P., Veenstra, T. D., and Hartley, J. L. (2005) Pooled ORF expression technology (POET): using proteomics to screen pools of open reading frames for protein expression, Mol. Cell. Proteomics 4, 1647-1652.
134. Shapiro, J. A. (1998) Thinking about bacterial populations as multicellular organisms, Annu. Rev. Microbiol. 52, 81-104.
135. Vlamakis, H., Aguilar, C., Losick, R., and Kolter, R. (2008) Control of cell fate by the formation of an architecturally complex bacterial community, Genes Dev. 22, 945-953.
136. Boehmer, P. E., and Lehman, I. R. (1997) Herpes simplex virus DNA replication, Annu. Rev. Biochem. 66, 347-384.
137. Wu, J. J., Brentjens, M. H., Torres, G., Yeung-Yue, K., Lee, P., and Tyring, S. K. (2003) Valacyclovir in the treatment of herpes simplex, herpes zoster, and other viral infections, J. Cutan. Med.
Surg. 7, 372-381. 138. Morfin, F., and Thouvenot, D. (2003) Herpes simplex virus
resistance to antiviral drugs, J. Clin. Virol. 26, 29-37. 139. Nishiyama, Y. (2004) Herpes simplex virus gene products: the
accessories reflect her lifestyle well, Rev. Med. Virol. 14, 33-46. 140. Glaunsinger, B., and Ganem, D. (2004) Lytic KSHV infection
inhibits host gene expression by accelerating global mRNA turnover, Mol. Cell 13, 713-723.
73
141. Martinez, R., Sarisky, R. T., Weber, P. C., and Weller, S. K. (1996) Herpes simplex virus type 1 alkaline nuclease is required for efficient processing of viral DNA replication intermediates, J. Virol.
70, 2075-2085. 142. Thomas, M. S., Gao, M., Knipe, D. M., and Powell, K. L. (1992)
Association between the herpes simplex virus major DNA-binding protein and alkaline nuclease, J. Virol. 66, 1152-1161.
143. Reuven, N. B., Staire, A. E., Myers, R. S., and Weller, S. K. (2003) The herpes simplex virus type 1 alkaline nuclease and single-stranded DNA binding protein mediate strand exchange in vitro, J.
Virol. 77, 7425-7433. 144. Glaunsinger, B., Chavez, L., and Ganem, D. (2005) The exonuclease
and host shutoff functions of the SOX protein of Kaposi's sarcoma-associated herpesvirus are genetically separable, J. Virol. 79, 7396-7401.
145. Rowe, M., Glaunsinger, B., van Leeuwen, D., Zuo, J., Sweetman, D., Ganem, D., Middeldorp, J., Wiertz, E. J., and Ressing, M. E. (2007) Host shutoff during productive Epstein-Barr virus infection is mediated by BGLF5 and may contribute to immune evasion, Proc.
Natl. Acad. Sci. U. S. A. 104, 3366-3371. 146. Nordlund, P., and Reichard, P. (2006) Ribonucleotide reductases,
Annu. Rev. Biochem. 75, 681-706. 147. Cohen, G. H. (1972) Ribonucleotide reductase activity of
synchronized KB cells infected with herpes simplex virus, J. Virol.
9, 408-418. 148. Daikoku, T., Yamamoto, N., Maeno, K., and Nishiyama, Y. (1991)
Role of viral ribonucleotide reductase in the increase of dTTP pool size in herpes simplex virus-infected Vero cells, J. Gen. Virol. 72 ( Pt 6), 1441-1444.
149. Langelier, Y., Bergeron, S., Chabaud, S., Lippens, J., Guilbault, C., Sasseville, A. M., Denis, S., Mosser, D. D., and Massie, B. (2002) The R1 subunit of herpes simplex virus ribonucleotide reductase protects cells against apoptosis at, or upstream of, caspase-8 activation, J. Gen. Virol. 83, 2779-2789.
150. Sekino, Y., Bruner, S. D., and Verdine, G. L. (2000) Selective inhibition of herpes simplex virus type-1 uracil-DNA glycosylase by designed substrate analogs, J. Biol. Chem. 275, 36506-36508.
151. Mullaney, J., Moss, H. W., and McGeoch, D. J. (1989) Gene UL2 of herpes simplex virus type 1 encodes a uracil-DNA glycosylase, J.
Gen. Virol. 70 ( Pt 2), 449-454. 152. Pyles, R. B., Sawtell, N. M., and Thompson, R. L. (1992) Herpes
simplex virus type 1 dUTPase mutants are attenuated for neurovirulence, neuroinvasiveness, and reactivation from latency, J.
Virol. 66, 6706-6713.
74
153. Preston, V. G., and Fisher, F. B. (1984) Identification of the herpes simplex virus type 1 gene encoding the dUTPase, Virology 138, 58-68.
154. Aranda, M., and Maule, A. (1998) Virus-induced host gene shutoff in animals and plants, Virology 243, 261-267.
155. Oroskar, A. A., and Read, G. S. (1989) Control of mRNA stability by the virion host shutoff function of herpes simplex virus, J. Virol.
63, 1897-1906. 156. Strom, T., and Frenkel, N. (1987) Effects of herpes simplex virus on
mRNA stability, J. Virol. 61, 2198-2207. 157. Smiley, J. R. (2004) Herpes simplex virus virion host shutoff
protein: immune evasion mediated by a viral RNase?, J. Virol. 78,
1063-1068. 158. Everly, D. N., Jr., Feng, P., Mian, I. S., and Read, G. S. (2002)
mRNA degradation by the virion host shutoff (Vhs) protein of herpes simplex virus: genetic and biochemical evidence that Vhs is a nuclease, J. Virol. 76, 8560-8571.
159. Hardwicke, M. A., and Sandri-Goldin, R. M. (1994) The herpes simplex virus regulatory protein ICP27 contributes to the decrease in cellular mRNA levels during infection, J. Virol. 68, 4797-4810.
160. Corcoran, J. A., Hsu, W. L., and Smiley, J. R. (2006) Herpes simplex virus ICP27 is required for virus-induced stabilization of the ARE-containing IEX-1 mRNA encoded by the human IER3 gene, J.
Virol. 80, 9720-9729. 161. Glaunsinger, B. A., and Ganem, D. E. (2006) Messenger RNA
turnover and its regulation in herpesviral infection, Adv. Virus Res.
66, 337-394. 162. Jenner, R. G., Alba, M. M., Boshoff, C., and Kellam, P. (2001)
Kaposi's sarcoma-associated herpesvirus latent and lytic gene expression as revealed by DNA arrays, J. Virol. 75, 891-902.
163. Hoffmann, P. J., and Cheng, Y. C. (1978) The deoxyribonuclease induced after infection of KB cells by herpes simplex virus type 1 or type 2. I. Purification and characterization of the enzyme, J. Biol.
Chem. 253, 3557-3562. 164. Hoffmann, P. J., and Cheng, Y. C. (1979) DNase induced after
infection of KB cells by herpes simplex virus type 1 or type 2. II. Characterization of an associated endonuclease activity, J. Virol. 32, 449-457.
165. Bronstein, J. C., and Weber, P. C. (1996) Purification and characterization of herpes simplex virus type 1 alkaline exonuclease expressed in Escherichia coli, J. Virol. 70, 2008-2013.
166. Stolzenberg, M. C., and Ooka, T. (1990) Purification and properties of Epstein-Barr virus DNase expressed in Escherichia coli, J. Virol.
64, 96-104.
75
167. Shao, L., Rapp, L. M., and Weller, S. K. (1993) Herpes simplex virus 1 alkaline nuclease is required for efficient egress of capsids from the nucleus, Virology 196, 146-162.
168. Reuven, N. B., and Weller, S. K. (2005) Herpes simplex virus type 1 single-strand DNA binding protein ICP8 enhances the nuclease activity of the UL12 alkaline nuclease by increasing its processivity, J. Virol. 79, 9356-9358.
169. Pingoud, A., Fuxreiter, M., Pingoud, V., and Wende, W. (2005) Type II restriction endonucleases: structure and mechanism, Cell.
Mol. Life Sci. 62, 685-707. 170. Bujnicki, J. M., and Rychlewski, L. (2001) The herpesvirus alkaline
exonuclease belongs to the restriction endonuclease PD-(D/E)XK superfamily: insight from molecular modeling and phylogenetic analysis, Virus Genes 22, 219-230.
171. Kosinski, J., Feder, M., and Bujnicki, J. M. (2005) The PD-(D/E)XK superfamily revisited: identification of new members among proteins involved in DNA metabolism and functional predictions for domains of (hitherto) unknown function, BMC Bioinformatics 6, 172.
172. Pingoud, A., and Jeltsch, A. (2001) Structure and function of type II restriction endonucleases, Nucleic Acids Res 29, 3705-3727.
173. Goldstein, J. N., and Weller, S. K. (1998) The exonuclease activity of HSV-1 UL12 is required for in vivo function, Virology 244, 442-457.
174. Liu, M. T., Hsu, T. Y., Lin, S. F., Seow, S. V., Liu, M. Y., Chen, J. Y., and Yang, C. S. (1998) Distinct regions of EBV DNase are required for nuclease and DNA binding activities, Virology 242, 6-13.
175. Zuo, J., Thomas, W., van Leeuwen, D., Middeldorp, J. M., Wiertz, E. J., Ressing, M. E., and Rowe, M. (2008) The DNase of gammaherpesviruses impairs recognition by virus-specific CD8+ T cells through an additional host shutoff function, J. Virol. 82, 2385-2393.
176. Kovall, R., and Matthews, B. W. (1997) Toroidal structure of lambda-exonuclease, Science 277, 1824-1827.
177. Eklund, H., Uhlin, U., Farnegardh, M., Logan, D. T., and Nordlund, P. (2001) Structure and function of the radical enzyme ribonucleotide reductase, Prog. Biophys. Mol. Biol. 77, 177-268.
178. Wnuk, S. F., and Robins, M. J. (2006) Ribonucleotide reductase inhibitors as anti-herpes agents, Antiviral Res. 71, 122-126.
179. Liuzzi, M., Deziel, R., Moss, N., Beaulieu, P., Bonneau, A. M., Bousquet, C., Chafouleas, J. G., Garneau, M., Jaramillo, J., Krogsrud, R. L., Lagacé, L., McCollum, R.S., Nawoot, S. and Guindon, Y. (1994) A potent peptidomimetic inhibitor of HSV
76
ribonucleotide reductase with antiviral activity in vivo, Nature 372, 695-698.
180. Crute, J. J., Grygon, C. A., Hargrave, K. D., Simoneau, B., Faucher, A. M., Bolger, G., Kibler, P., Liuzzi, M., and Cordingley, M. G. (2002) Herpes simplex virus helicase-primase inhibitors are active in animal models of human disease, Nat. Med. 8, 386-391.
181. Cerqueira, N. M., Pereira, S., Fernandes, P. A., and Ramos, M. J. (2005) Overview of ribonucleotide reductase inhibitors: an appealing target in anti-tumour therapy, Curr. Med. Chem. 12, 1283-1294.
182. Wang, X., Zhenchuk, A., Wiman, K. G., and Albertioni, F. (2008) Regulation of p53R2 and its role as potential target for cancer therapy, Cancer Lett.
183. Tanaka, H., Arakawa, H., Yamaguchi, T., Shiraishi, K., Fukuda, S., Matsui, K., Takei, Y., and Nakamura, Y. (2000) A ribonucleotide reductase gene involved in a p53-dependent cell-cycle checkpoint for DNA damage, Nature 404, 42-49.
184. Nakano, K., Balint, E., Ashcroft, M., and Vousden, K. H. (2000) A ribonucleotide reductase gene is a transcriptional target of p53 and p73, Oncogene 19, 4283-4289.
185. Koshland, D. E., Jr. (1993) Molecule of the year, Science 262, 1953. 186. Guittet, O., Hakansson, P., Voevodskaya, N., Fridd, S., Graslund,
A., Arakawa, H., Nakamura, Y., and Thelander, L. (2001) Mammalian p53R2 protein forms an active ribonucleotide reductase in vitro with the R1 protein, which is expressed both in resting cells in response to DNA damage and in proliferating cells, J. Biol. Chem.
276, 40647-40651. 187. Hakansson, P., Hofer, A., and Thelander, L. (2006) Regulation of
mammalian ribonucleotide reduction and dNTP pools after DNA damage and in resting cells, J. Biol. Chem. 281, 7834-7841.
188. Bourdon, A., Minai, L., Serre, V., Jais, J.P., Sarzi, E., Aubert, S., Chrétien, D., de Lonlay, P., Paquis-Flucklinger, V., Arakawa, H., Nakamura, Y., Munnich,. A and Rötig, A. (2007) Mutation of RRM2B, encoding p53-controlled ribonucleotide reductase (p53R2), causes severe mitochondrial DNA depletion. Nat. Genet. 39:703-704.