doi:10.1182/blood-2007-01-068759Prepublished online May 16, 2007;
Alessandra Recchia and Fulvio MavilioManfred Schmidt, Christof von Kalle, Steve Howe, Adrian J. Thrasher, Alessandro Aiuti, Giuliana Ferrari, Claudia Cattoglio, Giulia Facchini, Daniela Sartori, Antonella Antonelli, Annarita Miccio, Barbara Cassani, Hot spots of retroviral integration in human CD34+ hematopoietic cells
(795 articles)Oncogenes and Tumor Suppressors � (4217 articles)Neoplasia �
(3131 articles)Hematopoiesis and Stem Cells � (523 articles)Gene Therapy �
(1725 articles)Free Research Articles �Articles on similar topics can be found in the following Blood collections
http://bloodjournal.hematologylibrary.org/site/misc/rights.xhtml#repub_requestsInformation about reproducing this article in parts or in its entirety may be found online at:
http://bloodjournal.hematologylibrary.org/site/misc/rights.xhtml#reprintsInformation about ordering reprints may be found online at:
http://bloodjournal.hematologylibrary.org/site/subscriptions/index.xhtmlInformation about subscriptions and ASH membership may be found online at:
digital object identifier (DOIs) and date of initial publication. theindexed by PubMed from initial publication. Citations to Advance online articles must include
final publication). Advance online articles are citable and establish publication priority; they areappeared in the paper journal (edited, typeset versions may be posted when available prior to Advance online articles have been peer reviewed and accepted for publication but have not yet
Copyright 2011 by The American Society of Hematology; all rights reserved.20036.the American Society of Hematology, 2021 L St, NW, Suite 900, Washington DC Blood (print ISSN 0006-4971, online ISSN 1528-0020), is published weekly by
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Hot spots of retroviral integration
in human CD34+ hematopoietic cells
Claudia Cattoglio, Giulia Facchini, Daniela Sartori, Antonella Antonelli, Annarita Miccio,
Barbara Cassani, Manfred Schmidt, Christof von Kalle, Steve Howe, Adrian J. Thrasher,
Alessandro Aiuti, Giuliana Ferrari, Alessandra Recchia and Fulvio Mavilio.
From the IIT Unit of Molecular Neuroscience, Istituto Scientifico H. San Raffaele, Milan,
Italy; Department of Biomedical Sciences, University of Modena and Reggio Emilia,
Modena, Italy; San Raffaele Telethon Institute for Gene Therapy, Milan, Italy; Vita-Salute
San Raffaele University, Milan, Italy; National Center for Tumor Diseases, Heidelberg,
Germany; Molecular Immunology Unit, Institute of Child Health, London, United Kingdom.
Running title: Retroviral integration hot spots
Supported by grants from Telethon Italy (GGP06101 and TIGET), the European Commission
(VI FP, CONSERT) and Fondazione Cariplo.
Corresponding author: Fulvio Mavilio Department of Biomedical Sciences University of Modena and Reggio Emilia Via Campi 287, 41100 Modena, Italy Phone: +39-059-2055392 Fax: +39-059-2055410 e-mail: [email protected]
C. Cattoglio, A. Recchia and F. Mavilio designed research and wrote the paper;
C. Cattoglio, G. Facchini, D. Sartori, A. Antonelli, A. Miccio, S. Howe, B. Cassani performed
research and analyzed data; M. Schmidt, C. von Kalle, A. Thrasher, A. Aiuti and G. Ferrari
contributed vital reagents and data sets.
Blood First Edition Paper, prepublished online May 16, 2007; DOI 10.1182/blood-2007-01-068759
Copyright © 2007 American Society of Hematology
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
ABSTRACT
Insertional oncogenesis is a possible consequence of the integration of gamma-retroviral
(RV) or lentiviral (LV) vectors into the human genome. RV common insertion sites
(CIS) have been identified in hematopoietic malignancies and in the non-malignant
progeny of transduced hematopoietic stem/progenitor cells (HSCs), possibly as a
consequence of clonal selection in vivo. We have mapped a large number of RV and LV
integrations in human CD34+ HSCs, transduced in vitro and analyzed without selection.
Recurrent insertion sites (hot spots) account for >21% of the RV integration events,
while they are significantly less frequent in the case of LV vectors. RV but not LV hot
spots are highly enriched in proto-oncogenes, cancer-associated CIS, and growth-
controlling genes, indicating that at least part of the biases observed in the HSC progeny
in vivo are characteristics of RV integration, already present in non-transplanted cells.
Genes involved in hematopoietic and immune system development are targeted at high
frequency and enriched in hot spots, suggesting that the CD34+ gene expression
program is instrumental in directing RV integration. The lower propensity of LV
vectors for integrating in potentially dangerous regions of the human genome may be a
factor determining a better safety profile for gene therapy applications.
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Introduction
Gene therapy of genetic blood disorders requires stable genetic modification of hematopoietic
stem cells. Gene transfer vectors derived from murine gamma-retroviruses, such as the
Moloney murine leukemia virus (MLV), have been used for more than a decade to transduce
human bone marrow-derived or mobilized hematopoietic stem/progenitor cells (HSCs) in a
clinical context. Retroviral vector-mediated gene transfer has recently achieved therapeutic
efficacy, allowing correction of life-threatening diseases such as severe combined
immunodeficiencies (SCID)1-3 or chronic granulomatous disease (CGD)4. MLV-derived
vectors, however, have also raised significant safety concerns for the genotoxic risk
potentially associated with their uncontrolled integration into the human genome5-7. Indeed,
insertional activation of a T-cell proto-oncogene has been correlated with the occurrence of
lymphoproliferative disorders in three patients treated with retrovirally-transduced
hematopoietic cells for X-linked SCID (X-SCID)1. Recent studies have shown that gamma-
retroviral vectors integrate preferentially within transcribed genes and around promoters and
CpG islands8, where insertion of the viral long terminal repeat (LTR) transcriptional enhancer
has a high probability to interfere with gene regulation9. Nevertheless, no adverse event
related to viral insertion was reported in other clinical trials for X-SCID3, adenosine
deaminase-deficient SCID (ADA-SCID)2, CGD4 or graft-versus-host disease9, suggesting the
existence of specific risk factors that are incompletely understood10.
Analysis of MLV integration patterns in natural or experimentally induced
leukemias/lymphomas showed the existence of insertion sites recurrently associated with a
malignant phenotype. These “common insertion sites” (CIS) include proto-oncogenes or other
genes associated with cell growth and proliferation, the activation or deregulation of which
has a causal relationship with the establishment and/or progression of neoplasia11. Some of
these sites, such as the EVI1-MDS1 locus, have been identified at relatively high frequency
also in the non-malignant progeny of transduced hematopoietic cells in mice12 non-human
primates13 and humans4, indicating that insertion into certain genes may cause clonal
amplification of transduced progenitors in vivo. From these studies, however, it is not clear
whether clonal dominance is entirely the result of in vivo selection, or is favored by the
existence of highly preferred regions of retroviral integration that make clonal amplification
more likely to occur. This issue is highly relevant in understanding the different outcomes of
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
different gene therapy clinical trials, in assessing the relative safety of using MLV-derived
vectors in specific clinical applications, and in comparing the safety profile of alternative
vectors (e.g., HIV-derived lentiviral vectors) or vector designs.
We report an analysis of gamma-retroviral (RV) and lentiviral (LV) vector integration
hot spots from large collections of integration sites obtained from human cord blood- and
bone marrow-derived CD34+ HSCs transduced in vitro and analyzed without selection. Hot
spots account for >20% of the MLV integration sites, while they are significantly less
frequent in the case of HIV-derived vectors. Integration sites associated with clonal
dominance and neoplasia in both mice and humans, including LMO2, are hot spots of
gamma-retroviral but not lentiviral integration in human hematopoietic cells.
Materials and Methods
Retroviral vectors
Cord blood (CB)-derived CD34+ cells were transduced with the previously described LGS∆N
and LGS∆N-∆CAAT RV vectors14, driving the expression of GFP under an intact or a U3-
deleted MLV LTR, and of ∆LNGFR under an internal SV40 promoter. CB-derived CD34+
cells were also transduced with the self-inactivating (SIN) pRRLsin-18.pptCMV-GFPwpre
LV vector15, containing a U3-deleted HIV-1 LTR and a CMV-driven GFP cassette16, or with
the pHR2pptCMV-GFPwpre or the pHR2pptGS∆N LV vectors, retaining HIV-1 wild-type
LTRs and driving the expression of GFP or ∆LNGFR under internal CMV or SV40
promoters. To generate the pHR2pptCMVGFPwpre construct, a pptCMVGFPwpre fragment
from the pRRLsin-18.pptCMVGFPwpre vector was cloned into ClaI-EcoRI sites of
pHR2MD-NGFR17. To obtain the pHR2pptGS∆N LV construct, the pHR2pptCMVGFPwpre
vector was digested with BamHI/EcoRI and ligated to a GFP-SV40∆LNGFR cassette. Bone
marrow (BM)-derived CD34+ cells were transduced with previously described RV vectors
expressing either the ADA (GIADA12) or the γc receptor3 cDNA.
RV vector supernatants were produced by transient transfection of the amphotropic
Phoenix packaging cell line. Infectious particle titer was determined on K562 cells. VSV-G
pseudotyped LV particles were prepared by transient co-transfection of 293T cells, collected
and concentrated as already described17, and titrated on 293T cells. Transduction efficiency
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
was evaluated by flow cytometry. Amphotropic or GaLV-pseudotyped ADA and γc receptor
RV vectors were titered as previously described2,3.
Transduction of human CD34+ cells
CD34+ HSCs were purified from cord blood Ficoll fractions by magnetic sorting
(MiniMACS; Milthenyi, Auburn, CA) and pre-stimulated for 24-48 hours in serum-free
Iscove’s modified Dulbecco’s medium (IMDM) supplemented with 20% BIT (Stem Cell
Technologies; Vancouver, BC), 20 ng/ml human thrombopoietin, 100 ng/ml Flt-3 ligand
(PeproTech; Rocky Hill, NJ), 20 ng/ml interleukin-6, and 100 ng/ml stem cell factor (R&D
Systems Minneapolis, MN). RV transduction was performed by spinoculation (3 rounds at
1,500 rpm for 45 min) in the presence of 4 µg/ml polybrene. LV transduction was performed
by overnight incubation at a MOI of 200 in the presence of 4 µg/ml polybrene. Transduction
efficiency was evaluated by analysis of EGFP and/or ∆LNGFR expression by flow cytometry
using a mouse anti-human NGFR antibody (Becton Dickinson).
BM- or peripheral blood-derived CD34+ cells were purified from normal donors or
SCID patients by magnetic sorting, pre-stimulated for 24 hours in IMDM containing human
serum, or serum-free X-Vivo10 medium, and a cytokine cocktail (FLT3-ligand, SCF, TPO,
IL-3), and transduced by three cycle-exposure to the GIADA1 or the γc receptor RV vector
supernatant as previously described2,3.
Cloning and analysis of retroviral insertion sites
Integration sites were cloned by linker-mediated PCR (LM-PCR) or linear amplification-
mediated PCR (LAM-PCR), as described18,19. Briefly, genomic DNA was extracted from 0.5-
5 x 106 infected cells and digested with MseI and a second enzyme to prevent amplification of
internal 5’ LTR fragments (PstI for RV vectors and SacI/NarI for LV vectors). An MseI
double-stranded linker was then ligated and LM-PCR performed with nested primers specific
for the linker and the 3’ LTR (MLV: 5’- GACTTGTGGTCTCGCTGTTCCTTGG-3’ and 5’-
GGTCTCCTCTGAGTGATTGACTACC-3’; HIV: 5’- AGTGCTTCAAGTAGTGTGTGCC-
3’ and 5’- GTCTGTTGTGTGACTCTGGTAAC-3’). PCR products were shotgun-cloned
(TOPO TA cloning kit, Invitrogen; Carlsbad, CA) into libraries of integration junctions,
which were then sequenced to saturation. A valid integration contained the MLV or HIV
nested primer, the entire MLV or HIV genome up to a CA dinucleotide and the linker nested
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
primer. Sequences between the 3’ LTR and the linker primers were mapped onto the human
genome by the BLAT genome browser (UCSC Human Genome Project Working Draft, May
2004). Random genomic sequences originated by LM-PCR (genomic MseI-MseI, PstI-MseI,
NarI-MseI or SacI-MseI fragments) were used as controls. Sequences featuring a unique best
hit with ≥95% identity to the human genome were considered genuine integration sites, and
classified as intergenic when occurring at an arbitrarily chosen distance of >30 kb from any
Known Gene (UCSC definition), perigenic when ≤30 kb upstream or downstream of a Known
Gene, and intragenic when within the transcribed portion of at least one Known Gene. In case
of multiple transcript variants, the most represented and/or the longest isoform was chosen.
Gene density analysis was performed using the Table Browser tool of the UCSC BLAT
genome browser. For each integration, the number of Known Genes (a single isoform in case
of multiple variants) contained in a range of 1 Megabase (Mb) around the insertion site was
calculated. For all pairwise comparisons, we applied a two-sample test for equality of
proportions with continuity correction (Rweb 1.03).
A genomic region was defined as an “hot spot” for retroviral integration according to
criteria developed for defining cancer-related common insertion sites (CIS), with minor
modifications11,20. Cutoff values were set at 36 kb for 2 insertions, 56 kb for 3 insertions and
104 kb for 4 or more insertions.
Gene expression profiling
The expression profile of CD34+ cells was determined by microarray analysis. RNA was
isolated from 1-2 x 106 CB- and BM-derived CD34+ cells stimulated with cytokines according
to the same protocols used for RV (CB- and BM-derived cells) or LV (CB-derived cells)
vector transduction, transcribed into biotinylated cRNA, hybridized to Affymetrix HG-
U133A Gene Chip arrays and analyzed as previously described9. To correlate retroviral
integration and gene activity, expression values from the CD34+ cell microarrays were
divided into four classes, i.e. absent, low (below the 25th percentile in a normalized
distribution), intermediate (between the 25th and the 75th percentile) and high (above the 75th
percentile).
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Functional clustering analysis
Functional cluster analysis of genes targeted by retroviral integrations and from control
sequences was performed using the DAVID 2.1 Functional Annotation Tool21,22
(http://david.abcc.ncifcrf.gov). In the DAVID annotation system, a Fisher exact test corrected
for multiple comparisons (DAVID’s EASE score21) is adopted to measure the level of gene-
enrichment in Gene Ontology (GO) annotation terms with respect to a background
population, and GO categories considered over-represented when yielding an EASE score
<0.05. A list of 417 cancer-associated CIS was obtained from the Mouse Retrovirus Tagged
Cancer Gene Database (http://rtcgd.ncifcrf.gov/), where murine genes were replaced with
human homologs. Genes were analyzed also by the network-based Ingenuity Pathways
Analysis tool (Ingenuity® Systems, www.ingenuity.com). Gene identifiers were uploaded
into the application, and mapped to their corresponding Focus Gene in the Ingenuity
Pathways Knowledge Base. Networks were algorithmically generated based on the direct or
indirect interaction between Focus Genes. The Functional Analysis of each network identified
the biological functions and/or diseases that were most significant to the genes in the network.
(Fischer’s exact test). A list of 596 human proto-oncogenes was compiled from the UNSW
Embryology DNA-Tumor Suppressor and Oncogene Database
(http://embryology.med.unsw.edu.au) and the Tumor Gene Database (http://www.tumor-
gene.org).
Results
Retroviral integration preferences in human CD34+ HSCs
Human CD34+ HSCs were purified from umbilical cord blood (CB) pools, bone marrow
(BM) from ADA-SCID and X-SCID patients, or peripheral blood (PB) from a normal donor.
CB CD34+ cells were transduced with MLV-derived gamma-retroviral (RV) or HIV-derived
lentiviral (LV) vectors carrying a GFP reporter gene and either a wild-type or a U3-deleted
(SIN) LTR. BM CD34+ cells were transduced with MLV-derived retroviral vectors
expressing either ADA2 or γc receptor3 from a wild-type LTR. PB CD34+ cells were
transduced with the vector expressing γc receptor3. Transduction efficiency ranged from 15%
(SIN-RV) to >90% (SIN-LV) depending on the vector and target cell type, and remained
stable throughout the culture period. DNA was obtained 1 to 12 days after infection, from
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
cells that underwent 1 (all BM and PB samples) to 5-6 (all CB samples) cell doublings in
culture. Vector-genome junctions were cloned and sequenced by a linker-mediated (LM) or
linear amplification-mediated (LAM) PCR approach adapted to the different vector types, and
mapped onto the human genome. Cumulatively, we mapped 1,030 RV and 849 LV
integrations in CB- or BM-derived CD34+ cells. 595 RV integrations were obtained from CB
cells transduced with wild-type (395) or SIN (200) LTR vectors expressing ∆LNGFR from an
internal promoter, and 435 from BM cells transduced with wild-type LTR vectors expressing
ADA (190) or γc receptor (245). All LV integrations were obtained from CB cells transduced
with wild-type (404) or SIN (445) LTR vectors expressing GFP or ∆LNGFR from an internal
promoter.
Among RV integrations, 172 (16.7%) were in an intergenic position, 566 (55.0%)
within the transcribed portion of at least one gene, and 292 (28.3%) at a distance of ≤30 kb
upstream or downstream of one or more genes (Table 1; the complete list of sequences is
available at GeneBank with the accession number XXXX). Among LV integrations, 148
(17.4%) were in intergenic, 609 (71.7%) in intragenic, and 92 (10.9%) in perigenic position.
Conversely, a collection of 798 control sequences randomly cloned by LM-PCR contained
369 (46.2%) intergenic, 308 (38.6%) intragenic and 121 (15.2%) perigenic sequences.
Compared to controls, RV vectors showed a preference for intragenic (two-sample test for
equality of proportions with continuity correction, p<10-11) and perigenic (p<10-10)
integration, while LV vectors showed a much higher preference for intragenic positions
(p<10-15).
The position of the integrated proviruses with respect to Known Genes is shown in
Figure 1, which considers the total number of vector-gene interactions in an interval of 30 kb
around each insertion site (1,517 and 1241 for RV and LV vectors respectively). Compared to
randomly cloned or computer-generated23 control sequences, a significant clustering around
transcription start sites was observed for RV but not LV vectors. Overall, 29.3% of the total
RV vector-gene interactions were within ±10 kb from the +1 position of Known Genes,
compared to 16.1% for LV vectors (p<10-15) (Table 1, Figure 1). The RV general integration
preferences were similar in CD34+ and HeLa cells, as indicated by the analysis of 869
insertions from a previously published collection24 (Table 1).
In CD34+ cells, RV integrations showed a significant preference for gene-dense
regions: >60% of proviruses were found in genomic regions containing 6-20 genes per Mb,
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
with a peak of 35% at a density of 6-10 genes/Mb, while >60% of control sequences mapped
to regions with a gene density of <5 genes/Mb (p<10-4) (Figure 2A). On the contrary, LV
integrations followed a distribution within regions of different gene density more similar to
that of the control sequences and of the human genome, and different from that of RV (p<10-
4) (Figure 2B).
To correlate vector integration with gene activity, we determined the expression
profile of >16,000 genes by microarray analysis in CB- and BM-derived CD34+ cells
activated in culture in the same conditions used for RV and LV transduction. As shown in
Figure 3, ~60% of 1,571 probesets representing 866 genes hit by a RV vector detected a
transcript in activated CD34+ cells, and among them 13% were classified as lowly abundant,
30% as intermediately abundant and 17% as highly abundant, compared to a 45-47%
“present” call on the whole microarrays (percentages were slightly different between CB- and
BM-derived cells) and a 11-12%, 23% and 11-12% breakdown in the three abundance
classes. With the exception of the lowest expression class, all differences were statistically
significant (10-15<p<10-7), indicating that RV vectors integrate preferentially into genes active
in CD34+ cells at the time of transduction, and particularly in the fraction of genes expressed
at higher level. A similar correlation with gene activity was observed, as already reported in T
cells25, for genes hit by LV vectors: ~56% of 1,346 probesets representing 757 hit genes
detected a transcript in activated CD34+ cells, with a 13%, 31% and 12% breakdown in the
three abundance classes. Compared to the whole microarray, the fraction of probesets with a
present call was significantly higher (56 vs. 46%, p<10-12), but the difference was accounted
for essentially by the intermediately abundant transcripts (31 vs. 23%, p<10-10) (Figure 3),
indicating that LV vectors tend to integrate into active genes in CD34+ cells but have no
specific preference for genes expressed at high levels when compared to RV vectors (p<10-4).
Genes regulating cell growth and proliferation are preferred targets of retroviral
integration
A functional classification by the Gene Ontology (GO) criteria26 of genes hit by RV and LV
vectors in CD34+ cells (Supplementary Tables 1 and 2) showed statistically significant biases
towards several gene categories (Figure 4A). In particular, genes involved in the
establishment and/or maintenance of chromatin architecture, signal transduction and cell cycle
were significantly more represented in the collection of genes hit by RV integrations
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
compared to their expected frequency in the human genome (EASE score <0.005). Genes
involved in chromatin remodeling and phosphorylation were hit at a higher than expected
frequency also by LV vectors (EASE score <0.0005 and <0.005 respectively), particularly
those with serine/threonine kinase and GTPase activity (EASE score <0.0005). Two
additional categories, i.e., transcription and apoptosis, were over-represented in genes hit by
RV and/or LV vectors, although at less significant levels (EASE score <0.05). A different
analysis, carried out by the Ingenuity® network-based pathway analysis software, indicated
that genes involved in cell signaling, cell growth/proliferation, cell death, cancer and
hematopoietic system development were significantly over-represented in the collection of
RV and/or LV integrations with respect to genes annotated in the software Pathways
Knowledge Base (0.005<p<0.05). These categories were therefore chosen to carry out a direct
frequency comparison between RV and LV target genes and our control gene list
(Supplementary Tables 1-3). Genes involved in cell signaling, growth/proliferation and death
were over-represented in both RV and LV integrations with respect to control sequences (10-
9<p<10-2, Figure 4B), while genes involved in hematopoietic and immune system
development, immune response and cancer were significantly over-represented only in RV
integrations (10-8<p<10-4, Figure 4B). The comparison was then extended to genes
specifically annotated in cancer-related databases (see methods for definitions and data
source). RV integrations hit 77 proto-oncogenes and 64 cancer-associated CIS, corresponding
to 7.5% and 6.2% respectively of the 1,030 integrations (Figure 5). Both categories were
significantly over-represented (p<10-3 and 10-4 respectively) compared to control sequences
(27 proto-oncogenes and 17 CIS out of 798 sequences). On the contrary, LV integrations hit
49 proto-oncogenes and 32 CIS out of 849 integrations (Figure 5), a borderline significant
difference compared to controls (p=0.03 and 0.07 respectively). Interestingly, HeLa cell
integrations show over-representation of proto-oncogenes but not CIS (not shown). This
finding is not surprising considering that CIS have been mostly defined in hematopoietic
malignancies.
Overall, these analyses show that both RV and LV vectors have a general tendency to
integrate into genes involved in the regulation of cell growth and proliferation, and that RV
integrations have a specific bias for genes associated with oncogenic transformation. An
Ingenuity network analysis confirmed these biases and showed, in addition, that a significant
number of genes hit by RV integrations are functionally linked in gene networks involved in
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
apoptosis (Supplementary Table 4; Figure 6A), signal transduction, transcriptional regulation
and cancer (Supplementary Table 4; Figure 6B).
RV but not LV vectors show a high frequency of integration hot spots
The RV and LV insertion site collections were analyzed for the presence of integrations at
recurrent sites (hot spots), using essentially the same criteria previously applied to the
definition of cancer-associated CIS (at least two independent insertions in <30 kb, three in
<50 kb and four in <100 kb11,20). Overall, 219 out of 1,030 (21.3%) RV insertion sites met
these criteria, identifying 97 hot spots in the genome of CD34+ cells (Supplementary Table 5).
109 out of 869 integrations (12.5%) met the same criteria in HeLa cells, defining 52 hot spots
(not shown). LV vectors showed a significantly lower propensity to integrate at recurrent
sites, with only 70 out of 849 (8.2%) integrations meeting the definition criteria, and
identifying 33 hot spots (Supplementary Table 5). Comparing the three collections, 1 hot spot
appeared to be a recurrent site for both RV (4 hits) and LV (3 hits) integration (Chr. 17q23.2:
55188652-55285672), while 3 hot spots were found in common between CD34+ and HeLa
cells (not shown). It is worth noting that 22 out of 798 (2.8%) control sequences also met the
hot spot definition criteria (Supplementary Table 5), defining a background level of false
positivity in the LM-PCR analysis. The different sub-groups of RV integrations contributed to
the hot spot list proportionally to their size, with no apparent bias related to the type of
transduced cell (CB, BM or PB), the vector used for transduction (wt-LTR or SIN-LTR), or
the number of cell doublings undergone in culture before harvesting (Supplementary Table 6).
In particular, non-expanded cell populations (BM- and PB-derived), which collectively
contributed less than half of the 1,030 total RV integrations, contributed with at least one
integration to 56 of the 97 (58%) RV hot spots (Supplementary Table 5).
The position of RV hot spot integrations with respects to Known Genes reflected the
RV general integration preferences, with intergenic, perigenic, and gene-dense regions over-
represented to the same extent observed in the entire collection of RV integrations, and
clustering around TSS only slightly decreased (p=0.015) (Table 1, Figures 1A and 2A). On
the contrary, LV hot spots showed a higher frequency of integration in intragenic (81.4 vs.
71.7%) and gene-dense (65.7 vs. 35.6% in the >11 genes/Mb range) regions (Table 1 and
Figure 2B). Similarly, RV hot spots occurred in the same proportion of expressed genes
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
observed for all RV integrations (Figure 3A), while LV hot spots contained a significantly
higher proportion of expressed genes (73.2 vs. 55.9% p=0.003) (Figure 3B).
Interestingly, the maximum distance between independent integrations defining a hot
spot was significantly lower for RV vectors compared to LV vectors and control sequences
with hot spot characteristics. Overall, 52% and 67% of the RV hot spots in CD34+ and HeLa
cells span less than 10 kb, including those containing three or four independent integrations,
compared to 36% and 27% for LV and control sequences respectively (Figure 7). One fourth
(26.0%) of the RV hot spots in CD34+ cells and almost one half (40.4%) of those in HeLa
cells contained two independent integrations in less than 2 kb, compared to only 3% of the
LV hot spots.
Proto-oncogenes and cancer-associated CIS are hot spots of RV but not LV integration
The list of RV integration hot spots in CD34+ cells includes proto-oncogenes (e.g., LYL1,
MYB), cancer-associated CIS (e.g., FLI1, EVI2A, EVI2B, NF1), and genes involved in
chromosomal translocations in hematopoietic malignancies (e.g., LMO2, MKL1, ETV6)
(Table 2), all of them occurring at frequencies significantly higher than expected (10-8<p<10-
4) and higher than in the overall list of RV integrations (Figure 5). Interestingly, non-
expanded cell populations contributed with at least one integration to 9 of the 17 (53%) hot
spots containing a proto-oncogene or a cancer-associated CIS (Table 2), again indicating the
absence of biases related to the number of cell doublings in culture. On the contrary, LV hot
spots showed little enrichment for proto-oncogenes or CIS, although in this case low numbers
make comparisons poorly significant (Figure 5). Furthermore, RV but not LV hot spots
included a very high proportion of genes belonging to the intracellular signaling cascade
category (25.3%), which were significantly over-represented using either the human genome
or the total RV integrations as a background population in a GO analysis (EASE score:
1.2x10-6 and 2.2x10-4, respectively), despite their relatively small number (i.e., 22).
Interestingly, genes involved in hematopoietic and immune system development and in
immune response by Ingenuity® pathway analysis were further and significantly enriched in
RV hot spots with respect to the entire list of RV integrations (p<10-2, Figure 4B).
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Discussion
Retroviral integration preferences have significant consequences on the potential genotoxicity
of different families of vectors used to transfer genes into HSCs. The probability of dominant
activation of potentially cancer-causing genes, e.g., those involved in the control of stem cell
self-renewal, growth and differentiation, may in fact differ significantly between RV and LV
vectors simply based on the different frequency by which they may target those genes. Here
we report a detailed analysis of the RV and LV integration preferences in human CB-, PB-
and BM-derived CD34+ HSCs transduced in the same conditions used in clinical applications
and analyzed without selection. The general integration preferences of the two vector families
were similar to those previously described for other mammalian hematopoietic or non-
hematopoietic cells (reviewed in Bushman, 20058), and showed on average a two-fold higher
probability for RV vectors to target gene-dense regions, highly active genes and promoter-
proximal regions. However, RV but not LV integration occurs at high frequency (>20%) at
genomic locations (hot spots) that are significantly enriched in proto-oncogenes and genes
involved in the control of cell proliferation.
A high frequency of hot spots, defined by a statistical criterion previously applied to
define cancer-associated CIS11,20, appears to be a hallmark of RV integration in human CD34+
HSCs. We found that more than one fifth of the RV integrations meet the definition criteria, a
frequency more than 7-fold higher than expected from the analysis of a randomly cloned
collection of human DNA sequences, and almost 3-fold higher than that found in a collection
of LV integrations of comparable size. The average extension of RV hot spots, i.e., the
maximum distance between all insertions within each spot, was well within the definition
criteria, and significantly smaller than that of LV hot spots, spanning less than 10 kb in half of
the cases and less than 2 kb in one fourth of the cases. RV integration appears therefore to
have high preference for restricted genomic locations, which may exhibit specific chromatin
conformations or features that favor tethering of the pre-integration complexes (PICs) with
higher probability. These features do not include gene density, proximity to promoters or gene
expression per se, since hot spots integrations show exactly the same preferences observed in
the entire collection of RV integrations. Interestingly, we observed that the frequency of hot
spots increased progressively during the study, following the increase of the sample size in an
almost linear fashion. This may suggest that by analyzing a much higher number of sequences
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
all RV integrations could be clustered in a defined subset of genomic regions, all having the
appropriate features recognized by the PICs. Unfortunately, the molecular bases of the
interactions between RV PICs and the mammalian chromatin are poorly understood, and it is
difficult to correlate our finding with any specific mechanism. The situation was completely
different in the case of LV hot spots, the frequency of which increased only slightly with the
increase in the sample size and appeared to plateau. More importantly, insertions in LV hot
spots showed strikingly different characteristics with respect to the general LV integration
preferences, and were greatly enriched in gene-dense regions and expressed genes. These data
suggest that LV integration may happen in a much wider portion of the HSC genome, and that
hot spots are generated at low frequency by locations that are more favorable than others to
PIC interaction, and are apparently those with a high density of expressed genes. This
explanation is consistent with the available evidence that LV PICs are tethered to the human
genome by widely distributed chromatin component loosely associated with gene activity,
such as chromatin-remodeling27 or DNA-repair28 complexes, HMG29 and Polycomb-group
proteins30, and LEDGF31,32.
Previous studies carried out in patients4 as well as in animal models12,13,33 have
indicated that integrations in cancer-associated CIS and growth-controlling genes are enriched
in the progeny of RV-transduced, repopulating HSCs. The major conclusion of these studies
was that certain viral insertions lead to clonal selection of stem/progenitor cells in vivo.
However, the pre-transplantation frequency of these insertion events was never accurately
measured in the relevant cell population. Our analysis indicates that a bias towards integration
into or around certain category of genes, i.e., those involved in signal transduction, cell cycle,
chromatin remodeling and transcription, is already present in non-transplanted hematopoietic
progenitors, and particularly in integration hot spots. In particular, proto-oncogenes and
cancer-associated CIS are enriched at three- to five-fold the expected frequency in RV hot
spots, indicating a specific preference for genomic locations containing these categories of
genes. These include proto-oncogenes expressed in CD34+ hematopoietic progenitors and
involved in hematopoietic cell neoplasia, such as LMO2 and EVI2-NF1, targeted at a
frequency of ~1:350, LYL1 and MYB targeted at a frequency of ~1:500, and others (see
Table 2). Importantly, there was no significant difference in the number of integrations
contributing to oncogene-containing hot spots between non-expanded (BM- and PB-derived)
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
or moderately-expanded (all CB-derived) cell populations, arguing against the likelyhood of
clonal outgrowth generated in culture by insertional activation of growth-promoting genes.
A network-based pathway analysis indicates that a significant number of genes
targeted by RV integration are functionally linked in transcription-, signal transduction-
apoptosis- and tumorigenesis-related networks. Interestingly, genes involved in hematopoietic
and immune system development are targeted at uniquely high frequency by RV integrations,
and further enriched in RV hot spots, suggesting that the gene expression program of a
cycling hematopoietic cell is at least in part instrumental in directing RV PICs in certain
regions of the genome. Consistent with this hypothesis, almost none of the genes present in
CD34+ cells hot spots are found in hot spots from HeLa cell, which most likely operate
different regulatory networks. Kustikova et al.33 reached similar conclusions in compiling
their “insertional dominance database” from the progeny of serially transplanted HSCs in
mice, although they explain the observed over-representation of certain gene categories and
functional networks with in vivo selection rather than with intrinsic properties of the RV
integration machinery. Indeed, 18 to 34% of the genes present in the mouse database,
depending on the stringency of the comparison, are present also in our list, arguing against an
exclusive role for in vivo selection in determining most of the frequency biases. A notable
exception is the EVI1-MDS1 locus, which we found only once in non-transplanted cells while
it was found at exceedingly high frequencies in vivo in mice12,33, non-human primates13 and,
at least in one case, humans4. Insertional activation of the EVI1-MDS1 locus should therefore
be considered a factor favoring clonal amplification and/or selection in vivo independently
from the frequency by which it is targeted by RV integration before transplantation. It should
be noted, however, that our data come from a population of hematopoietic progenitors in
which the proportion of repopulating stem cells is admittedly low, leaving the possibility that
stem cell-specific hot spots went undetected. Unfortunately, an integration analysis in pre-
transplantation, long-term repopulating stem cells is currently impossible, and it is therefore
difficult to come to definitive conclusions as to what proportion of the biases detected in the
stem cell progeny in vivo is due to vector preferences and what proportion to in vivo
selection. We favor a predominant role of vector-specific factors, also based on our
experience with ADA-SCID patients in whom pre-transplantation and post-transplantation
integration preferences showed essentially overlapping patterns (A.A., B.C., A.R., F.M. et al.,
manuscript in preparation).
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
In conclusion, this study shows previously unrecognized features of RV and LV
integration into human HSCs that may have an impact in assessing the prospective genotoxic
risk of using either vector system for human gene therapy applications. In particular, the
frequency and characteristics of integration hot spots may be substantial factors in
determining a differential safety profile for RV and LV vectors of comparable design and
content.
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
References
1. Hacein-Bey-Abina S, Le Deist F, Carlier F, et al. Sustained correction of X-linked
severe combined immunodeficiency by ex vivo gene therapy. N Engl J Med. 2002;346:1185-
1193.
2. Aiuti A, Slavin S, Aker M, et al. Correction of ADA-SCID by stem cell gene therapy
combined with nonmyeloablative conditioning. Science. 2002;296:2410-2413.
3. Gaspar HB, Parsley KL, Howe S, et al. Gene therapy of X-linked severe combined
immunodeficiency by use of a pseudotyped gammaretroviral vector. Lancet. 2004;364:2181-
2187.
4. Ott MG, Schmidt M, Schwarzwaelder K, et al. Correction of X-linked chronic
granulomatous disease by gene therapy, augmented by insertional activation of MDS1-EVI1,
PRDM16 or SETBP1. Nat Med. 2006;12:401-409.
5. Baum C, Dullmann J, Li Z, et al. Side effects of retroviral gene transfer into
hematopoietic stem cells. Blood. 2003;101:2099-2114.
6. Baum C, Kustikova O, Modlich U, Li Z, Fehse B. Mutagenesis and oncogenesis by
chromosomal insertion of gene transfer vectors. Hum Gene Ther. 2006;17:253-263.
7. Nienhuis AW, Dunbar CE, Sorrentino BP. Genotoxicity of retroviral integration in
hematopoietic cells. Mol Ther. 2006;13:1031-1049.
8. Bushman F, Lewinski M, Ciuffi A, et al. Genome-wide analysis of retroviral DNA
integration. Nat Rev Microbiol. 2005;3:848-858.
9. Recchia A, Bonini C, Magnani Z, et al. Retroviral vector integration deregulates gene
expression but has no consequence on the biology and function of transplanted T cells. Proc
Natl Acad Sci U S A. 2006;103:1457-1462.
10. Fischer A, Cavazzana-Calvo M. Integration of Retroviruses: A Fine Balance between
Efficiency and Danger. PLoS Med. 2005;2:e10.
11. Wu X, Luke BT, Burgess SM. Redefining the common insertion site. Virology.
2006;344:292-295.
12. Kustikova O, Fehse B, Modlich U, et al. Clonal dominance of hematopoietic stem
cells triggered by retroviral gene marking. Science. 2005;308:1171-1174.
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
13. Calmels B, Ferguson C, Laukkanen MO, et al. Recurrent retroviral vector integration
at the Mds1/Evi1 locus in nonhuman primate hematopoietic cells. Blood. 2005;106:2530-
2533.
14. Testa A, Lotti F, Cairns L, et al. Deletion of a negatively acting sequence in a chimeric
GATA-1 enhancer-long terminal repeat greatly increases retrovirally mediated erythroid
expression. J Biol Chem. 2004;279:10523-10531.
15. Follenzi A, Sabatino G, Lombardo A, Boccaccio C, Naldini L. Efficient gene delivery
and targeted expression to hepatocytes in vivo by improved lentiviral vectors. Hum Gene
Ther. 2002;13:243-260.
16. Zufferey R, Donello JE, Trono D, Hope TJ. Woodchuck hepatitis virus
posttranscriptional regulatory element enhances expression of transgenes delivered by
retroviral vectors. J Virol. 1999;73:2886-2892.
17. Dull T, Zufferey R, Kelly M, et al. A third-generation lentivirus vector with a
conditional packaging system. J Virol. 1998;72:8463-8471.
18. Schmidt M, Hoffmann G, Wissler M, et al. Detection and direct genomic sequencing
of multiple rare unknown flanking DNA in highly complex samples. Hum Gene Ther.
2001;12:743-749.
19. Schmidt M, Zickler P, Hoffmann G, et al. Polyclonal long-term repopulating stem cell
clones in a primate model. Blood. 2002;100:2737-2743.
20. Suzuki T, Shen H, Akagi K, et al. New genes involved in cancer identified by
retroviral tagging. Nat Genet. 2002;32:166-174.
21. Hosack DA, Dennis G, Jr., Sherman BT, Lane HC, Lempicki RA. Identifying
biological themes within lists of genes with EASE. Genome Biol. 2003;4:R70.
22. Dennis G, Jr., Sherman BT, Hosack DA, et al. DAVID: Database for Annotation,
Visualization, and Integrated Discovery. Genome Biol. 2003;4:P3.
23. Hematti P, Hong BK, Ferguson C, et al. Distinct genomic integration of MLV and SIV
vectors in primate hematopoietic stem and progenitor cells. PLoS Biol. 2004;2:e423.
24. Wu X, Li Y, Crise B, Burgess SM. Transcription start regions in the human genome
are favored targets for MLV integration. Science. 2003;300:1749-1751.
25. Schroder AR, Shinn P, Chen H, Berry C, Ecker JR, Bushman F. HIV-1 integration in
the human genome favors active genes and local hotspots. Cell. 2002;110:521-529.
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
26. Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of
biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25-29.
27. Kalpana GV, Marmon S, Wang W, Crabtree GR, Goff SP. Binding and stimulation of
HIV-1 integrase by a human homolog of yeast transcription factor SNF5. Science.
1994;266:2002-2006.
28. Mulder LC, Chakrabarti LA, Muesing MA. Interaction of HIV-1 integrase with DNA
repair protein hRad18. J Biol Chem. 2002;277:27489-27493.
29. Farnet CM, Bushman FD. HIV-1 cDNA integration: requirement of HMG I(Y) protein
for function of preintegration complexes in vitro. Cell. 1997;88:483-492.
30. Violot S, Hong SS, Rakotobe D, et al. The human polycomb group EED protein
interacts with the integrase of human immunodeficiency virus type 1. J Virol. 2003;77:12507-
12522.
31. Llano M, Vanegas M, Fregoso O, et al. LEDGF/p75 determines cellular trafficking of
diverse lentiviral but not murine oncoretroviral integrase proteins and is a component of
functional lentiviral preintegration complexes. J Virol. 2004;78:9524-9537.
32. Ciuffi A, Llano M, Poeschla E, et al. A role for LEDGF/p75 in targeting HIV DNA
integration. Nat Med. 2005;11:1287-1289.
33. Kustikova OS, Geiger H, Li Z, et al. Retroviral vector insertion sites associated with
dominant hematopoietic clones mark "stemness" pathways. Blood. 2007;109:1897-1907.
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Table 1. Retroviral integration site distribution in human CD34+ HSCs
Intergenic (%)
Intragenic (%)
Perigenic (%)
Total hits
±10 kb from TSS
(%)
Total vector/gene
interactions*
CD34+ cells
RV vectors 16.7 55.0 28.3 1,030 29.3 1,517
LV vectors 17.4 71.7 10.9 849 16.1 1,241
Control sequences
46.2 38.6 15.2 798 9.1 902
RV hot spots 16.0 56.6 27.4 219 22.2 302
LV hot spots 8.6 81.4 10.0 70 13.2 114
Control hot spots 36.4 59.1 4.5 22 13.0 23
HeLa cells
RV vectors 18.8 48.1 25.5 869 26.1 1,219
RV hot spots 16.5 53.2 30.3 109 27.3 165
Distribution of RV and LV integration sites unambiguously mapped in unselected CB- and BM-derived CD34+ HSCs, and RV integrations in HeLa cells from a previously published collection24. Integrations (total hits) were distributed as inside (intragenic), outside (intergenic), or at a distance of <30 kb upstream or dowstream (perigenic) from, Known Genes (UCSC annotation). Insertions at a distance of ±10 kb from transcription start sites (TSS) are indicated as percentage of the total vector/gene interactions. Control sequences were obtained from a randomly cloned library of Pst1/MseI-restricted, LM-PCR-amplified human CD34+ cell DNA. *Total number of genes within 30 kb from individual hits + intergenic hits.
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Table 2. RV and LV hot spots containing at least one proto-oncogene and/or cancer-associated CIS.
Chromosome Range (bp) N° hits Gene symbol Origin* RV hot spots 14q24.3 13882 4 C14orf43, PNMA1 CB-RV (2)
CB-SIN-RV (2) 11p13 48661 3 AF116668, LMO2 BM-ADA (1)
CB-RV (1) CB-SIN-RV (1)
17q11.2 7827 3 EVI2A, EVI2B, NF1, OMG
BM-X-SCID(1) CB-RV (2)
10q25.2 1920 2 ADD3 BM-ADA (1) CB-SIN-RV (1)
11q23.2 22851 2 ZBTB16 BM-ADA (1) CB-RV (1)
11q24.3 14147 2 FLI1 BM-ADA (1) CB-RV (1)
12p13.2 7360 2 ETV6 CB-RV (2) 16p13.11 18559 2 ABCC1 BM-ADA (1) 19p13.13 137 2 BTBD14B, LYL1,
NFIX, TRMT1 BM-X-SCID(1) CB-RV (1)
20p12.3 136 2 PLCB1 CB-SIN-RV (2) 20q13.12 19100 2 C20orf121, PKIG,
SERINC3 CB-RV (2)
22q13.1 29588 2 AB051446, MKL1, RUTBC3
BM-X-SCID (1) CB-SIN-RV (1)
2p11.2 779 2 CAPG, LOC284948, RBED1
PB-ND (2)
2p21 975 2 AK025445, MGC40574, THADA, ZFP36L2
CB-RV (2)
4p14 11999 2 N4BP2, RHOH CB-RV (2) 6q23.3 9422 2 MYB CB-RV (2) 6p24.3 1991 2 RREB1 CB-RV (1)
CB-SIN-RV (1) LV hot spots 9q34.3 31043 3 AK130247, C9orf163,
INPP5E, NOTCH1, PMPCA, DCCAG3
CB-SIN-LV (2) CB-LV (1)
2p21 22106 2 THADA CB-SIN-LV (1) CB-LV (1)
20p12.3 24132 2 PLCB1 CB-LV (2) 17p13.3 25818 2 RUTBC1, SMG6, SRR,
TSR1 CB-LV (2)
Controls 6q25.1 4561 2 ESR1
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Range indicates the maximun distance between hits contained in each hot spot. Proto-oncogenes or CIS are shown in bold. For the complete list of hot spots see Supplementary Table 5 * CB-RV, CB-derived CD34+ cells transduced with wt-LTR RV; CB-SIN-RV, CB-derived CD34+ cells transduced with SIN-LTR RV; BM-ADA, BM-derived CD34+ cells from ADA-SCID patients transduced with wt-LTR RV; BM-X-SCID: BM-derived CD34+ cells from X-SCID patients transduced with wt-LTR RV; PB-ND: PB-derived CD34+ cells from normal donor transduced with wt-LTR RV; CB-LV: CB-derived CD34+ cells transduced with wt-LTR LV; CB-SIN-LV, CB-derived CD34+ cells transduced with SIN-LTR LV (the number in parentheses indicates the number of hits for each category)
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Figure 1. Retroviral integrations and transcription start sites. Distribution of gamma-retroviral (RV) (A) and lentiviral (LV) (B) integration sites in human CD34+ cells within an interval of ±30 kb upstream or downstream from the transcription start site (TSS) of known genes (UCSC definition, considering only one isoform/gene). The bars show the % distribution in each 5-kb interval of retroviral insertions (white bars), insertion hot spots (black bars), and control sequences (grey bars). The line shows the distribution of 65,000 computer-generated random insertion sites23. n values indicate vector-gene interactions, i.e., the total number of genes within 30 kb from individual insertions plus the intergenic insertions.
12
10
8
6
4
2
0-30 -25 -20 -15 -10 -5 +5 +10 +15 +20 +25 +30
TSS
Control (n = 902)
RV all (n = 1,517)
RV hot spots (n = 302)
12
10
8
6
4
2
0
TSS
Control (n = 902)
LV all (n = 1,241)
LV hot spots (n = 114)
-30 -25 -20 -15 -10 -5 +5 +10 +15 +20 +25 +30
vect
or-
gen
ein
tera
ctio
ns
(%)
vect
or-
gen
ein
tera
ctio
ns
(%)
distance from TSS (kb)
A
B
distance from TSS (kb)
Random (n = 65,000)
Random (n = 65,000)
12
10
8
6
4
2
0-30 -25 -20 -15 -10 -5 +5 +10 +15 +20 +25 +30
TSS
Control (n = 902)
RV all (n = 1,517)
RV hot spots (n = 302)
12
10
8
6
4
2
0
TSS
Control (n = 902)
LV all (n = 1,241)
LV hot spots (n = 114)
-30 -25 -20 -15 -10 -5 +5 +10 +15 +20 +25 +30
vect
or-
gen
ein
tera
ctio
ns
(%)
vect
or-
gen
ein
tera
ctio
ns
(%)
distance from TSS (kb)
A
B
distance from TSS (kb)
Random (n = 65,000)
Random (n = 65,000)
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Figure 2. Retroviral integration and gene density. Integration sites (white bars) and integration hot spots (black bars) of RV (A) and LV (B) vectors in CD34+ cells are plotted according to the number of Known Genes (UCSC definition, considering only one isoform/gene) contained in a range of 1 Mb around each insertion site, in intervals of 5 genes/Mb. The distribution of control sequences is indicated by light gray bars. The dark gray bars represent the frequency of 1-Mb segments in the human genome for each gene density interval. n values indicate the number of independent hits in each group.
inte
grat
ion
site
s(%
)
A
0
10
20
30
40
50
60
70
0-5 6-10 11-15 16-20 21-25 26-30 >30
Control (n = 798)
RV all (n = 1,030)
RV hot spots (n = 219)
Genome
Known Genes / Mb
Control (n = 798)
LV all (n = 849)
LV hot spots (n = 70)
Genome
0-5 6-10 11-15 16-20 21-25 26-30 >30
Known Genes / Mb
inte
grat
ion
site
s(%
)
B
0
10
20
30
40
50
60
70
inte
grat
ion
site
s(%
)
A
0
10
20
30
40
50
60
70
0-5 6-10 11-15 16-20 21-25 26-30 >30
Control (n = 798)
RV all (n = 1,030)
RV hot spots (n = 219)
Genome
Known Genes / Mb
Control (n = 798)
LV all (n = 849)
LV hot spots (n = 70)
Genome
0-5 6-10 11-15 16-20 21-25 26-30 >30
Known Genes / Mb
inte
grat
ion
site
s(%
)
B
0
10
20
30
40
50
60
70
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Figure 3. Correlation between retroviral integration and gene activity in CD34+ cells. The bars show the % distribution of expression values from Affymetrix HG-U133A microarrays of cytokine-stimulated CD34+ cells. To correlate retroviral integration and gene activity, expression values from the CD34+ cell microarrays were divided into four classes, i.e. absent (black), low (i.e., below the 25th percentile in a normalized distribution, blue), intermediate (i.e., between the 25th and the 75th percentile, yellow) and high (i.e., above the 75th percentile, red). (A) The first two bars (all genes) show the distribution of the >16,000 genes on the microarray of cord blood (CB)- or bone marrow (BM)-derived CD34+ cells activated in the same conditions used for transduction with RV vectors, the other two bars represent the expression values of genes targeted by all RV integrations (RV all) or by integration hot spots (RV hot spots), derived from a weighted mean of the CB and BM microarray values. (B) The first bar (all genes) show the distribution of the >16,000 genes on the microarray of CB-derived CD34+ cells activated in the same conditions used for transduction with LV vectors, the other two bars represent the expression values of genes targeted by all LV integrations (LV all) or by integration hot spots (LV hot spots). The n values indicate the number of probesets analyzed for each group of genes.
n=22,283 n=1,571 n=195 n=959 n=65n=22,283
53
12
23
12
55
11
23
11
40
13
30
17
43
13
27
17
54
11
23
11
44
13
31
12
27
10
46
17
0
20
40
60
80
100
CB RV all RVhot spots
CBall genes
LV all LVhot spots
pro
bes
ets
(%)
High Intermediate Low Absent
BM0
20
40
60
80
100
A B
all genes
n=22,283 n=1,571 n=195 n=959 n=65n=22,283
53
12
23
12
53
12
23
12
55
11
23
11
55
11
23
11
40
13
30
17
40
13
30
17
43
13
27
17
43
13
27
17
54
11
23
11
54
11
23
11
44
13
31
12
44
13
31
12
27
10
46
17
27
10
46
17
0
20
40
60
80
100
CB RV all RVhot spots
CBall genes
LV all LVhot spots
pro
bes
ets
(%)
High Intermediate Low Absent
BM0
20
40
60
80
100
A B
all genes
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
35 ***
26 ***
*41
24
133
*128
795
Cell cycle
Apoptosis
Signal transduction
Transcription
Phosphorylation
Chromatinarchitecture
Genome (n = 14,959)
RV all (n = 762)
LV all (n = 633)
A
*****
**
*
**
*
**
5 10 15 20 25 300
2826245
5146
1492452
1973249
42565
55723
0 3 6 9 12 15
Protein ser/thrkinase activity
GTPase regulator
eligible genes (%)
26295
31458
*
GO Molecular Function
GO Biological Process
Genome (n = 17,847)
RV all (n = 866)
LV all (n = 712)
35 ***
26 ***
*41
24
133
*128
795
Cell cycle
Apoptosis
Signal transduction
Transcription
Phosphorylation
Chromatinarchitecture
Genome (n = 14,959)
RV all (n = 762)
LV all (n = 633)
A
*****
**
*
**
*
**
5 10 15 20 25 300
2826245
5146
1492452
1973249
42565
55723
0 3 6 9 12 15
Protein ser/thrkinase activity
GTPase regulator
eligible genes (%)
26295
31458
*
GO Molecular Function
GO Biological Process
Genome (n = 17,847)
RV all (n = 866)
LV all (n = 712)
B
Cancer
Hematopoieticsystem
Immune response
Immune/lymphaticsystem
RV hot spots (n = 76)
Control (n = 268)
RV all (n= 637)
LV all (n = 514)
LV hot spots (n = 31)
***
0 5 10 15 20 25 30 35
4715112705
39417
101
10622374
129020283
12
*
******
***
******
Diseases and Physiological System Development
0 5 10 15 20 25 30
21163
132
399
36
1679
77
Cell growth andproliferation
Cell signaling
Cell death
Cellular and Molecular Function
eligible genes (%)
11
12
4
4
23
******
***
***
***
**
***
*
***
RV hot spots (n = 76)
Control (n = 268)
RV all (n= 637)
LV all (n = 514)
LV hot spots (n = 31)
B
Cancer
Hematopoieticsystem
Immune response
Immune/lymphaticsystem
RV hot spots (n = 76)
Control (n = 268)
RV all (n= 637)
LV all (n = 514)
LV hot spots (n = 31)
***
0 5 10 15 20 25 30 35
4715112705
39417
101
10622374
129020283
12
*
******
***
******
Diseases and Physiological System Development
0 5 10 15 20 25 30
21163
132
399
36
1679
77
Cell growth andproliferation
Cell signaling
Cell death
Cellular and Molecular Function
eligible genes (%)
11
12
4
4
23
******
***
***
***
**
***
*
***
RV hot spots (n = 76)
Control (n = 268)
RV all (n= 637)
LV all (n = 514)
LV hot spots (n = 31)
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Figure 4. Genes regulating cell growth and proliferation are preferential targets of retroviral integration. (A) Gene Ontology (GO) analysis of integration target genes in CD34+ cells. Genes identified as targets for RV (black bars) and LV (white bars) integration were analyzed for significant functional clusters with the DAVID 2.1 software. Functional categories are derived from the GO-Biological Process (establishment and/or maintenance of chromatin architecture, phosphorylation, transcription, signal transduction, apoptosis, cell cycle) and the GO-Molecular Function (GTPase regulator activity, protein serine/threonine kinase activity) classifications. Bars indicate the number of integration target genes annotated within the given category out of n genes eligible for each analysis. Asterisks denote the significance level of over-representation of any given category with respect to the human genome (grey bars), used as background population (***, EASE score <0.0005, **, EASE score <0.005, *, EASE score <0.05). The number of gene identifiers annotated within each functional category is indicated in the bars. (B) Functional clustering analysis comparing integration target and control gene lists. Function/disease categories were those significantly over-represented in at least one integration target gene list (0.005<p<0.05) using the Ingenuity Pathways Knowledge Base as background population and the Ingenuity analysis software. Bars represent the percentage of integration target genes belonging to each category among n genes eligible for the analysis. Asterisks denote the probability that differences observed between integration data sets (RV, LV, RV hot spots and LV hot spots) and the control data set are due to chance alone (two-sample test for equality of proportions with continuity correction; ***, p<0.0005, **, p<0.005, *, p<0.05). The number of genes annotated within each category is indicated in the bars.
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Figure 5. CIS and proto-oncogenes are over-represented in RV integrations and integration hot spots. Comparative analysis of the frequency of genes annotated in the CIS and proto-oncogene databases (see methods for definitions and data source) between integration target and control gene lists. Bars represent the % of RV and LV integrations, RV and LV integration hot spots, and control sequences targeting at least one proto-oncogene or CIS. The n values indicate the number of independent hits in each group. Asterisks denote the level of enrichment with respect to control data set (two-sample test for equality of proportions with continuity correction; ***, p<0.0005, *, p<0.05).
0 2 4 6 8 10 12
Proto-oncogenes
CIS
Control (798)
RV all (1,030)
LV all (849)
RV hot spots (219)
LV hot spots (70)
176419325
277724494
integration sites (%)
******
*
*
******
0 2 4 6 8 10 12
Proto-oncogenes
CIS
Control (798)
RV all (1,030)
LV all (849)
RV hot spots (219)
LV hot spots (70)
176419325
277724494
integration sites (%)
******
*
*
******
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Figure 6. Genes hit by retroviral integration are functionally linked in gene networks. Representative networks originated by Ingenuity analysis of RV target genes (see Supplementary Table 4 for a complete list). Both networks are made of 35 target genes, with an Ingenuity score ≥42. The color code indicates the most significant biological functions associated to each network (p<0.0005). (A) RV network 1; (B) RV network 4 (networks are identified in Supplemetary Table 4).
Apoptosis (21 genes, p =1.8x10-8)
A
PMAIP1
TPT1
PARP1
CHGB
BCL2L1
MADD
CFLAR
RASGRP3
ITPR1*
TUBB
CRADD*
RTN4IP1
SLC1A6
ATXN1
AHCYL1
CASP10
TUBA8PRKCQ
DYNLL1
RTN4
TUBB1
TNFRSF7FAS
NOSIP
ALOX5AP
SNW1
ALK
MKL1*MAD1L1
DDB2
BRE*
CASP8
BCL2*
MYB*
MLX*
Transcription (25 genes, p =2.0x10-14)Cell proliferation (17 genes, p =1.5x10-5)Tumorigenesis (12 genes, p =1.3x10-4)
B
CXXC5
KLF13
MAP3K14
RFX2
ATP2B4ENG
KLF6
TNIP1
SND1
CDC27*
RUNX1
TCF7L2
JUND
EVI1RPL30RXRB
SPP1
FOXP1*
SMAD3
TOB1
PCAF
TGFBR1
SURB7
PPP3CA
FRAT1
TADA2L
RREB1*
HHEX
CTBP1
RFX1
E2F5
FOLR2
CCL18
RARG
NR1H3
Apoptosis (21 genes, p =1.8x10-8)
A
PMAIP1
TPT1
PARP1
CHGB
BCL2L1
MADD
CFLAR
RASGRP3
ITPR1*
TUBB
CRADD*
RTN4IP1
SLC1A6
ATXN1
AHCYL1
CASP10
TUBA8PRKCQ
DYNLL1
RTN4
TUBB1
TNFRSF7FAS
NOSIP
ALOX5AP
SNW1
ALK
MKL1*MAD1L1
DDB2
BRE*
CASP8
BCL2*
MYB*
MLX*
PMAIP1
TPT1
PARP1
CHGB
BCL2L1
MADD
CFLAR
RASGRP3
ITPR1*
TUBB
CRADD*
RTN4IP1
SLC1A6
ATXN1
AHCYL1
CASP10
TUBA8PRKCQ
DYNLL1
RTN4
TUBB1
TNFRSF7FAS
NOSIP
ALOX5AP
SNW1
ALK
MKL1*MAD1L1
DDB2
BRE*
CASP8
BCL2*
MYB*
MLX*
Transcription (25 genes, p =2.0x10-14)Cell proliferation (17 genes, p =1.5x10-5)Tumorigenesis (12 genes, p =1.3x10-4)
B
CXXC5
KLF13
MAP3K14
RFX2
ATP2B4ENG
KLF6
TNIP1
SND1
CDC27*
RUNX1
TCF7L2
JUND
EVI1RPL30RXRB
SPP1
FOXP1*
SMAD3
TOB1
PCAF
TGFBR1
SURB7
PPP3CA
FRAT1
TADA2L
RREB1*
HHEX
CTBP1
RFX1
E2F5
FOLR2
CCL18
RARG
NR1H3
CXXC5
KLF13
MAP3K14
RFX2
ATP2B4ENG
KLF6
TNIP1
SND1
CDC27*
RUNX1
TCF7L2
JUND
EVI1RPL30RXRB
SPP1
FOXP1*
SMAD3
TOB1
PCAF
TGFBR1
SURB7
PPP3CA
FRAT1
TADA2L
RREB1*
HHEX
CTBP1
RFX1
E2F5
FOLR2
CCL18
RARG
NR1H3
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Figure 7. Schematic representation of the maximum distance between individual hits within RV and LV hot spots. Symbols represent single hot spots originated from 2 (black), 3 (grey) or 4 (white) hits in the genome of CD34+ HSCs (1,030 RV and 849 LV integrations) and HeLa cells (869 RV integrations), plotted according to the maximum distance between individual integrations (in base pairs, log scale). Also shown are “false positive” hot spots generated by applying the definition criteria to a library of LM-PCR-amplified random sequences of human CD34+ DNA (798 sequences). 26.0% of the 97 RV hot spots in CD34+ cells and almost one half (40.4%) of the 52 RV hot spots in HeLa cells contained two independent integrations in less than 2 kb, compared to only one of the 33 LV hot spots.
100 1.000 10.000 100.000
HeLa RV
CD34 RV
CD34 LV
Controls
2 hits 3 hits 4 hits2 kb
40%
26%
3%
9%
bp100 1.000 10.000 100.000
HeLa RV
CD34 RV
CD34 LV
Controls
2 hits 3 hits 4 hits2 kb
40%
26%
3%
9%
bp
For personal use only. by guest on May 30, 2013. bloodjournal.hematologylibrary.orgFrom
Top Related