Revealing Domain Structure through Linker-Scanning Analysis of the Murine Leukemia Virus (MuLV)...

15
2006, 80(19):9497. DOI: 10.1128/JVI.00856-06. J. Virol. B. Jonsson and Monica J. Roth Cote, Michael Scher, Joelle N. Pelletier, Sinu John, Colleen Jennifer Puglia, Tan Wang, Christine Smith-Snyder, Marie Integrase Proteins 1 and Human Immunodeficiency Virus Type Leukemia Virus (MuLV) RNase H and MuLV Linker-Scanning Analysis of the Murine Revealing Domain Structure through http://jvi.asm.org/content/80/19/9497 Updated information and services can be found at: These include: REFERENCES http://jvi.asm.org/content/80/19/9497#ref-list-1 at: This article cites 95 articles, 66 of which can be accessed free CONTENT ALERTS more» articles cite this article), Receive: RSS Feeds, eTOCs, free email alerts (when new http://journals.asm.org/site/misc/reprints.xhtml Information about commercial reprint orders: http://journals.asm.org/site/subscriptions/ To subscribe to to another ASM Journal go to: on May 1, 2014 by guest http://jvi.asm.org/ Downloaded from on May 1, 2014 by guest http://jvi.asm.org/ Downloaded from

Transcript of Revealing Domain Structure through Linker-Scanning Analysis of the Murine Leukemia Virus (MuLV)...

  2006, 80(19):9497. DOI: 10.1128/JVI.00856-06. J. Virol. 

B. Jonsson and Monica J. RothCote, Michael Scher, Joelle N. Pelletier, Sinu John, Colleen Jennifer Puglia, Tan Wang, Christine Smith-Snyder, Marie Integrase Proteins

1and Human Immunodeficiency Virus Type Leukemia Virus (MuLV) RNase H and MuLVLinker-Scanning Analysis of the Murine Revealing Domain Structure through

http://jvi.asm.org/content/80/19/9497Updated information and services can be found at:

These include:

REFERENCEShttp://jvi.asm.org/content/80/19/9497#ref-list-1at:

This article cites 95 articles, 66 of which can be accessed free

CONTENT ALERTS more»articles cite this article),

Receive: RSS Feeds, eTOCs, free email alerts (when new

http://journals.asm.org/site/misc/reprints.xhtmlInformation about commercial reprint orders: http://journals.asm.org/site/subscriptions/To subscribe to to another ASM Journal go to:

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

JOURNAL OF VIROLOGY, Oct. 2006, p. 9497–9510 Vol. 80, No. 190022-538X/06/$08.00�0 doi:10.1128/JVI.00856-06Copyright © 2006, American Society for Microbiology. All Rights Reserved.

Revealing Domain Structure through Linker-Scanning Analysis ofthe Murine Leukemia Virus (MuLV) RNase H and MuLV and

Human Immunodeficiency Virus Type 1 Integrase ProteinsJennifer Puglia,1† Tan Wang,2† Christine Smith-Snyder,1 Marie Cote,1 Michael Scher,1

Joelle N. Pelletier,4 Sinu John,3 Colleen B. Jonsson,2 and Monica J. Roth1*Department of Biochemistry, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, 675 Hoes Lane,

Piscataway, New Jersey 088541; Department of Biochemistry and Molecular Biology, Southern Research Institute,2000 9th Ave. S., Birmingham, Alabama 352052; Graduate Program in Biochemistry and Molecular Genetics,

University of Alabama at Birmingham, Birmingham, Alabama 352943; and Departement de Chimie,Faculte des Arts et Sciences, et Departement de Biochimie, Faculte de Medecine, Universite de

Montreal, C.P. 6128, Succursale Centre-Ville, Montreal, Quebec H3C 3J7, Canada4

Received 26 April 2006/Accepted 7 July 2006

Linker-scanning libraries were generated within the 3� terminus of the Moloney murine leukemia virus(M-MuLV) pol gene encoding the connection-RNase H domains of reverse transcriptase (RT) as well as thestructurally related M-MuLV and human immunodeficiency virus type 1 (HIV-1) integrase (IN) proteins.Mutations within the M-MuLV proviral vectors were Tn7 based and resulted in 15-bp insertions. Mutationswithin an HIV-1 IN bacterial expression vector were based on Tn5 and resulted in 57-bp insertions. The effectsof the insertions were examined in vivo (M-MuLV) and in vitro (HIV-1). A total of 178 individual M-MuLVconstructs were analyzed; 40 in-frame insertions within RT connection-RNase H, 108 in-frame insertionswithin IN, 13 insertions encoding stop codons within RNase H, and 17 insertions encoding stop codons withinIN. For HIV-1 IN, 56 mutants were analyzed. In both M-MuLV and HIV-1 IN, regions are identified whichfunctionally tolerate multiple-linker insertions. For MuLV, these correspond to the RT-IN proteolytic junction,the junction between the IN core and C terminus, and the C terminus of IN. For HIV-1 IN, in addition to thejunction between the IN core and C terminus and the C terminus of IN, insertions between the N terminus andcore domains maintained integration and disintegration activity. Of the 40 in-frame insertions within theM-MuLV RT connection-RNase H domains, only the three C-terminal insertions mapping to the RT-INproteolytic junction were viable. These results correlate with deletion studies mapping the domain andsubdomain boundaries of RT and IN. Importantly, these genetic footprints provide a means to identifynonessential regions within RT and IN for targeted gene therapy applications.

Methods have been developed for the comprehensive anal-ysis of a gene by construction of a saturating or near-saturatinglibrary of mutants (5, 78, 83). This approach has defined do-main boundaries, provided functional maps, and given insightsinto previously predicted unstructured loops (4, 5, 50, 71, 73,78, 83). In this report, this method of insertional functionalmapping is applied to three catalytically related domains: theMoloney murine leukemia virus (M-MuLV) RNase H domainof the reverse transcriptase (RT), and the M-MuLV and hu-man immunodeficiency virus type 1 (HIV-1) integrase (IN)proteins. Inclusion of the HIV-1 IN protein assisted compari-son and model building, since structural information is avail-able (7, 18, 26, 34, 35, 37, 87). In the retroviral life cycle, theRNase H activity is required for viral replication during theconversion of the viral RNA (vRNA) into double-strandedDNA through the RNA-DNA intermediate. The IN protein isrequired for the insertion of the double-stranded DNA into thehost chromosome, establishing the integrated provirus.

The replication and integration of retroviral particles aretwo distinct yet interrelated processes. Replicative complexesand preintegrative complexes have been purified and charac-terized from infected cells (6, 9, 17, 28–30, 39, 52, 53, 55, 56, 66,67). Within viral species as well as between viral species, thecomposition of replicative complexes differs from that of pre-integrative complexes. Interactions between RT and IN arealso reported (40, 69, 89, 90, 96), and multiple mutations of INare known to alter viral replication (27, 58–60). Despite exten-sive efforts, the assembly of these complexes is not well under-stood. These studies have been assisted by structural studies. Astructure of the M-MuLV RT has recently been reported (21),as have structures of related retroviral IN subdomains (8, 14,18, 19, 26, 35, 43, 87, 94). However, to date, neither a structureof a complete retroviral IN protein nor one of a subdomain incomplex with DNA has been obtained.

The ability of retroviral particles to stably integrate into thehost genome is a great benefit for gene delivery, but the po-tential for insertional mutagenesis cannot be overlooked (15,22, 38, 63). Schemes to target integration into alternative po-sitions within the host chromosome to avoid this issue fre-quently involve generation of fusion proteins with novel tar-geting domains (10, 48, 84). The linker insertion geneticfootprint provides a means to identify nonessential regions

* Corresponding author. Mailing address: Department of Biochem-istry, Robert Wood Johnson Medical School, University of Medicineand Dentistry of New Jersey, 675 Hoes Lane, Piscataway, NJ 08854.Phone: (732) 235-5048. Fax: (732) 235-4783. E-mail: [email protected].

† These authors contributed equally to the manuscript.

9497

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

within proteins capable of withstanding insertions. Extendingthese studies to include the RNase H domain provides a par-allel analysis of a protein containing a related catalytic coreconsisting of an acidic catalytic triad.

In this report, the 3� terminus of the M-MuLV pol gene andthe HIV-1 IN gene were subjected to random insertion mu-tagenesis. Individual constructs, selected from the library, wereassayed for the effects on virus viability in vivo or IN functionsin vitro. Using this complementary approach, four regionsfunctionally tolerant of insertions were identified within RNaseH-IN. These regions correlate with domain and protein junc-tions. No viable linker insertions were identified within anynonstructured regions of connection-RNase H.

MATERIALS AND METHODS

Generation of plasmids. Construction and analysis of pNCA-C, a viable, rep-lication-competent M-MuLV proviral construct have been previously described(31). The pNCA-C-XN-SU8 M-MuLV proviral construct was derived frompNCA-C (74, 80). This contains a NotI linker within the XbaI site at the 3�terminus of the M-MuLV pol gene, yielding a 23-amino-acid C-terminal trunca-tion of the MuLV IN protein plus a suppressor tRNA in the 3� long terminalrepeat (LTR) (SU8) (57).

The 3� terminal two-thirds of the pol gene was subcloned into a minimalplasmid backbone for mutagenesis. The amp gene and the origin of replicationof pGEM-3Zf(�) (Promega) were PCR amplified using primer Hpatag/45314(5�-GCCGTTAACACATGTGAGCAAAAGGCC-3�) and primer Bamtag/45315 (5�-CGGGATCCTTGAAAAAGGAAGAGTATG-3�) using a mixture of5 U Taq DNA pol (Invitrogen) and 2.5 U cloned Pfu (Stratagene). The 1.7-kbPCR product was digested with BamHI and HpaI and ligated to the 2.3-kbBamHI/HpaI fragment from pNCA-C-XN-SU8. The resulting plasmid, pGEM-BH-XN-BH, was used for mutagenesis.

To facilitate the reconstruction of the insertional library into pNCA-C-XN-SU8, a deletion within MuLV IN was generated. pNCA-C-XN-SU8 was partiallydigested with XmnI, and the linear product was isolated and digested with PmlI.The 10-kb DNA fragment was isolated and ligated to yield pNCA-C-XN-SU8-�IN. This deletion within IN maintains the SalI and NotI sites required forreconstruction of the library. pNCA-C-XN-SU8-�IN and pGEM-BH-XN-BHplasmids were generated in HB101 Escherichia coli cells.

Mutagenesis. M-MuLV mutagenesis was performed on pGEM-BH-XN-BH(80 ng) using the GPS-LS linker scanning system kit (NEB). The method is basedon random Tn7 transposition (5) introducing the chloramphenicol resistancegene (Cmr). DNA was introduced into ElectroMAX DH10B cells (Gibco BRL)by electroporation. Chloramphenicol-resistant colonies (105) were selected onone 245- by 245-mm plate. Colonies were scraped off the plate and pooled;the mutagenized pGEM-BH-BH-chlor plasmids were isolated (Midi proto-col; QIAGEN) and maintained as a library. This initial library was digested withPmeI to remove the chloramphenicol resistance gene, ligated, and electropo-rated into ElectroMAX DH10B cells (Gibco BRL). Ampicillin-resistant colonieswere pooled and lysed to isolate the pGEM-BH-XN-BH-15 constructs, whichcontained the 15-bp linker insertion encoding a PmeI site.

The EZ:TN in-frame linker insertion kit (Epicenter Biotechnologies) was usedto generate a library of mutants of HIV-1 IN with 19-amino-acid insertionswithin the target plasmid pINSD.His (NIH AIDS Research and ReferenceReagent Program) by following the manufacturer’s protocols. Mutants werescreened using a PCR-based strategy. The pinsdBscreen primer (5�-CGG GCTTTG TTA GCA GCC GG-3�) and pinsdFscreen primer (5�-GGT GCC GCGCGG CAG CC-3�) were used to amplify the HIV-1 IN sequence, which annealedto nucleotide positions 301 to 320 and 335 to 351 of the pET15b plasmid,respectively. The PCR mixtures contain 1� PCR buffer from the Expand Longtemplate PCR system (Boehringer Mannheim), 2.25 mM MgSO4, 0.2 mM de-oxynucleoside triphosphate, 2 �M primers, 2 U Taq polymerase, and a toothpicktrace of the glycerol stocks stored at �80°C. PCR conditions were as follows:94°C for 4 min, followed by 35 cycles of denaturing at 94°C for 30 s, annealing at69°C for 30 s, and an extension step at 72°C for 2 min and 15 s. This cycle wasfollowed by a final extension period at 72°C for 4 min, which was followed by ahold at 4°C. After PCR, the samples were loaded onto a 1.5% agarose gel andexamined to determine which of the clones was positive for linker insertionwithin the IN gene. Clones with insertions were individually digested with NotIaccording to manufacturer’s recommendations.

Reconstruction of library into MuLV provirus. The 15-bp insertion library wasreconstructed back into the pNCA-C-XN-SU8 provirus backbone. The SalI-NotI2,060-bp fragment from the pGEM-BH-XN-BH-15 library was exchanged intothe pNCA-C-XN-SU8-�IN, which was digested with the same enzymes. LibraryDNA was introduced into chemically competent UltraMAX DH5�-FT (Tetr)(Gibco BRL) and maintained on one 245- by 245-mm plate. Since the Tn7transposition was performed on the BamHI-HpaI 2,281-bp region of MuLVwithin pGEM-BH-XN-BH, the possibility remained that the insertions occurredoutside the SalI-NotI fragment utilized in the provirus reconstruction. To elim-inate constructs in which the wild-type coding sequence was transferred, thelibrary DNA was digested with PmeI, and the linear DNA was isolated andligated to generate the final mutant library. This selected for pNCA-C-XN-SU8plasmids bearing a PmeI linker insertion.

Single isolate mapping with MuLV. The position of the PmeI sites withinMuLV (pNCA-C-XN-SU8) was determined by size analysis of the SalI/PmeIfragment released from the individual mutated plasmid library isolates. Auto-mated sequencing was performed in the DNA core facility of Robert WoodJohnson Medical School (UMDNJ) using an appropriate primer determinedafter restriction mapping. Alternatively, individual colonies were directly se-quenced with primers spanning the MuLV RT/IN coding region to identifyPmeI-containing sequences. Approximately 750 individual colonies were isolatedand screened for insertions. DNA sequencing of HIV-1 clones was performedwith the ABI PRISM BigDye Primer v3.0 cycle sequencing ready reaction kitwith AmpliTaq DNA polymerase, FS (Applied Biosystems, Foster City, CA) todetermine the site of the 19-codon insertion. Sequence data were analyzed withVectorNTI from InforMax Inc. (Frederick, MD).

Cell culture. The generation and maintenance of canine D17 cells expressingMCAT, the receptor for ecotropic M-MuLV (pJET) (1) has been previouslydescribed (68). Individual PmeI-encoding MuLV proviral constructs (100 ngeach) from the final library were transiently introduced to 2 � 104 D17/pJET (15mm wells) in the presence of 150 �g/ml DEAE-dextran (64). Upon confluence,supernatants were collected and cells were passed to six-well (60 mm) plates formaintenance. Supernatants were collected on all subsequent days of confluenceand assayed for RT activity (33). Viral DNA was isolated from RT-positivecultures using the method of Hirt (41).

PCR of MuLV viral DNA. Unintegrated MuLV viral DNA (41) was isolatedfrom D17/pJET cells and used as a template for PCR in the presence of 100pmol of primers JR6325L (5�-CAGTACTGACCCCTCTGAGCATC-3�) andJR4085R (5�-ATCAAGCAAGCTCTTCTAACTGCC-3�) using a mixture ofTaq DNA pol (5 Units; Invitrogen) and cloned PFU (2.5 Units; Stratagene). Theamplified 2.2-kb product (bp 4085 to bp 6325 in the pNCA-C-XN-SU8 parentalvector) was isolated from a 1% agarose gel using the QIAquick gel extraction kitprotocol (QIAGEN) and subjected to automated DNA sequencing.

Expression of the M-MuLV IN C terminus. A directional deletion analysis wasperformed to select for a stable MuLV IN C terminus expression construct. TheHis6-thrombin-WTIN construct within a pET vector was digested with SphI andsubjected to Bal31 digestion for five time points between 5 and 30 min. The DNAwas digested with PstI, and the deletion fragments between 1 and 1.8 kbp weregel isolated. The plasmid pIN1-105 plasmid contains the His6-thrombin-leaderfollowed by the IN 1-105 expressed downstream of an NdeI site (C�303) (91).The pIN1-105 was digested with NdeI and blunt ended by filling in with Klenowpolymerase. After PstI digestion, the 4,568-bp fragment was isolated and ligatedwith the Bal31 deletion fragments. Individual colonies (total of 93) were analyzedfor the size of the deletion, and 20 were further selected for protein expressionin E. coli BL21(DE3). Isolate 77 was subjected to DNA sequence analysis.

Expression and purification of HIV IN and mutants. Wild-type HIV-1 IN andinsertion mutants were expressed in E. coli BL21(DE3) cells in 50 ml of mediumand purified as hexahistidine-tagged fusion proteins as described previously (88).Purification from 50-ml cultures yielded approximately 2 mg of 90 to 95%homogenous protein. The protein fraction refolded at a concentration of 5 mg/mlexhibited the greatest enzymatic activity. HIV-1 IN precipitated upon addition ofbuffer C (0.2 M NaCl). The precipitated protein was resuspended in buffer D (0.5M NaCl) to a final concentration of 1 mg/ml.

In vitro integration and disintegration assays of HIV-1 IN. Strand transfer anddisintegration reactions were performed as described previously (88). Reactionproducts were separated on a 20% polyacrylamide denaturing gel and subjectedto autoradiography or PhosphorImager screens (Molecular Dynamics). Productswere quantified with ImageQuant software (Molecular Dynamics). Oligonucle-otides were purified on 20% denaturing polyacrylamide gels, 32P labeled at the5� end with T4 polynucleotide kinase, and hybridized to complementary strandsas previously described (47). Unincorporated radioactivity was removed fromlabeled integration and disintegration substrates with G-25 or G-50 Quick Spincolumns (Boehringer, Mannheim, IN).

9498 PUGLIA ET AL. J. VIROL.

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

Molecular modeling of MuLV RT. A three-dimensional model of the M-MuLV RT was reconstructed using the 1RW3 crystal structure comprising thefingers, palm, thumb, and connection domains (21) and the preliminary RNaseH �C crystal structure (54; Wayne Hendrickson, personal communication). Acrude full model was generated using the O program (45) by positioning theRNase H �C domain into the diffuse electron density observed in the 1RW3structure. Nonstructured regions were molecularly modeled using the O programand subjected to energy minimization using the AMBER suite of programs (70).Regions reconstructed include residues 327 to 334 (at the tip of the thumb),residues 475 to 504 (between connection and RNase H), and residues 592 to 603and 633 to 642 (RNase H). Additional modification included insertions of theC-helix from E. coli RNase H (1G15) (32) between the B- and D-helices of theM-MuLV RNase H domain, mutating the residues to the correct M-MuLVresidues and subjecting the final model to energy minimization. The correspond-ing figures were generated using MOLSCRIPT (49) and Raster3D (65).

Structural model of HIV-1 IN monomer. The structural model of the HIV-1IN monomer was constructed from a combination of two X-ray crystal structures,represented by PDB codes 1k6y (the two-domain finger/core) and 1ex4 (thetwo-domain core/C terminus). The “A” molecule core region of 1k6y was super-imposed onto the “A” molecule core region of 1ex4 using the program O. The1k6y structure is comprised of residues 1 to 46, 56 to 139, and 149 to 210; and1ex4 is comprised of residues 56 to 141 and 145 to 270. Thus, the superposition-ing consisted of overlaying the C� atoms of all common core residues (root meansquare deviation, 0.83 angstroms). Where the model contained disordered re-gions (residues 47 to 55 and 142 to 144, inclusive), polyalanine segments con-taining the correct number of amino acids were created and moved into theappropriate linking positions in the model. The Ala residues were then changedto the proper residues, and the regions were subjected to least-squares minimi-zation. Similarly, residues 271 to 288 (absent from 1ex4) were created using apolyalanine chain, mutating to the appropriate amino acid residues, followed byenergy minimization.

RESULTS

Figure 1 outlines the series of steps used to generate thelinker insertion library within the M-MuLV proviral constructpNCA-C-XN-SU8 (panel A) and the HIV-1 IN expressionconstruct (panel B). For M-MuLV, the target fragment encod-ing the 3� terminal two-thirds of the pol gene (2.3 kb BamHI/HpaI fragment) was first subcloned into a minimal plasmidencoding ori/amp, generating pGEM-XN-BH-XN. The Tn7mutagenesis system results in the random insertion of thetransposon encoding the chloramphenicol resistance genethroughout the plasmid. The use of a minimal plasmid biasesthe nonessential regions to be within the target viral insert.With the target sequence 2,281 bp in size, the generation of 105

Cmr colonies was indicative of an extensive mutational library.The colonies were pooled, and the plasmid DNA was isolatedas a population and digested with PmeI to remove the chlor-amphenicol resistance gene. After ligation, the remainder ofthe Tn7 element reconstitutes a 15-bp linker insertion encod-ing a PmeI site. This population of 5 amino acid insertions wasselected for Ampr, colonies were pooled, and the plasmidDNA was isolated as a population.

Reconstruction of the library back into a retroviral backboneutilized a proviral construct bearing a deletion of the IN gene,decreasing the possibility of wild-type (WT) sequences withinthe library. Reconstruction was facilitated by the presence of a

FIG. 1. (A) Generation of the M-MuLV pol insertional library. The seven steps required to generate the insertional library within the retroviralproviral construct are outlined. The top figure schematically outlines the pNCA-C XN construct, containing the viral LTR and gag, pol, and envgenes. The region of the pol gene encoding the RT, connection (C), and RNase H (R) domains and the IN protein subjected to Tn7 insertionalmutagenesis (GPS-LS linker scanning system; NEB) are expanded. Restriction sites utilized and their positions within the M-MuLV viral RNA(82), where appropriate, are BamHI (B, 3535), SalI (S, 3705), XbaI (X, 5766), NotI (N), HpaI (H, 5816), and PmeI (P). (B) Generation of theHIV-1 insertional library. The five steps required to generate the insertional library are outlined. The region of the IN gene encoding the proteinwas subjected to Tn5 insertional mutagenesis, which contains the kanamycin resistance gene between its short 19-bp mosaic end (ME) Tn5transposase recognition sequences. NotI restriction sites flanking the ME also are shown.

VOL. 80, 2006 RNase H AND INTEGRASE DOMAIN STRUCTURE-FUNCTION 9499

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

unique NotI site introduced at an XbaI site at the C terminusof IN (75). This mutation truncates the C-terminal 23 aminoacids of MuLV IN and maintains virus viability (75). ThePmeI-bearing pGEM-BH-XN-BH plasmid library was di-gested with the unique restriction enzymes SalI and NotI, andthe library was regenerated by fragment exchange into thepNCA-C-XN-SU8 proviral backbone. With this approach, it ispossible that a small number of WT sequences could be trans-ferred, if the initial transposition occurred either within the170-bp region between the BamHI and SalI sites at the 5� endor if it occurred in the 50-bp region between the NotI and HpaIsites at the 3� end. To eliminate these particular constructsfrom the library, the reconstructed pNCA-C-XN-SU8-PmeIlibrary was digested with PmeI, and the linear DNA was iso-lated, religated, and transformed back into E. coli to generatethe final library. WT MuLV does not encode a PmeI site andwould be eliminated during this process.

The Tn5 mutagenesis system was used to create a library ofmutants within HIV-1 IN. The generation of over 2,000 Kanr

colonies was indicative of a large-scale mutational library. Tomake the screening process high throughput, each individualcolony was picked and transferred into 96-well culture blocksand PCR-based screening of insertions were conducted. Intotal, 1,056 colonies were analyzed for the presence of the Tn5transposon insertion. One hundred eleven clones were positivefor having 1 insertion within the HIV-1 IN gene; of these, 56were unique.

Insertion sites of M-MuLV and HIV-1 individual isolates.The final pNCA-C-XN-SU8 PmeI insertion library forM-MuLV was characterized by analyzing individual isolates.Isolates of the final library were subjected to restriction map-ping and sequencing analysis (summarized in Fig. 2 and Tables1 and 2). The 15-bp insertion generated by the linker scanning

system resulted in a 5-amino-acid insertion in 4/6 readingframes and a TAA stop codon in 2/6 reading frames. Sequenc-ing and restriction mapping of isolates from the library dem-onstrated that insertions were distributed throughout the frag-ment. The insertion sites, however, were not randomlydistributed, with clustering of insertions within the center ofthe fragment. This could indicate a preference for a specificstructure within the plasmid DNA by the transposase enzymeor reflect an inadequate sampling of the large population ofconstructs within the library. However, within the populationexamined, a large number of duplicate isolates were identified,indicating that the sample size was representative. A total of148 in-frame insertions were identified. The ratio of in-frameinserts to those with stop codons was as predicted. In the initialscreen of 80 individual colonies analyzed, 67 had unique inser-tion sites that could be readily sequenced. Approximately 2/5(29/67) individual constructs had mutations that resulted instop codons; 37 constructs resulted in the insertion of 5 aminoacids. One isolate bore a deletion of 20 essential amino acidswithin the core region of MuLV IN.

The composition of 5-amino-acid inserts is determined bythe target site selected and duplicated during the transpositionprocess as well as the sequences encoding the PmeI restrictionsite. Depending upon the reading frame, the in-frame inser-tions will encode a core of either CLN or FKQ/H (Table 1).The insertions are therefore not simple aliphatic side chainsbut contain bulky and often reactive or charged species. Sim-ilarly, the TAA stop codon cannot be avoided, as it encodes thecore of the PmeI site (GTTTAAAC).

Of the 148 in-frame insertions, 40 were within MuLV RT, 10were within the connection region, and 30 were within RNaseH. The remaining 108 in-frame insertions mapped within theMuLV IN protein; 45 mapped to the N-terminal zinc-binding

FIG. 2. Functional mapping of the linker insertions on the MuLV pol gene products. The figure summarizes the linker insertions and theireffects on retroviral viability. E, nonviable termination inserts; F, viable termination inserts; ƒ, nonviable in-frame insertions; �, viable in-frameinsertions. Asterisks (*) indicate linker insertions previously characterized (74, 79). The insertion marked with a plus sign was temperature sensitivefor replication and integration. Amino acid numbering within RT and IN are indicated at the left and right edges. The protease cleavage sitemarking the junction between MuLV RT and IN is indicated above the sequence. MuLV RT aa 515 is marked, indicating the N terminus of thedomain homologous to E. coli RNase H (93). The HHCC N-terminal domain of MuLV IN corresponds to IN1-105 (91). The position of T287 ofMuLV IN is indicated, marking the N terminus of the MuLV IN C-terminal domain (Fig. 7). The coding region subjected to mutagenesis in thisstudy includes the sequence N�-DEKQ. . . .GGPS-C�.

9500 PUGLIA ET AL. J. VIROL.

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

TABLE 1. Summary of in-frame insertions

MuLVa aab Insertc RTd MuLVa aab Insertc RTd

3815 RT/W406 WCLNRWP � 4797 IN/L62 LMFKHLS �3866 RT/A423 ACLNNAG � 4821 IN/L70 LLFKHLL �3879 RT/T427 TMFKQTM � 4823 IN/L71 LCLNILE �3899 RT/I434 ICLNIIL � 4832 IN/L74 SCLNRSH �3903 RT/L435 LVFKHLA � 4845 IN/Y78 YLFKHYY �3945 RT/D449 DLFKHDR � 4851 IN/M80 MLFKHML �3947 RT/R450 RCLNNRW � 4854 IN/L81 LMFKQLN �3948 RT/R450 RLFKHRW � 4856 IN/L82 NCLNMNR �3965 RT/R456 RCLNTRM � 4865 IN/R85 RCLNNRT �4023 RT/V475 VVFKQVV � 4875 IN/K88 KMFKHKN �4100 RT/E501 ECLNTEA � 4895 IN/K95 KCLNSKA �4113 RT/T505 TLFKQTR � 4902 IN/C97 CVFKHCA �4149 RT/A517 AVFKHAD � 4911 IN/V100 VMFKQVN �4275 RT/Q559 QLFKHQR � 4917 IN/A102 AMFKHAS �4299 RT/T567 TLFKHTQ � 4923 IN/K104 KLFKHKS �4311 RT/K571 KMFKQKM � 4934 IN/K108 KCLNIKQ �4313 RT/M572 MCLNKMA � 4947 IN/R112 RVFKHRV �4314 RT/M572 MVFKQMA � 4953 IN/R114 RVFKHRG �4320 RT/E574 EVFKQEG � 4967 IN/R119 GCLNNGT �4343 RT/T582 TCLNNTD � 4998 IN/I129 IMFKQIK �4346 RT/D583 DCLNTDS � 5000 IN/K130 KCLNIKP �4353 RT/R585 RMFKHRY � 5009 IN/L133 LCLNRLY �4362 RT/F588 FVFKHFA � 5010 IN/L133 LLFKQLY �4367 RT/T590 TCLNTTA � 5012 IN/Y134 YCLNMYG �4394 RT/R600 RCLNNRR � 5013 IN/Y134 YVFKQYG �4458 RT/L620 LLFKHLL � 5018 IN/Y136 YCLNSYK �4460 RT/L621 LCLNILK � 5025 IN/Y138 YLFKQYL �4461 RT/L621 LMFKQLK � 5045 IN/T145 TCLNNTF �4463 RT/K622 KCLNIKA � 5049 IN/F146 FLFKHFS �4464 RT/K622 KVFKQKA � 5057 IN/W149 WCLNSWI �4473 RT/F625 FLFKHFL � 5064 IN/E151 EVFKQEA �4503 RT/C635 CLFKHCP � 5132 IN/R174 RCLNTRF �4518 RT/K640 KVFKQKG � 5138 IN/G176 GCLNIGM �4520 RT/G641 GCLNKGH � 5165 IN/N185 NCLNNNG �4538 RT/R647 RCLNTRG � 5174 IN/A188 ACLNTAF �4541 RT/G648 GCLNRGN � 5175 IN/A188 AFFKHAF �4559 RT/Q654 QCLNNQA � 5189 IN/V193 VCLNKVS �4583 RT/T662 TCLNITE � 5217 IN/G202 GMFKQGI �4604 RT/T669 TCLNTTL � 5243 IN/Y211 YCLNTYR �4607 RT/L670 LCLNTLL � 5244 IN/Y211 YMFKQYR �4614 IN/I1 IVFKHIE � 5250 IN/P213 PLFKQPQ �4628 IN/P6 PCLNTPY � 5253 IN/Q214 QMFKHQS �4629 IN/P6 PLFKQPY � 5255 IN/S215 SCLNKSS �4638 IN/S9 SVFKHSE � 5256 IN/S215 SLFKQSS �4640 IN/E10 ECLNTEH � 5261 IN/G217 GCLNTGQ �4641 IN/E10 ELFKQEH � 5273 IN/R221 RCLNKRM �4646 IN/F12 FCLNNFH � 5274 IN/R221 RMFKQRM �4647 IN/F12 FLFKHFH � 5276 IN/M222 MCLNRMN �4650 IN/H13 HLFKHHY � 5279 IN/N223 NCLNMNR �4661 IN/R17 TCLNMTD � 5282 IN/R224 RCLNNRT �4671 IN/K20 KVFKQKD � 5289 IN/I226 IMFKHIK �4686 IN/L25 LVFKQLG � 5291 IN/K227 KCLNIKE �4694 IN/I28 ICLNNIY � 5292 IN/K227 KVFKHKE �4695 IN/I28 ILFKHIY � 5295 IN/E228 EMFKQET �4697 IN/Y29 YCLNIYD � 5298 IN/T229 TLFKQTL �4698 IN/Y29 YVFKHYD � 5300 IN/L230 LCLNTLT �4707 IN/T32 TMFKQTK � 5309 IN/L233 LCLNKLT �4724 IN/Y38 YCLNIYQ � 5310 IN/L233 LMFKQLT �4728 IN/Q39 QVFKHQG � 5327 IN/S239 SCLNSSR �4730 IN/G40 GCLNKGK � 5330 IN/R240 RCLNTRD �4733 IN/K41 KCLNRKP � 5331 IN/R240 RVFKHRD �4736 IN/P42 PCLNKPV � 5354 IN/L248 LCLNTLA �4737 IN/P42 PVFKQPV � 5355 IN/L248 LVFKHLA �4740 IN/M44 MFKHVNP � 5390 IN/H260 HCLNTHG �4743 IN/M44 MLFKQMP � 5435 IN/L275 LCLNTLV �4746 IN/P45 PVFKQPD � 5450 IN/D280 DCLNTDP �4755 IN/F48 FMFKHFT � 5465 IN/R285 RCLNTRV �4761 IN/F50 FVFKHFE � 5487 IN/L292 LLFKHLQ �4767 IN/L52 LLFKQLL � 5535 IN/R308 RLFKQRP �4775 IN/F55 FCLNNFL � 5574 IN/V322 VCLNTVV �4779 IN/L56 LLFKHLH � 5634 IN/K341 KMFKHKN �4785 IN/Q58 QLFKHQL � 5654 IN/K348 KCLNRKG �4788 IN/L59 LMFKQLT � 5673 IN/L354 LLFKHLL �4790 IN/T60 TCLNMTH � 5742 IN/A377 AVFKQAA �

a Position of insertion, based on the vRNA sequence (82).b Amino acid position within RT or IN, as indicated, N-terminal to the insertion.c Sequences of the 5-aa inserts are shown in bold. The first and last amino acids are encoded by the WT pol gene product.d �, positive, or �, negative, for reverse transcriptase activity released into the media after transient introduction of the MuLV provirus in D17/pJET cells (see Materials

and Methods).

9501

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

domain (amino acids [aa] 1 to 105), 56 to the catalytic core (aa106 to 286), and 7 to the C-terminal domain (aa 287 to 408).

Sequencing of the HIV-1 IN library showed that the inser-tions were distributed throughout the HIV-1 IN gene (Fig. 3and Table 3). The insertion sites, however, were not randomlydistributed, with clustering of insertions within the C-terminaldomain. Of the 111 insertions, 2 were within the N-terminaldomain, 35 were within the catalytic core, and 74 were withinthe C-terminal domain. Of these, 56 clones had unique inser-tion positions and the correct sequence.

In vivo analysis of individual M-MuLV isolates. The viabil-ity of individual viral constructs was tested for the passage oftransiently expressed virus in tissue culture. Three series ofviral constructs were analyzed. The first consisted of a randommixture of both in-frame and terminating codon insertionsspanning the complete target sequence. Since the pol gene isexpressed as a precursor protein containing protease, reversetranscriptase, and integrase, it was predicted that terminationcodons within RT would be lethal, resulting in loss of MuLVIN protein. Twenty-nine termination codon insertions wereincluded for analysis. The second series contained linker in-sertion mutations that mapped to the MuLV IN protein plusone termination codon at the C terminus of IN, and the third

series included mutations within the MuLV RT connectionand RNase H domains.

Plasmid DNA of the individual constructs from the finalpNCA-C-XN-SU8 insertion library was transiently introducedinto D17/pJET cells in the presence of DEAE-dextran. Ondays of confluence, cells were screened for the release of re-verse transcriptase into the supernatant. Figure 4 is an auto-radiograph of one RT assay performed on day 16. The inser-tions were arranged within the 96-well plate in a linear orderfrom the 5� end to the 3� end of the pol gene. Rows A to Gcontained 84 in-frame insertions; a single termination insertionwithin the C terminus of IN (in5743. IH) was included. Thepositive controls, pNCA-C-XN-Su8 (H11) and pNCA-C (H12)are clearly positive for RT activity at this time point. Quiteremarkably, two regions of viable insertions are readily de-tected in this series. The first 10 isolates are all viable. Thesecorrespond with the linker insertions initiating at the extremeC terminus of RNase H and spanning into the N terminus ofMuLV IN. These include in4583, in4603, in4607, in4628,in4629, in4638, in4640, in4641, in4647, and in4650. This indi-cates that insertions within the first 14 amino acids of MuLVIN are tolerated as well as the terminal 9 aa of MuLV RNaseH (Fig. 2). Within this region, one insertion from a separateassay series was found not to be viable (Table 1, in4614). Thisinsertion is within the protease recognition sequence and re-sults in the substitution of the P2� and P3� position from STLL/IEN to STLL/IVF.

Of considerable interest are the three consecutive insertions(G7 to 9) (Fig. 4) consisting of in5450, in5465, and in5487.These three insertions span a 12-aa region between the coreand C-terminal domain. This region has not been previouslyexplored in mutational analyses of IN. Using the homology ofMuLV and HIV-1 IN defined by McClure et al. (62), theequivalent region of HIV-1 IN was identified. Figure 5A showsthe mapping of this region onto the two-domain structure(IEX4) of the HIV-1 IN core-C terminus (18). The region ofHIV-1 IN homologous to MuLV IN spanning in5450 to in5487is shown, as are the HIV-1 IN core domain and C terminus.The region connecting the C terminus and core consists of anextended alpha-helix, containing a central bend. The homolo-gous insertion-tolerant region maps within this alpha-helicaldomain, centered at the bend. The net result of the 5-amino-acid insertion would be to lengthen the distance between thetwo domains and/or increase the discontinuity of the extendedalpha-helix.

A third region of the MuLV IN protein was found nones-sential. This mapped to the extreme C terminus of MuLV IN.Interestingly, insertion in4742 resulting in the in-frame inser-tion of AVFKAAA (insertion shown in boldface type) wasviable (Fig. 2), whereas the terminator insertion in5743 encod-ing AA*TAA was nonviable (Fig. 2). These studies more finelydefine the nonessential region of the C terminus of MuLV IN.Linker insertions and truncational studies that mapped threeamino acids upstream were previously reported to be nonvia-ble (Fig. 2), whereas in-frame insertions and truncations map-ping 2 amino acids downstream were viable (Fig. 2) (74). Ofthe terminator insertions, only one, in5764, was viable. Thismapped within the region previously identified to be nones-sential (74).

The 16 viable viruses identified in this analysis appeared with

TABLE 2. Summary of terminations

MuLVa aab Insertc RTd

3730 RT/A377 ANV*TAK �4099 RT/A500 ADV*TAE �4159 RT/T520 TCV*TTW �4312 RT/K571 KIV*TKM �4318 RT/A573 ADV*TAE �4363 RT/F588 FAV*TFA �4369 RT/T590 TAV*TTA �4387 RT/E596 EIV*TEI �4411 RT/L604 LTV*TLT �4462 RT/L621 LNV*TLK �4474 RT/F625 FLV*TFL �4483 RT/K628 KSV*TKR �4544 RT/G648 GNV*TGN �4627 IN/S5 SPV*TSP �4672 IN/K20 KDV*TKD �4687 IN/L25 LGV*TLG �4726 IN/Y38 YHV*TYQ �4867 IN/R85 RTV*TRT �4876 IN/K88 KNV*TKN �4999 IN/I129 INV*TIK �5005 IN/P131 PGV*TPG �5011 IN/L133 LYV*TLY �5020 IN/Y136 YNV*TYK �5104 IN/T164 TNV*TTK �5257 IN/S215 SSV*TSS �5347 IN/L245 LLV*TLL �5458 IN/D282 DIV*TNM �5554 IN/Y314 YHV*TYQ �5743 IN/A377 AAV*TAA �5764 IN/P384 PSV*TPS �

a Position of insertion, based on the vRNA sequence (82).b Amino acid position within RT or IN, as indicated, N-terminal to the insertion.c Sequences of the amino acid inserts are indicated in boldface type. Asterisks

indicate the stop termination codons. The first and last amino acids are encodedby the WT pol gene product.

d �, positive, or �, negative, for reverse transcriptase activity released into themedia after transient introduction of the MuLV provirus in D17/pJET cells (seeMaterials and Methods).

9502 PUGLIA ET AL. J. VIROL.

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

a time course identical to that of the parental pNCA-C-XNvirus. The RT� virus passaged in this study was isolated andutilized to isolate the unintegrated viral DNA by the method ofHirt (41). The terminal two-thirds of the pol gene was PCRamplified from the viral DNA. This PCR product was se-quenced in its entirety. All of the viral constructs maintainedthe linker insertion sequence encoding the PmeI site. No ad-ditional second-site mutations within MuLV IN were identi-fied.

Mutations within MuLV RT-connection-RNase H. Of the 40linker insertions within the C-terminal half of MuLV RT en-coding connection and RNase H, only the three extreme C-terminal in-frame insertions were viable (in4583, in4603, andin4607). These define the sequences at the MuLV RT-IN junc-tion. These results are surprising, as the preliminary X-raystructure of the MuLV RT contains unstructured or flexibleloops in several regions within connection-RNase H. Figure 6shows a molecular model of MuLV RT based upon the struc-tures 1RW3 and MuLV RNase H domain (54). Gaps in thestructure were reconstructed and are indicated, including theregion between amino acids 327 to 334 (thumb), 475 to 504(joining connection with the RNase H domain), 592 to 603(RNase H), and 633 to 642 (RNase H). The positions of thelinker insertions are mapped onto this MuLV RT model (Fig.6). Although several of these insertions map within structurallyundefined regions, none of the inserts were viable. These non-structured regions display stringent requirements for correctreplication of the virus in vivo.

Expression of MuLV IN C-terminal domain. The position-ing of the viable insertions between nucleotides 5450 to 5487 isindicative of a domain boundary between the core and C ter-minus. To confirm this through biochemical means, a deletionanalysis of the MuLV IN protein was performed to identify astably expressed C-terminal domain. Of the 93 deletion con-structs generated, 20 with deletions beyond position 5137 ofMuLV (82) were further analyzed for protein expression. Itwas predicted that one in three N-terminal directional dele-tions would result in an in-frame deletion. However, only oneconstruct reproducibly yielded an abundant, stable MuLV C-terminal IN protein. Figure 7 shows the screening of five indi-vidual constructs, where isolate 77 expressed a single 17-kDaprotein. DNA sequence analysis identified the N terminus ofthis protein to be TNSP, corresponding with Thr287 of MuLVIN (marked in Fig. 2). This maps to nucleotide 5471 (82) in thecenter of the region defined by the linker insertion analysis.Additional studies indicated that IN 287 to 408 construct (no.77) could be purified from soluble E. coli extracts by nickel-affinity chromatography (data not shown). These results con-firm, using biochemical data, that the boundary between thecore and C-terminal domain lies within this region.

In vitro analysis of individual HIV-1 IN mutants. Fifty-sixinsertional mutant proteins were expressed, purified, and as-sessed for strand transfer and disintegration activity (Table 3and Fig. 3). These activities varied and are summarized belowbased on the position of the mutation within the N-terminal,core and C terminus of the HIV-1 IN protein.

FIG. 3. Mutation functional map of insertions of HIV IN. Positions of each insertion (indicated by arrow) and their activity (using differentcolor scheme) relative to disintegration (circle) and strand transfer activity (square) are shown in the alignment of HIV-1 and MuLV IN protein.Amino acid sequences alignment of MuLV and HIV-1 IN was based on the method of Johnson et al. (44). Dots indicate alignment gap/insertion.Numbering from the N terminus of MuLV IN includes alignment gaps. The GenBank accession number for MuLV IN sequences is NC 001501.Known structural elements of HIV-1 IN, determined by crystallography of recombinant HIV-1 IN (18, 87), are also shown (bold horizontal lines)above the respective homologous segments. Their PDB accession numbers are 1K6Y and 1ex4, respectively. Core structural elements are labeledwith a prime (�); C-terminal elements are labeled with a double prime (�). HHCC and DDE motifs are highlighted by red color. Activity is basedon the WT activity set to 100%: �, 0%; , 0 to 5%; �, 6 to 35%; ��, 36 to 75%; ���, 76 to 100%.

VOL. 80, 2006 RNase H AND INTEGRASE DOMAIN STRUCTURE-FUNCTION 9503

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

HIV-1 IN N-terminal domain mutants. The HIV-1 N-termi-nal domain is made of a three-helix bundle (Fig. 5B). Twoinsertions were identified at N272L, located at one end of thehelix bundle in the loop connecting the second (�2) and third

(�3) helices. The two mutants were at the same position buthad different amino acid sequence insertions. Both of the mu-tants retained full disintegration activity; however, integrationactivity was barely detectable. Two additional insertions,D552C and P582G, fall into the hinge region between theHIV-1 HHCC and core domains. In the two-domain crystalstructure, this connecting region (residues 47 to 55) is disor-dered in all four monomers (87). These two insertions retainedfull disintegration activity and had moderate to full integrationactivity.

HIV-1 IN core mutants. In the HIV-1 core domain (aa 50 to186) (12), all insertions resulted in disruption of strand transferactivity. The requirements for strand transfer are more strin-gent than disintegration, and three regions that displayed lowlevels of disintegration were identified. These include threeinsertions (I1352K, N1442P, S1472Q) located between �3and �4 and a group of five mutations (D1672Q throughM1782A) from the end of �4 into �5. Interestingly, six inser-tions that were distributed within �6� (A196 to E212) showeda gradient of increasing disintegration activity as one movestoward the C-terminal end of the helix. Of considerable inter-est, insertion E2122L maintained nearly full disintegrationand integration activity. E2122L is within the region connect-ing the C terminus and core, which consists of an extendedalpha-helix with a bend at the center.

HIV-1 IN C-terminal domain. In the HIV-1 C-terminal do-main, four different regions of activity were identified, and theoverall activity of each region increased toward the C terminus.In the first region, between 1� and 2�, two insertions wereidentified (R2282D and S2302R) with barely detectable dis-integration and no integration activity. The second region,which comprises 2� through 4� revealed 10 insertion sitesthat retained a higher level of disintegration than the firstregion but exhibited no integration activity. Mutant G2472A,which is just before 3�, was the exception, as it retained fullintegration and disintegration activity. Interestingly, two inser-tions, which are right before and after G247, had no integra-tion activity and were decreased in disintegration activity. Thethird region, which is after 5� (from I268 to V281) had similarlevels of activity in disintegration and retained moderate inte-gration activity compared to wild-type HIV-1 IN. The fourth

FIG. 4. RT assay. RT assay of 85 individual isolates 16 days aftertransfection into D17/pJET cells (see Materials and Methods). RT-positive constructs are as follows: A1, in4583-15; A2, in4603-15; A3,in4607-15; A4, in4628-15; A5, in4629-15; A6, in4638-15; A7, in4640-15;A8, in4641-15; A9, in4647-15, A10, in4650-15; G7, in5450-15; G8,in5465-15; G9, in5487-15; H11, XN, parental vector pNCA-C-XN-SU8(positive control); H12, WT, full-length pNCA-C M-MuLV proviralvector.

TABLE 3. Summary of HIV-1 IN insertions

Insertionpositiona Insertion amino acid sequenceb DSc,e STd,e

N272L LSLVHILRPQDVYKRQDFN †† N272L PVSCTHLAAARCVQETDFN ††† D552C LSLVHILRPQDVYKRQQVD ††† ††D552C CLLYTSCGRKMCTRDRQVD ††† ††P582G VSCTHLAAARCVQETDCSP ††† †††D642C SVSCTHLAAARCVQETELD �I732L LSLVHILRPQDVYKRQKVI � �Y832I TVSCTHLAAARCVQETGGY � �E962T LSLVHILRPQDVYKRQGQE � �T1152D CLLYTSCGRKMCTRDRVHT � �D1162N TVSCTHLAAARCVQETDTD � �T1252V AVSCTHLAAARCVQETGTT � �A1282A CLLYTSCGRKMCTRDRVKA � �W1312W LSLVHILRPQDVYKRQACW � �I1352K CLLYTSCGRKMCTRDRAGI †† �N1442P LSLVHILRPQDVYKRQPYN † �S1472Q CLLYTSCGRKMCTRDSPQS �G1492V AVSCTHLAAARCVQETGQG � �V1652R LSLVHILRPQDVYKRQGQV � �D1672Q CLLYTSCGRKMCTRDRVRD † �A1692E CLLYTSCGRKMCTRDRDQA � �E1702H PVSCTHLAAARCVQETEAE † �K1732T LSLVHILRPQDVYKRQHLK �T1742A CLLYTSCGRKMCTRDSLKT † �M1782A LSLVHILRPQDVYKRQVQM �A1962G CLLYTSCGRKMCTRDRYSA �I2002V CLLYTSCGRKMCTRDRERI † �V2012D AVSCTHLAAARCVQETGIV † �A2052T VSCTHLAAARCVQETDIIA † �D2072I LSLVHILRPQDVYKRQATD † �E2122L SVSCTHLAAARCVQETAKE †† ††R2282D AVSCTHLAAARCVQETDYR † �S2302R SCLLYTSCGRKMCTRDRDS † �A2392K TVSCTHLAAARCVQETGPA ††† �K2402L PVSCTHLAAARCVQETAAK †† �W2432K NCLLYTSCGRKMCTRDSLW †† �G2452E CLLYTSCGRKMCTRDSWG †† �G2472A LSLVHILRPQDVYKRQGEG ††† †††A2482V CLLYTSCGRKMCTRDSEGA †† �V2502I CLLYTSCGRKMCTRDRAVV †† �V2502I LSLVHILRPQDVYKRQAVV †† �I2512Q PVSCTHLAAARCVQETVVI ††† �D2532N LSLVHILRPQDVYKRQIQD ††† �S2552D CLLYTSCGRKMCTRDRDNS ††† �V2592V CLLYTSCGRKMCTRDSIKV ††† �I2682R SCLLYTSCGRKMCTRDRII ††† ††R2692D AVSCTHLAAARCVQETVIR †† ††G2722K CLLYTSCGRKMCTRDRDYG ††† ††K2732Q PVSCTHLAAARCVQETDGK ††† ††D2792C LSLVHILRPQDVYKRQGDD ††† ††V2812A CLLYTSCGRKMCTRDSDCV ††† ††R2842D PVSCTHLAAARCVQETASR ††† †††Q2852D AVSCTHLAAARCVQETGRQ ††† †††E2872D AVSCTHLAAARCVQETEDE ††† †††D2882 CLLYTSCGRKMCTRDRDED ††† †††

a Positions are based on the protein sequence of HIV-1 IN. Arrows mark theinsertion between the two amino acids indicated. All isolates are independent in-sertions.

b Sequence of the 19-aa insertion.c Disintegration assay.d Strand-transfer assay.e Activity is based on WT IN activity (set at 100%). Symbols: �, 0%; , 0 to

5%; †, 6 to 35%; ††, 36 to 75%; †††, 76 to 100%.

9504 PUGLIA ET AL. J. VIROL.

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

region, which comprised insertions at R284 to the C terminus,retained full disintegration and integration activity.

Overall summary of HIV and MuLV IN analysis. In sum-mary, four regions retained full integration activity in this com-plementary in vivo and in vitro study of M-MuLV and HIV-1IN, respectively. These correspond to the first 14 amino acidsof IN (MuLV), the hinge region connecting the N-terminaland core domains (HIV), the region within the �6� helix con-necting the core and C-terminal domains (MuLV and HIV),and the extreme C terminus of the IN (MuLV and HIV).

DISCUSSION

The retroviral genome has evolved to encode multifunc-tional proteins expressed within polyproteins. These compactviral particles must assemble, infect, replicate, and integratethe viral genome using limited enzymatic functions. In thisstudy, we have used two parallel transposon-based mutationalsystems (Tn5 and Tn7), differing in the size of the insertion, tocreate functional maps of the M-MuLV and HIV-1 IN pro-teins. Studies in M-MuLV extend the region within the 3�

FIG. 5. (A) MuLV viable domain mapped onto the HIV-1 core-C terminus structure (1EX4). The 14-amino-acid region in MuLV IN spanninginsertions in5450-15 through in5487-15 (DPDMTRVTNSPSLQ) was tolerant of 5 amino acid insertions in vivo. This region corresponds to theHIV-1 IN sequence IATDIQFKELQKQI (44), which is highlighted in red (A204 to I217 of the A molecule in 1EX4 is taken from the two-domainstructure of the HIV-1 core-C terminus [18]). The HIV-1 core domain is colored blue; the C terminus is yellow. The C terminus ends at aminoacid 271. The figure was generated in MOLSCRIPT V 2.0 (49). (B) A three-dimensional structural model of the HIV-1 monomer (aa 1 to 288).The locations of the insertion mutations and their subsequent effects on disintegration and strand transfer activity are shown using the color schemecorresponding to Fig. 3. Amino acid numbering within HIV-1 IN is shown in white. The large spheres denote disintegration activity and thewidened colored linear portions denote strand transfer activity.

FIG. 6. Position of the linker insertions within M-MuLV RT-RNase H. The figure shows two views, differing by 180°, of the molecular modelof the M-MuLV RT. The individual subdomains are colored as follows: finger-palm, salmon; thumb, pink; connection, blue; and RNase H, green.The catalytic triad (D524, E562, and D583) is shown as space-filled orange spheres. The loop structures introduced into structurally undefinedregions are yellow. The position of each individual linker insertion is shown in red. Amino acid positions within MuLV RT are shown in black.The figure was generated in MOLSCRIPT V 2.0.

VOL. 80, 2006 RNase H AND INTEGRASE DOMAIN STRUCTURE-FUNCTION 9505

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

terminus of the pol gene to include the connection and RNaseH domains of RT. Analysis of 178 mutations in MuLV and 57mutations in HIV-1 IN indicate limited nonessential regionstolerant of amino acid insertions. These regions localize toprotein and domain boundaries, between the RT and IN, be-tween the N terminus and the core of IN, at the C terminus ofIN, and between the core and C terminus of IN. Althoughthese results are nonsaturating, the data indicate functionalconservation even within regions shown to be disorderedwithin crystallographic structures.

Several systems have been developed for “genetic footprint-ing” of a gene based upon the generation of a library of ran-dom inserts and screening those pools for selectable pheno-types. The systems are based on bacterial transposons,including Tn5, Tn7, and Mu, or viruses (5, 42, 73, 78, 83).These systems have the potential to screen the entire popula-tion of insertions before and after a selection process throughpositional mapping of the inserts by PCR. The two systemsutilized in this study characterized individual isolates ratherthan the population as a whole. For the in vitro studies, selec-tion for IN function is complex, and a high-throughput ap-proach was developed. For the Tn7 system, the unique se-quence of the insertion is limited to a 10-nucleotide region,which is insufficient to direct a PCR primer to specificallyhybridize. Mapping insertions using a series of nested PCRproducts followed by PmeI digestion proved difficult, as thePCR products were not efficiently cleaved by PmeI. Due to thistechnical difficulty, this study focused on analysis of individualisolates whose insertion sites could be predetermined prior tointroduction into tissue culture for selection. This approacheliminated several additional complications, including limitingthe number of termination insertions analyzed as well as de-creasing the number of false positives resulting from comple-mentation and/or recombination of mixed infections. Approx-imately 750 isolates were sequenced to identify the 178 uniqueisolates utilized in these studies. Within this population of 750,

duplicates were identified, indicating that the population ana-lyzed was representative of the library.

The domain boundaries defined in these studies are in gen-eral agreement with previous biochemical studies. For theMuLV RT, deletion studies in E. coli which identified a stableand active MuLV RT (pB6B15.23) (77) resulted in the trun-cation of the seven terminal amino acids of RT/RNase H. Thistruncation is within the 9-aa region at the C terminus of RT,which was tolerant of linker insertions. Similarly, MuLV INdeletion constructs (p135-1) (76), which lacked the N-terminal8 amino acids, bound DNA similar to a full-length IN con-struct. The results of these studies indicate that the N-terminal13 amino acids of MuLV IN tolerated insertions in vivo. Theone exception was the insertion that altered the protease rec-ognition site (in4614). It should be noted that the N terminusof MuLV IN encodes 45 amino acids not conserved in eitherHIV or avian sarcoma virus-related INs (92). The region tol-erant of 5-aa insertions at the N terminus of MuLV IN mapswithin this nonconserved region. Previous studies indicatedthat the MuLV IN C terminus could be truncated by 28 aminoacids and maintain virus viability (74). These studies refine thisregion, demonstrating that truncation of 31 aa resulted in non-viable virus. Interestingly, the in-frame linker insertion at thiscoding region was viable, whereas insertions 3 amino acidsupstream were not. These boundaries for IN function mayassist in expressing minimized IN constructs for crystallizationstudies.

In the HIV-1 IN N terminus, only two insertions, at N27,were obtained. These insertions retained disintegration buthad barely detectable levels of integration. Relevant to ourmutants, it has been shown that a monoclonal antibody whichinteracts with amino acids 27 to 29 destabilizes the N-terminalhelical bundle and decreases 3� processing and transfer activ-ities of HIV-1 IN in vitro (95). In addition, it is known thatdeletion of the N-terminal 39 aa abolishes integration activity(25). In the core domain of HIV-1 IN, using an extensive panelof mutants, we show that integration was abolished and disin-tegration was diminished with insertions between D642C andE212L inclusive. HIV-1 IN disintegration requires only thecore domain (residues 50 to 186) (12). Importantly, this set ofmutants demonstrates the compactness of IN and underscoresthe complexity of intramolecular and intermolecular interac-tions that IN must maintain during the integration process. Inour studies, it was anticipated that some of the loop regionswithin the core might be more amenable to mutation given thesolvent accessibility shown in the monomer and dimer struc-tures, such as the loops between the N-terminal �3 and thecore 1�, the core 5� and �4�, and the core �5� and �6�. Whilewe did not expect integration activity per se, we expecteddisintegration, since this activity may not require a higher-order complex. However, in our studies, insertions located atthe core loops 5� and �4� and �4� and �5�, all lost integrationactivity and had no or barely detectable disintegration activity.These two regions retaining minimal disintegration activitycorrespond to an extended loop (residues 137 to 156) and aflanking region (residues 161 to 173), which are protected fromproteolysis upon metal binding (2, 3). Substitution of Gly140and Gly149 with more constrained Ala residues impaired ca-talysis of HIV-1 IN, indicating a requirement for some degreeof conformational flexibility for catalytic activity (37). These

FIG. 7. Expression of a stable C-terminal IN domain. Whole-cell E.coli extracts of individual deletion constructs of the MuLV IN Nterminus. Extracts were subjected to sodium dodecyl sulfate-polyacryl-amide gel electrophoresis, followed by Coomassie blue staining. Fiveindividual colonies are shown. Lane 1, isolate 1; lane 2, isolate 3; lane3, isolate 74; lane 5, isolate 77; lane 6, isolate 35. The positions of theprotein standards are indicated at the left. The arrow marks the stableC-terminal IN287-408 MuLV protein product of approximately 17kDa (isolate 77).

9506 PUGLIA ET AL. J. VIROL.

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

two loops are believed to undergo significant movement to aidin the coordination of a metal ion by the catalytic triad (2, 3).Interestingly, residues 168 to 171 are also reported to contactthe host factor LEDGF (20).

Previously, we and others had shown that the C terminus ofHIV-1 and M-MuLV IN can tolerate large C-terminal dele-tions and, similar to the core, can still retain considerabledisintegration activity (12, 25, 46, 61). Herein, we show fourdifferent regions in the HIV-1 IN C terminus with a gradient ofincreasing activity as one moves toward the carboxyl terminus.Insertional mutants after amino acids 239 in 2� and in theloop between 3� and 4� lost strand transfer activity whileexhibiting full or moderate levels of disintegration activity.G2472A was an exception, as it retained full integration anddisintegration activity. Interestingly, the insertions in 2� and3�, positioned before and after G247, had no integrationactivity and low disintegration activity. The context of G247differed within two molecular models of an HIV IN tetramer(72, 87). In contrast to the Wang tetramer model (72, 87), a 19aa insertion in the Podtelezhnikov et al. model (72, 87) couldinterfere with the binding of a putative LTR and stericallyclash with the loop region (between 1� and 2�) of anothercore molecule. Our results are consistent with this tetramermodel. Insertions after I268 and before Q284 had similar levelsof activity in disintegration and retained moderate integrationactivity compared to wild-type IN. The terminal region, whichcomprised insertions after R284, retained full integration anddisintegration activity.

It is of interest that, although functional complementation ofMuLV IN was achieved in vitro using constructs that stablyexpressed the N-terminal zinc binding domain (MuLV IN1-105) with the core-C terminus fragment (MuLV IN 106 to 404)(91), no viable linker insertion was identified in vivo at thejunction of the HHCC domain and the core domain. However,in the case of the in vitro HIV-1 IN mutational study, three19-bp insertions at two positions (D552C and P582G) wereidentified at the transition between the HHCC and core do-main, which retain full activity in both disintegration andstrand transfer activity of HIV-1 IN. The D55/C56/S57 se-quence is proposed to be involved in close proximity with theHIV LTR positions 1 to 4, based on a structural tetramermodel (16).

Although it is possible that the linkers are substituting fornatural amino acids at that position, we did not observe in-stances where two in-frame insertions at the same positionresulted in differential effects both in MuLV and HIV-1. Thismight have been predicted, as the insertions frequently encodeCys, which could alter the protein folding. However, within theMuLV IN 6/11 viable insertions encoded Cys. In both theHIV-1 and MuLV IN studies, insertions at the same codingsequence were identified that behaved identically, indicatingthere was not a positive selection for a Cys residue to, forexample, stabilize the region. For MuLV IN, this is exemplifiedwithin in4628 PCLNTPY and in4628 PLFKQPY; for HIV, thetwo insertions at D552C encode LSLVHILRPQDVYKRQQVD and CLLYTSCGRKMCTRDRQVD and those atV2502I encode CLLYTSCGRKMCTRDRAVV and LSLVHILRPQDVYKRQAVV.

Insights into the boundaries defining the insertion-tolerantregion between the core and C terminus were obtained in these

comparative studies. In M-MuLV, this region, encoding DPDMTRVTNSPSLQ, corresponds with HIV-1 IN sequence IATDIQFKELQKQI (Fig. 5A). At the 5� terminus, the closestnonviable 5-amino-acid insertion in MuLV IN is 5 aa up-stream. However, the closest insertion downstream of 5487 isat 5535, 16 aa C-terminal. A more-saturated library within thisregion would be required. The deletion study that identified astable C-terminal construct mapped directly within this region,supporting this as a domain boundary. The 19-amino-acid in-sertions within HIV-1 IN provide additional insights into theseboundaries. A panel of insertional mutants within the HIV-1IN �6� showed a gradient of increasing disintegration activity,with E2122L active for both disintegration and integration.Insertion E2122L maps within the 12-aa region homologousto MuLV (IATDIQFKELQKQI, where EL is underlined)(18). Insertions C-terminal to the observed bend toleratedinsertions of both 5 and 19 amino acids, in vitro and in vivo inthe HIV-1 and MuLV IN, respectively. The 19-aa insertionD2072I maps within the region homologous to MuLV IN(IATDIQFKELQKQI, with the DI insertion site underlined)yet is not active for disintegration of strand-transfer activity.Thus, differences in the boundaries between HIV-1 and MuLVIN were identified. This may reflect the differences in the sizeof the insertions, where 5 amino acids are tolerated and 19amino acids are not, or structural differences in the assembly ofIN multimers.

In both MuLV and HIV IN studies, the results indicateconsiderable flexibility in the linkage between the catalytic coreand C-terminal domain, either through lengthening the dis-tance between the two domains and/or increasing the discon-tinuity of the extended alpha-helix. It is not known whether theinsertions into the long �6� helix that connects the core and Cterminus present a favorable condition for the virus. In therelated insertional study of the Cre recombinase, insertionsinto the M-N linker increased DNA binding cooperativity (71).In this system, it was proposed that extending the length of thelinker would lead to a smaller bend angle and thus stabilizepartner Cre subunits binding to the loxP. In a similar manner,extending the distance between the core and C terminus in INmay assist in the assembly of the synaptic complex consisting ofthe two viral termini plus the target DNA. The arrangement ofthe C-terminal domain relative to catalytic core differs amongHIV-1, simian immunodeficiency virus type 1, and Rous sar-coma virus IN X-ray structures (18, 19, 94).

The results of the linker insertions into the MuLV RT-connection and RNase H domains were unexpected, as noviable mutations outside the extreme C terminus were identi-fied. Figure 6 contains a molecular model of the MuLV RT,based on the structure of the MuLV RT (1RW3, 443 resides,encoding through residue 474), plus the model of the MuLVRNase H �C domain (54). To assist in mapping the linkerinsertions, the structurally undefined and deleted regions werereconstructed into this model as tubes. These include the re-gion within the thumb (residues 327 to 334), the region in theconnection domain downstream of residue 474 through to thestructurally unelucidated region within RNase H (residues 475to 504), the �-C helix of RNase H, the region homologous tothe His loop (23) in HIV-1 RNase H (residues H634 to H642),and residues 592 to 603 of RNase H. The function of the largestructurally undefined region between residues 474 to 504 is of

VOL. 80, 2006 RNase H AND INTEGRASE DOMAIN STRUCTURE-FUNCTION 9507

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

interest. Domain mapping using in vitro RT activities (85, 86)mapped the N terminus of RNase H to position 4542 of theDNA provirus (4093 of the viral RNA) (82). Therefore, in4100localizes within the structurally undefined N terminus ofRNase H and in4113 at the beginning of the RNase H struc-tured region. The in vivo data presented in this paper correlatewith the in vitro data, indicating that the N terminus of RNaseH, despite being structurally undefined, is essential for RNaseH activity. In addition, in4023 maps within the structurallyundefined region of the RT connection domain. By molecularmodeling, residues 475 to 504 were placed on the opposite faceof the RT molecule from where the nucleic acid binding sitelies, and it was therefore believed that it may reflect a nones-sential region of RT. However, in4023 was found to be nonvi-able in vivo. Interestingly, insertions within this region (M38,H7, and H2) (85) were found to be temperature sensitive forRT activity in vitro. Conformational changes within this regionmay be required for switching between the polymerase andRNase H activities or to allow steric access to the active sites.Similarly, in Cre, flexible loops were identified which were nottolerant to insertions, indicating their role in Cre function,possibly protein assembly or DNA binding. The function ofthese structurally uncharacterized loops in both RT and INneed to be defined. The intrinsic flexibility of both these en-zymes may reflect the multifunctional activities and stagedassembly steps required to specifically bind and recognize theircognate substrates (24, 51).

One aim of this mutational analysis was to identify siteswithin the IN protein that may tolerate small insertional tagswhose function may alter the target site selection of the viralintegrases. Protein domains and tags have been inserted bothinto the N terminus (11, 48, 84) and C terminus (13, 36, 48, 80,81, 84) of retroviral IN constructs. The identification of theregion between the N terminus, the core, and C terminus of INas functional in the presence of a variety of linker insertionsstrongly suggests that this region could serve as a third poten-tial insertion site for short tags within the IN protein. Theability of this site to function in alternative protein-protein orprotein-DNA interactions depends on its accessibility withinthe synaptic complex. Further biochemical and structural stud-ies are required to address this question.

ACKNOWLEDGMENTS

This work was supported by NIH grants RO1 GM070837 issued toM.J.R. and GM07666-24 to C.B.J.

We thank Jennifer Jones and Naadira McClean for their assistance.

REFERENCES

1. Albritton, L. M., L. Tweng, D. Scadden, and J. M. Cunningham. 1989. Aputative murine retrovirus receptor gene encodes a multiple membrane-spanning protein and confers susceptibility to virus infection. Cell 57:659–666.

2. Asante-Appiah, E., S. H. Seeholzer, and A. M. Skalka. 1998. Structuraldeterminants of metal-induced conformational changes in HIV-1 integrase.J. Biol. Chem. 273:35078–35087.

3. Asante-Appiah, E., and A. Skalka. 1997. A metal-induced conformationalchange and activation of HIV-1 integrase. J. Biol. Chem. 272:16196–16205.

4. Auerbach, M., C. Shu, A. Kaplan, and I. Singh. 2003. Functional character-ization of a portion of the Moloney murine leukemia virus gag gene bygenetic footprinting. Proc. Natl. Acad. Sci. USA 100:11929–11930.

5. Biery, M. C., F. J. Stewart, A. E. Stellwagen, E. A. Raleigh, and N. L. Craig.2000. A simple in vitro Tn7-based transposition system with low target siteselectivity for genome and gene analysis. Nucleic Acids Res. 28:1067–1077.

6. Bowerman, B., P. O. Brown, J. M. Bishop, and H. E. Varmus. 1989. A

nucleoprotein complex mediates the integration of retroviral DNA. GenesDev. 3:469–478.

7. Bujacz, G., J. Alexandratos, Z. Qing, C. Clement-Mella, and A. Wlodawer.1996. The catalytic domain of human immunodeficiency virus integrase:ordered active site in the F185H mutant. FEBS Lett. 398:175–178.

8. Bujacz, G., M. Jaskolski, J. Alexandratos, A. Wlodawer, G. Merkel, R. A.Katz, and A. M. Skalka. 1995. High-resolution structure of the catalyticdomain of avian sarcoma virus integrase. J. Mol. Biol. 253:333–346.

9. Bukrinsky, M. I., N. Sharova, T. L. McDonald, T. Pushkarskaya, W. G.Tarpley, and M. Stevenson. 1993. Association of integrase, matrix, andreverse transcriptase antigens of human immunodeficiency virus type 1 withviral nucleic acids following acute infection. Proc. Natl. Acad. Sci. USA90:6125–6129.

10. Bushman, F. 1995. Targeting retroviral integration. Science 267:1443–1444.11. Bushman, F. D. 1994. Tethering human immunodeficiency virus 1 integrase

to a DNA site directs integration to nearby sequences. Proc. Natl. Acad. Sci.USA 91:9233–9237.

12. Bushman, F. D., A. Engelman, I. Palmer, P. Wingfield, and R. Craigie. 1993.Domains of the integrase protein of human immunodeficiency virus type 1responsible for polynucleotidyl transfer and zinc binding. Proc. Natl. Acad.Sci. USA 90:3428–3432.

13. Bushman, F. D., and M. D. Miller. 1997. Tethering Human immunodefi-ciency virus type 1 preintegration complexes to target DNA promotes inte-gration at nearby sites. J. Virol. 71:458–464.

14. Cai, M., R. Zheng, M. Caffrey, R. Craigie, G. M. Clore, and A. M. Gronen-born. 1997. Solution structure of the N-terminal zinc binding domain ofHIV-1 integrase. Nat. Struct. Biol. 4:567–577.

15. Calmels, B., C. Ferguson, M. O. Laukkanen, R. Adler, M. Faulhaber, H.-J.Kim, S. Sellers, P. Hematti, M. Schmidt, C. von Kalle, K. Akagi, R. E.Donahue, and C. E. Dunbar. 2005. Recurrent retroviral vector integration atthe MDS1-EVI1 locus in non-human primate hematopoietic cells. Blood106:2530–2533.

16. Chen, A., I. T. Weber, R. W. Harrison, and J. Leis. 2006. Identification ofamino acids in HIV-1 and avian sarcoma virus integrase subsites required forspecific recognition of the long terminal repeat ends. J. Biol. Chem. 281:4173–4182.

17. Chen, H., and A. Engelman. 1998. The barrier-to-autointegration protein isa host factor for HIV type 1 integration. Proc. Natl. Acad. Sci. USA 95:15270–15274.

18. Chen, J. C.-H., J. Krucinski, L. J. W. Miercke, J. S. Finer-Moore, A. H. Tang,A. D. Leavitt, and R. M. Stroud. 2000. Crystal structure of the HIV-1integrase catalytic core and C-terminal domains: a model for viral DNAbinding. Proc. Natl. Acad. Sci. USA 97:8233–8238.

19. Chen, Z., Y. Yan, S. Munshi, Y. Li, J. Zugay-Murphy, B. Xu, M. Witmer, P.Felock, A. Wolfe, V. Sardana, E. A. Emini, D. Hazuda, and L. C. Kuo. 2000.X-ray structure of simian immunodeficiency virus integrase containing thecore and C-terminal domain (residues 50–293)-an initial glance of the viralDNA-binding platform. J. Mol. Biol. 296:521–533.

20. Cherepanov, P., A. Ambrosio, S. Rahman, T. Ellenberger, and A. Engelman.2005. Structural basis for the recognition between HIV-1 integrase andtranscriptional coactivator p75. Proc. Natl. Acad. Sci. USA 102:17308–17313.

21. Das, D., and M. Georgiadis. 2004. The crystal structure of the monomericreverse transcriptase from Moloney murine leukemia virus. Structure (Cam-bridge) 12:819–829.

22. Dave, U. P., N. A. Jenkins, and N. G. Copeland. 2004. Gene therapy inser-tional mutagenesis insights. Science 303:333.

23. Davies, J. F., III, Z. Hostomska, Z. Hostomsky, S. R. Jordan, and D. A.Matthews. 1991. Crystal structure of the ribonuclease H domain of HIV-1reverse transcriptase. Science 252:88–95.

24. Dayam, R., and N. Neamati. 2004. Active site binding modes of the beta-diketoacids: a multi-active site approach in HIV-1 integrase inhibitor design.Bioorg. Med. Chem. 12:6371–6381.

25. Drelich, M., R. Wilhelm, and J. Mous. 1992. Identification of amino acidresidues critical for endonuclease and integration activities of HIV-1 INprotein in vitro. Virology 188:459–468.

26. Dyda, F., A. B. Hickman, T. M. Jenkins, A. Engelman, R. Craigie, and D. R.Davies. 1994. Crystal structure of the catalytic domain of HIV-1 integrase:similarity to other polynucleotidyl transferases. Science 266:1981–1986.

27. Engelman, A. 1999. In vivo analysis of retroviral integrase structure andfunction. Adv. Virus Res. 52:411–426.

28. Farnet, C. M., and W. A. Hazeltine. 1991. Determination of viral proteinspresent in human immunodeficiency virus type 1 preintegration complex.J. Virol. 65:1910–1915.

29. Fassati, A., and S. P. Goff. 2001. Characterization of intracellular reversetranscription complexes of human immunodeficiency virus type 1. J. Virol.75:3626–3635.

30. Fassati, A., and S. P. Goff. 1999. Characterization of intracellular reversetranscription complexes of Moloney murine leukemia virus. J. Virol. 73:8919–8925.

31. Felkner, R. H., and M. J. Roth. 1992. Mutational analysis of N-linked gly-cosylation sites of the SU protein of Moloney murine leukemia virus. J. Vi-rol. 66:4258–4264.

9508 PUGLIA ET AL. J. VIROL.

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

32. Goedken, E., and S. Marqusee. 2001. Co-crystal of Escherichia coli RNase HIwith Mn2� ions reveals two divalent metals bound in the active site. J. Biol.Chem. 276:7266–7271.

33. Goff, S. P., P. Traktman, and D. Baltimore. 1981. Isolation and properties ofMoloney murine leukemia virus mutants; use of a rapid assay for release ofvirion reverse transcriptase. J. Virol. 38:239–248.

34. Goldgur, Y., R. Craigie, G. H. Cohen, T. Fujiwara, T. Yoshinaga, T. Fuji-shita, H. Sugimoto, T. Endo, H. Murai, and D. R. Davies. 1999. Structure ofthe HIV-1 integrase catalytic domain complexed with an inhibitor: a plat-form for antiviral drug design. Proc. Natl. Acad. Sci. USA 96:13040–13043.

35. Goldgur, Y., F. Dyda, A. B. Hickman, T. M. Jenkins, R. Craigie, and D. R.Davies. 1998. Three new structures of the core domain of HIV-1 integrase:an active site that binds magnesium. Proc. Natl. Acad. Sci. USA 95:9150–9154.

36. Goulaouic, H., and S. A. Chow. 1996. Directed integration of viral DNAmediated by fusion proteins consisting of human immunodeficiency virustype 1 integrase and Escherichia coli LexA protein. J. Virol. 70:37–46.

37. Greenwald, J., V. Le, S. Butler, F. Bushman, and S. Choe. 1999. The mobilityof an HIV-1 integrase active site loop is correlated with catalytic activity.Biochemistry 38:8892–8898.

38. Hacein-Bey-Abina, S., V. K. C., M. Schmidt, M. P. McCormack, N. Wulf-fraat, P. Leboulch, A. Lim, C. S. Osborne, R. Pawliuk, E. Morillon, R.Sorensen, A. Forster, P. Fraser, J. I. Cohen, G. de Saint Basile, I. Alexander,U. Wintergerst, T. Frebourg, A. Aurias, D. Stoppa-Lyonnet, S. Romana, I.Radford-Weiss, F. Gross, F. Valensi, E. Delabesse, E. Macintyre, F. Sigaux,J. Soulier, L. E. Leiva, M.Wissler, C. Prinz, T. H. Rabbitts, F. Le Deist, A.Fischer, and M. Cavazzana-Calvo. 2003. LMO2-associated clonal T cellproliferation in two patients after gene therapy for SCID-X1. Science 302:415–419.

39. Hansen, M. S., and F. D. Bushman. 1997. Human immunodeficiency virustype 2 preintegration complexes: activities in vitro and response to inhibitors.J. Virol. 71:3351–3356.

40. Hehl, E. A., P. Joshi, G. V. Kalpana, and V. R. Prasad. 2004. Interactionbetween human immunodeficiency virus type 1 reverse transcriptase andintegrase proteins. J. Virol. 78:5056–5067.

41. Hirt, B. 1967. Selective extraction of polyoma DNA from infected mouse cellcultures. J. Mol. Biol. 26:365–371.

42. Hoffman, L., J. Jendrisak, R. Meis, I. Goryshin, and S. Reznikof. 2000.Transposome insertional mutagenesis and direct sequencing of microbialgenomes. Genetica 108:19–24.

43. Hyde, C. C., F. D. Bushman, T. C. Mueser, and Z.-N. Yang. 1999. Crystalstructure of an active two-domain derivative of rous sarcoma virus integrase.J. Mol. Biol. 296:535–538.

44. Johnson, M. S., M. A. McClure, D. F. Feng, J. Gray, and R. F. Doolittle.1986. Computer analysis of retroviral pol genes: assignment of enzymaticfunctions to specific sequences and homologies with nonviral enzymes. Proc.Natl. Acad. Sci. USA 83:7648–7652.

45. Jones, T. A., J. Y. Zou, S. W. Cowan, and M. Kjeldgaard. 1991. Improvedmethods of building protein models in electron density maps and the loca-tion of errors in these models. Acta Crystallogr. A. 47:110–119.

46. Jonsson, C. B., G. A. Donzella, E. Gaucan, C. M. Smith, and M. J. Roth.1996. Functional domains of Moloney murine leukemia virus integrase de-fined by mutation and complementation analysis. J. Virol. 70:4585–4597.

47. Jonsson, C. B., and M. J. Roth. 1993. Role of the His-Cys finger of Moloneymurine leukemia virus integrase protein in integration and disintegration.J. Virol. 67:5562–5571.

48. Katz, R. A., G. Merkel, and A. M. Skalka. 1996. Targeting of retroviralintegrase by fusion to a heterologous DNA binding domain: in vitro activitiesand incorporation of a fusion protein into viral particles. Virology 217:178–190.

49. Kraulis, P. J. 1991. MOLSCRIPT: a program to produce both detailed andschematic plots of protein structures. J. Appl. Crystallogr. 24:946–950.

50. Laurent, L. C., M. N. Olsen, R. A. Crowley, H. Savilahti, and P. O. Brown.2000. Functional characterization of the human immunodeficiency virus type1 genome by genetic footprinting. J. Virol. 74:2760–2769.

51. Lee, M. C., J. Deng, J. M. Briggs, and Y. Duan. 2005. Large scale confor-mational dynamics of the HIV-1 integrase core domain and its catalytic loopmutants. Biophys. J. 88:3133–3146.

52. Lee, M. S., and R. Craigie. 1998. A previously unidentified host proteinprotects retroviral DNA from autointegration. Proc. Natl. Acad. Sci. USA95:1528–1533.

53. Li, L., C. M. Farnet, W. F. Anderson, and F. D. Bushman. 1998. Modulationof activity of Moloney murine leukemia virus preintegrative complexes byhost factors in vitro. J. Virol. 72:2125–2131.

54. Lim, D. 2001. Functional and structural analysis of the RNaseH domain ofthe Moloney murine leukemia virus reverse transcriptase. Ph.D. dissertation.Columbia University, New York, N.Y.

55. Lin, C.-W., and A. Engelman. 2003. The barrier-to-autointegration factor isa component of functional human immunodeficiency virus type 1 preinte-gration complexes. J. Virol. 77:5030–5036.

56. Llano, M., M. Vanegas, O. Fregoso, D. Saenz, S. Chung, M. Peretz, andE. M. Poeschla. 2004. LEDGF/p75 determines cellular trafficking of diverse

lentiviral but not murine oncoretroviral integrase proteins and is a compo-nent of functional lentiviral preintegration complexes. J. Virol. 78:9524–9537.

57. Lobel, L. I., and S. P. Goff. 1984. Construction of mutants of Moloneymurine leukemia virus by suppressor-linker insertion mutagenesis: positionsof viable insertion mutations. Proc. Natl. Acad. Sci. USA 81:4149–4153.

58. Lu, R., H. Z. Ghory, and A. Engelman. 2005. Genetic analyses of conservedresidues in the carboxyl-terminal domain of human immunodeficiency virustype 1 integrase. J. Virol. 79:10356–10368.

59. Lu, R., A. Limon, E. Devroe, P. A. Silver, P. Cherepanov, and A. Engelman.2004. Class II integrase mutants with changes in putative nuclear localizationsignals are primarily blocked at a postnuclear entry step of human immuno-deficiency virus type 1 replication. J. Virol. 78:12735–12746.

60. Lu, R., A. Limon, H. Z. Ghory, and A. Engelman. 2005. Genetic analyses ofDNA-binding mutants in the catalytic core domain of human immunodefi-ciency virus type 1 integrase. J. Virol. 79:2493–2505.

61. Lutzke, R. A. P., and R. H. A. Plasterk. 1998. Structure-based mutationalanalysis of the C-terminal DNA-binding domain of human immunodefi-ciency virus type 1 integrase: critical residues for protein oligomerization andDNA binding. J. Virol. 72:4841–4848.

62. McClure, M. A., M. S. Johnson, D.-F. Feng, and R. F. Doolittle. 1988.Sequence comparisons of retroviral proteins: relative rates of change andgeneral phylogeny. Proc. Natl. Acad. Sci. USA 85:2469–2473.

63. McCormack, M., A. Forster, L. Drynan, R. Pannell, and T. H. Rabbitts.2003. The LMO2 T-cell oncogene is activated via chromosomal transloca-tions or retroviral insertion during gene therapy but has no mandatory rolein normal T-cell development. Mol. Cell. Biol. 23:9003–9013.

64. McCutchan, J. H., and J. S. Pagano. 1968. Enhancement of the infectivity ofsimian virus 40 deoxyribonucleic acid with diethylaminoethyl-dextran.J. Natl. Cancer Inst. 41:351–357.

65. Merrit, E. A., and D. J. Bacon. 1997. Raster3d: photorealistic moleculargraphics. Methods Enzymol. 227:505–524.

66. Miller, M. D., C. M. Farnet, and F. D. Bushman. 1997. Human immunode-ficiency virus type 1 preintegration complexes: studies of organization andcomposition. J. Virol. 71:5382–5390.

67. Nermut, M. V., and A. Fassati. 2003. Structural analyses of purified humanimmunodeficiency virus type 1 intracellular reverse transcription complexes.J. Virol. 77:8196–8206.

68. O’Reilly, L., and M. J. Roth. 2000. Second-site changes affect viability ofamphotropic/ecotropic chimeric enveloped murine leukemia viruses. J. Vi-rol. 74:899–913.

69. Oz-Gleenberg, I., O. Avidan, Y. Goldgur, A. Herschhorn, and A. Hizi. 2005.Peptides derived from the reverse transcriptase of human immunodeficiencyvirus type 1 as novel inhibitors of the viral integrase. J. Biol. Chem. 280:21987–21996.

70. Pearlman, D. A., D. A. Case, J. W. Caldwell, W. R. Ross, T. E. Cheatham III,S. DeBolt, D. Ferguson, G. Seibel, P. Kollman, and P. Amber. 1995.AMBER, a computer program for applying molecular mechanics, normalmode analysis, molecular dynamics and free energy calculations to elucidatethe structure and energies of molecules. Comput. Phys. Commun. 91:1–41.

71. Petyuk, V., J. McDermott, M. Cook, and B. Sauer. 2004. Functional mappingof Cre recombinase by pentapeptide insertional mutagenesis. J. Biol. Chem.279:37040–37048.

72. Podtelezhnikov, A., K. Gao, F. Bushman, and J. McCammon. 2003. Model-ing HIV-1 integrase complexes based on their hydrodynamic properties.Biopolymers 68:110–120.

73. Quinonez, R., I. Sinha, I. R. Singh, and R. E. Sutton. 2003. Genetic foot-printing of the HIV co-receptor CCR5: delineation of surface expression andviral entry determinants. Virology 307:98–115.

74. Roth, M. J. 1991. Mutational analysis of the carboxyl terminus of the Molo-ney murine leukemia virus integration protein. J. Virol. 65:2141–2145.

75. Roth, M. J., P. Schwartzberg, N. Tanese, and S. P. Goff. 1990. Analysis ofmutations in the integration function of Moloney murine leukemia virus:effects on DNA binding and cutting. J. Virol. 64:4709–4717.

76. Roth, M. J., N. Tanese, and S. P. Goff. 1988. Gene product of Moloneymurine leukemia virus required for proviral integration is a DNA-bindingprotein. J. Mol. Biol. 203:131–139.

77. Roth, M. J., N. Tanese, and S. P. Goff. 1985. Purification and characteriza-tion of murine retroviral reverse transcriptase expressed in Escherichia coli.J. Biol. Chem. 260:9326–9335.

78. Rothenberg, S. M., M. N. Olsen, L. C. Laurent, R. A. Crowley, and P. O.Brown. 2001. Comprehensive mutational analysis of the Moloney murineleukemia virus envelope protein. J. Virol. 75:11851–11862.

79. Schwartzberg, P., M. Roth, N. Tanese, and S. Goff. 1993. Analysis of atemperature-sensitive mutation affecting the integration protein of Moloneymurine leukemia virus. Virology 192:673–678.

80. Seamon, J. A., M. Adams, S. Sengupta, and M. J. Roth. 2000. Differentialeffects of C-terminal molecular tagged integrase on replication competentMoloney-murine leukemia virus. Virology 274:412–419.

81. Seamon, J. A., C. Miller, K. S. Jones, and M. J. Roth. 2002. Inserting nucleartargeting signals onto a replication-competent M-MuLV affects viral export

VOL. 80, 2006 RNase H AND INTEGRASE DOMAIN STRUCTURE-FUNCTION 9509

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from

and is not sufficient for cell cycle independent infection. J. Virol. 76:8475–8484.

82. Shinnick, T. M., R. A. Lerner, and J. G. Sutcliffe. 1981. Nucleotide sequenceof Moloney murine leukaemia virus. Nature 293:543–548.

83. Singh, I. R., R. A. Crowley, and P. O. Brown. 1997. High-resolution func-tional mapping of a cloned gene by genetic footprinting. Proc. Natl. Acad.Sci. USA 94:1304–1309.

84. Tan, W., K. Zhu, D. J. Segal, I. Carlos F. Barbas, and S. A. Chow. 2004.Fusion proteins consisting of human immunodeficiency virus type 1 integraseand the designed polydactyl zinc finger protein E2C direct integration ofviral DNA into specific sites. J. Virol. 78:1301–1313.

85. Tanese, N., and S. P. Goff. 1988. Domain structure of the Moloney murineleukemia virus reverse transcriptase: mutational analysis and separate ex-pression of the DNA polymerase and RNase H activities. Proc. Natl. Acad.Sci. USA 85:1777–1781.

86. Tanese, N., A. Telesnitsky, and S. P. Goff. 1991. Abortive reverse transcrip-tion by mutants of Moloney murine leukemia virus deficient in the reversetranscriptase-associated RNase H function. J. Virol. 65:4387–4397.

87. Wang, J.-Y., H. Ling, W. Yang, and R. Craigie. 2001. Structure of a two-domain fragment of HIV-1 integrase: implications for domain organizationin the intact protein. EMBO J. 20:7333–7343.

88. Wang, T., M. Balakrishnan, and C. B. Jonsson. 1999. Major and minorgroove contacts in retroviral integrase-LTR interactions. Biochemistry 38:3624–3632.

89. Wilhelm, M., and F.-X. Wilhelm. 2005. Role of integrase in reverse tran-

scription of the Saccharomyces cerevisiae retrotransposon Ty1. Eukaryot.Cell 4:1057–1065.

90. Wu, W., H. Lui, L. Xiao, J. A. Conway, E. Hehl, G. V. Kalpana, V. Prasad,and J. C. Kappes. 1999. Human immunodeficiency virus type 1 integraseprotein promotes reverse transcription through specific interactions with thenucleoprotein reverse transcription complex. J. Virol. 73:2126–2135.

91. Yang, F., O. Leon, N. J. Greenfield, and M. J. Roth. 1999. Functionalinteractions of the HHCC domain of Moloney murine leukemia virus inte-grase revealed by non-overlapping complementation and zinc dependentdimerization. J. Virol. 73:1809–1817.

92. Yang, F., J. A. Seamon, and M. J. Roth. 2001. Mutational analysis of theN-terminus of Moloney murine leukemia virus integrase. Virology 291:32–45.

93. Yang, W., W. A. Hendrickson, R. J. Crouch, and Y. Satow. 1990. Structure ofribonuclease H phased at 2 Å resolution by MAD analysis of the selenome-thionyl protein. Science 249:1398–1405.

94. Yang, Z. N., T. C. Mueser, F. D. Bushman, and C. C. Hyde. 2000. Crystalstructure of an active two-domain derivative of Rous Sarcoma virus inte-grase. J. Mol. Biol. 296:535–548.

95. Yi, J., J. Arthur, R. Dunbrack, Jr., and A. Skalka. 2000. An inhibitorymonoclonal antibody binds at the turn of the helix-turn-helix motif in theN-terminal domain of HIV-1 integrase. J. Biol. Chem. 275:38739–38748.

96. Zhu, K., C. Dobard, and S. A. Chow. 2004. Requirement for integrase duringreverse transcription of human immunodeficiency virus type 1 and the effectof cysteine mutations of integrase on its interactions with reverse transcrip-tase. J. Virol. 78:5045–5055.

9510 PUGLIA ET AL. J. VIROL.

on May 1, 2014 by guest

http://jvi.asm.org/

Dow

nloaded from