Secondary structure prediction and structure-specific sequence analysis of single-stranded DNA

3248–3257 Nucleic Acids Research, 2001, Vol. 29, No. 15 © 2001 Oxford University Press

Secondary structure prediction and structure-specificsequence analysis of single-stranded DNAFang Dong, Hatim T. Allawi, Todd Anderson, Bruce P. Neri and Victor I. Lyamichev*

Third Wave Technologies Inc., 502 South Rosa Road, Madison, WI 53719-1256, USA

Received March 28, 2001; Revised and Accepted June 12, 2001

ABSTRACT

DNA sequence analysis by oligonucleotide bindingis often affected by interference with the secondarystructure of the target DNA. Here we describe anapproach that improves DNA secondary structureprediction by combining enzymatic probing of DNAby structure-specific 5′-nucleases with an energyminimization algorithm that utilizes the 5′-nucleasecleavage sites as constraints. The method can iden-tify structural differences between two DNAmolecules caused by minor sequence variationssuch as a single nucleotide mutation. It also demon-strates the existence of long-range interactionsbetween DNA regions separated by >300 nt and theformation of multiple alternative structures by a244 nt DNA molecule. The differences in thesecondary structure of DNA molecules revealed by5′-nuclease probing were used to design structure-specific probes for mutation discrimination thattarget the regions of structural, rather thansequence, differences. We also demonstrate theperformance of structure-specific ‘bridge’ probescomplementary to non-contiguous regions of thetarget molecule. The structure-specific probes do notrequire the high stringency binding conditionsnecessary for methods based on mismatch forma-tion and permit mutation detection at temperaturesfrom 4 to 37°C. Structure-specific sequence analysisis applied for mutation detection in the Mycobacteriumtuberculosis katG gene and for genotyping of thehepatitis C virus.

INTRODUCTION

Sequence analysis of nucleic acids by oligonucleotide bindinghas traveled a long road from the pioneer works by Southern(1) and Wallace et al. (2) to an idea of large-scale sequenceanalysis with arrays of short oligonucleotides (3–5) and thedevelopment of arrays containing up to hundreds of thousandsof different oligonucleotides (6). The basic idea of thesemethods is that the efficiency of probe binding depends onsimilarity between the target and probe, with the most stableduplexes being formed by fully complementary sequences.Any variations in the target sequence, such as substitutions,

deletions or insertions, produce imperfect, less stable duplexes.This differential stability between the perfect and imperfectduplexes can be used as a means for DNA sequencing andmutation detection.

To achieve maximal mutation discrimination, the oligo-nucleotide binding should be performed under high stringencyconditions, so that the perfect duplex remains stable but themismatched duplex is unstable. Depending on the type ofmismatch, its position in the duplex and probe length, a singlemismatch will decrease the stability of the 16–22 nt probestypically used for mutation analysis by 3–10°C (7,8). There-fore, for successful mutation detection, any parametersaffecting duplex stability, such as temperature, pH or concen-trations of salt and destabilizing agents, must be carefullyselected and tightly controlled during the analysis.

The secondary structure of nucleic acids is another well-documented factor that affects probe binding for both DNAand RNA molecules (9–16). Formation of secondary structurecan reduce the binding constant of a specific probe by as muchas 105–106 (11). This effect can obscure the relation betweenprobe affinity and its similarity to the target. Although highstringency conditions reduce the effect of secondary structureson probe binding (9), decreasing the length of target moleculesby additional chemical, enzymatic or thermal fragmentation isusually required to minimize the probability of secondarystructure formation (6). Despite these measures, secondarystructure still remains an unknown parameter that complicatessequence analysis (17). Evaluating this parameter is possibleonly if detailed knowledge of secondary structures is available.

Most experimental methods for secondary structure analysisrely on the differential reactivity of single- and double-stranded regions of nucleic acids with chemical or enzymaticagents (18,19). However, deducing structural information fromthese data is complicated by the fact that such reactivity isusually sequence dependent and is averaged over an ensembleof possible structures adopted by the molecule. Among compu-tational methods, the comparative or phylogenetic methods forsecondary structure prediction (20) provide the most reliableinformation, although they apply only to families of sequenceswith the same function. Secondary structure prediction for anygiven sequence relies on energy minimization algorithms, suchas the well-known mfold program (21). The probability ofcorrectly predicting a secondary structure by these algorithmsis quite poor because of the limitations of the mathematicalmodel and the uncertainties in the thermodynamic parametersused in these methods. To partially overcome this problem,mfold predicts multiple suboptimal structures with close free

*To whom correspondence should be addressed. Tel: +1 608 663 7017; Fax: +1 608 273 8618; Email: [email protected]

Nucleic Acids Research, 2001, Vol. 29, No. 15 3249

energy values. To select likely actual candidate structures thusrequires further analysis.

Incorporating constraint parameters obtained from experi-mental data into computational methods used to predictsecondary structures can greatly improve the results (22), butthis approach is limited by the sensitivity of current structureprobing techniques. In this work, we have used enzymaticcleavage of single-stranded DNA with the 5′-nuclease TaqExoderived from Taq DNA polymerase I (23,24) to obtain infor-mation about hairpin structures formed by DNA molecules.TaqExo specifically recognizes hairpin structures with stemduplexes longer than 6 bp and cleaves them between the firsttwo base pairs at the 5′-end of the hairpin, thus creating apattern of fragments unique for each sequence. It has beenpreviously demonstrated (23) that TaqExo fragment patternsare extremely sensitive to small changes in the secondarystructure of DNA, thus providing a useful tool to detect thepoint mutations responsible for such changes.

We used TaqExo cleavage sites as mfold constraint parametersto determine the secondary structures of Mycobacteriumtuberculosis wild-type and mutant katG genes and hepatitisC virus (HCV) cDNAs. We demonstrate that a single mutationcan significantly alter the folding of DNA molecules, as hasbeen previously observed for RNA molecules (25), and thateven relatively short molecules can adopt multiple mutuallyexclusive conformations. The revealed structural differencesbetween similar DNA molecules were used to design structure-specific probes for mutation discrimination that target theconformationally different regions rather than the mismatchsites. We also designed structure-specific ‘bridge’ probes thatbind to two non-contiguous regions in the target moleculewhose relative positions are strongly affected by mutations.The bridge probes are similar to tethered (26,27) and stem-bridging oligonucleotides (28), which have been previouslydescribed for recognition of structured RNA and DNA molecules.The bridge probes we designed, based upon the TaqExo/mfoldsecondary structure prediction, showed a level of mutationdiscrimination similar to or exceeding that of linear mismatchdiscriminating probes. Finally, structure-specific probebinding requires no target fragmentation and can be performedunder low stringency conditions that favor secondary structureformation, e.g. room temperature, reducing the adverse effectsof temperature and binding buffer variations on mutationdiscrimination.

MATERIALS AND METHODS

Materials

Chemicals and buffers were from Fisher Scientific unlessotherwise noted. Restriction enzymes were purchased fromNew England Biolabs. PCR amplification was done using aGeneAmp kit with AmpliTaq DNA polymerase (PerkinElmer). TaqExo and MjaFEN 5′-nucleases were prepared asdescribed (23,29).

Oligonucleotide synthesis and purification

All oligonucleotides were synthesized on an Expedite 8909synthesizer (PerSeptive Biosystems) using standard phos-phoramidite chemistry including biotin, fluorescein (Fl), andtetrachlorofluorescein (TET) modifications (Glen Research)

and purified as described previously (30). Oligonucleotideconcentrations were determined by measuring absorption at260 nm and using specific extinction coefficients for A, T, Gand C (31).

Mycobacterium tuberculosis katG gene DNA fragments

Genomic DNAs isolated from wild-type and mutant isoni-azide-resistant strains of M.tuberculosis were the gift ofDr Cockerill (Mayo Clinic). A fragment of the catalase–peroxidase (katG) gene (GenBank accession no. U06263)corresponding to codons 302–507 was PCR amplified for eachgenomic DNA with sense and antisense strand primers 5′-AGCTCGTATGGCACCGGAAC and 5′-TTGACCTCCCA-CCCGACTTG, respectively, and cloned into a TA vector(Invitrogen). Presence of the G→C mutation at position 41(G41C) of the mutant fragment was confirmed by sequencing.The 379, 391, 423 and 504 bp katG DNA fragments weregenerated by PCR amplification of the cloned fragments withthe same sense strand primer labeled with Fl or TET at the 5′-endand one of the antisense strand primers 5′-CAAGGTATC-TGGCAAGGGGA, 5′-GGACCAGCGGCCCAAGGTAT, 5′-GACCGGATCCTGCCACAGCA or 5′-GACAGTCAATCC-CGATGCCC, respectively. For the wild-type and mutant391 bp katG DNA fragment carrying the C385G substitutionthe antisense strand primer 5′-GGACCACCGGCCCAAG-GTAT was used. The 423 bp katG DNA fragment, internallylabeled with dUTP-Fl (Roche Molecular Biochemicals), was PCRamplified as described above, except that a mixture of 150 µMdTTP and 50 µM dUTP-Fl was used instead of 200 µM dTTP.The PCR products were purified by denaturing gel electro-phoresis as described previously (23).

HCV 5′-untranslated region (5′-UTR) DNA fragments

The 244 bp DNA fragments of the 5′-UTR of HCV genotypes1a, 1b, 2a/c and 3a were PCR amplified with negative strandprimer 5′-CTCGCAAGCACCCTATCAGGCAGT labeledwith biotin or TET at the 5′-end, and positive strand primer 5′-GCAGAAAGCGTCTAGCCATGGCGT using in-house HCVcDNA clones. The PCR products were purified as describedabove.

Sequencing reactions

Sequencing reactions were performed with a Thermosequenasekit (Amersham Pharmacia Biotech) using 250–500 ng PCRproduct as template and 0.25 µM sequencing primers labeledwith TET at the 5′-end. The sequencing products were resolvedon an 8% denaturing polyacrylamide gel and scanned on anFMBIO-100 fluorescence scanner (Hitachi) using a 585 nmemission filter.

TaqExo probing of DNA secondary structure

This was performed as described previously (23). Briefly,100–500 fmol PCR product, with the analyzed strand labeledwith TET at the 5′-end, was heat denatured in 13 µl of 5 mMMOPS, pH 7.5, for 15 s at 95°C and then cooled to 55°C. Thereaction samples were mixed with 7 µl of a solution containing1 µl of 25 ng/µl TaqExo, 2 µl of 100 mM MOPS, pH 7.5, 0.5%Tween-20, 0.5% Nonidet-P40, 2 µl of 2 mM MnCl2, 2 µl ofwater and the mixtures were then incubated for 90 s at 55°C.The reactions were terminated by addition of 16 µl of 95%formamide, 10 mM EDTA, pH 8.0, 0.05% crystal violet

3250 Nucleic Acids Research, 2001, Vol. 29, No. 15

(Sigma). Aliquots of TaqExo cleavage products (5 µl) wereresolved on a 10% denaturing polyacrylamide gel and scannedon an FMBIO-100 fluorescence scanner (Hitachi) using a 585 nmemission filter. A 25 bp DNA ladder (Promega) labeled at the5′-end with a FluoroReporter Bodipy TMR-C5 oligonucleotidelabeling kit (Molecular Probes) was used as molecular weightmarkers.

Secondary structure prediction using mfold and TaqExoconstraints

DNA folding was performed at the DNA mfold server(bioinfo.math.rpi.edu/~mfold/dna/) using DNA free energyparameters (7). Each TaqExo cleavage site, selected as aconstraint in DNA folding, was encoded according to theconstraint options provided by the program. For example, acleavage site at position n was represented by parameters F n 0 2to force the nucleotides located directly 3′ and 5′ of thecleavage site to be base paired and P n – n + 1 1 – n + 16 toprohibit base pairing of nucleotides n and n + 1 with a region1 – n + 16, and therefore to ensure that in the predicted struc-tures the cleavage site n is located at the 5′-end of a hairpinwith an at least 7 bp duplex region. Compatibility betweenconstraints was investigated by a trial-and-error method andthe sets of compatible constraints were determined manually.

Oligonucleotide probe binding under low stringencyconditions

For each binding reaction, the 423 bp katG PCR product (10–30 nM), labeled with Fl either at the 5′-end of the target strandor internally, was denatured in 0.2 M NaOH, 5 mM EDTA for10 min. The denatured target (1 µl) was mixed with 1.5 pmolof one of the probes labeled with biotin in 150 µl of a buffercontaining 0.8 M NaCl, 45 mM NaH2PO4, pH 7.4, 4.5 mMEDTA, 0.2% Ultrapure BSA (Panvera) and 10 ng/µl tRNA(Sigma) and incubated at room temperature for 30 min before100 µl of the reaction mixture was transferred to a well of astreptavidin-coated microtiter plate (Boehringer Mannheim,catalog no. 1,734,784). After incubation for 20 min at roomtemperature, the plate was washed three times with TBS buffercontaining 25 mM Tris–HCl, pH 7.2, 0.15 M NaCl, 0.05 mg/mlNaN3, 0.1% Tween-20, using an Autostrip ELX50 (Bio-TekInstruments). Each well of the washed plate was incubated for20 min at room temperature with 100 µl of SuperBlocksolution (Pierce) in TBS buffer containing 0.015 U Anti-Fluorescein-AP (Boehringer Mannheim). The unboundconjugate was washed out as described above and 100 µl of 0.6mg/ml AttoPhos in AttoPhos buffer (JBL Scientific) wasadded to each well. After incubation for 30 min at 37°C, thechemiluminescence signal in each well was measured using aCytoFluor 4000 (PerSeptive Biosystems) equipped with a 450/50excitation filter and a 580/50 emission filter. For signalnormalization, the control katG probe, 5′-CGT CCT TGGCGG TGT ATT labeled at the 5′-end with biotin, was used.

The probe binding reactions at room temperature for the244 nt HCV 5′-UTR DNA fragments were performed asdescribed above, except that the DNA targets were 5′-labeledwith biotin and probes were 5′-labeled with Fl. For the probebinding experiments at 4 and 37°C, both the probe binding andmicrotiter plate binding steps were performed at the selectedtemperature and the subsequent steps were carried out at roomtemperature. The sequences of the HCV structure-specific

probes were 5′-TTG GGC GTT GCT TGT GGT (probe 3-2),5′-AGT GTC GTT TGG AAC CGG (probe 5-4), 5′-AGT GTCGTT TCT TGT GGT (probe 5-2) and 5′-GCA GAA AGT TCTTGC GAG (probe 6-1). The spacer dinucleotides of the structure-specific probes are shown in italic. For signal normalization,the control probe, 5′-GCG AAA GGC CTT GTG G labeled atthe 5′-end with Fl, was used.

RESULTS

Changes in secondary structure of single-stranded DNAinduced by a single mutation can be detected by TaqExoprobing

The ability of structure-specific 5′-nucleases to create a uniquepattern of cleavage products has been applied to mutation analysisof numerous DNA molecules (23,32–36). Because 5′-nucleasecleavage patterns are sensitive to single nucleotide substitu-tions, they can be used to identify structural differencesbetween closely related DNA sequences. An example of suchan analysis is shown in Figure 1 for wild-type and mutant DNAfragments of the M.tuberculosis katG gene that differ by only asingle G→C mutation at position 41 (G41C) (Fig. 1A). TheTaqExo cleavage patterns of the wild-type and mutant 504 ntfragments (Fig. 1B) show that the G41C mutation eliminatesthe 37 nt cleavage product present in the wild-type moleculewith little or no effect on the rest of the pattern (for brevity,

Figure 1. Analysis of structural changes in katG DNA by TaqExo probing andthe ‘PCR walk’ method. (A) Schematic presentation of the 504, 423, 391 and379 nt fragments of katG gene DNA. The fragments have identical sequencesat the 5′-ends labeled with TET (*). The TaqExo cleavage site at position 37and the G37C mutation are shown by short and long arrows, respectively.(B) The TaqExo cleavage products of the wild-type (WT) and mutant (G41C)katG DNA fragments shown in (A). The 37 nt product is shown by an arrow.M, size markers.


only the portion of the gel corresponding to cleavage productsthat are 25–125 nt long is shown). Thus, TaqExo probingsuggests that the G41C mutation destroys the hairpin structureresponsible for the 37 nt cleavage product of the wild-typekatG DNA fragment.

Long-range interactions in the secondary structure of thewild-type katG DNA

According to the substrate specificity of TaqExo (29,37), thecleavage site at position 37 defines the 5′-end of a hairpinstructure formed in the wild-type DNA molecule. Mfoldpredicts that nucleotide A37 of the 504 nt wild-type fragmentcan pair with either T254 or T389, if it is forced to be at the5′-end of a hairpin structure (data not shown). To determineexperimentally which of these two nucleotides defines the3′-end of the hairpin, we developed a technique called the‘PCR walk’. Using nested PCR primers, two sets of DNA frag-ments, which have identical 5′-ends but progressivelytruncated 3′-ends, were prepared for the wild-type and mutantkatG genes by PCR amplification, as shown in Figure 1A.Analysis of the TaqExo cleavage products demonstrates thattruncation of the wild-type DNA from 504 to 391 nucleotidesdoes not affect the cleavage site at position 37, however, afurther reduction from 391 to 379 nt causes the 37 nt product todisappear (Fig. 1B).

Disappearance of the 37 nt product was the only majorchange in the cleavage pattern of the 379 nt wild-type frag-ment, indicating that the length reduction did not affect otherelements of the secondary structure. Thus, the PCR walk analysissuggests that the 3′-end of the hairpin that gives rise to the37 nucleotides product is located between nucleotides 379 and391. This result is consistent with the mfold structure predic-tion that indicates formation of an A37·T389 base pair in apotential hairpin duplex consisting of regions 37–53 and 374–390 of the wild-type sequence, as shown in Figure 2A. The

G41C mutation would substitute the G41·C385 base pair in thewild-type structure with a C·C mismatch in the mutant struc-ture, which increases the calculated free energy of hairpinduplex formation from –17.3 to –11.7 kcal/mol.

To experimentally confirm the existence of long-range basepairing between nucleotides G41 and C385, a C→G substitu-tion was introduced at position 385 of the 391 nt wild-type andmutant katG DNA fragments. If the proposed hairpin structurewas correct, the C385G substitution would favor base pairingbetween nucleotides 41 and 385 in the mutant G41C molecule,but would also substitute a G·G mismatch for the G41·C385base pair in the wild-type DNA. In agreement with this predic-tion, TaqExo analysis showed that the C385G substitutionrestores the cleavage site at position 37 in the mutant fragment,but eliminates this site in the wild-type fragment (compareFigs 2B and 1B).

Mutation detection in the katG gene by structure-specificprobes

Using the proposed hairpin structure of the wild-type katGDNA, we designed four structure-specific probes for G41Cmutation discrimination (Fig. 3A). Two probes, 1 and 1a, arecomplementary to region 37–53 of the mutant and wild-typeDNAs, respectively, and their binding should be affected byboth mismatch formation and secondary structure. Probe 2 iscomplementary to region 374–389, which comprises the 3′-part of the hairpin duplex formed in the wild-type structure.Bridge probe 3 is complementary to two non-contiguousregions, 29–36 and 390–397, which are presumably broughtinto close proximity in the secondary structure of the wild-typeDNA (Fig. 3A). The target-specific regions of probe 3 arelinked by a 5′-CC dinucleotide to stabilize the potential three-way helical DNA junction formed by probe 3 and the wild-typekatG target (38). Because probes 2 and 3 are complementary toregions that do not include the G41C mutation, their bindingshould depend only on structural differences between the wild-type and mutant katG DNAs.

Binding of the designed structure-specific probes wasstudied by allowing complex formation between a FI-labeledsingle-stranded DNA target and a 5′-biotin-labeled probeunder low stringency conditions at room temperature and thencapturing the complex in streptavidin-coated microtiter plates.The relative binding of each probe was determined from theabsolute fluorescence signal of the captured DNA target,normalized to the signal of a control probe (see Materials andMethods). Figure 3B shows the binding affinities of probes 1,1a, 2 and 3 for the 423 nt fragments of the wild-type andmutant katG DNAs. Probe 1 shows a high level of discrimina-tion between the two targets, possibly due to the combinednegative effect of the mismatch and secondary structure on itsbinding with the wild-type fragment. Probe 1a binds weakly toboth targets, presumably due to C·C mismatch formation withthe mutant target and a competing secondary structure in theprobe-binding region of the wild-type target. Probe 2 bindsmore efficiently with the mutant target, in agreement with theassumption that the probe-binding region in the mutant targetis less structured compared to the wild-type target. Conversely,the presence of a hairpin in the wild-type target explains itshigh binding affinity for probe 3, probably due to formation ofa stable three-way junction. The high binding affinity of probe3 for the wild-type target and the high level of discrimination

Figure 2. Confirmation of the long-range interactions in wild-type katG DNAby compensatory substitutions. (A) Proposed hairpin structure of the wild-type(WT) 391 nt katG DNA fragment. The G41·C385 base pair is shown in bold.(B) The TaqExo cleavage products of the WT/C385G and G41C/C385G 391 ntkatG DNA fragments carrying the C385G substitution. The 37 nt cleavageproduct and its position in the structures are shown by arrows. M, size markers.The nucleotide positions discussed in the text are indicated.


between the two targets demonstrate that bridge probes can beat least as efficient as linear probes.

Target fragmentation affects binding of structure-specificprobes

Fragmentation of target DNA is often used to reduce theinterference of secondary structures with probe binding (6).Consequently, target fragmentation should decrease the abilityof structure-specific probes to discriminate mutations. Toinvestigate this hypothesis, we studied the effect of AvaI treat-ment of wild-type and mutant katG DNA targets on the bindingaffinities of probes 1, 1a, 2 and 3 (Fig. 3A). The 423 nt katGtargets, both internally labeled with FI, were cleaved with AvaIat positions 95 and 299, neither of which overlap with thebinding sites of either the structure-specific or control probes.The internal labeling, required for detection of the fragmentedDNA, did not significantly affect discrimination between theintact wild-type and mutant katG targets (compare Fig. 3B andC). However, the structure-specific probes were no longer ableto discriminate the G41C mutation when AvaI-treated targetswere used instead of intact ones (Fig. 3D). The binding affini-ties of bridge probe 3 were affected the most and decreasedessentially to a background level.

Structural analysis of the 5′-UTR of hepatitis C virus

Analysis of the katG DNA described above was limited to asingle hairpin structure. As a second example, we predicted thecomplete secondary structures of four DNA molecules corre-sponding to the 5′-UTR of HCV RNA. The 5′-UTR is highlyconserved compared to the overall sequence of the viral RNAand is commonly used for HCV genotyping (32,39).

Figure 4A shows the sequences of the 244 nt DNA fragmentscorresponding to the region from –31 to –274 of the type 1a,1b, 2a/c and 3a negative strand HCV RNAs chosen for thisstudy. TaqExo cleavage products obtained by partial digestionof the four DNA molecules labeled at their 5′-ends are shownin Figure 4B. The positions of the TaqExo major cleavage siteswere determined by sequencing gel analysis with singlenucleotide resolution (data not shown) and indicated by thearrowheads in Figure 4A. These sites were then used asconstraint parameters in mfold (21) to select only those struc-tures that are consistent with the TaqExo data, as described inMaterials and Methods. Briefly, each cleavage site was forcedto be located between the first two nucleotides at the 5′-end ofa hairpin structure with a duplex region of at least 7 bp. Forconvenience, we refer to the constraint imposed by a cleavagesite at position n as constraint n.

Figure 3. Detection of the G41C mutation in katG DNA by structure-specific probe binding under low stringency conditions. (A) Proposed structures of the wild-type (WT) and mutant (G41C) 423 nt fragments of katG DNA. Probes 1, 1a, 2 and 3 are shown by solid lines at the regions complementary to the targets. Non-complementary nucleotides in the probes are indicated. (B) Relative binding affinities of probes 1, 1a, 2 and 3 with WT and G41C targets labeled with fluoresceinat the 5′-end (5′-Fl). (C) Identical to (B) but with the targets internally labeled with fluorescein (Fl-Int). (D) Identical to (C) but with the internally labeled targetstreated with AvaI (Fl-Int-Ava I). The binding affinities for WT and G41C targets are shown by white and gray rectangles, respectively. Error bars indicate thestandard deviations obtained from triplicate measurements.


Figure 5A–C shows some of the structures predicted for thetype 1b DNA molecule using mfold with constraints 33, 55,62, 90, 118, 125, 161 and 173. Interestingly, the analysis couldnot predict any structure consistent with all constraints. Forexample, the structure shown in Figure 5A conforms toconstraints 33, 90, 161 and 173, but is not compatible withconstraints 55, 62, 118 and 125, because the positions ofcleavage sites 55, 62, 118 and 125 in this structure contradictthe substrate specificity of TaqExo (29). To satisfy constraints118 and 125, the hairpin structures defined by constraints 90,161 and 173 should be completely rearranged, resulting in thestructure shown in Figure 5B. The structure shown in Figure5C explains cleavage sites 55 and 62 and is compatible withconstraints 90 and 173, but not with constraints 33 and 161.The calculated free energies of the structures proposedin Figure 5A–C at 1 M NaCl and 37°C are –33.6, –26.3 and–30.3 kcal/mol, respectively. In comparison, the optimalsecondary structure predicted by mfold without any imposedconstraints (Fig. 5D) has a free energy of –34.6 kcal/mol and isvery similar to the structure shown in Figure 5A.

Some cleavage sites, e.g. sites 173 and 125 in Figure 5A andB, respectively, are located at internal positions of theproposed hairpin structures. Since TaqExo cannot cleaveduplexes internally, we assume that long hairpin structures canbe partially melted under the reaction conditions (55°C,10 mM MOPS, pH 7.5, 0.2 mM MnCl2) to generate truncatedsubstrates for TaqExo (24,37).

A similar analysis performed on types 1a, 2a/c and 3a DNAmolecules using TaqExo cleavage sites (Fig. 4A) also revealedmultiple alternative structures (data not shown), of which themost stable for each type are depicted in Figure 6, including thetype 1b structure shown in Figure 5A. All four structuresexhibit structurally conserved hairpins I, II and III. TaqExodoes not detect hairpin II, probably because this enzymecannot recognize hairpin duplexes shorter than 7–8 bp (29).However, the calculated free energy –6.1 kcal/mol of hairpin IIand the observed cleavage at site 141 of all four types of DNAmolecules with MjaFEN 5′-nuclease, which recognizes shorterhairpin duplexes (29), confirms the existence of hairpin II inthe proposed structures (data not shown). Besides theseconserved elements, the structures also exhibit structurally

Figure 4. TaqExo analysis of types 1a, 1b, 2 and 3 5′-UTR HCV DNA fragments. (A) Alignment of DNA sequences corresponding to nucleotides –31 to –274 ofthe 5′-UTR of type 1a, 1b, 2 and 3 negative strand HCV RNAs (40). Major cleavage sites are indicated by arrows. (B) TaqExo cleavage products of type 1a, 1b, 2and 3 DNAs shown in (A). M, size markers. The sizes of major cleavage products of type 1b DNA are indicated on the left.


variable regions. For example, the T→C substitution at posi-tion 69 that discriminates HCV types 1a and 1b results information of a stable hairpin structure in region 33–77 of type1b DNA (Fig. 6B). This difference in otherwise identical struc-tures is supported by the strong TaqExo cleavage at position 33 oftype 1b DNA, which is greatly reduced in type 1a DNA (Fig. 4B)and has not been used as a constraint for type 1a DNA structureprediction.

Structure-specific bridge probes for HCV genotyping

The proposed structures of the HCV cDNA molecules (Fig. 6)were used to design structure-specific bridge probes for HCVgenotyping. We selected six 8 nt regions, shown in Figure 6,which are likely to be accessible according to the proposedstructures. To exclude the effect of mismatch formation onprobe binding, the regions were selected from sequencesconserved among all four types. Each bridge probe consists of5′ and 3′ 8 nt regions, complementary to two of the six selectedregions, linked together by a spacer dinucleotide. The bridge

probes were classified according to the target regions used fortheir design. For example, a probe with 5′ and 3′ regionscomplementary to target regions 3 and 2, respectively, isreferred to as probe 3-2. Among 30 possible combinations, wetested 10 probes (data not shown) and finally selected aminimal set of four probes, 3-2, 5-4, 5-2 and 7-1, sufficient toidentify the 1a, 1b, 2a/c and 3a genotypes. Figure 7A demon-strates that these probes exhibit unique and highly reproduciblebinding profiles with the four DNA targets under low strin-gency conditions, with a coefficient of variation of <6%, thusensuring reliable discrimination of all four targets.

Sequence analysis by structure-specific probe bindingunder low stringency conditions is temperatureindependent

Because secondary structure is the major factor affectingbinding between target and structure-specific probes, theability of probes to discriminate targets should be temperatureindependent under conditions favoring the formation of

Figure 5. Predicted secondary structures of the type 1b DNA fragment. Alternative secondary structures predicted for type 1b DNA by mfold using constraints (A)33, 90, 161, 173, (B) 33, 118, 125 and (C) 55, 62, 90, 173. The positions of major cleavage sites are indicated by arrows. (D) Optimal structure of the type 1b DNAfragment predicted by mfold. The calculated free energies for each structure at 1 M NaCl and 37°C are shown.


secondary structure. We tested this hypothesis by comparingthe binding of probes 3-2, 5-4, 5-2 and 6-1 to type 1a and 1bDNA targets at room temperature, 4 and 37°C. Figure 7Bshows that probe 3-2, which binds with type 1b DNA morestrongly than with type 1a DNA at room temperature (Fig. 7A),exhibits the same level of discrimination at both 4 and 37°C,whereas the binding affinities of probes 5-4, 5-2 and 6-1 arepractically independent of type at all temperatures. Thus,structure-specific probes can be used for reliable sequenceanalysis over a temperature range of >30°C, including roomtemperature.

DISCUSSION

Combined use of TaqExo enzymatic probing with the energyminimization mfold algorithm (21) was used in this work toimprove the prediction of DNA secondary structures. Theunique specificity of TaqExo, which cleaves hairpin structuresprecisely at their 5′-ends, permits straightforward conversion

of TaqExo cleavage data into formal mfold constraint parameters.It can also be used with the PCR walk method described in thiswork. The power of TaqExo probing is demonstrated bydetection of structural changes caused by single nucleotidevariations in the katG gene DNA (Fig. 3A) and the HCV 5′-UTR cDNA (Fig. 6A and B) and by prediction of multiplealternative structures for the 244 nt type 1b HCV DNAfragment (Fig. 5). The latter example underscores the advan-tage of TaqExo probing when compared with other enzymaticand chemical methods by providing a superposition of discretecleavage patterns rather than an average reactivity of eachnucleotide over an ensemble of all alternative structures.

The prediction of alternative structures can be helpful inoptimizing the thermodynamic parameters used in energyminimization programs. For example, the calculated free ener-gies of two alternative structures adopted by type 1b DNA(Fig. 5A and B) differ by >7 kcal/mol. Such a difference wouldmake formation of the first structure >105-fold more probablethan formation of the second. Because this conclusion is not

Figure 6. Optimal secondary structures of type 1a, 1b, 2 and 3 5′-UTR DNA fragments predicted by mfold using TaqExo constraints. The structures were predictedusing constraints (A) 90, 161, 173 for type 1a, (B) 33, 90, 161, 173 for type 1b, (C) 33, 85, 89, 173 for type 2a/c and (D) 33, 90, 92, 98, 161, 173 for type 3a (seeFig. 4A) and conditions of 1 M NaCl and 37°C. Polymorphic substitution identifying each genotype are shown in bold. Conserved hairpin structures I–III areindicated. Regions used for designing bridge probes are shown by solid lines and denoted by Arabic numerals.


consistent with the fact that cleavage sites 118 and 125 areclearly detectable (Fig. 4B), additional interactions that are notaccounted for in mfold likely stabilize the structure shown inFigure 5B.

The ability to identify structural changes caused by a singlemutation opens new opportunities in designing oligonucleotideprobes for sequence analysis based on structural differencesrather than mismatch formation. Because structure-specificprobes are not restricted to the site of mutation, a large numberof these probes, especially bridge probes, can be designed, asopposed to mismatch discriminating probes, thus increasingthe probability of achieving the desired level of mutationdiscrimination. The structure-specific approach has the furtheradvantage of reducing the number of probes required forsequence analysis of multiple targets. For example, we usedonly four probes to reliably identify four HCV genotypes(Fig. 7A). Standard mismatch detecting methods wouldrequire at least twice as many probes for such an analysis.Potentially, structure-specific probes can substantiallyimprove mutation discrimination compared with probesdesigned solely on the mismatch formation principle. Forexample, binding of probe 3-2 with types 1a and 1b HCVtargets exhibits a very high discrimination factor of 25:1(Fig. 7A).

The predicted secondary structures of target molecules canprovide significant insight in designing highly selective, struc-ture-specific probes. For example, probe 3-2 binds to type 1b,2a/c and 3a HCV DNA molecules significantly more stronglythan to the type 1a target (Fig. 7A). This finding agrees withthe proposed hairpin structure in region 33–77 of type 1b, 2a/cand 3a molecules that brings regions 2 and 3 into closeproximity to each other (Fig. 6). On the other hand, proposedstructural differences among the HCV targets cannot explain in

such simple terms the differential binding of the other threeprobes, suggesting that more elaborate models are required formore accurate prediction of the correlation between nucleicacid structure and the binding affinity of structure-specificprobes.

An important feature of the structure-specific probes is theirability to detect mutations under low stringency conditions.We observed nearly identical levels of discrimination betweenthe type 1a and 1b 5′-UTR DNAs with probe 3-2 at 4°C, roomtemperature and 37°C (Fig. 7). This result supports the hypothesisthat structural differences between the targets remain practi-cally unchanged under these conditions. Absolute bindingefficiencies of the bridge probes, measured as the intensities ofthe fluorescence signals generated by the same amount of 1aand 1b 5′-UTR DNAs, were several-fold higher at roomtemperature than at 4 or 37°C (data not shown). Thus, temperature-induced variations in secondary structures can influencebinding of bridge probes, but have no apparent effect on theirability to discriminate mutations. The negligible effect oftemperature on the discrimination factor contributes to the highreproducibility of the results, permitting reliable sequenceanalysis even when the discrimination factor of structure-specific probes is only 2:1, e.g. the katG-specific probe 2(Fig. 3B). In contrast, the strong temperature effect on relativebinding of complementary and mismatched probes oftenadversely affects the reproducibility of mismatch discrimi-nating methods operating under high stringency conditions.

We have demonstrated in this work that available secondarystructure prediction programs, such as mfold, can benefit fromthe structural information provided by TaqExo 5′-nuclease,thus improving the probability of correct structure prediction.In addition, TaqExo data can be used to refine energy parametersfor certain structural elements, e.g. three-way junctions, hairpins

Figure 7. HCV genotype identification using structure-specific probe binding under low stringency conditions. (A) Relative binding affinities of bridge probes3-2, 5-4, 5-2 and 6-1 for type 1a, 1b, 2a/c and 3a 5′-UTR DNA targets at room temperature. (B) Identical to (A) but with the type 1a and 1b targets at 4 and 37°C.The binding affinities for the type 1a and 1b targets are shown by white and gray rectangles, respectively. Error bars indicate the standard deviations obtained fromtriplicate measurements.


and internal loops, eventually simplifying the design of highlyefficient structure-specific probes.

ACKNOWLEDGEMENTS

We thank Olke Uhlenbeck, James Dahlberg, Ted Ullman,Michael Zuker and John SantaLucia for discussions, PeggyEis, Kafryn Lieder and Andrew Lukowiak for critically readingthe manuscript, Frank Cockerill and Manual Altamirano forDNA samples and Natasha Lyamicheva for katG DNA clones.The work was supported by Cooperative Agreement70NANB7H3015 from the Department of Commerce AdvanceTechnology Program to Dr Lance Fors (Third Wave Technol-ogies).

REFERENCES

1. Southern,E.M. (1975) Detection of specific sequences among DNAfragments separated by gel electrophoresis. J. Mol. Biol., 98, 503–517.

2. Wallace,R.B., Shaffer,J., Murphy,R.F., Bonner,J., Hirose,T. andItakura,K. (1979) Hybridization of synthetic oligodeoxyribonucleotides tophi chi 174 DNA: the effect of single base pair mismatch. Nucleic AcidsRes., 6, 3543–3557.

3. Bains,W. and Smith,G.C. (1988) A novel method for nucleic acidsequence determination. J. Theor. Biol., 135, 303–307.

4. Drmanac,R., Labat,I., Brukner,I. and Crkvenjakov,R. (1989) Sequencingof megabase plus DNA by hybridization: theory of the method. Genomics,4, 114–128.

5. Southern,E.M. (1996) DNA chips: analysing sequence by hybridization tooligonucleotides on a large scale. Trends Genet., 12, 110–115.

6. Fodor,S.P., Rava,R.P., Huang,X.C., Pease,A.C., Holmes,C.P. andAdams,C.L. (1993) Multiplexed biochemical assays with biological chips.Nature, 364, 555–556.

7. SantaLucia,J.,Jr (1998) A unified view of polymer, dumbbell andoligonucleotide DNA nearest-neighbor thermodynamics. Proc. NatlAcad. Sci. USA, 95, 1460–1465.

8. Allawi,H.T. and SantaLucia,J.,Jr (1997) Thermodynamics and NMR ofinternal G.T mismatches in DNA. Biochemistry, 36, 10581–10594.

9. Gamper,H.B., Cimino,G.D. and Hearst,J.E. (1987) Solution hybridizationof crosslinkable DNA oligonucleotides to bacteriophage M13 DNA.Effect of secondary structure on hybridization kinetics and equilibria.J. Mol. Biol., 197, 349–362.

10. Fedorova,O.S., Podust,L.M., Maksakova,G.A., Gorn,V.V. andKnorre,D.G. (1992) The influence of the target structure on the efficiencyof alkylation of single-stranded DNA with the reactive derivatives ofantisense oligonucleotides. FEBS Lett., 302, 47–50.

11. Lima,W.F., Monia,B.P., Ecker,D.J. and Freier,S.M. (1992) Implication ofRNA structure on antisense oligonucleotide hybridization kinetics.Biochemistry, 31, 12055–12061.

12. Godard,G., Francois,J.C., Duroux,I., Asseline,U., Chassignol,M., Nguyen,T.,Helene,C. and Saison-Behmoaras,T. (1994) Photochemically and chemicallyactivatable antisense oligonucleotides: comparison of their reactivitiestowards DNA and RNA targets. Nucleic Acids Res., 22, 4789–4795.

13. Zarrinkar,P.P. and Williamson,J.R. (1994) Kinetic intermediates in RNAfolding. Science, 265, 918–924.

14. Parkhurst,K.M. and Parkhurst,L.J. (1995) Kinetic studies by fluorescenceresonance energy transfer employing a double-labeled oligonucleotide:hybridization to the oligonucleotide complement and to single-strandedDNA. Biochemistry, 34, 285–292.

15. Schwille,P., Oehlenschlager,F. and Walter,N.G. (1996) Quantitativehybridization kinetics of DNA probes to RNA in solution followed bydiffusional fluorescence correlation analysis. Biochemistry, 35, 10182–10193.

16. Tyagi,S. and Kramer,F.R. (1996) Molecular beacons: probes thatfluoresce upon hybridization. Nature Biotechnol., 14, 303–308.

17. Williams,J.C., Case-Green,S.C., Mir,K.U. and Southern,E.M. (1994) Studiesof oligonucleotide interactions by hybridisation to arrays: the influence ofdangling ends on duplex yield. Nucleic Acids Res., 22, 1365–1367.

18. Ehresmann,C., Bauldin,F., Mougel,M., Romby,P., Ebel,J.P. andEhresmann,B. (1987) Probing the structure of RNAs in solution. NucleicAcids Res., 15, 9109–9128.

19. Knapp,G. (1989) Enzymatic approaches to probing of RNA secondaryand tertiary structure. Methods Enzymol., 180, 192–212.

20. Michel,F. and Westhof,E. (1990) Modelling of the three-dimensionalarchitecture of group I catalytic introns based on comparative sequenceanalysis. J. Mol. Biol., 216, 585–610.

21. Zuker,M. (1989) On finding all suboptimal foldings of an RNA molecule.Science, 244, 48–52.

22. Gaspin,C. and Westhof,E. (1995) An interactive framework for RNAsecondary structure prediction with a dynamical treatment of constraints.J. Mol. Biol., 254, 163–174.

23. Brow,M.A., Oldenburg,M.C., Lyamichev,V., Heisler,L.M., Lyamicheva,N.,Hall,J.G., Eagan,N.J., Olive,D.M., Smith,L.M., Fors,L. and Dahlberg,J.E.(1996) Differentiation of bacterial 16s rRNA genes and intergenic regions andMycobacterium tuberculosis katG genes by structure-specific endonucleasecleavage. J. Clin. Microbiol., 34, 3129–3137.

24. Lyamichev,V., Brow,M.A., Varvel,V.E. and Dahlberg,J.E. (1999)Comparison of the 5′ nuclease activities of Taq DNA polymerase and itsisolated nuclease domain. Proc. Natl Acad. Sci. USA, 96, 6143–6148.

25. Shen,L.X., Basilion,J.P. and Stanton,V.P. (1999) Single-nucleotidepolymorphisms can cause different structural folds of mRNA. Proc. NatlAcad. Sci. USA, 96, 7871–7876.

26. Richardson,P.L. and Schepartz,A. (1991) Tethered oligonucleotideprobes. A strategy for the recognition of structured RNA. J. Am. Chem.Soc., 113, 5109–5111.

27. Cload,S.T., Richardson,P.L., Huang,Y.H. and Schepartz,A. (1993)Kinetic and thermodynamic analysis of RNA binding by tetheredoligonucleotide probes: alternative structures and conformationalchanges. J. Am. Chem. Soc., 115, 5005–5014.

28. Francois,J.C., Thuong,N.T. and Helene,C. (1994) Recognition andcleavage of hairpin structures in nucleic acids by oligodeoxynucleotides.Nucleic Acids Res., 22, 3943–3950.

29. Kaiser,M.W., Lyamicheva,N., Ma,W., Miller,C., Neri,B., Fors,L. andLyamichev,V.I. (1999) A comparison of Eubacterial and Archaealstructure-specific 5′-exonucleases. J. Biol. Chem., 274, 21387–21394.

30. Reynaldo,L.P., Vologodskii,A.V., Neri,B.P. and Lyamichev,V.I. (2000)The kinetics of oligonucleotide replacements. J. Mol. Biol., 297, 511–520.

31. Richards,E.G. (1975) In Fasman,G.P. (ed.) Handbook of Biochemistry andMolecular Biology, 3rd Edn. CRC Press, Cleveland, OH, Vol. 1, p. 597.

32. Marshall,D.J., Heisler,L.M., Lyamichev,V., Murvine,C., Olive,D.M.,Ehrlich,G.D., Neri,B.P. and de Arruda,M. (1997) Determination ofhepatitis C virus genotypes in the United States by cleavase fragmentlength polymorphism analysis. J. Clin. Microbiol., 35, 3156–3162.

33. Sreevatsan,S., Bookout,J.B., Ringpis,F.M., Mogazeh,S.L.,Kreiswirth,B.N., Pottathil,R.R. and Barathur,R.R. (1998) Comparativeevaluation of cleavase fragment length polymorphism with PCR-SSCPand PCR-RFLP to detect antimicrobial agent resistance in Mycobacteriumtuberculosis. Mol. Diagn., 3, 81–91.

34. Eisinger,F., Jacquemier,J., Charpin,C., Stoppa-Lyonnet,D., Bressac-dePaillerets,B., Peyrat,J.P., Longy,M., Guinebretiere,J.M., Sauvan,R.,Noguchi,T., Birnbaum,D. and Sobol,H. (1998) Mutations at BRCA1: themedullary breast carcinoma revisited. Cancer Res., 58, 1588–1592.

35. Killeen,A.A., Jiddou,R.R. and Sane,K.S. (1998) Characterization offrequent polymorphisms in intron 2 of CYP21: application to analysis ofsegregation of CYP21 alleles. Clin. Chem., 44, 2410–2415.

36. Oldenburg,M.C. and Siebert,M. (2000) New cleavase fragment lengthpolymorphism method improves the mutation detection assay.Biotechniques, 28, 351–357.

37. Lyamichev,V., Brow,M.A. and Dahlberg,J.E. (1993) Structure-specificendonucleolytic cleavage of nucleic acids by eubacterial DNApolymerases. Science, 260, 778–783.

38. Leontis,N.B., Kwok,W. and Newman,J.S. (1991) Stability and structureof three-way DNA junctions containing unpaired nucleotides. NucleicAcids Res., 19, 759–766.

39. Young,K.K., Resnick,R.M. and Myers,T.W. (1993) Detection of hepatitisC virus RNA by a combined reverse transcription-polymerase chainreaction assay. J. Clin. Microbiol., 31, 882–886.

40. Simmonds,P., McOmish,F., Yap,P.L., Chan,S.W., Lin,C.K., Dusheiko,G.,Saeed,A.A. and Holmes,E.C. (1993) Sequence variability in the 5′ non-coding region of hepatitis C virus: identification of a new virus type andrestrictions on sequence diversity. J. Gen. Virol., 74, 661–668.

Secondary structure prediction and structure-specific sequence analysis of single-stranded DNA

Documents

Transcript of Secondary structure prediction and structure-specific sequence analysis of single-stranded DNA