Computational design of RNAs with complex energy landscapes

14
Computational Design of RNAs with Complex Energy Landscapes Christian H¨ oner zu Siederdissen a , Stefan Hammer a , Ingrid Abfalter b , Ivo L. Hofacker a,c,d , Christoph Flamm a,, Peter F. Stadler e,f,g,a,d,h,a Department of Theoretical Chemistry, University of Vienna, W¨ ahringerstraße 17, A-1090 Wien, Austria b Research Support, Johannes Kepler University Linz, Altenberger Str. 69, 4040 Linz, Austria c Bioinformatics and Computational Biology Research Group, University of Vienna, A-1090 W¨ ahringerstraße 17, Vienna, Austria d Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark e Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, H¨ artelstraße 16-18, D-04107, Leipzig, Germany f Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany g RNomics Group, Fraunhofer Institut f¨ ur Zelltherapie und Immunologie, Deutscher Platz 5e, D-04103 Leipzig, Germany h Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA Abstract RNA has become an integral building material in synthetic biology. Dominated by their secondary structures, which can be computed eciently, RNA molecules are amenable not only to in vitro and in vivo selection, but also to rational, computation-based design. While the inverse folding problem of constructing an RNA sequence with a prescribed ground-state structure has received considerable attention for nearly two decades, there have been few eorts to design RNAs that can switch between distinct prescribed conformations. We introduce a user-friendly tool for designing RNA sequences that fold into multiple target structures. The un- derlying algorithm makes use of a combination of graph coloring and heuristic local optimization to find sequences whose energy landscapes are dominated by the prescribed conformations. A flexible interface allows the specification of a wide range of design goals. We demonstrate that bi- and tri-stable “switches” can be designed easily with moder- ate computational eort for the vast majority of compatible combinations of desired target structures. RNAdesign is freely available under the GPL-v3 license. Keywords: RNA sequence design; inverse folding; multi-stable structures; graph coloring Corresponding authors Preprint submitted to Manuscript May 3, 2013

Transcript of Computational design of RNAs with complex energy landscapes

Computational Design of RNAs with Complex Energy Landscapes

Christian Honer zu Siederdissena, Stefan Hammera, Ingrid Abfalterb, Ivo L. Hofackera,c,d,Christoph Flamma,∗, Peter F. Stadlere,f,g,a,d,h,∗

aDepartment of Theoretical Chemistry, University of Vienna, Wahringerstraße 17, A-1090 Wien, AustriabResearch Support, Johannes Kepler University Linz, Altenberger Str. 69, 4040 Linz, Austria

cBioinformatics and Computational Biology Research Group,University of Vienna, A-1090 Wahringerstraße 17, Vienna,AustriadCenter for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark

eBioinformatics Group, Department of Computer Science, andInterdisciplinary Center for Bioinformatics, Hartelstraße 16-18, D-04107, Leipzig,Germany

fMax Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, GermanygRNomics Group, Fraunhofer Institut fur Zelltherapie und Immunologie, Deutscher Platz 5e, D-04103 Leipzig, Germany

hSanta Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA

Abstract

RNA has become an integral building material in synthetic biology. Dominated by their secondary structures, whichcan be computed efficiently, RNA molecules are amenable not only toin vitro andin vivoselection, but also to rational,computation-based design. While the inverse folding problem of constructing an RNA sequence with a prescribedground-state structure has received considerable attention for nearly two decades, there have been few efforts to designRNAs that can switch between distinct prescribed conformations.

We introduce a user-friendly tool for designing RNA sequences that fold into multiple target structures. The un-derlying algorithm makes use of a combination of graph coloring and heuristic local optimization to find sequenceswhose energy landscapes are dominated by the prescribed conformations. A flexible interface allows the specificationof a wide range of design goals. We demonstrate that bi- and tri-stable “switches” can be designed easily with moder-ate computational effort for the vast majority of compatible combinations of desired target structures.RNAdesign isfreely available under the GPL-v3 license.

Keywords: RNA sequence design; inverse folding; multi-stable structures; graph coloring

∗Corresponding authors

Preprint submitted toManuscript May 3, 2013

1. Introduction

A wide variety of RNA elements requires transi-tions between two or more different spatial conforma-tions. A prime example are riboswitches. These reg-ulatory elements, which are abundant in prokaryotes,regulate mRNA transcription or translation in responseto metabolite concentrations, reviewed in Serganov andNudler (2013). Substrate binding or unbinding at theaptamer component of the riboswitch triggers a con-formational change of the molecule that is propagatedto the effector location, where it causes the formationor destruction of a terminator hairpin, or the expo-sure of sequestration of the Shine-Dalgarno sequence.RNA thermometers are a variation on this theme (Kort-mann and Narberhaus, 2012). Similar mechanisms havebeen reported in eukaryotic genome regulation for ele-ments in untranslated parts of mRNAs that respond toprotein binding (Rayet al., 2009). Major conforma-tional changes also play a crucial role in viroid pro-cessing (Baumstarket al., 1997), in the replication cy-cle of self-replicating RNA synthesized by Qβ replicase(Biebricheret al., 1992), the folding of rRNA after exci-sion of self-splicing introns (Cao and Woodson, 2000),and the functioning of the hok/sok host-killing system(Gultyaevet al., 1997).

Regulatory RNA elements that respond to externaltriggers are attractive components in synthetic biol-ogy (Wieland et al., 2009; Win et al., 2009; Toppand Gallivan, 2010), making the design of novel RNAcomponents an interesting task of practical importance(Isaacset al., 2006). Recent success in designing asynthetic riboswitch acting on transcription highlightsthe feasibility and usefulness of rational design ap-proaches for RNAs with distinct, prescribed confor-mations (Wachsmuthet al., 2013). Similar principleshave been used to construct a riboswitch based on IRESstructures (Ogawa, 2011).

The task of designing an RNA sequence with a pre-scribed secondary structure as its ground state is knownas the “inverse folding problem”. Although this com-binatorial problem is hard in general (Schnall-Levinet al., 2008), most instances of practical interest canbe solved by simple hill-climbing heuristics: An ini-tial random seed is progressively “mutated” to approachthe desired folding properties. This simple idea isthe basis ofRNAinverse (Hofackeret al., 1994) andlater, more efficient approaches such asRNA-SSD (An-dronescuet al., 2004; Aguirre-Hernandezet al., 2007),as well as the very efficient optimization algorithm(Zadehet al., 2011b) implemented inNUPACK (Zadehet al., 2011a).INFO-RNA (Busch and Backofen, 2006)

uses a dynamic programming approach to compute themost stable sequence for the prescribed secondary struc-ture as a starting point for a local search heuristic. Amulti-objective optimization approach considering thetrade-off between thermodynamic stability and struc-tural similarity is used inMODENA (Taneda, 2011). In-verse folding problems can also be solved by an exactbranch and bound algorithm (Burghardt and Hartmann,2007). An alternative, essentially enumerative approachthat covers certain classes of pseudoknots is describedin (Gaoet al., 2010). RNAexinv (Avihoo et al., 2011)includes some additional attributes and also the muta-tional robustness and the minimum free energy. As analternative to iterative improvement, a global samplingapproach was proposed in Levinet al. (2012).

Much less is known about the design problem formulti-stable RNAs. In this case, the design goals in-volve more complex properties of the energy landscapesuch as prescribed local optima and energy barriers. Aweb tool for this type of design problem isARDesigner(Shuet al., 2010), which implements many of the ideasdiscussed in Flammet al. (2001). The most salient dif-ference between the inverse folding problem for singleand multiple structural constraints is that a solution neednot exist in the latter case (Flammet al., 2001). Thus,computing feasible solutions as starting points for sub-sequent optimization steps, and —in particular— sam-pling these starting points so that biases can be avoided,becomes a non-trivial problem. Flammet al.(2001) de-scribe a uniform sampling procedure for two prescribedsecondary structures. In this case, a non-empty set offeasible solutions always exists (Reidyset al., 1997).

The general case has been discussed by Abfalteret al.(2003), but no corresponding software has becomeavailable. Lyngsøet al. (2012) implemented a muchsimpler, approximate sampling of initial conditions to-gether with a genetic algorithm in theirFrnakensteintool. Similar ideas have been used by (Ramlan andZauner, 2011). In the present contribution, we con-solidate and expand on our earlier computational ap-proaches to designing RNA sequences with multipleprescribed conformations that satisfy additional, com-plex constraints and provide withRNAdesign an imple-mentation for a wide variety of RNA design tasks.

2. Theory

2.1. Notation

LetA denote the alphabet of monomers and letB ⊂A×A be the set of allowed base pairs. We assume thatB is symmetric. For RNA, we haveA = {A,U,G,C}

2

(..)..(....).(....)..

(.....)(...).........

(......)(....).......

(..(....)......).....

(..(......)(...))....

...(.......(....).).. 1

5

10

15

20

Figure 1: The dependency graphG of M = 6 secondary structures. Tomake the example smaller, the minimum number of unpaired positionsin a hairpin is reduced to 2 here.

and B = {AU,UA,GC,CG,GU,UG}. We denote asequence consisting ofn monomersxi ∈ A by x =x1x2 . . . xn.

A secondary structureΘ on x is a set of pairs (i, j),1 ≤ i < j ≤ n such that for all (i, j), (k, l) ∈ Θ holds

1. (xi , x j) ∈ B

2. (i, j) = (k, l) or {i, j} ∩ {k, l} = ∅, i.e.,Θ is a match-ing on{1,2, . . . ,n}

3. If i < k < j or i < l < j, theni < k < l < j, i.e.,base pairs do not cross.

Given a secondary structureΘ, we write

C[Θ] ={

x ∈ An∣

∣(xi , x j) ∈ B for all (i, j) ∈ Θ}

for the set of sequences that can form the structureΘ.We say that a sequencex ∈ C[Θ] is compatible withΘ.

To every pair (x,Θ) of a sequencex and a secondarystructureΘ compatible withx, an energyf (x,Θ) can beassigned. In practice,f (x,Θ) is computed as the sumof energy contributions of stacked base pairs and loops,which in turn are derived from a large body of accuratethermodynamic measurements (Mathewset al., 2004).

The energy landscape for a fixed sequencex is de-fined by the functionfx : Θ 7→ f (x,Θ) together withan adjacency relation of the secondary structures. Asusual, we regard two secondary structures as adjacentif they differ by a single base pair. Later, we will useproperties offx in the specification of the design goals.

Now consider a collection{Θ1,Θ2, . . . ,ΘM} of M dis-tinct secondary structure of the same lengthn. Is there asequencex that is simultaneously consistent with all theΘi? If so, our task is to determinex such that the pre-scribedΘi features as prominently as possible amongthe structures formed byx. We first address the exis-tence question.

2.2. The Search SpaceC

Given {Θ1,Θ2, . . . ,ΘM}, the set of sequences simul-taneously consistent with all these secondary structuresis

C = C[Θ1] ∩ C[Θ2] ∩ · · · ∩ C[ΘM] (1)

Hence, the design problem is solvable ifC , ∅. Thisquestion is addressed in Flammet al. (2001).

The dependency graph G= G(Θ1,Θ2, . . . ,ΘM) hasn vertices corresponding to the sequence positions ofx.There is an edge connectingk ∈ V(G) with l ∈ V(G)if and only if (k, l) is a base pair in at least one of thesecondary structuresΘi , i.e.,

E(G) =M⋃

i=1

Θi (2)

see Fig. 1.Generalized Intersection Theorem. (Flammet al., 2001)SupposeB ⊆ A × A contains at least one symmetricbase pair, andG is the dependency graph of a set ofsecondary structures. Then:

1. C , ∅ if G is bipartite. For the RNA alphabet,bipartiteness ofG is also a necessary condition.

2. There are|C | =∏

componentsψ of G

F(ψ) sequences in

C , whereF(ψ) is the number of sequences that arecompatible with a connected componentψ of G.

A simple breadth-first-search coloring algorithm can beused to test whetherG is bipartite or not.

Even for M = 3 different secondary structures, it issimple to construct triples of structures with conflict-ing base pairs that lead to a triangle inG(Θ1,Θ2,Θ3).In order to estimate the probability of a non-emptyC ,we sampled secondary structures with uniform proba-bility as described by Tackeret al. (1996) and checkedwhether the dependency graph ofM-tuples of structuresis bipartite. ForM ≥ 3, we find an exponential decreasewith sequence length, see Fig. 2. However, the expo-nent is very small forM = 3, indicating that tri-stableswitches in particular should not be uncommon.

2.3. Sequence Design as Graph Coloring

In this section, we outline a dynamic programmingapproach that can be used to enumerate and uniformlysample fromC . To this end, we consider sequencesasA-colorings of the dependency graphG, that is, asmapsc : V(G) → A which obey the pairing rules, i.e.,(c(k), c(l)) ∈ B for all (k, l) ∈ E(G).

3

0 50 100 150 200chain length

-8

-6

-4

-2

0lo

g(fr

act

ion

of

bip

art

ite g

rap

hs)

2 3 4 5 6 7 8number of structures

-0.3

-0.2

-0.1

0.0

slo

pe

Figure 2: Statistics of the fraction of bipartite graphs versus sequencelength with different numbersM of prescribed structures generatedwith uniform distribution for the set of all secondary structures offixed chain length.• M = 3, ¥ M = 4, ¨ M = 5, N M = 6, andH M = 7. Data redrawn from (Abfalteret al., 2003).

The important observation for our purposes is thatcolorings can be obtained by combining partial color-ings: LetH be a subgraph ofG, and consider two vertixsetsU,W ⊆ V(H). A partial coloring ofU in H is a mapcU : U → A such that (c(u), c(v)) ∈ B for all u, v ∈ Uwith (u, v) ∈ E(H). Partial coloringscU andcW on UandW, respectively, are compatible if (i)cU(y) = cW(y)for all y ∈ U ∩W and (ii) (cU(u), cW(v)) ∈ B for u ∈ Uand v ∈ W with (u, v) ∈ E(H). Denote by∂(U,W)the set of vertices in whichU and W overlap or areadjacent. Denote byc(U,a) andc(W,b) the sets of allthose colorings onU andW that are fixed to some as-signmentsa andb on ∂(U,W). Then the set of color-ings in U ∪W consists exactly of the combinations ofcolorings onU and W for which a and b are consis-tent, i.e., identical onU ∩ W and satisfying the colorconstraints on adjacent vertices. For simplicity, writec(U ∪W) =

a,b c(U) ◦ c(U ∪W)The idea is to use this type of composition of the

set of all conflict-free colorings for the stepwise con-struction ofc(G). Graph coloring is a well-known NP-complete problem (Jensen and Toft, 1994). Of course,our approach cannot overcome this in general. We can,however, search for a decomposition ofG that allows usto concatenate partial colorings using as little resourceconsumption as possible.

It is particularly easy to compose colorings at cut ver-tices. In the first step, we therefore decompose (eachconnected component of)G into its blocks, i.e., the 2-connected components and those edges that are not con-tained in a cycle, Fig. 3. The blocks and cut-vertices,i.e., the vertices common to two or more blocks, can be

G1

G’2

G’’2G2

G3 G3

G1

x

y

x xx

y

xxx

yy

Figure 3: Decomposition of the dependency graph of Fig. 1 intoitsblocks. Colors need to be constrained only at the cut vertices x andy.The number of colorings in this example is

|c(G)| =X

cy

|c(G′′2 , cy)| ·X

cx

|c(G′2, cxcy)| · |c(G1, cx)| · |c(G3, cx)| .

determined in linear time. For each blockB, we thendetermine the sets of coloringsc(B,q) with fixed colorsq assigned to cut vertices ofG in B. The 2-connectedcomponents are arranged in a tree. Choosing an arbi-trary root, we can compose the colorings recursively bytraversing from the leafs up.

Since 2-connected components can be large, we needto decompose them further. While it might seem natu-ral to use successive higher-order connectivities for thispurpose, we explore here an alternative approach.

Let G be a 2-connected graph. An ear decompositionof G is a sequenceE = (P0,P1, . . . ) of paths whereG0 =

P0 is a single vertex,

Gk =

k⋃

i=0

Pi (3)

andGµ = G, with µ = |E|−|V|+1 being the dimension ofthe cycle space ofG. An ear decomposition is “open” ifPi , i ≥ 1, has two distinct end points inG, see Fig. 4 foran example. An open ear decomposition exists exactlyfor 2-connected graphs (Whitney, 1932). We use theEDS algorithm by Maonet al. (1986) to produce thedecomposition. For details, see Sec. 3.1.

With the ear decompositionE of G we associate asequence of subgraphs ofG for which we construct thecolorings:

Gk =

µ⋃

i=k+1

Pi (4)

By definition G0 = G andGµ = ∅, the empty graph.Further, we have

Gk = Pk+1 ∪Gk+1 (5)

The intersectionAk := Gk ∩ Gk is completely discon-nected for eachk and by construction forms a cut inG.We call these vertex sets theattachment pointsof Gk onGk.

4

G0G6 G5 G4 G3 G2 G1

G6 G5 G4 G2 G1 G0G3

P6P5

P4

P3P2

P1 P0

Figure 4: Graphs associated with an ear-decomposition: (top) Ear-decomposition of a block: In each step fromG6 to G0, a path (ear) is removeduntil a central cycle is left. (bottom) The correspondingGk of each step is shown. The attachment points of the ears are depicted by unfilledvertices. For more compact illustration, a non-bipartite graph is shown.

Our task is now to construct and evaluate the setsc(Gk,ak) of colorings of the graphGk with colors ak

fixed on the setAk of its attachment points. To this end,we start from the outer-most pathGµ−1, for which thecolorings are easily constructed and counted, and pro-ceed inwards until we reachG0 = G.

These setsc(Gk,ak) can be computed by combining,in the above sense, colorings of the pathPk+1 with col-orings of the subgraphGk+1, again with prescribed as-signmentsak+1 at its attachment pointsAk+1.

c(Gk,ak) = c(Pk+1,b) ◦ c(Gk+1,ak+1) (6)

SinceAk, Ak+1, andPk+1 are generally not disjoint, thecoloringsak, b, andak+1 at the sets of attachment pointsmust, of course, coincide at their intersections. How-ever, asPk+1 andGk+1 are not connected by any otheredges inG, the concatenation◦ of the coloring sets isconstrained only by the common vertices. In particular,the end points ofPk are attachment points inAk, andthe attachment points ofGk+1 are either contained in theinterior of Pk+1 (b andak+1 coincide onAk+1 \ Ak), orthey are attachment points ofGk+1, and thusak = ak+1

on these vertices.The pathPk+1 is subdivided by the interior attachment

points into|Ak+1 \ Ak| + 1 sub-paths. For any coloringconditionB, it is straightforward to compute the set ofcolorings on a path of lengthℓ with fixed colors at itsendpoints, see e.g. Flammet al. (2001). From these,colorings of longer paths with fixed colors at the attach-ment points are easily obtained.

This decomposition of the sets of colorings formsthe basis for the recursive enumeration of colorings bydynamic programming. In each decomposition stepk,we need to store the number of colorings|c(Gk,ak)| of

0 2 4 6 8 10 12 14

02

46

810

1214

β

α

Figure 5: Distribution of the exponentsα (memory requirements) andβ (CPU runtime) for a fixed graph withµ = 13. 1000 spanning treeswere generated by replacing randomly selected tree edges with non-tree-edges. Each vertex was used a root for the Maon/Schieber al-gorithm to compute the ear decomposition. The (rare) best spanningtrees yieldα = 4 andβ = 5 in this example.

Gk given a fixed coloring of the attachment vertices,i.e., |A||Ak| values. The maximumα = maxk |Ak| overthe steps of the ear decomposition thus determines thememory requirements.

The CPU time required to compute one entry in thismatrix is determined by the setAk+1 \ Ak of attachmentpoints ofGk+1 that are attached to the interior ofPk+1.The total effort to count all colorings is therefore|A|β

with β = maxk(|Ak|+ |Ak+1\Ak|). The exponentsα andβdepend explicitly on the spanning tree ofG used in the

5

construction of the ear decomposition. Fig. 5 shows thatα andβ can vary dramatically, depending on the choiceof the the spanning tree. Note thatα andβ are stronglycorrelated. The data suggest that|Ak+1 \ Ak| ∈ {0,1,2},i.e., in each step at most two earlier attachment pointsare consumed.

2.4. Generating Colorings

For small connected components ofC it is possible(and efficient) to explicitly enumerate all colorings. Forlarge components, however, this becomes inefficient,and we must resort to a sampling technique. To this end,we use the generic idea of stochastic backtracing in dy-namic programming, which is used in a similar context,for instance, to generate samples of secondary struc-tures (Tackeret al., 1996; Ding and Lawrence, 2001;Waldispuhl and Clote, 2007). Here, we set for eachkfrom 1 toµ the colorsak for the attachment points withprobabilities proportional to|c(Gk,ak)|. In the secondstep, we sample the colorings for the connecting paths,whose end points now have fixed colors, as describedin Flammet al. (2001). This simple procedure ensuresuniform sampling fromC and hence unbiased gener-ation of feasible solutions. Biased samples could, ofcourse, be generated with less effort, for example, bydepth first search search with a random vertex order. Inmany cases, however, it is desirable to minimizea priorisequence biases.

2.5. Local Moves and Optimization

All heuristics for RNA design use “local moves” tonavigateC in an attempt to further improve the se-quence. The most obvious move, i.e., changing thecolor of a single vertex ofG, however, will typicallynot be feasible as it destroys compatibility. Instead weneed to always replace all vertices belonging to one con-nected component of the dependency graph. For de-signs with a single target this reduces to mutating asingle unpaired base or replacing a base pair with an-other one. This type of structure-dependent moves isalso used, for instance, to explore neutral networks ofsequences folding into the same secondary structures(Schusteret al., 1994). As the number of target struc-tures grows the dependency graph will have larger butfewer connected components. This means that fractionof the sequence changed in a single “local move” be-comes larger and larger.

In the context of sequence design with design goalsspecified in terms of the energy landscape, locality interms of sequence is highly desirable. The RNA energymodel has the interesting property that the difference in

minimum free energy between two sequences that dif-fer by a single nucleotide is bounded by a constantc(Fontanaet al., 1993). This is a consequence of the ad-ditivity of the energy model, which limits the effect ofa mutation to the maximum energy difference betweentwo adjacent loops upon removing their separating basepair, in practice twice the maximum stacking energy. Asan immediate consequence

| f (x,Θ) − f (x′,Θ)| ≤ cdH(x, x′) (7)

wheredH( . , . ) denotes the Hamming distance. Smallchanges in the sequence therefore cause only moderatechanges in the Boltzmann distribution of structures andare thus less prone to destroying achievements of pastoptimization steps.

The design goals are represented by an objectivefunction Ξ : C → R that assigns a “fitness” to eachsequencex, i.e., a feasible coloring ofG. We use a sim-ple, Simulated Annealing-like strategy to optimizeΞ. Ineach step, a candidatex′ is generated by a local move inone of the components of G. We acceptx′ if

Ξ(x′) ≤ Ξ(x) + t, t ∼ exp(λ) (8)

The new candidate sequencex′ is always accepted if itis better according to the optimization criterionΞ thanits parentx. To avoid locally optimal traps, a candidatesequence is also accepted if the energy difference is lessthan an exponentially distributed random variate (drawnanew each time). The parameterλ controls the speedwith which local optima are left again.

2.6. Design Goals

This fitness functionΞ can combine many featuresof the energy landscape ofx that can be expressedin terms of the secondary structure model. Examplesof such building blocks are properties of the Boltz-mann ensemble of secondary structures ofx such asits partition functionZ(x), the ensemble free energyg(x) = −RT ln Z(c), the minimum free energyf (x) =minΘ f (x,Θ), the base-pairing probability matrixP(x),and the energy of a given structuref (x,Θ). All theseproperties are readily computed by RNA folding algo-rithms as implemented, for instance, in theVienna RNA

Package (Hofackeret al., 1994; Lorenzet al., 2011).An basic design task, on which we focus here, is

to construct RNA sequences for which the prescribedstructuresΘi have nearly the same folding energy andwhich together dominate the Boltzmann ensemble. TheΘi will thus correspond to the ground state and the mostimportant metastable states in the fitness landscape. The

6

simplest fitness function for this task aims at simultane-ously minimizing the energy of theΘi , for instance

Ξ(x) = maxi=1...M

f (x,Θi) (9)

Since optimization of equ.(9) forces an increase in thefraction of the most under-represented target structure,it leads to comparable abundances of all prescribedstructures. The advantage of this ansatz is that it canbe evaluated very efficiently, requiring only the deter-mination of the energy ofM individual secondary struc-tures and avoids the use of the computationally demand-ing RNA folding algorithm. The effort to evaluateΞ(x)is only O(Mn), compared to the cubic inn runtime ofRNA folding. A disadvantage, however, is the lack ofdirect control over the ground state and hence over theensemble in which theΘi are embedded.

Zadehet al. (2011b) argued that design fitness func-tions should not only contain the positive design goalsbut also encapsulate negative design goals, i.e., theyshould explicitly penalize unwanted structures in theBoltzmann ensemble. A good example is the ensembledefectd(x,Θ) defined as the expected base pair distanceof a random structure picked from the Boltzmann en-semble from the target structureΘ. It can be computedin quadratic time fromP(x) (Zadehet al., 2011b). Thesum of the ensemble defects is one of several conceiv-able generalizations of the multi-target design problem.

Flammet al. (2001) used a different form of the ob-jective for bistable structures, aiming directly at mini-mizing the difference between the energies of the indi-vidual structures and the ensemble free energy. ForMstructures, this approach yields

Ξ(x) =

(

K∑

k=1

f (x,Θk) − g(x)

)

+ γ

(

k<l

( f (x,Θk) − f (x,Θl))2

) (10)

The first part of equ. 10 minimizes the difference be-tween the energies of the target structures and the Gibbsfree energy of the ensemble, while the second partyields targets that have approximately the same energy.The weightγ allows us to favour one goal over the other.Fitness functions based on RNA folding are expensiveto evaluate but promise better designs. An appealingapproach is thus to first find a sequence using equ. 9,which is then used as the initial seed for further opti-mization using equ. 10 or another scheme.

Additional design goals can easily be included intoΞ. For instance, a prescribed sequence composition can

be approached by suitable penalty terms for sequencebias. In particular, a log-multinomial function is avail-able that allows penalizing mono-nucleotide distribu-tions that deviate from a user-selected probability vec-tor. More elaborate features of the fitness landscape,such as minimum heights of energy barriers, could alsobe included as discussed in Flammet al. (2001), albeitat high computational cost.

2.7. Summary of the Design Algorithm

The complete design algorithm consists of the follow-ing steps:

1. INPUT: a set of secondary structures{Θi |1 ≤ i ≤M} and the objective functionΞ.

2. Construct the dependency graphG(Θ1, . . . ,ΘM).

3. If G is not bipartite, stop since the design problemis unsolvable.

4. Decompose the graph first into its connected com-ponents, then further into the biconnected compo-nents, and finally construct an open ear decompo-sition for each block.

5. Compute the numbers|c(H,a)| of colorings forthe various subgraphs in the decomposition withfixed color assignments at their attachment and cutpoints.

6. Using these tables, generate sequences with uni-form distribution on the set of compatible se-quences.

7. Optimize these start sequences by local search withrespect to the desired cost functionΞ for the designproblem at hand.

8. OUTPUT: Optimized nucleic acid sequence com-patible with all predefined structures.

3. Results

3.1. Implementation

We opted to implement the algorithm describedabove in the functional programming language Haskell(The GHC Team, 1989–2013; Hudaket al., 2007).Haskell promotes a high-level style of programming andmakes it easy to separate the logically distinct facets ofan algorithm. In terms of implementation, the func-tional style of programming sometimes requires ex-pressing an algorithm differently than known from theimperative world (Okasaki, 1999; Bird, 2010). Here,this concerns in particular the graph decomposition al-gorithm and the evaluation of candidate sequences.

7

dot.ps

C C G C G C U C G G G G U U C C U G G G U U U G U U A U C G C U C A A G U G G G U G G U A G C U C A G A U U A G C G G U U A A A G C U A U U G C U U G U U C G U G C G G U A G U G G G U U C G G G A A G U U U C C G G G C C A C G G C

C C G C G C U C G G G G U U C C U G G G U U U G U U A U C G C U C A A G U G G G U G G U A G C U C A G A U U A G C G G U U A A A G C U A U U G C U U G U U C G U G C G G U A G U G G G U U C G G G A A G U U U C C G G G C C A C G G CCC

GC

GC

UC

GG

GG

UU

CC

UG

GG

UU

UG

UU

AU

CG

CU

CA

AG

UG

GG

UG

GU

AG

CU

CA

GA

UU

AG

CG

GU

UA

AA

GC

UA

UU

GC

UU

GU

UC

GU

GC

GG

UA

GU

GG

GU

UC

GG

GA

AG

UU

UC

CG

GG

CC

AC

GG

C

CC

GC

GC

UC

GG

GG

UU

CC

UG

GG

UU

UG

UU

AU

CG

CU

CA

AG

UG

GG

UG

GU

AG

CU

CA

GA

UU

AG

CG

GU

UA

AA

GC

UA

UU

GC

UU

GU

UC

GU

GC

GG

UA

GU

GG

GU

UC

GG

GA

AG

UU

UC

CG

GG

CC

AC

GG

C

CC

GC

GC

UC

GG

GGU U

CC

UG

GG

U U U GU

UA

UC

GC

UC

AA G

UG

GG

UG

GU

AG

C UC

AG

AUUA

GC

GG U

UA

AA

GC

UA

UU

GC U

UG

UUC

GU

GC

GG

UA

GUGG

GU

UC

GG

GA

AGU

U UC

CG

GG

CC

AC

GG

C

CCG

CGCUCGGGGUUCCUGGGUUUGUUAUCGC

UC

AAGUGGGUGGUAGCU

CA

GAUU

AG

CGGUU

AA

AGCUAUUGCUUGUU

CG

UGCGGUAGUGGGUUCGGGAAGUUUCCGGGC

CA

CGGC

11418

24

2120

16

6

975

25

1017

1215

3

411

2213

238

19

2

1 100 10000 1e+06 1e+08Time [arbitrary units]

0

0.2

0.4

0.6

0.8

1

Pop

ulat

ion

Pro

babi

lity

Figure 6: A designed sequence for the SV11 riboswitch.(Dotplot) The upper triangular matrix depicts the probability for eachindividual nucleotideto be paired. The lower triangular plot shows the two structural constraints.(Structures) In red, the lower-energy structure that forms a Y-shapedmulti-branched loop and two additional exterior loops, and in blue the single long helix structure. The red structure is almost equivalent to theminimum-free energy structure which forms a single additionalAU base pair, resulting in a three-nucleotide hairpin insteadof a five-nucleotidehairpin with an additional gain of 0.6 kcal/mol. Accordingly, the second structure in the suboptimal ensemble is the red structure. This is followedby the blue structure at the third position in the ensemble with a difference of 0.9 kcal/mol. (Barrier tree (left bottom) and folding kinetics (rightbottom)) The red and blue curve correspond to the target structures and are dominant in the kinetics. The dashed lines are structures that are verysimilar (base pair distance of five or less) to the target structures. As the energy distance to the open chain is too large toinclude in the barrier andkinetics calculations, we started from a structure (colored black) that is somewhat related to the red target.

The ear decomposition algorithm of Maonet al.(1986), which we use to handle complex components ofthe dependency graph, is implemented using the func-tional graph library (Erwig, 1997). The decompositionalgorithm by Maonet al. (1986) adapts well to a func-tional description as it is not described in terms of anexplicit graph coloring, but rather as a decomposition ofthe original graph into a spanning tree, tree edges, andnon-tree edges. The resulting ears are then colored bylegal assignments of base pairs. The laziness propertiesof algorithms implemented in Haskell make it possibleto handle assignments with a large number of legal as-

signments without having to explicitly store them.

The evaluation of candidate sequences is a poten-tial performance bottleneck, as it requires evaluationof the energy of sequence candidates given the struc-ture constraints. We make use of fusion (Hinzeet al.,2011), a compiler optimization technique aimed at re-moving intermediate data structures in functional pro-grams, which often yields executables with a runtimeperformance comparable to that ofC implementations.In particular, we use stream fusion (Couttset al., 2007;Leshchinskiy, 2009) during sequence sampling. Energyevaluations are performed in a functional version of the

8

Vienna RNA folding algorithms (Lorenzet al., 2011),which are also fused (Honer zu Siederdissen, 2012).

In order to facilitate the exploration of different ob-jective functions, the user can supplyΞ on the commandline as a function of the primitive features outlined inSect. 2.6. It is easy to extend both the design algorithmand the command line parser to include additional termsif necessary. The current implementation ofRNAdesign

uses equ.(10) as default objective function.In many cases it is important to enforce sequence

constraints. For example, the Shine-Dalgarno sequence,the start codon, and the sequence of the ligand-bindingaptamer are typically fixed in design problems for ri-boswitches. We therefore provide an option to restrictthe set of nucleotides that may be varied during the de-sign process. The current implementation allows the useto specify, for each sequence position, the set of allowednucleotides. It is important to note that sequence con-straints further shrinkC and may render a design prob-lem infeasible even if the prescribed target structures areconsistent.RNAdesign of course detects such cases.

In addition to the functional implementation, we aredeveloping a memory-efficient implementation in C toextend the range of applications to large complex prob-lems, i.e, very long sequences andM ≫ 3 independenttarget structures.

3.2. Artificial SV11-like Bistable RiboswitchesSV11, a 115 nt long RNA, is a recombinant of the

plus and minus strands of the phage-derived MNV-11 RNA. Both molecules are efficient substrates forQβ replicase and arise consistently in artificial selec-tion experiments (Biebricheret al., 1992). SV11 isfrequently used as an extreme example of an RNAwhose properties are determined by folding intermedi-ates rather than its thermodynamic ground state alone.Co-transcriptional folding results in a metastable con-formation consisting of a Y-shaped multibranched struc-ture and two additional exterior hairpin loops, which isreplicated by Qβ replicase. The ground state, in con-trast, is a single long helix structure with a hairpin whichno longer serves as a template for the Qβ replicase. Themetastable structure can spontaneously rearrange to theground state. This transition is effectively irreversiblebecause of an energy difference exceeding 30 kcal/mol,as computed byRNAeval (Lorenzet al., 2011). For thesame reason, the base pair probabilities in the equilib-rium ensemble give no indication of important structuralalternatives.

Because of its extreme properties, the SV11 structurepair has been used repeatedly as an example, includ-ing for design tasks (Lyngsøet al., 2012) whose goal

is to find a sequence that realizes the two conforma-tions with nearly equal energy. In Fig. 6, we show thatour software readily solves this computational problem.The sequence proposed by our algorithm using the de-fault optimization criterion of equ. 10 is almost opti-mal. Both structures differ by only 0.9 kcal/mol. Theminimum-free energy structure differs from the com-plex (red) structure by a singleAU base pair.

Using only the criterion in equ.(10), the algorithm re-quired 120 seconds on anIntel Core i5-3570K. Intotal, 50 candidates were created, with a thinning of 200(i.e., only every 200th candidate is retained) with an ini-tial burn-in period of 100 candidates. Of the 50 candi-dates that are returned, only the top-most was selected.Other, suboptimal, solutions are returned to provide al-ternatives that can be evaluated before running the algo-rithm again.

3.3. A Tri-stable Riboswitch

In Fig. 7 we present a small, artificial example of atri-stable system with three prescribed target structures(red, green, and blue). The computational design prob-lem is solved by our tool using the default fitness func-tion equ.(10) within 10 000 optimization steps, amount-ing to 45 seconds on standard PC hardware. The de-signed sequence readily folds into exactly these threestructures. The red structure is the minimum-free en-ergy structure, the green and the blue ones are the firsttwo sub-optimal local minima in the energy landscape.Alternative structures with non-negligible probabilityhave a small base pair distance from one of the targets.As indicated by the partition function dotplot, the tar-gets are very well represented in the structural ensem-ble. Base pairs that are not part of one of the threedesign goals are very rare. There are, however, some“mixed” structures that facilitate the transition betweenthe three local optima. We use a barrier tree to visualizethe landscape of the designed sequence. Simulated fold-ing kinetics, starting from the open chain, shows that thethree target structures are, again, the three most promi-nent structures.

3.4. A Large-scale Set of Multi-stable Targets

The example above shows the effectiveness of our al-gorithm in designing sequences for tri-stable targets. Todemonstrate that our method scales well to larger de-sign problems, we generated an ensemble of 100 de-sign problems, each produced from a random sequenceof length 100 as follows: We first usedRNAshapes(Reeder and Giegerich, 2005) to extract the three moststable coarse-grained structures and their most stable

9

dot.ps

G C A U U U U G G U G U C A A A G C U C C C C C G A G C G G U U A U G C

G C A U U U U G G U G U C A A A G C U C C C C C G A G C G G U U A U G CGC

AU

UU

UG

GU

GU

CA

AA

GC

UC

CC

CC

GA

GC

GG

UU

AU

GC

GC

AU

UU

UG

GU

GU

CA

AA

GC

UC

CC

CC

GA

GC

GG

UU

AU

GC

GCAUUU

UG G U G U

CA

A A GC

UCC C

CCG

AG

CGGU

UAUGC

GCAU

UU

U G GU

GU

CA

A A GC

UCC C

CCG

AG

CGG

UUAU

GC

GCAU

UUUGGUG U

C A A A GC

UCC C

CCG

AG

CGGUUA

UGC

1

2

3

4

5

6

7

8

9

10

11

12

13

14

1 100 10000 1e+06 1e+08Time [arbitrary units]

0

0.2

0.4

0.6

0.8

1

Pop

ulat

ion

Pro

babi

lity

Figure 7: Example of a tri-stable switch molecule.(Dotplot) All structures have great statistical weight within the thermodynamic equilibrium.Base pairs belonging to one structure are colored red, green, or blue, respectively. All structures share four base pairs, which are colored black.(Structures) The three structures are the three optimal structures in the set of suboptimal structures and have a difference of 0.0 (the top-rightstructure is the mfe structure), 0.2, and 0.3 kcal/mol to the minimum-free energy structure.(Barrier tree (left, bottom) and folding kinetics(right, bottom)) The desired structures of the tri-stable switch form the mostprominent local minima in the folding landscape. Kinetic curves inbrownish colors correspond to mixed conformations where compatible structural features of different local minima structures have been blendedinto a single structure.

fine-grained representatives (“shreps”). The SV11 se-quence, for example, has (at least) two shapes: the low-energy rod-like structures with shape[] and the high-energy complex structure[[][]][][] with a Y-shapedmulti-branched loop and two additional external stems.This way we ensure that the design problem is feasible.We solve the design problems using the fitness func-tion equ.(10), perform a single optimization run for eachproblem, and retain a single top-scoring sequence, asoutput.

In order to evaluate the quality of the design se-quences we investigated their energy landscapes in moredetail. UsingRNAsubopt (Lorenzet al., 2011) we pro-duced all suboptimal structures within 5 kcal/mol of theground state, and determined the suboptimal structureΘ′i within this energy band that is closest to the designgoalΘi . Ideally, the base-pairing distancedbp(Θi ,Θ

′i )

should vanish andf (Θi) should be very close to the

ground state for all three design targets. In 96% of thedesign problems, all three target structures were indeedcontained within the 5 kcal/mol energy band, and evenin the remaining few cases, a very similar structure wasobserved within this range. Fig. 8 shows the energy dif-ferences for the 100 designs as a scatter plot. In most ofthe designs, one of the three targets is the ground state.The mean and median energy differences between theground state and the worst of the three target structuresis only 1.0 and 1.5 kcal/mol, respectively. Overall, thesedata show that our approach produces close-to-optimaltri-stable designs reliably and efficiently.

To assess the increase in difficulty of designing se-quences for more than three target structures simultane-ously, we repeated the large-scale experiment, but thistime using four target structures instead of three. Theresults shown in the lower panel of Fig. 8 lead us to be-lieve that our approach does indeed scale to very com-

10

0

1

2

3

4

5∆E

[kca

l/mol

] target structures sorted by energyM=3

0

1

2

3

4

5

∆E [k

cal/m

ol]

design instances

target structures sorted by energyM=4

Figure 8: Performance of the sequence design algorithm on 100ran-domly generated instances.(Top) Tri-stable targets. Of the 300 struc-tures, the 288 that appear within the 5 kcal/mol window are shown.The remaining 12 structures have 1≤ dbp ≤ 14. The lowest-energystructure (red) is typically very close (mean 0.93, median 0.4 kcal/moldifference) to the mfe structure, the second (mean 1.17, median 0.7kcal/mol; green) and third (mean 1.41, median 0.9 kcal/mol; blue)structure are more distant. (A small amount of jitter was added tobetter separate data points).(Below) Four target structures. Only 24of the 400 target structures lie outside the 5 kcal/mol window. Theenergy differences to the ground state for the energy-sorted targets arered (mean 1.85, median 1.6 kcal/mol), green (mean 2.17, median 1.9kcal/mol), blue (mean 2.40, median 2.05 kcal/mol), and black (mean2.62, median 2.3 kcal/mol).

plex multi-stable targets. As the quality of the generatedsequences also degenerated slightly with median differ-ences from 1.6 to 2.3 kcal/mol, further investigation intocomplex multi-structure targets will be required. Nev-ertheless, these differences only amount to roughly 1 to3 stacked base pairs. Again, we automatically selectedthe top-scoring sequence for each target instead of try-ing the, say, best five sequences for each structure.

4. Discussion and Conclusions

We have shown that multi-stable RNA sequenceswith prescribed alternative secondary structures can beconstructed efficiently by means of a generic computa-tional approach. WithRNAdesign we provide an effi-cient implementation that combines an exact solution ofa graph coloring problem with the heuristic optimiza-tion of feasible solutions by local search. When morethan two target structures are prescribed, a combinato-rial consistency condition must be satisfied. For triplesof targets and moderate sequence length, the designproblem frequently has feasible solutions, although theprobability decreases exponentially with chain length.Randomly generated sets of four or more target struc-tures, however, typically cannot be realized by the samesequence. Since very few multi-stable RNAs have beendescribed in the literature, we resorted to artificial testcases to verify that our approach solves the computa-tional problem at hand.RNAdesign can accomodate a wide range of design

goals. Although our test cases focus on nearly equalenrichment of target structures in the Boltzmann ensem-ble, more complex features of the fitness landscapes caneasily be incorporated. As discussed by Flammet al.(2001) for the case of bi-stable structures, it is feasi-ble (albeit computationally demanding) to estimate fora candidate sequencex the energy barrierf ,(x,Θi ,Θ j)between target structuresΘi andΘ j . For moderate se-quence lengths, this can be computed exactly by usingRNAsubopt andbarriers, and for longer sequences apath-based heuristic provides at least an upper bound(Morgan and Higgs, 1998; Flammet al., 2001). On thisbasis, it is even possible to estimate kinetic parameterssuch as first passage times to target structures (Wolfin-geret al., 2004). It will be easy to extent theRNAdesignso that kinetic parameters of this type can be includedinto the design fitness functionΞ.

The current version ofRNAdesign already supportsinclusion of prescribed energy differences between thetarget conformations. This is desirable, for instance,for the rational design of riboswitches that are triggeredby ligand binding. In this case, the fitness landscape isdistorted by the binding energy of the ligand in certainstructures. This causes a re-folding of the molecule inwhich conformational changes in the ligand binding do-main is used to change adjacent structural domains. Ourrecent construction of a transcriptional riboswitch basedon the theophylline aptamer domain (Wachsmuthet al.,2013) shows that the RNA energy model is sufficientlyaccurate to capture such effects.

We can, therefore, argue that the relative ease with

11

which multistable structures can be designed reflects theevolutionary accessibility of such molecules. Our datasuggest, in particular, that RNA sequences with threeor four disparate local optima with energies close to theground state are abundant and can readily be optimizedby a local search in sequence space. A similar observa-tion has been made by Ramlan and Zauner (2011). Ifsuch structures provide a selective advantage, evolutionshould therefore be able to evolve themde novoin dif-ferent contexts. This immediately raises the question ofwhether multi-stable RNAs have arisen in the history oflife and how abundant they are in nature.

For the case of two alternative structures the an-swer is, of course, affirmative, as demonstrated by adiverse set of riboswitches for a wide variety of lig-ands (Serganov and Nudler, 2013) and several classes ofRNA thermometers (Kortmann and Narberhaus, 2012).Self-induced conformational switches (Nagel and Pleij,2002) act as a kind of timing device. Here, the moleculeis trapped in a metastable structure that either allows ofblocks the RNA’s function. Decay to the ground statethen flips the switch. Molecules undergoing such con-formational changes have also been observed as the out-comes of artificial selection experiments, for instance,selecting for suitability as a template for Qβ replicase(Biebricher and Luce, 1982; Biebricheret al., 1992).

For more than two structural alternatives, the sit-uation is less obvious. No self-induced or smallmetabolite-triggered RNA switch with three or morestructural alternatives has so far been characterized.Complex conformational changes, however, play a rolein splicing and the action of ribozymes including self-splicing introns and other allosteric nucleic acid cata-lysts (Joseet al., 2001; Soukup and Breaker, 2000). Awell-understood system that comes at least close to aself-induced tri-stable RNA switch is theHok/Soksys-tem of plasmid R1 inE. coli (Gultyaevet al., 1997;Møller-Jensenet al., 2001). Allosteric nucleic acid cat-alysts (Joseet al., 2001; Soukup and Breaker, 2000).Here, the binding of one effector causes a change inthe structure of the ribozyme molecule, which in turnallows the binding of a second effector necessary forthe final activation of the enzymatic function. An arti-ficial catalytic system consisting of two RNAs that cat-alyze their ligation with the help of transient hammer-head ribozyme structure relies on several coordinatedstructural rearrangements (Gwiazdaet al., 2012).

The possibility that multi-stable conformationalswitches are a common element beyond simpleON/OFF switches in RNA-based regulation leads tothe question of whether RNA-based circuits provide ancompact – and hard to disentangle – implementation of

complex regulatory programs. Beyond such intriguingperspectives on RNA biology, we encountered also sev-eral non-trivial computational problems that provide in-teresting avenues for future research on rational RNAdesign.

Instead of a “fair” starting sequence sampled uni-formly from C , one might want to “stack the deck”as much as possible in favour of a successful design.This invites the question of whether there are any ef-ficient dynamic programming algorithms that computethe sequence that minimizes the sum of free energies ona prescribed set of structures. A promising way to ad-dress this question is a generalization of theintaRNA

approach of Busch and Backofen (2006) to multiplestructures. Another obvious challenge is to improve thecoloring step using an explicit construction for ear de-compositions that guarantee small values ofα andβ.

Since the design goals for more than two sequencesare not feasible in general, one may be interested in aslight relaxation of the structure in{Θi}, i.e., in a set{Θ′i }that is as close as possible to the original and for whichthe design is feasible. A natural objective function forthis task is, for instance,

i dG(Θ′i ,Θi) for some graphedit distancedG( . , . ). A simpler, but maybe less natu-ral, approach is to directly edit the dependency graphG,i.e., by removing a minimal number of edges.

An alternative approach to relaxing the structuralconstraints is to allow a small number of non-canonicalbase pairs. TheCONTRAfold algorithm by Doet al.(2006) considers all 16 possible base pairings insteadof just the canonical six. Another solution is to usethe space of extended secondary structures (Honer zuSiederdissenet al., 2011), which also considers all 16possible base pairs, and, in addition, explicitly annotatesnucleotide pairings with the nucleotide edge engagedin pairing. As both of these models basically have noconstraints, the space of candidate sequences is unre-stricted. However, since canonical base pairs are morelikely than non-canonical base pairs, it makes sense toalways constrain the search space to those sequences forwhich canonical pairings predominate. Formally, thisequates to allowing some –but not too many– color con-flicts.

Finally, it is desirable to impose more sophisticatedconditions on sequence composition. We currentlyallow penalizing candidate sequences according to amono-nucleotide model. It seems feasible to exploredi-nucleotide distributions instead of the current mono-nucleotide model. Such models have already beenused in a gene prediction context (Gesell and Washietl,2008), and their impact on sequence design will be in-teresting to explore.

12

TheRNAdesign tool opens the door to the largely un-explored realm of tri-stable and even higher-level multi-stable structures which is of utmost interest for syntheticbiology. With small modifications of the energy modelour approch can easily be extended to interacting multi-stable RNA molecules, a topic that is of particular inter-est for the design of small trans-acting and multistableself-assembling RNAs.

Acknowledgements

This work has been funded, in part, by the AustrianFonds zur Forderung der wissenschaftlichen Forschung(FWF) Projektnummer F43“RNA regulation of thetranscriptome” and theDeutsche Forschungsgemein-schaft (STA 850/15-10).

Availability

The RNAdesign software can downloaded fromhttp://www.tbi.univie.ac.at/~choener/

sequencedesign-submission.tar.gz

References

Abfalter I, Flamm C, Stadler PF, 2003. Design of multi-stable nucleicacid sequences. In: Mewes HW, Heun V, Frishman D, Kramer S,editors, Proceedings of the German Conference on Bioinformat-ics. GCB 2003, vol. 1, (pp. 1–7). Munchen, D: belleville VerlagMichael Farin.

Aguirre-Hernandez R, Hoos HH, Condon A, 2007. ComputationalRNA secondary structure design: empirical complexity and im-proved methods. BMC Bioinformatics 8:34.

Andronescu M, Fejes AP, Hutter F, Hoos HH, Condon A, 2004. Anew algorithm for RNA secondary structure design. J Mol Biol336:607–624.

Avihoo A, Churkin A, Barash D, 2011.RNAexinv: An extendedinverse RNA folding from shape and physical attributes to se-quences. BMC Bioinformatics 12:319.

Baumstark T, Schroder AR, Riesner D, 1997. Viroid processing:Switch from cleavage to ligation is driven by a change from atetraloop to a loop E conformation. EMBO J 16:599–610.

Biebricher CK, Diekmann S, Luce R, 1992. In vitro recombina-tion and terminal elongation of RNA by Qβ replicase. EMBO J11:51129–5135.

Biebricher CK, Luce R, 1982. Structural analysis of self-replicatingRNA synthesized by Qβ replicase. J Mol Biol 154:629–648.

Bird R, 2010. Pearls of Functional Algorithm Design. CambridgeUniversity Press.

Burghardt B, Hartmann AK, 2007. RNA secondary structure design.Phys Rev E Stat Nonlin Soft Matter Phys 75:021920.

Busch A, Backofen R, 2006. INFO-RNA–a fast approach to inverseRNA folding. Bioinformatics 22:1823–1831.

Cao Y, Woodson S, 2000. Refolding of rRNA exons enhances disso-ciation of the tetrahymena intron. RNA 6(9):1248–1256.

Coutts D, Leshchinskiy R, Stewart D, 2007. Stream Fusion: FromLists to Streams to Nothing at All. In: Proceedings of the 12thACM SIGPLAN international conference on Functional program-ming, ICFP’07, (pp. 315–326). ACM.

Ding Y, Lawrence CE, 2001. Statistical prediction of single-strandedregions in RNA secondary structure and application to predictingeffective antisense target sites and beyond. Nucleic Acids Res29:1034–1046.

Do CB, Woods DA, Batzoglou S, 2006. CONTRAfold: RNA sec-ondary structure prediction without physics-based models.Bioin-formatics 22:e90.

Erwig M, 1997. Functional programming with graphs. ACM SIG-PLAN Notices 32:52–65.

Flamm C, Hofacker IL, Maurer-Stroh S, Stadler PF, Zehl M, 2001.Design of multi-stable RNA molecules. RNA 7:254–265.

Fontana W, Stadler PF, Bornberg-Bauer EG, Griesmacher T, Ho-facker IL, Tacker M, Tarazona P, Weinberger ED, Schuster P, 1993.RNA folding landscapes and combinatory landscapes. Phys RevE47:2083–2099.

Gao JZ, Li LY, Reidys CM, 2010. Inverse folding of RNA pseudoknotstructures. Alg Mol Biol 5:27.

Gesell T, Washietl S, 2008. Dinucleotide controlled null models forcomparative rna gene prediction. BMC bioinformatics 9:248.

Gultyaev A, Franch T, Gerdes K, 1997. Programmed cell death byhok/sok of plasmid R1: coupled nucleotide covariations reveal aphylogenetically conserved folding pathway in the hok family ofmRNAs. J Mol Biol 273(1):26–37.

Gwiazda S, Salomon K, Appel B, Muller S, 2012. RNA self-ligation:From oligonucleotides to full length ribozymes. Biochimie94:1457–1463.

Hinze R, Harper T, James DW, 2011. Theory and Practice of Fusion.Implementation and Application of Functional Languages (pp.19–37).

Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schus-ter P, 1994. Fast folding and comparison of RNA secondary struc-tures. Monatsh Chem 125:167–188.

Honer zu Siederdissen C, 2012. Sneaking Around concatMap: Effi-cient Combinators for Dynamic Programming. In: Proceedingsof the 17th ACM SIGPLAN international conference on Func-tional programming, ICFP ’12, (pp. 215–226). New York, NY,USA: ACM. URL http://doi.acm.org/10.1145/2364527.

2364559.Honer zu Siederdissen C, Bernhart SH, Stadler PF, Hofacker IL,

2011. A folding algorithm for extended RNA secondary structures.Bioinformatics 27:129–136.

Hudak P, Hughes J, Peyton Jones S, Wadler P, 2007. A Historyof Haskell: Being Lazy with Class. In: Proceedings of thethird ACM SIGPLAN conference on History of programming lan-guages, HOPL III, (pp. 1–55). ACM.

Isaacs FJ, Dwyer D, Collins JJ, 2006. RNA synthetic biology.NatureBiotech 24:545–554.

Jensen TR, Toft B, 1994. Graph Coloring Problems. New York: JohnWiley & Sons.

Jose A, Soukup G, Breaker R, 2001. Cooperative binding of effectorsby an allosteric ribozyme. Nucleic Acids Res 29(70:1631–1637.

Kortmann J, Narberhaus F, 2012. Bacterial RNA thermometers:molecular zippers and switches. Nat Rev Microbiol 10:255–265.

Leshchinskiy R, 2009. Recycle Your Arrays! Practical Aspects ofDeclarative Languages (pp. 209–223).

Levin A, Lis M, Ponty Y, O’Donnell CW, Devadas S, Berger B,Waldispuhl J, 2012. A global sampling approach to designingand reengineering RNA secondary structures. Nucleic AcidsRes40:10041–10052.

Lorenz R, Bernhart SH, Honer zu Siederdissen C, Tafer H, Flamm C,Stadler PF, Hofacker IL, 2011. ViennaRNA Package 2.0. Algo-rithms for Molecular Biology 6.

Lyngsø RB, Anderson JW, Sizikova E, Badugu A, Hyland T, Hein J,2012.Frnakenstein: multiple target inverse RNA folding. BMCBioinformatics 13:260.

13

Maon Y, Schieber B, Vishkin U, 1986. Parallel ear decompositionsearch (EDS) and ST-numbering in graphs. Theor Comp Sci47:277–298.

Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, TurnerDH, 2004. Incorporating chemical modification constraints into adynamic programming algorithm for prediction ofRNA secondarystructure. Proc Natl Acad Sci USA 101:7287–7292.

Møller-Jensen J, Franch T, Gerdes K, 2001. Temporal translationalcontrol by a metastable RNA structure. J Biol Chem 276:35707–35713.

Morgan SR, Higgs PG, 1998. Barrier heights between groundstates ina model of RNA secondary structure. J Phys A 31:3153–3170.

Nagel JH, Pleij CW, 2002. Self-induced structural switchesin RNA.Biochimie 84(9):913–23.

Ogawa A, 2011. Rational design of artificial riboswitches based onligand-dependent modulation of internal ribosome entry in wheatgerm extract and their applications as label-free biosensors. RNA17:478–488.

Okasaki C, 1999. Purely functional data structures. Cambridge Uni-versity Press.

Ramlan EI, Zauner KP, 2011. Design of interacting multi-stablenucleic acids for molecular information processing. Biosystems105:14–24.

Ray PS, Jia J, Yao P, Majumder M, Hatzoglou M, Fox PL, 2009. Astress-responsive RNA switch regulates VEGFA expression.Na-ture 457:915–919.

Reeder J, Giegerich R, 2005. Consensus shapes: an alternative to theSankoff algorithm for RNA consensus structure prediction. Bioin-formatics 21:3516–3523.

Reidys C, Stadler PF, Schuster P, 1997. Generic properties of combi-natory maps: Neutral networks of RNA secondary structures. BullMath Biol 59:339–397.

Schnall-Levin M, Chindelevitch L, Berger B, 2008. Inverting theViterbi algorithm: an abstract framework for structure design.In: Proceedings of the 25th international conference on Machinelearning, ICML ’08, (pp. 904–911). New York, NY, USA: ACM.

Schuster P, Fontana W, Stadler PF, Hofacker IL, 1994. From se-quences to shapes and back: A case study in RNA secondary struc-tures. Proc Roy Soc Lond B 255:279–284.

Serganov A, Nudler E, 2013. A decade of riboswitches. Cell 152:17–24.

Shu W, Liu M, Chen H, Bo X, Wang S, 2010.ARDesigner: a web-based system for allosteric RNA design. J Biotechnol 150:466–473.

Soukup G, Breaker R, 2000. Allosteric nucleic acid catalysts. CurrOpin Struct Biol 10(3):318–325.

Tacker M, Stadler PF, Bornberg-Bauer EG, Hofacker IL, Schuster P,1996. Algorithm independent properties of RNA structure predic-tion. Eur Biophy J 25:115–130.

Taneda A, 2011. MODENA: a multi-objective RNA inverse folding.Adv Appl Bioinform Chem 4:1–12.

The GHC Team, 1989–2013. The Glasgow Haskell Compiler (GHC).http://www.haskell.org/ghc/.

Topp S, Gallivan JP, 2010. Emerging applications of riboswitches inchemical biology. ACS Chem Biol 5:139–148.

Wachsmuth M, Findeiß S, Weissheimer N, Stadler PF, Morl M, 2013.De novodesign of a synthetic riboswitch that regulates transcrip-tion termination. Nucleic Acids Res 41:2541–2551.

Waldispuhl J, Clote P, 2007. Computing the partition function andsampling for saturated secondary structures of RNA, with respectto the Turner energy model. J Comput Biol 14:190–215.

Whitney H, 1932. Non-separable and planar graphs. Trans AmerMath Soc 34:339–362.

Wieland M, Benz A, Klauser B, Hartig JS, 2009. Artificial ribozymeswitches containing natural riboswitch aptamer domains. Angew

Chem Int Ed 48:2715–2718.Win MN, Liang JC, Smolke CD, 2009. Frameworks for programming

biological function through RNA parts and devices. Chem Biol16:298–310.

Wolfinger MT, Svrcek-Seiler WA, Flamm C, Hofacker IL, StadlerPF,2004. Exact folding dynamics of RNA secondary structures. JPhys A Math Gen 37:4731–4741.

Zadeh JN, Steenberg CD, Bois JS, Wolfe BR, Pierce MB, Khan AR,Dirks RM, Pierce NA, 2011a. NUPACK: Analysis and design ofnucleic acid systems. J Comput Chem 32:170–173.

Zadeh JN, Wolfe BR, Pierce NA, 2011b. Nucleic acid sequence de-sign via efficient ensemble defect optimization. J Comput Chem32:439–452.

14