Ribozyme-catalysed RNA synthesis using triplet building blocks

80
1 1 2 3 Ribozyme-catalysed RNA synthesis using triplet building blocks 4 5 6 7 Authors: James Attwater 1 , Aditya Raguram 1 †, Alexey S. Morgunov 1 , Edoardo Gianni 1 & 8 Philipp Holliger 1 * 9 10 Affiliations: 11 1 MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, 12 Cambridge, CB2 0QH, UK. 13 Current address: Harvard University, Department of Chemistry and Chemical Biology, 12 14 Oxford Street, Cambridge, MA 02138, USA. 15 *Correspondence to: Philipp Holliger ([email protected]). 16

Transcript of Ribozyme-catalysed RNA synthesis using triplet building blocks

1

1

2

3

Ribozyme-catalysed RNA synthesis using triplet building blocks 4

5

6

7

Authors: James Attwater1, Aditya Raguram

1†, Alexey S. Morgunov1, Edoardo Gianni

1 & 8

Philipp Holliger1*

9

10

Affiliations: 11

1 MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, 12

Cambridge, CB2 0QH, UK. 13

† Current address: Harvard University, Department of Chemistry and Chemical Biology, 12 14

Oxford Street, Cambridge, MA 02138, USA. 15

*Correspondence to: Philipp Holliger ([email protected]). 16

2

Abstract 17

RNA-catalyzed RNA replication is widely believed to have supported a primordial 18

biology. However, RNA catalysis is dependent upon RNA folding, and this yields structures that 19

can block replication of such RNAs. To address this apparent paradox we have re-examined the 20

building blocks used for RNA replication. We report RNA-catalysed RNA synthesis on structured 21

templates when using trinucleotide triphosphates (triplets) as substrates, catalysed by a general 22

and accurate triplet polymerase ribozyme that emerged from in vitro evolution as a mutualistic 23

RNA heterodimer. The triplets cooperatively invaded and unraveled even highly stable RNA 24

secondary structures, and support non-canonical primer-free and bidirectional modes of RNA 25

synthesis and replication. Triplet substrates thus resolve a central incongruity of RNA 26

replication, and here allow the ribozyme to synthesise its own catalytic subunit ‘+’ and ‘–’ strands 27

in segments and assemble them into a new active ribozyme. 28

3

Introduction 29

The premise that some RNA sequences can catalyse and template their own replication - 30

synthesizing both their own ‘+’ and ‘–’ strands - underpins current thinking about early genetic 31

systems (Crick, 1968; Orgel, 1968; Szostak et al., 2001). Any ancient ribozyme with such RNA 32

replicase capability seems to be lost, but efforts to recreate RNA self-replication in the laboratory 33

are ongoing (Martin et al., 2015) as a critical test of the “RNA world” hypothesis (Gilbert, 1986). 34

Early on, derivatives of naturally occurring self-splicing introns (Doudna et al., 1991; Green & 35

Szostak, 1992; Hayden & Lehman, 2006) as well as later in vitro evolved ligase ribozymes 36

(Lincoln & Joyce, 2009; Sczepanski & Joyce, 2014) were shown to be able to assemble one of 37

their own strands from cognate constituent RNA segments. A critical drawback of such systems is 38

their use of specific preformed building blocks of at least 8 nucleotides (nt) average length, 39

limiting their potential for open ended evolution, and precluding their replication from pools of 40

random-sequence oligonucleotide substrates (Green & Szostak, 1992; Doudna et al., 1993). 41

In a contrasting approach, RNA polymerase ribozymes (RPRs) have been developed that 42

can use general monomer building blocks (ribonucleoside 5’ triphosphates (NTPs)) in RNA-43

templated RNA synthesis (Johnston et al., 2001; Zaher & Unrau, 2007; Wochner et al., 2011; 44

Attwater et al., 2013b; Horning & Joyce, 2016), akin to the activity of modern proteinaceous 45

polymerases. However, even the most highly-evolved RPRs (Horning & Joyce, 2016) are 46

substantially impeded by template secondary structures. Such structures are ubiquituous in larger, 47

functional RNAs (including the RPRs themselves) and generally indispensable for function. The 48

strong inhibitory role of this central feature of RNA leads to an antagonism between the degree to 49

which an RNA sequence is able to fold into a defined three-dimensional structure to encode 50

function (such as catalysis) and the ease with which it can be replicated (Boza et al., 2014). This 51

ostensible “structure vs. replication” paradox would have placed stringent probability constraints 52

4

on the emergence of an RNA replicase and generally impeded the ability of RNA to function as 53

an early genetic polymer. 54

We wondered whether this paradox might be evaded through a re-consideration of 55

plausible building blocks for early RNA replication. Models of non-enzymatic polymerisation of 56

all four activated ribonucleotides – the presumed source of the first RNA sequences – yield pools 57

of di-, tri- and tetranucleotide- etc. length oligonucleotides (in decreasing abundance) dominating 58

the population alongside longer products (Monnard et al., 2003). Here we have examined 59

whether substrates of such lengths can support RNA-catalyzed RNA replication, by developing a 60

ribozyme capable of iterative templated ligation of 5’-triphosphorylated RNA trinucleotides 61

(henceforth called triplets). This heterodimeric triplet polymerase ribozyme demonstrated a 62

striking capacity to copy a wide range of RNA sequences, including highly structured, previously 63

intractable RNA templates, as well as its own catalytic domain and encoding template in 64

segments. Its characterization revealed emergent properties of triplet-based RNA synthesis, 65

including cooperative invasion and unraveling of stable RNA structures by triplet substrates, bi-66

directional (both 5’-γ’ and γ’-5’) and primer-free (triplet-initiated) RNA synthesis, and fidelity 67

augmented by systemic properties of the random triplet pools. 68

5

Results 69

In vitro evolution of triplet polymerase activity 70

We set out to explore the potential of short RNA oligonucleotides as substrates for RNA-71

catalyzed RNA replication. To do this we required a ribozyme capable of general, iterative RNA-72

templated oligonucleotide ligation. Previously-described RNA polymerase ribozymes such as the 73

‘Z’ RPR (Wochner et al., 2011) can use NTPs to iteratively extend a primer hybridized to an 74

RNA template, but do not accommodate oligonucleotides bound downstream of the primer or 75

accept them as substrates. However, we detected a weak templated ligation activity in a truncated 76

version of the Z RPR comprising its catalytic core domain (Zcore) (Figure 1a), which supported 77

incorporation of oligonucleotide substrates as short as 3 nt (Figure 1 - figure supplement 1), when 78

incubated in the eutectic phase of water ice (Attwater et al., 2010; Mutschler et al., 2015). 79

To be able to properly examine such RNA trinucleotide triphosphates (triplets) as 80

replication substrates, we first sought to convert Zcore into an effective triplet polymerase 81

ribozyme using in vitro evolution. We devised a selection strategy that required iterative 82

templated triplet ligation by ribozymes to achieve their in cis covalent linkage to a tagged primer 83

(Figure 1 - figure supplement 2). This enables their recovery, amplification and mutagenesis 84

before further rounds of selection to enrich the selection pool in improved triplet polymerase 85

ribozyme variants. 86

We initiated selections from a library of 1.5 × 1015

Zcore variants with a new random γ’ 87

N30 region under eutectic phase conditions that increase RNA half-life and enhance ribozyme 88

activity (Attwater et al., 2010). After 7 rounds of in-ice evolution (Attwater et al., 2013b) one-89

quarter of the selection pool comprised an improved ribozyme (type 0). Its core domain (0core, 90

Figure 1a) could catalyse the iterative polymerization of multiple triplets allowing us to begin to 91

investigate the properties of triplet-based RNA replication. 92

6

Significantly, we found that 0core could catalyze triplet polymerisation on a series of 93

structured templates, which had proven intractable to the parental Z RPR (Figure 1b). Here, 94

primer extension exhibited a steep sigmoidal dependence upon triplet concentrations (Figure 1c), 95

suggestive of a cooperative invasion and unraveling of template secondary structures by the 96

triplet substrates themselves. Although still inefficient, the fact that the nascent activity of the 97

0core ribozyme could already copy templates that had confounded an established RPR 98

encouraged us to continue to seek improved triplet polymerase ribozymes to leverage this 99

substrate behaviour. 100

101

Emergence of cooperativity and characterisation of a ribozyme heterodimer 102

We continued selections for a further 14 rounds. At this point, the type 0 ribozyme had 103

gone extinct, replaced by six new types of RNA each characterised by a unique γ’ domain (Figure 104

2a, Figure 2 - figure supplement 1). Type 1 RNAs were most abundant, comprising ~50% of pool 105

sequences but mysteriously were catalytically inactive with diverse mutations in their core 106

domains. In contrast, the type 2-6 RNAs all displayed triplet polymerase activity, but fell short of 107

the polyclonal activity of the selection pool (Figure 2 - figure supplement 2). To attempt to 108

explain this discrepancy, we explored potential interactions among the different pool lineages, 109

and found that addition of an equimolar amount of type 1 RNA substantially enhanced triplet 110

polymerase activity of all the other ribozyme types 2-6 (Figure 2b). 111

Dissecting type 1 RNA function, we found that 5’ truncation of the region that previously 112

contacted the primer/template duplex (Shechner et al., 2009) did not affect its cofactor activity 113

(Figure 3a, Figure 3 - figure supplement 1). As judged by gel mobility shift (Figure 3b) and 114

activity enhancement (Figure 3 - figure supplement 1), type 1 RNA appears to form a 1:1 115

heterodimeric complex directly with active triplet polymerase ribozymes. Our attention was 116

drawn to their selection construct-derived 5’ hairpin elements, which differed between active 117

7

triplet polymerases (‘cap+’, Figure 3a, Figure 2 - figure supplement 1) and the most common type 118

1 variants in the selection pool where this hairpin had acquired a mutation (yielding ‘cap–’, 119

Figure 3a). ‘cap–’ was dispensible for type 1’s cofactor activity, but when replacing ‘cap+’ in 120

active triplet polymerases it abolished both their activity enhancement by type 1 (Figure 3 - figure 121

supplement 1) and complex formation (Figure 3b). This points to the ‘cap+’ hairpin as the critical 122

site of interaction with type 1; ‘cap–’ in type 1 presumably served to deter its homodimerisation. 123

Indeed, transplanting the ‘cap+’ element could make the parental ribozymes (Zcore & Z 124

RPR) receptive to activity enhancement by type 1 RNA (Figure 2b, Figure 3 - figure supplement 125

2). The catalytically inert type 1 RNA thus represents a general, mutualistic RNA species. This 126

molecular symbiont appears to have emerged spontaneously during in vitro evolution by forming 127

a heterodimeric holoenzyme with triplet polymerase ribozymes, enhancing their activity and their 128

joint recovery prospects. 129

In complex with type 5 (the fastest enriching triplet polymerase ribozyme in the final 130

selection pool), type 1 boosts polymerization of triplets (or longer oligonucleotides) to enable 131

synthesis of long RNAs (Figure 3c). Here it became apparent that type 1 also obviates the need 132

for ribozyme-template tethering. Due to their poor affinity for primer/template duplex (Lawrence 133

& Bartel, 2003), RPRs generally depend upon such tethering to template (Attwater et al., 2010; 134

Wochner et al., 2011; Horning & Joyce, 2016), which enhances local ribozyme concentration 135

and promotes formation of the RPR-primer/template holoenzyme (Attwater et al., 2010; Attwater 136

et al., 2013a). In contrast, the triplet polymerase heterodimer appears to have a capacity for true 137

intermolecular, sequence-general interaction with primer-template duplexes, which enables 138

holoenzyme formation and copying of RNA templates without requiring specific ribozyme-139

template hybridization sites. 140

141

Secondary structure invasion by triplet substrates 142

8

We performed an additional five rounds of in vitro evolution to further evolve the type 5 143

triplet polymerase ribozyme (now in the presence of truncated type 1 RNA), diversifying the 144

previously-fixed γ’ domain reverse transcription primer binding sequence. This reselection 145

yielded a shorter final heterodimeric triplet polymerase holoenzyme, hereafter termed ‘t5+1’ 146

(Figure 4). This robust triplet polymerase activity now proved suitable for exploring the scope 147

and potential of triplet-based RNA replication. 148

As a first examination of t5+1

activity, we revisited triplet-based RNA synthesis on 149

structured templates. To provide a stringent test of template structure inhibition, we now 150

examined hairpin-containing templates (4S, 6S, 8S) with increasing RNA hairpin stability and 151

estimated TMs of up to 9γ˚C (8S). The latter had previously strongly arrested even the most 152

advanced mononucleotide RPRs at higher temperatures (Horning & Joyce, 2016). However, 153

using triplets as substrates t5+1

robustly copied all of these (Figure 5a), even when templates were 154

pre-folded allowing RNA secondary structures to form prior to triplet addition (Figure 5 - figure 155

supplement 1). The triplet concentration-dependent cooperative structure invasion and unraveling 156

(previously observed with the simpler 0core domain and partly wobble-paired RNA template 157

structures (Figure 1b, c)) was recapitulated with t5+1

and the highly stable 8S hairpin template 158

(Figure 5 - figure supplement 2). In contrast, dinucleotide triphosphate substrates yielded 159

extension only up to the structured region (Figure 5 - figure supplement 2). 160

We began to explore whether triplet-based RNA synthesis by t5+1

might exhibit the 161

generality required not just for synthesis of arbitrary structured sequences, but for replication of 162

functional sequences (requiring synthesis of both ‘+’ and ‘’ strands). Encouragingly, t5+1

could 163

synthesise both a functional fluorescent ‘+’ strand of the 52 nt Broccoli RNA aptamer (Filonov et 164

al., 2014) and its encoding ‘’ strand template from their 13 (+) & 12 () different constitutive 165

triplets (Figure 5b). 166

167

9

Ribozyme sequence self-synthesis and assembly 168

We next turned to the critical test of generality: could triplet substrates allow self-169

synthesis? As t5+1

currently lacks the efficiency to synthesize RNAs its own length, we divided 170

the catalytic t5 ribozyme into five segments , , , and . This segmentation strategy (akin to 171

that used by some RNA viruses e.g. influenza) could reduce tertiary structures (Doudna et al., 172

1991; Mutschler et al., 2015) and ease product separation during RNA replication (Szostak, 173

2012). Starting from ~8 nt RNA primers, t5+1

achieved synthesis of the +, +

, and + segments 174

from their constitutive triplets as well as all of the ‘’ strand segments -, -

, -, -

and -, but 175

required some triplets pre-linked (as e.g. hexanucleotides) for synthesis of full-length + and +

176

segments (Figure 6a). 177

Operating across 70 distinct ligation junctions in these reactions including AU-rich 178

sequences, t5+1

using triplet substrates demonstrates the sequence generality for self-synthesis. 179

Notably, the average extent of ligation per junction during synthesis of t5 ‘+’ and ‘’ strands 180

(78%) was similar to that observed when t5+1

used an unstructured model template (74%, Figure 181

3c) upon which the parental Z and other RPRs excel (Attwater et al., 2013b; Horning & Joyce, 182

2016). 183

At this point we tested whether the broad oligonucleotide ligation capacity of t5+1

(Figure 184

3c) might allow assembly of synthesised ‘+’ strand segments. Indeed, t5+1

could assemble these 185

into + and +

fragments, guided only by partially overlapping ‘’ strands (Figure 6b, Figure 6 186

- figure supplement 1). Through non-covalent association (Vaish et al., 2003; Mutschler et al., 187

2015), the ribozyme-synthesised + and +

fragments spontaneously reconstituted a new 188

catalytically active triplet polymerase ribozyme (with in vitro transcribed type 1 RNA). We found 189

that this synthesis product could regenerate fresh - segment using t5

+1 ribozyme-synthesised +

190

(left over from ribozyme assembly) as a template (Figure 6c), recapitulating elements of a self-191

10

replication cycle. However, while the t5+1

ribozyme displays a nascent capacity for templated 192

synthesis of its own catalytic domain ‘+’ strands (and ‘’ strands), efficiency of both segment 193

synthesis and assembly will need to be increased significantly to realise a full self-replication 194

cycle (which would also require synthesis and replication of the type 1 subunit). 195

196

Primer-free RNA synthesis 197

Templated ‘+’ strand self-synthesis is a central element of ribozyme self-replication. 198

However, a limitation of our above strategy in the context of triplet-based self-replication is the 199

continued requirement for some pre-synthesized longer oligonucleotides to act as primers and 200

occasional substrates (together providing here the equivalent of ~1/4 of triplet junctions pre-201

ligated). Specific oligonucleotide substrates were used in places to compete out inhibitory mutual 202

hybridisation between ‘’ strand template and corresponding ‘+’ strand unstructured elements in 203

the t5 ribozyme (Figure 6 - figure supplement 2). In vitro selections that stabilise the ribozyme 204

tertiary structure (Figure 6 - figure supplement 3) may contribute to attenuating this requirement. 205

Additionally, more concentrated triplet substrates can successfully compete with ribozyme 206

unstructured elements for hybridization to ‘’ strand templates (Figure 6 - figure supplement 2). 207

The majority of specific oligonucleotides, however, were provided as primers to initiate 208

syntheses, as required by all RPRs akin to the activity of replicative polymerases in biology. As a 209

consequence of this, the priming sequence would effectively be excluded from evolution. 210

Furthermore, RNA oligonucleotides able to act as specific primers are unlikely to be prevalent in 211

prebiotic substrate pools, and their depletion during successive replication cycles could lead to 212

sequence loss at genome ends. This ‘primer problem’ has previously been noted in the context of 213

nonenzymatic replication (Szostak, 2012) as one of the fundamental obstacles to RNA self-214

replication. 215

11

Unexpectedly, triplet substrates provide a route to bypass the ‘primer problem’. We 216

observed that t5+1

can extend primers bidirectionally, in both the canonical 5’-γ’ as well as the 217

reverse γ’-5’ directions (Figure 7a). This not only allows completion of RNA synthesis from 218

either template end but initiation from anywhere along a template potentially allowing non-219

classical hierarchical or distributive RNA replication schemes as previously proposed (Szostak, 220

2011, 2012). Given this flexibility, we wondered if t5+1

even had a requirement for a primer 221

oligonucleotide. Indeed, the t5+1

ribozyme could achieve ‘primer free’ RNA synthesis (whereby 222

synthesis is presumably initiated by ligation of adjacent triplets anywhere on the template), as 223

exemplified here for the + segment (Figure 7b), as well as ‘primer free’ RNA replication as 224

shown for the ‘+’ and ‘’ strands of the segment, which can be replicated using triplets alone 225

(Figure 7c). 226

Thus the capacity of triplet substrates to pre-organise themselves on a template not only 227

enables replication of structured templates but allows complete copying of some RNA sequences 228

exclusively from triplet building blocks, suggesting an alternative to the canonical end-primed 229

replication strategies inspired by PCR. Such a ribozyme operating in a more distributive 230

polymerisation mode might be able to replicate RNA sequences directly from the putative pools 231

of short random RNA oligonucleotides furnished by prebiotic chemistry. 232

233

Fidelity of triplet-based RNA synthesis 234

Next we investigated the consequences of using defined modern analogues of such 235

prebiotic pools as a source of substrates for the t5+1

triplet polymerase ribozyme. Random 236

sequence triplet pools (‘pppNNN’, comprising equimolar amounts of all 64 triplets) could be used 237

as substrates by t5+1

in segment syntheses in place of defined triplet sets (Figure 6 – figure 238

12

supplement 4). Furthermore, extension activity remained robust upon pool supplementation with 239

noncanonical dinucleotide and mononucleotide substrates (Figure 6 – figure supplement 5). 240

However, a replicase must incorporate the correct template-complementary substrate from 241

random sequence pools, or genetic information may become irretrievably corrupted during 242

replication (Eigen, 1971). Sequence fidelity is therefore a critical parameter of RNA replication. 243

The fidelity challenge is exacerbated in triplet-based RNA replication by the need to discriminate 244

between 64 distinct substrates; indeed, a previous investigation into the incorporation of 245

individual trinucleotides indicated that misincorporations could outstrip cognate incorporation for 246

some triplets (Doudna et al., 1993). 247

In order to assess the fidelity of triplet polymerase ribozymes of widely differing activity, 248

we identified the triplets incorporated from random ppp

NNN triplet pools using 12 different 249

compositionally representative NNN triplet sequences as templates in a consistent sequence 250

context (5’-GGG-NNN-GGG-3’), whose collation allowed an estimation of ribozyme 251

misincorporation tendencies. On average, the starting Zcore ribozyme exhibited ~91% fidelity per 252

position (Figure 8a), lower than that described for RPRs (92% (Horning & Joyce, 2016) – 97% 253

(Attwater et al., 2013b)). Furthermore, its accuracy exhibited a pronounced downward gradient 254

from the 1st (5’) to the 3

rd (γ’) triplet position, highlighting escalating risks to fidelity of synthesis 255

founded on longer building blocks. 256

To investigate if ribozymes could exhibit higher triplet incorporation fidelity, we had 257

included a persistent adaptive pressure for fidelity during in vitro evolution, spiking in an excess 258

of mispairing γ’-deoxy ‘terminator’ triplets from round 9 onwards, precluding recovery of 259

ribozymes that incorporated these mispairs (Figure 2 - source data 1, Figure 4 - source data 1, 260

Figure 2 - figure supplement 2). This yielded reshaped and improved fidelity profiles in the 261

‘surviving’ type 2-6 ribozymes (Figure 8a). Notably, the final t5+1

ribozyme achieves an average 262

positional fidelity of 97.4% using ppp

NNN in this sequence context, higher than the best RPR 263

13

fidelity with NTPs under comparable eutectic conditions (Attwater et al., 2013b). Deep 264

sequencing of internal triplet positions of a defined sequence (+ segment) synthesised by t5

+1 265

using ppp

NNN indicated similar fidelity could be achieved during longer product synthesis (Table 266

1). 267

268

Molecular basis of triplet polymerase ribozyme fidelity 269

Having established that accurate triplet-based copying is possible (in at least some 270

sequence contexts), we sought to understand how the triplet polymerase ribozyme achieves it. 271

Investigating the fidelity contributions of different t5+1

ribozyme components, we found that the 272

type 1 RNA cofactor did not contribute; rather, fidelity gains appeared to be mediated by the 273

newly-evolved t5 ‘’ γ’-domain, as its deletion (yielding the truncated ‘’ ribozyme) reverted 274

its fidelity profile towards that of Zcore (Figure 8 - figure supplement 1). Presence of the 275

domain did not uniformly increase fidelity, but selectively reduced the most acute errors at the 2nd

276

and 3rd

triplet positions (with over 10-fold reductions for some errors, Figure 8b, Figure 8 - figure 277

supplement 2). Overall error rates at the 2nd

and 3rd

triplet positions were reduced by 4-fold and 9-278

fold compared to Zcore (Figure 8a), though slightly increased (1.3-fold) at the 1st triplet position 279

due to a localised asymmetric tolerance of G:U wobble pairing (Figure 8b). The domain fidelity 280

function is contingent upon the presence of a downstream triplet, operating only with basal 281

fidelity for final triplet incorporation (Figure 8 - figure supplement 3). 282

Dissecting the molecular determinants of the fidelity phenotype, we found that using 283

triplet substrates modified at the 3rd

position with a 2-thiouracil in place of a uracil (disrupting 284

minor groove hydrogen bonding) rendered the fidelity domain unable to discriminate 285

mismatches (Figure 8c, Figure 8 - figure supplement 3). Previously, a similar replacement of a 286

uracil 2-keto group with a 2-thio modification had been shown to impair Z RPR activity when 287

14

present upstream in the primer/template region (Attwater et al., 2013a), where Z is thought to rely 288

upon sequence-general minor groove contacts through an ‘A-minor’ motif (Shechner et al., 289

2009). Modification at the 3rd

triplet position reverts ’s divergent effects on fidelity at the 290

adjacent 2nd

and the distal 1st triplet positions (Figure 8 - figure supplement 3); disruption of this 291

minor groove contact site thus abolishes overall fidelity domain operation. sensitivity to minor 292

groove composition may be critical to its recognition of cognate Watson-Crick base pairs, 293

reminiscent of Tetrahymena group I intron folding (Battle & Doudna, 2002) and the decoding 294

centre of the ribosome (which also tolerates wobble pairing at the analogous (5’) triplet position) 295

(Ogle et al., 2001). 296

297

Systems-level properties of triplet pools 298

An important contribution to triplet fidelity also appears to arise from unexpected 299

behaviours of the triplet substrates themselves. We observed that in some direct pair-mispair 300

triplet contests, inclusion of their complementary triplets caused a striking (~3-fold) drop in 301

misincorporation errors (Figure 9). A potential explanation may arise from differential formation 302

of triplet:anti-triplet dimers in the reaction: for example, more extensive ppp

GCC:ppp

GGC (than 303

pppACC:

pppGGU) dimer formation would selectively reduce the effective concentration of free 304

pppGCC vs.

pppACC upon inclusion of their complementary

pppGGC and

pppGGU. 305

These pairwise reductions were recapitulated in the presence of random ppp

NNN substrate 306

pools (Figure 9). Indeed, counterintuitively, raising ppp

NNN concentrations from 0.5 to 5 με each 307

almost halved the overall error rate (Figure 9 - figure supplement 1). Although diverse effects 308

upon individual misincorporations were observed, this fidelity enhancement was driven by 309

pronounced reductions in errors where the mismatched triplet has a high GC content compared to 310

the cognate triplet, including common G-U wobble mispairs (Figure 9 - figure supplement 1). 311

Dimer formation among ppp

NNN substrate pools would be expected to selectively buffer the free 312

15

concentrations of the more strongly-pairing GC-rich triplets, which could promote both fidelity 313

and sequence generality through normalization of triplet availability against template (and 314

complementary triplet) binding strength. 315

Indeed, more efficient, higher fidelity segment synthesis was observed when partially 316

mimicking this outcome using an ppp

NNN pool formulated with a reduced G content (Figure 6 – 317

figure supplement 4, Table 1). In a prebiotic scenario, substrate pool composition would have 318

been determined by the abundance and nontemplated polymerization tendencies of the different 319

nucleotides; large biases in these could skew triplet compositions or deplete a triplet (resulting in 320

mismatch incorporation). However, the potential for replication to proceed in different triplet 321

registers may provide a degree of resilience towards such biases. 322

16

Discussion 323

Here we describe the discovery and characterization of a ribozyme (t5+1

) with a robust 324

ability to polymerize RNA trinucleotide triphosphate (triplet) substrates. Unusually, this triplet 325

polymerase ribozyme comprises a heterodimer of a catalytic triplet polymerase subunit (t5) and a 326

non-catalytic RNA cofactor (type 1), which enhances triplet polymerase activity and abrogates 327

the need for template tethering. Such a quaternary structure - involving a heterodimer of a full-328

length and a truncated subunit - is reminiscent of the processivity factors of proteinaceous 329

polymerases such as the heterodimeric p66/p51 HIV reverse transcriptase holoenzyme (Huang et 330

al., 1992). There are multiple examples of dimerization in RNA evolution - such as the VS 331

ribozyme (Suslov et al., 2015), retroviral RNA genome dimerization (Paillart et al., 2004), in 332

vitro evolved heterodimeric RNA liposome binders (Vlassov et al., 2001), and recently the 333

homodimeric CORN fluorescent RNA aptamer (Warner et al., 2017). However, the spontaneous 334

emergence of a general, mutualistic RNA cofactor has not previously been observed for 335

ribozymes and may suggest an underappreciated dimension to the evolutionary dynamics of 336

ribozyme pools under stringent adaptive pressures. Indeed, the extinction of previously dominant 337

species in the selection that were unable to benefit from type 1 enhancement (e.g. type 0, see 338

Figure 3 - figure supplement 2) and succession with cooperative RNA species (Vaidya et al., 339

2012) illustrates the potential for such symbioses to shape RNA molecular ecologies. 340

The t5+1

ribozyme’s principal current shortcoming is its low catalytic efficiency. In the 341

optimal context for mononucleotide polymerase ribozymes, this triplet polymerase heterodimer 342

yields ~4-fold more unligated junctions than the RPR tC19Z (Attwater et al., 2013b), which itself 343

is 240-fold slower than the currently most advanced RPR 24-3 (Horning & Joyce, 2016). Yet 344

despite this modest catalytic power, t5+1

displays much enhanced generality in RNA synthesis and 345

now achieves both copying of previously intractable structured RNA templates, and templated 346

17

synthesis and assembly of an active ‘+’ strand copy of its catalytic domain, suggesting key 347

contributions of the triplet substrates themselves. 348

Indeed, one of the main findings of our work are the compelling advantages triplet 349

substrates appear to offer for sequence general RNA replication. For instance, when binding 350

templates, triplets incur a lower entropic cost per position compared to canonical 351

mononucleotides (thus aiding copying of sequences rich in weakly pairing A and U bases), with 352

particularly helpful stability contributions from intra-triplet base stacking (Eigen, 1971). 353

Furthermore, energetically favourable inter-triplet stacking interactions appear to instigate 354

cooperative binding and unfolding of even highly stable RNA template structures (Figures 1b, 5a) 355

upon reaching the required substrate concentration threshold. In our work, this process is aided by 356

the cold temperature and solute concentration effect of eutectic ice phase formation (Attwater et 357

al., 2010; Mutschler et al., 2015). Counterintuitively, a general solution to the copying of 358

structured RNAs arises not from conditions that disfavour base-pairing (which would also hinder 359

substrate binding), but rather from conditions that promote it. 360

Together these favourable molecular traits serve to pre-organize the template towards a 361

double-stranded RNA duplex with triplet junctions poised for ligation. A triplet/template duplex 362

presents a more ordered, regular target for sequence-general ribozyme docking (by e.g. the 363

domain) than a single stranded template (variably prone to secondary structure formation or 364

sequence-specific interactions with the ribozyme (Wochner et al., 2011)). Such general duplex 365

interactions also underlie other notable features observed in our triplet-based RNA synthesis such 366

as in trans template binding (Figure 3c) as well as the capacity for bidirectional (5’-γ’ / γ’-5’) and 367

primer-free RNA synthesis (Figure 7). 368

Contrary to expectations RNA-catalyzed triplet polymerisation can proceed with a fidelity 369

matching or exceeding even the best mononucleotide RNA polymerase ribozymes (Attwater et 370

al., 2013b; Horning & Joyce, 2016). t5+1

ribozyme fidelity is due to both a readout of cognate 371

18

minor groove interactions by the ribozyme domain (Figure 8) and an unanticipated fidelity 372

boost arising from systems-level properties of triplet pools, that appear to normalize the 373

availability of free triplet (and potentially longer oligonucleotide) substrates against their base-374

pairing strength (Figure 9). Though further work will be required to characterize triplet pool 375

properties, they likely involve formation of cognate or near-cognate triplet:anti-triplet interaction 376

networks, as formation of tRNA dimers via cognate anticodon:anticodon interactions has been 377

observed in a similar concentration range (Eisinger & Gross, 1975). 378

While phylogenetically unrelated, mechanistic analogies between the triplet polymerase 379

ribozyme and the ribosome are apparent. Both are RNA heterodimers that operate in a triplet 380

register along a single-stranded RNA template, whilst enforcing a minor-groove mediated pattern 381

of triplet / anticodon readout (including tolerance of 5’ wobble pairing), suggestive of convergent 382

adaptive solutions to the challenges of replication and decoding. It has long been speculated that 383

the decoding centre of the small ribosomal subunit might have had its origins in an ancestral RNA 384

replicase, but the implied triplet-based character of such a replicase was conspicuously discordant 385

with modern mononucleotide-based replication (Weiss & Cherry, 1993; Poole et al., 1998; 386

Noller, 2012). The utility of triplets as substrates for RNA synthesis and self-synthesis described 387

herein suggests that these early ideas deserve to be reconsidered. In the context of initial 388

uncorrelated evolution of the small and large ribosomal subunits (Petrov et al., 2015), an early 389

reliance upon triplets in RNA replication could have inadvertently supplied a decoding center for 390

translation. 391

In conclusion, the unexpected emergent properties of triplets – including cooperative 392

binding and unfolding of structured RNA templates, enhanced incorporation of AU-rich 393

substrates, and error attenuation (resulting from triplet pool interaction networks) – argue that 394

short RNA oligonucleotides represent predisposed substrates for RNA-catalyzed RNA 395

replication. Some of these benefits might also extend to anticodon behaviour in early translation, 396

19

and to the non-enzymatic replication of RNA (Szostak, 2012), where downstream trinucleotides 397

have recently been shown to enhance incorporation of preceding activated mononucleotides both 398

through stacking and positioning effects (Vogel et al., 2005; Zhang et al., 2018) and the 399

formation of a highly reactive intermediate (Prywes et al., 2016; O'Flaherty et al., 2018). Taken 400

together, the interaction of triplet substrate pools with RNA templates promotes uncoupling of an 401

RNA’s sequence (i.e. information content, and associated folding tendencies) from its 402

replicability, thereby enhancing RNA’s capacity to serve as an informational polymer. 403

20

Materials and Methods 404

Templated RNA-catalysed RNA synthesis 405

Standard ribozyme activity assays (modified where specified) comprise 5 pmol of each 406

ribozyme annealed in 2.5 μl water (80˚C 2 min, 17˚C 10 min), with 2 μl of 1 M MgCl2 and 0.5 μl 407

of 1 M tris•HCl pH 8.3 (at β5˚C, pH raised to 9.2 at 7˚C) then added on ice, and left for >5 min 408

to ensure folding. This was added to 5 pmol each of primer and template and 50 pmol of each 409

triplet pre-annealed in 5 μl water, then frozen on dry ice (10 min) and incubated at 7˚C in a R4 410

series TC120 refrigerated cooling bath (Grant) to allow eutectic phase formation and reaction. 411

Final pre-freezing concentrations of components are displayed throughout (in this 412

example, yielding 0.5 με ribozyme/primer/template, 5 με each triplet, 200 mM MgCl2, 50 mM 413

tris•HCl pH 8.3). Supercooled reactions (Figure 1c) remained liquid by omitting the dry-ice 414

freezing step, maintaining these concentrations. Ice crystal formation upon eutectic phase 415

equilibration, however, concentrates all solutes ~ 4-5 fold (Attwater et al., 2010) to their final 416

operational levels and cooling elevates tris-buffered pH to ~9.2. 417

Some substrate mixes (e.g. ppp

NNN) led to a higher final reaction volume, but eutectic 418

phase equilibration restored standard operational concentrations, also applicable to the four-fold-419

diluted extensions with the fragmented ribozyme (Figure 6c). These used 2 pmol each 420

ribozyme/fragment annealed in 3.25 μl 62 mM MgCl2, 15 mM tris•HCl pH 8.3 (γ7˚C 5 min, 421

ramped to 4˚C at 0.1˚C/s, 4˚C 10 min), with pre-annealed primer/template/substrate 422

(0.5/0.5/5 pmol) added in 0.75 μl water. These reactions, and preparative syntheses (Figure 6 - 423

figure supplement 1, Figure 7c), were supercooled at 7˚C followed by ice crystal addition for 424

quick freezing and optimal activity. 425

Figure 1c extensions were set up by adding buffer, then RNAs (preannealed together, 426

0.1 με final concentrations) to triplets. RNAs for + syntheses were chilled on ice instead of 427

annealing, with ribozyme/MgCl2/tris•HCl pH 8.3 mixed with the other RNAs at 7˚C. 428

21

Oligonucleotide substrates were added equimolar to template binding sites in the 429

primer/template/substrate anneal. NTPs, on the other hand, were added with the MgCl2/tris•HCl 430

pH 8.3 to the ribozyme polymerase. 431

432

Extension product separation 433

At the end of standard incubations, reactions were thawed and 2 μl aliquots added to stop 434

buffer (1 μl 0.44 M EDTA (pH 7.4), with urea to a 6 M final concentration and a 10-20 fold molar 435

excess over template of complementary competing oligonucleotide (see Supplementary file 3) to 436

prevent long product/template reannealing). Samples were denatured (94˚C 5 min) and RNAs 437

separated by 8 M urea 1 TBE denaturing PAGE. 438

To avoid using potentially confounding competing oligonucleotide when purifying 439

extension products, reactions with a biotinylated primer or template (stopped as above) could be 440

purified by bead capture using MyOne C1 (Invitrogen) streptavidin-coated paramagnetic 441

microbeads (using 5 μg pre-washed beads per pmol biotinylated RNA) in 0.5-0.8 bead buffer 442

(BB: 200 mM NaCl, 10 mM tris•HCl pH 7.4 (at β5˚C), 1 mM EDTA, 0.1% Tween-20). After 443

washing twice in BB to remove unbound components, beads were incubated (1 min) in 25 mM 444

NaOH, 1 mM EDTA, 0.05% Tween-20 to denature the duplexes (Horning & Joyce, 2016). To 445

recover biotinylated extension products (e.g. Figure 5b left panels, Figure 6 - figure supplement 1 446

+/+

, Figure 7c left panel) the supernatant was discarded, and beads were washed first in BB 447

with 200 mM tris•HCl pH 7.4, then in BB, then heated (94˚C 4 min) in 95% formamide, 10 mM 448

EDTA to release primers for urea-PAGE. To recover extension products bound to biotinylated 449

templates (e.g. Figure 6 - figure supplement 1 +, +

, +, Figure 7c right panel) the supernatant 450

was removed, neutralized with 500 mM tris•HCl pH 7.4, spin-concentrated using Ultracel 3K 451

filters (Amicon), recovered and denatured in 6M urea/10 mM EDTA before urea-PAGE. + 452

synthesised in Figure 6 - figure supplement 1 was not spin-concentrated, leading to a lower 453

22

recovery yield; + synthesis was denatured directly from the ligation reaction in 60% formamide 454

with excess EDTA. 455

For gel mobility shift assays (Figure 3b), ribozymes were mixed at 0.5 με, pre-annealed 456

and buffer added on ice as for extension reactions, then mixed with 5 loading buffer (50% 457

glycerol, 250 mM tris•HCl pH 8.3, 125 mM MgCl2) for separation by native PAGE (0.5 TB, 8% 458

59:1 acrylamide:bisacrylamide, 25 mM MgCl2, run in a Hoefer SE600 Chroma (upper chamber: 459

0.5TB 50 mM NaOAc, lower chamber: 0.5TB 25 mM Mg(OAc)2) kept at 4˚C in a circulator 460

bath for 6-8 h at 10 W), then SYBR Gold stained. 461

462

Extension product detection, quantification and purification 463

Fluorescent primer extension products were detected using the appropriate laser 464

wavelength on a Typhoon Trio scanner (GE); gel densitometry allowed quantification of RNA 465

synthesis efficiency. Gel contrasts in figures were linearly adjusted to optimize display of bands 466

of differing intensities. 467

The gel in Figure 5b (middle panel) was washed thrice (5 min) in water, incubated with 468

10 με DFHBI-1T ligand in buffer for 20 min to fold full-length broccoli aptamer (as in (Filonov 469

et al., 2015)) and scanned. The ligand was then eluted in three 1 TBE washes (leaving negligible 470

background fluoresence), and stained in 1 TBE with SYBR Gold (1:10000), washed again, and 471

re-scanned to detect all RNA products (left panel); scans were aligned via an adjacent Cy5-472

labelled primer extension lane (not shown). 473

Full-length product yields in the Figure 6 - figure supplement 1 plus-strand syntheses were 474

calculated by running samples of bead-eluted products (or raw reaction for +) alongside known 475

amounts of the positive controls indicated, followed by SYBR-Gold staining. To purify, bead-476

eluted products were run similarly, and excised using UV shadowing. Products were then eluted 477

from the gel fragments in 10 mM tris•HCl pH 7.4, and Spin-X column filtrate (Costar) 478

23

precipitated in 75% ethanol with 1 μl 1% glycogen carrier (omitted for +). Recovered full-length 479

product yields were calculated similarly to reaction yields for +/+

/+, or using A260s for +

, 480

+, +

. 481

482

Fragment sequencing 483

Purified ribozyme- and TGK-synthesized +/+

fragments were sequenced by first 484

ligating a γ’ adaptor (10 U/μl T4 RNA Ligase 2 truncated KQ in 1 RNA ligase buffer (NEB) 485

with 15% PEG-8000 and 2 με AdeHDVLig at 10˚C overnight). These reactions were bound to 486

MyOne C1 microbeads, washed with BB to remove unligated adaptor, and reverse transcribed 487

(50˚C 30 min) with 1 με HDVrec primer using Superscript III (Invitrogen). Beads were washed 488

again then PCR amplified (5 cycles with a 40˚C annealing step, then 20 cycles with a 50˚C 489

annealing step) using GoTaq HotStart master mix (Promega) and 0.8 με each of primers 490

P3HDV, and P5Xα8 or P5X 7, applying tags for high-throughput sequencing (Illumina MiSeq or 491

HiSeq) after PCR product agarose gel purification. + syntheses’ cDNAs were amplified with 492

P3HDV and P5X6. 493

494

Fidelity assay 495

To estimate RNA synthesis fidelity, ribozymes extended primers using ppp

NNN on 496

templates encoding CCC-XXX-CCC, where XXX were 12 different triplet sequences evenly 497

exploring base composition and distribution (see Supplementary file 3; for XXX = ACC, template 498

encodes CCC-ACC-UCC to avoid a terminal run of Gs). 499

Each primer/template pair (0.45/0.525 pmol per reaction) was annealed in 4 mM MgCl2, 500

1 mM tris•HCl pH 7.4 (80˚C 2 min, ramped to 4˚C at 0.1˚C/s, then kept on ice). The 12 pairs 501

were combined in 0.27M MgCl2/67 mM tris•HCl pH 8.3 on ice to discourage primer-template 502

assortment (of which sequencing later revealed negligible levels). 36 pmol of each triplet 503

24

(equivalent to 5 με final concentration after considering eutectic phase equilibration effects upon 504

this more dilute reaction) in ppp

NNN were added to a reaction vessel in 10.8 μl water, to which 505

5.4 μl of the primer/template/buffer mix was added followed by 7.2 pmol of ribozyme pre-506

annealed (80˚C 2 min 17˚C 10 min, ice >5 min) in 1.8 μl water (f.c. equivalent 1 με, in excess 507

over the 0.875 με template to which some ribozymes could tether to enhance extension). 508

Reactions were frozen and incubated (7 d at ˚C) as described above. 509

Reactions were stopped with 3.6 μl 0.44 M EDTA and 10.5 pmol of each template 510

competing oligonucleotide (migrating above product, with marker mutations to ensure exclusion), 511

denatured with 6 M urea, and urea-PAGE separated. After alignment with a fluorescence scan of 512

the gel, a region of the sample lane corresponding to primers extended by +4 to +14 nt was 513

excised (encompassing 2-4 triplet additions), and extension products were eluted, precipitated in 514

77% ethanol with 1 μl 1% glycogen carrier, washed in 85% ethanol and resuspended in water. 515

These extension products were γ’ adaptor ligated as for fragment sequencing. Products 516

were reverse transcribed (0.2 adaptor ligation reaction, 1 με HDVrec primer in Superscript III 517

reaction, 50˚C 30 min) and then PCR-amplified (1/30th

reverse transcription mix, 0.8 με each or 518

primers P3HDV and P5GGGX) for sequencing as above (yielding 2105-410

6 sequences per 519

ribozyme assay). 520

After processing and γ’ adaptor trimming, sequences corresponding to primer extended by 521

CCC + 1-3 additional triplets were collated for analysis. Variations in upstream primer sequences 522

(see Supplementary file 3) allowed the partner template to be identified for each sequenced 523

product; the triplet incorporated after the first CCC was counted. Separately, 10 μl extensions by 524

t5+1

of each primer/template alone with its encoded triplet and ppp

CCC (and ppp

UCC for the ACC 525

pair) were combined for purification and sequencing as above, to allow isolation of the ribozyme-526

mediated errors resulting from inclusion of the other 62 (61 for ACC) triplets in the reaction 527

(versus errors from sequencing, recombination etc.). The counts of cognate triplet (C) & each 528

25

error triplet (E) in the positive control (p) reduced error counts in the experimental samples (x) to 529

yield ribozyme-mediated error counts (Er) thusly: Er = Ex - Ep*(Cx/Cp) (not reducing Ex below 0, 530

and reallocating all reductions to Cr; ppp

CCC counts (and ppp

UCC for the ACC template) remained 531

uncorrected). 532

For each template, counts were then collated at the 1st/2

nd/3

rd positions to yield base-533

specific mutation rates for each position (Figure 8 - figure supplement 2, Figure 8 - source data 534

1). Across the 12 triplets, A, C, G, and U were encoded at each position three times; linear 535

averages were calculated to map the position’s error profile (Figure 8b) and geometric means of 536

the four nucleobases yielded the position’s overall fidelity (Figure 8a, Figure 8 - figure 537

supplement 1). 538

539

Triphosphorylated triplet synthesis 540

Triplets (and some other short oligonucleotides) were prepared from NTPs by T7 RNA 541

polymerase run-off transcription of a 5’ single-stranded DNA overhang downstream of a DNA 542

duplex T7 promoter sequence. In most cases the 5’ overhang encoded (was the reverse 543

complement of) the desired oligonucleotide. These oligonucleotides were short enough to 544

synthesise during the abortive initiation stage of transcription, attenuating sequence constraints on 545

the first bases of the transcript. However, T7 RNA polymerase exhibited tendencies to skip the 546

first (or even second) base (most severe for U>C>A>G before 2nd

position purines: encoding 547

CGU yielded some ppp

GU, encoding UAC yielded just ppp

AC) or use oligonucleotides generated 548

during transcription to re-initiate (e.g. encoding GAG yielded ppp

GAGAG, encoding AAA yielded 549

pppA6-9, encoding UCC yielded

pppCCC, encoding CGC yielded some

pppGCGC; this tendency was 550

most severe when the oligonucleotide could be accommodated opposite the final template bases 551

of the promoter). 552

26

These tendencies could be subverted by encoding additional first bases (usually without 553

providing the corresponding NTP). This initiated the oligonucleotide at the second position where 554

skipping tendencies were lower (e.g. encoding CUAG without CTP yielded ppp

UAG, encoding 555

UUAC yielded some ppp

UAC), and reduced recruitment as initiators of products with bases not 556

complementary to the introduced first position template base (e.g. encoding CGAG without CTP 557

yielded ppp

GAG, encoding CAA without CTP yielded ppp

AA and ppp

AAA, encoding AUCC 558

without ATP yielded ppp

UCC, encoding UCGC without UTP yielded ppp

CGC). 559

Each 30 μl transcription reaction contained 72 nmol of each desired product base as an 560

NTP (Roche) (e.g. for ppp

UCC, 72 nmol UTP, 144 nmol CTP) in 1 MegaShortScript kit buffer 561

with 1.5 μl MegaShortScript T7 enzyme (ThermoFisher). Also present were 15 pmol of each 562

DNA oligonucleotide forming the transcription duplex target (see Supplementary file 2). The 563

reactions were incubated overnight at γ7˚C, stopped with 3 μl 0.44 M EDTA and 17 μl 10 M 564

urea, and separated by electrophoresis (35 W, 4.5 h) on a 35180.15 cm 30% 19:1 565

acrylamide:bis-acrylamide 3 M urea tris-borate gel. Products were identified through their relative 566

migrations (reflecting overall composition, fastest to slowest: C>UA>G) by UV shadowing. 567

Triplet bands were excised and eluted overnight in 10 mM tris•HCl pH 7.4, and filtrate (Spin-X) 568

precipitated with 0.3 M sodium acetate pH 5.5 in 85% ethanol. Pellets were washed in 85% 569

ethanol, resuspended in water, and UV absorbances measured with a Nanodrop ND-1000 570

spectrophotometer. Oligocalc (Kibbe, 2007) was used to calculate sequence-specific 571

concentrations and yields. ppp

NNN was generated by combination of equal amounts of each of the 572

64 triplet stocks in a lo-bind microcentrifuge tube (Eppendorf). 573

γ’-deoxy triphosphorylated ‘terminator’ triplets were transcribed as above but using a γ’ 574

deoxynucleoside 5’ triphosphate (Trilink biotechnologies) for the last position, migrating faster 575

during PAGE than the equivalent all-RNA triplet. Triplets with 2-thiouridine residues were 576

transcribed as for their corresponding U, replacing UTP with U2S

TP (Jena Bioscience); 577

27

incorporation and migration were similar between the two, and their concentrations were 578

calculated from A260nm by comparison to the A260nm of mixtures of the component ribonucleotides 579

with UTP vs. U2S

TP. Triplets with β’-fluoro, β’-deoxy positions could also be transcribed, with 580

lower efficiency, by substituting the corresponding triphosphate (Trilink Biotechnologies). The 581

biotinylated ppp

GAU^^B triplet used in segment synthesis (Figure 7c) was transcribed as for 582

pppGAU, replacing UTP with biotin-16-aminoallyluridine-5’-triphosphate (Trilink 583

Biotechnologies), quantified via by comparison to the A290nm of mixtures of the component 584

ribonucleotides. 585

Longer triphosphorylated oligonucleotides used in ribozyme self-synthesis were generated 586

similarly, but using ~200 ng of fully double stranded DNA as a template. Candidate product 587

bands were purified and the desired oligonucleotide identified by ribozyme-catalysed in-frame 588

incorporation and, for some, fragment sequencing. 589

590

RNA oligonucleotide/ribozyme preparation 591

Transcriptions were performed on ~ 15 ng/μl dsDNA using MegaShortScript enzyme and 592

buffer (ThermoFisher) with 7.8 mM of each NTP, or, to yield a 5’ monophosphate on the product 593

to avoid aberrant ligation (‘GεP transcription’), 10 mM GMP and 2 mM of each NTP. 594

dsDNA templates for some of these (in Supplementary file 3) were generated (‘fill-in’) 595

using three cycles of mutual extension (GoTaq HotStart, Promega) between the associated DNA 596

oligonucleotide and 5T7 (or, where indicated, HDVrt for defined γ’ terminus formation (Schurer 597

et al., 2002)) followed by column purification (QiaQuick, Qiagen). 598

Some 5’ biotinylated RNAs were synthesized using the TGK polymerase (Cozens et al., 599

2012) (56 μg/ml, in 1 Thermopol buffer (NEB) supplemented with 3 mM MgCl2) to extend 5’ 600

biotinylated RNA primers (0.75 με) on DNA templates (1 με) using 2.5 mM of each NTP 601

28

(94˚C 30 s, 45˚C 2 min, 65˚C 30 min, 45˚C 2 min, 65˚C 30 min, then all repeated). Biotinylated 602

products were bead-purified as above. 603

γ’ biotinylation of RNAs was achieved in two stages: γ’ azidylation (at 2 με with 25 U/μl 604

yeast poly-A polymerase (ThermoFisher) and 0.5 mM β’-azido-β’-deoxycytidine triphosphate 605

(Trilink Biotechnologies) for 1 h at γ7˚C) with subsequent acidic phenol/chloroform extraction 606

and 75% ethanol precipitation, then copper-catalysed biotin-(PEG)4-alkyne (ThermoFisher) 607

cycloaddition (Winz et al., 2012) with subsequent 75% ethanol precipitation followed by 608

resuspension and buffer exchange in Ultracel 3K filters (Amicon) to remove residual biotin-609

alkyne. 610

611

Selection library synthesis 612

Round 1 libraries were synthesised by mutual extension of 4 nmol of oligonucleotides 613

1baN30 and 1GMPfo or 1GTPfo at 1 με each in 1 isothermal amplification buffer (NEB) with 614

250 με each dNTP, annealed (80˚C 3 min, 65 ˚C 5 min) before addition of 0.4 U/μl Bst 2.0 615

(NEB) and 30 min 65˚C incubation. 616

After purification, 375 μg of each DNA (~1.51015

molecules) were transcribed in 5 ml 617

transcription reactions (36 mM tris•HCl pH 7.9 (at 25˚C), 1.8 mM spermidine, 9 mM DTT, 618

10.8 mM MgCl2, 2 mM each NTP, 1% 10 MegaShortScript buffer, 2% 1:9 619

MegaShortScript:NEB T7 RNA polymerase, γ7˚C overnight). These were treated with DNase, 620

acid phenol/chloroform extracted and 73% ethanol precipitated prior to urea-PAGE purification, 621

elution, filtering (Spin-X) and re-precipitation, yielding the 1GTP Zcore selection construct 622

(Supplementary file 1). 10 mM GMP was present in transcriptions of the 1GMP construct, and for 623

future transcriptions of the GMP construct selection branch and rounds 8-18; round 19-21 and 624

reselection libraries were transcribed without GMP. Most subsequent selection rounds were 625

29

transcribed in 1/10th

scale transcriptions with 15 μg of DNA (~61013

molecules) derived from 626

amplification of recovered PCR products (see later). 627

For round 8, 700 pmol DNA was formed with Tri3CUUQ, amplifying round 7 merged 628

output (50 pmol), round 7 merged output recombined by StEP (Zhao & Zha, 2006) (200 pmol), 629

and 0core ribozyme with the starting γ’ 30N library domain added (450 pmol, but extinct at the 630

end of selection). DNA encoding type 5s amplified with AACAt5s was used to generate 631

reselection libraries by PCR amplification using primers TriGAA7GAAM and 632

T5ba13N/T5ba20N/T5ba28N; 5 pmol of the three dsDNA products were transcribed to generate 633

reselection constructs. 634

635

In vitro evolution cycle 636

An outline of the selection strategy is shown in Figure 1 - figure supplement 2, with 637

detailed lists of selection oligonucleotides and extension parameters in Figure 1 - source data 1, 638

Figure 2 - source data 1, Figure 4 - source data 1 and Supplementary file 3. Selection construct 639

was first annealed with equimolar dual-5’ biotinylated primer in water (80˚C 2-4 min, 17˚C 10 640

min), then chilled extension buffer and triplets were added before freezing and 7˚C incubation. 641

At the end of incubation the reaction was thawed on ice. To link the primer γ’ hydroxyl to 642

the 5’ monophosphate of GMP constructs, selection constructs were buffer-exchanged directly 643

after thawing using a PD-10 column (GE) in a cold room, into 3 ml ligation mix (optimised to 644

prevent ligation over gaps) (2 mM MgCl2, 50 mM tris•HCl pH 7.4, 0.1 mM ATP, 1 mM DTT, 645

2 με HO

GCG (Rounds 1-7) or 2 με HO

CUG (Rounds 8-18) with 30 U/ml T4 RNA Ligase 2 646

(NEB)). After incubation at 4˚C for 1 h, these were stopped with 2.2 mM EDTA and acid 647

phenol/chloroform treated. 648

Constructs were then precipitated with glycogen carrier and 0.3 M sodium acetate in 649

isopropanol (55%) before resuspension and denaturation (94˚C 4 min, in 6M urea 10 mM EDTA 650

30

with a 3 excess of competing oligonucleotide against the primer). For the reselection rounds, 651

constructs were then treated with polynucleotide kinase (NEB) before denaturation to resolve the 652

HDV-derived β’, γ’-cyclic phosphates and allow later adaptor ligation. 653

Constructs were urea-PAGE separated alongside FITC-labelled RNAs equivalent to 654

successfully ligated constructs. The gel region in the construct lane alongside these was excised to 655

exclude the bulk unreacted construct (judged by UV shadowing). Biotinylated (primer-linked) 656

constructs were eluted overnight into BB with 100 μg MyOne C1 beads. After 30 μm filtering 657

(Partec Celltrics) of the supernatant to remove gel fragments, the beads were washed in BB then 658

denaturing buffer (8 M urea, 50 mM tris•HCl pH 7.4, 1 mM EDTA, 0.1% Tween-20, 10 με 659

competing oligonucleotide, 60˚C 2 min) to confirm covalent linkage of construct to primer, 660

before further BB washing and transfer (to a fresh microcentrifuge tube to minimize downstream 661

contamination). At this stage in the reselection, γ’ adaptors were then ligated to bead-bound 662

constructs as above for 2 h (with buffer/enzyme added after bead resuspension in other reaction 663

components including 0.04% Tween-20), and beads BB washed and transferred again. 664

Bead-bound constructs were now reverse-transcribed using 1 με RTri (or HDVRec for 665

the reselection) by resuspension in a Superscript III reaction with added 0.02% Tween-20 (50˚C 666

30 min). Beads were BB washed and the RNA-bound cDNA γ’ end blocked by incubation with 667

terminal deoxynucleotidyl transferase (ThermoFisher) and 0.2 mM dideoxy-ATP (TriLink) with 668

0.02% added Tween-20 (γ7˚C 30 min), and beads were BB washed and transferred again. 669

cDNAs were eluted (10 μl 0.1 M NaOH 0.1% Tween-20 20 min), neutralized and plus 670

strands regenerated with 0.2 με rescue oligonucleotide in an IsoAmp II universal tHDA kit 671

(NEB) reaction (65˚C 60 min) to read through the structured product region. Whilst at this 672

temperature, reactions were stopped with 5 mM EDTA and 1 vol BB with 50-100 μg beads to 673

bind the nascent biotinylated plus strands at room temperature. These beads were then BB 674

31

washed, NaOH washed again to discard cDNAs (and recover only correctly-primed plus strands), 675

and washed and transferred again. 676

Each 50 μg beads were then subjected to plus strand recovery PCR in a 100 μl GoTaq 677

HotStart reaction with 0.5 με each RTri (or HDVrec for reselection) and RecInt (rounds 1-5) / 678

RecIntQ (rounds 6-9) / RecIntL (rounds 10-14, 19-21 & reselection) / RecIntQL (rounds 15-18). 679

The product was agarose size-purified, A260nm quantified and added to construct synthesis PCR in 680

3 molar amount of the anticipated recovered RNA (judged by test extensions) that yielded it. 681

This final PCR for construct transcription in the subsequent selection round used 1 με of 682

the indicated construct synthesis primer, plus 1 με RTri (or HDVrt for the reselection), in GoTaq 683

HotStart reactions or (where indicated in source data) the GeneMorph II kit for mutagenesis 684

(Agilent). 685

32

References and Notes 686

Attwater, J., Tagami, S., Kimoto, M., Butler, K., Kool, E. T., Wengel, J., Herdewijn, P., Hirao, I., 687

& Holliger, P. (2013a). Chemical fidelity of an RNA polymerase ribozyme. Chemical 688

Science. doi: 10.1039/C3SC50574J 689

Attwater, J., Wochner, A., & Holliger, P. (2013b). In-ice evolution of RNA polymerase ribozyme 690

activity. Nature Chemistry, 5 (12), 1011-1018. doi: 10.1038/nchem.1781 691

Attwater, J., Wochner, A., Pinheiro, V. B., Coulson, A., & Holliger, P. (2010). Ice as a 692

protocellular medium for RNA replication. Nat Commun, 1 (6), 76. doi: 693

10.1038/ncomms1076 694

Battle, D. J., & Doudna, J. A. (2002). Specificity of RNA-RNA helix recognition. Proc Natl Acad 695

Sci U S A, 99 (18), 11676-11681. doi: 10.1073/pnas.182221799 696

Boza, G., Szilagyi, A., Kun, A., Santos, M., & Szathmary, E. (2014). Evolution of the division of 697

labor between genes and enzymes in the RNA world. PLoS Comput Biol, 10 (12), 698

e1003936. doi: 10.1371/journal.pcbi.1003936 699

Cozens, C., Pinheiro, V. B., Vaisman, A., Woodgate, R., & Holliger, P. (2012). A short adaptive 700

path from DNA to RNA polymerases. Proc Natl Acad Sci U S A, 109 (21), 8067-8072. 701

doi: 10.1073/pnas.1120964109 702

Crick, F. H. (1968). The origin of the genetic code. J Mol Biol, 38 (3), 367-379. 703

Doudna, J. A., Couture, S., & Szostak, J. W. (1991). A multisubunit ribozyme that is a catalyst of 704

and template for complementary strand RNA synthesis. Science, 251 (5001), 1605-1608. 705

Doudna, J. A., Usman, N., & Szostak, J. W. (1993). Ribozyme-catalyzed primer extension by 706

trinucleotides: a model for the RNA-catalyzed replication of RNA. Biochemistry, 32 (8), 707

2111-2115. 708

Eigen, M. (1971). Selforganization of matter and the evolution of biological macromolecules. 709

Naturwissenschaften, 58 (10), 465-523. 710

Eisinger, J., & Gross, N. (1975). Conformers, Dimers, and Anticodon Complexes of Transfer-711

Rna-Glu2 (Escherichia-Coli). Biochemistry, 14 (18), 4031-4041. doi: Doi 712

10.1021/Bi00689a016 713

Filonov, G. S., Kam, C. W., Song, W., & Jaffrey, S. R. (2015). In-gel imaging of RNA processing 714

using broccoli reveals optimal aptamer expression strategies. Chem Biol, 22 (5), 649-660. 715

doi: 10.1016/j.chembiol.2015.04.018 716

Filonov, G. S., Moon, J. D., Svensen, N., & Jaffrey, S. R. (2014). Broccoli: rapid selection of an 717

RNA mimic of green fluorescent protein by fluorescence-based selection and directed 718

evolution. J Am Chem Soc, 136 (46), 16299-16308. doi: 10.1021/ja508478x 719

Gilbert, Walter. (1986). Origin of life: The RNA world. Nature, 319 (6055), 618-618. 720

Green, R., & Szostak, J. W. (1992). Selection of a ribozyme that functions as a superior template 721

in a self-copying reaction. Science, 258 (5090), 1910-1915. 722

Hayden, E. J., & Lehman, N. (2006). Self-assembly of a group I intron from inactive 723

oligonucleotide fragments. Chem Biol, 13 (8), 909-918. doi: 724

10.1016/j.chembiol.2006.06.014 725

33

Horning, D. P., & Joyce, G. F. (2016). Amplification of RNA by an RNA polymerase ribozyme. 726

Proc Natl Acad Sci U S A, 113 (35), 9786-9791. doi: 10.1073/pnas.1610103113 727

Huang, S. C., Smith, J. R., & Moen, L. K. (1992). Contribution of the p51 subunit of HIV-1 728

reverse transcriptase to enzyme processivity. Biochem Biophys Res Commun, 184 (2), 729

986-992. 730

Johnston, W. K., Unrau, P. J., Lawrence, M. S., Glasner, M. E., & Bartel, D. P. (2001). RNA-731

catalyzed RNA polymerization: accurate and general RNA-templated primer extension. 732

Science, 292 (5520), 1319-1325. 733

Kibbe, W. A. (2007). OligoCalc: an online oligonucleotide properties calculator. Nucleic Acids 734

Res, 35 (Web Server issue), W43-46. 735

Lawrence, M. S., & Bartel, D. P. (2003). Processivity of ribozyme-catalyzed RNA 736

polymerization. Biochemistry, 42 (29), 8748-8755. 737

Lincoln, T. A., & Joyce, G. F. (2009). Self-sustained replication of an RNA enzyme. Science, 323 738

(5918), 1229-1232. 739

Markham, N. R., & Zuker, M. (2005). DINAMelt web server for nucleic acid melting prediction. 740

Nucleic Acids Res, 33 (Web Server issue), W577-581. doi: 10.1093/nar/gki591 741

Martin, L. L., Unrau, P. J., & Muller, U. F. (2015). RNA synthesis by in vitro selected ribozymes 742

for recreating an RNA world. Life (Basel), 5 (1), 247-268. doi: 10.3390/life5010247 743

Monnard, P. A., Kanavarioti, A., & Deamer, D. W. (2003). Eutectic phase polymerization of 744

activated ribonucleotide mixtures yields quasi-equimolar incorporation of purine and 745

pyrimidine nucleobases. J Am Chem Soc, 125 (45), 13734-13740. 746

Mutschler, H., Wochner, A., & Holliger, P. (2015). Freeze-thaw cycles as drivers of complex 747

ribozyme assembly. Nature Chemistry, 7 (6), 502-508. doi: 10.1038/nchem.2251 748

Noller, H. F. (2012). Evolution of protein synthesis from an RNA world. Cold Spring Harb 749

Perspect Biol, 4 (4), a003681. doi: 10.1101/cshperspect.a003681 750

O'Flaherty, D. K., Kamat, N. P., Mirza, F. N., Li, L., Prywes, N., & Szostak, J. W. (2018). 751

Copying of Mixed-Sequence RNA Templates inside Model Protocells. J Am Chem Soc. 752

doi: 10.1021/jacs.8b00639 753

Ogle, J. M., Brodersen, D. E., Clemons, W. M., Jr., Tarry, M. J., Carter, A. P., & Ramakrishnan, 754

V. (2001). Recognition of cognate transfer RNA by the 30S ribosomal subunit. Science, 755

292 (5518), 897-902. doi: 10.1126/science.1060612 756

Orgel, L. E. (1968). Evolution of the genetic apparatus. J Mol Biol, 38 (3), 381-393. 757

Paillart, J. C., Shehu-Xhilaga, M., Marquet, R., & Mak, J. (2004). Dimerization of retroviral RNA 758

genomes: an inseparable pair. Nat Rev Microbiol, 2 (6), 461-472. doi: 10.1038/nrmicro903 759

Petrov, A. S., Gulen, B., Norris, A. M., Kovacs, N. A., Bernier, C. R., Lanier, K. A., Fox, G. E., 760

Harvey, S. C., Wartell, R. M., Hud, N. V., & Williams, L. D. (2015). History of the 761

ribosome and the origin of translation. Proc Natl Acad Sci U S A, 112 (50), 15396-15401. 762

doi: 10.1073/pnas.1509761112 763

Poole, A. M., Jeffares, D. C., & Penny, D. (1998). The path from the RNA world. J Mol Evol, 46 764

(1), 1-17. 765

Prywes, N., Blain, J. C., Del Frate, F., & Szostak, J. W. (2016). Nonenzymatic copying of RNA 766

templates containing all four letters is catalyzed by activated oligonucleotides. Elife, 5. 767

doi: 10.7554/eLife.17756 768

34

Rohatgi, R., Bartel, D. P., & Szostak, J. W. (1996). Kinetic and mechanistic analysis of 769

nonenzymatic, template-directed oligoribonucleotide ligation. J Am Chem Soc, 118 (14), 770

3332-3339. 771

Schurer, H., Lang, K., Schuster, J., & Morl, M. (2002). A universal method to produce in vitro 772

transcripts with homogeneous 3' ends. Nucleic Acids Res, 30 (12), e56. 773

Sczepanski, J. T., & Joyce, G. F. (2014). A cross-chiral RNA polymerase ribozyme. Nature, 515 774

(7527), 440-442. doi: 10.1038/nature13900 775

Shechner, D. M., Grant, R. A., Bagby, S. C., Koldobskaya, Y., Piccirilli, J. A., & Bartel, D. P. 776

(2009). Crystal structure of the catalytic core of an RNA-polymerase ribozyme. Science, 777

326 (5957), 1271-1275. 778

Suslov, N. B., DasGupta, S., Huang, H., Fuller, J. R., Lilley, D. M., Rice, P. A., & Piccirilli, J. A. 779

(2015). Crystal structure of the Varkud satellite ribozyme. Nat Chem Biol, 11 (11), 840-780

846. doi: 10.1038/nchembio.1929 781

Szostak, J. W. (2011). An optimal degree of physical and chemical heterogeneity for the origin of 782

life? Philos Trans R Soc Lond B Biol Sci, 366 (1580), 2894-2901. doi: 783

10.1098/rstb.2011.0140 784

Szostak, J. W. (2012). The Eightfold Path to non-enzymatic RNA replication. J. Syst. Chem., 3. 785

Szostak, J. W., Bartel, D. P., & Luisi, P. L. (2001). Synthesizing life. Nature, 409 (6818), 387-786

390. 787

Tagami, S., Attwater, J., & Holliger, P. (2017). Simple peptides derived from the ribosomal core 788

potentiate RNA polymerase ribozyme function. Nature Chemistry, 9 (4), 325-332. doi: 789

10.1038/nchem.2739 790

Vaidya, N., Manapat, M. L., Chen, I. A., Xulvi-Brunet, R., Hayden, E. J., & Lehman, N. (2012). 791

Spontaneous network formation among cooperative RNA replicators. Nature, 491 (7422), 792

72-77. doi: 10.1038/nature11549 793

Vaish, N. K., Jadhav, V. R., Kossen, K., Pasko, C., Andrews, L. E., McSwiggen, J. A., Polisky, 794

B., & Seiwert, S. D. (2003). Zeptomole detection of a viral nucleic acid using a target-795

activated ribozyme. Rna, 9 (9), 1058-1072. 796

Vlassov, A., Khvorova, A., & Yarus, M. (2001). Binding and disruption of phospholipid bilayers 797

by supramolecular RNA complexes. Proc Natl Acad Sci U S A, 98 (14), 7706-7711. doi: 798

10.1073/pnas.141041098 799

Vogel, S. R., Deck, C., & Richert, C. (2005). Accelerating chemical replication steps of RNA 800

involving activated ribonucleotides and downstream-binding elements. Chem Commun 801

(Camb) (39), 4922-4924. 802

Warner, K. D., Sjekloca, L., Song, W., Filonov, G. S., Jaffrey, S. R., & Ferre-D'Amare, A. R. 803

(2017). A homodimer interface without base pairs in an RNA mimic of red fluorescent 804

protein. Nat Chem Biol, 13 (11), 1195-1201. doi: 10.1038/nchembio.2475 805

Weiss, Robert, & Cherry, Joshua. (1993). 3 Speculations on the Origin of Ribosomal 806

Translocation. 807

Winz, M. L., Samanta, A., Benzinger, D., & Jaschke, A. (2012). Site-specific terminal and 808

internal labeling of RNA by poly(A) polymerase tailing and copper-catalyzed or copper-809

free strain-promoted click chemistry. Nucleic Acids Res, 40 (10), e78. doi: 810

10.1093/nar/gks062 811

35

Wochner, A., Attwater, J., Coulson, A., & Holliger, P. (2011). Ribozyme-catalyzed transcription 812

of an active ribozyme. Science, 332 (6026), 209-212. 813

Zaher, H. S., & Unrau, P. J. (2007). Selection of an improved RNA polymerase ribozyme with 814

superior extension and fidelity. Rna, 13 (7), 1017-1026. 815

Zhang, W., Tam, C. P., Zhou, L., Oh, S. S., Wang, J., & Szostak, J. W. (2018). Structural 816

Rationale for the Enhanced Catalysis of Nonenzymatic RNA Primer Extension by a 817

Downstream Oligonucleotide. J Am Chem Soc. doi: 10.1021/jacs.7b11750 818

Zhao, H., & Zha, W. (2006). In vitro 'sexual' evolution through the PCR-based staggered 819

extension process (StEP). Nat Protoc, 1 (4), 1865-1871. 820

Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic 821

Acids Res, 31 (13), 3406-3415. 822

823

824

Acknowledgments: We thank S. Thomsen, M. McKie and M. Sharrock for important 825

discussions, and B.T. Porebski and S.L. James for assistance with sequencing analysis. This work 826

was supported by a Homerton College junior research fellowship (J.A.), a Harvard-Cambridge 827

summer fellowship (A.R.) and by the Medical Research Council (J.A., E.G., A.M., P.H.; 828

programme no. MC_U105178804). 829

830

Conditions for selections are included as source data 1 for Figures 1, 2 and 4. Numerical data for 831

Figures 1c and 3b are included in Figure 1 - source data 2 and Figure 3 - source data 1 832

respectively. Numerical data and calculations for Figure 8 and its supplements 1 and 2 are 833

supplied as Figure 8 - source data 1. Numerical data and calculations for Figure 8 - figure 834

supplement 3 are supplied as Figure 8 - source data 2. A more extensive selection of substrate 835

concentration-dependent error rates is supplied in Figure 9 - source data 1. Sequences of 836

ribozymes, triplet synthesis templates, and oligonucleotides used in this study are supplied in 837

Supplementary files 1, 2 and 3 respectively. 838

36

Figure legends 839

Figure 1. Monomer polymerisation and triplet polymerisation. 840

(A) Scheme outlining initial derivation of a triplet polymerase activity from a mononucleotide 841

polymerase ribozyme via directed evolution. Z RPR truncation effects are shown in Figure 1 - 842

figure supplement 1, the selection cycle is outlined in Figure 1 - figure supplement 2, and the 843

selection conditions of rounds 1-7 are listed in Figure 1 - source data 1. Below, modes of action 844

and secondary structures of the mononucleotide polymerase ribozyme (Z RPR) and a triplet 845

polymerase ribozyme (0core), both depicted surrounding primer (tan) / template (grey) duplexes 846

with a mononucleoside triphosphate (NTP) or trinucleotide triphosphate (triplet) substrate present 847

(red). Here the templates are hybridised to the ribozyme upstream of the primer binding site, 848

flexibly tethered to enhance local concentration and activity (via L repeats of an AACA sequence, 849

e.g. L = 5 in templates SR1-4 below). Z RPR residues comprising its catalytic core (Zcore) are 850

black; mutations in 0core arising from directed evolution of Zcore are in teal. 851

(B) Primer extension by the Z RPR using monomers (1 mM NTPs) or by 0core using triplets 852

(5 με ppp

AUA & ppp

CGC) on a series of 6-nucleotide repeat templates (SR1-4, examples below) 853

with escalating secondary structure potential that quenches Z RPR activity beyond the shortest 854

template SR1 (7˚C ice 17 d, 0.5 με/RNA). Extension by the triplet polymerase ribozyme 0core 855

can overcome these structure tendencies up to the longest template SR4. 856

(C) Triplet concentration dependence of extension using templates SR1 (grey circles) and SR3 857

(black triangles) by 0core (ppp

AUA & ppp

CGC, 0.1 με of primer A10, template and ribozyme, 858

7˚C supercooled 15 d, ± s.d., n = 3); shown below is a model of cooperative triplet-mediated 859

unfolding of template SR3 structure to explain the sigmoidal triplet concentration dependence 860

(red curve) of extension upon it. Numerical values are supplied in Figure 1 - source data 2. 861

Figure 1 - source data 1. Selection conditions of rounds 1-7. 862

Figure 1 - source data 2. Triplet concentration-dependent extension values. 863

864

Figure 2. Emergence of cooperativity during in vitro evolution. 865

(A) Composition of the round 21 selection pool as a % of total pool sequences; selection 866

conditions of rounds 8-21 are listed in Figure 2 - source data 1. Secondary structures of ribozyme 867

37

type 1-6 archetypes are shown in Figure 2 - figure supplement 1, and comparison of their 868

activities to that of the polyclonal selection pool is shown in Figure 2 - figure supplement 2. 869

(B) Primer extension with triplets by these emergent triplet polymerase ribozyme types 2-6 (‘Rz’, 870

alongside the starting Zcore ribozyme with ‘cap+’ sequence from selection, see Figure 3a), alone 871

or with added truncated type 1 (+1, see Figure 3a) which boosted their triplet polymerase 872

activities (0.5 με Rzs/A10 primer/SR3 template, 5 με ppp

AUA & ppp

CGC, 7˚C ice 16 h). 873

Figure 2 - source data 1. Selection conditions of rounds 8-21. 874

875

Figure 3. Heterodimer formation and behaviour. 876

(A) Secondary structures of the most common type 1 and type 5 clones from the selection, with 877

in vitro-selected γ’ domain and core mutations coloured orange (type 1) or blue (type 5). 878

5’ truncation of type 1 (faded, including its putative primer/template interacting region), yielding 879

the minimal type 1 variant ‘1’, maintained its enhancement activity (see Figure 3 - figure 880

supplement 1). The effects of transplanting the indicated 5’ hairpin ‘cap+’ element from type 5 to 881

other ribozymes are shown in Figure 3 - figure supplement 2. The inset shows type 5 ‘cap+’ 882

hairpin element alteration to ‘cap–’ (yielding type 5cap-

). 883

(B) Gel mobility shift characteristic of complex formation resulting from mixing of type 5 884

ribozyme with type 1 RNA (equimolar, or with the indicated equivalents). Type 5cap-

loses this 885

shift and its susceptibility to type 1 activity enhancement (Figure 3 - figure supplement 1). Below, 886

shifted band intensities with increasing type 1 addition are plotted (quantified relative to the 887

indicated type 5+1

lane intensities, n = 4 ± s.d.), signifying 1:1 heterodimer formation; numerical 888

values are supplied in Figure 3 - source data 1. 889

(C) Type 1 enhancement allows type 5 variants to synthesise long RNAs using triphosphorylated 890

oligonucleotide (11 nt) or short triplet (3 nt) substrates (Sub, 3.6 με or 5 με each, substrate 891

sequences in grey beside lanes; 0.4 με primer A11/template I-8, 2 με each Rz, 7˚C ice for 892

16 d). This activity is independent of template tethering (Wochner et al., 2011), as comparable 893

synthesis is achieved by versions of type 5 whose 5’ regions allow or avoid hybridisation to the 894

template (type 5cis

or type 5trans

respectively, schematic above, sequences in Supplementary file 895

1). The average extent of ligation at the end of the reaction amongst all junctions in a lane is 896

shown beneath each lane. 897

Figure 3 - source data 1. Relative intensities of shifted bands when varying type 1 equivalents. 898

38

899

Figure 4. A trans-acting heterodimeric triplet polymerase. 900

Top, scheme outlining derivation of the final t5+1

triplet polymerase archetype from the type 1 901

and type 5 RNAs (shown in Figure 3a) by reselection using the conditions in Figure 4 - source 902

data 1. Below, the secondary structure of this ribozyme heterodimer, 135 nt (1) & 153 nt (t5) 903

long, is depicted operating in trans on a non-tethered primer/template duplex. Type 5 γ’ domain 904

bases that re-emerged after randomisation during reselection are coloured black in the t5 γ’ 905

domain. Ribozyme development is summarized in Figure 4 - figure supplement 1; all ribozyme 906

sequences are listed in Supplementary file 1. 907

Figure 4 - source data 1. Selection conditions of rounds 1-5 of the reselection. 908

909

Figure 5. Triplet-mediated structured and functional template copying. 910

(A) Extension on three structured hairpin templates (4S, 6S, 8S) with increasing stability. Top, the 911

mfold-predicted (Markham & Zuker, 2005) structure and TM of the most stable 8S template; 912

below, primer extensions on these templates by t5+1

triplet polymerase (with 5 με each encoded 913

triplet) or type 1-enhanced Z polymerase ribozyme (Z RPRcap+ +1

, with 1 mM each NTP) (2 με 914

ribozyme, 0.5 με 4S, 6S or 8S template and primer A9, 7˚C ice 25 d). The self-complementary 915

region in each template is indicated between each pair of lanes (shaded by triplet), with the 916

encoded triplet substrate sequences at the left (in grey, with 5’ template overhangs in brackets). 917

Syntheses using different substrate compositions and concentrations are shown in Figure 5 - 918

figure supplement 1 and Figure 5 - figure supplement 2. While all hairpin templates are robustly 919

copied by t5+1

, synthesis by the Z RPR is completely arrested by the 6S and 8S hairpins. 920

(B) Synthesis of the broccoli aptamer. The native secondary structure is shown above (Tan: bases 921

from ‘+’ strand synthesis primer. Green: bases from triplets. Outlined green: primer binding site 922

for the ‘’ strand synthesis). Below left, t5+1

-catalysed synthesis of fluorescent broccoli aptamer, 923

run alongside standard (Broc+, synthesized by in vitro transcription), and stained for RNA with 924

SYBR Gold (magenta) or folded with DFHBI-1T ligand (green fluorescence) (2 με t5+1

, 925

1 με BBrc10/TBrc, 5 με each triplet (in grey), -7˚C ice 22 d). Below right, ‘’ strand synthesis 926

on Broc+ standard (0.5 με without ligand in ribozyme extension buffer, 0.5 με FBrcb6 primer, 927

2 με t5+1

, 5 με each triplet (in grey), 7˚C ice 38 d). t5+1

is able to synthesize both full-length 928

functional (fluorescent) Broccoli ‘+’ and encoding ‘–’ strands. 929

39

930

Figure 6. Ribozyme self-synthesis and assembly of its own catalytic domain. 931

(A) t5+1

-catalysed syntheses of the five catalytic domain ‘+’ and ‘−’ segments via triplet extension 932

of primers (grey) in 7˚C ice. Triplets are coloured by segment and shown alongside the lanes; 933

longer oligonucleotide substrates (faded) were provided for α+ and

+ syntheses to combat 934

ribozyme-template pairing as shown in Figure 6 - figure supplement 2 and Figure 6 - figure 935

supplement 3. The triplets were supplied at 5 με (α+ to

+), 10 με ( −

to −), or 20 με (α−

) each, 936

with 0.5 με primer/template (P/T; 1 με for F 6/T ) and oligonucleotides equimolar to template 937

sites. Use of substrates of more heterogenous compositions and lengths are shown in Figure 6 - 938

figure supplement 4 and Figure 6 - figure supplement 5 respectively. Densitometry gave yields of 939

full-length products (boxed, by % of total primer), and geometric mean of the final extents of 940

ligation (78%) across all 70 junctions in this self-synthesis context. These segment sequences 941

derive from t5b, a t5 variant with a neutral signature mutation (Supplementary file 1). 942

(B) Secondary structure representation of a t5 catalytic domain (+/+

, t5b sequence), formed 943

via non-covalent assembly of t5+1

-synthesised ‘+’ strand fragments in Figure 6 - figure 944

supplement 1, coloured by segment and synthesis substrate as in (A). 945

(C) Activity of ribozyme-synthesised +/+

(B), compared to protein-synthesised +/+

946

and full-length t5b equivalents. These were assayed for synthesis of a -

strand segment on a 947

ribozyme-synthesised + template, with added in vitro transcribed type 1 (2 με each Rz, 5 με 948

triplets, 0.5 με P/T, 7˚C 0.25× ice 10 d). The ribozyme-synthesized and assembled +/+

949

ribozyme is as active as in vitro transcribed equivalents, and can efficiently utilize ribozyme-950

synthesized RNA (+) as a template. 951

952

Figure 7. Triplet-initiated template sequence copying. 953

(A) Extension by t5+1

of fluorescein-labelled primers bound to either the γ’ (A10) or 5’ (pppba) 954

ends of a template (T8GAA, 10 με ppp

GAA, 0.5 με/RNA, 7˚C ice 69 h), demonstrating 955

extension in either 5’-γ’ (A10) or γ’-5’ (pppba) directions. 956

(B) Synthesis of + on T template via t5

+1-catalysed polymerisation of the substrates indicated 957

on the right (2 με t5+1

, 5 με each triplet, 0.5 με template (lane 2: 0.5 με hexanucleotide), 7˚C 958

ice 9 d). Extension products in lanes 2-5 were eluted from template, PAGE-separated and SYBR-959

40

Gold stained alongside in vitro transcribed full-length segment control (lane 1). Lane 3 shows 960

full-length synthesis of +

segment from triplets alone. 961

(C) Triplet based replication of the segment. Top left, synthesis of + from the indicated 962

substrates (1 μM t5+1

, 5 με each triplet, 2 με THP template (lane 2: 2 με Bio7 primer), 7˚C 963

ice 7 d). Biotinylated extension products in lanes 2-4 were isolated from template, PAGE-964

separated and SYBR-Gold stained alongside in vitro transcribed staining marker (M+m1, lane 1). 965

This indicated 10% yield (per template) of full-length synthesis of + segment from triplets alone 966

(the final band in lane 4), which was purified for use as a template in - segment synthesis 967

(bottom right, 1 με t5+1

, 5 με each triplet, 0.05 με template, with 0.05% Tween-20, 7˚C ice 968

27 d). Extension products in lanes 5 & 6 were eluted from template, PAGE-separated and SYBR-969

Gold stained alongside in vitro transcribed full-length segment control (M -m1, lane 7). This 970

indicated 6% yield (per template) of full-length synthesis of - segment from triplets alone (the 971

final band in lane 6). 972

973

Figure 8. Fidelity of ribozyme-catalysed triplet polymerisation. 974

(A) To estimate its fidelity, each ribozyme was provided with an equimolar mix of all 64 triplet 975

substrates (ppp

NNN at 5 με each) for primer extension using templates containing twelve 976

representative trinucleotide sequences (NNN). Deep sequencing of extension products 977

identified the triplets added opposite each template trinucleotide, yielding position-specific error 978

tendencies; the overall fidelity was calculated as a geometric mean of positional errors at each 979

triplet position (n and s.d. of this value shown for ribozymes assayed multiple times, see Figure 8 980

- source data 1 for analysis of collated errors). The triplet polymerases exhibit diverse fidelity 981

profiles; fidelity profiles of other type 5 variants are shown in Figure 8 - figure supplement 1. 982

(B) Collation of error rates by base type and position for type 5 with (t5+1

) and without (α +1) 983

the fidelity domain. Positional and overall fidelities are calculated as geometric means (see 984

Figure 8 - source data 1); individual positional fidelities are plotted in Figure 8 - figure 985

supplement 2. 986

(C) Schematic summary of effects of triplet minor groove modification upon the fidelity 987

phenotype. In the depicted trinucleotide RNA duplex segment, spheres represent minor groove 988

groups potentially available for hydrogen bonding in a sequence-general manner. For three of 989

these groups (highlighted in black), we assayed whether their modification in substrates (2F = β’ 990

41

fluoro, 2S = 2-thio) affected the fidelity domain’s mismatch discrimination capabilities (detailed 991

in Figure 8 - figure supplement 3, with data and calculations in Figure 8 - source data 2). These 992

groups are labelled with the fraction of fidelity phenotype retained () when discriminating 993

between the indicated modified substrates. Colour reflects the impact of that group’s modification 994

upon the fidelity phenotype, with red denoting a strong disruptive effect, and yellow weak or 995

negligible effects. 996

Figure 8 - source data 1. Analysis of collated errors by ribozymes in the fidelity assay. 997

Figure 8 - source data 2. Calculation of residual fidelity phenotypes in Figure 8c. 998

999

Figure 9. Substrate pool interactions improve triplet fidelity. 1000

Applying the fidelity assay (Figure 8a, using t5+1

) to single templates with only an encoded triplet 1001

and a mispairing one as substrates (at 5 με each), we observed that relative mispair incorporation 1002

was proportionally reduced (by 61% (left) and 73% (right)) upon introduction of complementary 1003

triplets. Using all 64 triplets (ppp

NNN) has an analogous effect upon these pair/mispair 1004

comparisons with fidelity progressively improved upon increasing overall ppp

NNN concentrations, 1005

with example effects on other triplets and overall fidelity presented in Figure 9 - figure 1006

supplement 1, and comprehensive error rates and ratios in Figure 9 - source data 1. 1007

Figure 9 - source data 1. Collated error rates and ratios at different substrate concentrations. 1008

42

Figure supplement legends 1009

Figure 1 - figure supplement 1. Templated ligase activity from a mononucleotide 1010

polymerase. 1011

Extension of primer A11 on template HTI by the Z RPR (Wochner et al., 2011) and a γ’ 1012

truncation of it (Zcore, Figure 1a) at 7˚C in ice for 8 d (0.5 με of each RNA), adding as 1013

substrates either a long oligonucleotide (‘11 nt’, pppGCGAAGCGUGU at 0.5 με), a triplet 1014

(‘γ nt’, pppGCG at 5 με), or NTPs (‘1 nt’, at 1 mM each). Gel densitometry was used to calculate 1015

the percentage of primer extended (shown below each lane). Z RPR has only a very limited 1016

template ligation activity (slightly exceeding background nonenzymatic ligation (Rohatgi et al., 1017

1996)). In contrast, γ’ truncation (which abolished mononucleotide polymerase activity) 1018

accommodated and enhanced templated ligation. 1019

1020

Figure 1 - figure supplement 2. Selection scheme for in vitro evolution of triplet polymerase 1021

activity. 1022

Illustration of the in vitro evolution strategy to enrich iterative templated ligase activity from 1023

libraries of selection constructs. Centre, an example selection construct is shown comprising 1024

Zcore with a γ’ 30 nt random domain followed by a fixed hairpin-forming reverse transcription 1025

site (1). This construct is flexibly tethered at its 5’ in cis (Tagami et al., 2017; Attwater et al., 1026

2010) (via L repeats (3-8) of a flexible AACA linker) to a template region (grey) bound to 5’-dual 1027

biotin modified primer (tan). This construct requires successive polymerisation of two ppp

UCG 1028

triplet substrates (red) in ˚C ice followed by self-ligation to its triphosphorylated 5’ terminus to 1029

attach itself to the biotinylated primer (2). Active triplet polymerases are then identified and size-1030

purified by denaturing PAGE to deplete unreacted ribozymes and primer (3), then captured on 1031

streptavidin beads and subjected to a denaturing wash to ensure covalent linkage to primer. 1032

Remaining dual-purified bead-linked ribozymes were then reverse transcribed (4). cDNAs were 1033

eluted with NaOH and neutralised, before acting as templates for ‘+’ strand DNA synthesis from 1034

5’ biotinylated ‘rescue’ oligonucleotides that only prime correctly on the complete synthesised 1035

sequence (5), ensuring that constructs that deviated from the specified templated triplet 1036

polymerization fail to prime on cDNA preventing their ‘+’ strand synthesis. ‘+’ strands were then 1037

captured on fresh streptavidin beads, stripped of cDNAs, reamplified and diversified to generate 1038

selection construct for a new round of selection (6). See Materials and Methods for protocol 1039

43

details and Figure 1 - source data 1, Figure 2 - source data 1, Figure 4 - source data 1 and 1040

Supplementary file 3 for oligonucleotides used in selections. 1041

1042

Figure 2 - figure supplement 1. Secondary structures of type 1-6 ribozymes. 1043

The archetypes shown are active truncations of the most common/enriched variants of each type. 1044

Types 2-6 are depicted hybridised to a flexibly tethered primer (tan) / template (grey) / substrate 1045

(red) duplex analogous to the selection construct (Figure 1 - figure supplement 2), supporting 1046

extension on the SR1-4 templates (Figure 2b). Coloured bases deviate from Zcore or derive from 1047

the random sequence region. Type 1, the most common type in the output, displayed a high 1048

degree of sporadic mutation in the catalytic core domain, consistent with a loss of catalytic 1049

function. Shown here are the four most common core mutations, which were each present in 1050

>45% of type 1 sequences. 1051

1052

Figure 2 - figure supplement 2. Clonal versus polyclonal activity. 1053

Type 1 & 2 RNAs, and the indicated polyclonal ribozyme pools late in the selection, were 1054

transcribed as selection constructs generated with Tri8AUAM (then annealed to primer A10). 1055

Their abilities to incorporate eight ppp

AUA triplets then self-ligate in this selection construct 1056

context were compared (left, 0.1 με each RNA, 2 με ppp

AUA, in 7˚C ice for 5 d; right, 0.2 με 1057

each RNA, 2 με ppp

AUA and ppp

AUGγ’d

, in 7˚C ice for 9 d). 1058

Left: Despite being the most active type (Figure 2b), type 2 activity fell far short of the round 19 1059

pool polyclonal activity. 1060

Right: The incorporation of wobble pairing γ’-deoxy ‘terminator’ triplet ppp

AUGγ’d

(see results 1061

section on fidelity, yielding a faster-migrating band than incorporation of the cognate ppp

AUA 1062

triplet) was decreased sharply after round 19, as calculated by densitometry and averaging 1063

amongst the ligation junctions in each lane, indicative of enrichment of higher fidelity triplet 1064

polymerases in the selection pool. 1065

1066

Figure 3 - figure supplement 1. Parameters of type 1 activity enhancement. 1067

(A) PAGE of primer extension reactions comprising combinations of type 5 and type 1 variants 1068

(0.2 με primer A10, template SR3, and type 5 variant, here annealed together and combined with 1069

44

0.2 με of separately-annealed type 1 variant after buffer addition; 7˚C in ice for 63 h, 2 με 1070

each ppp

AUA, ppp

AUGγ’d

, ppp

CGC). Full-length type 1 enhanced the activity of type 5 as did 5’ 1071

truncated type 1 (‘1’, Figure 3a) without its duplex-interacting A-minor motif (Shechner et al., 1072

2009) and its ‘cap’ sequence (originally present on some type 1 sequences in the selection). 1073

Replacing the type 5 ribozyme’s selected 5’ ‘cap+’ sequence with ‘cap-’ (type 5cap-

) abolished its 1074

enhancement by type 1 RNA. 1075

(B) Stoichiometry of type 1 enhancement of type 3 activity (0.5 με primer A10, template SR3, 1076

2 με each ppp

AUA, ppp

GUAγ’d

, ppp

CGC in -7˚C ice for 20 h); quantification of average extension 1077

per primer in each lane is consistent with the 1:1 stoichiometry implied by type 1’s ~50% 1078

abundance in the round 21 selection pool. 1079

1080

Figure 3 - figure supplement 2. Type 1 enhancement of parental ribozymes. 1081

Left, transplanting the conserved 5’ ‘cap+’ sequence from types 2-6 allows the Zcore but not the 1082

0core ribozyme triplet polymerase activity to be enhanced by type 1; shown are PAGE of 1083

extensions of primer A10 on template CCCMisAUG by the indicated ribozyme cores (5 με of 1084

pppAUG and

pppCCC, 0.5 με of each RNA with type 1 annealed and added separately, 7˚C in ice 1085

for 22 h). Right, RNA polymerase activity of Z RPR using NTPs is enhanced by type 1 addition, 1086

but only when modified with the 5’ ‘cap+’ sequence (Z RPRcap+

), extending primer A10 on 1087

tethered template HTI (0.5 με of each RNA, 4 mM of each NTP, at 4˚C for 68 h). 1088

1089

Figure 4 - figure supplement 1. Summary of ribozyme development in this work. 1090

Left, map of triplet polymerase selection and engineering. Ribozyme sequences are listed in 1091

Supplementary file 1. Right, secondary structures of the central ribozymes in this work. 1092

1093

Figure 5 - figure supplement 1. Substrates for structured template copying. 1094

Primer extension reactions are shown using sets of defined triplet substrates (above the lanes), or 1095

random pools of all 64 triplets (ppp

NNN), each at the indicated concentrations on three templates 1096

with progressively longer, more stable secondary structure elements (right, TMs estimated using 1097

(Zuker, 2003)) (0.5 με primer A9, 0.5 με 4S, 6S, 8S templates, 2 με t5+1

, 7˚C ice for 25 d). 1098

Structure invasion and primer extension across the structured template region (beginning at the 1099

45

dotted red line on the image) occurred in all cases (with increased efficiency at higher substrate 1100

concentrations for the 8S template (5µM)), including extensions where template secondary 1101

structures were allowed to form in the absence of triplets (*, triplets added last after 30 min 1102

supercooled at -7˚C). 1103

1104

Figure 5 - figure supplement 2. Substrate concentration dependence of structured template 1105

copying. 1106

Extension on the 8S structured template with a defined set of complementary triplets (left) or 1107

dinucleotides (right), at the indicated substrate concentrations (0.5 με primer A9, template 8S, 1108

2 με t5+1

, 7˚C ice for 25 d). Left, quantification of the extent of extension into the structured 1109

template region below each lane (measured by the fraction of third triplet addition to primers 1110

already extended by two) recapitulates a sigmoidal relationship with triplet concentration. Right, 1111

dinucleotides (like mononucleotides) exhibit negligible extension into the template hairpin 1112

structure. 1113

1114

Figure 6 - figure supplement 1. Ribozyme catalytic domain self-synthesis and assembly. 1115

Top, conditions and yields for self-synthesis and assembly of the catalytic t5 domain (as α + and 1116

+ fragments, Figure 6b) by t5

+1. 1117

To obtain maximal amounts of fully synthesized and assembled t5b ribozyme for activity testing 1118

(Figure 6c), we implemented t5 ‘+’ strand synthesis using sequence-specified triplet substrates for 1119

segment syntheses (as shown). The synthesis scale corresponds to the limiting component present. 1120

*: Unlike other segment ligation steps, the ligation was carried out using freeze-thaw cycling 1121

(Mutschler et al., 2015) (2.5 h at γ0˚C, 21 h at 7˚C, then 0.5 h at γ7˚C), which modestly 1122

improved yield. 1123

Below, scheme of t5+1

-catalysed t5 synthesis and assembly reactions. 1124

PAGE separations of syntheses (xt) alongside purified reference segments and fragments (in bold) 1125

were stained with SYBR Gold to quantify boxed full-length products. These were excised for use 1126

in subsequent assembly steps (illustrated by dashed black arrows). Synthetic schemes of colour-1127

coded segments are shown beside corresponding lanes, denoting ‘+’ strand synthesis (bold primer 1128

and dashes) on ‘–’ strand template. Fully synthesized segments are shown as bold lines. Assembly 1129

reactions involve use of fully synthesized segments as the final substrate in a primer extension 1130

46

reaction (α + , + ) or direct templated ligation of synthesized segments ( + ). The α + and 1131

+ fragments associate spontaneously to form an active triplet polymerase ribozyme (Figure 1132

6b), tested for activity in Figure 6c. The synthesised segment sequences were derived from the t5b 1133

variant of t5, which differs by one neutral signature mutation from t5 in the α segment, and by 1134

another neutral signature mutation from t5a in the segment (mutations highlighted in 1135

Supplementary file 1); these neutral mutations, not present in the t5 and t5a used to synthesise the 1136

segments (above), allowed verification of the synthetic origin of the product fragments by 1137

sequencing, which revealed the correct signature mutations in each fragment ruling out 1138

contamination of the synthesized ‘+’ strand products by the synthesizing (‘+’ strand) triplet 1139

polymerase ribozyme. 1140

1141

Figure 6 - figure supplement 2. Substrate competition attenuates inhibitory +/ -

pairing 1142

during self-synthesis. 1143

(A) Longer oligonucleotide substrates are required to maximise synthesis of full-length + 1144

segment (0.5 µM F 9 primer/T template/oligonucleotide substrate and 5 µM each triplet (left, 1145

omitted where equivalent oligonucleotide substrate was present), 13 d in 7˚C ice). Replacing 1146

two triplets with a preformed hexanucleotide substrate (ppp

UGAAUG) boosts full-length +

1147

product synthesis by t5+1

(left panel), but not by the type 6s+1

ribozyme - with a different 1148

accessory domain (see Figure 2 - figure supplement 1, sequence in Supplementary file 1) - which 1149

synthesizes + independent of this hexanucleotide (right panel, allowing some full-length

+ 1150

synthesis with only triplets). This likely reflects unfavourable competition between the t5 1151

ribozyme’s own + domain and triplet substrates for complementary pairing to the

- RNA used as 1152

template. Consistent with this hypothesis, increasing t5+1

concentrations with this template is 1153

counterproductive (middle panel). For both ribozymes, using a preformed nonanucleotide 1154

substrate ppp

UUUUUCAUG at the end of the template boosted full-length product over use of a 1155

pppUUCAUG hexanucleotide +

pppUUU triplet, or just the three constituent triplets (S=9 vs. 6/3 1156

vs. 3/3/3); this is not due to the individual triplet sequences as t5+1

can efficiently incorporate 1157

pppUUU and

pppUUC (see Figure 5b, Figure 6a). 1158

(B) Doubling triplet concentrations to 10 µM can attenuate t5+1’s requirement for the 1159

pppUGAAUG hexanucleotide substrate during

+ synthesis. This suggests that at higher triplet 1160

concentrations they successfully compete with this part of t5 for - template hybridisation (0.5 µM 1161

F 9 primer/T template/longer oligonucleotide substrates, 13 d in 7˚C ice). With a fourfold 1162

47

excess of primer/template duplex and substrates (right lane), the amount of full-length segment 1163

generated exceeds the ribozyme added, evidence of multiple turnover of full-length + product. 1164

1165

Figure 6 - figure supplement 3. Ribozyme stabilisation attenuates inhibitory +/ -

pairing 1166

during self-synthesis. 1167

Left, + synthesis using 5 με of each complementary triplet substrate (hex) or 1 με of 1168

pppUGACAU (+hex) replacing the final two triplets (0.25 με primer F 7/template HT , 0.5 με 1169

each ribozyme, 6 d in 7˚C ice). Type 5 with the initially isolated domain (type 5s) needs 1170

hexamer substrate for + synthesis, but with the reselected domain (t5

a) does not. See 1171

Supplementary file 1 for ribozyme sequences. 1172

Right, model of substrate/ribozyme competition for template binding. The region at the end of the 1173

segment exhibits irregular pairing in the initially isolated domain (above), and appears 1174

vulnerable to - template pairing during synthesis with triplet substrates. Reselection 1175

strengthened base-pairing in the corresponding domain stem (below), replacing a G-U wobble 1176

with a cognate G-C pair and a U.U mispair with cognate U-A pair, stabilizing the - domain 1177

junction and potentially reducing competition with triplets during segment synthesis. 1178

1179

Figure 6 - figure supplement 4. Ribozyme segment synthesis with random substrate pools. 1180

For maximum self-synthesis yield, we had used specific triplet substrate sets. Here we compare 1181

synthesis by 0.5 µM t5+1

of the five t5 ‘+’ segments (30 d in 7˚C ice) using specific triplets vs. 1182

random triplet pools or reduced G-content random triplet pools. Reactions included 5 µM of each 1183

triplet in specific triplet sets (‘tri’, as in Figure 6a ‘+’ syntheses), random ppp

NNN, or a low-G 1184

pppNNN. The low-G

pppNNN had the same overall triplet concentration as

pppNNN, but individual 1185

triplets were five-fold less common for each constituent G (~10/2/0.4/0.08 µM for 0/1/2/3 Gs per 1186

triplet; some primer/template (P/T) concentrations were reduced here to ensure excess substrate 1187

over template). Synthesis is compared with and without some longer oligonucleotide substrates 1188

(indicated below, equimolar to template sites, replacing the corresponding triplets in ‘tri’ substrate 1189

mixes). Importantly, for all ‘+’ strand segments, full-length products are generated using ppp

NNN 1190

with yields approaching those obtained when using specific triplets (calculated by densitometry, 1191

above the gel); intriguingly the low-G ppp

NNN pools often gave superior yields, possibly helped 1192

by the higher concentrations of weaker-binding AU-rich triplets therein. 1193

48

1194

Figure 6 - figure supplement 5. Ribozyme segment synthesis with mixed length substrate 1195

pools. 1196

+ (left) or

+ (right) syntheses were performed (using 0.5 µM each of t5

+1 and the 1197

primer/templates above, 15 d in 7˚C ice) with their constitutive triplet sets alone (*, see Figure 1198

6a) or the random-sequence substrate mixes indicated. Introduction of dinucleotide and 1199

mononucleotide substrates (ppp

NN, ppp

N i.e. NTPs) decreases full-length product band intensity 1200

but not ligations performed (here quantified whilst assuming each extension product was formed 1201

from the fewest substrates possible), with increasing numbers of extension products deviating 1202

from the starting triplet register. Dinucleotides and mononucleotides appear poor substrates in the 1203

absence of triplets. 1204

1205

Figure 8 - figure supplement 1. Fidelity of type 5 variants. 1206

Positional error rates and fidelities (determined as in Figure 8a) for type 5 ribozyme variants; the 1207

overall fidelity was calculated as a geometric mean of positional errors at each triplet position (n 1208

and s.d. of this value shown for ribozymes assayed multiple times, see Figure 8 - source data 1). 1209

Fidelity of the initial type 5 isolate (tethered to the assay templates) is modestly improved in the 1210

absence of type 1. However, with type 1, fidelity is improved when operating fully in trans 1211

(type5s +1

). Reselection and stabilisation of the domain yields a further fidelity improvement 1212

(t5+1

) but truncation (α +1) reverts the pattern of error tendencies along the triplet towards that 1213

of the starting core. See Supplementary file 1 for ribozyme sequences. 1214

1215

Figure 8 - figure supplement 2. Fidelity domain influence upon template- and position-1216

specific error rates. 1217

At each position in each of the twelve triplet templates (top) tested in the fidelity assay 1218

(Figure 8a), three different errors are possible after analysis of sequence composition of added 1219

triplets (see Figure 8 - source data 1). Errors are represented by discs, coloured by mutation (e.g. 1220

A to G : green; A to C : blue; G to C : blue). Disc size is proportional to error frequency without 1221

the fidelity domain (its occurrence using +1). The effect of the fidelity domain upon these 1222

error tendencies is plotted on the y-axis (by comparison of error rates between t5+1

and +1). 1223

Many of the most significant errors (using +1) exhibit the greatest proportional reductions 1224

49

with the fidelity domain, including substantial (>10-fold) reductions in several misincorporations 1225

at the third position. Improvements in fidelity are variable but focused on the 2nd

and 3rd

triplet 1226

positions, while the 1st triplet position shows an increase specifically in A to G mutations, 1227

indicative of a tolerance for G-U wobble pairing. 1228

1229

Figure 8 - figure supplement 3. Determination of residual fidelity phenotype when using 1230

minor groove-modified substrates. 1231

Top: framework for estimation of residual fidelity phenotype (). Shown are model logistic 1232

curves for incorporation of a triplet e.g. GAU (at a fixed concentration, opposite γ’-AUC-5’) 1233

versus increasing concentrations of a mismatching triplet e.g. GGU. A ribozyme ‘+’ with a 1234

fidelity function (e.g. t5a, blue curve) will incorporate equal amounts of the matched and 1235

mismatched triplets at a different concentration of the mismatched triplet compared to a ribozyme 1236

‘−’ without a fidelity function (e.g. α , red curve). x+ and x− represent the lns of these 1237

concentrations for + and −; their separation (x+ - x−) is a proxy for the strength of the ‘+’ fidelity 1238

phenotype. We measured the relative incorporation (W) of triplet pairs by the two triplet 1239

polymerases at test concentrations (t) of mismatched triplet, chosen to maximise the difference in 1240

the ribozymes’ resulting fractional incorporations (marked by a blue dot vs. a red dot on 1241

respective curves). If a triplet modification (*, right) interferes with mismatch discrimination by 1242

‘+’, it would shift the ‘+’ relative incorporation curve (blue) towards that of ‘−’ (red), reducing 1243

the difference in fractional incorporation. Assuming curve steepness (k) remains constant, the 1244

residual phenotype () for that modification is described by the new separation (x+* - x−*) as a 1245

proportion of the original separation (x+ - x−); numerical values and calculations from the 1246

measurements described below are supplied in Figure 8 - source data 2. 1247

Middle: measurements of ratios of incorporation vs. misincorporation (W) for unmodified and 1248

modified triplet pairs (at the indicated test concentrations) by the t5a +1

triplet polymerase 1249

compared to the fidelity domain-truncated α +1 (at 0.5 με each). The expected triplet 1250

additions are in grey along the left of each gel; the average W from n independent experiments, 1251

calculated via densitometry of products containing slower-migrating G misincorporations, is 1252

shown below each lane along with the average . Primers (P), templates (T) and additional triplets 1253

were at 0.5/0.5/5 με, incubated for three days in 7˚C ice. Left: presence of a 2-thiouracil (2SU) 1254

at the 3rd

position of the triplet abolishes the fidelity domain’s ability to discriminate against a 2nd

1255

position wobble pair (P: A10, T: CCCMisGAU, +5 με ppp

CCC). Centre-left: the same 1256

50

modification (2SU) also abolished the fidelity domain’s preference for G misincorporation at the 1257

1st position (P: F 7, T: T AGU, +5 με

HOCUG). Centre-right: in contrast, a 2SU at the 1

st 1258

position of the triplet exerted no clear influence upon 3rd

position wobble discrimination (P: A10, 1259

T: CCCMisUGA, +5 με ppp

CCC). Right: replacement of a β’ hydroxyl group with a β’ fluoro 1260

(β’F) at the 1st triplet position likewise had no effect upon 3

rd position mismatch discrimination 1261

(P: F 7, T: T AGU, +5 με HO

CUG). 1262

Below: Measurement of fidelity phenotype in the presence or absence of a downstream triplet. 1263

Residual phenotypes indicate attenuation or abolishment of 1st position (left panel, P F 7, T 1264

T AGU), 2nd position (middle panel, P F 7, T T GAU), and 3

rd position (right panel, P F 7, T 1265

T AGU) fidelity effects from downstream triplet absence. No effects are seen (upon 3rd

position 1266

discrimination) from the presence or absence of a 5’ triphosphate on the downstream triplet (ppp). 1267

In the absence of a triplet bound downstream on the template, the fidelity phenotype is severely 1268

compromised, suggesting this adjacent triplet:template duplex plays a critical role in positioning 1269

the fidelity domain relative to the incoming triplet. 1270

1271

Figure 9 - figure supplement 1. Influence of random triplet pool concentrations upon triplet 1272

misincorporation tendencies. 1273

Left: 0.2 µM t5+1

was tested in the fidelity assay at a series of ppp

NNN concentrations. The 1274

observed overall error rate (black circles) in these reactions halved as ppp

NNN concentrations 1275

were increased to 7 µM each (after which extension activity began to reduce and error rates 1276

increase). Diverse patterns were observed in the concentration dependence of individual triplet 1277

positional errors, of which a selection are displayed here (×) coloured according to whether they 1278

raise (red-yellow) or lower (blue) the GC content of the expected triplet, and coloured more 1279

intensely for higher GC content triplets. 1280

Right: For a broader set of errors with 0.5 µM t5+1

(those with >0.3% incidence at 5 µM ppp

NNN, 1281

x-axes) the changes in their incidences from raising concentrations of each ppp

NNN from 2 µM to 1282

10 µM substrate are plotted on the y-axes. The frequencies of several mutations that increase 1283

triplet GC content in GC-rich triplets are reduced at 10 µM ppp

NNN concentrations (top panel). 1284

Conversely, the frequencies of some mutations that reduce GC content are increased at 10 µM 1285

pppNNN concentrations (bottom panel); these suggest relative depletion of available GC rich 1286

triplets as overall triplet concentrations rise. Such pool behaviour can explain the reduction of 1287

51

overall error rates in ~5 µM ppp

NNN triplet pools, as misincorporations that increase GC content 1288

(e.g. G opposite U) are responsible for more errors than those that decrease it (bottom left). 1289

Values and ratios for a wider range of errors are supplied in Figure 9 - source data 1. 1290

52

G C C A U C A A A G C U U G A G A G C A U C U U Internal

triplets’ average:

10 με each tri: 93.3 99.4 99.5 97.8 99.8 99.4 99.6 97.9 99.1 99.3 99.5 99.1 99.8 98.9 99.5 98.2 96.0 96.3 98.79

10 με each PPPNNN: 92.8 97.0 98.8 99.2 99.5 99.3 99.1 97.8 98.6 99.4 99.1 98.7 94.3 81.4 96.6 97.4 59.8 42.3 96.65

10 με average, low-G PPPNNN: 97.3 98.2 99.5 97.3 99.7 99.4 99.6 97.8 99.0 99.3 98.9 98.8 97.6 97.3 98.8 97.4 73.2 43.2 98.56

Table 1. Sequencing of ribozyme-synthesised + segment. 1291

Shown are the individual base fidelities (%) along the + sequences (top) synthesised by t5

+1, 1292

using the six specific triplets (tri), or random (ppp

NNN) or compositionally-biased random (low-G 1293

pppNNN, see Figure 6 - figure supplement 4) substrate pools, from F6 primer (the first six 1294

positions at the left) with template T (1 με each RNA, 13 d 7˚C ice). For their sequencing, 1295

extension products were eluted from templates, and full-length products were gel-purified, ligated 1296

to adaptor, reverse-transcribed and PCR amplified. For compositional analysis, a small percentage 1297

of unrelated amplified products were excluded (those with >9 mutations vs. the expected + 1298

sequence; similar levels were excluded if a >6 mutation threshold was applied, 0.2%/0.2% & 1299

3.7%/4.2% & 3.7%/3.8% for tri & ppp

NNN & low-G ppp

NNN). These sequences mostly appeared 1300

to derive from off-target priming and extension of F6 on the ribozyme in the presence of 1301

pppNNN. 1302

The sequencing of products generated from specific triplets provides an estimate of background 1303

error arising from amplification and sequencing. The final triplet constitutes an error hot-spot - 1304

likely to mutate to a more mutationally stable triplet during self-replication - exacerbated in 1305

pppNNN samples by the inability of the fidelity domain to operate in the absence of a downstream 1306

triplet (Figure 8 - figure supplement 3). The geometric average of internal triplet position 1307

fidelities is used to gauge overall t5+1

fidelity during RNA synthesis. While overall fidelity drops 1308

from defined to random triplets (98.8% to 96.7%), much of this loss in fidelity can be recovered 1309

by adjusting the triplet composition to a low-G random pool, where reductions in G-U wobble 1310

pairing more than compensate for increases in rarer misincorporations opposite template C. 1311

1312

Supplementary file 1. Ribozyme sequences. 1313

All sequences are written in a 5’-to-γ’ direction, generated by GMP transcription of the 1314

corresponding PCR-generated dsDNA of the sequence downstream of 5T7 sequence duplex. All 1315

transcripts were PAGE-purified. 1316

53

HDV ribozyme sequences (blue) were transcribed in series with reselected type 5 ribozymes and 1317

cleave themselves off during transcription (Schurer et al., 2002) to yield precise γ’ ends with β’, 1318

γ’-cyclic phosphates; this presence of group did not affect type 5 activity. 1319

Sequences corresponding to the 5’ ‘cap+’ or ‘cap’ regions from the selection, presenting a target 1320

for type 1 interaction, are coloured light green, with alternative arbitrary inert 5’ hairpin-forming 1321

sequences in dark green. Single stranded sequences capable of hybridization with sites at the γ’ 1322

(or 5’, type 5cis

) ends of certain templates to endow flexible ribozyme-template duplex tethering 1323

(Wochner et al., 2011; Attwater et al., 2010) are coloured yellow. Accessory domains γ’ of the 1324

catalytic core are in bold. 1325

1326

Supplementary file 2. Transcription of triplets. 1327

Each target was transcribed from oligonucleotides 5T7 (Supplementary file 3) and 1328

5’-(var)-TATAGTGAGTCGTATTAATTTCGCGGGCGAGATCGATC-γ’, where the (var) 1329

overhang encodes (is the DNA reverse complement of) the sequence indicated below. Guide 1330

yields: >15 nmol = ***, 10-15 nmol = **, 5-10 nmol = *, <5 nmol = ~. †: These transcriptions 1331

yield one main product suitable for use as markers to identify comigrating triplets with a similar 1332

G/AU/C content. 1333

1334

Supplementary file 3. Oligonucleotide sequences. 1335

All sequences are written in a 5’-to-γ’ direction. DNA sequences are coloured grey. RNA 1336

sequences are coloured black, with the exception of hammerhead ribozyme (brown) and HDV 1337

ribozyme (blue) sequences transcribed in series that cleave themselves off during transcription 1338

(Schurer et al., 2002) to yield precise 5’ and γ’ ends respectively. All RNAs were denaturing 1339

PAGE-purified, and DNAs were not, unless otherwise noted (‘(non)-GP’). Competing 1340

oligonucleotides, listed by templates, were purified using RNeasy columns (Qiagen). Primer and 1341

oligonucleotide binding sites on templates are underlined. 1342

AAAC

CUU

GGAG

AC

UG

CC

GA

CG

G

GC

UU

CAAC G

AG

AA

UCU

CAUCU

UCG

GUAG

A

C

G

AAA

AA

U

CAGG

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

A

GGUUGUCC

AACA( )L

3’

U

AACC

CC

C

GA

GGUUGGC

G

C

NNN

NNN

PPP

5’

5’

G

P5’

0core

C

G

C UU

G AA

AAA

G

CG

UCA

GCGA

C

U

AA3’5’

AAAC

CUU

GGAG

A

CG

UCA

CU

GC

C

GA

CG

GA

GC

UUA

AC G

AG

AA

UU

CU

CAUCU

UCG

GUAG

C

A

C

G

AAA

AA

U

CCA

GGU

ACA

UGU

GG

CC

C

G

C U U

G A A

AAA

G

G

U

AACC

CC

C

GA

GGUUGGC

G

C

NNN

AACA( )L

3’

3’

5’

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

UAG

U

A

AC

UG

A

C

G

C G C

G C G

AA GC

CA

AA

AAA

GC

CC

CA

AU

CA

CC

GG

CGC

GCG

UUC

G

U

UAG

AUC

GGG

CCC

AAC

AUAC

UC

C

AG

G

AU

U

GC

P5’

Z RPR / Zcore

Z RPR Zcore Round 7library

+ 3’ N30

domaintype 0 0core

25%

selectiontruncation

A

B

Templates: SR1 SR1 SR2 SR3 SR4

Z RPR

noNTPs

Primer

Substrates: 1 mMNTPs

0core triplet polymerase

notriplets

SR1 SR1 SR2 SR3 SR4

5 μM pppAUA,5 μM pppCGC

[each triplet] (μM)

Superc

oole

d e

xte

nsio

n(t

riple

ts a

dded p

er

prim

er)

Template SR3

5’ P

CUA GU

GCG

G

U

AU

AU

U

G

GC

Template SR1

5’ PCUA GU G CUA GU G CUGAU CUG G AU G 5’ P

CGCAUA

PP

PP

PP

CGCAUA

PP

PP

PP

CGCAUA

PP

PP

PP+pppAUA

0.25

0.20

0.15

0.10

0.05

0

0 10 20 30 40

+pppCGC

C

NPPP

exte

nsio

n

Template SR1 ( )

Template SR3 ( )

5’

Nprimer

template

RPR

PPP N

PPPN

PPP

N

PPP

5’

NNN

tripletpolymerase

PPPNNN

PPPNNN

PPP

truncation

type 2 type 3 type 4 type 5 type 61 type 1

Round 7 library

Round 21 library

selection

53% 11% 0.5% 30% 6% 0.1%

truncation

Pri

Zcorecap+Rz+1 +1 +1 +1 +1

:

AUA

CGC

AUA

CGC

AUA

CGC

type 2 type 3 type 4 type 5 type 61+1

trip

lets

added

A

B

Type 1 / 1 (truncation)

AAG

U

GGAG

AC

C

G

CG

G

CAC G

AG

AA

C

CAUCU

UCG

GUAG

C

A

C

G

U

3’

A

CG

UGG

ACC

GC

CC

UA

CU

A

CGC

GCG

GAU

A

U

C

G

C

U

G

AAUGCAU

U

C

ACG G

CCACCAC

AUGAUCUGAAGAGCCUU

GG GUUUUUU

G

UCA

AG

UCA

CC

CU

C

AAA

AA

CCAAUAGG

P5’

U

G

C U G

G A CU

A

UU

UAC

A

AAAC

CUU

GGAG

C

A

A

UCA

CU

GC

C

GA

CG

G

GC

UU

CAAC G

AG

AG

UU

CU

CAUCU

UCG

GUAG

C

A

C

G

AAA

AA

U

CCAACAGG

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

A3’CA G

U

U

UU

UG

UAC

U

GG AAAGUAA

UUG

GCC

G

UU

AU

GC

CG

GU

A AAA

C

G

CU

U

G

C

GGUUGUCC

AACA( )L

3’

U

AACC

CC

C

GA

GGUUGGC

G

C

NNN

NNN

PPP

5’

5’

G

P5’

U

G

C U G

G A CU

A

UU

UCC

Type 5

B

+1type 5

+1type 5cap-1

cap-

Rz: -

11 3 3 11 11

Pri

+88nt

C

0.1 72 74 66 71

type 5

cis

+1

type 5

trans +

1

type 5

cis

+1

type 5

trans +

1

GCG

AAG

CGU

GUG

CGA

AGC

GUG

UGC

GAA

GCG

UGU

GC

GA

AG

CG

UG

Ure

peate

d

repeate

d

substrate length (nt)

average ligationper junction (%)

cap+

GUAC

Type 5cap-

type 5 + 1

0.25 0.5 0.75 0.9

equivalents of 1:

1.1 1.25 1.5 2

5’

NNN

PPP

type 5

cis

+1

type 5

trans +

1

P

0 0.5 1 1.5 20

0.5

1.0

Equivalents of 1 added to type 5

Shifte

d b

and inte

nsity

- r

ela

tive to

Shift1

type 5

1

AAAC

CUU

GGAG

C

A

A

UCA

CU

GC

C

GA

CG

G

GC

UU

CAAC G

AG

AG

UU

CU

CAUCU

UCG

GUAG

C

A

C

G

AAA

AA

U

CAA

A

G

U

AACC

CC

C

GA

GGUUGGC

G

C

NNN

NNN

PPP

3’

5’

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

t5UUUC

C

A

GA

UC

U

CU

AG

G

3’

CA G

U

U

CA

UG

UAC

U

GAGGUAA

UUG

GCC

G

UU

AU

GC

CG

GU

A AAA

C

UU

1

CAA G

CUU

GGAG

AC

UG

CC

GAC

GG

CGAG

AA

CCU

CAUCU

UCG

GUAG

A

C

G

U

A

C G

UGG

ACC

G CA

CGC

GCG

GAU

A

G

CG

UCA

AG

CU

CAA

C

U5’

C

3’

C

CU

CA

AUGCAU

U

C

ACG G

CCACCAC

AUGAUCUGAAGAGCCUU

GG GUUUUUU

5’

P

P

5’

type 5

1

type 1

Round 5library

t5

t5+1

engineering

in reselection+ mix of 3’ N

13/N

20/N

28

- 5’ template hybridisation site- 3’ 20 nt reverse transcription sitetruncation

GGU

GUG

GAG

CGG

ACUCCU

UCU

(C)

-

-

-

-

- (

( (

( -

- -

- ) ) )

)- -

- -

-

GGU

GUG

GAG

UGC

GGA

CAC

UCC

UUC

-

-

-

- -

(

( (

(

( (

- -

- -

))) )

)

) - - - - -(UC)

GGU

GUG

GAG

UGU

GCG

GAC

ACA

CUC

-

-

-

-

- (

(

( (

(

( (

( -

- -

- ) )

)

) ) ) ) )

- - -

- -

CUU

CUC

Pri Pri Pri

4S 6S 8S

B

Rz: t5+1 t5+1 t5+1

CUGC C A A C C5’

GACGGUUGG C C A C A3’

UA G

UGC

AC

U

CCG

GC

GA

GCU

C

5’ AA AG G

GGUPPP

P

TM ~ 93˚C

Template 8S

A

Broc+

Synth w/t5+1

SYBRGold

DFHBI-1Tligand

CGG

GUC

CAG

AUA

UUC

GUA

UCU

GUC

AGUGUG

UAGGAG

GGCUCC+1

CG

UAUA

UA

G

UC

U

GU

CA

CUG

G

GGG CUC

GU

A

CC

GG

C

GG A GG CU

GU GU

GA A

UG GA

5’

3’

DFHBI-1Tligand

t5+1

-tri t5+1

+tri

Pri

CCA

CAC

UCU

ACU

CGA

CAG

AUA

CGAAUAUCUGGACCCGACCGUCUC+1

Bro

c+

as tem

pla

te:

Broc+

Synth w/t5+1

‘+’ strandsynthesis

‘-’ strandsynthesis

Z RPRcap+ +1 Z RPRcap+ +1 Z RPRcap+ +1

GG

UU

UC

GC

CG

CA

GC

GG

AG

GG

C

GA

GUUUUUCAUG

UGAAUG

CG

UC

UU

GU

GC

GC

CU

UC

AU

GA

GU

GA

GC

UA

AA

GACAAA

AA

AGACAAA

UC

UC

GA

UC

U

UG

A C

AU

CC

AC

AA

UC

UC

GU

CA

AC

GC

GGAUGCA

GCAAAA

GCCAUC

GGAUCUUCU

GCGAUAG

days:

[t5+1](µM)

0 0 0 0 02 2 2 2 2 20.5 0.5 0.5 0.5

7 1 7 7 1 7 7 1 7 7 1 7 7 1 7

P/T: Fα9/TαHP Fβ6/Tβ Fγ7/TγHP Fδ7/Tδ Fε6/Tε

yield:(%)

27 19 17 10 15

Pri

α+ β+ γ+ δ+ ε+

:

A

GC

AU

CU

GC

CG

CU

UC

C

UC

AC

UC

GG

CU

UU

UC

A

UC

AU

UU

CG

CU

GG

CG

UG

AA

UG

AG

GU

GCCACCGAAG

CAUGAAAAA

AAGAUGCUC

AGAUUUGUCU

AUGUCAU

0 0 0 0 02 2 2 2 2 22 2 2 2

25 3 25 25 3 25 25 3 25 25 3 25 25 3 25

Cyα10ba/αβ Fβ9ba/β+ Cyγ10ba/Fγ Fδ7ba/δ+ Fε9ba/εHP

6.1 29 36 51 10

UU

GG

AU

GA

UC

GA

GA

AC

CC

GC

C

GA

UA

GC

UA

UC

GC

AC

GC

AU

UU

UA

AG

UG

CC

AC

GC

G

α− β− γ− δ− ε−

days:

[t5+1](µM)

P/T:

Pri

:

yield:(%)

B

AAAC

CUU

GGAG

C

A

A

UCA

CUGCC

GA

CG

G

GC

UU

CAAC G

AG

AG

UU

CU

CAUCU

UG

G

UAG

C

A

C

G

AAA

AA

U

CAG

A

3’

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

UU

UC

C

A

GA

UC

U

CUAGG

CA GU

U

CA

U

G

UAC

U

GAGGUA

A

UUG

GCC

G

UU

AUGC

CG

GU

AAAA

C

UU

bio5’

bio5’3’

α+

β+

γ+

δ+

ε+

t5b

Pri

GGU

UGA

GAA

CGU

UGG

CGC

UAU

CGC

t5 variant:

t5+

1-s

ynth

esis

ed δ

+

C

δ−

αβ+,γδε+ none

protein t5+1synthesised by:

αβ+,γδε+

αβ+,γδε+

Lane: 1 2 3 4 5

Lane 1:

Lane 2:

Lane 3:

Lane 4:

Lane 5: No Rz,

pppGCCAUCAAAGCUUGAGAGCAUCUU alone (β+ sequence)

pppCUUpppCAUpppGAGpppUGApppGCUpppAAApppGCCAUC

pppCUUpppCAUpppGAGpppUGApppGCUpppAAApppGCC pppAUC

pppCUUpppCAUpppGAGpppUGApppGCUpppAAApppAUC

pppCUUpppCAUpppGAGpppUGApppGCUpppAAApppAUC

A10 PPPba

PP

P

3’

5’ 5’

Pri

Pri5’

5’

3’

5’

3’ e

xte

nsio

n

3’

5’ e

xte

nsio

n

Pri

Pri

A

B

PPPba

A10

C

Lane: 1 2 3 4

5 6 7

Lane 1:

Lane 2:

Lane 3:

Lane 4:

pppGAUGCAGAGGCGGCAGCCUUCGGUGGC>p alone (γ+ sequence)

pppGGUpppUUCpppGCCpppGCApppGCGpppGAGBio-GGAUGCA pppGGC

pppGGUpppUUCpppGCCpppGCApppGCGpppGAG pppGGCpppGCApGAU

Bio

pppGGUpppGCCpppGCApppGCGpppGAG pppGGCpppGCApGAU

Bio

Lane 5: pppUGCpppCUCpppCGCpppUGCpppGGCpppGAApppACCpppGCC

Bio

p

Lane 6: pppUGCpppCUCpppCGCpppUGCpppGGCpppGAA pppAUCpppACCpppGCC

Bio

p

Lane 7:pppGCCACCGAAGGCUGCCGCCUCUGCAUC alone (γ− sequence) Lane:

β+ s

ynth

esis

γ+ s

ynth

esis

γ− synth

esis

B

A

3.3 3.7 8.3 7.5 5.2 9.96.3

6.7 9.6 7.1 6.7 9.7 3.53.4

15.7 14.7 6.6 6.8 15.7 5.24.3

91.3 90.5 92.7 93.0 89.7 93.895.3

Zcore 0core t5+1

Average ribozyme error rates (%) per triplet position:

1st

2nd

3rd

4.4

1.8

1.7

97.4

s.d.(n): ±.73(2) ±.61(3)

Overall fidelities (%):

85.5 0.1 0.1 0.1

13.7 99.6 0.3 0.2

0.6 0.2 98.7 0.3

0.2 0.1 1.0 99.4

95.6

96.6 0.2 0.3 0.1

2.2 98.8 0.3 0.2

0.2 0.5 98.1 0.3

1.0 0.4 1.3 99.4

98.2

97.4 0.3 0.6 0.2

1.4 98.8 0.3 0.3

1.0 0.5 98.0 0.4

0.2 0.3 1.1 99.1

98.3

93.0 99.1 98.2 99.3

97.4

Base encoded:A G C U

t5+1

1st position fidelity (%):

3rd position fidelity (%):

A

G

C

U

% incorp

ora

ted

A

G

C

U

A

G

C

U

A G C U

92.5 0.1 0.1 0.2

6.6 99.7 0.1 0.3

0.7 0.1 98.9 0.4

0.2 0.0 0.9 99.1

97.5

88.5 0.7 1.3 2.2

9.0 98.3 0.8 7.7

1.9 0.7 90.0 2.9

0.6 0.2 7.8 87.2

90.9

A G C UBase encoded:

αβγδ+1 + ε domain

88.2 0.3 0.3 0.1

9.3 99.3 0.2 1.4

0.3 0.3 96.2 0.5

2.2 0.2 3.2 98.0

95.32nd position fidelity (%):

% incorp

ora

ted

% incorp

ora

ted

89.7 99.1 95.0 94.6

94.5Overall fidelity (%):

αβγδ+1 fidelity (%) for

A G C U

t5+1 fidelity (%) for

type 2+1 type 3+1

type 4+1 type 5+1 type 6+1

1st position fidelity (%):

3rd position fidelity (%):

A

G

C

U

A

G

C

U

A

G

C

U

2nd position fidelity (%):

trip

let positio

n:

Overall fidelity (%):

% incorp

ora

ted

% incorp

ora

ted

% incorp

ora

ted

GGGN'N'N'GGG

CCCPPP

CCCPPP

NNNCUGGCCAUA

UUUCGCGGUACG

UCACAUGAAAAC

UGG

3’ 5’3’

5’

C Triplet position:

1st 2nd 3rd

Template

5’ 3’

3’

5’

φ = 0.8,U(2S)GA

vs. U(2S)GG

φ = 1.0, A(2’F)GU

vs. A(2’F)GG

φ = -0.2,GAU(2S) vs. GGU(2S)

φ = 0.0,AGU(2S) vs. GGU(2S)

1 2 3triplet position:

PPP

2 μM pppNNN27%

pppACC, pppGCC31%

pppACCEncoded:

Mispair:

pppGAC

Incorporationof mispair vs.

encoded triplet

pppGCC pppGGC

pppACC, pppGCC,pppGGU, pppGGC

12%

5 μM pppNNN14%

10 μM pppNNN8%

2 μM pppNNN1.6%

pppGAC, pppGGC2.7%

pppGAC, pppGGC,pppGUC, pppGCC

0.71%

5 μM pppNNN0.74%

10 μM pppNNN0.69%

mispairpair + mispair

(%)

( )

Pri

Substrate (nt): 1 3 11 1 3 11 1 3

Ribozyme: none Z RPR Zcore

Extension (%): .01 .07 .07 35 .58 .22 .62 2.2

11

2.9

Gprimer

template

RPR

PPP G

PPPC

PPP

CGCUUCGCACCUAAGUAAC5’

U

PPP

A

PPP

GCGAAGCGUGU

PPP

CGCUUCGCACCUAAGUAAC5’

GCG

PPP

CGCUUCGCACCUAAGUAAC5’

1 nt substrates (NTPs)

3 nt substrate (triplet)

11 nt substrate (oligomer)

AAAC

CUU

GGAG

AC

UG

CC

GA

CG

G

GC

UU

CAAC G

AG

AA

UU

CU

CAUCU

UCG

GUAG

A

C

G

AAA

AA

U

CCAACAGG

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

A

3’

GGUUGUCC

AACA( )L

U

AACC

CC

C

GA

GGUUGGC

G

C

PPP

G

Selection construct

CG

UCA

C

U

A

C U A

G A U

G

CU

UU

UC

UGAAGAGCCUU

GGGUUUUUU

NNN

NNNN

NNNNNNNNNN

NNNN

NNN N

NNN

NN

AGC

UCG

5’

AG

C

GAA

A

GG

UC

C

CC

AG

G

P

PP

3’

CU

GA

GAG

CUC

GCU

U

5’ dualbiotin

In-ice tripletself-ligation

Bead binding& washing

Annealing,eutectic phase formation

Sizepurification

On-beadreverse transcription

& cDNA elution

Sequence-selectiveplus strand synthesis

Bead binding,cDNA removal

In-nestrecovery PCR

Transcription,size purification

B5’

B5’

B5’

B

3’ 5’

3’ 5’

rt primer

rescueoligomer

5’B

5’B

3’ 5’5’B

3’5’

5’

T7 promoter

+recoveryprimer

B5’primer

triplets

Selection construct

RNA library triphosphate

construct synthesis primer

rt primer

then +

5’3’

3’

3’

3’

3’

Mutagenic constructgeneration PCR

then

1 2 3

4

5

6

AAAC

CUU

GGAG

C

A

A

UCA

CUGCC

GACGG

GC

UU

CAAC G

AG

AG

UU

CU

CAUCUUCG

GUAG

C

A

C

G

AAA

AA

U

CCAACAGG

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

A3’CA GU

U

UU

UG

UAC

U

GG AAAGU

AA

UUG

GCCG

UU

AUGC

CG

GU

AAAA

C

G

CUU

G

C

GGUUGUCC

AACA( )L

3’

U

AACC

CC

C

GA

GGUUGGC

G

C

NNN

NNN

PPP

5’

5’

G

P5’

U

G

CU G

GA CU

A

UU

UCC

Type 5

GGUUGUCC

AACA( )L

3’

U

AACC

CC

C

GA

GGUUGGC

G

C

NNN

NNN

PPP

5’

5’

G AAAC

CUU

GGAG

A

CG

UCA

CUGUC

GACAG

AGC

UU

CAAC G

AG

AA

UU

UU

CAUCUUCG

GUAG

C

A

C

G

AAA

AA

U

CA

UAGG

3’

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

C

G

CU G

GA CU

A

UU

UCC

AC

GAAUU

CG U

GUC

A

A AUC

GU

AUGGC

C

CU

U

C

A

GG

A

G

GA

GAA

UUU

C

G

A

P5’

Type 6

AAAC

CUU

GGAG

A

CG

UCA

CUGUC

GACAG

AGC

UU

CAAC G

AG

AG

UU

CU

CAUCUUCG

GUAG

C

A

C

G

AAA

AA

C

CCAGCAGG

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

U

G

CC G

GG CU

A

UU

UCC

CC

G

GCG

C

AUU

UCCCC

ACU

3’

P5’

GGUUGUCC

AACA( )L

3’

U

AACC

CC

C

GA

GGUUGGC

G

C

NNN

NNN

PPP

5’

5’

G

Type 4

Type 2

AAAC

CUU

GGAG

A

CG

UCA

CUGUC

GACAG

AGC

UU

CAAC G

AG

AG

UU

CU

CAUCUUCG

GUAG

C

A

C

G

AAA

AA

U

CCAAUAGG

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

UG

ACA

GGC

GGC

G

CA

U

AUUAG

UA

UAC

A3’

UA

GGUUGUCC

AACA( )L

3’

U

AACC

CC

C

GA

GGUUGGC

G

C

NNN

NNN

PPP

5’

5’

G

P5’

U

G

CU G

GA CU

A

UU

UCC

Type 3

AAAC

CUU

GGAG

A

CG

UCA

CUGAA

GACU

GAGC

UU

CAAC G

AG

AA

UU

CU

CAUCUUCG

GUAG

C

A

C

G

AAA

AA

C

CCA

GGU

AUA

UGU

GG

CC

AACA( )L

3’

A

CG

UGG

ACC

GCA

CGC

GCG

GCU

A

G

U

3’

AGU

C

G

AC A

UGC

GG

C

A

U

U

G

AU

UUAUAU

G

U

AACC

CC

C

GA

GGUUGGC

G

C

NNN

NNN

PPP

5’

5’

P5’

U

G

CU G

GA CU

A

UU

UCC

Type 1

AAG

U

GGAG

AC

C

G

CGG

CAC G

AG

AA

C

CAUCUUCG

GUAG

C

A

C

G

U

3’

A

CG

UGG

ACC

GC

CC

UA

CU

A

CGC

GCG

GAU

A

U

C

G

C

U

G

AAU

GCAU

U

C

ACG G

CCACCACAUGAUCUGAAGAGC

CUUGG GUUUUUU

G

UCA

AG

UCA

CC

CU

C

AAA

AA

CCAAUAGG

P5’

U

G

CU G

GA CU

A

UU

UAC

Pri

Polyclonal output : R1959

R2089

R2190

R21 no AUG3’dpppAUA incorp. (%):

Full length

self-ligation%: .07 .35 .30 .71

AUG3’dAUA

Pri

R19polyclonal

type 21Rz:

+1type 5

+type 1

type 5cap-

+type 1

B

.05 .5.250[type 1] (μM):

Pri

.044

.115

.136

.005

triplets addedper primer

A

Pri

[type 3] (μM): .25 .25 .25 .25

:

Zcore

cap+

+ty

pe 1

0core

Zcore

cap+

Zcore

0core

cap+

0core

cap+

+ty

pe 1

Pri

Z R

PR

cap+

+1

Z R

PR

+

1

Z R

PR

Z R

PR

cap+

Pri

Rz: Rz:

type 2 type 3 type 4 type 5 type 6

1

type 1

Round 21 library

Round 5library

t5

t5+1

selection(see Figure 2 -source data 1)

engineering

in reselection(see Figure 4 -source data 1)

53% 11% 0.5% 30% 6% 0.1%

5’ truncation(see Figure 3a)

Z RPR Zcore

Round 7 library

+ 3’ N30

domain(see Figure 1 - figure supplement 2)

type 0

0core

25%

selection(see Figure 1 -source data 1)

3’ truncation(see Figure 1a)

truncation(see Figure 1a)

+ mix of 3’ N13

/N20

/N28

- 5’ template hybridisation site- 3’ 20 nt reverse transcription site

(see Figure 2 -figure supplement 1)

association(see Figure 4)

AAAC

CUU

GGAG

AC

UG

CC

GA

CG

G

GC

UU

CAAC G

AG

AA

UU

CU

CAUCU

UCG

GUAG

A

C

G

AAA

AA

U

CCAACAGG

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

A

3’

GGUUGUCC

AACA( )L

U

AACC

CC

C

GA

GGUUGGC

G

C

PPP

G

Zcore starting library

CG

UCA

C

U

A

C U A

G A U

G

CU

UU

UC

UGAAGAGCCUU

GGGUUUUUU

NNN

NNNN

NNNNNNNNNN

NNNN

NNN N

NNN

NN

AGC

UCG

5’

AG

C

GAA

A

GG

UC

C

CC

AG

G

P

PP

3’

CU

GA

GAG

CUC

GCU

U

5’ dualbiotin

AAAC

CUU

GGAG

AC

UG

CC

GA

CG

G

GC

UU

CAAC G

AG

AA

UCU

CAUCU

UCG

GUAG

A

C

G

AAA

AA

U

CAGG

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

A

GGUUGUCC

AACA( )L

3’

U

AACC

CC

C

GA

GGUUGGC

G

C

NNN

NNN

PPP

5’

5’

G

P5’

0core

C

G

C UU

G AA

AAA

G

CG

UCA

GCGA

C

U

AA3’

AAAC

CUU

GGAG

C

A

A

UCA

CU

GC

C

GA

CG

G

GC

UU

CAAC G

AG

AG

UU

CU

CAUCU

UCG

GUAG

C

A

C

G

AAA

AA

U

CCAACAGG

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

A3’CA G

U

U

UU

UG

UAC

U

GG AAAGUAA

UUG

GCC

G

UU

AU

GC

CG

GU

A AAA

C

G

CU

U

G

C

GGUUGUCC

AACA( )L

3’

U

AACC

CC

C

GA

GGUUGGC

G

C

NNN

NNN

PPP

5’

5’

G

P5’

U

G

C U G

G A CU

A

UU

UCC

Type 5

AAAC

CUU

GGAG

C

A

A

UCA

CU

GC

C

GA

CG

G

GC

UU

CAAC G

AG

AG

UU

CU

CAUCU

UCG

GUAG

C

A

C

G

AAA

AA

U

CAA

A

G

U

AACC

CC

C

GA

GGUUGGC

G

C

NNN

NNN

PPP

3’

5’

A

CG

UGG

ACC

GCA

CGC

GCG

GAU

A

G

t5UUUC

C

A

GA

UC

U

CU

AG

G

3’

CA G

U

U

CA

UG

UAC

U

GAGGUAA

UUG

GCC

G

UU

AU

GC

CG

GU

A AAA

C

UU

1

CAA G

CUU

GGAG

AC

UG

CC

GAC

GG

CGAG

AA

CCU

CAUCU

UCG

GUAG

A

C

G

U

A

C G

UGG

ACC

G CA

CGC

GCG

GAU

A

G

CG

UCA

AG

CU

CAA

C

U5’

C

3’

C

CU

CA

AUGCAU

U

C

ACG G

CCACCAC

AUGAUCUGAAGAGCCUU

GG GUUUUUU

5’

P

P

5’

Type 1

AAG

U

GGAG

AC

C

G

CG

G

CAC G

AG

AA

C

CAUCU

UCG

GUAG

C

A

C

G

U

3’

A

CG

UGG

ACC

GC

CC

UA

CU

A

CGC

GCG

GAU

A

U

C

G

C

U

G

AAUGCAU

U

C

ACG G

CCACCAC

AUGAUCUGAAGAGCCUU

GG GUUUUUU

G

UCA

AG

UCA

CC

CU

C

AAA

AA

CCAAUAGG

P5’

U

G

C U G

G A CU

A

UU

UAC

Pri

(μM): 5* 25 25 5* 25 25 5* 25 25

pppGGU pppGUGpppGAG pppUGUpppGCG pppGACpppACA pppCUC

pppCUU pppNNN

pppGGU pppGUGpppGAG pppUGCpppGGA pppCACpppUCC pppUUC pppNNN

pppGGU pppGUGpppGAG pppCGGpppACU pppCCU

pppUCU pppNNNSubstr

ate

s 4S 6S 8S

Structured template:

CUGC C A A C C5’

GACGGUUGG C C A C A3’

UA G

UGC

AC

U

CCG

GC

GA

GCU

C

5’ AA AG G

GGUPPP

P

TM ~ 93˚C

CUGC C A A C C5’

GACGGUUGG C C A C A3’

UA G

CU

CCG

GC

GA

GCU

C

5’ AA AG G

GGUPPP

P

TM ~ 89˚C

CUGC C A A C C5’

GACGGUUGG C C A C A3’

U

CCG

GC

GA

GCU

C

5’ AA AG G

GGUPPP

P

TM ~ 82˚C

Template 8S

Template 4S

Template 6S

0[each triplet]

(μM)0.5 21 3 105:

0 0.03 0.220.07 0.43 0.70.6extension ≥ 3extension ≥ 2 :

Pri

0 2 53 10 4020: 80[each dinucleotide]

(μM)

pppGG pppUG pppGA pppGUpppGC pppAC pppUC pppCU

pppGGU pppGUG pppGAGpppUGU pppGCG pppGACpppACA pppCUC pppCUU

Pri

Substrates:

Synthesis

Scale (pmol),

Volume (μl)

2000,1000

170,340

3000,6000

4000,2000

400,800

115,460

Ribozyme

2 μM t5+1

2 μM t5+1

0.5 μM t5+1

2 μM t5+1

0.5 μM t5a+1

0.5 μM t5a+1

Primer

2 μMpppGCCAUC

1 μMBioα9

0.5 μMpppGCAAAA

2 μMpppGCGAUAG

0.5 μMδ+

0.5 μMBioγ7

Incubation

time (d)

14

24

22

20

6*

11

Recovered

product (pmol)

183 β+

19 αβ+

521 ε+

432 δ+

115 δε+

5.2 γδε+

Triplets

10 μM pppAAA, pppGCU, pppUGA,pppGAG, pppCAU, pppCUU

10 μM pppCGA, pppUCU, pppAAA

5 μM pppCGC, pppGUG,pppCUU, pppCGU, pppGAG

5 μM pppCGC, pppCAA,pppCGU, pppUCU, pppCCA

none

5 μM pppGAG, pppGCG, pppGCA,pppGCC, pppUUC, pppGGU, pppGGC

βα+β

εδ

δεγ+δε

Template

2 μMTβ

1 μMTαL

0.5 μMTε

2 μMTδ

0.5 μMδεsplint

0.5 μMTγL

Oligomers

none

2 μM pppGACAAA0.5 μM β+

0.5 μM pppUGAAUG0.5 μM pppUUUUUCAUG

2 μMpppUGACAU

0.5 μMε+

0.25 μMδε+

Full-length

product (%)

31

14

21

15

36

8

αβ+ xt

β+ xt

31%

β+ 14%

αβ+ γδε+

xt ε+

21%

15%

xt δ+

8%

xt γδε+ xt δε+

36%

δ+

ε+

δε+

β α+β γ+δε δε δ ε

0.5 µM t5+1 2 µM t5+1 2 µM type 6s+1Ribozyme:

3’ terminalsubstrates

333

63 9

333

63 9

333

63 9

333

63 9

333

63 9

333

63 9

0.5 µMpppUGAAUG

0.5 µMpppUGAAUG

0.5 µMpppUGAAUG

0.5 µM t5+1

0.5 µMpppUGAAUG

9:

2 µM P, T, 9,pppUGAAUG

BA

[tri] (µM): 5 10 5 10 5

full-lengthproduct (%)

: 29 34 13 30 33

pppGUG

pppCUU

pppCGU

pppUGA

pppAUG

pppGAG

pppUUU

pppUUC

pppAUG

3’CA G

U

U

UU

UG

UAC

U

GCC

G

UU

AU

GC

CG

GU

A AAA

C

GG

5’core

5’ AUGUCAUGGUUGAGAACGUUGGCGCUAUCGC

UGAPPP

CAUPPP

UGACAUPPPC

CCAPPP

CCAPPP

δ template:

-hex: +hex:

Pri

Fulllength

CGC

CAA

CGU

UCU

CAA

UGACAU

CCA

CGC

CAA

CGU

UCU

CAA

UGACAU

CCA

-hex +hex

t5a +

1

type 5

s +

1

3’CA G

U

U

UG

UAC

U

GCC

G

UU

AU

GC

CG

GU

A AAA

C

5’core

5’ AUGUCAUGGUUGAGAACGUUGGCGCUAUCGC

UGAPPP

CAUPPP

UGACAUPPPC

CCAPPP

CCAPPP

δ template:

U

CA

-hex: +hex:

t5a +

1

type 5

s +

1

t5a +1

type 5s +1

Rz:

Full-length (%): 17 5.4 16 10 3.4 15 4.4 1.6 4.3 25 19 18 28 14 29 16 7.2 17 18 10 14 7.5 2.9 .76

tri

pppN

NN

pppN

NN

low

-G

tri

pppN

NN

pppN

NN

low

-G

tri

pppN

NN

pppN

NN

low

-G

tri

pppN

NN

pppN

NN

low

-G

tri

pppN

NN

pppN

NN

low

-G

tri

pppN

NN

pppN

NN

low

-G

tri

pppN

NN

pppN

NN

low

-G

tri

pppN

NN

pppN

NN

low

-G

Substrate:

P/T: Fα9/BTαHP Fβ6/Tβ Fγ7/BTγHP Fδ7/Tδ Fε6/Tε

Oligomers: pppGACAAA pppCAUCUU pppUGACAUpppUGAAUG

pppUUUUUCAUG

[P/T] (μM): .5 .5 .5 .35 .5 .5 .5.35 1 1 .35 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5.35 .35

Fγ7/TγHPFδ7/Tδ

[each pppNNN] (μM): *5 5 5 5 5 0 0

[each pppNN] (μM): 0 0 5 20 40 40 0

[each pppN] (μM): 0 0 5 80 320 320 0

*5 5 5 5 5 0 0

0 0 5 20 40 40 0

0 0 5 80 320 320 0

Average ligationsper primer 1.54 0.98 0.95 0.91 0.91 0.03 0 2.16 2.81 2.83 2.86 2.94 1.02 0:

Extension producttriplet register (%) 100 100 86 72 58 16 100 100 96 84 70 2:

Pri

6.3 5.5 5.9 4.4

3.4 2.1 2.7 1.8

4.3 3.7 3.1 1.7

95.3 96.2 96.1 97.4

type 5+1 t5+1

±.61(3)

type 5 type 5s +1

±1.1(3)±.73(2)s.d.(n):

αβγδ+1

2.5

4.7

9.1

94.5

±.31(4)

Overall fidelities (%):

Rz:

1st

2nd

3rd

trip

let base:

Average error rates (%) per triplet position:

10

1

0.1

0.01

1st position errors 2nd position errors 3rd position errors

A^^

Adr

A``

G^`

G`d

Gr^

Cdd

C`^

Crr

U^r

Ud`

Urd ^ A^

^ Gr

^ C`

d A`

d Cd

d U^

` Gd

` C^

` Ur

rAr

r G`

r Ud

^^A

^dU

^`C

d^C

d`G

drA

`dG

``A

`rU

r^U

rdC

rrGEncoded triplet:

t5+

1

αβγδ

+1ra

tio o

f m

isin

corp

ora

tion r

ate

s o

f=

A=

G=

C=

U

ε in

cre

ases

err

or

ε decre

ases

err

or

fractio

nal G

AU

incorp

ora

tion

ln [GGU]

xt

x− x+

0

1

fractional G

AU*

incorp

ora

tion

ln [GGU*]

xt

x−*x+*0

1

Modified triplets (*)Unmodified triplets

at = t, W = W* =GGU incorporation

GAU incorporation

GGU* incorporation

GAU* incorporation

[GGU]and

[GGU*]

ln = k(x+- x−)

W−W

+

( ) ln = k(x+*

- x−*) φ = = (x+*

- x−*)

(x+- x−)

If φ is , the modification * the fidelity phenotype (for GAU vs. GGU)

~0

<1

~1

>1

abolishesattenuatesdoesn’t affectenhances

ln(W−/W+)

ln(W−/W+)

* *( )**

W−W

+

+GGU

CCC

+GAU

CCC

GAU

t5a +1

Pri

αβγδ+1

GA

U2S

GG

U2S

Match (1 μM):

Mis- match (4 μM):

GA

U

GA

U2S

GA

U

GG

U2S

GG

U

GG

U

W, n=4: .66 2.4 1.7 1.9

+GGU+AGU

AGU

t5a +1

Pri

αβγδ+1

AG

U2S

GG

U2S

Match (1 μM):

Mis- match (4 μM):

AG

U

AG

U2S

AG

U

GG

U2S

GG

U

GG

U

W, n=3: 1.3 .79 .71 .77

φ = -0.2, ± 0.23 (s.d.) φ = 0.0, ± 0.18 (s.d.)

+UGG

CCC

+UGA

CCC

UGA

t5a +1

Pri

αβγδ+1

U2SG

AU

2SG

G

Match (1 μM):

Mis- match (4 μM):

UG

A

U2SG

A

UG

A

U2SG

G

UG

G

UG

G

W, n=4: .36 .27 .73 .45

φ = 0.8, ± 0.52 (s.d.)

+AGG+AGU

AGU

t5a +1

Pri

αβγδ+1

A2’FG

UA

2’FG

GMatch (1 μM):

Mis- match (9 μM):

AG

U

A2’FG

U

AG

U

A2’FG

G

AG

G

AG

G

W, n=3: 1.5 1.9 7.1 9.1

φ = 1.0, ± 0.19 (s.d.)

CUG CUG

t5a +1 αβγδ+1 t5a +1 αβγδ+1 t5a +1 αβγδ+1

Pri

CUG

AGU

+GGU+AGU

CUG

GAU

+GGU+GAU

CUG

AGU

+AGG+AGU

Pri Pri

Downstream triplet (5 μM):

HOC

UG

-

HOC

UG

-

HOC

UG

-

HOC

UG

-

HOC

UG

-

HOC

UG

-

pppC

UG

pppC

UG

W, n=3: 1.7 .84 1.1 .69 .65 4.7 3.0 5.3 2.3 2.4 9.4 11 9.5 14

φ = 0.5, ± 0.18 (s.d.) φ = 0.0, ± 0.54 (s.d.)

W, n=3:W, n=3:

φ = 0.3, ± 0.04 (s.d.) φppp = 0.9, ± 0.09 (s.d.)

1 μ

M G

AU

4 μ

M G

GU

1 μ

M A

GU

9 μ

M A

GG

1 μ

M A

GU

4 μ

M G

GU

vs.5’

AAC GU C

UC G

pppAGU pppAGG

OH

5’ AAC GU C

pppAGU pppAGG

0.1

1

10

0.3 30

2 to 3 1 to 2 0 to 1

Mutation incidence with 5 μM pppNNN (%)

Mutation effect upontriplet GC content:

Muta

tion incid

en

ce w

ith 1

0 μ

M p

ppN

NN

Muta

tion incid

ence w

ith 2

μM

pppN

NN

3

0.1

1

10

0.3 30

Mutation incidence with 5 μM pppNNN (%)

3

Mu

tation incid

en

ce

with 1

0 μ

M p

ppN

NN

Muta

tion incid

ence w

ith 2

μM

pppN

NN

Average error contribution (%)of classes of substitutionsby effect on GC content:

+GC =GC -GC

1.42 0.14 0.44

Err

or

occurr

ence (

%)

10

1

40

0.4

4

1 100.5 52

pppNNN concentration(μM of each triplet)

3 to 2 2 to 1 1 to 0Mutation effect upontriplet GC content:

: CC

G

A : GU

G

A : A A

G

A

: G G

U

C: A C

U

C : Overall error rate

0.5 μMpppNNN 2.78 0.28 0.59

5 μMpppNNN